From patchwork Mon Mar 13 19:12:47 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= <heiko@sntech.de>
X-Patchwork-Id: 69069
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1369268wrd;
        Mon, 13 Mar 2023 12:36:16 -0700 (PDT)
X-Google-Smtp-Source: 
 AK7set/Uset2Yf/aIh2oxIa46d4tXODo79IxOxPKkOTt7fozeK3ynNYf2q/Ifv/U8yEfX98xH5zA
X-Received: by 2002:a17:90b:1e05:b0:237:7149:b333 with SMTP id
 pg5-20020a17090b1e0500b002377149b333mr36702344pjb.43.1678736176101;
        Mon, 13 Mar 2023 12:36:16 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1678736176; cv=none;
        d=google.com; s=arc-20160816;
        b=F9nZscs42XwyE9p8EnVCkxyAQrQIR0CQ5iqMVJNeU+LH6n4euMD+xEwMYBg/e+2xl4
         YnUfL6TfW11OWUqVVNstVibG6sM5MRWE0yaWM5mDcL6SmaxX3gablS8FgEWydwJuQ4as
         qdSxRho4fOVesFUohxhIq1BNY/vsPI00MA9ccZi2vMw3h3f3pqCYSA3kDvEkGFVZxc3B
         KArpJHdqjsXLOggT+wxsh4+mc/wd7tUIy4TvI952AqVa28jUHI9ApnkzqaGfhX0ZA5pk
         a9JLhYQahvNXH5t2dPkBLeMj1RStweOE954uTP0tvuQRVCN6uP6Mz7JPhW+buc0kA3Pf
         ksgg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from;
        bh=Q+cO1kxqWnTSmcL3EKqhWERKSfiTWJeaLTkYEiu9fiw=;
        b=pUsKzcDouujRWNSUoqcQH+jjVAC0hBKidDvCvtktItpsM1lmOveE0QMw0bHeX1bZ+d
         wt9Ht9Uv4yFWyzA4QshizA2dY+S3GrgNW1Vpv0KIXs3iQ7KwVWVqTbojt6mNi3z9p6NX
         hCCPHAAxveMVlRNJdO/NN4v4WBN7y0bRdTmwYrgeXmSinC5MVKbZ3r7JZyDRpmqd6H0M
         hN3M2d/CsT3UqGtm1EW17rjdaIE89FZD/Lj0sYY700RlGIgJ2Ou+nKg0Yg/rY3lPiJ/D
         uYNvrWULtzFyigEjYj06P4nBxHjKs6/oIlJPghrX3x64OQLptCfAVbXCwiofjlZKVzWV
         QmGA==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 u3-20020a17090a890300b00234bc771801si461719pjn.57.2023.03.13.12.36.03;
        Mon, 13 Mar 2023 12:36:16 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230201AbjCMTNP (ORCPT <rfc822;realc9580@gmail.com> + 99 others);
        Mon, 13 Mar 2023 15:13:15 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35522 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229845AbjCMTNM (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 13 Mar 2023 15:13:12 -0400
Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A84DCA255
        for <linux-kernel@vger.kernel.org>;
 Mon, 13 Mar 2023 12:13:07 -0700 (PDT)
Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169]
 helo=phil.lan)
        by gloria.sntech.de with esmtpsa  (TLS1.3) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.94.2)
        (envelope-from <heiko@sntech.de>)
        id 1pbnbT-00028k-V2; Mon, 13 Mar 2023 20:13:04 +0100
From: Heiko Stuebner <heiko@sntech.de>
To: palmer@rivosinc.com
Cc: greentime.hu@sifive.com, conor@kernel.org,
        linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
        christoph.muellner@vrull.eu, heiko@sntech.de
Subject: [PATCH RFC v3 01/16] riscv: Add support for kernel mode vector
Date: Mon, 13 Mar 2023 20:12:47 +0100
Message-Id: <20230313191302.580787-2-heiko.stuebner@vrull.eu>
X-Mailer: git-send-email 2.39.0
In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
References: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,
        T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1760282464524397023?=
X-GMAIL-MSGID: =?utf-8?q?1760282464524397023?=

From: Greentime Hu <greentime.hu@sifive.com>

Add kernel_rvv_begin() and kernel_rvv_end() function declarations
and corresponding definitions in kernel_mode_vector.c

These are needed to wrap uses of vector in kernel mode.

Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/vector.h        |  14 +++
 arch/riscv/kernel/Makefile             |   1 +
 arch/riscv/kernel/kernel_mode_vector.c | 132 +++++++++++++++++++++++++
 3 files changed, 147 insertions(+)
 create mode 100644 arch/riscv/kernel/kernel_mode_vector.c

diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 9aeab4074ca8..202df9ea28d7 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -147,6 +147,20 @@ static inline void __switch_to_vector(struct task_struct *prev,
 	riscv_v_vstate_restore(next, task_pt_regs(next));
 }
 
+static inline void riscv_v_flush_cpu_state(void)
+{
+	asm volatile (
+		"vsetvli	t0, x0, e8, m8, ta, ma\n\t"
+		"vmv.v.i	v0, 0\n\t"
+		"vmv.v.i	v8, 0\n\t"
+		"vmv.v.i	v16, 0\n\t"
+		"vmv.v.i	v24, 0\n\t"
+		: : : "t0");
+}
+
+void kernel_rvv_begin(void);
+void kernel_rvv_end(void);
+
 #else /* ! CONFIG_RISCV_ISA_V  */
 
 struct pt_regs;
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 48d345a5f326..304c500cc1f7 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -56,6 +56,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/
 obj-$(CONFIG_RISCV_M_MODE)	+= traps_misaligned.o
 obj-$(CONFIG_FPU)		+= fpu.o
 obj-$(CONFIG_RISCV_ISA_V)	+= vector.o
+obj-$(CONFIG_RISCV_ISA_V)	+= kernel_mode_vector.o
 obj-$(CONFIG_SMP)		+= smpboot.o
 obj-$(CONFIG_SMP)		+= smp.o
 obj-$(CONFIG_SMP)		+= cpu_ops.o
diff --git a/arch/riscv/kernel/kernel_mode_vector.c b/arch/riscv/kernel/kernel_mode_vector.c
new file mode 100644
index 000000000000..2d704190c054
--- /dev/null
+++ b/arch/riscv/kernel/kernel_mode_vector.c
@@ -0,0 +1,132 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ * Author: Catalin Marinas <catalin.marinas@arm.com>
+ * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2021 SiFive
+ */
+#include <linux/compiler.h>
+#include <linux/irqflags.h>
+#include <linux/percpu.h>
+#include <linux/preempt.h>
+#include <linux/types.h>
+
+#include <asm/vector.h>
+#include <asm/switch_to.h>
+
+DECLARE_PER_CPU(bool, vector_context_busy);
+DEFINE_PER_CPU(bool, vector_context_busy);
+
+/*
+ * may_use_vector - whether it is allowable at this time to issue vector
+ *                instructions or access the vector register file
+ *
+ * Callers must not assume that the result remains true beyond the next
+ * preempt_enable() or return from softirq context.
+ */
+static __must_check inline bool may_use_vector(void)
+{
+	/*
+	 * vector_context_busy is only set while preemption is disabled,
+	 * and is clear whenever preemption is enabled. Since
+	 * this_cpu_read() is atomic w.r.t. preemption, vector_context_busy
+	 * cannot change under our feet -- if it's set we cannot be
+	 * migrated, and if it's clear we cannot be migrated to a CPU
+	 * where it is set.
+	 */
+	return !in_irq() && !irqs_disabled() && !in_nmi() &&
+	       !this_cpu_read(vector_context_busy);
+}
+
+/*
+ * Claim ownership of the CPU vector context for use by the calling context.
+ *
+ * The caller may freely manipulate the vector context metadata until
+ * put_cpu_vector_context() is called.
+ */
+static void get_cpu_vector_context(void)
+{
+	bool busy;
+
+	preempt_disable();
+	busy = __this_cpu_xchg(vector_context_busy, true);
+
+	WARN_ON(busy);
+}
+
+/*
+ * Release the CPU vector context.
+ *
+ * Must be called from a context in which get_cpu_vector_context() was
+ * previously called, with no call to put_cpu_vector_context() in the
+ * meantime.
+ */
+static void put_cpu_vector_context(void)
+{
+	bool busy = __this_cpu_xchg(vector_context_busy, false);
+
+	WARN_ON(!busy);
+	preempt_enable();
+}
+
+/*
+ * kernel_rvv_begin(): obtain the CPU vector registers for use by the calling
+ * context
+ *
+ * Must not be called unless may_use_vector() returns true.
+ * Task context in the vector registers is saved back to memory as necessary.
+ *
+ * A matching call to kernel_rvv_end() must be made before returning from the
+ * calling context.
+ *
+ * The caller may freely use the vector registers until kernel_rvv_end() is
+ * called.
+ */
+void kernel_rvv_begin(void)
+{
+	if (WARN_ON(!has_vector()))
+		return;
+
+	WARN_ON(!may_use_vector());
+
+	/* Acquire kernel mode vector */
+	get_cpu_vector_context();
+
+	/* Save vector state, if any */
+	riscv_v_vstate_save(current, task_pt_regs(current));
+
+	/* Enable vector */
+	riscv_v_enable();
+
+	/* Invalidate vector regs */
+	riscv_v_flush_cpu_state();
+}
+EXPORT_SYMBOL_GPL(kernel_rvv_begin);
+
+/*
+ * kernel_rvv_end(): give the CPU vector registers back to the current task
+ *
+ * Must be called from a context in which kernel_rvv_begin() was previously
+ * called, with no call to kernel_rvv_end() in the meantime.
+ *
+ * The caller must not use the vector registers after this function is called,
+ * unless kernel_rvv_begin() is called again in the meantime.
+ */
+void kernel_rvv_end(void)
+{
+	if (WARN_ON(!has_vector()))
+		return;
+
+	/* Invalidate vector regs */
+	riscv_v_flush_cpu_state();
+
+	/* Restore vector state, if any */
+	riscv_v_vstate_restore(current, task_pt_regs(current));
+
+	/* disable vector */
+	riscv_v_disable();
+
+	/* release kernel mode vector */
+	put_cpu_vector_context();
+}
+EXPORT_SYMBOL_GPL(kernel_rvv_end);

From patchwork Mon Mar 13 19:12:48 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= <heiko@sntech.de>
X-Patchwork-Id: 69068
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1367334wrd;
        Mon, 13 Mar 2023 12:31:43 -0700 (PDT)
X-Google-Smtp-Source: 
 AK7set/M0z87LZsN+q4stuAh6NHxMtTX8DCIKf6ci+Aj1vLa9Yit4q3SG8RmQnyuP/IHb1z931u1
X-Received: by 2002:a17:90b:38c3:b0:236:704d:ab8c with SMTP id
 nn3-20020a17090b38c300b00236704dab8cmr38035369pjb.26.1678735903578;
        Mon, 13 Mar 2023 12:31:43 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1678735903; cv=none;
        d=google.com; s=arc-20160816;
        b=tr9lgZB3Oy+JTgcSXBqH8bPT3/LsBIlugZVzFdFywvHrPDLA1VcdbWYg1ntaQBdSJO
         oBD8rxPQ9Ys8R8ki4bcUwJDSNbO+lapAHOp7miykcvIxs7JlZh2NUM36P5uYOwiGaOTf
         JioAhEJ1FwqZUho1aznYo4OWz3+UMaAQkJTbgVJtcVVExToFC0mVx4lIgWSSGCb/RXPn
         pwUXW8znzQ56TGRKrLX0q9H6B/PGCHeNgI9NFBFV8fxz3tTq4xY5IlKOOYllpCyHoPsL
         MwnfaYe5UIOoQLPk3IA0C9ZnbXihfJuAUxGA9oLjDWvcWZODfeH+rAf5b9w/thIJ8MPL
         XizA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from;
        bh=Zu1irl9P9YB5ITwf6EXO2Umq9HhJCyQ+UbXvIt20cPQ=;
        b=bf/0zmrnCI1/ryQ/bLgtkf2dD7nmkKJbKtFLxcsbh958Um+xzTMfe8AXxIIVDf3cOf
         xOqPI4mLreewxQtZc95hCx7UZNpUHA3py+6KkniZWM2huXoGYSHha5f3is2xJPfdkBrE
         nXQ5opyr/FgYkvAp0wSincxoME+qzfz5zJKVznLRuCajMgZI7sCocnHNBMEc0WHXqDFH
         4m4HpX3aZw0RufAddytOs9WD4AAGsL+xA/OV0OB0F6BeGfj62ntEZnhC3FE+2uvQ81dV
         Z9os433ylQiTI/PHeq66Rsi8PhgUc7Z8XVRktw+cY4qZSeDDAwMyhejN8tPXfhOKYpZf
         Tjrg==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 z3-20020a17090abd8300b00234bfb1c4c6si453103pjr.58.2023.03.13.12.31.27;
        Mon, 13 Mar 2023 12:31:43 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231166AbjCMTNf (ORCPT <rfc822;realc9580@gmail.com> + 99 others);
        Mon, 13 Mar 2023 15:13:35 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35642 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229977AbjCMTNO (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 13 Mar 2023 15:13:14 -0400
Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A9063B762
        for <linux-kernel@vger.kernel.org>;
 Mon, 13 Mar 2023 12:13:07 -0700 (PDT)
Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169]
 helo=phil.lan)
        by gloria.sntech.de with esmtpsa  (TLS1.3) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.94.2)
        (envelope-from <heiko@sntech.de>)
        id 1pbnbU-00028k-6z; Mon, 13 Mar 2023 20:13:04 +0100
From: Heiko Stuebner <heiko@sntech.de>
To: palmer@rivosinc.com
Cc: greentime.hu@sifive.com, conor@kernel.org,
        linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
        christoph.muellner@vrull.eu, heiko@sntech.de
Subject: [PATCH RFC v3 02/16] riscv: Add vector extension XOR implementation
Date: Mon, 13 Mar 2023 20:12:48 +0100
Message-Id: <20230313191302.580787-3-heiko.stuebner@vrull.eu>
X-Mailer: git-send-email 2.39.0
In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
References: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,
        T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1760282178725440852?=
X-GMAIL-MSGID: =?utf-8?q?1760282178725440852?=

From: Greentime Hu <greentime.hu@sifive.com>

This patch adds support for vector optimized XOR and it is tested in
qemu.

Co-developed-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Signed-off-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/xor.h | 82 ++++++++++++++++++++++++++++++++++++
 arch/riscv/lib/Makefile      |  1 +
 arch/riscv/lib/xor.S         | 81 +++++++++++++++++++++++++++++++++++
 3 files changed, 164 insertions(+)
 create mode 100644 arch/riscv/include/asm/xor.h
 create mode 100644 arch/riscv/lib/xor.S

diff --git a/arch/riscv/include/asm/xor.h b/arch/riscv/include/asm/xor.h
new file mode 100644
index 000000000000..74867c7fd955
--- /dev/null
+++ b/arch/riscv/include/asm/xor.h
@@ -0,0 +1,82 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2021 SiFive
+ */
+
+#include <linux/hardirq.h>
+#include <asm-generic/xor.h>
+#ifdef CONFIG_VECTOR
+#include <asm/vector.h>
+#include <asm/switch_to.h>
+
+void xor_regs_2_(unsigned long bytes, unsigned long *__restrict p1,
+		 const unsigned long *__restrict p2);
+void xor_regs_3_(unsigned long bytes, unsigned long *__restrict p1,
+		 const unsigned long *__restrict p2,
+		 const unsigned long *__restrict p3);
+void xor_regs_4_(unsigned long bytes, unsigned long *__restrict p1,
+		 const unsigned long *__restrict p2,
+		 const unsigned long *__restrict p3,
+		 const unsigned long *__restrict p4);
+void xor_regs_5_(unsigned long bytes, unsigned long *__restrict p1,
+		 const unsigned long *__restrict p2,
+		 const unsigned long *__restrict p3,
+		 const unsigned long *__restrict p4,
+		 const unsigned long *__restrict p5);
+
+static void xor_rvv_2(unsigned long bytes, unsigned long *__restrict p1,
+		      const unsigned long *__restrict p2)
+{
+	kernel_rvv_begin();
+	xor_regs_2_(bytes, p1, p2);
+	kernel_rvv_end();
+}
+
+static void xor_rvv_3(unsigned long bytes, unsigned long *__restrict p1,
+		      const unsigned long *__restrict p2,
+		      const unsigned long *__restrict p3)
+{
+	kernel_rvv_begin();
+	xor_regs_3_(bytes, p1, p2, p3);
+	kernel_rvv_end();
+}
+
+static void xor_rvv_4(unsigned long bytes, unsigned long *__restrict p1,
+		      const unsigned long *__restrict p2,
+		      const unsigned long *__restrict p3,
+		      const unsigned long *__restrict p4)
+{
+	kernel_rvv_begin();
+	xor_regs_4_(bytes, p1, p2, p3, p4);
+	kernel_rvv_end();
+}
+
+static void xor_rvv_5(unsigned long bytes, unsigned long *__restrict p1,
+		      const unsigned long *__restrict p2,
+		      const unsigned long *__restrict p3,
+		      const unsigned long *__restrict p4,
+		      const unsigned long *__restrict p5)
+{
+	kernel_rvv_begin();
+	xor_regs_5_(bytes, p1, p2, p3, p4, p5);
+	kernel_rvv_end();
+}
+
+static struct xor_block_template xor_block_rvv = {
+	.name = "rvv",
+	.do_2 = xor_rvv_2,
+	.do_3 = xor_rvv_3,
+	.do_4 = xor_rvv_4,
+	.do_5 = xor_rvv_5
+};
+
+#undef XOR_TRY_TEMPLATES
+#define XOR_TRY_TEMPLATES           \
+	do {        \
+		xor_speed(&xor_block_8regs);    \
+		xor_speed(&xor_block_32regs);    \
+		if (has_vector()) { \
+			xor_speed(&xor_block_rvv);\
+		} \
+	} while (0)
+#endif
diff --git a/arch/riscv/lib/Makefile b/arch/riscv/lib/Makefile
index 6c74b0bedd60..d87e0b6fe1d0 100644
--- a/arch/riscv/lib/Makefile
+++ b/arch/riscv/lib/Makefile
@@ -10,3 +10,4 @@ lib-$(CONFIG_MMU)	+= uaccess.o
 lib-$(CONFIG_64BIT)	+= tishift.o
 
 obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o
+lib-$(CONFIG_VECTOR)	+= xor.o
diff --git a/arch/riscv/lib/xor.S b/arch/riscv/lib/xor.S
new file mode 100644
index 000000000000..3bc059e18171
--- /dev/null
+++ b/arch/riscv/lib/xor.S
@@ -0,0 +1,81 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2021 SiFive
+ */
+#include <linux/linkage.h>
+#include <asm-generic/export.h>
+#include <asm/asm.h>
+
+ENTRY(xor_regs_2_)
+	vsetvli a3, a0, e8, m8, ta, ma
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a3
+	vxor.vv v16, v0, v8
+	add a2, a2, a3
+	vse8.v v16, (a1)
+	add a1, a1, a3
+	bnez a0, xor_regs_2_
+	ret
+END(xor_regs_2_)
+EXPORT_SYMBOL(xor_regs_2_)
+
+ENTRY(xor_regs_3_)
+	vsetvli a4, a0, e8, m8, ta, ma
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a4
+	vxor.vv v0, v0, v8
+	vle8.v v16, (a3)
+	add a2, a2, a4
+	vxor.vv v16, v0, v16
+	add a3, a3, a4
+	vse8.v v16, (a1)
+	add a1, a1, a4
+	bnez a0, xor_regs_3_
+	ret
+END(xor_regs_3_)
+EXPORT_SYMBOL(xor_regs_3_)
+
+ENTRY(xor_regs_4_)
+	vsetvli a5, a0, e8, m8, ta, ma
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a5
+	vxor.vv v0, v0, v8
+	vle8.v v16, (a3)
+	add a2, a2, a5
+	vxor.vv v0, v0, v16
+	vle8.v v24, (a4)
+	add a3, a3, a5
+	vxor.vv v16, v0, v24
+	add a4, a4, a5
+	vse8.v v16, (a1)
+	add a1, a1, a5
+	bnez a0, xor_regs_4_
+	ret
+END(xor_regs_4_)
+EXPORT_SYMBOL(xor_regs_4_)
+
+ENTRY(xor_regs_5_)
+	vsetvli a6, a0, e8, m8, ta, ma
+	vle8.v v0, (a1)
+	vle8.v v8, (a2)
+	sub a0, a0, a6
+	vxor.vv v0, v0, v8
+	vle8.v v16, (a3)
+	add a2, a2, a6
+	vxor.vv v0, v0, v16
+	vle8.v v24, (a4)
+	add a3, a3, a6
+	vxor.vv v0, v0, v24
+	vle8.v v8, (a5)
+	add a4, a4, a6
+	vxor.vv v16, v0, v8
+	add a5, a5, a6
+	vse8.v v16, (a1)
+	add a1, a1, a6
+	bnez a0, xor_regs_5_
+	ret
+END(xor_regs_5_)
+EXPORT_SYMBOL(xor_regs_5_)

From patchwork Mon Mar 13 19:12:49 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= <heiko@sntech.de>
X-Patchwork-Id: 69072
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1372130wrd;
        Mon, 13 Mar 2023 12:44:10 -0700 (PDT)
X-Google-Smtp-Source: 
 AK7set+MjlSgyVt04+nYI7rO/bAGiOJT8/J7obkxjj3UExtBAtQmKL8Cy4zEW3ZHBhAO5dtx73N+
X-Received: by 2002:a05:6a20:63a1:b0:c7:4bf5:fa0a with SMTP id
 m33-20020a056a2063a100b000c74bf5fa0amr28941066pzg.48.1678736650495;
        Mon, 13 Mar 2023 12:44:10 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1678736650; cv=none;
        d=google.com; s=arc-20160816;
        b=IubsiT6ZMYvYAdp/4KlNqSj56TeOJswY5f1om+pN60kCNL3Hk2KTzxH1Pk1GeHFSYC
         2WBPSpz06f83wGadnMmUViigGGz21FFgrnESBsfaOlmhTp75Y3ljZugJJEFnTpbqZkm2
         EfEqGXFIPHQzKVTOx8jahbQeI8fuDi7s3AKM0jBZklV+OeqWr0y1Qw2PrDQW4xp/FmGw
         Q2w5cCRVulhadEObEkpJQe8vP0x/7RTni+FfaZC8xuMsjfFnIkojQh2HrUV74X7RxgQd
         4QjVRtGapppO/9vILKqR7B4/mlhS5oPsN7u29wqJHUfX/NsQ+KMA98SKaO7UtlYuVKU+
         ILSA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from;
        bh=WmnFDaMdtJq1tUGJsS3Yk7fBodS3EDm6zwlJrFpC2CQ=;
        b=V15CNyzYtpsHVyDqoUGEY2+YtKEMYR0PlKwgUGiT2R5X7pI8ARC7fUM85YRT2D0kaS
         LqoH9fgamren1/tpklYv7nqPQqh2CqDcRaMqwMWX8WiQtibBmJE/7vk417A29GNYcryy
         TF7rkMoF/fsT0/eKk17RCfHHIMg+mfuISheR6LvQEsaJAlk4bYObQv6VrVP6BVFxeyAj
         dvQ7WoxtCUYNcfg8BysINDtEnQdyrbjbFXK48i2SLg7uLwU/ToCXYzpXwAcrtk4nwKYS
         loEEy74fUSuL6dQAAFZawVxDXZDF9WqO4nP87Je+lANH9hxdsqr6MzfAyNQZqiJUY0RA
         oArQ==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 q20-20020a635054000000b00502f9a7edb4si259885pgl.340.2023.03.13.12.43.56;
        Mon, 13 Mar 2023 12:44:10 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229813AbjCMTNS (ORCPT <rfc822;realc9580@gmail.com> + 99 others);
        Mon, 13 Mar 2023 15:13:18 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35608 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230071AbjCMTNN (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 13 Mar 2023 15:13:13 -0400
Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A86E4A267
        for <linux-kernel@vger.kernel.org>;
 Mon, 13 Mar 2023 12:13:07 -0700 (PDT)
Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169]
 helo=phil.lan)
        by gloria.sntech.de with esmtpsa  (TLS1.3) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.94.2)
        (envelope-from <heiko@sntech.de>)
        id 1pbnbU-00028k-F7; Mon, 13 Mar 2023 20:13:04 +0100
From: Heiko Stuebner <heiko@sntech.de>
To: palmer@rivosinc.com
Cc: greentime.hu@sifive.com, conor@kernel.org,
        linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
        christoph.muellner@vrull.eu, heiko@sntech.de
Subject: [PATCH RFC v3 03/16] RISC-V: add Zbc extension detection
Date: Mon, 13 Mar 2023 20:12:49 +0100
Message-Id: <20230313191302.580787-4-heiko.stuebner@vrull.eu>
X-Mailer: git-send-email 2.39.0
In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
References: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,
        T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1760282962297569342?=
X-GMAIL-MSGID: =?utf-8?q?1760282962297569342?=

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Add handling for Zbc extension.

Zbc provides instruction for carry-less multiplication.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/Kconfig             | 22 ++++++++++++++++++++++
 arch/riscv/include/asm/hwcap.h |  1 +
 arch/riscv/kernel/cpu.c        |  1 +
 arch/riscv/kernel/cpufeature.c |  1 +
 4 files changed, 25 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index eb691dd8ee4f..8d83935f77d2 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -459,6 +459,28 @@ config RISCV_ISA_ZBB
 
 	   If you don't know what to do here, say Y.
 
+config TOOLCHAIN_HAS_ZBC
+	bool
+	default y
+	depends on !64BIT || $(cc-option,-mabi=lp64 -march=rv64ima_zbc)
+	depends on !32BIT || $(cc-option,-mabi=ilp32 -march=rv32ima_zbc)
+	depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900
+	depends on AS_IS_GNU
+
+config RISCV_ISA_ZBC
+	bool "Zbc extension support for bit manipulation instructions"
+	depends on TOOLCHAIN_HAS_ZBC
+	depends on !XIP_KERNEL && MMU
+	default y
+	help
+	   Adds support to dynamically detect the presence of the ZBC
+	   extension (carry-less multiplication) and enable its usage.
+
+	   The Zbc extension provides instructions clmul, clmulh and clmulr
+	   to accelerate carry-less multiplications.
+
+	   If you don't know what to do here, say Y.
+
 config RISCV_ISA_ZICBOM
 	bool "Zicbom extension support for non-coherent DMA operation"
 	depends on !XIP_KERNEL && MMU
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index ee73341bb1d4..92f234c6cb4c 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -43,6 +43,7 @@
 #define RISCV_ISA_EXT_ZBB		30
 #define RISCV_ISA_EXT_ZICBOM		31
 #define RISCV_ISA_EXT_ZIHINTPAUSE	32
+#define RISCV_ISA_EXT_ZBC		33
 
 #define RISCV_ISA_EXT_MAX		64
 #define RISCV_ISA_EXT_NAME_LEN_MAX	32
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index 8400f0cc9704..5d47a0c75c69 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -188,6 +188,7 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
 	__RISCV_ISA_EXT_DATA(zicbom, RISCV_ISA_EXT_ZICBOM),
 	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
 	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
+	__RISCV_ISA_EXT_DATA(zbc, RISCV_ISA_EXT_ZBC),
 	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
 	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
 	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index e6d53e2e672b..c9099694b5bb 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -228,6 +228,7 @@ void __init riscv_fill_hwcap(void)
 				SET_ISA_EXT_MAP("svinval", RISCV_ISA_EXT_SVINVAL);
 				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
 				SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
+				SET_ISA_EXT_MAP("zbc", RISCV_ISA_EXT_ZBC);
 				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
 				SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
 			}

From patchwork Mon Mar 13 19:12:50 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= <heiko@sntech.de>
X-Patchwork-Id: 69064
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1363472wrd;
        Mon, 13 Mar 2023 12:22:57 -0700 (PDT)
X-Google-Smtp-Source: 
 AK7set8krK+dj9Zb5jFNK7UKl6ms4xcB83mhXNEtNzafTDC86PaHcYlT0I+//0Tnu48/393d7zQE
X-Received: by 2002:a17:902:da91:b0:19c:d7a9:8bf0 with SMTP id
 j17-20020a170902da9100b0019cd7a98bf0mr12952525plx.10.1678735377604;
        Mon, 13 Mar 2023 12:22:57 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1678735377; cv=none;
        d=google.com; s=arc-20160816;
        b=nSaA1Anqao2sOCrjz5MGYwh17K/ilhAxCMp/2svAAfLz0uGR0oDftKv2xhC+k+IeQH
         AV/4OGgrFPxH18UQ0bS7yCDHPk8sxa44GHWaSVrCXWxUTder7fGCF9Qdv6ZQYMuS3102
         tkkP+e1tmyh3+R2qRGwq9PYUA8NuTwpzrtciIEDoQb0MnKzl5Wn6elI59HLA7IDBB2rW
         qmrXMAP9eKJhin9zNJNFjtYsVfCy1Wjn155AQZpsNBxWLwRtMTdwkWAHQAa5Uwt6S/xJ
         XK2TsG05UozRAFM8Byqku8Re9MxGO2UCqmnwxzqxuGGdVYjT9tB0rC+JImcTtRKVfT/P
         xfKA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from;
        bh=KtF1lFlpDWkxFgxPqL4VlyI53O1VGo/XkpSBIOHBQX4=;
        b=jXS9AgvUrIrUBl3bnYDEt8hUDEXtMJZ0znNMv/LqmNTeKkshj4l209P3D7j/I0SP/8
         elhXU6pdZWfZBH33IDEzPFUs6hqmoJ70qGvfSUxvLDmFqIgv6LACASpwW877DaJ5600j
         LCNvwAmS7FvtofJ0hobDuL3DNGJNdnAVqCkyNmUQVpIREQ5xEojRNlIs4vf0sx4dRjWt
         +4b1mYyESJOCZo5tCegepLOHNebTL+kUq+p0i9qRIqKfiAEmwPv+eNJ+Fl5bp9tEJtcR
         56YBi9wTYZoZKqXG+FEd2pw+T6Q+Ap9A1Uo74H6v+heJ58BYdzh9Rl0+XuxCQeUQf/pz
         yGPg==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 b11-20020a170902d50b00b00196711a94f6si460924plg.353.2023.03.13.12.22.43;
        Mon, 13 Mar 2023 12:22:57 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230251AbjCMTNV (ORCPT <rfc822;realc9580@gmail.com> + 99 others);
        Mon, 13 Mar 2023 15:13:21 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35612 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230104AbjCMTNN (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 13 Mar 2023 15:13:13 -0400
Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A96F5CA2D
        for <linux-kernel@vger.kernel.org>;
 Mon, 13 Mar 2023 12:13:07 -0700 (PDT)
Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169]
 helo=phil.lan)
        by gloria.sntech.de with esmtpsa  (TLS1.3) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.94.2)
        (envelope-from <heiko@sntech.de>)
        id 1pbnbU-00028k-NQ; Mon, 13 Mar 2023 20:13:04 +0100
From: Heiko Stuebner <heiko@sntech.de>
To: palmer@rivosinc.com
Cc: greentime.hu@sifive.com, conor@kernel.org,
        linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
        christoph.muellner@vrull.eu, heiko@sntech.de
Subject: [PATCH RFC v3 04/16] RISC-V: add Zbkb extension detection
Date: Mon, 13 Mar 2023 20:12:50 +0100
Message-Id: <20230313191302.580787-5-heiko.stuebner@vrull.eu>
X-Mailer: git-send-email 2.39.0
In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
References: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,
        T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1760281627565937956?=
X-GMAIL-MSGID: =?utf-8?q?1760281627565937956?=

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Add detection for Zbkb extension.

Zbkb is part of the set of scalar cryptography extensions and provides
bitmanip instructions for cryptography, with them being a "subset of the
Zbb extension particularly useful for cryptography".

Zbkb was ratified in january 2022.

Expect code using the extension to pre-encode zbkb instructions, so
don't introduce special toolchain requirements for now.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/hwcap.h | 1 +
 arch/riscv/kernel/cpu.c        | 1 +
 arch/riscv/kernel/cpufeature.c | 1 +
 3 files changed, 3 insertions(+)

diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 92f234c6cb4c..b28548fb10f3 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -44,6 +44,7 @@
 #define RISCV_ISA_EXT_ZICBOM		31
 #define RISCV_ISA_EXT_ZIHINTPAUSE	32
 #define RISCV_ISA_EXT_ZBC		33
+#define RISCV_ISA_EXT_ZBKB		34
 
 #define RISCV_ISA_EXT_MAX		64
 #define RISCV_ISA_EXT_NAME_LEN_MAX	32
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index 5d47a0c75c69..6f65aac68018 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -189,6 +189,7 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
 	__RISCV_ISA_EXT_DATA(zihintpause, RISCV_ISA_EXT_ZIHINTPAUSE),
 	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
 	__RISCV_ISA_EXT_DATA(zbc, RISCV_ISA_EXT_ZBC),
+	__RISCV_ISA_EXT_DATA(zbkb, RISCV_ISA_EXT_ZBKB),
 	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
 	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
 	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index c9099694b5bb..eb7be8e7f24e 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -229,6 +229,7 @@ void __init riscv_fill_hwcap(void)
 				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
 				SET_ISA_EXT_MAP("zbb", RISCV_ISA_EXT_ZBB);
 				SET_ISA_EXT_MAP("zbc", RISCV_ISA_EXT_ZBC);
+				SET_ISA_EXT_MAP("zbkb", RISCV_ISA_EXT_ZBKB);
 				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
 				SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
 			}

From patchwork Mon Mar 13 19:12:51 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= <heiko@sntech.de>
X-Patchwork-Id: 69067
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1365567wrd;
        Mon, 13 Mar 2023 12:27:54 -0700 (PDT)
X-Google-Smtp-Source: 
 AK7set/Nv7ODvfuLEZDKlgrnjparPtCX4TDypTpd0BLN7SZzlvhdU88lrszY+Zwlr+/KlgFenZSe
X-Received: by 2002:a17:90a:bd0d:b0:23b:4b4a:d778 with SMTP id
 y13-20020a17090abd0d00b0023b4b4ad778mr7306919pjr.34.1678735673966;
        Mon, 13 Mar 2023 12:27:53 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1678735673; cv=none;
        d=google.com; s=arc-20160816;
        b=xF9pEGPWbKkTXbyJAKfPoluhD/JQb4nbzh7WPp8H+ZydQS17XGPjiRSqOYHGe9lCAf
         ytMpNFhG5ag3+L+v9om2L1R1X2X8VvFwZErl38gRLhxTPTKs3TlRwv81PMrXqhDzkigX
         Lytamwauta+yw2lOy4nLlvCwMuBGtqevYqXaEh6M2TEZ8nI/qcvXj+5jHtpqRyfzLRyM
         zo7+jkNXkTbJnYs1mPvArf10ApfPHV46CjXpyYB3ENE0i/2o+2NW1yEd3ryehEXJYPMW
         Kr/gh32eC41Z5ybwPbSTIuP8AmaDMr1vfn1f1NT1Y0A+7kii5V0g8tQaofsXUi47vL4O
         zkUw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from;
        bh=vJF6CvlSXvmq7qX4eunn3BoeOyl7pdqADErsJfHPImU=;
        b=NeV1gs63rji0UAbFA80bp0TGgsyvJhuHr++/JG0vuzMpKGLzyPJKU1GhxVF+HL8e/4
         q62AFpf9b3CO0CVNxU6kqrcxQhTy2/KmLJv6ojpvmpyjF0XZ93faK2azdpE3QAgmZhrd
         uKTiBjrkUirob488AhxXRqqKIl1trRxcE3oAIpM9iIDpo3b4h5wAS/94JaSJGvW1orpj
         xyEf2f8jwcJYT1cIb8GwzS1mdvjtLrGF/HE40B0E58bokVAX1hyPapRott9P6gytVt3h
         4eWlLKrigpsMoj7NdN/1gdR5bosWkalq7R7bc2Nyjd280uPs9CWKHUkga5beRrZ8N252
         it/w==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 c8-20020a170902724800b001a05e6d4b08si480210pll.40.2023.03.13.12.27.29;
        Mon, 13 Mar 2023 12:27:53 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230450AbjCMTN0 (ORCPT <rfc822;realc9580@gmail.com> + 99 others);
        Mon, 13 Mar 2023 15:13:26 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35610 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230135AbjCMTNN (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 13 Mar 2023 15:13:13 -0400
Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A92E6BDF2
        for <linux-kernel@vger.kernel.org>;
 Mon, 13 Mar 2023 12:13:07 -0700 (PDT)
Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169]
 helo=phil.lan)
        by gloria.sntech.de with esmtpsa  (TLS1.3) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.94.2)
        (envelope-from <heiko@sntech.de>)
        id 1pbnbV-00028k-07; Mon, 13 Mar 2023 20:13:05 +0100
From: Heiko Stuebner <heiko@sntech.de>
To: palmer@rivosinc.com
Cc: greentime.hu@sifive.com, conor@kernel.org,
        linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
        christoph.muellner@vrull.eu, heiko@sntech.de
Subject: [PATCH RFC v3 05/16] RISC-V: hook new crypto subdir into build-system
Date: Mon, 13 Mar 2023 20:12:51 +0100
Message-Id: <20230313191302.580787-6-heiko.stuebner@vrull.eu>
X-Mailer: git-send-email 2.39.0
In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
References: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,
        T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1760281938371209739?=
X-GMAIL-MSGID: =?utf-8?q?1760281938371209739?=

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Create a crypto subdirectory for added accelerated cryptography routines
and hook it into the riscv Kbuild and the main crypto Kconfig.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/Kbuild          | 1 +
 arch/riscv/crypto/Kconfig  | 5 +++++
 arch/riscv/crypto/Makefile | 4 ++++
 crypto/Kconfig             | 3 +++
 4 files changed, 13 insertions(+)
 create mode 100644 arch/riscv/crypto/Kconfig
 create mode 100644 arch/riscv/crypto/Makefile

diff --git a/arch/riscv/Kbuild b/arch/riscv/Kbuild
index afa83e307a2e..250d1fd38618 100644
--- a/arch/riscv/Kbuild
+++ b/arch/riscv/Kbuild
@@ -2,6 +2,7 @@
 
 obj-y += kernel/ mm/ net/
 obj-$(CONFIG_BUILTIN_DTB) += boot/dts/
+obj-$(CONFIG_CRYPTO) += crypto/
 obj-y += errata/
 obj-$(CONFIG_KVM) += kvm/
 
diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
new file mode 100644
index 000000000000..10d60edc0110
--- /dev/null
+++ b/arch/riscv/crypto/Kconfig
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+menu "Accelerated Cryptographic Algorithms for CPU (riscv)"
+
+endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
new file mode 100644
index 000000000000..b3b6332c9f6d
--- /dev/null
+++ b/arch/riscv/crypto/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# linux/arch/riscv/crypto/Makefile
+#
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 9c86f7045157..003921cb0301 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1401,6 +1401,9 @@ endif
 if PPC
 source "arch/powerpc/crypto/Kconfig"
 endif
+if RISCV
+source "arch/riscv/crypto/Kconfig"
+endif
 if S390
 source "arch/s390/crypto/Kconfig"
 endif

From patchwork Mon Mar 13 19:12:52 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= <heiko@sntech.de>
X-Patchwork-Id: 69070
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1370006wrd;
        Mon, 13 Mar 2023 12:38:17 -0700 (PDT)
X-Google-Smtp-Source: 
 AK7set/8x68rcVjrEWDx25gVMyUZZ6D0Ww4JMqcxXlQJQLjWH/Aul4TvFJLXUw8nL1Yn1uq7eXEY
X-Received: by 2002:a62:63c2:0:b0:5de:3c49:b06 with SMTP id
 x185-20020a6263c2000000b005de3c490b06mr12251504pfb.3.1678736297393;
        Mon, 13 Mar 2023 12:38:17 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1678736297; cv=none;
        d=google.com; s=arc-20160816;
        b=sA4ZnNFl8onlcltlT+/iK83w1uNLS1DLxX0gz9YRSRD4MWFo2j8nUXIb+ALxZUWc0p
         V4YXNWK/2JyCqDvP9uWTCbCW1P+i2xbzv/IapCTv8dz0GyWZOA+vhlgCsJvu0J6NKQuu
         a5VbKXpHDp1u41+PxpefM0BN4DY3JLYmrM1GeF2zmqT4fpP1XsgbkG81+vxI3t8fu0Xe
         nZQ4VG21hHwhCBskIo5k+fMZYPB/79+scn0650JWmMsQjKEho+DGRYeYpr1PheHXdjvd
         w+y1i/XWUfZaQlZ2s1y0yS3EOmhsrsePPQXZqs2D7oUDpbSPlp61nGnnCvme9u/FOKgR
         7nzA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from;
        bh=kAgPGimJs8ABktzXFns1e6nuPh8j25GVJOc1kftWB5Y=;
        b=D2/634blbxjfI97guvCQwCAoDYr9gQb3O7ljhRcZ9OPJIsS7P2UJNDUKdc2+NKDVSS
         kDijeF+n2VwSi9LtwHeb1OJkocfCRDA0Eg7H03WN/w1f4r89OMNWT5G4Fx3Le6VLv7qP
         bFW4WB9hcIJ6nMQGz6i8DFHCtzxP+Z/ln6RYiurTt00X9hYxJZfI6whZZ7sceRvmrpqa
         Kh7hg7AqwK/hLWFwjS9nsXSM/zwiByo4CXk1HKJ/JQWoiRn34RGyzwHI+w6Q+JJbiWkM
         Zh6u7wSyAOZySjEsPneyjH66vFhsum+47SWcCWV+aq+j5ldxkdI4212mhJ9h7kmvlzt7
         1ipg==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 205-20020a6306d6000000b0050b19d24c3csi237487pgg.509.2023.03.13.12.38.04;
        Mon, 13 Mar 2023 12:38:17 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230254AbjCMTNm (ORCPT <rfc822;realc9580@gmail.com> + 99 others);
        Mon, 13 Mar 2023 15:13:42 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35642 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229845AbjCMTNQ (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 13 Mar 2023 15:13:16 -0400
Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8C90AD30
        for <linux-kernel@vger.kernel.org>;
 Mon, 13 Mar 2023 12:13:07 -0700 (PDT)
Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169]
 helo=phil.lan)
        by gloria.sntech.de with esmtpsa  (TLS1.3) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.94.2)
        (envelope-from <heiko@sntech.de>)
        id 1pbnbV-00028k-95; Mon, 13 Mar 2023 20:13:05 +0100
From: Heiko Stuebner <heiko@sntech.de>
To: palmer@rivosinc.com
Cc: greentime.hu@sifive.com, conor@kernel.org,
        linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
        christoph.muellner@vrull.eu, heiko@sntech.de
Subject: [PATCH RFC v3 06/16] RISC-V: crypto: add accelerated GCM GHASH
 implementation
Date: Mon, 13 Mar 2023 20:12:52 +0100
Message-Id: <20230313191302.580787-7-heiko.stuebner@vrull.eu>
X-Mailer: git-send-email 2.39.0
In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
References: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,
        T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1760282591795809626?=
X-GMAIL-MSGID: =?utf-8?q?1760282591795809626?=

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

With different sets of available extensions a number of different
implementation variants are possible. Quite a number of them are already
implemented in openSSL or are in the process of being implemented, so pick
the relevant openSSL coden and add suitable glue code similar to arm64 and
powerpc to use it for kernel-specific cryptography.

The prioritization of the algorithms follows the ifdef chain for the
assembly callbacks done in openssl but here algorithms will get registered
separately so that all of them can be part of the crypto selftests.

The crypto subsystem will select the most performant of all registered
algorithms on the running system but will selftest all registered ones.

In a first step this adds scalar variants using the Zbc, Zbb and
possible Zbkb (bitmanip crypto extension) and the perl implementation
stems from openSSL pull request on
    https://github.com/openssl/openssl/pull/20078

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig              |  11 +
 arch/riscv/crypto/Makefile             |  14 +
 arch/riscv/crypto/ghash-riscv64-glue.c | 258 ++++++++++++++++
 arch/riscv/crypto/ghash-riscv64-zbc.pl | 400 +++++++++++++++++++++++++
 arch/riscv/crypto/riscv.pm             | 230 ++++++++++++++
 5 files changed, 913 insertions(+)
 create mode 100644 arch/riscv/crypto/ghash-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/ghash-riscv64-zbc.pl
 create mode 100644 arch/riscv/crypto/riscv.pm

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 10d60edc0110..010adbbb058a 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -2,4 +2,15 @@
 
 menu "Accelerated Cryptographic Algorithms for CPU (riscv)"
 
+config CRYPTO_GHASH_RISCV64
+	tristate "Hash functions: GHASH"
+	depends on 64BIT && RISCV_ISA_ZBC
+	select CRYPTO_HASH
+	select CRYPTO_LIB_GF128MUL
+	help
+	  GCM GHASH function (NIST SP800-38D)
+
+	  Architecture: riscv64 using one of:
+	  - ZBC extension
+
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index b3b6332c9f6d..0a158919e9da 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -2,3 +2,17 @@
 #
 # linux/arch/riscv/crypto/Makefile
 #
+
+obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
+ghash-riscv64-y := ghash-riscv64-glue.o
+ifdef CONFIG_RISCV_ISA_ZBC
+ghash-riscv64-y += ghash-riscv64-zbc.o
+endif
+
+quiet_cmd_perlasm = PERLASM $@
+      cmd_perlasm = $(PERL) $(<) void $(@)
+
+$(obj)/ghash-riscv64-zbc.S: $(src)/ghash-riscv64-zbc.pl
+	$(call cmd,perlasm)
+
+clean-files += ghash-riscv64-zbc.S
diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/ghash-riscv64-glue.c
new file mode 100644
index 000000000000..6a6c39e16702
--- /dev/null
+++ b/arch/riscv/crypto/ghash-riscv64-glue.c
@@ -0,0 +1,258 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * RISC-V optimized GHASH routines
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ */
+
+#include <linux/types.h>
+#include <linux/err.h>
+#include <linux/crypto.h>
+#include <linux/module.h>
+#include <asm/simd.h>
+#include <crypto/ghash.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+
+/* Zbc (optional with zbkb improvements) */
+void gcm_ghash_rv64i_zbc(u64 Xi[2], const u128 Htable[16],
+			 const u8 *inp, size_t len);
+void gcm_ghash_rv64i_zbc__zbkb(u64 Xi[2], const u128 Htable[16],
+			       const u8 *inp, size_t len);
+
+struct riscv64_ghash_ctx {
+	void (*ghash_func)(u64 Xi[2], const u128 Htable[16],
+			   const u8 *inp, size_t len);
+
+	/* key used by vector asm */
+	u128 htable[16];
+	/* key used by software fallback */
+	be128 key;
+};
+
+struct riscv64_ghash_desc_ctx {
+	u64 shash[2];
+	u8 buffer[GHASH_DIGEST_SIZE];
+	int bytes;
+};
+
+static int riscv64_ghash_init(struct shash_desc *desc)
+{
+	struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+
+	dctx->bytes = 0;
+	memset(dctx->shash, 0, GHASH_DIGEST_SIZE);
+	return 0;
+}
+
+#ifdef CONFIG_RISCV_ISA_ZBC
+
+#define RISCV64_ZBC_SETKEY(VARIANT, GHASH)				\
+void gcm_init_rv64i_ ## VARIANT(u128 Htable[16], const u64 Xi[2]);	\
+static int riscv64_zbc_ghash_setkey_ ## VARIANT(struct crypto_shash *tfm,	\
+					   const u8 *key,		\
+					   unsigned int keylen)		\
+{									\
+	struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(tfm)); \
+	const u64 k[2] = { cpu_to_be64(((const u64 *)key)[0]),		\
+			   cpu_to_be64(((const u64 *)key)[1]) };	\
+									\
+	if (keylen != GHASH_BLOCK_SIZE)					\
+		return -EINVAL;						\
+									\
+	memcpy(&ctx->key, key, GHASH_BLOCK_SIZE);			\
+	gcm_init_rv64i_ ## VARIANT(ctx->htable, k);			\
+									\
+	ctx->ghash_func = gcm_ghash_rv64i_ ## GHASH;			\
+									\
+	return 0;							\
+}
+
+static int riscv64_zbc_ghash_update(struct shash_desc *desc,
+			   const u8 *src, unsigned int srclen)
+{
+	unsigned int len;
+	struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));
+	struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+
+	if (dctx->bytes) {
+		if (dctx->bytes + srclen < GHASH_DIGEST_SIZE) {
+			memcpy(dctx->buffer + dctx->bytes, src,
+				srclen);
+			dctx->bytes += srclen;
+			return 0;
+		}
+		memcpy(dctx->buffer + dctx->bytes, src,
+			GHASH_DIGEST_SIZE - dctx->bytes);
+
+		ctx->ghash_func(dctx->shash, ctx->htable,
+				dctx->buffer, GHASH_DIGEST_SIZE);
+
+		src += GHASH_DIGEST_SIZE - dctx->bytes;
+		srclen -= GHASH_DIGEST_SIZE - dctx->bytes;
+		dctx->bytes = 0;
+	}
+	len = srclen & ~(GHASH_DIGEST_SIZE - 1);
+
+	if (len) {
+		gcm_ghash_rv64i_zbc(dctx->shash, ctx->htable,
+				src, len);
+		src += len;
+		srclen -= len;
+	}
+
+	if (srclen) {
+		memcpy(dctx->buffer, src, srclen);
+		dctx->bytes = srclen;
+	}
+	return 0;
+}
+
+static int riscv64_zbc_ghash_final(struct shash_desc *desc, u8 *out)
+{
+	int i;
+	struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));
+	struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+
+	if (dctx->bytes) {
+		for (i = dctx->bytes; i < GHASH_DIGEST_SIZE; i++)
+			dctx->buffer[i] = 0;
+		ctx->ghash_func(dctx->shash, ctx->htable,
+				dctx->buffer, GHASH_DIGEST_SIZE);
+		dctx->bytes = 0;
+	}
+	memcpy(out, dctx->shash, GHASH_DIGEST_SIZE);
+	return 0;
+}
+
+RISCV64_ZBC_SETKEY(zbc, zbc);
+struct shash_alg riscv64_zbc_ghash_alg = {
+	.digestsize = GHASH_DIGEST_SIZE,
+	.init = riscv64_ghash_init,
+	.update = riscv64_zbc_ghash_update,
+	.final = riscv64_zbc_ghash_final,
+	.setkey = riscv64_zbc_ghash_setkey_zbc,
+	.descsize = sizeof(struct riscv64_ghash_desc_ctx)
+		    + sizeof(struct ghash_desc_ctx),
+	.base = {
+		 .cra_name = "ghash",
+		 .cra_driver_name = "riscv64_zbc_ghash",
+		 .cra_priority = 250,
+		 .cra_blocksize = GHASH_BLOCK_SIZE,
+		 .cra_ctxsize = sizeof(struct riscv64_ghash_ctx),
+		 .cra_module = THIS_MODULE,
+	},
+};
+
+RISCV64_ZBC_SETKEY(zbc__zbb, zbc);
+struct shash_alg riscv64_zbc_zbb_ghash_alg = {
+	.digestsize = GHASH_DIGEST_SIZE,
+	.init = riscv64_ghash_init,
+	.update = riscv64_zbc_ghash_update,
+	.final = riscv64_zbc_ghash_final,
+	.setkey = riscv64_zbc_ghash_setkey_zbc__zbb,
+	.descsize = sizeof(struct riscv64_ghash_desc_ctx)
+		    + sizeof(struct ghash_desc_ctx),
+	.base = {
+		 .cra_name = "ghash",
+		 .cra_driver_name = "riscv64_zbc_zbb_ghash",
+		 .cra_priority = 251,
+		 .cra_blocksize = GHASH_BLOCK_SIZE,
+		 .cra_ctxsize = sizeof(struct riscv64_ghash_ctx),
+		 .cra_module = THIS_MODULE,
+	},
+};
+
+RISCV64_ZBC_SETKEY(zbc__zbkb, zbc__zbkb);
+struct shash_alg riscv64_zbc_zbkb_ghash_alg = {
+	.digestsize = GHASH_DIGEST_SIZE,
+	.init = riscv64_ghash_init,
+	.update = riscv64_zbc_ghash_update,
+	.final = riscv64_zbc_ghash_final,
+	.setkey = riscv64_zbc_ghash_setkey_zbc__zbkb,
+	.descsize = sizeof(struct riscv64_ghash_desc_ctx)
+		    + sizeof(struct ghash_desc_ctx),
+	.base = {
+		 .cra_name = "ghash",
+		 .cra_driver_name = "riscv64_zbc_zbkb_ghash",
+		 .cra_priority = 252,
+		 .cra_blocksize = GHASH_BLOCK_SIZE,
+		 .cra_ctxsize = sizeof(struct riscv64_ghash_ctx),
+		 .cra_module = THIS_MODULE,
+	},
+};
+
+#endif /* CONFIG_RISCV_ISA_ZBC */
+
+#define RISCV64_DEFINED_GHASHES		7
+
+static struct shash_alg *riscv64_ghashes[RISCV64_DEFINED_GHASHES];
+static int num_riscv64_ghashes;
+
+static int __init riscv64_ghash_register(struct shash_alg *ghash)
+{
+	int ret;
+
+	ret = crypto_register_shash(ghash);
+	if (ret < 0) {
+		int i;
+
+		for (i = num_riscv64_ghashes - 1; i >= 0 ; i--)
+			crypto_unregister_shash(riscv64_ghashes[i]);
+
+		num_riscv64_ghashes = 0;
+
+		return ret;
+	}
+
+	pr_debug("Registered RISC-V ghash %s\n", ghash->base.cra_driver_name);
+	riscv64_ghashes[num_riscv64_ghashes] = ghash;
+	num_riscv64_ghashes++;
+	return 0;
+}
+
+static int __init riscv64_ghash_mod_init(void)
+{
+	int ret = 0;
+
+#ifdef CONFIG_RISCV_ISA_ZBC
+	if (riscv_isa_extension_available(NULL, ZBC)) {
+		ret = riscv64_ghash_register(&riscv64_zbc_ghash_alg);
+		if (ret < 0)
+			return ret;
+
+		if (riscv_isa_extension_available(NULL, ZBB)) {
+			ret = riscv64_ghash_register(&riscv64_zbc_zbb_ghash_alg);
+			if (ret < 0)
+				return ret;
+		}
+
+		if (riscv_isa_extension_available(NULL, ZBKB)) {
+			ret = riscv64_ghash_register(&riscv64_zbc_zbkb_ghash_alg);
+			if (ret < 0)
+				return ret;
+		}
+	}
+#endif
+
+	return 0;
+}
+
+static void __exit riscv64_ghash_mod_fini(void)
+{
+	int i;
+
+	for (i = num_riscv64_ghashes - 1; i >= 0 ; i--)
+		crypto_unregister_shash(riscv64_ghashes[i]);
+
+	num_riscv64_ghashes = 0;
+}
+
+module_init(riscv64_ghash_mod_init);
+module_exit(riscv64_ghash_mod_fini);
+
+MODULE_DESCRIPTION("GSM GHASH (accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("ghash");
diff --git a/arch/riscv/crypto/ghash-riscv64-zbc.pl b/arch/riscv/crypto/ghash-riscv64-zbc.pl
new file mode 100644
index 000000000000..691231ffa11c
--- /dev/null
+++ b/arch/riscv/crypto/ghash-riscv64-zbc.pl
@@ -0,0 +1,400 @@
+#! /usr/bin/env perl
+# Copyright 2022 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License").  You may not use
+# this file except in compliance with the License.  You can obtain a copy
+# in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+################################################################################
+# void gcm_init_rv64i_zbc(u128 Htable[16], const u64 H[2]);
+# void gcm_init_rv64i_zbc__zbb(u128 Htable[16], const u64 H[2]);
+# void gcm_init_rv64i_zbc__zbkb(u128 Htable[16], const u64 H[2]);
+#
+# input:  H: 128-bit H - secret parameter E(K, 0^128)
+# output: Htable: Preprocessed key data for gcm_gmult_rv64i_zbc* and
+#                 gcm_ghash_rv64i_zbc*
+#
+# All callers of this function revert the byte-order unconditionally
+# on little-endian machines. So we need to revert the byte-order back.
+# Additionally we reverse the bits of each byte.
+
+{
+my ($Htable,$H,$VAL0,$VAL1,$TMP0,$TMP1,$TMP2) = ("a0","a1","a2","a3","t0","t1","t2");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_init_rv64i_zbc
+.type gcm_init_rv64i_zbc,\@function
+gcm_init_rv64i_zbc:
+    ld      $VAL0,0($H)
+    ld      $VAL1,8($H)
+    @{[brev8_rv64i   $VAL0, $TMP0, $TMP1, $TMP2]}
+    @{[brev8_rv64i   $VAL1, $TMP0, $TMP1, $TMP2]}
+    @{[sd_rev8_rv64i $VAL0, $Htable, 0, $TMP0]}
+    @{[sd_rev8_rv64i $VAL1, $Htable, 8, $TMP0]}
+    ret
+.size gcm_init_rv64i_zbc,.-gcm_init_rv64i_zbc
+___
+}
+
+{
+my ($Htable,$H,$VAL0,$VAL1,$TMP0,$TMP1,$TMP2) = ("a0","a1","a2","a3","t0","t1","t2");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_init_rv64i_zbc__zbb
+.type gcm_init_rv64i_zbc__zbb,\@function
+gcm_init_rv64i_zbc__zbb:
+    ld      $VAL0,0($H)
+    ld      $VAL1,8($H)
+    @{[brev8_rv64i $VAL0, $TMP0, $TMP1, $TMP2]}
+    @{[brev8_rv64i $VAL1, $TMP0, $TMP1, $TMP2]}
+    @{[rev8 $VAL0, $VAL0]}
+    @{[rev8 $VAL1, $VAL1]}
+    sd      $VAL0,0($Htable)
+    sd      $VAL1,8($Htable)
+    ret
+.size gcm_init_rv64i_zbc__zbb,.-gcm_init_rv64i_zbc__zbb
+___
+}
+
+{
+my ($Htable,$H,$TMP0,$TMP1) = ("a0","a1","t0","t1");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_init_rv64i_zbc__zbkb
+.type gcm_init_rv64i_zbc__zbkb,\@function
+gcm_init_rv64i_zbc__zbkb:
+    ld      $TMP0,0($H)
+    ld      $TMP1,8($H)
+    @{[brev8 $TMP0, $TMP0]}
+    @{[brev8 $TMP1, $TMP1]}
+    @{[rev8 $TMP0, $TMP0]}
+    @{[rev8 $TMP1, $TMP1]}
+    sd      $TMP0,0($Htable)
+    sd      $TMP1,8($Htable)
+    ret
+.size gcm_init_rv64i_zbc__zbkb,.-gcm_init_rv64i_zbc__zbkb
+___
+}
+
+################################################################################
+# void gcm_gmult_rv64i_zbc(u64 Xi[2], const u128 Htable[16]);
+# void gcm_gmult_rv64i_zbc__zbkb(u64 Xi[2], const u128 Htable[16]);
+#
+# input:  Xi: current hash value
+#         Htable: copy of H
+# output: Xi: next hash value Xi
+#
+# Compute GMULT (Xi*H mod f) using the Zbc (clmul) and Zbb (basic bit manip)
+# extensions. Using the no-Karatsuba approach and clmul for the final reduction.
+# This results in an implementation with minimized number of instructions.
+# HW with clmul latencies higher than 2 cycles might observe a performance
+# improvement with Karatsuba. HW with clmul latencies higher than 6 cycles
+# might observe a performance improvement with additionally converting the
+# reduction to shift&xor. For a full discussion of this estimates see
+# https://github.com/riscv/riscv-crypto/blob/master/doc/supp/gcm-mode-cmul.adoc
+{
+my ($Xi,$Htable,$x0,$x1,$y0,$y1) = ("a0","a1","a4","a5","a6","a7");
+my ($z0,$z1,$z2,$z3,$t0,$t1,$polymod) = ("t0","t1","t2","t3","t4","t5","t6");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_gmult_rv64i_zbc
+.type gcm_gmult_rv64i_zbc,\@function
+gcm_gmult_rv64i_zbc:
+    # Load Xi and bit-reverse it
+    ld        $x0, 0($Xi)
+    ld        $x1, 8($Xi)
+    @{[brev8_rv64i $x0, $z0, $z1, $z2]}
+    @{[brev8_rv64i $x1, $z0, $z1, $z2]}
+
+    # Load the key (already bit-reversed)
+    ld        $y0, 0($Htable)
+    ld        $y1, 8($Htable)
+
+    # Load the reduction constant
+    la        $polymod, Lpolymod
+    lbu       $polymod, 0($polymod)
+
+    # Multiplication (without Karatsuba)
+    @{[clmulh $z3, $x1, $y1]}
+    @{[clmul  $z2, $x1, $y1]}
+    @{[clmulh $t1, $x0, $y1]}
+    @{[clmul  $z1, $x0, $y1]}
+    xor       $z2, $z2, $t1
+    @{[clmulh $t1, $x1, $y0]}
+    @{[clmul  $t0, $x1, $y0]}
+    xor       $z2, $z2, $t1
+    xor       $z1, $z1, $t0
+    @{[clmulh $t1, $x0, $y0]}
+    @{[clmul  $z0, $x0, $y0]}
+    xor       $z1, $z1, $t1
+
+    # Reduction with clmul
+    @{[clmulh $t1, $z3, $polymod]}
+    @{[clmul  $t0, $z3, $polymod]}
+    xor       $z2, $z2, $t1
+    xor       $z1, $z1, $t0
+    @{[clmulh $t1, $z2, $polymod]}
+    @{[clmul  $t0, $z2, $polymod]}
+    xor       $x1, $z1, $t1
+    xor       $x0, $z0, $t0
+
+    # Bit-reverse Xi back and store it
+    @{[brev8_rv64i $x0, $z0, $z1, $z2]}
+    @{[brev8_rv64i $x1, $z0, $z1, $z2]}
+    sd        $x0, 0($Xi)
+    sd        $x1, 8($Xi)
+    ret
+.size gcm_gmult_rv64i_zbc,.-gcm_gmult_rv64i_zbc
+___
+}
+
+{
+my ($Xi,$Htable,$x0,$x1,$y0,$y1) = ("a0","a1","a4","a5","a6","a7");
+my ($z0,$z1,$z2,$z3,$t0,$t1,$polymod) = ("t0","t1","t2","t3","t4","t5","t6");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_gmult_rv64i_zbc__zbkb
+.type gcm_gmult_rv64i_zbc__zbkb,\@function
+gcm_gmult_rv64i_zbc__zbkb:
+    # Load Xi and bit-reverse it
+    ld        $x0, 0($Xi)
+    ld        $x1, 8($Xi)
+    @{[brev8  $x0, $x0]}
+    @{[brev8  $x1, $x1]}
+
+    # Load the key (already bit-reversed)
+    ld        $y0, 0($Htable)
+    ld        $y1, 8($Htable)
+
+    # Load the reduction constant
+    la        $polymod, Lpolymod
+    lbu       $polymod, 0($polymod)
+
+    # Multiplication (without Karatsuba)
+    @{[clmulh $z3, $x1, $y1]}
+    @{[clmul  $z2, $x1, $y1]}
+    @{[clmulh $t1, $x0, $y1]}
+    @{[clmul  $z1, $x0, $y1]}
+    xor       $z2, $z2, $t1
+    @{[clmulh $t1, $x1, $y0]}
+    @{[clmul  $t0, $x1, $y0]}
+    xor       $z2, $z2, $t1
+    xor       $z1, $z1, $t0
+    @{[clmulh $t1, $x0, $y0]}
+    @{[clmul  $z0, $x0, $y0]}
+    xor       $z1, $z1, $t1
+
+    # Reduction with clmul
+    @{[clmulh $t1, $z3, $polymod]}
+    @{[clmul  $t0, $z3, $polymod]}
+    xor       $z2, $z2, $t1
+    xor       $z1, $z1, $t0
+    @{[clmulh $t1, $z2, $polymod]}
+    @{[clmul  $t0, $z2, $polymod]}
+    xor       $x1, $z1, $t1
+    xor       $x0, $z0, $t0
+
+    # Bit-reverse Xi back and store it
+    @{[brev8  $x0, $x0]}
+    @{[brev8  $x1, $x1]}
+    sd        $x0, 0($Xi)
+    sd        $x1, 8($Xi)
+    ret
+.size gcm_gmult_rv64i_zbc__zbkb,.-gcm_gmult_rv64i_zbc__zbkb
+___
+}
+
+################################################################################
+# void gcm_ghash_rv64i_zbc(u64 Xi[2], const u128 Htable[16],
+#                          const u8 *inp, size_t len);
+# void gcm_ghash_rv64i_zbc__zbkb(u64 Xi[2], const u128 Htable[16],
+#                                const u8 *inp, size_t len);
+#
+# input:  Xi: current hash value
+#         Htable: copy of H
+#         inp: pointer to input data
+#         len: length of input data in bytes (mutiple of block size)
+# output: Xi: Xi+1 (next hash value Xi)
+{
+my ($Xi,$Htable,$inp,$len,$x0,$x1,$y0,$y1) = ("a0","a1","a2","a3","a4","a5","a6","a7");
+my ($z0,$z1,$z2,$z3,$t0,$t1,$polymod) = ("t0","t1","t2","t3","t4","t5","t6");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_ghash_rv64i_zbc
+.type gcm_ghash_rv64i_zbc,\@function
+gcm_ghash_rv64i_zbc:
+    # Load Xi and bit-reverse it
+    ld        $x0, 0($Xi)
+    ld        $x1, 8($Xi)
+    @{[brev8_rv64i $x0, $z0, $z1, $z2]}
+    @{[brev8_rv64i $x1, $z0, $z1, $z2]}
+
+    # Load the key (already bit-reversed)
+    ld        $y0, 0($Htable)
+    ld        $y1, 8($Htable)
+
+    # Load the reduction constant
+    la        $polymod, Lpolymod
+    lbu       $polymod, 0($polymod)
+
+Lstep:
+    # Load the input data, bit-reverse them, and XOR them with Xi
+    ld        $t0, 0($inp)
+    ld        $t1, 8($inp)
+    add       $inp, $inp, 16
+    add       $len, $len, -16
+    @{[brev8_rv64i $t0, $z0, $z1, $z2]}
+    @{[brev8_rv64i $t1, $z0, $z1, $z2]}
+    xor       $x0, $x0, $t0
+    xor       $x1, $x1, $t1
+
+    # Multiplication (without Karatsuba)
+    @{[clmulh $z3, $x1, $y1]}
+    @{[clmul  $z2, $x1, $y1]}
+    @{[clmulh $t1, $x0, $y1]}
+    @{[clmul  $z1, $x0, $y1]}
+    xor       $z2, $z2, $t1
+    @{[clmulh $t1, $x1, $y0]}
+    @{[clmul  $t0, $x1, $y0]}
+    xor       $z2, $z2, $t1
+    xor       $z1, $z1, $t0
+    @{[clmulh $t1, $x0, $y0]}
+    @{[clmul  $z0, $x0, $y0]}
+    xor       $z1, $z1, $t1
+
+    # Reduction with clmul
+    @{[clmulh $t1, $z3, $polymod]}
+    @{[clmul  $t0, $z3, $polymod]}
+    xor       $z2, $z2, $t1
+    xor       $z1, $z1, $t0
+    @{[clmulh $t1, $z2, $polymod]}
+    @{[clmul  $t0, $z2, $polymod]}
+    xor       $x1, $z1, $t1
+    xor       $x0, $z0, $t0
+
+    # Iterate over all blocks
+    bnez      $len, Lstep
+
+    # Bit-reverse final Xi back and store it
+    @{[brev8_rv64i $x0, $z0, $z1, $z2]}
+    @{[brev8_rv64i $x1, $z0, $z1, $z2]}
+    sd        $x0, 0($Xi)
+    sd        $x1, 8($Xi)
+    ret
+.size gcm_ghash_rv64i_zbc,.-gcm_ghash_rv64i_zbc
+___
+}
+
+{
+my ($Xi,$Htable,$inp,$len,$x0,$x1,$y0,$y1) = ("a0","a1","a2","a3","a4","a5","a6","a7");
+my ($z0,$z1,$z2,$z3,$t0,$t1,$polymod) = ("t0","t1","t2","t3","t4","t5","t6");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_ghash_rv64i_zbc__zbkb
+.type gcm_ghash_rv64i_zbc__zbkb,\@function
+gcm_ghash_rv64i_zbc__zbkb:
+    # Load Xi and bit-reverse it
+    ld        $x0, 0($Xi)
+    ld        $x1, 8($Xi)
+    @{[brev8  $x0, $x0]}
+    @{[brev8  $x1, $x1]}
+
+    # Load the key (already bit-reversed)
+    ld        $y0, 0($Htable)
+    ld        $y1, 8($Htable)
+
+    # Load the reduction constant
+    la        $polymod, Lpolymod
+    lbu       $polymod, 0($polymod)
+
+Lstep_zkbk:
+    # Load the input data, bit-reverse them, and XOR them with Xi
+    ld        $t0, 0($inp)
+    ld        $t1, 8($inp)
+    add       $inp, $inp, 16
+    add       $len, $len, -16
+    @{[brev8  $t0, $t0]}
+    @{[brev8  $t1, $t1]}
+    xor       $x0, $x0, $t0
+    xor       $x1, $x1, $t1
+
+    # Multiplication (without Karatsuba)
+    @{[clmulh $z3, $x1, $y1]}
+    @{[clmul  $z2, $x1, $y1]}
+    @{[clmulh $t1, $x0, $y1]}
+    @{[clmul  $z1, $x0, $y1]}
+    xor       $z2, $z2, $t1
+    @{[clmulh $t1, $x1, $y0]}
+    @{[clmul  $t0, $x1, $y0]}
+    xor       $z2, $z2, $t1
+    xor       $z1, $z1, $t0
+    @{[clmulh $t1, $x0, $y0]}
+    @{[clmul  $z0, $x0, $y0]}
+    xor       $z1, $z1, $t1
+
+    # Reduction with clmul
+    @{[clmulh $t1, $z3, $polymod]}
+    @{[clmul  $t0, $z3, $polymod]}
+    xor       $z2, $z2, $t1
+    xor       $z1, $z1, $t0
+    @{[clmulh $t1, $z2, $polymod]}
+    @{[clmul  $t0, $z2, $polymod]}
+    xor       $x1, $z1, $t1
+    xor       $x0, $z0, $t0
+
+    # Iterate over all blocks
+    bnez      $len, Lstep_zkbk
+
+    # Bit-reverse final Xi back and store it
+    @{[brev8  $x0, $x0]}
+    @{[brev8  $x1, $x1]}
+    sd $x0,  0($Xi)
+    sd $x1,  8($Xi)
+    ret
+.size gcm_ghash_rv64i_zbc__zbkb,.-gcm_ghash_rv64i_zbc__zbkb
+___
+}
+
+$code .= <<___;
+.p2align 3
+Lbrev8_const:
+    .dword  0xAAAAAAAAAAAAAAAA
+    .dword  0xCCCCCCCCCCCCCCCC
+    .dword  0xF0F0F0F0F0F0F0F0
+.size Lbrev8_const,.-Lbrev8_const
+
+Lpolymod:
+    .byte 0x87
+.size Lpolymod,.-Lpolymod
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";
diff --git a/arch/riscv/crypto/riscv.pm b/arch/riscv/crypto/riscv.pm
new file mode 100644
index 000000000000..61bc4fc41a43
--- /dev/null
+++ b/arch/riscv/crypto/riscv.pm
@@ -0,0 +1,230 @@
+#! /usr/bin/env perl
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License").  You may not use
+# this file except in compliance with the License.  You can obtain a copy
+# in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+
+use strict;
+use warnings;
+
+# Set $have_stacktrace to 1 if we have Devel::StackTrace
+my $have_stacktrace = 0;
+if (eval {require Devel::StackTrace;1;}) {
+    $have_stacktrace = 1;
+}
+
+my @regs = map("x$_",(0..31));
+my @regaliases = ('zero','ra','sp','gp','tp','t0','t1','t2','s0','s1',
+    map("a$_",(0..7)),
+    map("s$_",(2..11)),
+    map("t$_",(3..6))
+);
+
+my %reglookup;
+@reglookup{@regs} = @regs;
+@reglookup{@regaliases} = @regs;
+
+# Takes a register name, possibly an alias, and converts it to a register index
+# from 0 to 31
+sub read_reg {
+    my $reg = lc shift;
+    if (!exists($reglookup{$reg})) {
+        my $trace = "";
+        if ($have_stacktrace) {
+            $trace = Devel::StackTrace->new->as_string;
+        }
+        die("Unknown register ".$reg."\n".$trace);
+    }
+    my $regstr = $reglookup{$reg};
+    if (!($regstr =~ /^x([0-9]+)$/)) {
+        my $trace = "";
+        if ($have_stacktrace) {
+            $trace = Devel::StackTrace->new->as_string;
+        }
+        die("Could not process register ".$reg."\n".$trace);
+    }
+    return $1;
+}
+
+# Helper functions
+
+sub brev8_rv64i {
+    # brev8 without `brev8` instruction (only in Zkbk)
+    # Bit-reverses the first argument and needs three scratch registers
+    my $val = shift;
+    my $t0 = shift;
+    my $t1 = shift;
+    my $brev8_const = shift;
+    my $seq = <<___;
+        la      $brev8_const, Lbrev8_const
+
+        ld      $t0, 0($brev8_const)  # 0xAAAAAAAAAAAAAAAA
+        slli    $t1, $val, 1
+        and     $t1, $t1, $t0
+        and     $val, $val, $t0
+        srli    $val, $val, 1
+        or      $val, $t1, $val
+
+        ld      $t0, 8($brev8_const)  # 0xCCCCCCCCCCCCCCCC
+        slli    $t1, $val, 2
+        and     $t1, $t1, $t0
+        and     $val, $val, $t0
+        srli    $val, $val, 2
+        or      $val, $t1, $val
+
+        ld      $t0, 16($brev8_const) # 0xF0F0F0F0F0F0F0F0
+        slli    $t1, $val, 4
+        and     $t1, $t1, $t0
+        and     $val, $val, $t0
+        srli    $val, $val, 4
+        or      $val, $t1, $val
+___
+    return $seq;
+}
+
+sub sd_rev8_rv64i {
+    # rev8 without `rev8` instruction (only in Zbb or Zbkb)
+    # Stores the given value byte-reversed and needs one scratch register
+    my $val = shift;
+    my $addr = shift;
+    my $off = shift;
+    my $tmp = shift;
+    my $off0 = ($off + 0);
+    my $off1 = ($off + 1);
+    my $off2 = ($off + 2);
+    my $off3 = ($off + 3);
+    my $off4 = ($off + 4);
+    my $off5 = ($off + 5);
+    my $off6 = ($off + 6);
+    my $off7 = ($off + 7);
+    my $seq = <<___;
+        sb      $val, $off7($addr)
+        srli    $tmp, $val, 8
+        sb      $tmp, $off6($addr)
+        srli    $tmp, $val, 16
+        sb      $tmp, $off5($addr)
+        srli    $tmp, $val, 24
+        sb      $tmp, $off4($addr)
+        srli    $tmp, $val, 32
+        sb      $tmp, $off3($addr)
+        srli    $tmp, $val, 40
+        sb      $tmp, $off2($addr)
+        srli    $tmp, $val, 48
+        sb      $tmp, $off1($addr)
+        srli    $tmp, $val, 56
+        sb      $tmp, $off0($addr)
+___
+    return $seq;
+}
+
+# Scalar crypto instructions
+
+sub aes64ds {
+    # Encoding for aes64ds rd, rs1, rs2 instruction on RV64
+    #                XXXXXXX_ rs2 _ rs1 _XXX_ rd  _XXXXXXX
+    my $template = 0b0011101_00000_00000_000_00000_0110011;
+    my $rd = read_reg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7));
+}
+
+sub aes64dsm {
+    # Encoding for aes64dsm rd, rs1, rs2 instruction on RV64
+    #                XXXXXXX_ rs2 _ rs1 _XXX_ rd  _XXXXXXX
+    my $template = 0b0011111_00000_00000_000_00000_0110011;
+    my $rd = read_reg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7));
+}
+
+sub aes64es {
+    # Encoding for aes64es rd, rs1, rs2 instruction on RV64
+    #                XXXXXXX_ rs2 _ rs1 _XXX_ rd  _XXXXXXX
+    my $template = 0b0011001_00000_00000_000_00000_0110011;
+    my $rd = read_reg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7));
+}
+
+sub aes64esm {
+    # Encoding for aes64esm rd, rs1, rs2 instruction on RV64
+    #                XXXXXXX_ rs2 _ rs1 _XXX_ rd  _XXXXXXX
+    my $template = 0b0011011_00000_00000_000_00000_0110011;
+    my $rd = read_reg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7));
+}
+
+sub aes64im {
+    # Encoding for aes64im rd, rs1 instruction on RV64
+    #                XXXXXXXXXXXX_ rs1 _XXX_ rd  _XXXXXXX
+    my $template = 0b001100000000_00000_001_00000_0010011;
+    my $rd = read_reg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($rs1 << 15) | ($rd << 7));
+}
+
+sub aes64ks1i {
+    # Encoding for aes64ks1i rd, rs1, rnum instruction on RV64
+    #                XXXXXXXX_rnum_ rs1 _XXX_ rd  _XXXXXXX
+    my $template = 0b00110001_0000_00000_001_00000_0010011;
+    my $rd = read_reg shift;
+    my $rs1 = read_reg shift;
+    my $rnum = shift;
+    return ".word ".($template | ($rnum << 20) | ($rs1 << 15) | ($rd << 7));
+}
+
+sub aes64ks2 {
+    # Encoding for aes64ks2 rd, rs1, rs2 instruction on RV64
+    #                XXXXXXX_ rs2 _ rs1 _XXX_ rd  _XXXXXXX
+    my $template = 0b0111111_00000_00000_000_00000_0110011;
+    my $rd = read_reg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7));
+}
+
+sub brev8 {
+    # brev8 rd, rs
+    my $template = 0b011010000111_00000_101_00000_0010011;
+    my $rd = read_reg shift;
+    my $rs = read_reg shift;
+    return ".word ".($template | ($rs << 15) | ($rd << 7));
+}
+
+sub clmul {
+    # Encoding for clmul rd, rs1, rs2 instruction on RV64
+    #                XXXXXXX_ rs2 _ rs1 _XXX_ rd  _XXXXXXX
+    my $template = 0b0000101_00000_00000_001_00000_0110011;
+    my $rd = read_reg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7));
+}
+
+sub clmulh {
+    # Encoding for clmulh rd, rs1, rs2 instruction on RV64
+    #                XXXXXXX_ rs2 _ rs1 _XXX_ rd  _XXXXXXX
+    my $template = 0b0000101_00000_00000_011_00000_0110011;
+    my $rd = read_reg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($rd << 7));
+}
+
+sub rev8 {
+    # Encoding for rev8 rd, rs instruction on RV64
+    #               XXXXXXXXXXXXX_ rs  _XXX_ rd  _XXXXXXX
+    my $template = 0b011010111000_00000_101_00000_0010011;
+    my $rd = read_reg shift;
+    my $rs = read_reg shift;
+    return ".word ".($template | ($rs << 15) | ($rd << 7));
+}
+
+1;

From patchwork Mon Mar 13 19:12:53 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= <heiko@sntech.de>
X-Patchwork-Id: 69063
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1362130wrd;
        Mon, 13 Mar 2023 12:20:25 -0700 (PDT)
X-Google-Smtp-Source: 
 AK7set/n3bXvGIhdXHFMJDwYU9s8Wk3fvT/BlmfYVLvz4a5b72oSRBuCuCB5LexUGp9RJEyI7KM5
X-Received: by 2002:a62:1792:0:b0:5a8:cec9:6ab6 with SMTP id
 140-20020a621792000000b005a8cec96ab6mr28893503pfx.31.1678735223049;
        Mon, 13 Mar 2023 12:20:23 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1678735223; cv=none;
        d=google.com; s=arc-20160816;
        b=0d6bCUsnCadf/9/fi+/dk9t8dpI1ZvsIgBnWhjbgHcw1PESEhiZ1B0N1aZICgt3REy
         /fKTyhvvlCp1SG5rKoQtk8afNpb1oKBAhe3FGbPkhEISInoPVtvybc4sFQCpIGczBF8E
         tQlV6lHIkrhgmGMi5Z8GtWFPEnZax5WXt0EnaiLFkcXfgvynm8ioNIb0a33LXTKVBH3C
         tZz4irCE7lwwO+2tg/9SzCnuELqmvq8NhbWCtTRUZlb5H3ZmlyN49V2I2iwlX3MWHd4S
         sRT772dNTAFv+qKkAh6tOBb9Q4mJMzDG7AIjpKuSQwrggOXyrJ99ozdXGF6bJMcP0DO3
         0XUQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from;
        bh=7I3aclIWEe20n6d9rt8YZV1H1jCvecFjvx48YFQ5AF0=;
        b=xKds37usWekIjVwqsch7UACSymOdvPaJH8kIL3rCk5Hc7M0eXpCzF+1uQi//kUqm0f
         XV9McVSxox3lZhqCZ1oMCPRbGOfGq3iQOqiW/jSAO8n/G+BGBORFplShaDksrX3p02hB
         ++slgbKus1XBAcLzJkWOOu9giGupa0HKPhF72kMaGDTGEiaUvA6WK94vyf1wEAwOWS6r
         sYxXVp0PPzf8C5MYVRDOwQVfaIy4JncX2OfFjNKiVvivfgxOArTCVdI4NgKQYEfJjzvq
         +b97V7wKl9WPImDUWMZxfv7ktzflQJlNPGp+d7PpKSwY2dDbTM8/0zkruOS3paHBdMCw
         dFIQ==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 k189-20020a6284c6000000b0062275c53889si191258pfd.267.2023.03.13.12.20.08;
        Mon, 13 Mar 2023 12:20:23 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230287AbjCMTN3 (ORCPT <rfc822;realc9580@gmail.com> + 99 others);
        Mon, 13 Mar 2023 15:13:29 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35606 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229608AbjCMTNN (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 13 Mar 2023 15:13:13 -0400
Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A9505CA20
        for <linux-kernel@vger.kernel.org>;
 Mon, 13 Mar 2023 12:13:08 -0700 (PDT)
Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169]
 helo=phil.lan)
        by gloria.sntech.de with esmtpsa  (TLS1.3) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.94.2)
        (envelope-from <heiko@sntech.de>)
        id 1pbnbV-00028k-In; Mon, 13 Mar 2023 20:13:05 +0100
From: Heiko Stuebner <heiko@sntech.de>
To: palmer@rivosinc.com
Cc: greentime.hu@sifive.com, conor@kernel.org,
        linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
        christoph.muellner@vrull.eu, heiko@sntech.de
Subject: [PATCH RFC v3 07/16] RISC-V: add helper function to read the vector
 VLEN
Date: Mon, 13 Mar 2023 20:12:53 +0100
Message-Id: <20230313191302.580787-8-heiko.stuebner@vrull.eu>
X-Mailer: git-send-email 2.39.0
In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
References: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,
        T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1760281465062527761?=
X-GMAIL-MSGID: =?utf-8?q?1760281465062527761?=

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

VLEN describes the length of each vector register and some instructions
need specific minimal VLENs to work correctly.

The vector code already includes a variable riscv_vsize that contains the
value of "32 vector registers with vlenb length" that gets filled during
boot. vlenb is the value contained in the CSR_VLENB register and
the value represents "VLEN / 8".

So add riscv_vector_vlen() to return the actual VLEN value for in-kernel
users when they need to check the available VLEN.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/vector.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 202df9ea28d7..e466e7787d25 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -178,4 +178,15 @@ static inline bool riscv_v_vstate_query(struct pt_regs *regs) { return false; }
 
 #endif /* CONFIG_RISCV_ISA_V */
 
+/*
+ * Return the implementation's vlen value.
+ *
+ * riscv_vsize contains the value of "32 vector registers with vlenb length"
+ * so rebuild the vlen value in bits from it.
+ */
+static inline int riscv_vector_vlen(void)
+{
+	return riscv_v_vsize / 32 * 8;
+}
+
 #endif /* ! __ASM_RISCV_VECTOR_H */

From patchwork Mon Mar 13 19:12:54 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= <heiko@sntech.de>
X-Patchwork-Id: 69073
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1372455wrd;
        Mon, 13 Mar 2023 12:45:06 -0700 (PDT)
X-Google-Smtp-Source: 
 AK7set8ruDrJuJjgQ+DRy1O+r9ArpwW0JN7/JgGC3J2fbnJrnTx4+gPKI7dIpQS9tHQ5VVXBK80G
X-Received: by 2002:a05:6a20:4c1f:b0:cc:6d4d:57b3 with SMTP id
 fm31-20020a056a204c1f00b000cc6d4d57b3mr26699660pzb.12.1678736706016;
        Mon, 13 Mar 2023 12:45:06 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1678736705; cv=none;
        d=google.com; s=arc-20160816;
        b=UPLG9kPKswCIWbWFmTJ95IoitFPm0RL4k1V4H/Lgy6LZmPyGmwzdtPGtmOWFdgISmE
         utwqt6jovSoi1EV3+vlpcBl4DSir4Pl+uqYNrBL0yiU8j+oAesNn0IwGpj6gJysX58EW
         T4FSNBnS0oVG7prbJOJRE6xMsurm+M6RptiEYyQn4lTzegrIjmQmwBHHsN+qLc0T+xFA
         PGoKIEt2Zyf4cAJsWXeVg1qWXk+q5aS856vflKicHL6dZuMB4IlRO0H5IkfIkAT1b1eC
         NzYuSoHvIoPvhgwV4LnvPesnhkBoRv2Zn6sMO0q5dmj6hPwAQfn1eknBtSNwTICT+2ze
         mv6w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from;
        bh=tx36PPvgqYGv33TKRKLppWsdAiL2MHoLSrap+v9eH3o=;
        b=dLGpgZ6NDP0mI9e71tFN7Ay/rVLSF0xv+zV4pAmM4gnL8b38HlJQgm1gEJHjL3T1of
         A+ZLeFdI+HU4Gq02dvYR7nUUcACk6Dkd2SGgcg9eWmMMk/40pC+LanC8ScMozujjI1QY
         WhRyrpQlngcoOQWarSbnUSMbWSTk//EkeNArucVhdFEEIJAPF8QpKO92VJt/3LXClUPa
         +OcJgLUgpYsdOKClwQnQuuIvoAGfDTZuoCa0lW67IOZLhN7Cz+Fhhm/4LaIjL+MJirCa
         N3b641z3NZbeDrs8j8hrl+TWej/LbIJIrKCAS3MHToKCHeROBHc5W9vZmfRxlH3krNGj
         ed3Q==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 s37-20020a634525000000b004fbe8661fdfsi262873pga.309.2023.03.13.12.44.51;
        Mon, 13 Mar 2023 12:45:05 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230409AbjCMTNX (ORCPT <rfc822;realc9580@gmail.com> + 99 others);
        Mon, 13 Mar 2023 15:13:23 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35614 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230117AbjCMTNN (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 13 Mar 2023 15:13:13 -0400
Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A9939CC3B
        for <linux-kernel@vger.kernel.org>;
 Mon, 13 Mar 2023 12:13:08 -0700 (PDT)
Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169]
 helo=phil.lan)
        by gloria.sntech.de with esmtpsa  (TLS1.3) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.94.2)
        (envelope-from <heiko@sntech.de>)
        id 1pbnbV-00028k-Qn; Mon, 13 Mar 2023 20:13:05 +0100
From: Heiko Stuebner <heiko@sntech.de>
To: palmer@rivosinc.com
Cc: greentime.hu@sifive.com, conor@kernel.org,
        linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
        christoph.muellner@vrull.eu, heiko@sntech.de
Subject: [PATCH RFC v3 08/16] RISC-V: add vector crypto extension detection
Date: Mon, 13 Mar 2023 20:12:54 +0100
Message-Id: <20230313191302.580787-9-heiko.stuebner@vrull.eu>
X-Mailer: git-send-email 2.39.0
In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
References: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,
        T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1760283020141740556?=
X-GMAIL-MSGID: =?utf-8?q?1760283020141740556?=

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Add detection for some extensions of the vector-crypto specification, namely
- Zvkb: Vector Bit-manipulation used in Cryptography
- Zvkg: Vector GCM/GMAC
- Zvknha and Zvknhb: NIST Algorithm Suite
- Zvkns: AES-128, AES-256 Single Round Suite
- Zvksed: ShangMi Algorithm Suite
- Zvksh: ShangMi Algorithm Suite

As their use is very specific and will likely be limited to special places
we expect current code to just pre-encode those instructions, so right now
we don't introduce toolchain requirements.

Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/include/asm/hwcap.h | 7 +++++++
 arch/riscv/kernel/cpu.c        | 7 +++++++
 arch/riscv/kernel/cpufeature.c | 7 +++++++
 3 files changed, 21 insertions(+)

diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index b28548fb10f3..914559e0e136 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -45,6 +45,13 @@
 #define RISCV_ISA_EXT_ZIHINTPAUSE	32
 #define RISCV_ISA_EXT_ZBC		33
 #define RISCV_ISA_EXT_ZBKB		34
+#define RISCV_ISA_EXT_ZVKB		35
+#define RISCV_ISA_EXT_ZVKG		36
+#define RISCV_ISA_EXT_ZVKNED		37
+#define RISCV_ISA_EXT_ZVKNHA		38
+#define RISCV_ISA_EXT_ZVKNHB		39
+#define RISCV_ISA_EXT_ZVKSED		40
+#define RISCV_ISA_EXT_ZVKSH		41
 
 #define RISCV_ISA_EXT_MAX		64
 #define RISCV_ISA_EXT_NAME_LEN_MAX	32
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index 6f65aac68018..c01e6673a947 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -190,6 +190,13 @@ static struct riscv_isa_ext_data isa_ext_arr[] = {
 	__RISCV_ISA_EXT_DATA(zbb, RISCV_ISA_EXT_ZBB),
 	__RISCV_ISA_EXT_DATA(zbc, RISCV_ISA_EXT_ZBC),
 	__RISCV_ISA_EXT_DATA(zbkb, RISCV_ISA_EXT_ZBKB),
+	__RISCV_ISA_EXT_DATA(zvkb, RISCV_ISA_EXT_ZVKB),
+	__RISCV_ISA_EXT_DATA(zvkg, RISCV_ISA_EXT_ZVKG),
+	__RISCV_ISA_EXT_DATA(zvkned, RISCV_ISA_EXT_ZVKNED),
+	__RISCV_ISA_EXT_DATA(zvknha, RISCV_ISA_EXT_ZVKNHA),
+	__RISCV_ISA_EXT_DATA(zvknhb, RISCV_ISA_EXT_ZVKNHB),
+	__RISCV_ISA_EXT_DATA(zvksed, RISCV_ISA_EXT_ZVKSED),
+	__RISCV_ISA_EXT_DATA(zvksh, RISCV_ISA_EXT_ZVKSH),
 	__RISCV_ISA_EXT_DATA(sscofpmf, RISCV_ISA_EXT_SSCOFPMF),
 	__RISCV_ISA_EXT_DATA(sstc, RISCV_ISA_EXT_SSTC),
 	__RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL),
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index eb7be8e7f24e..ad866321ae37 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -232,6 +232,13 @@ void __init riscv_fill_hwcap(void)
 				SET_ISA_EXT_MAP("zbkb", RISCV_ISA_EXT_ZBKB);
 				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
 				SET_ISA_EXT_MAP("zihintpause", RISCV_ISA_EXT_ZIHINTPAUSE);
+				SET_ISA_EXT_MAP("zvkb", RISCV_ISA_EXT_ZVKB);
+				SET_ISA_EXT_MAP("zvkg", RISCV_ISA_EXT_ZVKG);
+				SET_ISA_EXT_MAP("zvkned", RISCV_ISA_EXT_ZVKNED);
+				SET_ISA_EXT_MAP("zvknha", RISCV_ISA_EXT_ZVKNHA);
+				SET_ISA_EXT_MAP("zvknhb", RISCV_ISA_EXT_ZVKNHB);
+				SET_ISA_EXT_MAP("zvksed", RISCV_ISA_EXT_ZVKSED);
+				SET_ISA_EXT_MAP("zvksh", RISCV_ISA_EXT_ZVKSH);
 			}
 #undef SET_ISA_EXT_MAP
 		}

From patchwork Mon Mar 13 19:12:55 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= <heiko@sntech.de>
X-Patchwork-Id: 69062
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1362115wrd;
        Mon, 13 Mar 2023 12:20:23 -0700 (PDT)
X-Google-Smtp-Source: 
 AK7set9bepMhLijXvEyuZj5gF5yuBQ9gGR1BRUsYh3Kll50i9A+SebXiyc2wiJP1+KY0X4wDnAve
X-Received: by 2002:a17:902:ce89:b0:19e:6cb9:4c8f with SMTP id
 f9-20020a170902ce8900b0019e6cb94c8fmr44699811plg.41.1678735223415;
        Mon, 13 Mar 2023 12:20:23 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1678735223; cv=none;
        d=google.com; s=arc-20160816;
        b=TpggRw7ZYVj3eDpQbv2fz70Ckd16+hee93gv2CQs8b6vcUYj2oByQz1GSxHWiq0Var
         rexOzSMO69GBizt5HEOHsfRdGLEvnrCVTLzTcajTX9W++NHH9z53gl8M91YwH6dBVM8N
         Xo7i1d2ZTpSBWKFngIpczSProIQ2rAf9CD48Up7T29Uk6UDDSZio+hr1DtcPeUF/qLie
         jmk0tRUcuWYW8hAeSzGt8m5bl9iMzU0oFaDtufoE/Cp29L0273X8s4k6m8o2mVjPMulX
         by7t5ixyGoKdKPb/Q1tHcmiC7pKAgYt3CbOczmIhtyQsSbD7UbiKzfPrAC7tAOX4lOZ7
         IS/w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from;
        bh=3ME6kGGSq8N1B7fh9UKNG+KzI9iIShrWG6DRAEhS6Hc=;
        b=mpZH+RBhxaeXJnIUB46b424w68VJHhpnMAqgiMny3rIh1vcdnuRyPV3qP1YniCkOtS
         NAj1QAcghHAPov79IqJrh++Vd1Pb/Dsp0G/eF+QP311AoJ3MNXCVytiqIzunk26/7029
         xhdy41zUDvYNrkXS3jSTWVh4vbPIzin+Eegbyk2xc2Mh0C3iFc9ofqX6WF4ASnKU22tx
         H/OLAPj4x5bKOOpvBlXwof25KqsKzzWK6w2DiyQq6ddeO6dEVzkLMisOLMFXuQbvP/Oh
         ANNopmGpTjo/7VVqCmowR3uACOC2LwQRCJsCP0LGfb/YUO8+N7aF5NcZMkQMECxa1jgD
         y+JA==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 jh6-20020a170903328600b001a05a18d269si460972plb.121.2023.03.13.12.20.08;
        Mon, 13 Mar 2023 12:20:23 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231175AbjCMTNj (ORCPT <rfc822;realc9580@gmail.com> + 99 others);
        Mon, 13 Mar 2023 15:13:39 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35608 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230187AbjCMTNP (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 13 Mar 2023 15:13:15 -0400
Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B8A1D33E
        for <linux-kernel@vger.kernel.org>;
 Mon, 13 Mar 2023 12:13:08 -0700 (PDT)
Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169]
 helo=phil.lan)
        by gloria.sntech.de with esmtpsa  (TLS1.3) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.94.2)
        (envelope-from <heiko@sntech.de>)
        id 1pbnbW-00028k-3C; Mon, 13 Mar 2023 20:13:06 +0100
From: Heiko Stuebner <heiko@sntech.de>
To: palmer@rivosinc.com
Cc: greentime.hu@sifive.com, conor@kernel.org,
        linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
        christoph.muellner@vrull.eu, heiko@sntech.de
Subject: [PATCH RFC v3 09/16] RISC-V: crypto: update perl include with helpers
 for vector (crypto) instructions
Date: Mon, 13 Mar 2023 20:12:55 +0100
Message-Id: <20230313191302.580787-10-heiko.stuebner@vrull.eu>
X-Mailer: git-send-email 2.39.0
In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
References: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,
        T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1760281465534693498?=
X-GMAIL-MSGID: =?utf-8?q?1760281465534693498?=

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

The openSSL scripts use a number of helpers for handling vector
instructions and instructions from the vector-crypto-extensions.

Therefore port these over from openSSL.

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/riscv.pm | 433 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 431 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/crypto/riscv.pm b/arch/riscv/crypto/riscv.pm
index 61bc4fc41a43..a707dd3a68fb 100644
--- a/arch/riscv/crypto/riscv.pm
+++ b/arch/riscv/crypto/riscv.pm
@@ -48,11 +48,34 @@ sub read_reg {
     return $1;
 }
 
+my @vregs = map("v$_",(0..31));
+my %vreglookup;
+@vreglookup{@vregs} = @vregs;
+
+sub read_vreg {
+    my $vreg = lc shift;
+    if (!exists($vreglookup{$vreg})) {
+        my $trace = "";
+        if ($have_stacktrace) {
+            $trace = Devel::StackTrace->new->as_string;
+        }
+        die("Unknown vector register ".$vreg."\n".$trace);
+    }
+    if (!($vreg =~ /^v([0-9]+)$/)) {
+        my $trace = "";
+        if ($have_stacktrace) {
+            $trace = Devel::StackTrace->new->as_string;
+        }
+        die("Could not process vector register ".$vreg."\n".$trace);
+    }
+    return $1;
+}
+
 # Helper functions
 
 sub brev8_rv64i {
-    # brev8 without `brev8` instruction (only in Zkbk)
-    # Bit-reverses the first argument and needs three scratch registers
+    # brev8 without `brev8` instruction (only in Zbkb)
+    # Bit-reverses the first argument and needs two scratch registers
     my $val = shift;
     my $t0 = shift;
     my $t1 = shift;
@@ -227,4 +250,410 @@ sub rev8 {
     return ".word ".($template | ($rs << 15) | ($rd << 7));
 }
 
+# Vector instructions
+
+sub vadd_vv {
+    # vadd.vv vd, vs2, vs1
+    my $template = 0b0000001_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vid_v {
+    # vid.v vd
+    my $template = 0b0101001_00000_10001_010_00000_1010111;
+    my $vd = read_vreg shift;
+    return ".word ".($template | ($vd << 7));
+}
+
+sub vle32_v {
+    # vle32.v vd, (rs1)
+    my $template = 0b0000001_00000_00000_110_00000_0000111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vle64_v {
+    # vle64.v vd, (rs1)
+    my $template = 0b0000001_00000_00000_111_00000_0000111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vlse32_v {
+    # vlse32.v vd, (rs1), rs2
+    my $template = 0b0000101_00000_00000_110_00000_0000111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vlse64_v {
+    # vlse64.v vd, (rs1), rs2
+    my $template = 0b0000101_00000_00000_111_00000_0000111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vmerge_vim {
+    # vmerge.vim vd, vs2, imm, v0
+    my $template = 0b0101110_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $imm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($imm << 15) | ($vd << 7));
+}
+
+sub vmerge_vvm {
+    # vmerge.vvm vd vs2 vs1
+    my $template = 0b0101110_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 <<   15) | ($vd << 7))
+}
+
+sub vmseq_vi {
+    # vmseq vd vs1, imm
+    my $template = 0b0110001_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    my $imm = shift;
+    return ".word ".($template | ($vs1 << 20) | ($imm <<   15) | ($vd << 7))
+}
+
+sub vmv_v_i {
+    # vmv.v.i vd, imm
+    my $template = 0b0101111_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $imm = shift;
+    return ".word ".($template | ($imm << 15) | ($vd << 7));
+}
+
+sub vmv_v_v {
+    # vmv.v.v vd, vs1
+    my $template = 0b0101111_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vor_vv_v0t {
+    # vor.vv vd, vs2, vs1, v0.t
+    my $template = 0b0010100_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vse32_v {
+    # vse32.v vd, (rs1)
+    my $template = 0b0000001_00000_00000_110_00000_0100111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vse64_v {
+    # vse64.v vd, (rs1)
+    my $template = 0b0000001_00000_00000_111_00000_0100111;
+    my $vd = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vsetivli__x0_2_e64_m1_ta_ma {
+    # vsetivli x0, 2, e64, m1, ta, ma
+    return ".word 0xcd817057";
+}
+
+sub vsetivli__x0_4_e32_m1_ta_ma {
+    # vsetivli x0, 4, e32, m1, ta, ma
+    return ".word 0xcd027057";
+}
+
+sub vsetivli__x0_4_e64_m1_ta_ma {
+    # vsetivli x0,4,e64,m1,ta,ma
+    return ".word 0xcd827057";
+}
+
+sub vsetivli__x0_8_e32_m1_ta_ma {
+    return ".word 0xcd047057";
+}
+
+sub vslidedown_vi {
+    # vslidedown.vi vd, vs2, uimm
+    my $template = 0b0011111_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vslideup_vi_v0t {
+    # vslideup.vi vd, vs2, uimm, v0.t
+    my $template = 0b0011100_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vslideup_vi {
+    # vslideup.vi vd, vs2, uimm
+    my $template = 0b0011101_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vsll_vi {
+    # vsll.vi vd, vs2, uimm, vm
+    my $template = 0b1001011_00000_00000_011_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vsrl_vx {
+    # vsrl.vx vd, vs2, rs1
+    my $template = 0b1010001_00000_00000_100_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vsse32_v {
+    # vse32.v vs3, (rs1), rs2
+    my $template = 0b0000101_00000_00000_110_00000_0100111;
+    my $vs3 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7));
+}
+
+sub vsse64_v {
+    # vsse64.v vs3, (rs1), rs2
+    my $template = 0b0000101_00000_00000_111_00000_0100111;
+    my $vs3 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    my $rs2 = read_reg shift;
+    return ".word ".($template | ($rs2 << 20) | ($rs1 << 15) | ($vs3 << 7));
+}
+
+sub vxor_vv_v0t {
+    # vxor.vv vd, vs2, vs1, v0.t
+    my $template = 0b0010110_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vxor_vv {
+    # vxor.vv vd, vs2, vs1
+    my $template = 0b0010111_00000_00000_000_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+# Vector crypto instructions
+
+## Zvkb instructions
+
+sub vclmulh_vx {
+    # vclmulh.vx vd, vs2, rs1
+    my $template = 0b0011011_00000_00000_110_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vclmul_vx_v0t {
+    # vclmul.vx vd, vs2, rs1, v0.t
+    my $template = 0b0011000_00000_00000_110_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vclmul_vx {
+    # vclmul.vx vd, vs2, rs1
+    my $template = 0b0011001_00000_00000_110_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $rs1 = read_reg shift;
+    return ".word ".($template | ($vs2 << 20) | ($rs1 << 15) | ($vd << 7));
+}
+
+sub vrev8_v {
+    # vrev8.v vd, vs2
+    my $template = 0b0100101_00000_01001_010_00000_1010111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## Zvkg instructions
+
+sub vghsh_vv {
+    # vghsh.vv vd, vs2, vs1
+    my $template = 0b1011001_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15) | ($vd << 7));
+}
+
+sub vgmul_vv {
+    # vgmul.vv vd, vs2
+    my $template = 0b1010001_00000_10001_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## Zvkned instructions
+
+sub vaesdf_vs {
+    # vaesdf.vs vd, vs2
+    my $template = 0b101001_1_00000_00001_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaesdm_vs {
+    # vaesdm.vs vd, vs2
+    my $template = 0b101001_1_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaesef_vs {
+    # vaesef.vs vd, vs2
+    my $template = 0b101001_1_00000_00011_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaesem_vs {
+    # vaesem.vs vd, vs2
+    my $template = 0b101001_1_00000_00010_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaeskf1_vi {
+    # vaeskf1.vi vd, vs2, uimmm
+    my $template = 0b100010_1_00000_00000_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    my $uimm = shift;
+    return ".word ".($template | ($uimm << 15) | ($vs2 << 20) | ($vd << 7));
+}
+
+sub vaeskf2_vi {
+    # vaeskf2.vi vd, vs2, uimm
+    my $template = 0b101010_1_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vaesz_vs {
+    # vaesz.vs vd, vs2
+    my $template = 0b101001_1_00000_00111_010_00000_1110111;
+    my $vd = read_vreg  shift;
+    my $vs2 = read_vreg  shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## Zvknha and Zvknhb instructions
+
+sub vsha2ms_vv {
+    # vsha2ms.vv vd, vs2, vs1
+    my $template = 0b1011011_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7));
+}
+
+sub vsha2ch_vv {
+    # vsha2ch.vv vd, vs2, vs1
+    my $template = 0b101110_10000_00000_001_00000_01110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7));
+}
+
+sub vsha2cl_vv {
+    # vsha2cl.vv vd, vs2, vs1
+    my $template = 0b101111_10000_00000_001_00000_01110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20)| ($vs1 << 15 )| ($vd << 7));
+}
+
+## Zvksed instructions
+
+sub vsm4k_vi {
+    # vsm4k.vi vd, vs2, uimm
+    my $template = 0b1000011_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15) | ($vd << 7));
+}
+
+sub vsm4r_vs {
+    # vsm4r.vs vd, vs2
+    my $template = 0b1010011_00000_10000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vd << 7));
+}
+
+## zvksh instructions
+
+sub vsm3c_vi {
+    # vsm3c.vi vd, vs2, uimm
+    my $template = 0b1010111_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $uimm = shift;
+    return ".word ".($template | ($vs2 << 20) | ($uimm << 15 ) | ($vd << 7));
+}
+
+sub vsm3me_vv {
+    # vsm3me.vv vd, vs2, vs1
+    my $template = 0b1000001_00000_00000_010_00000_1110111;
+    my $vd = read_vreg shift;
+    my $vs2 = read_vreg shift;
+    my $vs1 = read_vreg shift;
+    return ".word ".($template | ($vs2 << 20) | ($vs1 << 15 ) | ($vd << 7));
+}
+
 1;

From patchwork Mon Mar 13 19:12:56 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= <heiko@sntech.de>
X-Patchwork-Id: 69076
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1378718wrd;
        Mon, 13 Mar 2023 13:00:51 -0700 (PDT)
X-Google-Smtp-Source: 
 AK7set/TCcQtEPT/Pf6OYPGjWoW+lTWLR1qmq/sGMVokVM2fNHc5YQ1cTK6cUY4ncB50ysF5pFGo
X-Received: by 2002:a17:903:124c:b0:19e:b2ed:6ffc with SMTP id
 u12-20020a170903124c00b0019eb2ed6ffcmr34301949plh.57.1678737651245;
        Mon, 13 Mar 2023 13:00:51 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1678737651; cv=none;
        d=google.com; s=arc-20160816;
        b=A4Uq5yauhoBtCVVdhYOCc1lw3MLER2r+zA+t4R2I+7XP1TKsNkpTMhjXofvZMXYZbx
         +2twZXytI594zdDhRNCmXyStexrsx/MG84+xpgkWlnkr68snRwjGmUpJ5AnNBaFDmg5s
         qT/028Qbr/sRJoD5kp/ja7ePq38sPKUq+o2P53VNOrHqeAGDNc6ukYRjFw/G1ekJHPfz
         JobT9Ak7DxDyVepCopYBW4MWoewl1KIccE52SEWX3s7fOUvgNtwxBcb8tXyVLuK5ngYy
         gj+mb8YMzQFR8LWr5wJ0bTTNx+TfFirtIuu/Va1IEg6YbiaHNhXBCfcNw5QdiVP83f0B
         otsQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from;
        bh=wZpLNihvj8XX8BKZONIK5UN2oqvL4wBYP4vqF3MMMcA=;
        b=iSQRm+7EyyTQAEaI75BTN+xyIZHY9xPupPqm1kOTm5IDoEWd1kyA2tJWpfwHV5frWo
         OXH8PoNDdOjyEG6y2oU+jyOAjV5qJ7NTyKLG2hWFk7oiX/jY6DNbRBCtwWEAOnZFZx9w
         HloAn9f3MxYbrpYUUAMuog+puD1x9Tk0mQKqPkclSjHRsiI+UFFZwrJhrJIenihpV03r
         RBfcobo9X+tZ32l1pB5cicZuR4kcSTtgwkIXb/+nrmC0e5uEOz+Rg/5n+T8t6/mDATm8
         d/SuazxdjNhZ0htVU8jV1I7o9IDn+3xaRT4+oTHAhgu0eh4wI21TPQQrFBUg+xx4mfZx
         Tm4g==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 kt5-20020a170903088500b0019fb1cb8dc7si456427plb.359.2023.03.13.13.00.34;
        Mon, 13 Mar 2023 13:00:51 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230511AbjCMTRl (ORCPT <rfc822;realc9580@gmail.com> + 99 others);
        Mon, 13 Mar 2023 15:17:41 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44876 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230464AbjCMTRj (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 13 Mar 2023 15:17:39 -0400
Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 023DE2CFC1
        for <linux-kernel@vger.kernel.org>;
 Mon, 13 Mar 2023 12:17:09 -0700 (PDT)
Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169]
 helo=phil.lan)
        by gloria.sntech.de with esmtpsa  (TLS1.3) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.94.2)
        (envelope-from <heiko@sntech.de>)
        id 1pbnbW-00028k-Be; Mon, 13 Mar 2023 20:13:06 +0100
From: Heiko Stuebner <heiko@sntech.de>
To: palmer@rivosinc.com
Cc: greentime.hu@sifive.com, conor@kernel.org,
        linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
        christoph.muellner@vrull.eu, heiko@sntech.de
Subject: [PATCH RFC v3 10/16] RISC-V: crypto: add Zvkb accelerated GCM GHASH
 implementation
Date: Mon, 13 Mar 2023 20:12:56 +0100
Message-Id: <20230313191302.580787-11-heiko.stuebner@vrull.eu>
X-Mailer: git-send-email 2.39.0
In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
References: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,
        T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1760284011450761741?=
X-GMAIL-MSGID: =?utf-8?q?1760284011450761741?=

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Add a gcm hash implementation using the Zvkb crypto extension.
It gets possibly registered alongside the Zbc-based variant, with a higher
priority so that the crypto subsystem will be able to select the most
performant variant, but the algorithm itself will still be part of the
crypto selftests that run during registration.

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig               |   3 +-
 arch/riscv/crypto/Makefile              |   8 +-
 arch/riscv/crypto/ghash-riscv64-glue.c  | 147 ++++++++++
 arch/riscv/crypto/ghash-riscv64-zvkb.pl | 349 ++++++++++++++++++++++++
 4 files changed, 505 insertions(+), 2 deletions(-)
 create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkb.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 010adbbb058a..404fd9b3cb7c 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -4,7 +4,7 @@ menu "Accelerated Cryptographic Algorithms for CPU (riscv)"
 
 config CRYPTO_GHASH_RISCV64
 	tristate "Hash functions: GHASH"
-	depends on 64BIT && RISCV_ISA_ZBC
+	depends on 64BIT && (RISCV_ISA_ZBC || RISCV_ISA_V)
 	select CRYPTO_HASH
 	select CRYPTO_LIB_GF128MUL
 	help
@@ -12,5 +12,6 @@ config CRYPTO_GHASH_RISCV64
 
 	  Architecture: riscv64 using one of:
 	  - ZBC extension
+	  - ZVKB vector crypto extension
 
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 0a158919e9da..8ab9a0ae8f2d 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -8,6 +8,9 @@ ghash-riscv64-y := ghash-riscv64-glue.o
 ifdef CONFIG_RISCV_ISA_ZBC
 ghash-riscv64-y += ghash-riscv64-zbc.o
 endif
+ifdef CONFIG_RISCV_ISA_V
+ghash-riscv64-y += ghash-riscv64-zvkb.o
+endif
 
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
@@ -15,4 +18,7 @@ quiet_cmd_perlasm = PERLASM $@
 $(obj)/ghash-riscv64-zbc.S: $(src)/ghash-riscv64-zbc.pl
 	$(call cmd,perlasm)
 
-clean-files += ghash-riscv64-zbc.S
+$(obj)/ghash-riscv64-zvkb.S: $(src)/ghash-riscv64-zvkb.pl
+	$(call cmd,perlasm)
+
+clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S
diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/ghash-riscv64-glue.c
index 6a6c39e16702..004a1a11d7d8 100644
--- a/arch/riscv/crypto/ghash-riscv64-glue.c
+++ b/arch/riscv/crypto/ghash-riscv64-glue.c
@@ -11,6 +11,7 @@
 #include <linux/crypto.h>
 #include <linux/module.h>
 #include <asm/simd.h>
+#include <asm/vector.h>
 #include <crypto/ghash.h>
 #include <crypto/internal/hash.h>
 #include <crypto/internal/simd.h>
@@ -21,6 +22,10 @@ void gcm_ghash_rv64i_zbc(u64 Xi[2], const u128 Htable[16],
 void gcm_ghash_rv64i_zbc__zbkb(u64 Xi[2], const u128 Htable[16],
 			       const u8 *inp, size_t len);
 
+/* Zvkb (vector crypto with vclmul) based routines. */
+void gcm_ghash_rv64i_zvkb(u64 Xi[2], const u128 Htable[16],
+			  const u8 *inp, size_t len);
+
 struct riscv64_ghash_ctx {
 	void (*ghash_func)(u64 Xi[2], const u128 Htable[16],
 			   const u8 *inp, size_t len);
@@ -46,6 +51,140 @@ static int riscv64_ghash_init(struct shash_desc *desc)
 	return 0;
 }
 
+#ifdef CONFIG_RISCV_ISA_V
+
+#define RISCV64_ZVK_SETKEY(VARIANT, GHASH)				\
+void gcm_init_rv64i_ ## VARIANT(u128 Htable[16], const u64 Xi[2]);	\
+static int riscv64_zvk_ghash_setkey_ ## VARIANT(struct crypto_shash *tfm,	\
+					   const u8 *key,		\
+					   unsigned int keylen)		\
+{									\
+	struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(tfm)); \
+	const u64 k[2] = { cpu_to_be64(((const u64 *)key)[0]),		\
+			   cpu_to_be64(((const u64 *)key)[1]) };	\
+									\
+	if (keylen != GHASH_BLOCK_SIZE)					\
+		return -EINVAL;						\
+									\
+	memcpy(&ctx->key, key, GHASH_BLOCK_SIZE);			\
+	kernel_rvv_begin();						\
+	gcm_init_rv64i_ ## VARIANT(ctx->htable, k);			\
+	kernel_rvv_end();						\
+									\
+	ctx->ghash_func = gcm_ghash_rv64i_ ## GHASH;			\
+									\
+	return 0;							\
+}
+
+static inline void __ghash_block(struct riscv64_ghash_ctx *ctx,
+				 struct riscv64_ghash_desc_ctx *dctx)
+{
+	if (crypto_simd_usable()) {
+		kernel_rvv_begin();
+		ctx->ghash_func(dctx->shash, ctx->htable,
+				dctx->buffer, GHASH_DIGEST_SIZE);
+		kernel_rvv_end();
+	} else {
+		crypto_xor((u8 *)dctx->shash, dctx->buffer, GHASH_BLOCK_SIZE);
+		gf128mul_lle((be128 *)dctx->shash, &ctx->key);
+	}
+}
+
+static inline void __ghash_blocks(struct riscv64_ghash_ctx *ctx,
+				  struct riscv64_ghash_desc_ctx *dctx,
+				  const u8 *src, unsigned int srclen)
+{
+	if (crypto_simd_usable()) {
+		kernel_rvv_begin();
+		ctx->ghash_func(dctx->shash, ctx->htable,
+				src, srclen);
+		kernel_rvv_end();
+	} else {
+		while (srclen >= GHASH_BLOCK_SIZE) {
+			crypto_xor((u8 *)dctx->shash, src, GHASH_BLOCK_SIZE);
+			gf128mul_lle((be128 *)dctx->shash, &ctx->key);
+			srclen -= GHASH_BLOCK_SIZE;
+			src += GHASH_BLOCK_SIZE;
+		}
+	}
+}
+
+static int riscv64_zvk_ghash_update(struct shash_desc *desc,
+			   const u8 *src, unsigned int srclen)
+{
+	unsigned int len;
+	struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));
+	struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+
+	if (dctx->bytes) {
+		if (dctx->bytes + srclen < GHASH_DIGEST_SIZE) {
+			memcpy(dctx->buffer + dctx->bytes, src,
+				srclen);
+			dctx->bytes += srclen;
+			return 0;
+		}
+		memcpy(dctx->buffer + dctx->bytes, src,
+			GHASH_DIGEST_SIZE - dctx->bytes);
+
+		__ghash_block(ctx, dctx);
+
+		src += GHASH_DIGEST_SIZE - dctx->bytes;
+		srclen -= GHASH_DIGEST_SIZE - dctx->bytes;
+		dctx->bytes = 0;
+	}
+	len = srclen & ~(GHASH_DIGEST_SIZE - 1);
+
+	if (len) {
+		__ghash_blocks(ctx, dctx, src, len);
+		src += len;
+		srclen -= len;
+	}
+
+	if (srclen) {
+		memcpy(dctx->buffer, src, srclen);
+		dctx->bytes = srclen;
+	}
+	return 0;
+}
+
+static int riscv64_zvk_ghash_final(struct shash_desc *desc, u8 *out)
+{
+	struct riscv64_ghash_ctx *ctx = crypto_tfm_ctx(crypto_shash_tfm(desc->tfm));
+	struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc);
+	int i;
+
+	if (dctx->bytes) {
+		for (i = dctx->bytes; i < GHASH_DIGEST_SIZE; i++)
+			dctx->buffer[i] = 0;
+		__ghash_block(ctx, dctx);
+		dctx->bytes = 0;
+	}
+
+	memcpy(out, dctx->shash, GHASH_DIGEST_SIZE);
+	return 0;
+}
+
+RISCV64_ZVK_SETKEY(zvkb, zvkb);
+struct shash_alg riscv64_zvkb_ghash_alg = {
+	.digestsize = GHASH_DIGEST_SIZE,
+	.init = riscv64_ghash_init,
+	.update = riscv64_zvk_ghash_update,
+	.final = riscv64_zvk_ghash_final,
+	.setkey = riscv64_zvk_ghash_setkey_zvkb,
+	.descsize = sizeof(struct riscv64_ghash_desc_ctx)
+		    + sizeof(struct ghash_desc_ctx),
+	.base = {
+		 .cra_name = "ghash",
+		 .cra_driver_name = "riscv64_zvkb_ghash",
+		 .cra_priority = 300,
+		 .cra_blocksize = GHASH_BLOCK_SIZE,
+		 .cra_ctxsize = sizeof(struct riscv64_ghash_ctx),
+		 .cra_module = THIS_MODULE,
+	},
+};
+
+#endif /* CONFIG_RISCV_ISA_V */
+
 #ifdef CONFIG_RISCV_ISA_ZBC
 
 #define RISCV64_ZBC_SETKEY(VARIANT, GHASH)				\
@@ -236,6 +375,14 @@ static int __init riscv64_ghash_mod_init(void)
 	}
 #endif
 
+#ifdef CONFIG_RISCV_ISA_V
+	if (riscv_isa_extension_available(NULL, ZVKB)) {
+		ret = riscv64_ghash_register(&riscv64_zvkb_ghash_alg);
+		if (ret < 0)
+			return ret;
+	}
+#endif
+
 	return 0;
 }
 
diff --git a/arch/riscv/crypto/ghash-riscv64-zvkb.pl b/arch/riscv/crypto/ghash-riscv64-zvkb.pl
new file mode 100644
index 000000000000..3b81c082ba5a
--- /dev/null
+++ b/arch/riscv/crypto/ghash-riscv64-zvkb.pl
@@ -0,0 +1,349 @@
+#! /usr/bin/env perl
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License").  You may not use
+# this file except in compliance with the License.  You can obtain a copy
+# in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 128
+# - Vector Bit-manipulation used in Cryptography ('Zvkb')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+################################################################################
+# void gcm_init_rv64i_zvkg(u128 Htable[16], const u64 H[2]);
+#
+# input:	H: 128-bit H - secret parameter E(K, 0^128)
+# output:	Htable: Preprocessed key data for gcm_gmult_rv64i_zvkb and
+#                       gcm_ghash_rv64i_zvkb
+{
+my ($Htable,$H,$TMP0,$TMP1,$TMP2) = ("a0","a1","t0","t1","t2");
+my ($V0,$V1,$V2,$V3,$V4,$V5,$V6) = ("v0","v1","v2","v3","v4","v5","v6");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_init_rv64i_zvkb
+.type gcm_init_rv64i_zvkb,\@function
+gcm_init_rv64i_zvkb:
+    # Load/store data in reverse order.
+    # This is needed as a part of endianness swap.
+    add $H, $H, 8
+    li $TMP0, -8
+    li $TMP1, 63
+    la $TMP2, Lpolymod
+
+    @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma
+
+    @{[vlse64_v  $V1, $H, $TMP0]}    # vlse64.v v1, (a1), t0
+    @{[vle64_v $V2, $TMP2]}          # vle64.v v2, (t2)
+
+    # Shift one left and get the carry bits.
+    @{[vsrl_vx $V3, $V1, $TMP1]}     # vsrl.vx v3, v1, t1
+    @{[vsll_vi $V1, $V1, 1]}         # vsll.vi v1, v1, 1
+
+    # Use the fact that the polynomial degree is no more than 128,
+    # i.e. only the LSB of the upper half could be set.
+    # Thanks to we don't need to do the full reduction here.
+    # Instead simply subtract the reduction polynomial.
+    # This idea was taken from x86 ghash implementation in OpenSSL.
+    @{[vslideup_vi $V4, $V3, 1]}     # vslideup.vi v4, v3, 1
+    @{[vslidedown_vi $V3, $V3, 1]}   # vslidedown.vi v3, v3, 1
+
+    @{[vmv_v_i $V0, 2]}              # vmv.v.i v0, 2
+    @{[vor_vv_v0t $V1, $V1, $V4]}    # vor.vv v1, v1, v4, v0.t
+
+    # Need to set the mask to 3, if the carry bit is set.
+    @{[vmv_v_v $V0, $V3]}            # vmv.v.v v0, v3
+    @{[vmv_v_i $V3, 0]}              # vmv.v.i v3, 0
+    @{[vmerge_vim $V3, $V3, 3]}      # vmerge.vim v3, v3, 3, v0
+    @{[vmv_v_v $V0, $V3]}            # vmv.v.v v0, v3
+
+    @{[vxor_vv_v0t $V1, $V1, $V2]}   # vxor.vv v1, v1, v2, v0.t
+
+    @{[vse64_v $V1, $Htable]}        # vse64.v v1, (a0)
+    ret
+.size gcm_init_rv64i_zvkb,.-gcm_init_rv64i_zvkb
+___
+}
+
+################################################################################
+# void gcm_gmult_rv64i_zvkb(u64 Xi[2], const u128 Htable[16]);
+#
+# input:	Xi: current hash value
+#		Htable: preprocessed H
+# output:	Xi: next hash value Xi = (Xi * H mod f)
+{
+my ($Xi,$Htable,$TMP0,$TMP1,$TMP2,$TMP3,$TMP4) = ("a0","a1","t0","t1","t2","t3","t4");
+my ($V0,$V1,$V2,$V3,$V4,$V5,$V6) = ("v0","v1","v2","v3","v4","v5","v6");
+
+$code .= <<___;
+.text
+.p2align 3
+.globl gcm_gmult_rv64i_zvkb
+.type gcm_gmult_rv64i_zvkb,\@function
+gcm_gmult_rv64i_zvkb:
+    ld $TMP0, ($Htable)
+    ld $TMP1, 8($Htable)
+    li $TMP2, 63
+    la $TMP3, Lpolymod
+    ld $TMP3, 8($TMP3)
+
+    # Load/store data in reverse order.
+    # This is needed as a part of endianness swap.
+    add $Xi, $Xi, 8
+    li $TMP4, -8
+
+    @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma
+
+    @{[vlse64_v $V5, $Xi, $TMP4]}    # vlse64.v v5, (a0), t4
+    @{[vrev8_v $V5, $V5]}            # vrev8.v v5, v5
+
+    # Multiplication
+
+    # Do two 64x64 multiplications in one go to save some time
+    # and simplify things.
+
+    # A = a1a0 (t1, t0)
+    # B = b1b0 (v5)
+    # C = c1c0 (256 bit)
+    # c1 = a1b1 + (a0b1)h + (a1b0)h
+    # c0 = a0b0 + (a0b1)l + (a1b0)h
+
+    # v1 = (a0b1)l,(a0b0)l
+    @{[vclmul_vx $V1, $V5, $TMP0]}   # vclmul.vx v1, v5, t0
+    # v3 = (a0b1)h,(a0b0)h
+    @{[vclmulh_vx $V3, $V5, $TMP0]}  # vclmulh.vx v3, v5, t0
+
+    # v4 = (a1b1)l,(a1b0)l
+    @{[vclmul_vx $V4, $V5, $TMP1]}   # vclmul.vx v4, v5, t1
+    # v2 = (a1b1)h,(a1b0)h
+    @{[vclmulh_vx $V2, $V5, $TMP1]}   # vclmulh.vx v2, v5, t1
+
+    # Is there a better way to do this?
+    # Would need to swap the order of elements within a vector register.
+    @{[vslideup_vi $V5, $V3, 1]}     # vslideup.vi v5, v3, 1
+    @{[vslideup_vi $V6, $V4, 1]}     # vslideup.vi v6, v4, 1
+    @{[vslidedown_vi $V3, $V3, 1]}   # vslidedown.vi v3, v3, 1
+    @{[vslidedown_vi $V4, $V4, 1]}   # vslidedown.vi v4, v4, 1
+
+    @{[vmv_v_i $V0, 1]}              # vmv.v.i v0, 1
+    # v2 += (a0b1)h
+    @{[vxor_vv_v0t $V2, $V2, $V3]}   # vxor.vv v2, v2, v3, v0.t
+    # v2 += (a1b1)l
+    @{[vxor_vv_v0t $V2, $V2, $V4]}   # vxor.vv v2, v2, v4, v0.t
+
+    @{[vmv_v_i $V0, 2]}              # vmv.v.i v0, 2
+    # v1 += (a0b0)h,0
+    @{[vxor_vv_v0t $V1, $V1, $V5]}   # vxor.vv v1, v1, v5, v0.t
+    # v1 += (a1b0)l,0
+    @{[vxor_vv_v0t $V1, $V1, $V6]}   # vxor.vv v1, v1, v6, v0.t
+
+    # Now the 256bit product should be stored in (v2,v1)
+    # v1 = (a0b1)l + (a0b0)h + (a1b0)l, (a0b0)l
+    # v2 = (a1b1)h, (a1b0)h + (a0b1)h + (a1b1)l
+
+    # Reduction
+    # Let C := A*B = c3,c2,c1,c0 = v2[1],v2[0],v1[1],v1[0]
+    # This is a slight variation of the Gueron's Montgomery reduction.
+    # The difference being the order of some operations has been changed,
+    # to make a better use of vclmul(h) instructions.
+
+    # First step:
+    # c1 += (c0 * P)l
+    # vmv.v.i v0, 2
+    @{[vslideup_vi_v0t $V3, $V1, 1]} # vslideup.vi v3, v1, 1, v0.t
+    @{[vclmul_vx_v0t $V3, $V3, $TMP3]} # vclmul.vx v3, v3, t3, v0.t
+    @{[vxor_vv_v0t $V1, $V1, $V3]}   # vxor.vv v1, v1, v3, v0.t
+
+    # Second step:
+    # D = d1,d0 is final result
+    # We want:
+    # m1 = c1 + (c1 * P)h
+    # m0 = (c1 * P)l + (c0 * P)h + c0
+    # d1 = c3 + m1
+    # d0 = c2 + m0
+
+    #v3 = (c1 * P)l, 0
+    @{[vclmul_vx_v0t $V3, $V1, $TMP3]} # vclmul.vx v3, v1, t3, v0.t
+    #v4 = (c1 * P)h, (c0 * P)h
+    @{[vclmulh_vx $V4, $V1, $TMP3]}   # vclmulh.vx v4, v1, t3
+
+    @{[vmv_v_i $V0, 1]}              # vmv.v.i v0, 1
+    @{[vslidedown_vi $V3, $V3, 1]}   # vslidedown.vi v3, v3, 1
+
+    @{[vxor_vv $V1, $V1, $V4]}       # vxor.vv v1, v1, v4
+    @{[vxor_vv_v0t $V1, $V1, $V3]}   # vxor.vv v1, v1, v3, v0.t
+
+    # XOR in the upper upper part of the product
+    @{[vxor_vv $V2, $V2, $V1]}       # vxor.vv v2, v2, v1
+
+    @{[vrev8_v $V2, $V2]}            # vrev8.v v2, v2
+    @{[vsse64_v $V2, $Xi, $TMP4]}    # vsse64.v v2, (a0), t4
+    ret
+.size gcm_gmult_rv64i_zvkb,.-gcm_gmult_rv64i_zvkb
+___
+}
+
+################################################################################
+# void gcm_ghash_rv64i_zvkb(u64 Xi[2], const u128 Htable[16],
+#                           const u8 *inp, size_t len);
+#
+# input:	Xi: current hash value
+#		Htable: preprocessed H
+#		inp: pointer to input data
+#		len: length of input data in bytes (mutiple of block size)
+# output:	Xi: Xi+1 (next hash value Xi)
+{
+my ($Xi,$Htable,$inp,$len,$TMP0,$TMP1,$TMP2,$TMP3,$M8,$TMP5,$TMP6) = ("a0","a1","a2","a3","t0","t1","t2","t3","t4","t5","t6");
+my ($V0,$V1,$V2,$V3,$V4,$V5,$V6,$Vinp) = ("v0","v1","v2","v3","v4","v5","v6","v7");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_ghash_rv64i_zvkb
+.type gcm_ghash_rv64i_zvkb,\@function
+gcm_ghash_rv64i_zvkb:
+    ld $TMP0, ($Htable)
+    ld $TMP1, 8($Htable)
+    li $TMP2, 63
+    la $TMP3, Lpolymod
+    ld $TMP3, 8($TMP3)
+
+    # Load/store data in reverse order.
+    # This is needed as a part of endianness swap.
+    add $Xi, $Xi, 8
+    add $inp, $inp, 8
+    li $M8, -8
+
+    @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma
+
+    @{[vlse64_v $V5, $Xi, $M8]}      # vlse64.v v5, (a0), t4
+
+Lstep:
+    # Read input data
+    @{[vlse64_v $Vinp, $inp, $M8]}   # vle64.v v0, (a2)
+    add $inp, $inp, 16
+    add $len, $len, -16
+    # XOR them into Xi
+    @{[vxor_vv $V5, $V5, $Vinp]}       # vxor.vv v0, v0, v1
+
+    @{[vrev8_v $V5, $V5]}            # vrev8.v v5, v5
+
+    # Multiplication
+
+    # Do two 64x64 multiplications in one go to save some time
+    # and simplify things.
+
+    # A = a1a0 (t1, t0)
+    # B = b1b0 (v5)
+    # C = c1c0 (256 bit)
+    # c1 = a1b1 + (a0b1)h + (a1b0)h
+    # c0 = a0b0 + (a0b1)l + (a1b0)h
+
+    # v1 = (a0b1)l,(a0b0)l
+    @{[vclmul_vx $V1, $V5, $TMP0]}   # vclmul.vx v1, v5, t0
+    # v3 = (a0b1)h,(a0b0)h
+    @{[vclmulh_vx $V3, $V5, $TMP0]}  # vclmulh.vx v3, v5, t0
+
+    # v4 = (a1b1)l,(a1b0)l
+    @{[vclmul_vx $V4, $V5, $TMP1]}   # vclmul.vx v4, v5, t1
+    # v2 = (a1b1)h,(a1b0)h
+    @{[vclmulh_vx $V2, $V5, $TMP1]}   # vclmulh.vx v2, v5, t1
+
+    # Is there a better way to do this?
+    # Would need to swap the order of elements within a vector register.
+    @{[vslideup_vi $V5, $V3, 1]}     # vslideup.vi v5, v3, 1
+    @{[vslideup_vi $V6, $V4, 1]}     # vslideup.vi v6, v4, 1
+    @{[vslidedown_vi $V3, $V3, 1]}   # vslidedown.vi v3, v3, 1
+    @{[vslidedown_vi $V4, $V4, 1]}   # vslidedown.vi v4, v4, 1
+
+    @{[vmv_v_i $V0, 1]}              # vmv.v.i v0, 1
+    # v2 += (a0b1)h
+    @{[vxor_vv_v0t $V2, $V2, $V3]}   # vxor.vv v2, v2, v3, v0.t
+    # v2 += (a1b1)l
+    @{[vxor_vv_v0t $V2, $V2, $V4]}   # vxor.vv v2, v2, v4, v0.t
+
+    @{[vmv_v_i $V0, 2]}              # vmv.v.i v0, 2
+    # v1 += (a0b0)h,0
+    @{[vxor_vv_v0t $V1, $V1, $V5]}   # vxor.vv v1, v1, v5, v0.t
+    # v1 += (a1b0)l,0
+    @{[vxor_vv_v0t $V1, $V1, $V6]}   # vxor.vv v1, v1, v6, v0.t
+
+    # Now the 256bit product should be stored in (v2,v1)
+    # v1 = (a0b1)l + (a0b0)h + (a1b0)l, (a0b0)l
+    # v2 = (a1b1)h, (a1b0)h + (a0b1)h + (a1b1)l
+
+    # Reduction
+    # Let C := A*B = c3,c2,c1,c0 = v2[1],v2[0],v1[1],v1[0]
+    # This is a slight variation of the Gueron's Montgomery reduction.
+    # The difference being the order of some operations has been changed,
+    # to make a better use of vclmul(h) instructions.
+
+    # First step:
+    # c1 += (c0 * P)l
+    # vmv.v.i v0, 2
+    @{[vslideup_vi_v0t $V3, $V1, 1]} # vslideup.vi v3, v1, 1, v0.t
+    @{[vclmul_vx_v0t $V3, $V3, $TMP3]} # vclmul.vx v3, v3, t3, v0.t
+    @{[vxor_vv_v0t $V1, $V1, $V3]}   # vxor.vv v1, v1, v3, v0.t
+
+    # Second step:
+    # D = d1,d0 is final result
+    # We want:
+    # m1 = c1 + (c1 * P)h
+    # m0 = (c1 * P)l + (c0 * P)h + c0
+    # d1 = c3 + m1
+    # d0 = c2 + m0
+
+    #v3 = (c1 * P)l, 0
+    @{[vclmul_vx_v0t $V3, $V1, $TMP3]} # vclmul.vx v3, v1, t3, v0.t
+    #v4 = (c1 * P)h, (c0 * P)h
+    @{[vclmulh_vx $V4, $V1, $TMP3]}   # vclmulh.vx v4, v1, t3
+
+    @{[vmv_v_i $V0, 1]}              # vmv.v.i v0, 1
+    @{[vslidedown_vi $V3, $V3, 1]}   # vslidedown.vi v3, v3, 1
+
+    @{[vxor_vv $V1, $V1, $V4]}       # vxor.vv v1, v1, v4
+    @{[vxor_vv_v0t $V1, $V1, $V3]}   # vxor.vv v1, v1, v3, v0.t
+
+    # XOR in the upper upper part of the product
+    @{[vxor_vv $V2, $V2, $V1]}       # vxor.vv v2, v2, v1
+
+    @{[vrev8_v $V5, $V2]}            # vrev8.v v2, v2
+
+    bnez $len, Lstep
+
+    @{[vsse64_v $V5, $Xi, $M8]}    # vsse64.v v2, (a0), t4
+    ret
+.size gcm_ghash_rv64i_zvkb,.-gcm_ghash_rv64i_zvkb
+___
+}
+
+$code .= <<___;
+.p2align 4
+Lpolymod:
+        .dword 0x0000000000000001
+        .dword 0xc200000000000000
+.size Lpolymod,.-Lpolymod
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";

From patchwork Mon Mar 13 19:12:57 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= <heiko@sntech.de>
X-Patchwork-Id: 69065
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1363997wrd;
        Mon, 13 Mar 2023 12:24:03 -0700 (PDT)
X-Google-Smtp-Source: 
 AK7set8SjJVYTdd9r9BfqTji2jNCxNa7EPn6RCBIlGYELnd7B8NjgI8RZZKhdRU3OlQvHopDLkGh
X-Received: by 2002:a17:902:d2d0:b0:19e:65db:7ac3 with SMTP id
 n16-20020a170902d2d000b0019e65db7ac3mr45473808plc.68.1678735443410;
        Mon, 13 Mar 2023 12:24:03 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1678735443; cv=none;
        d=google.com; s=arc-20160816;
        b=w8/OhOG3mhah1/65aHwurQi7qb9zR7Y/BsRPzTZJl5ELuVFjjC8Wze4f9v4vgVJJ1b
         XLBg6SruYM7/nStOKa486ubSB0IKTjntq3zGuccXIL9kSn6mmnYn4hVb6Lr+/O7R3wnY
         fbsmmJftcZJwBoluihg5PtDQb5I9J+tDCOXHPSwu6FOmH5RE1irxVKkH4U52qI2vdNU0
         mW/3Br0J/Y/HaWuagm8H3DL13fjnTNPbQOQm7svRBVIxa/FqIgZ7U1B0dxGBBwu4F5l1
         p0YAqCe8njUoCF8BErsAtJAOl72QArezvlpcT2lXaK9FaTrLBzmVpfU+KNMEtC+MfrRF
         lnZQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from;
        bh=xST8Jct9HnqAqddp8dvaQgLM0OC67YRnG9StxVmaRBk=;
        b=t2r31wgOB2C1OsmyuDtAnSL/W51fJzB1EmsVwPeXAhvVdS2tiksZtC4QU30UirZzFG
         fLxW6r8+anImsGgwnjYiG7erXXF39PuP5QpzJOdHxW96A2htpp8HgiJ7h4K7eIOqK1e+
         PxFizj1y4qUQ7ibSiU2LJxBp2nsbZ4ShSfvf3Yv39b9DdxIhu+ZTRKsRdQxcMqOmNAMA
         xjRcEcEEzZSCLarPWAyfpa7HM1PZAA5VwRLYv4uo1MWlytidsAza12FllP99QKQlIp1r
         i6KW79BHfBqlEhojna9yhV+QXLvRhopMnzOyr5BtJFhRepXD57FVvwfldS0zytz9/plA
         G8yA==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 ko14-20020a17090307ce00b0019f23a15f43si414950plb.578.2023.03.13.12.23.50;
        Mon, 13 Mar 2023 12:24:03 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229694AbjCMTVZ (ORCPT <rfc822;realc9580@gmail.com> + 99 others);
        Mon, 13 Mar 2023 15:21:25 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48480 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230448AbjCMTVO (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 13 Mar 2023 15:21:14 -0400
Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 673896A06F
        for <linux-kernel@vger.kernel.org>;
 Mon, 13 Mar 2023 12:20:43 -0700 (PDT)
Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169]
 helo=phil.lan)
        by gloria.sntech.de with esmtpsa  (TLS1.3) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.94.2)
        (envelope-from <heiko@sntech.de>)
        id 1pbnbW-00028k-K2; Mon, 13 Mar 2023 20:13:06 +0100
From: Heiko Stuebner <heiko@sntech.de>
To: palmer@rivosinc.com
Cc: greentime.hu@sifive.com, conor@kernel.org,
        linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
        christoph.muellner@vrull.eu, heiko@sntech.de
Subject: [PATCH RFC v3 11/16] RISC-V: crypto: add Zvkg accelerated GCM GHASH
 implementation
Date: Mon, 13 Mar 2023 20:12:57 +0100
Message-Id: <20230313191302.580787-12-heiko.stuebner@vrull.eu>
X-Mailer: git-send-email 2.39.0
In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
References: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,
        T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1760281696311367576?=
X-GMAIL-MSGID: =?utf-8?q?1760281696311367576?=

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

When the Zvkg vector crypto extension is available another optimized
gcm ghash variant is possible, so add it as another implmentation.

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig               |   1 +
 arch/riscv/crypto/Makefile              |   7 +-
 arch/riscv/crypto/ghash-riscv64-glue.c  |  80 ++++++++++++
 arch/riscv/crypto/ghash-riscv64-zvkg.pl | 161 ++++++++++++++++++++++++
 4 files changed, 247 insertions(+), 2 deletions(-)
 create mode 100644 arch/riscv/crypto/ghash-riscv64-zvkg.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 404fd9b3cb7c..84da19bdde8b 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -13,5 +13,6 @@ config CRYPTO_GHASH_RISCV64
 	  Architecture: riscv64 using one of:
 	  - ZBC extension
 	  - ZVKB vector crypto extension
+	  - ZVKG vector crypto extension
 
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 8ab9a0ae8f2d..1ee0ce7d3264 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -9,7 +9,7 @@ ifdef CONFIG_RISCV_ISA_ZBC
 ghash-riscv64-y += ghash-riscv64-zbc.o
 endif
 ifdef CONFIG_RISCV_ISA_V
-ghash-riscv64-y += ghash-riscv64-zvkb.o
+ghash-riscv64-y += ghash-riscv64-zvkb.o ghash-riscv64-zvkg.o
 endif
 
 quiet_cmd_perlasm = PERLASM $@
@@ -21,4 +21,7 @@ $(obj)/ghash-riscv64-zbc.S: $(src)/ghash-riscv64-zbc.pl
 $(obj)/ghash-riscv64-zvkb.S: $(src)/ghash-riscv64-zvkb.pl
 	$(call cmd,perlasm)
 
-clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S
+$(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
+	$(call cmd,perlasm)
+
+clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S
diff --git a/arch/riscv/crypto/ghash-riscv64-glue.c b/arch/riscv/crypto/ghash-riscv64-glue.c
index 004a1a11d7d8..9c7a8616049e 100644
--- a/arch/riscv/crypto/ghash-riscv64-glue.c
+++ b/arch/riscv/crypto/ghash-riscv64-glue.c
@@ -26,6 +26,10 @@ void gcm_ghash_rv64i_zbc__zbkb(u64 Xi[2], const u128 Htable[16],
 void gcm_ghash_rv64i_zvkb(u64 Xi[2], const u128 Htable[16],
 			  const u8 *inp, size_t len);
 
+/* Zvkg (vector crypto with vghmac.vv). */
+void gcm_ghash_rv64i_zvkg(u64 Xi[2], const u128 Htable[16],
+			  const u8 *inp, size_t len);
+
 struct riscv64_ghash_ctx {
 	void (*ghash_func)(u64 Xi[2], const u128 Htable[16],
 			   const u8 *inp, size_t len);
@@ -183,6 +187,63 @@ struct shash_alg riscv64_zvkb_ghash_alg = {
 	},
 };
 
+RISCV64_ZVK_SETKEY(zvkg, zvkg);
+struct shash_alg riscv64_zvkg_ghash_alg = {
+	.digestsize = GHASH_DIGEST_SIZE,
+	.init = riscv64_ghash_init,
+	.update = riscv64_zvk_ghash_update,
+	.final = riscv64_zvk_ghash_final,
+	.setkey = riscv64_zvk_ghash_setkey_zvkg,
+	.descsize = sizeof(struct riscv64_ghash_desc_ctx)
+		    + sizeof(struct ghash_desc_ctx),
+	.base = {
+		 .cra_name = "ghash",
+		 .cra_driver_name = "riscv64_zvkg_ghash",
+		 .cra_priority = 301,
+		 .cra_blocksize = GHASH_BLOCK_SIZE,
+		 .cra_ctxsize = sizeof(struct riscv64_ghash_ctx),
+		 .cra_module = THIS_MODULE,
+	},
+};
+
+RISCV64_ZVK_SETKEY(zvkg__zbb_or_zbkb, zvkg);
+struct shash_alg riscv64_zvkg_zbb_or_zbkb_ghash_alg = {
+	.digestsize = GHASH_DIGEST_SIZE,
+	.init = riscv64_ghash_init,
+	.update = riscv64_zvk_ghash_update,
+	.final = riscv64_zvk_ghash_final,
+	.setkey = riscv64_zvk_ghash_setkey_zvkg__zbb_or_zbkb,
+	.descsize = sizeof(struct riscv64_ghash_desc_ctx)
+		    + sizeof(struct ghash_desc_ctx),
+	.base = {
+		 .cra_name = "ghash",
+		 .cra_driver_name = "riscv64_zvkg_zbb_or_zbkb_ghash",
+		 .cra_priority = 302,
+		 .cra_blocksize = GHASH_BLOCK_SIZE,
+		 .cra_ctxsize = sizeof(struct riscv64_ghash_ctx),
+		 .cra_module = THIS_MODULE,
+	},
+};
+
+RISCV64_ZVK_SETKEY(zvkg__zvkb, zvkg);
+struct shash_alg riscv64_zvkg_zvkb_ghash_alg = {
+	.digestsize = GHASH_DIGEST_SIZE,
+	.init = riscv64_ghash_init,
+	.update = riscv64_zvk_ghash_update,
+	.final = riscv64_zvk_ghash_final,
+	.setkey = riscv64_zvk_ghash_setkey_zvkg__zvkb,
+	.descsize = sizeof(struct riscv64_ghash_desc_ctx)
+		    + sizeof(struct ghash_desc_ctx),
+	.base = {
+		 .cra_name = "ghash",
+		 .cra_driver_name = "riscv64_zvkg_zvkb_ghash",
+		 .cra_priority = 303,
+		 .cra_blocksize = GHASH_BLOCK_SIZE,
+		 .cra_ctxsize = sizeof(struct riscv64_ghash_ctx),
+		 .cra_module = THIS_MODULE,
+	},
+};
+
 #endif /* CONFIG_RISCV_ISA_V */
 
 #ifdef CONFIG_RISCV_ISA_ZBC
@@ -381,6 +442,25 @@ static int __init riscv64_ghash_mod_init(void)
 		if (ret < 0)
 			return ret;
 	}
+
+	if (riscv_isa_extension_available(NULL, ZVKG)) {
+		ret = riscv64_ghash_register(&riscv64_zvkg_ghash_alg);
+		if (ret < 0)
+			return ret;
+
+		if (riscv_isa_extension_available(NULL, ZVKB)) {
+			ret = riscv64_ghash_register(&riscv64_zvkg_zvkb_ghash_alg);
+			if (ret < 0)
+				return ret;
+		}
+
+		if (riscv_isa_extension_available(NULL, ZBB) ||
+		    riscv_isa_extension_available(NULL, ZBKB)) {
+			ret = riscv64_ghash_register(&riscv64_zvkg_zbb_or_zbkb_ghash_alg);
+			if (ret < 0)
+				return ret;
+		}
+	}
 #endif
 
 	return 0;
diff --git a/arch/riscv/crypto/ghash-riscv64-zvkg.pl b/arch/riscv/crypto/ghash-riscv64-zvkg.pl
new file mode 100644
index 000000000000..c13dd9c4ee31
--- /dev/null
+++ b/arch/riscv/crypto/ghash-riscv64-zvkg.pl
@@ -0,0 +1,161 @@
+#! /usr/bin/env perl
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License").  You may not use
+# this file except in compliance with the License.  You can obtain a copy
+# in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 128
+# - RISC-V vector crypto GHASH extension ('Zvkg')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+################################################################################
+# void gcm_init_rv64i_zvkg(u128 Htable[16], const u64 H[2]);
+# void gcm_init_rv64i_zvkg__zbb_or_zbkb(u128 Htable[16], const u64 H[2]);
+# void gcm_init_rv64i_zvkg__zvkb(u128 Htable[16], const u64 H[2]);
+#
+# input: H: 128-bit H - secret parameter E(K, 0^128)
+# output: Htable: Copy of secret parameter (in normalized byte order)
+#
+# All callers of this function revert the byte-order unconditionally
+# on little-endian machines. So we need to revert the byte-order back.
+{
+my ($Htable,$H,$VAL0,$VAL1,$TMP0) = ("a0","a1","a2","a3","t0");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_init_rv64i_zvkg
+.type gcm_init_rv64i_zvkg,\@function
+gcm_init_rv64i_zvkg:
+    # First word
+    ld      $VAL0, 0($H)
+    ld      $VAL1, 8($H)
+    @{[sd_rev8_rv64i $VAL0, $Htable, 0, $TMP0]}
+    @{[sd_rev8_rv64i $VAL1, $Htable, 8, $TMP0]}
+    ret
+.size gcm_init_rv64i_zvkg,.-gcm_init_rv64i_zvkg
+___
+}
+
+{
+my ($Htable,$H,$TMP0,$TMP1) = ("a0","a1","t0","t1");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_init_rv64i_zvkg__zbb_or_zbkb
+.type gcm_init_rv64i_zvkg__zbb_or_zbkb,\@function
+gcm_init_rv64i_zvkg__zbb_or_zbkb:
+    ld      $TMP0,0($H)
+    ld      $TMP1,8($H)
+    @{[rev8 $TMP0, $TMP0]}           #rev8    $TMP0, $TMP0
+    @{[rev8 $TMP1, $TMP1]}           #rev8    $TMP1, $TMP1
+    sd      $TMP0,0($Htable)
+    sd      $TMP1,8($Htable)
+    ret
+.size gcm_init_rv64i_zvkg__zbb_or_zbkb,.-gcm_init_rv64i_zvkg__zbb_or_zbkb
+___
+}
+
+{
+my ($Htable,$H,$V0) = ("a0","a1","v0");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_init_rv64i_zvkg__zvkb
+.type gcm_init_rv64i_zvkg__zvkb,\@function
+gcm_init_rv64i_zvkg__zvkb:
+    # All callers of this function revert the byte-order unconditionally
+    # on little-endian machines. So we need to revert the byte-order back.
+    @{[vsetivli__x0_2_e64_m1_ta_ma]} # vsetivli x0, 2, e64, m1, ta, ma
+    @{[vle64_v $V0, $H]}             # vle64.v v0, (a1)
+    @{[vrev8_v $V0, $V0]}            # vrev8.v v0, v0
+    @{[vse64_v $V0, $Htable]}        # vse64.v v0, (a0)
+    ret
+.size gcm_init_rv64i_zvkg__zvkb,.-gcm_init_rv64i_zvkg__zvkb
+___
+}
+
+################################################################################
+# void gcm_gmult_rv64i_zvkg(u64 Xi[2], const u128 Htable[16]);
+#
+# input: Xi: current hash value
+#        Htable: copy of H
+# output: Xi: next hash value Xi
+{
+my ($Xi,$Htable) = ("a0","a1");
+my ($VD,$VS2) = ("v1","v2");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_gmult_rv64i_zvkg
+.type gcm_gmult_rv64i_zvkg,\@function
+gcm_gmult_rv64i_zvkg:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+    @{[vle32_v $VS2, $Htable]}
+    @{[vle32_v $VD, $Xi]}
+    @{[vgmul_vv $VD, $VS2]}
+    @{[vse32_v $VD, $Xi]}
+    ret
+.size gcm_gmult_rv64i_zvkg,.-gcm_gmult_rv64i_zvkg
+___
+}
+
+################################################################################
+# void gcm_ghash_rv64i_zvkg(u64 Xi[2], const u128 Htable[16],
+#                           const u8 *inp, size_t len);
+#
+# input: Xi: current hash value
+#        Htable: copy of H
+#        inp: pointer to input data
+#        len: length of input data in bytes (mutiple of block size)
+# output: Xi: Xi+1 (next hash value Xi)
+{
+my ($Xi,$Htable,$inp,$len) = ("a0","a1","a2","a3");
+my ($vXi,$vH,$vinp,$Vzero) = ("v1","v2","v3","v4");
+
+$code .= <<___;
+.p2align 3
+.globl gcm_ghash_rv64i_zvkg
+.type gcm_ghash_rv64i_zvkg,\@function
+gcm_ghash_rv64i_zvkg:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+    @{[vle32_v $vH, $Htable]}
+    @{[vle32_v $vXi, $Xi]}
+
+Lstep:
+    @{[vle32_v $vinp, $inp]}
+    add $inp, $inp, 16
+    add $len, $len, -16
+    @{[vghsh_vv $vXi, $vinp, $vH]}
+    bnez $len, Lstep
+
+    @{[vse32_v $vXi, $Xi]}
+    ret
+
+.size gcm_ghash_rv64i_zvkg,.-gcm_ghash_rv64i_zvkg
+___
+}
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";

From patchwork Mon Mar 13 19:12:58 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= <heiko@sntech.de>
X-Patchwork-Id: 69061
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1361726wrd;
        Mon, 13 Mar 2023 12:19:29 -0700 (PDT)
X-Google-Smtp-Source: 
 AK7set8Og2/W23dTko4//bR4XgnPhh9Rl6mciocqcLtQYcLOTwmkcfjj7nFvSOTsDIvD36p2Kl9n
X-Received: by 2002:a17:902:8bcb:b0:19e:ad18:da5c with SMTP id
 r11-20020a1709028bcb00b0019ead18da5cmr24705455plo.37.1678735169149;
        Mon, 13 Mar 2023 12:19:29 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1678735169; cv=none;
        d=google.com; s=arc-20160816;
        b=UAmtg1AwBWL1GnQEJOU8KIHSA3weYohLjai2djaRJ8dlX6LhRM+3uHZxX/G0+BluPB
         YNaR0GlRYuiP+vcgHRLOx01NcZYVCgICFvCpbJwFlNTr+9JJo6mFLLPmrXbVgqyFcFvh
         b3GDcN/iNOaKH1PapkhpuLrj4nvUlxJandOwXML0y9pA3/ekJBByA+Wb/zAVCmP/Y8Eo
         twDB0v2AHcfg2Oj+Qa5L5I1OafywnsXAoKnXfLvAO6unTWD+MkW4tJCa0AfrjPBacWFt
         UpXgQ6LEyP1R+iYIZ1GZwzkRQl5ZcKo0Jz2k9QKiynILgCmL9VvRPs9Q2zaml1Q5Nh6m
         /jYA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from;
        bh=bw5yiJQTAxNLkWR3Tk3nByraK0m2ufI4SaHxvRLm4MU=;
        b=OaY8z7I5/6mMcWCADAGQ4888wPY/D+gw6FznkOHwuClgxrD+0Qoif/YDrKpt0G5Qra
         o7O9Jia6MpnJydWDiyg0O1ASEfqI6pk5xG8lDTkOvH/9mrIpEhRe7YPC6kH3aHp0Bdm3
         CE63pURorrucWGd+wc2SFGQ20tDS0XJxlwj5bNA3fItnNruSH73gBQJvXyU+M9MIY9Pf
         +2TvcfG0iCnqgKNvIPI5iawIb+UPUM8Jwutz79Ch7h4GiYt+vQTdXNQELU1XpIZexfbg
         +Y+ZUGaCqubP28/XLYK0tsXTFot6DExKYWB2E5tBjpN7ySBxRoYGXqx+PrZvNGHblsZP
         A2QQ==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 d18-20020a170903231200b0019d04cfd41asi410800plh.593.2023.03.13.12.19.12;
        Mon, 13 Mar 2023 12:19:29 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230455AbjCMTQm (ORCPT <rfc822;realc9580@gmail.com> + 99 others);
        Mon, 13 Mar 2023 15:16:42 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42358 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230395AbjCMTQd (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 13 Mar 2023 15:16:33 -0400
Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 73DD37B4AC
        for <linux-kernel@vger.kernel.org>;
 Mon, 13 Mar 2023 12:16:12 -0700 (PDT)
Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169]
 helo=phil.lan)
        by gloria.sntech.de with esmtpsa  (TLS1.3) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.94.2)
        (envelope-from <heiko@sntech.de>)
        id 1pbnbW-00028k-Sz; Mon, 13 Mar 2023 20:13:06 +0100
From: Heiko Stuebner <heiko@sntech.de>
To: palmer@rivosinc.com
Cc: greentime.hu@sifive.com, conor@kernel.org,
        linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
        christoph.muellner@vrull.eu, heiko@sntech.de
Subject: [PATCH RFC v3 12/16] RISC-V: crypto: add a vector-crypto-accelerated
 SHA256 implementation
Date: Mon, 13 Mar 2023 20:12:58 +0100
Message-Id: <20230313191302.580787-13-heiko.stuebner@vrull.eu>
X-Mailer: git-send-email 2.39.0
In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
References: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,
        T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1760281408729327811?=
X-GMAIL-MSGID: =?utf-8?q?1760281408729327811?=

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

This adds an accelerated SHA256 algorithm using either the Zvknha
or Zvknhb vector crypto extensions. The spec says that

    Zvknhb supports SHA-256 and SHA-512. Zvknha supports only SHA-256.

so the relevant acclerating instructions are included in both.

Co-developed-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig                  |  11 +
 arch/riscv/crypto/Makefile                 |   7 +
 arch/riscv/crypto/sha256-riscv64-glue.c    | 114 +++++++++
 arch/riscv/crypto/sha256-riscv64-zvknha.pl | 284 +++++++++++++++++++++
 4 files changed, 416 insertions(+)
 create mode 100644 arch/riscv/crypto/sha256-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sha256-riscv64-zvknha.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 84da19bdde8b..8645e02171f7 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -15,4 +15,15 @@ config CRYPTO_GHASH_RISCV64
 	  - ZVKB vector crypto extension
 	  - ZVKG vector crypto extension
 
+config CRYPTO_SHA256_RISCV64
+	tristate "Hash functions: SHA-256"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_HASH
+	select CRYPTO_LIB_SHA256
+	help
+	  SHA-256 secure hash algorithm (FIPS 180)
+
+	  Architecture: riscv64 using
+	  - Zvknha or Zvknhb vector crypto extensions
+
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 1ee0ce7d3264..02b3b4c32672 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -12,6 +12,9 @@ ifdef CONFIG_RISCV_ISA_V
 ghash-riscv64-y += ghash-riscv64-zvkb.o ghash-riscv64-zvkg.o
 endif
 
+obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o
+sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvknhb.o
+
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
 
@@ -24,4 +27,8 @@ $(obj)/ghash-riscv64-zvkb.S: $(src)/ghash-riscv64-zvkb.pl
 $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
 	$(call cmd,perlasm)
 
+$(obj)/sha256-riscv64-zvknhb.S: $(src)/sha256-riscv64-zvknha.pl
+	$(call cmd,perlasm)
+
 clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S
+clean-files += sha256-riscv64-zvknha.S
diff --git a/arch/riscv/crypto/sha256-riscv64-glue.c b/arch/riscv/crypto/sha256-riscv64-glue.c
new file mode 100644
index 000000000000..8e3bb1deaad5
--- /dev/null
+++ b/arch/riscv/crypto/sha256-riscv64-glue.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux/riscv64 port of the OpenSSL SHA256 implementation for RISCV64
+ *
+ * Copyright (C) 2022 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ */
+
+#include <linux/types.h>
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sha2.h>
+#include <crypto/sha256_base.h>
+
+asmlinkage void sha256_block_data_order_zvknha(u32 *digest, const void *data,
+					unsigned int num_blks);
+
+static void __sha256_block_data_order(struct sha256_state *sst, u8 const *src,
+				      int blocks)
+{
+	sha256_block_data_order_zvknha(sst->state, src, blocks);
+}
+
+static int riscv64_sha256_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int len)
+{
+	if (crypto_simd_usable()) {
+		int ret;
+
+		kernel_rvv_begin();
+		ret = sha256_base_do_update(desc, data, len,
+					    __sha256_block_data_order);
+		kernel_rvv_end();
+		return ret;
+	} else { 
+		sha256_update(shash_desc_ctx(desc), data, len);
+		return 0;
+	}
+}
+
+static int riscv64_sha256_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
+{
+	if (!crypto_simd_usable()) {
+		sha256_update(shash_desc_ctx(desc), data, len);
+		sha256_final(shash_desc_ctx(desc), out);
+		return 0;
+	}
+
+	kernel_rvv_begin();
+	if (len)
+		sha256_base_do_update(desc, data, len,
+				      __sha256_block_data_order);
+
+	sha256_base_do_finalize(desc, __sha256_block_data_order);
+	kernel_rvv_end();
+
+	return sha256_base_finish(desc, out);
+}
+
+static int riscv64_sha256_final(struct shash_desc *desc, u8 *out)
+{
+	return riscv64_sha256_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg sha256_alg = {
+	.digestsize		= SHA256_DIGEST_SIZE,
+	.init			= sha256_base_init,
+	.update			= riscv64_sha256_update,
+	.final			= riscv64_sha256_final,
+	.finup			= riscv64_sha256_finup,
+	.descsize		= sizeof(struct sha256_state),
+	.base.cra_name		= "sha256",
+	.base.cra_driver_name	= "sha256-riscv64-zvknha",
+	.base.cra_priority	= 150,
+	.base.cra_blocksize	= SHA256_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+};
+
+static int __init sha256_mod_init(void)
+{
+	/*
+	 * From the spec:
+	 * Zvknhb supports SHA-256 and SHA-512. Zvknha supports only SHA-256.
+	 */
+	if ((riscv_isa_extension_available(NULL, ZVKNHA) ||
+	     riscv_isa_extension_available(NULL, ZVKNHB)) &&
+	     riscv_isa_extension_available(NULL, ZVKB) &&
+	     riscv_vector_vlen() >= 128)
+
+		return crypto_register_shash(&sha256_alg);
+
+	return 0;
+}
+
+static void __exit sha256_mod_fini(void)
+{
+	if ((riscv_isa_extension_available(NULL, ZVKNHA) ||
+	     riscv_isa_extension_available(NULL, ZVKNHB)) &&
+	     riscv_isa_extension_available(NULL, ZVKB) &&
+	     riscv_vector_vlen() >= 128)
+		crypto_unregister_shash(&sha256_alg);
+}
+
+module_init(sha256_mod_init);
+module_exit(sha256_mod_fini);
+
+MODULE_DESCRIPTION("SHA-256 secure hash for riscv64");
+MODULE_AUTHOR("Andy Polyakov <appro@openssl.org>");
+MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("sha256");
diff --git a/arch/riscv/crypto/sha256-riscv64-zvknha.pl b/arch/riscv/crypto/sha256-riscv64-zvknha.pl
new file mode 100644
index 000000000000..c4ac20d7d138
--- /dev/null
+++ b/arch/riscv/crypto/sha256-riscv64-zvknha.pl
@@ -0,0 +1,284 @@
+#! /usr/bin/env perl
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License").  You may not use
+# this file except in compliance with the License.  You can obtain a copy
+# in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 128
+# - Vector Bit-manipulation used in Cryptography ('Zvkb')
+# - Vector SHA-2 Secure Hash ('Zvknha')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+my ($V0, $V10, $V11, $V12, $V13, $V14, $V15, $V16, $V17) = ("v0", "v10", "v11", "v12", "v13", "v14","v15", "v16", "v17");
+my ($V26, $V27) = ("v26", "v27");
+
+my $K256 = "K256";
+
+# Function arguments
+my ($H, $INP, $LEN, $KT, $STRIDE) = ("a0", "a1", "a2", "a3", "t3");
+
+################################################################################
+# void sha256_block_data_order(void *c, const void *p, size_t len)
+$code .= <<___;
+.p2align 2
+.globl sha256_block_data_order_zvknha
+.type   sha256_block_data_order_zvknha,\@function
+sha256_block_data_order_zvknha:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # H is stored as {a,b,c,d},{e,f,g,h}, but we need {f,e,b,a},{h,g,d,c}
+    # We achieve this by reading with a negative stride followed by
+    # element sliding.
+    li $STRIDE, -4
+    addi $H, $H, 12
+    @{[vlse32_v $V16, $H, $STRIDE]} # {d,c,b,a}
+    addi $H, $H, 16
+    @{[vlse32_v $V17, $H, $STRIDE]} # {h,g,f,e}
+    # Keep H advanced by 12
+    addi $H, $H, -16
+
+    @{[vmv_v_v $V27, $V16]} # {d,c,b,a}
+    @{[vslidedown_vi $V26, $V16, 2]} # {b,a,0,0}
+    @{[vslidedown_vi $V16, $V17, 2]} # {f,e,0,0}
+    @{[vslideup_vi $V16, $V26, 2]} # {f,e,b,a}
+    @{[vslideup_vi $V17, $V27, 2]} # {h,g,d,c}
+
+    # Keep the old state as we need it later: H' = H+{a',b',c',...,h'}.
+    @{[vmv_v_v $V26, $V16]}
+    @{[vmv_v_v $V27, $V17]}
+
+L_round_loop:
+    la $KT, $K256 # Load round constants K256
+
+    # Load the 512-bits of the message block in v10-v13 and perform
+    # an endian swap on each 4 bytes element.
+    @{[vle32_v $V10, $INP]}
+    @{[vrev8_v $V10, $V10]}
+    add $INP, $INP, 16
+    @{[vle32_v $V11, $INP]}
+    @{[vrev8_v $V11, $V11]}
+    add $INP, $INP, 16
+    @{[vle32_v $V12, $INP]}
+    @{[vrev8_v $V12, $V12]}
+    add $INP, $INP, 16
+    @{[vle32_v $V13, $INP]}
+    @{[vrev8_v $V13, $V13]}
+    add $INP, $INP, 16
+
+    # Decrement length by 1
+    add $LEN, $LEN, -1
+
+    # Set v0 up for the vmerge that replaces the first word (idx==0)
+    @{[vid_v $V0]}
+    @{[vmseq_vi $V0, $V0, 0x0]}    # v0.mask[i] = (i == 0 ? 1 : 0)
+
+    # Quad-round 0 (+0, Wt from oldest to newest in v10->v11->v12->v13)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}  # Generate W[19:16]
+
+    # Quad-round 1 (+1, v11->v12->v13->v10)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}  # Generate W[23:20]
+
+    # Quad-round 2 (+2, v12->v13->v10->v11)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}  # Generate W[27:24]
+
+    # Quad-round 3 (+3, v13->v10->v11->v12)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}  # Generate W[31:28]
+
+    # Quad-round 4 (+0, v10->v11->v12->v13)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}  # Generate W[35:32]
+
+    # Quad-round 5 (+1, v11->v12->v13->v10)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}  # Generate W[39:36]
+
+    # Quad-round 6 (+2, v12->v13->v10->v11)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}  # Generate W[43:40]
+
+    # Quad-round 7 (+3, v13->v10->v11->v12)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}  # Generate W[47:44]
+
+    # Quad-round 8 (+0, v10->v11->v12->v13)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}  # Generate W[51:48]
+
+    # Quad-round 9 (+1, v11->v12->v13->v10)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}  # Generate W[55:52]
+
+    # Quad-round 10 (+2, v12->v13->v10->v11)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}  # Generate W[59:56]
+
+    # Quad-round 11 (+3, v13->v10->v11->v12)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}  # Generate W[63:60]
+
+    # Quad-round 12 (+0, v10->v11->v12->v13)
+    # Note that we stop generating new message schedule words (Wt, v10-13)
+    # as we already generated all the words we end up consuming (i.e., W[63:60]).
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+
+    # Quad-round 13 (+1, v11->v12->v13->v10)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+
+    # Quad-round 14 (+2, v12->v13->v10->v11)
+    @{[vle32_v $V15, $KT]}
+    addi $KT, $KT, 16
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+
+    # Quad-round 15 (+3, v13->v10->v11->v12)
+    @{[vle32_v $V15, $KT]}
+    # No kt increment needed.
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+
+    # H' = H+{a',b',c',...,h'}
+    @{[vadd_vv $V16, $V26, $V16]}
+    @{[vadd_vv $V17, $V27, $V17]}
+    @{[vmv_v_v $V26, $V16]}
+    @{[vmv_v_v $V27, $V17]}
+    bnez $LEN, L_round_loop
+
+    # v26 = v16 = {f,e,b,a}
+    # v27 = v17 = {h,g,d,c}
+    # Let's do the opposit transformation like on entry.
+
+    @{[vslideup_vi $V17, $V16, 2]} # {h,g,f,e}
+
+    @{[vslidedown_vi $V16, $V27, 2]} # {d,c,0,0}
+    @{[vslidedown_vi $V26, $V26, 2]} # {b,a,0,0}
+    @{[vslideup_vi $V16, $V26, 2]} # {d,c,b,a}
+
+    # H is already advanced by 12
+    @{[vsse32_v $V16, $H, $STRIDE]} # {a,b,c,d}
+    addi $H, $H, 16
+    @{[vsse32_v $V17, $H, $STRIDE]} # {e,f,g,h}
+
+    ret
+.size sha256_block_data_order_zvknha,.-sha256_block_data_order_zvknha
+
+.p2align 2
+.type $K256,\@object
+$K256:
+    .word 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5
+    .word 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5
+    .word 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3
+    .word 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174
+    .word 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc
+    .word 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da
+    .word 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7
+    .word 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967
+    .word 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13
+    .word 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85
+    .word 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3
+    .word 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070
+    .word 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5
+    .word 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3
+    .word 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208
+    .word 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2
+.size $K256,.-$K256
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";

From patchwork Mon Mar 13 19:12:59 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= <heiko@sntech.de>
X-Patchwork-Id: 69071
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1371195wrd;
        Mon, 13 Mar 2023 12:41:30 -0700 (PDT)
X-Google-Smtp-Source: 
 AK7set8Bo1+JE7sFppb4koDVhwWjkWGQuxk2opqHVsdrnMG/DcwelFCfOwg6724UzaeYB/9JzEOw
X-Received: by 2002:a17:90b:38d0:b0:22c:816e:d67d with SMTP id
 nn16-20020a17090b38d000b0022c816ed67dmr35769210pjb.24.1678736490646;
        Mon, 13 Mar 2023 12:41:30 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1678736490; cv=none;
        d=google.com; s=arc-20160816;
        b=s1o9wMawKQ7YLTodjtTK15NNMEBj9WZ6WJbsKO1Dbz9z5kwOVfF4fOXOQ2W/8ziZj4
         4GyTPGEbCKz66c2PTCfXkW4bDk3q8eR0Ojoh7Qfwif8PVQuAZQ39kMw6tmx0kxV5hAXz
         Rk8lcXjVdrZf1EmJKggvENTz3A3LEa7BfSq9N7Ktb1q54vzy7YojGQvoQMtgyzLWjGje
         peQAO25LVp1YEPAn0TMIcnpmjxPwtNp4kAz22XUI2+sBUIrHUzWphOO/vGVCdJCdG+KB
         R6DueCjnhprBy4pWwmKW6DL3h2PN24T+Vw+/impFErayIrVovwwz+jfHoRee4r8HNWpz
         WP1g==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from;
        bh=MWiccSq8Jjz3j5dCdf59xPoZ8vANtcyTQmrOs/4RXfE=;
        b=X41BI/kmPFDINtObzpDoVnJ+6QBmLiDjxY7jOzzMQWq0aFCkJ53iuWHOPBCWQafch1
         2DgqHhP9m1uMIXd2v585sPkmcvANBkL0VlS2lOrvBA8gW8nImubvJBVtGd5ewL9Sq2sF
         EdbfAWaGRbpWf2QsM7QKTgsesuqNq/LBmwfDQjzEZ7Fh8XifXNQFMqHyUkvWjGDwucOi
         7sWNM82frdJb3q4kCNsT5QaxAG+26EXhZ4roWZiSPS25oK/04eY0xxad4YaEZScGgY1O
         Xv8WWukFNzxso7ZTOJccxMkmbQw1Q37vwh/AGja+WeOROy0ufSI7Ptqn5fZxVO1tmM/Z
         jzYg==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 x6-20020a17090aca0600b00233866f90e3si452732pjt.125.2023.03.13.12.41.15;
        Mon, 13 Mar 2023 12:41:30 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229932AbjCMTPu (ORCPT <rfc822;realc9580@gmail.com> + 99 others);
        Mon, 13 Mar 2023 15:15:50 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41946 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229871AbjCMTPt (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 13 Mar 2023 15:15:49 -0400
Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0733B17142
        for <linux-kernel@vger.kernel.org>;
 Mon, 13 Mar 2023 12:15:00 -0700 (PDT)
Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169]
 helo=phil.lan)
        by gloria.sntech.de with esmtpsa  (TLS1.3) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.94.2)
        (envelope-from <heiko@sntech.de>)
        id 1pbnbX-00028k-5S; Mon, 13 Mar 2023 20:13:07 +0100
From: Heiko Stuebner <heiko@sntech.de>
To: palmer@rivosinc.com
Cc: greentime.hu@sifive.com, conor@kernel.org,
        linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
        christoph.muellner@vrull.eu, heiko@sntech.de
Subject: [PATCH RFC v3 13/16] RISC-V: crypto: add a vector-crypto-accelerated
 SHA512 implementation
Date: Mon, 13 Mar 2023 20:12:59 +0100
Message-Id: <20230313191302.580787-14-heiko.stuebner@vrull.eu>
X-Mailer: git-send-email 2.39.0
In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
References: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,
        T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1760282794285913295?=
X-GMAIL-MSGID: =?utf-8?q?1760282794285913295?=

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

This adds an accelerated SHA512 algorithm using either the Zvknhb
vector crypto extension.

Co-developed-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig                  |  11 +
 arch/riscv/crypto/Makefile                 |   8 +-
 arch/riscv/crypto/sha512-riscv64-glue.c    | 104 ++++++
 arch/riscv/crypto/sha512-riscv64-zvknhb.pl | 347 +++++++++++++++++++++
 4 files changed, 469 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/crypto/sha512-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sha512-riscv64-zvknhb.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index 8645e02171f7..da6244f0c0c4 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -26,4 +26,15 @@ config CRYPTO_SHA256_RISCV64
 	  Architecture: riscv64 using
 	  - Zvknha or Zvknhb vector crypto extensions
 
+config CRYPTO_SHA512_RISCV64
+	tristate "Hash functions: SHA-512"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_HASH
+	select CRYPTO_SHA512
+	help
+	  SHA-512 secure hash algorithm (FIPS 180)
+
+	  Architecture: riscv64
+	  - Zvknhb vector crypto extension
+
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 02b3b4c32672..3c94753affdf 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -15,6 +15,9 @@ endif
 obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o
 sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvknhb.o
 
+obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
+sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvknhb.o
+
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
 
@@ -30,5 +33,8 @@ $(obj)/ghash-riscv64-zvkg.S: $(src)/ghash-riscv64-zvkg.pl
 $(obj)/sha256-riscv64-zvknhb.S: $(src)/sha256-riscv64-zvknha.pl
 	$(call cmd,perlasm)
 
+$(obj)/sha512-riscv64-zvknhb.S: $(src)/sha512-riscv64-zvknhb.pl
+	$(call cmd,perlasm)
+
 clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S
-clean-files += sha256-riscv64-zvknha.S
+clean-files += sha256-riscv64-zvknha.S sha512-riscv64-zvknhb.S
diff --git a/arch/riscv/crypto/sha512-riscv64-glue.c b/arch/riscv/crypto/sha512-riscv64-glue.c
new file mode 100644
index 000000000000..fc35ba269bbc
--- /dev/null
+++ b/arch/riscv/crypto/sha512-riscv64-glue.c
@@ -0,0 +1,104 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux/riscv64 port of the OpenSSL SHA512 implementation for RISCV64
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ */
+
+#include <linux/types.h>
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sha2.h>
+#include <crypto/sha512_base.h>
+
+asmlinkage void sha512_block_data_order_zvknhb(u64 *digest, const void *data,
+					unsigned int num_blks);
+
+static void __sha512_block_data_order(struct sha512_state *sst, u8 const *src,
+				      int blocks)
+{
+	sha512_block_data_order_zvknhb(sst->state, src, blocks);
+}
+
+static int sha512_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int len)
+{
+	if (crypto_simd_usable()) {
+		int ret;
+
+		kernel_rvv_begin();
+		ret = sha512_base_do_update(desc, data, len,
+					    __sha512_block_data_order);
+		kernel_rvv_end();
+		return ret;
+	} else { 
+		return crypto_sha512_update(desc, data, len);
+	}
+}
+
+static int sha512_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
+{
+	if (!crypto_simd_usable())
+		return crypto_sha512_finup(desc, data, len, out);
+
+	kernel_rvv_begin();
+	if (len)
+		sha512_base_do_update(desc, data, len,
+				      __sha512_block_data_order);
+
+	sha512_base_do_finalize(desc, __sha512_block_data_order);
+	kernel_rvv_end();
+
+	return sha512_base_finish(desc, out);
+}
+
+static int sha512_final(struct shash_desc *desc, u8 *out)
+{
+	return sha512_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg sha512_alg = {
+	.digestsize		= SHA512_DIGEST_SIZE,
+	.init			= sha512_base_init,
+	.update			= sha512_update,
+	.final			= sha512_final,
+	.finup			= sha512_finup,
+	.descsize		= sizeof(struct sha512_state),
+	.base.cra_name		= "sha512",
+	.base.cra_driver_name	= "sha512-riscv64-zvknhb",
+	.base.cra_priority	= 150,
+	.base.cra_blocksize	= SHA512_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+};
+
+static int __init sha512_mod_init(void)
+{
+	/* sha512 needs at least a vlen of 256 to work correctly */
+	if (riscv_isa_extension_available(NULL, ZVKNHB) &&
+	    riscv_isa_extension_available(NULL, ZVKB) &&
+	    riscv_vector_vlen() >= 256)
+		return crypto_register_shash(&sha512_alg);
+
+	return 0;
+}
+
+static void __exit sha512_mod_fini(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKNHB) &&
+	    riscv_isa_extension_available(NULL, ZVKB) &&
+	    riscv_vector_vlen() >= 256)
+		crypto_unregister_shash(&sha512_alg);
+}
+
+module_init(sha512_mod_init);
+module_exit(sha512_mod_fini);
+
+MODULE_DESCRIPTION("SHA-512 secure hash for riscv64");
+MODULE_AUTHOR("Andy Polyakov <appro@openssl.org>");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("sha512");
diff --git a/arch/riscv/crypto/sha512-riscv64-zvknhb.pl b/arch/riscv/crypto/sha512-riscv64-zvknhb.pl
new file mode 100644
index 000000000000..f7d609003358
--- /dev/null
+++ b/arch/riscv/crypto/sha512-riscv64-zvknhb.pl
@@ -0,0 +1,347 @@
+#! /usr/bin/env perl
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License").  You may not use
+# this file except in compliance with the License.  You can obtain a copy
+# in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 256
+# - Vector Bit-manipulation used in Cryptography ('Zvkb')
+# - Vector SHA-2 Secure Hash ('Zvknhb')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+my ($V0, $V10, $V11, $V12, $V13, $V14, $V15, $V16, $V17) = ("v0", "v10", "v11", "v12", "v13", "v14","v15", "v16", "v17");
+my ($V26, $V27) = ("v26", "v27");
+
+my $K512 = "K512";
+
+# Function arguments
+my ($H, $INP, $LEN, $KT, $STRIDE) = ("a0", "a1", "a2", "a3", "t3");
+
+################################################################################
+# void sha512_block_data_order(void *c, const void *p, size_t len)
+$code .= <<___;
+.p2align 2
+.globl sha512_block_data_order_zvknhb
+.type sha512_block_data_order_zvknhb,\@function
+sha512_block_data_order_zvknhb:
+    @{[vsetivli__x0_4_e64_m1_ta_ma]}
+
+    # H is stored as {a,b,c,d},{e,f,g,h}, but we need {f,e,b,a},{h,g,d,c}
+    # We achieve this by reading with a negative stride followed by
+    # element sliding.
+    li $STRIDE, -8
+    addi $H, $H, 24
+    @{[vlse64_v $V16, $H, $STRIDE]} # {d,c,b,a}
+    addi $H, $H, 32
+    @{[vlse64_v $V17, $H, $STRIDE]} # {h,g,f,e}
+    # Keep H advanced by 24
+    addi $H, $H, -32
+
+    @{[vmv_v_v $V27, $V16]} # {d,c,b,a}
+    @{[vslidedown_vi $V26, $V16, 2]} # {b,a,0,0}
+    @{[vslidedown_vi $V16, $V17, 2]} # {f,e,0,0}
+    @{[vslideup_vi $V16, $V26, 2]} # {f,e,b,a}
+    @{[vslideup_vi $V17, $V27, 2]} # {h,g,d,c}
+
+    # Keep the old state as we need it later: H' = H+{a',b',c',...,h'}.
+    @{[vmv_v_v $V26, $V16]}
+    @{[vmv_v_v $V27, $V17]}
+
+L_round_loop:
+    la $KT, $K512 # Load round constants K512
+
+    # Load the 1024-bits of the message block in v10-v13 and perform
+    # an endian swap on each 4 bytes element.
+    @{[vle64_v $V10, $INP]}
+    @{[vrev8_v $V10, $V10]}
+    add $INP, $INP, 32
+    @{[vle64_v $V11, $INP]}
+    @{[vrev8_v $V11, $V11]}
+    add $INP, $INP, 32
+    @{[vle64_v $V12, $INP]}
+    @{[vrev8_v $V12, $V12]}
+    add $INP, $INP, 32
+    @{[vle64_v $V13, $INP]}
+    @{[vrev8_v $V13, $V13]}
+    add $INP, $INP, 32
+
+    # Decrement length by 1
+    add $LEN, $LEN, -1
+
+    # Set v0 up for the vmerge that replaces the first word (idx==0)
+    @{[vid_v $V0]}
+    @{[vmseq_vi $V0, $V0, 0x0]} # v0.mask[i] = (i == 0 ? 1 : 0)
+
+    # Quad-round 0 (+0, v10->v11->v12->v13)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}
+
+    # Quad-round 1 (+1, v11->v12->v13->v10)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}
+
+    # Quad-round 2 (+2, v12->v13->v10->v11)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}
+
+    # Quad-round 3 (+3, v13->v10->v11->v12)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}
+
+    # Quad-round 4 (+0, v10->v11->v12->v13)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}
+
+    # Quad-round 5 (+1, v11->v12->v13->v10)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}
+
+    # Quad-round 6 (+2, v12->v13->v10->v11)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}
+
+    # Quad-round 7 (+3, v13->v10->v11->v12)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}
+
+    # Quad-round 8 (+0, v10->v11->v12->v13)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}
+
+    # Quad-round 9 (+1, v11->v12->v13->v10)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}
+
+    # Quad-round 10 (+2, v12->v13->v10->v11)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}
+
+    # Quad-round 11 (+3, v13->v10->v11->v12)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}
+
+    # Quad-round 12 (+0, v10->v11->v12->v13)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+    @{[vsha2ms_vv $V10, $V14, $V13]}
+
+    # Quad-round 13 (+1, v11->v12->v13->v10)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+    @{[vsha2ms_vv $V11, $V14, $V10]}
+
+    # Quad-round 14 (+2, v12->v13->v10->v11)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+    @{[vsha2ms_vv $V12, $V14, $V11]}
+
+    # Quad-round 15 (+3, v13->v10->v11->v12)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V11, $V10, $V0]}
+    @{[vsha2ms_vv $V13, $V14, $V12]}
+
+    # Quad-round 16 (+0, v10->v11->v12->v13)
+    # Note that we stop generating new message schedule words (Wt, v10-13)
+    # as we already generated all the words we end up consuming (i.e., W[79:76]).
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V10]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V12, $V11, $V0]}
+
+    # Quad-round 17 (+1, v11->v12->v13->v10)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V11]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V13, $V12, $V0]}
+
+    # Quad-round 18 (+2, v12->v13->v10->v11)
+    @{[vle64_v $V15, ($KT)]}
+    addi $KT, $KT, 32
+    @{[vadd_vv $V14, $V15, $V12]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+    @{[vmerge_vvm $V14, $V10, $V13, $V0]}
+
+    # Quad-round 19 (+3, v13->v10->v11->v12)
+    @{[vle64_v $V15, ($KT)]}
+    # No t1 increment needed.
+    @{[vadd_vv $V14, $V15, $V13]}
+    @{[vsha2cl_vv $V17, $V16, $V14]}
+    @{[vsha2ch_vv $V16, $V17, $V14]}
+
+    # H' = H+{a',b',c',...,h'}
+    @{[vadd_vv $V16, $V26, $V16]}
+    @{[vadd_vv $V17, $V27, $V17]}
+    @{[vmv_v_v $V26, $V16]}
+    @{[vmv_v_v $V27, $V17]}
+    bnez $LEN, L_round_loop
+
+    # v26 = v16 = {f,e,b,a}
+    # v27 = v17 = {h,g,d,c}
+    # Let's do the opposit transformation like on entry.
+
+    @{[vslideup_vi $V17, $V16, 2]} # {h,g,f,e}
+
+    @{[vslidedown_vi $V16, $V27, 2]} # {d,c,0,0}
+    @{[vslidedown_vi $V26, $V26, 2]} # {b,a,0,0}
+    @{[vslideup_vi $V16, $V26, 2]} # {d,c,b,a}
+
+    # H is already advanced by 24
+    @{[vsse64_v $V16, $H, $STRIDE]} # {a,b,c,d}
+    addi $H, $H, 32
+    @{[vsse64_v $V17, $H, $STRIDE]} # {e,f,g,h}
+
+    ret
+.size sha512_block_data_order_zvknhb,.-sha512_block_data_order_zvknhb
+
+.p2align 3
+.type $K512,\@object
+$K512:
+    .dword 0x428a2f98d728ae22, 0x7137449123ef65cd
+    .dword 0xb5c0fbcfec4d3b2f, 0xe9b5dba58189dbbc
+    .dword 0x3956c25bf348b538, 0x59f111f1b605d019
+    .dword 0x923f82a4af194f9b, 0xab1c5ed5da6d8118
+    .dword 0xd807aa98a3030242, 0x12835b0145706fbe
+    .dword 0x243185be4ee4b28c, 0x550c7dc3d5ffb4e2
+    .dword 0x72be5d74f27b896f, 0x80deb1fe3b1696b1
+    .dword 0x9bdc06a725c71235, 0xc19bf174cf692694
+    .dword 0xe49b69c19ef14ad2, 0xefbe4786384f25e3
+    .dword 0x0fc19dc68b8cd5b5, 0x240ca1cc77ac9c65
+    .dword 0x2de92c6f592b0275, 0x4a7484aa6ea6e483
+    .dword 0x5cb0a9dcbd41fbd4, 0x76f988da831153b5
+    .dword 0x983e5152ee66dfab, 0xa831c66d2db43210
+    .dword 0xb00327c898fb213f, 0xbf597fc7beef0ee4
+    .dword 0xc6e00bf33da88fc2, 0xd5a79147930aa725
+    .dword 0x06ca6351e003826f, 0x142929670a0e6e70
+    .dword 0x27b70a8546d22ffc, 0x2e1b21385c26c926
+    .dword 0x4d2c6dfc5ac42aed, 0x53380d139d95b3df
+    .dword 0x650a73548baf63de, 0x766a0abb3c77b2a8
+    .dword 0x81c2c92e47edaee6, 0x92722c851482353b
+    .dword 0xa2bfe8a14cf10364, 0xa81a664bbc423001
+    .dword 0xc24b8b70d0f89791, 0xc76c51a30654be30
+    .dword 0xd192e819d6ef5218, 0xd69906245565a910
+    .dword 0xf40e35855771202a, 0x106aa07032bbd1b8
+    .dword 0x19a4c116b8d2d0c8, 0x1e376c085141ab53
+    .dword 0x2748774cdf8eeb99, 0x34b0bcb5e19b48a8
+    .dword 0x391c0cb3c5c95a63, 0x4ed8aa4ae3418acb
+    .dword 0x5b9cca4f7763e373, 0x682e6ff3d6b2b8a3
+    .dword 0x748f82ee5defb2fc, 0x78a5636f43172f60
+    .dword 0x84c87814a1f0ab72, 0x8cc702081a6439ec
+    .dword 0x90befffa23631e28, 0xa4506cebde82bde9
+    .dword 0xbef9a3f7b2c67915, 0xc67178f2e372532b
+    .dword 0xca273eceea26619c, 0xd186b8c721c0c207
+    .dword 0xeada7dd6cde0eb1e, 0xf57d4f7fee6ed178
+    .dword 0x06f067aa72176fba, 0x0a637dc5a2c898a6
+    .dword 0x113f9804bef90dae, 0x1b710b35131c471b
+    .dword 0x28db77f523047d84, 0x32caab7b40c72493
+    .dword 0x3c9ebe0a15c9bebc, 0x431d67c49c100d4c
+    .dword 0x4cc5d4becb3e42b6, 0x597f299cfc657e2a
+    .dword 0x5fcb6fab3ad6faec, 0x6c44198c4a475817
+.size $K512,.-$K512
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";

From patchwork Mon Mar 13 19:13:00 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= <heiko@sntech.de>
X-Patchwork-Id: 69066
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1365202wrd;
        Mon, 13 Mar 2023 12:27:02 -0700 (PDT)
X-Google-Smtp-Source: 
 AK7set/qHzZ5lK73Rd1clJH87uqaSEdeowO6S5txafi/Eincy6AXMgGa9a3WG3IMyoBOxpNB3/51
X-Received: by 2002:a17:902:dacf:b0:1a0:6010:dd0d with SMTP id
 q15-20020a170902dacf00b001a06010dd0dmr1315575plx.36.1678735621880;
        Mon, 13 Mar 2023 12:27:01 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1678735621; cv=none;
        d=google.com; s=arc-20160816;
        b=pMkEOS0yzeJocLZvesImpFBr1T5JKDYEnpJiqu60FCPmB11+khyD3HrDCqgfP3ULzl
         6L1L3xgapbr2xMaaMv0tDe1oRTYekTEv0Cbw9G0RSI84tqiQWVrI8Xihn/7qqlnEQGmf
         o7GlIPFD6uP+uVMDsV6ioGk+wR/a+7VO3gRt143TvS9U9xHqj9pl8jCZ/7vLAOydsHjX
         DtvViMXtMmhmsoqQNDev4k4jedCVda1qEzDhSNsknfSdxYbLVc84tr781+c1ioN0fkvZ
         x0L3k7KBgdsfLq8/lspqa9TbMtcZJG2FiI7mCYOoQoozsl5CxLBVEaY5OfzAXWPBjelo
         5Xgg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from;
        bh=A9u9hc1S1QECegBlwDOljP1DV4KYpwT9fkH6i+GEyUY=;
        b=i0ag+FYxbuvwFmDuHXR38+CTuD6ByDDpRx2uOuD/MhElL/IAiBa6pY2VeTHa03Jb9u
         5o/eDpoPHdSE1JWCWrXHSPTahuZyZOApStoNNX+mFSIWYbgDUFmNb0YbFy2bOcdBcoYW
         zSRhC/G7KzRdboKlu3ir+BLRhdy6jahkmzZK1DPpQ+zop3GaQlDbEOZxc+ux1ZxmKeEn
         fJpAOZrF9jR519vs4Z4RQcIyVJo8ucuFhgVhchSmBeJk0g/m/tjqNl0PixRUSgEA10px
         Afszc4JTrZPs068DITJBg6iCV+JT4VgvDSD0OzqfqD31pDedbtLXokmXFaB2eM2BzL10
         9BPg==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 c8-20020a170902724800b001a05e6d4b08si480210pll.40.2023.03.13.12.26.47;
        Mon, 13 Mar 2023 12:27:01 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231144AbjCMTUm (ORCPT <rfc822;realc9580@gmail.com> + 99 others);
        Mon, 13 Mar 2023 15:20:42 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51214 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230039AbjCMTUe (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 13 Mar 2023 15:20:34 -0400
Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CEAD51165E
        for <linux-kernel@vger.kernel.org>;
 Mon, 13 Mar 2023 12:20:01 -0700 (PDT)
Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169]
 helo=phil.lan)
        by gloria.sntech.de with esmtpsa  (TLS1.3) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.94.2)
        (envelope-from <heiko@sntech.de>)
        id 1pbnbX-00028k-Ds; Mon, 13 Mar 2023 20:13:07 +0100
From: Heiko Stuebner <heiko@sntech.de>
To: palmer@rivosinc.com
Cc: greentime.hu@sifive.com, conor@kernel.org,
        linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
        christoph.muellner@vrull.eu, heiko@sntech.de
Subject: [PATCH RFC v3 14/16] RISC-V: crypto: add Zvkned accelerated AES
 encryption implementation
Date: Mon, 13 Mar 2023 20:13:00 +0100
Message-Id: <20230313191302.580787-15-heiko.stuebner@vrull.eu>
X-Mailer: git-send-email 2.39.0
In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
References: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,
        T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1760281883260865783?=
X-GMAIL-MSGID: =?utf-8?q?1760281883260865783?=

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

This adds an AES implementation using the Zvkned vector crypto instructions.

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig               |  14 +
 arch/riscv/crypto/Makefile              |   7 +
 arch/riscv/crypto/aes-riscv-glue.c      | 169 ++++++++
 arch/riscv/crypto/aes-riscv64-zvkned.pl | 500 ++++++++++++++++++++++++
 4 files changed, 690 insertions(+)
 create mode 100644 arch/riscv/crypto/aes-riscv-glue.c
 create mode 100644 arch/riscv/crypto/aes-riscv64-zvkned.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index da6244f0c0c4..c8abb29bb49b 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -2,6 +2,20 @@
 
 menu "Accelerated Cryptographic Algorithms for CPU (riscv)"
 
+config CRYPTO_AES_RISCV
+	tristate "Ciphers: AES (RISCV)"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_AES
+	help
+	  Block ciphers: AES cipher algorithms (FIPS-197)
+	  Length-preserving ciphers: AES with ECB, CBC, CTR, CTS,
+	    XCTR, and XTS modes
+	  AEAD cipher: AES with CBC, ESSIV, and SHA-256
+	    for fscrypt and dm-crypt
+
+	  Architecture: riscv using one of
+	  - Zvkns
+
 config CRYPTO_GHASH_RISCV64
 	tristate "Hash functions: GHASH"
 	depends on 64BIT && (RISCV_ISA_ZBC || RISCV_ISA_V)
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index 3c94753affdf..e5c702dff883 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -3,6 +3,9 @@
 # linux/arch/riscv/crypto/Makefile
 #
 
+obj-$(CONFIG_CRYPTO_AES_RISCV) += aes-riscv.o
+aes-riscv-y := aes-riscv-glue.o aes-riscv64-zvkned.o
+
 obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o
 ghash-riscv64-y := ghash-riscv64-glue.o
 ifdef CONFIG_RISCV_ISA_ZBC
@@ -21,6 +24,9 @@ sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvknhb.o
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
 
+$(obj)/aes-riscv64-zvkned.S: $(src)/aes-riscv64-zvkned.pl
+	$(call cmd,perlasm)
+
 $(obj)/ghash-riscv64-zbc.S: $(src)/ghash-riscv64-zbc.pl
 	$(call cmd,perlasm)
 
@@ -36,5 +42,6 @@ $(obj)/sha256-riscv64-zvknhb.S: $(src)/sha256-riscv64-zvknha.pl
 $(obj)/sha512-riscv64-zvknhb.S: $(src)/sha512-riscv64-zvknhb.pl
 	$(call cmd,perlasm)
 
+clean-files += aes-riscv64-zvkned.S
 clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S
 clean-files += sha256-riscv64-zvknha.S sha512-riscv64-zvknhb.S
diff --git a/arch/riscv/crypto/aes-riscv-glue.c b/arch/riscv/crypto/aes-riscv-glue.c
new file mode 100644
index 000000000000..f0b73058bb54
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv-glue.c
@@ -0,0 +1,169 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Linux/riscv port of the OpenSSL AES implementation for RISCV
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ */
+
+#include <linux/crypto.h>
+#include <linux/delay.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/aes.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/internal/simd.h>
+
+struct aes_key {
+	u8 key[AES_MAX_KEYLENGTH];
+	int rounds;
+};
+
+/* variant using the zvkned vector crypto extension */
+void rv64i_zvkned_encrypt(const u8 *in, u8 *out, const struct aes_key *key);
+void rv64i_zvkned_decrypt(const u8 *in, u8 *out, const struct aes_key *key);
+int rv64i_zvkned_set_encrypt_key(const u8 *userKey, const int bits,
+				struct aes_key *key);
+int rv64i_zvkned_set_decrypt_key(const u8 *userKey, const int bits,
+				struct aes_key *key);
+
+struct riscv_aes_ctx {
+	struct crypto_cipher *fallback;
+	struct aes_key enc_key;
+	struct aes_key dec_key;
+	unsigned int keylen;
+};
+
+static int riscv64_aes_init_zvkned(struct crypto_tfm *tfm)
+{
+	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+	const char *alg = crypto_tfm_alg_name(tfm);
+	struct crypto_cipher *fallback;
+
+	fallback = crypto_alloc_cipher(alg, 0, CRYPTO_ALG_NEED_FALLBACK);
+	if (IS_ERR(fallback)) {
+		printk(KERN_ERR
+		       "Failed to allocate transformation for '%s': %ld\n",
+		       alg, PTR_ERR(fallback));
+		return PTR_ERR(fallback);
+	}
+
+	crypto_cipher_set_flags(fallback,
+				crypto_cipher_get_flags((struct
+							 crypto_cipher *)
+							tfm));
+	ctx->fallback = fallback;
+
+	return 0;
+}
+
+static void riscv_aes_exit(struct crypto_tfm *tfm)
+{
+	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (ctx->fallback) {
+		crypto_free_cipher(ctx->fallback);
+		ctx->fallback = NULL;
+	}
+}
+
+static int riscv64_aes_setkey_zvkned(struct crypto_tfm *tfm, const u8 *key,
+			 unsigned int keylen)
+{
+	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+	int ret;
+
+	ctx->keylen = keylen;
+
+	if (keylen == 16 || keylen == 32) {
+		kernel_rvv_begin();
+		ret = rv64i_zvkned_set_encrypt_key(key, keylen * 8, &ctx->enc_key);
+		if (ret != 1) {
+			kernel_rvv_end();
+			return -EINVAL;
+		}
+
+		ret = rv64i_zvkned_set_decrypt_key(key, keylen * 8, &ctx->dec_key);
+		kernel_rvv_end();
+		if (ret != 1)
+			return -EINVAL;
+	}
+
+	ret = crypto_cipher_setkey(ctx->fallback, key, keylen);
+
+	return ret ? -EINVAL : 0;
+}
+
+static void riscv64_aes_encrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (crypto_simd_usable() && (ctx->keylen == 16 || ctx->keylen == 32)) {
+		kernel_rvv_begin();
+		rv64i_zvkned_encrypt(src, dst, &ctx->enc_key);
+		kernel_rvv_end();
+	} else {
+		crypto_cipher_encrypt_one(ctx->fallback, dst, src);
+	}
+}
+
+static void riscv64_aes_decrypt_zvkned(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+	struct riscv_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (crypto_simd_usable() && (ctx->keylen == 16 || ctx->keylen == 32)) {
+		kernel_rvv_begin();
+		rv64i_zvkned_decrypt(src, dst, &ctx->dec_key);
+		kernel_rvv_end();
+	} else {
+		crypto_cipher_decrypt_one(ctx->fallback, dst, src);
+	}
+}
+
+struct crypto_alg riscv64_aes_zvkned_alg = {
+	.cra_name = "aes",
+	.cra_driver_name = "riscv-aes-zvkned",
+	.cra_module = THIS_MODULE,
+	.cra_priority = 300,
+	.cra_type = NULL,
+	.cra_flags = CRYPTO_ALG_TYPE_CIPHER | CRYPTO_ALG_NEED_FALLBACK,
+	.cra_alignmask = 0,
+	.cra_blocksize = AES_BLOCK_SIZE,
+	.cra_ctxsize = sizeof(struct riscv_aes_ctx),
+	.cra_init = riscv64_aes_init_zvkned,
+	.cra_exit = riscv_aes_exit,
+	.cra_cipher = {
+		.cia_min_keysize = AES_MIN_KEY_SIZE,
+		.cia_max_keysize = AES_MAX_KEY_SIZE,
+		.cia_setkey = riscv64_aes_setkey_zvkned,
+		.cia_encrypt = riscv64_aes_encrypt_zvkned,
+		.cia_decrypt = riscv64_aes_decrypt_zvkned,
+	},
+};
+
+static int __init riscv_aes_mod_init(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKNED) &&
+	    riscv_vector_vlen() >= 128)
+		return crypto_register_alg(&riscv64_aes_zvkned_alg);
+
+	return 0;
+}
+
+static void __exit riscv_aes_mod_fini(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKNED) &&
+	    riscv_vector_vlen() >= 128)
+		return crypto_unregister_alg(&riscv64_aes_zvkned_alg);
+}
+
+module_init(riscv_aes_mod_init);
+module_exit(riscv_aes_mod_fini);
+
+MODULE_DESCRIPTION("AES (accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("aes");
diff --git a/arch/riscv/crypto/aes-riscv64-zvkned.pl b/arch/riscv/crypto/aes-riscv64-zvkned.pl
new file mode 100644
index 000000000000..176588723220
--- /dev/null
+++ b/arch/riscv/crypto/aes-riscv64-zvkned.pl
@@ -0,0 +1,500 @@
+#! /usr/bin/env perl
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License").  You may not use
+# this file except in compliance with the License.  You can obtain a copy
+# in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 128
+# - RISC-V vector crypto AES extension ('Zvkned')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+################################################################################
+# int rv64i_zvkned_set_encrypt_key(const unsigned char *userKey, const int bits,
+#                                  AES_KEY *key)
+# int rv64i_zvkned_set_decrypt_key(const unsigned char *userKey, const int bits,
+#                                  AES_KEY *key)
+{
+my ($UKEY,$BITS,$KEYP) = ("a0", "a1", "a2");
+my ($T0,$T1,$T4) = ("t1", "t2", "t4");
+my ($v0,  $v1,  $v2,  $v3,  $v4,  $v5,  $v6,
+          $v7,  $v8,  $v9,  $v10, $v11, $v12,
+          $v13, $v14, $v15, $v16, $v17, $v18,
+          $v19, $v20, $v21, $v22, $v23, $v24,
+) = map("v$_",(0..24));
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_set_encrypt_key
+.type rv64i_zvkned_set_encrypt_key,\@function
+rv64i_zvkned_set_encrypt_key:
+    beqz $UKEY, L_fail_m1
+    beqz $KEYP, L_fail_m1
+
+    # Get proper routine for key size
+    li $T0, 256
+    beq $BITS, $T0, L_set_key_256
+    li $T0, 128
+    beq $BITS, $T0, L_set_key_128
+
+    j L_fail_m2
+
+.size rv64i_zvkned_set_encrypt_key,.-rv64i_zvkned_set_encrypt_key
+___
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_set_decrypt_key
+.type rv64i_zvkned_set_decrypt_key,\@function
+rv64i_zvkned_set_decrypt_key:
+    beqz $UKEY, L_fail_m1
+    beqz $KEYP, L_fail_m1
+
+    # Get proper routine for key size
+    li $T0, 256
+    beq $BITS, $T0, L_set_key_256
+    li $T0, 128
+    beq $BITS, $T0, L_set_key_128
+
+    j L_fail_m2
+
+.size rv64i_zvkned_set_decrypt_key,.-rv64i_zvkned_set_decrypt_key
+___
+
+$code .= <<___;
+.p2align 3
+L_set_key_128:
+    # Store the number of rounds
+    li $T1, 10
+    sw $T1, 240($KEYP)
+
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # Load the key
+    @{[vle32_v $v10, ($UKEY)]}
+
+    # Generate keys for round 2-11 into registers v11-v20.
+    @{[vaeskf1_vi $v11, $v10, 1]}   # v11 <- rk2  (w[ 4, 7])
+    @{[vaeskf1_vi $v12, $v11, 2]}   # v12 <- rk3  (w[ 8,11])
+    @{[vaeskf1_vi $v13, $v12, 3]}   # v13 <- rk4  (w[12,15])
+    @{[vaeskf1_vi $v14, $v13, 4]}   # v14 <- rk5  (w[16,19])
+    @{[vaeskf1_vi $v15, $v14, 5]}   # v15 <- rk6  (w[20,23])
+    @{[vaeskf1_vi $v16, $v15, 6]}   # v16 <- rk7  (w[24,27])
+    @{[vaeskf1_vi $v17, $v16, 7]}   # v17 <- rk8  (w[28,31])
+    @{[vaeskf1_vi $v18, $v17, 8]}   # v18 <- rk9  (w[32,35])
+    @{[vaeskf1_vi $v19, $v18, 9]}   # v19 <- rk10 (w[36,39])
+    @{[vaeskf1_vi $v20, $v19, 10]}  # v20 <- rk11 (w[40,43])
+
+    # Store the round keys
+    @{[vse32_v $v10, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v11, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v12, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v13, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v14, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v15, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v16, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v17, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v18, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v19, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v20, ($KEYP)]}
+
+    li a0, 1
+    ret
+.size L_set_key_128,.-L_set_key_128
+___
+
+$code .= <<___;
+.p2align 3
+L_set_key_256:
+    # Store the number of rounds
+    li $T1, 14
+    sw $T1, 240($KEYP)
+
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # Load the key
+    @{[vle32_v $v10, ($UKEY)]}
+    addi $UKEY, $UKEY, 16
+    @{[vle32_v $v11, ($UKEY)]}
+
+    @{[vmv_v_v $v12, $v10]}
+    @{[vaeskf2_vi $v12, $v11, 1]}
+    @{[vmv_v_v $v13, $v11]}
+    @{[vaeskf2_vi $v13, $v12, 2]}
+    @{[vmv_v_v $v14, $v12]}
+    @{[vaeskf2_vi $v14, $v13, 3]}
+    @{[vmv_v_v $v15, $v13]}
+    @{[vaeskf2_vi $v15, $v14, 4]}
+    @{[vmv_v_v $v16, $v14]}
+    @{[vaeskf2_vi $v16, $v15, 5]}
+    @{[vmv_v_v $v17, $v15]}
+    @{[vaeskf2_vi $v17, $v16, 6]}
+    @{[vmv_v_v $v18, $v16]}
+    @{[vaeskf2_vi $v18, $v17, 7]}
+    @{[vmv_v_v $v19, $v17]}
+    @{[vaeskf2_vi $v19, $v18, 8]}
+    @{[vmv_v_v $v20, $v18]}
+    @{[vaeskf2_vi $v20, $v19, 9]}
+    @{[vmv_v_v $v21, $v19]}
+    @{[vaeskf2_vi $v21, $v20, 10]}
+    @{[vmv_v_v $v22, $v20]}
+    @{[vaeskf2_vi $v22, $v21, 11]}
+    @{[vmv_v_v $v23, $v21]}
+    @{[vaeskf2_vi $v23, $v22, 12]}
+    @{[vmv_v_v $v24, $v22]}
+    @{[vaeskf2_vi $v24, $v23, 13]}
+
+    @{[vse32_v $v10, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v11, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v12, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v13, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v14, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v15, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v16, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v17, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v18, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v19, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v20, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v21, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v22, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v23, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vse32_v $v24, ($KEYP)]}
+
+    li a0, 1
+    ret
+.size L_set_key_256,.-L_set_key_256
+___
+}
+
+################################################################################
+# void rv64i_zvkned_encrypt(const unsigned char *in, unsigned char *out,
+#                           const AES_KEY *key);
+{
+my ($INP,$OUTP,$KEYP) = ("a0", "a1", "a2");
+my ($T0,$T1, $rounds, $T6) = ("a3", "a4", "t5", "t6");
+my ($v0,  $v1,  $v2,  $v3,  $v4,  $v5,  $v6,
+          $v7,  $v8,  $v9,  $v10, $v11, $v12,
+          $v13, $v14, $v15, $v16, $v17, $v18,
+          $v19, $v20, $v21, $v22, $v23, $v24,
+) = map("v$_",(0..24));
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_encrypt
+.type rv64i_zvkned_encrypt,\@function
+rv64i_zvkned_encrypt:
+    # Load number of rounds
+    lwu     $rounds, 240($KEYP)
+
+    # Get proper routine for key size
+    li $T6, 14
+    beq $rounds, $T6, L_enc_256
+    li $T6, 10
+    beq $rounds, $T6, L_enc_128
+
+    j L_fail_m2
+.size rv64i_zvkned_encrypt,.-rv64i_zvkned_encrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_enc_128:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    @{[vle32_v $v10, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v11, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v12, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v13, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v14, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v15, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v16, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v17, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v18, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v19, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v20, ($KEYP)]}
+
+    @{[vle32_v $v1, ($INP)]}
+
+    @{[vaesz_vs $v1, $v10]}    # with round key w[ 0, 3]
+    @{[vaesem_vs $v1, $v11]}   # with round key w[ 4, 7]
+    @{[vaesem_vs $v1, $v12]}   # with round key w[ 8,11]
+    @{[vaesem_vs $v1, $v13]}   # with round key w[12,15]
+    @{[vaesem_vs $v1, $v14]}   # with round key w[16,19]
+    @{[vaesem_vs $v1, $v15]}   # with round key w[20,23]
+    @{[vaesem_vs $v1, $v16]}   # with round key w[24,27]
+    @{[vaesem_vs $v1, $v17]}   # with round key w[28,31]
+    @{[vaesem_vs $v1, $v18]}   # with round key w[32,35]
+    @{[vaesem_vs $v1, $v19]}   # with round key w[36,39]
+    @{[vaesef_vs $v1, $v20]}   # with round key w[40,43]
+
+    @{[vse32_v $v1, ($OUTP)]}
+
+    ret
+.size L_enc_128,.-L_enc_128
+___
+
+$code .= <<___;
+.p2align 3
+L_enc_256:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    @{[vle32_v $v10, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v11, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v12, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v13, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v14, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v15, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v16, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v17, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v18, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v19, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v20, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v21, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v22, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v23, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v24, ($KEYP)]}
+
+    @{[vle32_v $v1, ($INP)]}
+
+    @{[vaesz_vs $v1, $v10]}     # with round key w[ 0, 3]
+    @{[vaesem_vs $v1, $v11]}
+    @{[vaesem_vs $v1, $v12]}
+    @{[vaesem_vs $v1, $v13]}
+    @{[vaesem_vs $v1, $v14]}
+    @{[vaesem_vs $v1, $v15]}
+    @{[vaesem_vs $v1, $v16]}
+    @{[vaesem_vs $v1, $v17]}
+    @{[vaesem_vs $v1, $v18]}
+    @{[vaesem_vs $v1, $v19]}
+    @{[vaesem_vs $v1, $v20]}
+    @{[vaesem_vs $v1, $v21]}
+    @{[vaesem_vs $v1, $v22]}
+    @{[vaesem_vs $v1, $v23]}
+    @{[vaesef_vs $v1, $v24]}
+
+    @{[vse32_v $v1, ($OUTP)]}
+    ret
+.size L_enc_256,.-L_enc_256
+___
+}
+
+################################################################################
+# void rv64i_zvkned_decrypt(const unsigned char *in, unsigned char *out,
+#                           const AES_KEY *key);
+{
+my ($INP,$OUTP,$KEYP) = ("a0", "a1", "a2");
+my ($T0,$T1, $rounds, $T6) = ("a3", "a4", "t5", "t6");
+my ($v0,  $v1,  $v2,  $v3,  $v4,  $v5,  $v6,
+          $v7,  $v8,  $v9,  $v10, $v11, $v12,
+          $v13, $v14, $v15, $v16, $v17, $v18,
+          $v19, $v20, $v21, $v22, $v23, $v24,
+) = map("v$_",(0..24));
+
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvkned_decrypt
+.type rv64i_zvkned_decrypt,\@function
+rv64i_zvkned_decrypt:
+    # Load number of rounds
+    lwu     $rounds, 240($KEYP)
+
+    # Get proper routine for key size
+    li $T6, 14
+    beq $rounds, $T6, L_dec_256
+    li $T6, 10
+    beq $rounds, $T6, L_dec_128
+
+    j L_fail_m2
+.size rv64i_zvkned_decrypt,.-rv64i_zvkned_decrypt
+___
+
+$code .= <<___;
+.p2align 3
+L_dec_128:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    @{[vle32_v $v10, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v11, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v12, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v13, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v14, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v15, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v16, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v17, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v18, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v19, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v20, ($KEYP)]}
+
+    @{[vle32_v $v1, ($INP)]}
+
+    @{[vaesz_vs $v1, $v20]}    # with round key w[43,47]
+    @{[vaesdm_vs $v1, $v19]}   # with round key w[36,39]
+    @{[vaesdm_vs $v1, $v18]}   # with round key w[32,35]
+    @{[vaesdm_vs $v1, $v17]}   # with round key w[28,31]
+    @{[vaesdm_vs $v1, $v16]}   # with round key w[24,27]
+    @{[vaesdm_vs $v1, $v15]}   # with round key w[20,23]
+    @{[vaesdm_vs $v1, $v14]}   # with round key w[16,19]
+    @{[vaesdm_vs $v1, $v13]}   # with round key w[12,15]
+    @{[vaesdm_vs $v1, $v12]}   # with round key w[ 8,11]
+    @{[vaesdm_vs $v1, $v11]}   # with round key w[ 4, 7]
+    @{[vaesdf_vs $v1, $v10]}   # with round key w[ 0, 3]
+
+    @{[vse32_v $v1, ($OUTP)]}
+
+    ret
+.size L_dec_128,.-L_dec_128
+___
+
+$code .= <<___;
+.p2align 3
+L_dec_256:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    @{[vle32_v $v10, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v11, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v12, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v13, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v14, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v15, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v16, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v17, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v18, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v19, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v20, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v21, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v22, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v23, ($KEYP)]}
+    addi $KEYP, $KEYP, 16
+    @{[vle32_v $v24, ($KEYP)]}
+
+    @{[vle32_v $v1, ($INP)]}
+
+    @{[vaesz_vs $v1, $v24]}    # with round key w[56,59]
+    @{[vaesdm_vs $v1, $v23]}   # with round key w[52,55]
+    @{[vaesdm_vs $v1, $v22]}   # with round key w[48,51]
+    @{[vaesdm_vs $v1, $v21]}   # with round key w[44,47]
+    @{[vaesdm_vs $v1, $v20]}   # with round key w[40,43]
+    @{[vaesdm_vs $v1, $v19]}   # with round key w[36,39]
+    @{[vaesdm_vs $v1, $v18]}   # with round key w[32,35]
+    @{[vaesdm_vs $v1, $v17]}   # with round key w[28,31]
+    @{[vaesdm_vs $v1, $v16]}   # with round key w[24,27]
+    @{[vaesdm_vs $v1, $v15]}   # with round key w[20,23]
+    @{[vaesdm_vs $v1, $v14]}   # with round key w[16,19]
+    @{[vaesdm_vs $v1, $v13]}   # with round key w[12,15]
+    @{[vaesdm_vs $v1, $v12]}   # with round key w[ 8,11]
+    @{[vaesdm_vs $v1, $v11]}   # with round key w[ 4, 7]
+    @{[vaesdf_vs $v1, $v10]}   # with round key w[ 0, 3]
+
+    @{[vse32_v $v1, ($OUTP)]}
+
+    ret
+.size L_dec_256,.-L_dec_256
+___
+}
+
+$code .= <<___;
+L_fail_m1:
+    li a0, -1
+    ret
+.size L_fail_m1,.-L_fail_m1
+
+L_fail_m2:
+    li a0, -2
+    ret
+.size L_fail_m2,.-L_fail_m2
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";

From patchwork Mon Mar 13 19:13:01 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= <heiko@sntech.de>
X-Patchwork-Id: 69074
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1376736wrd;
        Mon, 13 Mar 2023 12:56:09 -0700 (PDT)
X-Google-Smtp-Source: 
 AK7set8JES53tBA9HUZeSYbDjGUfsHzKT+vkiM9XDu8k2QXmPjgfmpvMljYxer7CMsiDYwEmwFfe
X-Received: by 2002:a17:902:d50f:b0:19c:d6fe:39c7 with SMTP id
 b15-20020a170902d50f00b0019cd6fe39c7mr42439835plg.41.1678737369708;
        Mon, 13 Mar 2023 12:56:09 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1678737369; cv=none;
        d=google.com; s=arc-20160816;
        b=Ehn8AzU9EiNa+KnFnM+uSCXKs2wRbB7GzsA42qnZPr2HNBd3V5K0Sev1msGpYbxpLG
         4pkUqWKg4Amq+dIOQ76l9uiiHfuGxxC+xc1K09XU9DzHuyJ0hOuBBXDSF/ZUDM+WSm9Q
         20MBy6uGCahQS/ya9tjcTidjKv7NBBljwJN2nLUfEjtRuWR1FfBTGECIMI7/gZCrk5wD
         zqCmOgP+nfOjVPw30lQsXAECqjvVGPWrCsyzlMWZ3Lf4aWMXIm0Z1d+/UAT1JLfaWYeQ
         msHStM6vgoea+V+tQjMKdtOw+pB+UQ56Q8VnaJ0Z/zeejFy955+JM/egoyWE9hlm9iGn
         bpaA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from;
        bh=n27U1p5WWr+zI6nYR6TPegjYDh/bXNoiu8ma5pThcUQ=;
        b=ZKwXXYxXlbmiXuKFBHUrpw3vTmTzElwTgObXzBN9zkqOO+suOCcwqX+4+Lhusv60MQ
         t9ZZlbXNHy+EjBQBoPtU/qA9mBoh7CAVYSFFI13YZCuwoTROif+yMx/QwqjqBoy5SHY9
         N/ob2lQzgHSowpjZWt/DiB8mu2ReKfNzovQk4gnkWBg6K4KMjMgM/8ejNRw8ZxRe0hPt
         GDCnXSCKZnhrFBLNFjqi7/HDaY0qd8oIMpU6wLMVxrPiWhPKvUal2GQW8llU1AIu7xfQ
         /gTb49rl3Hapk9B1U3eAsSeG8GR/p9yAjHYXvvwRJe2QWFsEdQg9225keXNlD66dSrzz
         71DQ==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 33-20020a630e61000000b004fb359820cfsi200184pgo.573.2023.03.13.12.55.53;
        Mon, 13 Mar 2023 12:56:09 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230308AbjCMTWr (ORCPT <rfc822;realc9580@gmail.com> + 99 others);
        Mon, 13 Mar 2023 15:22:47 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48286 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231152AbjCMTTo (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 13 Mar 2023 15:19:44 -0400
Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 140B6265AB
        for <linux-kernel@vger.kernel.org>;
 Mon, 13 Mar 2023 12:19:12 -0700 (PDT)
Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169]
 helo=phil.lan)
        by gloria.sntech.de with esmtpsa  (TLS1.3) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.94.2)
        (envelope-from <heiko@sntech.de>)
        id 1pbnbX-00028k-Mx; Mon, 13 Mar 2023 20:13:07 +0100
From: Heiko Stuebner <heiko@sntech.de>
To: palmer@rivosinc.com
Cc: greentime.hu@sifive.com, conor@kernel.org,
        linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
        christoph.muellner@vrull.eu, heiko@sntech.de
Subject: [PATCH RFC v3 15/16] RISC-V: crypto: add Zvksed accelerated SM4
 encryption implementation
Date: Mon, 13 Mar 2023 20:13:01 +0100
Message-Id: <20230313191302.580787-16-heiko.stuebner@vrull.eu>
X-Mailer: git-send-email 2.39.0
In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
References: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,
        T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1760283716466694827?=
X-GMAIL-MSGID: =?utf-8?q?1760283716466694827?=

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Add support for the SM4 symmetric cipher implemented using the special
instructions provided by the Zvksed vector crypto instructions.

Co-developed-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig               |  17 ++
 arch/riscv/crypto/Makefile              |   7 +
 arch/riscv/crypto/sm4-riscv64-glue.c    | 163 ++++++++++++++
 arch/riscv/crypto/sm4-riscv64-zvksed.pl | 270 ++++++++++++++++++++++++
 4 files changed, 457 insertions(+)
 create mode 100644 arch/riscv/crypto/sm4-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sm4-riscv64-zvksed.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index c8abb29bb49b..a78c4fcb4127 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -51,4 +51,21 @@ config CRYPTO_SHA512_RISCV64
 	  Architecture: riscv64
 	  - Zvknhb vector crypto extension
 
+config CRYPTO_SM4_RISCV64
+	tristate "Ciphers: SM4 (ShangMi 4)"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_ALGAPI
+	select CRYPTO_SM4
+	select CRYPTO_SM4_GENERIC
+	help
+	  SM4 cipher algorithms (OSCCA GB/T 32907-2016,
+	  ISO/IEC 18033-3:2010/Amd 1:2021)
+
+	  SM4 (GBT.32907-2016) is a cryptographic standard issued by the
+	  Organization of State Commercial Administration of China (OSCCA)
+	  as an authorized cryptographic algorithms for the use within China.
+
+	  Architecture: riscv64
+	  - Zvksed vector crypto extension
+
 endmenu
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index e5c702dff883..c721da42af4c 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -21,6 +21,9 @@ sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvknhb.o
 obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
 sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvknhb.o
 
+obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o
+sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed.o
+
 quiet_cmd_perlasm = PERLASM $@
       cmd_perlasm = $(PERL) $(<) void $(@)
 
@@ -42,6 +45,10 @@ $(obj)/sha256-riscv64-zvknhb.S: $(src)/sha256-riscv64-zvknha.pl
 $(obj)/sha512-riscv64-zvknhb.S: $(src)/sha512-riscv64-zvknhb.pl
 	$(call cmd,perlasm)
 
+$(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl
+	$(call cmd,perlasm)
+
 clean-files += aes-riscv64-zvkned.S
 clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S
 clean-files += sha256-riscv64-zvknha.S sha512-riscv64-zvknhb.S
+clean-files += sm4-riscv64-zvksed.S
diff --git a/arch/riscv/crypto/sm4-riscv64-glue.c b/arch/riscv/crypto/sm4-riscv64-glue.c
new file mode 100644
index 000000000000..3eb37441f37c
--- /dev/null
+++ b/arch/riscv/crypto/sm4-riscv64-glue.c
@@ -0,0 +1,163 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Linux/riscv64 port of the OpenSSL SM4 implementation for RISCV64
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ */
+
+#include <linux/crypto.h>
+#include <linux/delay.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/sm4.h>
+#include <crypto/internal/cipher.h>
+#include <crypto/internal/simd.h>
+
+struct sm4_key {
+	u32 rkey[SM4_RKEY_WORDS];
+};
+
+void rv64i_zvksed_sm4_encrypt(const u8 *in, u8 *out, const struct sm4_key *key);
+void rv64i_zvksed_sm4_decrypt(const u8 *in, u8 *out, const struct sm4_key *key);
+int rv64i_zvksed_sm4_set_encrypt_key(const u8 *userKey, struct sm4_key *key);
+int rv64i_zvksed_sm4_set_decrypt_key(const u8 *userKey, struct sm4_key *key);
+
+struct riscv_sm4_ctx {
+	struct crypto_cipher *fallback;
+	struct sm4_key enc_key;
+	struct sm4_key dec_key;
+	unsigned int keylen;
+};
+
+static int riscv64_sm4_init_zvksed(struct crypto_tfm *tfm)
+{
+	struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+	const char *alg = crypto_tfm_alg_name(tfm);
+	struct crypto_cipher *fallback;
+
+	fallback = crypto_alloc_cipher(alg, 0, CRYPTO_ALG_NEED_FALLBACK);
+	if (IS_ERR(fallback)) {
+		printk(KERN_ERR
+		       "Failed to allocate fallback for '%s': %ld\n",
+		       alg, PTR_ERR(fallback));
+		return PTR_ERR(fallback);
+	}
+
+	crypto_cipher_set_flags(fallback,
+				crypto_cipher_get_flags((struct
+							 crypto_cipher *)
+							tfm));
+	ctx->fallback = fallback;
+
+	return 0;
+}
+
+static void riscv64_sm4_exit_zvksed(struct crypto_tfm *tfm)
+{
+	struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (ctx->fallback) {
+		crypto_free_cipher(ctx->fallback);
+		ctx->fallback = NULL;
+	}
+}
+
+static int riscv64_sm4_setkey_zvksed(struct crypto_tfm *tfm, const u8 *key,
+				     unsigned int keylen)
+{
+	struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+	int ret;
+
+	ctx->keylen = keylen;
+
+	kernel_rvv_begin();
+	ret = rv64i_zvksed_sm4_set_encrypt_key(key, &ctx->enc_key);
+	if (ret != 1) {
+		kernel_rvv_end();
+		return -EINVAL;
+	}
+
+	ret = rv64i_zvksed_sm4_set_decrypt_key(key, &ctx->dec_key);
+	kernel_rvv_end();
+	if (ret != 1)
+		return -EINVAL;
+
+	ret = crypto_cipher_setkey(ctx->fallback, key, keylen);
+
+	return ret ? -EINVAL : 0;
+}
+
+static void riscv64_sm4_encrypt_zvksed(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+	struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (crypto_simd_usable()) {
+		kernel_rvv_begin();
+		rv64i_zvksed_sm4_encrypt(src, dst, &ctx->enc_key);
+		kernel_rvv_end();
+	} else {
+		crypto_cipher_encrypt_one(ctx->fallback, dst, src);
+	}
+}
+
+static void riscv64_sm4_decrypt_zvksed(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+	struct riscv_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	if (crypto_simd_usable()) {
+		kernel_rvv_begin();
+		rv64i_zvksed_sm4_decrypt(src, dst, &ctx->dec_key);
+		kernel_rvv_end();
+	} else {
+		crypto_cipher_decrypt_one(ctx->fallback, dst, src);
+	}
+}
+
+struct crypto_alg riscv64_sm4_zvksed_alg = {
+	.cra_name = "sm4",
+	.cra_driver_name = "riscv-sm4-zvksed",
+	.cra_module = THIS_MODULE,
+	.cra_priority = 300,
+	.cra_flags = CRYPTO_ALG_TYPE_CIPHER | CRYPTO_ALG_NEED_FALLBACK,
+	.cra_blocksize = SM4_BLOCK_SIZE,
+	.cra_ctxsize = sizeof(struct riscv_sm4_ctx),
+	.cra_init = riscv64_sm4_init_zvksed,
+	.cra_exit = riscv64_sm4_exit_zvksed,
+	.cra_cipher = {
+		.cia_min_keysize = SM4_KEY_SIZE,
+		.cia_max_keysize = SM4_KEY_SIZE,
+		.cia_setkey = riscv64_sm4_setkey_zvksed,
+		.cia_encrypt = riscv64_sm4_encrypt_zvksed,
+		.cia_decrypt = riscv64_sm4_decrypt_zvksed,
+	},
+};
+
+static int __init riscv64_sm4_mod_init(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKSED) &&
+	    riscv_isa_extension_available(NULL, ZVKB) &&
+	    riscv_vector_vlen() >= 128)
+		return crypto_register_alg(&riscv64_sm4_zvksed_alg);
+
+	return 0;
+}
+
+static void __exit riscv64_sm4_mod_fini(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKSED) &&
+	    riscv_isa_extension_available(NULL, ZVKB) &&
+	    riscv_vector_vlen() >= 128)
+		crypto_unregister_alg(&riscv64_sm4_zvksed_alg);
+}
+
+module_init(riscv64_sm4_mod_init);
+module_exit(riscv64_sm4_mod_fini);
+
+MODULE_DESCRIPTION("SM4 (accelerated)");
+MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("sm4");
diff --git a/arch/riscv/crypto/sm4-riscv64-zvksed.pl b/arch/riscv/crypto/sm4-riscv64-zvksed.pl
new file mode 100644
index 000000000000..3c948f273071
--- /dev/null
+++ b/arch/riscv/crypto/sm4-riscv64-zvksed.pl
@@ -0,0 +1,270 @@
+#! /usr/bin/env perl
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License").  You may not use
+# this file except in compliance with the License.  You can obtain a copy
+# in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 128
+# - Vector Bit-manipulation used in Cryptography ('Zvkb')
+# - Vector ShangMi Suite: SM4 Block Cipher ('Zvksed')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+####
+# int rv64i_zvksed_sm4_set_encrypt_key(const unsigned char *userKey,
+#                                      SM4_KEY *key);
+#
+{
+my ($ukey,$keys,$fk)=("a0","a1","t0");
+my ($vukey,$vfk,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_set_encrypt_key
+.type rv64i_zvksed_sm4_set_encrypt_key,\@function
+rv64i_zvksed_sm4_set_encrypt_key:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # Load the user key
+    @{[vle32_v $vukey, $ukey]}
+    @{[vrev8_v $vukey, $vukey]}
+
+    # Load the FK.
+    la $fk, FK
+    @{[vle32_v $vfk, $fk]}
+
+    # Generate round keys.
+    @{[vxor_vv $vukey, $vukey, $vfk]}
+    @{[vsm4k_vi $vk0, $vukey, 0]} # rk[0:3]
+    @{[vsm4k_vi $vk1, $vk0, 1]} # rk[4:7]
+    @{[vsm4k_vi $vk2, $vk1, 2]} # rk[8:11]
+    @{[vsm4k_vi $vk3, $vk2, 3]} # rk[12:15]
+    @{[vsm4k_vi $vk4, $vk3, 4]} # rk[16:19]
+    @{[vsm4k_vi $vk5, $vk4, 5]} # rk[20:23]
+    @{[vsm4k_vi $vk6, $vk5, 6]} # rk[24:27]
+    @{[vsm4k_vi $vk7, $vk6, 7]} # rk[28:31]
+
+    # Store round keys
+    @{[vse32_v $vk0, $keys]} # rk[0:3]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk1, $keys]} # rk[4:7]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk2, $keys]} # rk[8:11]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk3, $keys]} # rk[12:15]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk4, $keys]} # rk[16:19]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk5, $keys]} # rk[20:23]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk6, $keys]} # rk[24:27]
+    addi $keys, $keys, 16
+    @{[vse32_v $vk7, $keys]} # rk[28:31]
+
+    li a0, 1
+    ret
+.size rv64i_zvksed_sm4_set_encrypt_key,.-rv64i_zvksed_sm4_set_encrypt_key
+___
+}
+
+####
+# int rv64i_zvksed_sm4_set_decrypt_key(const unsigned char *userKey,
+#                                      SM4_KEY *key);
+#
+{
+my ($ukey,$keys,$fk,$stride)=("a0","a1","t0","t1");
+my ($vukey,$vfk,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_set_decrypt_key
+.type rv64i_zvksed_sm4_set_decrypt_key,\@function
+rv64i_zvksed_sm4_set_decrypt_key:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # Load the user key
+    @{[vle32_v $vukey, $ukey]}
+    @{[vrev8_v $vukey, $vukey]}
+
+    # Load the FK.
+    la $fk, FK
+    @{[vle32_v $vfk, $fk]}
+
+    # Generate round keys.
+    @{[vxor_vv $vukey, $vukey, $vfk]}
+    @{[vsm4k_vi $vk0, $vukey, 0]} # rk[0:3]
+    @{[vsm4k_vi $vk1, $vk0, 1]} # rk[4:7]
+    @{[vsm4k_vi $vk2, $vk1, 2]} # rk[8:11]
+    @{[vsm4k_vi $vk3, $vk2, 3]} # rk[12:15]
+    @{[vsm4k_vi $vk4, $vk3, 4]} # rk[16:19]
+    @{[vsm4k_vi $vk5, $vk4, 5]} # rk[20:23]
+    @{[vsm4k_vi $vk6, $vk5, 6]} # rk[24:27]
+    @{[vsm4k_vi $vk7, $vk6, 7]} # rk[28:31]
+
+    # Store round keys in reverse order
+    addi $keys, $keys, 12
+    li $stride, -4
+    @{[vsse32_v $vk7, $keys, $stride]} # rk[31:28]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk6, $keys, $stride]} # rk[27:24]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk5, $keys, $stride]} # rk[23:20]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk4, $keys, $stride]} # rk[19:16]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk3, $keys, $stride]} # rk[15:12]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk2, $keys, $stride]} # rk[11:8]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk1, $keys, $stride]} # rk[7:4]
+    addi $keys, $keys, 16
+    @{[vsse32_v $vk0, $keys, $stride]} # rk[3:0]
+
+    li a0, 1
+    ret
+.size rv64i_zvksed_sm4_set_decrypt_key,.-rv64i_zvksed_sm4_set_decrypt_key
+___
+}
+
+####
+# void rv64i_zvksed_sm4_encrypt(const unsigned char *in, unsigned char *out,
+#                               const SM4_KEY *key);
+#
+{
+my ($in,$out,$keys,$stride)=("a0","a1","a2","t0");
+my ($vdata,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7,$vgen)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_encrypt
+.type rv64i_zvksed_sm4_encrypt,\@function
+rv64i_zvksed_sm4_encrypt:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # Order of elements was adjusted in set_encrypt_key()
+    @{[vle32_v $vk0, $keys]} # rk[0:3]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk1, $keys]} # rk[4:7]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk2, $keys]} # rk[8:11]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk3, $keys]} # rk[12:15]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk4, $keys]} # rk[16:19]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk5, $keys]} # rk[20:23]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk6, $keys]} # rk[24:27]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk7, $keys]} # rk[28:31]
+
+    # Load input data
+    @{[vle32_v $vdata, $in]}
+    @{[vrev8_v $vdata, $vdata]}
+
+    # Encrypt with all keys
+    @{[vsm4r_vs $vdata, $vk0]}
+    @{[vsm4r_vs $vdata, $vk1]}
+    @{[vsm4r_vs $vdata, $vk2]}
+    @{[vsm4r_vs $vdata, $vk3]}
+    @{[vsm4r_vs $vdata, $vk4]}
+    @{[vsm4r_vs $vdata, $vk5]}
+    @{[vsm4r_vs $vdata, $vk6]}
+    @{[vsm4r_vs $vdata, $vk7]}
+
+    # Save the ciphertext (in reverse element order)
+    @{[vrev8_v $vdata, $vdata]}
+    li $stride, -4
+    addi $out, $out, 12
+    @{[vsse32_v $vdata, $out, $stride]}
+
+    ret
+.size rv64i_zvksed_sm4_encrypt,.-rv64i_zvksed_sm4_encrypt
+___
+}
+
+####
+# void rv64i_zvksed_sm4_decrypt(const unsigned char *in, unsigned char *out,
+#                               const SM4_KEY *key);
+#
+{
+my ($in,$out,$keys,$stride)=("a0","a1","a2","t0");
+my ($vdata,$vk0,$vk1,$vk2,$vk3,$vk4,$vk5,$vk6,$vk7,$vgen)=("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10");
+$code .= <<___;
+.p2align 3
+.globl rv64i_zvksed_sm4_decrypt
+.type rv64i_zvksed_sm4_decrypt,\@function
+rv64i_zvksed_sm4_decrypt:
+    @{[vsetivli__x0_4_e32_m1_ta_ma]}
+
+    # Order of elements was adjusted in set_decrypt_key()
+    @{[vle32_v $vk7, $keys]} # rk[31:28]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk6, $keys]} # rk[27:24]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk5, $keys]} # rk[23:20]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk4, $keys]} # rk[19:16]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk3, $keys]} # rk[15:11]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk2, $keys]} # rk[11:8]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk1, $keys]} # rk[7:4]
+    addi $keys, $keys, 16
+    @{[vle32_v $vk0, $keys]} # rk[3:0]
+
+    # Load input data
+    @{[vle32_v $vdata, $in]}
+    @{[vrev8_v $vdata, $vdata]}
+
+    # Encrypt with all keys
+    @{[vsm4r_vs $vdata, $vk7]}
+    @{[vsm4r_vs $vdata, $vk6]}
+    @{[vsm4r_vs $vdata, $vk5]}
+    @{[vsm4r_vs $vdata, $vk4]}
+    @{[vsm4r_vs $vdata, $vk3]}
+    @{[vsm4r_vs $vdata, $vk2]}
+    @{[vsm4r_vs $vdata, $vk1]}
+    @{[vsm4r_vs $vdata, $vk0]}
+
+    # Save the ciphertext (in reverse element order)
+    @{[vrev8_v $vdata, $vdata]}
+    li $stride, -4
+    addi $out, $out, 12
+    @{[vsse32_v $vdata, $out, $stride]}
+
+    ret
+.size rv64i_zvksed_sm4_decrypt,.-rv64i_zvksed_sm4_decrypt
+___
+}
+
+$code .= <<___;
+# Family Key (little-endian 32-bit chunks)
+.p2align 3
+FK:
+    .word 0xA3B1BAC6, 0x56AA3350, 0x677D9197, 0xB27022DC
+.size FK,.-FK
+___
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";

From patchwork Mon Mar 13 19:13:02 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: =?utf-8?q?Heiko_St=C3=BCbner?= <heiko@sntech.de>
X-Patchwork-Id: 69075
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1378719wrd;
        Mon, 13 Mar 2023 13:00:51 -0700 (PDT)
X-Google-Smtp-Source: 
 AK7set+2Y9O9Hs8GSEBRXbN56H+R4+Zb6diMT5paWigpMq5jqi6g/XmONUzIi/EMpWrSiFxWl3tA
X-Received: by 2002:a05:6a20:431a:b0:cb:77ef:b502 with SMTP id
 h26-20020a056a20431a00b000cb77efb502mr42195347pzk.5.1678737651388;
        Mon, 13 Mar 2023 13:00:51 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1678737651; cv=none;
        d=google.com; s=arc-20160816;
        b=hwGFH+BjczN8axTnbDBQM9vkgBin8HYtpHeVSkRSX25HIvrDee8HQqX+JbC+A+b4j3
         Mw34J7znn9/DFknZGm4ADFpgC2RFQRScXJkh7IgmL8GJGdMHFnRFk4pH+RMQgkorwpmd
         6RzfqJnmF+mRqtQBAIpmx56MhG5riBpjOEc0n15EBNQSzgBT2FmVas961dddaN+fRMsn
         zFM4MK3oDMvncgLbwCvJbovVdyua6q0w1f8XwQ6+sHjUUw0cwc9PcS3lLd8iF/cdM/pv
         SGT2HiUd6TEAVbHOwLgQELqTrvmNAVqomauGCMfmaoWNh6DrJLifum7UUJVcIULv/ZR3
         RVjQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from;
        bh=omJtUFaS2JOlvXouatGep2he25yHHuqqZuBfFBCPUiA=;
        b=GU1W1Wx2q5Ta5w9+Tokn5025hQmC8LnLx2Efd4t0YMICkn/Dj4cStaFCY62PaWiUCg
         gS+3SSU8Pox0oqDHDMT4Qy1fSHXcYloBwxncmj6+aAY9bZff/yl/cS44Zn15q6WKLbbw
         5fPhb0Skl0AzNiqJjZcrvlV+autvbKen3auM7xW4W1l8r5mr8n+5xMVsPeP0IF4tnqS1
         KBilN7aYDFSzScsU8NxTps7FNgfly8CGeRXB73oD1HAmqBJw4/lkcGeBY5s7Q3SEqrCo
         gOkjM2XJVxRfWTn+9N5EcXcuJ3Fqb33QyAJKbYtJpswt7YfmkuXnn3GNeTiEDF7fQh3X
         OTOQ==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 g14-20020aa79dce000000b00576c9c3c4aasi332457pfq.5.2023.03.13.13.00.35;
        Mon, 13 Mar 2023 13:00:51 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sntech.de
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231410AbjCMTTK (ORCPT <rfc822;realc9580@gmail.com> + 99 others);
        Mon, 13 Mar 2023 15:19:10 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48446 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231313AbjCMTSk (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 13 Mar 2023 15:18:40 -0400
Received: from gloria.sntech.de (gloria.sntech.de [185.11.138.130])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67D4723312
        for <linux-kernel@vger.kernel.org>;
 Mon, 13 Mar 2023 12:18:02 -0700 (PDT)
Received: from ip4d1634a9.dynamic.kabel-deutschland.de ([77.22.52.169]
 helo=phil.lan)
        by gloria.sntech.de with esmtpsa  (TLS1.3) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.94.2)
        (envelope-from <heiko@sntech.de>)
        id 1pbnbX-00028k-Uz; Mon, 13 Mar 2023 20:13:08 +0100
From: Heiko Stuebner <heiko@sntech.de>
To: palmer@rivosinc.com
Cc: greentime.hu@sifive.com, conor@kernel.org,
        linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
        christoph.muellner@vrull.eu, heiko@sntech.de
Subject: [PATCH RFC v3 16/16] RISC-V: crypto: add Zvksh accelerated SM3 hash
 implementation
Date: Mon, 13 Mar 2023 20:13:02 +0100
Message-Id: <20230313191302.580787-17-heiko.stuebner@vrull.eu>
X-Mailer: git-send-email 2.39.0
In-Reply-To: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
References: <20230313191302.580787-1-heiko.stuebner@vrull.eu>
MIME-Version: 1.0
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,
        T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1760284011632018712?=
X-GMAIL-MSGID: =?utf-8?q?1760284011632018712?=

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

Add support for the SM3 hash function implemented using the special
instructions provided by the Zvksh vector crypto instructions.

Co-developed-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Charalampos Mitrodimas <charalampos.mitrodimas@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
---
 arch/riscv/crypto/Kconfig              |  11 ++
 arch/riscv/crypto/Makefile             |   8 +-
 arch/riscv/crypto/sm3-riscv64-glue.c   | 112 ++++++++++++++
 arch/riscv/crypto/sm3-riscv64-zvksh.pl | 195 +++++++++++++++++++++++++
 4 files changed, 325 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/crypto/sm3-riscv64-glue.c
 create mode 100644 arch/riscv/crypto/sm3-riscv64-zvksh.pl

diff --git a/arch/riscv/crypto/Kconfig b/arch/riscv/crypto/Kconfig
index a78c4fcb4127..9e50e7236036 100644
--- a/arch/riscv/crypto/Kconfig
+++ b/arch/riscv/crypto/Kconfig
@@ -51,6 +51,17 @@ config CRYPTO_SHA512_RISCV64
 	  Architecture: riscv64
 	  - Zvknhb vector crypto extension
 
+config CRYPTO_SM3_RISCV64
+	tristate "Hash functions: SM3 (ShangMi 3)"
+	depends on 64BIT && RISCV_ISA_V
+	select CRYPTO_HASH
+	select CRYPTO_SM3
+	help
+	  SHA-512 secure hash algorithm (FIPS 180)
+
+	  Architecture: riscv64
+	  - Zvknhb vector crypto extension
+
 config CRYPTO_SM4_RISCV64
 	tristate "Ciphers: SM4 (ShangMi 4)"
 	depends on 64BIT && RISCV_ISA_V
diff --git a/arch/riscv/crypto/Makefile b/arch/riscv/crypto/Makefile
index c721da42af4c..79ff81a05d8b 100644
--- a/arch/riscv/crypto/Makefile
+++ b/arch/riscv/crypto/Makefile
@@ -21,6 +21,9 @@ sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvknhb.o
 obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o
 sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvknhb.o
 
+obj-$(CONFIG_CRYPTO_SM3_RISCV64) += sm3-riscv64.o
+sm3-riscv64-y := sm3-riscv64-glue.o sm3-riscv64-zvksh.o
+
 obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o
 sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed.o
 
@@ -45,10 +48,13 @@ $(obj)/sha256-riscv64-zvknhb.S: $(src)/sha256-riscv64-zvknha.pl
 $(obj)/sha512-riscv64-zvknhb.S: $(src)/sha512-riscv64-zvknhb.pl
 	$(call cmd,perlasm)
 
+$(obj)/sm3-riscv64-zvksh.S: $(src)/sm3-riscv64-zvksh.pl
+	$(call cmd,perlasm)
+
 $(obj)/sm4-riscv64-zvksed.S: $(src)/sm4-riscv64-zvksed.pl
 	$(call cmd,perlasm)
 
 clean-files += aes-riscv64-zvkned.S
 clean-files += ghash-riscv64-zbc.S ghash-riscv64-zvkb.S ghash-riscv64-zvkg.S
 clean-files += sha256-riscv64-zvknha.S sha512-riscv64-zvknhb.S
-clean-files += sm4-riscv64-zvksed.S
+clean-files += sm3-riscv64-zvksh.S sm4-riscv64-zvksed.S
diff --git a/arch/riscv/crypto/sm3-riscv64-glue.c b/arch/riscv/crypto/sm3-riscv64-glue.c
new file mode 100644
index 000000000000..455f73c27d1f
--- /dev/null
+++ b/arch/riscv/crypto/sm3-riscv64-glue.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Linux/riscv64 port of the OpenSSL SM3 implementation for RISCV64
+ *
+ * Copyright (C) 2023 VRULL GmbH
+ * Author: Heiko Stuebner <heiko.stuebner@vrull.eu>
+ */
+
+#include <linux/types.h>
+#include <asm/simd.h>
+#include <asm/vector.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <crypto/sha2.h>
+#include <crypto/sm3_base.h>
+
+asmlinkage void ossl_hwsm3_block_data_order_zvksh(u32 *digest, const void *o,
+						  unsigned int num);
+
+static void __sm3_block_data_order(struct sm3_state *sst, u8 const *src,
+				      int blocks)
+{
+	ossl_hwsm3_block_data_order_zvksh(sst->state, src, blocks);
+}
+
+static int riscv64_sm3_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int len)
+{
+	if (crypto_simd_usable()) {
+		int ret;
+
+		kernel_rvv_begin();
+		ret = sm3_base_do_update(desc, data, len,
+					    __sm3_block_data_order);
+		kernel_rvv_end();
+		return ret;
+	} else { 
+		sm3_update(shash_desc_ctx(desc), data, len);
+		return 0;
+	}
+}
+
+static int riscv64_sm3_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
+{
+
+	if (!crypto_simd_usable()) {
+		struct sm3_state *sctx = shash_desc_ctx(desc);
+
+		if (len)
+			sm3_update(sctx, data, len);
+		sm3_final(sctx, out);
+		return 0;
+	}
+
+	kernel_rvv_begin();
+	if (len)
+		sm3_base_do_update(desc, data, len,
+				   __sm3_block_data_order);
+
+	sm3_base_do_finalize(desc, __sm3_block_data_order);
+	kernel_rvv_end();
+
+	return sm3_base_finish(desc, out);
+}
+
+static int riscv64_sm3_final(struct shash_desc *desc, u8 *out)
+{
+	return riscv64_sm3_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg sm3_alg = {
+	.digestsize		= SM3_DIGEST_SIZE,
+	.init			= sm3_base_init,
+	.update			= riscv64_sm3_update,
+	.final			= riscv64_sm3_final,
+	.finup			= riscv64_sm3_finup,
+	.descsize		= sizeof(struct sm3_state),
+	.base.cra_name		= "sm3",
+	.base.cra_driver_name	= "sm3-riscv64-zvksh",
+	.base.cra_priority	= 150,
+	.base.cra_blocksize	= SM3_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+};
+
+static int __init sm3_mod_init(void)
+{
+	/* sm3 needs at least a vlen of 256 to work correctly */
+	if (riscv_isa_extension_available(NULL, ZVKSH) &&
+	    riscv_isa_extension_available(NULL, ZVKB) &&
+	    riscv_vector_vlen() >= 256)
+		return crypto_register_shash(&sm3_alg);
+
+	return 0;
+}
+
+static void __exit sm3_mod_fini(void)
+{
+	if (riscv_isa_extension_available(NULL, ZVKSH) &&
+	    riscv_isa_extension_available(NULL, ZVKB) &&
+	    riscv_vector_vlen() >= 256)
+		crypto_unregister_shash(&sm3_alg);
+}
+
+module_init(sm3_mod_init);
+module_exit(sm3_mod_fini);
+
+MODULE_DESCRIPTION("SM3 secure hash for riscv64");
+MODULE_AUTHOR("Andy Polyakov <appro@openssl.org>");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("sm3");
diff --git a/arch/riscv/crypto/sm3-riscv64-zvksh.pl b/arch/riscv/crypto/sm3-riscv64-zvksh.pl
new file mode 100644
index 000000000000..d6006ef32e4e
--- /dev/null
+++ b/arch/riscv/crypto/sm3-riscv64-zvksh.pl
@@ -0,0 +1,195 @@
+#! /usr/bin/env perl
+# Copyright 2023 The OpenSSL Project Authors. All Rights Reserved.
+#
+# Licensed under the Apache License 2.0 (the "License").  You may not use
+# this file except in compliance with the License.  You can obtain a copy
+# in the file LICENSE in the source distribution or at
+# https://www.openssl.org/source/license.html
+
+# The generated code of this file depends on the following RISC-V extensions:
+# - RV64I
+# - RISC-V vector ('V') with VLEN >= 256
+# - Vector Bit-manipulation used in Cryptography ('Zvkb')
+# - ShangMi Suite: SM3 Secure Hash ('Zvksh')
+
+use strict;
+use warnings;
+
+use FindBin qw($Bin);
+use lib "$Bin";
+use lib "$Bin/../../perlasm";
+use riscv;
+
+# $output is the last argument if it looks like a file (it has an extension)
+# $flavour is the first argument if it doesn't look like a file
+my $output = $#ARGV >= 0 && $ARGV[$#ARGV] =~ m|\.\w+$| ? pop : undef;
+my $flavour = $#ARGV >= 0 && $ARGV[0] !~ m|\.| ? shift : undef;
+
+$output and open STDOUT,">$output";
+
+my $code=<<___;
+.text
+___
+
+################################################################################
+# ossl_hwsm3_block_data_order_zvksh(SM3_CTX *c, const void *p, size_t num);
+{
+my ($CTX, $INPUT, $NUM) = ("a0", "a1", "a2");
+my ($V0, $V1, $V2, $V3, $V4) = ("v0", "v1", "v2", "v3", "v4");
+
+$code .= <<___;
+.text
+.p2align 3
+.globl ossl_hwsm3_block_data_order_zvksh
+.type ossl_hwsm3_block_data_order_zvksh,\@function
+ossl_hwsm3_block_data_order_zvksh:
+    @{[vsetivli__x0_8_e32_m1_ta_ma]}
+
+    # Load initial state of hash context (c->A-H).
+    @{[vle32_v $V0, $CTX]}
+    @{[vrev8_v $V0, $V0]}
+
+L_sm3_loop:
+    # Copy the previous state to v1.
+    # It will be XOR'ed with the current state at the end of the round.
+    @{[vmv_v_v $V1, $V0]}
+
+    # Load the 64B block in 2x32B chunks.
+    @{[vle32_v $V3, $INPUT]} # v3 := {w0, ..., w7}
+    add $INPUT, $INPUT, 32
+
+    @{[vle32_v $V4, $INPUT]} # v4 := {w8, ..., w15}
+    add $INPUT, $INPUT, 32
+
+    add $NUM, $NUM, -1
+
+    # As vsm3c consumes only w0, w1, w4, w5 we need to slide the input
+    # 2 elements down so we process elements w2, w3, w6, w7
+    # This will be repeated for each odd round.
+    @{[vslidedown_vi $V2, $V3, 2]} # v2 := {w2, ..., w7, 0, 0}
+
+    @{[vsm3c_vi $V0, $V3, 0]}
+    @{[vsm3c_vi $V0, $V2, 1]}
+
+    # Prepare a vector with {w4, ..., w11}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w4, ..., w7, 0, 0, 0, 0}
+    @{[vslideup_vi $V2, $V4, 4]}   # v2 := {w4, w5, w6, w7, w8, w9, w10, w11}
+
+    @{[vsm3c_vi $V0, $V2, 2]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w6, w7, w8, w9, w10, w11, 0, 0}
+    @{[vsm3c_vi $V0, $V2, 3]}
+
+    @{[vsm3c_vi $V0, $V4, 4]}
+    @{[vslidedown_vi $V2, $V4, 2]} # v2 := {w10, w11, w12, w13, w14, w15, 0, 0}
+    @{[vsm3c_vi $V0, $V2, 5]}
+
+    @{[vsm3me_vv $V3, $V4, $V3]}  # v3 := {w16, w17, w18, w19, w20, w21, w22, w23}
+
+    # Prepare a register with {w12, w13, w14, w15, w16, w17, w18, w19}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w12, w13, w14, w15, 0, 0, 0, 0}
+    @{[vslideup_vi $V2, $V3, 4]}   # v2 := {w12, w13, w14, w15, w16, w17, w18, w19}
+
+    @{[vsm3c_vi $V0, $V2, 6]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w14, w15, w16, w17, w18, w19, 0, 0}
+    @{[vsm3c_vi $V0, $V2, 7]}
+
+    @{[vsm3c_vi $V0, $V3, 8]}
+    @{[vslidedown_vi $V2, $V3, 2]} # v2 := {w18, w19, w20, w21, w22, w23, 0, 0}
+    @{[vsm3c_vi $V0, $V2, 9]}
+
+    @{[vsm3me_vv $V4, $V3, $V4]} # v4 := {w24, w25, w26, w27, w28, w29, w30, w31}
+
+    # Prepare a register with {w20, w21, w22, w23, w24, w25, w26, w27}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w20, w21, w22, w23, 0, 0, 0, 0}
+    @{[vslideup_vi $V2, $V4, 4]}   # v2 := {w20, w21, w22, w23, w24, w25, w26, w27}
+
+    @{[vsm3c_vi $V0, $V2, 10]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w22, w23, w24, w25, w26, w27, 0, 0}
+    @{[vsm3c_vi $V0, $V2, 11]}
+
+    @{[vsm3c_vi $V0, $V4, 12]}
+    @{[vslidedown_vi $V2, $V4, 2]} # v2 := {w26, w27, w28, w29, w30, w31, 0, 0}
+    @{[vsm3c_vi $V0, $V2, 13]}
+
+    @{[vsm3me_vv $V3, $V4, $V3]} # v3 := {w32, w33, w34, w35, w36, w37, w38, w39}
+
+    # Prepare a register with {w28, w29, w30, w31, w32, w33, w34, w35}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w28, w29, w30, w31, 0, 0, 0, 0}
+    @{[vslideup_vi $V2, $V3, 4]}   # v2 := {w28, w29, w30, w31, w32, w33, w34, w35}
+
+    @{[vsm3c_vi $V0, $V2, 14]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w30, w31, w32, w33, w34, w35, 0, 0}
+    @{[vsm3c_vi $V0, $V2, 15]}
+
+    @{[vsm3c_vi $V0, $V3, 16]}
+    @{[vslidedown_vi $V2, $V3, 2]} # v2 := {w34, w35, w36, w37, w38, w39, 0, 0}
+    @{[vsm3c_vi $V0, $V2, 17]}
+
+    @{[vsm3me_vv $V4, $V3, $V4]}   # v4 := {w40, w41, w42, w43, w44, w45, w46, w47}
+
+    # Prepare a register with {w36, w37, w38, w39, w40, w41, w42, w43}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w36, w37, w38, w39, 0, 0, 0, 0}
+    @{[vslideup_vi $V2, $V4, 4]}   # v2 := {w36, w37, w38, w39, w40, w41, w42, w43}
+
+    @{[vsm3c_vi $V0, $V2, 18]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w38, w39, w40, w41, w42, w43, 0, 0}
+    @{[vsm3c_vi $V0, $V2, 19]}
+
+    @{[vsm3c_vi $V0, $V4, 20]}
+    @{[vslidedown_vi $V2, $V4, 2]} # v2 := {w42, w43, w44, w45, w46, w47, 0, 0}
+    @{[vsm3c_vi $V0, $V2, 21]}
+
+    @{[vsm3me_vv $V3, $V4, $V3]}   # v3 := {w48, w49, w50, w51, w52, w53, w54, w55}
+
+    # Prepare a register with {w44, w45, w46, w47, w48, w49, w50, w51}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w44, w45, w46, w47, 0, 0, 0, 0}
+    @{[vslideup_vi $V2, $V3, 4]}   # v2 := {w44, w45, w46, w47, w48, w49, w50, w51}
+
+    @{[vsm3c_vi $V0, $V2, 22]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w46, w47, w48, w49, w50, w51, 0, 0}
+    @{[vsm3c_vi $V0, $V2, 23]}
+
+    @{[vsm3c_vi $V0, $V3, 24]}
+    @{[vslidedown_vi $V2, $V3, 2]} # v2 := {w50, w51, w52, w53, w54, w55, 0, 0}
+    @{[vsm3c_vi $V0, $V2, 25]}
+
+    @{[vsm3me_vv $V4, $V3, $V4]}   # v4 := {w56, w57, w58, w59, w60, w61, w62, w63}
+
+    # Prepare a register with {w52, w53, w54, w55, w56, w57, w58, w59}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w52, w53, w54, w55, 0, 0, 0, 0}
+    @{[vslideup_vi $V2, $V4, 4]}   # v2 := {w52, w53, w54, w55, w56, w57, w58, w59}
+
+    @{[vsm3c_vi $V0, $V2, 26]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w54, w55, w56, w57, w58, w59, 0, 0}
+    @{[vsm3c_vi $V0, $V2, 27]}
+
+    @{[vsm3c_vi $V0, $V4, 28]}
+    @{[vslidedown_vi $V2, $V4, 2]} # v2 := {w58, w59, w60, w61, w62, w63, 0, 0}
+    @{[vsm3c_vi $V0, $V2, 29]}
+
+    @{[vsm3me_vv $V3, $V4, $V3]}   # v3 := {w64, w65, w66, w67, w68, w69, w70, w71}
+
+    # Prepare a register with {w60, w61, w62, w63, w64, w65, w66, w67}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w60, w61, w62, w63, 0, 0, 0, 0}
+    @{[vslideup_vi $V2, $V3, 4]}   # v2 := {w60, w61, w62, w63, w64, w65, w66, w67}
+
+    @{[vsm3c_vi $V0, $V2, 30]}
+    @{[vslidedown_vi $V2, $V2, 2]} # v2 := {w62, w63, w64, w65, w66, w67, 0, 0}
+    @{[vsm3c_vi $V0, $V2, 31]}
+
+    # XOR in the previous state.
+    @{[vxor_vv $V0, $V0, $V1]}
+
+    bnez $NUM, L_sm3_loop     # Check if there are any more block to process
+L_sm3_end:
+    @{[vrev8_v $V0, $V0]}
+    @{[vse32_v $V0, $CTX]}
+    ret
+
+.size ossl_hwsm3_block_data_order_zvksh,.-ossl_hwsm3_block_data_order_zvksh
+___
+}
+
+print $code;
+
+close STDOUT or die "error closing STDOUT: $!";