[v4] crypto: stm32 - Save and restore between each request

Message ID Y/3N6zFOZeehJQ/p@gondor.apana.org.au
State New
Headers
Series [v4] crypto: stm32 - Save and restore between each request |

Commit Message

Herbert Xu Feb. 28, 2023, 9:48 a.m. UTC
  v4 fixes hmac to not reload the key over and over again causing
the hash state to be corrupted.

---8<---
The Crypto API hashing paradigm requires the hardware state to
be exported between *each* request because multiple unrelated
hashes may be processed concurrently.

The stm32 hardware is capable of producing the hardware hashing
state but it was only doing it in the export function.  This is
not only broken for export as you can't export a kernel pointer
and reimport it, but it also means that concurrent hashing was
fundamentally broken.

Fix this by moving the saving and restoring of hardware hash
state between each and every hashing request.

Also change the emptymsg check in stm32_hash_copy_hash to rely
on whether we have any existing hash state, rather than whether
this particular update request is empty.

Fixes: 8a1012d3f2ab ("crypto: stm32 - Support for STM32 HASH module")
Reported-by: Li kunyu <kunyu@nfschina.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
  

Comments

Linus Walleij Feb. 28, 2023, 8:50 p.m. UTC | #1
On Tue, Feb 28, 2023 at 10:48 AM Herbert Xu <herbert@gondor.apana.org.au> wrote:

> v4 fixes hmac to not reload the key over and over again causing
> the hash state to be corrupted.

OK I tested this, sadly the same results.

Notice though: the HMAC versions fail on test vector 0 and
the non-MAC:ed fail on vector 1, so I guess that means test
vector 0 works with those?

Here is the complete log:

[    2.997312] alg: extra crypto tests enabled.  This is intended for
developer use only.
[   15.203609] Key type encrypted registered
[   22.553791] stm32-hash a03c2000.hash: allocated hmac(sha256) fallback
[   22.561976] alg: ahash: stm32-hmac-sha256 test failed (wrong
result) on test vector 0, cfg="init+update+final aligned buffer"
[   22.573387] Expected:
[   22.575674] 00000000: a2 1b 1f 5d 4c f4 f7 3a 4d d9 39 75 0f 7a 06 6a
[   22.582160] 00000010: 7f 98 cc 13 1c b1 6a 66 92 75 90 21 cf ab 81 81
[   22.588613] Obtained:
[   22.590917] 00000000: 46 24 76 a8 97 dd fd bd 40 d1 42 0e 08 a5 bc fe
[   22.597368] 00000010: eb 25 c3 e2 ad e6 a0 a9 08 3b 32 7b 9e f9 fc a1
[   22.603865] alg: self-tests for hmac(sha256) using
stm32-hmac-sha256 failed (rc=-22)
[   22.603887] ------------[ cut here ]------------
[   22.616297] WARNING: CPU: 1 PID: 75 at crypto/testmgr.c:5864
alg_test.part.0+0x4d0/0x4dc
[   22.624437] alg: self-tests for hmac(sha256) using
stm32-hmac-sha256 failed (rc=-22)
[   22.624448] Modules linked in:
[   22.635258] CPU: 1 PID: 75 Comm: cryptomgr_test Not tainted
6.2.0-12020-g1c3e1a0051be #67
[   22.643437] Hardware name: ST-Ericsson Ux5x0 platform (Device Tree Support)
[   22.650405]  unwind_backtrace from show_stack+0x10/0x14
[   22.655650]  show_stack from dump_stack_lvl+0x40/0x4c
[   22.660724]  dump_stack_lvl from __warn+0x94/0xc0
[   22.665447]  __warn from warn_slowpath_fmt+0x118/0x164
[   22.670601]  warn_slowpath_fmt from alg_test.part.0+0x4d0/0x4dc
[   22.676537]  alg_test.part.0 from cryptomgr_test+0x18/0x38
[   22.682037]  cryptomgr_test from kthread+0xc0/0xc4
[   22.686843]  kthread from ret_from_fork+0x14/0x2c
[   22.691553] Exception stack(0xf0f45fb0 to 0xf0f45ff8)
[   22.696604] 5fa0:                                     00000000
00000000 00000000 00000000
[   22.704779] 5fc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[   22.712953] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   22.719596] ---[ end trace 0000000000000000 ]---
[   22.724494] stm32-hash a03c2000.hash: allocated sha256 fallback
[   22.769732] alg: ahash: stm32-sha256 test failed (wrong result) on
test vector 1, cfg="init+update+final aligned buffer"
[   22.780648] Expected:
[   22.782952] 00000000: ba 78 16 bf 8f 01 cf ea 41 41 40 de 5d ae 22 23
[   22.789392] 00000010: b0 03 61 a3 96 17 7a 9c b4 10 ff 61 f2 00 15 ad
[   22.795874] Obtained:
[   22.798147] 00000000: e3 b0 c4 42 98 fc 1c 14 9a fb f4 c8 99 6f b9 24
[   22.804607] 00000010: 27 ae 41 e4 64 9b 93 4c a4 95 99 1b 78 52 b8 55
[   22.811074] alg: self-tests for sha256 using stm32-sha256 failed (rc=-22)
[   22.811083] ------------[ cut here ]------------
[   22.822480] WARNING: CPU: 1 PID: 85 at crypto/testmgr.c:5864
alg_test.part.0+0x4d0/0x4dc
[   22.830607] alg: self-tests for sha256 using stm32-sha256 failed (rc=-22)
[   22.830615] Modules linked in:
[   22.840457] CPU: 1 PID: 85 Comm: cryptomgr_test Tainted: G        W
         6.2.0-12020-g1c3e1a0051be #67
[   22.850109] Hardware name: ST-Ericsson Ux5x0 platform (Device Tree Support)
[   22.857069]  unwind_backtrace from show_stack+0x10/0x14
[   22.862307]  show_stack from dump_stack_lvl+0x40/0x4c
[   22.867373]  dump_stack_lvl from __warn+0x94/0xc0
[   22.872090]  __warn from warn_slowpath_fmt+0x118/0x164
[   22.877237]  warn_slowpath_fmt from alg_test.part.0+0x4d0/0x4dc
[   22.883167]  alg_test.part.0 from cryptomgr_test+0x18/0x38
[   22.888662]  cryptomgr_test from kthread+0xc0/0xc4
[   22.893462]  kthread from ret_from_fork+0x14/0x2c
[   22.898169] Exception stack(0xf0f6dfb0 to 0xf0f6dff8)
[   22.903216] dfa0:                                     00000000
00000000 00000000 00000000
[   22.911388] dfc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[   22.919559] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   22.926182] ---[ end trace 0000000000000000 ]---
[   36.677933] stm32-hash a03c2000.hash: allocated hmac(sha1) fallback
[   36.686991] alg: ahash: stm32-hmac-sha1 test failed (wrong result)
on test vector 0, cfg="init+update+final aligned buffer"
[   36.698242] Expected:
[   36.700547] 00000000: b6 17 31 86 55 05 72 64 e2 8b c0 b6 fb 37 8c 8e
[   36.707002] 00000010: f1 46 be 00
[   36.710345] Obtained:
[   36.712624] 00000000: 12 3f d7 8b da 01 00 78 6a e8 6b 76 f5 0f 01 bd
[   36.719072] 00000010: 18 e4 77 f3
[   36.722450] alg: self-tests for hmac(sha1) using stm32-hmac-sha1
failed (rc=-22)
[   36.722472] ------------[ cut here ]------------
[   36.734495] WARNING: CPU: 1 PID: 88 at crypto/testmgr.c:5864
alg_test.part.0+0x4d0/0x4dc
[   36.742628] alg: self-tests for hmac(sha1) using stm32-hmac-sha1
failed (rc=-22)
[   36.742637] Modules linked in:
[   36.753097] CPU: 1 PID: 88 Comm: cryptomgr_test Tainted: G        W
         6.2.0-12020-g1c3e1a0051be #67
[   36.762754] Hardware name: ST-Ericsson Ux5x0 platform (Device Tree Support)
[   36.769719]  unwind_backtrace from show_stack+0x10/0x14
[   36.774963]  show_stack from dump_stack_lvl+0x40/0x4c
[   36.780036]  dump_stack_lvl from __warn+0x94/0xc0
[   36.784759]  __warn from warn_slowpath_fmt+0x118/0x164
[   36.789912]  warn_slowpath_fmt from alg_test.part.0+0x4d0/0x4dc
[   36.795847]  alg_test.part.0 from cryptomgr_test+0x18/0x38
[   36.801347]  cryptomgr_test from kthread+0xc0/0xc4
[   36.806153]  kthread from ret_from_fork+0x14/0x2c
[   36.810862] Exception stack(0xf0f79fb0 to 0xf0f79ff8)
[   36.815912] 9fa0:                                     00000000
00000000 00000000 00000000
[   36.824087] 9fc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[   36.832261] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   36.838902] ---[ end trace 0000000000000000 ]---
[   36.843762] stm32-hash a03c2000.hash: allocated sha1 fallback
[   36.889782] alg: ahash: stm32-sha1 test failed (wrong result) on
test vector 1, cfg="init+update+final aligned buffer"
[   36.900507] Expected:
[   36.902786] 00000000: a9 99 3e 36 47 06 81 6a ba 3e 25 71 78 50 c2 6c
[   36.909225] 00000010: 9c d0 d8 9d
[   36.912564] Obtained:
[   36.914834] 00000000: da 39 a3 ee 5e 6b 4b 0d 32 55 bf ef 95 60 18 90
[   36.921296] 00000010: af d8 07 09
[   36.924627] alg: self-tests for sha1 using stm32-sha1 failed (rc=-22)
[   36.924635] ------------[ cut here ]------------
[   36.935687] WARNING: CPU: 1 PID: 100 at crypto/testmgr.c:5864
alg_test.part.0+0x4d0/0x4dc
[   36.943902] alg: self-tests for sha1 using stm32-sha1 failed (rc=-22)
[   36.943909] Modules linked in:
[   36.953406] CPU: 1 PID: 100 Comm: cryptomgr_test Tainted: G
W          6.2.0-12020-g1c3e1a0051be #67
[   36.963144] Hardware name: ST-Ericsson Ux5x0 platform (Device Tree Support)
[   36.970103]  unwind_backtrace from show_stack+0x10/0x14
[   36.975340]  show_stack from dump_stack_lvl+0x40/0x4c
[   36.980404]  dump_stack_lvl from __warn+0x94/0xc0
[   36.985120]  __warn from warn_slowpath_fmt+0x118/0x164
[   36.990266]  warn_slowpath_fmt from alg_test.part.0+0x4d0/0x4dc
[   36.996195]  alg_test.part.0 from cryptomgr_test+0x18/0x38
[   37.001689]  cryptomgr_test from kthread+0xc0/0xc4
[   37.006488]  kthread from ret_from_fork+0x14/0x2c
[   37.011193] Exception stack(0xf0f8dfb0 to 0xf0f8dff8)
[   37.016240] dfa0:                                     00000000
00000000 00000000 00000000
[   37.024411] dfc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[   37.032581] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   37.039222] ---[ end trace 0000000000000000 ]---

Here I have applied a patch like this to see the failing vectors:

commit 1c3e1a0051be234ef109e97075783c28e3b07452 (HEAD ->
ux500-fixup-stm32-cryp-herbert-v4)
Author: Linus Walleij <linus.walleij@linaro.org>
Date:   Mon Dec 26 09:53:10 2022 +0100

    test hacks

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index c91e93ece20b..db511293933b 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -1203,6 +1203,10 @@ static int check_hash_result(const char *type,
        if (memcmp(result, vec->digest, digestsize) != 0) {
                pr_err("alg: %s: %s test failed (wrong result) on test
vector %s, cfg=\"%s\"\n",
                       type, driver, vec_name, cfg->name);
+               pr_err("Expected:\n");
+               hexdump(vec->digest, digestsize);
+               pr_err("Obtained:\n");
+               hexdump(result, digestsize);
                return -EINVAL;

I'm a bit lost on what to try next :/

Yours,
Linus Walleij
  
Herbert Xu March 1, 2023, 1:30 a.m. UTC | #2
On Tue, Feb 28, 2023 at 09:50:55PM +0100, Linus Walleij wrote:
> 
> OK I tested this, sadly the same results.
> 
> Notice though: the HMAC versions fail on test vector 0 and
> the non-MAC:ed fail on vector 1, so I guess that means test
> vector 0 works with those?

Hah, test vector 0 for sha256 is an empty message.  While test
vector 1 is the same as test vector 0 for hmac(sha256).

So I guess at least the fallback is still working :)

Cheers,
  
Herbert Xu March 1, 2023, 1:36 a.m. UTC | #3
On Tue, Feb 28, 2023 at 09:50:55PM +0100, Linus Walleij wrote:
> 
> Notice though: the HMAC versions fail on test vector 0 and
> the non-MAC:ed fail on vector 1, so I guess that means test
> vector 0 works with those?

The failing vector is the first one where we save the state from
the hardware and then try to restore it.

Is your device ux500 or stm32? Perhaps state saving/restoring is
simply broken on ux500 (as the original ux500 driver didn't support
export/import and always used a fallback)?

Thanks,
  
Herbert Xu March 1, 2023, 1:46 a.m. UTC | #4
On Wed, Mar 01, 2023 at 09:36:08AM +0800, Herbert Xu wrote:
>
> Is your device ux500 or stm32? Perhaps state saving/restoring is
> simply broken on ux500 (as the original ux500 driver didn't support
> export/import and always used a fallback)?

Interesting, I dug up the old ux500 driver and even though
it doesn't have export/import hooked up, it does actually appear
to save/restore hardware state.  In fact it seems to do it multiple
times per request, even when it's unnecessary.

I'll try to see if the saving/restoring is subtly different
between ux500 and stm32.

Cheers,
  
Linus Walleij March 1, 2023, 12:22 p.m. UTC | #5
On Wed, Mar 1, 2023 at 2:36 AM Herbert Xu <herbert@gondor.apana.org.au> wrote:

> The failing vector is the first one where we save the state from
> the hardware and then try to restore it.

Yeah that's typical :/

> Is your device ux500 or stm32? Perhaps state saving/restoring is
> simply broken on ux500 (as the original ux500 driver didn't support
> export/import and always used a fallback)?

It's Ux500 but I had no problem with import/export before,
and yeah it has state save/restore in HW.

Yours,
Linus Walleij
  
Herbert Xu March 2, 2023, 1:16 a.m. UTC | #6
On Wed, Mar 01, 2023 at 01:22:13PM +0100, Linus Walleij wrote:
>
> It's Ux500 but I had no problem with import/export before,
> and yeah it has state save/restore in HW.

So with the stm32 driver your ux500 is able to pass the extra
fuzz tests, right? That should indeed test export and import.

Thanks,
  
Herbert Xu March 2, 2023, 6:04 a.m. UTC | #7
On Wed, Mar 01, 2023 at 01:22:13PM +0100, Linus Walleij wrote:
>
> It's Ux500 but I had no problem with import/export before,
> and yeah it has state save/restore in HW.

I think I see the problem.  My patch wasn't waiting for the hash
computation to complete before saving the state so obviously it
will get the wrong hash state every single time.

I'll fix this up and some other inconsistencies (my reading of the
documentation is that there are 54 registers (0-53), not 53) and
resend the patch.

Cheers,
  
lionel.debieve@foss.st.com March 7, 2023, 1:55 p.m. UTC | #8
Hi All,

Sorry for the very (very very) late response.
Thanks for highlighting the issue. I'm worried about the issue seen that
we've fixed at our downstream level.
We (ST) are currently working on upstreaming the new peripheral update for
STM32MP13 that fixed the old issue seen (such as CSR register numbers), and
so on....

The issue about the context management relies on a question I've get time to
ask you. There is no internal test purpose (using test manager) that really
show the need of a hash update that needs to be "self-content". We've seen
the issue using openssl use cases that is not using import/export.
I'm wondering to understand the real need of import/export in the framework
if the request must be safe itself?

From hardware point of view, it is a penalty to wait for completion to save
the context after each request. I understand the need of multiple hash
request in // but I was wondering that it can be managed by the
import/export, but it seems I was wrong. The penalty of the context saving
will impact all hash requests where, in a runtime context is probably not
the most important use case.
I'm looking deeper to check with the DMA use case and there is some new HW
restriction on the coming hash version that doesn't allow the read of CSR
register at some times.

BR,
Lionel


ST Restricted

-----Original Message-----
From: Herbert Xu <herbert@gondor.apana.org.au> 
Sent: Monday, March 6, 2023 5:42 AM
To: Linus Walleij <linus.walleij@linaro.org>
Cc: Lionel Debieve <lionel.debieve@foss.st.com>; Li kunyu
<kunyu@nfschina.com>; davem@davemloft.net;
linux-arm-kernel@lists.infradead.org; linux-crypto@vger.kernel.org;
linux-kernel@vger.kernel.org; linux-stm32@st-md-mailman.stormreply.com;
mcoquelin.stm32@gmail.com
Subject: [v6 PATCH 0/7] crypto: stm32 - Save and restore between each
request

On Sat, Mar 04, 2023 at 05:34:04PM +0800, Herbert Xu wrote:
> 
> I've split the patch up into smaller chunks for easier testing.

v6 fixes a bug in the finup patch that caused the new data to be discarded
instead of hashed.

This patch series fixes the import/export functions in the stm32 driver.  As
usual, a failure in import/export indicates a general bug in the hash driver
that may break as soon as two concurrent users show up and hash at the same
time using any method other than digest or init+finup.

Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page:
http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
  
Herbert Xu March 8, 2023, 3:46 a.m. UTC | #9
On Tue, Mar 07, 2023 at 02:55:29PM +0100, lionel.debieve@foss.st.com wrote:
>
> The issue about the context management relies on a question I've get time to
> ask you. There is no internal test purpose (using test manager) that really
> show the need of a hash update that needs to be "self-content". We've seen

Indeed this functionality is sorely missed.  It shouldn't be hard
to implement, because simply hashing two different requests
interleaved with each other should show the problem:

init(A) => update(A) => init(B) => update(B) => final(A) => final(B)

> the issue using openssl use cases that is not using import/export.
> I'm wondering to understand the real need of import/export in the framework
> if the request must be safe itself?

The hash state is normally stored in the request context.  The
import/export functions let you save a copy of the state for
subsequent processing.  The request could then be freed after
the export and re-allocated prior to import, or in other contexts
the request could be reused for a completely different hash in
the time being (init/update/final).

> >From hardware point of view, it is a penalty to wait for completion to save
> the context after each request. I understand the need of multiple hash
> request in // but I was wondering that it can be managed by the
> import/export, but it seems I was wrong. The penalty of the context saving
> will impact all hash requests where, in a runtime context is probably not
> the most important use case.

Oh of course we try to avoid unnecessary savings/restoring as much
as we can.  That's why we encourage users to use finup/digest as
much as possible, in which case there may be be no need to save and
restore at all.

However, if the user has to do a partial update through the update
function, then we have to save the state.

Cheers,
  

Patch

diff --git a/drivers/crypto/stm32/stm32-hash.c b/drivers/crypto/stm32/stm32-hash.c
index 7bf805563ac2..a4c4cb1735d4 100644
--- a/drivers/crypto/stm32/stm32-hash.c
+++ b/drivers/crypto/stm32/stm32-hash.c
@@ -7,7 +7,6 @@ 
  */
 
 #include <linux/clk.h>
-#include <linux/crypto.h>
 #include <linux/delay.h>
 #include <linux/dma-mapping.h>
 #include <linux/dmaengine.h>
@@ -127,6 +126,16 @@  struct stm32_hash_ctx {
 	int			keylen;
 };
 
+struct stm32_hash_state {
+	u16			bufcnt;
+	u16			buflen;
+
+	u8 buffer[HASH_BUFLEN] __aligned(4);
+
+	/* hash state */
+	u32			hw_context[3 + HASH_CSR_REGISTER_NUMBER];
+};
+
 struct stm32_hash_request_ctx {
 	struct stm32_hash_dev	*hdev;
 	unsigned long		flags;
@@ -134,8 +143,6 @@  struct stm32_hash_request_ctx {
 
 	u8 digest[SHA256_DIGEST_SIZE] __aligned(sizeof(u32));
 	size_t			digcnt;
-	size_t			bufcnt;
-	size_t			buflen;
 
 	/* DMA */
 	struct scatterlist	*sg;
@@ -149,10 +156,7 @@  struct stm32_hash_request_ctx {
 
 	u8			data_type;
 
-	u8 buffer[HASH_BUFLEN] __aligned(sizeof(u32));
-
-	/* Export Context */
-	u32			*hw_context;
+	struct stm32_hash_state state;
 };
 
 struct stm32_hash_algs_info {
@@ -183,7 +187,6 @@  struct stm32_hash_dev {
 	struct ahash_request	*req;
 	struct crypto_engine	*engine;
 
-	int			err;
 	unsigned long		flags;
 
 	struct dma_chan		*dma_lch;
@@ -326,11 +329,12 @@  static void stm32_hash_write_ctrl(struct stm32_hash_dev *hdev, int bufcnt)
 
 static void stm32_hash_append_sg(struct stm32_hash_request_ctx *rctx)
 {
+	struct stm32_hash_state *state = &rctx->state;
 	size_t count;
 
-	while ((rctx->bufcnt < rctx->buflen) && rctx->total) {
+	while ((state->bufcnt < state->buflen) && rctx->total) {
 		count = min(rctx->sg->length - rctx->offset, rctx->total);
-		count = min(count, rctx->buflen - rctx->bufcnt);
+		count = min_t(size_t, count, state->buflen - state->bufcnt);
 
 		if (count <= 0) {
 			if ((rctx->sg->length == 0) && !sg_is_last(rctx->sg)) {
@@ -341,10 +345,10 @@  static void stm32_hash_append_sg(struct stm32_hash_request_ctx *rctx)
 			}
 		}
 
-		scatterwalk_map_and_copy(rctx->buffer + rctx->bufcnt, rctx->sg,
-					 rctx->offset, count, 0);
+		scatterwalk_map_and_copy(state->buffer + state->bufcnt,
+					 rctx->sg, rctx->offset, count, 0);
 
-		rctx->bufcnt += count;
+		state->bufcnt += count;
 		rctx->offset += count;
 		rctx->total -= count;
 
@@ -413,26 +417,27 @@  static int stm32_hash_xmit_cpu(struct stm32_hash_dev *hdev,
 static int stm32_hash_update_cpu(struct stm32_hash_dev *hdev)
 {
 	struct stm32_hash_request_ctx *rctx = ahash_request_ctx(hdev->req);
+	struct stm32_hash_state *state = &rctx->state;
 	int bufcnt, err = 0, final;
 
 	dev_dbg(hdev->dev, "%s flags %lx\n", __func__, rctx->flags);
 
 	final = (rctx->flags & HASH_FLAGS_FINUP);
 
-	while ((rctx->total >= rctx->buflen) ||
-	       (rctx->bufcnt + rctx->total >= rctx->buflen)) {
+	while ((rctx->total >= state->buflen) ||
+	       (state->bufcnt + rctx->total >= state->buflen)) {
 		stm32_hash_append_sg(rctx);
-		bufcnt = rctx->bufcnt;
-		rctx->bufcnt = 0;
-		err = stm32_hash_xmit_cpu(hdev, rctx->buffer, bufcnt, 0);
+		bufcnt = state->bufcnt;
+		state->bufcnt = 0;
+		err = stm32_hash_xmit_cpu(hdev, state->buffer, bufcnt, 0);
 	}
 
 	stm32_hash_append_sg(rctx);
 
 	if (final) {
-		bufcnt = rctx->bufcnt;
-		rctx->bufcnt = 0;
-		err = stm32_hash_xmit_cpu(hdev, rctx->buffer, bufcnt, 1);
+		bufcnt = state->bufcnt;
+		state->bufcnt = 0;
+		err = stm32_hash_xmit_cpu(hdev, state->buffer, bufcnt, 1);
 
 		/* If we have an IRQ, wait for that, else poll for completion */
 		if (hdev->polled) {
@@ -441,8 +446,20 @@  static int stm32_hash_update_cpu(struct stm32_hash_dev *hdev)
 			hdev->flags |= HASH_FLAGS_OUTPUT_READY;
 			err = 0;
 		}
+	} else {
+		u32 *preg = state->hw_context;
+		int i;
+
+		if (!hdev->pdata->ux500)
+			*preg++ = stm32_hash_read(hdev, HASH_IMR);
+		*preg++ = stm32_hash_read(hdev, HASH_STR);
+		*preg++ = stm32_hash_read(hdev, HASH_CR);
+		for (i = 0; i < HASH_CSR_REGISTER_NUMBER; i++)
+			*preg++ = stm32_hash_read(hdev, HASH_CSR(i));
 	}
 
+	rctx->flags |= HASH_FLAGS_INIT;
+
 	return err;
 }
 
@@ -584,10 +601,10 @@  static int stm32_hash_dma_init(struct stm32_hash_dev *hdev)
 static int stm32_hash_dma_send(struct stm32_hash_dev *hdev)
 {
 	struct stm32_hash_request_ctx *rctx = ahash_request_ctx(hdev->req);
+	u32 *buffer = (void *)rctx->state.buffer;
 	struct scatterlist sg[1], *tsg;
 	int err = 0, len = 0, reg, ncp = 0;
 	unsigned int i;
-	u32 *buffer = (void *)rctx->buffer;
 
 	rctx->sg = hdev->req->src;
 	rctx->total = hdev->req->nbytes;
@@ -615,7 +632,7 @@  static int stm32_hash_dma_send(struct stm32_hash_dev *hdev)
 
 				ncp = sg_pcopy_to_buffer(
 					rctx->sg, rctx->nents,
-					rctx->buffer, sg->length - len,
+					rctx->state.buffer, sg->length - len,
 					rctx->total - sg->length + len);
 
 				sg->length = len;
@@ -671,6 +688,8 @@  static int stm32_hash_dma_send(struct stm32_hash_dev *hdev)
 		err = stm32_hash_hmac_dma_send(hdev);
 	}
 
+	rctx->flags |= HASH_FLAGS_INIT;
+
 	return err;
 }
 
@@ -749,14 +768,12 @@  static int stm32_hash_init(struct ahash_request *req)
 		return -EINVAL;
 	}
 
-	rctx->bufcnt = 0;
-	rctx->buflen = HASH_BUFLEN;
+	rctx->state.bufcnt = 0;
+	rctx->state.buflen = HASH_BUFLEN;
 	rctx->total = 0;
 	rctx->offset = 0;
 	rctx->data_type = HASH_DATA_8_BITS;
 
-	memset(rctx->buffer, 0, HASH_BUFLEN);
-
 	if (ctx->flags & HASH_FLAGS_HMAC)
 		rctx->flags |= HASH_FLAGS_HMAC;
 
@@ -774,15 +791,16 @@  static int stm32_hash_final_req(struct stm32_hash_dev *hdev)
 {
 	struct ahash_request *req = hdev->req;
 	struct stm32_hash_request_ctx *rctx = ahash_request_ctx(req);
+	struct stm32_hash_state *state = &rctx->state;
+	int buflen = state->bufcnt;
 	int err;
-	int buflen = rctx->bufcnt;
 
-	rctx->bufcnt = 0;
+	state->bufcnt = 0;
 
 	if (!(rctx->flags & HASH_FLAGS_CPU))
 		err = stm32_hash_dma_send(hdev);
 	else
-		err = stm32_hash_xmit_cpu(hdev, rctx->buffer, buflen, 1);
+		err = stm32_hash_xmit_cpu(hdev, state->buffer, buflen, 1);
 
 	/* If we have an IRQ, wait for that, else poll for completion */
 	if (hdev->polled) {
@@ -832,7 +850,7 @@  static void stm32_hash_copy_hash(struct ahash_request *req)
 	__be32 *hash = (void *)rctx->digest;
 	unsigned int i, hashsize;
 
-	if (hdev->pdata->broken_emptymsg && !req->nbytes)
+	if (hdev->pdata->broken_emptymsg && !(rctx->flags & HASH_FLAGS_INIT))
 		return stm32_hash_emptymsg_fallback(req);
 
 	switch (rctx->flags & HASH_FLAGS_ALGO_MASK) {
@@ -882,11 +900,6 @@  static void stm32_hash_finish_req(struct ahash_request *req, int err)
 	if (!err && (HASH_FLAGS_FINAL & hdev->flags)) {
 		stm32_hash_copy_hash(req);
 		err = stm32_hash_finish(req);
-		hdev->flags &= ~(HASH_FLAGS_FINAL | HASH_FLAGS_CPU |
-				 HASH_FLAGS_INIT | HASH_FLAGS_DMA_READY |
-				 HASH_FLAGS_OUTPUT_READY | HASH_FLAGS_HMAC |
-				 HASH_FLAGS_HMAC_INIT | HASH_FLAGS_HMAC_FINAL |
-				 HASH_FLAGS_HMAC_KEY);
 	} else {
 		rctx->flags |= HASH_FLAGS_ERRORS;
 	}
@@ -897,67 +910,61 @@  static void stm32_hash_finish_req(struct ahash_request *req, int err)
 	crypto_finalize_hash_request(hdev->engine, req, err);
 }
 
-static int stm32_hash_hw_init(struct stm32_hash_dev *hdev,
+static void stm32_hash_hw_init(struct stm32_hash_dev *hdev,
 			      struct stm32_hash_request_ctx *rctx)
 {
 	pm_runtime_get_sync(hdev->dev);
-
-	if (!(HASH_FLAGS_INIT & hdev->flags)) {
-		stm32_hash_write(hdev, HASH_CR, HASH_CR_INIT);
-		stm32_hash_write(hdev, HASH_STR, 0);
-		stm32_hash_write(hdev, HASH_DIN, 0);
-		stm32_hash_write(hdev, HASH_IMR, 0);
-		hdev->err = 0;
-	}
-
-	return 0;
 }
 
-static int stm32_hash_one_request(struct crypto_engine *engine, void *areq);
-static int stm32_hash_prepare_req(struct crypto_engine *engine, void *areq);
-
 static int stm32_hash_handle_queue(struct stm32_hash_dev *hdev,
 				   struct ahash_request *req)
 {
 	return crypto_transfer_hash_request_to_engine(hdev->engine, req);
 }
 
-static int stm32_hash_prepare_req(struct crypto_engine *engine, void *areq)
+static int stm32_hash_one_request(struct crypto_engine *engine, void *areq)
 {
 	struct ahash_request *req = container_of(areq, struct ahash_request,
 						 base);
 	struct stm32_hash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(req));
 	struct stm32_hash_dev *hdev = stm32_hash_find_dev(ctx);
 	struct stm32_hash_request_ctx *rctx;
+	int err = 0;
 
 	if (!hdev)
 		return -ENODEV;
 
+	dev_dbg(hdev->dev, "processing new req, op: %lu, nbytes %d\n",
+		rctx->op, req->nbytes);
+
+	stm32_hash_hw_init(hdev, rctx);
+
 	hdev->req = req;
+	hdev->flags = 0;
 
 	rctx = ahash_request_ctx(req);
 
-	dev_dbg(hdev->dev, "processing new req, op: %lu, nbytes %d\n",
-		rctx->op, req->nbytes);
+	if (rctx->flags & HASH_FLAGS_INIT) {
+		u32 *preg = rctx->state.hw_context;
+		u32 reg;
+		int i;
 
-	return stm32_hash_hw_init(hdev, rctx);
-}
-
-static int stm32_hash_one_request(struct crypto_engine *engine, void *areq)
-{
-	struct ahash_request *req = container_of(areq, struct ahash_request,
-						 base);
-	struct stm32_hash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(req));
-	struct stm32_hash_dev *hdev = stm32_hash_find_dev(ctx);
-	struct stm32_hash_request_ctx *rctx;
-	int err = 0;
+		if (!hdev->pdata->ux500)
+			stm32_hash_write(hdev, HASH_IMR, *preg++);
+		stm32_hash_write(hdev, HASH_STR, *preg++);
+		stm32_hash_write(hdev, HASH_CR, *preg);
+		reg = *preg++ | HASH_CR_INIT;
+		stm32_hash_write(hdev, HASH_CR, reg);
 
-	if (!hdev)
-		return -ENODEV;
+		for (i = 0; i < HASH_CSR_REGISTER_NUMBER; i++)
+			stm32_hash_write(hdev, HASH_CSR(i), *preg++);
 
-	hdev->req = req;
+		hdev->flags |= HASH_FLAGS_INIT;
 
-	rctx = ahash_request_ctx(req);
+		if (rctx->flags & HASH_FLAGS_HMAC)
+			hdev->flags |= HASH_FLAGS_HMAC |
+				       HASH_FLAGS_HMAC_KEY;
+	}
 
 	if (rctx->op == HASH_OP_UPDATE)
 		err = stm32_hash_update_req(hdev);
@@ -985,6 +992,7 @@  static int stm32_hash_enqueue(struct ahash_request *req, unsigned int op)
 static int stm32_hash_update(struct ahash_request *req)
 {
 	struct stm32_hash_request_ctx *rctx = ahash_request_ctx(req);
+	struct stm32_hash_state *state = &rctx->state;
 
 	if (!req->nbytes || !(rctx->flags & HASH_FLAGS_CPU))
 		return 0;
@@ -993,7 +1001,7 @@  static int stm32_hash_update(struct ahash_request *req)
 	rctx->sg = req->src;
 	rctx->offset = 0;
 
-	if ((rctx->bufcnt + rctx->total < rctx->buflen)) {
+	if ((state->bufcnt + rctx->total < state->buflen)) {
 		stm32_hash_append_sg(rctx);
 		return 0;
 	}
@@ -1044,35 +1052,13 @@  static int stm32_hash_digest(struct ahash_request *req)
 static int stm32_hash_export(struct ahash_request *req, void *out)
 {
 	struct stm32_hash_request_ctx *rctx = ahash_request_ctx(req);
-	struct stm32_hash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(req));
-	struct stm32_hash_dev *hdev = stm32_hash_find_dev(ctx);
-	u32 *preg;
-	unsigned int i;
-	int ret;
+	bool empty = !(rctx->flags & HASH_FLAGS_INIT);
+	u8 *p = out;
 
-	pm_runtime_get_sync(hdev->dev);
-
-	ret = stm32_hash_wait_busy(hdev);
-	if (ret)
-		return ret;
-
-	rctx->hw_context = kmalloc_array(3 + HASH_CSR_REGISTER_NUMBER,
-					 sizeof(u32),
-					 GFP_KERNEL);
+	*(u8 *)p = empty;
 
-	preg = rctx->hw_context;
-
-	if (!hdev->pdata->ux500)
-		*preg++ = stm32_hash_read(hdev, HASH_IMR);
-	*preg++ = stm32_hash_read(hdev, HASH_STR);
-	*preg++ = stm32_hash_read(hdev, HASH_CR);
-	for (i = 0; i < HASH_CSR_REGISTER_NUMBER; i++)
-		*preg++ = stm32_hash_read(hdev, HASH_CSR(i));
-
-	pm_runtime_mark_last_busy(hdev->dev);
-	pm_runtime_put_autosuspend(hdev->dev);
-
-	memcpy(out, rctx, sizeof(*rctx));
+	if (!empty)
+		memcpy(p + 1, &rctx->state, sizeof(rctx->state));
 
 	return 0;
 }
@@ -1080,32 +1066,14 @@  static int stm32_hash_export(struct ahash_request *req, void *out)
 static int stm32_hash_import(struct ahash_request *req, const void *in)
 {
 	struct stm32_hash_request_ctx *rctx = ahash_request_ctx(req);
-	struct stm32_hash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(req));
-	struct stm32_hash_dev *hdev = stm32_hash_find_dev(ctx);
-	const u32 *preg = in;
-	u32 reg;
-	unsigned int i;
-
-	memcpy(rctx, in, sizeof(*rctx));
+	const u8 *p = in;
 
-	preg = rctx->hw_context;
-
-	pm_runtime_get_sync(hdev->dev);
+	stm32_hash_init(req);
 
-	if (!hdev->pdata->ux500)
-		stm32_hash_write(hdev, HASH_IMR, *preg++);
-	stm32_hash_write(hdev, HASH_STR, *preg++);
-	stm32_hash_write(hdev, HASH_CR, *preg);
-	reg = *preg++ | HASH_CR_INIT;
-	stm32_hash_write(hdev, HASH_CR, reg);
-
-	for (i = 0; i < HASH_CSR_REGISTER_NUMBER; i++)
-		stm32_hash_write(hdev, HASH_CSR(i), *preg++);
-
-	pm_runtime_mark_last_busy(hdev->dev);
-	pm_runtime_put_autosuspend(hdev->dev);
-
-	kfree(rctx->hw_context);
+	if (!*(u8 *)p) {
+		rctx->flags |= HASH_FLAGS_INIT;
+		memcpy(&rctx->state, p + 1, sizeof(rctx->state));
+	}
 
 	return 0;
 }
@@ -1162,8 +1130,6 @@  static int stm32_hash_cra_init_algs(struct crypto_tfm *tfm,
 		ctx->flags |= HASH_FLAGS_HMAC;
 
 	ctx->enginectx.op.do_one_request = stm32_hash_one_request;
-	ctx->enginectx.op.prepare_request = stm32_hash_prepare_req;
-	ctx->enginectx.op.unprepare_request = NULL;
 
 	return stm32_hash_init_fallback(tfm);
 }
@@ -1255,7 +1221,7 @@  static struct ahash_alg algs_md5[] = {
 		.import = stm32_hash_import,
 		.halg = {
 			.digestsize = MD5_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state) + 1,
 			.base = {
 				.cra_name = "md5",
 				.cra_driver_name = "stm32-md5",
@@ -1282,7 +1248,7 @@  static struct ahash_alg algs_md5[] = {
 		.setkey = stm32_hash_setkey,
 		.halg = {
 			.digestsize = MD5_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state) + 1,
 			.base = {
 				.cra_name = "hmac(md5)",
 				.cra_driver_name = "stm32-hmac-md5",
@@ -1311,7 +1277,7 @@  static struct ahash_alg algs_sha1[] = {
 		.import = stm32_hash_import,
 		.halg = {
 			.digestsize = SHA1_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state) + 1,
 			.base = {
 				.cra_name = "sha1",
 				.cra_driver_name = "stm32-sha1",
@@ -1338,7 +1304,7 @@  static struct ahash_alg algs_sha1[] = {
 		.setkey = stm32_hash_setkey,
 		.halg = {
 			.digestsize = SHA1_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state) + 1,
 			.base = {
 				.cra_name = "hmac(sha1)",
 				.cra_driver_name = "stm32-hmac-sha1",
@@ -1367,7 +1333,7 @@  static struct ahash_alg algs_sha224[] = {
 		.import = stm32_hash_import,
 		.halg = {
 			.digestsize = SHA224_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state) + 1,
 			.base = {
 				.cra_name = "sha224",
 				.cra_driver_name = "stm32-sha224",
@@ -1394,7 +1360,7 @@  static struct ahash_alg algs_sha224[] = {
 		.import = stm32_hash_import,
 		.halg = {
 			.digestsize = SHA224_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state) + 1,
 			.base = {
 				.cra_name = "hmac(sha224)",
 				.cra_driver_name = "stm32-hmac-sha224",
@@ -1423,7 +1389,7 @@  static struct ahash_alg algs_sha256[] = {
 		.import = stm32_hash_import,
 		.halg = {
 			.digestsize = SHA256_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state) + 1,
 			.base = {
 				.cra_name = "sha256",
 				.cra_driver_name = "stm32-sha256",
@@ -1450,7 +1416,7 @@  static struct ahash_alg algs_sha256[] = {
 		.setkey = stm32_hash_setkey,
 		.halg = {
 			.digestsize = SHA256_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state) + 1,
 			.base = {
 				.cra_name = "hmac(sha256)",
 				.cra_driver_name = "stm32-hmac-sha256",