[v5,7/7] crypto: stm32 - Save and restore between each request

Message ID E1pZ2fs-000e27-4H@formenos.hmeau.com
State New
Headers
Series crypto: stm32 - Save and restore between each request |

Commit Message

Herbert Xu March 6, 2023, 4:42 a.m. UTC
  The Crypto API hashing paradigm requires the hardware state to
be exported between *each* request because multiple unrelated
hashes may be processed concurrently.

The stm32 hardware is capable of producing the hardware hashing
state but it was only doing it in the export function.  This is
not only broken for export as you can't export a kernel pointer
and reimport it, but it also means that concurrent hashing was
fundamentally broken.

Fix this by moving the saving and restoring of hardware hash
state between each and every hashing request.

Also change the emptymsg check in stm32_hash_copy_hash to rely
on whether we have any existing hash state, rather than whether
this particular update request is empty.  

Fixes: 8a1012d3f2ab ("crypto: stm32 - Support for STM32 HASH module")
Reported-by: Li kunyu <kunyu@nfschina.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 drivers/crypto/stm32/stm32-hash.c |  172 +++++++++++++-------------------------
 1 file changed, 62 insertions(+), 110 deletions(-)
  

Comments

Linus Walleij March 6, 2023, 10:08 a.m. UTC | #1
On Mon, Mar 6, 2023 at 5:42 AM Herbert Xu <herbert@gondor.apana.org.au> wrote:

> The Crypto API hashing paradigm requires the hardware state to
> be exported between *each* request because multiple unrelated
> hashes may be processed concurrently.
>
> The stm32 hardware is capable of producing the hardware hashing
> state but it was only doing it in the export function.  This is
> not only broken for export as you can't export a kernel pointer
> and reimport it, but it also means that concurrent hashing was
> fundamentally broken.
>
> Fix this by moving the saving and restoring of hardware hash
> state between each and every hashing request.
>
> Also change the emptymsg check in stm32_hash_copy_hash to rely
> on whether we have any existing hash state, rather than whether
> this particular update request is empty.
>
> Fixes: 8a1012d3f2ab ("crypto: stm32 - Support for STM32 HASH module")
> Reported-by: Li kunyu <kunyu@nfschina.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

This partly works (after my folded in fix in patch 5)!

Clean SHA1 and SHA256 works flawlessly.
HMAC still fails, but not until we start testing random vectors:

[    7.541954] alg: ahash: stm32-hmac-sha256 digest() failed on test
vector "random: psize=0 ksize=80"; expected_error=0,
actual_error=-110, cfg="random: may_sleep"
[    7.567212] alg: self-tests for hmac(sha256) using
stm32-hmac-sha256 failed (rc=-110)
[    7.567222] ------------[ cut here ]------------
[    7.579669] WARNING: CPU: 0 PID: 89 at crypto/testmgr.c:5858
alg_test.part.0+0x4d0/0x4dc
[    7.587809] alg: self-tests for hmac(sha256) using
stm32-hmac-sha256 failed (rc=-110)
[    7.587817] Modules linked in:
[    7.598702] CPU: 0 PID: 89 Comm: cryptomgr_test Not tainted
6.2.0-13572-gcdc48b2701b2 #87
[    7.606877] Hardware name: ST-Ericsson Ux5x0 platform (Device Tree Support)
[    7.613842]  unwind_backtrace from show_stack+0x10/0x14
[    7.619080]  show_stack from dump_stack_lvl+0x40/0x4c
[    7.624145]  dump_stack_lvl from __warn+0x94/0xc0
[    7.628861]  __warn from warn_slowpath_fmt+0x118/0x164
[    7.634007]  warn_slowpath_fmt from alg_test.part.0+0x4d0/0x4dc
[    7.639936]  alg_test.part.0 from cryptomgr_test+0x18/0x38
[    7.645430]  cryptomgr_test from kthread+0xc0/0xc4
[    7.650229]  kthread from ret_from_fork+0x14/0x2c
[    7.654936] Exception stack(0xf10b5fb0 to 0xf10b5ff8)
[    7.659984] 5fa0:                                     00000000
00000000 00000000 00000000
[    7.668154] 5fc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[    7.676325] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    7.682986] ---[ end trace 0000000000000000 ]---
[    7.688219] stm32-hash a03c2000.hash: allocated sha256 fallback
[   10.675002] stm32-hash a03c2000.hash: allocated hmac(sha1) fallback
[   11.269604] alg: ahash: stm32-hmac-sha1 finup() failed with err
-110 on test vector "random: psize=0 ksize=15", cfg="random: use_finup
src_divs=[100.0%@+4081] i"
[   11.285037] alg: self-tests for hmac(sha1) using stm32-hmac-sha1
failed (rc=-110)
[   11.285048] ------------[ cut here ]------------
[   11.297141] WARNING: CPU: 1 PID: 102 at crypto/testmgr.c:5858
alg_test.part.0+0x4d0/0x4dc
[   11.305352] alg: self-tests for hmac(sha1) using stm32-hmac-sha1
failed (rc=-110)
[   11.305361] Modules linked in:
[   11.315894] CPU: 1 PID: 102 Comm: cryptomgr_test Tainted: G
W          6.2.0-13572-gcdc48b2701b2 #87
[   11.325633] Hardware name: ST-Ericsson Ux5x0 platform (Device Tree Support)
[   11.332594]  unwind_backtrace from show_stack+0x10/0x14
[   11.337832]  show_stack from dump_stack_lvl+0x40/0x4c
[   11.342897]  dump_stack_lvl from __warn+0x94/0xc0
[   11.347611]  __warn from warn_slowpath_fmt+0x118/0x164
[   11.352758]  warn_slowpath_fmt from alg_test.part.0+0x4d0/0x4dc
[   11.358687]  alg_test.part.0 from cryptomgr_test+0x18/0x38
[   11.364181]  cryptomgr_test from kthread+0xc0/0xc4
[   11.368981]  kthread from ret_from_fork+0x14/0x2c
[   11.373687] Exception stack(0xf10f1fb0 to 0xf10f1ff8)
[   11.378734] 1fa0:                                     00000000
00000000 00000000 00000000
[   11.386906] 1fc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[   11.395076] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   11.401724] ---[ end trace 0000000000000000 ]---
[   11.407162] stm32-hash a03c2000.hash: allocated sha1 fallback

I will try to investigate further as time permits. (Right now I have
to go off for lunch...)

Yours,
Linus Walleij
  
Herbert Xu March 7, 2023, 10:10 a.m. UTC | #2
On Mon, Mar 06, 2023 at 11:08:13AM +0100, Linus Walleij wrote:
>
> This partly works (after my folded in fix in patch 5)!
> 
> Clean SHA1 and SHA256 works flawlessly.
> HMAC still fails, but not until we start testing random vectors:
> 
> [    7.541954] alg: ahash: stm32-hmac-sha256 digest() failed on test
> vector "random: psize=0 ksize=80"; expected_error=0,
> actual_error=-110, cfg="random: may_sleep"
> [    7.567212] alg: self-tests for hmac(sha256) using
> stm32-hmac-sha256 failed (rc=-110)

So it's timing out.  I wonder if the timeout in stm32_hash_wait_busy
is long enough.  Perhaps try adding a zero so that the timeout becomes
100,000us and see if it still breaks?

Thanks,
  
Linus Walleij March 7, 2023, 3:31 p.m. UTC | #3
On Tue, Mar 7, 2023 at 11:10 AM Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Mon, Mar 06, 2023 at 11:08:13AM +0100, Linus Walleij wrote:
> >
> > This partly works (after my folded in fix in patch 5)!
> >
> > Clean SHA1 and SHA256 works flawlessly.
> > HMAC still fails, but not until we start testing random vectors:
> >
> > [    7.541954] alg: ahash: stm32-hmac-sha256 digest() failed on test
> > vector "random: psize=0 ksize=80"; expected_error=0,
> > actual_error=-110, cfg="random: may_sleep"
> > [    7.567212] alg: self-tests for hmac(sha256) using
> > stm32-hmac-sha256 failed (rc=-110)
>
> So it's timing out.  I wonder if the timeout in stm32_hash_wait_busy
> is long enough.  Perhaps try adding a zero so that the timeout becomes
> 100,000us and see if it still breaks?

Sadly this doesn't work.

I tried increasing with one and even two orders of magnitude,
but the timeouts still happen, usually two of them, sometimes
one sometimes three, depending on randomness, as can be
expected.

I think you mentioned something about that we need to store
the key in the state as well though?

Yours,
Linus Walleij
  
Herbert Xu March 8, 2023, 3:23 a.m. UTC | #4
On Tue, Mar 07, 2023 at 04:31:44PM +0100, Linus Walleij wrote:
>
> Sadly this doesn't work.

Nevermind, I know why it doesn't work.  It's specific to ux500
where you're polling the DCAL bit.  But the DCAL bit only works
for the final hash, it doesn't work for the intermediate state.

Let me check how the old ux500 handled this case.

Thanks,
  
Herbert Xu March 8, 2023, 3:40 a.m. UTC | #5
On Wed, Mar 08, 2023 at 11:23:24AM +0800, Herbert Xu wrote:
>
> Nevermind, I know why it doesn't work.  It's specific to ux500
> where you're polling the DCAL bit.  But the DCAL bit only works
> for the final hash, it doesn't work for the intermediate state.
> 
> Let me check how the old ux500 handled this case.

Hmm, it seems to use the same bit.  I guess the meaning must be
different with ux500.

Could you check for me which wait_busy() call is actually failing?
Is it the one I added right before we save the state, or is it
something else?

If it's something perhaps we aren't restoring the state in the
right way, because the stm32 state restoring code is quite different
compared to the ux500 code.

Thanks,
  
Herbert Xu March 8, 2023, 3:52 a.m. UTC | #6
On Wed, Mar 08, 2023 at 11:40:02AM +0800, Herbert Xu wrote:
>
> If it's something perhaps we aren't restoring the state in the
> right way, because the stm32 state restoring code is quite different
> compared to the ux500 code.

Could you also confirm that the old ux500 driver actually passes
all the extra tests on your hardware? It literally saves and
restores the state every 256 bytes :)

Thanks,
  
Linus Walleij March 8, 2023, 9:05 a.m. UTC | #7
On Wed, Mar 8, 2023 at 4:40 AM Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Wed, Mar 08, 2023 at 11:23:24AM +0800, Herbert Xu wrote:
> >
> > Nevermind, I know why it doesn't work.  It's specific to ux500
> > where you're polling the DCAL bit.  But the DCAL bit only works
> > for the final hash, it doesn't work for the intermediate state.
> >
> > Let me check how the old ux500 handled this case.
>
> Hmm, it seems to use the same bit.  I guess the meaning must be
> different with ux500.

Not sure if it's that or if we were just lucky, or the tests were not
as extensive back on the old driver :/

> Could you check for me which wait_busy() call is actually failing?
> Is it the one I added right before we save the state, or is it
> something else?

It times out - always - before writing the HMAC key in
stm32_hash_xmit_cpu().

So this:

                stm32_hash_set_nblw(hdev, length);
                reg = stm32_hash_read(hdev, HASH_STR);
                reg |= HASH_STR_DCAL;
                stm32_hash_write(hdev, HASH_STR, reg);
                if (hdev->flags & HASH_FLAGS_HMAC) {
-                       if (stm32_hash_wait_busy(hdev))
+                       if (stm32_hash_wait_busy(hdev)) {
+                               dev_err(hdev->dev,
+                                       "timeout before writing key in
stm32_hash_xmit_cpu()\n");
                                return -ETIMEDOUT;
+                       }
                        stm32_hash_write_key(hdev);
                }
                return -EINPROGRESS;

Gives:

[    4.812106] stm32-hash a03c2000.hash: allocated hmac(sha256) fallback
[    5.008829] stm32-hash a03c2000.hash: timeout before writing key in
stm32_hash_xmit_cpu()
[    5.017167] alg: ahash: stm32-hmac-sha256 final() failed with err
-110 on test vector "random: psize=0 ksize=70", cfg="random: may_sleep
use_final src_divs=[<fl"
[    5.041034] alg: self-tests for hmac(sha256) using
stm32-hmac-sha256 failed (rc=-110)

I put error messages at all other timeouts too but they do not
trigger.

This is why non-HMAC works fine all of the time.

> If it's something perhaps we aren't restoring the state in the
> right way, because the stm32 state restoring code is quite different
> compared to the ux500 code.

It's just the HMAC that is failing so I'm puzzled, because until
this point it is essentially a clean SHA sum.

I also noticed that it only happens with long keys (more than
64 bytes) that need to set bit 16 (HASH_CR_LKEY) in the control
register.

> Could you also confirm that the old ux500 driver actually passes
> all the extra tests on your hardware? It literally saves and
> restores the state every 256 bytes :)

I had this old patch set where I tried to clean up the old driver:
https://lore.kernel.org/linux-crypto/20220816140049.102306-1-linus.walleij@linaro.org/
I put this old patch set on top of v6.0.

First I enabled the old crypto driver without extended tests: all works fine.

Then I enabled the extended tests. It does seem like it has
the same problem!

[   25.486878] CPU: 1 PID: 91 Comm: cryptomgr_test Not tainted
6.0.0-00016-g23791565aac5 #3
[   25.494967] Hardware name: ST-Ericsson Ux5x0 platform (Device Tree Support)
[   25.501932]  unwind_backtrace from show_stack+0x10/0x14
[   25.507175]  show_stack from dump_stack_lvl+0x40/0x4c
[   25.512236]  dump_stack_lvl from nmi_cpu_backtrace+0xd4/0x104
[   25.517988]  nmi_cpu_backtrace from nmi_trigger_cpumask_backtrace+0xdc/0x128
[   25.525034]  nmi_trigger_cpumask_backtrace from
trigger_single_cpu_backtrace+0x24/0x2c
[   25.532952]  trigger_single_cpu_backtrace from
rcu_dump_cpu_stacks+0x10c/0x150
[   25.540173]  rcu_dump_cpu_stacks from print_cpu_stall+0x14c/0x1d0
[   25.546268]  print_cpu_stall from check_cpu_stall+0x1cc/0x26c
[   25.552013]  check_cpu_stall from rcu_sched_clock_irq+0x74/0x16c
[   25.558017]  rcu_sched_clock_irq from update_process_times+0x68/0x94
[   25.564377]  update_process_times from tick_sched_timer+0x60/0xec
[   25.570470]  tick_sched_timer from __hrtimer_run_queues+0x15c/0x218
[   25.576737]  __hrtimer_run_queues from hrtimer_interrupt+0x124/0x2b0
[   25.583090]  hrtimer_interrupt from twd_handler+0x34/0x3c
[   25.588489]  twd_handler from handle_percpu_devid_irq+0x78/0x134
[   25.594498]  handle_percpu_devid_irq from generic_handle_domain_irq+0x24/0x34
[   25.601640]  generic_handle_domain_irq from gic_handle_irq+0x74/0x88
[   25.608000]  gic_handle_irq from generic_handle_arch_irq+0x34/0x44
[   25.614187]  generic_handle_arch_irq from call_with_stack+0x18/0x20
[   25.620462]  call_with_stack from __irq_svc+0x98/0xb0
[   25.625514] Exception stack(0xf10ad970 to 0xf10ad9b8)
[   25.630563] d960:                                     c1d59a00
40000013 c056bbfc 00005f32
[   25.638734] d980: 00000000 00000008 f10ad9e0 00000020 c1f69bc0
c1f69bc0 c2364140 f10adb34
[   25.646904] d9a0: 00000000 f10ad9c0 c056da50 c099d1ec 20000013 ffffffff
[   25.653512]  __irq_svc from _raw_spin_unlock_irqrestore+0x1c/0x20
[   25.659605]  _raw_spin_unlock_irqrestore from regmap_read+0x50/0x60
[   25.665882]  regmap_read from hash_hw_write_key+0xd8/0x100
[   25.671377]  hash_hw_write_key from init_hash_hw+0xd8/0xfc
[   25.676862]  init_hash_hw from hash_hw_final+0x88/0x36c
[   25.682089]  hash_hw_final from ahash_final+0x58/0x9c
[   25.687138]  ahash_final from do_ahash_op+0x20/0x98
[   25.692019]  do_ahash_op from test_ahash_vec_cfg+0x68c/0x984
[   25.697677]  test_ahash_vec_cfg from test_hash_vec+0x64/0x168
[   25.703421]  test_hash_vec from __alg_test_hash+0x158/0x2d8
[   25.708991]  __alg_test_hash from alg_test_hash+0xc0/0x170
[   25.714474]  alg_test_hash from alg_test.part.0+0x378/0x4b8
[   25.720044]  alg_test.part.0 from cryptomgr_test+0x24/0x44
[   25.725537]  cryptomgr_test from kthread+0xc0/0xc4
[   25.730334]  kthread from ret_from_fork+0x14/0x2c
[   25.735036] Exception stack(0xf10adfb0 to 0xf10adff8)
[   25.740081] dfa0:                                     00000000
00000000 00000000 00000000
[   25.748252] dfc0: 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000
[   25.756422] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000

I can't see 100% if it is the same issue but it does hang on something
related to writing keys.

So for Ux500 at least I suppose it would be best to inhibit .import and
.export when using HMAC with long keys unless I can figure out exactly
what the issue is here. I wonder if that is possible?
Or do I have to remove it from the HMAC algos altogether?

Yours,
Linus Walleij
  
Herbert Xu March 8, 2023, 9:13 a.m. UTC | #8
On Wed, Mar 08, 2023 at 10:05:14AM +0100, Linus Walleij wrote:
>
> So for Ux500 at least I suppose it would be best to inhibit .import and
> .export when using HMAC with long keys unless I can figure out exactly
> what the issue is here. I wonder if that is possible?
> Or do I have to remove it from the HMAC algos altogether?

If the hardware is buggered it's not a big deal if it's just HMAC.
Because any HMAC hash can easily be broken down into three underlying
hash operations.

But let me digest your new information first, and see if we can
figure out a way to get it to work.  If not then we could just disable
hmac (unless we can get confirmation from stm32 hardware we should
just disable it for everything in stm32) and I'll fix the generic
hmac to actually work with hardware drivers.

Thanks,
  
Herbert Xu March 8, 2023, 10:10 a.m. UTC | #9
On Wed, Mar 08, 2023 at 10:05:14AM +0100, Linus Walleij wrote:
>
> [    4.812106] stm32-hash a03c2000.hash: allocated hmac(sha256) fallback
> [    5.008829] stm32-hash a03c2000.hash: timeout before writing key in
> stm32_hash_xmit_cpu()
> [    5.017167] alg: ahash: stm32-hmac-sha256 final() failed with err
> -110 on test vector "random: psize=0 ksize=70", cfg="random: may_sleep
> use_final src_divs=[<fl"

Wait a second, this is an empty message.  Can you reproduce the
hang if you exclude all psize=0 test vectors?

If it's just empty messages, which we know are broken with ux500
to begin with, then we can simply not do the hash at all (doing
it and then throwing it away seems pointless).

Thanks,
  
Herbert Xu March 8, 2023, 10:19 a.m. UTC | #10
On Wed, Mar 08, 2023 at 06:10:14PM +0800, Herbert Xu wrote:
>
> If it's just empty messages, which we know are broken with ux500
> to begin with, then we can simply not do the hash at all (doing
> it and then throwing it away seems pointless).

Here is a patch to not process empty messages at all.

Cheers,
  
Linus Walleij March 8, 2023, 9:19 p.m. UTC | #11
On Wed, Mar 8, 2023 at 11:19 AM Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Wed, Mar 08, 2023 at 06:10:14PM +0800, Herbert Xu wrote:
> >
> > If it's just empty messages, which we know are broken with ux500
> > to begin with, then we can simply not do the hash at all (doing
> > it and then throwing it away seems pointless).
>
> Here is a patch to not process empty messages at all.

Hey it works :D

I had to tweak it slightly because we need to set the state right:

@@ -374,9 +387,20 @@ static int stm32_hash_xmit_cpu(struct stm32_hash_dev *hdev,
        const u32 *buffer = (const u32 *)buf;
        u32 reg;

-       if (final)
+       if (final) {
                hdev->flags |= HASH_FLAGS_FINAL;

+               /* Do not process empty messages if hw is buggy. */
+               if (!(hdev->flags & HASH_FLAGS_INIT) && !length &&
+                   hdev->pdata->broken_emptymsg) {
+                       struct stm32_hash_request_ctx *rctx =
ahash_request_ctx(hdev->req);
+                       struct stm32_hash_state *state = &rctx->state;
+
+                       state->flags |= HASH_FLAGS_EMPTY;
+                       return 0;
+               }
+       }
+
        len32 = DIV_ROUND_UP(length, sizeof(u32));

        dev_dbg(hdev->dev, "%s: length: %zd, final: %x len32 %i\n",


After this it WORKS!

For the "v5.5" patch and this (and my other patch) folded in:
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Tested-by: Linus Walleij <linus.walleij@linaro.org>

So now the driver is fixed from a Ux500 point of view.

I actually have the STM32MP board, I can try to find some
time to test the final patch set on it as well if it has the same
has as the other STM32 SoCs.

Yours,
Linus Walleij
  
Herbert Xu March 9, 2023, 5:58 a.m. UTC | #12
On Wed, Mar 08, 2023 at 10:19:48PM +0100, Linus Walleij wrote:
> 
> So now the driver is fixed from a Ux500 point of view.

I think there is actually a nasty bug in it that may be hard to
trigger.

The stm32 driver as it stands will write up to 256 bytes into
the FIFO which on the ux500 is limited to 64 bytes.  We need to
change the fixed 256-byte size to be dependent on the hardware
type.

Cheers,
  
Linus Walleij March 9, 2023, 7:35 a.m. UTC | #13
On Thu, Mar 9, 2023 at 6:58 AM Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Wed, Mar 08, 2023 at 10:19:48PM +0100, Linus Walleij wrote:
> >
> > So now the driver is fixed from a Ux500 point of view.
>
> I think there is actually a nasty bug in it that may be hard to
> trigger.
>
> The stm32 driver as it stands will write up to 256 bytes into
> the FIFO which on the ux500 is limited to 64 bytes.  We need to
> change the fixed 256-byte size to be dependent on the hardware
> type.

Right so that is done implicitly by using a buffer of 256 bytes.

But actually I think the bug will never trigger, because the datasheet
for the DB8500 (Ux500) says this:

"Then the message can be sent, by writing it word per word into the
HASH_DIN register.
When a block of 512 bits, i.e. 16 words have been written, a partial
digest computation will
start upon writing the first data of the next block. The AHB bus will
be busy for 82 cycles for
SHA-1 algorithm (66 cycles for SHA-256 algorithm)."

The way I interpret it is that if you write 64 bytes (16 32bit words)
the AHB bus will simply
stall until the data is processed, so the writel() hangs there and
then 66/82 bus cycles
later it will continue.

This isn't the prettiest from a system PoV, as it can stall interrupt
handling and
cause latency jitter, but it's not actually a bug. It's kind of
similar to that user
experience "bug" on x86 PCs where the sound starts breaking up if you have too
intense graphics going on, because the bus is too busy so the sound FIFO goes
empty.

But I can certainly make a patch to shrink the buffer from 256 to 64 bytes on
Ux500 if it's the right thing to do.

Yours,
Linus Walleij
  
Herbert Xu March 9, 2023, 9:59 a.m. UTC | #14
On Thu, Mar 09, 2023 at 08:35:21AM +0100, Linus Walleij wrote:
>
> The way I interpret it is that if you write 64 bytes (16 32bit words)
> the AHB bus will simply
> stall until the data is processed, so the writel() hangs there and
> then 66/82 bus cycles
> later it will continue.

You're right.  I'll respin the patches with your updates in it.

Thanks,
  
David Laight March 9, 2023, 10:19 p.m. UTC | #15
From: Linus Walleij
> Sent: 09 March 2023 07:35
...
> But actually I think the bug will never trigger, because the datasheet
> for the DB8500 (Ux500) says this:
> 
> "Then the message can be sent, by writing it word per word into the
> HASH_DIN register.
> When a block of 512 bits, i.e. 16 words have been written, a partial
> digest computation will
> start upon writing the first data of the next block. The AHB bus will
> be busy for 82 cycles for
> SHA-1 algorithm (66 cycles for SHA-256 algorithm)."

What speed clock is that?

4 or 5 extra clocks/word may (or may not) be significant.

In terms of latency it may be noise compared to some PCIe
reads done by hardware interrupt handlers.
Some slow PCIe targets (like the fpga one we use) pretty
much take 1us to handle a read cycle.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
Linus Walleij March 10, 2023, 8:07 a.m. UTC | #16
On Thu, Mar 9, 2023 at 11:19 PM David Laight <David.Laight@aculab.com> wrote:

> > But actually I think the bug will never trigger, because the datasheet
> > for the DB8500 (Ux500) says this:
> >
> > "Then the message can be sent, by writing it word per word into the
> > HASH_DIN register.
> > When a block of 512 bits, i.e. 16 words have been written, a partial
> > digest computation will
> > start upon writing the first data of the next block. The AHB bus will
> > be busy for 82 cycles for
> > SHA-1 algorithm (66 cycles for SHA-256 algorithm)."
>
> What speed clock is that?

133 MHz.

> 4 or 5 extra clocks/word may (or may not) be significant.
>
> In terms of latency it may be noise compared to some PCIe
> reads done by hardware interrupt handlers.
> Some slow PCIe targets (like the fpga one we use) pretty
> much take 1us to handle a read cycle.

So in this case it's 1/133M s = 8ns cycle time, 82 in worst case,
so 82*8 = 656 ns < 1 us.

Yours,
Linus Walleij
  

Patch

diff --git a/drivers/crypto/stm32/stm32-hash.c b/drivers/crypto/stm32/stm32-hash.c
index 95cd8876689b..e87a909dd97e 100644
--- a/drivers/crypto/stm32/stm32-hash.c
+++ b/drivers/crypto/stm32/stm32-hash.c
@@ -95,6 +95,7 @@ 
 #define HASH_FLAGS_SHA1			BIT(19)
 #define HASH_FLAGS_SHA224		BIT(20)
 #define HASH_FLAGS_SHA256		BIT(21)
+#define HASH_FLAGS_EMPTY		BIT(22)
 #define HASH_FLAGS_HMAC			BIT(23)
 
 #define HASH_OP_UPDATE			1
@@ -134,7 +135,7 @@  struct stm32_hash_state {
 	u8 buffer[HASH_BUFLEN] __aligned(4);
 
 	/* hash state */
-	u32			*hw_context;
+	u32			hw_context[3 + HASH_CSR_REGISTER_NUMBER];
 };
 
 struct stm32_hash_request_ctx {
@@ -314,8 +315,11 @@  static void stm32_hash_write_ctrl(struct stm32_hash_dev *hdev, int bufcnt)
 		 * On the Ux500 we need to set a special flag to indicate that
 		 * the message is zero length.
 		 */
-		if (hdev->pdata->ux500 && bufcnt == 0)
+		if (hdev->pdata->ux500 && bufcnt == 0 &&
+		    (state->flags & HASH_FLAGS_FINAL)) {
 			reg |= HASH_CR_UX500_EMPTYMSG;
+			state->flags |= HASH_FLAGS_EMPTY;
+		}
 
 		if (!hdev->polled)
 			stm32_hash_write(hdev, HASH_IMR, HASH_DCIE);
@@ -419,7 +423,9 @@  static int stm32_hash_update_cpu(struct stm32_hash_dev *hdev)
 {
 	struct stm32_hash_request_ctx *rctx = ahash_request_ctx(hdev->req);
 	struct stm32_hash_state *state = &rctx->state;
+	u32 *preg = state->hw_context;
 	int bufcnt, err = 0, final;
+	int i;
 
 	dev_dbg(hdev->dev, "%s flags %x\n", __func__, state->flags);
 
@@ -440,9 +446,24 @@  static int stm32_hash_update_cpu(struct stm32_hash_dev *hdev)
 	if (final) {
 		bufcnt = state->bufcnt;
 		state->bufcnt = 0;
-		err = stm32_hash_xmit_cpu(hdev, state->buffer, bufcnt, 1);
+		return stm32_hash_xmit_cpu(hdev, state->buffer, bufcnt, 1);
 	}
 
+	if (!(hdev->flags & HASH_FLAGS_INIT))
+		return 0;
+
+	if (stm32_hash_wait_busy(hdev))
+		return -ETIMEDOUT;
+
+	if (!hdev->pdata->ux500)
+		*preg++ = stm32_hash_read(hdev, HASH_IMR);
+	*preg++ = stm32_hash_read(hdev, HASH_STR);
+	*preg++ = stm32_hash_read(hdev, HASH_CR);
+	for (i = 0; i < HASH_CSR_REGISTER_NUMBER; i++)
+		*preg++ = stm32_hash_read(hdev, HASH_CSR(i));
+
+	state->flags |= HASH_FLAGS_INIT;
+
 	return err;
 }
 
@@ -824,7 +845,7 @@  static void stm32_hash_copy_hash(struct ahash_request *req)
 	__be32 *hash = (void *)rctx->digest;
 	unsigned int i, hashsize;
 
-	if (hdev->pdata->broken_emptymsg && !req->nbytes)
+	if (hdev->pdata->broken_emptymsg && (state->flags & HASH_FLAGS_EMPTY))
 		return stm32_hash_emptymsg_fallback(req);
 
 	switch (state->flags & HASH_FLAGS_ALGO_MASK) {
@@ -874,11 +895,6 @@  static void stm32_hash_finish_req(struct ahash_request *req, int err)
 	if (!err && (HASH_FLAGS_FINAL & hdev->flags)) {
 		stm32_hash_copy_hash(req);
 		err = stm32_hash_finish(req);
-		hdev->flags &= ~(HASH_FLAGS_FINAL | HASH_FLAGS_CPU |
-				 HASH_FLAGS_INIT | HASH_FLAGS_DMA_READY |
-				 HASH_FLAGS_OUTPUT_READY | HASH_FLAGS_HMAC |
-				 HASH_FLAGS_HMAC_INIT | HASH_FLAGS_HMAC_FINAL |
-				 HASH_FLAGS_HMAC_KEY);
 	}
 
 	pm_runtime_mark_last_busy(hdev->dev);
@@ -887,66 +903,54 @@  static void stm32_hash_finish_req(struct ahash_request *req, int err)
 	crypto_finalize_hash_request(hdev->engine, req, err);
 }
 
-static int stm32_hash_hw_init(struct stm32_hash_dev *hdev,
-			      struct stm32_hash_request_ctx *rctx)
-{
-	pm_runtime_get_sync(hdev->dev);
-
-	if (!(HASH_FLAGS_INIT & hdev->flags)) {
-		stm32_hash_write(hdev, HASH_CR, HASH_CR_INIT);
-		stm32_hash_write(hdev, HASH_STR, 0);
-		stm32_hash_write(hdev, HASH_DIN, 0);
-		stm32_hash_write(hdev, HASH_IMR, 0);
-	}
-
-	return 0;
-}
-
-static int stm32_hash_one_request(struct crypto_engine *engine, void *areq);
-static int stm32_hash_prepare_req(struct crypto_engine *engine, void *areq);
-
 static int stm32_hash_handle_queue(struct stm32_hash_dev *hdev,
 				   struct ahash_request *req)
 {
 	return crypto_transfer_hash_request_to_engine(hdev->engine, req);
 }
 
-static int stm32_hash_prepare_req(struct crypto_engine *engine, void *areq)
+static int stm32_hash_one_request(struct crypto_engine *engine, void *areq)
 {
 	struct ahash_request *req = container_of(areq, struct ahash_request,
 						 base);
 	struct stm32_hash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(req));
+	struct stm32_hash_request_ctx *rctx = ahash_request_ctx(req);
 	struct stm32_hash_dev *hdev = stm32_hash_find_dev(ctx);
-	struct stm32_hash_request_ctx *rctx;
+	struct stm32_hash_state *state = &rctx->state;
+	int err = 0;
 
 	if (!hdev)
 		return -ENODEV;
 
-	hdev->req = req;
-
-	rctx = ahash_request_ctx(req);
-
 	dev_dbg(hdev->dev, "processing new req, op: %lu, nbytes %d\n",
 		rctx->op, req->nbytes);
 
-	return stm32_hash_hw_init(hdev, rctx);
-}
+	pm_runtime_get_sync(hdev->dev);
 
-static int stm32_hash_one_request(struct crypto_engine *engine, void *areq)
-{
-	struct ahash_request *req = container_of(areq, struct ahash_request,
-						 base);
-	struct stm32_hash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(req));
-	struct stm32_hash_dev *hdev = stm32_hash_find_dev(ctx);
-	struct stm32_hash_request_ctx *rctx;
-	int err = 0;
+	hdev->req = req;
+	hdev->flags = 0;
+
+	if (state->flags & HASH_FLAGS_INIT) {
+		u32 *preg = rctx->state.hw_context;
+		u32 reg;
+		int i;
+
+		if (!hdev->pdata->ux500)
+			stm32_hash_write(hdev, HASH_IMR, *preg++);
+		stm32_hash_write(hdev, HASH_STR, *preg++);
+		stm32_hash_write(hdev, HASH_CR, *preg);
+		reg = *preg++ | HASH_CR_INIT;
+		stm32_hash_write(hdev, HASH_CR, reg);
 
-	if (!hdev)
-		return -ENODEV;
+		for (i = 0; i < HASH_CSR_REGISTER_NUMBER; i++)
+			stm32_hash_write(hdev, HASH_CSR(i), *preg++);
 
-	hdev->req = req;
+		hdev->flags |= HASH_FLAGS_INIT;
 
-	rctx = ahash_request_ctx(req);
+		if (state->flags & HASH_FLAGS_HMAC)
+			hdev->flags |= HASH_FLAGS_HMAC |
+				       HASH_FLAGS_HMAC_KEY;
+	}
 
 	if (rctx->op == HASH_OP_UPDATE)
 		err = stm32_hash_update_req(hdev);
@@ -1041,34 +1045,8 @@  static int stm32_hash_digest(struct ahash_request *req)
 static int stm32_hash_export(struct ahash_request *req, void *out)
 {
 	struct stm32_hash_request_ctx *rctx = ahash_request_ctx(req);
-	struct stm32_hash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(req));
-	struct stm32_hash_dev *hdev = stm32_hash_find_dev(ctx);
-	struct stm32_hash_state *state = &rctx->state;
-	u32 *preg;
-	unsigned int i;
-	int ret;
-
-	pm_runtime_get_sync(hdev->dev);
-
-	ret = stm32_hash_wait_busy(hdev);
-	if (ret)
-		return ret;
-
-	state->hw_context = kmalloc_array(3 + HASH_CSR_REGISTER_NUMBER,
-					  sizeof(u32), GFP_KERNEL);
-	preg = state->hw_context;
 
-	if (!hdev->pdata->ux500)
-		*preg++ = stm32_hash_read(hdev, HASH_IMR);
-	*preg++ = stm32_hash_read(hdev, HASH_STR);
-	*preg++ = stm32_hash_read(hdev, HASH_CR);
-	for (i = 0; i < HASH_CSR_REGISTER_NUMBER; i++)
-		*preg++ = stm32_hash_read(hdev, HASH_CSR(i));
-
-	pm_runtime_mark_last_busy(hdev->dev);
-	pm_runtime_put_autosuspend(hdev->dev);
-
-	memcpy(out, rctx, sizeof(*rctx));
+	memcpy(out, &rctx->state, sizeof(rctx->state));
 
 	return 0;
 }
@@ -1076,33 +1054,9 @@  static int stm32_hash_export(struct ahash_request *req, void *out)
 static int stm32_hash_import(struct ahash_request *req, const void *in)
 {
 	struct stm32_hash_request_ctx *rctx = ahash_request_ctx(req);
-	struct stm32_hash_ctx *ctx = crypto_ahash_ctx(crypto_ahash_reqtfm(req));
-	struct stm32_hash_dev *hdev = stm32_hash_find_dev(ctx);
-	struct stm32_hash_state *state = &rctx->state;
-	const u32 *preg = in;
-	u32 reg;
-	unsigned int i;
-
-	memcpy(rctx, in, sizeof(*rctx));
-
-	preg = state->hw_context;
-
-	pm_runtime_get_sync(hdev->dev);
-
-	if (!hdev->pdata->ux500)
-		stm32_hash_write(hdev, HASH_IMR, *preg++);
-	stm32_hash_write(hdev, HASH_STR, *preg++);
-	stm32_hash_write(hdev, HASH_CR, *preg);
-	reg = *preg++ | HASH_CR_INIT;
-	stm32_hash_write(hdev, HASH_CR, reg);
-
-	for (i = 0; i < HASH_CSR_REGISTER_NUMBER; i++)
-		stm32_hash_write(hdev, HASH_CSR(i), *preg++);
-
-	pm_runtime_mark_last_busy(hdev->dev);
-	pm_runtime_put_autosuspend(hdev->dev);
 
-	kfree(state->hw_context);
+	stm32_hash_init(req);
+	memcpy(&rctx->state, in, sizeof(rctx->state));
 
 	return 0;
 }
@@ -1159,8 +1113,6 @@  static int stm32_hash_cra_init_algs(struct crypto_tfm *tfm,
 		ctx->flags |= HASH_FLAGS_HMAC;
 
 	ctx->enginectx.op.do_one_request = stm32_hash_one_request;
-	ctx->enginectx.op.prepare_request = stm32_hash_prepare_req;
-	ctx->enginectx.op.unprepare_request = NULL;
 
 	return stm32_hash_init_fallback(tfm);
 }
@@ -1252,7 +1204,7 @@  static struct ahash_alg algs_md5[] = {
 		.import = stm32_hash_import,
 		.halg = {
 			.digestsize = MD5_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state),
 			.base = {
 				.cra_name = "md5",
 				.cra_driver_name = "stm32-md5",
@@ -1279,7 +1231,7 @@  static struct ahash_alg algs_md5[] = {
 		.setkey = stm32_hash_setkey,
 		.halg = {
 			.digestsize = MD5_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state),
 			.base = {
 				.cra_name = "hmac(md5)",
 				.cra_driver_name = "stm32-hmac-md5",
@@ -1308,7 +1260,7 @@  static struct ahash_alg algs_sha1[] = {
 		.import = stm32_hash_import,
 		.halg = {
 			.digestsize = SHA1_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state),
 			.base = {
 				.cra_name = "sha1",
 				.cra_driver_name = "stm32-sha1",
@@ -1335,7 +1287,7 @@  static struct ahash_alg algs_sha1[] = {
 		.setkey = stm32_hash_setkey,
 		.halg = {
 			.digestsize = SHA1_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state),
 			.base = {
 				.cra_name = "hmac(sha1)",
 				.cra_driver_name = "stm32-hmac-sha1",
@@ -1364,7 +1316,7 @@  static struct ahash_alg algs_sha224[] = {
 		.import = stm32_hash_import,
 		.halg = {
 			.digestsize = SHA224_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state),
 			.base = {
 				.cra_name = "sha224",
 				.cra_driver_name = "stm32-sha224",
@@ -1391,7 +1343,7 @@  static struct ahash_alg algs_sha224[] = {
 		.import = stm32_hash_import,
 		.halg = {
 			.digestsize = SHA224_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state),
 			.base = {
 				.cra_name = "hmac(sha224)",
 				.cra_driver_name = "stm32-hmac-sha224",
@@ -1420,7 +1372,7 @@  static struct ahash_alg algs_sha256[] = {
 		.import = stm32_hash_import,
 		.halg = {
 			.digestsize = SHA256_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state),
 			.base = {
 				.cra_name = "sha256",
 				.cra_driver_name = "stm32-sha256",
@@ -1447,7 +1399,7 @@  static struct ahash_alg algs_sha256[] = {
 		.setkey = stm32_hash_setkey,
 		.halg = {
 			.digestsize = SHA256_DIGEST_SIZE,
-			.statesize = sizeof(struct stm32_hash_request_ctx),
+			.statesize = sizeof(struct stm32_hash_state),
 			.base = {
 				.cra_name = "hmac(sha256)",
 				.cra_driver_name = "stm32-hmac-sha256",