[v4,00/24] crypto: fix RCU stalls

Message ID 20221116041342.3841-1-elliott@hpe.com
Headers
Series crypto: fix RCU stalls |

Message

Elliott, Robert (Servers) Nov. 16, 2022, 4:13 a.m. UTC
  This series fixes the RCU stalls triggered by the x86 crypto
modules discussed in
https://lore.kernel.org/all/MW5PR84MB18426EBBA3303770A8BC0BDFAB759@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM/

Two root causes were:
- too much data processed between kernel_fpu_begin and
  kernel_fpu_end calls (which are heavily used by the x86
  optimized drivers)
- tcrypt not calling cond_resched during speed test loops

These problems have always been lurking, but improving the
loading of the x86/sha512 module led to it happening a lot
during boot when using SHA-512 for module signature checking.

Fixing these problems makes it safer to improve loading
the rest of the x86 modules like the sha512 module.

This series only handles the x86 modules.

Version 4 tackles lingering comments from version 2.

1. Unlike the hash functions, skcipher and aead functions
accept pointers to scatter-gather lists, and the helper
functions that walk through those lists limit processing
to a page size at a time.

The aegis module did everything inside one pair of
kernel_fpu_begin() and kernel_fpu_end() calls including
walking through the sglist, so it could preempt the CPU
without constraint.

The aesni aead functions for gcm process the additional
data (data that is included in the authentication tag
calculation but not encrypted) in one FPU context, so
that can be a problem. This will require some asm changes
to fix. However, I don't think that is a typical use case,
so this series defers fixing that.

The series adds device table matching for all the x86
crypto modules.

2. I replaced all the positive and negative prints with
module parameters, including enough clues in modinfo
descriptions that a user can determine what is
working and not working.


Robert Elliott (24):
  crypto: tcrypt - test crc32
  crypto: tcrypt - test nhpoly1305
  crypto: tcrypt - reschedule during cycles speed tests
  crypto: x86/sha - limit FPU preemption
  crypto: x86/crc - limit FPU preemption
  crypto: x86/sm3 - limit FPU preemption
  crypto: x86/ghash - use u8 rather than char
  crypto: x86/ghash - restructure FPU context saving
  crypto: x86/ghash - limit FPU preemption
  crypto: x86/poly - limit FPU preemption
  crypto: x86/aegis - limit FPU preemption
  crypto: x86/sha - register all variations
  crypto: x86/sha - minimize time in FPU context
  crypto: x86/sha - load based on CPU features
  crypto: x86/crc - load based on CPU features
  crypto: x86/sm3 - load based on CPU features
  crypto: x86/poly - load based on CPU features
  crypto: x86/ghash - load based on CPU features
  crypto: x86/aesni - avoid type conversions
  crypto: x86/ciphers - load based on CPU features
  crypto: x86 - report used CPU features via module parameters
  crypto: x86 - report missing CPU features via module parameters
  crypto: x86 - report suboptimal CPUs via module parameters
  crypto: x86 - standarize module descriptions

 arch/x86/crypto/aegis128-aesni-glue.c      |  66 +++--
 arch/x86/crypto/aesni-intel_glue.c         |  45 ++--
 arch/x86/crypto/aria_aesni_avx_glue.c      |  43 ++-
 arch/x86/crypto/blake2s-glue.c             |  18 +-
 arch/x86/crypto/blowfish_glue.c            |  39 ++-
 arch/x86/crypto/camellia_aesni_avx2_glue.c |  40 ++-
 arch/x86/crypto/camellia_aesni_avx_glue.c  |  38 ++-
 arch/x86/crypto/camellia_glue.c            |  37 ++-
 arch/x86/crypto/cast5_avx_glue.c           |  30 ++-
 arch/x86/crypto/cast6_avx_glue.c           |  30 ++-
 arch/x86/crypto/chacha_glue.c              |  18 +-
 arch/x86/crypto/crc32-pclmul_asm.S         |   6 +-
 arch/x86/crypto/crc32-pclmul_glue.c        |  39 ++-
 arch/x86/crypto/crc32c-intel_glue.c        |  66 +++--
 arch/x86/crypto/crct10dif-pclmul_glue.c    |  56 ++--
 arch/x86/crypto/curve25519-x86_64.c        |  29 +-
 arch/x86/crypto/des3_ede_glue.c            |  36 ++-
 arch/x86/crypto/ghash-clmulni-intel_asm.S  |   4 +-
 arch/x86/crypto/ghash-clmulni-intel_glue.c |  45 ++--
 arch/x86/crypto/nhpoly1305-avx2-glue.c     |  36 ++-
 arch/x86/crypto/nhpoly1305-sse2-glue.c     |  22 +-
 arch/x86/crypto/poly1305_glue.c            |  56 +++-
 arch/x86/crypto/polyval-clmulni_glue.c     |  31 ++-
 arch/x86/crypto/serpent_avx2_glue.c        |  36 ++-
 arch/x86/crypto/serpent_avx_glue.c         |  31 ++-
 arch/x86/crypto/serpent_sse2_glue.c        |  13 +-
 arch/x86/crypto/sha1_ssse3_glue.c          | 298 ++++++++++++++-------
 arch/x86/crypto/sha256_ssse3_glue.c        | 294 +++++++++++++-------
 arch/x86/crypto/sha512_ssse3_glue.c        | 205 +++++++++-----
 arch/x86/crypto/sm3_avx_glue.c             |  70 +++--
 arch/x86/crypto/sm4_aesni_avx2_glue.c      |  37 ++-
 arch/x86/crypto/sm4_aesni_avx_glue.c       |  39 ++-
 arch/x86/crypto/twofish_avx_glue.c         |  29 +-
 arch/x86/crypto/twofish_glue.c             |  12 +-
 arch/x86/crypto/twofish_glue_3way.c        |  36 ++-
 crypto/aes_ti.c                            |   2 +-
 crypto/blake2b_generic.c                   |   2 +-
 crypto/blowfish_common.c                   |   2 +-
 crypto/crct10dif_generic.c                 |   2 +-
 crypto/curve25519-generic.c                |   1 +
 crypto/sha256_generic.c                    |   2 +-
 crypto/sha512_generic.c                    |   2 +-
 crypto/sm3.c                               |   2 +-
 crypto/sm4.c                               |   2 +-
 crypto/tcrypt.c                            |  56 ++--
 crypto/twofish_common.c                    |   2 +-
 crypto/twofish_generic.c                   |   2 +-
 47 files changed, 1377 insertions(+), 630 deletions(-)
  

Comments

Herbert Xu Nov. 17, 2022, 3:58 a.m. UTC | #1
On Tue, Nov 15, 2022 at 10:13:18PM -0600, Robert Elliott wrote:
> This series fixes the RCU stalls triggered by the x86 crypto
> modules discussed in
> https://lore.kernel.org/all/MW5PR84MB18426EBBA3303770A8BC0BDFAB759@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM/
> 
> Two root causes were:
> - too much data processed between kernel_fpu_begin and
>   kernel_fpu_end calls (which are heavily used by the x86
>   optimized drivers)
> - tcrypt not calling cond_resched during speed test loops
> 
> These problems have always been lurking, but improving the
> loading of the x86/sha512 module led to it happening a lot
> during boot when using SHA-512 for module signature checking.

Can we split this series up please? The fixes to the stalls should
stand separately from the changes to how modules are loaded.  The
latter is more of an improvement while the former should be applied
ASAP.

Thanks,
  
Elliott, Robert (Servers) Nov. 17, 2022, 3:13 p.m. UTC | #2
> -----Original Message-----
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Sent: Wednesday, November 16, 2022 9:59 PM
> Subject: Re: [PATCH v4 00/24] crypto: fix RCU stalls
> 
> On Tue, Nov 15, 2022 at 10:13:18PM -0600, Robert Elliott wrote:
...
> > These problems have always been lurking, but improving the
> > loading of the x86/sha512 module led to it happening a lot
> > during boot when using SHA-512 for module signature checking.
> 
> Can we split this series up please? The fixes to the stalls should
> stand separately from the changes to how modules are loaded.  The
> latter is more of an improvement while the former should be applied
> ASAP.

Yes. With the v4 patch numbers:
[PATCH v4 01/24] crypto: tcrypt - test crc32
[PATCH v4 02/24] crypto: tcrypt - test nhpoly1305

Those ensure the changes to those hash modules are testable.

[PATCH v4 03/24] crypto: tcrypt - reschedule during cycles speed

That's only for tcrypt so not urgent for users, but pretty
simple.

[PATCH v4 04/24] crypto: x86/sha - limit FPU preemption
[PATCH v4 05/24] crypto: x86/crc - limit FPU preemption
[PATCH v4 06/24] crypto: x86/sm3 - limit FPU preemption
[PATCH v4 07/24] crypto: x86/ghash - use u8 rather than char
[PATCH v4 08/24] crypto: x86/ghash - restructure FPU context saving
[PATCH v4 09/24] crypto: x86/ghash - limit FPU preemption
[PATCH v4 10/24] crypto: x86/poly - limit FPU preemption
[PATCH v4 11/24] crypto: x86/aegis - limit FPU preemption
[PATCH v4 12/24] crypto: x86/sha - register all variations
[PATCH v4 13/24] crypto: x86/sha - minimize time in FPU context

That's the end of the fixes set.

[PATCH v4 14/24] crypto: x86/sha - load based on CPU features
[PATCH v4 15/24] crypto: x86/crc - load based on CPU features
[PATCH v4 16/24] crypto: x86/sm3 - load based on CPU features
[PATCH v4 17/24] crypto: x86/poly - load based on CPU features
[PATCH v4 18/24] crypto: x86/ghash - load based on CPU features
[PATCH v4 19/24] crypto: x86/aesni - avoid type conversions
[PATCH v4 20/24] crypto: x86/ciphers - load based on CPU features
[PATCH v4 21/24] crypto: x86 - report used CPU features via module
[PATCH v4 22/24] crypto: x86 - report missing CPU features via module
[PATCH v4 23/24] crypto: x86 - report suboptimal CPUs via module
[PATCH v4 24/24] crypto: x86 - standardize module descriptions

I'll put those in a new series.

For 6.1, I still suggest reverting aa031b8f702e ("crypto: x86/sha512 -
load based on CPU features) since that exposed the problem. Target
the fixes for 6.2 and module loading for 6.2 or 6.3.
  
Jason A. Donenfeld Nov. 17, 2022, 3:15 p.m. UTC | #3
On Thu, Nov 17, 2022 at 4:14 PM Elliott, Robert (Servers)
<elliott@hpe.com> wrote:
> > -----Original Message-----
> > From: Herbert Xu <herbert@gondor.apana.org.au>
> > Sent: Wednesday, November 16, 2022 9:59 PM
> > Subject: Re: [PATCH v4 00/24] crypto: fix RCU stalls
> >
> > On Tue, Nov 15, 2022 at 10:13:18PM -0600, Robert Elliott wrote:
> ...
> > > These problems have always been lurking, but improving the
> > > loading of the x86/sha512 module led to it happening a lot
> > > during boot when using SHA-512 for module signature checking.
> >
> > Can we split this series up please? The fixes to the stalls should
> > stand separately from the changes to how modules are loaded.  The
> > latter is more of an improvement while the former should be applied
> > ASAP.
>
> Yes. With the v4 patch numbers:
> [PATCH v4 01/24] crypto: tcrypt - test crc32
> [PATCH v4 02/24] crypto: tcrypt - test nhpoly1305
>
> Those ensure the changes to those hash modules are testable.
>
> [PATCH v4 03/24] crypto: tcrypt - reschedule during cycles speed
>
> That's only for tcrypt so not urgent for users, but pretty
> simple.
>
> [PATCH v4 04/24] crypto: x86/sha - limit FPU preemption
> [PATCH v4 05/24] crypto: x86/crc - limit FPU preemption
> [PATCH v4 06/24] crypto: x86/sm3 - limit FPU preemption
> [PATCH v4 07/24] crypto: x86/ghash - use u8 rather than char
> [PATCH v4 08/24] crypto: x86/ghash - restructure FPU context saving
> [PATCH v4 09/24] crypto: x86/ghash - limit FPU preemption
> [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption
> [PATCH v4 11/24] crypto: x86/aegis - limit FPU preemption
> [PATCH v4 12/24] crypto: x86/sha - register all variations
> [PATCH v4 13/24] crypto: x86/sha - minimize time in FPU context
>
> That's the end of the fixes set.
>
> [PATCH v4 14/24] crypto: x86/sha - load based on CPU features
> [PATCH v4 15/24] crypto: x86/crc - load based on CPU features
> [PATCH v4 16/24] crypto: x86/sm3 - load based on CPU features
> [PATCH v4 17/24] crypto: x86/poly - load based on CPU features
> [PATCH v4 18/24] crypto: x86/ghash - load based on CPU features
> [PATCH v4 19/24] crypto: x86/aesni - avoid type conversions
> [PATCH v4 20/24] crypto: x86/ciphers - load based on CPU features
> [PATCH v4 21/24] crypto: x86 - report used CPU features via module
> [PATCH v4 22/24] crypto: x86 - report missing CPU features via module
> [PATCH v4 23/24] crypto: x86 - report suboptimal CPUs via module
> [PATCH v4 24/24] crypto: x86 - standardize module descriptions
>
> I'll put those in a new series.

Thanks. Please take into account my review feedback this time for your
next series.

Jason