[00/13] crypto: x86 - yield FPU context during long loops

Message ID 20221219220223.3982176-1-elliott@hpe.com
Headers
Series crypto: x86 - yield FPU context during long loops |

Message

Elliott, Robert (Servers) Dec. 19, 2022, 10:02 p.m. UTC
  This is an offshoot of the previous patch series at:
  https://lore.kernel.org/linux-crypto/20221219202910.3063036-1-elliott@hpe.com

Add a kernel_fpu_yield() function for x86 crypto drivers to call
periodically during long loops.

Test results
============
I created 28 tcrypt modules so modprobe can run concurrent tests,
added 1 MiB functional and speed tests to tcrypt, and ran three processes
spawning 28 subprocesses (one per physical CPU core) each looping forever
through all the tcrypt test modes. This keeps the system quite busy,
generating RCU stalls and soft lockups during both generic and x86
crypto function processing.

In conjunction with these patch series:
* [PATCH 0/8] crypto: kernel-doc for assembly language
  https://lore.kernel.org/linux-crypto/20221219185555.433233-1-elliott@hpe.com
* [PATCH 0/3] crypto/rcu: suppress unnecessary CPU stall warnings
  https://lore.kernel.org/linux-crypto/20221219202910.3063036-1-elliott@hpe.com
* [PATCH 0/3] crypto: yield at end of operations
  https://lore.kernel.org/linux-crypto/20221219203733.3063192-1-elliott@hpe.com

while using the default RCU values (60 s stalls, 21 s expedited stalls),
several nights of testing did not result in any RCU stall warnings or soft
lockups in any of these preemption modes:
   preempt=none
   preempt=voluntary
   preempt=full

Setting the shortest possible RCU timeouts (3 s, 20 ms) did still result
in RCU stalls, but only about one every 2 hours, and not occurring
on particular modules like sha512_ssse3 and sm4-generic.

systemd usually crashes and restarts when its journal becomes full from
all the tcrypt printk messages. Without the patches, that triggered more
RCU stall reports and soft lockups; with the patches, only userspace
seems perturbed.


Robert Elliott (13):
  x86:  protect simd.h header file
  x86: add yield FPU context utility function
  crypto: x86/sha - yield FPU context during long loops
  crypto: x86/crc - yield FPU context during long loops
  crypto: x86/sm3 - yield FPU context during long loops
  crypto: x86/ghash - use u8 rather than char
  crypto: x86/ghash - restructure FPU context saving
  crypto: x86/ghash - yield FPU context during long loops
  crypto: x86/poly - yield FPU context only when needed
  crypto: x86/aegis - yield FPU context during long loops
  crypto: x86/blake - yield FPU context only when needed
  crypto: x86/chacha - yield FPU context only when needed
  crypto: x86/aria - yield FPU context only when needed

 arch/x86/crypto/aegis128-aesni-glue.c      |  49 ++++++---
 arch/x86/crypto/aria_aesni_avx_glue.c      |   7 +-
 arch/x86/crypto/blake2s-glue.c             |  41 +++----
 arch/x86/crypto/chacha_glue.c              |  22 ++--
 arch/x86/crypto/crc32-pclmul_glue.c        |  49 +++++----
 arch/x86/crypto/crc32c-intel_glue.c        | 118 ++++++++++++++------
 arch/x86/crypto/crct10dif-pclmul_glue.c    |  65 ++++++++---
 arch/x86/crypto/ghash-clmulni-intel_asm.S  |   6 +-
 arch/x86/crypto/ghash-clmulni-intel_glue.c |  37 +++++--
 arch/x86/crypto/nhpoly1305-avx2-glue.c     |  22 ++--
 arch/x86/crypto/nhpoly1305-sse2-glue.c     |  22 ++--
 arch/x86/crypto/poly1305_glue.c            |  47 ++++----
 arch/x86/crypto/polyval-clmulni_glue.c     |  46 +++++---
 arch/x86/crypto/sha1_avx2_x86_64_asm.S     |   6 +-
 arch/x86/crypto/sha1_ni_asm.S              |   8 +-
 arch/x86/crypto/sha1_ssse3_glue.c          | 120 +++++++++++++++++----
 arch/x86/crypto/sha256_ni_asm.S            |   8 +-
 arch/x86/crypto/sha256_ssse3_glue.c        | 115 ++++++++++++++++----
 arch/x86/crypto/sha512_ssse3_glue.c        |  89 ++++++++++++---
 arch/x86/crypto/sm3_avx_glue.c             |  34 +++++-
 arch/x86/include/asm/simd.h                |  23 ++++
 include/crypto/internal/blake2s.h          |   8 +-
 lib/crypto/blake2s-generic.c               |  12 +--
 23 files changed, 687 insertions(+), 267 deletions(-)
  

Comments

Eric Biggers Dec. 20, 2022, 8:02 p.m. UTC | #1
On Mon, Dec 19, 2022 at 04:02:10PM -0600, Robert Elliott wrote:
> This is an offshoot of the previous patch series at:
>   https://lore.kernel.org/linux-crypto/20221219202910.3063036-1-elliott@hpe.com
> 
> Add a kernel_fpu_yield() function for x86 crypto drivers to call
> periodically during long loops.
> 
> Test results
> ============
> I created 28 tcrypt modules so modprobe can run concurrent tests,
> added 1 MiB functional and speed tests to tcrypt, and ran three processes
> spawning 28 subprocesses (one per physical CPU core) each looping forever
> through all the tcrypt test modes. This keeps the system quite busy,
> generating RCU stalls and soft lockups during both generic and x86
> crypto function processing.
> 
> In conjunction with these patch series:
> * [PATCH 0/8] crypto: kernel-doc for assembly language
>   https://lore.kernel.org/linux-crypto/20221219185555.433233-1-elliott@hpe.com
> * [PATCH 0/3] crypto/rcu: suppress unnecessary CPU stall warnings
>   https://lore.kernel.org/linux-crypto/20221219202910.3063036-1-elliott@hpe.com
> * [PATCH 0/3] crypto: yield at end of operations
>   https://lore.kernel.org/linux-crypto/20221219203733.3063192-1-elliott@hpe.com
> 
> while using the default RCU values (60 s stalls, 21 s expedited stalls),
> several nights of testing did not result in any RCU stall warnings or soft
> lockups in any of these preemption modes:
>    preempt=none
>    preempt=voluntary
>    preempt=full
> 
> Setting the shortest possible RCU timeouts (3 s, 20 ms) did still result
> in RCU stalls, but only about one every 2 hours, and not occurring
> on particular modules like sha512_ssse3 and sm4-generic.
> 
> systemd usually crashes and restarts when its journal becomes full from
> all the tcrypt printk messages. Without the patches, that triggered more
> RCU stall reports and soft lockups; with the patches, only userspace
> seems perturbed.
> 

Where does this patch series apply to?

- Eric