Message ID | 20221116041342.3841-1-elliott@hpe.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp3083799wru; Tue, 15 Nov 2022 20:15:07 -0800 (PST) X-Google-Smtp-Source: AA0mqf4YuiGm9BFH2821S/E8iVy5oQC2lQMQQ3OT1bqe3X66sRn54sa7vbOs1StFibNYlQokh/cF X-Received: by 2002:a17:902:c085:b0:187:722:f4db with SMTP id j5-20020a170902c08500b001870722f4dbmr7280617pld.87.1668572107094; Tue, 15 Nov 2022 20:15:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668572107; cv=none; d=google.com; s=arc-20160816; b=CtjOkuSEjBtYG2DCP5/EZfDBmsd1b1bdoddeXBR/X3xI4Amd6gQiTeD4zkwoKdpWoK I0wHXtHd56NL7vspgbC+si5Uw/d+FiPhgfb4OocWd5OoUvKj2dkGcCrTlcyBLcfDTPuO zL4pL9HinZTLDXNtK4NYCwSQ5kMB6Gaz2ouh52HR6Fbv89MCx0sNqXiijKTNYe9whpEe bEdWlMlpi3efiXGRYiRLhuHmHqXp8l5MiUEWNi0xRkIeFSJsJVeQtm7VDCq0J3/b8yQV dbgBD/AwWFLlXGOPml5C93K2gMEaMwTA/QlAR14zAfkrwvdZVqyQ8G83T06FneFnFaD9 e+iw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Qj7j8pjop4erRoPwUhbMI2fTBopxdbJ9Z+dRh60Lpqg=; b=I28M+9BiQod8GlEMDbwNRs2NPgHZlQ65ZMeeECkg9PkR+kHe2pHOJByvYsR9QKoyQ4 6A4ICod6NS2i9yfkIeBzhT9Fn34NVI8yfT66HRSWZ4qQnuF7SzT2coBPMxEuqvLoo3An lurRS5RPr2TIKPNO6Sc4kkLMVyz6DmquVHwL1GWRaHGvut3iAHjsCATsNsdHDxWhh+cP lIy3xo1mYTt3usjTrXRwyOJDjcB37CN6bVwoxMXfQZ7irbMRc34AOC1OBhKATfGM+GYM JzU8gfOqnZuUMr7LKzT7n/c8SJsK3jnrx7PLtZf9yl7td+Irfjhg13QKS4fONOmDYybO gO+Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@hpe.com header.s=pps0720 header.b=fmQ0uzAf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hpe.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z1-20020a170902834100b0018862b71d11si13197666pln.381.2022.11.15.20.14.53; Tue, 15 Nov 2022 20:15:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@hpe.com header.s=pps0720 header.b=fmQ0uzAf; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hpe.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231594AbiKPEOO (ORCPT <rfc822;maxim.cournoyer@gmail.com> + 99 others); Tue, 15 Nov 2022 23:14:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54028 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229862AbiKPEOM (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 15 Nov 2022 23:14:12 -0500 Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 49EA81E3D2; Tue, 15 Nov 2022 20:14:11 -0800 (PST) Received: from pps.filterd (m0150242.ppops.net [127.0.0.1]) by mx0a-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2AG3dW60009594; Wed, 16 Nov 2022 04:13:54 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : content-transfer-encoding : mime-version; s=pps0720; bh=Qj7j8pjop4erRoPwUhbMI2fTBopxdbJ9Z+dRh60Lpqg=; b=fmQ0uzAffrN3vKxRGzwxxQdF4OQCG6GmrXubruKlImK33SFeLG3Wk91NAckXRwIWgUhA OH4pElOQmM6DxpFtUCyz2ublf5VncVHKIx9rLeIroqDKcB7doQng7asAN6jRBXkYwiqm +5HW0lEVT9PYPI132IN1ViuXrqKy1/BmTAUwWK6QrICc3pZna4x8SGEAD4cULdoermLX SrAloAtvkbTjCn7Y8W5rhZdRf4Nul4EwVIe1OHC4TXKY8S5WHd3ecbfNFIi2cqsxOMpr Sv5XRQslnbp0DX7eHh9BZCMizLXMNT+JDIz7aJHR0THbaFkNxnSXzy0/8GPucdtPGmtc tA== Received: from p1lg14879.it.hpe.com (p1lg14879.it.hpe.com [16.230.97.200]) by mx0a-002e3701.pphosted.com (PPS) with ESMTPS id 3kvr5486s5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 16 Nov 2022 04:13:54 +0000 Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14879.it.hpe.com (Postfix) with ESMTPS id A2E184B5C7; Wed, 16 Nov 2022 04:13:52 +0000 (UTC) Received: from adevxp033-sys.us.rdlabs.hpecorp.net (unknown [16.231.227.36]) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTP id C91D6808B9A; Wed, 16 Nov 2022 04:13:50 +0000 (UTC) From: Robert Elliott <elliott@hpe.com> To: herbert@gondor.apana.org.au, davem@davemloft.net, tim.c.chen@linux.intel.com, ap420073@gmail.com, ardb@kernel.org, Jason@zx2c4.com, David.Laight@ACULAB.COM, ebiggers@kernel.org, linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Robert Elliott <elliott@hpe.com> Subject: [PATCH v4 00/24] crypto: fix RCU stalls Date: Tue, 15 Nov 2022 22:13:18 -0600 Message-Id: <20221116041342.3841-1-elliott@hpe.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221103042740.6556-1-elliott@hpe.com> References: <20221103042740.6556-1-elliott@hpe.com> X-Proofpoint-ORIG-GUID: k2vNFqgcAFtLJIgVDuly1wSfrl6F68bQ X-Proofpoint-GUID: k2vNFqgcAFtLJIgVDuly1wSfrl6F68bQ Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-15_08,2022-11-15_03,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxscore=0 malwarescore=0 adultscore=0 clxscore=1015 lowpriorityscore=0 spamscore=0 bulkscore=0 mlxlogscore=999 phishscore=0 impostorscore=0 priorityscore=1501 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211160029 X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749624665503861258?= X-GMAIL-MSGID: =?utf-8?q?1749624665503861258?= |
Series |
crypto: fix RCU stalls
|
|
Message
Elliott, Robert (Servers)
Nov. 16, 2022, 4:13 a.m. UTC
This series fixes the RCU stalls triggered by the x86 crypto modules discussed in https://lore.kernel.org/all/MW5PR84MB18426EBBA3303770A8BC0BDFAB759@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM/ Two root causes were: - too much data processed between kernel_fpu_begin and kernel_fpu_end calls (which are heavily used by the x86 optimized drivers) - tcrypt not calling cond_resched during speed test loops These problems have always been lurking, but improving the loading of the x86/sha512 module led to it happening a lot during boot when using SHA-512 for module signature checking. Fixing these problems makes it safer to improve loading the rest of the x86 modules like the sha512 module. This series only handles the x86 modules. Version 4 tackles lingering comments from version 2. 1. Unlike the hash functions, skcipher and aead functions accept pointers to scatter-gather lists, and the helper functions that walk through those lists limit processing to a page size at a time. The aegis module did everything inside one pair of kernel_fpu_begin() and kernel_fpu_end() calls including walking through the sglist, so it could preempt the CPU without constraint. The aesni aead functions for gcm process the additional data (data that is included in the authentication tag calculation but not encrypted) in one FPU context, so that can be a problem. This will require some asm changes to fix. However, I don't think that is a typical use case, so this series defers fixing that. The series adds device table matching for all the x86 crypto modules. 2. I replaced all the positive and negative prints with module parameters, including enough clues in modinfo descriptions that a user can determine what is working and not working. Robert Elliott (24): crypto: tcrypt - test crc32 crypto: tcrypt - test nhpoly1305 crypto: tcrypt - reschedule during cycles speed tests crypto: x86/sha - limit FPU preemption crypto: x86/crc - limit FPU preemption crypto: x86/sm3 - limit FPU preemption crypto: x86/ghash - use u8 rather than char crypto: x86/ghash - restructure FPU context saving crypto: x86/ghash - limit FPU preemption crypto: x86/poly - limit FPU preemption crypto: x86/aegis - limit FPU preemption crypto: x86/sha - register all variations crypto: x86/sha - minimize time in FPU context crypto: x86/sha - load based on CPU features crypto: x86/crc - load based on CPU features crypto: x86/sm3 - load based on CPU features crypto: x86/poly - load based on CPU features crypto: x86/ghash - load based on CPU features crypto: x86/aesni - avoid type conversions crypto: x86/ciphers - load based on CPU features crypto: x86 - report used CPU features via module parameters crypto: x86 - report missing CPU features via module parameters crypto: x86 - report suboptimal CPUs via module parameters crypto: x86 - standarize module descriptions arch/x86/crypto/aegis128-aesni-glue.c | 66 +++-- arch/x86/crypto/aesni-intel_glue.c | 45 ++-- arch/x86/crypto/aria_aesni_avx_glue.c | 43 ++- arch/x86/crypto/blake2s-glue.c | 18 +- arch/x86/crypto/blowfish_glue.c | 39 ++- arch/x86/crypto/camellia_aesni_avx2_glue.c | 40 ++- arch/x86/crypto/camellia_aesni_avx_glue.c | 38 ++- arch/x86/crypto/camellia_glue.c | 37 ++- arch/x86/crypto/cast5_avx_glue.c | 30 ++- arch/x86/crypto/cast6_avx_glue.c | 30 ++- arch/x86/crypto/chacha_glue.c | 18 +- arch/x86/crypto/crc32-pclmul_asm.S | 6 +- arch/x86/crypto/crc32-pclmul_glue.c | 39 ++- arch/x86/crypto/crc32c-intel_glue.c | 66 +++-- arch/x86/crypto/crct10dif-pclmul_glue.c | 56 ++-- arch/x86/crypto/curve25519-x86_64.c | 29 +- arch/x86/crypto/des3_ede_glue.c | 36 ++- arch/x86/crypto/ghash-clmulni-intel_asm.S | 4 +- arch/x86/crypto/ghash-clmulni-intel_glue.c | 45 ++-- arch/x86/crypto/nhpoly1305-avx2-glue.c | 36 ++- arch/x86/crypto/nhpoly1305-sse2-glue.c | 22 +- arch/x86/crypto/poly1305_glue.c | 56 +++- arch/x86/crypto/polyval-clmulni_glue.c | 31 ++- arch/x86/crypto/serpent_avx2_glue.c | 36 ++- arch/x86/crypto/serpent_avx_glue.c | 31 ++- arch/x86/crypto/serpent_sse2_glue.c | 13 +- arch/x86/crypto/sha1_ssse3_glue.c | 298 ++++++++++++++------- arch/x86/crypto/sha256_ssse3_glue.c | 294 +++++++++++++------- arch/x86/crypto/sha512_ssse3_glue.c | 205 +++++++++----- arch/x86/crypto/sm3_avx_glue.c | 70 +++-- arch/x86/crypto/sm4_aesni_avx2_glue.c | 37 ++- arch/x86/crypto/sm4_aesni_avx_glue.c | 39 ++- arch/x86/crypto/twofish_avx_glue.c | 29 +- arch/x86/crypto/twofish_glue.c | 12 +- arch/x86/crypto/twofish_glue_3way.c | 36 ++- crypto/aes_ti.c | 2 +- crypto/blake2b_generic.c | 2 +- crypto/blowfish_common.c | 2 +- crypto/crct10dif_generic.c | 2 +- crypto/curve25519-generic.c | 1 + crypto/sha256_generic.c | 2 +- crypto/sha512_generic.c | 2 +- crypto/sm3.c | 2 +- crypto/sm4.c | 2 +- crypto/tcrypt.c | 56 ++-- crypto/twofish_common.c | 2 +- crypto/twofish_generic.c | 2 +- 47 files changed, 1377 insertions(+), 630 deletions(-)
Comments
On Tue, Nov 15, 2022 at 10:13:18PM -0600, Robert Elliott wrote: > This series fixes the RCU stalls triggered by the x86 crypto > modules discussed in > https://lore.kernel.org/all/MW5PR84MB18426EBBA3303770A8BC0BDFAB759@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM/ > > Two root causes were: > - too much data processed between kernel_fpu_begin and > kernel_fpu_end calls (which are heavily used by the x86 > optimized drivers) > - tcrypt not calling cond_resched during speed test loops > > These problems have always been lurking, but improving the > loading of the x86/sha512 module led to it happening a lot > during boot when using SHA-512 for module signature checking. Can we split this series up please? The fixes to the stalls should stand separately from the changes to how modules are loaded. The latter is more of an improvement while the former should be applied ASAP. Thanks,
> -----Original Message----- > From: Herbert Xu <herbert@gondor.apana.org.au> > Sent: Wednesday, November 16, 2022 9:59 PM > Subject: Re: [PATCH v4 00/24] crypto: fix RCU stalls > > On Tue, Nov 15, 2022 at 10:13:18PM -0600, Robert Elliott wrote: ... > > These problems have always been lurking, but improving the > > loading of the x86/sha512 module led to it happening a lot > > during boot when using SHA-512 for module signature checking. > > Can we split this series up please? The fixes to the stalls should > stand separately from the changes to how modules are loaded. The > latter is more of an improvement while the former should be applied > ASAP. Yes. With the v4 patch numbers: [PATCH v4 01/24] crypto: tcrypt - test crc32 [PATCH v4 02/24] crypto: tcrypt - test nhpoly1305 Those ensure the changes to those hash modules are testable. [PATCH v4 03/24] crypto: tcrypt - reschedule during cycles speed That's only for tcrypt so not urgent for users, but pretty simple. [PATCH v4 04/24] crypto: x86/sha - limit FPU preemption [PATCH v4 05/24] crypto: x86/crc - limit FPU preemption [PATCH v4 06/24] crypto: x86/sm3 - limit FPU preemption [PATCH v4 07/24] crypto: x86/ghash - use u8 rather than char [PATCH v4 08/24] crypto: x86/ghash - restructure FPU context saving [PATCH v4 09/24] crypto: x86/ghash - limit FPU preemption [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption [PATCH v4 11/24] crypto: x86/aegis - limit FPU preemption [PATCH v4 12/24] crypto: x86/sha - register all variations [PATCH v4 13/24] crypto: x86/sha - minimize time in FPU context That's the end of the fixes set. [PATCH v4 14/24] crypto: x86/sha - load based on CPU features [PATCH v4 15/24] crypto: x86/crc - load based on CPU features [PATCH v4 16/24] crypto: x86/sm3 - load based on CPU features [PATCH v4 17/24] crypto: x86/poly - load based on CPU features [PATCH v4 18/24] crypto: x86/ghash - load based on CPU features [PATCH v4 19/24] crypto: x86/aesni - avoid type conversions [PATCH v4 20/24] crypto: x86/ciphers - load based on CPU features [PATCH v4 21/24] crypto: x86 - report used CPU features via module [PATCH v4 22/24] crypto: x86 - report missing CPU features via module [PATCH v4 23/24] crypto: x86 - report suboptimal CPUs via module [PATCH v4 24/24] crypto: x86 - standardize module descriptions I'll put those in a new series. For 6.1, I still suggest reverting aa031b8f702e ("crypto: x86/sha512 - load based on CPU features) since that exposed the problem. Target the fixes for 6.2 and module loading for 6.2 or 6.3.
On Thu, Nov 17, 2022 at 4:14 PM Elliott, Robert (Servers) <elliott@hpe.com> wrote: > > -----Original Message----- > > From: Herbert Xu <herbert@gondor.apana.org.au> > > Sent: Wednesday, November 16, 2022 9:59 PM > > Subject: Re: [PATCH v4 00/24] crypto: fix RCU stalls > > > > On Tue, Nov 15, 2022 at 10:13:18PM -0600, Robert Elliott wrote: > ... > > > These problems have always been lurking, but improving the > > > loading of the x86/sha512 module led to it happening a lot > > > during boot when using SHA-512 for module signature checking. > > > > Can we split this series up please? The fixes to the stalls should > > stand separately from the changes to how modules are loaded. The > > latter is more of an improvement while the former should be applied > > ASAP. > > Yes. With the v4 patch numbers: > [PATCH v4 01/24] crypto: tcrypt - test crc32 > [PATCH v4 02/24] crypto: tcrypt - test nhpoly1305 > > Those ensure the changes to those hash modules are testable. > > [PATCH v4 03/24] crypto: tcrypt - reschedule during cycles speed > > That's only for tcrypt so not urgent for users, but pretty > simple. > > [PATCH v4 04/24] crypto: x86/sha - limit FPU preemption > [PATCH v4 05/24] crypto: x86/crc - limit FPU preemption > [PATCH v4 06/24] crypto: x86/sm3 - limit FPU preemption > [PATCH v4 07/24] crypto: x86/ghash - use u8 rather than char > [PATCH v4 08/24] crypto: x86/ghash - restructure FPU context saving > [PATCH v4 09/24] crypto: x86/ghash - limit FPU preemption > [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption > [PATCH v4 11/24] crypto: x86/aegis - limit FPU preemption > [PATCH v4 12/24] crypto: x86/sha - register all variations > [PATCH v4 13/24] crypto: x86/sha - minimize time in FPU context > > That's the end of the fixes set. > > [PATCH v4 14/24] crypto: x86/sha - load based on CPU features > [PATCH v4 15/24] crypto: x86/crc - load based on CPU features > [PATCH v4 16/24] crypto: x86/sm3 - load based on CPU features > [PATCH v4 17/24] crypto: x86/poly - load based on CPU features > [PATCH v4 18/24] crypto: x86/ghash - load based on CPU features > [PATCH v4 19/24] crypto: x86/aesni - avoid type conversions > [PATCH v4 20/24] crypto: x86/ciphers - load based on CPU features > [PATCH v4 21/24] crypto: x86 - report used CPU features via module > [PATCH v4 22/24] crypto: x86 - report missing CPU features via module > [PATCH v4 23/24] crypto: x86 - report suboptimal CPUs via module > [PATCH v4 24/24] crypto: x86 - standardize module descriptions > > I'll put those in a new series. Thanks. Please take into account my review feedback this time for your next series. Jason