From patchwork Wed Nov 16 04:13:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Elliott, Robert (Servers)" X-Patchwork-Id: 20700 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp3084059wru; Tue, 15 Nov 2022 20:16:06 -0800 (PST) X-Google-Smtp-Source: AA0mqf4aZDdI0UwkT47x1wrc1waopKtWnOW0pUJBPQr2zbSQI+ig/HSDp886GlzhZMviAz35oFv6 X-Received: by 2002:a62:e412:0:b0:56d:a1fc:7000 with SMTP id r18-20020a62e412000000b0056da1fc7000mr21083445pfh.35.1668572166290; Tue, 15 Nov 2022 20:16:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668572166; cv=none; d=google.com; s=arc-20160816; b=Ffd5vo8PEPhTzmhONmuCaXKuIRbfUUeAcnK2KnqgV8gylXU1lo2DjdxKDQefpD3z9e 0a6DL/j4WNHd4mAwP7aTKEfsxIMHF2vw8nSDW9TRpXcG6fY/veK0QodQJnoUCKp2xPVU euaztjKL2I7lfFXWmxtZH2NSdOvNxV3oFMkTK5fEx0qjzS5LXRNb4rGEiku/wB0scFLh AYUavyYgs+Tel9qhMSuI0ScnbmmJ/TA6ChU/Z+WL9GO7PtoK714IOKMS1xIC27/0iO0C zutTQUo+Ranc86rtT/WZnrjcn66Fd/bYgBQWD4PMGilAat7esvSKdcKzUdjf9utpYGuQ fuUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=DBZSqkANZQQ0OruNgKmmCEGIHHi7mZUgVGLxPPk3P8k=; b=M2qsmqMW8KEl175xyIouwCFBnWIbgWXQ6gh2wOuN5JM6DceKtxSspe56yIbGi3D00J ImKoi8UfGQQZSm/FyPYFY8dIsXSFdiLI00c6bpzpZovJlc6dS2O3i/r4hD6YRo0bH+QV d7DLD+R5BV6ccFcfVA/IiRii8uakEtMAJz0UaNscWhwIWrcWSfkzd9lOSh9MQeNhzmGx 1oLz+aKoaBgjKhs16vIaCEq4njP2AftF7vDCxRv8isbWkO+8pbP0YkGhtdQE+23xeDE+ CnhvN0klzdwcSFR8CnhBjErwsIN4IyVi6YRvJSXV1l6RWoIM4cNSRFGSeXUPWaotPFhe 527A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@hpe.com header.s=pps0720 header.b=AUz+aGuz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hpe.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e19-20020a170902e0d300b00186881e1ff0si12739282pla.302.2022.11.15.20.15.53; Tue, 15 Nov 2022 20:16:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@hpe.com header.s=pps0720 header.b=AUz+aGuz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hpe.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230247AbiKPEO1 (ORCPT + 99 others); Tue, 15 Nov 2022 23:14:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54042 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231702AbiKPEOP (ORCPT ); Tue, 15 Nov 2022 23:14:15 -0500 Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 720602D1DB; Tue, 15 Nov 2022 20:14:14 -0800 (PST) Received: from pps.filterd (m0134422.ppops.net [127.0.0.1]) by mx0b-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2AG401G7021757; Wed, 16 Nov 2022 04:14:05 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pps0720; bh=DBZSqkANZQQ0OruNgKmmCEGIHHi7mZUgVGLxPPk3P8k=; b=AUz+aGuzCA7f7EAWYy6mMP5TMq133ey6CcO/y7KePC293RXjthMsM7itHq9YQczbltyY gmNzgonVvC/cpXn3ByK6E7+WKTZ6y0BO2mkMtiGtyBFlvAX5d2ercfokLWYBFcHeFfJf GmoCF83mdgv5MVYdUFC2vLt7xZJop6SUrJEJYPsM/TZupQrBMuHF4PZ1CfARWKSIzscI LkQRQxdJbZZ/rWwg9nPL2OgeC7tzMx2pK7P21lS6ISkWmxRdRNKv8hesz6i7YIedaMvr js6y7DMoc1ksHhHkkL8rZoCuUgHqq9C7Cfakm8npDbH1NT0VcOto+Zn722ule2L1tssW 6Q== Received: from p1lg14880.it.hpe.com (p1lg14880.it.hpe.com [16.230.97.201]) by mx0b-002e3701.pphosted.com (PPS) with ESMTPS id 3kvrew82we-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 16 Nov 2022 04:14:05 +0000 Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14880.it.hpe.com (Postfix) with ESMTPS id 0AE40806B77; Wed, 16 Nov 2022 04:14:05 +0000 (UTC) Received: from adevxp033-sys.us.rdlabs.hpecorp.net (unknown [16.231.227.36]) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTP id 83A7C808BA7; Wed, 16 Nov 2022 04:14:04 +0000 (UTC) From: Robert Elliott To: herbert@gondor.apana.org.au, davem@davemloft.net, tim.c.chen@linux.intel.com, ap420073@gmail.com, ardb@kernel.org, Jason@zx2c4.com, David.Laight@ACULAB.COM, ebiggers@kernel.org, linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Robert Elliott Subject: [PATCH v4 10/24] crypto: x86/poly - limit FPU preemption Date: Tue, 15 Nov 2022 22:13:28 -0600 Message-Id: <20221116041342.3841-11-elliott@hpe.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221116041342.3841-1-elliott@hpe.com> References: <20221103042740.6556-1-elliott@hpe.com> <20221116041342.3841-1-elliott@hpe.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: 56s8Ohy8TZ1WboXjT_ok7Gn6ijdkpJuE X-Proofpoint-GUID: 56s8Ohy8TZ1WboXjT_ok7Gn6ijdkpJuE X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-15_08,2022-11-15_03,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 mlxscore=0 impostorscore=0 adultscore=0 lowpriorityscore=0 clxscore=1015 priorityscore=1501 malwarescore=0 phishscore=0 suspectscore=0 spamscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211160029 X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749624727795367060?= X-GMAIL-MSGID: =?utf-8?q?1749624727795367060?= Use a static const unsigned int for the limit of the number of bytes processed between kernel_fpu_begin() and kernel_fpu_end() rather than using the SZ_4K macro (which is a signed value), or a magic value of 4096U embedded in the C code. Use unsigned int rather than size_t for some of the arguments to avoid typecasting for the min() macro. Signed-off-by: Robert Elliott --- v3 use static int rather than macro, change to while loops rather than do/while loops --- arch/x86/crypto/nhpoly1305-avx2-glue.c | 11 +++++--- arch/x86/crypto/nhpoly1305-sse2-glue.c | 11 +++++--- arch/x86/crypto/poly1305_glue.c | 37 +++++++++++++++++--------- arch/x86/crypto/polyval-clmulni_glue.c | 8 ++++-- 4 files changed, 46 insertions(+), 21 deletions(-) diff --git a/arch/x86/crypto/nhpoly1305-avx2-glue.c b/arch/x86/crypto/nhpoly1305-avx2-glue.c index 8ea5ab0f1ca7..f7dc9c563bb5 100644 --- a/arch/x86/crypto/nhpoly1305-avx2-glue.c +++ b/arch/x86/crypto/nhpoly1305-avx2-glue.c @@ -13,6 +13,9 @@ #include #include +/* avoid kernel_fpu_begin/end scheduler/rcu stalls */ +static const unsigned int bytes_per_fpu = 337 * 1024; + asmlinkage void nh_avx2(const u32 *key, const u8 *message, size_t message_len, u8 hash[NH_HASH_BYTES]); @@ -26,18 +29,20 @@ static void _nh_avx2(const u32 *key, const u8 *message, size_t message_len, static int nhpoly1305_avx2_update(struct shash_desc *desc, const u8 *src, unsigned int srclen) { + BUILD_BUG_ON(bytes_per_fpu == 0); + if (srclen < 64 || !crypto_simd_usable()) return crypto_nhpoly1305_update(desc, src, srclen); - do { - unsigned int n = min_t(unsigned int, srclen, SZ_4K); + while (srclen) { + unsigned int n = min(srclen, bytes_per_fpu); kernel_fpu_begin(); crypto_nhpoly1305_update_helper(desc, src, n, _nh_avx2); kernel_fpu_end(); src += n; srclen -= n; - } while (srclen); + } return 0; } diff --git a/arch/x86/crypto/nhpoly1305-sse2-glue.c b/arch/x86/crypto/nhpoly1305-sse2-glue.c index 2b353d42ed13..daffcc7019ad 100644 --- a/arch/x86/crypto/nhpoly1305-sse2-glue.c +++ b/arch/x86/crypto/nhpoly1305-sse2-glue.c @@ -13,6 +13,9 @@ #include #include +/* avoid kernel_fpu_begin/end scheduler/rcu stalls */ +static const unsigned int bytes_per_fpu = 199 * 1024; + asmlinkage void nh_sse2(const u32 *key, const u8 *message, size_t message_len, u8 hash[NH_HASH_BYTES]); @@ -26,18 +29,20 @@ static void _nh_sse2(const u32 *key, const u8 *message, size_t message_len, static int nhpoly1305_sse2_update(struct shash_desc *desc, const u8 *src, unsigned int srclen) { + BUILD_BUG_ON(bytes_per_fpu == 0); + if (srclen < 64 || !crypto_simd_usable()) return crypto_nhpoly1305_update(desc, src, srclen); - do { - unsigned int n = min_t(unsigned int, srclen, SZ_4K); + while (srclen) { + unsigned int n = min(srclen, bytes_per_fpu); kernel_fpu_begin(); crypto_nhpoly1305_update_helper(desc, src, n, _nh_sse2); kernel_fpu_end(); src += n; srclen -= n; - } while (srclen); + } return 0; } diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c index 1dfb8af48a3c..16831c036d71 100644 --- a/arch/x86/crypto/poly1305_glue.c +++ b/arch/x86/crypto/poly1305_glue.c @@ -15,20 +15,27 @@ #include #include +#define POLY1305_BLOCK_SIZE_MASK (~(POLY1305_BLOCK_SIZE - 1)) + +/* avoid kernel_fpu_begin/end scheduler/rcu stalls */ +static const unsigned int bytes_per_fpu = 217 * 1024; + asmlinkage void poly1305_init_x86_64(void *ctx, const u8 key[POLY1305_BLOCK_SIZE]); asmlinkage void poly1305_blocks_x86_64(void *ctx, const u8 *inp, - const size_t len, const u32 padbit); + const unsigned int len, + const u32 padbit); asmlinkage void poly1305_emit_x86_64(void *ctx, u8 mac[POLY1305_DIGEST_SIZE], const u32 nonce[4]); asmlinkage void poly1305_emit_avx(void *ctx, u8 mac[POLY1305_DIGEST_SIZE], const u32 nonce[4]); -asmlinkage void poly1305_blocks_avx(void *ctx, const u8 *inp, const size_t len, - const u32 padbit); -asmlinkage void poly1305_blocks_avx2(void *ctx, const u8 *inp, const size_t len, - const u32 padbit); +asmlinkage void poly1305_blocks_avx(void *ctx, const u8 *inp, + const unsigned int len, const u32 padbit); +asmlinkage void poly1305_blocks_avx2(void *ctx, const u8 *inp, + const unsigned int len, const u32 padbit); asmlinkage void poly1305_blocks_avx512(void *ctx, const u8 *inp, - const size_t len, const u32 padbit); + const unsigned int len, + const u32 padbit); static __ro_after_init DEFINE_STATIC_KEY_FALSE(poly1305_use_avx); static __ro_after_init DEFINE_STATIC_KEY_FALSE(poly1305_use_avx2); @@ -86,14 +93,12 @@ static void poly1305_simd_init(void *ctx, const u8 key[POLY1305_BLOCK_SIZE]) poly1305_init_x86_64(ctx, key); } -static void poly1305_simd_blocks(void *ctx, const u8 *inp, size_t len, +static void poly1305_simd_blocks(void *ctx, const u8 *inp, unsigned int len, const u32 padbit) { struct poly1305_arch_internal *state = ctx; - /* SIMD disables preemption, so relax after processing each page. */ - BUILD_BUG_ON(SZ_4K < POLY1305_BLOCK_SIZE || - SZ_4K % POLY1305_BLOCK_SIZE); + BUILD_BUG_ON(bytes_per_fpu < POLY1305_BLOCK_SIZE); if (!static_branch_likely(&poly1305_use_avx) || (len < (POLY1305_BLOCK_SIZE * 18) && !state->is_base2_26) || @@ -103,8 +108,14 @@ static void poly1305_simd_blocks(void *ctx, const u8 *inp, size_t len, return; } - do { - const size_t bytes = min_t(size_t, len, SZ_4K); + while (len) { + unsigned int bytes; + + if (len < POLY1305_BLOCK_SIZE) + bytes = len; + else + bytes = min(len, + bytes_per_fpu & POLY1305_BLOCK_SIZE_MASK); kernel_fpu_begin(); if (IS_ENABLED(CONFIG_AS_AVX512) && static_branch_likely(&poly1305_use_avx512)) @@ -117,7 +128,7 @@ static void poly1305_simd_blocks(void *ctx, const u8 *inp, size_t len, len -= bytes; inp += bytes; - } while (len); + } } static void poly1305_simd_emit(void *ctx, u8 mac[POLY1305_DIGEST_SIZE], diff --git a/arch/x86/crypto/polyval-clmulni_glue.c b/arch/x86/crypto/polyval-clmulni_glue.c index b7664d018851..de1c908f7412 100644 --- a/arch/x86/crypto/polyval-clmulni_glue.c +++ b/arch/x86/crypto/polyval-clmulni_glue.c @@ -29,6 +29,9 @@ #define NUM_KEY_POWERS 8 +/* avoid kernel_fpu_begin/end scheduler/rcu stalls */ +static const unsigned int bytes_per_fpu = 393 * 1024; + struct polyval_tfm_ctx { /* * These powers must be in the order h^8, ..., h^1. @@ -107,6 +110,8 @@ static int polyval_x86_update(struct shash_desc *desc, unsigned int nblocks; unsigned int n; + BUILD_BUG_ON(bytes_per_fpu < POLYVAL_BLOCK_SIZE); + if (dctx->bytes) { n = min(srclen, dctx->bytes); pos = dctx->buffer + POLYVAL_BLOCK_SIZE - dctx->bytes; @@ -123,8 +128,7 @@ static int polyval_x86_update(struct shash_desc *desc, } while (srclen >= POLYVAL_BLOCK_SIZE) { - /* Allow rescheduling every 4K bytes. */ - nblocks = min(srclen, 4096U) / POLYVAL_BLOCK_SIZE; + nblocks = min(srclen, bytes_per_fpu) / POLYVAL_BLOCK_SIZE; internal_polyval_update(tctx, src, nblocks, dctx->buffer); srclen -= nblocks * POLYVAL_BLOCK_SIZE; src += nblocks * POLYVAL_BLOCK_SIZE;