From patchwork Mon Dec 19 22:02:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Elliott, Robert (Servers)" X-Patchwork-Id: 34791 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp2638405wrn; Mon, 19 Dec 2022 14:06:00 -0800 (PST) X-Google-Smtp-Source: AA0mqf4vy0BdbQtSrVjA2MAhjSMvchyLDH0rqDgryZOMPPSb8W1dCW9lSJdHdBV0j64XbPJy+HbT X-Received: by 2002:a05:6a20:7aa3:b0:ad:c694:3fbb with SMTP id u35-20020a056a207aa300b000adc6943fbbmr31713710pzh.25.1671487559643; Mon, 19 Dec 2022 14:05:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671487559; cv=none; d=google.com; s=arc-20160816; b=MGIgwxvvfL6x0UPz3Q7iyNS8YdhmdkIO3C6V04+jFxHydtqRbwyFj+4CU0oiphqfFD vc0QReJ+R2CiXHFGqTKLmm+zC8Ha3ncB54ISmdsItDefUT13moGj66ZpfT3Fj4s51BOg FVwJ3z5xY+ZAm3p4v+T1QAbMMh7plDCPysVg2bgMCJQ+EK0kAXfgqtoBw0qGSHeeNe4T HkIVWajfDiVP24g4lKF9laxHEcA/QZgEjGeCKF216pXjLcBYSzXe7D8TUWYqMuVqhcdS t3DtwsK5ZcMv7tNqr3V9Uobz68xxQLL8+Z0o1kKQ8wen3/t5nt19jhDpRfr5Z83fikGM DK+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=p9JbhkqMPpi3fDjAvwBFoPmQr84CYKUu0cOLpCX7Oos=; b=dz6RZxUqXHLNQOaVrDpyxcGnNwy8gti5MkWch12MobwocQHawne12R9VfynmNQsAht zqqL4g0jAOCsJAHVXOau4NNYftEALIKZC90Ja5W2huhrU6vyawjGzWAxDHdWANolEURG r2BcK1F9LO0+mxHdEc3+yr2upJOfiSYbuWybzyrnfZ5t6yWSM/cQ0fZs8sijW+/WzCUF BnjH+TCBLa5Npi4H2XjtYS+h/I4S419tKw2HDFxfOwIoEx6mR6RyPvS5JYy2dZy1dR2K W6R+/BW4VqEMQ1dooLH1y8rzQGka/hjCPPPxny7zhpJ48n7vsiDkky1dyAos7/+k2hRf 4k1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@hpe.com header.s=pps0720 header.b=R8vnhzEl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hpe.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x6-20020a63fe46000000b00476b15a02cesi11865423pgj.70.2022.12.19.14.05.46; Mon, 19 Dec 2022 14:05:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@hpe.com header.s=pps0720 header.b=R8vnhzEl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=hpe.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232860AbiLSWDX (ORCPT + 99 others); Mon, 19 Dec 2022 17:03:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232385AbiLSWDS (ORCPT ); Mon, 19 Dec 2022 17:03:18 -0500 Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 62E541409D; Mon, 19 Dec 2022 14:03:17 -0800 (PST) Received: from pps.filterd (m0150241.ppops.net [127.0.0.1]) by mx0a-002e3701.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 2BJJqpHh022240; Mon, 19 Dec 2022 22:02:58 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pps0720; bh=p9JbhkqMPpi3fDjAvwBFoPmQr84CYKUu0cOLpCX7Oos=; b=R8vnhzElii1PpLsbNT17v0Pq13SNkh97OdCOMmYK3gZBqg08UIHuF8KB2TCKL05kimPn cWrDjYCmo1i+csmB9g8RtK1/xDCphuS5drdfTPT4tUjQZJk08kDb/KSre7BqA/gX8Crf 1TVQtBiuLqKbbVZWS95OfUqk+MBBXSesV3eeXM58KL8ioG+bBi6amFu0AeNZUAMhBEKP 2rQvfcM2jrYvZBixCyOWLtKk87bXFExdBs2MFUoYHJCWFGvNmLn207QGxZv9fnSDOS6+ wgol+YeZCK5INUMepxHDXYQs7fCJxoAiNjGwt+o4AHl0mcXuERH+5XhV7MMOaHjmkNPU 0w== Received: from p1lg14880.it.hpe.com (p1lg14880.it.hpe.com [16.230.97.201]) by mx0a-002e3701.pphosted.com (PPS) with ESMTPS id 3mjx3b10xw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 19 Dec 2022 22:02:58 +0000 Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14880.it.hpe.com (Postfix) with ESMTPS id 01CFA807130; Mon, 19 Dec 2022 22:02:57 +0000 (UTC) Received: from adevxp033-sys.us.rdlabs.hpecorp.net (unknown [16.231.227.36]) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTP id 33578805634; Mon, 19 Dec 2022 22:02:57 +0000 (UTC) From: Robert Elliott To: herbert@gondor.apana.org.au, davem@davemloft.net, Jason@zx2c4.com, ardb@kernel.org, ap420073@gmail.com, David.Laight@ACULAB.COM, ebiggers@kernel.org, tim.c.chen@linux.intel.com, peter@n8pjl.ca, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com Cc: linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, Robert Elliott Subject: [PATCH 09/13] crypto: x86/poly - yield FPU context only when needed Date: Mon, 19 Dec 2022 16:02:19 -0600 Message-Id: <20221219220223.3982176-10-elliott@hpe.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221219220223.3982176-1-elliott@hpe.com> References: <20221219220223.3982176-1-elliott@hpe.com> MIME-Version: 1.0 X-Proofpoint-ORIG-GUID: twEmfStn_S24aIZ0bUsjbvnDIt-5xTZA X-Proofpoint-GUID: twEmfStn_S24aIZ0bUsjbvnDIt-5xTZA X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.923,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-12-19_01,2022-12-15_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=999 spamscore=0 clxscore=1015 impostorscore=0 priorityscore=1501 bulkscore=0 adultscore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 malwarescore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2212190193 X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752681739253303097?= X-GMAIL-MSGID: =?utf-8?q?1752681739253303097?= The x86 assembly language implementations using SIMD process data between kernel_fpu_begin() and kernel_fpu_end() calls. That disables scheduler preemption, so prevents the CPU core from being used by other threads. The update() and finup() functions might be called to process large quantities of data, which can result in RCU stalls and soft lockups. Rather than break the processing into 4 KiB passes, each of which unilaterally calls kernel_fpu_begin() and kernel_fpu_end(), periodically check if the kernel scheduler wants to run something else on the CPU. If so, yield the kernel FPU context and let the scheduler intervene. Suggested-by: Herbert Xu Signed-off-by: Robert Elliott --- arch/x86/crypto/nhpoly1305-avx2-glue.c | 22 +++++++----- arch/x86/crypto/nhpoly1305-sse2-glue.c | 22 +++++++----- arch/x86/crypto/poly1305_glue.c | 47 ++++++++++++-------------- arch/x86/crypto/polyval-clmulni_glue.c | 46 +++++++++++++++---------- 4 files changed, 79 insertions(+), 58 deletions(-) diff --git a/arch/x86/crypto/nhpoly1305-avx2-glue.c b/arch/x86/crypto/nhpoly1305-avx2-glue.c index 46b036204ed9..4afbfd35afda 100644 --- a/arch/x86/crypto/nhpoly1305-avx2-glue.c +++ b/arch/x86/crypto/nhpoly1305-avx2-glue.c @@ -22,15 +22,21 @@ static int nhpoly1305_avx2_update(struct shash_desc *desc, if (srclen < 64 || !crypto_simd_usable()) return crypto_nhpoly1305_update(desc, src, srclen); - do { - unsigned int n = min_t(unsigned int, srclen, SZ_4K); + kernel_fpu_begin(); + for (;;) { + const unsigned int chunk = min(srclen, 4096U); + + crypto_nhpoly1305_update_helper(desc, src, chunk, nh_avx2); + srclen -= chunk; + + if (!srclen) + break; + + src += chunk; + kernel_fpu_yield(); + } + kernel_fpu_end(); - kernel_fpu_begin(); - crypto_nhpoly1305_update_helper(desc, src, n, nh_avx2); - kernel_fpu_end(); - src += n; - srclen -= n; - } while (srclen); return 0; } diff --git a/arch/x86/crypto/nhpoly1305-sse2-glue.c b/arch/x86/crypto/nhpoly1305-sse2-glue.c index 4a4970d75107..f5c757f6f781 100644 --- a/arch/x86/crypto/nhpoly1305-sse2-glue.c +++ b/arch/x86/crypto/nhpoly1305-sse2-glue.c @@ -22,15 +22,21 @@ static int nhpoly1305_sse2_update(struct shash_desc *desc, if (srclen < 64 || !crypto_simd_usable()) return crypto_nhpoly1305_update(desc, src, srclen); - do { - unsigned int n = min_t(unsigned int, srclen, SZ_4K); + kernel_fpu_begin(); + for (;;) { + const unsigned int chunk = min(srclen, 4096U); + + crypto_nhpoly1305_update_helper(desc, src, chunk, nh_sse2); + srclen -= chunk; + + if (!srclen) + break; + + src += chunk; + kernel_fpu_yield(); + } + kernel_fpu_end(); - kernel_fpu_begin(); - crypto_nhpoly1305_update_helper(desc, src, n, nh_sse2); - kernel_fpu_end(); - src += n; - srclen -= n; - } while (srclen); return 0; } diff --git a/arch/x86/crypto/poly1305_glue.c b/arch/x86/crypto/poly1305_glue.c index 1dfb8af48a3c..13e2e134b458 100644 --- a/arch/x86/crypto/poly1305_glue.c +++ b/arch/x86/crypto/poly1305_glue.c @@ -15,20 +15,13 @@ #include #include -asmlinkage void poly1305_init_x86_64(void *ctx, - const u8 key[POLY1305_BLOCK_SIZE]); -asmlinkage void poly1305_blocks_x86_64(void *ctx, const u8 *inp, - const size_t len, const u32 padbit); -asmlinkage void poly1305_emit_x86_64(void *ctx, u8 mac[POLY1305_DIGEST_SIZE], - const u32 nonce[4]); -asmlinkage void poly1305_emit_avx(void *ctx, u8 mac[POLY1305_DIGEST_SIZE], - const u32 nonce[4]); -asmlinkage void poly1305_blocks_avx(void *ctx, const u8 *inp, const size_t len, - const u32 padbit); -asmlinkage void poly1305_blocks_avx2(void *ctx, const u8 *inp, const size_t len, - const u32 padbit); -asmlinkage void poly1305_blocks_avx512(void *ctx, const u8 *inp, - const size_t len, const u32 padbit); +asmlinkage void poly1305_init_x86_64(void *ctx, const u8 key[POLY1305_BLOCK_SIZE]); +asmlinkage void poly1305_blocks_x86_64(void *ctx, const u8 *inp, unsigned int len, u32 padbit); +asmlinkage void poly1305_emit_x86_64(void *ctx, u8 mac[POLY1305_DIGEST_SIZE], const u32 nonce[4]); +asmlinkage void poly1305_emit_avx(void *ctx, u8 mac[POLY1305_DIGEST_SIZE], const u32 nonce[4]); +asmlinkage void poly1305_blocks_avx(void *ctx, const u8 *inp, unsigned int len, const u32 padbit); +asmlinkage void poly1305_blocks_avx2(void *ctx, const u8 *inp, unsigned int len, u32 padbit); +asmlinkage void poly1305_blocks_avx512(void *ctx, const u8 *inp, unsigned int len, u32 padbit); static __ro_after_init DEFINE_STATIC_KEY_FALSE(poly1305_use_avx); static __ro_after_init DEFINE_STATIC_KEY_FALSE(poly1305_use_avx2); @@ -86,7 +79,7 @@ static void poly1305_simd_init(void *ctx, const u8 key[POLY1305_BLOCK_SIZE]) poly1305_init_x86_64(ctx, key); } -static void poly1305_simd_blocks(void *ctx, const u8 *inp, size_t len, +static void poly1305_simd_blocks(void *ctx, const u8 *inp, unsigned int len, const u32 padbit) { struct poly1305_arch_internal *state = ctx; @@ -103,21 +96,25 @@ static void poly1305_simd_blocks(void *ctx, const u8 *inp, size_t len, return; } - do { - const size_t bytes = min_t(size_t, len, SZ_4K); + kernel_fpu_begin(); + for (;;) { + const unsigned int chunk = min(len, 4096U); - kernel_fpu_begin(); if (IS_ENABLED(CONFIG_AS_AVX512) && static_branch_likely(&poly1305_use_avx512)) - poly1305_blocks_avx512(ctx, inp, bytes, padbit); + poly1305_blocks_avx512(ctx, inp, chunk, padbit); else if (static_branch_likely(&poly1305_use_avx2)) - poly1305_blocks_avx2(ctx, inp, bytes, padbit); + poly1305_blocks_avx2(ctx, inp, chunk, padbit); else - poly1305_blocks_avx(ctx, inp, bytes, padbit); - kernel_fpu_end(); + poly1305_blocks_avx(ctx, inp, chunk, padbit); + len -= chunk; - len -= bytes; - inp += bytes; - } while (len); + if (!len) + break; + + inp += chunk; + kernel_fpu_yield(); + } + kernel_fpu_end(); } static void poly1305_simd_emit(void *ctx, u8 mac[POLY1305_DIGEST_SIZE], diff --git a/arch/x86/crypto/polyval-clmulni_glue.c b/arch/x86/crypto/polyval-clmulni_glue.c index 8fa58b0f3cb3..a3d72e87d58d 100644 --- a/arch/x86/crypto/polyval-clmulni_glue.c +++ b/arch/x86/crypto/polyval-clmulni_glue.c @@ -45,8 +45,8 @@ struct polyval_desc_ctx { u32 bytes; }; -asmlinkage void clmul_polyval_update(const struct polyval_tfm_ctx *keys, - const u8 *in, size_t nblocks, u8 *accumulator); +asmlinkage void clmul_polyval_update(const struct polyval_tfm_ctx *keys, const u8 *in, + unsigned int nblocks, u8 *accumulator); asmlinkage void clmul_polyval_mul(u8 *op1, const u8 *op2); static inline struct polyval_tfm_ctx *polyval_tfm_ctx(struct crypto_shash *tfm) @@ -55,27 +55,40 @@ static inline struct polyval_tfm_ctx *polyval_tfm_ctx(struct crypto_shash *tfm) } static void internal_polyval_update(const struct polyval_tfm_ctx *keys, - const u8 *in, size_t nblocks, u8 *accumulator) + const u8 *in, unsigned int nblocks, u8 *accumulator) { - if (likely(crypto_simd_usable())) { - kernel_fpu_begin(); - clmul_polyval_update(keys, in, nblocks, accumulator); - kernel_fpu_end(); - } else { + if (!crypto_simd_usable()) { polyval_update_non4k(keys->key_powers[NUM_KEY_POWERS-1], in, nblocks, accumulator); + return; } + + kernel_fpu_begin(); + for (;;) { + const unsigned int chunks = min(nblocks, 4096U / POLYVAL_BLOCK_SIZE); + + clmul_polyval_update(keys, in, chunks, accumulator); + nblocks -= chunks; + + if (!nblocks) + break; + + in += chunks * POLYVAL_BLOCK_SIZE; + kernel_fpu_yield(); + } + kernel_fpu_end(); } static void internal_polyval_mul(u8 *op1, const u8 *op2) { - if (likely(crypto_simd_usable())) { - kernel_fpu_begin(); - clmul_polyval_mul(op1, op2); - kernel_fpu_end(); - } else { + if (!crypto_simd_usable()) { polyval_mul_non4k(op1, op2); + return; } + + kernel_fpu_begin(); + clmul_polyval_mul(op1, op2); + kernel_fpu_end(); } static int polyval_x86_setkey(struct crypto_shash *tfm, @@ -113,7 +126,6 @@ static int polyval_x86_update(struct shash_desc *desc, struct polyval_desc_ctx *dctx = shash_desc_ctx(desc); const struct polyval_tfm_ctx *tctx = polyval_tfm_ctx(desc->tfm); u8 *pos; - unsigned int nblocks; unsigned int n; if (dctx->bytes) { @@ -131,9 +143,9 @@ static int polyval_x86_update(struct shash_desc *desc, tctx->key_powers[NUM_KEY_POWERS-1]); } - while (srclen >= POLYVAL_BLOCK_SIZE) { - /* Allow rescheduling every 4K bytes. */ - nblocks = min(srclen, 4096U) / POLYVAL_BLOCK_SIZE; + if (srclen >= POLYVAL_BLOCK_SIZE) { + const unsigned int nblocks = srclen / POLYVAL_BLOCK_SIZE; + internal_polyval_update(tctx, src, nblocks, dctx->buffer); srclen -= nblocks * POLYVAL_BLOCK_SIZE; src += nblocks * POLYVAL_BLOCK_SIZE;