From patchwork Tue Nov 28 12:38:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 170744 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce62:0:b0:403:3b70:6f57 with SMTP id o2csp3884617vqx; Tue, 28 Nov 2023 04:38:57 -0800 (PST) X-Google-Smtp-Source: AGHT+IHxkUqRsXDe8VieH9z7H/baADnsG0biOOnGsHCXC2VtCRMFruiFDLPSEdAphKHpNNNV02AF X-Received: by 2002:a05:620a:6087:b0:77d:581b:4bb6 with SMTP id dx7-20020a05620a608700b0077d581b4bb6mr21964119qkb.48.1701175137347; Tue, 28 Nov 2023 04:38:57 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1701175137; cv=pass; d=google.com; s=arc-20160816; b=jgXKpn01RxOhkOFH+7RLAOoLVMmh7H+Neb9tYtdEtM2gqoidvukxIDd/AO+U3QKXqf tPmZ1TvG3sGxKfP3p+ntmyUJoG0tQF1FL8UrYx70pNEyAPHL5dRx26ULDpm/uwuwrFS7 68pybhfmW9A+yEhyEKlPYZqG+cur1gJciUb0VcdwSiwp3/D158GzPkTTjO/e5Zzk2YXF G5YlAAIRhdJauzFwMr1FITEt5sjaCPzGKttgv5jhwBb0CR3ROUeIAiyHEWG2EAR8BwQa S3U212KnelGefRbkLZT0deIOAwuMNHjR+MrpHVjUzV+9/wowoWnIV05/hmz+EOq3S76F DtTw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:reply-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-disposition :mime-version:message-id:subject:cc:to:from:date:dkim-signature :arc-filter:dmarc-filter:delivered-to; bh=AoBrQlG9kQlz0wjSs05UEH4dDFAiBjS6aV51CT2zCuQ=; fh=B1a4w0PRsEPJCXJekrVstAjRZ6EaFV3M/osyyteoUbw=; b=flJFEbsYoCxeUGk0/bz4/EkTruj9MatJuKWxlCHMhPUgYxLkLBEzuPK08WXSiUZz3P p4fW+T5W4YwEi41mig7P0RFJ3g4qPBWJXuqpbHxVHruNsFOqvrN9ZmzQn3EB2woe+SEq Osu9V288dddBPt7M+H57Lj6snQcj4+A6mOlAaJcGND9O610P+1iefwZ1oBMkHuEBy+K/ tEfcCBJXvbP/OXjpIfUelNVqSub9y+caTKDOAsQIrgIJ36WN/+K4X+ULRGkziliGcwMm dZDkkhuzHF5eiBaed/ZGGOWvGzgZ6qQvlru6tqckkyBo5PbfScPxjIacw5aF9rGT5lnI LTpQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gx2CINfU; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id or8-20020a05620a618800b0077d7685b2a4si11421787qkn.596.2023.11.28.04.38.57 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 04:38:57 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gx2CINfU; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1FBB63845BFA for ; Tue, 28 Nov 2023 12:38:57 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 570623858C50 for ; Tue, 28 Nov 2023 12:38:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 570623858C50 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 570623858C50 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701175113; cv=none; b=l/ZLOx14z+nXwvNcM/QnoksmIm+1Y1mrYZSm2y9CLijRknMsMcHIhkcZDXwIkCQj4STovRtQwOlY7xdSSLDVUw/TYzmbMwOmMPQTQv65ls6XScqfIN7CUes/uJpflbOKbMwV5H0aykCzfBfla4rFs5aQj5pMcdqF7huxNp9E9yo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701175113; c=relaxed/simple; bh=xsTmw4jWRdKn24wbP3PfO2ImRhMrHDBxZqBdHQb4/38=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=xi5DPs6yolpB9/oWFMxkcqQrNKwxImax0SAdosiVNZpJXTZJc/8K5UZ6e6QVd5r/NH6BpVq/6yPx8JoMRiWMaSi6pQczVon2s/ukMikT0Dg+83Ge55rwzFiP8hHKxdGTQUCYd36lEJBITQ+mmCo9rNmTRu7POkNvUR80UXx0rAQ= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1701175110; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=AoBrQlG9kQlz0wjSs05UEH4dDFAiBjS6aV51CT2zCuQ=; b=gx2CINfUGxgPLv20yhRlntQGlwuvRJb14BxckXYvVo9g9Z3/ua9WpaeBv6+k/2ixVABdTD tXHzUX6xNn2k/23vnuxz4fv2mpJrdLODlylxwmQNl1+Ghpup6nv0vx6Irg/yC3GyuW9t73 jVMUD66xDJP9ooKicWMbguiw+fzErRY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-641-db0SVw3POfGt4VkVO261YQ-1; Tue, 28 Nov 2023 07:38:29 -0500 X-MC-Unique: db0SVw3POfGt4VkVO261YQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1393C185A782; Tue, 28 Nov 2023 12:38:29 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.194.53]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A560C20268D8; Tue, 28 Nov 2023 12:38:28 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 3ASCcQ9n742912 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 28 Nov 2023 13:38:26 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 3ASCcPX9742911; Tue, 28 Nov 2023 13:38:25 +0100 Date: Tue, 28 Nov 2023 13:38:24 +0100 From: Jakub Jelinek To: gcc-patches@gcc.gnu.org Cc: Richard Earnshaw , Richard Sandiford , Kyrylo Tkachov Subject: [committed] libiberty: Use x86 HW optimized sha1 Message-ID: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.4 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_NUMSUBJECT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Jakub Jelinek Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783811421039994404 X-GMAIL-MSGID: 1783811421039994404 Hi! Nick has approved this patch (+ small ld change to use it for --build-id=), so I'm commiting it to GCC as master as well. If anyone from ARM would be willing to implement it similarly with vsha1{cq,mq,pq,h,su0q,su1q}_u32 intrinsics, it could be a useful linker speedup on those hosts as well, the intent in sha1.c was that sha1_hw_process_bytes, sha1_hw_process_block functions would be defined whenever defined (HAVE_X86_SHA1_HW_SUPPORT) || defined (HAVE_WHATEVERELSE_SHA1_HW_SUPPORT) but the body of sha1_hw_process_block and sha1_choose_process_bytes would then have #elif defined (HAVE_WHATEVERELSE_SHA1_HW_SUPPORT) for the other arch support, similarly for any target attributes on sha1_hw_process_block if needed. Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk (and binutils-gdb with the ld additional hunk). 2023-11-28 Jakub Jelinek include/ * sha1.h (sha1_process_bytes_fn): New typedef. (sha1_choose_process_bytes): Declare. libiberty/ * configure.ac (HAVE_X86_SHA1_HW_SUPPORT): New check. * sha1.c: If HAVE_X86_SHA1_HW_SUPPORT is defined, include x86intrin.h and cpuid.h. (sha1_hw_process_bytes, sha1_hw_process_block, sha1_choose_process_bytes): New functions. * config.in: Regenerated. * configure: Regenerated. Jakub --- include/sha1.h.jj 2023-01-16 11:52:16.315730646 +0100 +++ include/sha1.h 2023-11-25 12:22:13.191136098 +0100 @@ -108,6 +108,13 @@ extern void sha1_process_block (const vo extern void sha1_process_bytes (const void *buffer, size_t len, struct sha1_ctx *ctx); +typedef void (*sha1_process_bytes_fn) (const void *, size_t, + struct sha1_ctx *); + +/* Return sha1_process_bytes or some hardware optimized version thereof + depending on current CPU. */ +extern sha1_process_bytes_fn sha1_choose_process_bytes (void); + /* Process the remaining bytes in the buffer and put result from CTX in first 20 bytes following RESBUF. The result is always in little endian byte order, so that a byte-wise output yields to the wanted --- libiberty/configure.ac.jj 2023-11-11 08:52:20.968837498 +0100 +++ libiberty/configure.ac 2023-11-25 12:51:05.540291805 +0100 @@ -742,6 +742,46 @@ case "${host}" in esac AC_SUBST(pexecute) +AC_MSG_CHECKING([for SHA1 HW acceleration support]) +AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[ +#include +#include + +__attribute__((__target__ ("sse4.1,sha"))) +void foo (__m128i *buf, unsigned int e, __m128i msg0, __m128i msg1) +{ + __m128i abcd = _mm_loadu_si128 ((const __m128i *) buf); + __m128i e0 = _mm_set_epi32 (e, 0, 0, 0); + abcd = _mm_shuffle_epi32 (abcd, 0x1b); + const __m128i shuf_mask = _mm_set_epi64x (0x0001020304050607ULL, 0x08090a0b0c0d0e0fULL); + abcd = _mm_shuffle_epi8 (abcd, shuf_mask); + e0 = _mm_sha1nexte_epu32 (e0, msg1); + abcd = _mm_sha1rnds4_epu32 (abcd, e0, 0); + msg0 = _mm_sha1msg1_epu32 (msg0, msg1); + msg0 = _mm_sha1msg2_epu32 (msg0, msg1); + msg0 = _mm_xor_si128 (msg0, msg1); + e0 = _mm_add_epi32 (e0, msg0); + e0 = abcd; + _mm_storeu_si128 (buf, abcd); + e = _mm_extract_epi32 (e0, 3); +} + +int bar (void) +{ + unsigned int eax, ebx, ecx, edx; + if (__get_cpuid_count (7, 0, &eax, &ebx, &ecx, &edx) + && (ebx & bit_SHA) != 0 + && __get_cpuid (1, &eax, &ebx, &ecx, &edx) + && (ecx & bit_SSE4_1) != 0) + return 1; + return 0; +} +]], [[bar ();]])], + [AC_MSG_RESULT([x86 SHA1]) + AC_DEFINE(HAVE_X86_SHA1_HW_SUPPORT, 1, + [Define if you have x86 SHA1 HW acceleration support.])], + [AC_MSG_RESULT([no])]) + libiberty_AC_FUNC_STRNCMP # Install a library built with a cross compiler in $(tooldir) rather --- libiberty/sha1.c.jj 2023-01-16 11:52:16.874722408 +0100 +++ libiberty/sha1.c 2023-11-25 12:48:36.301348519 +0100 @@ -29,6 +29,11 @@ #include #include +#ifdef HAVE_X86_SHA1_HW_SUPPORT +# include +# include +#endif + #if USE_UNLOCKED_IO # include "unlocked-io.h" #endif @@ -412,3 +417,303 @@ sha1_process_block (const void *buffer, e = ctx->E += e; } } + +#if defined(HAVE_X86_SHA1_HW_SUPPORT) +/* HW specific version of sha1_process_bytes. */ + +static void sha1_hw_process_block (const void *, size_t, struct sha1_ctx *); + +static void +sha1_hw_process_bytes (const void *buffer, size_t len, struct sha1_ctx *ctx) +{ + /* When we already have some bits in our internal buffer concatenate + both inputs first. */ + if (ctx->buflen != 0) + { + size_t left_over = ctx->buflen; + size_t add = 128 - left_over > len ? len : 128 - left_over; + + memcpy (&((char *) ctx->buffer)[left_over], buffer, add); + ctx->buflen += add; + + if (ctx->buflen > 64) + { + sha1_hw_process_block (ctx->buffer, ctx->buflen & ~63, ctx); + + ctx->buflen &= 63; + /* The regions in the following copy operation cannot overlap. */ + memcpy (ctx->buffer, + &((char *) ctx->buffer)[(left_over + add) & ~63], + ctx->buflen); + } + + buffer = (const char *) buffer + add; + len -= add; + } + + /* Process available complete blocks. */ + if (len >= 64) + { +#if !_STRING_ARCH_unaligned +# define alignof(type) offsetof (struct { char c; type x; }, x) +# define UNALIGNED_P(p) (((size_t) p) % alignof (sha1_uint32) != 0) + if (UNALIGNED_P (buffer)) + while (len > 64) + { + sha1_hw_process_block (memcpy (ctx->buffer, buffer, 64), 64, ctx); + buffer = (const char *) buffer + 64; + len -= 64; + } + else +#endif + { + sha1_hw_process_block (buffer, len & ~63, ctx); + buffer = (const char *) buffer + (len & ~63); + len &= 63; + } + } + + /* Move remaining bytes in internal buffer. */ + if (len > 0) + { + size_t left_over = ctx->buflen; + + memcpy (&((char *) ctx->buffer)[left_over], buffer, len); + left_over += len; + if (left_over >= 64) + { + sha1_hw_process_block (ctx->buffer, 64, ctx); + left_over -= 64; + memmove (ctx->buffer, &ctx->buffer[16], left_over); + } + ctx->buflen = left_over; + } +} + +/* Process LEN bytes of BUFFER, accumulating context into CTX. + Using CPU specific intrinsics. */ + +#ifdef HAVE_X86_SHA1_HW_SUPPORT +__attribute__((__target__ ("sse4.1,sha"))) +#endif +static void +sha1_hw_process_block (const void *buffer, size_t len, struct sha1_ctx *ctx) +{ +#ifdef HAVE_X86_SHA1_HW_SUPPORT + /* Implemented from + https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sha-extensions.html */ + const __m128i *words = (const __m128i *) buffer; + const __m128i *endp = (const __m128i *) ((const char *) buffer + len); + __m128i abcd, abcd_save, e0, e0_save, e1, msg0, msg1, msg2, msg3; + const __m128i shuf_mask + = _mm_set_epi64x (0x0001020304050607ULL, 0x08090a0b0c0d0e0fULL); + char check[((offsetof (struct sha1_ctx, B) + == offsetof (struct sha1_ctx, A) + sizeof (ctx->A)) + && (offsetof (struct sha1_ctx, C) + == offsetof (struct sha1_ctx, A) + 2 * sizeof (ctx->A)) + && (offsetof (struct sha1_ctx, D) + == offsetof (struct sha1_ctx, A) + 3 * sizeof (ctx->A))) + ? 1 : -1]; + + /* First increment the byte count. RFC 1321 specifies the possible + length of the file up to 2^64 bits. Here we only compute the + number of bytes. Do a double word increment. */ + ctx->total[0] += len; + ctx->total[1] += ((len >> 31) >> 1) + (ctx->total[0] < len); + + (void) &check[0]; + abcd = _mm_loadu_si128 ((const __m128i *) &ctx->A); + e0 = _mm_set_epi32 (ctx->E, 0, 0, 0); + abcd = _mm_shuffle_epi32 (abcd, 0x1b); /* 0, 1, 2, 3 */ + + while (words < endp) + { + abcd_save = abcd; + e0_save = e0; + + /* 0..3 */ + msg0 = _mm_loadu_si128 (words); + msg0 = _mm_shuffle_epi8 (msg0, shuf_mask); + e0 = _mm_add_epi32 (e0, msg0); + e1 = abcd; + abcd = _mm_sha1rnds4_epu32 (abcd, e0, 0); + + /* 4..7 */ + msg1 = _mm_loadu_si128 (words + 1); + msg1 = _mm_shuffle_epi8 (msg1, shuf_mask); + e1 = _mm_sha1nexte_epu32 (e1, msg1); + e0 = abcd; + abcd = _mm_sha1rnds4_epu32 (abcd, e1, 0); + msg0 = _mm_sha1msg1_epu32 (msg0, msg1); + + /* 8..11 */ + msg2 = _mm_loadu_si128 (words + 2); + msg2 = _mm_shuffle_epi8 (msg2, shuf_mask); + e0 = _mm_sha1nexte_epu32 (e0, msg2); + e1 = abcd; + abcd = _mm_sha1rnds4_epu32 (abcd, e0, 0); + msg1 = _mm_sha1msg1_epu32 (msg1, msg2); + msg0 = _mm_xor_si128 (msg0, msg2); + + /* 12..15 */ + msg3 = _mm_loadu_si128 (words + 3); + msg3 = _mm_shuffle_epi8 (msg3, shuf_mask); + e1 = _mm_sha1nexte_epu32 (e1, msg3); + e0 = abcd; + msg0 = _mm_sha1msg2_epu32 (msg0, msg3); + abcd = _mm_sha1rnds4_epu32 (abcd, e1, 0); + msg2 = _mm_sha1msg1_epu32 (msg2, msg3); + msg1 = _mm_xor_si128 (msg1, msg3); + + /* 16..19 */ + e0 = _mm_sha1nexte_epu32 (e0, msg0); + e1 = abcd; + msg1 = _mm_sha1msg2_epu32 (msg1, msg0); + abcd = _mm_sha1rnds4_epu32 (abcd, e0, 0); + msg3 = _mm_sha1msg1_epu32 (msg3, msg0); + msg2 = _mm_xor_si128 (msg2, msg0); + + /* 20..23 */ + e1 = _mm_sha1nexte_epu32 (e1, msg1); + e0 = abcd; + msg2 = _mm_sha1msg2_epu32 (msg2, msg1); + abcd = _mm_sha1rnds4_epu32 (abcd, e1, 1); + msg0 = _mm_sha1msg1_epu32 (msg0, msg1); + msg3 = _mm_xor_si128 (msg3, msg1); + + /* 24..27 */ + e0 = _mm_sha1nexte_epu32 (e0, msg2); + e1 = abcd; + msg3 = _mm_sha1msg2_epu32 (msg3, msg2); + abcd = _mm_sha1rnds4_epu32 (abcd, e0, 1); + msg1 = _mm_sha1msg1_epu32 (msg1, msg2); + msg0 = _mm_xor_si128 (msg0, msg2); + + /* 28..31 */ + e1 = _mm_sha1nexte_epu32 (e1, msg3); + e0 = abcd; + msg0 = _mm_sha1msg2_epu32 (msg0, msg3); + abcd = _mm_sha1rnds4_epu32 (abcd, e1, 1); + msg2 = _mm_sha1msg1_epu32 (msg2, msg3); + msg1 = _mm_xor_si128 (msg1, msg3); + + /* 32..35 */ + e0 = _mm_sha1nexte_epu32 (e0, msg0); + e1 = abcd; + msg1 = _mm_sha1msg2_epu32 (msg1, msg0); + abcd = _mm_sha1rnds4_epu32 (abcd, e0, 1); + msg3 = _mm_sha1msg1_epu32 (msg3, msg0); + msg2 = _mm_xor_si128 (msg2, msg0); + + /* 36..39 */ + e1 = _mm_sha1nexte_epu32 (e1, msg1); + e0 = abcd; + msg2 = _mm_sha1msg2_epu32 (msg2, msg1); + abcd = _mm_sha1rnds4_epu32 (abcd, e1, 1); + msg0 = _mm_sha1msg1_epu32 (msg0, msg1); + msg3 = _mm_xor_si128 (msg3, msg1); + + /* 40..43 */ + e0 = _mm_sha1nexte_epu32 (e0, msg2); + e1 = abcd; + msg3 = _mm_sha1msg2_epu32 (msg3, msg2); + abcd = _mm_sha1rnds4_epu32 (abcd, e0, 2); + msg1 = _mm_sha1msg1_epu32 (msg1, msg2); + msg0 = _mm_xor_si128 (msg0, msg2); + + /* 44..47 */ + e1 = _mm_sha1nexte_epu32 (e1, msg3); + e0 = abcd; + msg0 = _mm_sha1msg2_epu32 (msg0, msg3); + abcd = _mm_sha1rnds4_epu32 (abcd, e1, 2); + msg2 = _mm_sha1msg1_epu32 (msg2, msg3); + msg1 = _mm_xor_si128 (msg1, msg3); + + /* 48..51 */ + e0 = _mm_sha1nexte_epu32 (e0, msg0); + e1 = abcd; + msg1 = _mm_sha1msg2_epu32 (msg1, msg0); + abcd = _mm_sha1rnds4_epu32 (abcd, e0, 2); + msg3 = _mm_sha1msg1_epu32 (msg3, msg0); + msg2 = _mm_xor_si128 (msg2, msg0); + + /* 52..55 */ + e1 = _mm_sha1nexte_epu32 (e1, msg1); + e0 = abcd; + msg2 = _mm_sha1msg2_epu32 (msg2, msg1); + abcd = _mm_sha1rnds4_epu32 (abcd, e1, 2); + msg0 = _mm_sha1msg1_epu32 (msg0, msg1); + msg3 = _mm_xor_si128 (msg3, msg1); + + /* 56..59 */ + e0 = _mm_sha1nexte_epu32 (e0, msg2); + e1 = abcd; + msg3 = _mm_sha1msg2_epu32 (msg3, msg2); + abcd = _mm_sha1rnds4_epu32 (abcd, e0, 2); + msg1 = _mm_sha1msg1_epu32 (msg1, msg2); + msg0 = _mm_xor_si128 (msg0, msg2); + + /* 60..63 */ + e1 = _mm_sha1nexte_epu32 (e1, msg3); + e0 = abcd; + msg0 = _mm_sha1msg2_epu32 (msg0, msg3); + abcd = _mm_sha1rnds4_epu32 (abcd, e1, 3); + msg2 = _mm_sha1msg1_epu32 (msg2, msg3); + msg1 = _mm_xor_si128 (msg1, msg3); + + /* 64..67 */ + e0 = _mm_sha1nexte_epu32 (e0, msg0); + e1 = abcd; + msg1 = _mm_sha1msg2_epu32 (msg1, msg0); + abcd = _mm_sha1rnds4_epu32 (abcd, e0, 3); + msg3 = _mm_sha1msg1_epu32 (msg3, msg0); + msg2 = _mm_xor_si128 (msg2, msg0); + + /* 68..71 */ + e1 = _mm_sha1nexte_epu32 (e1, msg1); + e0 = abcd; + msg2 = _mm_sha1msg2_epu32 (msg2, msg1); + abcd = _mm_sha1rnds4_epu32 (abcd, e1, 3); + msg3 = _mm_xor_si128 (msg3, msg1); + + /* 72..75 */ + e0 = _mm_sha1nexte_epu32 (e0, msg2); + e1 = abcd; + msg3 = _mm_sha1msg2_epu32 (msg3, msg2); + abcd = _mm_sha1rnds4_epu32 (abcd, e0, 3); + + /* 76..79 */ + e1 = _mm_sha1nexte_epu32 (e1, msg3); + e0 = abcd; + abcd = _mm_sha1rnds4_epu32 (abcd, e1, 3); + + /* Finalize. */ + e0 = _mm_sha1nexte_epu32 (e0, e0_save); + abcd = _mm_add_epi32 (abcd, abcd_save); + + words = words + 4; + } + + abcd = _mm_shuffle_epi32 (abcd, 0x1b); /* 0, 1, 2, 3 */ + _mm_storeu_si128 ((__m128i *) &ctx->A, abcd); + ctx->E = _mm_extract_epi32 (e0, 3); +#endif +} +#endif + +/* Return sha1_process_bytes or some hardware optimized version thereof + depending on current CPU. */ + +sha1_process_bytes_fn +sha1_choose_process_bytes (void) +{ +#ifdef HAVE_X86_SHA1_HW_SUPPORT + unsigned int eax, ebx, ecx, edx; + if (__get_cpuid_count (7, 0, &eax, &ebx, &ecx, &edx) + && (ebx & bit_SHA) != 0 + && __get_cpuid (1, &eax, &ebx, &ecx, &edx) + && (ecx & bit_SSE4_1) != 0) + return sha1_hw_process_bytes; +#endif + return sha1_process_bytes; +} --- libiberty/config.in.jj 2023-11-11 08:52:20.964837553 +0100 +++ libiberty/config.in 2023-11-25 12:49:08.231908560 +0100 @@ -441,6 +441,9 @@ /* Define to 1 if `vfork' works. */ #undef HAVE_WORKING_VFORK +/* Define if you have x86 SHA1 HW acceleration support. */ +#undef HAVE_X86_SHA1_HW_SUPPORT + /* Define to 1 if you have the `_doprnt' function. */ #undef HAVE__DOPRNT --- libiberty/configure.jj 2023-11-11 08:52:20.967837512 +0100 +++ libiberty/configure 2023-11-25 12:51:16.375142489 +0100 @@ -7546,6 +7546,64 @@ case "${host}" in esac +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for SHA1 HW acceleration support" >&5 +$as_echo_n "checking for SHA1 HW acceleration support... " >&6; } +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ + +#include +#include + +__attribute__((__target__ ("sse4.1,sha"))) +void foo (__m128i *buf, unsigned int e, __m128i msg0, __m128i msg1) +{ + __m128i abcd = _mm_loadu_si128 ((const __m128i *) buf); + __m128i e0 = _mm_set_epi32 (e, 0, 0, 0); + abcd = _mm_shuffle_epi32 (abcd, 0x1b); + const __m128i shuf_mask = _mm_set_epi64x (0x0001020304050607ULL, 0x08090a0b0c0d0e0fULL); + abcd = _mm_shuffle_epi8 (abcd, shuf_mask); + e0 = _mm_sha1nexte_epu32 (e0, msg1); + abcd = _mm_sha1rnds4_epu32 (abcd, e0, 0); + msg0 = _mm_sha1msg1_epu32 (msg0, msg1); + msg0 = _mm_sha1msg2_epu32 (msg0, msg1); + msg0 = _mm_xor_si128 (msg0, msg1); + e0 = _mm_add_epi32 (e0, msg0); + e0 = abcd; + _mm_storeu_si128 (buf, abcd); + e = _mm_extract_epi32 (e0, 3); +} + +int bar (void) +{ + unsigned int eax, ebx, ecx, edx; + if (__get_cpuid_count (7, 0, &eax, &ebx, &ecx, &edx) + && (ebx & bit_SHA) != 0 + && __get_cpuid (1, &eax, &ebx, &ecx, &edx) + && (ecx & bit_SSE4_1) != 0) + return 1; + return 0; +} + +int +main () +{ +bar (); + ; + return 0; +} +_ACEOF +if ac_fn_c_try_compile "$LINENO"; then : + { $as_echo "$as_me:${as_lineno-$LINENO}: result: x86 SHA1" >&5 +$as_echo "x86 SHA1" >&6; } + +$as_echo "#define HAVE_X86_SHA1_HW_SUPPORT 1" >>confdefs.h + +else + { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5 +$as_echo "no" >&6; } +fi +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext +