From patchwork Sat Nov 25 17:33:52 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jakub Jelinek <jakub@redhat.com>
X-Patchwork-Id: 169759
Return-Path: <binutils-bounces+ouuuleilei=gmail.com@sourceware.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:ce62:0:b0:403:3b70:6f57 with SMTP id o2csp2069482vqx;
        Sat, 25 Nov 2023 09:34:13 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IGVEflitJmeY3eivdBUXV0Iwwsgcgr013Ogr+2FAwsaRHmmufstqMpr0/iMXvlGWMws6Y4p
X-Received: by 2002:a05:6214:5990:b0:67a:2472:9683 with SMTP id
 qp16-20020a056214599000b0067a24729683mr4560250qvb.9.1700933653649;
        Sat, 25 Nov 2023 09:34:13 -0800 (PST)
ARC-Seal: i=2; a=rsa-sha256; t=1700933653; cv=pass;
        d=google.com; s=arc-20160816;
        b=H60C22+8R5e01dp8CQwJk24UCAyQmKH6EIb2XXsWAGzD4gvn9u4L4+4UFLalcLXHQE
         TVLL2juDTB+fwZphlRfHi15FHX4nCXh2hU+aEE8HXRD9CWEFYENUt+4vuIF9hyx0oUxP
         DT+O4PVKsv4LHRmB13VSHV+Uo8l3GdP60tmy2x2rj6DCMBWOkgzHPVBKhmJNY82V7Bwz
         4DZJ81MHN79h3X1BQYbxhmCG8nEgLk3dwCwOlbZMC1EXR+GStZvndiHPTaOeE763XfMf
         IH6MLL2R4mWo98/gJtEBBnMtq3SUsOzFIfzyW6ThkkUGBOTxjMQRusb5QLCgVzY/Ob+c
         acrQ==
ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=errors-to:reply-to:list-subscribe:list-help:list-post:list-archive
         :list-unsubscribe:list-id:precedence:content-disposition
         :mime-version:message-id:subject:to:from:date:dkim-signature
         :arc-filter:dmarc-filter:delivered-to;
        bh=zRJVsysorrpnKItQcau+ChIE3K03S71hIc536+Y7/7U=;
        fh=NLxAvL/bDfPg4AGOtxqvQlND8vazkZrNzKLY8+LAbBY=;
        b=Zrw358YsDVYhsGmDReLii4h7jPbpbbGnQ+WJS1UykwWsugk6OqqGawGrqTAAqr5ufW
         6gpRIgxRiItPdNzX+JN4zHe6Qcrgcz8IBu9OOiJehxsUCjIFbX18hKqxt6WX+gLKH9U2
         1rHSM9Vq30dm6OClCIiQHNvlfRkFHsQmNhCY77d9xQoxonPcuJnZgGyFR8Awk4rrKEhg
         Eufcr/2k7tj+3cbxeSBG0XhUbxb6VmWAV6JzfnRZ6hjrg8jM4IWMsdnWmOLy9mcLlBV4
         35zLYhRtXhe541vbwT8LqByhbKQd28L1pJ9HMFtKFD9siQP/iyYAd7ktNYDDiEiAj2bB
         z9iQ==
ARC-Authentication-Results: i=2; mx.google.com;
       dkim=pass header.i=@redhat.com header.s=mimecast20190719
 header.b=gQrvPe++;
       arc=pass (i=1);
       spf=pass (google.com: domain of
 binutils-bounces+ouuuleilei=gmail.com@sourceware.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="binutils-bounces+ouuuleilei=gmail.com@sourceware.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com
Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97])
        by mx.google.com with ESMTPS id
 ea13-20020a05620a488d00b0077d783c7ee9si5272176qkb.193.2023.11.25.09.34.13
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 25 Nov 2023 09:34:13 -0800 (PST)
Received-SPF: pass (google.com: domain of
 binutils-bounces+ouuuleilei=gmail.com@sourceware.org designates 8.43.85.97 as
 permitted sender) client-ip=8.43.85.97;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@redhat.com header.s=mimecast20190719
 header.b=gQrvPe++;
       arc=pass (i=1);
       spf=pass (google.com: domain of
 binutils-bounces+ouuuleilei=gmail.com@sourceware.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="binutils-bounces+ouuuleilei=gmail.com@sourceware.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 6853538582A7
	for <ouuuleilei@gmail.com>; Sat, 25 Nov 2023 17:34:13 +0000 (GMT)
X-Original-To: binutils@sourceware.org
Delivered-To: binutils@sourceware.org
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.133.124])
 by sourceware.org (Postfix) with ESMTPS id 6DD9A38582AA
 for <binutils@sourceware.org>; Sat, 25 Nov 2023 17:34:03 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6DD9A38582AA
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 6DD9A38582AA
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=170.10.133.124
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700933645; cv=none;
 b=ForZtBR6xDieCOXp/uYL49baQYt6lKNWfwjtp18PhOAr9gUlntBialeclh0yDOq/ZkRbcpSDcEr09rBY2eX6wkiZs1QnuHg1UkRfwlVB/vLIo+P/BCB8XK0rMUXy69rwMJTZMYqEVM0Hdm+C+72puTUyRkLvO92Su5Yg1QHp2VE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1700933645; c=relaxed/simple;
 bh=EfoLu60khSL77/8CZfH66FQP7J+J5V7oTqNeBov+Y3w=;
 h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version;
 b=eY8eJzEkLfeSq8FCLg5Smuz/7N/EBzXXiMg+Ws4SbkN9cLpgRH+KG7MEWGv+un5Vwx6ka6FpgkJvWbgvcnku3sfVki7JMYw3DR9c/kdYTTj8AkM+0MC4+4vOhPCM5HAkm409mZfQsfup03Aur4PJhhaM7K3B5h3sYazB9Pj2uBk=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
 s=mimecast20190719; t=1700933642;
 h=from:from:reply-to:reply-to:subject:subject:date:date:
 message-id:message-id:to:to:cc:mime-version:mime-version:
 content-type:content-type; bh=zRJVsysorrpnKItQcau+ChIE3K03S71hIc536+Y7/7U=;
 b=gQrvPe++7/PBD0CmZs8MXOBtKqTp/jeHbb45fmwZmmdOumqAxyOqpQuyf4w3zmOt4sNyEl
 anzq9eK6ITz/uDlYV/gS48IXChGoEzTxOO+pmKK7aXeK6QjCrDzixWg+QXs9Gm4K/IqWK2
 rEDHbBe1wE4s6FUgyvdDY6pqIfHXOE0=
Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com
 [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id
 us-mta-376-5DDTS7YgNtqBgPWT_b3JUw-1; Sat, 25 Nov 2023 12:34:01 -0500
X-MC-Unique: 5DDTS7YgNtqBgPWT_b3JUw-1
Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com
 [10.11.54.6])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest
 SHA256)
 (No client certificate requested)
 by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 271A080C343
 for <binutils@sourceware.org>; Sat, 25 Nov 2023 17:34:01 +0000 (UTC)
Received: from tucnak.zalov.cz (unknown [10.39.194.53])
 by smtp.corp.redhat.com (Postfix) with ESMTPS id C3D492166B26
 for <binutils@sourceware.org>; Sat, 25 Nov 2023 17:34:00 +0000 (UTC)
Received: from tucnak.zalov.cz (localhost [127.0.0.1])
 by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 3APHXr9o1794965
 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT)
 for <binutils@sourceware.org>; Sat, 25 Nov 2023 18:33:53 +0100
Received: (from jakub@localhost)
 by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 3APHXrdQ1794964
 for binutils@sourceware.org; Sat, 25 Nov 2023 18:33:53 +0100
Date: Sat, 25 Nov 2023 18:33:52 +0100
From: Jakub Jelinek <jakub@redhat.com>
To: binutils@sourceware.org
Subject: [PATCH] libiberty, ld: Use x86 HW optimized sha1
Message-ID: <ZWIwAD8aosBVZqGk@tucnak>
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.6
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Disposition: inline
X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_NUMSUBJECT,
 RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,
 SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: binutils@sourceware.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Binutils mailing list <binutils.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/binutils>,
 <mailto:binutils-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/binutils/>
List-Post: <mailto:binutils@sourceware.org>
List-Help: <mailto:binutils-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/binutils>,
 <mailto:binutils-request@sourceware.org?subject=subscribe>
Reply-To: Jakub Jelinek <jakub@redhat.com>
Errors-To: binutils-bounces+ouuuleilei=gmail.com@sourceware.org
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1783558206897307309
X-GMAIL-MSGID: 1783558206897307309

Hi!

The following patch attempts to use x86 SHA ISA if available to speed
up in my testing about 2.5x sha1 build-id processing (in my case on
AMD Ryzen 5 3600) while producing the same result.
I believe AArch64 has similar HW acceleration for SHA1, perhaps it
could be added similarly.

Note, seems lld uses BLAKE3 rather than md5/sha1.  I think it would be
a bad idea to lie to users, if they choose --buildid=sha1, we should
be using SHA1, not some other checksum, but perhaps we could add some other
--buildid= styles and perhaps make one of the new the default.

2023-11-25  Jakub Jelinek  <jakub@redhat.com>

include/
	* sha1.h (sha1_process_bytes_fn): New typedef.
	(sha1_choose_process_bytes): Declare.
libiberty/
	* configure.ac (HAVE_X86_SHA1_HW_SUPPORT): New check.
	* sha1.c: If HAVE_X86_SHA1_HW_SUPPORT is defined, include x86intrin.h
	and cpuid.h.
	(sha1_hw_process_bytes, sha1_hw_process_block,
	sha1_choose_process_bytes): New functions.
	* config.in: Regenerated.
	* configure: Regenerated.
ld/
	* ldbuildid.c (generate_build_id): Use sha1_choose_process_bytes ()
	instead of &sha1_process_bytes.


	Jakub

--- include/sha1.h.jj	2023-01-16 11:52:16.315730646 +0100
+++ include/sha1.h	2023-11-25 12:22:13.191136098 +0100
@@ -108,6 +108,13 @@ extern void sha1_process_block (const vo
 extern void sha1_process_bytes (const void *buffer, size_t len,
 				struct sha1_ctx *ctx);
 
+typedef void (*sha1_process_bytes_fn) (const void *, size_t,
+				       struct sha1_ctx *);
+
+/* Return sha1_process_bytes or some hardware optimized version thereof
+   depending on current CPU.  */
+extern sha1_process_bytes_fn sha1_choose_process_bytes (void);
+
 /* Process the remaining bytes in the buffer and put result from CTX
    in first 20 bytes following RESBUF.  The result is always in little
    endian byte order, so that a byte-wise output yields to the wanted
--- libiberty/configure.ac.jj	2023-11-11 08:52:20.968837498 +0100
+++ libiberty/configure.ac	2023-11-25 12:51:05.540291805 +0100
@@ -742,6 +742,46 @@ case "${host}" in
 esac
 AC_SUBST(pexecute)
 
+AC_MSG_CHECKING([for SHA1 HW acceleration support])
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
+#include <x86intrin.h>
+#include <cpuid.h>
+
+__attribute__((__target__ ("sse4.1,sha")))
+void foo (__m128i *buf, unsigned int e, __m128i msg0, __m128i msg1)
+{
+  __m128i abcd = _mm_loadu_si128 ((const __m128i *) buf);
+  __m128i e0 = _mm_set_epi32 (e, 0, 0, 0);
+  abcd = _mm_shuffle_epi32 (abcd, 0x1b);
+  const __m128i shuf_mask = _mm_set_epi64x (0x0001020304050607ULL, 0x08090a0b0c0d0e0fULL);
+  abcd = _mm_shuffle_epi8 (abcd, shuf_mask);
+  e0 = _mm_sha1nexte_epu32 (e0, msg1);
+  abcd = _mm_sha1rnds4_epu32 (abcd, e0, 0);
+  msg0 = _mm_sha1msg1_epu32 (msg0, msg1);
+  msg0 = _mm_sha1msg2_epu32 (msg0, msg1);
+  msg0 = _mm_xor_si128 (msg0, msg1);
+  e0 = _mm_add_epi32 (e0, msg0);
+  e0 = abcd;
+  _mm_storeu_si128 (buf, abcd);
+  e = _mm_extract_epi32 (e0, 3);
+}
+
+int bar (void)
+{
+  unsigned int eax, ebx, ecx, edx;
+  if (__get_cpuid_count (7, 0, &eax, &ebx, &ecx, &edx)
+      && (ebx & bit_SHA) != 0
+      && __get_cpuid (1, &eax, &ebx, &ecx, &edx)
+      && (ecx & bit_SSE4_1) != 0)
+    return 1;
+  return 0;
+}
+]], [[bar ();]])],
+  [AC_MSG_RESULT([x86 SHA1])
+  AC_DEFINE(HAVE_X86_SHA1_HW_SUPPORT, 1,
+	    [Define if you have x86 SHA1 HW acceleration support.])],
+  [AC_MSG_RESULT([no])])
+
 libiberty_AC_FUNC_STRNCMP
 
 # Install a library built with a cross compiler in $(tooldir) rather
--- libiberty/sha1.c.jj	2023-01-16 11:52:16.874722408 +0100
+++ libiberty/sha1.c	2023-11-25 12:48:36.301348519 +0100
@@ -29,6 +29,11 @@
 #include <stddef.h>
 #include <string.h>
 
+#ifdef HAVE_X86_SHA1_HW_SUPPORT
+# include <x86intrin.h>
+# include <cpuid.h>
+#endif
+
 #if USE_UNLOCKED_IO
 # include "unlocked-io.h"
 #endif
@@ -412,3 +417,303 @@ sha1_process_block (const void *buffer,
       e = ctx->E += e;
     }
 }
+
+#if defined(HAVE_X86_SHA1_HW_SUPPORT)
+/* HW specific version of sha1_process_bytes.  */
+
+static void sha1_hw_process_block (const void *, size_t, struct sha1_ctx *);
+
+static void
+sha1_hw_process_bytes (const void *buffer, size_t len, struct sha1_ctx *ctx)
+{
+  /* When we already have some bits in our internal buffer concatenate
+     both inputs first.  */
+  if (ctx->buflen != 0)
+    {
+      size_t left_over = ctx->buflen;
+      size_t add = 128 - left_over > len ? len : 128 - left_over;
+
+      memcpy (&((char *) ctx->buffer)[left_over], buffer, add);
+      ctx->buflen += add;
+
+      if (ctx->buflen > 64)
+	{
+	  sha1_hw_process_block (ctx->buffer, ctx->buflen & ~63, ctx);
+
+	  ctx->buflen &= 63;
+	  /* The regions in the following copy operation cannot overlap.  */
+	  memcpy (ctx->buffer,
+		  &((char *) ctx->buffer)[(left_over + add) & ~63],
+		  ctx->buflen);
+	}
+
+      buffer = (const char *) buffer + add;
+      len -= add;
+    }
+
+  /* Process available complete blocks.  */
+  if (len >= 64)
+    {
+#if !_STRING_ARCH_unaligned
+# define alignof(type) offsetof (struct { char c; type x; }, x)
+# define UNALIGNED_P(p) (((size_t) p) % alignof (sha1_uint32) != 0)
+      if (UNALIGNED_P (buffer))
+	while (len > 64)
+	  {
+	    sha1_hw_process_block (memcpy (ctx->buffer, buffer, 64), 64, ctx);
+	    buffer = (const char *) buffer + 64;
+	    len -= 64;
+	  }
+      else
+#endif
+	{
+	  sha1_hw_process_block (buffer, len & ~63, ctx);
+	  buffer = (const char *) buffer + (len & ~63);
+	  len &= 63;
+	}
+    }
+
+  /* Move remaining bytes in internal buffer.  */
+  if (len > 0)
+    {
+      size_t left_over = ctx->buflen;
+
+      memcpy (&((char *) ctx->buffer)[left_over], buffer, len);
+      left_over += len;
+      if (left_over >= 64)
+	{
+	  sha1_hw_process_block (ctx->buffer, 64, ctx);
+	  left_over -= 64;
+	  memmove (ctx->buffer, &ctx->buffer[16], left_over);
+	}
+      ctx->buflen = left_over;
+    }
+}
+
+/* Process LEN bytes of BUFFER, accumulating context into CTX.
+   Using CPU specific intrinsics.  */
+
+#ifdef HAVE_X86_SHA1_HW_SUPPORT
+__attribute__((__target__ ("sse4.1,sha")))
+#endif
+static void
+sha1_hw_process_block (const void *buffer, size_t len, struct sha1_ctx *ctx)
+{
+#ifdef HAVE_X86_SHA1_HW_SUPPORT
+  /* Implemented from
+     https://www.intel.com/content/www/us/en/developer/articles/technical/intel-sha-extensions.html  */
+  const __m128i *words = (const __m128i *) buffer;
+  const __m128i *endp = (const __m128i *) ((const char *) buffer + len);
+  __m128i abcd, abcd_save, e0, e0_save, e1, msg0, msg1, msg2, msg3;
+  const __m128i shuf_mask
+    = _mm_set_epi64x (0x0001020304050607ULL, 0x08090a0b0c0d0e0fULL);
+  char check[((offsetof (struct sha1_ctx, B)
+	     == offsetof (struct sha1_ctx, A) + sizeof (ctx->A))
+		   && (offsetof (struct sha1_ctx, C)
+		       == offsetof (struct sha1_ctx, A) + 2 * sizeof (ctx->A))
+		   && (offsetof (struct sha1_ctx, D)
+		       == offsetof (struct sha1_ctx, A) + 3 * sizeof (ctx->A)))
+		  ? 1 : -1];
+
+  /* First increment the byte count.  RFC 1321 specifies the possible
+     length of the file up to 2^64 bits.  Here we only compute the
+     number of bytes.  Do a double word increment.  */
+  ctx->total[0] += len;
+  ctx->total[1] += ((len >> 31) >> 1) + (ctx->total[0] < len);
+
+  (void) &check[0];
+  abcd = _mm_loadu_si128 ((const __m128i *) &ctx->A);
+  e0 = _mm_set_epi32 (ctx->E, 0, 0, 0);
+  abcd = _mm_shuffle_epi32 (abcd, 0x1b); /* 0, 1, 2, 3 */
+
+  while (words < endp)
+    {
+      abcd_save = abcd;
+      e0_save = e0;
+
+      /* 0..3 */
+      msg0 = _mm_loadu_si128 (words);
+      msg0 = _mm_shuffle_epi8 (msg0, shuf_mask);
+      e0 = _mm_add_epi32 (e0, msg0);
+      e1 = abcd;
+      abcd = _mm_sha1rnds4_epu32 (abcd, e0, 0);
+
+      /* 4..7 */
+      msg1 = _mm_loadu_si128 (words + 1);
+      msg1 = _mm_shuffle_epi8 (msg1, shuf_mask);
+      e1 = _mm_sha1nexte_epu32 (e1, msg1);
+      e0 = abcd;
+      abcd = _mm_sha1rnds4_epu32 (abcd, e1, 0);
+      msg0 = _mm_sha1msg1_epu32 (msg0, msg1);
+
+      /* 8..11 */
+      msg2 = _mm_loadu_si128 (words + 2);
+      msg2 = _mm_shuffle_epi8 (msg2, shuf_mask);
+      e0 = _mm_sha1nexte_epu32 (e0, msg2);
+      e1 = abcd;
+      abcd = _mm_sha1rnds4_epu32 (abcd, e0, 0);
+      msg1 = _mm_sha1msg1_epu32 (msg1, msg2);
+      msg0 = _mm_xor_si128 (msg0, msg2);
+
+      /* 12..15 */
+      msg3 = _mm_loadu_si128 (words + 3);
+      msg3 = _mm_shuffle_epi8 (msg3, shuf_mask);
+      e1 = _mm_sha1nexte_epu32 (e1, msg3);
+      e0 = abcd;
+      msg0 = _mm_sha1msg2_epu32 (msg0, msg3);
+      abcd = _mm_sha1rnds4_epu32 (abcd, e1, 0);
+      msg2 = _mm_sha1msg1_epu32 (msg2, msg3);
+      msg1 = _mm_xor_si128 (msg1, msg3);
+
+      /* 16..19 */
+      e0 = _mm_sha1nexte_epu32 (e0, msg0);
+      e1 = abcd;
+      msg1 = _mm_sha1msg2_epu32 (msg1, msg0);
+      abcd = _mm_sha1rnds4_epu32 (abcd, e0, 0);
+      msg3 = _mm_sha1msg1_epu32 (msg3, msg0);
+      msg2 = _mm_xor_si128 (msg2, msg0);
+
+      /* 20..23 */
+      e1 = _mm_sha1nexte_epu32 (e1, msg1);
+      e0 = abcd;
+      msg2 = _mm_sha1msg2_epu32 (msg2, msg1);
+      abcd = _mm_sha1rnds4_epu32 (abcd, e1, 1);
+      msg0 = _mm_sha1msg1_epu32 (msg0, msg1);
+      msg3 = _mm_xor_si128 (msg3, msg1);
+
+      /* 24..27 */
+      e0 = _mm_sha1nexte_epu32 (e0, msg2);
+      e1 = abcd;
+      msg3 = _mm_sha1msg2_epu32 (msg3, msg2);
+      abcd = _mm_sha1rnds4_epu32 (abcd, e0, 1);
+      msg1 = _mm_sha1msg1_epu32 (msg1, msg2);
+      msg0 = _mm_xor_si128 (msg0, msg2);
+
+      /* 28..31 */
+      e1 = _mm_sha1nexte_epu32 (e1, msg3);
+      e0 = abcd;
+      msg0 = _mm_sha1msg2_epu32 (msg0, msg3);
+      abcd = _mm_sha1rnds4_epu32 (abcd, e1, 1);
+      msg2 = _mm_sha1msg1_epu32 (msg2, msg3);
+      msg1 = _mm_xor_si128 (msg1, msg3);
+
+      /* 32..35 */
+      e0 = _mm_sha1nexte_epu32 (e0, msg0);
+      e1 = abcd;
+      msg1 = _mm_sha1msg2_epu32 (msg1, msg0);
+      abcd = _mm_sha1rnds4_epu32 (abcd, e0, 1);
+      msg3 = _mm_sha1msg1_epu32 (msg3, msg0);
+      msg2 = _mm_xor_si128 (msg2, msg0);
+
+      /* 36..39 */
+      e1 = _mm_sha1nexte_epu32 (e1, msg1);
+      e0 = abcd;
+      msg2 = _mm_sha1msg2_epu32 (msg2, msg1);
+      abcd = _mm_sha1rnds4_epu32 (abcd, e1, 1);
+      msg0 = _mm_sha1msg1_epu32 (msg0, msg1);
+      msg3 = _mm_xor_si128 (msg3, msg1);
+
+      /* 40..43 */
+      e0 = _mm_sha1nexte_epu32 (e0, msg2);
+      e1 = abcd;
+      msg3 = _mm_sha1msg2_epu32 (msg3, msg2);
+      abcd = _mm_sha1rnds4_epu32 (abcd, e0, 2);
+      msg1 = _mm_sha1msg1_epu32 (msg1, msg2);
+      msg0 = _mm_xor_si128 (msg0, msg2);
+
+      /* 44..47 */
+      e1 = _mm_sha1nexte_epu32 (e1, msg3);
+      e0 = abcd;
+      msg0 = _mm_sha1msg2_epu32 (msg0, msg3);
+      abcd = _mm_sha1rnds4_epu32 (abcd, e1, 2);
+      msg2 = _mm_sha1msg1_epu32 (msg2, msg3);
+      msg1 = _mm_xor_si128 (msg1, msg3);
+
+      /* 48..51 */
+      e0 = _mm_sha1nexte_epu32 (e0, msg0);
+      e1 = abcd;
+      msg1 = _mm_sha1msg2_epu32 (msg1, msg0);
+      abcd = _mm_sha1rnds4_epu32 (abcd, e0, 2);
+      msg3 = _mm_sha1msg1_epu32 (msg3, msg0);
+      msg2 = _mm_xor_si128 (msg2, msg0);
+
+      /* 52..55 */
+      e1 = _mm_sha1nexte_epu32 (e1, msg1);
+      e0 = abcd;
+      msg2 = _mm_sha1msg2_epu32 (msg2, msg1);
+      abcd = _mm_sha1rnds4_epu32 (abcd, e1, 2);
+      msg0 = _mm_sha1msg1_epu32 (msg0, msg1);
+      msg3 = _mm_xor_si128 (msg3, msg1);
+
+      /* 56..59 */
+      e0 = _mm_sha1nexte_epu32 (e0, msg2);
+      e1 = abcd;
+      msg3 = _mm_sha1msg2_epu32 (msg3, msg2);
+      abcd = _mm_sha1rnds4_epu32 (abcd, e0, 2);
+      msg1 = _mm_sha1msg1_epu32 (msg1, msg2);
+      msg0 = _mm_xor_si128 (msg0, msg2);
+
+      /* 60..63 */
+      e1 = _mm_sha1nexte_epu32 (e1, msg3);
+      e0 = abcd;
+      msg0 = _mm_sha1msg2_epu32 (msg0, msg3);
+      abcd = _mm_sha1rnds4_epu32 (abcd, e1, 3);
+      msg2 = _mm_sha1msg1_epu32 (msg2, msg3);
+      msg1 = _mm_xor_si128 (msg1, msg3);
+
+      /* 64..67 */
+      e0 = _mm_sha1nexte_epu32 (e0, msg0);
+      e1 = abcd;
+      msg1 = _mm_sha1msg2_epu32 (msg1, msg0);
+      abcd = _mm_sha1rnds4_epu32 (abcd, e0, 3);
+      msg3 = _mm_sha1msg1_epu32 (msg3, msg0);
+      msg2 = _mm_xor_si128 (msg2, msg0);
+
+      /* 68..71 */
+      e1 = _mm_sha1nexte_epu32 (e1, msg1);
+      e0 = abcd;
+      msg2 = _mm_sha1msg2_epu32 (msg2, msg1);
+      abcd = _mm_sha1rnds4_epu32 (abcd, e1, 3);
+      msg3 = _mm_xor_si128 (msg3, msg1);
+
+      /* 72..75 */
+      e0 = _mm_sha1nexte_epu32 (e0, msg2);
+      e1 = abcd;
+      msg3 = _mm_sha1msg2_epu32 (msg3, msg2);
+      abcd = _mm_sha1rnds4_epu32 (abcd, e0, 3);
+
+      /* 76..79 */
+      e1 = _mm_sha1nexte_epu32 (e1, msg3);
+      e0 = abcd;
+      abcd = _mm_sha1rnds4_epu32 (abcd, e1, 3);
+
+      /* Finalize. */
+      e0 = _mm_sha1nexte_epu32 (e0, e0_save);
+      abcd = _mm_add_epi32 (abcd, abcd_save);
+
+      words = words + 4;
+    }
+
+  abcd = _mm_shuffle_epi32 (abcd, 0x1b); /* 0, 1, 2, 3 */
+  _mm_storeu_si128 ((__m128i *) &ctx->A, abcd);
+  ctx->E = _mm_extract_epi32 (e0, 3);
+#endif
+}
+#endif
+
+/* Return sha1_process_bytes or some hardware optimized version thereof
+   depending on current CPU.  */
+
+sha1_process_bytes_fn
+sha1_choose_process_bytes (void)
+{
+#ifdef HAVE_X86_SHA1_HW_SUPPORT
+  unsigned int eax, ebx, ecx, edx;
+  if (__get_cpuid_count (7, 0, &eax, &ebx, &ecx, &edx)
+      && (ebx & bit_SHA) != 0
+      && __get_cpuid (1, &eax, &ebx, &ecx, &edx)
+      && (ecx & bit_SSE4_1) != 0)
+    return sha1_hw_process_bytes;
+#endif
+  return sha1_process_bytes;
+}
--- libiberty/config.in.jj	2023-11-11 08:52:20.964837553 +0100
+++ libiberty/config.in	2023-11-25 12:49:08.231908560 +0100
@@ -441,6 +441,9 @@
 /* Define to 1 if `vfork' works. */
 #undef HAVE_WORKING_VFORK
 
+/* Define if you have x86 SHA1 HW acceleration support. */
+#undef HAVE_X86_SHA1_HW_SUPPORT
+
 /* Define to 1 if you have the `_doprnt' function. */
 #undef HAVE__DOPRNT
 
--- libiberty/configure.jj	2023-11-11 08:52:20.967837512 +0100
+++ libiberty/configure	2023-11-25 12:51:16.375142489 +0100
@@ -7546,6 +7546,64 @@ case "${host}" in
 esac
 
 
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for SHA1 HW acceleration support" >&5
+$as_echo_n "checking for SHA1 HW acceleration support... " >&6; }
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+#include <x86intrin.h>
+#include <cpuid.h>
+
+__attribute__((__target__ ("sse4.1,sha")))
+void foo (__m128i *buf, unsigned int e, __m128i msg0, __m128i msg1)
+{
+  __m128i abcd = _mm_loadu_si128 ((const __m128i *) buf);
+  __m128i e0 = _mm_set_epi32 (e, 0, 0, 0);
+  abcd = _mm_shuffle_epi32 (abcd, 0x1b);
+  const __m128i shuf_mask = _mm_set_epi64x (0x0001020304050607ULL, 0x08090a0b0c0d0e0fULL);
+  abcd = _mm_shuffle_epi8 (abcd, shuf_mask);
+  e0 = _mm_sha1nexte_epu32 (e0, msg1);
+  abcd = _mm_sha1rnds4_epu32 (abcd, e0, 0);
+  msg0 = _mm_sha1msg1_epu32 (msg0, msg1);
+  msg0 = _mm_sha1msg2_epu32 (msg0, msg1);
+  msg0 = _mm_xor_si128 (msg0, msg1);
+  e0 = _mm_add_epi32 (e0, msg0);
+  e0 = abcd;
+  _mm_storeu_si128 (buf, abcd);
+  e = _mm_extract_epi32 (e0, 3);
+}
+
+int bar (void)
+{
+  unsigned int eax, ebx, ecx, edx;
+  if (__get_cpuid_count (7, 0, &eax, &ebx, &ecx, &edx)
+      && (ebx & bit_SHA) != 0
+      && __get_cpuid (1, &eax, &ebx, &ecx, &edx)
+      && (ecx & bit_SSE4_1) != 0)
+    return 1;
+  return 0;
+}
+
+int
+main ()
+{
+bar ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"; then :
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: x86 SHA1" >&5
+$as_echo "x86 SHA1" >&6; }
+
+$as_echo "#define HAVE_X86_SHA1_HW_SUPPORT 1" >>confdefs.h
+
+else
+  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
+
 
 
--- ld/ldbuildid.c.jj	2023-11-10 16:47:32.608718016 +0100
+++ ld/ldbuildid.c	2023-11-25 18:20:06.734134874 +0100
@@ -114,7 +114,8 @@ generate_build_id (bfd *abfd,
       struct sha1_ctx ctx;
 
       sha1_init_ctx (&ctx);
-      if (!(*checksum_contents) (abfd, (sum_fn) &sha1_process_bytes, &ctx))
+      if (!(*checksum_contents) (abfd, (sum_fn) sha1_choose_process_bytes (),
+				 &ctx))
 	return false;
       sha1_finish_ctx (&ctx, id_bits);
     }