From patchwork Sat Jan 27 07:53:42 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jakub Jelinek <jakub@redhat.com>
X-Patchwork-Id: 192985
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7301:2395:b0:106:343:edcb with SMTP id gw21csp373549dyb;
        Sat, 27 Jan 2024 00:15:11 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IFQZnnWnn3KCy4cfI0dvHEuMC2L5OVI10cEbD0gXJEsjSPkiQNX219Cxu+vD9BG2VCP9vU0
X-Received: by 2002:ac8:5f8f:0:b0:42a:41fc:39a3 with SMTP id
 j15-20020ac85f8f000000b0042a41fc39a3mr1039021qta.24.1706343311209;
        Sat, 27 Jan 2024 00:15:11 -0800 (PST)
ARC-Seal: i=2; a=rsa-sha256; t=1706343311; cv=pass;
        d=google.com; s=arc-20160816;
        b=0b20dgpfdyjDQ39murBP8bLpIZlsjUkFrozfNCxluWWwQFzP+53nRunsq3Iz46Y+Z9
         TlRMCH43i23dbFTX3stgcydpVdhTRi94BTHyAejpzNPbhRBEYIvO0/oHidUHdFxE76ma
         7mdz6QYqDnx2G35T9Zdba9Iv+IpiqP4Y8Ba/gUIfPd3fq9qzhw7dEhN5mGy/XLn+A3js
         PA12mp/hjDshK7GjDr+jczmWe843J8/ASZEK8acFEUQ2THk1pfuhi8wEX0HheZJg6GTO
         9WMhYJioRb+EKB/3jJNuc4F01LuhKryBJboiSl74RuXLLwrvBPZmwsW8bC7AtX1TBJby
         Q/wQ==
ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=errors-to:reply-to:list-subscribe:list-help:list-post:list-archive
         :list-unsubscribe:list-id:precedence:content-disposition
         :mime-version:message-id:subject:cc:to:from:date:resent-to
         :resent-message-id:resent-date:resent-from:dkim-signature:arc-filter
         :dmarc-filter:delivered-to;
        bh=7fieHIJlL7fotuHy4zQspIlM97UNRhw7MBW2vOr1kMQ=;
        fh=ihL1ztYjrjXoo7NHmqkZNAmZ6R6lAJfRTG/XGmQ67yo=;
        b=xGmtrqd3yid9lKzVA5kF4a8gcdeggDLfJ7oGPcFquVWxQ4XM1MRnYWG5msH0Hh3Uvn
         9pmqJK+W+6+9HuNFU/Nh3pjuJ07a33RxZG84FGlrBF5qJkYGPlpBOwkir03fL0ItcXj+
         YqAs5P990L9iBv0+tjcHFJeUY0mNtpmpgZUus12loXP4LEa0TonV4Pi4r6VJTRvw8xk2
         A2NV47QUwZ5+JBxJi3gEWZWHuzhkOWoYvYCVGEOprgE9pyj9LdA2WEBZFKyLNuybJcyf
         tfTl8DYbyAk9VKgbc+u8kjH5Lgbl0EsWGuBG+3GOyrL0+rcMs06I0Zpk/JWsWNJtrN25
         hdSA==
ARC-Authentication-Results: i=2; mx.google.com;
       dkim=pass header.i=@redhat.com header.s=mimecast20190719
 header.b=d14wUGgx;
       arc=pass (i=1);
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com
Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97])
        by mx.google.com with ESMTPS id
 e2-20020ac85982000000b0042a93934ddfsi258495qte.545.2024.01.27.00.15.11
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sat, 27 Jan 2024 00:15:11 -0800 (PST)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender) client-ip=8.43.85.97;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@redhat.com header.s=mimecast20190719
 header.b=d14wUGgx;
       arc=pass (i=1);
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id D68C33858289
	for <ouuuleilei@gmail.com>; Sat, 27 Jan 2024 08:15:10 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.129.124])
 by sourceware.org (Postfix) with ESMTPS id 83F073858D28
 for <gcc-patches@gcc.gnu.org>; Sat, 27 Jan 2024 08:13:43 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 83F073858D28
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 83F073858D28
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=170.10.129.124
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706343227; cv=none;
 b=Rrvk6KRIJzoLA/x8dVtUTd1HX+mXkY5DiBugDDsYIk0M7yTHNLCZoTrB8IA8xv00gPIvzf0/iDhwcGcGifQZv4VPUCRpmtxdNGgVYBC21QCiQul1w2ghD94XkIo+wO3VEbJE+SfQsEU1Otw1mSi/ei25VfpCA0UaCZlAi7UzMKI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1706343227; c=relaxed/simple;
 bh=LLNF2hXE8WLPHtQFt70oMNAiI1STnVqZTWMjp7CFUzs=;
 h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version;
 b=x5Jq8Zs/TQQMzX+SkAxHqIoR8Fv8XGN7oNnFKzbV5qfY0x+SJFUuUTunNt9AJG3pKrHsNXMEQwDdEHWQEaQGTU1hsLf4pm+W7J0lsJnStyvBNAf1pziCPu8XT/pNT9zMwpURv/4p2/IAJzLvsGeJ20/cVV4eQpllAWsSVgHi4uk=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
 s=mimecast20190719; t=1706343223;
 h=from:from:reply-to:reply-to:subject:subject:date:date:
 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
 content-type:content-type:resent-to:resent-from:resent-message-id;
 bh=7fieHIJlL7fotuHy4zQspIlM97UNRhw7MBW2vOr1kMQ=;
 b=d14wUGgxfqCws2a9QkNMrXXsng5q2jJtlb3eaF/7kYW8aEb4/FoxlnuS5cuJHcT2AOsLUS
 zcxtlNCmDHt9rtELfF+AYDTnvv0Jvaj0EBE9XKmrS5KteRjcBLDaoJD67kno9iD3UCdcEE
 x0oPv/daINgmyScDPw0FZ+JAjVYi0wc=
Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73])
 by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3,
 cipher=TLS_AES_256_GCM_SHA384) id us-mta-589-rJYV3Z5BOmm9QkGERxjcaw-1; Sat,
 27 Jan 2024 03:13:41 -0500
X-MC-Unique: rJYV3Z5BOmm9QkGERxjcaw-1
Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com
 [10.11.54.1])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest
 SHA256)
 (No client certificate requested)
 by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 238DB386915F
 for <gcc-patches@gcc.gnu.org>; Sat, 27 Jan 2024 08:13:41 +0000 (UTC)
Received: from tucnak.zalov.cz (unknown [10.39.192.70])
 by smtp.corp.redhat.com (Postfix) with ESMTPS id A7615AD1;
 Sat, 27 Jan 2024 08:13:40 +0000 (UTC)
Received: from tucnak.zalov.cz (localhost [127.0.0.1])
 by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 40R8Dc97141154
 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT);
 Sat, 27 Jan 2024 09:13:38 +0100
Received: (from jakub@localhost)
 by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 40R8DbiS141153;
 Sat, 27 Jan 2024 09:13:37 +0100
Resent-From: Jakub Jelinek <jakub@redhat.com>
Resent-Date: Sat, 27 Jan 2024 09:13:37 +0100
Resent-Message-ID: <ZbS7MerUbF6mdvDP@tucnak>
Resent-To: "Joseph S. Myers" <josmyers@redhat.com>, gcc-patches@gcc.gnu.org
Date: Sat, 27 Jan 2024 08:53:42 +0100
From: Jakub Jelinek <jakub@redhat.com>
To: "Joseph S. Myers" <josmyers@redhat.com>
Cc: gcc-patches@gcc.gnu.org
Subject: [PATCH] libgcc: Fix up _BitInt division [PR113604]
Message-ID: <ZbS2hfdIwvtpFuGZ@tucnak>
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.1
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Disposition: inline
X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE,
 RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Reply-To: Jakub Jelinek <jakub@redhat.com>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1789230643968088911
X-GMAIL-MSGID: 1789230643968088911

Hi!

The following testcase ends up with SIGFPE in __divmodbitint4.
The problem is a thinko in my attempt to implement Knuth's algorithm.

The algorithm does (where b is 65536, i.e. one larger than what
fits in their unsigned short word):
	// Compute estimate qhat of q[j].
	qhat = (un[j+n]*b + un[j+n-1])/vn[n-1];
	rhat = (un[j+n]*b + un[j+n-1]) - qhat*vn[n-1];
      again:
	if (qhat >= b || qhat*vn[n-2] > b*rhat + un[j+n-2])
	{ qhat = qhat - 1;
	  rhat = rhat + vn[n-1];
	  if (rhat < b) goto again;
	}
The problem is that it uses a double-word / word -> double-word
division (and modulo), while all we have is udiv_qrnnd unless
we'd want to do further library calls, and udiv_qrnnd is a
double-word / word -> word division and modulo.
Now, as the algorithm description says, it can produce at most
word bits + 1 bit quotient.  And I believe that actually the
highest qhat the original algorithm can produce is
(1 << word_bits) + 1.  The algorithm performs earlier canonicalization
where both the divisor and dividend are shifted left such that divisor
has msb set.  If it has msb set already before, no shifting occurs but
we start with added 0 limb, so in the first uv1:uv0 double-word uv1
is 0 and so we can't get too high qhat, if shifting occurs, the first
limb of dividend is shifted right by UWtype bits - shift count into
a new limb, so again in the first iteration in the uv1:uv0 double-word
uv1 doesn't have msb set while vv1 does and qhat has to fit into word.
In the following iterations, previous iteration should guarantee that
the previous quotient digit is correct.  Even if the divisor was the
maximal possible vv1:all_ones_in_all_lower_limbs, if the old uv0:lower_limbs
would be larger or equal to the divisor, the previous quotient digit
would increase and another divisor would be subtracted, which I think
implies that in the next iteration in uv1:uv0 double-word uv1 <= vv1,
but uv0 could be up to all ones, e.g. in case of all lower limbs
of divisor being all ones and at least one dividend limb below uv0
being not all ones.  So, we can e.g. for 64-bit UWtype see
uv1:uv0 / vv1 0x8000000000000000UL:0xffffffffffffffffUL / 0x8000000000000000UL
or 0xffffffffffffffffUL:0xffffffffffffffffUL / 0xffffffffffffffffUL
In all these cases (when uv1 == vv1 && uv0 >= uv1), qhat is
0x10000000000000001UL, i.e. 2 more than fits into UWtype result,
if uv1 == vv1 && uv0 < uv1 it would be 0x10000000000000000UL, i.e.
1 more than fits into UWtype result.
Because we only have udiv_qrnnd which can't deal with those too large
cases (SIGFPEs or otherwise invokes undefined behavior on those), I've
tried to handle the uv1 >= vv1 case separately, but for one thing
I thought it would be at most 1 larger than what fits, and for two
have actually subtracted vv1:vv1 from uv1:uv0 instead of subtracting
0:vv1 from uv1:uv0.
For the uv1 < vv1 case, the implementation already performs roughly
what the algorithm does.
Now, let's see what happens with the two possible extra cases in
the original algorithm.
If uv1 == vv1 && uv0 < uv1, qhat above would be b, so we take
if (qhat >= b, decrement qhat by 1 (it becomes b - 1), add
vn[n-1] aka vv1 to rhat and goto again if rhat < b (but because
qhat already fits we can goto to the again label in the uv1 < vv1
code).  rhat in this case is uv0 and rhat + vv1 can but doesn't
have to overflow, say for uv0 42UL and vv1 0x8000000000000000UL
it will not (and so we should goto again), while for uv0
0x8000000000000000UL and vv1 0x8000000000000001UL it will (and
we shouldn't goto again).
If uv1 == vv1 && uv0 >= uv1, qhat above would be b + 1, so we
take if (qhat >= b, decrement qhat by 1 (it becomes b), add
vn[n-1] aka vv1 to rhat. But because vv1 has msb set and
rhat in this case is uv0 - vv1, the rhat + vv1 addition
certainly doesn't overflow, because (uv0 - vv1) + vv1 is uv0,
so in the algorithm we goto again, again take if (qhat >= b and
decrement qhat so it finally becomes b - 1, and add vn[n-1]
aka vv1 to rhat again.  But this time I believe it must always
overflow, simply because we added (uv0 - vv1) + vv1 + vv1 and
vv1 has msb set, so already vv1 + vv1 must overflow.  And
because it overflowed, it will not goto again.
So, I believe the following patch implements this correctly, by
subtracting vv1 from uv1:uv0 double-word once, then comparing
again if uv1 >= vv1.  If that is true, subtract vv1 from uv1:uv0
again and add 2 * vv1 to rhat, no __builtin_add_overflow is needed
as we know it always overflowed and so won't goto again.
If after the first subtraction uv1 < vv1, use __builtin_add_overflow 
when adding vv1 to rhat, because it can but doesn't have to overflow.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-01-27  Jakub Jelinek  <jakub@redhat.com>

	PR libgcc/113604
	* libgcc2.c (__divmodbitint4): If uv1 >= vv1, subtract
	vv1 from uv1:uv0 once or twice as needed, rather than
	subtracting vv1:vv1.

	* gcc.dg/torture/bitint-53.c: New test.


	Jakub

--- libgcc/libgcc2.c.jj	2024-01-12 10:10:16.000000000 +0100
+++ libgcc/libgcc2.c	2024-01-26 10:27:10.932017695 +0100
@@ -1863,12 +1863,46 @@ __divmodbitint4 (UBILtype *q, SItype qpr
 	      if (uv1 >= vv1)
 		{
 		  /* udiv_qrnnd doesn't support quotients which don't
-		     fit into UWtype, so subtract from uv1:uv0 vv1
-		     first.  */
-		  uv1 -= vv1 + __builtin_sub_overflow (uv0, vv1, &uv0);
-		  udiv_qrnnd (qhat, rhat, uv1, uv0, vv1);
-		  if (!__builtin_add_overflow (rhat, vv1, &rhat))
-		    goto again;
+		     fit into UWtype, while Knuth's algorithm originally
+		     uses a double-word by word to double-word division.
+		     Fortunately, the algorithm guarantees that uv1 <= vv1,
+		     because if uv1 > vv1, then even if v would have all
+		     bits in all words below vv1 set, the previous iteration
+		     would be supposed to use qhat larger by 1 and subtract
+		     v.  With uv1 == vv1 and uv0 >= vv1 the double-word
+		     qhat in Knuth's algorithm would be 1 in the upper word
+		     and 1 in the lower word, say for
+		     uv1 0x8000000000000000ULL
+		     uv0 0xffffffffffffffffULL
+		     vv1 0x8000000000000000ULL
+		     0x8000000000000000ffffffffffffffffuwb
+		     / 0x8000000000000000uwb == 0x10000000000000001uwb, and
+		     exactly like that also for any other value
+		     > 0x8000000000000000ULL in uv1 and vv1 and uv0 >= uv1.
+		     So we need to subtract one or at most two vv1s from
+		     uv1:uv0 (qhat because of that decreases by 1 or 2 and
+		     is then representable in UWtype) and need to increase
+		     rhat by vv1 once or twice because of that.  Now, if
+		     we need to subtract 2 vv1s, i.e. if
+		     uv1 == vv1 && uv0 >= vv1, then rhat (which is uv0 - vv1)
+		     + vv1 computation can't overflow, because it is equal
+		     to uv0 and therefore the original algorithm in that case
+		     performs goto again, but the second vv1 addition must
+		     overflow already because vv1 has msb set from the
+		     canonicalization.  */
+		  uv1 -= __builtin_sub_overflow (uv0, vv1, &uv0);
+		  if (uv1 >= vv1)
+		    {
+		      uv1 -= __builtin_sub_overflow (uv0, vv1, &uv0);
+		      udiv_qrnnd (qhat, rhat, uv1, uv0, vv1);
+		      rhat += 2 * vv1;
+		    }
+		  else
+		    {
+		      udiv_qrnnd (qhat, rhat, uv1, uv0, vv1);
+		      if (!__builtin_add_overflow (rhat, vv1, &rhat))
+			goto again;
+		    }
 		}
 	      else
 		{
--- gcc/testsuite/gcc.dg/torture/bitint-53.c.jj	2024-01-26 07:23:31.651790252 +0100
+++ gcc/testsuite/gcc.dg/torture/bitint-53.c	2024-01-26 07:23:20.608945843 +0100
@@ -0,0 +1,26 @@
+/* PR libgcc/113604 */
+/* { dg-do run { target bitint } } */
+/* { dg-options "-std=c23 -pedantic-errors" } */
+/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
+/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
+
+#if __BITINT_MAXWIDTH__ >= 256
+unsigned _BitInt (256) x;
+
+void
+foo (unsigned _BitInt (256) a, unsigned _BitInt (128) b)
+{
+  x = a / b;
+}
+#endif
+
+int
+main ()
+{
+#if __BITINT_MAXWIDTH__ >= 256
+  foo (0xfffffffffffffffffffffc0000000000000000000004uwb, 0x7ffffffffffffffffffffffffffuwb);
+  if (x != 0x1fffffffffffffffffuwb)
+    __builtin_abort ();
+#endif
+  return 0;
+}