From patchwork Wed Nov 29 11:13:19 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jakub Jelinek <jakub@redhat.com>
X-Patchwork-Id: 171261
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:a5a7:0:b0:403:3b70:6f57 with SMTP id d7csp263513vqn;
        Wed, 29 Nov 2023 03:13:53 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IGQN3I8kxhAxMLuH/Y0gXD/cgqmj0Am4eaIUVzT8CPL4FJb5/avvayyi62JM5kea81llQ6D
X-Received: by 2002:a05:6214:16d:b0:67a:2942:988 with SMTP id
 y13-20020a056214016d00b0067a29420988mr13726603qvs.21.1701256433109;
        Wed, 29 Nov 2023 03:13:53 -0800 (PST)
ARC-Seal: i=2; a=rsa-sha256; t=1701256433; cv=pass;
        d=google.com; s=arc-20160816;
        b=loGhWXoCTBbdjgf+cqu4pws022L2w/tzu5tWC8shRC46cGxYcOFpOU00dZpFIhpkhM
         GoVSsY3iw/MIzC4gkgHiy5kH7++Ax7FSMigZSPKxxVqUhGLBI1lwYdxElmCPZPXzgV2p
         nLCU1GemFObrYvrDBfzJSc4WhCjxRhEb98F3tKCF138sG6gbJTvIHHfFlBbQmHn10846
         D6eUMrQqg/HFTpm8NMrK7g93MDvX17N8F/46PYaa7tMB8ifVHU3oC3lgDIYaudKmGu3X
         JcLxtRwcYrF8fluPG7+xnfDtxzF9N3dLiolr5IbcgxaINLo0RFBJfl/IZlrH7aDn2XDT
         PBNQ==
ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=errors-to:reply-to:list-subscribe:list-help:list-post:list-archive
         :list-unsubscribe:list-id:precedence:content-disposition
         :mime-version:message-id:subject:cc:to:from:date:dkim-signature
         :arc-filter:dmarc-filter:delivered-to;
        bh=beTx46oX2QTv7Wc9DmtQ6A7fTvIjiDzvNzFvPRz3B/w=;
        fh=eM2nYbNDZ6Theupq9WaNJQV2QgkrsvQk+3X75erbI8c=;
        b=jX4j1h7rQ9AI8buPoQZ1iHvljh0w9v+bXT8+9/ilbo10Fc2rFiv5NxEZ3ZMy/9c2+J
         64xZD0VXQJyBcbw20PviPLbFPcB2nUEOdAVNy8Jm9/L07DK9sijVMOtFAOFHweQi3d2B
         fRvBQLYFj2m//9JBhd1vKZTe8/3TEA3o2gMpfFA63fN2cZJS55Sn1zr5dc25Gn3UOjlo
         5kTifk/jd6XMol1XQ1ZMNSKm/W+M9DICCRl3Ik0k4q8GXZEVX588f/nBN08xlbz+fxr8
         66IIRElsBppnFkkYngy6KqLy3tCjyHCS1zI3W4ISGzm7yPjifSQe7hDQxOqTvymjbU6V
         djSA==
ARC-Authentication-Results: i=2; mx.google.com;
       dkim=pass header.i=@redhat.com header.s=mimecast20190719
 header.b=SFx+DeG8;
       arc=pass (i=1);
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com
Received: from server2.sourceware.org (server2.sourceware.org.
 [2620:52:3:1:0:246e:9693:128c])
        by mx.google.com with ESMTPS id
 o8-20020a0ce408000000b00679ce542c8bsi13197959qvl.330.2023.11.29.03.13.53
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 29 Nov 2023 03:13:53 -0800 (PST)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 client-ip=2620:52:3:1:0:246e:9693:128c;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@redhat.com header.s=mimecast20190719
 header.b=SFx+DeG8;
       arc=pass (i=1);
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id D523A385AC2B
	for <ouuuleilei@gmail.com>; Wed, 29 Nov 2023 11:13:52 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.129.124])
 by sourceware.org (Postfix) with ESMTPS id AB0AC3858C60
 for <gcc-patches@gcc.gnu.org>; Wed, 29 Nov 2023 11:13:27 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AB0AC3858C60
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=redhat.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org AB0AC3858C60
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=170.10.129.124
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701256409; cv=none;
 b=HqZSeNfbkNo5t/lrrf5FwJfICAzjYJOQ5qtTFVD+ARNjYZTRzKT2B1LjTlnF3hBRwtV4pU2wes1embpdHRM6r1LLNgmGshyNid8qEchgD7f4oW+a3ZnnmXw4uHBwhx7mOtSXsUmxl+JM8bH/ZOWbxD5Hm6DLM0zXsC5i7eTr1hs=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1701256409; c=relaxed/simple;
 bh=bi1tWnxiXrvkR06GjDa+n0O+3c/QU1zaxhZOmxIrh/M=;
 h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version;
 b=nTAz4s8Qdc/Nc0JBujJDAUO5AY3DFsxqVV8ScJFjVCdwm2QYE/bQ1KF1jtHEVGLCEChlcp/qK1hrOvnj9gIprU2dgpm2zlf93v/1MJt+hUvlazjt/khw2SdZp1qT5lNINsJgykLyWD5O6yxI2w6CW00D66/JMZMcJ99H+uG993g=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
 s=mimecast20190719; t=1701256407;
 h=from:from:reply-to:reply-to:subject:subject:date:date:
 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
 content-type:content-type; bh=beTx46oX2QTv7Wc9DmtQ6A7fTvIjiDzvNzFvPRz3B/w=;
 b=SFx+DeG8vV1YMtt+vne7fyW9JQ47iLk1CFYS3fjvPKkV6JXCDf4/3q5SoPdYo0aNhWDc5c
 LO5Ao26sfg8Ff81NEPJBy+TX9rwd09txw1ys3zMKyqQQ/emr4hOkxaS1rUdyglk8yGqP/O
 N/JzVxqGIcgCjozrEY0E2/cEIIr4JF0=
Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73])
 by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3,
 cipher=TLS_AES_256_GCM_SHA384) id us-mta-215-SYkjWTk1Opqq2AKkWE-rKA-1; Wed,
 29 Nov 2023 06:13:24 -0500
X-MC-Unique: SYkjWTk1Opqq2AKkWE-rKA-1
Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com
 [10.11.54.9])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest
 SHA256)
 (No client certificate requested)
 by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B14F038116E3;
 Wed, 29 Nov 2023 11:13:23 +0000 (UTC)
Received: from tucnak.zalov.cz (unknown [10.39.194.53])
 by smtp.corp.redhat.com (Postfix) with ESMTPS id 6AD2C492BE0;
 Wed, 29 Nov 2023 11:13:23 +0000 (UTC)
Received: from tucnak.zalov.cz (localhost [127.0.0.1])
 by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 3ATBDKkQ3296836
 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT);
 Wed, 29 Nov 2023 12:13:20 +0100
Received: (from jakub@localhost)
 by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 3ATBDJqf3296835;
 Wed, 29 Nov 2023 12:13:19 +0100
Date: Wed, 29 Nov 2023 12:13:19 +0100
From: Jakub Jelinek <jakub@redhat.com>
To: Richard Biener <rguenther@suse.de>,
 Richard Sandiford <richard.sandiford@arm.com>
Cc: gcc-patches@gcc.gnu.org
Subject: [PATCH] wide-int: Fix wide_int division/remainder [PR112733]
Message-ID: <ZWccz0kiLGvC9CIc@tucnak>
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Disposition: inline
X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE,
 RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Reply-To: Jakub Jelinek <jakub@redhat.com>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1783896665330544504
X-GMAIL-MSGID: 1783896665330544504

Hi!

This patch fixes the other half of the PR, where we were crashing on the
UNSIGNED widest_int modulo of 1 % -128 (where the -128 in UNSIGNED is
131072 bit 0xfffff...fffff80).
The patch actually has 2 independent parts, if you want, they could be
split out.
The fix for the divmod buffer overflow is in the 2nd-5th and 7th-8th hunks of
the patch.  abs (remainder) <= min (abs (dividend), abs (divisor)) and the
wide-int.h callers estimate the remainder size to be the same as the
quotient size, i.e. for UNSIGNED dividend with msb set quotient (same as
remainder) precision based estimation, for SIGNED or dividend with msb
clear estimate based on divident len (+ 1).
For quotient (if any), wi_pack is called as
  quotient_len = wi_pack (quotient, b_quotient, m, dividend_prec);
where m is
  m = dividend_blocks_needed;
  while (m > 1 && b_dividend[m - 1] == 0)
    m--;
and
  dividend_blocks_needed = 2 * BLOCKS_NEEDED (dividend_prec);
where wi_pack stores to result (in_len + 1) / 2 limbs (or one more
if (in_len & 1) == 0 and in_len / 2 < BLOCKS_NEEDED (dividend_prec)).
In any case, that is never more than BLOCKS_NEEDED (dividend_prec) and
matches caller's estimations (then it is canonicalized and perhaps shrunk
that way).

The problematic case is the remainder, where wi_pack is called as
  *remainder_len = wi_pack (remainder, b_remainder, n, dividend_prec);
The last argument is again based on the dividend precision like for
quotient, but n is instead:
  n = divisor_blocks_needed;
  while (n > 1 && b_divisor[n - 1] == 0)
    n--;
so based on the divisor rather than dividend.  Now, in the
wi::multiple_of_p (1, -128, UNSIGNED) widest_int precision case,
dividend_prec is very small, the dividend is 1 and while the official
precision is 131072 bits, dividend_len is 1 and
  if (sgn == SIGNED || dividend_val[dividend_len - 1] >= 0)
    dividend_prec = MIN ((dividend_len + 1) * HOST_BITS_PER_WIDE_INT,
                         dividend_prec);
shrinks the estimate to 128 bits in this case; but divisor_prec
is huge, because the divisor is negative UNSIGNED, so 131072 bits,
2048 limbs, 4096 half-limbs (so divisor_blocks_needed and n are 4096).
Now, wi_pack relies on the penultimate argument to be smaller or equal
to 2 * BLOCKS_NEEDED of the last argument and that is not the case here,
4096 is certainly larger than 2 * BLOCKS_NEEDED (128) aka 4 and so
wi_pack overflows the array.
Unfortunately looking around, this isn't the only overflow, already
in divmod_internal_2 we have an overflow, though there not overflowing
a buffer during writing, but during reading.  When divmod_internal_2
is called with the last 2 arguments like m=1, n=4096 as in this case
(but generally any where m < n), it has a special case for n == 1 (which
won't be the problem, we never have m or n 0), then decides based on
msb of divisor whether there should be some shift canonicalization, then
performs a
  for (j = m - n; j >= 0; j--)
main loop and finally initializes b_remainder:
  if (s)
    for (i = 0; i < n; i++)
      b_remainder[i] = (b_dividend[i] >> s)
        | (b_dividend[i+1] << (HOST_BITS_PER_HALF_WIDE_INT - s));
  else
    for (i = 0; i < n; i++)
      b_remainder[i] = b_dividend[i];
In the usual case of m >= n, the main loop will iterate at least once,
b_dividend has m + 1 valid elements and the copying to b_remainder
is correct.  But for the m < n unusual case, the main loop never iterates
and we want to copy the b_dividend right into b_remainder (possibly with
shifts).  Except when it copies n elements, it copies garbage which isn't
there at all for the last n - m elements.
Because the decision whether the main loop should iterate at all or not
and the s computation should be based on the original n, the following
patch sets n to MIN (n, m) only after the main loop, such that we copy
to b_remainder at most what is initialized in b_dividend, and returns
that adjusted value to the caller which then passes it to wi_pack and
so satisfies again the requirement there, rather than trying to
do the n = MIN (n, m) before calling divmod_internal_2 or after it.

I believe the bug is mostly something I've only introduced myself in the
> 576 bit wide_int/widest_int support changes, before that there was
no attempt to decrease the precisions and so I think dividend_prec
and divisor_prec were pretty much always the same (or always?),
and even the copying of garbage wasn't the case because the only time
m or n was decreased from the same precision was if there were 0s in the
upper half-limbs.

The 1st and 6th hunks in the patch is just that while looking at the
code, I've noticed I computed wrong the sizes in the XALLOCAVEC case,
somehow the 4 * in the static arrays led me believe I need to make the
alloced buffers twice (in the multiplication) or 4 times (in the
division/modulo cases) larger than needed.

Earlier version of the patch has been bootstrapped/regtested on x86_64-linux
and i686-linux and due to forgotten adjustment of memset argument in 6th
hunk regressed gcc.dg/bitint-3{8,9}.c tests, this version so far passed
make check-gcc check-g++ -j32 -k GCC_TEST_RUN_EXPENSIVE=1 RUNTESTFLAGS="GCC_TEST_RUN_EXPENSIVE=1 dg.exp='*bitint* pr112673.c builtin-stdc-bit-*.c pr112566-2.c pr112511.c' dg-torture.exp=*bitint* dfp.exp=*bitint*"

Ok for trunk if it passes full bootstrap/regtest again?
And do you want to split those 2 hunks into a separate commit or not?

2023-11-29  Jakub Jelinek  <jakub@redhat.com>

	PR middle-end/112733
	* wide-int.cc (wi::mul_internal): Don't allocate twice as much
	space for u, v and r as needed.
	(divmod_internal_2): Change return type from void to int, for n == 1
	return 1, otherwise before writing b_dividend into b_remainder set
	n to MIN (n, m) and at the end return it.
	(wi::divmod_internal): Don't allocate 4 times as much space for
	b_quotient, b_remainder, b_dividend and b_divisor.  Set n to
	result of divmod_internal_2.
	(wide_int_cc_tests): Add test for unsigned widest_int
	wi::multiple_of_p of 1 and -128.


	Jakub

--- gcc/wide-int.cc.jj	2023-11-28 16:56:50.000000000 +0100
+++ gcc/wide-int.cc	2023-11-29 10:53:14.329098216 +0100
@@ -1477,10 +1477,10 @@ wi::mul_internal (HOST_WIDE_INT *val, co
   if (UNLIKELY (prec > WIDE_INT_MAX_INL_PRECISION))
     {
       unsigned HOST_HALF_WIDE_INT *buf
-	= XALLOCAVEC (unsigned HOST_HALF_WIDE_INT, 4 * 4 * blocks_needed);
+	= XALLOCAVEC (unsigned HOST_HALF_WIDE_INT, 4 * half_blocks_needed);
       u = buf;
-      v = u + 4 * blocks_needed;
-      r = v + 4 * blocks_needed;
+      v = u + half_blocks_needed;
+      r = v + half_blocks_needed;
     }
 
   /* We do unsigned mul and then correct it.  */
@@ -1675,8 +1675,9 @@ wi::sub_large (HOST_WIDE_INT *val, const
    Delight by Warren, which itself is a small modification of Knuth's
    algorithm.  M is the number of significant elements of U however
    there needs to be at least one extra element of B_DIVIDEND
-   allocated, N is the number of elements of B_DIVISOR.  */
-static void
+   allocated, N is the number of elements of B_DIVISOR.
+   Return new value for N.  */
+static int
 divmod_internal_2 (unsigned HOST_HALF_WIDE_INT *b_quotient,
 		   unsigned HOST_HALF_WIDE_INT *b_remainder,
 		   unsigned HOST_HALF_WIDE_INT *b_dividend,
@@ -1707,7 +1708,7 @@ divmod_internal_2 (unsigned HOST_HALF_WI
 		  * (unsigned HOST_WIDE_INT)b_divisor[0]));
 	}
       b_remainder[0] = k;
-      return;
+      return 1;
     }
 
   s = clz_hwi (b_divisor[n-1]) - HOST_BITS_PER_HALF_WIDE_INT; /* CHECK clz */
@@ -1770,6 +1771,10 @@ divmod_internal_2 (unsigned HOST_HALF_WI
 	  b_dividend[j+n] += k;
 	}
     }
+  /* If N > M, the main loop was skipped, quotient will be 0 and
+     we can't copy more than M half-limbs into the remainder, as they
+     aren't present in b_dividend (which has .  */
+  n = MIN (n, m);
   if (s)
     for (i = 0; i < n; i++)
       b_remainder[i] = (b_dividend[i] >> s)
@@ -1777,6 +1782,7 @@ divmod_internal_2 (unsigned HOST_HALF_WI
   else
     for (i = 0; i < n; i++)
       b_remainder[i] = b_dividend[i];
+  return n;
 }
 
 
@@ -1943,14 +1949,14 @@ wi::divmod_internal (HOST_WIDE_INT *quot
     {
       unsigned HOST_HALF_WIDE_INT *buf
 	= XALLOCAVEC (unsigned HOST_HALF_WIDE_INT,
-		      12 * dividend_blocks_needed
-		      + 4 * divisor_blocks_needed + 1);
+		      3 * dividend_blocks_needed + 1
+		      + divisor_blocks_needed);
       b_quotient = buf;
-      b_remainder = b_quotient + 4 * dividend_blocks_needed;
-      b_dividend = b_remainder + 4 * dividend_blocks_needed;
-      b_divisor = b_dividend + 4 * dividend_blocks_needed + 1;
+      b_remainder = b_quotient + dividend_blocks_needed;
+      b_dividend = b_remainder + dividend_blocks_needed;
+      b_divisor = b_dividend + dividend_blocks_needed + 1;
       memset (b_quotient, 0,
-	      4 * dividend_blocks_needed * sizeof (HOST_HALF_WIDE_INT));
+	      dividend_blocks_needed * sizeof (HOST_HALF_WIDE_INT));
     }
   wi_unpack (b_dividend, dividend.get_val (), dividend.get_len (),
 	     dividend_blocks_needed, dividend_prec, UNSIGNED);
@@ -1969,7 +1975,7 @@ wi::divmod_internal (HOST_WIDE_INT *quot
   if (b_quotient == b_quotient_buf)
     memset (b_quotient_buf, 0, sizeof (b_quotient_buf));
 
-  divmod_internal_2 (b_quotient, b_remainder, b_dividend, b_divisor, m, n);
+  n = divmod_internal_2 (b_quotient, b_remainder, b_dividend, b_divisor, m, n);
 
   unsigned int quotient_len = 0;
   if (quotient)
@@ -2673,6 +2679,9 @@ wide_int_cc_tests ()
 	     wi::shifted_mask (0, 128, false, 128));
   ASSERT_EQ (wi::mask (128, true, 128),
 	     wi::shifted_mask (0, 128, true, 128));
+  ASSERT_EQ (wi::multiple_of_p (from_int <widest_int> (1),
+				from_int <widest_int> (-128), UNSIGNED),
+	     false);
 }
 
 } // namespace selftest