Message ID | ZMKlEWqSJ941v3UV@tucnak |
---|---|
Headers |
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a985:0:b0:3e4:2afc:c1 with SMTP id t5csp1243423vqo; Thu, 27 Jul 2023 10:11:43 -0700 (PDT) X-Google-Smtp-Source: APBJJlEd96RkYdoOG9Wq/a5rWA45OO+bGPmeaDyJoax784UOYbvHq8moDcTkrQHr5IOGgT6jE4qE X-Received: by 2002:aa7:cd5a:0:b0:522:1dce:ca09 with SMTP id v26-20020aa7cd5a000000b005221dceca09mr2320598edw.29.1690477903609; Thu, 27 Jul 2023 10:11:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690477903; cv=none; d=google.com; s=arc-20160816; b=LK3UC7wNowLUcs1doualk5DAOOKHsEOKQw8rhfSsx6AUM1ya7frI3k/DtY1+30tOPP sM8PM53de+g+NswEt5CqeVoWpj4boe/4pFyXo8sEgr3Ti9Rt0YCbe0jXuPOEKxProxDE O1FzN0mYJfUIrasTt0HTbwNOATizLTQMMM7ZmVf+LvTbrW/EFW9ZnJnyXuG1l2L1rlF7 d5BmxUW2coB3xqM6C5hVqUPTQQ/xTjc3m+my+gykqbm6r02qUyzGTjfzH+SMzir/ovuD uM589mgPq1jXPX2cmePCiSj9Hz76aMFv6JBRBmS24JAnAaU12FQyELd9EBrp7v86RqEV ym4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-disposition:mime-version:message-id:subject:cc:to:date :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=x0ycHaQHFmV15OFxfsETqh7IwO5WWXzYiWqXiNRBIQg=; fh=KY+sxu+GDO8REWaCSOfGlt2KdoG6IIVk0/n1blXeCRk=; b=EQIHlh1Tq2xO1sl2iDQYzNyLD9rx1AxG3gLspdGaJ0J2Atbyhx/y5yeNHlj2xqiBdz duplOU9SwFhCwWuPDX/LyMjJRk4meV3B11VbduVLnpzktcptXAc+yl/fbPvSzhl6nyAF h6W2OUsKueuXD4NmjxxSFwI/jYyxtPs+d80OTDFU5/S4+xrf/3ylX5AXnGyAnKxbhI3g uV75Ety3mxaWWnfcfUvJrU1+i4hkylZypAb1hnK67g8mfulJXUOu+KkY0UaWbG5ALspN 3yLshsLIlLkmZVKkI1xBukLVystW3gKxHjvwLcEjKr88h36xekC3D98fsVN6lt7HAfxX y4tw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=rKzkPQ4V; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id c20-20020aa7c754000000b0052257d64ad2si1189311eds.269.2023.07.27.10.11.42 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Jul 2023 10:11:43 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=rKzkPQ4V; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0083D385AFB2 for <ouuuleilei@gmail.com>; Thu, 27 Jul 2023 17:11:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0083D385AFB2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1690477898; bh=x0ycHaQHFmV15OFxfsETqh7IwO5WWXzYiWqXiNRBIQg=; h=Date:To:Cc:Subject:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=rKzkPQ4VlSYrFepogVXOABOuF2TQBbmTOH4JyvjLbGhEbMtzR5yBRmFL56X7BExuT pVCk95AyxZJv7UTB+gSq9O4KrrF8PTMr44F+qftmHqbtlwZJAdbeSb/uVkwKBI2kcJ ESwYJpXe1sXNN+b11ITwHNN6zdCOd+8h3ZCC8Ndw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 96B5E3858D39 for <gcc-patches@gcc.gnu.org>; Thu, 27 Jul 2023 17:10:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 96B5E3858D39 Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-399-nN4G6kQAOF66dLcrj9xZcg-1; Thu, 27 Jul 2023 13:10:47 -0400 X-MC-Unique: nN4G6kQAOF66dLcrj9xZcg-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A8F17185A78F; Thu, 27 Jul 2023 17:10:46 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.45.224.18]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 3A865492B02; Thu, 27 Jul 2023 17:10:46 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 36RHAhiw2823026 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Thu, 27 Jul 2023 19:10:43 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 36RHAgPq2823025; Thu, 27 Jul 2023 19:10:42 +0200 Date: Thu, 27 Jul 2023 19:10:41 +0200 To: gcc-patches@gcc.gnu.org Cc: Richard Biener <rguenther@suse.de>, "Joseph S. Myers" <joseph@codesourcery.com>, Uros Bizjak <ubizjak@gmail.com>, hjl.tools@gmail.com Subject: [PATCH 0/5] GCC _BitInt support [PR102989] Message-ID: <ZMKlEWqSJ941v3UV@tucnak> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-3.4 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Jakub Jelinek <jakub@redhat.com> Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772594558355018002 X-GMAIL-MSGID: 1772594558355018002 |
Series | GCC _BitInt support [PR102989] | |
Message
Jakub Jelinek
July 27, 2023, 5:10 p.m. UTC
[PATCH 0/5] GCC _BitInt support [PR102989]
The following patch series introduces support for C23 bit-precise integer
types. In short, they are similar to other integral types in many ways,
just aren't subject for integral promotions if smaller than int and they can
have even much wider precisions than ordinary integer types.
It is enabled only on targets which have agreed on processor specific
ABI how to lay those out or pass as function arguments/return values,
which currently is just x86-64 I believe, would be nice if target maintainers
helped to get agreement on psABI changes and GCC 14 could enable it on far
more architectures than just one.
C23 says that <limits.h> defines BITINT_MAXWIDTH macro and that is the
largest supported precision of the _BitInt types, smallest is precision
of unsigned long long (but due to lack of psABI agreement we'll violate
that on architectures which don't have the support done yet).
The following series uses for the time just WIDE_INT_MAX_PRECISION as
that BITINT_MAXWIDTH, with the intent to increase it incrementally later
on. WIDE_INT_MAX_PRECISION is 575 bits on x86_64, but will be even smaller
on lots of architectures. This is the largest precision we can support
without changes of wide_int/widest_int representation (to make those non-POD
and allow use of some allocated buffer rather than the included fixed size
one). Once that would be overcome, there is another internal enforced limit,
INTEGER_CST in current layout allows at most 255 64-bit limbs, which is
16320 bits as another cap. And if that is overcome, then we have limitation
of TYPE_PRECISION being 16-bit, so 65535 as maximum precision. Perhaps
we could make TYPE_PRECISION dependent on BITINT_TYPE vs. others and use
32-bit precision in that case later. Latest Clang/LLVM I think supports
on paper up to 8388608 bits, but is hardly usable even with much shorter
precisions.
Besides this hopefully temporary cap on supported precision and support
only on targets which buy into it, the support has the following limitations:
- _BitInt(N) bit-fields aren't supported yet (the patch rejects them); I'd like
to enable those incrementally, but don't really see details on how such
bit-fields should be laid-out in memory nor passed inside of function
arguments; LLVM implements something, but it is a question if that is what
the various ABIs want
- conversions between large/huge (see later) _BitInt and _Decimal{32,64,128}
aren't support and emit a sorry; I'm not familiar enough with DFP stuff
to implement that
- _Complex _BitInt(N) isn't supported; again mainly because none of the psABIs
mention how those should be passed/returned; in a limited way they are
supported internally because the internal functions into which
__builtin_{add,sub,mul}_overflow{,_p} is lowered return COMPLEX_TYPE as a
hack to return 2 values without using references/pointers
- vectors of _BitInt(N) aren't supported, both because psABIs don't specify
how that works and because I'm not really sure it would be useful given
lack of hw support for anything but bit-precise integers with the same
bit precision as standard integer types
Because the bit-precise types have different behavior both in the C FE
(e.g. the lack of promotion) and do or can have different behavior in type
layout and function argument passing/returning values, the patch introduces
a new integral type, BITINT_TYPE, so various spots which explicitly check
for INTEGER_TYPE and not say INTEGRAL_TYPE_P macro need to be adjusted.
Also the assumption that all integral types have scalar integer type mode
is no longer true, larger BITINT_TYPEs have BLKmode type.
The patch makes 4 different categories of _BitInt depending on the target hook
decisions and their precision. The x86-64 psABI says that _BitInt which fit
into signed/unsigned char, short, int, long and long long are laid out and
passed as those types (with padding bits undefined if they don't have mode
precision). Such smallest precision bit-precise integer types are categorized
as small, the target hook gives for specific precision a scalar integral mode
where a single such mode contains all the bits. Such small _BitInt types are
generally kept in the IL until expansion into RTL, with minor tweaks during
expansion to avoid relying on the padding bit values. All larger precision
_BitInt types are supposed to be handled as structure containing an array
of limbs or so, where a limb has some integral mode (for libgcc purposes
best if it has word-size) and the limbs have either little or big endian
ordering in the array. The padding bits in the most significant limb if any
are either undefined or should be always sign/zero extended (but support for this
isn't in yet, we don't know if any psABI will require it). As mentioned in
some psABI proposals, while currently there is just one limb mode, if the limb
ordering would follow normal target endianity, there is always a possibility
to have two limb modes, one used for ABI purposes (in alignment/size decisions)
and another one used during the actual lowering or libgcc helpers.
The second _BitInt category is called medium in the series, those are _BitInt
precisions which need more than one limb, but the precision is still smaller
than TImode precision (or DImode on targets which don't support __int128).
Most arithmetics on such types can be lowered simply to casts to the larger/equal
precision {,unsigned} {long long,__int128} type and performing the arith on
normal integers and then casted back. Larger _BitInt precision typically
will have BLKmode and will be lowered in a new bitintlower* pass right after
complex lowering (for -O1+ it is shortly after IPA) into series of operations
on individual limbs. The series talks about large and huge _BitInts,
large ones are up to one bit smaller than 4 limbs and are lowered in most
places in straight line code iterating of the limbs and huge ones are those
which use some loop to handle most of the limbs and only handle up to 2 limbs
before or after the loop.
Most operations, like bitwise operations, addition, subtraction, left shift by
constant smaller than limb precision, some casts, ==/!= comparisons,
loads/stores are handled in a loop with 2 limbs per iteration followed by 0, 1
or 2 limbs handled after, are called in the series mergeable and the loop
handles perhaps many different operations with single use in the same bb.
>/>=/</<= comparisons are handled optionally together with operand casts and
loads in one optional straight line handling of most significant limb (unless
unsigned and precision is multiple of limb precision) followed by a loop handling
one limb at a time from more significant down to least significant.
Other operations like arbitrary left shifts or all right shifts are handled also
in a loop doing one limb at a time but accessing possibly some other limb.
Multiplication, division, modulo and floating point to/from _BitInt conversions
are handled using libgcc library routines.
__builtin_{add,sub}_overflow are handled similarly to addition/subtraction but
not mergeable with anything except implicit or explicit casts/loads and with
tracking carry at the end.
__builtin_mul_overflow is implemented by using infinite precision library
multiplication (from range info we determine ranges of operands and use possibly
a temporary array to hold large enough result) and then comparing if all bits
are zero resp. sign bit copies.
The libgcc library routines, both for multiplication, division, modulo or
conversions with floating point use a special calling convention, where for each
_BitInt a pointer to array of limbs and precision are passed. The precision
is signed SImode, if positive, it is a known minimum precision in bits of an
unsigned operand, if it is negative, its absolute value is known minimum
precision in bits of a signed operand. That way, the compiler using e.g. range
information can already pre-reduce precision and at runtime libgcc can reduce
it further by skipping over most significant limbs which contain just zeros or
sign bit copies. In any case, small _BitInt types can be passed differently,
but for passing those to the libgcc routines they need to be forced into
an array of limbs as well (typically just one or two limbs).
The whole series have been successfully bootstrapped/regtested on x86_64-linux
and i686-linux.
Jakub Jelinek (5):
Middle-end _BitInt support [PR102989]
libgcc _BitInt support [PR102989]
C _BitInt support [PR102989]
testsuite part 1 for _BitInt support [PR102989]
testsuite part 2 for _BitInt support [PR102989]
gcc/Makefile.in | 1
gcc/builtins.cc | 7
gcc/c-family/c-common.cc | 11
gcc/c-family/c-common.h | 2
gcc/c-family/c-cppbuiltin.cc | 23
gcc/c-family/c-lex.cc | 164
gcc/c-family/c-pretty-print.cc | 32
gcc/c-family/c-ubsan.cc | 4
gcc/c/c-convert.cc | 1
gcc/c/c-decl.cc | 181 -
gcc/c/c-parser.cc | 27
gcc/c/c-tree.h | 18
gcc/c/c-typeck.cc | 119
gcc/calls.cc | 18
gcc/cfgexpand.cc | 13
gcc/config/i386/i386.cc | 33
gcc/convert.cc | 8
gcc/doc/tm.texi | 5
gcc/doc/tm.texi.in | 2
gcc/dwarf2out.cc | 48
gcc/expr.cc | 53
gcc/fold-const.cc | 75
gcc/gimple-expr.cc | 9
gcc/gimple-fold.cc | 5
gcc/gimple-lower-bitint.cc | 5495 +++++++++++++++++++++++++++++++
gcc/gimple-lower-bitint.h | 31
gcc/glimits.h | 5
gcc/internal-fn.cc | 117
gcc/internal-fn.def | 6
gcc/internal-fn.h | 4
gcc/lto-streamer-in.cc | 2
gcc/match.pd | 1
gcc/passes.def | 3
gcc/pretty-print.h | 19
gcc/stor-layout.cc | 70
gcc/target.def | 9
gcc/target.h | 14
gcc/targhooks.cc | 8
gcc/targhooks.h | 1
gcc/testsuite/gcc.dg/bitint-1.c | 26
gcc/testsuite/gcc.dg/bitint-10.c | 15
gcc/testsuite/gcc.dg/bitint-11.c | 9
gcc/testsuite/gcc.dg/bitint-12.c | 31
gcc/testsuite/gcc.dg/bitint-13.c | 17
gcc/testsuite/gcc.dg/bitint-14.c | 11
gcc/testsuite/gcc.dg/bitint-15.c | 10
gcc/testsuite/gcc.dg/bitint-2.c | 116
gcc/testsuite/gcc.dg/bitint-3.c | 40
gcc/testsuite/gcc.dg/bitint-4.c | 39
gcc/testsuite/gcc.dg/bitint-5.c | 63
gcc/testsuite/gcc.dg/bitint-6.c | 15
gcc/testsuite/gcc.dg/bitint-7.c | 16
gcc/testsuite/gcc.dg/bitint-8.c | 34
gcc/testsuite/gcc.dg/bitint-9.c | 52
gcc/testsuite/gcc.dg/torture/bitint-1.c | 114
gcc/testsuite/gcc.dg/torture/bitint-10.c | 38
gcc/testsuite/gcc.dg/torture/bitint-11.c | 77
gcc/testsuite/gcc.dg/torture/bitint-12.c | 128
gcc/testsuite/gcc.dg/torture/bitint-13.c | 171
gcc/testsuite/gcc.dg/torture/bitint-14.c | 140
gcc/testsuite/gcc.dg/torture/bitint-15.c | 264 +
gcc/testsuite/gcc.dg/torture/bitint-16.c | 385 ++
gcc/testsuite/gcc.dg/torture/bitint-17.c | 82
gcc/testsuite/gcc.dg/torture/bitint-18.c | 117
gcc/testsuite/gcc.dg/torture/bitint-19.c | 190 +
gcc/testsuite/gcc.dg/torture/bitint-2.c | 118
gcc/testsuite/gcc.dg/torture/bitint-20.c | 190 +
gcc/testsuite/gcc.dg/torture/bitint-21.c | 282 +
gcc/testsuite/gcc.dg/torture/bitint-22.c | 282 +
gcc/testsuite/gcc.dg/torture/bitint-23.c | 804 ++++
gcc/testsuite/gcc.dg/torture/bitint-24.c | 804 ++++
gcc/testsuite/gcc.dg/torture/bitint-25.c | 91
gcc/testsuite/gcc.dg/torture/bitint-26.c | 66
gcc/testsuite/gcc.dg/torture/bitint-27.c | 373 ++
gcc/testsuite/gcc.dg/torture/bitint-28.c | 20
gcc/testsuite/gcc.dg/torture/bitint-29.c | 24
gcc/testsuite/gcc.dg/torture/bitint-3.c | 134
gcc/testsuite/gcc.dg/torture/bitint-30.c | 19
gcc/testsuite/gcc.dg/torture/bitint-31.c | 23
gcc/testsuite/gcc.dg/torture/bitint-32.c | 24
gcc/testsuite/gcc.dg/torture/bitint-33.c | 24
gcc/testsuite/gcc.dg/torture/bitint-34.c | 24
gcc/testsuite/gcc.dg/torture/bitint-35.c | 23
gcc/testsuite/gcc.dg/torture/bitint-36.c | 23
gcc/testsuite/gcc.dg/torture/bitint-37.c | 23
gcc/testsuite/gcc.dg/torture/bitint-38.c | 56
gcc/testsuite/gcc.dg/torture/bitint-39.c | 57
gcc/testsuite/gcc.dg/torture/bitint-4.c | 134
gcc/testsuite/gcc.dg/torture/bitint-40.c | 40
gcc/testsuite/gcc.dg/torture/bitint-41.c | 34
gcc/testsuite/gcc.dg/torture/bitint-5.c | 359 ++
gcc/testsuite/gcc.dg/torture/bitint-6.c | 359 ++
gcc/testsuite/gcc.dg/torture/bitint-7.c | 386 ++
gcc/testsuite/gcc.dg/torture/bitint-8.c | 391 ++
gcc/testsuite/gcc.dg/torture/bitint-9.c | 391 ++
gcc/testsuite/gcc.dg/ubsan/bitint-1.c | 49
gcc/testsuite/gcc.dg/ubsan/bitint-2.c | 49
gcc/testsuite/gcc.dg/ubsan/bitint-3.c | 45
gcc/testsuite/lib/target-supports.exp | 27
gcc/tree-pass.h | 3
gcc/tree-pretty-print.cc | 23
gcc/tree-ssa-coalesce.cc | 148
gcc/tree-ssa-live.cc | 8
gcc/tree-ssa-live.h | 8
gcc/tree-ssa-sccvn.cc | 11
gcc/tree-switch-conversion.cc | 75
gcc/tree.cc | 67
gcc/tree.def | 5
gcc/tree.h | 90
gcc/typeclass.h | 3
gcc/ubsan.cc | 89
gcc/ubsan.h | 3
gcc/varasm.cc | 55
gcc/vr-values.cc | 27
libcpp/expr.cc | 29
libcpp/include/cpplib.h | 1
libgcc/Makefile.in | 5
libgcc/config/aarch64/t-softfp | 2
libgcc/config/i386/64/t-softfp | 2
libgcc/config/i386/libgcc-glibc.ver | 10
libgcc/config/i386/t-softfp | 5
libgcc/config/riscv/t-softfp32 | 6
libgcc/config/rs6000/t-e500v1-fp | 2
libgcc/config/rs6000/t-e500v2-fp | 2
libgcc/config/t-softfp | 2
libgcc/config/t-softfp-sfdftf | 1
libgcc/config/t-softfp-tf | 1
libgcc/libgcc-std.ver.in | 10
libgcc/libgcc2.c | 681 +++
libgcc/libgcc2.h | 15
libgcc/soft-fp/bitint.h | 306 +
libgcc/soft-fp/fixdfbitint.c | 71
libgcc/soft-fp/fixsfbitint.c | 71
libgcc/soft-fp/fixtfbitint.c | 81
libgcc/soft-fp/fixxfbitint.c | 82
libgcc/soft-fp/floatbitintbf.c | 59
libgcc/soft-fp/floatbitintdf.c | 64
libgcc/soft-fp/floatbitinthf.c | 59
libgcc/soft-fp/floatbitintsf.c | 59
libgcc/soft-fp/floatbitinttf.c | 73
libgcc/soft-fp/floatbitintxf.c | 74
libgcc/soft-fp/op-common.h | 31
142 files changed, 16814 insertions(+), 197 deletions(-)
Jakub
Comments
On Thu, 27 Jul 2023, Jakub Jelinek via Gcc-patches wrote: > - _BitInt(N) bit-fields aren't supported yet (the patch rejects them); I'd like > to enable those incrementally, but don't really see details on how such > bit-fields should be laid-out in memory nor passed inside of function > arguments; LLVM implements something, but it is a question if that is what > the various ABIs want So if the x86-64 ABI (or any other _BitInt ABI that already exists) doesn't specify this adequately then an issue should be filed (at https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues in the x86-64 case). (Note that the language specifies that e.g. _BitInt(123):45 gets promoted to _BitInt(123) by the integer promotions, rather than left as a type with the bit-field width.) > - conversions between large/huge (see later) _BitInt and _Decimal{32,64,128} > aren't support and emit a sorry; I'm not familiar enough with DFP stuff > to implement that Doing things incrementally might indicate first doing this only for BID (so sufficing for x86-64), with DPD support to be added when _BitInt support is added for an architecture using DPD, i.e. powerpc / s390. This conversion is a mix of base conversion and things specific to DFP types. For conversion *from DFP to _BitInt*, the DFP value needs to be interpreted (hopefully using existing libbid code) as the product of a sign, an integer and a power of 10, with appropriate truncation of the fractional part if there is one (and appropriate handling of infinity / NaN / values where the integer part obviously doesn't fit in the type as raising "invalid" and returning an arbitrary result). Then it's just a matter of doing an integer multiplication and producing an appropriately signed result (which might itself overflow the range of representable values with the given sign, meaning "invalid" should be raised). Precomputed tables of powers of 10 in binary might speed up the multiplication process (don't know if various existing tables in libbid are usable for that). It's unspecified whether "inexact" is raised for non-integer DFP values. For conversion *from _BitInt to DFP*, the _BitInt value needs to be expressed in decimal. In the absence of optimized multiplication / division for _BitInt, it seems reasonable enough to do this naively (repeatedly dividing by a power of 10 that fits in one limb to determine base 10^N digits from the least significant end, for example), modulo detecting obvious overflow cases up front (if the absolute value is at least 10^97, conversion to _Decimal32 definitely overflows in all rounding modes, for example, so you just need to do an overflowing computation that produces a result with the right sign in order to get the correct rounding-mode-dependent result and exceptions). Probably it isn't necessary to convert most of those base 10^N digits into base 10 digits. Rather, it's enough to find the leading M (= precision of the DFP type in decimal digits) base 10 digits, plus to know whether what follows is exactly 0, exactly 0.5, between 0 and 0.5, or between 0.5 and 1. Then adding two appropriate DFP values with the right sign produces the final DFP result. Those DFP values would need to be produced from integer digits together with the relevant power of 10. And there might be multiple possible choices for the DFP quantum exponent; the preferred exponent for exact results is 0, so the resulting exponent needs to be chosen to be as close to 0 as possible (which also produces correct results when the result is inexact). (If the result is 0, note that quantum exponent of 0 is not the same as the zero from default initialization, which has the least exponent possible.)
On Thu, Jul 27, 2023 at 06:41:44PM +0000, Joseph Myers wrote: > On Thu, 27 Jul 2023, Jakub Jelinek via Gcc-patches wrote: > > > - _BitInt(N) bit-fields aren't supported yet (the patch rejects them); I'd like > > to enable those incrementally, but don't really see details on how such > > bit-fields should be laid-out in memory nor passed inside of function > > arguments; LLVM implements something, but it is a question if that is what > > the various ABIs want > > So if the x86-64 ABI (or any other _BitInt ABI that already exists) > doesn't specify this adequately then an issue should be filed (at > https://gitlab.com/x86-psABIs/x86-64-ABI/-/issues in the x86-64 case). > > (Note that the language specifies that e.g. _BitInt(123):45 gets promoted > to _BitInt(123) by the integer promotions, rather than left as a type with > the bit-field width.) Ok, I'll try to investigate in detail what LLVM does and what GCC would do if I just enabled the bitfield support and report. Still, I'd like to handle this only in incremental step after the rest of _BitInt support goes in. > > - conversions between large/huge (see later) _BitInt and _Decimal{32,64,128} > > aren't support and emit a sorry; I'm not familiar enough with DFP stuff > > to implement that > > Doing things incrementally might indicate first doing this only for BID > (so sufficing for x86-64), with DPD support to be added when _BitInt > support is added for an architecture using DPD, i.e. powerpc / s390. > > This conversion is a mix of base conversion and things specific to DFP > types. I had a brief look at libbid and am totally unimpressed. Seems we don't implement {,unsigned} __int128 <-> _Decimal{32,64,128} conversions at all (we emit calls to __bid_* functions which don't exist), the library (or the way we configure it) doesn't care about exceptions nor rounding mode (see following testcase) and for integral <-> _Decimal32 conversions implement them as integral <-> _Decimal64 <-> _Decimal32 conversions. While in the _Decimal32 -> _Decimal64 -> integral direction that is probably ok, even if exceptions and rounding (other than to nearest) were supported, the other direction I'm sure can suffer from double rounding. So, wonder if it wouldn't be better to implement these in the soft-fp infrastructure which at least has the exception and rounding mode support. Unlike DPD, decoding BID seems to be about 2 simple tests of the 4 bits below the sign bit and doing some shifts, so not something one needs a 10MB of a library for. Now, sure, 5MB out of that are generated tables in bid_binarydecimal.c, but unfortunately those are static and not in a form which could be directly fed into multiplication (unless we'd want to go through conversions to/from strings). So, it seems to be easier to guess needed power of 10 from number of binary digits or vice versa, have a small table of powers of 10 (say those which fit into a limb) and construct larger powers of 10 by multiplicating those several times, _Decimal128 has exponent up to 6144 which is ~ 2552 bytes or 319 64-bit limbs, but having a table with all the 6144 powers of ten would be just huge. In 64-bit limb fit power of ten until 10^19, so we might need say < 32 multiplications to cover it all (but with the current 575 bits limitation far less). Perhaps later on write a few selected powers of 10 as _BitInt to decrease that number. > For conversion *from _BitInt to DFP*, the _BitInt value needs to be > expressed in decimal. In the absence of optimized multiplication / > division for _BitInt, it seems reasonable enough to do this naively > (repeatedly dividing by a power of 10 that fits in one limb to determine > base 10^N digits from the least significant end, for example), modulo > detecting obvious overflow cases up front (if the absolute value is at Wouldn't it be cheaper to guess using the 10^3 ~= 2^10 approximation and instead repeatedly multiply like in the other direction and then just divide once with remainder? Jakub #include <fenv.h> int main () { volatile _Decimal64 d; volatile long long l; int e; feclearexcept (FE_ALL_EXCEPT); d = __builtin_infd64 (); l = d; e = fetestexcept (FE_INVALID); feclearexcept (FE_ALL_EXCEPT); __builtin_printf ("%016lx %d\n", l, e != 0); l = 999999999999999950LL; fesetround (FE_TONEAREST); d = l; __builtin_printf ("%ld\n", (long long) d); fesetround (FE_UPWARD); d = l; fesetround (FE_TONEAREST); __builtin_printf ("%ld\n", (long long) d); fesetround (FE_DOWNWARD); d = l; fesetround (FE_TONEAREST); __builtin_printf ("%ld\n", (long long) d); l = 999999999999999901LL; fesetround (FE_TONEAREST); d = l; __builtin_printf ("%ld\n", (long long) d); fesetround (FE_UPWARD); d = l; fesetround (FE_TONEAREST); __builtin_printf ("%ld\n", (long long) d); fesetround (FE_DOWNWARD); d = l; fesetround (FE_TONEAREST); __builtin_printf ("%ld\n", (long long) d); }
On Fri, 28 Jul 2023, Jakub Jelinek via Gcc-patches wrote: > I had a brief look at libbid and am totally unimpressed. > Seems we don't implement {,unsigned} __int128 <-> _Decimal{32,64,128} > conversions at all (we emit calls to __bid_* functions which don't exist), That's bug 65833. > the library (or the way we configure it) doesn't care about exceptions nor > rounding mode (see following testcase) And this is related to the never-properly-resolved issue about the split of responsibility between libgcc, libdfp and glibc. Decimal floating point has its own rounding mode, set with fe_dec_setround and read with fe_dec_getround (so this test is incorrect). In some cases (e.g. Power), that's a hardware rounding mode. In others, it needs to be implemented in software as a TLS variable. In either case, it's part of the floating-point environment, so should be included in the state manipulated by functions using fenv_t or femode_t. Exceptions are shared with binary floating point. libbid in libgcc has its own TLS rounding mode and exceptions state, but the former isn't connected to fe_dec_setround / fe_dec_getround functions, while the latter isn't the right way to do things when there's hardware exceptions state. libdfp - https://github.com/libdfp/libdfp - is a separate library, not part of libgcc or glibc (and with its own range of correctness bugs) - maintained, but not very actively (maybe more so than the DFP support in GCC - we haven't had a listed DFP maintainer since 2019). It has various standard DFP library functions - maybe not the full C23 set, though some of the TS 18661-2 functions did get added, so it's not just the old TR 24732 set. That includes its own version of the libgcc support, which I think has some more support for using exceptions and rounding modes. It includes the fe_dec_getround and fe_dec_setround functions. It doesn't do anything to help with the issue of including the DFP rounding state in the state manipulated by functions such as fegetenv. Being a separate library probably in turn means that it's less likely to be used (although any code that uses DFP can probably readily enough choose to use a separate library if it wishes). And it introduces issues with linker command line ordering, if the user intends to use libdfp's copy of the functions but the linker processes -lgcc first. For full correctness, at least some functionality (such as the rounding modes and associated inclusion in fenv_t) would probably need to go in glibc. See https://sourceware.org/pipermail/libc-alpha/2019-September/106579.html for more discussion. But if you do put some things in glibc, maybe you still don't want the _BitInt conversions there? Rather, if you keep the _BitInt conversions in libgcc (even when the other support is in glibc), you'd have some libc-provided interface for libgcc code to get the DFP rounding mode from glibc in the case where it's handled in software, like some interfaces already present in the soft-float powerpc case to provide access to its floating-point state from libc (and something along the lines of sfp-machine.h could tell libgcc how to use either that interface or hardware instructions to access the rounding mode and exceptions as needed). > and for integral <-> _Decimal32 > conversions implement them as integral <-> _Decimal64 <-> _Decimal32 > conversions. While in the _Decimal32 -> _Decimal64 -> integral > direction that is probably ok, even if exceptions and rounding (other than > to nearest) were supported, the other direction I'm sure can suffer from > double rounding. Yes, double rounding would be an issue for converting 64-bit integers to _Decimal32 via _Decimal64 (it would be fine to convert 32-bit integers like that since they can be exactly represented in _Decimal64; it would be fine to convert 64-bit integers via _Decimal128). > So, wonder if it wouldn't be better to implement these in the soft-fp > infrastructure which at least has the exception and rounding mode support. > Unlike DPD, decoding BID seems to be about 2 simple tests of the 4 bits > below the sign bit and doing some shifts, so not something one needs a 10MB > of a library for. Now, sure, 5MB out of that are generated tables in Note that representations with too-large significand are defined to be noncanonical representations of zero, so you need to take care of that in decoding BID. > bid_binarydecimal.c, but unfortunately those are static and not in a form > which could be directly fed into multiplication (unless we'd want to go > through conversions to/from strings). > So, it seems to be easier to guess needed power of 10 from number of binary > digits or vice versa, have a small table of powers of 10 (say those which > fit into a limb) and construct larger powers of 10 by multiplicating those > several times, _Decimal128 has exponent up to 6144 which is ~ 2552 bytes > or 319 64-bit limbs, but having a table with all the 6144 powers of ten > would be just huge. In 64-bit limb fit power of ten until 10^19, so we > might need say < 32 multiplications to cover it all (but with the current > 575 bits limitation far less). Perhaps later on write a few selected powers > of 10 as _BitInt to decrease that number. You could e.g. have a table up to 10^(N-1) for some N, and 10^N, 10^2N etc. up to 10^6144 (or rather up to 10^6111, which can then be multiplied by a 34-digit integer significand), so that only one multiplication is needed to get the power of 10 and then a second multiplication by the significand. (Or split into three parts at the cost of an extra multiplication, or multiply the significand by 1, 10, 100, 1000 or 10000 as a multiplication within 128 bits and so only need to compute 10^k for k a multiple of 5, or any number of variations on those themes.) > > For conversion *from _BitInt to DFP*, the _BitInt value needs to be > > expressed in decimal. In the absence of optimized multiplication / > > division for _BitInt, it seems reasonable enough to do this naively > > (repeatedly dividing by a power of 10 that fits in one limb to determine > > base 10^N digits from the least significant end, for example), modulo > > detecting obvious overflow cases up front (if the absolute value is at > > Wouldn't it be cheaper to guess using the 10^3 ~= 2^10 approximation > and instead repeatedly multiply like in the other direction and then just > divide once with remainder? I don't know what's most efficient here, given that it's quadratic in the absence of optimized multiplication / division (so a choice between different approaches that take quadratic time).
On Fri, Jul 28, 2023 at 06:03:33PM +0000, Joseph Myers wrote: > You could e.g. have a table up to 10^(N-1) for some N, and 10^N, 10^2N > etc. up to 10^6144 (or rather up to 10^6111, which can then be multiplied > by a 34-digit integer significand), so that only one multiplication is > needed to get the power of 10 and then a second multiplication by the > significand. (Or split into three parts at the cost of an extra > multiplication, or multiply the significand by 1, 10, 100, 1000 or 10000 > as a multiplication within 128 bits and so only need to compute 10^k for k > a multiple of 5, or any number of variations on those themes.) So, I've done some quick counting, if we want at most one multiplication to get 10^X for X in 0..6111 (plus another to multiply mantissa by that), having one table with 10^1..10^(N-1) and another with 10^YN for Y 1..6111/N, I get for 64-bit limbs S1 - size of 10^1..10^(N-1) table in bytes S2 - size of 10^YN table N S1 S2 S 20 152 388792 388944 32 344 241848 242192 64 1104 121560 122664 128 3896 60144 64040 255 14472 29320 43792 256 14584 29440 44024 266 15704 28032 43736 384 32072 19192 51264 512 56384 14080 70464 where 266 seems to be the minimum, though the difference from 256 is minimal and having N a power of 2 seems cheaper. Though, the above is just counting the bytes of the 64-bit limb arrays concatenated together, I think it will be helpful to have also an unsigned short table with the indexes into the limb array (so another 256*2 + 24*2 bytes). For something not in libgcc_s.so but in libgcc.a I guess 43.5KiB of .rodata might be acceptable to make it fast. Jakub