From patchwork Thu Sep 29 15:55:50 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jakub Jelinek <jakub@redhat.com>
X-Patchwork-Id: 1551
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp47179wrs;
        Thu, 29 Sep 2022 08:57:10 -0700 (PDT)
X-Google-Smtp-Source: 
 AMsMyM6CiwG6PHTLKfC99wlUW4fjPEXz0OQtYL+e0ZuqWTKKq4D7g3f0ZbT/mhiF2k4QZFTLiXbV
X-Received: by 2002:a17:907:2d9e:b0:782:69f2:a0ec with SMTP id
 gt30-20020a1709072d9e00b0078269f2a0ecmr3085824ejc.680.1664467030476;
        Thu, 29 Sep 2022 08:57:10 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1664467030; cv=none;
        d=google.com; s=arc-20160816;
        b=Ni8CE8LabdPL8ewNrjLqsAB8Ut7gxrXj9LKOlkfjV5I6Q1qNrWaHxabnmNtpAvh8Iv
         XsQYobv+z2Rgv6MBFJEIlM4e59sFucCtbRXZblI70fIc+RHG+mSrjBVK3xc5WOEn0NME
         ukMnh6WaSMgi8btdQfnUYoGq0UCgBjBf0N2H1McrzUvinKjxYXqfrmHYSnmHy4x9OUwv
         I4Sp29DoHU40uRmffDMVNwiWX95nP8QN6FJGCghYraa13YrPb4AWPXu/u1TYsg2YdrMp
         v6+riz9d0IsgF7zi/qGUUAvjMz/zFJ6FiyIHyVyJKjZNcmScrvbXyp1o0U+s1C59/hnc
         JiXQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help
         :list-post:list-archive:list-unsubscribe:list-id:precedence
         :content-disposition:mime-version:message-id:subject:to:date
         :dmarc-filter:delivered-to:dkim-signature:dkim-filter;
        bh=0tZY4UvMqFpc9gpOO8T5mQii0ndq+aVmMhxcJIjCDjk=;
        b=XyefGTLAkdpciEpub1jUjXSrDOXouASx+mSTgWR1iNt5+WrWBv+5253Lv0I9OJL50H
         0nn+0n673G6diepms0wvG5/UDMIvoNK+JY2dYycuX0CV1fHu5KZoRnKfi7Ic8CNKBkhj
         ozS+AplC/BmLMEhslhzF9gV+HkY9qQOrOxl9UNJLfvHjiwJT1yPH7ZxNFMl+8ZH6lPoN
         nL+Y6u7ywSIT4K36G9D7hV04J9zf9HhtunlG2BNfZoYjYCQB9Rw+QYuqePpHJ2ck/Hn6
         g6IIuONux4dvgzTeC2VXQOVjI+Tme2lGsok1kghI/G6s8bg/zfyObfo4JZ3TaHogP1Hk
         x48g==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=vgIGUevR;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97])
        by mx.google.com with ESMTPS id
 g18-20020a50d5d2000000b0044ea2b1c817si7338816edj.244.2022.09.29.08.57.10
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 29 Sep 2022 08:57:10 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender) client-ip=8.43.85.97;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=vgIGUevR;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 4648638582A0
	for <ouuuleilei@gmail.com>; Thu, 29 Sep 2022 15:57:09 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4648638582A0
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1664467029;
	bh=0tZY4UvMqFpc9gpOO8T5mQii0ndq+aVmMhxcJIjCDjk=;
	h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:Cc:From;
	b=vgIGUevR4A6meuWziXIElUC7gpeeGWX6U4njhTaMrawS2xR5gSTJ0jsKW0yMW6BFv
	 XFXEDebEqSqoAJp8D3PqmWCvz4luIJerawkKYpdywCvNm5KWZ57bJ7UXWRYNPdPt9J
	 tN0sOF7fuMawvjnWTAKXR9vBkgJPyfAYHY0bzRDk=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.129.124])
 by sourceware.org (Postfix) with ESMTPS id 75A983857BBF
 for <gcc-patches@gcc.gnu.org>; Thu, 29 Sep 2022 15:56:02 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 75A983857BBF
Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com
 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-591-Ms9BQxNVPt-o4WOGB1JNQg-1; Thu, 29 Sep 2022 11:55:57 -0400
X-MC-Unique: Ms9BQxNVPt-o4WOGB1JNQg-1
Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com
 [10.11.54.8])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6BB2E3806658;
 Thu, 29 Sep 2022 15:55:56 +0000 (UTC)
Received: from tucnak.zalov.cz (unknown [10.39.192.194])
 by smtp.corp.redhat.com (Postfix) with ESMTPS id B3D98C15BA4;
 Thu, 29 Sep 2022 15:55:55 +0000 (UTC)
Received: from tucnak.zalov.cz (localhost [127.0.0.1])
 by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 28TFtqlq3938028
 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT);
 Thu, 29 Sep 2022 17:55:52 +0200
Received: (from jakub@localhost)
 by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 28TFtocF3938027;
 Thu, 29 Sep 2022 17:55:50 +0200
Date: Thu, 29 Sep 2022 17:55:50 +0200
To: Jason Merrill <jason@redhat.com>,
 "Joseph S. Myers" <joseph@codesourcery.com>,
 Hongtao Liu <crazylht@gmail.com>, hjl.tools@gmail.com,
 Richard Earnshaw <richard.earnshaw@arm.com>,
 Kyrylo Tkachov <kyrylo.tkachov@arm.com>, richard.sandiford@arm.com
Subject: [RFC PATCH] c++, i386, arm, aarch64, libgcc: std::bfloat16_t and
 __bf16 arithmetic support
Message-ID: <YzXABvJX2wl3gHkK@tucnak>
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Disposition: inline
X-Spam-Status: No, score=-25.9 required=5.0 tests=BAYES_00, DKIM_INVALID,
 DKIM_SIGNED, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_MANYTO, KAM_SHORT,
 RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Jakub Jelinek via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Jakub Jelinek <jakub@redhat.com>
Reply-To: Jakub Jelinek <jakub@redhat.com>
Cc: gcc-patches@gcc.gnu.org
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1745320180652338316?=
X-GMAIL-MSGID: =?utf-8?q?1745320180652338316?=

Hi!

Here is more complete patch to add std::bfloat16_t support on
x86, AArch64 and (only partially) on ARM 32-bit.  No BFmode optabs
are added by the patch, so for binops/unops it extends to SFmode
first and then truncates back to BFmode.
For {HF,SF,DF,XF,TF}mode -> BFmode conversions libgcc has implementations
of all those conversions so that we avoid double rounding, for
BFmode -> {DF,XF,TF}mode conversions to avoid growing libgcc too much
it emits BFmode -> SFmode conversion first and then converts to the even
wider mode, neither step should be imprecise.
For BFmode -> HFmode, it first emits a precise BFmode -> SFmode conversion
and then SFmode -> HFmode, because neither format is subset or superset
of the other, while SFmode is superset of both.
expr.cc then contains a -ffast-math optimization of the BF -> SF and
SF -> BF conversions if we don't optimize for space (and for the latter
if -frounding-math isn't enabled either).
For x86, perhaps truncsfbf2 optab could be defined for TARGET_AVX512BF16
but IMNSHO should FAIL if !flag_finite_math || flag_rounding_math
|| !flag_unsafe_math_optimizations, because I think the insn doesn't
raise on sNaNs, hardcodes round to nearest and flushes denormals to zero.
In C by default (unless x86 -fexcess-precision=16) we use float excess
precision for BFmode, so truncate only on explicit casts and assignments.
In C++ unfortunately (but that is the case of also _Float16) we don't
support excess precision yet which means that for
__bf16 (__bf16 a, __bf16 b, __bf16 c, __bf16 d) { return a * b + c * d; }
we do a lot of conversions.
The aarch64 part is untested but has a chance of working (IMHO),
though I'd appreciate if ARM maintainers could decide whether it is
acceptable for them that __bf16 changes mangling and will allow arithmetics
and conversions.
The arm part is partial, libgcc side is missing as the target doesn't really
seem to use soft-fp right now.  Perhaps the config/arm/ changes can be
left out from the patch (thus keep ARM 32-bit __bf16 as before) and support
for it can be done at some later time.

Thoughts on this?

2022-09-29  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* tree-core.h (enum tree_index): Add TI_BFLOAT16_TYPE.
	* tree.h (bfloat16_type_node): Define.
	* tree.cc (excess_precision_type): Promote bfloat16_type_mode
	like float16_type_mode.
	* expmed.h (maybe_expand_shift): Declare.
	* expmed.cc (maybe_expand_shift): No longer static.
	* expr.cc (convert_mode_scalar): Don't ICE on BF -> HF or HF -> BF
	conversions.  If there is no optab, handle BF -> {DF,XF,TF,HF}
	conversions as separate BF -> SF -> {DF,XF,TF,HF} conversions, add
	-ffast-math generic implementation for BF -> SF and SF -> BF
	conversions.
	* config/arm/arm.h (arm_bf16_type_node): Remove.
	(arm_bf16_ptr_type_node): Adjust comment.
	* config/arm/arm.cc (TARGET_INVALID_UNARY_OP,
	TARGET_INVALID_BINARY_OP): Don't redefine.
	(arm_mangle_type): Mangle BFmode as DFb16_.
	(arm_invalid_conversion): Only reject BF <-> HF conversions if
	HFmode is non-IEEE format.
	(arm_invalid_unary_op, arm_invalid_binary_op): Remove.
	* config/arm/arm-builtins.cc (arm_bf16_type_node): Remove.
	(arm_simd_builtin_std_type): Use bfloat16_type_node rather than
	arm_bf16_type_node.
	(arm_init_simd_builtin_types): Likewise.
	(arm_init_simd_builtin_scalar_types): Likewise.
	(arm_init_bf16_types): Likewise.
	* config/i386/i386.cc (ix86_mangle_type): Mangle BFmode as DFb16_.
	(ix86_invalid_conversion, ix86_invalid_unary_op,
	ix86_invalid_binary_op): Remove.
	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP,
	TARGET_INVALID_BINARY_OP): Don't redefine.
	* config/i386/i386-builtins.cc (ix86_bf16_type_node): Remove.
	(ix86_register_bf16_builtin_type): Use bfloat16_type_node rather than
	ix86_bf16_type_node.
	* config/i386/i386-builtin-types.def (BFLOAT16): Likewise.
	* config/aarch64/aarch64.h (aarch64_bf16_type_node): Remove.
	(aarch64_bf16_ptr_type_node): Adjust comment.
	* config/aarch64/aarch64.cc (aarch64_gimplify_va_arg_expr): Use
	bfloat16_type_node rather than aarch64_bf16_type_node.
	(aarch64_mangle_type): Mangle BFmode as DFb16_.
	(aarch64_invalid_conversion, aarch64_invalid_unary_op): Remove.
	aarch64_invalid_binary_op): Remove BFmode related rejections.
	(TARGET_INVALID_CONVERSION, TARGET_INVALID_UNARY_OP): Don't redefine.
	* config/aarch64/aarch64-builtins.cc (aarch64_bf16_type_node): Remove.
	(aarch64_int_or_fp_type): Use bfloat16_type_node rather than
	aarch64_bf16_type_node.
	(aarch64_init_simd_builtin_types, aarch64_init_bf16_types): Likewise.
	* config/aarch64/aarch64-sve-builtins.def (svbfloat16_t): Likewise.
gcc/c-family/
	* c-cppbuiltin.cc (c_cpp_builtins): If bfloat16_type_node,
	predefine for C++ __BFLT16_*__ macros and for C++23 also
	__STDCPP_BFLOAT16_T__.
	* c-lex.cc (interpret_float): Handle CPP_N_BFLOAT16 for C++.
gcc/cp/
	* cp-tree.h (extended_float_type_p): Return true for
	bfloat16_type_node.
	* typeck.cc (cp_compare_floating_point_conversion_ranks): Set
	extended{1,2} if mv{1,2} is bfloat16_type_node.  Adjust comment.
libcpp/
	* include/cpplib.h (CPP_N_BFLOAT16): Define.
	* expr.cc (interpret_float_suffix): Handle bf16 and BF16 suffixes for
	C++.
libgcc/
	* config/arm/sfp-machine.h (_FP_NANFRAC_B): Define.
	* config/aarch64/t-softfp (softfp_extensions): Add bfsf.
	(softfp_truncations): Add tfbf dfbf sfbf hfbf.
	* config/aarch64/libgcc-softfp.ver (GCC_13.0.0): Export
	__extendbfsf2 and __trunc{s,d,t,h}fbf2.
	* config/aarch64/sfp-machine.h (_FP_NANFRAC_B): Define.
	* config/i386/t-softfp (softfp_extensions): Add bfsf.
	(softfp_truncations): Add tfbf xfbf dfbf sfbf hfbf.
	* config/i386/libgcc-glibc.ver (GCC_13.0.0): Export
	__extendbfsf2 and __trunc{s,d,x,t,h}fbf2.
	* config/i386/sfp-machine.h (_FP_NANSIGN_B): Define.
	* config/i386/64/sfp-machine.h (_FP_NANFRAC_B): Define.
	* config/i386/32/sfp-machine.h (_FP_NANFRAC_B): Define.
	* soft-fp/brain.h: New file.
	* soft-fp/truncsfbf2.c: New file.
	* soft-fp/truncdfbf2.c: New file.
	* soft-fp/truncxfbf2.c: New file.
	* soft-fp/trunctfbf2.c: New file.
	* soft-fp/trunchfbf2.c: New file.
	* soft-fp/truncbfhf2.c: New file.
	* soft-fp/extendbfsf2.c: New file.
libiberty/
	* cp-demangle.h (D_BUILTIN_TYPE_COUNT): Increment.
	* cp-demangle.c (cplus_demangle_builtin_types): Add std::bfloat16_t
	entry.
	(cplus_demangle_type): Demangle DFb16_.
	* testsuite/demangle-expected (_Z3xxxDFb16_): New test.


	Jakub

--- gcc/tree-core.h.jj	2022-09-29 09:13:25.717718458 +0200
+++ gcc/tree-core.h	2022-09-29 12:40:17.417778754 +0200
@@ -665,6 +665,9 @@ enum tree_index {
   TI_DOUBLE_TYPE,
   TI_LONG_DOUBLE_TYPE,
 
+  /* __bf16 type if supported (used in C++ as std::bfloat16_t).  */
+  TI_BFLOAT16_TYPE,
+
   /* The _FloatN and _FloatNx types must be consecutive, and in the
      same sequence as the corresponding complex types, which must also
      be consecutive; _FloatN must come before _FloatNx; the order must
--- gcc/tree.h.jj	2022-09-29 09:13:25.720718416 +0200
+++ gcc/tree.h	2022-09-29 12:40:17.416778768 +0200
@@ -4285,6 +4285,7 @@ tree_strip_any_location_wrapper (tree ex
 #define float_type_node			global_trees[TI_FLOAT_TYPE]
 #define double_type_node		global_trees[TI_DOUBLE_TYPE]
 #define long_double_type_node		global_trees[TI_LONG_DOUBLE_TYPE]
+#define bfloat16_type_node		global_trees[TI_BFLOAT16_TYPE]
 
 /* Nodes for particular _FloatN and _FloatNx types in sequence.  */
 #define FLOATN_TYPE_NODE(IDX)		global_trees[TI_FLOATN_TYPE_FIRST + (IDX)]
--- gcc/tree.cc.jj	2022-09-29 09:13:31.328641080 +0200
+++ gcc/tree.cc	2022-09-29 12:40:17.400778985 +0200
@@ -7711,7 +7711,7 @@ excess_precision_type (tree type)
     = (flag_excess_precision == EXCESS_PRECISION_FAST
        ? EXCESS_PRECISION_TYPE_FAST
        : (flag_excess_precision == EXCESS_PRECISION_FLOAT16
-	  ? EXCESS_PRECISION_TYPE_FLOAT16 :EXCESS_PRECISION_TYPE_STANDARD));
+	  ? EXCESS_PRECISION_TYPE_FLOAT16 : EXCESS_PRECISION_TYPE_STANDARD));
 
   enum flt_eval_method target_flt_eval_method
     = targetm.c.excess_precision (requested_type);
@@ -7736,6 +7736,9 @@ excess_precision_type (tree type)
   machine_mode float16_type_mode = (float16_type_node
 				    ? TYPE_MODE (float16_type_node)
 				    : VOIDmode);
+  machine_mode bfloat16_type_mode = (bfloat16_type_node
+				     ? TYPE_MODE (bfloat16_type_node)
+				     : VOIDmode);
   machine_mode float_type_mode = TYPE_MODE (float_type_node);
   machine_mode double_type_mode = TYPE_MODE (double_type_node);
 
@@ -7747,16 +7750,19 @@ excess_precision_type (tree type)
 	switch (target_flt_eval_method)
 	  {
 	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
-	    if (type_mode == float16_type_mode)
+	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode)
 	      return float_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode)
 	      return double_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode
 		|| type_mode == double_type_mode)
 	      return long_double_type_node;
@@ -7774,16 +7780,19 @@ excess_precision_type (tree type)
 	switch (target_flt_eval_method)
 	  {
 	  case FLT_EVAL_METHOD_PROMOTE_TO_FLOAT:
-	    if (type_mode == float16_type_mode)
+	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode)
 	      return complex_float_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode)
 	      return complex_double_type_node;
 	    break;
 	  case FLT_EVAL_METHOD_PROMOTE_TO_LONG_DOUBLE:
 	    if (type_mode == float16_type_mode
+		|| type_mode == bfloat16_type_mode
 		|| type_mode == float_type_mode
 		|| type_mode == double_type_mode)
 	      return complex_long_double_type_node;
--- gcc/expmed.h.jj	2022-07-26 10:32:23.681271790 +0200
+++ gcc/expmed.h	2022-09-29 15:18:46.457023535 +0200
@@ -707,6 +707,8 @@ extern rtx expand_variable_shift (enum t
 				  rtx, tree, rtx, int);
 extern rtx expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx,
 			 int);
+extern rtx maybe_expand_shift (enum tree_code, machine_mode, rtx, int, rtx,
+			       int);
 #ifdef GCC_OPTABS_H
 extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
 			  rtx, int, enum optab_methods = OPTAB_LIB_WIDEN);
--- gcc/expmed.cc.jj	2022-08-31 10:20:20.000000000 +0200
+++ gcc/expmed.cc	2022-09-29 15:17:52.224769673 +0200
@@ -2705,7 +2705,7 @@ expand_shift (enum tree_code code, machi
 
 /* Likewise, but return 0 if that cannot be done.  */
 
-static rtx
+rtx
 maybe_expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
 		    int amount, rtx target, int unsignedp)
 {
--- gcc/expr.cc.jj	2022-09-09 09:50:35.228575531 +0200
+++ gcc/expr.cc	2022-09-29 17:09:46.716352938 +0200
@@ -344,7 +344,11 @@ convert_mode_scalar (rtx to, rtx from, i
       gcc_assert ((GET_MODE_PRECISION (from_mode)
 		   != GET_MODE_PRECISION (to_mode))
 		  || (DECIMAL_FLOAT_MODE_P (from_mode)
-		      != DECIMAL_FLOAT_MODE_P (to_mode)));
+		      != DECIMAL_FLOAT_MODE_P (to_mode))
+		  || (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
+		      && REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
+		  || (REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
+		      && REAL_MODE_FORMAT (from_mode) == &ieee_half_format));
 
       if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
 	/* Conversion between decimal float and binary float, same size.  */
@@ -364,6 +368,150 @@ convert_mode_scalar (rtx to, rtx from, i
 	  return;
 	}
 
+#ifdef HAVE_SFmode
+      if (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
+	  && REAL_MODE_FORMAT (SFmode) == &ieee_single_format)
+	{
+	  if (GET_MODE_PRECISION (to_mode) > GET_MODE_PRECISION (SFmode))
+	    {
+	      /* To cut down on libgcc size, implement
+		 BFmode -> {DF,XF,TF}mode conversions by
+		 BFmode -> SFmode -> {DF,XF,TF}mode conversions.  */
+	      rtx temp = gen_reg_rtx (SFmode);
+	      convert_mode_scalar (temp, from, unsignedp);
+	      convert_mode_scalar (to, temp, unsignedp);
+	      return;
+	    }
+	  if (REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
+	    {
+	      /* Similarly, implement BFmode -> HFmode as
+		 BFmode -> SFmode -> HFmode conversion where SFmode
+		 has superset of BFmode values.  We don't need
+		 to handle sNaNs by raising exception and turning
+		 into into qNaN though, as that can be done in the
+		 SFmode -> HFmode conversion too.  */
+	      rtx temp = gen_reg_rtx (SFmode);
+	      int save_flag_finite_math_only = flag_finite_math_only;
+	      flag_finite_math_only = true;
+	      convert_mode_scalar (temp, from, unsignedp);
+	      flag_finite_math_only = save_flag_finite_math_only;
+	      convert_mode_scalar (to, temp, unsignedp);
+	      return;
+	    }
+	  if (to_mode == SFmode
+	      && !HONOR_NANS (from_mode)
+	      && !HONOR_NANS (to_mode)
+	      && optimize_insn_for_speed_p ())
+	    {
+	      /* If we don't expect sNaNs, for BFmode -> SFmode we can just
+		 shift the bits up.  */
+	      machine_mode fromi_mode, toi_mode;
+	      if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
+				     0).exists (&fromi_mode)
+		  && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
+					0).exists (&toi_mode))
+		{
+		  start_sequence ();
+		  rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
+		  rtx tof = NULL_RTX;
+		  if (fromi)
+		    {
+		      rtx toi = gen_reg_rtx (toi_mode);
+		      convert_mode_scalar (toi, fromi, 1);
+		      toi
+			= maybe_expand_shift (LSHIFT_EXPR, toi_mode, toi,
+					      GET_MODE_PRECISION (to_mode)
+					      - GET_MODE_PRECISION (from_mode),
+					      NULL_RTX, 1);
+		      if (toi)
+			{
+			  tof = lowpart_subreg (to_mode, toi, toi_mode);
+			  if (tof)
+			    emit_move_insn (to, tof);
+			}
+		    }
+		  insns = get_insns ();
+		  end_sequence ();
+		  if (tof)
+		    {
+		      emit_insn (insns);
+		      return;
+		    }
+		}
+	    }
+	}
+      if (REAL_MODE_FORMAT (from_mode) == &ieee_single_format
+	  && REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
+	  && !HONOR_NANS (from_mode)
+	  && !HONOR_NANS (to_mode)
+	  && !flag_rounding_math
+	  && optimize_insn_for_speed_p ())
+	{
+	  /* If we don't expect qNaNs nor sNaNs and can assume rounding
+	     to nearest, we can expand the conversion inline as
+	     (fromi + 0x7fff + ((fromi >> 16) & 1)) >> 16.  */
+	  machine_mode fromi_mode, toi_mode;
+	  if (int_mode_for_size (GET_MODE_BITSIZE (from_mode),
+				 0).exists (&fromi_mode)
+	      && int_mode_for_size (GET_MODE_BITSIZE (to_mode),
+				    0).exists (&toi_mode))
+	    {
+	      start_sequence ();
+	      rtx fromi = lowpart_subreg (fromi_mode, from, from_mode);
+	      rtx tof = NULL_RTX;
+	      do
+		{
+		  if (!fromi)
+		    break;
+		  int shift = (GET_MODE_PRECISION (from_mode)
+			       - GET_MODE_PRECISION (to_mode));
+		  rtx temp1
+		    = maybe_expand_shift (RSHIFT_EXPR, fromi_mode, fromi,
+					  shift, NULL_RTX, 1);
+		  if (!temp1)
+		    break;
+		  rtx temp2
+		    = expand_binop (fromi_mode, and_optab, temp1, const1_rtx,
+				    NULL_RTX, 1, OPTAB_DIRECT);
+		  if (!temp2)
+		    break;
+		  rtx temp3
+		    = expand_binop (fromi_mode, add_optab, fromi,
+				    gen_int_mode ((HOST_WIDE_INT_1U
+						   << (shift - 1)) - 1,
+						  fromi_mode), NULL_RTX,
+				    1, OPTAB_DIRECT);
+		  if (!temp3)
+		    break;
+		  rtx temp4
+		    = expand_binop (fromi_mode, add_optab, temp3, temp2,
+				    NULL_RTX, 1, OPTAB_DIRECT);
+		  if (!temp4)
+		    break;
+		  rtx temp5 = maybe_expand_shift (RSHIFT_EXPR, fromi_mode,
+						  temp4, shift, NULL_RTX, 1);
+		  if (!temp5)
+		    break;
+		  rtx temp6 = lowpart_subreg (toi_mode, temp5, fromi_mode);
+		  if (!temp6)
+		    break;
+		  tof = lowpart_subreg (to_mode, force_reg (toi_mode, temp6),
+					toi_mode);
+		  if (tof)
+		    emit_move_insn (to, tof);
+		}
+	      while (0);
+	      insns = get_insns ();
+	      end_sequence ();
+	      if (tof)
+		{
+		  emit_insn (insns);
+		  return;
+		}
+	    }
+	}
+#endif
+
       /* Otherwise use a libcall.  */
       libcall = convert_optab_libfunc (tab, to_mode, from_mode);
 
--- gcc/config/arm/arm.h.jj	2022-09-29 09:13:25.709718568 +0200
+++ gcc/config/arm/arm.h	2022-09-29 12:40:17.401778971 +0200
@@ -78,9 +78,8 @@ extern void (*arm_lang_output_object_att
    the backend.  Defined in arm-builtins.cc.  */
 extern tree arm_fp16_type_node;
 
-/* This type is the user-visible __bf16.  We need it in a few places in
-   the backend.  Defined in arm-builtins.cc.  */
-extern tree arm_bf16_type_node;
+/* The user-visible __bf16 uses bfloat16_type_node, but for pointer to that
+   use backend specific tree.  Defined in arm-builtins.cc.  */
 extern tree arm_bf16_ptr_type_node;
 
 
--- gcc/config/arm/arm.cc.jj	2022-09-29 09:13:25.709718568 +0200
+++ gcc/config/arm/arm.cc	2022-09-29 15:33:07.997170885 +0200
@@ -688,12 +688,6 @@ static const struct attribute_spec arm_a
 #undef TARGET_INVALID_CONVERSION
 #define TARGET_INVALID_CONVERSION arm_invalid_conversion
 
-#undef TARGET_INVALID_UNARY_OP
-#define TARGET_INVALID_UNARY_OP arm_invalid_unary_op
-
-#undef TARGET_INVALID_BINARY_OP
-#define TARGET_INVALID_BINARY_OP arm_invalid_binary_op
-
 #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV arm_atomic_assign_expand_fenv
 
@@ -30360,7 +30354,7 @@ arm_mangle_type (const_tree type)
   if (TREE_CODE (type) == REAL_TYPE && TYPE_PRECISION (type) == 16)
     {
       if (TYPE_MODE (type) == BFmode)
-	return "u6__bf16";
+	return "DFb16_";
       else
 	return "Dh";
     }
@@ -33996,47 +33990,22 @@ arm_invalid_conversion (const_tree fromt
 {
   if (element_mode (fromtype) != element_mode (totype))
     {
-      /* Do no allow conversions to/from BFmode scalar types.  */
-      if (TYPE_MODE (fromtype) == BFmode)
-	return N_("invalid conversion from type %<bfloat16_t%>");
-      if (TYPE_MODE (totype) == BFmode)
-	return N_("invalid conversion to type %<bfloat16_t%>");
+      /* Do no allow conversions from BFmode to non-ieee HFmode
+	 scalar types or vice versa.  */
+      if (TYPE_MODE (fromtype) == BFmode
+	  && TYPE_MODE (totype) == HFmode
+	  && arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
+	return N_("invalid conversion from type %<bfloat16_t%> to %<__fp16%>");
+      if (TYPE_MODE (totype) == BFmode
+	  && TYPE_MODE (fromtype) == HFmode
+	  && arm_fp16_format == ARM_FP16_FORMAT_ALTERNATIVE)
+	return N_("invalid conversion to type %<bfloat16_t%> from %<__fp16%>");
     }
 
   /* Conversion allowed.  */
   return NULL;
 }
 
-/* Return the diagnostic message string if the unary operation OP is
-   not permitted on TYPE, NULL otherwise.  */
-
-static const char *
-arm_invalid_unary_op (int op, const_tree type)
-{
-  /* Reject all single-operand operations on BFmode except for &.  */
-  if (element_mode (type) == BFmode && op != ADDR_EXPR)
-    return N_("operation not permitted on type %<bfloat16_t%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
-/* Return the diagnostic message string if the binary operation OP is
-   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
-
-static const char *
-arm_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
-			   const_tree type2)
-{
-  /* Reject all 2-operand operations on BFmode.  */
-  if (element_mode (type1) == BFmode
-      || element_mode (type2) == BFmode)
-    return N_("operation not permitted on type %<bfloat16_t%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
 /* Implement TARGET_CAN_CHANGE_MODE_CLASS.
 
    In VFPv1, VFP registers could only be accessed in the mode they were
--- gcc/config/arm/arm-builtins.cc.jj	2022-09-29 09:13:25.681718954 +0200
+++ gcc/config/arm/arm-builtins.cc	2022-09-29 12:40:17.405778917 +0200
@@ -1370,7 +1370,6 @@ struct arm_simd_type_info arm_simd_types
 tree arm_fp16_type_node = NULL_TREE;
 
 /* Back-end node type for brain float (bfloat) types.  */
-tree arm_bf16_type_node = NULL_TREE;
 tree arm_bf16_ptr_type_node = NULL_TREE;
 
 static tree arm_simd_intOI_type_node = NULL_TREE;
@@ -1459,7 +1458,7 @@ arm_simd_builtin_std_type (machine_mode
     case E_DFmode:
       return double_type_node;
     case E_BFmode:
-      return arm_bf16_type_node;
+      return bfloat16_type_node;
     default:
       gcc_unreachable ();
     }
@@ -1570,9 +1569,9 @@ arm_init_simd_builtin_types (void)
   arm_simd_types[Float32x4_t].eltype = float_type_node;
 
   /* Init Bfloat vector types with underlying __bf16 scalar type.  */
-  arm_simd_types[Bfloat16x2_t].eltype = arm_bf16_type_node;
-  arm_simd_types[Bfloat16x4_t].eltype = arm_bf16_type_node;
-  arm_simd_types[Bfloat16x8_t].eltype = arm_bf16_type_node;
+  arm_simd_types[Bfloat16x2_t].eltype = bfloat16_type_node;
+  arm_simd_types[Bfloat16x4_t].eltype = bfloat16_type_node;
+  arm_simd_types[Bfloat16x8_t].eltype = bfloat16_type_node;
 
   for (i = 0; i < nelts; i++)
     {
@@ -1658,7 +1657,7 @@ arm_init_simd_builtin_scalar_types (void
 					     "__builtin_neon_df");
   (*lang_hooks.types.register_builtin_type) (intTI_type_node,
 					     "__builtin_neon_ti");
-  (*lang_hooks.types.register_builtin_type) (arm_bf16_type_node,
+  (*lang_hooks.types.register_builtin_type) (bfloat16_type_node,
                                              "__builtin_neon_bf");
   /* Unsigned integer types for various mode sizes.  */
   (*lang_hooks.types.register_builtin_type) (unsigned_intQI_type_node,
@@ -1797,13 +1796,13 @@ arm_init_builtin (unsigned int fcode, ar
 static void
 arm_init_bf16_types (void)
 {
-  arm_bf16_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (arm_bf16_type_node) = 16;
-  SET_TYPE_MODE (arm_bf16_type_node, BFmode);
-  layout_type (arm_bf16_type_node);
+  bfloat16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (bfloat16_type_node) = 16;
+  SET_TYPE_MODE (bfloat16_type_node, BFmode);
+  layout_type (bfloat16_type_node);
 
-  lang_hooks.types.register_builtin_type (arm_bf16_type_node, "__bf16");
-  arm_bf16_ptr_type_node = build_pointer_type (arm_bf16_type_node);
+  lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
+  arm_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
 }
 
 /* Set up ACLE builtins, even builtins for instructions that are not
--- gcc/config/i386/i386.cc.jj	2022-09-29 12:03:12.073350093 +0200
+++ gcc/config/i386/i386.cc	2022-09-29 12:40:17.409778863 +0200
@@ -22728,7 +22728,7 @@ ix86_mangle_type (const_tree type)
   switch (TYPE_MODE (type))
     {
     case E_BFmode:
-      return "u6__bf16";
+      return "DFb16_";
     case E_HFmode:
       /* _Float16 is "DF16_".
 	 Align with clang's decision in https://reviews.llvm.org/D33719. */
@@ -22747,55 +22747,6 @@ ix86_mangle_type (const_tree type)
     }
 }
 
-/* Return the diagnostic message string if conversion from FROMTYPE to
-   TOTYPE is not allowed, NULL otherwise.  */
-
-static const char *
-ix86_invalid_conversion (const_tree fromtype, const_tree totype)
-{
-  if (element_mode (fromtype) != element_mode (totype))
-    {
-      /* Do no allow conversions to/from BFmode scalar types.  */
-      if (TYPE_MODE (fromtype) == BFmode)
-	return N_("invalid conversion from type %<__bf16%>");
-      if (TYPE_MODE (totype) == BFmode)
-	return N_("invalid conversion to type %<__bf16%>");
-    }
-
-  /* Conversion allowed.  */
-  return NULL;
-}
-
-/* Return the diagnostic message string if the unary operation OP is
-   not permitted on TYPE, NULL otherwise.  */
-
-static const char *
-ix86_invalid_unary_op (int op, const_tree type)
-{
-  /* Reject all single-operand operations on BFmode except for &.  */
-  if (element_mode (type) == BFmode && op != ADDR_EXPR)
-    return N_("operation not permitted on type %<__bf16%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
-/* Return the diagnostic message string if the binary operation OP is
-   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
-
-static const char *
-ix86_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
-			   const_tree type2)
-{
-  /* Reject all 2-operand operations on BFmode.  */
-  if (element_mode (type1) == BFmode
-      || element_mode (type2) == BFmode)
-    return N_("operation not permitted on type %<__bf16%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
 static GTY(()) tree ix86_tls_stack_chk_guard_decl;
 
 static tree
@@ -24853,15 +24804,6 @@ ix86_libgcc_floating_mode_supported_p
 #undef TARGET_MANGLE_TYPE
 #define TARGET_MANGLE_TYPE ix86_mangle_type
 
-#undef TARGET_INVALID_CONVERSION
-#define TARGET_INVALID_CONVERSION ix86_invalid_conversion
-
-#undef TARGET_INVALID_UNARY_OP
-#define TARGET_INVALID_UNARY_OP ix86_invalid_unary_op
-
-#undef TARGET_INVALID_BINARY_OP
-#define TARGET_INVALID_BINARY_OP ix86_invalid_binary_op
-
 #undef TARGET_STACK_PROTECT_GUARD
 #define TARGET_STACK_PROTECT_GUARD ix86_stack_protect_guard
 
--- gcc/config/i386/i386-builtins.cc.jj	2022-09-29 09:13:25.710718554 +0200
+++ gcc/config/i386/i386-builtins.cc	2022-09-29 12:40:17.406778903 +0200
@@ -126,7 +126,6 @@ BDESC_VERIFYS (IX86_BUILTIN_MAX,
 static GTY(()) tree ix86_builtin_type_tab[(int) IX86_BT_LAST_CPTR + 1];
 
 tree ix86_float16_type_node = NULL_TREE;
-tree ix86_bf16_type_node = NULL_TREE;
 tree ix86_bf16_ptr_type_node = NULL_TREE;
 
 /* Retrieve an element from the above table, building some of
@@ -1372,16 +1371,15 @@ ix86_register_float16_builtin_type (void
 static void
 ix86_register_bf16_builtin_type (void)
 {
-  ix86_bf16_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (ix86_bf16_type_node) = 16;
-  SET_TYPE_MODE (ix86_bf16_type_node, BFmode);
-  layout_type (ix86_bf16_type_node);
+  bfloat16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (bfloat16_type_node) = 16;
+  SET_TYPE_MODE (bfloat16_type_node, BFmode);
+  layout_type (bfloat16_type_node);
 
   if (!maybe_get_identifier ("__bf16") && TARGET_SSE2)
     {
-      lang_hooks.types.register_builtin_type (ix86_bf16_type_node,
-					    "__bf16");
-      ix86_bf16_ptr_type_node = build_pointer_type (ix86_bf16_type_node);
+      lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
+      ix86_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
     }
 }
 
--- gcc/config/i386/i386-builtin-types.def.jj	2022-09-29 09:13:25.709718568 +0200
+++ gcc/config/i386/i386-builtin-types.def	2022-09-29 12:40:17.406778903 +0200
@@ -69,7 +69,7 @@ DEF_PRIMITIVE_TYPE (UINT16, short_unsign
 DEF_PRIMITIVE_TYPE (INT64, long_long_integer_type_node)
 DEF_PRIMITIVE_TYPE (UINT64, long_long_unsigned_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT16, ix86_float16_type_node)
-DEF_PRIMITIVE_TYPE (BFLOAT16, ix86_bf16_type_node)
+DEF_PRIMITIVE_TYPE (BFLOAT16, bfloat16_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT, float_type_node)
 DEF_PRIMITIVE_TYPE (DOUBLE, double_type_node)
 DEF_PRIMITIVE_TYPE (FLOAT80, float80_type_node)
--- gcc/config/aarch64/aarch64.h.jj	2022-09-29 09:13:25.680718968 +0200
+++ gcc/config/aarch64/aarch64.h	2022-09-29 12:40:17.409778863 +0200
@@ -1337,9 +1337,8 @@ extern const char *aarch64_rewrite_mcpu
 extern GTY(()) tree aarch64_fp16_type_node;
 extern GTY(()) tree aarch64_fp16_ptr_type_node;
 
-/* This type is the user-visible __bf16, and a pointer to that type.  Defined
-   in aarch64-builtins.cc.  */
-extern GTY(()) tree aarch64_bf16_type_node;
+/* Pointer to the user-visible __bf16 type.  __bf16 itself is generic
+   bfloat16_type_node.  Defined in aarch64-builtins.cc.  */
 extern GTY(()) tree aarch64_bf16_ptr_type_node;
 
 /* The generic unwind code in libgcc does not initialize the frame pointer.
--- gcc/config/aarch64/aarch64.cc.jj	2022-09-29 09:13:25.680718968 +0200
+++ gcc/config/aarch64/aarch64.cc	2022-09-29 12:40:17.413778808 +0200
@@ -19741,7 +19741,7 @@ aarch64_gimplify_va_arg_expr (tree valis
 	  field_ptr_t = aarch64_fp16_ptr_type_node;
 	  break;
 	case E_BFmode:
-	  field_t = aarch64_bf16_type_node;
+	  field_t = bfloat16_type_node;
 	  field_ptr_t = aarch64_bf16_ptr_type_node;
 	  break;
 	case E_V2SImode:
@@ -20645,7 +20645,7 @@ aarch64_mangle_type (const_tree type)
   if (TREE_CODE (type) == REAL_TYPE && TYPE_PRECISION (type) == 16)
     {
       if (TYPE_MODE (type) == BFmode)
-	return "u6__bf16";
+	return "DFb16_";
       else
 	return "Dh";
     }
@@ -26820,39 +26820,6 @@ aarch64_stack_protect_guard (void)
   return NULL_TREE;
 }
 
-/* Return the diagnostic message string if conversion from FROMTYPE to
-   TOTYPE is not allowed, NULL otherwise.  */
-
-static const char *
-aarch64_invalid_conversion (const_tree fromtype, const_tree totype)
-{
-  if (element_mode (fromtype) != element_mode (totype))
-    {
-      /* Do no allow conversions to/from BFmode scalar types.  */
-      if (TYPE_MODE (fromtype) == BFmode)
-	return N_("invalid conversion from type %<bfloat16_t%>");
-      if (TYPE_MODE (totype) == BFmode)
-	return N_("invalid conversion to type %<bfloat16_t%>");
-    }
-
-  /* Conversion allowed.  */
-  return NULL;
-}
-
-/* Return the diagnostic message string if the unary operation OP is
-   not permitted on TYPE, NULL otherwise.  */
-
-static const char *
-aarch64_invalid_unary_op (int op, const_tree type)
-{
-  /* Reject all single-operand operations on BFmode except for &.  */
-  if (element_mode (type) == BFmode && op != ADDR_EXPR)
-    return N_("operation not permitted on type %<bfloat16_t%>");
-
-  /* Operation allowed.  */
-  return NULL;
-}
-
 /* Return the diagnostic message string if the binary operation OP is
    not permitted on TYPE1 and TYPE2, NULL otherwise.  */
 
@@ -26860,11 +26827,6 @@ static const char *
 aarch64_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
 			   const_tree type2)
 {
-  /* Reject all 2-operand operations on BFmode.  */
-  if (element_mode (type1) == BFmode
-      || element_mode (type2) == BFmode)
-    return N_("operation not permitted on type %<bfloat16_t%>");
-
   if (VECTOR_TYPE_P (type1)
       && VECTOR_TYPE_P (type2)
       && !TYPE_INDIVISIBLE_P (type1)
@@ -27461,12 +27423,6 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_MANGLE_TYPE
 #define TARGET_MANGLE_TYPE aarch64_mangle_type
 
-#undef TARGET_INVALID_CONVERSION
-#define TARGET_INVALID_CONVERSION aarch64_invalid_conversion
-
-#undef TARGET_INVALID_UNARY_OP
-#define TARGET_INVALID_UNARY_OP aarch64_invalid_unary_op
-
 #undef TARGET_INVALID_BINARY_OP
 #define TARGET_INVALID_BINARY_OP aarch64_invalid_binary_op
 
--- gcc/config/aarch64/aarch64-builtins.cc.jj	2022-09-29 09:13:25.676719023 +0200
+++ gcc/config/aarch64/aarch64-builtins.cc	2022-09-29 12:40:17.410778849 +0200
@@ -918,7 +918,6 @@ tree aarch64_fp16_type_node = NULL_TREE;
 tree aarch64_fp16_ptr_type_node = NULL_TREE;
 
 /* Back-end node type for brain float (bfloat) types.  */
-tree aarch64_bf16_type_node = NULL_TREE;
 tree aarch64_bf16_ptr_type_node = NULL_TREE;
 
 /* Wrapper around add_builtin_function.  NAME is the name of the built-in
@@ -1010,7 +1009,7 @@ aarch64_int_or_fp_type (machine_mode mod
     case E_DFmode:
       return double_type_node;
     case E_BFmode:
-      return aarch64_bf16_type_node;
+      return bfloat16_type_node;
     default:
       gcc_unreachable ();
     }
@@ -1124,8 +1123,8 @@ aarch64_init_simd_builtin_types (void)
   aarch64_simd_types[Float64x2_t].eltype = double_type_node;
 
   /* Init Bfloat vector types with underlying __bf16 type.  */
-  aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node;
-  aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node;
+  aarch64_simd_types[Bfloat16x4_t].eltype = bfloat16_type_node;
+  aarch64_simd_types[Bfloat16x8_t].eltype = bfloat16_type_node;
 
   for (i = 0; i < nelts; i++)
     {
@@ -1197,7 +1196,7 @@ aarch64_init_simd_builtin_scalar_types (
 					     "__builtin_aarch64_simd_poly128");
   (*lang_hooks.types.register_builtin_type) (intTI_type_node,
 					     "__builtin_aarch64_simd_ti");
-  (*lang_hooks.types.register_builtin_type) (aarch64_bf16_type_node,
+  (*lang_hooks.types.register_builtin_type) (bfloat16_type_node,
 					     "__builtin_aarch64_simd_bf");
   /* Unsigned integer types for various mode sizes.  */
   (*lang_hooks.types.register_builtin_type) (unsigned_intQI_type_node,
@@ -1682,13 +1681,13 @@ aarch64_init_fp16_types (void)
 static void
 aarch64_init_bf16_types (void)
 {
-  aarch64_bf16_type_node = make_node (REAL_TYPE);
-  TYPE_PRECISION (aarch64_bf16_type_node) = 16;
-  SET_TYPE_MODE (aarch64_bf16_type_node, BFmode);
-  layout_type (aarch64_bf16_type_node);
+  bfloat16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (bfloat16_type_node) = 16;
+  SET_TYPE_MODE (bfloat16_type_node, BFmode);
+  layout_type (bfloat16_type_node);
 
-  lang_hooks.types.register_builtin_type (aarch64_bf16_type_node, "__bf16");
-  aarch64_bf16_ptr_type_node = build_pointer_type (aarch64_bf16_type_node);
+  lang_hooks.types.register_builtin_type (bfloat16_type_node, "__bf16");
+  aarch64_bf16_ptr_type_node = build_pointer_type (bfloat16_type_node);
 }
 
 /* Pointer authentication builtins that will become NOP on legacy platform.
--- gcc/config/aarch64/aarch64-sve-builtins.def.jj	2022-09-29 09:13:25.676719023 +0200
+++ gcc/config/aarch64/aarch64-sve-builtins.def	2022-09-29 12:40:17.413778808 +0200
@@ -61,7 +61,7 @@ DEF_SVE_MODE (u64offset, none, svuint64_
 DEF_SVE_MODE (vnum, none, none, vectors)
 
 DEF_SVE_TYPE (svbool_t, 10, __SVBool_t, boolean_type_node)
-DEF_SVE_TYPE (svbfloat16_t, 14, __SVBfloat16_t, aarch64_bf16_type_node)
+DEF_SVE_TYPE (svbfloat16_t, 14, __SVBfloat16_t, bfloat16_type_node)
 DEF_SVE_TYPE (svfloat16_t, 13, __SVFloat16_t, aarch64_fp16_type_node)
 DEF_SVE_TYPE (svfloat32_t, 13, __SVFloat32_t, float_type_node)
 DEF_SVE_TYPE (svfloat64_t, 13, __SVFloat64_t, double_type_node)
--- gcc/c-family/c-cppbuiltin.cc.jj	2022-09-29 09:13:25.675719037 +0200
+++ gcc/c-family/c-cppbuiltin.cc	2022-09-29 12:40:17.416778768 +0200
@@ -1264,6 +1264,13 @@ c_cpp_builtins (cpp_reader *pfile)
       builtin_define_float_constants (prefix, ggc_strdup (csuffix), "%s",
 				      csuffix, FLOATN_NX_TYPE_NODE (i));
     }
+  if (bfloat16_type_node && c_dialect_cxx ())
+    {
+      if (cxx_dialect > cxx20)
+	cpp_define (pfile, "__STDCPP_BFLOAT16_T__=1");
+      builtin_define_float_constants ("BFLT16", "BF16", "%s",
+				      "BF16", bfloat16_type_node);
+    }
 
   /* For float.h.  */
   if (targetm.decimal_float_supported_p ())
--- gcc/c-family/c-lex.cc.jj	2022-09-29 09:13:25.675719037 +0200
+++ gcc/c-family/c-lex.cc	2022-09-29 12:40:17.416778768 +0200
@@ -995,6 +995,19 @@ interpret_float (const cpp_token *token,
 	  pedwarn (input_location, OPT_Wpedantic,
 		   "non-standard suffix on floating constant");
       }
+    else if ((flags & CPP_N_BFLOAT16) != 0 && c_dialect_cxx ())
+      {
+	type = bfloat16_type_node;
+	if (type == NULL_TREE)
+	  {
+	    error ("unsupported non-standard suffix on floating constant");
+	    return error_mark_node;
+	  }
+	if (cxx_dialect < cxx23)
+	  pedwarn (input_location, OPT_Wpedantic,
+		   "%<bf16%> or %<BF16%> suffix on floating constant only "
+		   "available with %<-std=c++2b%> or %<-std=gnu++2b%>");
+      }
     else if ((flags & CPP_N_WIDTH) == CPP_N_LARGE)
       type = long_double_type_node;
     else if ((flags & CPP_N_WIDTH) == CPP_N_SMALL
--- gcc/cp/cp-tree.h.jj	2022-09-29 09:13:31.164643341 +0200
+++ gcc/cp/cp-tree.h	2022-09-29 12:40:17.414778795 +0200
@@ -8714,6 +8714,8 @@ extended_float_type_p (tree type)
   for (int i = 0; i < NUM_FLOATN_NX_TYPES; ++i)
     if (type == FLOATN_TYPE_NODE (i))
       return true;
+  if (type == bfloat16_type_node)
+    return true;
   return false;
 }
 
--- gcc/cp/typeck.cc.jj	2022-09-29 09:13:25.716718472 +0200
+++ gcc/cp/typeck.cc	2022-09-29 12:40:17.415778781 +0200
@@ -293,6 +293,10 @@ cp_compare_floating_point_conversion_ran
       if (mv2 == FLOATN_NX_TYPE_NODE (i))
 	extended2 = i + 1;
     }
+  if (mv1 == bfloat16_type_node)
+    extended1 = true;
+  if (mv2 == bfloat16_type_node)
+    extended2 = true;
   if (extended2 && !extended1)
     {
       int ret = cp_compare_floating_point_conversion_ranks (t2, t1);
@@ -390,7 +394,9 @@ cp_compare_floating_point_conversion_ran
   if (cnt > 1 && mv2 == long_double_type_node)
     return -2;
   /* Otherwise, they have equal rank, but extended types
-     (other than std::bfloat16_t) have higher subrank.  */
+     (other than std::bfloat16_t) have higher subrank.
+     std::bfloat16_t shouldn't have equal rank to any standard
+     floating point type.  */
   return 1;
 }
 
--- libcpp/include/cpplib.h.jj	2022-09-08 13:01:19.853771383 +0200
+++ libcpp/include/cpplib.h	2022-09-28 19:06:59.615380690 +0200
@@ -1275,6 +1275,7 @@ struct cpp_num
 #define CPP_N_USERDEF	0x1000000 /* C++11 user-defined literal.  */
 
 #define CPP_N_SIZE_T	0x2000000 /* C++23 size_t literal.  */
+#define CPP_N_BFLOAT16	0x4000000 /* std::bfloat16_t type.  */
 
 #define CPP_N_WIDTH_FLOATN_NX	0xF0000000 /* _FloatN / _FloatNx value
 					      of N, divided by 16.  */
--- libcpp/expr.cc.jj	2022-09-27 08:03:27.119982735 +0200
+++ libcpp/expr.cc	2022-09-28 17:55:36.667177540 +0200
@@ -91,10 +91,10 @@ interpret_float_suffix (cpp_reader *pfil
   size_t orig_len = len;
   const uchar *orig_s = s;
   size_t flags;
-  size_t f, d, l, w, q, i, fn, fnx, fn_bits;
+  size_t f, d, l, w, q, i, fn, fnx, fn_bits, bf16;
 
   flags = 0;
-  f = d = l = w = q = i = fn = fnx = fn_bits = 0;
+  f = d = l = w = q = i = fn = fnx = fn_bits = bf16 = 0;
 
   /* The following decimal float suffixes, from TR 24732:2009, TS
      18661-2:2015 and C2X, are supported:
@@ -131,7 +131,8 @@ interpret_float_suffix (cpp_reader *pfil
      w, W - machine-specific type such as __float80 (GNU extension).
      q, Q - machine-specific type such as __float128 (GNU extension).
      fN, FN - _FloatN (TS 18661-3:2015).
-     fNx, FNx - _FloatNx (TS 18661-3:2015).  */
+     fNx, FNx - _FloatNx (TS 18661-3:2015).
+     bf16, BF16 - std::bfloat16_t (ISO C++23).  */
 
   /* Process decimal float suffixes, which are two letters starting
      with d or D.  Order and case are significant.  */
@@ -239,6 +240,20 @@ interpret_float_suffix (cpp_reader *pfil
 		fn++;
 	    }
 	  break;
+	case 'b': case 'B':
+	  if (len > 2
+	      /* Except for bf16 / BF16 where case is significant.  */
+	      && s[1] == (s[0] == 'b' ? 'f' : 'F')
+	      && s[2] == '1'
+	      && s[3] == '6'
+	      && CPP_OPTION (pfile, cplusplus))
+	    {
+	      bf16++;
+	      len -= 3;
+	      s += 3;
+	      break;
+	    }
+	  return 0;
 	case 'd': case 'D': d++; break;
 	case 'l': case 'L': l++; break;
 	case 'w': case 'W': w++; break;
@@ -257,7 +272,7 @@ interpret_float_suffix (cpp_reader *pfil
      of N larger than can be represented in the return value.  The
      caller is responsible for rejecting _FloatN suffixes where
      _FloatN is not supported on the chosen target.  */
-  if (f + d + l + w + q + fn + fnx > 1 || i > 1)
+  if (f + d + l + w + q + fn + fnx + bf16 > 1 || i > 1)
     return 0;
   if (fn_bits > CPP_FLOATN_MAX)
     return 0;
@@ -295,6 +310,7 @@ interpret_float_suffix (cpp_reader *pfil
 	     q ? CPP_N_MD_Q :
 	     fn ? CPP_N_FLOATN | (fn_bits << CPP_FLOATN_SHIFT) :
 	     fnx ? CPP_N_FLOATNX | (fn_bits << CPP_FLOATN_SHIFT) :
+	     bf16 ? CPP_N_BFLOAT16 :
 	     CPP_N_DEFAULT));
 }
 
--- libgcc/config/arm/sfp-machine.h.jj	2020-01-12 11:54:38.615380187 +0100
+++ libgcc/config/arm/sfp-machine.h	2022-09-28 19:02:51.922710542 +0200
@@ -22,6 +22,7 @@ typedef int __gcc_CMPtype __attribute__
 /* According to RTABI, QNAN is only with the most significant bit of the
    significand set, and all other significand bits zero.  */
 #define _FP_NANFRAC_H		_FP_QNANBIT_H
+#define _FP_NANFRAC_B		_FP_QNANBIT_B
 #define _FP_NANFRAC_S		_FP_QNANBIT_S
 #define _FP_NANFRAC_D		_FP_QNANBIT_D, 0
 #define _FP_NANFRAC_Q		_FP_QNANBIT_Q, 0, 0, 0
--- libgcc/config/aarch64/t-softfp.jj	2020-09-29 11:32:02.988602194 +0200
+++ libgcc/config/aarch64/t-softfp	2022-09-28 18:59:43.381246466 +0200
@@ -1,7 +1,7 @@
 softfp_float_modes := tf
 softfp_int_modes := si di ti
-softfp_extensions := sftf dftf hftf
-softfp_truncations := tfsf tfdf tfhf
+softfp_extensions := sftf dftf hftf bfsf
+softfp_truncations := tfsf tfdf tfhf tfbf dfbf sfbf hfbf
 softfp_exclude_libgcc2 := n
 softfp_extras := fixhfti fixunshfti floattihf floatuntihf
 
--- libgcc/config/aarch64/libgcc-softfp.ver.jj	2022-01-11 23:11:23.691271871 +0100
+++ libgcc/config/aarch64/libgcc-softfp.ver	2022-09-28 19:00:36.050537146 +0200
@@ -26,3 +26,12 @@ GCC_11.0 {
   __mulhc3
   __trunctfhf2
 }
+
+%inherit GCC_13.0.0 GCC_11.0.0
+GCC_13.0.0 {
+  __extendbfsf2
+  __truncdfbf2
+  __truncsfbf2
+  __trunctfbf2
+  __trunchfbf2
+}
--- libgcc/config/aarch64/sfp-machine.h.jj	2022-01-11 23:11:23.691271871 +0100
+++ libgcc/config/aarch64/sfp-machine.h	2022-09-28 19:02:10.303270053 +0200
@@ -43,6 +43,7 @@ typedef int __gcc_CMPtype __attribute__
 #define _FP_DIV_MEAT_Q(R,X,Y)	_FP_DIV_MEAT_2_udiv(Q,R,X,Y)
 
 #define _FP_NANFRAC_H		((_FP_QNANBIT_H << 1) - 1)
+#define _FP_NANFRAC_B		((_FP_QNANBIT_B << 1) - 1)
 #define _FP_NANFRAC_S		((_FP_QNANBIT_S << 1) - 1)
 #define _FP_NANFRAC_D		((_FP_QNANBIT_D << 1) - 1)
 #define _FP_NANFRAC_Q		((_FP_QNANBIT_Q << 1) - 1), -1
--- libgcc/config/i386/t-softfp.jj	2022-09-23 09:02:31.759659479 +0200
+++ libgcc/config/i386/t-softfp	2022-09-28 18:58:09.114520943 +0200
@@ -6,8 +6,9 @@ LIB2FUNCS_EXCLUDE += $(libgcc2-hf-functi
 libgcc2-hf-extras = $(addsuffix .c, $(libgcc2-hf-functions))
 LIB2ADD += $(addprefix $(srcdir)/config/i386/, $(libgcc2-hf-extras))
 
-softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf
-softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf
+softfp_extensions := hfsf hfdf hftf hfxf sfdf sftf dftf xftf bfsf
+softfp_truncations := tfhf xfhf dfhf sfhf tfsf dfsf tfdf tfxf \
+		      tfbf xfbf dfbf sfbf hfbf
 
 softfp_extras += eqhf2
 
@@ -20,6 +21,7 @@ CFLAGS-truncsfhf2.c += -msse2
 CFLAGS-truncdfhf2.c += -msse2
 CFLAGS-truncxfhf2.c += -msse2
 CFLAGS-trunctfhf2.c += -msse2
+CFLAGS-trunchfbf2.c += -msse2
 
 CFLAGS-eqhf2.c += -msse2
 CFLAGS-_divhc3.c += -msse2
--- libgcc/config/i386/libgcc-glibc.ver.jj	2022-09-23 09:02:31.746659658 +0200
+++ libgcc/config/i386/libgcc-glibc.ver	2022-09-28 18:58:09.114520943 +0200
@@ -214,3 +214,13 @@ GCC_12.0.0 {
   __trunctfhf2
   __truncxfhf2
 }
+
+%inherit GCC_13.0.0 GCC_12.0.0
+GCC_13.0.0 {
+  __extendbfsf2
+  __truncdfbf2
+  __truncsfbf2
+  __trunctfbf2
+  __truncxfbf2
+  __trunchfbf2
+}
--- libgcc/config/i386/sfp-machine.h.jj	2022-09-23 09:02:31.747659644 +0200
+++ libgcc/config/i386/sfp-machine.h	2022-09-28 18:58:09.114520943 +0200
@@ -18,6 +18,7 @@ typedef int __gcc_CMPtype __attribute__
 #define _FP_QNANNEGATEDP 0
 
 #define _FP_NANSIGN_H		1
+#define _FP_NANSIGN_B		1
 #define _FP_NANSIGN_S		1
 #define _FP_NANSIGN_D		1
 #define _FP_NANSIGN_E		1
--- libgcc/config/i386/64/sfp-machine.h.jj	2022-09-23 09:02:31.700660291 +0200
+++ libgcc/config/i386/64/sfp-machine.h	2022-09-28 18:58:09.114520943 +0200
@@ -14,6 +14,7 @@ typedef unsigned int UTItype __attribute
 #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_2_udiv(Q,R,X,Y)
 
 #define _FP_NANFRAC_H		_FP_QNANBIT_H
+#define _FP_NANFRAC_B		_FP_QNANBIT_B
 #define _FP_NANFRAC_S		_FP_QNANBIT_S
 #define _FP_NANFRAC_D		_FP_QNANBIT_D
 #define _FP_NANFRAC_E		_FP_QNANBIT_E, 0
--- libgcc/config/i386/32/sfp-machine.h.jj	2022-09-23 09:02:31.683660526 +0200
+++ libgcc/config/i386/32/sfp-machine.h	2022-09-28 18:58:09.115520929 +0200
@@ -87,6 +87,7 @@
 #define _FP_DIV_MEAT_Q(R,X,Y)   _FP_DIV_MEAT_4_udiv(Q,R,X,Y)
 
 #define _FP_NANFRAC_H		_FP_QNANBIT_H
+#define _FP_NANFRAC_B		_FP_QNANBIT_B
 #define _FP_NANFRAC_S		_FP_QNANBIT_S
 #define _FP_NANFRAC_D		_FP_QNANBIT_D, 0
 /* Even if XFmode is 12byte,  we have to pad it to
--- libgcc/soft-fp/brain.h.jj	2022-09-28 18:58:09.113520956 +0200
+++ libgcc/soft-fp/brain.h	2022-09-28 18:58:09.113520956 +0200
@@ -0,0 +1,172 @@
+/* Software floating-point emulation.
+   Definitions for Brain Floating Point format (bfloat16).
+   Copyright (C) 1997-2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#ifndef SOFT_FP_BRAIN_H
+#define SOFT_FP_BRAIN_H	1
+
+#if _FP_W_TYPE_SIZE < 32
+# error "Here's a nickel kid.  Go buy yourself a real computer."
+#endif
+
+#define _FP_FRACTBITS_B		(_FP_W_TYPE_SIZE)
+
+#define _FP_FRACTBITS_DW_B	(_FP_W_TYPE_SIZE)
+
+#define _FP_FRACBITS_B		8
+#define _FP_FRACXBITS_B		(_FP_FRACTBITS_B - _FP_FRACBITS_B)
+#define _FP_WFRACBITS_B		(_FP_WORKBITS + _FP_FRACBITS_B)
+#define _FP_WFRACXBITS_B	(_FP_FRACTBITS_B - _FP_WFRACBITS_B)
+#define _FP_EXPBITS_B		8
+#define _FP_EXPBIAS_B		127
+#define _FP_EXPMAX_B		255
+
+#define _FP_QNANBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2))
+#define _FP_QNANBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-2+_FP_WORKBITS))
+#define _FP_IMPLBIT_B		((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1))
+#define _FP_IMPLBIT_SH_B	((_FP_W_TYPE) 1 << (_FP_FRACBITS_B-1+_FP_WORKBITS))
+#define _FP_OVERFLOW_B		((_FP_W_TYPE) 1 << (_FP_WFRACBITS_B))
+
+#define _FP_WFRACBITS_DW_B	(2 * _FP_WFRACBITS_B)
+#define _FP_WFRACXBITS_DW_B	(_FP_FRACTBITS_DW_B - _FP_WFRACBITS_DW_B)
+#define _FP_HIGHBIT_DW_B	\
+  ((_FP_W_TYPE) 1 << (_FP_WFRACBITS_DW_B - 1) % _FP_W_TYPE_SIZE)
+
+/* The implementation of _FP_MUL_MEAT_B and _FP_DIV_MEAT_B should be
+   chosen by the target machine.  */
+
+typedef float BFtype __attribute__ ((mode (BF)));
+
+union _FP_UNION_B
+{
+  BFtype flt;
+  struct _FP_STRUCT_LAYOUT
+  {
+#if __BYTE_ORDER == __BIG_ENDIAN
+    unsigned sign : 1;
+    unsigned exp  : _FP_EXPBITS_B;
+    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
+#else
+    unsigned frac : _FP_FRACBITS_B - (_FP_IMPLBIT_B != 0);
+    unsigned exp  : _FP_EXPBITS_B;
+    unsigned sign : 1;
+#endif
+  } bits;
+};
+
+#define FP_DECL_B(X)		_FP_DECL (1, X)
+#define FP_UNPACK_RAW_B(X, val)	_FP_UNPACK_RAW_1 (B, X, (val))
+#define FP_UNPACK_RAW_BP(X, val)	_FP_UNPACK_RAW_1_P (B, X, (val))
+#define FP_PACK_RAW_B(val, X)	_FP_PACK_RAW_1 (B, (val), X)
+#define FP_PACK_RAW_BP(val, X)			\
+  do						\
+    {						\
+      if (!FP_INHIBIT_RESULTS)			\
+	_FP_PACK_RAW_1_P (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_B(X, val)			\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1 (B, X, (val));		\
+      _FP_UNPACK_CANONICAL (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_BP(X, val)			\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1_P (B, X, (val));		\
+      _FP_UNPACK_CANONICAL (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_SEMIRAW_B(X, val)		\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1 (B, X, (val));		\
+      _FP_UNPACK_SEMIRAW (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_UNPACK_SEMIRAW_BP(X, val)		\
+  do						\
+    {						\
+      _FP_UNPACK_RAW_1_P (B, X, (val));		\
+      _FP_UNPACK_SEMIRAW (B, 1, X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_B(val, X)			\
+  do						\
+    {						\
+      _FP_PACK_CANONICAL (B, 1, X);		\
+      _FP_PACK_RAW_1 (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_BP(val, X)			\
+  do						\
+    {						\
+      _FP_PACK_CANONICAL (B, 1, X);		\
+      if (!FP_INHIBIT_RESULTS)			\
+	_FP_PACK_RAW_1_P (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_SEMIRAW_B(val, X)		\
+  do						\
+    {						\
+      _FP_PACK_SEMIRAW (B, 1, X);		\
+      _FP_PACK_RAW_1 (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_PACK_SEMIRAW_BP(val, X)		\
+  do						\
+    {						\
+      _FP_PACK_SEMIRAW (B, 1, X);		\
+      if (!FP_INHIBIT_RESULTS)			\
+	_FP_PACK_RAW_1_P (B, (val), X);		\
+    }						\
+  while (0)
+
+#define FP_TO_INT_B(r, X, rsz, rsg)	_FP_TO_INT (B, 1, (r), X, (rsz), (rsg))
+#define FP_TO_INT_ROUND_B(r, X, rsz, rsg)	\
+  _FP_TO_INT_ROUND (B, 1, (r), X, (rsz), (rsg))
+#define FP_FROM_INT_B(X, r, rs, rt)	_FP_FROM_INT (B, 1, X, (r), (rs), rt)
+
+/* BFmode arithmetic is not implemented.  */
+
+#define _FP_FRAC_HIGH_B(X)	_FP_FRAC_HIGH_1 (X)
+#define _FP_FRAC_HIGH_RAW_B(X)	_FP_FRAC_HIGH_1 (X)
+#define _FP_FRAC_HIGH_DW_B(X)	_FP_FRAC_HIGH_1 (X)
+
+#define FP_CMP_EQ_B(r, X, Y, ex)       _FP_CMP_EQ (B, 1, (r), X, Y, (ex))
+
+#endif /* !SOFT_FP_BRAIN_H */
--- libgcc/soft-fp/truncsfbf2.c.jj	2022-09-28 18:58:09.113520956 +0200
+++ libgcc/soft-fp/truncsfbf2.c	2022-09-28 18:58:09.113520956 +0200
@@ -0,0 +1,48 @@
+/* Software floating-point emulation.
+   Truncate IEEE single into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "single.h"
+
+BFtype
+__truncsfbf2 (SFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_S (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_S (A, a);
+  FP_TRUNC (B, S, 1, 1, R, A);
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/truncdfbf2.c.jj	2022-09-28 18:58:09.114520943 +0200
+++ libgcc/soft-fp/truncdfbf2.c	2022-09-28 18:58:09.114520943 +0200
@@ -0,0 +1,52 @@
+/* Software floating-point emulation.
+   Truncate IEEE double into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "double.h"
+
+BFtype
+__truncdfbf2 (DFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_D (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_D (A, a);
+#if _FP_W_TYPE_SIZE < _FP_FRACBITS_D
+  FP_TRUNC (B, D, 1, 2, R, A);
+#else
+  FP_TRUNC (B, D, 1, 1, R, A);
+#endif
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/truncxfbf2.c.jj	2022-09-28 18:58:09.113520956 +0200
+++ libgcc/soft-fp/truncxfbf2.c	2022-09-28 18:58:09.113520956 +0200
@@ -0,0 +1,52 @@
+/* Software floating-point emulation.
+   Truncate IEEE extended into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "extended.h"
+
+BFtype
+__truncxfbf2 (XFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_E (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_E (A, a);
+#if _FP_W_TYPE_SIZE < 64
+  FP_TRUNC (B, E, 1, 4, R, A);
+#else
+  FP_TRUNC (B, E, 1, 2, R, A);
+#endif
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/trunctfbf2.c.jj	2022-09-28 18:58:09.114520943 +0200
+++ libgcc/soft-fp/trunctfbf2.c	2022-09-28 18:58:09.114520943 +0200
@@ -0,0 +1,52 @@
+/* Software floating-point emulation.
+   Truncate IEEE quad into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "quad.h"
+
+BFtype
+__trunctfbf2 (TFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_Q (A);
+  FP_DECL_B (R);
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_SEMIRAW_Q (A, a);
+#if _FP_W_TYPE_SIZE < 64
+  FP_TRUNC (B, Q, 1, 4, R, A);
+#else
+  FP_TRUNC (B, Q, 1, 2, R, A);
+#endif
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/trunchfbf2.c.jj	2022-09-28 18:58:09.114520943 +0200
+++ libgcc/soft-fp/trunchfbf2.c	2022-09-28 18:58:09.114520943 +0200
@@ -0,0 +1,58 @@
+/* Software floating-point emulation.
+   Truncate IEEE half into bfloat16.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "brain.h"
+#include "half.h"
+#include "single.h"
+
+/* BFtype and HFtype are unordered, neither is a superset or subset
+   of each other.  Convert HFtype to SFtype (lossless) and then
+   truncate to BFtype.  */
+
+BFtype
+__trunchfbf2 (HFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_S (B);
+  FP_DECL_B (R);
+  SFtype b;
+  BFtype r;
+
+  FP_INIT_ROUNDMODE;
+  FP_UNPACK_RAW_H (A, a);
+  FP_EXTEND (S, H, 1, 1, B, A);
+  FP_PACK_RAW_S (b, B);
+  FP_UNPACK_SEMIRAW_S (B, b);
+  FP_TRUNC (B, S, 1, 1, R, B);
+  FP_PACK_SEMIRAW_B (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/truncbfhf2.c.jj	2022-09-28 18:58:09.113520956 +0200
+++ libgcc/soft-fp/truncbfhf2.c	2022-09-28 18:58:09.113520956 +0200
@@ -0,0 +1,75 @@
+/* Software floating-point emulation.
+   Truncate bfloat16 into IEEE half.
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "soft-fp.h"
+#include "half.h"
+#include "brain.h"
+#include "single.h"
+
+/* BFtype and HFtype are unordered, neither is a superset or subset
+   of each other.  Convert BFtype to SFtype (lossless) and then
+   truncate to HFtype.  */
+
+HFtype
+__truncbfhf2 (BFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_H (A);
+  FP_DECL_S (B);
+  FP_DECL_B (R);
+  SFtype b;
+  HFtype r;
+
+  FP_INIT_ROUNDMODE;
+  /* Optimize BFtype to SFtype conversion to simple left shift
+     by 16 if possible, we don't need to raise exceptions on sNaN
+     here as the SFtype to HFtype truncation should do that too.  */
+  if (sizeof (BFtype) == 2
+      && sizeof (unsigned short) == 2
+      && sizeof (SFtype) == 4
+      && sizeof (unsigned int) == 4)
+    {
+      union { BFtype a; unsigned short b; } u1;
+      union { SFtype a; unsigned int b; } u2;
+      u1.a = a;
+      u2.b = (u1.b << 8) << 8;
+      b = u2.a;
+    }
+  else
+    {
+      FP_UNPACK_RAW_B (A, a);
+      FP_EXTEND (S, B, 1, 1, B, A);
+      FP_PACK_RAW_S (b, B);
+    }
+  FP_UNPACK_SEMIRAW_S (B, b);
+  FP_TRUNC (H, S, 1, 1, R, B);
+  FP_PACK_SEMIRAW_H (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libgcc/soft-fp/extendbfsf2.c.jj	2022-09-28 18:58:09.114520943 +0200
+++ libgcc/soft-fp/extendbfsf2.c	2022-09-28 18:58:09.114520943 +0200
@@ -0,0 +1,49 @@
+/* Software floating-point emulation.
+   Return an bfloat16 converted to IEEE single
+   Copyright (C) 2022 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   In addition to the permissions in the GNU Lesser General Public
+   License, the Free Software Foundation gives you unlimited
+   permission to link the compiled version of this file into
+   combinations with other programs, and to distribute those
+   combinations without any restriction coming from the use of this
+   file.  (The Lesser General Public License restrictions do apply in
+   other respects; for example, they cover modification of the file,
+   and distribution when not linked into a combine executable.)
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define FP_NO_EXACT_UNDERFLOW
+#include "soft-fp.h"
+#include "brain.h"
+#include "single.h"
+
+SFtype
+__extendbfsf2 (BFtype a)
+{
+  FP_DECL_EX;
+  FP_DECL_B (A);
+  FP_DECL_S (R);
+  SFtype r;
+
+  FP_INIT_EXCEPTIONS;
+  FP_UNPACK_RAW_B (A, a);
+  FP_EXTEND (S, B, 1, 1, R, A);
+  FP_PACK_RAW_S (r, R);
+  FP_HANDLE_EXCEPTIONS;
+
+  return r;
+}
--- libiberty/cp-demangle.h.jj	2022-09-27 08:03:27.142982423 +0200
+++ libiberty/cp-demangle.h	2022-09-29 12:42:47.291727886 +0200
@@ -180,7 +180,7 @@ d_advance (struct d_info *di, int i)
 extern const struct demangle_operator_info cplus_demangle_operators[];
 #endif
 
-#define D_BUILTIN_TYPE_COUNT (35)
+#define D_BUILTIN_TYPE_COUNT (36)
 
 CP_STATIC_IF_GLIBCPP_V3
 const struct demangle_builtin_type_info
--- libiberty/cp-demangle.c.jj	2022-09-27 08:03:27.141982437 +0200
+++ libiberty/cp-demangle.c	2022-09-29 13:04:57.083526204 +0200
@@ -2489,6 +2489,7 @@ cplus_demangle_builtin_types[D_BUILTIN_T
   /* 33 */ { NL ("decltype(nullptr)"),	NL ("decltype(nullptr)"),
 	     D_PRINT_DEFAULT },
   /* 34 */ { NL ("_Float"),	NL ("_Float"),		D_PRINT_FLOAT },
+  /* 35 */ { NL ("std::bfloat16_t"), NL ("std::bfloat16_t"), D_PRINT_FLOAT },
 };
 
 CP_STATIC_IF_GLIBCPP_V3
@@ -2753,8 +2754,20 @@ cplus_demangle_type (struct d_info *di)
 
 	case 'F':
 	  /* DF<number>_ - _Float<number>.
-	     DF<number>x - _Float<number>x.  */
+	     DF<number>x - _Float<number>x
+	     DFb16_ - std::bfloat16_t.  */
 	  {
+	    if (d_peek_char (di) == 'b')
+	      {
+		d_advance (di, 1);
+		if (d_number (di) != 16 || d_peek_char (di) != '_')
+		  return NULL;
+		d_advance (di, 1);
+		ret = d_make_builtin_type (di,
+					   &cplus_demangle_builtin_types[35]);
+		di->expansion += ret->u.s_builtin.type->len;
+		break;
+	      }
 	    int arg = d_number (di);
 	    char buf[12];
 	    char suffix = 0;
--- libiberty/testsuite/demangle-expected.jj	2022-09-27 08:03:27.168982071 +0200
+++ libiberty/testsuite/demangle-expected	2022-09-29 12:49:02.181597532 +0200
@@ -1249,6 +1249,10 @@ xxx
 _Z3xxxDF32xDF64xDF128xCDF32xVb
 xxx(_Float32x, _Float64x, _Float128x, _Float32x _Complex, bool volatile)
 xxx
+--format=auto --no-params
+_Z3xxxDFb16_
+xxx(std::bfloat16_t)
+xxx
 # https://sourceware.org/bugzilla/show_bug.cgi?id=16817
 --format=auto --no-params
 _QueueNotification_QueueController__$4PPPPPPPM_A_INotice___Z