From patchwork Fri Oct 28 06:20:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Pan2 via Gcc-patches" X-Patchwork-Id: 12085 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp650986wru; Thu, 27 Oct 2022 23:21:26 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7jYYwTyfCfAVEcwGvWVW1EZqVmxZw0xnIH3OEzwG8pS7fn14nBNOoFEZFZqOXZKFe11H7r X-Received: by 2002:a17:907:3f04:b0:741:4bf4:fe42 with SMTP id hq4-20020a1709073f0400b007414bf4fe42mr46523560ejc.664.1666938074956; Thu, 27 Oct 2022 23:21:14 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id cr21-20020a170906d55500b00773db351c39si4183276ejc.64.2022.10.27.23.21.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Oct 2022 23:21:14 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="wUYQ3h/M"; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BFA54385702E for ; Fri, 28 Oct 2022 06:21:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BFA54385702E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1666938073; bh=/+U87goPaSpzbFdCb9wv0nXThDNlVTEB9T1NwwJKHZw=; h=To:Subject:Date:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=wUYQ3h/MU1nvV6Q89yNxVQrlexAWN7wtEkuxmbUYDXZ9OerBaGQCYjWPQiAukQLlO gzBkxh2CyPYv9KLvQCupDhFNGEBzAKzjrns9GcSC7dsBqxuE8OxbPj8f5+BaTwpTdB cUmkv1SQ+zQohDKd7uqLM5G83n8EzOKF4ytAoMAE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by sourceware.org (Postfix) with ESMTPS id 888DD3857C4B for ; Fri, 28 Oct 2022 06:20:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 888DD3857C4B X-IronPort-AV: E=McAfee;i="6500,9779,10513"; a="372635157" X-IronPort-AV: E=Sophos;i="5.95,220,1661842800"; d="scan'208";a="372635157" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Oct 2022 23:20:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10513"; a="775266856" X-IronPort-AV: E=Sophos;i="5.95,220,1661842800"; d="scan'208";a="775266856" Received: from fmsmsx602.amr.corp.intel.com ([10.18.126.82]) by fmsmga001.fm.intel.com with ESMTP; 27 Oct 2022 23:20:11 -0700 Received: from fmsmsx612.amr.corp.intel.com (10.18.126.92) by fmsmsx602.amr.corp.intel.com (10.18.126.82) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 27 Oct 2022 23:20:11 -0700 Received: from fmsmsx610.amr.corp.intel.com (10.18.126.90) by fmsmsx612.amr.corp.intel.com (10.18.126.92) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 27 Oct 2022 23:20:10 -0700 Received: from FMSEDG603.ED.cps.intel.com (10.1.192.133) by fmsmsx610.amr.corp.intel.com (10.18.126.90) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31 via Frontend Transport; Thu, 27 Oct 2022 23:20:10 -0700 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (104.47.58.168) by edgegateway.intel.com (192.55.55.68) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2375.31; Thu, 27 Oct 2022 23:20:10 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mmY1DEuIOL0CvcqKlWgE0bF8+4YvlnfAqhQcE+fGu7THvUqTQM9tzVvlX7sDAP0jNY9KG3PJ+iqx7YNn9ftc4GjvA8B2ygUtKZeWuN7Y6N4IOcaJ8junEubbJJQxcdBjSH/u0yg4A6wkC+TdQsoXEY6jdjboxr2iUS2Hl7y1Cb2AZGK2ozTaGSenqp61cQTuZnw+b1rcQ96GYFFvfmXLFNmwGO8gOhzU4MrdxDlj8E+NsSCwgE31l8v5whQ+4aLtDlqJ8oyOapi9jgoiBZ0u6I5m4/0J32U+pAMJMJZg/UqMUHq04S1UlaJsRfSuewJ6bB0qR9hUs9zpmSV3G+iXDA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=/+U87goPaSpzbFdCb9wv0nXThDNlVTEB9T1NwwJKHZw=; b=JRV/sWW47ptWZ5/Iy+mHnnMR5UDas2Q1c4qfWkrpkvpvuCOb3iweietK/VEmKy+nuTlrDWkJ9lkC4tjNBVb8ARoI2dBa3QCbsx0fmdrsQYgyCqKJthJNG4W1fyQeO+VN+ZbbrybktTE8Dq9wjneSgWfFckU/p5ntzrdejBAKycX+vucITtpPx2kaA7+yhYxhvNkXx3SznuIgAQY/U2UwFneEhxMqEmUb3ICRRs5/wnz7eRafJdBV1KvfgZHY9c0J/5I0TwisKrKFTgbX/nhObZeRZw8jkmlDdRZqFtx5zrMEDiU7gZAIap/JXEy0JaXZdAETfROq3PHNZgUGYx+zhg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from DM4PR11MB5487.namprd11.prod.outlook.com (2603:10b6:5:39f::22) by BL1PR11MB5398.namprd11.prod.outlook.com (2603:10b6:208:31c::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5746.28; Fri, 28 Oct 2022 06:20:07 +0000 Received: from DM4PR11MB5487.namprd11.prod.outlook.com ([fe80::dc50:e9a3:2270:4a70]) by DM4PR11MB5487.namprd11.prod.outlook.com ([fe80::dc50:e9a3:2270:4a70%9]) with mapi id 15.20.5746.028; Fri, 28 Oct 2022 06:20:06 +0000 To: "Liu, Hongtao" , "gcc-patches@gcc.gnu.org" Subject: [PATCH] i386: using __bf16 for AVX512BF16 intrinsics Thread-Topic: [PATCH] i386: using __bf16 for AVX512BF16 intrinsics Thread-Index: AQHY6pOuDl6wgtzUekOWNrXCUyMhu64jUqtg Date: Fri, 28 Oct 2022 06:20:06 +0000 Message-ID: References: <20221028060808.1637178-1-lingling.kong@intel.com> In-Reply-To: <20221028060808.1637178-1-lingling.kong@intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-version: 11.6.500.17 dlp-reaction: no-action x-ms-publictraffictype: Email x-ms-traffictypediagnostic: DM4PR11MB5487:EE_|BL1PR11MB5398:EE_ x-ms-office365-filtering-correlation-id: 8898a0ca-ff21-45d3-851d-08dab8ac755c x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: GtU+l1k62M838Cr3IYU9Zdkg1/tgcbtPH6YEcZteV91tQfCstxErpzYS9eC9a7WbYBqP3f9+iKs8hYTcsQD8Tn6LR6oga8JDUQJxWckWR2dSvHKcgksPmc/zD0edRMamfbRIvT8hIJafae4iQM5ghzSqPgomqAfxtRWEEIwAZLvPxKYxUhUZSaYmPbljwgo7d0DKjIgxgZ6U1PcSkHNOQl0Do5zsdtn0zt8WTKi34rEFUjOlnrDW57v/he4024W07aL+86JRMuZ0DllpOac40nmHPGcRdP0BGBTcVPJKH3G/BUtKHaRlA/S/3lFbiFeWiSmVACcxheTZYonB4Y5YZnpmsJkl4F8o+K24qh0EKaEabc3XOwJQ5v7FTE/NYvUMGuwbjqJInMeMKGj+d4jhI2/JDMAgCUxOaSHUJ1yUufxM1giWPBwtuRdtXq24p/zN2PqEs8C9RUghUEa5QRyD14Df/AjpZ+Ks4ZT3HYHipwqWriuGSOzHD6qOJysQhCDnaGdUTp4Rtj0g1J4rXkjXLayLTY3M2pK7Uo8DGXifrl/FQzl6N1lofuP9UJulokcVf1MnF3CSataS9/nD4qvpL3NSflVlHXAA6UHT2nt80dr+FEmiG73Ub1HvNkLnKbM1gMgchhI4iN85mroV4OEdKysTgOpYRwceIUDqMRRt8Nxt4EkoogVaX+IorBo0TDJ3ha/CQBVnhooQ0OYtc12KnJRkJy4871NeWei3Nu6ZL0nhl3JhSrS7fXOMZfr6lIS/ceUdilyuhv0B8hqNdLnU570ikJNtkzI75MX0XJiDkKM= x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DM4PR11MB5487.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(346002)(396003)(376002)(136003)(366004)(39860400002)(451199015)(84970400001)(316002)(66556008)(26005)(66476007)(55016003)(8936002)(66446008)(38070700005)(83380400001)(110136005)(66946007)(30864003)(41300700001)(478600001)(64756008)(8676002)(33656002)(2906002)(52536014)(71200400001)(5660300002)(9686003)(186003)(122000001)(38100700002)(76116006)(6506007)(86362001)(7696005)(82960400001)(559001)(579004); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: fXFThrp2YmuB2kP2Ljq4gSZnqgGpUV8gERwKYer0Lwk41n7awsWOSecZb1BQn0UOJRUgiHBr8YnZwcCT8OMEqvEYtXgQmMMFgXfzGm0AcS7NbTYKZKgncJ6hvXw1cZk/ca+LTQyVuWvYrw90G2I7DdCHW1hrxoV7iKZ9tbeadTCsYCMuHdhAEkidyTVuOrArdthauIgJvwbcgZF9Tw3lnYPL3ij8XJwnNAsB/jhaSwgaf24p/eyWxmxsHBrQpyWkrtuZ2Q1IGvaVa5j3ArDG5R6AC8uwWLIQBWgQzZiplZ8PiTErUEgf930lfIp7RmcGX/6tJ0kgIWHY/fEjnnoLkgljQQbdI/v2F/g1eTD+mofdkmMHaYIWOA+IYLxrQ4YRBNrvxthIv/uXOHqKmpa9gibvz/mW1l5Kefjc85FWPg47TAUjfOrGGF6RgWdryDiSkU5JRFdajps74dUPFC+vV0V3+yUFqVPpa9RsYo8jGQaaXJkXSkiX8tTAZhQF3X0mdKzjwMUZJKB0mm8kRbsG/rJPyMkNAi1iqlQHOTU3/t1WgkQpSylr+UKJkwdOBaKwOZDVGpvDopO76bC+FxQIt4mc/x/NExsr0tdRkZSNQVf8PrDMWIjGVx6fR+McyjZHu4jFogNfUVpeXRCULhdzNAeAj97NSauDhnWE2kTznHYkuM/FA0wLnKccdwpuijnylhUWgmQjuwVoHMWgJ/yVGuRgi+BHwl2mqKx/exeWzwaTmmlajShEJbaRZcOcCm3VxV2BdcuRnvg5OFqpjcE+t3E/IcLAjVa9BOpsyde0JkqU/VYPKCwQ5AT73lR5uPpA4OKglTpbDT3SBox7XQ+3riOIxYy4WvE3L65nhbniN/8yLO535S/zku9Yx4iD82ZUGrvDgszDL9xtbqKo+7IBwpJnaCg1j5zkAkwS87MW6dw3BGrddqjU0zaakMkxNFXIjEeSMBhB/6MUsuJO1Vt6Lg3uC5lxzXKtxgmQNkFXrDqvT8+fy1MN0Xa8+MmGTbqi8Nx8V/op1TlNxn/mtOKR2sqCKnoVB/+0ix6FoiE0wXLt60Yx/BP3aMZoWjYKRQ9uIAEKBZoyj/ycMDVml0NG4e+0px6mm2DfR+K4gHz2MUzupTK99xPJT3ptd5+V1NKKn30Z+GIdPPpXr4SCqpExw2JK8MXmSC7O3yj1x4ycbgjqP230hexy+3b2t9QuXaJr1PEzBP3sGKp/ytw+BgxLKUbjxzypCCtXflU/RuC9GZlsi3Yh7Gwp58TekiiI+Y33K/T9LWcuOnCNmEKD0eDpVIKlFBRqT5x7HlK2XuzPMOfWOc3R2l0JFRu5HY/RTh6SwphIzf7HMRXDGY+OjO2KoJIiLBL7gQHS0W3LoF4owIwMUHuipE365wzws5YAmCZdhOMlisvzbbuzVzkMxHx5gr/cDdy6COwiLKmhNR3/xCFVsB2Yk73fABkbBRdrEO9goy/xUSXlq85n+zRAkUMZxEcex9xXhMq/qKkRt51JGPkGbreQCWXjKQ/jEuOHyZmFUr22tVyLyeFTQzIXlWCeGmHL3ISMpxuIvXfPKA/v5LSfJi40xFZume9NocGs4G+h MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: DM4PR11MB5487.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 8898a0ca-ff21-45d3-851d-08dab8ac755c X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Oct 2022 06:20:06.8761 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: VfQzDP5SaZqP3jrWqa+xGo8K73BidJ1idXMKiRgS5uIzPrIBBCvJKNk8DqRyw0clpTmum+9HhCIXEXYNyEhy1g== X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL1PR11MB5398 X-OriginatorOrg: intel.com X-Spam-Status: No, score=-13.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kong, Lingling via Gcc-patches" From: "Li, Pan2 via Gcc-patches" Reply-To: "Kong, Lingling" Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747911258842229842?= X-GMAIL-MSGID: =?utf-8?q?1747911258842229842?= Hi, Previously we use unsigned short to represent bf16. It's not a good expression, and at the time the front end didn't support bf16 type. Now we introduced __bf16 to X86 psABI. So we can switch intrinsics to the new type. Ok for trunk ? Thanks, Lingling gcc/ChangeLog: * config/i386/avx512bf16intrin.h (__attribute__): Change short to bf16. (_mm_cvtsbh_ss): Ditto. (_mm512_cvtne2ps_pbh): Ditto. (_mm512_mask_cvtne2ps_pbh): Ditto. (_mm512_maskz_cvtne2ps_pbh): Ditto. * config/i386/avx512bf16vlintrin.h (__attribute__): Ditto. (_mm256_cvtne2ps_pbh): Ditto. (_mm256_mask_cvtne2ps_pbh): Ditto. (_mm256_maskz_cvtne2ps_pbh): Ditto. (_mm_cvtne2ps_pbh): Ditto. (_mm_mask_cvtne2ps_pbh): Ditto. (_mm_maskz_cvtne2ps_pbh): Ditto. (_mm_cvtness_sbh): Ditto. * config/i386/i386-builtin-types.def (V8BF): Add new DEF_VECTOR_TYPE for BFmode. (V16BF): Ditto. (V32BF): Ditto. * config/i386/i386-builtin.def (BDESC): Fixed builtins. * config/i386/i386-expand.cc (ix86_expand_args_builtin): Changed avx512bf16 ix86_builtin_func_type included HI to BF. * config/i386/immintrin.h: Add SSE2 depend for avx512bf16. * config/i386/sse.md (TARGET_AVX512VL): Changed HI vector to BF vector. (avx512f_cvtneps2bf16_v4sf): New define_expand. (*avx512f_cvtneps2bf16_v4sf): New define_insn. (avx512f_cvtneps2bf16_v4sf_maskz):Ditto. (avx512f_cvtneps2bf16_v4sf_mask): Ditto. (avx512f_cvtneps2bf16_v4sf_mask_1): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: Add fpmath option. * gcc.target/i386/avx512bf16-vdpbf16ps-2.c: Fixed scan-assembler. * gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c: Add x/y suffix for vcvtneps2bf16. * gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c: Ditto. --- gcc/config/i386/avx512bf16intrin.h | 12 +-- gcc/config/i386/avx512bf16vlintrin.h | 29 ++--- gcc/config/i386/i386-builtin-types.def | 51 ++++----- gcc/config/i386/i386-builtin.def | 54 +++++----- gcc/config/i386/i386-expand.cc | 48 ++++----- gcc/config/i386/immintrin.h | 2 + gcc/config/i386/sse.md | 101 ++++++++++++++---- .../gcc.target/i386/avx512bf16-cvtsbh2ss-1.c | 2 +- .../gcc.target/i386/avx512bf16-vdpbf16ps-2.c | 2 +- .../i386/avx512bf16vl-cvtness2sbh-1.c | 2 +- .../i386/avx512bf16vl-vcvtneps2bf16-1.c | 12 +-- 11 files changed, 189 insertions(+), 126 deletions(-) diff --git a/gcc/config/i386/avx512bf16intrin.h b/gcc/config/i386/avx512bf16intrin.h index b6e9ddad157..ea1d0125b3f 100644 --- a/gcc/config/i386/avx512bf16intrin.h +++ b/gcc/config/i386/avx512bf16intrin.h @@ -35,16 +35,16 @@ #endif /* __AVX512BF16__ */ /* Internal data types for implementing the intrinsics. */ -typedef short __v32bh __attribute__ ((__vector_size__ (64))); +typedef __bf16 __v32bf __attribute__ ((__vector_size__ (64))); /* The Intel API is flexible enough that we must allow aliasing with other vector types, and their scalar components. */ -typedef short __m512bh __attribute__ ((__vector_size__ (64), __may_alias__)); +typedef __bf16 __m512bh __attribute__ ((__vector_size__ (64), __may_alias__)); /* Convert One BF16 Data to One Single Float Data. */ extern __inline float __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm_cvtsbh_ss (__bfloat16 __A) +_mm_cvtsbh_ss (__bf16 __A) { union{ float a; unsigned int b;} __tmp; __tmp.b = ((unsigned int)(__A)) << 16; @@ -57,21 +57,21 @@ extern __inline __m512bh __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm512_cvtne2ps_pbh (__m512 __A, __m512 __B) { - return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32hi(__A, __B); + return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32bf(__A, __B); } extern __inline __m512bh __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm512_mask_cvtne2ps_pbh (__m512bh __A, __mmask32 __B, __m512 __C, __m512 __D) { - return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32hi_mask(__C, __D, __A, __B); + return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32bf_mask(__C, __D, __A, __B); } extern __inline __m512bh __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm512_maskz_cvtne2ps_pbh (__mmask32 __A, __m512 __B, __m512 __C) { - return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32hi_maskz(__B, __C, __A); + return (__m512bh)__builtin_ia32_cvtne2ps2bf16_v32bf_maskz(__B, __C, __A); } /* vcvtneps2bf16 */ diff --git a/gcc/config/i386/avx512bf16vlintrin.h b/gcc/config/i386/avx512bf16vlintrin.h index 969335ff358..56c28f14cf6 100644 --- a/gcc/config/i386/avx512bf16vlintrin.h +++ b/gcc/config/i386/avx512bf16vlintrin.h @@ -35,57 +35,58 @@ #endif /* __AVX512BF16__ */ /* Internal data types for implementing the intrinsics. */ -typedef short __v16bh __attribute__ ((__vector_size__ (32))); -typedef short __v8bh __attribute__ ((__vector_size__ (16))); +typedef __bf16 __v16bf __attribute__ ((__vector_size__ (32))); +typedef __bf16 __v8bf __attribute__ ((__vector_size__ (16))); /* The Intel API is flexible enough that we must allow aliasing with other vector types, and their scalar components. */ -typedef short __m256bh __attribute__ ((__vector_size__ (32), __may_alias__)); -typedef short __m128bh __attribute__ ((__vector_size__ (16), __may_alias__)); +typedef __bf16 __m256bh __attribute__ ((__vector_size__ (32), __may_alias__)); +typedef __bf16 __m128bh __attribute__ ((__vector_size__ (16), __may_alias__)); + +typedef __bf16 __bfloat16; -typedef unsigned short __bfloat16; /* vcvtne2ps2bf16 */ extern __inline __m256bh __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cvtne2ps_pbh (__m256 __A, __m256 __B) { - return (__m256bh)__builtin_ia32_cvtne2ps2bf16_v16hi(__A, __B); + return (__m256bh)__builtin_ia32_cvtne2ps2bf16_v16bf(__A, __B); } extern __inline __m256bh __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm256_mask_cvtne2ps_pbh (__m256bh __A, __mmask16 __B, __m256 __C, __m256 __D) { - return (__m256bh)__builtin_ia32_cvtne2ps2bf16_v16hi_mask(__C, __D, __A, __B); + return (__m256bh)__builtin_ia32_cvtne2ps2bf16_v16bf_mask(__C, __D, __A, __B); } extern __inline __m256bh __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm256_maskz_cvtne2ps_pbh (__mmask16 __A, __m256 __B, __m256 __C) { - return (__m256bh)__builtin_ia32_cvtne2ps2bf16_v16hi_maskz(__B, __C, __A); + return (__m256bh)__builtin_ia32_cvtne2ps2bf16_v16bf_maskz(__B, __C, __A); } extern __inline __m128bh __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_cvtne2ps_pbh (__m128 __A, __m128 __B) { - return (__m128bh)__builtin_ia32_cvtne2ps2bf16_v8hi(__A, __B); + return (__m128bh)__builtin_ia32_cvtne2ps2bf16_v8bf(__A, __B); } extern __inline __m128bh __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_mask_cvtne2ps_pbh (__m128bh __A, __mmask8 __B, __m128 __C, __m128 __D) { - return (__m128bh)__builtin_ia32_cvtne2ps2bf16_v8hi_mask(__C, __D, __A, __B); + return (__m128bh)__builtin_ia32_cvtne2ps2bf16_v8bf_mask(__C, __D, __A, __B); } extern __inline __m128bh __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_maskz_cvtne2ps_pbh (__mmask8 __A, __m128 __B, __m128 __C) { - return (__m128bh)__builtin_ia32_cvtne2ps2bf16_v8hi_maskz(__B, __C, __A); + return (__m128bh)__builtin_ia32_cvtne2ps2bf16_v8bf_maskz(__B, __C, __A); } /* vcvtneps2bf16 */ @@ -176,13 +177,13 @@ _mm_maskz_dpbf16_ps (__mmask8 __A, __m128 __B, __m128bh __C, __m128bh __D) return (__m128)__builtin_ia32_dpbf16ps_v4sf_maskz(__B, __C, __D, __A); } -extern __inline __bfloat16 +extern __inline __bf16 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_cvtness_sbh (float __A) { __v4sf __V = {__A, 0, 0, 0}; - __v8hi __R = __builtin_ia32_cvtneps2bf16_v4sf_mask ((__v4sf)__V, - (__v8hi)_mm_undefined_si128 (), (__mmask8)-1); + __v8bf __R = __builtin_ia32_cvtneps2bf16_v4sf_mask ((__v4sf)__V, + (__v8bf)_mm_undefined_si128 (), (__mmask8)-1); return __R[0]; } diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 63a360b0f8b..aedae2d7750 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -87,6 +87,7 @@ DEF_VECTOR_TYPE (V8QI, QI) DEF_VECTOR_TYPE (V2DF, DOUBLE) DEF_VECTOR_TYPE (V4SF, FLOAT) DEF_VECTOR_TYPE (V8HF, FLOAT16) +DEF_VECTOR_TYPE (V8BF, BFLOAT16) DEF_VECTOR_TYPE (V2DI, DI) DEF_VECTOR_TYPE (V4SI, SI) DEF_VECTOR_TYPE (V8HI, HI) @@ -100,6 +101,7 @@ DEF_VECTOR_TYPE (V16UQI, UQI, V16QI) DEF_VECTOR_TYPE (V4DF, DOUBLE) DEF_VECTOR_TYPE (V8SF, FLOAT) DEF_VECTOR_TYPE (V16HF, FLOAT16) +DEF_VECTOR_TYPE (V16BF, BFLOAT16) DEF_VECTOR_TYPE (V4DI, DI) DEF_VECTOR_TYPE (V8SI, SI) DEF_VECTOR_TYPE (V16HI, HI) @@ -111,6 +113,7 @@ DEF_VECTOR_TYPE (V16UHI, UHI, V16HI) # AVX512F vectors DEF_VECTOR_TYPE (V32SF, FLOAT) DEF_VECTOR_TYPE (V32HF, FLOAT16) +DEF_VECTOR_TYPE (V32BF, BFLOAT16) DEF_VECTOR_TYPE (V16SF, FLOAT) DEF_VECTOR_TYPE (V8DF, DOUBLE) DEF_VECTOR_TYPE (V8DI, DI) @@ -1273,30 +1276,30 @@ DEF_FUNCTION_TYPE (V4SI, V4SI, V4SI, UHI) DEF_FUNCTION_TYPE (V8SI, V8SI, V8SI, UHI) # BF16 builtins -DEF_FUNCTION_TYPE (V32HI, V16SF, V16SF) -DEF_FUNCTION_TYPE (V32HI, V16SF, V16SF, V32HI, USI) -DEF_FUNCTION_TYPE (V32HI, V16SF, V16SF, USI) -DEF_FUNCTION_TYPE (V16HI, V8SF, V8SF) -DEF_FUNCTION_TYPE (V16HI, V8SF, V8SF, V16HI, UHI) -DEF_FUNCTION_TYPE (V16HI, V8SF, V8SF, UHI) -DEF_FUNCTION_TYPE (V8HI, V4SF, V4SF) -DEF_FUNCTION_TYPE (V8HI, V4SF, V4SF, V8HI, UQI) -DEF_FUNCTION_TYPE (V8HI, V4SF, V4SF, UQI) -DEF_FUNCTION_TYPE (V16HI, V16SF) -DEF_FUNCTION_TYPE (V16HI, V16SF, V16HI, UHI) -DEF_FUNCTION_TYPE (V16HI, V16SF, UHI) -DEF_FUNCTION_TYPE (V8HI, V8SF) -DEF_FUNCTION_TYPE (V8HI, V8SF, V8HI, UQI) -DEF_FUNCTION_TYPE (V8HI, V8SF, UQI) -DEF_FUNCTION_TYPE (V8HI, V4SF) -DEF_FUNCTION_TYPE (V8HI, V4SF, V8HI, UQI) -DEF_FUNCTION_TYPE (V8HI, V4SF, UQI) -DEF_FUNCTION_TYPE (V16SF, V16SF, V32HI, V32HI) -DEF_FUNCTION_TYPE (V16SF, V16SF, V32HI, V32HI, UHI) -DEF_FUNCTION_TYPE (V8SF, V8SF, V16HI, V16HI) -DEF_FUNCTION_TYPE (V8SF, V8SF, V16HI, V16HI, UQI) -DEF_FUNCTION_TYPE (V4SF, V4SF, V8HI, V8HI) -DEF_FUNCTION_TYPE (V4SF, V4SF, V8HI, V8HI, UQI) +DEF_FUNCTION_TYPE (V32BF, V16SF, V16SF) +DEF_FUNCTION_TYPE (V32BF, V16SF, V16SF, V32BF, USI) +DEF_FUNCTION_TYPE (V32BF, V16SF, V16SF, USI) +DEF_FUNCTION_TYPE (V16BF, V8SF, V8SF) +DEF_FUNCTION_TYPE (V16BF, V8SF, V8SF, V16BF, UHI) +DEF_FUNCTION_TYPE (V16BF, V8SF, V8SF, UHI) +DEF_FUNCTION_TYPE (V8BF, V4SF, V4SF) +DEF_FUNCTION_TYPE (V8BF, V4SF, V4SF, V8BF, UQI) +DEF_FUNCTION_TYPE (V8BF, V4SF, V4SF, UQI) +DEF_FUNCTION_TYPE (V16BF, V16SF) +DEF_FUNCTION_TYPE (V16BF, V16SF, V16BF, UHI) +DEF_FUNCTION_TYPE (V16BF, V16SF, UHI) +DEF_FUNCTION_TYPE (V8BF, V8SF) +DEF_FUNCTION_TYPE (V8BF, V8SF, V8BF, UQI) +DEF_FUNCTION_TYPE (V8BF, V8SF, UQI) +DEF_FUNCTION_TYPE (V8BF, V4SF) +DEF_FUNCTION_TYPE (V8BF, V4SF, V8BF, UQI) +DEF_FUNCTION_TYPE (V8BF, V4SF, UQI) +DEF_FUNCTION_TYPE (V16SF, V16SF, V32BF, V32BF) +DEF_FUNCTION_TYPE (V16SF, V16SF, V32BF, V32BF, UHI) +DEF_FUNCTION_TYPE (V8SF, V8SF, V16BF, V16BF) +DEF_FUNCTION_TYPE (V8SF, V8SF, V16BF, V16BF, UQI) +DEF_FUNCTION_TYPE (V4SF, V4SF, V8BF, V8BF) +DEF_FUNCTION_TYPE (V4SF, V4SF, V8BF, V8BF, UQI) # KEYLOCKER builtins DEF_FUNCTION_TYPE (UINT, UINT, V2DI, V2DI, PVOID) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index e35306e27d0..5802e2049a8 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2779,33 +2779,33 @@ BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenclast_v32qi, "__builtin_ia32_vae BDESC (0, OPTION_MASK_ISA2_VAES, CODE_FOR_vaesenclast_v64qi, "__builtin_ia32_vaesenclast_v64qi", IX86_BUILTIN_VAESENCLAST64, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI) /* BF16 */ -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32hi, "__builtin_ia32_cvtne2ps2bf16_v32hi", IX86_BUILTIN_CVTNE2PS2HI16_V32HI, UNKNOWN, (int) V32HI_FTYPE_V16SF_V16SF) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32hi_mask, "__builtin_ia32_cvtne2ps2bf16_v32hi_mask", IX86_BUILTIN_CVTNE2PS2HI16_V32HI_MASK, UNKNOWN, (int) V32HI_FTYPE_V16SF_V16SF_V32HI_USI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32hi_maskz, "__builtin_ia32_cvtne2ps2bf16_v32hi_maskz", IX86_BUILTIN_CVTNE2PS2HI16_V32HI_MASKZ, UNKNOWN, (int) V32HI_FTYPE_V16SF_V16SF_USI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16hi, "__builtin_ia32_cvtne2ps2bf16_v16hi", IX86_BUILTIN_CVTNE2PS2HI16_V16HI, UNKNOWN, (int) V16HI_FTYPE_V8SF_V8SF) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16hi_mask, "__builtin_ia32_cvtne2ps2bf16_v16hi_mask", IX86_BUILTIN_CVTNE2PS2HI16_V16HI_MASK, UNKNOWN, (int) V16HI_FTYPE_V8SF_V8SF_V16HI_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16hi_maskz, "__builtin_ia32_cvtne2ps2bf16_v16hi_maskz", IX86_BUILTIN_CVTNE2PS2HI16_V16HI_MASKZ, UNKNOWN, (int) V16HI_FTYPE_V8SF_V8SF_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8hi, "__builtin_ia32_cvtne2ps2bf16_v8hi", IX86_BUILTIN_CVTNE2PS2HI16_V8HI, UNKNOWN, (int) V8HI_FTYPE_V4SF_V4SF) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8hi_mask, "__builtin_ia32_cvtne2ps2bf16_v8hi_mask", IX86_BUILTIN_CVTNE2PS2HI16_V8HI_MASK, UNKNOWN, (int) V8HI_FTYPE_V4SF_V4SF_V8HI_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8hi_maskz, "__builtin_ia32_cvtne2ps2bf16_v8hi_maskz", IX86_BUILTIN_CVTNE2PS2HI16_V8HI_MASKZ, UNKNOWN, (int) V8HI_FTYPE_V4SF_V4SF_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf, "__builtin_ia32_cvtneps2bf16_v16sf", IX86_BUILTIN_CVTNEPS2HI16_V16SF, UNKNOWN, (int) V16HI_FTYPE_V16SF) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf_mask, "__builtin_ia32_cvtneps2bf16_v16sf_mask", IX86_BUILTIN_CVTNEPS2HI16_V16SF_MASK, UNKNOWN, (int) V16HI_FTYPE_V16SF_V16HI_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf_maskz, "__builtin_ia32_cvtneps2bf16_v16sf_maskz", IX86_BUILTIN_CVTNE2PS2HI16_V16SF_MASKZ, UNKNOWN, (int) V16HI_FTYPE_V16SF_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v8sf, "__builtin_ia32_cvtneps2bf16_v8sf", IX86_BUILTIN_CVTNEPS2HI16_V8SF, UNKNOWN, (int) V8HI_FTYPE_V8SF) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v8sf_mask, "__builtin_ia32_cvtneps2bf16_v8sf_mask", IX86_BUILTIN_CVTNEPS2HI16_V8SF_MASK, UNKNOWN, (int) V8HI_FTYPE_V8SF_V8HI_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v8sf_maskz, "__builtin_ia32_cvtneps2bf16_v8sf_maskz", IX86_BUILTIN_CVTNE2PS2HI16_V8SF_MASKZ, UNKNOWN, (int) V8HI_FTYPE_V8SF_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v4sf, "__builtin_ia32_cvtneps2bf16_v4sf", IX86_BUILTIN_CVTNEPS2HI16_V4SF, UNKNOWN, (int) V8HI_FTYPE_V4SF) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v4sf_mask, "__builtin_ia32_cvtneps2bf16_v4sf_mask", IX86_BUILTIN_CVTNEPS2HI16_V4SF_MASK, UNKNOWN, (int) V8HI_FTYPE_V4SF_V8HI_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v4sf_maskz, "__builtin_ia32_cvtneps2bf16_v4sf_maskz", IX86_BUILTIN_CVTNE2PS2HI16_V4SF_MASKZ, UNKNOWN, (int) V8HI_FTYPE_V4SF_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf, "__builtin_ia32_dpbf16ps_v16sf", IX86_BUILTIN_DPHI16PS_V16SF, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32HI_V32HI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf_mask, "__builtin_ia32_dpbf16ps_v16sf_mask", IX86_BUILTIN_DPHI16PS_V16SF_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32HI_V32HI_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf_maskz, "__builtin_ia32_dpbf16ps_v16sf_maskz", IX86_BUILTIN_DPHI16PS_V16SF_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32HI_V32HI_UHI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf, "__builtin_ia32_dpbf16ps_v8sf", IX86_BUILTIN_DPHI16PS_V8SF, UNKNOWN, (int) V8SF_FTYPE_V8SF_V16HI_V16HI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf_mask, "__builtin_ia32_dpbf16ps_v8sf_mask", IX86_BUILTIN_DPHI16PS_V8SF_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V16HI_V16HI_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf_maskz, "__builtin_ia32_dpbf16ps_v8sf_maskz", IX86_BUILTIN_DPHI16PS_V8SF_MASKZ, UNKNOWN, (int) V8SF_FTYPE_V8SF_V16HI_V16HI_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf, "__builtin_ia32_dpbf16ps_v4sf", IX86_BUILTIN_DPHI16PS_V4SF, UNKNOWN, (int) V4SF_FTYPE_V4SF_V8HI_V8HI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_mask, "__builtin_ia32_dpbf16ps_v4sf_mask", IX86_BUILTIN_DPHI16PS_V4SF_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V8HI_V8HI_UQI) -BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_maskz, "__builtin_ia32_dpbf16ps_v4sf_maskz", IX86_BUILTIN_DPHI16PS_V4SF_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V8HI_V8HI_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf, "__builtin_ia32_cvtne2ps2bf16_v32bf", IX86_BUILTIN_CVTNE2PS2BF16_V32BF, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf_mask, "__builtin_ia32_cvtne2ps2bf16_v32bf_mask", IX86_BUILTIN_CVTNE2PS2BF16_V32BF_MASK, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF_V32BF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v32bf_maskz, "__builtin_ia32_cvtne2ps2bf16_v32bf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V32BF_MASKZ, UNKNOWN, (int) V32BF_FTYPE_V16SF_V16SF_USI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16bf, "__builtin_ia32_cvtne2ps2bf16_v16bf", IX86_BUILTIN_CVTNE2PS2BF16_V16BF, UNKNOWN, (int) V16BF_FTYPE_V8SF_V8SF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16bf_mask, "__builtin_ia32_cvtne2ps2bf16_v16bf_mask", IX86_BUILTIN_CVTNE2PS2BF16_V16BF_MASK, UNKNOWN, (int) V16BF_FTYPE_V8SF_V8SF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v16bf_maskz, "__builtin_ia32_cvtne2ps2bf16_v16bf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V16BF_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V8SF_V8SF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8bf, "__builtin_ia32_cvtne2ps2bf16_v8bf", IX86_BUILTIN_CVTNE2PS2BF16_V8BF, UNKNOWN, (int) V8BF_FTYPE_V4SF_V4SF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8bf_mask, "__builtin_ia32_cvtne2ps2bf16_v8bf_mask", IX86_BUILTIN_CVTNE2PS2BF16_V8BF_MASK, UNKNOWN, (int) V8BF_FTYPE_V4SF_V4SF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtne2ps2bf16_v8bf_maskz, "__builtin_ia32_cvtne2ps2bf16_v8bf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V8BF_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V4SF_V4SF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf, "__builtin_ia32_cvtneps2bf16_v16sf", IX86_BUILTIN_CVTNEPS2BF16_V16SF, UNKNOWN, (int) V16BF_FTYPE_V16SF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf_mask, "__builtin_ia32_cvtneps2bf16_v16sf_mask", IX86_BUILTIN_CVTNEPS2BF16_V16SF_MASK, UNKNOWN, (int) V16BF_FTYPE_V16SF_V16BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v16sf_maskz, "__builtin_ia32_cvtneps2bf16_v16sf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V16SF_MASKZ, UNKNOWN, (int) V16BF_FTYPE_V16SF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v8sf, "__builtin_ia32_cvtneps2bf16_v8sf", IX86_BUILTIN_CVTNEPS2BF16_V8SF, UNKNOWN, (int) V8BF_FTYPE_V8SF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v8sf_mask, "__builtin_ia32_cvtneps2bf16_v8sf_mask", IX86_BUILTIN_CVTNEPS2BF16_V8SF_MASK, UNKNOWN, (int) V8BF_FTYPE_V8SF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v8sf_maskz, "__builtin_ia32_cvtneps2bf16_v8sf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V8SF_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V8SF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v4sf, "__builtin_ia32_cvtneps2bf16_v4sf", IX86_BUILTIN_CVTNEPS2BF16_V4SF, UNKNOWN, (int) V8BF_FTYPE_V4SF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v4sf_mask, "__builtin_ia32_cvtneps2bf16_v4sf_mask", IX86_BUILTIN_CVTNEPS2BF16_V4SF_MASK, UNKNOWN, (int) V8BF_FTYPE_V4SF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_cvtneps2bf16_v4sf_maskz, "__builtin_ia32_cvtneps2bf16_v4sf_maskz", IX86_BUILTIN_CVTNE2PS2BF16_V4SF_MASKZ, UNKNOWN, (int) V8BF_FTYPE_V4SF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf, "__builtin_ia32_dpbf16ps_v16sf", IX86_BUILTIN_DPBF16PS_V16SF, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf_mask, "__builtin_ia32_dpbf16ps_v16sf_mask", IX86_BUILTIN_DPBF16PS_V16SF_MASK, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v16sf_maskz, "__builtin_ia32_dpbf16ps_v16sf_maskz", IX86_BUILTIN_DPBF16PS_V16SF_MASKZ, UNKNOWN, (int) V16SF_FTYPE_V16SF_V32BF_V32BF_UHI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf, "__builtin_ia32_dpbf16ps_v8sf", IX86_BUILTIN_DPBF16PS_V8SF, UNKNOWN, (int) V8SF_FTYPE_V8SF_V16BF_V16BF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf_mask, "__builtin_ia32_dpbf16ps_v8sf_mask", IX86_BUILTIN_DPBF16PS_V8SF_MASK, UNKNOWN, (int) V8SF_FTYPE_V8SF_V16BF_V16BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v8sf_maskz, "__builtin_ia32_dpbf16ps_v8sf_maskz", IX86_BUILTIN_DPBF16PS_V8SF_MASKZ, UNKNOWN, (int) V8SF_FTYPE_V8SF_V16BF_V16BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf, "__builtin_ia32_dpbf16ps_v4sf", IX86_BUILTIN_DPBF16PS_V4SF, UNKNOWN, (int) V4SF_FTYPE_V4SF_V8BF_V8BF) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_mask, "__builtin_ia32_dpbf16ps_v4sf_mask", IX86_BUILTIN_DPBF16PS_V4SF_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V8BF_V8BF_UQI) +BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_maskz, "__builtin_ia32_dpbf16ps_v4sf_maskz", IX86_BUILTIN_DPBF16PS_V4SF_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V8BF_V8BF_UQI) /* AVX512FP16. */ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv8hf3_mask, "__builtin_ia32_addph128_mask", IX86_BUILTIN_ADDPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 5d9e5a12f7e..8e1ef0b4c4a 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -10462,9 +10462,9 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V8DF_FTYPE_V2DF: case V8DF_FTYPE_V8DF: case V4DI_FTYPE_V4DI: - case V16HI_FTYPE_V16SF: - case V8HI_FTYPE_V8SF: - case V8HI_FTYPE_V4SF: + case V16BF_FTYPE_V16SF: + case V8BF_FTYPE_V8SF: + case V8BF_FTYPE_V4SF: nargs = 1; break; case V4SF_FTYPE_V4SF_VEC_MERGE: @@ -10592,12 +10592,12 @@ ix86_expand_args_builtin (const struct builtin_description *d, case USI_FTYPE_USI_USI: case UDI_FTYPE_UDI_UDI: case V16SI_FTYPE_V8DF_V8DF: - case V32HI_FTYPE_V16SF_V16SF: - case V16HI_FTYPE_V8SF_V8SF: - case V8HI_FTYPE_V4SF_V4SF: - case V16HI_FTYPE_V16SF_UHI: - case V8HI_FTYPE_V8SF_UQI: - case V8HI_FTYPE_V4SF_UQI: + case V32BF_FTYPE_V16SF_V16SF: + case V16BF_FTYPE_V8SF_V8SF: + case V8BF_FTYPE_V4SF_V4SF: + case V16BF_FTYPE_V16SF_UHI: + case V8BF_FTYPE_V8SF_UQI: + case V8BF_FTYPE_V4SF_UQI: nargs = 2; break; case V2DI_FTYPE_V2DI_INT_CONVERT: @@ -10803,15 +10803,15 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V16HI_FTYPE_V16HI_V16HI_V16HI: case V8SI_FTYPE_V8SI_V8SI_V8SI: case V8HI_FTYPE_V8HI_V8HI_V8HI: - case V32HI_FTYPE_V16SF_V16SF_USI: - case V16HI_FTYPE_V8SF_V8SF_UHI: - case V8HI_FTYPE_V4SF_V4SF_UQI: - case V16HI_FTYPE_V16SF_V16HI_UHI: - case V8HI_FTYPE_V8SF_V8HI_UQI: - case V8HI_FTYPE_V4SF_V8HI_UQI: - case V16SF_FTYPE_V16SF_V32HI_V32HI: - case V8SF_FTYPE_V8SF_V16HI_V16HI: - case V4SF_FTYPE_V4SF_V8HI_V8HI: + case V32BF_FTYPE_V16SF_V16SF_USI: + case V16BF_FTYPE_V8SF_V8SF_UHI: + case V8BF_FTYPE_V4SF_V4SF_UQI: + case V16BF_FTYPE_V16SF_V16BF_UHI: + case V8BF_FTYPE_V8SF_V8BF_UQI: + case V8BF_FTYPE_V4SF_V8BF_UQI: + case V16SF_FTYPE_V16SF_V32BF_V32BF: + case V8SF_FTYPE_V8SF_V16BF_V16BF: + case V4SF_FTYPE_V4SF_V8BF_V8BF: nargs = 3; break; case V32QI_FTYPE_V32QI_V32QI_INT: @@ -10958,9 +10958,9 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V16HI_FTYPE_V32QI_V32QI_V16HI_UHI: case V8SI_FTYPE_V16HI_V16HI_V8SI_UQI: case V4SI_FTYPE_V8HI_V8HI_V4SI_UQI: - case V32HI_FTYPE_V16SF_V16SF_V32HI_USI: - case V16HI_FTYPE_V8SF_V8SF_V16HI_UHI: - case V8HI_FTYPE_V4SF_V4SF_V8HI_UQI: + case V32BF_FTYPE_V16SF_V16SF_V32BF_USI: + case V16BF_FTYPE_V8SF_V8SF_V16BF_UHI: + case V8BF_FTYPE_V4SF_V4SF_V8BF_UQI: nargs = 4; break; case V2DF_FTYPE_V2DF_V2DF_V2DI_INT: @@ -10998,9 +10998,9 @@ ix86_expand_args_builtin (const struct builtin_description *d, break; case UCHAR_FTYPE_UCHAR_UINT_UINT_PUNSIGNED: case UCHAR_FTYPE_UCHAR_ULONGLONG_ULONGLONG_PULONGLONG: - case V16SF_FTYPE_V16SF_V32HI_V32HI_UHI: - case V8SF_FTYPE_V8SF_V16HI_V16HI_UQI: - case V4SF_FTYPE_V4SF_V8HI_V8HI_UQI: + case V16SF_FTYPE_V16SF_V32BF_V32BF_UHI: + case V8SF_FTYPE_V8SF_V16BF_V16BF_UQI: + case V4SF_FTYPE_V4SF_V8BF_V8BF_UQI: nargs = 4; break; case UQI_FTYPE_V8DI_V8DI_INT_UQI: diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index ddea249d09b..c62d50f1951 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -118,9 +118,11 @@ #include +#ifdef __SSE2__ #include #include +#endif #include diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index f4b5506703f..fba81a93c1a 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -187,8 +187,6 @@ UNSPEC_VP2INTERSECT ;; For AVX512BF16 support - UNSPEC_VCVTNE2PS2BF16 - UNSPEC_VCVTNEPS2BF16 UNSPEC_VDPBF16PS ;; For AVX512FP16 suppport @@ -28918,41 +28916,101 @@ "vp2intersectd\t{%2, %1, %0|%0, %1, %2}" [(set_attr ("prefix") ("evex"))]) -(define_mode_iterator BF16 [V32HI (V16HI "TARGET_AVX512VL") (V8HI "TARGET_AVX512VL")]) +(define_mode_iterator VF_AVX512BF16VL + [V32BF (V16BF "TARGET_AVX512VL") (V8BF "TARGET_AVX512VL")]) ;; Converting from BF to SF (define_mode_attr bf16_cvt_2sf - [(V32HI "V16SF") (V16HI "V8SF") (V8HI "V4SF")]) + [(V32BF "V16SF") (V16BF "V8SF") (V8BF "V4SF")]) ;; Converting from SF to BF (define_mode_attr sf_cvt_bf16 - [(V4SF "V8HI") (V8SF "V8HI") (V16SF "V16HI")]) + [(V8SF "V8BF") (V16SF "V16BF")]) ;; Mapping from BF to SF (define_mode_attr sf_bf16 - [(V4SF "V8HI") (V8SF "V16HI") (V16SF "V32HI")]) + [(V4SF "V8BF") (V8SF "V16BF") (V16SF "V32BF")]) (define_expand "avx512f_cvtne2ps2bf16__maskz" - [(match_operand:BF16 0 "register_operand") + [(match_operand:VF_AVX512BF16VL 0 "register_operand") (match_operand: 1 "register_operand") - (match_operand: 2 "register_operand") + (match_operand: 2 "nonimmediate_operand") (match_operand: 3 "register_operand")] "TARGET_AVX512BF16" { - emit_insn (gen_avx512f_cvtne2ps2bf16__mask(operands[0], operands[1], - operands[2], CONST0_RTX(mode), operands[3])); + emit_insn (gen_avx512f_cvtne2ps2bf16__mask(operands[0], operands[2], + operands[1], CONST0_RTX(mode), operands[3])); DONE; }) (define_insn "avx512f_cvtne2ps2bf16_" - [(set (match_operand:BF16 0 "register_operand" "=v") - (unspec:BF16 - [(match_operand: 1 "register_operand" "v") - (match_operand: 2 "register_operand" "v")] - UNSPEC_VCVTNE2PS2BF16))] + [(set (match_operand:VF_AVX512BF16VL 0 "register_operand" "=v") + (vec_concat:VF_AVX512BF16VL + (float_truncate: + (match_operand: 2 "nonimmediate_operand" "vm")) + (float_truncate: + (match_operand: 1 "register_operand" "v"))))] "TARGET_AVX512BF16" "vcvtne2ps2bf16\t{%2, %1, %0|%0, %1, %2}") +(define_expand "avx512f_cvtneps2bf16_v4sf" + [(set (match_operand:V8BF 0 "register_operand") + (vec_concat:V8BF + (float_truncate:V4BF + (match_operand:V4SF 1 "nonimmediate_operand")) + (match_dup 2)))] + "TARGET_AVX512BF16 && TARGET_AVX512VL" + "operands[2] = CONST0_RTX (V4BFmode);") + +(define_insn "*avx512f_cvtneps2bf16_v4sf" + [(set (match_operand:V8BF 0 "register_operand" "=v") + (vec_concat:V8BF + (float_truncate:V4BF + (match_operand:V4SF 1 "nonimmediate_operand" "vm")) + (match_operand:V4BF 2 "const0_operand")))] + "TARGET_AVX512BF16 && TARGET_AVX512VL" + "vcvtneps2bf16{x}\t{%1, %0|%0, %1}") + +(define_expand "avx512f_cvtneps2bf16_v4sf_maskz" + [(match_operand:V8BF 0 "register_operand") + (match_operand:V4SF 1 "nonimmediate_operand") + (match_operand:QI 2 "register_operand")] + "TARGET_AVX512BF16 && TARGET_AVX512VL" +{ + emit_insn (gen_avx512f_cvtneps2bf16_v4sf_mask_1(operands[0], operands[1], + CONST0_RTX(V8BFmode), operands[2], CONST0_RTX(V4BFmode))); + DONE; +}) + +(define_expand "avx512f_cvtneps2bf16_v4sf_mask" + [(match_operand:V8BF 0 "register_operand") + (match_operand:V4SF 1 "nonimmediate_operand") + (match_operand:V8BF 2 "nonimm_or_0_operand") + (match_operand:QI 3 "register_operand")] + "TARGET_AVX512BF16 && TARGET_AVX512VL" +{ + emit_insn (gen_avx512f_cvtneps2bf16_v4sf_mask_1(operands[0], operands[1], + operands[2], operands[3], CONST0_RTX(V4BFmode))); + DONE; +}) + +(define_insn "avx512f_cvtneps2bf16_v4sf_mask_1" + [(set (match_operand:V8BF 0 "register_operand" "=v") + (vec_concat:V8BF + (vec_merge:V4BF + (float_truncate:V4BF + (match_operand:V4SF 1 "nonimmediate_operand" "vm")) + (vec_select:V4BF + (match_operand:V8BF 2 "nonimm_or_0_operand" "0C") + (parallel [(const_int 0) (const_int 1) + (const_int 2) (const_int 3)])) + (match_operand:QI 3 "register_operand" "Yk")) + (match_operand:V4BF 4 "const0_operand")))] + "TARGET_AVX512BF16 && TARGET_AVX512VL" + "vcvtneps2bf16{x}\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}") + +(define_mode_iterator VF1_AVX512_256 [V16SF (V8SF "TARGET_AVX512VL")]) + (define_expand "avx512f_cvtneps2bf16__maskz" [(match_operand: 0 "register_operand") - (match_operand:VF1_AVX512VL 1 "register_operand") + (match_operand:VF1_AVX512_256 1 "nonimmediate_operand") (match_operand: 2 "register_operand")] "TARGET_AVX512BF16" { @@ -28963,11 +29021,10 @@ (define_insn "avx512f_cvtneps2bf16_" [(set (match_operand: 0 "register_operand" "=v") - (unspec: - [(match_operand:VF1_AVX512VL 1 "register_operand" "v")] - UNSPEC_VCVTNEPS2BF16))] + (float_truncate: + (match_operand:VF1_AVX512_256 1 "nonimmediate_operand" "vm")))] "TARGET_AVX512BF16" - "vcvtneps2bf16\t{%1, %0|%0, %1}") + "vcvtneps2bf16\t{%1, %0|%0, %1}") (define_expand "avx512f_dpbf16ps__maskz" [(match_operand:VF1_AVX512VL 0 "register_operand") @@ -28987,7 +29044,7 @@ (unspec:VF1_AVX512VL [(match_operand:VF1_AVX512VL 1 "register_operand" "0") (match_operand: 2 "register_operand" "v") - (match_operand: 3 "register_operand" "v")] + (match_operand: 3 "nonimmediate_operand" "vm")] UNSPEC_VDPBF16PS))] "TARGET_AVX512BF16" "vdpbf16ps\t{%3, %2, %0|%0, %2, %3}") @@ -28998,7 +29055,7 @@ (unspec:VF1_AVX512VL [(match_operand:VF1_AVX512VL 1 "register_operand" "0") (match_operand: 2 "register_operand" "v") - (match_operand: 3 "register_operand" "v")] + (match_operand: 3 "nonimmediate_operand" "vm")] UNSPEC_VDPBF16PS) (match_dup 1) (match_operand: 4 "register_operand" "Yk")))] diff --git a/gcc/testsuite/gcc.target/i386/avx512bf16-cvtsbh2ss-1.c b/gcc/testsuite/gcc.target/i386/avx512bf16-cvtsbh2ss-1.c index 831abd37d80..8e929e6f159 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bf16-cvtsbh2ss-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512bf16-cvtsbh2ss-1.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-mavx512bf16 -O2" } */ -/* { dg-additional-options "-fno-PIE" { target ia32 } } */ +/* { dg-additional-options "-fno-PIE -mfpmath=sse" { target ia32 } } */ /* { dg-final { scan-assembler-times "sall\[ \\t\]+\[^\{\n\]*16" 1 } } */ /* { dg-final { scan-assembler-times "movl" 1 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512bf16-vdpbf16ps-2.c b/gcc/testsuite/gcc.target/i386/avx512bf16-vdpbf16ps-2.c index b64ad7b84dd..02ebdd8cf5b 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bf16-vdpbf16ps-2.c +++ b/gcc/testsuite/gcc.target/i386/avx512bf16-vdpbf16ps-2.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-mavx512bf16 -O2" } */ -/* { dg-final { scan-assembler-times "vdpbf16ps\[ \\t\]+\[^\{\n\]*%zmm\[0-9\]+\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vdpbf16ps\[ \\t\]+\[^\{\n\]*\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ #include diff --git a/gcc/testsuite/gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c b/gcc/testsuite/gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c index 8f21b1bfdae..b71addd6301 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512bf16vl-cvtness2sbh-1.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-mavx512bf16 -mavx512vl -O2" } */ -/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16x\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ #include diff --git a/gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c b/gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c index 0969ae1b35e..d3a9bdf8c34 100644 --- a/gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c @@ -1,11 +1,11 @@ /* { dg-do compile } */ /* { dg-options "-mavx512bf16 -mavx512vl -O2" } */ -/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ -/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16y\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16y\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16y\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16x\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16x\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16x\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ #include