From patchwork Fri Oct 14 07:54:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jiang, Haochen" X-Patchwork-Id: 2553 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp53909wrs; Fri, 14 Oct 2022 00:56:35 -0700 (PDT) X-Google-Smtp-Source: AMsMyM687FS9Auu2WBP+BvW1QN3EffzHi8HYRo4bgMAVCACJuajSMklNGcegtVBuY8809E9b+pCu X-Received: by 2002:a05:6402:22c7:b0:459:487c:b077 with SMTP id dm7-20020a05640222c700b00459487cb077mr3316894edb.66.1665734195733; Fri, 14 Oct 2022 00:56:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665734195; cv=none; d=google.com; s=arc-20160816; b=UPO6/8GNAvVhBU7PBu38Cm2zI+9hKnftpuuS5m2SVsRUjC4hRqpOG6ccSJuselLHb/ BBoi57Rqt5y+/Jyy42EU9u9Zjt21rhgA+5umiU0Ca3jrb4g2smkTjTaI0+O9TKSJ8mbY 3NFzCV5bEr0AiK7ehEapCquX1G2VWWm+kWG7qp/ctceEbBAJw1b4O/p3B4WzeZzBVBmK uDj2mwEsBWwWVh3jEfVgdQ4Krs0pmbiTnUSeVOQb1GZScMz4Xz6z8sF3e2SswXtf5m7j TaAZBB3JE41YoBBS7XuLoseTMxTh191yKkL7FKaOYMwk8tqgC4221Zos/SQWx85BZVUE zNDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :references:in-reply-to:message-id:date:subject:to:dmarc-filter :delivered-to:dkim-signature:dkim-filter; bh=4dW3v53gqAMQVY15dst/aEUsuTdd31vIBS6e70g/DuY=; b=DUlnN5emJJSdYltCg7ckdSawV9Xkk5b+vp36NSbXHtZJ3YuRPArI6V1QHfXXC63C3v 0RummUOM2m6QrFw3WYZLiAgZ3BRV1OrY5RIn9WeebV/BvDOcWfIiw8g4q9bk2tZYY/6M 2AIjeSmniIV+P41SVN/xA1CMogVzgpHfLAYUbvmN2KBoFZF7ozRMffqFDyZoNjASHYR+ AupROHDS/KnkB3ZcVteoh/K+UHTovmJ7aFxFJ0bBO3CcNpZAaUoDUVt1L1Pf3K+Gt7oG 8Vyz0TRk+lAlZ/i2/9WdIcVO/YYawiMOpg5DHVaPNVjC1lPBClCb9YCASqzbZ29mq3qO Wh8A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=xr6pTRUJ; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id dr8-20020a170907720800b00781b6ce15e1si2051806ejc.101.2022.10.14.00.56.35 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Oct 2022 00:56:35 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=xr6pTRUJ; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 962BC384BC11 for ; Fri, 14 Oct 2022 07:55:46 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 962BC384BC11 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1665734146; bh=4dW3v53gqAMQVY15dst/aEUsuTdd31vIBS6e70g/DuY=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=xr6pTRUJ2iuMwWgjuFOJTCKXMC+6qSdonVN+SKXwkrr3fFEQZi/ZjEdK/ACfotvSP vyIyZx9isxYqBx3fD4d/XEfYliOF7+XJO9IU3v/Je98th5yFVfdPpYoqsWWWnSi8IP kta1okXXXrxkeeYpFPe+Idgyf/Oz1WqbQosLn1wg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id C2EA338582A6 for ; Fri, 14 Oct 2022 07:54:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C2EA338582A6 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="288597861" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="288597861" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 00:54:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="627488383" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="627488383" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga002.jf.intel.com with ESMTP; 14 Oct 2022 00:54:48 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id DA0C21009C8C; Fri, 14 Oct 2022 15:54:47 +0800 (CST) To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/6] Support Intel AVX-IFMA Date: Fri, 14 Oct 2022 15:54:40 +0800 Message-Id: <20221014075445.7938-2-haochen.jiang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20221014075445.7938-1-haochen.jiang@intel.com> References: <20221014075445.7938-1-haochen.jiang@intel.com> X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Haochen Jiang via Gcc-patches From: "Jiang, Haochen" Reply-To: Haochen Jiang Cc: hongtao.liu@intel.com, Hongyu Wang Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746648900223885634?= X-GMAIL-MSGID: =?utf-8?q?1746648900223885634?= From: Hongyu Wang gcc/ * common/config/i386/i386-common.cc (OPTION_MASK_ISA_AVXIFMA_SET, OPTION_MASK_ISA2_AVXIFMA_UNSET, OPTION_MASK_ISA2_AVX2_UNSET): New macro. (ix86_handle_option): Handle -mavxifma. * commmon/config/i386/i386-cpuinfo.h (processor_types): Add FEATURE_AVXIFMA. * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for avxifma. * common/config/i386/cpuinfo.h (get_available_features): Detect avxifma. * config.gcc: Add avxifmaintrin.h * config/i386/avxifmaintrin.h: New. * config/i386/cpuid.h (bit_AVXIFMA): New. * config/i386/i386-builtin.def: Add new builtins. * config/i386/i386-c.cc (ix86_target_macros_internal): Define __AVXIFMA__. * config/i386/i386-options.cc (isa2_opts): Add -mavxifma. (ix86_valid_target_attribute_inner_p): Handle avxifma. * config/i386/i386.h (TARGET_AVXIFMA, TARGET_AVXIFMA_P, PTA_AVXIFMA): New. * config/i386/i386.opt: Add option -mavxifma. * config/i386/immintrin.h: Inculde avxifmaintrin.h. * config/i386/sse.md (vpamdd52): Remove. (avx_vpmadd52_, vpamdd52, vpamdd52_maskz_1): New define_insn. * doc/invoke.texi: Document -mavxifma. * doc/extend.texi: Document avxifma. * doc/sourcebuild.text: Document target avxifma. gcc/testsuite/ * gcc.target/i386/avx512ifma-vpmaddhuq-1.c: Remane.. * gcc.target/i386/avx512ifma-vpmaddhuq-1a.c: To this. * gcc.target/i386/avx512ifma-vpmaddluq-1.c: Ditto. * gcc.target/i386/avx512ifma-vpmaddluq-1a.c: Ditto. * gcc.target/i386/avx512vl-vpmaddhuq-2.c: Ditto. * gcc.target/i386/avx512vl-vpmaddhuq-2a.c: Ditto. * gcc.target/i386/avx512vl-vpmaddluq-2.c: Ditto. * gcc.target/i386/avx512vl-vpmaddluq-2a.c: Ditto. * gcc.target/i386/avx-check.h: Add avxifma check. * gcc.target/i386/avx512ifma-vpmaddhuq-1b.c: New Test. * gcc.target/i386/avx512ifma-vpmaddluq-1b.c: Ditto. * gcc.target/i386/avx512vl-vpmaddhuq-2b.c: Ditto. * gcc.target/i386/avx512vl-vpmaddluq-2b.c: Ditto. * gcc.target/i386/avx-ifma-1.c: Ditto. * gcc.target/i386/avx-ifma-vpmaddhuq-2.c: Ditto. * gcc.target/i386/avx-ifma-vpmaddluq-2.c: Ditto. * gcc.target/i386/sse-12.c: Add -mavxifma. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * g++.dg/other/i386-2.C: Ditto. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/builtin_target.c: Detect avxifma. * gcc.target/i386/funcspec-56.inc: Add new target attribute. * lib/target-supports.exp (check_effective_target_avxifma): New. --- gcc/common/config/i386/cpuinfo.h | 2 + gcc/common/config/i386/i386-common.cc | 20 ++++- gcc/common/config/i386/i386-cpuinfo.h | 1 + gcc/common/config/i386/i386-isas.h | 1 + gcc/config.gcc | 3 +- gcc/config/i386/avxifmaintrin.h | 78 +++++++++++++++++++ gcc/config/i386/cpuid.h | 1 + gcc/config/i386/i386-builtin.def | 6 ++ gcc/config/i386/i386-c.cc | 2 + gcc/config/i386/i386-isa.def | 1 + gcc/config/i386/i386-options.cc | 4 +- gcc/config/i386/i386.opt | 5 ++ gcc/config/i386/immintrin.h | 2 + gcc/config/i386/sse.md | 42 +++++++++- gcc/doc/extend.texi | 5 ++ gcc/doc/invoke.texi | 9 ++- gcc/doc/sourcebuild.texi | 3 + gcc/testsuite/g++.dg/other/i386-2.C | 2 +- gcc/testsuite/g++.dg/other/i386-3.C | 2 +- gcc/testsuite/gcc.target/i386/avx-check.h | 6 +- gcc/testsuite/gcc.target/i386/avx-ifma-1.c | 20 +++++ .../gcc.target/i386/avx-ifma-vpmaddhuq-2.c | 72 +++++++++++++++++ .../gcc.target/i386/avx-ifma-vpmaddluq-2.c | 61 +++++++++++++++ ...pmaddhuq-1.c => avx512ifma-vpmaddhuq-1a.c} | 0 .../gcc.target/i386/avx512ifma-vpmaddhuq-1b.c | 33 ++++++++ ...pmaddluq-1.c => avx512ifma-vpmaddluq-1a.c} | 0 .../gcc.target/i386/avx512ifma-vpmaddluq-1b.c | 33 ++++++++ gcc/testsuite/gcc.target/i386/funcspec-56.inc | 2 + gcc/testsuite/gcc.target/i386/sse-12.c | 2 +- gcc/testsuite/gcc.target/i386/sse-13.c | 2 +- gcc/testsuite/gcc.target/i386/sse-14.c | 2 +- gcc/testsuite/gcc.target/i386/sse-22.c | 4 +- gcc/testsuite/gcc.target/i386/sse-23.c | 2 +- gcc/testsuite/lib/target-supports.exp | 12 +++ 34 files changed, 423 insertions(+), 17 deletions(-) create mode 100644 gcc/config/i386/avxifmaintrin.h create mode 100644 gcc/testsuite/gcc.target/i386/avx-ifma-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-ifma-vpmaddhuq-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-ifma-vpmaddluq-2.c rename gcc/testsuite/gcc.target/i386/{avx512ifma-vpmaddhuq-1.c => avx512ifma-vpmaddhuq-1a.c} (100%) create mode 100644 gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-1b.c rename gcc/testsuite/gcc.target/i386/{avx512ifma-vpmaddluq-1.c => avx512ifma-vpmaddluq-1a.c} (100%) create mode 100644 gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-1b.c diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h index b5c1b21e554..9bb21c6cacc 100644 --- a/gcc/common/config/i386/cpuinfo.h +++ b/gcc/common/config/i386/cpuinfo.h @@ -793,6 +793,8 @@ get_available_features (struct __processor_model *cpu_model, { if (eax & bit_AVXVNNI) set_feature (FEATURE_AVXVNNI); + if (eax & bit_AVXIFMA) + set_feature (FEATURE_AVXIFMA); } if (avx512_usable) { diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index d6a68dc9b1d..4de7906b247 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -76,6 +76,7 @@ along with GCC; see the file COPYING3. If not see (OPTION_MASK_ISA_AVX512VL | OPTION_MASK_ISA_AVX512F_SET) #define OPTION_MASK_ISA_AVX512IFMA_SET \ (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512F_SET) +#define OPTION_MASK_ISA2_AVXIFMA_SET OPTION_MASK_ISA2_AVXIFMA #define OPTION_MASK_ISA_AVX512VBMI_SET \ (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512BW_SET) #define OPTION_MASK_ISA2_AVX5124FMAPS_SET OPTION_MASK_ISA2_AVX5124FMAPS @@ -212,7 +213,8 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA_AVX2_UNSET \ (OPTION_MASK_ISA_AVX2 | OPTION_MASK_ISA_AVX512F_UNSET) #define OPTION_MASK_ISA2_AVX2_UNSET \ - (OPTION_MASK_ISA2_AVXVNNI_UNSET | OPTION_MASK_ISA2_AVX512F_UNSET) + (OPTION_MASK_ISA2_AVXIFMA_UNSET | OPTION_MASK_ISA2_AVXVNNI_UNSET \ + | OPTION_MASK_ISA2_AVX512F_UNSET) #define OPTION_MASK_ISA_AVX512F_UNSET \ (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_AVX512CD_UNSET \ | OPTION_MASK_ISA_AVX512PF_UNSET | OPTION_MASK_ISA_AVX512ER_UNSET \ @@ -230,6 +232,7 @@ along with GCC; see the file COPYING3. If not see (OPTION_MASK_ISA_AVX512BW | OPTION_MASK_ISA_AVX512VBMI_UNSET) #define OPTION_MASK_ISA_AVX512VL_UNSET OPTION_MASK_ISA_AVX512VL #define OPTION_MASK_ISA_AVX512IFMA_UNSET OPTION_MASK_ISA_AVX512IFMA +#define OPTION_MASK_ISA2_AVXIFMA_UNSET OPTION_MASK_ISA2_AVXIFMA #define OPTION_MASK_ISA_AVX512VBMI_UNSET OPTION_MASK_ISA_AVX512VBMI #define OPTION_MASK_ISA2_AVX5124FMAPS_UNSET OPTION_MASK_ISA2_AVX5124FMAPS #define OPTION_MASK_ISA2_AVX5124VNNIW_UNSET OPTION_MASK_ISA2_AVX5124VNNIW @@ -1124,6 +1127,21 @@ ix86_handle_option (struct gcc_options *opts, } return true; + case OPT_mavxifma: + if (value) + { + opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVXIFMA_SET; + opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVXIFMA_SET; + opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX2_SET; + opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX2_SET; + } + else + { + opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVXIFMA_UNSET; + opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVXIFMA_UNSET; + } + return true; + case OPT_mfma: if (value) { diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h index 643fbd97378..968f9a56a6c 100644 --- a/gcc/common/config/i386/i386-cpuinfo.h +++ b/gcc/common/config/i386/i386-cpuinfo.h @@ -240,6 +240,7 @@ enum processor_features FEATURE_X86_64_V2, FEATURE_X86_64_V3, FEATURE_X86_64_V4, + FEATURE_AVXIFMA, CPU_FEATURE_MAX }; diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h index 2d0646a68f8..b05b4bb8f0d 100644 --- a/gcc/common/config/i386/i386-isas.h +++ b/gcc/common/config/i386/i386-isas.h @@ -175,4 +175,5 @@ ISA_NAMES_TABLE_START ISA_NAMES_TABLE_ENTRY("x86-64-v2", FEATURE_X86_64_V2, P_X86_64_V2, NULL) ISA_NAMES_TABLE_ENTRY("x86-64-v3", FEATURE_X86_64_V3, P_X86_64_V3, NULL) ISA_NAMES_TABLE_ENTRY("x86-64-v4", FEATURE_X86_64_V4, P_X86_64_V4, NULL) + ISA_NAMES_TABLE_ENTRY("avxifma", FEATURE_AVXIFMA, P_NONE, "-mavxifma") ISA_NAMES_TABLE_END diff --git a/gcc/config.gcc b/gcc/config.gcc index 8d5972fecf7..12365abbf86 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -421,7 +421,8 @@ i[34567]86-*-* | x86_64-*-*) tsxldtrkintrin.h amxtileintrin.h amxint8intrin.h amxbf16intrin.h x86gprintrin.h uintrintrin.h hresetintrin.h keylockerintrin.h avxvnniintrin.h - mwaitintrin.h avx512fp16intrin.h avx512fp16vlintrin.h" + mwaitintrin.h avx512fp16intrin.h avx512fp16vlintrin.h + avxifmaintrin.h" ;; ia64-*-*) extra_headers=ia64intrin.h diff --git a/gcc/config/i386/avxifmaintrin.h b/gcc/config/i386/avxifmaintrin.h new file mode 100644 index 00000000000..8f512c3ecb0 --- /dev/null +++ b/gcc/config/i386/avxifmaintrin.h @@ -0,0 +1,78 @@ +/* Copyright (C) 2020 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVXIFMAINTRIN_H_INCLUDED +#define _AVXIFMAINTRIN_H_INCLUDED + +#ifndef __AVXIFMA__ +#pragma GCC push_options +#pragma GCC target("avxifma") +#define __DISABLE_AVXIFMA__ +#endif /* __AVXIFMA__ */ + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_madd52lo_avx_epu64 (__m128i __X, __m128i __Y, __m128i __Z) +{ + return (__m128i) __builtin_ia32_avx_vpmadd52luq128 ((__v2di) __X, + (__v2di) __Y, + (__v2di) __Z); +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_madd52hi_avx_epu64 (__m128i __X, __m128i __Y, __m128i __Z) +{ + return (__m128i) __builtin_ia32_avx_vpmadd52huq128 ((__v2di) __X, + (__v2di) __Y, + (__v2di) __Z); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_madd52lo_avx_epu64 (__m256i __X, __m256i __Y, __m256i __Z) +{ + return (__m256i) __builtin_ia32_avx_vpmadd52luq256 ((__v4di) __X, + (__v4di) __Y, + (__v4di) __Z); +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_madd52hi_avx_epu64 (__m256i __X, __m256i __Y, __m256i __Z) +{ + return (__m256i) __builtin_ia32_avx_vpmadd52huq256 ((__v4di) __X, + (__v4di) __Y, + (__v4di) __Z); +} + +#ifdef __DISABLE_AVXIFMA__ +#undef __DISABLE_AVXIFMA__ +#pragma GCC pop_options +#endif /* __DISABLE_AVXIFMA__ */ + +#endif /* _AVXIFMAINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index a4c2fed7eda..9885699efd5 100644 --- a/gcc/config/i386/cpuid.h +++ b/gcc/config/i386/cpuid.h @@ -28,6 +28,7 @@ #define bit_AVXVNNI (1 << 4) #define bit_AVX512BF16 (1 << 5) #define bit_HRESET (1 << 22) +#define bit_AVXIFMA (1 << 23) /* %ecx */ #define bit_SSE3 (1 << 0) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index dea52a28d28..4a89099a00f 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2499,6 +2499,12 @@ BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpamdd BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpamdd52huqv2di_mask, "__builtin_ia32_vpmadd52huq128_mask", IX86_BUILTIN_VPMADD52HUQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_UQI) BDESC (OPTION_MASK_ISA_AVX512IFMA | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpamdd52huqv2di_maskz, "__builtin_ia32_vpmadd52huq128_maskz", IX86_BUILTIN_VPMADD52HUQ128_MASKZ, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI_UQI) +/* AVX_IFMA */ +BDESC (0, OPTION_MASK_ISA2_AVXIFMA, CODE_FOR_avx_vpmadd52luq_v4di, "__builtin_ia32_avx_vpmadd52luq256", IX86_BUINTIN_AVX_VPMADD52LUQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI) +BDESC (0, OPTION_MASK_ISA2_AVXIFMA, CODE_FOR_avx_vpmadd52huq_v4di, "__builtin_ia32_avx_vpmadd52huq256", IX86_BUINTIN_AVX_VPMADD52HUQ256, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_V4DI) +BDESC (0, OPTION_MASK_ISA2_AVXIFMA, CODE_FOR_avx_vpmadd52luq_v2di, "__builtin_ia32_avx_vpmadd52luq128", IX86_BUINTIN_AVX_VPMADD52LUQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI) +BDESC (0, OPTION_MASK_ISA2_AVXIFMA, CODE_FOR_avx_vpmadd52huq_v2di, "__builtin_ia32_avx_vpmadd52huq128", IX86_BUINTIN_AVX_VPMADD52HUQ128, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_V2DI) + /* AVX512VBMI */ BDESC (OPTION_MASK_ISA_AVX512VBMI, 0, CODE_FOR_vpmultishiftqbv64qi_mask, "__builtin_ia32_vpmultishiftqb512_mask", IX86_BUILTIN_VPMULTISHIFTQB512, UNKNOWN, (int) V64QI_FTYPE_V64QI_V64QI_V64QI_UDI) BDESC (OPTION_MASK_ISA_AVX512VBMI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpmultishiftqbv32qi_mask, "__builtin_ia32_vpmultishiftqb256_mask", IX86_BUILTIN_VPMULTISHIFTQB256, UNKNOWN, (int) V32QI_FTYPE_V32QI_V32QI_V32QI_USI) diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc index eb0e3b36a76..3494ec035d5 100644 --- a/gcc/config/i386/i386-c.cc +++ b/gcc/config/i386/i386-c.cc @@ -633,6 +633,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, def_or_undef (parse_in, "__WIDEKL__"); if (isa_flag2 & OPTION_MASK_ISA2_AVXVNNI) def_or_undef (parse_in, "__AVXVNNI__"); + if (isa_flag2 & OPTION_MASK_ISA2_AVXIFMA) + def_or_undef (parse_in, "__AVXIFMA__"); if (TARGET_IAMCU) { def_or_undef (parse_in, "__iamcu"); diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def index 83659d0bea4..6e0254ce418 100644 --- a/gcc/config/i386/i386-isa.def +++ b/gcc/config/i386/i386-isa.def @@ -109,3 +109,4 @@ DEF_PTA(KL) DEF_PTA(WIDEKL) DEF_PTA(AVXVNNI) DEF_PTA(AVX512FP16) +DEF_PTA(AVXIFMA) diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index acb2291e70f..5facb64c2a8 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -226,7 +226,8 @@ static struct ix86_target_opts isa2_opts[] = { "-mkl", OPTION_MASK_ISA2_KL }, { "-mwidekl", OPTION_MASK_ISA2_WIDEKL }, { "-mavxvnni", OPTION_MASK_ISA2_AVXVNNI }, - { "-mavx512fp16", OPTION_MASK_ISA2_AVX512FP16 } + { "-mavx512fp16", OPTION_MASK_ISA2_AVX512FP16 }, + { "-mavxifma", OPTION_MASK_ISA2_AVXIFMA } }; static struct ix86_target_opts isa_opts[] = { @@ -1072,6 +1073,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[], IX86_ATTR_ISA ("hreset", OPT_mhreset), IX86_ATTR_ISA ("avxvnni", OPT_mavxvnni), IX86_ATTR_ISA ("avx512fp16", OPT_mavx512fp16), + IX86_ATTR_ISA ("avxifma", OPT_mavxifma), /* enum options */ IX86_ATTR_ENUM ("fpmath=", OPT_mfpmath_), diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 0dbaacb57ed..36e28b7063d 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -1214,3 +1214,8 @@ Do not use GOT to access external symbols. -param=x86-stlf-window-ninsns= Target Joined UInteger Var(x86_stlf_window_ninsns) Init(64) Param Instructions number above which STFL stall penalty can be compensated. + +mavxifma +Target Mask(ISA2_AVXIFMA) Var(ix86_isa_flags2) Save +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, and +AVXIFMA built-in functions and code generation. diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index 6afd78c2b6f..e9d4e975243 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -44,6 +44,8 @@ #include +#include + #include #include diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 076064f97e6..331347569ea 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -27867,6 +27867,19 @@ (define_int_attr vpmadd52type [(UNSPEC_VPMADD52LUQ "luq") (UNSPEC_VPMADD52HUQ "huq")]) +(define_insn "avx_vpmadd52_" + [(set (match_operand:VI8_AVX2 0 "register_operand" "=x") + (unspec:VI8_AVX2 + [(match_operand:VI8_AVX2 1 "register_operand" "0") + (match_operand:VI8_AVX2 2 "register_operand" "x") + (match_operand:VI8_AVX2 3 "nonimmediate_operand" "xm")] + VPMADD52))] + "TARGET_AVXIFMA" + "%{vex%} vpmadd52\t{%3, %2, %0|%0, %2, %3}" + [(set_attr "type" "ssemuladd") + (set_attr "prefix" "vex") + (set_attr "mode" "")]) + (define_expand "vpamdd52huq_maskz" [(match_operand:VI8_AVX512VL 0 "register_operand") (match_operand:VI8_AVX512VL 1 "register_operand") @@ -27895,7 +27908,7 @@ DONE; }) -(define_insn "vpamdd52" +(define_insn "vpamdd52" [(set (match_operand:VI8_AVX512VL 0 "register_operand" "=v") (unspec:VI8_AVX512VL [(match_operand:VI8_AVX512VL 1 "register_operand" "0") @@ -27903,7 +27916,32 @@ (match_operand:VI8_AVX512VL 3 "nonimmediate_operand" "vm")] VPMADD52))] "TARGET_AVX512IFMA" - "vpmadd52\t{%3, %2, %0|%0, %2, %3}" +{ + if ( <=32 + && TARGET_AVXIFMA + && !EXT_REX_SSE_REG_P (operands[1]) + && !EXT_REX_SSE_REG_P (operands[2]) + && !EXT_REX_SSE_REG_P (operands[3])) + return "%{vex%} vpmadd52\t{%3, %2, %0|%0, %2, %3}"; + else + return "vpmadd52\t{%3, %2, %0|%0, %2, %3}"; +} + [(set_attr "type" "ssemuladd") + (set_attr "prefix" "maybe_evex") + (set_attr "mode" "")]) + +(define_insn "vpamdd52_maskz_1" + [(set (match_operand:VI8_AVX512VL 0 "register_operand" "=v") + (vec_merge:VI8_AVX512VL + (unspec:VI8_AVX512VL + [(match_operand:VI8_AVX512VL 1 "register_operand" "0") + (match_operand:VI8_AVX512VL 2 "register_operand" "v") + (match_operand:VI8_AVX512VL 3 "nonimmediate_operand" "vm")] + VPMADD52) + (match_operand:VI8_AVX512VL 4 "const0_operand" "C") + (match_operand: 5 "register_operand" "Yk")))] + "TARGET_AVX512IFMA" + "vpmadd52\t{%3, %2, %0%{%5%}%{z%}|%0%{%5%}%{z%}, %2, %3}" [(set_attr "type" "ssemuladd") (set_attr "prefix" "evex") (set_attr "mode" "")]) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index cfbe32afce9..edecf5c0070 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -7060,6 +7060,11 @@ Enable/disable the generation of the WIDEKL instructions. @cindex @code{target("avxvnni")} function attribute, x86 Enable/disable the generation of the AVXVNNI instructions. +@item avxifma +@itemx no-avxifma +@cindex @code{target("avxifma")} function attribute, x86 +Enable/disable the generation of the AVXIFMA instructions. + @item cld @itemx no-cld @cindex @code{target("cld")} function attribute, x86 diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index a9ecc4426a4..886fc1d0164 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1436,7 +1436,7 @@ See RS/6000 and PowerPC Options. -mavx5124fmaps -mavx512vnni -mavx5124vnniw -mprfchw -mrdpid @gol -mrdseed -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol -mamx-tile -mamx-int8 -mamx-bf16 -muintr -mhreset -mavxvnni@gol --mavx512fp16 @gol +-mavx512fp16 -mavxifma @gol -mcldemote -mms-bitfields -mno-align-stringops -minline-all-stringops @gol -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol -mkl -mwidekl @gol @@ -32893,6 +32893,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}. @need 200 @itemx -mwidekl @opindex mwidekl +@need 200 +@itemx -mavxifma +@opindex mavxifma These switches enable the use of instructions in the MMX, SSE, SSE2, SSE3, SSSE3, SSE4, SSE4A, SSE4.1, SSE4.2, AVX, AVX2, AVX512F, AVX512PF, AVX512ER, AVX512CD, AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA, AVX512VBMI, SHA, @@ -32902,8 +32905,8 @@ WBNOINVD, FMA4, PREFETCHW, RDPID, PREFETCHWT1, RDSEED, SGX, XOP, LWP, XSAVEOPT, XSAVEC, XSAVES, RTM, HLE, TBM, MWAITX, CLZERO, PKU, AVX512VBMI2, GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16, ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE, -UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512FP16 -or CLDEMOTE extended instruction sets. Each has a corresponding +UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512FP16, +AVXIFMA or CLDEMOTE extended instruction sets. Each has a corresponding @option{-mno-} option to disable use of these instructions. These extensions are also available as built-in functions: see diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index c81e2ffd43a..0173acf4a65 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -2490,6 +2490,9 @@ Target supports the execution of @code{avx512f} instructions. @item avx512vp2intersect Target supports the execution of @code{avx512vp2intersect} instructions. +@item avxifma +Target supports the execution of @code{avxifma} instructions. + @item amx_tile Target supports the execution of @code{amx-tile} instructions. diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C index fba3d1ac684..5388606779b 100644 --- a/gcc/testsuite/g++.dg/other/i386-2.C +++ b/gcc/testsuite/g++.dg/other/i386-2.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */ +/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C index 5cc0fa83457..86cedd3d32f 100644 --- a/gcc/testsuite/g++.dg/other/i386-3.C +++ b/gcc/testsuite/g++.dg/other/i386-3.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */ +/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/gcc.target/i386/avx-check.h b/gcc/testsuite/gcc.target/i386/avx-check.h index 7ddca9d7b80..24ee6ab4efd 100644 --- a/gcc/testsuite/gcc.target/i386/avx-check.h +++ b/gcc/testsuite/gcc.target/i386/avx-check.h @@ -22,7 +22,11 @@ main () /* Run AVX test only if host has AVX support. */ if (((ecx & (bit_AVX | bit_OSXSAVE)) == (bit_AVX | bit_OSXSAVE)) - && avx_os_support ()) + && avx_os_support () +#ifdef AVXIFMA + && __builtin_cpu_supports ("avxifma") +#endif + ) { do_test (); #ifdef DEBUG diff --git a/gcc/testsuite/gcc.target/i386/avx-ifma-1.c b/gcc/testsuite/gcc.target/i386/avx-ifma-1.c new file mode 100644 index 00000000000..6388373123c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ifma-1.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-mavxifma -O2" } */ +/* { dg-final { scan-assembler-times "\{vex\} vpmadd52huq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vpmadd52luq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vpmadd52huq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vpmadd52luq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+" 1 } } */ + +#include + +volatile __m256i x,y,z; +volatile __m128i x_,y_,z_; + +void extern +avxifma_test (void) +{ + x = _mm256_madd52hi_avx_epu64 (x, y, z); + x = _mm256_madd52lo_avx_epu64 (x, y, z); + x_ = _mm_madd52hi_avx_epu64 (x_, y_, z_); + x_ = _mm_madd52lo_avx_epu64 (x_, y_, z_); +} diff --git a/gcc/testsuite/gcc.target/i386/avx-ifma-vpmaddhuq-2.c b/gcc/testsuite/gcc.target/i386/avx-ifma-vpmaddhuq-2.c new file mode 100644 index 00000000000..c9efee33091 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ifma-vpmaddhuq-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavxifma" } */ +/* { dg-require-effective-target avxifma } */ +#define AVXIFMA +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +void +CALC (long long *r, long long *s1, long long *s2, long long *s3, int size) +{ + int i; + long long a,b; + + for (i = 0; i < size; i++) + { + /* Simulate higher 52 bits out of 104 bit, + by shifting opernads with 0 in lower 26 bits. */ + a = s2[i] >> 26; + b = s3[i] >> 26; + r[i] = a * b + s1[i]; + } +} + +void +TEST (void) +{ + union256i_q src1_256, src2_256, dst_256; + union128i_q src1_128, src2_128, dst_128; + long long dst_ref_256[4], dst_ref_128[2]; + int i; + + for (i = 0; i < 4; i++) + { + src1_256.a[i] = 15 + 3467 * i; + src2_256.a[i] = 9217 + i; + src1_256.a[i] = src1_256.a[i] << 26; + src2_256.a[i] = src2_256.a[i] << 26; + src1_256.a[i] &= ((1LL << 52) - 1); + src2_256.a[i] &= ((1LL << 52) - 1); + dst_256.a[i] = -1; + } + + for (i = 0; i < 2; i++) + { + src1_128.a[i] = 16 + 3467 * i; + src2_128.a[i] = 9127 + i; + src1_128.a[i] = src1_128.a[i] << 26; + src2_128.a[i] = src2_128.a[i] << 26; + src1_128.a[i] &= ((1LL << 52) - 1); + src2_128.a[i] &= ((1LL << 52) - 1); + dst_128.a[i] = -1; + } + + CALC (dst_ref_256, dst_256.a, src1_256.a, src2_256.a, 4); + dst_256.x = _mm256_madd52hi_avx_epu64 (dst_256.x, src1_256.x, src2_256.x); + if (check_union256i_q (dst_256, dst_ref_256)) + abort (); + + CALC (dst_ref_128, dst_128.a, src1_128.a, src2_128.a, 2); + dst_128.x = _mm_madd52hi_avx_epu64 (dst_128.x, src1_128.x, src2_128.x); + if (check_union128i_q (dst_128, dst_ref_128)) + abort (); + +} + diff --git a/gcc/testsuite/gcc.target/i386/avx-ifma-vpmaddluq-2.c b/gcc/testsuite/gcc.target/i386/avx-ifma-vpmaddluq-2.c new file mode 100644 index 00000000000..600978ea9ad --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ifma-vpmaddluq-2.c @@ -0,0 +1,61 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavxifma" } */ +/* { dg-require-effective-target avxifma } */ +#define AVXIFMA +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +void +CALC (unsigned long long *r, unsigned long long *s1, + unsigned long long *s2, unsigned long long *s3, + int size) +{ + int i; + + for (i = 0; i < size; i++) + { + r[i] = s2[i] * s3[i] + s1[i]; + } +} + +void +TEST (void) +{ + union256i_q src1_256, src2_256, dst_256; + union128i_q src1_128, src2_128, dst_128; + unsigned long long dst_ref_256[4], dst_ref_128[2]; + int i; + + for (i = 0; i < 4; i++) + { + src1_256.a[i] = 3450 * i; + src2_256.a[i] = 7863 * i; + dst_256.a[i] = 117; + } + + for (i = 0; i < 2; i++) + { + src1_128.a[i] = 3540 * i; + src2_128.a[i] = 7683 * i; + dst_128.a[i] = 117; + } + + CALC (dst_ref_256, dst_256.a, src1_256.a, src2_256.a, 4); + dst_256.x = _mm256_madd52lo_avx_epu64 (dst_256.x, src1_256.x, src2_256.x); + if (check_union256i_q (dst_256, dst_ref_256)) + abort (); + + CALC (dst_ref_128, dst_128.a, src1_128.a, src2_128.a, 2); + dst_128.x = _mm_madd52lo_avx_epu64 (dst_128.x, src1_128.x, src2_128.x); + if (check_union128i_q (dst_128, dst_ref_128)) + abort (); + +} + diff --git a/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-1.c b/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-1a.c similarity index 100% rename from gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-1.c rename to gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-1a.c diff --git a/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-1b.c b/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-1b.c new file mode 100644 index 00000000000..67e94baa01b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddhuq-1b.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512ifma -mavx512vl -mavxifma -O2" } */ +/* { dg-final { scan-assembler-times "vpmadd52huq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vpmadd52huq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52huq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52huq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52huq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vpmadd52huq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52huq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52huq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52huq\[ \\t\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "vpmadd52huq\[ \\t\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52huq\[ \\t\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}" 1 } } */ + +#include + +volatile __m512i _x1, _y1, _z1; +volatile __m256i _x2, _y2, _z2; +volatile __m128i _x3, _y3, _z3; + +void extern +avx512ifma_test (void) +{ + _x3 = _mm_madd52hi_epu64 (_x3, _y3, _z3); + _x3 = _mm_mask_madd52hi_epu64 (_x3, 2, _y3, _z3); + _x3 = _mm_maskz_madd52hi_epu64 (2, _x3, _y3, _z3); + _x2 = _mm256_madd52hi_epu64 (_x2, _y2, _z2); + _x2 = _mm256_mask_madd52hi_epu64 (_x2, 3, _y2, _z2); + _x2 = _mm256_maskz_madd52hi_epu64 (3, _x2, _y2, _z2); + _x1 = _mm512_madd52hi_epu64 (_x1, _y1, _z1); + _x1 = _mm512_mask_madd52hi_epu64 (_x1, 3, _y1, _z1); + _x1 = _mm512_maskz_madd52hi_epu64 (3, _x1, _y1, _z1); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-1.c b/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-1a.c similarity index 100% rename from gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-1.c rename to gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-1a.c diff --git a/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-1b.c b/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-1b.c new file mode 100644 index 00000000000..4b8ea27f403 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512ifma-vpmaddluq-1b.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512ifma -mavx512vl -mavxifma -O2" } */ +/* { dg-final { scan-assembler-times "vpmadd52luq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vpmadd52luq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52luq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52luq\[ \\t\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\[^\n\]*%xmm\[0-9\]+\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52luq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vpmadd52luq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52luq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52luq\[ \\t\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\[^\n\]*%ymm\[0-9\]+\{%k\[1-7\]\}\{z\}" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52luq\[ \\t\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+" 3 } } */ +/* { dg-final { scan-assembler-times "vpmadd52luq\[ \\t\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\[^\{\]" 1 } } */ +/* { dg-final { scan-assembler-times "vpmadd52luq\[ \\t\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\[^\n\]*%zmm\[0-9\]+\{%k\[1-7\]\}\{z\}" 1 } } */ + +#include + +volatile __m512i _x1, _y1, _z1; +volatile __m256i _x2, _y2, _z2; +volatile __m128i _x3, _y3, _z3; + +void extern +avx512ifma_test (void) +{ + _x3 = _mm_madd52lo_epu64 (_x3, _y3, _z3); + _x3 = _mm_mask_madd52lo_epu64 (_x3, 2, _y3, _z3); + _x3 = _mm_maskz_madd52lo_epu64 (2, _x3, _y3, _z3); + _x2 = _mm256_madd52lo_epu64 (_x2, _y2, _z2); + _x2 = _mm256_mask_madd52lo_epu64 (_x2, 3, _y2, _z2); + _x2 = _mm256_maskz_madd52lo_epu64 (3, _x2, _y2, _z2); + _x1 = _mm512_madd52lo_epu64 (_x1, _y1, _z1); + _x1 = _mm512_mask_madd52lo_epu64 (_x1, 3, _y1, _z1); + _x1 = _mm512_maskz_madd52lo_epu64 (3, _x1, _y1, _z1); +} diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc index b76dddb86a2..466555c0d06 100644 --- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc +++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc @@ -80,6 +80,7 @@ extern void test_keylocker (void) __attribute__((__target__("kl"))); extern void test_widekl (void) __attribute__((__target__("widekl"))); extern void test_avxvnni (void) __attribute__((__target__("avxvnni"))); extern void test_avx512fp16 (void) __attribute__((__target__("avx512fp16"))); +extern void test_avxifma (void) __attribute__((__target__("avxifma"))); extern void test_no_sgx (void) __attribute__((__target__("no-sgx"))); extern void test_no_avx5124fmaps(void) __attribute__((__target__("no-avx5124fmaps"))); @@ -161,6 +162,7 @@ extern void test_no_keylocker (void) __attribute__((__target__("no-kl"))); extern void test_no_widekl (void) __attribute__((__target__("no-widekl"))); extern void test_no_avxvnni (void) __attribute__((__target__("no-avxvnni"))); extern void test_no_avx512fp16 (void) __attribute__((__target__("no-avx512fp16"))); +extern void test_no_avxifma (void) __attribute__((__target__("no-avxifma"))); extern void test_arch_nocona (void) __attribute__((__target__("arch=nocona"))); extern void test_arch_core2 (void) __attribute__((__target__("arch=core2"))); diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c index 375d4d1b4de..fde56261d8f 100644 --- a/gcc/testsuite/gcc.target/i386/sse-12.c +++ b/gcc/testsuite/gcc.target/i386/sse-12.c @@ -3,7 +3,7 @@ popcntintrin.h gfniintrin.h and mm_malloc.h are usable with -O -std=c89 -pedantic-errors. */ /* { dg-do compile } */ -/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni" } */ +/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavxifma" } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index e285c307d00..bb29555babe 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */ +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma" } */ /* { dg-add-options bind_pic_locally } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index f41493b93f3..f2701ddaaf9 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16" } */ +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma" } */ /* { dg-add-options bind_pic_locally } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 31492ef3697..3d196975b1e 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -103,7 +103,7 @@ #ifndef DIFFERENT_PRAGMAS -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16") +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma") #endif /* Following intrinsics require immediate arguments. They @@ -220,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1) /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */ #ifdef DIFFERENT_PRAGMAS -#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16") +#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma") #endif #include test_1 (_cvtss_sh, unsigned short, float, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index f71a7b29157..d3a233f90fc 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -843,6 +843,6 @@ #define __builtin_ia32_vpclmulqdq_v2di(A, B, C) __builtin_ia32_vpclmulqdq_v2di(A, B, 1) #define __builtin_ia32_vpclmulqdq_v8di(A, B, C) __builtin_ia32_vpclmulqdq_v8di(A, B, 1) -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16") +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma") #include diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index fdd88e6a516..69de3b96bfc 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -9506,6 +9506,18 @@ proc check_effective_target_avxvnni { } { } "-mavxvnni" ] } +# Return 1 if avxifma instructions can be compiled. +proc check_effective_target_avxifma { } { + return [check_no_compiler_messages avxifma object { + typedef long long __v4di __attribute__ ((__vector_size__ (32))); + __v4di + _mm256_maddlo_avx_epu64 (__v4di __X, __v4di __Y, __v4di __Z) + { + return __builtin_ia32_avx_vpmadd52luq256 (__X, __Y, __Z); + } + } "-O0 -mavxifma" ] +} + # Return 1 if sse instructions can be compiled. proc check_effective_target_sse { } { return [check_no_compiler_messages sse object { From patchwork Fri Oct 14 07:54:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jiang, Haochen" X-Patchwork-Id: 2556 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp54292wrs; Fri, 14 Oct 2022 00:57:56 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7EzcA63sbPFo0sm0StaPCR5XiI2w1w2LhSMhMlUrMFKKjH4NpzayI+u/+t0r46zNqSf+Ak X-Received: by 2002:a17:906:8a62:b0:78d:a05c:c37f with SMTP id hy2-20020a1709068a6200b0078da05cc37fmr2619970ejc.159.1665734276436; Fri, 14 Oct 2022 00:57:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665734276; cv=none; d=google.com; s=arc-20160816; b=bCGy5RR6EDcFPtPcESDn9fNGiaCmXtskQUly1zDw3UUX+tTovVDsBo8OvqdnR6QoYS 85nbYSWf4TtkFBCIXLW3XpSZETLmiatDeZbBKYi5hM8eg8MV5XFLCjidoB/MM6gtKAw9 LWnFnsAnq6C1OXn4V77gWE8Hwd5ZvN4UhucUN1UsrFJGOMUV2Xl2jw5R/4K+c3+BWKnU TH1l7QnfO204pkUkif8/r3UJLYvE+3S9GarDGxoTVVJAtkeBsnaEygJCPeGdi+sQa+Dq x9PZqWbVMGNzZ7s40ta430SJcgnHvi+KsSc/zM96lb6uV7pILpzoU3DoECqIs2UXYZJ0 nRmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :references:in-reply-to:message-id:date:subject:to:dmarc-filter :delivered-to:dkim-signature:dkim-filter; bh=JoBt8gvQqSCR4MxTz3CQ6NiXXqyxYjF4Ni7KwZhq7ZE=; b=JNsp/KDCzuzMb0idqF4tpw5uTvPVq26WYk6F6EgYjqR1VZcNj8qxVTjLY7nMGfbBuQ YL5sKTj04a6pW5KzqVvncqyh5fW9UVNreB1lvIp/OoOONVdjXT6HY9Qe9q6Y2AVchrkj w1ovpJye0onQiZuMTeqt/YlcxfudjL2weDl0g8268UN3du1nM2EFLmb9FofOQN+sx2BP s1V1vqPU4Z+smrM4kIr53WFNRenosOaaXcKXgM9AVo/ZeOWVxpiLYuq6Y/JqzdZ791PD bpPMbhnnhYZHVs9pboer0G4oBLdPLal0Bit/xMhD3WMAYJlPYLYjaxKFJF9XsT4yZSwc R9oQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=f90qCHzP; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id gb6-20020a170907960600b007414dda0c62si2221786ejc.817.2022.10.14.00.57.56 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Oct 2022 00:57:56 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=f90qCHzP; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BB1C5385020A for ; Fri, 14 Oct 2022 07:56:46 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BB1C5385020A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1665734206; bh=JoBt8gvQqSCR4MxTz3CQ6NiXXqyxYjF4Ni7KwZhq7ZE=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=f90qCHzPvbNTzMWkaLhK1wwceCw0IFxNHHuk1tODvLw4fywHS8S8RsmeQFdHIcBkq pf2a+xVLbYAT8xYG1Bqp4YsR9OPYrENLifYudMfSs/AtRwan+IA+K1d6TaXfN1QLfj kmshPtFuD/1wZC9K3kbYMrARlLoJIO0vdnuEDeI0= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id 824593858C39 for ; Fri, 14 Oct 2022 07:55:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 824593858C39 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="288597870" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="288597870" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 00:55:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="627488400" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="627488400" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga002.jf.intel.com with ESMTP; 14 Oct 2022 00:54:48 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id E05DD1009C8D; Fri, 14 Oct 2022 15:54:47 +0800 (CST) To: gcc-patches@gcc.gnu.org Subject: [PATCH 2/6] Support Intel AVX-VNNI-INT8 Date: Fri, 14 Oct 2022 15:54:41 +0800 Message-Id: <20221014075445.7938-3-haochen.jiang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20221014075445.7938-1-haochen.jiang@intel.com> References: <20221014075445.7938-1-haochen.jiang@intel.com> X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Haochen Jiang via Gcc-patches From: "Jiang, Haochen" Reply-To: Haochen Jiang Cc: hongtao.liu@intel.com Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746648984697081656?= X-GMAIL-MSGID: =?utf-8?q?1746648984697081656?= From: Kong Lingling gcc/ChangeLog * common/config/i386/cpuinfo.h (get_available_features): Detect avxvnniint8. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AVXVNNIINT8_SET): New. (OPTION_MASK_ISA2_AVXVNNIINT8_UNSET): Ditto. (ix86_handle_option): Handle -mavxvnniint8. * common/config/i386/i386-cpuinfo.h (enum processor_features): Add FEATURE_AVXVNNIINT8. * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for avxvnniint8. * config.gcc: Add avxvnniint8intrin.h. * config/i386/avxvnniint8intrin.h: New file. * config/i386/cpuid.h (bit_AVXVNNIINT8): New. * config/i386/i386-builtin.def: Add new builtins. * config/i386/i386-c.cc (ix86_target_macros_internal): Define __AVXVNNIINT8__. * config/i386/i386-options.cc (isa2_opts): Add -mavxvnniint8. (ix86_valid_target_attribute_inner_p): Handle avxvnniint8. * config/i386/i386-isa.def: Add DEF_PTA(AVXVNNIINT8) New.. * config/i386/i386.opt: Add option -mavxvnniint8. * config/i386/immintrin.h: Include avxvnniint8intrin.h. * config/i386/sse.md (vpdp_): New define_insn. * doc/extend.texi: Document avxvnniint8. * doc/invoke.texi: Document -mavxvnniint8. * doc/sourcebuild.texi: Document target avxvnniint8. gcc/testsuite/ChangeLog * g++.dg/other/i386-2.C: Add -mavxvnniint8. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/avx-check.h: Add avxvnniint8 check. * gcc.target/i386/sse-12.c: Add -mavxvnniint8. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/funcspec-56.inc: Add new target attribute. * lib/target-supports.exp (check_effective_target_avxvnniint8): New. * gcc.target/i386/avxvnniint8-1.c: Ditto. * gcc.target/i386/avxvnniint8-vpdpbssd-2.c: Ditto. * gcc.target/i386/avxvnniint8-vpdpbssds-2.c: Ditto. * gcc.target/i386/avxvnniint8-vpdpbsud-2.c: Ditto. * gcc.target/i386/avxvnniint8-vpdpbsuds-2.c: Ditto. * gcc.target/i386/avxvnniint8-vpdpbuud-2.c: Ditto. * gcc.target/i386/avxvnniint8-vpdpbuuds-2.c: Ditto. Co-authored-by: Hongyu Wang Co-authored-by: Haochen Jiang --- gcc/common/config/i386/cpuinfo.h | 2 + gcc/common/config/i386/i386-common.cc | 22 ++- gcc/common/config/i386/i386-cpuinfo.h | 1 + gcc/common/config/i386/i386-isas.h | 2 + gcc/config.gcc | 2 +- gcc/config/i386/avxvnniint8intrin.h | 138 ++++++++++++++++++ gcc/config/i386/cpuid.h | 1 + gcc/config/i386/i386-builtin.def | 14 ++ gcc/config/i386/i386-c.cc | 2 + gcc/config/i386/i386-isa.def | 1 + gcc/config/i386/i386-options.cc | 4 +- gcc/config/i386/i386.opt | 5 + gcc/config/i386/immintrin.h | 2 + gcc/config/i386/sse.md | 31 ++++ gcc/doc/extend.texi | 5 + gcc/doc/invoke.texi | 9 +- gcc/doc/sourcebuild.texi | 3 + gcc/testsuite/g++.dg/other/i386-2.C | 2 +- gcc/testsuite/g++.dg/other/i386-3.C | 2 +- gcc/testsuite/gcc.target/i386/avx-check.h | 3 + gcc/testsuite/gcc.target/i386/avxvnniint8-1.c | 43 ++++++ .../gcc.target/i386/avxvnniint8-vpdpbssd-2.c | 72 +++++++++ .../gcc.target/i386/avxvnniint8-vpdpbssds-2.c | 72 +++++++++ .../gcc.target/i386/avxvnniint8-vpdpbsud-2.c | 72 +++++++++ .../gcc.target/i386/avxvnniint8-vpdpbsuds-2.c | 72 +++++++++ .../gcc.target/i386/avxvnniint8-vpdpbuud-2.c | 72 +++++++++ .../gcc.target/i386/avxvnniint8-vpdpbuuds-2.c | 72 +++++++++ gcc/testsuite/gcc.target/i386/funcspec-56.inc | 2 + gcc/testsuite/gcc.target/i386/sse-12.c | 2 +- gcc/testsuite/gcc.target/i386/sse-13.c | 2 +- gcc/testsuite/gcc.target/i386/sse-14.c | 2 +- gcc/testsuite/gcc.target/i386/sse-22.c | 4 +- gcc/testsuite/gcc.target/i386/sse-23.c | 2 +- gcc/testsuite/lib/target-supports.exp | 12 ++ 34 files changed, 738 insertions(+), 14 deletions(-) create mode 100644 gcc/config/i386/avxvnniint8intrin.h create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint8-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbssd-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbssds-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbsud-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbsuds-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbuud-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbuuds-2.c diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h index 9bb21c6cacc..bed88003f8e 100644 --- a/gcc/common/config/i386/cpuinfo.h +++ b/gcc/common/config/i386/cpuinfo.h @@ -795,6 +795,8 @@ get_available_features (struct __processor_model *cpu_model, set_feature (FEATURE_AVXVNNI); if (eax & bit_AVXIFMA) set_feature (FEATURE_AVXIFMA); + if (edx & bit_AVXVNNIINT8) + set_feature (FEATURE_AVXVNNIINT8); } if (avx512_usable) { diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index 4de7906b247..6a2a7e3d25a 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -108,6 +108,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA2_AMX_TILE_SET OPTION_MASK_ISA2_AMX_TILE #define OPTION_MASK_ISA2_AMX_INT8_SET OPTION_MASK_ISA2_AMX_INT8 #define OPTION_MASK_ISA2_AMX_BF16_SET OPTION_MASK_ISA2_AMX_BF16 +#define OPTION_MASK_ISA2_AVXVNNIINT8_SET OPTION_MASK_ISA2_AVXVNNIINT8 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same as -msse4.2. */ @@ -214,7 +215,7 @@ along with GCC; see the file COPYING3. If not see (OPTION_MASK_ISA_AVX2 | OPTION_MASK_ISA_AVX512F_UNSET) #define OPTION_MASK_ISA2_AVX2_UNSET \ (OPTION_MASK_ISA2_AVXIFMA_UNSET | OPTION_MASK_ISA2_AVXVNNI_UNSET \ - | OPTION_MASK_ISA2_AVX512F_UNSET) + | OPTION_MASK_ISA2_AVXVNNIINT8_UNSET | OPTION_MASK_ISA2_AVX512F_UNSET) #define OPTION_MASK_ISA_AVX512F_UNSET \ (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_AVX512CD_UNSET \ | OPTION_MASK_ISA_AVX512PF_UNSET | OPTION_MASK_ISA_AVX512ER_UNSET \ @@ -278,6 +279,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA2_KL_UNSET \ (OPTION_MASK_ISA2_KL | OPTION_MASK_ISA2_WIDEKL_UNSET) #define OPTION_MASK_ISA2_WIDEKL_UNSET OPTION_MASK_ISA2_WIDEKL +#define OPTION_MASK_ISA2_AVXVNNIINT8_UNSET OPTION_MASK_ISA2_AVXVNNIINT8 /* SSE4 includes both SSE4.1 and SSE4.2. -mno-sse4 should the same as -mno-sse4.1. */ @@ -1142,6 +1144,24 @@ ix86_handle_option (struct gcc_options *opts, } return true; + case OPT_mavxvnniint8: + if (value) + { + opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVXVNNIINT8_SET; + opts->x_ix86_isa_flags2_explicit |= + OPTION_MASK_ISA2_AVXVNNIINT8_SET; + opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX2_SET; + opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX2_SET; + } + else + { + opts->x_ix86_isa_flags2 &= + ~OPTION_MASK_ISA2_AVXVNNIINT8_UNSET; + opts->x_ix86_isa_flags2_explicit |= + OPTION_MASK_ISA2_AVXVNNIINT8_UNSET; + } + return true; + case OPT_mfma: if (value) { diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h index 968f9a56a6c..9a6b92fab79 100644 --- a/gcc/common/config/i386/i386-cpuinfo.h +++ b/gcc/common/config/i386/i386-cpuinfo.h @@ -241,6 +241,7 @@ enum processor_features FEATURE_X86_64_V3, FEATURE_X86_64_V4, FEATURE_AVXIFMA, + FEATURE_AVXVNNIINT8, CPU_FEATURE_MAX }; diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h index b05b4bb8f0d..8c1f351056c 100644 --- a/gcc/common/config/i386/i386-isas.h +++ b/gcc/common/config/i386/i386-isas.h @@ -176,4 +176,6 @@ ISA_NAMES_TABLE_START ISA_NAMES_TABLE_ENTRY("x86-64-v3", FEATURE_X86_64_V3, P_X86_64_V3, NULL) ISA_NAMES_TABLE_ENTRY("x86-64-v4", FEATURE_X86_64_V4, P_X86_64_V4, NULL) ISA_NAMES_TABLE_ENTRY("avxifma", FEATURE_AVXIFMA, P_NONE, "-mavxifma") + ISA_NAMES_TABLE_ENTRY("avxvnniint8", FEATURE_AVXVNNIINT8, + P_NONE, "-mavxvnniint8") ISA_NAMES_TABLE_END diff --git a/gcc/config.gcc b/gcc/config.gcc index 12365abbf86..4df78238910 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -422,7 +422,7 @@ i[34567]86-*-* | x86_64-*-*) amxbf16intrin.h x86gprintrin.h uintrintrin.h hresetintrin.h keylockerintrin.h avxvnniintrin.h mwaitintrin.h avx512fp16intrin.h avx512fp16vlintrin.h - avxifmaintrin.h" + avxifmaintrin.h avxvnniint8intrin.h" ;; ia64-*-*) extra_headers=ia64intrin.h diff --git a/gcc/config/i386/avxvnniint8intrin.h b/gcc/config/i386/avxvnniint8intrin.h new file mode 100644 index 00000000000..362e6f65c2a --- /dev/null +++ b/gcc/config/i386/avxvnniint8intrin.h @@ -0,0 +1,138 @@ +/* Copyright (C) 2020 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#if !defined _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVXVNNIINT8INTRIN_H_INCLUDED +#define _AVXVNNIINT8INTRIN_H_INCLUDED + +#if !defined(__AVXVNNIINT8__) +#pragma GCC push_options +#pragma GCC target("avxvnniint8") +#define __DISABLE_AVXVNNIINT8__ +#endif /* __AVXVNNIINT8__ */ + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_dpbssd_epi32 (__m128i __W, __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbssd128 ((__v4si) __W, (__v4si) __A, (__v4si) __B); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_dpbssds_epi32 (__m128i __W, __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbssds128 ((__v4si) __W, (__v4si) __A, (__v4si) __B); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_dpbsud_epi32 (__m128i __W, __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbsud128 ((__v4si) __W, (__v4si) __A, (__v4si) __B); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_dpbsuds_epi32 (__m128i __W, __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbsuds128 ((__v4si) __W, (__v4si) __A, (__v4si) __B); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_dpbuud_epi32 (__m128i __W, __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbuud128 ((__v4si) __W, (__v4si) __A, (__v4si) __B); +} + +extern __inline __m128i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_dpbuuds_epi32 (__m128i __W, __m128i __A, __m128i __B) +{ + return (__m128i) + __builtin_ia32_vpdpbuuds128 ((__v4si) __W, (__v4si) __A, (__v4si) __B); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_dpbssd_epi32 (__m256i __W, __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbssd256 ((__v8si) __W, (__v8si) __A, (__v8si) __B); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_dpbssds_epi32 (__m256i __W, __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbssds256 ((__v8si) __W, (__v8si) __A, (__v8si) __B); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_dpbsud_epi32 (__m256i __W, __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbsud256 ((__v8si) __W, (__v8si) __A, (__v8si) __B); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_dpbsuds_epi32 (__m256i __W, __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbsuds256 ((__v8si) __W, (__v8si) __A, (__v8si) __B); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_dpbuud_epi32 (__m256i __W, __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbuud256 ((__v8si) __W, (__v8si) __A, (__v8si) __B); +} + +extern __inline __m256i +__attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_dpbuuds_epi32 (__m256i __W, __m256i __A, __m256i __B) +{ + return (__m256i) + __builtin_ia32_vpdpbuuds256 ((__v8si) __W, (__v8si) __A, (__v8si) __B); +} + +#ifdef __DISABLE_AVXVNNIINT8__ +#undef __DISABLE_AVXVNNIINT8__ +#pragma GCC pop_options +#endif /* __DISABLE_AVXVNNIINT8__ */ + +#endif /* __AVXVNNIINT8INTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index 9885699efd5..f5fad22149a 100644 --- a/gcc/config/i386/cpuid.h +++ b/gcc/config/i386/cpuid.h @@ -49,6 +49,7 @@ #define bit_RDRND (1 << 30) /* %edx */ +#define bit_AVXVNNIINT8 (1 << 4) #define bit_CMPXCHG8B (1 << 8) #define bit_CMOV (1 << 15) #define bit_MMX (1 << 23) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index 4a89099a00f..e6edae5728b 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -2696,6 +2696,20 @@ BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_A BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssds_v4si_mask, "__builtin_ia32_vpdpwssds_v4si_mask", IX86_BUILTIN_VPDPWSSDSV4SI_MASK, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) BDESC (OPTION_MASK_ISA_AVX512VNNI | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpdpwssds_v4si_maskz, "__builtin_ia32_vpdpwssds_v4si_maskz", IX86_BUILTIN_VPDPWSSDSV4SI_MASKZ, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI_UQI) +/* AVXVNNIINT8 */ +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbssd_v8si, "__builtin_ia32_vpdpbssd256", IX86_BUILTIN_VPDPBSSDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbssds_v8si, "__builtin_ia32_vpdpbssds256", IX86_BUILTIN_VPDPBSSDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbsud_v8si, "__builtin_ia32_vpdpbsud256", IX86_BUILTIN_VPDPBSUDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbsuds_v8si, "__builtin_ia32_vpdpbsuds256", IX86_BUILTIN_VPDPBSUDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbuud_v8si, "__builtin_ia32_vpdpbuud256", IX86_BUILTIN_VPDPBUUDV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbuuds_v8si, "__builtin_ia32_vpdpbuuds256", IX86_BUILTIN_VPDPBUUDSV8SI, UNKNOWN, (int) V8SI_FTYPE_V8SI_V8SI_V8SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbssd_v4si, "__builtin_ia32_vpdpbssd128", IX86_BUILTIN_VPDPBSSDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbssds_v4si, "__builtin_ia32_vpdpbssds128", IX86_BUILTIN_VPDPBSSDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbsud_v4si, "__builtin_ia32_vpdpbsud128", IX86_BUILTIN_VPDPBSUDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbsuds_v4si, "__builtin_ia32_vpdpbsuds128", IX86_BUILTIN_VPDPBSUDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbuud_v4si, "__builtin_ia32_vpdpbuud128", IX86_BUILTIN_VPDPBUUDV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) +BDESC (0, OPTION_MASK_ISA2_AVXVNNIINT8, CODE_FOR_vpdpbuuds_v4si, "__builtin_ia32_vpdpbuuds128", IX86_BUILTIN_VPDPBUUDSV4SI, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI_V4SI) + /* VPCLMULQDQ */ BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX512VL, 0, CODE_FOR_vpclmulqdq_v2di, "__builtin_ia32_vpclmulqdq_v2di", IX86_BUILTIN_VPCLMULQDQ2, UNKNOWN, (int) V2DI_FTYPE_V2DI_V2DI_INT) BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX, 0, CODE_FOR_vpclmulqdq_v4di, "__builtin_ia32_vpclmulqdq_v4di", IX86_BUILTIN_VPCLMULQDQ4, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT) diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc index 3494ec035d5..a9a35c0a18a 100644 --- a/gcc/config/i386/i386-c.cc +++ b/gcc/config/i386/i386-c.cc @@ -635,6 +635,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, def_or_undef (parse_in, "__AVXVNNI__"); if (isa_flag2 & OPTION_MASK_ISA2_AVXIFMA) def_or_undef (parse_in, "__AVXIFMA__"); + if (isa_flag2 & OPTION_MASK_ISA2_AVXVNNIINT8) + def_or_undef (parse_in, "__AVXVNNIINT8__"); if (TARGET_IAMCU) { def_or_undef (parse_in, "__iamcu"); diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def index 6e0254ce418..c95b917c6ce 100644 --- a/gcc/config/i386/i386-isa.def +++ b/gcc/config/i386/i386-isa.def @@ -110,3 +110,4 @@ DEF_PTA(WIDEKL) DEF_PTA(AVXVNNI) DEF_PTA(AVX512FP16) DEF_PTA(AVXIFMA) +DEF_PTA(AVXVNNIINT8) diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index 5facb64c2a8..3e6d04433a6 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -227,7 +227,8 @@ static struct ix86_target_opts isa2_opts[] = { "-mwidekl", OPTION_MASK_ISA2_WIDEKL }, { "-mavxvnni", OPTION_MASK_ISA2_AVXVNNI }, { "-mavx512fp16", OPTION_MASK_ISA2_AVX512FP16 }, - { "-mavxifma", OPTION_MASK_ISA2_AVXIFMA } + { "-mavxifma", OPTION_MASK_ISA2_AVXIFMA }, + { "-mavxvnniint8", OPTION_MASK_ISA2_AVXVNNIINT8 } }; static struct ix86_target_opts isa_opts[] = { @@ -1074,6 +1075,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[], IX86_ATTR_ISA ("avxvnni", OPT_mavxvnni), IX86_ATTR_ISA ("avx512fp16", OPT_mavx512fp16), IX86_ATTR_ISA ("avxifma", OPT_mavxifma), + IX86_ATTR_ISA ("avxvnniint8", OPT_mavxvnniint8), /* enum options */ IX86_ATTR_ENUM ("fpmath=", OPT_mfpmath_), diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 36e28b7063d..53d534f6392 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -1219,3 +1219,8 @@ mavxifma Target Mask(ISA2_AVXIFMA) Var(ix86_isa_flags2) Save Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, and AVXIFMA built-in functions and code generation. + +mavxvnniint8 +Target Mask(ISA2_AVXVNNIINT8) Var(ix86_isa_flags2) Save +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and +AVXVNNIINT8 built-in functions and code generation. diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index e9d4e975243..ddea249d09b 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -46,6 +46,8 @@ #include +#include + #include #include diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 331347569ea..49490a213ea 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -200,6 +200,13 @@ UNSPEC_COMPLEX_FCMUL UNSPEC_COMPLEX_MASK + ;; For AVX-VNNI-INT8 support + UNSPEC_VPDPBSSD + UNSPEC_VPDPBSSDS + UNSPEC_VPDPBSUD + UNSPEC_VPDPBSUDS + UNSPEC_VPDPBUUD + UNSPEC_VPDPBUUDS ]) (define_c_enum "unspecv" [ @@ -29241,3 +29248,27 @@ gcc_unreachable (); DONE; }) + +(define_int_iterator VPDOTPROD + [UNSPEC_VPDPBSSD + UNSPEC_VPDPBSSDS + UNSPEC_VPDPBSUD + UNSPEC_VPDPBSUDS + UNSPEC_VPDPBUUD + UNSPEC_VPDPBUUDS]) + +(define_int_attr vpdotprodtype + [(UNSPEC_VPDPBSSD "bssd") (UNSPEC_VPDPBSSDS "bssds") + (UNSPEC_VPDPBSUD "bsud") (UNSPEC_VPDPBSUDS "bsuds") + (UNSPEC_VPDPBUUD "buud") (UNSPEC_VPDPBUUDS "buuds")]) + +(define_insn "vpdp_" + [(set (match_operand:VI4_AVX 0 "register_operand" "=x") + (unspec:VI4_AVX + [(match_operand:VI4_AVX 1 "register_operand" "0") + (match_operand:VI4_AVX 2 "register_operand" "x") + (match_operand:VI4_AVX 3 "nonimmediate_operand" "xm")] + VPDOTPROD))] + "TARGET_AVXVNNIINT8" + "vpdp\t{%3, %2, %0|%0, %2, %3}" + [(set_attr "prefix" "vex")]) diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index edecf5c0070..9a8de9fc226 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -7065,6 +7065,11 @@ Enable/disable the generation of the AVXVNNI instructions. @cindex @code{target("avxifma")} function attribute, x86 Enable/disable the generation of the AVXIFMA instructions. +@item avxvnniint8 +@itemx no-avxvnniint8 +@cindex @code{target("avxvnniint8")} function attribute, x86 +Enable/disable the generation of the AVXVNNIINT8 instructions. + @item cld @itemx no-cld @cindex @code{target("cld")} function attribute, x86 diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 886fc1d0164..d4ff7549bf3 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1436,7 +1436,7 @@ See RS/6000 and PowerPC Options. -mavx5124fmaps -mavx512vnni -mavx5124vnniw -mprfchw -mrdpid @gol -mrdseed -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol -mamx-tile -mamx-int8 -mamx-bf16 -muintr -mhreset -mavxvnni@gol --mavx512fp16 -mavxifma @gol +-mavx512fp16 -mavxifma -mavxvnniint8 @gol -mcldemote -mms-bitfields -mno-align-stringops -minline-all-stringops @gol -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol -mkl -mwidekl @gol @@ -32896,6 +32896,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}. @need 200 @itemx -mavxifma @opindex mavxifma +@need 200 +@itemx -mavxvnniint8 +@opindex mavxvnniint8 These switches enable the use of instructions in the MMX, SSE, SSE2, SSE3, SSSE3, SSE4, SSE4A, SSE4.1, SSE4.2, AVX, AVX2, AVX512F, AVX512PF, AVX512ER, AVX512CD, AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA, AVX512VBMI, SHA, @@ -32906,8 +32909,8 @@ XSAVEOPT, XSAVEC, XSAVES, RTM, HLE, TBM, MWAITX, CLZERO, PKU, AVX512VBMI2, GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16, ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE, UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512FP16, -AVXIFMA or CLDEMOTE extended instruction sets. Each has a corresponding -@option{-mno-} option to disable use of these instructions. +AVXIFMA, AVXVNNIINT8 or CLDEMOTE extended instruction sets. Each has a +corresponding @option{-mno-} option to disable use of these instructions. These extensions are also available as built-in functions: see @ref{x86 Built-in Functions}, for details of the functions enabled and diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index 0173acf4a65..e21a1d381e0 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -2493,6 +2493,9 @@ Target supports the execution of @code{avx512vp2intersect} instructions. @item avxifma Target supports the execution of @code{avxifma} instructions. +@item avxvnniint8 +Target supports the execution of @code{avxvnniint8} instructions. + @item amx_tile Target supports the execution of @code{amx-tile} instructions. diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C index 5388606779b..ebd01fe47bc 100644 --- a/gcc/testsuite/g++.dg/other/i386-2.C +++ b/gcc/testsuite/g++.dg/other/i386-2.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma" } */ +/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C index 86cedd3d32f..b66498f1d4c 100644 --- a/gcc/testsuite/g++.dg/other/i386-3.C +++ b/gcc/testsuite/g++.dg/other/i386-3.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma" } */ +/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/gcc.target/i386/avx-check.h b/gcc/testsuite/gcc.target/i386/avx-check.h index 24ee6ab4efd..77507ca2edc 100644 --- a/gcc/testsuite/gcc.target/i386/avx-check.h +++ b/gcc/testsuite/gcc.target/i386/avx-check.h @@ -25,6 +25,9 @@ main () && avx_os_support () #ifdef AVXIFMA && __builtin_cpu_supports ("avxifma") +#endif +#ifdef AVXVNNIINT8 + && __builtin_cpu_supports ("avxvnniint8") #endif ) { diff --git a/gcc/testsuite/gcc.target/i386/avxvnniint8-1.c b/gcc/testsuite/gcc.target/i386/avxvnniint8-1.c new file mode 100644 index 00000000000..d6942f34d6e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avxvnniint8-1.c @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-options "-mavxvnniint8 -O2" } */ +/* { dg-final { scan-assembler-times "vpdpbssd\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssd\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbssds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbsuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuud\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuud\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+\[^\n\r]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vpdpbuuds\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ + + +#include + +volatile __m256i x,y,z; +volatile __m128i x_,y_,z_; +volatile __mmask8 m; + +void extern +avxvnniint8_test (void) +{ + x = _mm256_dpbssd_epi32 (x, y, z); + x_ = _mm_dpbssd_epi32 (x_, y_, z_); + + x = _mm256_dpbssds_epi32 (x, y, z); + x_ = _mm_dpbssds_epi32 (x_, y_, z_); + + x = _mm256_dpbsud_epi32 (x, y, z); + x_ = _mm_dpbsud_epi32 (x_, y_, z_); + + x = _mm256_dpbsuds_epi32 (x, y, z); + x_ = _mm_dpbsuds_epi32 (x_, y_, z_); + + x = _mm256_dpbuud_epi32 (x, y, z); + x_ = _mm_dpbuud_epi32 (x_, y_, z_); + + x = _mm256_dpbuuds_epi32 (x, y, z); + x_ = _mm_dpbuuds_epi32 (x_, y_, z_); +} diff --git a/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbssd-2.c b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbssd-2.c new file mode 100644 index 00000000000..5016de39621 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbssd-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavxvnniint8" } */ +/* { dg-require-effective-target avxvnniint8 } */ +#define AVXVNNIINT8 +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +static void +CALC (int *r, int *dst, char *s1, char *s2, int size) +{ + short tempres[32]; + for (int i = 0; i < size; i++) { + tempres[i] = (short) s1[i] * (short) s2[i]; + } + for (int i = 0; i < size / 4; i++) { + long long test = (long long) dst[i] + tempres[i * 4] + tempres[i * 4 + 1] + + tempres[i * 4 + 2] + tempres[i * 4 + 3]; + r[i] = test; + } +} + +void +TEST (void) +{ + int i; + union256i_d res_256; + union256i_b src2_256; + union256i_b src1_256; + int res_ref_256[8]; + + for (i = 0; i < 32; i++) + { + int sign = i % 2 ? 1 : -1; + src1_256.a[i] = 10 + 3 * i + sign; + src2_256.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 8; i++) + res_256.a[i] = 0x7fffffff; + + CALC (res_ref_256, res_256.a, src1_256.a, src2_256.a, 32); + res_256.x = _mm256_dpbssd_epi32 (res_256.x, src1_256.x, src2_256.x); + if (check_union256i_d (res_256, res_ref_256)) + abort (); + + union128i_d res_128; + union128i_b src2_128; + union128i_b src1_128; + int res_ref_128[4]; + + for (i = 0; i < 16; i++) + { + int sign = i % 2 ? 1 : -1; + src1_128.a[i] = 10 + 3 * i * i + sign; + src2_128.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 4; i++) + res_128.a[i] = 0x7fffffff; + + CALC (res_ref_128, res_128.a, src1_128.a, src2_128.a, 16); + res_128.x = _mm_dpbssd_epi32 (res_128.x, src1_128.x, src2_128.x); + if (check_union128i_d (res_128, res_ref_128)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbssds-2.c b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbssds-2.c new file mode 100644 index 00000000000..6de5062e917 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbssds-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavxvnniint8" } */ +/* { dg-require-effective-target avxvnniint8 } */ +#define AVXVNNIINT8 +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +static void +CALC (int *r, int *dst, char *s1, char *s2, int size) +{ + short tempres[32]; + for (int i = 0; i < size; i++) { + tempres[i] = (short) s1[i] * (short) s2[i]; + } + for (int i = 0; i < size / 4; i++) { + long long test = (long long) dst[i] + tempres[i * 4] + tempres[i * 4 + 1] + + tempres[i * 4 + 2] + tempres[i * 4 + 3]; + r[i] = test > 0x7FFFFFFF ? 0x7FFFFFFF : test; + } +} + +void +TEST (void) +{ + int i; + union256i_d res_256; + union256i_b src2_256; + union256i_b src1_256; + int res_ref_256[8]; + + for (i = 0; i < 32; i++) + { + int sign = i % 2 ? 1 : -1; + src1_256.a[i] = 10 + 3 * i + sign; + src2_256.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 8; i++) + res_256.a[i] = 0x7fffffff; + + CALC (res_ref_256, res_256.a, src1_256.a, src2_256.a, 32); + res_256.x = _mm256_dpbssds_epi32 (res_256.x, src1_256.x, src2_256.x); + if (check_union256i_d (res_256, res_ref_256)) + abort (); + + union128i_d res_128; + union128i_b src2_128; + union128i_b src1_128; + int res_ref_128[4]; + + for (i = 0; i < 16; i++) + { + int sign = i % 2 ? 1 : -1; + src1_128.a[i] = 10 + 3 * i * i + sign; + src2_128.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 4; i++) + res_128.a[i] = 0x7fffffff; + + CALC (res_ref_128, res_128.a, src1_128.a, src2_128.a, 16); + res_128.x = _mm_dpbssds_epi32 (res_128.x, src1_128.x, src2_128.x); + if (check_union128i_d (res_128, res_ref_128)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbsud-2.c b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbsud-2.c new file mode 100644 index 00000000000..6e4ffd1c7be --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbsud-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavxvnniint8" } */ +/* { dg-require-effective-target avxvnniint8 } */ +#define AVXVNNIINT8 +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +static void +CALC (int *r, int *dst, char *s1, unsigned char *s2, int size) +{ + short tempres[32]; + for (int i = 0; i < size; i++) { + tempres[i] = (short) s1[i] * (unsigned short) s2[i]; + } + for (int i = 0; i < size / 4; i++) { + long long test = (long long) dst[i] + tempres[i * 4] + tempres[i * 4 + 1] + + tempres[i * 4 + 2] + tempres[i * 4 + 3]; + r[i] = test; + } +} + +void +TEST (void) +{ + int i; + union256i_d res_256; + union256i_b src1_256; + union256i_ub src2_256; + int res_ref_256[8]; + + for (i = 0; i < 32; i++) + { + int sign = i % 2 ? 1 : -1; + src1_256.a[i] = 10 + 3 * i + sign; + src2_256.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 8; i++) + res_256.a[i] = 0x7fffffff; + + CALC (res_ref_256, res_256.a, src1_256.a, src2_256.a, 32); + res_256.x = _mm256_dpbsud_epi32 (res_256.x, src1_256.x, src2_256.x); + if (check_union256i_d (res_256, res_ref_256)) + abort (); + + union128i_d res_128; + union128i_b src1_128; + union128i_ub src2_128; + int res_ref_128[4]; + + for (i = 0; i < 16; i++) + { + int sign = i % 2 ? 1 : -1; + src1_128.a[i] = 10 + 3 * i * i + sign; + src2_128.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 4; i++) + res_128.a[i] = 0x7fffffff; + + CALC (res_ref_128, res_128.a, src1_128.a, src2_128.a, 16); + res_128.x = _mm_dpbsud_epi32 (res_128.x, src1_128.x, src2_128.x); + if (check_union128i_d (res_128, res_ref_128)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbsuds-2.c b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbsuds-2.c new file mode 100644 index 00000000000..ad4b6047ecd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbsuds-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavxvnniint8" } */ +/* { dg-require-effective-target avxvnniint8 } */ +#define AVXVNNIINT8 +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +static void +CALC (int *r, int *dst, char *s1, unsigned char *s2, int size) +{ + short tempres[32]; + for (int i = 0; i < size; i++) { + tempres[i] = (short) s1[i] * (unsigned short) s2[i]; + } + for (int i = 0; i < size / 4; i++) { + long long test = (long long) dst[i] + tempres[i * 4] + tempres[i * 4 + 1] + + tempres[i * 4 + 2] + tempres[i * 4 + 3]; + r[i] = test > 0x7FFFFFFF ? 0x7FFFFFFF : test; + } +} + +void +TEST (void) +{ + int i; + union256i_d res_256; + union256i_b src1_256; + union256i_ub src2_256; + int res_ref_256[8]; + + for (i = 0; i < 32; i++) + { + int sign = i % 2 ? 1 : -1; + src1_256.a[i] = 10 + 3 * i + sign; + src2_256.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 8; i++) + res_256.a[i] = 0x7fffffff; + + CALC (res_ref_256, res_256.a, src1_256.a, src2_256.a, 32); + res_256.x = _mm256_dpbsuds_epi32 (res_256.x, src1_256.x, src2_256.x); + if (check_union256i_d (res_256, res_ref_256)) + abort (); + + union128i_d res_128; + union128i_b src1_128; + union128i_ub src2_128; + int res_ref_128[4]; + + for (i = 0; i < 16; i++) + { + int sign = i % 2 ? 1 : -1; + src1_128.a[i] = 10 + 3 * i * i + sign; + src2_128.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 4; i++) + res_128.a[i] = 0x7fffffff; + + CALC (res_ref_128, res_128.a, src1_128.a, src2_128.a, 16); + res_128.x = _mm_dpbsuds_epi32 (res_128.x, src1_128.x, src2_128.x); + if (check_union128i_d (res_128, res_ref_128)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbuud-2.c b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbuud-2.c new file mode 100644 index 00000000000..6590915a459 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbuud-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavxvnniint8" } */ +/* { dg-require-effective-target avxvnniint8 } */ +#define AVXVNNIINT8 +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +static void +CALC (unsigned int *r, unsigned int *dst, unsigned char *s1, unsigned char *s2, int size) +{ + unsigned short tempres[32]; + for (int i = 0; i < size; i++) { + tempres[i] = (unsigned short) s1[i] * (unsigned short) s2[i]; + } + for (int i = 0; i < size / 4; i++) { + unsigned int test = (unsigned int) dst[i] + tempres[i * 4] + tempres[i * 4 + 1] + + tempres[i * 4 + 2] + tempres[i * 4 + 3]; + r[i] = test; + } +} + +void +TEST (void) +{ + int i; + union256i_ud res_256; + union256i_ub src2_256; + union256i_ub src1_256; + unsigned int res_ref_256[8]; + + for (i = 0; i < 32; i++) + { + int sign = i % 2 ? 1 : -1; + src1_256.a[i] = 10 + 3 * i + sign; + src2_256.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 8; i++) + res_256.a[i] = 0x7fffffff; + + CALC (res_ref_256, res_256.a, src1_256.a, src2_256.a, 32); + res_256.x = _mm256_dpbuud_epi32 (res_256.x, src1_256.x, src2_256.x); + if (check_union256i_ud (res_256, res_ref_256)) + abort (); + + union128i_ud res_128; + union128i_ub src2_128; + union128i_ub src1_128; + unsigned int res_ref_128[4]; + + for (i = 0; i < 16; i++) + { + int sign = i % 2 ? 1 : -1; + src1_128.a[i] = 10 + 3 * i * i + sign; + src2_128.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 4; i++) + res_128.a[i] = 0x7fffffff; + + CALC (res_ref_128, res_128.a, src1_128.a, src2_128.a, 16); + res_128.x = _mm_dpbuud_epi32 (res_128.x, src1_128.x, src2_128.x); + if (check_union128i_ud (res_128, res_ref_128)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbuuds-2.c b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbuuds-2.c new file mode 100644 index 00000000000..970e4a5d408 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avxvnniint8-vpdpbuuds-2.c @@ -0,0 +1,72 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -mavxvnniint8" } */ +/* { dg-require-effective-target avxvnniint8 } */ +#define AVXVNNIINT8 +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +static void +CALC (unsigned int *r, unsigned int *dst, unsigned char *s1, unsigned char *s2, int size) +{ + unsigned short tempres[32]; + for (int i = 0; i < size; i++) { + tempres[i] = (unsigned short) s1[i] * (unsigned short) s2[i]; + } + for (int i = 0; i < size / 4; i++) { + unsigned int test = (unsigned int) dst[i] + tempres[i * 4] + tempres[i * 4 + 1] + + tempres[i * 4 + 2] + tempres[i * 4 + 3]; + r[i] = test > 0xFFFFFFFF ? 0xFFFFFFFF : test; + } +} + +void +TEST (void) +{ + int i; + union256i_ud res_256; + union256i_ub src2_256; + union256i_ub src1_256; + unsigned int res_ref_256[8]; + + for (i = 0; i < 32; i++) + { + int sign = i % 2 ? 1 : -1; + src1_256.a[i] = 10 + 3 * i + sign; + src2_256.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 8; i++) + res_256.a[i] = 0x7fffffff; + + CALC (res_ref_256, res_256.a, src1_256.a, src2_256.a, 32); + res_256.x = _mm256_dpbuuds_epi32 (res_256.x, src1_256.x, src2_256.x); + if (check_union256i_ud (res_256, res_ref_256)) + abort (); + + union128i_ud res_128; + union128i_ub src2_128; + union128i_ub src1_128; + unsigned int res_ref_128[4]; + + for (i = 0; i < 16; i++) + { + int sign = i % 2 ? 1 : -1; + src1_128.a[i] = 10 + 3 * i * i + sign; + src2_128.a[i] = sign * 10 * i * i; + } + + for (i = 0; i < 4; i++) + res_128.a[i] = 0x7fffffff; + + CALC (res_ref_128, res_128.a, src1_128.a, src2_128.a, 16); + res_128.x = _mm_dpbuuds_epi32 (res_128.x, src1_128.x, src2_128.x); + if (check_union128i_ud (res_128, res_ref_128)) + abort (); +} diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc index 466555c0d06..a681bffe3e7 100644 --- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc +++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc @@ -81,6 +81,7 @@ extern void test_widekl (void) __attribute__((__target__("widekl"))); extern void test_avxvnni (void) __attribute__((__target__("avxvnni"))); extern void test_avx512fp16 (void) __attribute__((__target__("avx512fp16"))); extern void test_avxifma (void) __attribute__((__target__("avxifma"))); +extern void test_avxvnniint8 (void) __attribute__((__target__("avxvnniint8"))); extern void test_no_sgx (void) __attribute__((__target__("no-sgx"))); extern void test_no_avx5124fmaps(void) __attribute__((__target__("no-avx5124fmaps"))); @@ -163,6 +164,7 @@ extern void test_no_widekl (void) __attribute__((__target__("no-widekl"))); extern void test_no_avxvnni (void) __attribute__((__target__("no-avxvnni"))); extern void test_no_avx512fp16 (void) __attribute__((__target__("no-avx512fp16"))); extern void test_no_avxifma (void) __attribute__((__target__("no-avxifma"))); +extern void test_no_avxvnniint8 (void) __attribute__((__target__("no-avxvnniint8"))); extern void test_arch_nocona (void) __attribute__((__target__("arch=nocona"))); extern void test_arch_core2 (void) __attribute__((__target__("arch=core2"))); diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c index fde56261d8f..ddde2df6657 100644 --- a/gcc/testsuite/gcc.target/i386/sse-12.c +++ b/gcc/testsuite/gcc.target/i386/sse-12.c @@ -3,7 +3,7 @@ popcntintrin.h gfniintrin.h and mm_malloc.h are usable with -O -std=c89 -pedantic-errors. */ /* { dg-do compile } */ -/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavxifma" } */ +/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavxifma -mavxvnniint8" } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index bb29555babe..2b293216c6f 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma" } */ +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8" } */ /* { dg-add-options bind_pic_locally } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index f2701ddaaf9..78b51048b90 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma" } */ +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8" } */ /* { dg-add-options bind_pic_locally } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index 3d196975b1e..cc1c8cfa4be 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -103,7 +103,7 @@ #ifndef DIFFERENT_PRAGMAS -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma") +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8") #endif /* Following intrinsics require immediate arguments. They @@ -220,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1) /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */ #ifdef DIFFERENT_PRAGMAS -#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma") +#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8") #endif #include test_1 (_cvtss_sh, unsigned short, float, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index d3a233f90fc..270f4483491 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -843,6 +843,6 @@ #define __builtin_ia32_vpclmulqdq_v2di(A, B, C) __builtin_ia32_vpclmulqdq_v2di(A, B, 1) #define __builtin_ia32_vpclmulqdq_v8di(A, B, C) __builtin_ia32_vpclmulqdq_v8di(A, B, 1) -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma") +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8") #include diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 69de3b96bfc..64ccfc746bd 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -9518,6 +9518,18 @@ proc check_effective_target_avxifma { } { } "-O0 -mavxifma" ] } +# Return 1 if avxvnniint8 instructions can be compiled. +proc check_effective_target_avxvnniint8 { } { + return [check_no_compiler_messages avxvnniint8 object { + typedef int __v8si __attribute__ ((__vector_size__ (32))); + __v8si + _mm256_dpbssd_epi32 (__v8si __A, __v8si __B, __v8si __C) + { + return __builtin_ia32_vpdpbssd256 (__A, __B, __C); + } + } "-O0 -mavxvnniint8" ] +} + # Return 1 if sse instructions can be compiled. proc check_effective_target_sse { } { return [check_no_compiler_messages sse object { From patchwork Fri Oct 14 07:54:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jiang, Haochen" X-Patchwork-Id: 2554 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp53938wrs; Fri, 14 Oct 2022 00:56:40 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5bsfSijKayZDJ3TNS6npUtGMCra13nw0/jeWuq7aY3NjR6tLGbNDGl2n8NUYGFgVIex78s X-Received: by 2002:a17:907:2daa:b0:78d:4dca:43e with SMTP id gt42-20020a1709072daa00b0078d4dca043emr2676661ejc.134.1665734200256; Fri, 14 Oct 2022 00:56:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665734200; cv=none; d=google.com; s=arc-20160816; b=0dtuA21ZyDONRD4PaA+5Isu+hiIzPpV3GSrN7WGGuvUchLkl6A7JjMh+o16YES7L+C ifopHCBs/gdAtsJzHwkyDT07vmARcmquPzs3aaLLDb9jUnc5OfovRswmWSmvzMSV79iA i+D1/2PZIhQDG0cGOjM/ekDhffZQtna1ES1HUTrcBOWPDVpyYOK7LAVW1r8pq0Dvf8YE xr9vvUpK4Jsnz7IiPXi7QBsdmogZdXdFb0mJ5af8Xkf5esUYL9e1MDKhyhW+f4NlXceB iS49Mey+Mzo7bWp6+NDmSq/irqp2sb9iSAABQX3c9sEHqzXs38qzXRkYSYW1fN0ExVlh wmkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :references:in-reply-to:message-id:date:subject:to:dmarc-filter :delivered-to:dkim-signature:dkim-filter; bh=gAzjroxN34knf6B48N8kXkEWzVqdVeepNchrpyHY5rA=; b=jcXRLnaYVWwz1slg7gHEshQrf0izAdqPW+a11rccTc211zSLYlE4OGTS42wQjbpchm Oyg67F0DnMaA8VzYUVFQy/cnZsLqTRALo2sgCrCl7DfXY+1Y9iR0BVAh1BHlPs6o5s8a h6qG7hMbx/kLhNztowM2LFanJ243T1CJRIUXhdPRvbkr8IgxPQZ4TBMGvEXWnhUutrYg HaXMJX4MZJNx0EgyCSTS1upoTalk+JbPVPyku8IjA28EliH+rOCSDdtdQtVwH/82un36 TTruLiJc8h26+tSWkAArDcFPmy8cW/+bMXA6hmjkaJddaY1ZP25m4NwyPNdUaHTUW9in y7Ow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=u95QAd0D; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id dp16-20020a170906c15000b0078dcc87b1c4si1583028ejc.923.2022.10.14.00.56.39 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Oct 2022 00:56:40 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=u95QAd0D; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C3FA838515EF for ; Fri, 14 Oct 2022 07:55:48 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C3FA838515EF DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1665734148; bh=gAzjroxN34knf6B48N8kXkEWzVqdVeepNchrpyHY5rA=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=u95QAd0DAT2p0Dr9H51EbK9j/kLc1jlLjrtg8CraicfhoveKBiLeBbxMQxwfzjkDT VlZtKcq3NQyPclRJ59uqK0F6onfHj5qtYLOqJTEl2bW6H62jIlzzCTKuTW7Dz8TZ8b Srz8DTu70AGz6oCyJTYsD2DXWVNJHwu7dgFJ3/qA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id 8BC053857C58 for ; Fri, 14 Oct 2022 07:54:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8BC053857C58 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="288597868" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="288597868" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 00:54:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="627488391" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="627488391" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga002.jf.intel.com with ESMTP; 14 Oct 2022 00:54:48 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id E6FED1009C8E; Fri, 14 Oct 2022 15:54:47 +0800 (CST) To: gcc-patches@gcc.gnu.org Subject: [PATCH 3/6] i386: Add intrinsic for vector __bf16 Date: Fri, 14 Oct 2022 15:54:42 +0800 Message-Id: <20221014075445.7938-4-haochen.jiang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20221014075445.7938-1-haochen.jiang@intel.com> References: <20221014075445.7938-1-haochen.jiang@intel.com> X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Haochen Jiang via Gcc-patches From: "Jiang, Haochen" Reply-To: Haochen Jiang Cc: hongtao.liu@intel.com Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746648904442852395?= X-GMAIL-MSGID: =?utf-8?q?1746648904442852395?= From: konglin1 gcc/ChangeLog: * config/i386/avx512fp16intrin.h : New intrinsic. (_mm_load_sbf16): Ditto. (_mm_mask_load_sbf16): Ditto. (_mm_maskz_load_sbf16): Ditto. (_mm_mask_store_sbf16): Ditto. (_mm_mask_move_sbf16): Ditto. (_mm_maskz_move_sbf16): Ditto. * config/i386/avx512bf16intrin.h: New intrinsic. (_mm_setzero_pbf16): Ditto. (_mm256_setzero_pbf16): Ditto. (_mm512_setzero_pbf16): Ditto. (_mm512_undefined_pbf16): Ditto. (_mm512_set1_pbf16): Ditto. (_mm512_set_pbf16): Ditto. (_mm512_setr_pbf16): Ditto. (_mm_castpbf16_ps): Ditto. (_mm256_castpbf16_ps): Ditto. (_mm512_castpbf16_ps): Ditto. (_mm_castpbf16_pd): Ditto. (_mm256_castpbf16_pd): Ditto. (_mm512_castpbf16_pd): Ditto. (_mm_castpbf16_si128): Ditto. (_mm256_castpbf16_si256): Ditto. (_mm512_castpbf16_si512): Ditto. (_mm_castps_pbf16): Ditto. (_mm256_castps_pbf16): Ditto. (_mm512_castps_pbf16): Ditto. (_mm_castpd_pbf16): Ditto. (_mm256_castpd_pbf16): Ditto. (_mm512_castpd_pbf16): Ditto. (_mm_castsi128_pbf16): Ditto. (_mm256_castsi256_pbf16): Ditto. (_mm512_castsi512_pbf16): Ditto. (_mm256_castpbf16256_pbf16128): Ditto. (_mm512_castpbf16512_pbf16128): Ditto. (_mm512_castpbf16512_pbf16256): Ditto. (_mm256_castpbf16128_pbf16256): Ditto. (_mm512_castpbf16128_pbf16512): Ditto. (_mm512_castpbf16256_pbf16512): Ditto. (_mm256_zextpbf16128_pbf16256): Ditto. (_mm512_zextpbf16128_pbf16512): Ditto. (_mm512_zextpbf16256_pbf16512): Ditto. (_mm512_abs_pbf16): Ditto. (_mm512_load_pbf16): Ditto. (_mm256_load_pbf16): Ditto. (_mm_load_pbf16): Ditto. (_mm512_loadu_pbf16): Ditto. (_mm256_loadu_pbf16): Ditto. (_mm_loadu_pbf16): Ditto. (_mm_store_sbf16): Ditto. (_mm512_store_pbf16): Ditto. (_mm256_store_pbf16): Ditto. (_mm_store_pbf16): Ditto. (_mm512_storeu_pbf16): Ditto. (_mm256_storeu_pbf16): Ditto. (_mm_storeu_pbf16): Ditto. (_mm_move_sbf16): Ditto. (_mm512_mask_blend_pbf16): Ditto. (_mm512_permutex2var_pbf16): Ditto. (_mm512_permutexvar_pbf16): Ditto. (_mm512_bcstnebf16_ps): Ditto. (_mm512_mask_bcstnebf16_ps): Ditto. (_mm512_bcstnesh_ps): Ditto. (_mm512_mask_bcstnesh_ps): Ditto. (_mm512_maskz_bcstnesh_ps): Ditto. (_mm512_cvtne2ps_ph): Ditto. (_mm512_mask_cvtne2ps_ph): Ditto. (_mm512_cvtne_round2ps_ph): Ditto. (_mm512_mask_cvtne_round2ps_ph): Ditto. (_mm512_cvtneebf16_ps): Ditto. (_mm512_mask_cvtneebf16_ps): Ditto. (_mm512_maskz_cvtneebf16_ps): Ditto. (_mm512_cvtneeph_ps): Ditto. (_mm512_mask_cvtneeph_ps): Ditto. (_mm512_cvtneobf16_ps): Ditto. (_mm512_mask_cvtneobf16_ps): Ditto. (_mm512_maskz_cvtneobf16_ps): Ditto. (_mm512_cvtneoph_ps): Ditto. (_mm512_mask_cvtneoph_ps): Ditto. * config/i386/avx512bf16vlintrin.h (__attribute__): Ditto. (_mm_cvtsbf16_bf16): Ditto. (_mm256_cvtsbf16_bf16): Ditto. (_mm256_undefined_pbf16): Ditto. (_mm_undefined_pbf16): Ditto. (_mm_set_sbf16): Ditto. (_mm_set1_pbf16): Ditto. (_mm256_set1_pbf16): Ditto. (_mm_set_pbf16): Ditto. (_mm256_set_pbf16): Ditto. (_mm_setr_pbf16): Ditto. (_mm256_setr_pbf16): Ditto. (_mm256_abs_pbf16): Ditto. (_mm_abs_pbf16): Ditto. (_mm_mask_blend_pbf16): Ditto. (_mm256_mask_blend_pbf16): Ditto. (_mm_permutex2var_pbf16): Ditto. (_mm256_permutex2var_pbf16): Ditto. (_mm_permutexvar_pbf16): Ditto. (_mm256_permutexvar_pbf16): Ditto. (_mm_cvtneebf16_ps): Change bf16 mode. (_mm256_cvtneebf16_ps): Diito. (_mm_cvtneobf16_ps): Diito. (_mm256_cvtneobf16_ps): Diito. (_mm_mask_cvtneebf16_ps): Diito. (_mm_maskz_cvtneebf16_ps): Diito. (_mm256_mask_cvtneebf16_ps): Diito. (_mm256_maskz_cvtneebf16_ps): Diito. (_mm_mask_cvtneobf16_ps): Diito. (_mm_maskz_cvtneobf16_ps): Diito. (_mm256_mask_cvtneobf16_ps): Diito. (_mm256_maskz_cvtneobf16_ps): Diito. * config/i386/immintrin.h: Add SSE2 depend for avx512bf16. --- gcc/config/i386/avx512bf16intrin.h | 418 +++++++++++++++++++++++++++ gcc/config/i386/avx512bf16vlintrin.h | 177 ++++++++++++ gcc/config/i386/avx512fp16intrin.h | 70 +++++ gcc/config/i386/immintrin.h | 2 + 4 files changed, 667 insertions(+) diff --git a/gcc/config/i386/avx512bf16intrin.h b/gcc/config/i386/avx512bf16intrin.h index b6e9ddad157..d09a59c1509 100644 --- a/gcc/config/i386/avx512bf16intrin.h +++ b/gcc/config/i386/avx512bf16intrin.h @@ -51,6 +51,424 @@ _mm_cvtsbh_ss (__bfloat16 __A) return __tmp.a; } +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_setzero_pbf16 (void) +{ + return (__m512bf16)(__v32bf) _mm512_setzero_ps (); +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_undefined_pbf16 (void) +{ + __m512bf16 __Y = __Y; + return __Y; +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set1_pbf16 (__bf16 __h) +{ + return (__m512bf16)(__v32bf) {__h, __h, __h, __h, __h, __h, __h, __h, + __h, __h, __h, __h, __h, __h, __h, __h, + __h, __h, __h, __h, __h, __h, __h, __h, + __h, __h, __h, __h, __h, __h, __h, __h}; +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_set_pbf16 (__bf16 __h1, __bf16 __h2, __bf16 __h3, __bf16 __h4, + __bf16 __h5, __bf16 __h6, __bf16 __h7, __bf16 __h8, + __bf16 __h9, __bf16 __h10, __bf16 __h11, __bf16 __h12, + __bf16 __h13, __bf16 __h14, __bf16 __h15, __bf16 __h16, + __bf16 __h17, __bf16 __h18, __bf16 __h19, __bf16 __h20, + __bf16 __h21, __bf16 __h22, __bf16 __h23, __bf16 __h24, + __bf16 __h25, __bf16 __h26, __bf16 __h27, __bf16 __h28, + __bf16 __h29, __bf16 __h30, __bf16 __h31, __bf16 __h32) +{ + return + (__m512bf16)(__v32bf) {__h32, __h31, __h30, __h29, __h28, __h27, __h26, + __h25, __h24, __h23, __h22, __h21, __h20, __h19, + __h18, __h17, __h16, __h15, __h14, __h13, __h12, + __h11, __h10, __h9, __h8, __h7, __h6, __h5, + __h4, __h3, __h2, __h1}; +} + +#define _mm512_setr_pbf16(h1, h2, h3, h4, h5, h6, h7, h8, h9, h10, h11, h12, \ + h13, h14, h15, h16, h17, h18, h19, h20, h21, h22, \ + h23, h24, h25, h26, h27, h28, h29, h30, h31, h32) \ + _mm512_set_pbf16 ((h32), (h31), (h30), (h29), (h28), (h27), (h26), (h25), \ + (h24), (h23), (h22), (h21), (h20), (h19), (h18), (h17), \ + (h16), (h15), (h14), (h13), (h12), (h11), (h10), (h9), \ + (h8), (h7), (h6), (h5), (h4), (h3), (h2), (h1)) + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castpbf16_ps (__m128bf16 __a) +{ + return (__m128) __a; +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castpbf16_ps (__m256bf16 __a) +{ + return (__m256) __a; +} + +extern __inline __m512 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castpbf16_ps (__m512bf16 __a) +{ + return (__m512) __a; +} + +extern __inline __m128d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castpbf16_pd (__m128bf16 __a) +{ + return (__m128d) __a; +} + +extern __inline __m256d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castpbf16_pd (__m256bf16 __a) +{ + return (__m256d) __a; +} + +extern __inline __m512d +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castpbf16_pd (__m512bf16 __a) +{ + return (__m512d) __a; +} + +extern __inline __m128i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castpbf16_si128 (__m128bf16 __a) +{ + return (__m128i) __a; +} + +extern __inline __m256i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castpbf16_si256 (__m256bf16 __a) +{ + return (__m256i) __a; +} + +extern __inline __m512i +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castpbf16_si512 (__m512bf16 __a) +{ + return (__m512i) __a; +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castps_pbf16 (__m128 __a) +{ + return (__m128bf16) __a; +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castps_pbf16 (__m256 __a) +{ + return (__m256bf16) __a; +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castps_pbf16 (__m512 __a) +{ + return (__m512bf16) __a; +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castpd_pbf16 (__m128d __a) +{ + return (__m128bf16) __a; +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castpd_pbf16 (__m256d __a) +{ + return (__m256bf16) __a; +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castpd_pbf16 (__m512d __a) +{ + return (__m512bf16) __a; +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_castsi128_pbf16 (__m128i __a) +{ + return (__m128bf16) __a; +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castsi256_pbf16 (__m256i __a) +{ + return (__m256bf16) __a; +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castsi512_pbf16 (__m512i __a) +{ + return (__m512bf16) __a; +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castpbf16256_pbf16128 (__m256bf16 __a) +{ + return __builtin_shufflevector (__a, __a, 0, 1, 2, 3, 4, 5, 6, 7); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castpbf16512_pbf16128 (__m512bf16 __a) +{ + return __builtin_shufflevector (__a, __a, 0, 1, 2, 3, 4, 5, 6, 7); +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castpbf16512_pbf16256 (__m512bf16 __a) +{ + return __builtin_shufflevector (__a, __a, 0, 1, 2, 3, 4, 5, 6, 7, + 8, 9, 10, 11, 12, 13, 14, 15); +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_castpbf16128_pbf16256 (__m128bf16 __a) +{ + return __builtin_shufflevector (__a, __a, 0, 1, 2, 3, 4, 5, 6, 7, + -1, -1, -1, -1, -1, -1, -1, -1); +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castpbf16128_pbf16512 (__m128bf16 __a) +{ + return __builtin_shufflevector (__a, __a, 0, 1, 2, 3, 4, 5, 6, 7, -1, -1, + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1); +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_castpbf16256_pbf16512 (__m256bf16 __a) +{ + return __builtin_shufflevector (__a, __a, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, + 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, + -1, -1, -1, -1, -1, -1, -1, -1, -1, -1); +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_zextpbf16128_pbf16256 (__m128bf16 __A) +{ + return (__m256bf16) _mm256_insertf128_ps (_mm256_setzero_ps (), + (__m128) __A, 0); +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_zextpbf16128_pbf16512 (__m128bf16 __A) +{ + return (__m512bf16) _mm512_insertf32x4 (_mm512_setzero_ps (), + (__m128) __A, 0); +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_zextpbf16256_pbf16512 (__m256bf16 __A) +{ + return (__m512bf16) _mm512_insertf64x4 (_mm512_setzero_pd (), + (__m256d) __A, 0); +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_abs_pbf16 (__m512bf16 __A) +{ + return + (__m512bf16) _mm512_and_epi32 (_mm512_set1_epi32 (0x7FFF7FFF), + (__m512i) __A); +} + +// loads with vmovsh if avx512fp16 enable: +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_load_pbf16 (void const *__p) +{ + return *(const __m512bf16 *) __p; +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_load_pbf16 (void const *__p) +{ + return *(const __m256bf16 *) __p; +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_load_pbf16 (void const *__p) +{ + return *(const __m128bf16 *) __p; +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_loadu_pbf16 (void const *__p) +{ + struct __loadu_pbf16 + { + __m512bf16_u __v; + } __attribute__((__packed__, __may_alias__)); + return ((const struct __loadu_pbf16 *) __p)->__v; +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_loadu_pbf16 (void const *__p) +{ + struct __loadu_pbf16 + { + __m256bf16_u __v; + } __attribute__((__packed__, __may_alias__)); + return ((const struct __loadu_pbf16 *) __p)->__v; +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_loadu_pbf16 (void const *__p) +{ + struct __loadu_pbf16 + { + __m128bf16_u __v; + } __attribute__((__packed__, __may_alias__)); + return ((const struct __loadu_pbf16 *) __p)->__v; +} + +// stores with vmovsh if avx512fp16 enable: +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_store_sbf16 (void *__dp, __m128bf16 __a) +{ + struct __mm_store_sbf16_struct + { + __bf16 __u; + } __attribute__((__packed__, __may_alias__)); + ((struct __mm_store_sbf16_struct *) __dp)->__u = __a[0]; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_store_pbf16 (void *__P, __m512bf16 __A) +{ + *(__m512bf16 *) __P = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_store_pbf16 (void *__P, __m256bf16 __A) +{ + *(__m256bf16 *) __P = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_store_pbf16 (void *__P, __m128bf16 __A) +{ + *(__m128bf16 *) __P = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_storeu_pbf16 (void *__P, __m512bf16 __A) +{ + struct __storeu_pbf16 { + __m512bf16_u __v; + } __attribute__((__packed__, __may_alias__)); + ((struct __storeu_pbf16 *) __P)->__v = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_storeu_pbf16 (void *__P, __m256bf16 __A) +{ + struct __storeu_pbf16 + { + __m256bf16_u __v; + } __attribute__((__packed__, __may_alias__)); + ((struct __storeu_pbf16 *) __P)->__v = __A; +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_storeu_pbf16 (void *__P, __m128bf16 __A) +{ + struct __storeu_pbf16 + { + __m128bf16_u __v; + } __attribute__((__packed__, __may_alias__)); + ((struct __storeu_pbf16 *) __P)->__v = __A; +} + +// moves with vmovsh if enable avx512fp16: +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_move_sbf16 (__m128bf16 __a, __m128bf16 __b) +{ + __a[0] = __b[0]; + return __a; +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_mask_blend_pbf16 (__mmask32 __U, __m512bf16 __A, __m512bf16 __W) +{ + return (__m512bf16) __builtin_ia32_movdquhi512_mask ((__v32hi) __W, + (__v32hi) __A, + (__mmask32) __U); +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutex2var_pbf16 (__m512bf16 __A, __m512i __I, __m512bf16 __B) +{ + return (__m512bf16) __builtin_ia32_vpermi2varhi512_mask ((__v32hi) __A, + (__v32hi) __I, + (__v32hi) __B, + (__mmask32)-1); +} + +extern __inline __m512bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm512_permutexvar_pbf16 (__m512i __A, __m512bf16 __B) +{ + return (__m512bf16) __builtin_ia32_permvarhi512_mask ((__v32hi) __B, + (__v32hi) __A, + (__v32hi) + (_mm512_setzero_si512 ()), + (__mmask32)-1); +} + /* vcvtne2ps2bf16 */ extern __inline __m512bh diff --git a/gcc/config/i386/avx512bf16vlintrin.h b/gcc/config/i386/avx512bf16vlintrin.h index 969335ff358..732623a94a2 100644 --- a/gcc/config/i386/avx512bf16vlintrin.h +++ b/gcc/config/i386/avx512bf16vlintrin.h @@ -44,6 +44,183 @@ typedef short __m256bh __attribute__ ((__vector_size__ (32), __may_alias__)); typedef short __m128bh __attribute__ ((__vector_size__ (16), __may_alias__)); typedef unsigned short __bfloat16; + +extern __inline __bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtsbf16_bf16 (__m128bf16 __a) +{ + return __a[0]; +} + +extern __inline __bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtsbf16_bf16 (__m256bf16 __a) +{ + return __a[0]; +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_undefined_pbf16 (void) +{ + __m256bf16 __Y = __Y; + return __Y; +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_undefined_pbf16 (void) +{ + __m128bf16 __Y = __Y; + return __Y; +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_setzero_pbf16 (void) +{ + return (__m128bf16)(__v8bf) _mm_setzero_ps (); +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_setzero_pbf16 (void) +{ + return (__m256bf16)(__v16bf) _mm256_setzero_ps (); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_sbf16 (__bf16 bf) +{ + return (__v8bf) + __builtin_shufflevector ((__v8bf){bf, bf, bf, bf, bf, bf, bf, bf}, + (__v8bf) _mm_setzero_pbf16 (), 0, + 8, 8, 8, 8, 8, 8, 8); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set1_pbf16 (__bf16 bf) +{ + return (__m128bf16)(__v8bf) {bf, bf, bf, bf, bf, bf, bf, bf}; +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set1_pbf16 (__bf16 bf) +{ + return (__m256bf16)(__v16bf) {bf, bf, bf, bf, bf, bf, bf, bf, + bf, bf, bf, bf, bf, bf, bf, bf}; +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_set_pbf16 (__bf16 bf1, __bf16 bf2, __bf16 bf3, __bf16 bf4, + __bf16 bf5, __bf16 bf6, __bf16 bf7, __bf16 bf8) +{ + return (__m128bf16)(__v8bf) {bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8}; +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_set_pbf16 (__bf16 bf1, __bf16 bf2, __bf16 bf3, __bf16 bf4, + __bf16 bf5, __bf16 bf6, __bf16 bf7, __bf16 bf8, + __bf16 bf9, __bf16 bf10, __bf16 bf11, __bf16 bf12, + __bf16 bf13, __bf16 bf14, __bf16 bf15, __bf16 bf16) +{ + return (__m256bf16)(__v16bf) {bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8, + bf9, bf10, bf11, bf12, bf13, bf14, + bf15, bf16}; +} + +#define _mm_setr_pbf16(bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8) \ + _mm_set_pbf16 ((bf8), (bf7), (bf6), (bf5), (bf4), (bf3), (bf2), (bf1)) + +#define _mm256_setr_pbf16(bf1, bf2, bf3, bf4, bf5, bf6, bf7, bf8, bf9, bf10, \ + bf11, bf12, bf13, bf14, bf15, bf16) \ + _mm256_set_pbf16 ((bf16), (bf15), (bf14), (bf13), (bf12), (bf11), (bf10), \ + (bf9), (bf8), (bf7), (bf6), (bf5), (bf4), (bf3), (bf2), \ + (bf1)) + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_abs_pbf16 (__m256bf16 __A) +{ + return (__m256bf16) _mm256_and_si256 (_mm256_set1_epi32 (0x7FFF7FFF), + (__m256i)__A); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_abs_pbf16 (__m128bf16 __A) +{ + return (__m128bf16) _mm_and_si128 (_mm_set1_epi32 (0x7FFF7FFF), + (__m128i)__A); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_blend_pbf16 (__mmask8 __U, __m128bf16 __A, __m128bf16 __W) +{ + return (__m128bf16) + __builtin_ia32_movdquhi128_mask ((__v8hi) __W, + (__v8hi) __A, + (__mmask8) __U); +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_mask_blend_pbf16 (__mmask16 __U, __m256bf16 __A, __m256bf16 __W) +{ + return (__m256bf16) + __builtin_ia32_movdquhi256_mask ((__v16hi) __W, + (__v16hi) __A, + (__mmask16) __U); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_permutex2var_pbf16 (__m128bf16 __A, __m128i __I, __m128bf16 __B) +{ + return (__m128bf16) + __builtin_ia32_vpermi2varhi128_mask ((__v8hi) __A, + (__v8hi) __I, + (__v8hi) __B, + (__mmask8) -1); +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permutex2var_pbf16 (__m256bf16 __A, __m256i __I, __m256bf16 __B) +{ + return (__m256bf16) __builtin_ia32_vpermi2varhi256_mask ((__v16hi) __A, + (__v16hi) __I, + (__v16hi) __B, + (__mmask16)-1); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_permutexvar_pbf16 (__m128i __A, __m128bf16 __B) +{ + return (__m128bf16) __builtin_ia32_permvarhi128_mask ((__v8hi) __B, + (__v8hi) __A, + (__v8hi) + (_mm_setzero_si128 ()), + (__mmask8) -1); +} + +extern __inline __m256bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_permutexvar_pbf16 (__m256i __A, __m256bf16 __B) +{ + return (__m256bf16) __builtin_ia32_permvarhi256_mask ((__v16hi) __B, + (__v16hi) __A, + (__v16hi) + (_mm256_setzero_si256 ()), + (__mmask16) -1); +} /* vcvtne2ps2bf16 */ extern __inline __m256bh diff --git a/gcc/config/i386/avx512fp16intrin.h b/gcc/config/i386/avx512fp16intrin.h index 75f7475ad18..82b814abde2 100644 --- a/gcc/config/i386/avx512fp16intrin.h +++ b/gcc/config/i386/avx512fp16intrin.h @@ -53,6 +53,18 @@ typedef _Float16 __m256h_u __attribute__ ((__vector_size__ (32), \ typedef _Float16 __m512h_u __attribute__ ((__vector_size__ (64), \ __may_alias__, __aligned__ (1))); + +/* Internal data types for implementing the bf16 intrinsics. */ +typedef __bf16 __v32bf __attribute__((__vector_size__(64), __aligned__(64))); +typedef __bf16 __m512bf16 __attribute__((__vector_size__(64), __aligned__(64))); +typedef __bf16 __m512bf16_u __attribute__((__vector_size__(64), __aligned__(1))); +typedef __bf16 __v8bf __attribute__((__vector_size__(16), __aligned__(16))); +typedef __bf16 __m128bf16 __attribute__((__vector_size__(16), __aligned__(16))); +typedef __bf16 __m128bf16_u __attribute__((__vector_size__(16), __aligned__(1))); +typedef __bf16 __v16bf __attribute__((__vector_size__(32), __aligned__(32))); +typedef __bf16 __m256bf16 __attribute__((__vector_size__(32), __aligned__(32))); +typedef __bf16 __m256bf16_u __attribute__((__vector_size__(32), __aligned__(1))); + extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_set_ph (_Float16 __A7, _Float16 __A6, _Float16 __A5, @@ -2771,6 +2783,44 @@ _mm_mask_store_sh (_Float16 const* __A, __mmask8 __B, __m128h __C) __builtin_ia32_storesh_mask (__A, __C, __B); } +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_load_sbf16 (void const *__dp) +{ + return (__m128bf16) + __builtin_ia32_loadsh_mask ((_Float16 const*) __dp, + _mm_setzero_ph(), + (__mmask8) -1); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_load_sbf16 (__m128bf16 __A, __mmask8 __B, const void *__C) +{ + return (__m128bf16) + __builtin_ia32_loadsh_mask ((_Float16 const*) __C, + (__v8hf) __A, + (__mmask8) __B); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_load_sbf16 (__mmask8 __A, const void *__B) +{ + return (__m128bf16) + __builtin_ia32_loadsh_mask ((_Float16 const*) __B, + _mm_setzero_ph(), + (__mmask8) __A); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_store_sbf16 (const void *__A, __mmask8 __B, __m128bf16 __C) +{ + __builtin_ia32_storesh_mask ((_Float16 const*) __A, + (__v8hf) __C, (__mmask8) __B); +} + extern __inline __m128h __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm_move_sh (__m128h __A, __m128h __B) @@ -2793,6 +2843,26 @@ _mm_maskz_move_sh (__mmask8 __A, __m128h __B, __m128h __C) return __builtin_ia32_vmovsh_mask (__B, __C, _mm_setzero_ph (), __A); } +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_mask_move_sbf16 (__m128bf16 __A, __mmask8 __B, + __m128bf16 __C, __m128bf16 __D) +{ + return (__m128bf16) + __builtin_ia32_vmovsh_mask ((__v8hf) __C, (__v8hf) __D, + (__v8hf) __A, (__mmask8) __B); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_maskz_move_sbf16 (__mmask8 __A, __m128bf16 __B, __m128bf16 __C) +{ + return (__m128bf16) + __builtin_ia32_vmovsh_mask ((__v8hf) __B, (__v8hf) __C, + _mm_setzero_ph(), + (__mmask8) __A); +} + /* Intrinsics vcvtph2dq. */ extern __inline __m512i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index ddea249d09b..c62d50f1951 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -118,9 +118,11 @@ #include +#ifdef __SSE2__ #include #include +#endif #include From patchwork Fri Oct 14 07:54:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jiang, Haochen" X-Patchwork-Id: 2559 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp54741wrs; Fri, 14 Oct 2022 00:59:11 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6QgxDTkXxgveLDxljbxBJOSi5K/HrhQ5HMes1b9/MInvb7sD7XiWsZO7DdaFsNtMgt4CJw X-Received: by 2002:a17:907:6e87:b0:782:2d55:f996 with SMTP id sh7-20020a1709076e8700b007822d55f996mr2674913ejc.502.1665734351738; Fri, 14 Oct 2022 00:59:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665734351; cv=none; d=google.com; s=arc-20160816; b=LeXPKW8NOEMsNceGMm7t04IKBMoZY4+0ec593VC5Cn+f5KRBIpodUtWxVwARUb++1g SfqOxDHJO7qkMGdlvKhd88XNDsIYdqhRVved2fTbAokshpk9g4erouop2fMRqj4D6fEM leZ3ZH3Eu0IuP0sIxkAJHit+6w0MYTTAje1iZoIooQ93Y98ZsMXKGFNbQZ7NjTAFlslS um6g6bMQxsENIpgGQjp287XgxxCTeaw/kuUpvHXBUGjmAnvtOF2ZIeNNUMoB4WqpHgnk TmgfNeO1fVocxe7XEFb0IRvrjc2vDE2wyO6fjOm6+6lCggkG0jpCzNPj7iiy7bfpUMuh t/Cg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :references:in-reply-to:message-id:date:subject:to:dmarc-filter :delivered-to:dkim-signature:dkim-filter; bh=D6kAHsgkX5iK00PjWFMXypzw5EeqSwDxpGW0mDrY66o=; b=PNJ/noL2IwbO/wBwLUBggoWm38fVFsc1EXGckpG/VULURxeT5+QjW+cMtquKcz5LbP +wik8U/hY1/WtoNvA4bCRycLWiReAo2AO/Z9GcwgzgaTiwEdUqVA+zj4D55B/G2jwVrU 9Fwu4EJeB8logAmxQgCWMIaruIVB+E5x+YhlG+OBYKtIbSo3Zn/eAad2lCAYSxUatDi7 nujdn1o4KJsEaMJgg2Hc9ZiCJ0U/kbIJbsT8RQqtBlIL3rUTZUgG5dw35+zDlPkjDVbM x7uilTF0XMMQiNLepF3RRq6MR+4qECecdK3PjoEUOzNowIWfP6ZNBhqfAZml/C9mZdy9 CFrw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=jaD8RssR; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id m3-20020a509983000000b0045c3f6ad4b0si1578543edb.484.2022.10.14.00.59.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Oct 2022 00:59:11 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=jaD8RssR; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7AB753884523 for ; Fri, 14 Oct 2022 07:57:45 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7AB753884523 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1665734265; bh=D6kAHsgkX5iK00PjWFMXypzw5EeqSwDxpGW0mDrY66o=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=jaD8RssRLRfDZ8LmvQbiTdqie++SzvEMQRpUaCcqavsckBmuhMs29DTYu9EVwLyEq Q2XBuhnRONLhv5gWurB8TYytkBRNrL9b7DVB0aF9scp1vxElQQy4eyLRS3YTSIse2c 2Fhrov0VhngaEpfWtpnmFFM37ICJ5kDFaXHeb8DI= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id E0233385741B for ; Fri, 14 Oct 2022 07:55:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E0233385741B X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="288597869" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="288597869" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 00:54:57 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="627488393" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="627488393" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga002.jf.intel.com with ESMTP; 14 Oct 2022 00:54:48 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id ED54E1009C8F; Fri, 14 Oct 2022 15:54:47 +0800 (CST) To: gcc-patches@gcc.gnu.org Subject: [PATCH 4/6] Support Intel AVX-NE-CONVERT Date: Fri, 14 Oct 2022 15:54:43 +0800 Message-Id: <20221014075445.7938-5-haochen.jiang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20221014075445.7938-1-haochen.jiang@intel.com> References: <20221014075445.7938-1-haochen.jiang@intel.com> X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Haochen Jiang via Gcc-patches From: "Jiang, Haochen" Reply-To: Haochen Jiang Cc: hongtao.liu@intel.com Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746649063240626115?= X-GMAIL-MSGID: =?utf-8?q?1746649063240626115?= From: Kong Lingling gcc/ChangeLog: * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_AVXNECONVERT_SET, OPTION_MASK_ISA2_AVXNECONVERT_UNSET): New. (ix86_handle_option): Handle -mavxneconvert, unset avxneconvert when avx2 is disabled. * common/config/i386/i386-cpuinfo.h (processor_types): Add FEATURE_AVXNECONVERT. * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for avxneconvert. * common/config/i386/cpuinfo.h (get_available_features): Detect avxneconvert. * config.gcc: Add avxneconvertintrin.h * config/i386/avxneconvertintrin.h: New. * config/i386/cpuid.h (bit_AVXNECONVERT): New. * config/i386/i386-builtin-types.def: Add DEF_POINTER_TYPE (PCV8HF, V8HF, CONST), DEF_POINTER_TYPE (PCV16HF, V16HF, CONST), DEF_FUNCTION_TYPE (V4SF, PCSHORT), DEF_FUNCTION_TYPE (V8SF, PCSHORT), DEF_FUNCTION_TYPE (V4SF, PCV8BF), DEF_FUNCTION_TYPE (V4SF, PCV8BF), DEF_FUNCTION_TYPE (V8SF, PCV16HF), DEF_FUNCTION_TYPE (V8SF, PCV16BF). * config/i386/i386-builtin.def: Add new builtins. * config/i386/i386-c.cc (ix86_target_macros_internal): Define __AVXNECONVERT__. * config/i386/i386-expand.cc (ix86_expand_special_args_builtin): Handle V4SF_FTYPE_PCSHORT,V8SF_FTYPE_PCSHORT,V4SF_FTYPE_PCV8BF, V4SF_FTYPE_PCV8HF,V8SF_FTYPE_PCV16BF,V8SF_FTYPE_PCV16HF. * config/i386/i386-isa.def : Add DEF_PTA(AVXNECONVERT) New. * config/i386/i386-options.cc (isa2_opts): Add -mavxneconvert. (ix86_valid_target_attribute_inner_p): Handle avxneconvert. * config/i386/i386.opt: Add option -mavxneconvert. * config/i386/immintrin.h: Inculde avxneconvertintrin.h. * config/i386/sse.md: (avx_vbcstne2ps_), (avx_vcvtne2ps_), (avx_vcvtne2ps_), (avx_vcvtneps2bf16_): New define_insn (avx512f_cvtneps2bf16_):Ditto. (avx512f_cvtneps2bf16__mask):Ditto. * doc/invoke.texi: Document -mavxneconvert. * doc/extend.texi: Document avxneconvert. * doc/sourcebuild.texi: Document target avxneconvert. gcc/testsuite/ChangeLog: * gcc.target/i386/avx-check.h: Add avxneconvert check. * gcc.target/i386/funcspec-56.inc: Add new target attribute. * gcc.target/i386/sse-12.c: Add -mavxneconvert. * gcc.target/i386/sse-13.c: Ditto. * gcc.target/i386/sse-14.c: Ditto. * gcc.target/i386/sse-22.c: Ditto. * gcc.target/i386/sse-23.c: Ditto. * g++.dg/other/i386-2.C: Ditto. * g++.dg/other/i386-3.C: Ditto. * lib/target-supports.exp:add check_effective_target_avxneconvert. * gcc.target/i386/avx-ne-convert-1.c: New test. * gcc.target/i386/avx-ne-convert-vbcstnebf162ps-2.c: Ditto. * gcc.target/i386/avx-ne-convert-vbcstnesh2ps-2.c: Ditto. * gcc.target/i386/avx-ne-convert-vcvtneebf162ps-2.c: Ditto. * gcc.target/i386/avx-ne-convert-vcvtneeph2ps-2.c: Ditto. * gcc.target/i386/avx-ne-convert-vcvtneobf162ps-2.c: Ditto. * gcc.target/i386/avx-ne-convert-vcvtneoph2ps-2.c: Ditto. * gcc.target/i386/avx-ne-convert-vcvtneps2bf16-2.c: Ditto. * gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c: Rename.. * gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1a.c: To this. * gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1b.c: New test. --- gcc/common/config/i386/cpuinfo.h | 2 + gcc/common/config/i386/i386-common.cc | 21 ++- gcc/common/config/i386/i386-cpuinfo.h | 1 + gcc/common/config/i386/i386-isas.h | 2 + gcc/config.gcc | 2 +- gcc/config/i386/avxneconvertintrin.h | 140 ++++++++++++++++++ gcc/config/i386/cpuid.h | 1 + gcc/config/i386/i386-builtin-types.def | 17 +++ gcc/config/i386/i386-builtin.def | 18 +++ gcc/config/i386/i386-c.cc | 2 + gcc/config/i386/i386-expand.cc | 8 + gcc/config/i386/i386-isa.def | 1 + gcc/config/i386/i386-options.cc | 4 +- gcc/config/i386/i386.opt | 5 + gcc/config/i386/immintrin.h | 4 + gcc/config/i386/sse.md | 100 ++++++++++++- gcc/doc/extend.texi | 5 + gcc/doc/invoke.texi | 9 +- gcc/doc/sourcebuild.texi | 3 + gcc/testsuite/g++.dg/other/i386-2.C | 2 +- gcc/testsuite/g++.dg/other/i386-3.C | 2 +- gcc/testsuite/gcc.target/i386/avx-check.h | 3 + .../gcc.target/i386/avx-ne-convert-1.c | 45 ++++++ .../i386/avx-ne-convert-vbcstnebf162ps-2.c | 54 +++++++ .../i386/avx-ne-convert-vbcstnesh2ps-2.c | 42 ++++++ .../i386/avx-ne-convert-vcvtneebf162ps-2.c | 73 +++++++++ .../i386/avx-ne-convert-vcvtneeph2ps-2.c | 66 +++++++++ .../i386/avx-ne-convert-vcvtneobf162ps-2.c | 75 ++++++++++ .../i386/avx-ne-convert-vcvtneoph2ps-2.c | 66 +++++++++ .../i386/avx-ne-convert-vcvtneps2bf16-2.c | 58 ++++++++ ...16-1.c => avx512bf16vl-vcvtneps2bf16-1a.c} | 0 .../i386/avx512bf16vl-vcvtneps2bf16-1b.c | 27 ++++ gcc/testsuite/gcc.target/i386/funcspec-56.inc | 2 + gcc/testsuite/gcc.target/i386/sse-12.c | 2 +- gcc/testsuite/gcc.target/i386/sse-13.c | 2 +- gcc/testsuite/gcc.target/i386/sse-14.c | 2 +- gcc/testsuite/gcc.target/i386/sse-22.c | 4 +- gcc/testsuite/gcc.target/i386/sse-23.c | 2 +- gcc/testsuite/lib/target-supports.exp | 12 ++ 39 files changed, 868 insertions(+), 16 deletions(-) create mode 100644 gcc/config/i386/avxneconvertintrin.h create mode 100644 gcc/testsuite/gcc.target/i386/avx-ne-convert-1.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-ne-convert-vbcstnebf162ps-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-ne-convert-vbcstnesh2ps-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneebf162ps-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneeph2ps-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneobf162ps-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneoph2ps-2.c create mode 100644 gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneps2bf16-2.c rename gcc/testsuite/gcc.target/i386/{avx512bf16vl-vcvtneps2bf16-1.c => avx512bf16vl-vcvtneps2bf16-1a.c} (100%) create mode 100644 gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1b.c diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h index bed88003f8e..e9fd586704d 100644 --- a/gcc/common/config/i386/cpuinfo.h +++ b/gcc/common/config/i386/cpuinfo.h @@ -797,6 +797,8 @@ get_available_features (struct __processor_model *cpu_model, set_feature (FEATURE_AVXIFMA); if (edx & bit_AVXVNNIINT8) set_feature (FEATURE_AVXVNNIINT8); + if (edx & bit_AVXNECONVERT) + set_feature (FEATURE_AVXNECONVERT); } if (avx512_usable) { diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index 6a2a7e3d25a..f9c906f75cf 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -109,6 +109,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA2_AMX_INT8_SET OPTION_MASK_ISA2_AMX_INT8 #define OPTION_MASK_ISA2_AMX_BF16_SET OPTION_MASK_ISA2_AMX_BF16 #define OPTION_MASK_ISA2_AVXVNNIINT8_SET OPTION_MASK_ISA2_AVXVNNIINT8 +#define OPTION_MASK_ISA2_AVXNECONVERT_SET OPTION_MASK_ISA2_AVXNECONVERT /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same as -msse4.2. */ @@ -215,7 +216,8 @@ along with GCC; see the file COPYING3. If not see (OPTION_MASK_ISA_AVX2 | OPTION_MASK_ISA_AVX512F_UNSET) #define OPTION_MASK_ISA2_AVX2_UNSET \ (OPTION_MASK_ISA2_AVXIFMA_UNSET | OPTION_MASK_ISA2_AVXVNNI_UNSET \ - | OPTION_MASK_ISA2_AVXVNNIINT8_UNSET | OPTION_MASK_ISA2_AVX512F_UNSET) + | OPTION_MASK_ISA2_AVXVNNIINT8_UNSET | OPTION_MASK_ISA2_AVXNECONVERT_UNSET \ + | OPTION_MASK_ISA2_AVX512F_UNSET) #define OPTION_MASK_ISA_AVX512F_UNSET \ (OPTION_MASK_ISA_AVX512F | OPTION_MASK_ISA_AVX512CD_UNSET \ | OPTION_MASK_ISA_AVX512PF_UNSET | OPTION_MASK_ISA_AVX512ER_UNSET \ @@ -280,6 +282,7 @@ along with GCC; see the file COPYING3. If not see (OPTION_MASK_ISA2_KL | OPTION_MASK_ISA2_WIDEKL_UNSET) #define OPTION_MASK_ISA2_WIDEKL_UNSET OPTION_MASK_ISA2_WIDEKL #define OPTION_MASK_ISA2_AVXVNNIINT8_UNSET OPTION_MASK_ISA2_AVXVNNIINT8 +#define OPTION_MASK_ISA2_AVXNECONVERT_UNSET OPTION_MASK_ISA2_AVXNECONVERT /* SSE4 includes both SSE4.1 and SSE4.2. -mno-sse4 should the same as -mno-sse4.1. */ @@ -1162,6 +1165,22 @@ ix86_handle_option (struct gcc_options *opts, } return true; + case OPT_mavxneconvert: + if (value) + { + opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_AVXNECONVERT_SET; + opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_AVXNECONVERT_SET; + opts->x_ix86_isa_flags |= OPTION_MASK_ISA_AVX2_SET; + opts->x_ix86_isa_flags_explicit |= OPTION_MASK_ISA_AVX2_SET; + } + else + { + opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_AVXNECONVERT_UNSET; + opts->x_ix86_isa_flags2_explicit + |= OPTION_MASK_ISA2_AVXNECONVERT_UNSET; + } + return true; + case OPT_mfma: if (value) { diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h index 9a6b92fab79..2d3fbfc817a 100644 --- a/gcc/common/config/i386/i386-cpuinfo.h +++ b/gcc/common/config/i386/i386-cpuinfo.h @@ -242,6 +242,7 @@ enum processor_features FEATURE_X86_64_V4, FEATURE_AVXIFMA, FEATURE_AVXVNNIINT8, + FEATURE_AVXNECONVERT, CPU_FEATURE_MAX }; diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h index 8c1f351056c..bceaee589ee 100644 --- a/gcc/common/config/i386/i386-isas.h +++ b/gcc/common/config/i386/i386-isas.h @@ -178,4 +178,6 @@ ISA_NAMES_TABLE_START ISA_NAMES_TABLE_ENTRY("avxifma", FEATURE_AVXIFMA, P_NONE, "-mavxifma") ISA_NAMES_TABLE_ENTRY("avxvnniint8", FEATURE_AVXVNNIINT8, P_NONE, "-mavxvnniint8") + ISA_NAMES_TABLE_ENTRY("avxneconvert", FEATURE_AVXNECONVERT, + P_NONE, "-mavxneconvert") ISA_NAMES_TABLE_END diff --git a/gcc/config.gcc b/gcc/config.gcc index 4df78238910..840b62aee61 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -422,7 +422,7 @@ i[34567]86-*-* | x86_64-*-*) amxbf16intrin.h x86gprintrin.h uintrintrin.h hresetintrin.h keylockerintrin.h avxvnniintrin.h mwaitintrin.h avx512fp16intrin.h avx512fp16vlintrin.h - avxifmaintrin.h avxvnniint8intrin.h" + avxifmaintrin.h avxvnniint8intrin.h avxneconvertintrin.h" ;; ia64-*-*) extra_headers=ia64intrin.h diff --git a/gcc/config/i386/avxneconvertintrin.h b/gcc/config/i386/avxneconvertintrin.h new file mode 100644 index 00000000000..30199384725 --- /dev/null +++ b/gcc/config/i386/avxneconvertintrin.h @@ -0,0 +1,140 @@ +/* Copyright (C) 2021 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _IMMINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _AVXNECONVERTINTRIN_H_INCLUDED +#define _AVXNECONVERTINTRIN_H_INCLUDED + +#ifndef __AVXNECONVERT__ +#pragma GCC push_options +#pragma GCC target ("avxneconvert") +#define __DISABLE_AVXNECONVERT__ +#endif /* __AVXNECONVERT__ */ + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_bcstnebf16_ps (const void *__P) +{ + return (__m128) __builtin_ia32_vbcstnebf162ps128 ((const short *) __P); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_bcstnebf16_ps (const void *__P) +{ + return (__m256) __builtin_ia32_vbcstnebf162ps256 ((const short *) __P); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_bcstnesh_ps (const void *__P) +{ + return (__m128) __builtin_ia32_vbcstnesh2ps128 ((const short *) __P); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_bcstnesh_ps (const void *__P) +{ + return (__m256) __builtin_ia32_vbcstnesh2ps256 ((const short *) __P); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtneebf16_ps (const __m128bf16 *__A) +{ + return (__m128) __builtin_ia32_vcvtneebf162ps128 ((const __v8bf *) __A); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtneebf16_ps (const __m256bf16 *__A) +{ + return (__m256) __builtin_ia32_vcvtneebf162ps256 ((const __v16bf *) __A); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtneeph_ps (const __m128h *__A) +{ + return (__m128) __builtin_ia32_vcvtneeph2ps128 ((const __v8hf *) __A); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtneeph_ps (const __m256h *__A) +{ + return (__m256) __builtin_ia32_vcvtneeph2ps256 ((const __v16hf *) __A); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtneobf16_ps (const __m128bf16 *__A) +{ + return (__m128) __builtin_ia32_vcvtneobf162ps128 ((const __v8bf *) __A); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtneobf16_ps (const __m256bf16 *__A) +{ + return (__m256) __builtin_ia32_vcvtneobf162ps256 ((const __v16bf *) __A); +} + +extern __inline __m128 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtneoph_ps (const __m128h *__A) +{ + return (__m128) __builtin_ia32_vcvtneoph2ps128 ((const __v8hf *) __A); +} + +extern __inline __m256 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtneoph_ps (const __m256h *__A) +{ + return (__m256) __builtin_ia32_vcvtneoph2ps256 ((const __v16hf *) __A); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm_cvtneps_avx_pbh (__m128 __A) +{ + return (__m128bf16) __builtin_ia32_vcvtneps2bf16128 (__A); +} + +extern __inline __m128bf16 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_cvtneps_avx_pbh (__m256 __A) +{ + return (__m128bf16) __builtin_ia32_vcvtneps2bf16256 (__A); +} + +#ifdef __DISABLE_AVXNECONVERT__ +#undef __DISABLE_AVXNECONVERT__ +#pragma GCC pop_options +#endif /* __DISABLE_AVXNECONVERT__ */ + +#endif /* _AVXNECONVERTINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index f5fad22149a..18bbc0cb7be 100644 --- a/gcc/config/i386/cpuid.h +++ b/gcc/config/i386/cpuid.h @@ -50,6 +50,7 @@ /* %edx */ #define bit_AVXVNNIINT8 (1 << 4) +#define bit_AVXNECONVERT (1 << 5) #define bit_CMPXCHG8B (1 << 8) #define bit_CMOV (1 << 15) #define bit_MMX (1 << 23) diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index 63a360b0f8b..ebf6e5b4ad8 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -87,6 +87,7 @@ DEF_VECTOR_TYPE (V8QI, QI) DEF_VECTOR_TYPE (V2DF, DOUBLE) DEF_VECTOR_TYPE (V4SF, FLOAT) DEF_VECTOR_TYPE (V8HF, FLOAT16) +DEF_VECTOR_TYPE (V8BF, BFLOAT16) DEF_VECTOR_TYPE (V2DI, DI) DEF_VECTOR_TYPE (V4SI, SI) DEF_VECTOR_TYPE (V8HI, HI) @@ -100,6 +101,7 @@ DEF_VECTOR_TYPE (V16UQI, UQI, V16QI) DEF_VECTOR_TYPE (V4DF, DOUBLE) DEF_VECTOR_TYPE (V8SF, FLOAT) DEF_VECTOR_TYPE (V16HF, FLOAT16) +DEF_VECTOR_TYPE (V16BF, BFLOAT16) DEF_VECTOR_TYPE (V4DI, DI) DEF_VECTOR_TYPE (V8SI, SI) DEF_VECTOR_TYPE (V16HI, HI) @@ -111,6 +113,7 @@ DEF_VECTOR_TYPE (V16UHI, UHI, V16HI) # AVX512F vectors DEF_VECTOR_TYPE (V32SF, FLOAT) DEF_VECTOR_TYPE (V32HF, FLOAT16) +DEF_VECTOR_TYPE (V32BF, BFLOAT16) DEF_VECTOR_TYPE (V16SF, FLOAT) DEF_VECTOR_TYPE (V8DF, DOUBLE) DEF_VECTOR_TYPE (V8DI, DI) @@ -179,6 +182,10 @@ DEF_POINTER_TYPE (PCV4DF, V4DF, CONST) DEF_POINTER_TYPE (PCV4SF, V4SF, CONST) DEF_POINTER_TYPE (PCV8DF, V8DF, CONST) DEF_POINTER_TYPE (PCV8SF, V8SF, CONST) +DEF_POINTER_TYPE (PCV8HF, V8HF, CONST) +DEF_POINTER_TYPE (PCV8BF, V8BF, CONST) +DEF_POINTER_TYPE (PCV16HF, V16HF, CONST) +DEF_POINTER_TYPE (PCV16BF, V16BF, CONST) DEF_POINTER_TYPE (PCV16SF, V16SF, CONST) DEF_POINTER_TYPE (PCV2DI, V2DI, CONST) @@ -254,12 +261,14 @@ DEF_FUNCTION_TYPE (V4DF, V4SI) DEF_FUNCTION_TYPE (V8DF, V8DF) DEF_FUNCTION_TYPE (V4HI, V4HI) DEF_FUNCTION_TYPE (V4SF, PCFLOAT) +DEF_FUNCTION_TYPE (V4SF, PCSHORT) DEF_FUNCTION_TYPE (V4SF, V2DF) DEF_FUNCTION_TYPE (V4SF, V2DF, V4SF, UQI) DEF_FUNCTION_TYPE (V4SF, V4DF) DEF_FUNCTION_TYPE (V4SF, V4DF, V4SF, UQI) DEF_FUNCTION_TYPE (V4SF, V4SF) DEF_FUNCTION_TYPE (V4SF, PCV4SF) +DEF_FUNCTION_TYPE (V4SF, PCV8HF) DEF_FUNCTION_TYPE (V4SF, V4SI) DEF_FUNCTION_TYPE (V4SF, V8SF) DEF_FUNCTION_TYPE (V4SF, V8HI) @@ -275,8 +284,10 @@ DEF_FUNCTION_TYPE (V8HI, V16QI) DEF_FUNCTION_TYPE (V8HI, V8HI) DEF_FUNCTION_TYPE (V8QI, V8QI) DEF_FUNCTION_TYPE (V8SF, PCFLOAT) +DEF_FUNCTION_TYPE (V8SF, PCSHORT) DEF_FUNCTION_TYPE (V8SF, PCV4SF) DEF_FUNCTION_TYPE (V8SF, PCV8SF) +DEF_FUNCTION_TYPE (V8SF, PCV16HF) DEF_FUNCTION_TYPE (V8SF, V4SF) DEF_FUNCTION_TYPE (V8SF, V8SF) DEF_FUNCTION_TYPE (V8SF, V8SI) @@ -1389,3 +1400,9 @@ DEF_FUNCTION_TYPE (V32HF, V32HF) DEF_FUNCTION_TYPE_ALIAS (V8HF_FTYPE_V8HF, ROUND) DEF_FUNCTION_TYPE_ALIAS (V16HF_FTYPE_V16HF, ROUND) DEF_FUNCTION_TYPE_ALIAS (V32HF_FTYPE_V32HF, ROUND) + +# AVXNECONVERT builtins +DEF_FUNCTION_TYPE (V8BF, V8SF) +DEF_FUNCTION_TYPE (V8BF, V4SF) +DEF_FUNCTION_TYPE (V4SF, PCV8BF) +DEF_FUNCTION_TYPE (V8SF, PCV16BF) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index e6edae5728b..a429577180c 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -274,6 +274,20 @@ BDESC (OPTION_MASK_ISA_RTM, 0, CODE_FOR_xbegin, "__builtin_ia32_xbegin", IX86_BU BDESC (OPTION_MASK_ISA_RTM, 0, CODE_FOR_xend, "__builtin_ia32_xend", IX86_BUILTIN_XEND, UNKNOWN, (int) VOID_FTYPE_VOID) BDESC (OPTION_MASK_ISA_RTM, 0, CODE_FOR_xtest, "__builtin_ia32_xtest", IX86_BUILTIN_XTEST, UNKNOWN, (int) INT_FTYPE_VOID) +/* AVX-NE-CONVERT */ +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vbcstnebf162ps_v4sf, "__builtin_ia32_vbcstnebf162ps128", IX86_BUILTIN_VBCSTNEBF162PS128, UNKNOWN, (int) V4SF_FTYPE_PCSHORT) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vbcstnebf162ps_v8sf, "__builtin_ia32_vbcstnebf162ps256", IX86_BUILTIN_VBCSTNEBF162PS256, UNKNOWN, (int) V8SF_FTYPE_PCSHORT) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vbcstnesh2ps_v4sf, "__builtin_ia32_vbcstnesh2ps128", IX86_BUILTIN_VBCSTNESH2PS128, UNKNOWN, (int) V4SF_FTYPE_PCSHORT) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vbcstnesh2ps_v8sf, "__builtin_ia32_vbcstnesh2ps256", IX86_BUILTIN_VBCSTNESH2PS256, UNKNOWN, (int) V8SF_FTYPE_PCSHORT) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneebf162ps_v4sf, "__builtin_ia32_vcvtneebf162ps128", IX86_BUILTIN_VCVTNEEBF162PS128, UNKNOWN, (int) V4SF_FTYPE_PCV8BF) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneebf162ps_v8sf, "__builtin_ia32_vcvtneebf162ps256", IX86_BUILTIN_VCVTNEEBF162PS256, UNKNOWN, (int) V8SF_FTYPE_PCV16BF) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneeph2ps_v4sf, "__builtin_ia32_vcvtneeph2ps128", IX86_BUILTIN_VCVTNEEPH2PS128, UNKNOWN, (int) V4SF_FTYPE_PCV8HF) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneeph2ps_v8sf, "__builtin_ia32_vcvtneeph2ps256", IX86_BUILTIN_VCVTNEEPH2PS256, UNKNOWN, (int) V8SF_FTYPE_PCV16HF) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneobf162ps_v4sf, "__builtin_ia32_vcvtneobf162ps128", IX86_BUILTIN_VCVTNEOBF162PS128, UNKNOWN, (int) V4SF_FTYPE_PCV8BF) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneobf162ps_v8sf, "__builtin_ia32_vcvtneobf162ps256", IX86_BUILTIN_VCVTNEOBF162PS256, UNKNOWN, (int) V8SF_FTYPE_PCV16BF) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneoph2ps_v4sf, "__builtin_ia32_vcvtneoph2ps128", IX86_BUILTIN_VCVTNEOPH2PS128, UNKNOWN, (int) V4SF_FTYPE_PCV8HF) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneoph2ps_v8sf, "__builtin_ia32_vcvtneoph2ps256", IX86_BUILTIN_VCVTNEOPH2PS256, UNKNOWN, (int) V8SF_FTYPE_PCV16HF) + /* AVX512BW */ BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv32hi_mask, "__builtin_ia32_loaddquhi512_mask", IX86_BUILTIN_LOADDQUHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_PCSHORT_V32HI_USI) BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv64qi_mask, "__builtin_ia32_loaddquqi512_mask", IX86_BUILTIN_LOADDQUQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_PCCHAR_V64QI_UDI) @@ -2809,6 +2823,10 @@ BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf, "__builti BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_mask, "__builtin_ia32_dpbf16ps_v4sf_mask", IX86_BUILTIN_DPHI16PS_V4SF_MASK, UNKNOWN, (int) V4SF_FTYPE_V4SF_V8HI_V8HI_UQI) BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_maskz, "__builtin_ia32_dpbf16ps_v4sf_maskz", IX86_BUILTIN_DPHI16PS_V4SF_MASKZ, UNKNOWN, (int) V4SF_FTYPE_V4SF_V8HI_V8HI_UQI) +/* AVX-NE-CONVERT */ +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_avx_vcvtneps2bf16_v4sf, "__builtin_ia32_vcvtneps2bf16128", IX86_BUILTIN_VCVTNEPS2BF16128, UNKNOWN, (int) V8BF_FTYPE_V4SF) +BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_avx_vcvtneps2bf16_v8sf, "__builtin_ia32_vcvtneps2bf16256", IX86_BUILTIN_VCVTNEPS2BF16256, UNKNOWN, (int) V8BF_FTYPE_V8SF) + /* AVX512FP16. */ BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv8hf3_mask, "__builtin_ia32_addph128_mask", IX86_BUILTIN_ADDPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI) BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, CODE_FOR_addv16hf3_mask, "__builtin_ia32_addph256_mask", IX86_BUILTIN_ADDPH256_MASK, UNKNOWN, (int) V16HF_FTYPE_V16HF_V16HF_V16HF_UHI) diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc index a9a35c0a18a..48934df664c 100644 --- a/gcc/config/i386/i386-c.cc +++ b/gcc/config/i386/i386-c.cc @@ -637,6 +637,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, def_or_undef (parse_in, "__AVXIFMA__"); if (isa_flag2 & OPTION_MASK_ISA2_AVXVNNIINT8) def_or_undef (parse_in, "__AVXVNNIINT8__"); + if (isa_flag2 & OPTION_MASK_ISA2_AVXNECONVERT) + def_or_undef (parse_in, "__AVXNECONVERT__"); if (TARGET_IAMCU) { def_or_undef (parse_in, "__iamcu"); diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index a0f8a98986e..1e29fe584af 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -10427,7 +10427,9 @@ ix86_expand_args_builtin (const struct builtin_description *d, case V4DI_FTYPE_V4DI: case V16HI_FTYPE_V16SF: case V8HI_FTYPE_V8SF: + case V8BF_FTYPE_V8SF: case V8HI_FTYPE_V4SF: + case V8BF_FTYPE_V4SF: nargs = 1; break; case V4SF_FTYPE_V4SF_VEC_MERGE: @@ -11860,6 +11862,12 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, case V8SF_FTYPE_PCV4SF: case V8SF_FTYPE_PCFLOAT: case V4SF_FTYPE_PCFLOAT: + case V4SF_FTYPE_PCSHORT: + case V4SF_FTYPE_PCV8BF: + case V4SF_FTYPE_PCV8HF: + case V8SF_FTYPE_PCSHORT: + case V8SF_FTYPE_PCV16BF: + case V8SF_FTYPE_PCV16HF: case V4DF_FTYPE_PCV2DF: case V4DF_FTYPE_PCDOUBLE: case V2DF_FTYPE_PCDOUBLE: diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def index c95b917c6ce..4ea3f96f69f 100644 --- a/gcc/config/i386/i386-isa.def +++ b/gcc/config/i386/i386-isa.def @@ -111,3 +111,4 @@ DEF_PTA(AVXVNNI) DEF_PTA(AVX512FP16) DEF_PTA(AVXIFMA) DEF_PTA(AVXVNNIINT8) +DEF_PTA(AVXNECONVERT) diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index 3e6d04433a6..e59e2d8aeaf 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -228,7 +228,8 @@ static struct ix86_target_opts isa2_opts[] = { "-mavxvnni", OPTION_MASK_ISA2_AVXVNNI }, { "-mavx512fp16", OPTION_MASK_ISA2_AVX512FP16 }, { "-mavxifma", OPTION_MASK_ISA2_AVXIFMA }, - { "-mavxvnniint8", OPTION_MASK_ISA2_AVXVNNIINT8 } + { "-mavxvnniint8", OPTION_MASK_ISA2_AVXVNNIINT8 }, + { "-mavxneconvert", OPTION_MASK_ISA2_AVXNECONVERT } }; static struct ix86_target_opts isa_opts[] = { @@ -1076,6 +1077,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[], IX86_ATTR_ISA ("avx512fp16", OPT_mavx512fp16), IX86_ATTR_ISA ("avxifma", OPT_mavxifma), IX86_ATTR_ISA ("avxvnniint8", OPT_mavxvnniint8), + IX86_ATTR_ISA ("avxneconvert", OPT_mavxneconvert), /* enum options */ IX86_ATTR_ENUM ("fpmath=", OPT_mfpmath_), diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 53d534f6392..6e07b89ac4c 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -1224,3 +1224,8 @@ mavxvnniint8 Target Mask(ISA2_AVXVNNIINT8) Var(ix86_isa_flags2) Save Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2 and AVXVNNIINT8 built-in functions and code generation. + +mavxneconvert +Target Mask(ISA2_AVXNECONVERT) Var(ix86_isa_flags2) Save +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, and +AVXNECONVERT build-in functions and code generation. diff --git a/gcc/config/i386/immintrin.h b/gcc/config/i386/immintrin.h index c62d50f1951..d7433f639c8 100644 --- a/gcc/config/i386/immintrin.h +++ b/gcc/config/i386/immintrin.h @@ -124,6 +124,10 @@ #include #endif +#ifdef __AVX2__ +#include +#endif + #include #include diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 49490a213ea..bef4447de62 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -171,6 +171,14 @@ UNSPEC_VPMADDWDACCD UNSPEC_VPMADDWDACCSSD + ;; For AVXNECONVERT support + UNSPEC_VCVTNEBF16SF + UNSPEC_VCVTNESHSF + UNSPEC_VCVTNEEBF16SF + UNSPEC_VCVTNEEPHSF + UNSPEC_VCVTNEOBF16SF + UNSPEC_VCVTNEOPHSF + ;; For VAES support UNSPEC_VAESDEC UNSPEC_VAESDECLAST @@ -28930,9 +28938,69 @@ ;; Converting from SF to BF (define_mode_attr sf_cvt_bf16 [(V4SF "V8HI") (V8SF "V8HI") (V16SF "V16HI")]) +(define_mode_attr sf_cvt_bfloat16 + [(V4SF "V8BF") (V8SF "V8BF")]) ;; Mapping from BF to SF (define_mode_attr sf_bf16 [(V4SF "V8HI") (V8SF "V16HI") (V16SF "V32HI")]) +(define_mode_attr sf_bfloat16 + [(V4SF "V8BF") (V8SF "V16BF") (V16SF "V32BF")]) +;; Mapping from PH to SF +(define_mode_attr ph_cvt_sf + [(V4SF "V8HF") (V8SF "V16HF")]) + +(define_int_iterator VBCSTNE + [UNSPEC_VCVTNEBF16SF + UNSPEC_VCVTNESHSF]) + +(define_int_attr vbcstnetype + [(UNSPEC_VCVTNEBF16SF "bf16") (UNSPEC_VCVTNESHSF "sh")]) + +(define_insn "vbcstne2ps_" + [(set (match_operand:VF1_128_256 0 "register_operand" "=x") + (vec_duplicate:VF1_128_256 + (unspec:SF + [(match_operand:HI 1 "memory_operand" "m")] + VBCSTNE)))] + "TARGET_AVXNECONVERT" + "vbcstne2ps\t{%1, %0|%0, %1}" + [(set_attr "prefix" "vex") + (set_attr "mode" "")]) + +(define_int_iterator VCVTNEBF16 + [UNSPEC_VCVTNEEBF16SF + UNSPEC_VCVTNEOBF16SF]) + +(define_int_attr vcvtnebf16type + [(UNSPEC_VCVTNEEBF16SF "ebf16") + (UNSPEC_VCVTNEOBF16SF "obf16")]) +(define_insn "vcvtne2ps_" + [(set (match_operand:VF1_128_256 0 "register_operand" "=x") + (unspec:VF1_128_256 + [(match_operand: 1 "memory_operand" "m")] + VCVTNEBF16))] + "TARGET_AVXNECONVERT" + "vcvtne2ps\t{%1, %0|%0, %1}" + [(set_attr "prefix" "vex") + (set_attr "mode" "")]) + +(define_int_iterator VCVTNEPH + [UNSPEC_VCVTNEEPHSF + UNSPEC_VCVTNEOPHSF]) + +(define_int_attr vcvtnephtype + [(UNSPEC_VCVTNEEPHSF "eph") + (UNSPEC_VCVTNEOPHSF "oph")]) + +(define_insn "vcvtne2ps_" + [(set (match_operand:VF1_128_256 0 "register_operand" "=x") + (unspec:VF1_128_256 + [(match_operand: 1 "memory_operand" "m")] + VCVTNEPH))] + "TARGET_AVXNECONVERT" + "vcvtne2ps\t{%1, %0|%0, %1}" + [(set_attr "prefix" "vex") + (set_attr "mode" "")]) (define_expand "avx512f_cvtne2ps2bf16__maskz" [(match_operand:BF16 0 "register_operand") @@ -28966,13 +29034,41 @@ DONE; }) -(define_insn "avx512f_cvtneps2bf16_" +(define_insn "avx_vcvtneps2bf16_" + [(set (match_operand: 0 "register_operand" "=v") + (unspec: + [(match_operand:VF1_128_256 1 "register_operand" "v")] + UNSPEC_VCVTNEPS2BF16))] + "TARGET_AVXNECONVERT" + "%{vex%} vcvtneps2bf16\t{%1, %0|%0, %1}" + [(set_attr "prefix" "vex")]) + +(define_insn "avx512f_cvtneps2bf16_" [(set (match_operand: 0 "register_operand" "=v") (unspec: [(match_operand:VF1_AVX512VL 1 "register_operand" "v")] UNSPEC_VCVTNEPS2BF16))] "TARGET_AVX512BF16" - "vcvtneps2bf16\t{%1, %0|%0, %1}") + { + if ( <=32 + && TARGET_AVXNECONVERT + && !EXT_REX_SSE_REG_P (operands[0]) + && !EXT_REX_SSE_REG_P (operands[1])) + return "%{vex%} vcvtneps2bf16\t{%1, %0|%0, %1}"; + else + return "vcvtneps2bf16\t{%1, %0|%0, %1}"; + }) + +(define_insn "avx512f_cvtneps2bf16__mask" + [(set (match_operand: 0 "register_operand" "=v") + (vec_merge: + (unspec: + [(match_operand:VF1_AVX512VL 1 "register_operand" "v")] + UNSPEC_VCVTNEPS2BF16) + (match_operand: 2 "nonimm_or_0_operand" "0C") + (match_operand: 3 "register_operand" "Yk")))] + "TARGET_AVX512BF16" + "vcvtneps2bf16\t{%1, %0%{%3%}%N2|%0%{%3%}%N2, %1}") (define_expand "avx512f_dpbf16ps__maskz" [(match_operand:VF1_AVX512VL 0 "register_operand") diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 9a8de9fc226..0a4396f92bb 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -7070,6 +7070,11 @@ Enable/disable the generation of the AVXIFMA instructions. @cindex @code{target("avxvnniint8")} function attribute, x86 Enable/disable the generation of the AVXVNNIINT8 instructions. +@item avxneconvert +@itemx no-avxneconvert +@cindex @code{target("avxneconvert")} function attribute, x86 +Enable/disable the generation of the AVXNECONVERT instructions. + @item cld @itemx no-cld @cindex @code{target("cld")} function attribute, x86 diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index d4ff7549bf3..307fb7fa441 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1436,7 +1436,7 @@ See RS/6000 and PowerPC Options. -mavx5124fmaps -mavx512vnni -mavx5124vnniw -mprfchw -mrdpid @gol -mrdseed -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol -mamx-tile -mamx-int8 -mamx-bf16 -muintr -mhreset -mavxvnni@gol --mavx512fp16 -mavxifma -mavxvnniint8 @gol +-mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert @gol -mcldemote -mms-bitfields -mno-align-stringops -minline-all-stringops @gol -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol -mkl -mwidekl @gol @@ -32899,6 +32899,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}. @need 200 @itemx -mavxvnniint8 @opindex mavxvnniint8 +@need 200 +@itemx -mavxneconvert +@opindex mavxneconvert These switches enable the use of instructions in the MMX, SSE, SSE2, SSE3, SSSE3, SSE4, SSE4A, SSE4.1, SSE4.2, AVX, AVX2, AVX512F, AVX512PF, AVX512ER, AVX512CD, AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA, AVX512VBMI, SHA, @@ -32909,8 +32912,8 @@ XSAVEOPT, XSAVEC, XSAVES, RTM, HLE, TBM, MWAITX, CLZERO, PKU, AVX512VBMI2, GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16, ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE, UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512FP16, -AVXIFMA, AVXVNNIINT8 or CLDEMOTE extended instruction sets. Each has a -corresponding @option{-mno-} option to disable use of these instructions. +AVXIFMA, AVXVNNIINT8, AVXNECONVERT or CLDEMOTE extended instruction sets. Each +has a corresponding @option{-mno-} option to disable use of these instructions. These extensions are also available as built-in functions: see @ref{x86 Built-in Functions}, for details of the functions enabled and diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index e21a1d381e0..a12175b6498 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -2493,6 +2493,9 @@ Target supports the execution of @code{avx512vp2intersect} instructions. @item avxifma Target supports the execution of @code{avxifma} instructions. +@item avxneconvert +Target supports the execution of @code{avxneconvert} instructions. + @item avxvnniint8 Target supports the execution of @code{avxvnniint8} instructions. diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C index ebd01fe47bc..dd3e71f25ed 100644 --- a/gcc/testsuite/g++.dg/other/i386-2.C +++ b/gcc/testsuite/g++.dg/other/i386-2.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8" } */ +/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C index b66498f1d4c..cd7045cc4e4 100644 --- a/gcc/testsuite/g++.dg/other/i386-3.C +++ b/gcc/testsuite/g++.dg/other/i386-3.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8" } */ +/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/gcc.target/i386/avx-check.h b/gcc/testsuite/gcc.target/i386/avx-check.h index 77507ca2edc..666eff50780 100644 --- a/gcc/testsuite/gcc.target/i386/avx-check.h +++ b/gcc/testsuite/gcc.target/i386/avx-check.h @@ -28,6 +28,9 @@ main () #endif #ifdef AVXVNNIINT8 && __builtin_cpu_supports ("avxvnniint8") +#endif +#ifdef AVXNECONVERT + && __builtin_cpu_supports ("avxneconvert") #endif ) { diff --git a/gcc/testsuite/gcc.target/i386/avx-ne-convert-1.c b/gcc/testsuite/gcc.target/i386/avx-ne-convert-1.c new file mode 100644 index 00000000000..b1848037e81 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ne-convert-1.c @@ -0,0 +1,45 @@ +/* { dg-do compile } */ +/* { dg-options "-mavxneconvert -O2" } */ +/* { dg-final { scan-assembler-times "vbcstnebf162ps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vbcstnebf162ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vbcstnesh2ps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vbcstnesh2ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneebf162ps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneebf162ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneeph2ps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneeph2ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneobf162ps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneobf162ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneoph2ps\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneoph2ps\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +#include + +volatile __m128 x1; +volatile __m256 x2; +volatile __m128bf16 res1, res2; +const void *a; +__m128bf16 *b; +__m256bf16 *c; +__m128h *d; +__m256h *e; + +void extern +avx_ne_convert_test (void) +{ + x1 = _mm_bcstnebf16_ps (a); + x2 = _mm256_bcstnebf16_ps (a); + x1 = _mm_bcstnesh_ps (a); + x2 = _mm256_bcstnesh_ps (a); + x1 = _mm_cvtneebf16_ps (b); + x2 = _mm256_cvtneebf16_ps (c); + x1 = _mm_cvtneeph_ps (d); + x2 = _mm256_cvtneeph_ps (e); + x1 = _mm_cvtneobf16_ps (b); + x2 = _mm256_cvtneobf16_ps (c); + x1 = _mm_cvtneoph_ps (d); + x2 = _mm256_cvtneoph_ps (e); + res1 = _mm_cvtneps_avx_pbh (x1); + res2 = _mm256_cvtneps_avx_pbh (x2); +} diff --git a/gcc/testsuite/gcc.target/i386/avx-ne-convert-vbcstnebf162ps-2.c b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vbcstnebf162ps-2.c new file mode 100644 index 00000000000..2707c58f7cd --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vbcstnebf162ps-2.c @@ -0,0 +1,54 @@ +/* { dg-do run } */ +/* { dg-options "-mavxneconvert -O2" } */ +/* { dg-require-effective-target avxneconvert } */ +#define AVXNECONVERT +#include + +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +typedef union +{ + uint32_t int32; + float flt; +} float_int_t; + +static uint16_t convert_fp32_to_bf16 (float fp) +{ + float_int_t fi; + fi.flt = fp; + return ((fi.int32 >> 16) & 0xffff); +} + +void TEST (void) +{ + union128 dst_128; + union256 dst_256; + float res_ref_128[4], res_ref_256[8], fp32; + uint16_t var; + fp32 = (float) 3 * 2 + 5.5; + for (int i = 0; i < 4; i++) + { + res_ref_128[i] = fp32; + dst_128.a[i] = 117; + } + for (int i = 0; i < 8; i++) + { + res_ref_256[i] = fp32; + dst_256.a[i] = 117; + } + var = convert_fp32_to_bf16 (fp32); + dst_128.x = _mm_bcstnebf16_ps (&var); + dst_256.x = _mm256_bcstnebf16_ps (&var); + if (check_union128 (dst_128, res_ref_128)) + abort(); + if (check_union256 (dst_256, res_ref_256)) + abort(); +} diff --git a/gcc/testsuite/gcc.target/i386/avx-ne-convert-vbcstnesh2ps-2.c b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vbcstnesh2ps-2.c new file mode 100644 index 00000000000..0e6f38334b8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vbcstnesh2ps-2.c @@ -0,0 +1,42 @@ +/* { dg-do run } */ +/* { dg-options "-mavxneconvert -mf16c -O2" } */ +/* { dg-require-effective-target avxneconvert } */ +#define AVXNECONVERT +#include +#include + +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +void TEST (void) +{ + union128 dst_128; + union256 dst_256; + float res_ref_128[4], res_ref_256[8], fp32; + uint16_t var; + fp32 = (float) 3 * 2 + 8.5; + for (int i = 0; i < 4; i++) + { + res_ref_128[i] = fp32; + dst_128.a[i] = 117; + } + for (int i = 0; i < 8; i++) + { + res_ref_256[i] = fp32; + dst_256.a[i] = 117; + } + var = _cvtss_sh (fp32, 0); + dst_128.x = _mm_bcstnesh_ps (&var); + dst_256.x = _mm256_bcstnesh_ps (&var); + if (check_union128 (dst_128, res_ref_128)) + abort(); + if (check_union256 (dst_256, res_ref_256)) + abort(); +} diff --git a/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneebf162ps-2.c b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneebf162ps-2.c new file mode 100644 index 00000000000..c80f3fdedec --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneebf162ps-2.c @@ -0,0 +1,73 @@ +/* { dg-do run } */ +/* { dg-options "-mavxneconvert -O2" } */ +/* { dg-require-effective-target avxneconvert } */ +#define AVXNECONVERT +#include + +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +typedef union +{ + uint32_t int32; + float flt; +} float_int_t; + +typedef union +{ + __m128bf16 x; + uint32_t a[4]; +} union128bf16_i; + +typedef union +{ + __m256bf16 x; + uint32_t a[8]; +} union256bf16_i; + +static uint16_t convert_fp32_to_bf16 (float fp) +{ + float_int_t fi; + fi.flt = fp; + return ((fi.int32 >> 16) & 0xffff); +} + +void TEST (void) +{ + union128 dst_128; + union256 dst_256; + float res_ref_128[4], res_ref_256[8], fp32; + uint16_t bf16; + union128bf16_i src_128bh; + union256bf16_i src_256bh; + + for (int i = 0; i < 4; i++) + { + fp32 = (float) 3 * i + 5 + i * 0.5; + bf16 = convert_fp32_to_bf16 (fp32); + src_128bh.a[i] = bf16; // store bf16 at the lower part of the dword + res_ref_128[i] = fp32; + dst_128.a[i] = 117; + } + for (int i = 0; i < 8; i++) + { + fp32 = (float) 3 * i + 5 + i * 0.5; + bf16 = convert_fp32_to_bf16 (fp32); + src_256bh.a[i] = bf16; // store bf16 at the lower part of the dword + res_ref_256[i] = fp32; + dst_256.a[i] = 117; + } + dst_128.x = _mm_cvtneebf16_ps (&src_128bh.x); + dst_256.x = _mm256_cvtneebf16_ps (&src_256bh.x); + if (check_union128 (dst_128, res_ref_128)) + abort(); + if (check_union256 (dst_256, res_ref_256)) + abort(); +} diff --git a/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneeph2ps-2.c b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneeph2ps-2.c new file mode 100644 index 00000000000..a862894746d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneeph2ps-2.c @@ -0,0 +1,66 @@ +/* { dg-do run } */ +/* { dg-options "-mavxneconvert -mf16c -O2" } */ +/* { dg-require-effective-target avxneconvert } */ +#define AVXNECONVERT +#include + +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +typedef union +{ + uint32_t int32; + float flt; +} float_int_t; + +typedef union +{ + __m128h x; + uint32_t a[4]; +} union128h; + +typedef union +{ + __m256h x; + uint32_t a[8]; +} union256h; + +void TEST (void) +{ + union128 dst_128; + union256 dst_256; + float res_ref_128[4], res_ref_256[8], fp32; + uint16_t fp16; + union128h src_128h; + union256h src_256h; + + for (int i = 0; i < 4; i++) + { + fp32 = (float) 3 * i + 5 + i * 0.5; + fp16 = _cvtss_sh (fp32, 0); + src_128h.a[i] = fp16; + res_ref_128[i] = fp32; + dst_128.a[i] = 117; + } + for (int i = 0; i < 8; i++) + { + fp32 = (float) 3 * i + 5 + i * 0.5; + fp16 = _cvtss_sh (fp32, 0); + src_256h.a[i] = fp16; + res_ref_256[i] = fp32; + dst_256.a[i] = 117; + } + dst_128.x = _mm_cvtneeph_ps (&src_128h.x); + dst_256.x = _mm256_cvtneeph_ps (&src_256h.x); + if (check_union128 (dst_128, res_ref_128)) + abort(); + if (check_union256 (dst_256, res_ref_256)) + abort(); +} diff --git a/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneobf162ps-2.c b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneobf162ps-2.c new file mode 100644 index 00000000000..d95aee067ae --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneobf162ps-2.c @@ -0,0 +1,75 @@ +/* { dg-do run } */ +/* { dg-options "-mavxneconvert -O2" } */ +/* { dg-require-effective-target avxneconvert } */ +#define AVXNECONVERT +#include + +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +typedef union +{ + uint32_t int32; + float flt; +} float_int_t; + +typedef union +{ + __m128bf16 x; + uint32_t a[4]; +} union128bf16_i; + +typedef union +{ + __m256bf16 x; + uint32_t a[8]; +} union256bf16_i; + +static uint16_t convert_fp32_to_bf16 (float fp) +{ + float_int_t fi; + fi.flt = fp; + return ((fi.int32 >> 16) & 0xffff); +} + +void TEST (void) +{ + union128 dst_128; + union256 dst_256; + float res_ref_128[4], res_ref_256[8], fp32; + uint16_t bf16; + union128bf16_i src_128bh; + union256bf16_i src_256bh; + + for (int i = 0; i < 4; i++) + { + fp32 = (float) 3 * i + 5 + i * 0.5; + bf16 = convert_fp32_to_bf16 (fp32); + // store bf16 at the upper part of the dword + src_128bh.a[i] = (bf16 << 16) & 0xffff0000; + res_ref_128[i] = fp32; + dst_128.a[i] = 117; + } + for (int i = 0; i < 8; i++) + { + fp32 = (float) 3 * i + 5 + i * 0.5; + bf16 = convert_fp32_to_bf16 (fp32); + // store bf16 at the upper part of the dword + src_256bh.a[i] = (bf16 << 16) & 0xffff0000; + res_ref_256[i] = fp32; + dst_256.a[i] = 117; + } + dst_128.x = _mm_cvtneobf16_ps (&src_128bh.x); + dst_256.x = _mm256_cvtneobf16_ps (&src_256bh.x); + if (check_union128 (dst_128, res_ref_128)) + abort(); + if (check_union256 (dst_256, res_ref_256)) + abort(); +} diff --git a/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneoph2ps-2.c b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneoph2ps-2.c new file mode 100644 index 00000000000..95eb5d74765 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneoph2ps-2.c @@ -0,0 +1,66 @@ +/* { dg-do run } */ +/* { dg-options "-mavxneconvert -mf16c -O2" } */ +/* { dg-require-effective-target avxneconvert } */ +#define AVXNECONVERT +#include + +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +typedef union +{ + uint32_t int32; + float flt; +} float_int_t; + +typedef union +{ + __m128h x; + uint32_t a[4]; +} union128h; + +typedef union +{ + __m256h x; + uint32_t a[8]; +} union256h; + +void TEST (void) +{ + union128 dst_128; + union256 dst_256; + float res_ref_128[4], res_ref_256[8], fp32; + uint16_t fp16; + union128h src_128h; + union256h src_256h; + + for (int i = 0; i < 4; i++) + { + fp32 = (float) 3 * i + 5 + i * 0.5; + fp16 = _cvtss_sh (fp32, 0); + src_128h.a[i] = fp16 << 16; + res_ref_128[i] = fp32; + dst_128.a[i] = 117; + } + for (int i = 0; i < 8; i++) + { + fp32 = (float) 3 * i + 5 + i * 0.5; + fp16 = _cvtss_sh (fp32, 0); + src_256h.a[i] = fp16 << 16; + res_ref_256[i] = fp32; + dst_256.a[i] = 117; + } + dst_128.x = _mm_cvtneoph_ps (&src_128h.x); + dst_256.x = _mm256_cvtneoph_ps (&src_256h.x); + if (check_union128 (dst_128, res_ref_128)) + abort(); + if (check_union256 (dst_256, res_ref_256)) + abort(); +} diff --git a/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneps2bf16-2.c b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneps2bf16-2.c new file mode 100644 index 00000000000..0861521111a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx-ne-convert-vcvtneps2bf16-2.c @@ -0,0 +1,58 @@ +/* { dg-do run } */ +/* { dg-options "-mavxneconvert -O2" } */ +/* { dg-require-effective-target avxneconvert } */ +#define AVXNECONVERT +#include + +#ifndef CHECK +#define CHECK "avx-check.h" +#endif + +#ifndef TEST +#define TEST avx_test +#endif + +#include CHECK + +typedef union +{ + uint32_t int32; + float flt; +} float_int_t; + +typedef union +{ + __m128bf16 x; + unsigned short a[8]; +} union128bf16; + +void TEST (void) +{ + union128 src_128; + union256 src_256; + union128bf16 dst_128, dst_256; + uint16_t res_ref_128[8] = {0}, res_ref_256[8]; + float_int_t fp32; + for (int i = 0; i < 4; i++) + { + fp32.flt = (float) 2 * i + 7 + i * 0.25; + src_128.a[i] = fp32.flt; + res_ref_128[i] = fp32.int32 >> 16; + dst_128.a[i] = 117; + } + + for (int i = 0; i < 8; i++) + { + fp32.flt = (float) 2 * i + 7 + i * 0.25; + src_256.a[i] = fp32.flt; + res_ref_256[i] = fp32.int32 >> 16; + dst_256.a[i] = 117; + } + dst_128.x = _mm_cvtneps_avx_pbh (src_128.x); + dst_256.x = _mm256_cvtneps_avx_pbh (src_256.x); + + if (checkVus (dst_128.a, res_ref_128, 8)) + abort(); + if (checkVus (dst_128.a, res_ref_128, 8)) + abort(); +} diff --git a/gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c b/gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1a.c similarity index 100% rename from gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1.c rename to gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1a.c diff --git a/gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1b.c b/gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1b.c new file mode 100644 index 00000000000..8b5d6a644bc --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/avx512bf16vl-vcvtneps2bf16-1b.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512bf16 -mavx512vl -mavxneconvert -O2" } */ +/* { dg-final { scan-assembler-times "\{vex\} vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%ymm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "\{vex\} vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[ \\t\]+#)" 1 } } */ +/* { dg-final { scan-assembler-times "vcvtneps2bf16\[ \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ \\t\]+#)" 1 } } */ + +#include + +volatile __m128bh res1, res2; +volatile __m128 x1; +volatile __m256 x2; +volatile __mmask8 m8; + +void extern +avx512bf16_test (void) +{ + res2 = _mm256_cvtneps_pbh (x2); + res2 = _mm256_mask_cvtneps_pbh (res2, m8, x2); + res2 = _mm256_maskz_cvtneps_pbh (m8, x2); + + res1 = _mm_cvtneps_pbh (x1); + res1 = _mm_mask_cvtneps_pbh (res1, m8, x1); + res1 = _mm_maskz_cvtneps_pbh (m8, x1); +} diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc index a681bffe3e7..b3d33df7c9c 100644 --- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc +++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc @@ -82,6 +82,7 @@ extern void test_avxvnni (void) __attribute__((__target__("avxvnni"))); extern void test_avx512fp16 (void) __attribute__((__target__("avx512fp16"))); extern void test_avxifma (void) __attribute__((__target__("avxifma"))); extern void test_avxvnniint8 (void) __attribute__((__target__("avxvnniint8"))); +extern void test_avxneconvert (void) __attribute__((__target__("avxneconvert"))); extern void test_no_sgx (void) __attribute__((__target__("no-sgx"))); extern void test_no_avx5124fmaps(void) __attribute__((__target__("no-avx5124fmaps"))); @@ -165,6 +166,7 @@ extern void test_no_avxvnni (void) __attribute__((__target__("no-avxvnni"))); extern void test_no_avx512fp16 (void) __attribute__((__target__("no-avx512fp16"))); extern void test_no_avxifma (void) __attribute__((__target__("no-avxifma"))); extern void test_no_avxvnniint8 (void) __attribute__((__target__("no-avxvnniint8"))); +extern void test_no_avxneconvert (void) __attribute__((__target__("no-avxneconvert"))); extern void test_arch_nocona (void) __attribute__((__target__("arch=nocona"))); extern void test_arch_core2 (void) __attribute__((__target__("arch=core2"))); diff --git a/gcc/testsuite/gcc.target/i386/sse-12.c b/gcc/testsuite/gcc.target/i386/sse-12.c index ddde2df6657..3eabc49a6ab 100644 --- a/gcc/testsuite/gcc.target/i386/sse-12.c +++ b/gcc/testsuite/gcc.target/i386/sse-12.c @@ -3,7 +3,7 @@ popcntintrin.h gfniintrin.h and mm_malloc.h are usable with -O -std=c89 -pedantic-errors. */ /* { dg-do compile } */ -/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavxifma -mavxvnniint8" } */ +/* { dg-options "-O -std=c89 -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512bw -mavx512dq -mavx512vl -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavxifma -mavxvnniint8 -mavxneconvert" } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index 2b293216c6f..b9cdfb690d1 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8" } */ +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert" } */ /* { dg-add-options bind_pic_locally } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-14.c b/gcc/testsuite/gcc.target/i386/sse-14.c index 78b51048b90..b6ee3806dcc 100644 --- a/gcc/testsuite/gcc.target/i386/sse-14.c +++ b/gcc/testsuite/gcc.target/i386/sse-14.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8" } */ +/* { dg-options "-O0 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mpconfig -mwbnoinvd -mavx512vl -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert" } */ /* { dg-add-options bind_pic_locally } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-22.c b/gcc/testsuite/gcc.target/i386/sse-22.c index cc1c8cfa4be..71ac0f3da19 100644 --- a/gcc/testsuite/gcc.target/i386/sse-22.c +++ b/gcc/testsuite/gcc.target/i386/sse-22.c @@ -103,7 +103,7 @@ #ifndef DIFFERENT_PRAGMAS -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8") +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,avx512vl,avx512bw,avx512dq,avx512vbmi,avx512vbmi2,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8,avxneconvert") #endif /* Following intrinsics require immediate arguments. They @@ -220,7 +220,7 @@ test_4 (_mm_cmpestrz, int, __m128i, int, __m128i, int, 1) /* immintrin.h (AVX/AVX2/RDRND/FSGSBASE/F16C/RTM/AVX512F/SHA) */ #ifdef DIFFERENT_PRAGMAS -#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8") +#pragma GCC target ("avx,avx2,rdrnd,fsgsbase,f16c,rtm,avx512f,avx512er,avx512cd,avx512pf,sha,avx512vl,avx512bw,avx512dq,avx512ifma,avx512vbmi,avx512vbmi2,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,gfni,avx512bitalg,avx512bf16,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8,avxneconvert") #endif #include test_1 (_cvtss_sh, unsigned short, float, 1) diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 270f4483491..898dde80c8f 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -843,6 +843,6 @@ #define __builtin_ia32_vpclmulqdq_v2di(A, B, C) __builtin_ia32_vpclmulqdq_v2di(A, B, 1) #define __builtin_ia32_vpclmulqdq_v8di(A, B, C) __builtin_ia32_vpclmulqdq_v8di(A, B, 1) -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8") +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8,avxneconvert") #include diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 64ccfc746bd..9228e810c45 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -9530,6 +9530,18 @@ proc check_effective_target_avxvnniint8 { } { } "-O0 -mavxvnniint8" ] } +# Return 1 if avxneconvert instructions can be compiled. +proc check_effective_target_avxneconvert { } { + return [check_no_compiler_messages avxneconvert object { + typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__)); + __m128 + _mm_bcstnebf16_ps (const void *__P) + { + return (__m128) __builtin_ia32_vbcstnebf162ps128 ((const short *) __P); + } + } "-O0 -mavxneconvert" ] +} + # Return 1 if sse instructions can be compiled. proc check_effective_target_sse { } { return [check_no_compiler_messages sse object { From patchwork Fri Oct 14 07:54:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jiang, Haochen" X-Patchwork-Id: 2558 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp54675wrs; Fri, 14 Oct 2022 00:59:02 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7sSFVFigdMFUUcBuNSITDwY0XyJ/SqJZiWznym1x0NCvTmUYqgpJu2T4kbzsgQpX411Gt4 X-Received: by 2002:a17:907:97d5:b0:782:23b0:ecb8 with SMTP id js21-20020a17090797d500b0078223b0ecb8mr2760896ejc.100.1665734342250; Fri, 14 Oct 2022 00:59:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665734342; cv=none; d=google.com; s=arc-20160816; b=aDSxGJxvL29bG8bTy4akE5fJ6jzycOvVNZlXsdaJYY8M2wVR/v4Fc6HdcpM3Zi9+dt kP1RaW13QoW9RwvQGNq7ulRCBtT/msiZD5MYIdh2D/xcwPBBXxK+1mpCYzL8YAkNrVLJ 8QT8IiGofK5FOK//EJNzf2EL1gs32f2ShukXMdoYYZxqQ5p4ogCfW9WR0mRbIlNYHRTT 3MWKBjsQY65/doGgrMVBl3PLTCVZUQIKDwybIVgqFEr2CnscaVxyvY2xe38KyO1wWPCO DuiwuCaR5jfHCbuO1N09ZvdKIo+EppzN/O+jHqT4Zm3aJPwIlSc/o4if7yomWygD6g8u +LJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :references:in-reply-to:message-id:date:subject:to:dmarc-filter :delivered-to:dkim-signature:dkim-filter; bh=RKVzZCX3A6wG4RdRtKw3FHefe0ecxl+t6y+eP3ANtsY=; b=B23XH7Yqp0QixNIFSUJM/px2jqMU6jxCKPYIKeDFeT9XeIIxELDu2pagALGt6XKbT0 /lHX7xCd4tokgl8FutQhxWV14CDj42x/q1sZJxV4uxFFkUtgexPLPLplEneZea358qVM JNIyuzG98PUCb2fbtcb40QACVy5df2OTTaReODXDvcJgzeZi8ygsl1aoxuGelBtcLZU8 +M7XqAwY4xbJ9ARXhK61+8cPXBFdbR8jdAq6KDWU9exBF3Jso4z3d2QGaX7ff5a2l+hM 2sAJtkppNcpQPPLBfBxC4XNVDqvsGB0A4Z6UsQcEAI9PmVMX5PEfhz5anQ84c05NwxJy haBA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=IJxwViAB; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id n24-20020a17090695d800b0078396eb97c1si1615520ejy.382.2022.10.14.00.59.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Oct 2022 00:59:02 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=IJxwViAB; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 922CF38515FF for ; Fri, 14 Oct 2022 07:57:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 922CF38515FF DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1665734257; bh=RKVzZCX3A6wG4RdRtKw3FHefe0ecxl+t6y+eP3ANtsY=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=IJxwViABGdIlp6L4x5izazFCAy3JP1hh74SS+lJOugw3LSsjQr5jV85P6X1kQSneU l+aRCfDWOi5eQ/6X4fKkvin+6h1DPvLIb3Pd46Tl3DkRvB487F5nYmjX2nscFp9Bb9 g5M+vT9TlmToufnflxzNwpm4cuZC28Y/iCirwRqM= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by sourceware.org (Postfix) with ESMTPS id 97C83385414D for ; Fri, 14 Oct 2022 07:55:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 97C83385414D X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="285038211" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="285038211" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 00:55:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="627488475" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="627488475" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga002.jf.intel.com with ESMTP; 14 Oct 2022 00:54:50 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 01A5E1009C90; Fri, 14 Oct 2022 15:54:48 +0800 (CST) To: gcc-patches@gcc.gnu.org Subject: [PATCH 5/6] Support Intel CMPccXADD Date: Fri, 14 Oct 2022 15:54:44 +0800 Message-Id: <20221014075445.7938-6-haochen.jiang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20221014075445.7938-1-haochen.jiang@intel.com> References: <20221014075445.7938-1-haochen.jiang@intel.com> X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_PASS, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Haochen Jiang via Gcc-patches From: "Jiang, Haochen" Reply-To: Haochen Jiang Cc: hongtao.liu@intel.com Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746649053892050491?= X-GMAIL-MSGID: =?utf-8?q?1746649053892050491?= gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_available_features): Detect cmpccxadd. * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_CMPCCXADD_SET, OPTION_MASK_ISA2_CMPCCXADD_UNSET): New. (ix86_handle_option): Handle -mcmpccxadd, unset cmpccxadd when avx2 is disabled. * common/config/i386/i386-cpuinfo.h (enum processor_features): Add FEATURE_CMPCCXADD. * common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY for cmpccxadd. * config.gcc: Add cmpccxaddintrin.h. * config/i386/cpuid.h (bit_CMPCCXADD): New. * config/i386/i386-builtin-types.def: Add DEF_FUNCTION_TYPE(INT, PINT, INT, INT, INT) and DEF_FUNCTION_TYPE(LONGLONG, PLONGLONG, LONGLONG, LONGLONG, INT). * config/i386/i386-builtin.def (BDESC): Add new builtins. * config/i386/i386-c.cc (ix86_target_macros_internal): Define __CMPCCXADD__. * config/i386/i386-expand.cc (ix86_expand_special_args_builtin): Add new parameter to indicate constant position. Handle INT_FTYPE_PINT_INT_INT_INT and LONGLONG_FTYPE_PLONGLONG_LONGLONG_LONGLONG_INT. * config/i386/i386-isa.def (CMPCCXADD): Add DEF_PTA(CMPCCXADD). * config/i386/i386-options.cc (isa2_opts): Add -mcmpccxadd. (ix86_valid_target_attribute_inner_p): Handle cmpccxadd. * config/i386/i386.opt: Add option -mcmpccxadd. * config/i386/sync.md (cmpccxadd_): New define insn. * config/i386/x86gprintrin.h: Include cmpccxaddintrin.h. * doc/extend.texi: Document cmpccxadd. * doc/invoke.texi: Document -mcmpccxadd. * doc/sourcebuild.texi: Document target cmpccxadd. * config/i386/cmpccxaddintrin.h: New file. gcc/testsuite/ChangeLog: * g++.dg/other/i386-2.C: Add -mcmpccxadd. * g++.dg/other/i386-3.C: Ditto. * gcc.target/i386/avx-1.c: Add builtin define for enum. * gcc.target/i386/funcspec-56.inc: Add new target attribute. * gcc.target/i386/sse-13.c: Add builtin define for enum. * gcc.target/i386/sse-23.c: Ditto. * gcc.target/i386/x86gprintrin-1.c: Add -mcmpccxadd for 64 bit target. * gcc.target/i386/x86gprintrin-2.c: Add -mcmpccxadd for 64 bit target. Add builtin define for enum. * gcc.target/i386/x86gprintrin-3.c: Add -mcmpccxadd for 64 bit target. * gcc.target/i386/x86gprintrin-4.c: Add mcmpccxadd for 64 bit target. * gcc.target/i386/x86gprintrin-5.c: Add mcpmccxadd for 64 bit target. Add builtin define for enum. * gcc.target/i386/cmpccxadd-1.c: New test. * gcc.target/i386/cmpccxadd-2.c: New test. --- gcc/common/config/i386/cpuinfo.h | 2 + gcc/common/config/i386/i386-common.cc | 15 ++ gcc/common/config/i386/i386-cpuinfo.h | 1 + gcc/common/config/i386/i386-isas.h | 1 + gcc/config.gcc | 3 +- gcc/config/i386/cmpccxaddintrin.h | 89 +++++++++++ gcc/config/i386/cpuid.h | 1 + gcc/config/i386/i386-builtin-types.def | 4 + gcc/config/i386/i386-builtin.def | 4 + gcc/config/i386/i386-c.cc | 2 + gcc/config/i386/i386-expand.cc | 22 ++- gcc/config/i386/i386-isa.def | 1 + gcc/config/i386/i386-options.cc | 4 +- gcc/config/i386/i386.opt | 5 + gcc/config/i386/sync.md | 42 ++++++ gcc/config/i386/x86gprintrin.h | 2 + gcc/doc/extend.texi | 5 + gcc/doc/invoke.texi | 10 +- gcc/doc/sourcebuild.texi | 3 + gcc/testsuite/g++.dg/other/i386-2.C | 2 +- gcc/testsuite/g++.dg/other/i386-3.C | 2 +- gcc/testsuite/gcc.target/i386/avx-1.c | 4 + gcc/testsuite/gcc.target/i386/cmpccxadd-1.c | 61 ++++++++ gcc/testsuite/gcc.target/i386/cmpccxadd-2.c | 138 ++++++++++++++++++ gcc/testsuite/gcc.target/i386/funcspec-56.inc | 2 + gcc/testsuite/gcc.target/i386/sse-13.c | 6 +- gcc/testsuite/gcc.target/i386/sse-23.c | 6 +- .../gcc.target/i386/x86gprintrin-1.c | 2 +- .../gcc.target/i386/x86gprintrin-2.c | 6 +- .../gcc.target/i386/x86gprintrin-3.c | 2 +- .../gcc.target/i386/x86gprintrin-4.c | 2 +- .../gcc.target/i386/x86gprintrin-5.c | 6 +- gcc/testsuite/lib/target-supports.exp | 10 ++ 33 files changed, 450 insertions(+), 15 deletions(-) create mode 100644 gcc/config/i386/cmpccxaddintrin.h create mode 100644 gcc/testsuite/gcc.target/i386/cmpccxadd-1.c create mode 100644 gcc/testsuite/gcc.target/i386/cmpccxadd-2.c diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h index e9fd586704d..f73834b086c 100644 --- a/gcc/common/config/i386/cpuinfo.h +++ b/gcc/common/config/i386/cpuinfo.h @@ -789,6 +789,8 @@ get_available_features (struct __processor_model *cpu_model, __cpuid_count (7, 1, eax, ebx, ecx, edx); if (eax & bit_HRESET) set_feature (FEATURE_HRESET); + if (eax & bit_CMPCCXADD) + set_feature(FEATURE_CMPCCXADD); if (avx_usable) { if (eax & bit_AVXVNNI) diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index f9c906f75cf..75966779d82 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -110,6 +110,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA2_AMX_BF16_SET OPTION_MASK_ISA2_AMX_BF16 #define OPTION_MASK_ISA2_AVXVNNIINT8_SET OPTION_MASK_ISA2_AVXVNNIINT8 #define OPTION_MASK_ISA2_AVXNECONVERT_SET OPTION_MASK_ISA2_AVXNECONVERT +#define OPTION_MASK_ISA2_CMPCCXADD_SET OPTION_MASK_ISA2_CMPCCXADD /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same as -msse4.2. */ @@ -283,6 +284,7 @@ along with GCC; see the file COPYING3. If not see #define OPTION_MASK_ISA2_WIDEKL_UNSET OPTION_MASK_ISA2_WIDEKL #define OPTION_MASK_ISA2_AVXVNNIINT8_UNSET OPTION_MASK_ISA2_AVXVNNIINT8 #define OPTION_MASK_ISA2_AVXNECONVERT_UNSET OPTION_MASK_ISA2_AVXNECONVERT +#define OPTION_MASK_ISA2_CMPCCXADD_UNSET OPTION_MASK_ISA2_CMPCCXADD /* SSE4 includes both SSE4.1 and SSE4.2. -mno-sse4 should the same as -mno-sse4.1. */ @@ -1181,6 +1183,19 @@ ix86_handle_option (struct gcc_options *opts, } return true; + case OPT_mcmpccxadd: + if (value) + { + opts->x_ix86_isa_flags2 |= OPTION_MASK_ISA2_CMPCCXADD_SET; + opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_CMPCCXADD_SET; + } + else + { + opts->x_ix86_isa_flags2 &= ~OPTION_MASK_ISA2_CMPCCXADD_UNSET; + opts->x_ix86_isa_flags2_explicit |= OPTION_MASK_ISA2_CMPCCXADD_UNSET; + } + return true; + case OPT_mfma: if (value) { diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h index 2d3fbfc817a..5a61d817007 100644 --- a/gcc/common/config/i386/i386-cpuinfo.h +++ b/gcc/common/config/i386/i386-cpuinfo.h @@ -243,6 +243,7 @@ enum processor_features FEATURE_AVXIFMA, FEATURE_AVXVNNIINT8, FEATURE_AVXNECONVERT, + FEATURE_CMPCCXADD, CPU_FEATURE_MAX }; diff --git a/gcc/common/config/i386/i386-isas.h b/gcc/common/config/i386/i386-isas.h index bceaee589ee..3035e4a8186 100644 --- a/gcc/common/config/i386/i386-isas.h +++ b/gcc/common/config/i386/i386-isas.h @@ -180,4 +180,5 @@ ISA_NAMES_TABLE_START P_NONE, "-mavxvnniint8") ISA_NAMES_TABLE_ENTRY("avxneconvert", FEATURE_AVXNECONVERT, P_NONE, "-mavxneconvert") + ISA_NAMES_TABLE_ENTRY("cmpccxadd", FEATURE_CMPCCXADD, P_NONE, "-mcmpccxadd") ISA_NAMES_TABLE_END diff --git a/gcc/config.gcc b/gcc/config.gcc index 840b62aee61..fe063bfbb26 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -422,7 +422,8 @@ i[34567]86-*-* | x86_64-*-*) amxbf16intrin.h x86gprintrin.h uintrintrin.h hresetintrin.h keylockerintrin.h avxvnniintrin.h mwaitintrin.h avx512fp16intrin.h avx512fp16vlintrin.h - avxifmaintrin.h avxvnniint8intrin.h avxneconvertintrin.h" + avxifmaintrin.h avxvnniint8intrin.h avxneconvertintrin.h + cmpccxaddintrin.h" ;; ia64-*-*) extra_headers=ia64intrin.h diff --git a/gcc/config/i386/cmpccxaddintrin.h b/gcc/config/i386/cmpccxaddintrin.h new file mode 100644 index 00000000000..74ae015476d --- /dev/null +++ b/gcc/config/i386/cmpccxaddintrin.h @@ -0,0 +1,89 @@ +/* Copyright (C) 2012-2021 Free Software Foundation, Inc. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +#ifndef _X86GPRINTRIN_H_INCLUDED +#error "Never use directly; include instead." +#endif + +#ifndef _CMPCCXADDINTRIN_H_INCLUDED +#define _CMPCCXADDINTRIN_H_INCLUDED + +#ifdef __x86_64__ + +#ifndef __CMPCCXADD__ +#pragma GCC push_options +#pragma GCC target("cmpccxadd") +#define __DISABLE_CMPCCXADD__ +#endif /* __CMPCCXADD__ */ + +typedef enum { + _CMPCCX_BE, /* Below or equal. */ + _CMPCCX_B, /* Below. */ + _CMPCCX_LE, /* Less or equal. */ + _CMPCCX_L, /* Less. */ + _CMPCCX_NBE, /* Neither below nor equal. */ + _CMPCCX_NB, /* Not below. */ + _CMPCCX_NLE, /* Neither less nor equal. */ + _CMPCCX_NL, /* Not less. */ + _CMPCCX_NO, /* No overflow. */ + _CMPCCX_NP, /* No parity. */ + _CMPCCX_NS, /* No sign. */ + _CMPCCX_NZ, /* Not zero. */ + _CMPCCX_O, /* Overflow. */ + _CMPCCX_P, /* Parity. */ + _CMPCCX_S, /* Sign. */ + _CMPCCX_Z, /* Zero. */ +} _CMPCCX_ENUM; + +#ifdef __OPTIMIZE__ +extern __inline int +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__cmpccxadd_epi32 (int *__A, int __B, int __C, const _CMPCCX_ENUM __D) +{ + return __builtin_ia32_cmpccxadd (__A, __B, __C, __D); +} + +extern __inline long long +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +__cmpccxadd_epi64 (long long *__A, long long __B, long long __C, + const _CMPCCX_ENUM __D) +{ + return __builtin_ia32_cmpccxadd64 (__A, __B, __C, __D); +} +#else +#define __cmpccxadd_epi32(A,B,C,D) \ +__builtin_ia32_cmpccxadd((int *) (A), (int) (B), (int) (C), \ + (_CMPCCX_ENUM)(D)) +#define __cmpccxadd_epi64(A,B,C,D) \ +__builtin_ia32_cmpccxadd64((int*) (A), (int) (B), (int) (C), \ + (_CMPCCX_ENUM)(D)) +#endif + +#ifdef __DISABLE_CMPCCXADD__ +#undef __DISABLE_CMPCCXADD__ +#pragma GCC pop_options +#endif /* __DISABLE_CMPCCXADD__ */ + +#endif + +#endif /* _CMPCCXADDINTRIN_H_INCLUDED */ diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h index 18bbc0cb7be..19c0d033921 100644 --- a/gcc/config/i386/cpuid.h +++ b/gcc/config/i386/cpuid.h @@ -27,6 +27,7 @@ /* %eax */ #define bit_AVXVNNI (1 << 4) #define bit_AVX512BF16 (1 << 5) +#define bit_CMPCCXADD (1 << 7) #define bit_HRESET (1 << 22) #define bit_AVXIFMA (1 << 23) diff --git a/gcc/config/i386/i386-builtin-types.def b/gcc/config/i386/i386-builtin-types.def index ebf6e5b4ad8..922348fcd60 100644 --- a/gcc/config/i386/i386-builtin-types.def +++ b/gcc/config/i386/i386-builtin-types.def @@ -1406,3 +1406,7 @@ DEF_FUNCTION_TYPE (V8BF, V8SF) DEF_FUNCTION_TYPE (V8BF, V4SF) DEF_FUNCTION_TYPE (V4SF, PCV8BF) DEF_FUNCTION_TYPE (V8SF, PCV16BF) + +# CMPccXADD builtins +DEF_FUNCTION_TYPE (INT, PINT, INT, INT, INT) +DEF_FUNCTION_TYPE (LONGLONG, PLONGLONG, LONGLONG, LONGLONG, INT) diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def index a429577180c..d4d4fda1d4a 100644 --- a/gcc/config/i386/i386-builtin.def +++ b/gcc/config/i386/i386-builtin.def @@ -288,6 +288,10 @@ BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneobf162ps_v8sf, "__builti BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneoph2ps_v4sf, "__builtin_ia32_vcvtneoph2ps128", IX86_BUILTIN_VCVTNEOPH2PS128, UNKNOWN, (int) V4SF_FTYPE_PCV8HF) BDESC (0, OPTION_MASK_ISA2_AVXNECONVERT, CODE_FOR_vcvtneoph2ps_v8sf, "__builtin_ia32_vcvtneoph2ps256", IX86_BUILTIN_VCVTNEOPH2PS256, UNKNOWN, (int) V8SF_FTYPE_PCV16HF) +/* CMPCCXADD */ +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_CMPCCXADD, CODE_FOR_cmpccxadd_si, "__builtin_ia32_cmpccxadd", IX86_BUILTIN_CMPCCXADD, UNKNOWN, (int) INT_FTYPE_PINT_INT_INT_INT) +BDESC (OPTION_MASK_ISA_64BIT, OPTION_MASK_ISA2_CMPCCXADD, CODE_FOR_cmpccxadd_di, "__builtin_ia32_cmpccxadd64", IX86_BUILTIN_CMPCCXADD64, UNKNOWN, (int) LONGLONG_FTYPE_PLONGLONG_LONGLONG_LONGLONG_INT) + /* AVX512BW */ BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv32hi_mask, "__builtin_ia32_loaddquhi512_mask", IX86_BUILTIN_LOADDQUHI512_MASK, UNKNOWN, (int) V32HI_FTYPE_PCSHORT_V32HI_USI) BDESC (OPTION_MASK_ISA_AVX512BW, 0, CODE_FOR_avx512bw_loadv64qi_mask, "__builtin_ia32_loaddquqi512_mask", IX86_BUILTIN_LOADDQUQI512_MASK, UNKNOWN, (int) V64QI_FTYPE_PCCHAR_V64QI_UDI) diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc index 48934df664c..9885a724d0f 100644 --- a/gcc/config/i386/i386-c.cc +++ b/gcc/config/i386/i386-c.cc @@ -639,6 +639,8 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, def_or_undef (parse_in, "__AVXVNNIINT8__"); if (isa_flag2 & OPTION_MASK_ISA2_AVXNECONVERT) def_or_undef (parse_in, "__AVXNECONVERT__"); + if (isa_flag2 & OPTION_MASK_ISA2_CMPCCXADD) + def_or_undef (parse_in, "__CMPCCXADD__"); if (TARGET_IAMCU) { def_or_undef (parse_in, "__iamcu"); diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 1e29fe584af..cad2eb728fd 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -11825,8 +11825,9 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, tree arg; rtx pat, op; unsigned int i, nargs, arg_adjust, memory; + unsigned int constant = 100; bool aligned_mem = false; - rtx xops[3]; + rtx xops[4]; enum insn_code icode = d->icode; const struct insn_data_d *insn_p = &insn_data[icode]; machine_mode tmode = insn_p->operand[0].mode; @@ -12115,6 +12116,13 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, klass = load; memory = 0; break; + case INT_FTYPE_PINT_INT_INT_INT: + case LONGLONG_FTYPE_PLONGLONG_LONGLONG_LONGLONG_INT: + nargs = 4; + klass = load; + memory = 0; + constant = 3; + break; default: gcc_unreachable (); } @@ -12180,6 +12188,15 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, if (MEM_ALIGN (op) < align) set_mem_align (op, align); } + else if (i == constant) + { + /* This must be the constant. */ + if (!insn_p->operand[nargs].predicate(op, SImode)) + { + error ("the fourth argument must be one of enum %qs", "_CMPCCX_ENUM"); + return const0_rtx; + } + } else { /* This must be register. */ @@ -12221,6 +12238,9 @@ ix86_expand_special_args_builtin (const struct builtin_description *d, case 3: pat = GEN_FCN (icode) (target, xops[0], xops[1], xops[2]); break; + case 4: + pat = GEN_FCN (icode) (target, xops[0], xops[1], xops[2], xops[3]); + break; default: gcc_unreachable (); } diff --git a/gcc/config/i386/i386-isa.def b/gcc/config/i386/i386-isa.def index 4ea3f96f69f..7ffc73ba23e 100644 --- a/gcc/config/i386/i386-isa.def +++ b/gcc/config/i386/i386-isa.def @@ -112,3 +112,4 @@ DEF_PTA(AVX512FP16) DEF_PTA(AVXIFMA) DEF_PTA(AVXVNNIINT8) DEF_PTA(AVXNECONVERT) +DEF_PTA(CMPCCXADD) diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index e59e2d8aeaf..fb872afdfb5 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -229,7 +229,8 @@ static struct ix86_target_opts isa2_opts[] = { "-mavx512fp16", OPTION_MASK_ISA2_AVX512FP16 }, { "-mavxifma", OPTION_MASK_ISA2_AVXIFMA }, { "-mavxvnniint8", OPTION_MASK_ISA2_AVXVNNIINT8 }, - { "-mavxneconvert", OPTION_MASK_ISA2_AVXNECONVERT } + { "-mavxneconvert", OPTION_MASK_ISA2_AVXNECONVERT }, + { "-mcmpccxadd", OPTION_MASK_ISA2_CMPCCXADD } }; static struct ix86_target_opts isa_opts[] = { @@ -1078,6 +1079,7 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree args, char *p_strings[], IX86_ATTR_ISA ("avxifma", OPT_mavxifma), IX86_ATTR_ISA ("avxvnniint8", OPT_mavxvnniint8), IX86_ATTR_ISA ("avxneconvert", OPT_mavxneconvert), + IX86_ATTR_ISA ("cmpccxadd", OPT_mcmpccxadd), /* enum options */ IX86_ATTR_ENUM ("fpmath=", OPT_mfpmath_), diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt index 6e07b89ac4c..c4a3bdcf960 100644 --- a/gcc/config/i386/i386.opt +++ b/gcc/config/i386/i386.opt @@ -1229,3 +1229,8 @@ mavxneconvert Target Mask(ISA2_AVXNECONVERT) Var(ix86_isa_flags2) Save Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, and AVXNECONVERT build-in functions and code generation. + +mcmpccxadd +Target Mask(ISA2_CMPCCXADD) Var(ix86_isa_flags2) Save +Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, and +CMPCCXADD build-in functions and code generation. diff --git a/gcc/config/i386/sync.md b/gcc/config/i386/sync.md index 92634d538cb..2b6f2f4c826 100644 --- a/gcc/config/i386/sync.md +++ b/gcc/config/i386/sync.md @@ -37,6 +37,9 @@ UNSPECV_CMPXCHG UNSPECV_XCHG UNSPECV_LOCK + + ;; For CMPccXADD support + UNSPECV_CMPCCXADD ]) (define_expand "sse2_lfence" @@ -1061,3 +1064,42 @@ (any_logic:SWI (match_dup 0) (match_dup 1)))] "" "lock{%;} %K2{}\t{%1, %0|%0, %1}") + +;; CMPCCXADD + +(define_insn "@cmpccxadd__1" + [(set (match_operand:SWI48x 1 "register_operand" "+r") + (match_operand:SWI48x 0 "memory_operand" "+m")) + (set (match_dup 0) + (unspec_volatile:SWI48x + [(match_dup 0) + (match_dup 1) + (match_operand:SWI48x 2 "register_operand" "r") + (match_operand:SI 3 "const_0_to_15_operand" "n")] + UNSPECV_CMPCCXADD)) + (clobber (reg:CC FLAGS_REG))] + "TARGET_CMPCCXADD && TARGET_64BIT" +{ + char buf[128]; + const char *ops = "cmp%sxadd\t{%%2, %%1, %%0|%%0, %%1, %%2}"; + char const *cc[16] = {"be" ,"b", "le", "l", "nbe", "nb", "nle", "nl", + "no", "np", "ns", "nz", "o", "p", "s", "z"}; + + snprintf (buf, sizeof (buf), ops, cc[INTVAL (operands[3])]); + output_asm_insn (buf, operands); + return ""; +}) + +(define_expand "cmpccxadd_" + [(match_operand:SWI48x 0 "register_operand") + (match_operand:SWI48x 1 "memory_operand") + (match_operand:SWI48x 2 "register_operand") + (match_operand:SWI48x 3 "register_operand") + (match_operand:SI 4 "const_0_to_15_operand")] + "TARGET_CMPCCXADD && TARGET_64BIT" +{ + emit_insn (gen_cmpccxadd_1 (mode, operands[1], operands[2], + operands[3], operands[4])); + emit_move_insn (operands[0], operands[2]); + DONE; +}) diff --git a/gcc/config/i386/x86gprintrin.h b/gcc/config/i386/x86gprintrin.h index e0be01d5e78..a84fbe9137d 100644 --- a/gcc/config/i386/x86gprintrin.h +++ b/gcc/config/i386/x86gprintrin.h @@ -52,6 +52,8 @@ #include +#include + #include #include diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index 0a4396f92bb..34c23240dfb 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -7075,6 +7075,11 @@ Enable/disable the generation of the AVXVNNIINT8 instructions. @cindex @code{target("avxneconvert")} function attribute, x86 Enable/disable the generation of the AVXNECONVERT instructions. +@item cmpccxadd +@itemx no-cmpccxadd +@cindex @code{target("cmpccxadd")} function attribute, x86 +Enable/disable the generation of the CMPccXADD instructions. + @item cld @itemx no-cld @cindex @code{target("cld")} function attribute, x86 diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 307fb7fa441..cbbc0201828 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -1436,7 +1436,7 @@ See RS/6000 and PowerPC Options. -mavx5124fmaps -mavx512vnni -mavx5124vnniw -mprfchw -mrdpid @gol -mrdseed -msgx -mavx512vp2intersect -mserialize -mtsxldtrk@gol -mamx-tile -mamx-int8 -mamx-bf16 -muintr -mhreset -mavxvnni@gol --mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert @gol +-mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert -mcmpccxadd @gol -mcldemote -mms-bitfields -mno-align-stringops -minline-all-stringops @gol -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol -mkl -mwidekl @gol @@ -32902,6 +32902,9 @@ preferred alignment to @option{-mpreferred-stack-boundary=2}. @need 200 @itemx -mavxneconvert @opindex mavxneconvert +@need 200 +@itemx -mcmpccxadd +@opindex mcmpccxadd These switches enable the use of instructions in the MMX, SSE, SSE2, SSE3, SSSE3, SSE4, SSE4A, SSE4.1, SSE4.2, AVX, AVX2, AVX512F, AVX512PF, AVX512ER, AVX512CD, AVX512VL, AVX512BW, AVX512DQ, AVX512IFMA, AVX512VBMI, SHA, @@ -32912,8 +32915,9 @@ XSAVEOPT, XSAVEC, XSAVES, RTM, HLE, TBM, MWAITX, CLZERO, PKU, AVX512VBMI2, GFNI, VAES, WAITPKG, VPCLMULQDQ, AVX512BITALG, MOVDIRI, MOVDIR64B, AVX512BF16, ENQCMD, AVX512VPOPCNTDQ, AVX5124FMAPS, AVX512VNNI, AVX5124VNNIW, SERIALIZE, UINTR, HRESET, AMXTILE, AMXINT8, AMXBF16, KL, WIDEKL, AVXVNNI, AVX512FP16, -AVXIFMA, AVXVNNIINT8, AVXNECONVERT or CLDEMOTE extended instruction sets. Each -has a corresponding @option{-mno-} option to disable use of these instructions. +AVXIFMA, AVXVNNIINT8, AVXNECONVERT, CMPCCXADD or CLDEMOTE extended instruction +sets. Each has a corresponding @option{-mno-} option to disable use of these +instructions. These extensions are also available as built-in functions: see @ref{x86 Built-in Functions}, for details of the functions enabled and diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi index a12175b6498..714595d33bf 100644 --- a/gcc/doc/sourcebuild.texi +++ b/gcc/doc/sourcebuild.texi @@ -2511,6 +2511,9 @@ Target supports the execution of @code{amx-bf16} instructions. @item cell_hw Test system can execute AltiVec and Cell PPU instructions. +@item cmpccxadd +Target supports the execution of @code{cmpccxadd} instructions. + @item coldfire_fpu Target uses a ColdFire FPU. diff --git a/gcc/testsuite/g++.dg/other/i386-2.C b/gcc/testsuite/g++.dg/other/i386-2.C index dd3e71f25ed..f7dbbbbf619 100644 --- a/gcc/testsuite/g++.dg/other/i386-2.C +++ b/gcc/testsuite/g++.dg/other/i386-2.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert" } */ +/* { dg-options "-O -pedantic-errors -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert -mcmpccxadd" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/g++.dg/other/i386-3.C b/gcc/testsuite/g++.dg/other/i386-3.C index cd7045cc4e4..2ac5d9f2df5 100644 --- a/gcc/testsuite/g++.dg/other/i386-3.C +++ b/gcc/testsuite/g++.dg/other/i386-3.C @@ -1,5 +1,5 @@ /* { dg-do compile { target i?86-*-* x86_64-*-* } } */ -/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert" } */ +/* { dg-options "-O -fkeep-inline-functions -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512dq -mavx512bw -mavx512vl -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mavx512vp2intersect -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert -mcmpccxadd" } */ /* Test that {,x,e,p,t,s,w,a,b,i}mmintrin.h, mm3dnow.h, fma4intrin.h, xopintrin.h, abmintrin.h, bmiintrin.h, tbmintrin.h, lwpintrin.h, diff --git a/gcc/testsuite/gcc.target/i386/avx-1.c b/gcc/testsuite/gcc.target/i386/avx-1.c index 154e7b3b107..051a1b59b5b 100644 --- a/gcc/testsuite/gcc.target/i386/avx-1.c +++ b/gcc/testsuite/gcc.target/i386/avx-1.c @@ -835,6 +835,10 @@ #define __builtin_ia32_bextri_u32(X, Y) __builtin_ia32_bextri_u32 (X, 1) #define __builtin_ia32_bextri_u64(X, Y) __builtin_ia32_bextri_u64 (X, 1) +/* cmpccxadd.h */ +#define __builtin_ia32_cmpccxadd(A, B, C, D) __builtin_ia32_cmpccxadd(A, B, C, 1) +#define __builtin_ia32_cmpccxadd64(A, B, C, D) __builtin_ia32_cmpccxadd64(A, B, C, 1) + #include #include #include diff --git a/gcc/testsuite/gcc.target/i386/cmpccxadd-1.c b/gcc/testsuite/gcc.target/i386/cmpccxadd-1.c new file mode 100644 index 00000000000..699ed9b2dc2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/cmpccxadd-1.c @@ -0,0 +1,61 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mcmpccxadd" } */ +/* { dg-final { scan-assembler-times "cmpbexadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpbxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmplexadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmplxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpnbexadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpnbxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpnlexadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpnlxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpnoxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpnpxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpnsxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpnzxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpoxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmppxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpsxadd\[ \\t\]" 2 } } */ +/* { dg-final { scan-assembler-times "cmpzxadd\[ \\t\]" 2 } } */ +#include + +int *a; +int b, c; +long long *d; +long long e, f; + +void extern +cmpccxadd_test(void) +{ + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_BE); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_BE); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_B); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_B); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_LE); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_LE); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_L); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_L); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_NBE); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_NBE); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_NB); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_NB); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_NLE); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_NLE); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_NL); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_NL); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_NO); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_NO); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_NP); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_NP); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_NS); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_NS); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_NZ); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_NZ); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_O); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_O); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_P); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_P); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_S); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_S); + b = __cmpccxadd_epi32 (a, b, c, _CMPCCX_Z); + e = __cmpccxadd_epi64 (d, e, f, _CMPCCX_Z); +} diff --git a/gcc/testsuite/gcc.target/i386/cmpccxadd-2.c b/gcc/testsuite/gcc.target/i386/cmpccxadd-2.c new file mode 100644 index 00000000000..76d17803fbb --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/cmpccxadd-2.c @@ -0,0 +1,138 @@ +/* { dg-do run { target { ! ia32 } } } */ +/* { dg-options "-O2 -mcmpccxadd" } */ +/* { dg-require-effective-target cmpccxadd } */ + +#include +#include + +int +main() +{ + if (!__builtin_cpu_supports("cmpccxadd")) + return 0; + + int srcdest1[16] = { 1,1,1,1,2,1,2,1,1,2,2,2,-2147483648,4,1,1 }; + int srcdest2[16] = { 1,2,1,2,1,1,1,1,1,1,1,1,1,1,2,1 }; + int src3[16] = { 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 }; + int _srcdest1[16], _srcdest2[16], res[16], cond[16]; + long long srcdest1_64[16] = { 1,1,1,1,2,1,2,1,1,2,2,2,-9223372036854775807LL-1,4,1,1 }; + long long srcdest2_64[16] = { 1,2,1,2,1,1,1,1,1,1,1,1,1,1,2,1 }; + long long src3_64[16] = { 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 }; + long long _srcdest1_64[16], _srcdest2_64[16], res_64[16], cond_64[16]; + + int tmp2[16]; + long long tmp2_64[16]; + + int cf[16] = { 0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0 }; + int of[16] = { 0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0 }; + int sf[16] = { 0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0 }; + int zf[16] = { 1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1 }; + int af[16] = { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 }; + int pf[16] = { 0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0 }; + + for (int i = 0; i < 16; i++) + { + tmp2[i] = srcdest1[i] + src3[i]; + tmp2_64[i] = srcdest1_64[i] + src3_64[i]; + } + + cond[0] = (cf[0] || zf[0]) == 1 ? 1 : 0; + cond[1] = cf[1] == 1 ? 1 : 0; + cond[2] = (((sf[2] && !of[2]) || (!sf[2] && of[2])) || zf[2]) == 1 ? 1 : 0; + cond[3] = ((sf[3] && !of[3]) || (!sf[3] && of[3])) == 1 ? 1 : 0; + cond[4] = (cf[4] || zf[4]) == 0 ? 1 : 0; + cond[5] = cf[5] == 0 ? 1 : 0; + cond[6] = (((sf[6] && !of[6]) || (!sf[6] && of[6])) || zf[6]) == 0 ? 1 : 0; + cond[7] = ((sf[7] && !of[7]) || (!sf[7] && of[7])) == 0 ? 1 : 0; + cond[8] = of[8] == 0 ? 1 : 0; + cond[9] = pf[9] == 0 ? 1 : 0; + cond[10] = sf[10] == 0 ? 1 : 0; + cond[11] = zf[11] == 0 ? 1 : 0; + cond[12] = of[12] == 1 ? 1 : 0; + cond[13] = pf[13] == 1 ? 1 : 0; + cond[14] = sf[14] == 1 ? 1 : 0; + cond[15] = zf[15] == 1 ? 1 : 0; + + cond_64[0] = (cf[0] || zf[0]) == 1 ? 1 : 0; + cond_64[1] = cf[1] == 1 ? 1 : 0; + cond_64[2] = (((sf[2] && !of[2]) || (!sf[2] && of[2])) || zf[2]) == 1 ? 1 : 0; + cond_64[3] = ((sf[3] && !of[3]) || (!sf[3] && of[3])) == 1 ? 1 : 0; + cond_64[4] = (cf[4] || zf[4]) == 0 ? 1 : 0; + cond_64[5] = cf[5] == 0 ? 1 : 0; + cond_64[6] = (((sf[6] && !of[6]) || (!sf[6] && of[6])) || zf[6]) == 0 ? 1 : 0; + cond_64[7] = ((sf[7] && !of[7]) || (!sf[7] && of[7])) == 0 ? 1 : 0; + cond_64[8] = of[8] == 0 ? 1 : 0; + cond_64[9] = pf[9] == 0 ? 1 : 0; + cond_64[10] = sf[10] == 0 ? 1 : 0; + cond_64[11] = zf[11] == 0 ? 1 : 0; + cond_64[12] = of[12] == 1 ? 1 : 0; + cond_64[13] = pf[13] == 1 ? 1 : 0; + cond_64[14] = sf[14] == 1 ? 1 : 0; + cond_64[15] = zf[15] == 1 ? 1 : 0; + + for (int i = 0; i < 16; i++) + { + if (cond[i] == 1) + { + _srcdest1[i] = tmp2[i]; + } + else + { + _srcdest1[i] = srcdest1[i]; + } + if (cond_64[i] == 1) + { + _srcdest1_64[i] = tmp2_64[i]; + } + else + { + _srcdest1_64[i] = srcdest1_64[i]; + } + _srcdest2[i] = srcdest1[i]; + _srcdest2_64[i] = srcdest1_64[i]; + } + + res[0] = __cmpccxadd_epi32 (&srcdest1[0], srcdest2[0], src3[0], _CMPCCX_BE); + res[1] = __cmpccxadd_epi32 (&srcdest1[1], srcdest2[1], src3[1], _CMPCCX_B); + res[2] = __cmpccxadd_epi32 (&srcdest1[2], srcdest2[2], src3[2], _CMPCCX_LE); + res[3] = __cmpccxadd_epi32 (&srcdest1[3], srcdest2[3], src3[3], _CMPCCX_L); + res[4] = __cmpccxadd_epi32 (&srcdest1[4], srcdest2[4], src3[4], _CMPCCX_NBE); + res[5] = __cmpccxadd_epi32 (&srcdest1[5], srcdest2[5], src3[5], _CMPCCX_NB); + res[6] = __cmpccxadd_epi32 (&srcdest1[6], srcdest2[6], src3[6], _CMPCCX_NLE); + res[7] = __cmpccxadd_epi32 (&srcdest1[7], srcdest2[7], src3[7], _CMPCCX_NL); + res[8] = __cmpccxadd_epi32 (&srcdest1[8], srcdest2[8], src3[8], _CMPCCX_NO); + res[9] = __cmpccxadd_epi32 (&srcdest1[9], srcdest2[9], src3[9], _CMPCCX_NP); + res[10] = __cmpccxadd_epi32 (&srcdest1[10], srcdest2[10], src3[10], _CMPCCX_NS); + res[11] = __cmpccxadd_epi32 (&srcdest1[11], srcdest2[11], src3[11], _CMPCCX_NZ); + res[12] = __cmpccxadd_epi32 (&srcdest1[12], srcdest2[12], src3[12], _CMPCCX_O); + res[13] = __cmpccxadd_epi32 (&srcdest1[13], srcdest2[13], src3[13], _CMPCCX_P); + res[14] = __cmpccxadd_epi32 (&srcdest1[14], srcdest2[14], src3[14], _CMPCCX_S); + res[15] = __cmpccxadd_epi32 (&srcdest1[15], srcdest2[15], src3[15], _CMPCCX_Z); + + res_64[0] = __cmpccxadd_epi64(&srcdest1_64[0], srcdest2_64[0], src3_64[0], _CMPCCX_BE); + res_64[1] = __cmpccxadd_epi64(&srcdest1_64[1], srcdest2_64[1], src3_64[1], _CMPCCX_B); + res_64[2] = __cmpccxadd_epi64(&srcdest1_64[2], srcdest2_64[2], src3_64[2], _CMPCCX_LE); + res_64[3] = __cmpccxadd_epi64(&srcdest1_64[3], srcdest2_64[3], src3_64[3], _CMPCCX_L); + res_64[4] = __cmpccxadd_epi64(&srcdest1_64[4], srcdest2_64[4], src3_64[4], _CMPCCX_NBE); + res_64[5] = __cmpccxadd_epi64(&srcdest1_64[5], srcdest2_64[5], src3_64[5], _CMPCCX_NB); + res_64[6] = __cmpccxadd_epi64(&srcdest1_64[6], srcdest2_64[6], src3_64[6], _CMPCCX_NLE); + res_64[7] = __cmpccxadd_epi64(&srcdest1_64[7], srcdest2_64[7], src3_64[7], _CMPCCX_NL); + res_64[8] = __cmpccxadd_epi64(&srcdest1_64[8], srcdest2_64[8], src3_64[8], _CMPCCX_NO); + res_64[9] = __cmpccxadd_epi64(&srcdest1_64[9], srcdest2_64[9], src3_64[9], _CMPCCX_NP); + res_64[10] = __cmpccxadd_epi64(&srcdest1_64[10], srcdest2_64[10], src3_64[10], _CMPCCX_NS); + res_64[11] = __cmpccxadd_epi64(&srcdest1_64[11], srcdest2_64[11], src3_64[11], _CMPCCX_NZ); + res_64[12] = __cmpccxadd_epi64(&srcdest1_64[12], srcdest2_64[12], src3_64[12], _CMPCCX_O); + res_64[13] = __cmpccxadd_epi64(&srcdest1_64[13], srcdest2_64[13], src3_64[13], _CMPCCX_P); + res_64[14] = __cmpccxadd_epi64(&srcdest1_64[14], srcdest2_64[14], src3_64[14], _CMPCCX_S); + res_64[15] = __cmpccxadd_epi64(&srcdest1_64[15], srcdest2_64[15], src3_64[15], _CMPCCX_Z); + + for (int i = 0; i < 16; i++) + { + if ((srcdest1[i] != _srcdest1[i]) || (res[i] != _srcdest2[i])) + abort(); + if ((srcdest1_64[i] != _srcdest1_64[i]) || (res_64[i] != _srcdest2_64[i])) + abort(); + } + + return 0; +} diff --git a/gcc/testsuite/gcc.target/i386/funcspec-56.inc b/gcc/testsuite/gcc.target/i386/funcspec-56.inc index b3d33df7c9c..2e35a7ae50e 100644 --- a/gcc/testsuite/gcc.target/i386/funcspec-56.inc +++ b/gcc/testsuite/gcc.target/i386/funcspec-56.inc @@ -83,6 +83,7 @@ extern void test_avx512fp16 (void) __attribute__((__target__("avx512fp16"))); extern void test_avxifma (void) __attribute__((__target__("avxifma"))); extern void test_avxvnniint8 (void) __attribute__((__target__("avxvnniint8"))); extern void test_avxneconvert (void) __attribute__((__target__("avxneconvert"))); +extern void test_cmpccxadd (void) __attribute__((__target__("cmpccxadd"))); extern void test_no_sgx (void) __attribute__((__target__("no-sgx"))); extern void test_no_avx5124fmaps(void) __attribute__((__target__("no-avx5124fmaps"))); @@ -167,6 +168,7 @@ extern void test_no_avx512fp16 (void) __attribute__((__target__("no-avx512fp16" extern void test_no_avxifma (void) __attribute__((__target__("no-avxifma"))); extern void test_no_avxvnniint8 (void) __attribute__((__target__("no-avxvnniint8"))); extern void test_no_avxneconvert (void) __attribute__((__target__("no-avxneconvert"))); +extern void test_no_cmpccxadd (void) __attribute__((__target__("no-cmpccxadd"))); extern void test_arch_nocona (void) __attribute__((__target__("arch=nocona"))); extern void test_arch_core2 (void) __attribute__((__target__("arch=core2"))); diff --git a/gcc/testsuite/gcc.target/i386/sse-13.c b/gcc/testsuite/gcc.target/i386/sse-13.c index b9cdfb690d1..e947b4347f4 100644 --- a/gcc/testsuite/gcc.target/i386/sse-13.c +++ b/gcc/testsuite/gcc.target/i386/sse-13.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert" } */ +/* { dg-options "-O2 -Werror-implicit-function-declaration -march=k8 -msse4a -m3dnow -mavx -mavx2 -mfma4 -mxop -maes -mpclmul -mpopcnt -mabm -mlzcnt -mbmi -mbmi2 -mtbm -mlwp -mfsgsbase -mrdrnd -mf16c -mfma -mrtm -mrdseed -mprfchw -madx -mfxsr -mxsaveopt -mavx512f -mavx512er -mavx512cd -mavx512pf -msha -mprefetchwt1 -mxsavec -mxsaves -mclflushopt -mavx512vl -mavx512dq -mavx512bw -mavx512vbmi -mavx512vbmi2 -mavx512ifma -mavx5124fmaps -mavx5124vnniw -mavx512vpopcntdq -mavx512vp2intersect -mclwb -mmwaitx -mclzero -mpku -msgx -mrdpid -mgfni -mavx512bitalg -mpconfig -mwbnoinvd -mavx512bf16 -menqcmd -mserialize -mtsxldtrk -mamx-tile -mamx-int8 -mamx-bf16 -mkl -mwidekl -mavxvnni -mavx512fp16 -mavxifma -mavxvnniint8 -mavxneconvert -mcmpccxadd" } */ /* { dg-add-options bind_pic_locally } */ #include @@ -842,4 +842,8 @@ #define __builtin_ia32_vpclmulqdq_v2di(A, B, C) __builtin_ia32_vpclmulqdq_v2di(A, B, 1) #define __builtin_ia32_vpclmulqdq_v8di(A, B, C) __builtin_ia32_vpclmulqdq_v8di(A, B, 1) +/* cmpccxadd.h */ +#define __builtin_ia32_cmpccxadd(A, B, C, D) __builtin_ia32_cmpccxadd(A, B, C, 1) +#define __builtin_ia32_cmpccxadd64(A, B, C, D) __builtin_ia32_cmpccxadd64(A, B, C, 1) + #include diff --git a/gcc/testsuite/gcc.target/i386/sse-23.c b/gcc/testsuite/gcc.target/i386/sse-23.c index 898dde80c8f..757ba9c9a7d 100644 --- a/gcc/testsuite/gcc.target/i386/sse-23.c +++ b/gcc/testsuite/gcc.target/i386/sse-23.c @@ -843,6 +843,10 @@ #define __builtin_ia32_vpclmulqdq_v2di(A, B, C) __builtin_ia32_vpclmulqdq_v2di(A, B, 1) #define __builtin_ia32_vpclmulqdq_v8di(A, B, C) __builtin_ia32_vpclmulqdq_v8di(A, B, 1) -#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8,avxneconvert") +/* cmpccxadd.h */ +#define __builtin_ia32_cmpccxadd(A, B, C, D) __builtin_ia32_cmpccxadd(A, B, C, 1) +#define __builtin_ia32_cmpccxadd64(A, B, C, D) __builtin_ia32_cmpccxadd64(A, B, C, 1) + +#pragma GCC target ("sse4a,3dnow,avx,avx2,fma4,xop,aes,pclmul,popcnt,abm,lzcnt,bmi,bmi2,tbm,lwp,fsgsbase,rdrnd,f16c,fma,rtm,rdseed,prfchw,adx,fxsr,xsaveopt,avx512f,avx512er,avx512cd,avx512pf,sha,prefetchwt1,xsavec,xsaves,clflushopt,avx512bw,avx512dq,avx512vl,avx512vbmi,avx512ifma,avx5124fmaps,avx5124vnniw,avx512vpopcntdq,clwb,mwaitx,clzero,pku,sgx,rdpid,gfni,avx512vbmi2,vpclmulqdq,avx512bitalg,pconfig,wbnoinvd,avx512bf16,enqcmd,avx512vp2intersect,serialize,tsxldtrk,amx-tile,amx-int8,amx-bf16,kl,widekl,avxvnni,avx512fp16,avxifma,avxvnniint8,avxneconvert,cmpccxadd") #include diff --git a/gcc/testsuite/gcc.target/i386/x86gprintrin-1.c b/gcc/testsuite/gcc.target/i386/x86gprintrin-1.c index 293be094b78..76de89d0cb7 100644 --- a/gcc/testsuite/gcc.target/i386/x86gprintrin-1.c +++ b/gcc/testsuite/gcc.target/i386/x86gprintrin-1.c @@ -1,7 +1,7 @@ /* Test that is usable with -O -std=c89 -pedantic-errors. */ /* { dg-do compile } */ /* { dg-options "-O -std=c89 -pedantic-errors -march=x86-64 -madx -mbmi -mbmi2 -mcldemote -mclflushopt -mclwb -mclzero -menqcmd -mfsgsbase -mfxsr -mhreset -mlzcnt -mlwp -mmovdiri -mmwaitx -mpconfig -mpopcnt -mpku -mptwrite -mrdpid -mrdrnd -mrdseed -mrtm -mserialize -msgx -mshstk -mtbm -mtsxldtrk -mwaitpkg -mwbnoinvd -mxsave -mxsavec -mxsaveopt -mxsaves -mno-sse -mno-mmx" } */ -/* { dg-additional-options "-muintr" { target { ! ia32 } } } */ +/* { dg-additional-options "-mcmpccxadd -muintr" { target { ! ia32 } } } */ #include diff --git a/gcc/testsuite/gcc.target/i386/x86gprintrin-2.c b/gcc/testsuite/gcc.target/i386/x86gprintrin-2.c index c6330275746..aefad77f864 100644 --- a/gcc/testsuite/gcc.target/i386/x86gprintrin-2.c +++ b/gcc/testsuite/gcc.target/i386/x86gprintrin-2.c @@ -1,7 +1,7 @@ /* { dg-do compile } */ /* { dg-options "-O2 -Werror-implicit-function-declaration -march=x86-64 -madx -mbmi -mbmi2 -mcldemote -mclflushopt -mclwb -mclzero -menqcmd -mfsgsbase -mfxsr -mhreset -mlzcnt -mlwp -mmovdiri -mmwaitx -mpconfig -mpopcnt -mpku -mptwrite -mrdpid -mrdrnd -mrdseed -mrtm -mserialize -msgx -mshstk -mtbm -mtsxldtrk -mwaitpkg -mwbnoinvd -mxsave -mxsavec -mxsaveopt -mxsaves -mno-sse -mno-mmx" } */ /* { dg-add-options bind_pic_locally } */ -/* { dg-additional-options "-muintr" { target { ! ia32 } } } */ +/* { dg-additional-options "-mcmpccxadd -muintr" { target { ! ia32 } } } */ /* Test that the intrinsics in compile with optimization. All of them are defined as inline functions that reference the proper @@ -28,4 +28,8 @@ /* rtmintrin.h */ #define __builtin_ia32_xabort(N) __builtin_ia32_xabort(1) +/* cmpccxadd.h */ +#define __builtin_ia32_cmpccxadd(A, B, C, D) __builtin_ia32_cmpccxadd(A, B, C, 1) +#define __builtin_ia32_cmpccxadd64(A, B, C, D) __builtin_ia32_cmpccxadd64(A, B, C, 1) + #include diff --git a/gcc/testsuite/gcc.target/i386/x86gprintrin-3.c b/gcc/testsuite/gcc.target/i386/x86gprintrin-3.c index 3a7e1f4a10d..261c9180aa0 100644 --- a/gcc/testsuite/gcc.target/i386/x86gprintrin-3.c +++ b/gcc/testsuite/gcc.target/i386/x86gprintrin-3.c @@ -1,7 +1,7 @@ /* { dg-do compile } */ /* { dg-options "-O0 -Werror-implicit-function-declaration -march=x86-64 -madx -mbmi -mbmi2 -mcldemote -mclflushopt -mclwb -mclzero -menqcmd -mfsgsbase -mfxsr -mhreset -mlzcnt -mlwp -mmovdiri -mmwaitx -mpconfig -mpopcnt -mpku -mptwrite -mrdpid -mrdrnd -mrdseed -mrtm -mserialize -msgx -mshstk -mtbm -mtsxldtrk -mwaitpkg -mwbnoinvd -mxsave -mxsavec -mxsaveopt -mxsaves -mno-sse -mno-mmx" } */ /* { dg-add-options bind_pic_locally } */ -/* { dg-additional-options "-muintr" { target { ! ia32 } } } */ +/* { dg-additional-options "-mcmpccxadd -muintr" { target { ! ia32 } } } */ /* Test that the intrinsics in compile without optimization. All of them are defined as inline functions that reference the proper diff --git a/gcc/testsuite/gcc.target/i386/x86gprintrin-4.c b/gcc/testsuite/gcc.target/i386/x86gprintrin-4.c index d8a6126e5dc..7f76b870934 100644 --- a/gcc/testsuite/gcc.target/i386/x86gprintrin-4.c +++ b/gcc/testsuite/gcc.target/i386/x86gprintrin-4.c @@ -15,7 +15,7 @@ #ifndef DIFFERENT_PRAGMAS #ifdef __x86_64__ -#pragma GCC target ("adx,bmi,bmi2,fsgsbase,fxsr,hreset,lwp,lzcnt,popcnt,rdrnd,rdseed,tbm,rtm,serialize,tsxldtrk,uintr,xsaveopt") +#pragma GCC target ("adx,bmi,bmi2,cmpccxadd,fsgsbase,fxsr,hreset,lwp,lzcnt,popcnt,rdrnd,rdseed,tbm,rtm,serialize,tsxldtrk,uintr,xsaveopt") #else #pragma GCC target ("adx,bmi,bmi2,fsgsbase,fxsr,hreset,lwp,lzcnt,popcnt,rdrnd,rdseed,tbm,rtm,serialize,tsxldtrk,xsaveopt") #endif diff --git a/gcc/testsuite/gcc.target/i386/x86gprintrin-5.c b/gcc/testsuite/gcc.target/i386/x86gprintrin-5.c index 9ef66fdad54..54d826c4f46 100644 --- a/gcc/testsuite/gcc.target/i386/x86gprintrin-5.c +++ b/gcc/testsuite/gcc.target/i386/x86gprintrin-5.c @@ -27,8 +27,12 @@ /* rtmintrin.h */ #define __builtin_ia32_xabort(M) __builtin_ia32_xabort(1) +/* cmpccxadd.h */ +#define __builtin_ia32_cmpccxadd(A, B, C, D) __builtin_ia32_cmpccxadd(A, B, C, 1) +#define __builtin_ia32_cmpccxadd64(A, B, C, D) __builtin_ia32_cmpccxadd64(A, B, C, 1) + #ifdef __x86_64__ -#pragma GCC target ("adx,bmi,bmi2,clflushopt,clwb,clzero,enqcmd,fsgsbase,fxsr,hreset,lwp,lzcnt,mwaitx,pconfig,pku,popcnt,rdpid,rdrnd,rdseed,tbm,rtm,serialize,sgx,tsxldtrk,uintr,xsavec,xsaveopt,xsaves,wbnoinvd") +#pragma GCC target ("adx,bmi,bmi2,clflushopt,clwb,clzero,cmpccxadd,enqcmd,fsgsbase,fxsr,hreset,lwp,lzcnt,mwaitx,pconfig,pku,popcnt,rdpid,rdrnd,rdseed,tbm,rtm,serialize,sgx,tsxldtrk,uintr,xsavec,xsaveopt,xsaves,wbnoinvd") #else #pragma GCC target ("adx,bmi,bmi2,clflushopt,clwb,clzero,enqcmd,fsgsbase,fxsr,hreset,lwp,lzcnt,mwaitx,pconfig,pku,popcnt,rdpid,rdrnd,rdseed,tbm,rtm,serialize,sgx,tsxldtrk,xsavec,xsaveopt,xsaves,wbnoinvd") #endif diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 9228e810c45..d3b9aafb8f0 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -9542,6 +9542,16 @@ proc check_effective_target_avxneconvert { } { } "-O0 -mavxneconvert" ] } +# Return 1 if cmpccxadd instructions can be compiled. +proc check_effective_target_cmpccxadd { } { + return [check_no_compiler_messages cmpccxadd object { + int _cmpccxadd_epi32 (int *__A, int __B, int __C, const int __D) + { + return (int)__builtin_ia32_cmpccxadd (__A, __B, __C, 1); + } + } "-mcmpccxadd" ] +} + # Return 1 if sse instructions can be compiled. proc check_effective_target_sse { } { return [check_no_compiler_messages sse object { From patchwork Fri Oct 14 07:54:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jiang, Haochen" X-Patchwork-Id: 2555 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp54222wrs; Fri, 14 Oct 2022 00:57:40 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5Xpx+Vjw5za50wV5eWYX77LEI9pR6KHhj+QEk2gEwV/qCdWx72ijaReJG+6D0KcBFRs6D+ X-Received: by 2002:a17:906:9b93:b0:78d:eb36:1ce7 with SMTP id dd19-20020a1709069b9300b0078deb361ce7mr2669510ejc.621.1665734260462; Fri, 14 Oct 2022 00:57:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665734260; cv=none; d=google.com; s=arc-20160816; b=XW6zdOEYevC8Jw4PdVholQmudzroq8mCB9tyyEW+8dBObN/4lSUXvqnR+sGs8Cdv9L g4aMVxdFPXVWdKZkhAuE3EYRiuy4l9lrgfm8sey7SNpcNyvPRJn5+h8dt6rCp2fwJsKS GNcNGNZNcwVaQBgePnrIaPaE137TL+qOSesoUEzcT57/qjBZGaRdRe9hTyQvY8twhF0N +0ftNMHaG6NT/6fNVY0kQifeHjO4TfLaWTX+s6sp0oq0Uxafyz6eHN56cVV8TqWFbiVl IezvdLxGP7I74U+k1WLSYS0lpHJlCNPWpDYi0urljmWkWEwmjJd/7y4MyRfUi2A+8mDc 1oUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :references:in-reply-to:message-id:date:subject:to:dmarc-filter :delivered-to:dkim-signature:dkim-filter; bh=ozpIppgD5pcW6YpT0hJI9u8MQmIhHmk4FltAuyvbzVU=; b=qa8IKnY83y3svFAgFwn3Wac/Il3DNwE66lvmt43rCBskujwwaDHpTvBMdduEo9Jm/V us6rfKmWV8yqTKd41zFX3PTZ7MiojztGgPPya6SPt03LvUwnsvQ4Vb+S7odJuu5cIilr qfSFHGh7yY0RzHipeXK3V8OO6otW9a3NJsef/ste43OCrdPk+SuZOh/kuLjqMfYrBu4l ddfg0IzZV2prwad6Gr4wxpWqorFRnGyrGdrlTtRuf/Tm8pZIUsZe29Iy48iBvHcXciIi qDplY0Eixvbz06Rvilc4b8q9YzAJnG9fvHKU2JZ+cptvG538TZiPCQFt6X8P5kiIZsDy il/Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Vf8asQ02; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id sc12-20020a1709078a0c00b0077fadd74307si1931872ejc.128.2022.10.14.00.57.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Oct 2022 00:57:40 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Vf8asQ02; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 03FEE3854141 for ; Fri, 14 Oct 2022 07:56:33 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 03FEE3854141 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1665734193; bh=ozpIppgD5pcW6YpT0hJI9u8MQmIhHmk4FltAuyvbzVU=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=Vf8asQ02ruopUAQt+pZC5ntrZ2S2R8DXvjC3wU77lq8MNt8O8Px2vtbvdEBpWeApO xT9CCDbwcELYIX8K9wf5ujH9Ps8FjE9+Y40v/xBBso1uanpT+DCtCBxNibHju1qrjD jnPogoTvrS473Da2WJ2hGemJpfH5BLFGpcKcSpgo= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by sourceware.org (Postfix) with ESMTPS id 4EC623858C39 for ; Fri, 14 Oct 2022 07:54:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4EC623858C39 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="288597856" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="288597856" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2022 00:54:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10499"; a="627488365" X-IronPort-AV: E=Sophos;i="5.95,182,1661842800"; d="scan'208";a="627488365" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga002.jf.intel.com with ESMTP; 14 Oct 2022 00:54:50 -0700 Received: from shliclel320.sh.intel.com (shliclel320.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 078861009C91; Fri, 14 Oct 2022 15:54:48 +0800 (CST) To: gcc-patches@gcc.gnu.org Subject: [PATCH 6/6] Initial Sierra Forest Support Date: Fri, 14 Oct 2022 15:54:45 +0800 Message-Id: <20221014075445.7938-7-haochen.jiang@intel.com> X-Mailer: git-send-email 2.18.1 In-Reply-To: <20221014075445.7938-1-haochen.jiang@intel.com> References: <20221014075445.7938-1-haochen.jiang@intel.com> X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Haochen Jiang via Gcc-patches From: "Jiang, Haochen" Reply-To: Haochen Jiang Cc: hongtao.liu@intel.com Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746648967991441880?= X-GMAIL-MSGID: =?utf-8?q?1746648967991441880?= gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_intel_cpu): Add Sierra Forest. * common/config/i386/i386-common.cc (processor_names): Add Sierra Forest. (processor_alias_table): Ditto. * common/config/i386/i386-cpuinfo.h (enum processor_types): Add INTEL_SIERRAFOREST. * config.gcc: Add -march=sierraforest. * config/i386/driver-i386.cc (host_detect_local_cpu): Handle Sierra Forest. * config/i386/i386-c.cc (ix86_target_macros_internal): Ditto. * config/i386/i386-options.cc (m_SIERRAFOREST): New define. (processor_cost_table): Add sierra forest. * config/i386/i386.h (enum processor_type): Add PROCESSOR_SIERRA_FOREST. (PTA_SIERRAFOREST): Ditto. * doc/extend.texi: Add sierra forest. * doc/invoke.texi: Ditto. gcc/testsuite/ChangeLog: * g++.target/i386/mv16.C: Add sierra forest. * gcc.target/i386/funcspec-56.inc: Handle new march. --- gcc/common/config/i386/cpuinfo.h | 6 ++++++ gcc/common/config/i386/i386-common.cc | 3 +++ gcc/common/config/i386/i386-cpuinfo.h | 1 + gcc/config.gcc | 3 ++- gcc/config/i386/driver-i386.cc | 5 ++++- gcc/config/i386/i386-c.cc | 7 +++++++ gcc/config/i386/i386-options.cc | 2 ++ gcc/config/i386/i386.h | 3 +++ gcc/doc/extend.texi | 3 +++ gcc/doc/invoke.texi | 8 ++++++++ gcc/testsuite/g++.target/i386/mv16.C | 6 ++++++ gcc/testsuite/gcc.target/i386/funcspec-56.inc | 1 + 12 files changed, 46 insertions(+), 2 deletions(-) diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h index f73834b086c..cc499c46ed0 100644 --- a/gcc/common/config/i386/cpuinfo.h +++ b/gcc/common/config/i386/cpuinfo.h @@ -516,6 +516,12 @@ get_intel_cpu (struct __processor_model *cpu_model, cpu_model->__cpu_type = INTEL_COREI7; cpu_model->__cpu_subtype = INTEL_COREI7_SAPPHIRERAPIDS; break; + case 0xaf: + /* Sierra Forest. */ + cpu = "sierraforest"; + CHECK___builtin_cpu_is ("sierraforest"); + cpu_model->__cpu_type = INTEL_SIERRAFOREST; + break; case 0x17: case 0x1d: /* Penryn. */ diff --git a/gcc/common/config/i386/i386-common.cc b/gcc/common/config/i386/i386-common.cc index 75966779d82..6ccc4d2f03c 100644 --- a/gcc/common/config/i386/i386-common.cc +++ b/gcc/common/config/i386/i386-common.cc @@ -1874,6 +1874,7 @@ const char *const processor_names[] = "goldmont", "goldmont-plus", "tremont", + "sierraforest", "knl", "knm", "skylake", @@ -2019,6 +2020,8 @@ const pta processor_alias_table[] = M_CPU_TYPE (INTEL_GOLDMONT_PLUS), P_PROC_SSE4_2}, {"tremont", PROCESSOR_TREMONT, CPU_HASWELL, PTA_TREMONT, M_CPU_TYPE (INTEL_TREMONT), P_PROC_SSE4_2}, + {"sierraforest", PROCESSOR_SIERRAFOREST, CPU_HASWELL, PTA_SIERRAFOREST, + M_CPU_SUBTYPE (INTEL_SIERRAFOREST), P_PROC_AVX2}, {"knl", PROCESSOR_KNL, CPU_SLM, PTA_KNL, M_CPU_TYPE (INTEL_KNL), P_PROC_AVX512F}, {"knm", PROCESSOR_KNM, CPU_SLM, PTA_KNM, diff --git a/gcc/common/config/i386/i386-cpuinfo.h b/gcc/common/config/i386/i386-cpuinfo.h index 5a61d817007..a71a10ebbd7 100644 --- a/gcc/common/config/i386/i386-cpuinfo.h +++ b/gcc/common/config/i386/i386-cpuinfo.h @@ -58,6 +58,7 @@ enum processor_types INTEL_TREMONT, AMDFAM19H, ZHAOXIN_FAM7H, + INTEL_SIERRAFOREST, CPU_TYPE_MAX, BUILTIN_CPU_TYPE_MAX = CPU_TYPE_MAX }; diff --git a/gcc/config.gcc b/gcc/config.gcc index fe063bfbb26..c0e10a72bd5 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -665,7 +665,8 @@ slm nehalem westmere sandybridge ivybridge haswell broadwell bonnell \ silvermont knl knm skylake-avx512 cannonlake icelake-client icelake-server \ skylake goldmont goldmont-plus tremont cascadelake tigerlake cooperlake \ sapphirerapids alderlake rocketlake eden-x2 nano nano-1000 nano-2000 nano-3000 \ -nano-x2 eden-x4 nano-x4 lujiazui x86-64 x86-64-v2 x86-64-v3 x86-64-v4 native" +nano-x2 eden-x4 nano-x4 lujiazui x86-64 x86-64-v2 x86-64-v3 x86-64-v4 \ +sierraforest native" # Additional x86 processors supported by --with-cpu=. Each processor # MUST be separated by exactly one space. diff --git a/gcc/config/i386/driver-i386.cc b/gcc/config/i386/driver-i386.cc index ef567045c67..be205a56ea2 100644 --- a/gcc/config/i386/driver-i386.cc +++ b/gcc/config/i386/driver-i386.cc @@ -589,8 +589,11 @@ const char *host_detect_local_cpu (int argc, const char **argv) /* This is unknown family 0x6 CPU. */ if (has_feature (FEATURE_AVX)) { + /* Assume Sierra Forest. */ + if (has_feature (FEATURE_AVXVNNIINT8)) + cpu = "sierraforest"; /* Assume Tiger Lake */ - if (has_feature (FEATURE_AVX512VP2INTERSECT)) + else if (has_feature (FEATURE_AVX512VP2INTERSECT)) cpu = "tigerlake"; /* Assume Sapphire Rapids. */ else if (has_feature (FEATURE_TSXLDTRK)) diff --git a/gcc/config/i386/i386-c.cc b/gcc/config/i386/i386-c.cc index 9885a724d0f..4494c412995 100644 --- a/gcc/config/i386/i386-c.cc +++ b/gcc/config/i386/i386-c.cc @@ -198,6 +198,10 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, def_or_undef (parse_in, "__tremont"); def_or_undef (parse_in, "__tremont__"); break; + case PROCESSOR_SIERRAFOREST: + def_or_undef (parse_in, "__sierraforest"); + def_or_undef (parse_in, "__sierraforest__"); + break; case PROCESSOR_KNL: def_or_undef (parse_in, "__knl"); def_or_undef (parse_in, "__knl__"); @@ -377,6 +381,9 @@ ix86_target_macros_internal (HOST_WIDE_INT isa_flag, case PROCESSOR_TREMONT: def_or_undef (parse_in, "__tune_tremont__"); break; + case PROCESSOR_SIERRAFOREST: + def_or_undef (parse_in, "__tune_sierraforest__"); + break; case PROCESSOR_KNL: def_or_undef (parse_in, "__tune_knl__"); break; diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index fb872afdfb5..4526dc09fc4 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -136,6 +136,7 @@ along with GCC; see the file COPYING3. If not see #define m_GOLDMONT (HOST_WIDE_INT_1U<