From patchwork Thu Sep 21 07:19:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Hu, Lin1" X-Patchwork-Id: 142766 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp4670092vqi; Thu, 21 Sep 2023 00:33:01 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF0sUZooH+4ULWIm/02oCZR/YrCYA7AnbS+gVwto6ZnLql2vq/sJuvd2T/Uq2I8SbnQ9VdN X-Received: by 2002:a05:6512:2208:b0:503:3cc:cd39 with SMTP id h8-20020a056512220800b0050303cccd39mr5627315lfu.8.1695281580736; Thu, 21 Sep 2023 00:33:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695281580; cv=none; d=google.com; s=arc-20160816; b=iDSqx+qBv1vOH5rfdgKZ45pZpF8+zgP4NPgNh6nwjZ8vEeU1ZKp2fD0cHYZiS5RE/s mb01OtjL4OwngHmGfShfcwcC7L3KULoVTmE6u5xn1F6iJuliX9v16q6bFA6P24qF6PnH PNP4rj60o/k2bv6u3S0LnPHel1D0UiOmZc/EKTuKUedOCmS+VAN2yI9dr7cPcWCNvDGm mnjei/OUbBDfFxF5XRi8QCaI/wgK5kY2F6eGKstj89k+LOZ6ZR82tRw7nCz0+bpR4YJe PJXORwrxEqjXEDkkSd+RDBqa16kbAl0D1YcrSrE1DQ4mEwHTVoWySA3eHsePDGHqO/BR 4v2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature:dmarc-filter:delivered-to; bh=XSwnk9g+6oHfJBw6T1I3dYxeQSPJvoRX8BjRp5ky9Rk=; fh=sF/NGAqCthaRflPLk0tS85YHxOxp+1SAOPUzsMi6/xU=; b=se9eXIq3Ztw8ShV36eM769z5BOQsSUvyZaBCjhlXCQr4vt3hIk8ZIOlxdRzJjjU+08 rKm24S4bMbqGA3wGO0jzpmxnGRYtjCsHEq2hUyO/7dLx7ZHiqU+mX6gorzoNgd+QwBBD f3XuWvwZKQAyO45e1Rkik2JYsKvnd1vJ/XvRu2ISaAWjSi2B8T0I0JSEhikI+Kzv2DuA kacMYlZrAXkOP12ywCBLR9xsNGCEuSo3hvaisTnyfSBBUpjrrcA+oMni4TnvXCgyIB+v u/NjexaIkHa1oxbaUCfnFXQ/4aQ3zSnySbJ36Ck+7CGugXSzSCT5WgzoqjhITcFRuuBf 02lA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=CY0F57Qt; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id b23-20020a05640202d700b0052548800d54si603800edx.265.2023.09.21.00.33.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Sep 2023 00:33:00 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=CY0F57Qt; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 57D853948438 for ; Thu, 21 Sep 2023 07:25:37 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.120]) by sourceware.org (Postfix) with ESMTPS id 2075938A816D for ; Thu, 21 Sep 2023 07:25:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2075938A816D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695281108; x=1726817108; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2L9cFyOhJAzgfzzHORMo+XuPFht1TvfffXUnqkMW2Cw=; b=CY0F57QtjuY3Hnjm5tvd0sirY6OykUjlVoL/tD2DQvyG/zdknP04nDHe zvG2Z5iHrxrHmf+cfxtcvyanUhCHGAytxbn/1xrAn0hSaFEbSVY3UavA8 pxcMbaDBUQye04PioA5EYj86xLPPF5Gs3zNCrobomvsroaGCM6ZG4qR5t D8ENMeyoKU355OD6QzyAcsf0hwJsikH6DMbM13uap4hjpTb2IQV51JzIn 03edXql62NXoMjSSgIM3h1a1T96g8ncHUyB6QVe8x6Tc6A3eqdpnClQmE PlPuhVHgrBbIAkvwc8MfuW8x02/d3QEPBpDG0zlmhzxDMuCxGrwIerUk7 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="379326677" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="379326677" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Sep 2023 00:22:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10839"; a="817262186" X-IronPort-AV: E=Sophos;i="6.03,164,1694761200"; d="scan'208";a="817262186" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 21 Sep 2023 00:22:15 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 1A8821005132; Thu, 21 Sep 2023 15:22:14 +0800 (CST) From: "Hu, Lin1" To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com, haochen.jiang@intel.com Subject: [PATCH 04/18] [PATCH 3/5] Push evex512 target for 512 bit intrins Date: Thu, 21 Sep 2023 15:19:59 +0800 Message-Id: <20230921072013.2124750-5-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20230921072013.2124750-1-lin1.hu@intel.com> References: <20230921072013.2124750-1-lin1.hu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP, UNWANTED_LANGUAGE_BODY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777631271629256484 X-GMAIL-MSGID: 1777631578841641875 From: Haochen Jiang gcc/ChangeLog: * config/i386/avx512bwintrin.h: Add evex512 target for 512 bit intrins. --- gcc/config/i386/avx512bwintrin.h | 291 ++++++++++++++++--------------- 1 file changed, 153 insertions(+), 138 deletions(-) diff --git a/gcc/config/i386/avx512bwintrin.h b/gcc/config/i386/avx512bwintrin.h index d1cd549ce18..925bae1457c 100644 --- a/gcc/config/i386/avx512bwintrin.h +++ b/gcc/config/i386/avx512bwintrin.h @@ -34,16 +34,6 @@ #define __DISABLE_AVX512BW__ #endif /* __AVX512BW__ */ -/* Internal data types for implementing the intrinsics. */ -typedef short __v32hi __attribute__ ((__vector_size__ (64))); -typedef short __v32hi_u __attribute__ ((__vector_size__ (64), \ - __may_alias__, __aligned__ (1))); -typedef char __v64qi __attribute__ ((__vector_size__ (64))); -typedef char __v64qi_u __attribute__ ((__vector_size__ (64), \ - __may_alias__, __aligned__ (1))); - -typedef unsigned long long __mmask64; - extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _ktest_mask32_u8 (__mmask32 __A, __mmask32 __B, unsigned char *__CF) @@ -54,229 +44,292 @@ _ktest_mask32_u8 (__mmask32 __A, __mmask32 __B, unsigned char *__CF) extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_ktest_mask64_u8 (__mmask64 __A, __mmask64 __B, unsigned char *__CF) +_ktestz_mask32_u8 (__mmask32 __A, __mmask32 __B) { - *__CF = (unsigned char) __builtin_ia32_ktestcdi (__A, __B); - return (unsigned char) __builtin_ia32_ktestzdi (__A, __B); + return (unsigned char) __builtin_ia32_ktestzsi (__A, __B); } extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_ktestz_mask32_u8 (__mmask32 __A, __mmask32 __B) +_ktestc_mask32_u8 (__mmask32 __A, __mmask32 __B) { - return (unsigned char) __builtin_ia32_ktestzsi (__A, __B); + return (unsigned char) __builtin_ia32_ktestcsi (__A, __B); } extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_ktestz_mask64_u8 (__mmask64 __A, __mmask64 __B) +_kortest_mask32_u8 (__mmask32 __A, __mmask32 __B, unsigned char *__CF) { - return (unsigned char) __builtin_ia32_ktestzdi (__A, __B); + *__CF = (unsigned char) __builtin_ia32_kortestcsi (__A, __B); + return (unsigned char) __builtin_ia32_kortestzsi (__A, __B); } extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_ktestc_mask32_u8 (__mmask32 __A, __mmask32 __B) +_kortestz_mask32_u8 (__mmask32 __A, __mmask32 __B) { - return (unsigned char) __builtin_ia32_ktestcsi (__A, __B); + return (unsigned char) __builtin_ia32_kortestzsi (__A, __B); } extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_ktestc_mask64_u8 (__mmask64 __A, __mmask64 __B) +_kortestc_mask32_u8 (__mmask32 __A, __mmask32 __B) { - return (unsigned char) __builtin_ia32_ktestcdi (__A, __B); + return (unsigned char) __builtin_ia32_kortestcsi (__A, __B); } -extern __inline unsigned char +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kortest_mask32_u8 (__mmask32 __A, __mmask32 __B, unsigned char *__CF) +_kadd_mask32 (__mmask32 __A, __mmask32 __B) { - *__CF = (unsigned char) __builtin_ia32_kortestcsi (__A, __B); - return (unsigned char) __builtin_ia32_kortestzsi (__A, __B); + return (__mmask32) __builtin_ia32_kaddsi ((__mmask32) __A, (__mmask32) __B); } -extern __inline unsigned char +extern __inline unsigned int __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kortest_mask64_u8 (__mmask64 __A, __mmask64 __B, unsigned char *__CF) +_cvtmask32_u32 (__mmask32 __A) { - *__CF = (unsigned char) __builtin_ia32_kortestcdi (__A, __B); - return (unsigned char) __builtin_ia32_kortestzdi (__A, __B); + return (unsigned int) __builtin_ia32_kmovd ((__mmask32) __A); } -extern __inline unsigned char +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kortestz_mask32_u8 (__mmask32 __A, __mmask32 __B) +_cvtu32_mask32 (unsigned int __A) { - return (unsigned char) __builtin_ia32_kortestzsi (__A, __B); + return (__mmask32) __builtin_ia32_kmovd ((__mmask32) __A); } -extern __inline unsigned char +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kortestz_mask64_u8 (__mmask64 __A, __mmask64 __B) +_load_mask32 (__mmask32 *__A) { - return (unsigned char) __builtin_ia32_kortestzdi (__A, __B); + return (__mmask32) __builtin_ia32_kmovd (*__A); } -extern __inline unsigned char +extern __inline void __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kortestc_mask32_u8 (__mmask32 __A, __mmask32 __B) +_store_mask32 (__mmask32 *__A, __mmask32 __B) { - return (unsigned char) __builtin_ia32_kortestcsi (__A, __B); + *(__mmask32 *) __A = __builtin_ia32_kmovd (__B); } -extern __inline unsigned char +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kortestc_mask64_u8 (__mmask64 __A, __mmask64 __B) +_knot_mask32 (__mmask32 __A) { - return (unsigned char) __builtin_ia32_kortestcdi (__A, __B); + return (__mmask32) __builtin_ia32_knotsi ((__mmask32) __A); } extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kadd_mask32 (__mmask32 __A, __mmask32 __B) +_kor_mask32 (__mmask32 __A, __mmask32 __B) { - return (__mmask32) __builtin_ia32_kaddsi ((__mmask32) __A, (__mmask32) __B); + return (__mmask32) __builtin_ia32_korsi ((__mmask32) __A, (__mmask32) __B); } -extern __inline __mmask64 +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kadd_mask64 (__mmask64 __A, __mmask64 __B) +_kxnor_mask32 (__mmask32 __A, __mmask32 __B) { - return (__mmask64) __builtin_ia32_kadddi ((__mmask64) __A, (__mmask64) __B); + return (__mmask32) __builtin_ia32_kxnorsi ((__mmask32) __A, (__mmask32) __B); } -extern __inline unsigned int +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_cvtmask32_u32 (__mmask32 __A) +_kxor_mask32 (__mmask32 __A, __mmask32 __B) { - return (unsigned int) __builtin_ia32_kmovd ((__mmask32) __A); + return (__mmask32) __builtin_ia32_kxorsi ((__mmask32) __A, (__mmask32) __B); } -extern __inline unsigned long long +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_cvtmask64_u64 (__mmask64 __A) +_kand_mask32 (__mmask32 __A, __mmask32 __B) { - return (unsigned long long) __builtin_ia32_kmovq ((__mmask64) __A); + return (__mmask32) __builtin_ia32_kandsi ((__mmask32) __A, (__mmask32) __B); } extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_cvtu32_mask32 (unsigned int __A) +_kandn_mask32 (__mmask32 __A, __mmask32 __B) { - return (__mmask32) __builtin_ia32_kmovd ((__mmask32) __A); + return (__mmask32) __builtin_ia32_kandnsi ((__mmask32) __A, (__mmask32) __B); } -extern __inline __mmask64 +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_cvtu64_mask64 (unsigned long long __A) +_mm512_kunpackw (__mmask32 __A, __mmask32 __B) { - return (__mmask64) __builtin_ia32_kmovq ((__mmask64) __A); + return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A, + (__mmask32) __B); } extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_load_mask32 (__mmask32 *__A) +_kunpackw_mask32 (__mmask16 __A, __mmask16 __B) { - return (__mmask32) __builtin_ia32_kmovd (*__A); + return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A, + (__mmask32) __B); } -extern __inline __mmask64 +#if __OPTIMIZE__ +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_load_mask64 (__mmask64 *__A) +_kshiftli_mask32 (__mmask32 __A, unsigned int __B) { - return (__mmask64) __builtin_ia32_kmovq (*(__mmask64 *) __A); + return (__mmask32) __builtin_ia32_kshiftlisi ((__mmask32) __A, + (__mmask8) __B); } -extern __inline void +extern __inline __mmask32 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_store_mask32 (__mmask32 *__A, __mmask32 __B) +_kshiftri_mask32 (__mmask32 __A, unsigned int __B) { - *(__mmask32 *) __A = __builtin_ia32_kmovd (__B); + return (__mmask32) __builtin_ia32_kshiftrisi ((__mmask32) __A, + (__mmask8) __B); } -extern __inline void +#else +#define _kshiftli_mask32(X, Y) \ + ((__mmask32) __builtin_ia32_kshiftlisi ((__mmask32)(X), (__mmask8)(Y))) + +#define _kshiftri_mask32(X, Y) \ + ((__mmask32) __builtin_ia32_kshiftrisi ((__mmask32)(X), (__mmask8)(Y))) + +#endif + +#ifdef __DISABLE_AVX512BW__ +#undef __DISABLE_AVX512BW__ +#pragma GCC pop_options +#endif /* __DISABLE_AVX512BW__ */ + +#if !defined (__AVX512BW__) || !defined (__EVEX512__) +#pragma GCC push_options +#pragma GCC target("avx512bw,evex512") +#define __DISABLE_AVX512BW_512__ +#endif /* __AVX512BW_512__ */ + +/* Internal data types for implementing the intrinsics. */ +typedef short __v32hi __attribute__ ((__vector_size__ (64))); +typedef short __v32hi_u __attribute__ ((__vector_size__ (64), \ + __may_alias__, __aligned__ (1))); +typedef char __v64qi __attribute__ ((__vector_size__ (64))); +typedef char __v64qi_u __attribute__ ((__vector_size__ (64), \ + __may_alias__, __aligned__ (1))); + +typedef unsigned long long __mmask64; + +extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_store_mask64 (__mmask64 *__A, __mmask64 __B) +_ktest_mask64_u8 (__mmask64 __A, __mmask64 __B, unsigned char *__CF) { - *(__mmask64 *) __A = __builtin_ia32_kmovq (__B); + *__CF = (unsigned char) __builtin_ia32_ktestcdi (__A, __B); + return (unsigned char) __builtin_ia32_ktestzdi (__A, __B); } -extern __inline __mmask32 +extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_knot_mask32 (__mmask32 __A) +_ktestz_mask64_u8 (__mmask64 __A, __mmask64 __B) { - return (__mmask32) __builtin_ia32_knotsi ((__mmask32) __A); + return (unsigned char) __builtin_ia32_ktestzdi (__A, __B); } -extern __inline __mmask64 +extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_knot_mask64 (__mmask64 __A) +_ktestc_mask64_u8 (__mmask64 __A, __mmask64 __B) { - return (__mmask64) __builtin_ia32_knotdi ((__mmask64) __A); + return (unsigned char) __builtin_ia32_ktestcdi (__A, __B); } -extern __inline __mmask32 +extern __inline unsigned char __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kor_mask32 (__mmask32 __A, __mmask32 __B) +_kortest_mask64_u8 (__mmask64 __A, __mmask64 __B, unsigned char *__CF) { - return (__mmask32) __builtin_ia32_korsi ((__mmask32) __A, (__mmask32) __B); + *__CF = (unsigned char) __builtin_ia32_kortestcdi (__A, __B); + return (unsigned char) __builtin_ia32_kortestzdi (__A, __B); +} + +extern __inline unsigned char +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_kortestz_mask64_u8 (__mmask64 __A, __mmask64 __B) +{ + return (unsigned char) __builtin_ia32_kortestzdi (__A, __B); +} + +extern __inline unsigned char +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_kortestc_mask64_u8 (__mmask64 __A, __mmask64 __B) +{ + return (unsigned char) __builtin_ia32_kortestcdi (__A, __B); } extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kor_mask64 (__mmask64 __A, __mmask64 __B) +_kadd_mask64 (__mmask64 __A, __mmask64 __B) { - return (__mmask64) __builtin_ia32_kordi ((__mmask64) __A, (__mmask64) __B); + return (__mmask64) __builtin_ia32_kadddi ((__mmask64) __A, (__mmask64) __B); } -extern __inline __mmask32 +extern __inline unsigned long long __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kxnor_mask32 (__mmask32 __A, __mmask32 __B) +_cvtmask64_u64 (__mmask64 __A) { - return (__mmask32) __builtin_ia32_kxnorsi ((__mmask32) __A, (__mmask32) __B); + return (unsigned long long) __builtin_ia32_kmovq ((__mmask64) __A); } extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kxnor_mask64 (__mmask64 __A, __mmask64 __B) +_cvtu64_mask64 (unsigned long long __A) { - return (__mmask64) __builtin_ia32_kxnordi ((__mmask64) __A, (__mmask64) __B); + return (__mmask64) __builtin_ia32_kmovq ((__mmask64) __A); } -extern __inline __mmask32 +extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kxor_mask32 (__mmask32 __A, __mmask32 __B) +_load_mask64 (__mmask64 *__A) { - return (__mmask32) __builtin_ia32_kxorsi ((__mmask32) __A, (__mmask32) __B); + return (__mmask64) __builtin_ia32_kmovq (*(__mmask64 *) __A); +} + +extern __inline void +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_store_mask64 (__mmask64 *__A, __mmask64 __B) +{ + *(__mmask64 *) __A = __builtin_ia32_kmovq (__B); } extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kxor_mask64 (__mmask64 __A, __mmask64 __B) +_knot_mask64 (__mmask64 __A) { - return (__mmask64) __builtin_ia32_kxordi ((__mmask64) __A, (__mmask64) __B); + return (__mmask64) __builtin_ia32_knotdi ((__mmask64) __A); } -extern __inline __mmask32 +extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kand_mask32 (__mmask32 __A, __mmask32 __B) +_kor_mask64 (__mmask64 __A, __mmask64 __B) { - return (__mmask32) __builtin_ia32_kandsi ((__mmask32) __A, (__mmask32) __B); + return (__mmask64) __builtin_ia32_kordi ((__mmask64) __A, (__mmask64) __B); } extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kand_mask64 (__mmask64 __A, __mmask64 __B) +_kxnor_mask64 (__mmask64 __A, __mmask64 __B) { - return (__mmask64) __builtin_ia32_kanddi ((__mmask64) __A, (__mmask64) __B); + return (__mmask64) __builtin_ia32_kxnordi ((__mmask64) __A, (__mmask64) __B); } -extern __inline __mmask32 +extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kandn_mask32 (__mmask32 __A, __mmask32 __B) +_kxor_mask64 (__mmask64 __A, __mmask64 __B) { - return (__mmask32) __builtin_ia32_kandnsi ((__mmask32) __A, (__mmask32) __B); + return (__mmask64) __builtin_ia32_kxordi ((__mmask64) __A, (__mmask64) __B); +} + +extern __inline __mmask64 +__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) +_kand_mask64 (__mmask64 __A, __mmask64 __B) +{ + return (__mmask64) __builtin_ia32_kanddi ((__mmask64) __A, (__mmask64) __B); } extern __inline __mmask64 @@ -366,22 +419,6 @@ _mm512_maskz_mov_epi8 (__mmask64 __U, __m512i __A) (__mmask64) __U); } -extern __inline __mmask32 -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_mm512_kunpackw (__mmask32 __A, __mmask32 __B) -{ - return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A, - (__mmask32) __B); -} - -extern __inline __mmask32 -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kunpackw_mask32 (__mmask16 __A, __mmask16 __B) -{ - return (__mmask32) __builtin_ia32_kunpcksi ((__mmask32) __A, - (__mmask32) __B); -} - extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_kunpackd (__mmask64 __A, __mmask64 __B) @@ -2776,14 +2813,6 @@ _mm512_mask_packus_epi32 (__m512i __W, __mmask32 __M, __m512i __A, } #ifdef __OPTIMIZE__ -extern __inline __mmask32 -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kshiftli_mask32 (__mmask32 __A, unsigned int __B) -{ - return (__mmask32) __builtin_ia32_kshiftlisi ((__mmask32) __A, - (__mmask8) __B); -} - extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _kshiftli_mask64 (__mmask64 __A, unsigned int __B) @@ -2792,14 +2821,6 @@ _kshiftli_mask64 (__mmask64 __A, unsigned int __B) (__mmask8) __B); } -extern __inline __mmask32 -__attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) -_kshiftri_mask32 (__mmask32 __A, unsigned int __B) -{ - return (__mmask32) __builtin_ia32_kshiftrisi ((__mmask32) __A, - (__mmask8) __B); -} - extern __inline __mmask64 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _kshiftri_mask64 (__mmask64 __A, unsigned int __B) @@ -3145,15 +3166,9 @@ _mm512_bsrli_epi128 (__m512i __A, const int __N) } #else -#define _kshiftli_mask32(X, Y) \ - ((__mmask32) __builtin_ia32_kshiftlisi ((__mmask32)(X), (__mmask8)(Y))) - #define _kshiftli_mask64(X, Y) \ ((__mmask64) __builtin_ia32_kshiftlidi ((__mmask64)(X), (__mmask8)(Y))) -#define _kshiftri_mask32(X, Y) \ - ((__mmask32) __builtin_ia32_kshiftrisi ((__mmask32)(X), (__mmask8)(Y))) - #define _kshiftri_mask64(X, Y) \ ((__mmask64) __builtin_ia32_kshiftridi ((__mmask64)(X), (__mmask8)(Y))) @@ -3328,9 +3343,9 @@ _mm512_bsrli_epi128 (__m512i __A, const int __N) #endif -#ifdef __DISABLE_AVX512BW__ -#undef __DISABLE_AVX512BW__ +#ifdef __DISABLE_AVX512BW_512__ +#undef __DISABLE_AVX512BW_512__ #pragma GCC pop_options -#endif /* __DISABLE_AVX512BW__ */ +#endif /* __DISABLE_AVX512BW_512__ */ #endif /* _AVX512BWINTRIN_H_INCLUDED */