From patchwork Tue Oct 31 06:37:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jiang, Haochen" X-Patchwork-Id: 159977 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b90f:0:b0:403:3b70:6f57 with SMTP id t15csp48042vqg; Mon, 30 Oct 2023 23:40:16 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHbmxXK5cCIEMunfLgH9VO2RjRMJfN+h2EDmclcQq64HD5VKy+VMAapmuNft4TIEnMICgC1 X-Received: by 2002:a05:620a:24c7:b0:778:ba13:a69d with SMTP id m7-20020a05620a24c700b00778ba13a69dmr2565499qkn.17.1698734416091; Mon, 30 Oct 2023 23:40:16 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1698734416; cv=pass; d=google.com; s=arc-20160816; b=T72pvFQP8im2tM0LPXA8EZ5VncTneRDIlndBlAOO8OiNNBf9MsixC44juHMdL0Jcxl JLoVwaYAamVoIbJ0se4m3ctlBD+Vq+x4/zMuhAhlJVmOk3PK3zM1lv+wXWzDxAutDV9B jYS9ZJp3JWcI1+/CsCvnh4W/WxCVR6DkZPHpQFxRx9o4CSugMCcmZ5ZSHCG5AxVYAnoO DYA2lAK179clB08bGhQ2mMNXDjxM0CzRUsMD/lNU2k6iOsDBmf7nBWqmizn22geUf9J5 o5khb7zBFhTYAdnmKuPolpr8Yyuskgx3viVMD2xvCv9CCCrS+yPN89CwKo5+17WK+8MT yQow== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=9QwQwlnIFOtNARC6RpqI3jxyB6sY2ayZnJml1CvoxaQ=; fh=n8eNxIWSYJwy/CU3QSXzDvE/zeEoomCGojuOcYEQEyQ=; b=qCp+5TJpa/5cjq5CMtjUTHJgXXS0V12bqfgBOsspKpNcrIT2520IMwTWEiCx9OvwTR H+JAyOUrcCzCDVkcU+xfNdwiT44jT29DMTWi+l/qWskZQpQtpMfLfK9cu8LD7Q0EcBli bw2QjSHkx+LcYzUJCFNPIX+SRJ0koL5FO52QXI3LJzUmT5dOWMgSO/Nz1spkFgbrKoxo kJ0QyJPuTCDXGuPLVjXNFqxGZ+EsicDyoKPTVIckmDEGFDBWRPHZh6s+UCvtqE10icJd wr2GNqtuSZ5AonuKkL2blbbXQfW0y+7VMXdlKxh/NbEKkyfqsMAeuen3lpZCiv1VNZmr yZKA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=S6H3Bt2q; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id m6-20020a05620a24c600b007788bbadf88si518152qkn.625.2023.10.30.23.40.16 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 23:40:16 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=S6H3Bt2q; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D6E7F38582BC for ; Tue, 31 Oct 2023 06:40:15 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.88]) by sourceware.org (Postfix) with ESMTPS id 387263858C50 for ; Tue, 31 Oct 2023 06:39:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 387263858C50 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 387263858C50 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.55.52.88 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698734354; cv=none; b=i5BgAavrhrUKcW7ehv5VVfJyogonJtHVAs8oHSSaguO4CXdAB+YgxskgpABQHU/q2UCkBIR8KROFoZTLoN1fDU7dXfhmG8inclN5LAwLAqbZvu/4sDQIZJHkxVOJbW3tFC/u8Ck7ixpzmlEIgBnVz5OO2x4AyUhtZGqhit4l2yM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698734354; c=relaxed/simple; bh=DS7XekJtpCOJ4WJx3HJL4bepqhMNSkvBusWqKxXtjSk=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=j/CSJznE4ZmLV9T1DBSlzBz7e1XUBD7dVWcAlfDY7sT09wKXeKzvQMEvFo9MXArhMXFCmehRLVLBrIrQVSNLAlOcm6CUXB9PJoWKb4ptSmZSyfToT+I1SJ7ieCL4mf7aSYmxhLcy/2xqh8iX+7a7jYlgPesIJHBywg1tY5OLRW0= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698734352; x=1730270352; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DS7XekJtpCOJ4WJx3HJL4bepqhMNSkvBusWqKxXtjSk=; b=S6H3Bt2qiY2+sB+5/RioYJ2MnY3nLgcPVbzqVSH1fIvwLOGWDe+tWuUa w+EPW6iVL0LVgMlOxI0M+N7fttzLND018jg2cD5b+bAsmn/IzEjzSDjlj m3DiQ0UrHjNsqDncb6Cbkfb0P9bHiir3Z1kW1d/uBlH1/fkovGcN5SMFG Pb05lmg51Y1884Gr813tgikmuWN8b5WvlfAHO6lao9epjjnBqGTr1lifU fx4yFJSqbUdUQwuQMR96AeSID8TwIQ0uRjNgx70bYF4tInbKjsmpanwgd jOTVKD2B9OXH9KBD2cRD+qaR8XBmMpm1MwAKLpQxM71jbs+IUEfgcgEYy g==; X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="419335309" X-IronPort-AV: E=Sophos;i="6.03,265,1694761200"; d="scan'208";a="419335309" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2023 23:39:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="710328696" X-IronPort-AV: E=Sophos;i="6.03,265,1694761200"; d="scan'208";a="710328696" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga003.jf.intel.com with ESMTP; 30 Oct 2023 23:39:06 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id ACB9A100566D; Tue, 31 Oct 2023 14:39:05 +0800 (CST) From: Haochen Jiang To: gcc-patches@gcc.gnu.org Cc: ubizjak@gmail.com, hongtao.liu@intel.com Subject: [PATCH 2/4] [PATCH 2/3] Change internal intrin call for AVX512 intrins Date: Tue, 31 Oct 2023 14:37:01 +0800 Message-Id: <20231031063703.2643896-3-haochen.jiang@intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20231031063703.2643896-1-haochen.jiang@intel.com> References: <20231031063703.2643896-1-haochen.jiang@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781252139071482133 X-GMAIL-MSGID: 1781252139071482133 gcc/ChangeLog: * config/i386/avx512bf16vlintrin.h: Change intrin call. * config/i386/avx512fintrin.h (_mm_avx512_undefined_ps): New. (_mm_avx512_undefined_pd): Ditto. (__attribute__): Change intrin call. * config/i386/avx512vbmivlintrin.h: Ditto. * config/i386/avx512vlbwintrin.h: Ditto. * config/i386/avx512vldqintrin.h: Ditto. * config/i386/avx512vlintrin.h (_mm_avx512_undefined_si128): New. (_mm256_avx512_undefined_ps): Ditto. (_mm256_avx512_undefined_pd): Ditto. (_mm256_avx512_undefined_si256): Ditto. (__attribute__): Change intrin call. --- gcc/config/i386/avx512bf16vlintrin.h | 2 +- gcc/config/i386/avx512fintrin.h | 24 +++++- gcc/config/i386/avx512vbmivlintrin.h | 8 +- gcc/config/i386/avx512vlbwintrin.h | 12 +-- gcc/config/i386/avx512vldqintrin.h | 10 +-- gcc/config/i386/avx512vlintrin.h | 110 ++++++++++++++++++--------- 6 files changed, 113 insertions(+), 53 deletions(-) diff --git a/gcc/config/i386/avx512bf16vlintrin.h b/gcc/config/i386/avx512bf16vlintrin.h index 6e8a6a09511..517544c5b89 100644 --- a/gcc/config/i386/avx512bf16vlintrin.h +++ b/gcc/config/i386/avx512bf16vlintrin.h @@ -174,7 +174,7 @@ _mm_cvtness_sbh (float __A) { __v4sf __V = {__A, 0, 0, 0}; __v8bf __R = __builtin_ia32_cvtneps2bf16_v4sf_mask ((__v4sf)__V, - (__v8bf)_mm_undefined_si128 (), (__mmask8)-1); + (__v8bf)_mm_avx512_undefined_si128 (), (__mmask8)-1); return __R[0]; } diff --git a/gcc/config/i386/avx512fintrin.h b/gcc/config/i386/avx512fintrin.h index 530be29eefa..90a00bec09a 100644 --- a/gcc/config/i386/avx512fintrin.h +++ b/gcc/config/i386/avx512fintrin.h @@ -59,6 +59,26 @@ typedef enum when calling AVX512 intrins implemented with these intrins under no-evex512 function attribute. All AVX512 intrins calling those AVX2 intrins or before will change their calls to these AVX512 version. */ +extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_avx512_undefined_ps (void) +{ +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Winit-self" + __m128 __Y = __Y; +#pragma GCC diagnostic pop + return __Y; +} + +extern __inline __m128d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_avx512_undefined_pd (void) +{ +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Winit-self" + __m128d __Y = __Y; +#pragma GCC diagnostic pop + return __Y; +} + extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_avx512_setzero_ps (void) { @@ -674,13 +694,13 @@ _mm_maskz_scalef_round_ss (__mmask8 __U, __m128 __A, __m128 __B, const int __R) #define _mm_scalef_round_sd(A, B, C) \ ((__m128d) \ __builtin_ia32_scalefsd_mask_round ((A), (B), \ - (__v2df) _mm_undefined_pd (), \ + (__v2df) _mm_avx512_undefined_pd (), \ -1, (C))) #define _mm_scalef_round_ss(A, B, C) \ ((__m128) \ __builtin_ia32_scalefss_mask_round ((A), (B), \ - (__v4sf) _mm_undefined_ps (), \ + (__v4sf) _mm_avx512_undefined_ps (), \ -1, (C))) #define _mm_mask_scalef_round_sd(W, U, A, B, C) \ diff --git a/gcc/config/i386/avx512vbmivlintrin.h b/gcc/config/i386/avx512vbmivlintrin.h index 270e9406db5..acec23b742f 100644 --- a/gcc/config/i386/avx512vbmivlintrin.h +++ b/gcc/config/i386/avx512vbmivlintrin.h @@ -62,7 +62,7 @@ _mm256_multishift_epi64_epi8 (__m256i __X, __m256i __Y) return (__m256i) __builtin_ia32_vpmultishiftqb256_mask ((__v32qi) __X, (__v32qi) __Y, (__v32qi) - _mm256_undefined_si256 (), + _mm256_avx512_undefined_si256 (), (__mmask32) -1); } @@ -94,7 +94,7 @@ _mm_multishift_epi64_epi8 (__m128i __X, __m128i __Y) return (__m128i) __builtin_ia32_vpmultishiftqb128_mask ((__v16qi) __X, (__v16qi) __Y, (__v16qi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask16) -1); } @@ -105,7 +105,7 @@ _mm256_permutexvar_epi8 (__m256i __A, __m256i __B) return (__m256i) __builtin_ia32_permvarqi256_mask ((__v32qi) __B, (__v32qi) __A, (__v32qi) - _mm256_undefined_si256 (), + _mm256_avx512_undefined_si256 (), (__mmask32) -1); } @@ -139,7 +139,7 @@ _mm_permutexvar_epi8 (__m128i __A, __m128i __B) return (__m128i) __builtin_ia32_permvarqi128_mask ((__v16qi) __B, (__v16qi) __A, (__v16qi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask16) -1); } diff --git a/gcc/config/i386/avx512vlbwintrin.h b/gcc/config/i386/avx512vlbwintrin.h index 7654bfaa87e..d7c8ea46df8 100644 --- a/gcc/config/i386/avx512vlbwintrin.h +++ b/gcc/config/i386/avx512vlbwintrin.h @@ -299,7 +299,7 @@ _mm256_cvtepi16_epi8 (__m256i __A) { return (__m128i) __builtin_ia32_pmovwb256_mask ((__v16hi) __A, - (__v16qi)_mm_undefined_si128(), + (__v16qi)_mm_avx512_undefined_si128(), (__mmask16) -1); } @@ -334,7 +334,7 @@ _mm_cvtsepi16_epi8 (__m128i __A) { return (__m128i) __builtin_ia32_pmovswb128_mask ((__v8hi) __A, - (__v16qi)_mm_undefined_si128(), + (__v16qi)_mm_avx512_undefined_si128(), (__mmask8) -1); } @@ -369,7 +369,7 @@ _mm256_cvtsepi16_epi8 (__m256i __A) { return (__m128i) __builtin_ia32_pmovswb256_mask ((__v16hi) __A, - (__v16qi)_mm_undefined_si128(), + (__v16qi)_mm_avx512_undefined_si128(), (__mmask16) -1); } @@ -404,7 +404,7 @@ _mm_cvtusepi16_epi8 (__m128i __A) { return (__m128i) __builtin_ia32_pmovuswb128_mask ((__v8hi) __A, - (__v16qi)_mm_undefined_si128(), + (__v16qi)_mm_avx512_undefined_si128(), (__mmask8) -1); } @@ -440,7 +440,7 @@ _mm256_cvtusepi16_epi8 (__m256i __A) { return (__m128i) __builtin_ia32_pmovuswb256_mask ((__v16hi) __A, - (__v16qi)_mm_undefined_si128(), + (__v16qi)_mm_avx512_undefined_si128(), (__mmask16) -1); } @@ -4089,7 +4089,7 @@ _mm_cvtepi16_epi8 (__m128i __A) { return (__m128i) __builtin_ia32_pmovwb128_mask ((__v8hi) __A, - (__v16qi)_mm_undefined_si128(), + (__v16qi)_mm_avx512_undefined_si128(), (__mmask8) -1); } diff --git a/gcc/config/i386/avx512vldqintrin.h b/gcc/config/i386/avx512vldqintrin.h index 7bb87bbd9be..1949737fe9c 100644 --- a/gcc/config/i386/avx512vldqintrin.h +++ b/gcc/config/i386/avx512vldqintrin.h @@ -388,7 +388,7 @@ _mm256_broadcast_f64x2 (__m128d __A) { return (__m256d) __builtin_ia32_broadcastf64x2_256_mask ((__v2df) __A, - (__v4df)_mm256_undefined_pd(), + (__v4df)_mm256_avx512_undefined_pd(), (__mmask8) -1); } @@ -419,7 +419,7 @@ _mm256_broadcast_i64x2 (__m128i __A) { return (__m256i) __builtin_ia32_broadcasti64x2_256_mask ((__v2di) __A, - (__v4di)_mm256_undefined_si256(), + (__v4di)_mm256_avx512_undefined_si256(), (__mmask8) -1); } @@ -449,7 +449,7 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_broadcast_f32x2 (__m128 __A) { return (__m256) __builtin_ia32_broadcastf32x2_256_mask ((__v4sf) __A, - (__v8sf)_mm256_undefined_ps(), + (__v8sf)_mm256_avx512_undefined_ps(), (__mmask8) -1); } @@ -478,7 +478,7 @@ _mm256_broadcast_i32x2 (__m128i __A) { return (__m256i) __builtin_ia32_broadcasti32x2_256_mask ((__v4si) __A, - (__v8si)_mm256_undefined_si256(), + (__v8si)_mm256_avx512_undefined_si256(), (__mmask8) -1); } @@ -509,7 +509,7 @@ _mm_broadcast_i32x2 (__m128i __A) { return (__m128i) __builtin_ia32_broadcasti32x2_128_mask ((__v4si) __A, - (__v4si)_mm_undefined_si128(), + (__v4si)_mm_avx512_undefined_si128(), (__mmask8) -1); } diff --git a/gcc/config/i386/avx512vlintrin.h b/gcc/config/i386/avx512vlintrin.h index 2b33b82b7ef..d4932f29b56 100644 --- a/gcc/config/i386/avx512vlintrin.h +++ b/gcc/config/i386/avx512vlintrin.h @@ -46,15 +46,49 @@ typedef long long __v4di_u __attribute__ ((__vector_size__ (32), \ __may_alias__, __aligned__ (1))); extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm_avx512_setzero_si128 (void) +_mm_avx512_undefined_si128 (void) { - return __extension__ (__m128i)(__v4si){ 0, 0, 0, 0 }; +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Winit-self" + __m128i __Y = __Y; +#pragma GCC diagnostic pop + return __Y; +} + +extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_avx512_undefined_ps (void) +{ +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Winit-self" + __m256 __Y = __Y; +#pragma GCC diagnostic pop + return __Y; } extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) -_mm256_avx512_setzero_pd (void) +_mm256_avx512_undefined_pd (void) { - return __extension__ (__m256d){ 0.0, 0.0, 0.0, 0.0 }; +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Winit-self" + __m256d __Y = __Y; +#pragma GCC diagnostic pop + return __Y; +} + +extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_avx512_undefined_si256 (void) +{ +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Winit-self" + __m256i __Y = __Y; +#pragma GCC diagnostic pop + return __Y; +} + +extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm_avx512_setzero_si128 (void) +{ + return __extension__ (__m128i)(__v4si){ 0, 0, 0, 0 }; } extern __inline __m256 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) @@ -64,6 +98,12 @@ _mm256_avx512_setzero_ps (void) 0.0, 0.0, 0.0, 0.0 }; } +extern __inline __m256d __attribute__((__gnu_inline__, __always_inline__, __artificial__)) +_mm256_avx512_setzero_pd (void) +{ + return __extension__ (__m256d){ 0.0, 0.0, 0.0, 0.0 }; +} + extern __inline __m256i __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm256_avx512_setzero_si256 (void) { @@ -1652,7 +1692,7 @@ _mm_cvtepi32_epi8 (__m128i __A) { return (__m128i) __builtin_ia32_pmovdb128_mask ((__v4si) __A, (__v16qi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -1687,7 +1727,7 @@ _mm256_cvtepi32_epi8 (__m256i __A) { return (__m128i) __builtin_ia32_pmovdb256_mask ((__v8si) __A, (__v16qi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -1722,7 +1762,7 @@ _mm_cvtsepi32_epi8 (__m128i __A) { return (__m128i) __builtin_ia32_pmovsdb128_mask ((__v4si) __A, (__v16qi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -1757,7 +1797,7 @@ _mm256_cvtsepi32_epi8 (__m256i __A) { return (__m128i) __builtin_ia32_pmovsdb256_mask ((__v8si) __A, (__v16qi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -1792,7 +1832,7 @@ _mm_cvtusepi32_epi8 (__m128i __A) { return (__m128i) __builtin_ia32_pmovusdb128_mask ((__v4si) __A, (__v16qi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -1828,7 +1868,7 @@ _mm256_cvtusepi32_epi8 (__m256i __A) { return (__m128i) __builtin_ia32_pmovusdb256_mask ((__v8si) __A, (__v16qi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -1970,7 +2010,7 @@ _mm256_cvtsepi32_epi16 (__m256i __A) { return (__m128i) __builtin_ia32_pmovsdw256_mask ((__v8si) __A, (__v8hi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2005,7 +2045,7 @@ _mm_cvtusepi32_epi16 (__m128i __A) { return (__m128i) __builtin_ia32_pmovusdw128_mask ((__v4si) __A, (__v8hi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2040,7 +2080,7 @@ _mm256_cvtusepi32_epi16 (__m256i __A) { return (__m128i) __builtin_ia32_pmovusdw256_mask ((__v8si) __A, (__v8hi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2075,7 +2115,7 @@ _mm_cvtepi64_epi8 (__m128i __A) { return (__m128i) __builtin_ia32_pmovqb128_mask ((__v2di) __A, (__v16qi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2110,7 +2150,7 @@ _mm256_cvtepi64_epi8 (__m256i __A) { return (__m128i) __builtin_ia32_pmovqb256_mask ((__v4di) __A, (__v16qi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2145,7 +2185,7 @@ _mm_cvtsepi64_epi8 (__m128i __A) { return (__m128i) __builtin_ia32_pmovsqb128_mask ((__v2di) __A, (__v16qi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2180,7 +2220,7 @@ _mm256_cvtsepi64_epi8 (__m256i __A) { return (__m128i) __builtin_ia32_pmovsqb256_mask ((__v4di) __A, (__v16qi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2215,7 +2255,7 @@ _mm_cvtusepi64_epi8 (__m128i __A) { return (__m128i) __builtin_ia32_pmovusqb128_mask ((__v2di) __A, (__v16qi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2251,7 +2291,7 @@ _mm256_cvtusepi64_epi8 (__m256i __A) { return (__m128i) __builtin_ia32_pmovusqb256_mask ((__v4di) __A, (__v16qi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2287,7 +2327,7 @@ _mm_cvtepi64_epi16 (__m128i __A) { return (__m128i) __builtin_ia32_pmovqw128_mask ((__v2di) __A, (__v8hi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2323,7 +2363,7 @@ _mm256_cvtepi64_epi16 (__m256i __A) { return (__m128i) __builtin_ia32_pmovqw256_mask ((__v4di) __A, (__v8hi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2358,7 +2398,7 @@ _mm_cvtsepi64_epi16 (__m128i __A) { return (__m128i) __builtin_ia32_pmovsqw128_mask ((__v2di) __A, (__v8hi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2393,7 +2433,7 @@ _mm256_cvtsepi64_epi16 (__m256i __A) { return (__m128i) __builtin_ia32_pmovsqw256_mask ((__v4di) __A, (__v8hi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2428,7 +2468,7 @@ _mm_cvtusepi64_epi16 (__m128i __A) { return (__m128i) __builtin_ia32_pmovusqw128_mask ((__v2di) __A, (__v8hi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2463,7 +2503,7 @@ _mm256_cvtusepi64_epi16 (__m256i __A) { return (__m128i) __builtin_ia32_pmovusqw256_mask ((__v4di) __A, (__v8hi) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2498,7 +2538,7 @@ _mm_cvtepi64_epi32 (__m128i __A) { return (__m128i) __builtin_ia32_pmovqd128_mask ((__v2di) __A, (__v4si) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2534,7 +2574,7 @@ _mm256_cvtepi64_epi32 (__m256i __A) { return (__m128i) __builtin_ia32_pmovqd256_mask ((__v4di) __A, (__v4si) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2569,7 +2609,7 @@ _mm_cvtsepi64_epi32 (__m128i __A) { return (__m128i) __builtin_ia32_pmovsqd128_mask ((__v2di) __A, (__v4si) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2604,7 +2644,7 @@ _mm256_cvtsepi64_epi32 (__m256i __A) { return (__m128i) __builtin_ia32_pmovsqd256_mask ((__v4di) __A, (__v4si) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2640,7 +2680,7 @@ _mm_cvtusepi64_epi32 (__m128i __A) { return (__m128i) __builtin_ia32_pmovusqd128_mask ((__v2di) __A, (__v4si) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2675,7 +2715,7 @@ _mm256_cvtusepi64_epi32 (__m256i __A) { return (__m128i) __builtin_ia32_pmovusqd256_mask ((__v4di) __A, (__v4si) - _mm_undefined_si128 (), + _mm_avx512_undefined_si128 (), (__mmask8) -1); } @@ -2914,7 +2954,7 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_broadcast_f32x4 (__m128 __A) { return (__m256) __builtin_ia32_broadcastf32x4_256_mask ((__v4sf) __A, - (__v8sf)_mm256_undefined_pd (), + (__v8sf)_mm256_avx512_undefined_pd (), (__mmask8) -1); } @@ -2943,7 +2983,7 @@ _mm256_broadcast_i32x4 (__m128i __A) { return (__m256i) __builtin_ia32_broadcasti32x4_256_mask ((__v4si) __A, - (__v8si)_mm256_undefined_si256 (), + (__v8si)_mm256_avx512_undefined_si256 (), (__mmask8) -1); } @@ -12315,7 +12355,7 @@ _mm256_permutex_pd (__m256d __X, const int __M) { return (__m256d) __builtin_ia32_permdf256_mask ((__v4df) __X, __M, (__v4df) - _mm256_undefined_pd (), + _mm256_avx512_undefined_pd (), (__mmask8) -1); } @@ -12323,7 +12363,7 @@ _mm256_permutex_pd (__m256d __X, const int __M) #define _mm256_permutex_pd(X, M) \ ((__m256d) __builtin_ia32_permdf256_mask ((__v4df)(__m256d)(X), (int)(M), \ (__v4df)(__m256d) \ - _mm256_undefined_pd (), \ + _mm256_avx512_undefined_pd (), \ (__mmask8)-1)) #define _mm256_permutex_epi64(X, I) \