From patchwork Thu Apr 20 00:46:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 85668 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp765118vqo; Wed, 19 Apr 2023 17:47:07 -0700 (PDT) X-Google-Smtp-Source: AKy350aUFFRn32RVBAcfhoRlDDztQ3LrwJ5gPt7MsBM8EaPR3tUoRBrjMIbEsSiWVKdowwf30EVM X-Received: by 2002:a17:906:924b:b0:94f:3045:2255 with SMTP id c11-20020a170906924b00b0094f30452255mr13440285ejx.50.1681951627488; Wed, 19 Apr 2023 17:47:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681951627; cv=none; d=google.com; s=arc-20160816; b=t/1H875oyxofTWA5mOGS9bfHsTrWMiF+ElIP8ZO1nmJn3V2BVrgrfxYk1JcAFpHuC6 6yg/bdTD+z2ZKzOAtc1f1AOEwq/YMqG7brQtRP/SX/yvV4Lqf0VMCASPKVLv0/ZOyJQZ x3e7AQgSmVZYFQZSrYBZ7QuNbGh4/HjeSyZ17T4sSzjWG0czSZr7UK2aa7poCGt0D6ah Z8cFxZN+vGlavjJx+OAVrUMaAUwQfQ9DiLia1EmUg8j2IdcpqAMb2fAbRSaPu+ud0NRI Z5g1i45YrYAT+BB/pnvUKZH0ybFIHqvtUND+tuDBt6TsyWHrjgHkw1aegyOdW3W6L3ie 0+oA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=Z6G4zTV3i891bc3t5X+uNwmENTXXycasjaXNxunSbaA=; b=VlkJu53Oeyt+9yaQXCk722TRuzh7P/9hpIGtcNUgGmmJi5vqXzmKz5KQcElscug0ch 0N4YrqyMERr9JCtwiJhgv3STNSrm+Ec+Msr/sXfQg2Tp/8M64ONstRyURfUgX1/Nkhit 8SDzY5kFIEpp6NlxLR1ESR/OjbFYgduqZfkD8aD61eNGqCmPvSmoFjm91HjCJ66E2gcJ D45BKVl24rLOyulLR/mx7nZYPXFyMKpOOuHtpjg2uvQhAMlwSq4TCvXcYsplEAgKUFqs WOMhuVBDbuyZATOpADiCVvvB5L7Fx0VpSzMpo/uAuf1JBVJYGZ1F1AuLKK3EXUY3yD1o Z7RQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=IxLvAb7k; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id hp16-20020a1709073e1000b0094faa981759si224017ejc.1004.2023.04.19.17.47.07 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Apr 2023 17:47:07 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=IxLvAb7k; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6DC6C3858401 for ; Thu, 20 Apr 2023 00:47:06 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6DC6C3858401 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1681951626; bh=Z6G4zTV3i891bc3t5X+uNwmENTXXycasjaXNxunSbaA=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=IxLvAb7kGVPrKg+g0ub6vbDY0PYo48m1MQMb749tejvmpqyDRf2BxnODSSzKKlCvL Qna03UqMPNEJdtmRn6rrPr0T5yyt7P/x6PUArlYH1Yua32fmMWl7Pbo9g5KI3iPdbE Ih6IegO25+y0Pm2mGwbcKotmcmVzh6DC2tm5OqVg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by sourceware.org (Postfix) with ESMTPS id AB49D3858D33 for ; Thu, 20 Apr 2023 00:46:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AB49D3858D33 X-IronPort-AV: E=McAfee;i="6600,9927,10685"; a="347455441" X-IronPort-AV: E=Sophos;i="5.99,210,1677571200"; d="scan'208";a="347455441" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2023 17:46:18 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10685"; a="691708840" X-IronPort-AV: E=Sophos;i="5.99,210,1677571200"; d="scan'208";a="691708840" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga002.jf.intel.com with ESMTP; 19 Apr 2023 17:46:16 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id CEBC6100AC77; Thu, 20 Apr 2023 08:46:15 +0800 (CST) To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com, hjl.tools@gmail.com Subject: [PATCH 1/2] Use NO_REGS in cost calculation when the preferred register class are not known yet. Date: Thu, 20 Apr 2023 08:46:14 +0800 Message-Id: <20230420004615.2434390-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.39.1.388.g2fc9e9ca3c MIME-Version: 1.0 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763654109531232866?= X-GMAIL-MSGID: =?utf-8?q?1763654109531232866?= 1547 /* If this insn loads a parameter from its stack slot, then it 1548 represents a savings, rather than a cost, if the parameter is 1549 stored in memory. Record this fact. 1550 1551 Similarly if we're loading other constants from memory (constant 1552 pool, TOC references, small data areas, etc) and this is the only 1553 assignment to the destination pseudo. At that time, preferred regclass is unknown, and GENERAL_REGS is used to record memory move cost, but it's not accurate especially for large vector modes, i.e. 512-bit vector in x86 which would most probably allocate with SSE_REGS instead of GENERAL_REGS. Using GENERAL_REGS here will overestimate the cost of this load and make RA propagate the memeory operand into many consume instructions which causes worse performance. Fortunately, NO_REGS is used to record the best scenario, so the patch uses NO_REGS instead of GENERAL_REGS here, it could help RA in PR108707. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,} and aarch64-linux-gnu. Ok for trunk? gcc/ChangeLog: PR rtl-optimization/108707 * ira-costs.cc (scan_one_insn): Use NO_REGS instead of GENERAL_REGS when preferred reg_class is not known. gcc/testsuite/ChangeLog: * gcc.target/i386/pr108707.c: New test. --- gcc/ira-costs.cc | 5 ++++- gcc/testsuite/gcc.target/i386/pr108707.c | 16 ++++++++++++++++ 2 files changed, 20 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr108707.c diff --git a/gcc/ira-costs.cc b/gcc/ira-costs.cc index c0fdef807dd..d2a801ab9b0 100644 --- a/gcc/ira-costs.cc +++ b/gcc/ira-costs.cc @@ -1572,7 +1572,10 @@ scan_one_insn (rtx_insn *insn) && (! ira_use_lra_p || ! pic_offset_table_rtx || ! contains_symbol_ref_p (XEXP (note, 0)))) { - enum reg_class cl = GENERAL_REGS; + /* Costs for NO_REGS are used in cost calculation on the + 1st pass when the preferred register classes are not + known yet. In this case we take the best scenario. */ + enum reg_class cl = NO_REGS; rtx reg = SET_DEST (set); int num = COST_INDEX (REGNO (reg)); diff --git a/gcc/testsuite/gcc.target/i386/pr108707.c b/gcc/testsuite/gcc.target/i386/pr108707.c new file mode 100644 index 00000000000..bc1a476f551 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr108707.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2" } */ +/* { dg-final { scan-assembler-not {(?n)vfmadd[1-3]*ps.*\(} } } */ +/* { dg-final { scan-assembler-times {(?n)vfmadd[1-3]*ps[ \t]*} 3 } } */ + +#include + +void +foo (__m512 pv, __m512 a, __m512 b, __m512 c, + __m512* pdest, __m512* p1) +{ + __m512 t = *p1; + pdest[0] = _mm512_fmadd_ps (t, pv, a); + pdest[1] = _mm512_fmadd_ps (t, pv, b); + pdest[2] = _mm512_fmadd_ps (t, pv, c); +} From patchwork Thu Apr 20 00:46:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 85669 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp765607vqo; Wed, 19 Apr 2023 17:48:07 -0700 (PDT) X-Google-Smtp-Source: AKy350YlhcW6touO/9o++P6G/+u//Plbrkm/NFdxSzx3wnod+H8vvkhhi09ZO6i1rB01hXAOh4YW X-Received: by 2002:a05:6402:3592:b0:506:bd27:a2f0 with SMTP id y18-20020a056402359200b00506bd27a2f0mr4540491edc.15.1681951687476; Wed, 19 Apr 2023 17:48:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681951687; cv=none; d=google.com; s=arc-20160816; b=mDlRcrsxZoAMop/5q82qRUTgY34b2lYkvbZtz9JQJPEDyqoXSW/nGsgsfTotJzzoF8 uZ/dl76fD3IOLDL5G7QEs9C4ZFl3cmcwhi4tfN1SbeYtqG74Xlnfx0d7HPkf4Sede9yQ f5EFmpUzf5hwKUhxprk3Q0kd/JMv45YKXxIp5zpXU6j1rnYKGfsSVA7JL6a52rUynV+j N+uAHwORpZE5qndQ30JOVhtU21wgG4NykiD6M5UwUfZVZeHMaBgE4ncyhYYJGpy77A0Q XVgdlraFgT+jfng/5SPIK++mlOaTX5q1bLDvHknOtvbWNqTPTVMTQa5z9JvkwpKWTWQq szhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=C+r3mfLKEpWI2CpxgvGPTU3Bh5mc/Bl7rGpl2DosTkQ=; b=IsFUqpCLTh45gbqgu4wewNeUqImGBVLuuSRLyQvc5A5+xUz56lRvjzAXWUgDE2Scp5 ASFVQrKsfD5OaAwogwsb3NnskJyCKYWk3zttWgHhRtm4/YOzQY8fPgOowAj6ltsbzx41 cpJ4hJXOZqU7CiSJ2ZKuCSyzL3TbLHhLF6YaprsuqvMk5VYba8l3giyGm74LOR/ZTbny l/WYrwofY9ynvzz1LuBK3U6rpWK/iSgajQhOOE/gI/movTCNlRE/Q4P976KFvzJWAe2k Zc1sV7y9Low8rUNsVZkxeQ2xYQGnhZGDuMffZBqVT1qaNQRkABFBCz1Vj9hIeHoQev5W /TQA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Rf00d3yk; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id r25-20020aa7c159000000b0050493497910si336591edp.41.2023.04.19.17.48.07 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 19 Apr 2023 17:48:07 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Rf00d3yk; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4B1F23858C1F for ; Thu, 20 Apr 2023 00:48:06 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4B1F23858C1F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1681951686; bh=C+r3mfLKEpWI2CpxgvGPTU3Bh5mc/Bl7rGpl2DosTkQ=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=Rf00d3yk5O7b70SHsHVmaCYDwob7bZ6e9I1gdwskwguuGyS2w1bc0JSQiN3UM1nac k/9AAJ+N8gzjZYSatFsu8yav2sl6dE40K/7M8BgFmxKuMor10GIJeRPHChF5PUIdxv Gg5QfGIwQ63VwNq5MQ0YvRaajSDmfkeeIWWQ0CBE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by sourceware.org (Postfix) with ESMTPS id 6625F3858D37 for ; Thu, 20 Apr 2023 00:46:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6625F3858D37 X-IronPort-AV: E=McAfee;i="6600,9927,10685"; a="347455453" X-IronPort-AV: E=Sophos;i="5.99,210,1677571200"; d="scan'208";a="347455453" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Apr 2023 17:46:21 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10685"; a="691708841" X-IronPort-AV: E=Sophos;i="5.99,210,1677571200"; d="scan'208";a="691708841" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga002.jf.intel.com with ESMTP; 19 Apr 2023 17:46:16 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id D1C75100AC78; Thu, 20 Apr 2023 08:46:15 +0800 (CST) To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com, hjl.tools@gmail.com Subject: [PATCH 2/2] Adjust testcases after better RA decision. Date: Thu, 20 Apr 2023 08:46:15 +0800 Message-Id: <20230420004615.2434390-2-hongtao.liu@intel.com> X-Mailer: git-send-email 2.39.1.388.g2fc9e9ca3c In-Reply-To: <20230420004615.2434390-1-hongtao.liu@intel.com> References: <20230420004615.2434390-1-hongtao.liu@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763654172438644518?= X-GMAIL-MSGID: =?utf-8?q?1763654172438644518?= After optimization for RA, memory op is not propagated into instructions(>1), and it make testcases not generate vxorps since the memory is loaded into the dest, and the dest is never unused now. So rewrite testcases to make the codegen more stable. gcc/testsuite/ChangeLog: * gcc.target/i386/avx2-dest-false-dep-for-glc.c: Rewrite testcase to make the codegen more stable. * gcc.target/i386/avx512dq-dest-false-dep-for-glc.c: Ditto * gcc.target/i386/avx512f-dest-false-dep-for-glc.c: Ditto. * gcc.target/i386/avx512fp16-dest-false-dep-for-glc.c: Ditto. * gcc.target/i386/avx512vl-dest-false-dep-for-glc.c: Ditto. --- .../i386/avx2-dest-false-dep-for-glc.c | 28 +- .../i386/avx512dq-dest-false-dep-for-glc.c | 257 ++++++++++--- .../i386/avx512f-dest-false-dep-for-glc.c | 348 ++++++++++++++---- .../i386/avx512fp16-dest-false-dep-for-glc.c | 118 ++++-- .../i386/avx512vl-dest-false-dep-for-glc.c | 243 +++++++++--- gcc/testsuite/gcc.target/i386/pr108707.c | 2 +- 6 files changed, 791 insertions(+), 205 deletions(-) diff --git a/gcc/testsuite/gcc.target/i386/avx2-dest-false-dep-for-glc.c b/gcc/testsuite/gcc.target/i386/avx2-dest-false-dep-for-glc.c index fe331fe5e2c..e260888627f 100644 --- a/gcc/testsuite/gcc.target/i386/avx2-dest-false-dep-for-glc.c +++ b/gcc/testsuite/gcc.target/i386/avx2-dest-false-dep-for-glc.c @@ -5,16 +5,28 @@ #include -extern __m256i i1, i2, i3, i4; -extern __m256d d1, d2; -extern __m256 f1, f2; +__m256i +foo0 (__m256i i3, __m256i i1, __m256i i2) +{ + return _mm256_permutevar8x32_epi32 (i1, i2); +} + +__m256i +foo1 (__m256i i2, __m256i i1) +{ + return _mm256_permute4x64_epi64 (i1, 12); +} + +__m256d +foo2 (__m256d d2, __m256d d1) +{ + return _mm256_permute4x64_pd (d1, 12); +} -void vperm_test (void) +__m256 +foo3 (__m256 f2, __m256i i2, __m256 f1) { - i3 = _mm256_permutevar8x32_epi32 (i1, i2); - i4 = _mm256_permute4x64_epi64 (i1, 12); - d2 = _mm256_permute4x64_pd (d1, 12); - f2 = _mm256_permutevar8x32_ps (f1, i2); + return _mm256_permutevar8x32_ps (f1, i2); } /* { dg-final { scan-assembler-times "vxorps" 4 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512dq-dest-false-dep-for-glc.c b/gcc/testsuite/gcc.target/i386/avx512dq-dest-false-dep-for-glc.c index b334b88194b..b615b55558d 100644 --- a/gcc/testsuite/gcc.target/i386/avx512dq-dest-false-dep-for-glc.c +++ b/gcc/testsuite/gcc.target/i386/avx512dq-dest-false-dep-for-glc.c @@ -13,56 +13,219 @@ extern __m512 f1, f11; extern __m256 f2; extern __m128 f3, f33; -__mmask32 m32; __mmask16 m16; __mmask8 m8; -void mullo_test (void) -{ - i1 = _mm512_mullo_epi64 (i1, i1); - i1 = _mm512_mask_mullo_epi64 (i1, m8, i1, i1); - i1 = _mm512_maskz_mullo_epi64 (m8, i1, i1); - i2 = _mm256_mullo_epi64 (i2, i2); - i2 = _mm256_mask_mullo_epi64 (i2, m8, i2, i2); - i2 = _mm256_maskz_mullo_epi64 (m8, i2, i2); - i3 = _mm_mullo_epi64 (i3, i3); - i3 = _mm_mask_mullo_epi64 (i3, m8, i3, i3); - i3 = _mm_maskz_mullo_epi64 (m8, i3, i3); -} - -void range_test (void) -{ - d1 = _mm512_range_pd (d1, d11, 15); - d11 = _mm512_range_round_pd (d11, d1, 15, 8); - d1 = _mm512_mask_range_pd (d1, m8, d11, d11, 15); - d11 = _mm512_mask_range_round_pd (d11, m8, d1, d1, 15, 8); - d1 = _mm512_maskz_range_pd (m8, d11, d11, 15); - d11 = _mm512_maskz_range_round_pd (m8, d1, d1, 15, 8); - d2 = _mm256_range_pd (d2, d2, 15); - d2 = _mm256_mask_range_pd (d2, m8, d2, d2, 15); - d2 = _mm256_maskz_range_pd (m8, d2, d2, 15); - d3 = _mm_range_pd (d3, d3, 15); - d3 = _mm_mask_range_pd (d3, m8, d3, d3, 15); - d3 = _mm_maskz_range_pd (m8, d3, d3, 15); - d33 = _mm_range_sd (d33, d33, 15); - d33 = _mm_mask_range_sd (d33, m8, d33, d33, 15); - d33 = _mm_maskz_range_sd (m8, d33, d33, 15); - - f1 = _mm512_range_ps (f1, f11, 15); - f11 = _mm512_range_round_ps (f11, f1, 15, 8); - f1 = _mm512_mask_range_ps (f1, m16, f11, f11, 15); - f11 = _mm512_mask_range_round_ps (f11, m16, f1, f1, 15, 8); - f1 = _mm512_maskz_range_ps (m16, f11, f11, 15); - f11 = _mm512_maskz_range_round_ps (m16, f1, f1, 15, 8); - f2 = _mm256_range_ps (f2, f2, 15); - f2 = _mm256_mask_range_ps (f2, m8, f2, f2, 15); - f2 = _mm256_maskz_range_ps (m8, f2, f2, 15); - f3 = _mm_range_ps (f3, f3, 15); - f3 = _mm_mask_range_ps (f3, m8, f3, f3, 15); - f3 = _mm_maskz_range_ps (m8, f3, f3, 15); - f33 = _mm_range_ss (f33, f33, 15); - f33 = _mm_mask_range_ss (f33, m8, f33, f33, 15); - f33 = _mm_maskz_range_ss (m8, f33, f33, 15); +#define MULLO(func, type) \ + type \ + mullo##type (type i2, type i1) \ + { \ + return func (i1, i1); \ + } + +#define MULLO_MASK(func, type) \ + type \ + mullo_mask##type (type i2, type i1) \ + { \ + return func (i1, m8, i1, i1); \ + } + +#define MULLO_MASKZ(func, type) \ + type \ + mullo_maksz##type (type i2, type i1) \ + { \ + return func (m8, i1, i1); \ + } + +MULLO (_mm512_mullo_epi64, __m512i); +MULLO_MASK (_mm512_mask_mullo_epi64, __m512i); +MULLO_MASKZ (_mm512_maskz_mullo_epi64, __m512i); +MULLO (_mm256_mullo_epi64, __m256i); +MULLO_MASK (_mm256_mask_mullo_epi64, __m256i); +MULLO_MASKZ (_mm256_maskz_mullo_epi64, __m256i); +MULLO (_mm_mullo_epi64, __m128i); +MULLO_MASK (_mm_mask_mullo_epi64, __m128i); +MULLO_MASKZ (_mm_maskz_mullo_epi64, __m128i); + + +__m512d +foo1 (__m512d d2, __m512d d1, __m512d d11) +{ + return _mm512_range_pd (d1, d11, 15); +} + +__m512d +foo2 (__m512d d2, __m512d d1, __m512d d11) +{ + return _mm512_range_round_pd (d11, d1, 15, 8); +} + +__m512d +foo3 (__m512d d2, __m512d d1, __m512d d11) +{ + return _mm512_mask_range_pd (d1, m8, d11, d11, 15); +} + +__m512d +foo4 (__m512d d2, __m512d d1, __m512d d11) +{ + return _mm512_mask_range_round_pd (d11, m8, d1, d1, 15, 8); +} + +__m512d +foo5 (__m512d d2, __m512d d1, __m512d d11) +{ + return _mm512_maskz_range_pd (m8, d11, d11, 15); +} + +__m512d +foo6 (__m512d d2, __m512d d1, __m512d d11) +{ + return _mm512_maskz_range_round_pd (m8, d1, d1, 15, 8); +} + +__m256d +foo7 (__m256d d1, __m256d d2) +{ + return _mm256_range_pd (d2, d2, 15); +} + +__m256d +foo8 (__m256d d1, __m256d d2) +{ + return _mm256_mask_range_pd (d2, m8, d2, d2, 15); +} + +__m256d +foo9 (__m256d d1, __m256d d2) +{ + return _mm256_maskz_range_pd (m8, d2, d2, 15); +} + +__m128d +foo10 (__m128d d1, __m128d d3) +{ + return _mm_range_pd (d3, d3, 15); +} + +__m128d +foo11 (__m128d d1, __m128d d3) +{ + return _mm_mask_range_pd (d3, m8, d3, d3, 15); +} + +__m128d +foo12 (__m128d d1, __m128d d3) +{ + return _mm_maskz_range_pd (m8, d3, d3, 15); +} + +__m128d +foo13 (__m128d d1, __m128d d33) +{ + return _mm_range_sd (d33, d33, 15); +} + +__m128d +foo14 (__m128d d1, __m128d d33) +{ + return _mm_mask_range_sd (d33, m8, d33, d33, 15); +} + +__m128d +foo15 (__m128d d1, __m128d d33) +{ + return _mm_maskz_range_sd (m8, d33, d33, 15); +} + +__m512 +bar1 (__m512 d2, __m512 d1, __m512 d11) +{ + return _mm512_range_ps (d1, d11, 15); +} + +__m512 +bar2 (__m512 d2, __m512 d1, __m512 d11) +{ + return _mm512_range_round_ps (d11, d1, 15, 8); +} + +__m512 +bar3 (__m512 d2, __m512 d1, __m512 d11) +{ + return _mm512_mask_range_ps (d1, m16, d11, d11, 15); +} + +__m512 +bar4 (__m512 d2, __m512 d1, __m512 d11) +{ + return _mm512_mask_range_round_ps (d11, m16, d1, d1, 15, 8); +} + +__m512 +bar5 (__m512 d2, __m512 d1, __m512 d11) +{ + return _mm512_maskz_range_ps (m16, d11, d11, 15); +} + +__m512 +bar6 (__m512 d2, __m512 d1, __m512 d11) +{ + return _mm512_maskz_range_round_ps (m16, d1, d1, 15, 8); +} + +__m256 +bar7 (__m256 d1, __m256 d2) +{ + return _mm256_range_ps (d2, d2, 15); +} + +__m256 +bar8 (__m256 d1, __m256 d2) +{ + return _mm256_mask_range_ps (d2, m8, d2, d2, 15); +} + +__m256 +bar9 (__m256 d1, __m256 d2) +{ + return _mm256_maskz_range_ps (m8, d2, d2, 15); +} + +__m128 +bar10 (__m128 d1, __m128 d3) +{ + return _mm_range_ps (d3, d3, 15); +} + +__m128 +bar11 (__m128 d1, __m128 d3) +{ + return _mm_mask_range_ps (d3, m8, d3, d3, 15); +} + +__m128 +bar12 (__m128 d1, __m128 d3) +{ + return _mm_maskz_range_ps (m8, d3, d3, 15); +} + +__m128 +bar13 (__m128 d1, __m128 d33) +{ + return _mm_range_ss (d33, d33, 15); +} + +__m128 +bar14 (__m128 d1, __m128 d33) +{ + return _mm_mask_range_ss (d33, m8, d33, d33, 15); +} + +__m128 +bar15 (__m128 d1, __m128 d33) +{ + return _mm_maskz_range_ss (m8, d33, d33, 15); } /* { dg-final { scan-assembler-times "vxorps" 26 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512f-dest-false-dep-for-glc.c b/gcc/testsuite/gcc.target/i386/avx512f-dest-false-dep-for-glc.c index 26e4ba7e969..1517878ef85 100644 --- a/gcc/testsuite/gcc.target/i386/avx512f-dest-false-dep-for-glc.c +++ b/gcc/testsuite/gcc.target/i386/avx512f-dest-false-dep-for-glc.c @@ -13,86 +13,288 @@ volatile __m512d *pd11; __mmask16 m16; __mmask8 m8; -void vperm_test (void) -{ - d1 = _mm512_permutex_pd (d1, 12); - d1 = _mm512_mask_permutex_pd (d1, m8, d1, 13); - d1 = _mm512_maskz_permutex_pd (m8, d1, 14); - d11 = _mm512_permutexvar_pd (i1, d11); - d11 = _mm512_mask_permutexvar_pd (d11, m8, i2, d11); - d11 = _mm512_maskz_permutexvar_pd (m8, i3, d11); - - f1 = _mm512_permutexvar_ps (i1, f1); - f1 = _mm512_mask_permutexvar_ps (f1, m16, i1, f1); - f1 = _mm512_maskz_permutexvar_ps (m16, i1, f1); - - i3 = _mm512_permutexvar_epi64 (i3, i3); - i3 = _mm512_mask_permutexvar_epi64 (i3, m8, i1, i1); - i3 = _mm512_maskz_permutexvar_epi64 (m8, i3, i1); - i1 = _mm512_permutex_epi64 (i3, 12); - i1 = _mm512_mask_permutex_epi64 (i1, m8, i1, 12); - i1 = _mm512_maskz_permutex_epi64 (m8, i1, 12); - - i2 = _mm512_permutexvar_epi32 (i2, i2); - i2 = _mm512_mask_permutexvar_epi32 (i2, m16, i2, i2); - i3 = _mm512_maskz_permutexvar_epi32 (m16, i3, i3); -} - -void getmant_test (void) -{ - d1 = _mm512_getmant_pd (*pd1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - d1 = _mm512_getmant_round_pd (*pd11, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); - d1 = _mm512_mask_getmant_pd (d1, m8, *pd1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - d1 = _mm512_mask_getmant_round_pd (d1, m8, *pd1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); - d1 = _mm512_maskz_getmant_pd (m8, *pd1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - d1 = _mm512_maskz_getmant_round_pd (m8, *pd1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); - f1 = _mm512_getmant_ps (*pf1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f1 = _mm512_getmant_round_ps (*pf1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); - f1 = _mm512_mask_getmant_ps (f1, m16, *pf1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f1 = _mm512_mask_getmant_round_ps (f1, m16, *pf1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); - f1 = _mm512_maskz_getmant_ps (m16, *pf1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f1 = _mm512_maskz_getmant_round_ps (m16, *pf1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); - - d2 = _mm_getmant_sd (d2, d2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - d2 = _mm_getmant_round_sd (d2, d2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); - d2 = _mm_mask_getmant_sd (d2, m8, d2, d2, _MM_MANT_NORM_p75_1p5, +__m512d +foo1 (__m512d d2, __m512d d1) +{ + return _mm512_permutex_pd (d1, 12); +} + +__m512d +foo2 (__m512d d2, __m512d d1) +{ + return _mm512_mask_permutex_pd (d1, m8, d1, 13); +} + +__m512d +foo3 (__m512d d2, __m512d d1) +{ + return _mm512_maskz_permutex_pd (m8, d1, 14); +} + +__m512d +foo4 (__m512d d2, __m512d d11, __m512i i1) +{ + return _mm512_permutexvar_pd (i1, d11); +} + +__m512d +foo5 (__m512d d2, __m512d d11, __m512i i2) +{ + return _mm512_mask_permutexvar_pd (d11, m8, i2, d11); +} + +__m512d +foo6 (__m512d d2, __m512d d11, __m512i i3) +{ + return _mm512_maskz_permutexvar_pd (m8, i3, d11); +} + +__m512i +ioo1 (__m512i d2, __m512i d1) +{ + return _mm512_permutex_epi64 (d1, 12); +} + +__m512i +ioo2 (__m512i d2, __m512i d1) +{ + return _mm512_mask_permutex_epi64 (d1, m8, d1, 13); +} + +__m512i +ioo3 (__m512i d2, __m512i d1) +{ + return _mm512_maskz_permutex_epi64 (m8, d1, 14); +} + +__m512i +ioo4 (__m512i d2, __m512i d11, __m512i i1) +{ + return _mm512_permutexvar_epi64 (i1, d11); +} + +__m512i +ioo5 (__m512i d2, __m512i d11, __m512i i2) +{ + return _mm512_mask_permutexvar_epi64 (d11, m8, i2, d11); +} + +__m512i +ioo6 (__m512i d2, __m512i d11, __m512i i3) +{ + return _mm512_maskz_permutexvar_epi64 (m8, i3, d11); +} + +__m512 +koo1 (__m512 f2, __m512i i1, __m512 f1) +{ + return _mm512_permutexvar_ps (i1, f1); +} + +__m512 +koo2 (__m512 f2, __m512i i1, __m512 f1) +{ + return _mm512_mask_permutexvar_ps (f1, m16, i1, f1); +} + +__m512 +koo3 (__m512 f2, __m512i i1, __m512 f1) +{ + return _mm512_maskz_permutexvar_ps (m16, i1, f1); +} + +__m512i +hoo1 (__m512i f2, __m512i i1, __m512i f1) +{ + return _mm512_permutexvar_epi32 (i1, f1); +} + +__m512i +hoo2 (__m512i f2, __m512i i1, __m512i f1) +{ + return _mm512_mask_permutexvar_epi32 (f1, m16, i1, f1); +} + +__m512i +hoo3 (__m512i f2, __m512i i1, __m512i f1) +{ + return _mm512_maskz_permutexvar_epi32 (m16, i1, f1); +} + +__m512d +moo1 (__m512d d2, __m512d* d1) +{ + return _mm512_getmant_pd (*d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m512d +moo2 (__m512d d2, __m512d* d1) +{ + return _mm512_getmant_round_pd (*d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); +} + +__m512d +moo3 (__m512d d2, __m512d d1, __m512d* d3) +{ + + return _mm512_mask_getmant_pd (d1, m8, *d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m512d +moo4 (__m512d d2, __m512d d1, __m512d* d3) +{ + return _mm512_mask_getmant_round_pd (d1, m8, *d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); +} + +__m512d +moo5 (__m512d d2, __m512d* d1) +{ + return _mm512_maskz_getmant_pd (m8, *d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m512d +moo6 (__m512d d2, __m512d* d1, __m512d d3) +{ + return _mm512_maskz_getmant_round_pd (m8, *d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); +} + +__m512 +noo1 (__m512 d2, __m512* d1) +{ + return _mm512_getmant_ps (*d1, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src); - d2 = _mm_mask_getmant_round_sd (d2, m8, d2, d2, _MM_MANT_NORM_p75_1p5, +} + +__m512 +noo2 (__m512 d2, __m512* d1) +{ + return _mm512_getmant_round_ps (*d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); +} + +__m512 +noo3 (__m512 d2, __m512 d1, __m512* d3) +{ + + return _mm512_mask_getmant_ps (d1, m16, *d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m512 +noo4 (__m512 d2, __m512 d1, __m512* d3) +{ + return _mm512_mask_getmant_round_ps (d1, m16, *d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); +} + +__m512 +noo5 (__m512 d2, __m512* d1) +{ + return _mm512_maskz_getmant_ps (m16, *d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m512 +noo6 (__m512 d2, __m512* d1, __m512 d3) +{ + return _mm512_maskz_getmant_round_ps (m16, *d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); +} + + +__m128d +ooo1 (__m128d d2, __m128d d1) +{ + return _mm_getmant_sd (d1, d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128d +ooo2 (__m128d d2, __m128d d1) +{ + return _mm_getmant_round_sd (d1, d1, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src, 8); - d2 = _mm_maskz_getmant_sd (m8, d2, d2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - d2 = _mm_maskz_getmant_round_sd (m8, d2, d2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); - f2 = _mm_getmant_ss (f2, f2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f2 = _mm_getmant_round_ss (f2, f2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); - f2 = _mm_mask_getmant_ss (f2, m8, f2, f2, _MM_MANT_NORM_p75_1p5, +} + +__m128d +ooo3 (__m128d d2, __m128d d1, __m128d d3) +{ + + return _mm_mask_getmant_sd (d1, m8, d3, d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128d +ooo4 (__m128d d2, __m128d d1, __m128d d3) +{ + return _mm_mask_getmant_round_sd (d1, m8, d3, d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); +} + +__m128d +ooo5 (__m128d d2, __m128d d1) +{ + return _mm_maskz_getmant_sd (m8, d1, d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128d +ooo6 (__m128d d2, __m128d d1, __m128d d3) +{ + return _mm_maskz_getmant_round_sd (m8, d1, d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); +} + +__m128 +poo1 (__m128 d2, __m128 d1) +{ + return _mm_getmant_ss (d1, d1, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src); - f2 = _mm_mask_getmant_round_ss (f2, m8, f2, f2, _MM_MANT_NORM_p75_1p5, +} + +__m128 +poo2 (__m128 d2, __m128 d1) +{ + return _mm_getmant_round_ss (d1, d1, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src, 8); - f2 = _mm_maskz_getmant_ss (m8, f2, f2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f2 = _mm_maskz_getmant_round_ss (m8, f2, f2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src, 8); +} + +__m128 +poo3 (__m128 d2, __m128 d1, __m128 d3) +{ + + return _mm_mask_getmant_ss (d1, m8, d3, d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128 +poo4 (__m128 d2, __m128 d1, __m128 d3) +{ + return _mm_mask_getmant_round_ss (d1, m8, d3, d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); +} +__m128 +poo5 (__m128 d2, __m128 d1) +{ + return _mm_maskz_getmant_ss (m8, d1, d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128 +poo6 (__m128 d2, __m128 d1, __m128 d3) +{ + return _mm_maskz_getmant_round_ss (m8, d1, d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src, 8); } -/* { dg-final { scan-assembler-times "vxorps" 22 } } */ +/* { dg-final { scan-assembler-times "vxorps" 24 } } */ /* { dg-final { scan-assembler-times "vpermd" 3 } } */ /* { dg-final { scan-assembler-times "vpermq" 6 } } */ /* { dg-final { scan-assembler-times "vpermps" 3 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-dest-false-dep-for-glc.c b/gcc/testsuite/gcc.target/i386/avx512fp16-dest-false-dep-for-glc.c index 990d65b0904..55c7399da3b 100644 --- a/gcc/testsuite/gcc.target/i386/avx512fp16-dest-false-dep-for-glc.c +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-dest-false-dep-for-glc.c @@ -11,32 +11,98 @@ __mmask32 m32; __mmask16 m16; __mmask8 m8; -void complex_mul_test (void) -{ - h1 = _mm512_fmul_pch (h1, h1); - h1 = _mm512_fmul_round_pch (h1, h1, 8); - h1 = _mm512_mask_fmul_pch (h1, m32, h1, h1); - h1 = _mm512_mask_fmul_round_pch (h1, m32, h1, h1, 8); - h1 = _mm512_maskz_fmul_pch (m32, h1, h1); - h1 = _mm512_maskz_fmul_round_pch (m32, h1, h1, 11); - - h3 = _mm_fmul_sch (h3, h3); - h3 = _mm_fmul_round_sch (h3, h3, 8); - h3 = _mm_mask_fmul_sch (h3, m8, h3, h3); - h3 = _mm_mask_fmul_round_sch (h3, m8, h3, h3, 8); - h3 = _mm_maskz_fmul_sch (m8, h3, h3); - h3 = _mm_maskz_fmul_round_sch (m8, h3, h3, 11); -} - -void vgetmant_test (void) -{ - h3 = _mm_getmant_sh (h3, h3, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - h3 = _mm_mask_getmant_sh (h3, m8, h3, h3, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - h3 = _mm_maskz_getmant_sh (m8, h3, h3, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); -} +__m512h +foo1 (__m512h h2, __m512h h1) +{ + return _mm512_fmul_pch (h1, h1); +} + +__m512h +foo2 (__m512h h2, __m512h h1) +{ + return _mm512_fmul_round_pch (h1, h1, 8); +} + +__m512h +foo3 (__m512h h2, __m512h h1) +{ + return _mm512_mask_fmul_pch (h1, m32, h1, h1); +} + +__m512h +foo4 (__m512h h2, __m512h h1) +{ + return _mm512_mask_fmul_round_pch (h1, m32, h1, h1, 8); +} + +__m512h +foo5 (__m512h h2, __m512h h1) +{ + return _mm512_maskz_fmul_pch (m32, h1, h1); +} + +__m512h +foo6 (__m512h h2, __m512h h1) +{ + return _mm512_maskz_fmul_round_pch (m32, h1, h1, 11); +} + +__m128h +bar1 (__m128h h2, __m128h h1) +{ + return _mm_fmul_sch (h1, h1); +} + +__m128h +bar2 (__m128h h2, __m128h h1) +{ + return _mm_fmul_round_sch (h1, h1, 8); +} + +__m128h +bar3 (__m128h h2, __m128h h1) +{ + return _mm_mask_fmul_sch (h1, m8, h1, h1); +} + +__m128h +bar4 (__m128h h2, __m128h h1) +{ + return _mm_mask_fmul_round_sch (h1, m8, h1, h1, 8); +} + +__m128h +bar5 (__m128h h2, __m128h h1) +{ + return _mm_maskz_fmul_sch (m8, h1, h1); +} + +__m128h +bar6 (__m128h h2, __m128h h1) +{ + return _mm_maskz_fmul_round_sch (m8, h1, h1, 11); +} + +__m128h +zoo1 (__m128h h1, __m128h h3) +{ + return _mm_getmant_sh (h3, h3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128h +zoo2 (__m128h h1, __m128h h3) +{ + return _mm_mask_getmant_sh (h3, m8, h3, h3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128h +zoo3 (__m128h h1, __m128h h3) +{ + return _mm_maskz_getmant_sh (m8, h3, h3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} /* { dg-final { scan-assembler-times "vxorps" 10 } } */ /* { dg-final { scan-assembler-times "vfmulcph" 6 } } */ diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-dest-false-dep-for-glc.c b/gcc/testsuite/gcc.target/i386/avx512vl-dest-false-dep-for-glc.c index 37d3ba51452..1437254d3ce 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-dest-false-dep-for-glc.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-dest-false-dep-for-glc.c @@ -13,60 +13,203 @@ extern __m128 f2, *pf2; __mmask16 m16; __mmask8 m8; -void vperm_test (void) -{ - d1 = _mm256_permutex_pd (d1, 12); - d1 = _mm256_mask_permutex_pd (d1, m8, d1, 12); - d1 = _mm256_maskz_permutex_pd (m8, d1, 12); - d11 = _mm256_permutexvar_pd (i1, d11); - d11 = _mm256_mask_permutexvar_pd (d11, m8, i1, d11); - d11 = _mm256_maskz_permutexvar_pd (m8, i1, d11); - - f1 = _mm256_permutexvar_ps (i1, f1); - f1 = _mm256_mask_permutexvar_ps (f1, m8, i1, f1); - f1 = _mm256_maskz_permutexvar_ps (m8, i1, f1); - - i1 = _mm256_permutexvar_epi64 (i1, i1); - i1 = _mm256_mask_permutexvar_epi64 (i1, m8, i1, i1); - i1 = _mm256_maskz_permutexvar_epi64 (m8, i1, i1); - i1 = _mm256_permutex_epi64 (i1, 12); - i1 = _mm256_mask_permutex_epi64 (i1, m8, i1, 12); - i1 = _mm256_maskz_permutex_epi64 (m8, i1, 12); - - i2 = _mm256_permutexvar_epi32 (i2, i2); - i2 = _mm256_mask_permutexvar_epi32 (i2, m8, i2, i2); - i3 = _mm256_maskz_permutexvar_epi32 (m8, i3, i3); -} - -void getmant_test (void) -{ - d1 = _mm256_getmant_pd (*pd1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - d1 = _mm256_mask_getmant_pd (d1, m8, *pd1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - d1 = _mm256_maskz_getmant_pd (m8, *pd1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - d2 = _mm_getmant_pd (*pd2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - d2 = _mm_mask_getmant_pd (d2, m8, *pd2, _MM_MANT_NORM_p75_1p5, +__m256d +foo1 (__m256d d2, __m256d d1) +{ + return _mm256_permutex_pd (d1, 12); +} + +__m256d +foo2 (__m256d d2, __m256d d1) +{ + return _mm256_mask_permutex_pd (d1, m8, d1, 13); +} + +__m256d +foo3 (__m256d d2, __m256d d1) +{ + return _mm256_maskz_permutex_pd (m8, d1, 14); +} + +__m256d +foo4 (__m256d d2, __m256d d11, __m256i i1) +{ + return _mm256_permutexvar_pd (i1, d11); +} + +__m256d +foo5 (__m256d d2, __m256d d11, __m256i i2) +{ + return _mm256_mask_permutexvar_pd (d11, m8, i2, d11); +} + +__m256d +foo6 (__m256d d2, __m256d d11, __m256i i3) +{ + return _mm256_maskz_permutexvar_pd (m8, i3, d11); +} + +__m256i +ioo1 (__m256i d2, __m256i d1) +{ + return _mm256_permutex_epi64 (d1, 12); +} + +__m256i +ioo2 (__m256i d2, __m256i d1) +{ + return _mm256_mask_permutex_epi64 (d1, m8, d1, 13); +} + +__m256i +ioo3 (__m256i d2, __m256i d1) +{ + return _mm256_maskz_permutex_epi64 (m8, d1, 14); +} + +__m256i +ioo4 (__m256i d2, __m256i d11, __m256i i1) +{ + return _mm256_permutexvar_epi64 (i1, d11); +} + +__m256i +ioo5 (__m256i d2, __m256i d11, __m256i i2) +{ + return _mm256_mask_permutexvar_epi64 (d11, m8, i2, d11); +} + +__m256i +ioo6 (__m256i d2, __m256i d11, __m256i i3) +{ + return _mm256_maskz_permutexvar_epi64 (m8, i3, d11); +} + +__m256 +koo1 (__m256 f2, __m256i i1, __m256 f1) +{ + return _mm256_permutexvar_ps (i1, f1); +} + +__m256 +koo2 (__m256 f2, __m256i i1, __m256 f1) +{ + return _mm256_mask_permutexvar_ps (f1, m8, i1, f1); +} + +__m256 +koo3 (__m256 f2, __m256i i1, __m256 f1) +{ + return _mm256_maskz_permutexvar_ps (m8, i1, f1); +} + +__m256i +hoo1 (__m256i f2, __m256i i1, __m256i f1) +{ + return _mm256_permutexvar_epi32 (i1, f1); +} + +__m256i +hoo2 (__m256i f2, __m256i i1, __m256i f1) +{ + return _mm256_mask_permutexvar_epi32 (f1, m8, i1, f1); +} + +__m256i +hoo3 (__m256i f2, __m256i i1, __m256i f1) +{ + return _mm256_maskz_permutexvar_epi32 (m8, i1, f1); +} + +__m256d +moo1 (__m256d d2, __m256d* d1) +{ + return _mm256_getmant_pd (*d1, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src); - d2 = _mm_maskz_getmant_pd (m8, *pd2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f1 = _mm256_getmant_ps (*pf1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f1 = _mm256_mask_getmant_ps (f1, m8, *pf1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f1 = _mm256_maskz_getmant_ps (m8, *pf1, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f2 = _mm_getmant_ps (*pf2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); - f2 = _mm_mask_getmant_ps (f2, m8, *pf2, _MM_MANT_NORM_p75_1p5, +} + +__m256d +moo3 (__m256d d2, __m256d d1, __m256d* d3) +{ + + return _mm256_mask_getmant_pd (d1, m8, *d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m256d +moo5 (__m256d d2, __m256d* d1) +{ + return _mm256_maskz_getmant_pd (m8, *d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128d +moo2 (__m128d d2, __m128d* d1) +{ + return _mm_getmant_pd (*d1, _MM_MANT_NORM_p75_1p5, _MM_MANT_SIGN_src); - f2 = _mm_maskz_getmant_ps (m8, *pf2, _MM_MANT_NORM_p75_1p5, - _MM_MANT_SIGN_src); } -/* { dg-final { scan-assembler-times "vxorps" 19 } } */ +__m128d +moo4 (__m128d d2, __m128d d1, __m128d* d3) +{ + + return _mm_mask_getmant_pd (d1, m8, *d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128d +moo6 (__m128d d2, __m128d* d1) +{ + return _mm_maskz_getmant_pd (m8, *d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m256 +noo1 (__m256 d2, __m256* d1) +{ + return _mm256_getmant_ps (*d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m256 +noo3 (__m256 d2, __m256 d1, __m256* d3) +{ + + return _mm256_mask_getmant_ps (d1, m8, *d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m256 +noo5 (__m256 d2, __m256* d1) +{ + return _mm256_maskz_getmant_ps (m8, *d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128 +noo2 (__m128 d2, __m128* d1) +{ + return _mm_getmant_ps (*d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128 +noo4 (__m128 d2, __m128 d1, __m128* d3) +{ + + return _mm_mask_getmant_ps (d1, m8, *d3, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +__m128 +noo6 (__m128 d2, __m128* d1) +{ + return _mm_maskz_getmant_ps (m8, *d1, _MM_MANT_NORM_p75_1p5, + _MM_MANT_SIGN_src); +} + +/* { dg-final { scan-assembler-times "vxorps" 20 } } */ /* { dg-final { scan-assembler-times "vpermpd" 6 } } */ /* { dg-final { scan-assembler-times "vpermps" 3 } } */ /* { dg-final { scan-assembler-times "vpermq" 6 } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr108707.c b/gcc/testsuite/gcc.target/i386/pr108707.c index bc1a476f551..6405cfe7cdc 100644 --- a/gcc/testsuite/gcc.target/i386/pr108707.c +++ b/gcc/testsuite/gcc.target/i386/pr108707.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-options "-mavx512f -O2" } */ -/* { dg-final { scan-assembler-not {(?n)vfmadd[1-3]*ps.*\(} } } */ +/* { dg-final { scan-assembler-not {(?n)vfmadd[1-3]*ps.*\(} { target { ! ia32 } } } } */ /* { dg-final { scan-assembler-times {(?n)vfmadd[1-3]*ps[ \t]*} 3 } } */ #include