From patchwork Tue Apr 18 06:52:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Pan2 via Gcc-patches" X-Patchwork-Id: 84603 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2629069vqo; Mon, 17 Apr 2023 23:53:13 -0700 (PDT) X-Google-Smtp-Source: AKy350aoKtADceaQRfGlKUtAF+0GoTppS4DHjAj16df7tkTpEd2OLzUjtTR1Lx/VlwW7WNmuDhPR X-Received: by 2002:a17:907:d1d:b0:94f:3eec:f6b5 with SMTP id gn29-20020a1709070d1d00b0094f3eecf6b5mr8403289ejc.57.1681800793602; Mon, 17 Apr 2023 23:53:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681800793; cv=none; d=google.com; s=arc-20160816; b=oeDN5muMnt6jBXzphXMeQ7b2klph6M+pmS1BwQ6kOc47z1EIMxrqcKsAGAFcPR8jNG teVxWk9pM5C1I51oczsXGNp3suBAX7fMrV1hVCnE4D6+CV6QMyhutNicTXxwmOMb+4b8 STgalIW664JS6uCn9q+7D0GUZqAJYHC4jLS2Bta6nuey4TxP+VU1rrEaARiwNPNQn7jx JUREoYcOUHEBmFwNNJS6kP1IAJknxQNVgHzupGOkFGiDCuVp13QGYAc6DZRwqshf50Tp Jd0i50GWD/tzxE3l++fikbxlUP0cKVQ8Ihynz1djCOeevJGcHSrxHIgTTxIh3rdiMYPv fOeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=Bl59ygNzHHqwbQ7dLK/xiDzn+VPUKY9SHM3RYJozRvk=; b=hsuky/QUqTQkWbQuEFW3fI7BjKtnS/5YxmTnNm1dDhNRJpxvEsqIL/i0jDVVOc5v5l n7rt7M/FWqVt/YIa23NaZZbsLMbuTUl94faodb2d/t2HonX0ofHWgjtvkQ0wksz+jiim OqsYC5lFXpQEEdDuKc7Qnms7z0n6WsNto4sLYtOB1/DIhiTiCysIRo6mgtQ/+L5fQ8AE 77iA5ZQ9VZU5Ca36S2UXjnOkUqbzFJ3qA1lAvgRuHkdo8oQ1kNgwMQvC5Sfoh56vqkxt u602ecHf2US0+eKFhg0F9uUHq3rIEFU8VAqzYjoHSBXwmgVAecxr/wMvtlfHQ2hCtReV 3sYQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=hwTj246E; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id fp29-20020a1709069e1d00b0094ed0028f57si10007877ejc.847.2023.04.17.23.53.13 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 Apr 2023 23:53:13 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=hwTj246E; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2CD3C3858C5E for ; Tue, 18 Apr 2023 06:53:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2CD3C3858C5E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1681800792; bh=Bl59ygNzHHqwbQ7dLK/xiDzn+VPUKY9SHM3RYJozRvk=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=hwTj246EaaZnEtYUS8ntrsDY3w0Ftxrr5dGk+OUJVEMmTzsXArYFw8pXhUxW+iCXG ucU3tBmO/r2AP3Sx2s6lL+v0Bu45B7hZpBI4ql5/ohcbiJ9oXFwjhYJDUMGA+dgGab GMwguNi3zr+mG3VY2t7QKUZdcnHChrt3pNyOoCEk= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by sourceware.org (Postfix) with ESMTPS id 43BE63858D1E for ; Tue, 18 Apr 2023 06:52:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 43BE63858D1E X-IronPort-AV: E=McAfee;i="6600,9927,10683"; a="345090343" X-IronPort-AV: E=Sophos;i="5.99,206,1677571200"; d="scan'208";a="345090343" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Apr 2023 23:52:26 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10683"; a="723530162" X-IronPort-AV: E=Sophos;i="5.99,206,1677571200"; d="scan'208";a="723530162" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga001.jf.intel.com with ESMTP; 17 Apr 2023 23:52:24 -0700 Received: from shliclel4214.sh.intel.com (shliclel4214.sh.intel.com [10.239.240.214]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 62A7210081C8; Tue, 18 Apr 2023 14:52:23 +0800 (CST) To: gcc-patches@gcc.gnu.org Cc: hongtao.liu@intel.com, ubizjak@gmail.com Subject: [PATCH] i386: Optimize vshuf{i, f}{32x4, 64x2} ymm and vperm{i, f}128 ymm Date: Tue, 18 Apr 2023 14:52:23 +0800 Message-Id: <20230418065223.3862113-1-lin1.hu@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Hu, Lin1 via Gcc-patches" From: "Li, Pan2 via Gcc-patches" Reply-To: "Hu, Lin1" Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763495948827965574?= X-GMAIL-MSGID: =?utf-8?q?1763495948827965574?= Hi, all The patch aims to optimize vshuf{i,f}{32x4,64x2} ymm and vperm{i,f}128. And it has regtested on x86_64-pc-linux-gnu. OK for trunk? Thanks. Lin vshuf{i,f}{32x4,64x2} ymm and vperm{i,f}128 ymm are 3 clk. We can optimze them to vblend, vmovaps when there's no cross-lane. gcc/ChangeLog: * config/i386/sse.md: Modify insn vperm{i,f} and vshuf{i,f}. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512vl-vshuff32x4-1.c: Modify test. * gcc.target/i386/avx512vl-vshuff64x2-1.c: Ditto. * gcc.target/i386/avx512vl-vshufi32x4-1.c: Ditto. * gcc.target/i386/avx512vl-vshufi64x2-1.c: Ditto. * gcc.target/i386/opt-vperm-vshuf-1.c: New test. * gcc.target/i386/opt-vperm-vshuf-2.c: Ditto. * gcc.target/i386/opt-vperm-vshuf-3.c: Ditto. --- gcc/config/i386/sse.md | 36 ++++++++-- .../gcc.target/i386/avx512vl-vshuff32x4-1.c | 2 +- .../gcc.target/i386/avx512vl-vshuff64x2-1.c | 2 +- .../gcc.target/i386/avx512vl-vshufi32x4-1.c | 2 +- .../gcc.target/i386/avx512vl-vshufi64x2-1.c | 2 +- .../gcc.target/i386/opt-vperm-vshuf-1.c | 51 ++++++++++++++ .../gcc.target/i386/opt-vperm-vshuf-2.c | 68 +++++++++++++++++++ .../gcc.target/i386/opt-vperm-vshuf-3.c | 63 +++++++++++++++++ 8 files changed, 218 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/opt-vperm-vshuf-1.c create mode 100644 gcc/testsuite/gcc.target/i386/opt-vperm-vshuf-2.c create mode 100644 gcc/testsuite/gcc.target/i386/opt-vperm-vshuf-3.c diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 513960e8f33..5b6b2427460 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -18437,6 +18437,8 @@ mask = INTVAL (operands[3]) / 2; mask |= (INTVAL (operands[5]) - 4) / 2 << 1; operands[3] = GEN_INT (mask); + if (INTVAL (operands[3]) == 2 && !) + return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}"; return "vshuf64x2\t{%3, %2, %1, %0|%0, %1, %2, %3}"; } [(set_attr "type" "sselog") @@ -18595,6 +18597,9 @@ mask |= (INTVAL (operands[7]) - 8) / 4 << 1; operands[3] = GEN_INT (mask); + if (INTVAL (operands[3]) == 2 && !) + return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}"; + return "vshuf32x4\t{%3, %2, %1, %0|%0, %1, %2, %3}"; } [(set_attr "type" "sselog") @@ -25663,7 +25668,28 @@ (match_operand:SI 3 "const_0_to_255_operand")] UNSPEC_VPERMTI))] "TARGET_AVX2" - "vperm2i128\t{%3, %2, %1, %0|%0, %1, %2, %3}" + { + int mask = INTVAL (operands[3]); + if ((mask & 0xbb) == 16) + { + if (rtx_equal_p (operands[0], operands[1])) + return ""; + else + return "vmovaps\t{%1, %0|%0, %1}"; + } + if ((mask & 0xbb) == 50) + { + if (rtx_equal_p (operands[0], operands[2])) + return ""; + else + return "vmovaps\t{%2, %0|%0, %2}"; + } + if ((mask & 0xbb) == 18) + return "vblendps\t{$15, %2, %1, %0|%0, %1, %2, 15}"; + if ((mask & 0xbb) == 48) + return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}"; + return "vperm2i128\t{%3, %2, %1, %0|%0, %1, %2, %3}"; + } [(set_attr "type" "sselog") (set_attr "prefix" "vex") (set_attr "mode" "OI")]) @@ -26226,9 +26252,11 @@ && avx_vperm2f128_parallel (operands[3], mode)" { int mask = avx_vperm2f128_parallel (operands[3], mode) - 1; - if (mask == 0x12) - return "vinsert\t{$0, %x2, %1, %0|%0, %1, %x2, 0}"; - if (mask == 0x20) + if ((mask & 0xbb) == 0x12) + return "vblendps\t{$15, %2, %1, %0|%0, %1, %2, 15}"; + if ((mask & 0xbb) == 0x30) + return "vblendps\t{$240, %2, %1, %0|%0, %1, %2, 240}"; + if ((mask & 0xbb) == 0x20) return "vinsert\t{$1, %x2, %1, %0|%0, %1, %x2, 1}"; operands[3] = GEN_INT (mask); return "vperm2\t{%3, %2, %1, %0|%0, %1, %2, %3}"; diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vshuff32x4-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-vshuff32x4-1.c index 6c2fb2f184a..02aecf4edce 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-vshuff32x4-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vshuff32x4-1.c @@ -12,7 +12,7 @@ volatile __mmask8 m; void extern avx512vl_test (void) { - x = _mm256_shuffle_f32x4 (x, x, 2); + x = _mm256_shuffle_f32x4 (x, x, 3); x = _mm256_mask_shuffle_f32x4 (x, m, x, x, 2); x = _mm256_maskz_shuffle_f32x4 (m, x, x, 2); } diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vshuff64x2-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-vshuff64x2-1.c index 1191b400134..563ded5d9df 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-vshuff64x2-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vshuff64x2-1.c @@ -12,7 +12,7 @@ volatile __mmask8 m; void extern avx512vl_test (void) { - x = _mm256_shuffle_f64x2 (x, x, 2); + x = _mm256_shuffle_f64x2 (x, x, 3); x = _mm256_mask_shuffle_f64x2 (x, m, x, x, 2); x = _mm256_maskz_shuffle_f64x2 (m, x, x, 2); } diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vshufi32x4-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-vshufi32x4-1.c index ef9a441e7a5..e89c4140d37 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-vshufi32x4-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vshufi32x4-1.c @@ -12,7 +12,7 @@ volatile __mmask8 m; void extern avx512vl_test (void) { - x = _mm256_shuffle_i32x4 (x, x, 2); + x = _mm256_shuffle_i32x4 (x, x, 3); x = _mm256_mask_shuffle_i32x4 (x, m, x, x, 2); x = _mm256_maskz_shuffle_i32x4 (m, x, x, 2); } diff --git a/gcc/testsuite/gcc.target/i386/avx512vl-vshufi64x2-1.c b/gcc/testsuite/gcc.target/i386/avx512vl-vshufi64x2-1.c index 0bd117e85d4..8e8e47eda38 100644 --- a/gcc/testsuite/gcc.target/i386/avx512vl-vshufi64x2-1.c +++ b/gcc/testsuite/gcc.target/i386/avx512vl-vshufi64x2-1.c @@ -12,7 +12,7 @@ volatile __mmask8 m; void extern avx512vl_test (void) { - x = _mm256_shuffle_i64x2 (x, x, 2); + x = _mm256_shuffle_i64x2 (x, x, 3); x = _mm256_mask_shuffle_i64x2 (x, m, x, x, 2); x = _mm256_maskz_shuffle_i64x2 (m, x, x, 2); } diff --git a/gcc/testsuite/gcc.target/i386/opt-vperm-vshuf-1.c b/gcc/testsuite/gcc.target/i386/opt-vperm-vshuf-1.c new file mode 100644 index 00000000000..1ee00b6b4a1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/opt-vperm-vshuf-1.c @@ -0,0 +1,51 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -march=sapphirerapids" } */ +/* { dg-final { scan-assembler-times "vmovaps" 1 } } */ +/* { dg-final { scan-assembler-times "vblendps\t\\\$15" 1 } } */ +/* { dg-final { scan-assembler-times "vblendps\t\\\$240" 5 } } */ + +#include + +/* Vpermi128/Vpermf128 */ +__m256i +perm0 (__m256i a, __m256i b) +{ + return _mm256_permute2x128_si256 (a, b, 50); +} + +__m256i +perm1 (__m256i a, __m256i b) +{ + return _mm256_permute2x128_si256 (a, b, 18); +} + +__m256i +perm2 (__m256i a, __m256i b) +{ + return _mm256_permute2x128_si256 (a, b, 48); +} + +/* vshuf{i,f}{32x4,64x2} ymm .*/ +__m256i +shuff0 (__m256i a, __m256i b) +{ + return _mm256_shuffle_i32x4(a, b, 2); +} + +__m256 +shuff1 (__m256 a, __m256 b) +{ + return _mm256_shuffle_f32x4(a, b, 2); +} + +__m256i +shuff2 (__m256i a, __m256i b) +{ + return _mm256_shuffle_i64x2(a, b, 2); +} + +__m256d +shuff3 (__m256d a, __m256d b) +{ + return _mm256_shuffle_f64x2(a, b, 2); +} diff --git a/gcc/testsuite/gcc.target/i386/opt-vperm-vshuf-2.c b/gcc/testsuite/gcc.target/i386/opt-vperm-vshuf-2.c new file mode 100644 index 00000000000..9775072b97a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/opt-vperm-vshuf-2.c @@ -0,0 +1,68 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -march=sapphirerapids" } */ +/* { dg-final { scan-assembler-not "vmovaps" } } */ +/* { dg-final { scan-assembler-not "vblendps" } } */ +/* { dg-final { scan-assembler-not "vperm2i128" } } */ +/* { dg-final { scan-assembler-not "vperm2f128" } } */ + +#include + +__m256i +perm0 (__m256i a, __m256i b) +{ + return _mm256_permute2x128_si256 (a, b, 16); +} + +__m256d +perm1 (__m256d a, __m256d b) +{ + return _mm256_permute2f128_pd (a, b, 16); +} + +__m256 +perm2 (__m256 a, __m256 b) +{ + return _mm256_permute2f128_ps (a, b, 16); +} + +__m256i +perm3 (__m256i a, __m256i b) +{ + return _mm256_permute2f128_si256 (a, b, 16); +} + +__m256i +perm4 (__m256i a, __m256i b) +{ + return _mm256_permute2x128_si256 (a, b, 20); +} + +__m256d +perm5 (__m256d a, __m256d b) +{ + return _mm256_permute2f128_pd (a, b, 20); +} + +__m256i +perm6 (__m256i a, __m256i b) +{ + return _mm256_permute2x128_si256 (a, b, 80); +} + +__m256d +perm7 (__m256d a, __m256d b) +{ + return _mm256_permute2f128_pd (a, b, 80); +} + +__m256i +perm8 (__m256i a, __m256i b) +{ + return _mm256_permute2x128_si256 (a, b, 84); +} + +__m256d +perm9 (__m256d a, __m256d b) +{ + return _mm256_permute2f128_pd (a, b, 84); +} diff --git a/gcc/testsuite/gcc.target/i386/opt-vperm-vshuf-3.c b/gcc/testsuite/gcc.target/i386/opt-vperm-vshuf-3.c new file mode 100644 index 00000000000..a330b14caca --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/opt-vperm-vshuf-3.c @@ -0,0 +1,63 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast -march=sapphirerapids" } */ +/* { dg-final { scan-assembler-times "vmov..." 3 } } */ +/* { dg-final { scan-assembler-times "vblendps\t\\\$15" 3 } } */ +/* { dg-final { scan-assembler-times "vblendps\t\\\$240" 3 } } */ +/* { dg-final { scan-assembler-not "vperm2f128" } } */ + +#include + +/* Vpermf128 */ +__m256 +perm0 (__m256 a, __m256 b) +{ + return _mm256_permute2f128_ps (a, b, 50); +} + +__m256 +perm1 (__m256 a, __m256 b) +{ + return _mm256_permute2f128_ps (a, b, 18); +} + +__m256 +perm2 (__m256 a, __m256 b) +{ + return _mm256_permute2f128_ps (a, b, 48); +} + +__m256i +perm3 (__m256i a, __m256i b) +{ + return _mm256_permute2f128_si256 (a, b, 50); +} + +__m256i +perm4 (__m256i a, __m256i b) +{ + return _mm256_permute2f128_si256 (a, b, 18); +} + +__m256i +perm5 (__m256i a, __m256i b) +{ + return _mm256_permute2f128_si256 (a, b, 48); +} + +__m256d +perm6 (__m256d a, __m256d b) +{ + return _mm256_permute2f128_pd (a, b, 50); +} + +__m256d +perm7 (__m256d a, __m256d b) +{ + return _mm256_permute2f128_pd (a, b, 18); +} + +__m256d +perm8 (__m256d a, __m256d b) +{ + return _mm256_permute2f128_pd (a, b, 48); +}