From patchwork Wed Mar 15 19:37:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 70403 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp84063wrt; Wed, 15 Mar 2023 12:38:38 -0700 (PDT) X-Google-Smtp-Source: AK7set8AABWOF/wJ6ah6ifASaj2qzCS6IL66TiMMj91CQHGbV17AnzhjSzpScu5HYZnQJQk2RrZv X-Received: by 2002:aa7:d0d8:0:b0:4fd:2675:3783 with SMTP id u24-20020aa7d0d8000000b004fd26753783mr4471031edo.1.1678909118651; Wed, 15 Mar 2023 12:38:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678909118; cv=none; d=google.com; s=arc-20160816; b=c6/nkD7UqT2XqawEpnX7FdkrkspoxcsiGSaO4CWJQ04AFlPd4K0xFw2bVljv3Ge56u lj86OvWrcKsnEyWYrmszl/GornVY/r0zHnY1FBfDGF2JKowRu/H+f3pFY4vO6DpY1LET BJgUDdtSF6Imoa4MsSwniCXYGR4c0J/aiW/uJ1cyIBRfw3yI/lyqKQlS9Tva25EsjEov SdA9HlqJWHM+NADSHyEBdZ12297nCHjg+Lp8h3GoH8rsSzzIDl2/c0N1bRHoanHKhp0+ B5lDE3MIwnFdT6E/Lbg6M87vDl0bok5bR6jxRA/h1Kv02wBRz6xeoxm2Y1TbOiAgZwbj 8i/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:to:subject :message-id:date:mime-version:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=1VP9oUTMGoVer6u3ghyOF8JltCzYECIezZ2jdBn8LtI=; b=O1BWMa5qReDr5/q1C2K+1/Y20My1VRYVNEhySCTXimCAkbKrfYrd2DC2/hhuzensNC 5EM3spJ0jB5K0EXsMXaAKgipTZSoq67TcL/XhEdPEaCHbRAuOGSE4V1tYOA9Oupcq/jq +X6XhNO+3isrpVQFFC/yKH71j7yk2DvDnh0Lsya3jIB1GxobTjyXwIQeOmGSpOjQNc36 7z+yOlprrjsLWGRqaAVhh084O3BmPK+aJ63zEKQyP+9LR9XU5LKwRswljlj8H77DxVCC vQMf10+eVj+5bI/062tx7fRdpasH23q+RJdYGR96McNSh8Hyey4JWs20NMTZYIDks8N4 ZT1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Y3AeHkeB; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id g17-20020aa7c591000000b004fd0c5e79b6si6379286edq.31.2023.03.15.12.38.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Mar 2023 12:38:38 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Y3AeHkeB; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2D04D38582A1 for ; Wed, 15 Mar 2023 19:38:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2D04D38582A1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1678909117; bh=1VP9oUTMGoVer6u3ghyOF8JltCzYECIezZ2jdBn8LtI=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=Y3AeHkeBVCeriEQkLJJB04glRCzohtymwEE3o5xwsFA+RaYJCniZVQBFLV4d+5p+P 4sj2GThp763cRB1ZI/94rs5zv44w0sPcEYJIRVUQ6jrQ7ECr5YfxsLhYxC/maDvJRu oXaCV9xaNlKIHd09nbD3x66MUw3fSNxnUBI4/3DA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-yb1-xb31.google.com (mail-yb1-xb31.google.com [IPv6:2607:f8b0:4864:20::b31]) by sourceware.org (Postfix) with ESMTPS id 94DBF3857C5A for ; Wed, 15 Mar 2023 19:37:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 94DBF3857C5A Received: by mail-yb1-xb31.google.com with SMTP id e65so11963047ybh.10 for ; Wed, 15 Mar 2023 12:37:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678909071; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=1VP9oUTMGoVer6u3ghyOF8JltCzYECIezZ2jdBn8LtI=; b=S6BaAcCaSKxV09tRPPcO+xld+UH0Kl+KsMNBCUpSoDEScrC279bCg5hdYCD9wiG4NN Vi3tw/jUV03E2ZsFAT4yeUVmQOrIV2159buA1IQqMev+iHg5PHHUz2YosIQRxZogr22p oohphh1z9sEmGzO4zKd19saqLpoL6TuFvQmkH2h5CzCOjvSTHNL05dpaB6NVH7BUJhFz tgoxagDBIbp2BcgIC/x7mVV9LiaRnXswGHacQUzxyR8LhMBfKizP9K8ne8EuXX1iGKN1 9np9arNJMRw2Qh114JacRQa+y26VmHIKk4RIv3phwKCSz3P+NOLk0Oy4GZlhh3N7BvuJ k5Uw== X-Gm-Message-State: AO0yUKWHbAfVecbXAs8LAFTF3LV+VOZrs5qzgNftppIh8P9UQ4Uf5pzi /GQr+W9GD/6vdCbmXCbxjpU4ECKsVLzzBf+hNP3+HT7Gp/l5zQ== X-Received: by 2002:a25:8607:0:b0:b30:2539:a2b4 with SMTP id y7-20020a258607000000b00b302539a2b4mr4796480ybk.2.1678909071389; Wed, 15 Mar 2023 12:37:51 -0700 (PDT) MIME-Version: 1.0 Date: Wed, 15 Mar 2023 20:37:40 +0100 Message-ID: Subject: [PATCH] i386: Fix blend vector permutation for 8-byte modes To: "gcc-patches@gcc.gnu.org" X-Spam-Status: No, score=-7.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Uros Bizjak via Gcc-patches From: Uros Bizjak Reply-To: Uros Bizjak Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760463807952567323?= X-GMAIL-MSGID: =?utf-8?q?1760463807952567323?= 8-byte modes should be processed only for TARGET_MMX_WITH_SSE. Handle V2SFmode and fix V2HImode handling. The resulting BLEND instructions are always faster than MOVSS/MOVSD, so prioritize them w.r.t MOVSS/MOVSD for TARGET_SSE4_1. gcc/ChangeLog: * config/i386/i386-expand.cc (expand_vec_perm_blend): Handle 8-byte modes only with TARGET_MMX_WITH_SSE. Handle V2SFmode and fix V2HImode handling. (expand_vec_perm_1): Try to emit BLEND instruction before MOVSS/MOVSD. * config/i386/mmx.md (*mmx_blendps): New insn pattern. gcc/testsuite/ChangeLog: * gcc.target/i386/merge-1.c (dg-options): Use -mno-sse4. * gcc.target/i386/sse2-mmx-21.c (dg-options): Ditto. * gcc.target/i386/sse-movss-4.c (dg-options): Use -mno-sse4. Simplify scan-assembler-not strings. * gcc.target/i386/sse2-movsd-3.c (dg-options): Ditto. * gcc.target/i386/sse2-mmx-movss-1.c: New test. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Pushed to master. Uros. diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index e89abf2e817..1545d4365b7 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -19007,9 +19007,10 @@ expand_vec_perm_blend (struct expand_vec_perm_d *d) ; else if (TARGET_AVX && (vmode == V4DFmode || vmode == V8SFmode)) ; - else if (TARGET_SSE4_1 && (GET_MODE_SIZE (vmode) == 16 - || GET_MODE_SIZE (vmode) == 8 - || GET_MODE_SIZE (vmode) == 4)) + else if (TARGET_SSE4_1 + && (GET_MODE_SIZE (vmode) == 16 + || (TARGET_MMX_WITH_SSE && GET_MODE_SIZE (vmode) == 8) + || GET_MODE_SIZE (vmode) == 4)) ; else return false; @@ -19042,6 +19043,8 @@ expand_vec_perm_blend (struct expand_vec_perm_d *d) case E_V8SFmode: case E_V2DFmode: case E_V4SFmode: + case E_V2SFmode: + case E_V2HImode: case E_V4HImode: case E_V8HImode: case E_V8SImode: @@ -19897,11 +19900,15 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d) } } + /* Try the SSE4.1 blend variable merge instructions. */ + if (expand_vec_perm_blend (d)) + return true; + /* Try movss/movsd instructions. */ if (expand_vec_perm_movs (d)) return true; - /* Finally, try the fully general two operand permute. */ + /* Try the fully general two operand permute. */ if (expand_vselect_vconcat (d->target, d->op0, d->op1, d->perm, nelt, d->testing_p)) return true; @@ -19924,10 +19931,6 @@ expand_vec_perm_1 (struct expand_vec_perm_d *d) return true; } - /* Try the SSE4.1 blend variable merge instructions. */ - if (expand_vec_perm_blend (d)) - return true; - /* Try one of the AVX vpermil variable permutations. */ if (expand_vec_perm_vpermil (d)) return true; diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index f9c66115f81..18dae03ad0a 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -1154,6 +1154,25 @@ (define_expand "vcondv2sf" DONE; }) +(define_insn "*mmx_blendps" + [(set (match_operand:V2SF 0 "register_operand" "=Yr,*x,x") + (vec_merge:V2SF + (match_operand:V2SF 2 "register_operand" "Yr,*x,x") + (match_operand:V2SF 1 "register_operand" "0,0,x") + (match_operand:SI 3 "const_0_to_3_operand")))] + "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE" + "@ + blendps\t{%3, %2, %0|%0, %2, %3} + blendps\t{%3, %2, %0|%0, %2, %3} + vblendps\t{%3, %2, %1, %0|%0, %1, %2, %3}" + [(set_attr "isa" "noavx,noavx,avx") + (set_attr "type" "ssemov") + (set_attr "length_immediate" "1") + (set_attr "prefix_data16" "1,1,*") + (set_attr "prefix_extra" "1") + (set_attr "prefix" "orig,orig,vex") + (set_attr "mode" "V4SF")]) + (define_insn "mmx_blendvps" [(set (match_operand:V2SF 0 "register_operand" "=Yr,*x,x") (unspec:V2SF diff --git a/gcc/testsuite/gcc.target/i386/merge-1.c b/gcc/testsuite/gcc.target/i386/merge-1.c index d5256851096..b018eb19205 100644 --- a/gcc/testsuite/gcc.target/i386/merge-1.c +++ b/gcc/testsuite/gcc.target/i386/merge-1.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O1 -msse2" } */ +/* { dg-options "-O1 -msse2 -mno-sse4" } */ #include diff --git a/gcc/testsuite/gcc.target/i386/sse-movss-4.c b/gcc/testsuite/gcc.target/i386/sse-movss-4.c index ec3019c8e54..d8a8a03b147 100644 --- a/gcc/testsuite/gcc.target/i386/sse-movss-4.c +++ b/gcc/testsuite/gcc.target/i386/sse-movss-4.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -msse" } */ +/* { dg-options "-O2 -msse -mno-sse4" } */ typedef unsigned int v4si __attribute__((vector_size(16))); typedef float v4sf __attribute__((vector_size(16))); @@ -7,7 +7,7 @@ typedef float v4sf __attribute__((vector_size(16))); v4si foo(v4si x,v4si y) { return (v4si){y[0],x[1],x[2],x[3]}; } v4sf bar(v4sf x,v4sf y) { return (v4sf){y[0],x[1],x[2],x[3]}; } -/* { dg-final { scan-assembler-times "\tv?movss\t" 2 } } */ +/* { dg-final { scan-assembler-times "\tmovss\t" 2 } } */ /* { dg-final { scan-assembler-not "movaps" } } */ /* { dg-final { scan-assembler-not "shufps" } } */ -/* { dg-final { scan-assembler-not "vpblendw" } } */ +/* { dg-final { scan-assembler-not "pblendw" } } */ diff --git a/gcc/testsuite/gcc.target/i386/sse2-mmx-21.c b/gcc/testsuite/gcc.target/i386/sse2-mmx-21.c index 8f5341e2de6..7f8098aa631 100644 --- a/gcc/testsuite/gcc.target/i386/sse2-mmx-21.c +++ b/gcc/testsuite/gcc.target/i386/sse2-mmx-21.c @@ -1,5 +1,5 @@ /* { dg-do compile { target { ! ia32 } } } */ -/* { dg-options "-O2 -msse2 -mno-mmx" } */ +/* { dg-options "-O2 -msse2 -mno-mmx -mno-sse4" } */ /* { dg-final { scan-assembler-times "pshufd" 1 } } */ /* { dg-final { scan-assembler-times "movd" 1 } } */ /* { dg-final { scan-assembler-not "%mm" } } */ diff --git a/gcc/testsuite/gcc.target/i386/sse2-mmx-movss-1.c b/gcc/testsuite/gcc.target/i386/sse2-mmx-movss-1.c new file mode 100644 index 00000000000..bb7962848b7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse2-mmx-movss-1.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -msse2 -mno-sse4" } */ + +typedef unsigned int v2si __attribute__((vector_size(8))); +typedef float v2sf __attribute__((vector_size(8))); + +v2si foo(v2si x,v2si y) { return (v2si){y[0],x[1]}; } +v2sf bar(v2sf x,v2sf y) { return (v2sf){y[0],x[1]}; } + +/* { dg-final { scan-assembler-times "\tmovss\t" 2 } } */ +/* { dg-final { scan-assembler-not "movaps" } } */ +/* { dg-final { scan-assembler-not "shufps" } } */ +/* { dg-final { scan-assembler-not "pblendw" } } */ diff --git a/gcc/testsuite/gcc.target/i386/sse2-movsd-3.c b/gcc/testsuite/gcc.target/i386/sse2-movsd-3.c index fadbe2bf2f6..edd4a445fc3 100644 --- a/gcc/testsuite/gcc.target/i386/sse2-movsd-3.c +++ b/gcc/testsuite/gcc.target/i386/sse2-movsd-3.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O2 -msse2" } */ +/* { dg-options "-O2 -msse2 -mno-sse4" } */ typedef unsigned long long v2di __attribute__((vector_size(16))); typedef double v2df __attribute__((vector_size(16))); @@ -7,9 +7,9 @@ typedef double v2df __attribute__((vector_size(16))); v2di foo(v2di x,v2di y) { return (v2di){y[0],x[1]}; } v2df bar(v2df x,v2df y) { return (v2df){y[0],x[1]}; } -/* { dg-final { scan-assembler-times "\tv?movsd\t" 2 } } */ -/* { dg-final { scan-assembler-not "v?shufpd" } } */ +/* { dg-final { scan-assembler-times "\tmovsd\t" 2 } } */ +/* { dg-final { scan-assembler-not "shufpd" } } */ /* { dg-final { scan-assembler-not "movdqa" } } */ /* { dg-final { scan-assembler-not "pshufd" } } */ -/* { dg-final { scan-assembler-not "v?punpckldq" } } */ -/* { dg-final { scan-assembler-not "v?movq" } } */ +/* { dg-final { scan-assembler-not "punpckldq" } } */ +/* { dg-final { scan-assembler-not "movq" } } */