From patchwork Thu Dec 22 23:19:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 35991 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp37220wrn; Thu, 22 Dec 2022 15:19:57 -0800 (PST) X-Google-Smtp-Source: AMrXdXtOa46kW5FT6u1MlSS0wQpwBeTIUEnjshGXQY5RG25xantNwEJe/xwd9bftwaMpwYUh0nbF X-Received: by 2002:a17:906:6d15:b0:7e8:c8f0:67f7 with SMTP id m21-20020a1709066d1500b007e8c8f067f7mr9621378ejr.38.1671751197292; Thu, 22 Dec 2022 15:19:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671751197; cv=none; d=google.com; s=arc-20160816; b=YocHnxcPybUjWuXQgjkAG5dATBVGy25gnkQApWGfFIBiag6x9qkgsiN6iy59BUYiK2 4fSMCBeRREkGstfGKRrmgjQjVmkOxD5AbmBw1Tap6iucI43u7tNP4bMt49z/z0H8zTmQ venpkmONSBBct6pF59iGcdNCoZMJwoG0BUYVp9KDhe2MEoTk7FmHJKyUIe7zOxgNNuj+ TfZDVqyhioTQeWgrN0xPgHGj9TVE8MSBEKNpPa+skmbMva9UfG1bDEmmIirbhr8O/OsE 4D9IQa8bFgJioiIuTuuObIognVY1dcyipg8oo0vNGkBt36WlZz8SYI+RZiuu/MIty4fC CKjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-language:thread-index :mime-version:message-id:date:subject:cc:to:from:dkim-signature :dmarc-filter:delivered-to; bh=2lPmId+DNvg5FtcSmt7ycu7L/59O16p5C+28plPhVH4=; b=i2V+FFTYNqHIfRMqxJWiWs7cPZyHcKHSXapsMcaI07QuoV93EqE1YR67F/+Ad9u1Vk 4y17iYSAEcy/Y2NTyRHblwQyePXAK2fctudCJ+hQAcoeKVx+2yBO10qZG8GEsaVhlztl oBxG504JmIXPWEYse27MlWddJ1jWn9QgROW3NXwk5daCTmDOCEICYKlAXSv0IpXKKMsd LhUbZU8tDsJqV4J53JyVHHlL8c/iEKZN7NPWuZS0MwVGSKAD1StFvZ72VFLT5H/ImTTr B3a6TOyx17Y2n9vpvslP+8CwbVDY6vuo6ezrayCmq0SRLDiqS0BV6k3uR7xn1rH+xihs b1Gg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@nextmovesoftware.com header.s=default header.b=C9xdwHzz; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id d4-20020aa7d684000000b0046ab036c18esi1362024edr.606.2022.12.22.15.19.57 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 22 Dec 2022 15:19:57 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=fail header.i=@nextmovesoftware.com header.s=default header.b=C9xdwHzz; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 989653857C5A for ; Thu, 22 Dec 2022 23:19:53 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id CD5243858D1E for ; Thu, 22 Dec 2022 23:19:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CD5243858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=2lPmId+DNvg5FtcSmt7ycu7L/59O16p5C+28plPhVH4=; b=C9xdwHzz348jfUZs7ZMuMLU7OH UMBF00L8aP1zhstbS9PAHjd1KwvRmUPangRY8QSt/tGoJGb6qR6FLwSSZ1tVZYcvTSfJknteuBNk3 UCxHit1kPaCAfWucFSgZ/DK0CV9Ey+NPvxF20JR6GdCoaE4UNx2IpxXI9+/+ECkNtgKOVRu14v0Pw W+73MBDnb+hUDSu2KMO2GC2S5W1+7OfVikEZspzPCX4h+gYAqZlQTNOlEY0ANVCivaAbdA85BzvDr R+z9xAeX6d39YtYH68hub8GGO5ZpyPezObS76sUPd12MTdjRGWslS/+7GRR1h48KpxBl0YxLpDFfc RsTnjn6Q==; Received: from host109-151-228-216.range109-151.btcentralplus.com ([109.151.228.216]:60526 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1p8UqS-0001Yv-9f; Thu, 22 Dec 2022 18:19:24 -0500 From: "Roger Sayle" To: "'GCC Patches'" Cc: "'Uros Bizjak'" Subject: [x86 PATCH] PR target/107548: Handle vec_select in STV. Date: Thu, 22 Dec 2022 23:19:21 -0000 Message-ID: <001d01d9165b$d4690e30$7d3b2a90$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdkWWwFvRpGTi3hVT/SWBdvdKq0rSw== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752958183686286744?= X-GMAIL-MSGID: =?utf-8?q?1752958183686286744?= This patch enhances x86's STV pass to handle VEC_SELECT during general scalar chain conversion, performing SImode scalar extraction from V4SI and DImode scalar extraction from V2DI vector registers. The motivating test case from bugzilla is: typedef unsigned int v4si __attribute__((vector_size(16))); unsigned int f (v4si a, v4si b) { a[0] += b[0]; return a[0] + a[1]; } currently with -O2 -march=znver2 this generates: vpextrd $1, %xmm0, %edx vmovd %xmm0, %eax addl %edx, %eax vmovd %xmm1, %edx addl %edx, %eax ret which performs three transfers from the vector unit to the scalar unit, and performs the two additions there. With this patch, we now generate: vmovdqa %xmm0, %xmm2 vpshufd $85, %xmm0, %xmm0 vpaddd %xmm0, %xmm2, %xmm0 vpaddd %xmm1, %xmm0, %xmm0 vmovd %xmm0, %eax ret which performs the two additions in the vector unit, and then transfers the result to the scalar unit. Technically the (cheap) movdqa isn't needed with better register allocation (or this could be cleaned up during peephole2), but even so this transform is still a win. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. Ok for mainline? 2022-12-22 Roger Sayle gcc/ChangeLog PR target/107548 * config/i386/i386-features.cc (scalar_chain::add_insn): The operands of a VEC_SELECT don't need to added to the scalar chain. (general_scalar_chain::compute_convert_gain) : Provide gains for performing STV on a VEC_SELECT. (general_scalar_chain::convert_insn): Convert VEC_SELECT to pshufd, psrldq or no-op. (general_scalar_to_vector_candidate_p): Handle VEC_SELECT of a single element from a vector register to a scalar register. gcc/testsuite/ChangeLog PR target/107548 * gcc.target/i386/pr107548-1.c: New test V4SI case. * gcc.target/i386/pr107548-1.c: New test V2DI case. Thanks in advance, Roger diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc index fd212262..cb21d3b 100644 --- a/gcc/config/i386/i386-features.cc +++ b/gcc/config/i386/i386-features.cc @@ -429,6 +429,11 @@ scalar_chain::add_insn (bitmap candidates, unsigned int insn_uid) for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref)) if (!HARD_REGISTER_P (DF_REF_REG (ref))) analyze_register_chain (candidates, ref); + + /* The operand(s) of VEC_SELECT don't need to be converted/convertible. */ + if (def_set && GET_CODE (SET_SRC (def_set)) == VEC_SELECT) + return; + for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref)) if (!DF_REF_REG_MEM_P (ref)) analyze_register_chain (candidates, ref); @@ -629,6 +634,23 @@ general_scalar_chain::compute_convert_gain () } break; + case VEC_SELECT: + if (XVECEXP (XEXP (src, 1), 0, 0) == const0_rtx) + { + // movd (4 bytes) replaced with movdqa (4 bytes). + if (!optimize_insn_for_size_p ()) + igain += ix86_cost->sse_to_integer - ix86_cost->xmm_move; + } + else + { + // pshufd; movd replaced with pshufd. + if (optimize_insn_for_size_p ()) + igain += COSTS_N_BYTES (4); + else + igain += ix86_cost->sse_to_integer; + } + break; + default: gcc_unreachable (); } @@ -1167,6 +1189,24 @@ general_scalar_chain::convert_insn (rtx_insn *insn) convert_op (&src, insn); break; + case VEC_SELECT: + if (XVECEXP (XEXP (src, 1), 0, 0) == const0_rtx) + src = XEXP (src, 0); + else if (smode == DImode) + { + rtx tmp = gen_lowpart (V1TImode, XEXP (src, 0)); + dst = gen_lowpart (V1TImode, dst); + src = gen_rtx_LSHIFTRT (V1TImode, tmp, GEN_INT (64)); + } + else + { + rtx tmp = XVECEXP (XEXP (src, 1), 0, 0); + rtvec vec = gen_rtvec (4, tmp, tmp, tmp, tmp); + rtx par = gen_rtx_PARALLEL (VOIDmode, vec); + src = gen_rtx_VEC_SELECT (vmode, XEXP (src, 0), par); + } + break; + default: gcc_unreachable (); } @@ -1917,6 +1957,16 @@ general_scalar_to_vector_candidate_p (rtx_insn *insn, enum machine_mode mode) case CONST_INT: return REG_P (dst); + case VEC_SELECT: + /* Excluding MEM_P (dst) avoids intefering with vpextr[dq]. */ + return REG_P (dst) + && REG_P (XEXP (src, 0)) + && GET_MODE (XEXP (src, 0)) == (mode == DImode ? V2DImode + : V4SImode) + && GET_CODE (XEXP (src, 1)) == PARALLEL + && XVECLEN (XEXP (src, 1), 0) == 1 + && CONST_INT_P (XVECEXP (XEXP (src, 1), 0, 0)); + default: return false; } diff --git a/gcc/testsuite/gcc.target/i386/pr107548-1.c b/gcc/testsuite/gcc.target/i386/pr107548-1.c new file mode 100644 index 0000000..da78f75 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr107548-1.c @@ -0,0 +1,25 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mstv -mno-stackrealign" } */ +typedef unsigned int v4si __attribute__((vector_size(16))); + +unsigned int foo1 (v4si a, v4si b) +{ + a[0] += b[0]; + return a[0] + a[1]; +} + +unsigned int foo2 (v4si a, v4si b) +{ + a[0] += b[0]; + return a[0] + a[2]; +} + +unsigned int foo3 (v4si a, v4si b) +{ + a[0] += b[0]; + return a[0] + a[3]; +} + +/* { dg-final { scan-assembler-times "\tmovd\t" 3 } } */ +/* { dg-final { scan-assembler-times "paddd" 6 } } */ +/* { dg-final { scan-assembler-not "addl" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr107548-2.c b/gcc/testsuite/gcc.target/i386/pr107548-2.c new file mode 100644 index 0000000..b57594e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr107548-2.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mstv -mno-stackrealign" } */ +typedef unsigned long long v2di __attribute__((vector_size(16))); + +unsigned long long foo(v2di a, v2di b) +{ + a[0] += b[0]; + return a[0] + a[1]; +} + +/* { dg-final { scan-assembler-not "\taddq\t" } } */ +/* { dg-final { scan-assembler-times "paddq" 2 } } */ +/* { dg-final { scan-assembler "psrldq" } } */