Message ID | 032901d8a2cf$fc07cfd0$f4176f70$@nextmovesoftware.com |
---|---|
State | New, archived |
Headers |
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6a10:b5d6:b0:2b9:3548:2db5 with SMTP id v22csp485737pxt; Thu, 28 Jul 2022 15:19:14 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sSksWk81YTWGMg556smHnmeaqbWSaJEvoh48bvy+bJokiEDKeD6rfMyXOJoFSrT47ZIKKa X-Received: by 2002:a17:906:7949:b0:72f:d5ce:c242 with SMTP id l9-20020a170906794900b0072fd5cec242mr738822ejo.207.1659046753956; Thu, 28 Jul 2022 15:19:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659046753; cv=none; d=google.com; s=arc-20160816; b=vIVf/KcA0bguuvTWZ3mrgoFyg/KHZl2U9FSdZ5LBtis8C2LRDqhWYET8IMUThJrTsX th5LfB9oRG8WsZMIIJhfSmjFZ8Rgf6GLuY30zatRwlvDNI1WNr0Lt8pa7JImll2FZpHY fzy1VhHVHyi6wjtMrWPX8mywCiEOjXR21AGDhHAz8noD8mFcYeHLbGwfCFCyP5Fw+6Xw eAd04fDdWlCRTdEK1/8LcyCthUi9daGfFv8B4B0VraI4iqW7QrtjZcsS2HsgcdVW3ynD meWLCSZyhoA9z+9Uhp1NjWvrnHNFSxXZ+GNC7I45rwdxkI7qAB03WwEXKbTDOgZ2m00Q Vtqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-language:thread-index :mime-version:message-id:date:subject:to:from:dkim-signature :dmarc-filter:delivered-to; bh=rhl//RD+WvGbT4+Grr2nfvx2utq7Ryl9GBwlYTh9SLk=; b=tDDET6+QbfeulT9LwzgxiH4W+c/PBJgKmNC3+V9GErztSrB1y6uSbv4V1cdK6pZ/ua bKgNkFeuEQt1HlSBRjq95r9zouDyLl1Ul5bAjDyCxiGu1sUqQQcMELeTd/MTO+/ddmk4 y2s7uDvS9ntdGgOozaPktmOf4MiysMTeyKJg1FKmv4MN+EZmNiQDNhyWpzkyQh4y/0+B TzXD0SM+rWzVsh2al+yHLIw5AQtw8jUnhC+z6JC7d2AraCdu5rZj21t7IuyF71hfA43i c9uuxdlUjJEv6jeOHgPogAxynkMSAho/RU/QiNkaY9bLS0Xx30ePp67Gm+aPa1XXt3cF KPwA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@nextmovesoftware.com header.s=default header.b=rukAWG8f; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id rs9-20020a170907036900b0072b978aed19si1355376ejb.375.2022.07.28.15.19.13 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Jul 2022 15:19:13 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=fail header.i=@nextmovesoftware.com header.s=default header.b=rukAWG8f; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8398E385E45D for <ouuuleilei@gmail.com>; Thu, 28 Jul 2022 22:19:04 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 9AE0038582A4 for <gcc-patches@gcc.gnu.org>; Thu, 28 Jul 2022 22:18:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9AE0038582A4 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=rhl//RD+WvGbT4+Grr2nfvx2utq7Ryl9GBwlYTh9SLk=; b=rukAWG8fXqjzRljLskNJFRRNJl Vmz1iH9b95owFCH2guC8af7fqpTYkGGdCIhxqP3SIvcVwOiUuiqNhfMc7uBpy1k9KgvUwnyfgmEDo 7JctjDV7DkonWE/k6VaVXnWsq1Lcp2XoBO1AMvCpvBWcGCuqNOS4/fwktUCy8wLzDWva+ye9882d4 kedsWFw2BGSEb18LKVn+ML+Ww8Z9D2Cn4KhevLcguyBjDGVNvTFYzqTSWYueYPkxn/34lgS0QhrR3 AbGTTBvIacBH0DURN98dBQHhFzN2t8R39PUqXsPImoL5h/6qoVuIqFBnh/ZANOQ4k1KtHhPC/hEMZ SPcXGGWw==; Received: from host86-169-41-119.range86-169.btcentralplus.com ([86.169.41.119]:61682 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <roger@nextmovesoftware.com>) id 1oHBq1-00069H-RV; Thu, 28 Jul 2022 18:18:38 -0400 From: "Roger Sayle" <roger@nextmovesoftware.com> To: <gcc-patches@gcc.gnu.org> Subject: [x86 PATCH] Support logical shifts by (some) integer constants in TImode STV. Date: Thu, 28 Jul 2022 23:18:36 +0100 Message-ID: <032901d8a2cf$fc07cfd0$f4176f70$@nextmovesoftware.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_032A_01D8A2D8.5DCC37D0" X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdiizyfTDo7NmXGBSnqaoLwU/+UPrw== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1739636608920476030?= X-GMAIL-MSGID: =?utf-8?q?1739636608920476030?= |
Series |
[x86] Support logical shifts by (some) integer constants in TImode STV.
|
|
Commit Message
Roger Sayle
July 28, 2022, 10:18 p.m. UTC
This patch improves TImode STV by adding support for logical shifts by integer constants that are multiples of 8. For the test case: __int128 a, b; void foo() { a = b << 16; } on x86_64, gcc -O2 currently generates: movq b(%rip), %rax movq b+8(%rip), %rdx shldq $16, %rax, %rdx salq $16, %rax movq %rax, a(%rip) movq %rdx, a+8(%rip) ret with this patch we now generate: movdqa b(%rip), %xmm0 pslldq $2, %xmm0 movaps %xmm0, a(%rip) ret This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check. both with and without --target_board=unix{-m32}, with no new failures. Ok for mainline? 2022-07-28 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-features.cc (compute_convert_gain): Add gain for converting suitable TImode shift to a V1TImode shift. (timode_scalar_chain::convert_insn): Add support for converting suitable ASHIFT and LSHIFTRT. (timode_scalar_to_vector_candidate_p): Consider logical shifts by integer constants that are multiples of 8 to be candidates. gcc/testsuite/ChangeLog * gcc.target/i386/sse4_1-stv-7.c: New test case. Thanks again, Roger --
Comments
On Fri, Jul 29, 2022 at 12:18 AM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > This patch improves TImode STV by adding support for logical shifts by > integer constants that are multiples of 8. For the test case: > > __int128 a, b; > void foo() { a = b << 16; } > > on x86_64, gcc -O2 currently generates: > > movq b(%rip), %rax > movq b+8(%rip), %rdx > shldq $16, %rax, %rdx > salq $16, %rax > movq %rax, a(%rip) > movq %rdx, a+8(%rip) > ret > > with this patch we now generate: > > movdqa b(%rip), %xmm0 > pslldq $2, %xmm0 > movaps %xmm0, a(%rip) > ret > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check. both with and without --target_board=unix{-m32}, > with no new failures. Ok for mainline? > > > 2022-07-28 Roger Sayle <roger@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386-features.cc (compute_convert_gain): Add gain > for converting suitable TImode shift to a V1TImode shift. > (timode_scalar_chain::convert_insn): Add support for converting > suitable ASHIFT and LSHIFTRT. > (timode_scalar_to_vector_candidate_p): Consider logical shifts > by integer constants that are multiples of 8 to be candidates. > > gcc/testsuite/ChangeLog > * gcc.target/i386/sse4_1-stv-7.c: New test case. + case ASHIFT: + case LSHIFTRT: + /* For logical shifts by constant multiples of 8. */ + igain = optimize_insn_for_size_p () ? COSTS_N_BYTES (4) + : COSTS_N_INSNS (1); Isn't the conversion an universal win for -O2 as well as for -Os? The conversion to/from XMM register is already accounted for, so for -Os substituting shldq/salq with pslldq should always be a win. I'd expect the cost calculation to be similar to the general_scalar_chain::compute_convert_gain cost calculation with m = 2. Uros.
Hi Uros, > From: Uros Bizjak <ubizjak@gmail.com> > Sent: 31 July 2022 18:23 > To: Roger Sayle <roger@nextmovesoftware.com> > On Fri, Jul 29, 2022 at 12:18 AM Roger Sayle <roger@nextmovesoftware.com> > wrote: > > > > This patch improves TImode STV by adding support for logical shifts by > > integer constants that are multiples of 8. For the test case: > > > > __int128 a, b; > > void foo() { a = b << 16; } > > > > on x86_64, gcc -O2 currently generates: > > > > movq b(%rip), %rax > > movq b+8(%rip), %rdx > > shldq $16, %rax, %rdx > > salq $16, %rax > > movq %rax, a(%rip) > > movq %rdx, a+8(%rip) > > ret > > > > with this patch we now generate: > > > > movdqa b(%rip), %xmm0 > > pslldq $2, %xmm0 > > movaps %xmm0, a(%rip) > > ret > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check. both with and without --target_board=unix{-m32}, > > with no new failures. Ok for mainline? > > > > > > 2022-07-28 Roger Sayle <roger@nextmovesoftware.com> > > > > gcc/ChangeLog > > * config/i386/i386-features.cc (compute_convert_gain): Add gain > > for converting suitable TImode shift to a V1TImode shift. > > (timode_scalar_chain::convert_insn): Add support for converting > > suitable ASHIFT and LSHIFTRT. > > (timode_scalar_to_vector_candidate_p): Consider logical shifts > > by integer constants that are multiples of 8 to be candidates. > > > > gcc/testsuite/ChangeLog > > * gcc.target/i386/sse4_1-stv-7.c: New test case. > > + case ASHIFT: > + case LSHIFTRT: > + /* For logical shifts by constant multiples of 8. */ igain = > + optimize_insn_for_size_p () ? COSTS_N_BYTES (4) > + : COSTS_N_INSNS (1); > > Isn't the conversion an universal win for -O2 as well as for -Os? The conversion > to/from XMM register is already accounted for, so for -Os substituting > shldq/salq with pslldq should always be a win. I'd expect the cost calculation to > be similar to the general_scalar_chain::compute_convert_gain cost calculation > with m = 2. I agree that the terminology is perhaps a little confusing. The compute_convert_gain function calculates the total "gain" from an STV chain, summing the igain of each instruction, and performs the STV transformation if this total gain is greater than zero. Hence positive values are good and negative values are bad. In this case, of a logical shift by multiple of 8, converting the chain is indeed always beneficial, reducing by 4 bytes in size when optimizing for size, and avoiding 1 fast instruction when optimizing for speed. Having a "positive gain of four bytes" sounds bad, but in this sense the gain is used as a synonym of "benefit" not "magnitude". By comparison, shifting by a single bit 128 bit value is always a net loss, requiring three addition fast instructions, or 15 extra bytes in size. However, it's still worth considering/capturing these counter-productive (i.e. negative) values, as they might be compensated for by other wins in the chain. Dealing with COSTS_N_BYTES (when optimizing for size) and COSTS_N_INSNS (when optimizing for speed) allows much finer granularity. For example, the constant number of bits used in a shift/rotate, or the value of an immediate constant in a compare have significant effects on the size/speed of scalar vs. vector code, and this isn't (yet) something easily handled by the simple "m" approximation used in general_scalar_chain::compute_convert_gain. See (comment #5 of) PR target/105034 which mentions the need for more accurate parameterization of compute_convert_gain (in response to the undesirable suggestion of simply disabling STV when optimizing for size). I hope this explains the above idiom. Hopefully, things will become clearer when support for shifts by other bit counts, and arithmetic shifts, are added to this part of the code (STV). I'll be sure to add more comments. [Ok for mainline?] Cheers, Roger --
On Tue, Aug 2, 2022 at 7:02 PM Roger Sayle <roger@nextmovesoftware.com> wrote: > > > Hi Uros, > > > From: Uros Bizjak <ubizjak@gmail.com> > > Sent: 31 July 2022 18:23 > > To: Roger Sayle <roger@nextmovesoftware.com> > > On Fri, Jul 29, 2022 at 12:18 AM Roger Sayle <roger@nextmovesoftware.com> > > wrote: > > > > > > This patch improves TImode STV by adding support for logical shifts by > > > integer constants that are multiples of 8. For the test case: > > > > > > __int128 a, b; > > > void foo() { a = b << 16; } > > > > > > on x86_64, gcc -O2 currently generates: > > > > > > movq b(%rip), %rax > > > movq b+8(%rip), %rdx > > > shldq $16, %rax, %rdx > > > salq $16, %rax > > > movq %rax, a(%rip) > > > movq %rdx, a+8(%rip) > > > ret > > > > > > with this patch we now generate: > > > > > > movdqa b(%rip), %xmm0 > > > pslldq $2, %xmm0 > > > movaps %xmm0, a(%rip) > > > ret > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > > and make -k check. both with and without --target_board=unix{-m32}, > > > with no new failures. Ok for mainline? > > > > > > > > > 2022-07-28 Roger Sayle <roger@nextmovesoftware.com> > > > > > > gcc/ChangeLog > > > * config/i386/i386-features.cc (compute_convert_gain): Add gain > > > for converting suitable TImode shift to a V1TImode shift. > > > (timode_scalar_chain::convert_insn): Add support for converting > > > suitable ASHIFT and LSHIFTRT. > > > (timode_scalar_to_vector_candidate_p): Consider logical shifts > > > by integer constants that are multiples of 8 to be candidates. > > > > > > gcc/testsuite/ChangeLog > > > * gcc.target/i386/sse4_1-stv-7.c: New test case. > > > > + case ASHIFT: > > + case LSHIFTRT: > > + /* For logical shifts by constant multiples of 8. */ igain = > > + optimize_insn_for_size_p () ? COSTS_N_BYTES (4) > > + : COSTS_N_INSNS (1); > > > > Isn't the conversion an universal win for -O2 as well as for -Os? The conversion > > to/from XMM register is already accounted for, so for -Os substituting > > shldq/salq with pslldq should always be a win. I'd expect the cost calculation to > > be similar to the general_scalar_chain::compute_convert_gain cost calculation > > with m = 2. > > I agree that the terminology is perhaps a little confusing. The > compute_convert_gain function calculates the total "gain" from an > STV chain, summing the igain of each instruction, and performs > the STV transformation if this total gain is greater than zero. > Hence positive values are good and negative values are bad. > > In this case, of a logical shift by multiple of 8, converting the chain is indeed always > beneficial, reducing by 4 bytes in size when optimizing for size, and avoiding 1 fast > instruction when optimizing for speed. Having a "positive gain of four bytes" sounds bad, > but in this sense the gain is used as a synonym of "benefit" not "magnitude". > > By comparison, shifting by a single bit 128 bit value is always a net loss, requiring > three addition fast instructions, or 15 extra bytes in size. However, it's still worth > considering/capturing these counter-productive (i.e. negative) values, as they > might be compensated for by other wins in the chain. > > Dealing with COSTS_N_BYTES (when optimizing for size) and COSTS_N_INSNS > (when optimizing for speed) allows much finer granularity. For example, > the constant number of bits used in a shift/rotate, or the value of an > immediate constant in a compare have significant effects on the size/speed > of scalar vs. vector code, and this isn't (yet) something easily handled by the > simple "m" approximation used in general_scalar_chain::compute_convert_gain. > > See (comment #5 of) PR target/105034 which mentions the need for more > accurate parameterization of compute_convert_gain (in response to the > undesirable suggestion of simply disabling STV when optimizing for size). > > I hope this explains the above idiom. Hopefully, things will become clearer > when support for shifts by other bit counts, and arithmetic shifts, are added > to this part of the code (STV). I'll be sure to add more comments. Thanks for the explanation, I was confused by the differences between 32bit DImode and 64bit TImode costs (these should IMO be quite similar, so if the TImode cost function is now more precise, DImode cost function should follow it). > [Ok for mainline?] OK. The cost functions are always in need of more accuracy.. Thanks, Uros.
diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc index aa5de71..e1e0645 100644 --- a/gcc/config/i386/i386-features.cc +++ b/gcc/config/i386/i386-features.cc @@ -1221,6 +1221,13 @@ timode_scalar_chain::compute_convert_gain () igain = COSTS_N_INSNS (1); break; + case ASHIFT: + case LSHIFTRT: + /* For logical shifts by constant multiples of 8. */ + igain = optimize_insn_for_size_p () ? COSTS_N_BYTES (4) + : COSTS_N_INSNS (1); + break; + default: break; } @@ -1462,6 +1469,12 @@ timode_scalar_chain::convert_insn (rtx_insn *insn) src = convert_compare (XEXP (src, 0), XEXP (src, 1), insn); break; + case ASHIFT: + case LSHIFTRT: + convert_op (&XEXP (src, 0), insn); + PUT_MODE (src, V1TImode); + break; + default: gcc_unreachable (); } @@ -1796,6 +1809,14 @@ timode_scalar_to_vector_candidate_p (rtx_insn *insn) case NOT: return REG_P (XEXP (src, 0)) || timode_mem_p (XEXP (src, 0)); + case ASHIFT: + case LSHIFTRT: + /* Handle logical shifts by integer constants between 0 and 120 + that are multiples of 8. */ + return REG_P (XEXP (src, 0)) + && CONST_INT_P (XEXP (src, 1)) + && (INTVAL (XEXP (src, 1)) & ~0x78) == 0; + default: return false; } diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-stv-7.c b/gcc/testsuite/gcc.target/i386/sse4_1-stv-7.c new file mode 100644 index 0000000..b0d5fce --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sse4_1-stv-7.c @@ -0,0 +1,18 @@ +/* { dg-do compile { target int128 } } */ +/* { dg-options "-O2 -msse4.1 -mstv -mno-stackrealign" } */ + +unsigned __int128 a; +unsigned __int128 b; + +void foo() +{ + a = b << 16; +} + +void bar() +{ + a = b >> 16; +} + +/* { dg-final { scan-assembler "pslldq" } } */ +/* { dg-final { scan-assembler "psrldq" } } */