From patchwork Sat Oct 14 23:32:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 152976 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp2696800vqb; Sat, 14 Oct 2023 16:33:26 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGV8LXyEJzOeBGwgwdtNNn0sOZuYsNaa0Zr5X9I7EyznynO6BCOz2E/PoywC15Gn5IGhGbb X-Received: by 2002:a05:620a:2889:b0:76c:df5d:13a9 with SMTP id j9-20020a05620a288900b0076cdf5d13a9mr34174597qkp.58.1697326406406; Sat, 14 Oct 2023 16:33:26 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1697326406; cv=pass; d=google.com; s=arc-20160816; b=Oz85Hr67axh5HgJuKSFMNOQQgWRVSyjUF6Nfnzr/BMt84X0PIkqXMz12kngku5Vs3g mOmkz/QBsyRC04ODqH66xUpjNjUYXj5C9v+Ak/20tSfKl5YR2SSp3egh8usIgsjYmQrS T8RBm0onUeVHW0k4IzRMdLpSJLiYUGUyYkVXl+Pywu+qvbTUGwFFM3C0Tur/jRust5Nm IFXt6IrHDaAteovz0nPS+4Um0Pxa/PL0HNnd9z+rKsnXE6oO1JfJi1DKPx9OhogIOFNE 3jJ8HjH9fUPDxFAXEwwSyeTpDG2FoiKjLcltYHOUlQJBhgzS1YEegSFhK0V7rP+DhJVb WoNg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-language:thread-index :mime-version:message-id:date:subject:cc:to:from:dkim-signature :dmarc-filter:arc-filter:delivered-to; bh=tjPThpbYZCnrg7LcToVaA6m6gnL3BXEUApt/SMzM7og=; fh=Sy3anwBK6I8K69zm30rjfQ8lZh395VkBfIIxnEUPKpA=; b=dsqp1U9NljxHaeEtN6V6GwfXPsvbBHgF7VGUqO+I1HYghKsNLF8d0W/requUk6Aj5b rjVHeTXYwwC9opNQQIdRGa1V88G/ORE47ubf4RSDyJYn2W451y7rpDzsKBcKT5U9MTR4 CzcnDj0QG3tI7HIjtNXMgovypgt2K495vY+A2HNr++7nmp9kA3WCd8dHmfog9jUITfHc E7FMawEZ+dYVR8L5MTSLjDQmlFXN12gumZltktAl8P6yy/QHwkTzVOMxUQyLNXCPS5Rm WmRdK1Q4l9hQdz0y3EtLPpYc5iYwcEg6rTfiKSAOD+YGmzo0aXuH7ssLugYvtZXsbjxe TRrg== ARC-Authentication-Results: i=2; mx.google.com; dkim=fail header.i=@nextmovesoftware.com header.s=default header.b=EZMNRuUh; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id ow40-20020a05620a822800b0077435f2b668si3295790qkn.207.2023.10.14.16.33.26 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 14 Oct 2023 16:33:26 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=fail header.i=@nextmovesoftware.com header.s=default header.b=EZMNRuUh; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2E9453858C2D for ; Sat, 14 Oct 2023 23:33:26 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 265913858D20 for ; Sat, 14 Oct 2023 23:32:58 +0000 (GMT) ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 265913858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=162.254.253.69 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697326379; cv=none; b=rhTXYKnFPlDyp5wsc9XPk8VrTmCJvuoCc41Y6wqwKDgd7he50uJL8/m8WOMASVgIPZcqnsHkb+em930fyjxeeO7Kq3ul5gJKT8o0sAaI0BI/HoeOGaqTuvhMyesxn1ucytxDXAxj1lzAzSADuc2t4CU2tshGiQNSjUG5F7PjUUY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697326379; c=relaxed/simple; bh=Yx4Yk/j6LOm5TRfBp6E4KlI8v8itcug4Szq8o1zUlTQ=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=QG2RxUMbidIWyCBu9eorLI9yRZXsC2M/EGcU24dJExIu2JoisGIevNBWvH5HZs29kR21lp4C+fnaSUn2K+MslvESH3Up3+kQ0zUoYtJRI72sHvCn7wXbLc82WVptsRJ4CZgXE+YX+N9ji50scaRNDO6dPyNfm0lEbUr1nUv06vs= ARC-Authentication-Results: i=1; server2.sourceware.org DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 265913858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=tjPThpbYZCnrg7LcToVaA6m6gnL3BXEUApt/SMzM7og=; b=EZMNRuUhwX/rSSUS3uTz6PqVKH i64QXT5mguE0J7jmmSbGzTbfpxx5EDD+XKXrtUSopSA0Ut1CYmb3aJoPwUqtTzKYrms/tiFTZAWaR GAXskwq8LpdmK8+pUbBzO7A4HArbHOempTRGjQGtydnxgCbWyzR7GrzAfWg0mj4ahMhfqP7oSTAXX 7ulf3aDG4xETljJJAAdFhgjoO89kxG0tgt1BcHH/ntiTsMkrhhtepRqXcvK2s4FrKGcMbuFZ0GakO WYclh2RhBpKQjMEYvy8tfjsmqIVDEt79tEdVTzRgBg+C5tbsr/2P8XPDNg1j5lr9XUNPIGTrXtViW dX2Ne7lw==; Received: from host86-160-20-38.range86-160.btcentralplus.com ([86.160.20.38]:60051 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96.1) (envelope-from ) id 1qro7t-00031t-1x; Sat, 14 Oct 2023 19:32:57 -0400 From: "Roger Sayle" To: Cc: "'Jeff Law'" Subject: [PATCH] Improved RTL expansion of 1LL << x. Date: Sun, 15 Oct 2023 00:32:58 +0100 Message-ID: <020d01d9fef6$c4fff920$4effeb60$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: Adn+9a/vIVYHTm2OQUeSRExEsyxuhQ== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1779775733756353658 X-GMAIL-MSGID: 1779775733756353658 This patch improves the initial RTL expanded for double word shifts on architectures with conditional moves, so that later passes don't need to clean-up unnecessary and/or unused instructions. Consider the general case, x << y, which is expanded well as: t1 = y & 32; t2 = 0; t3 = x_lo >> 1; t4 = y ^ ~0; t5 = t3 >> t4; tmp_hi = x_hi << y; tmp_hi |= t5; tmp_lo = x_lo << y; out_hi = t1 ? tmp_lo : tmp_hi; out_lo = t1 ? t2 : tmp_lo; which is nearly optimal, the only thing that can be improved is that using a unary NOT operation "t4 = ~y" is better than XOR with -1, on targets that support it. [Note the one_cmpl_optab expander didn't fall back to XOR when this code was originally written, but has been improved since]. Now consider the relatively common idiom of 1LL << y, which currently produces the RTL equivalent of: t1 = y & 32; t2 = 0; t3 = 1 >> 1; t4 = y ^ ~0; t5 = t3 >> t4; tmp_hi = 0 << y; tmp_hi |= t5; tmp_lo = 1 << y; out_hi = t1 ? tmp_lo : tmp_hi; out_lo = t1 ? t2 : tmp_lo; Notice here that t3 is always zero, so the assignment of t5 is a variable shift of zero, which expands to a loop on many smaller targets, a similar shift by zero in the first tmp_hi assignment (another loop), that the value of t4 is no longer required (as t3 is zero), and that the ultimate value of tmp_hi is always zero. Fortunately, for many (but perhaps not all) targets this mess gets cleaned up by later optimization passes. However, this patch avoids generating unnecessary RTL at expand time, by calling simplify_expand_binop instead of expand_binop, and avoiding generating dead or unnecessary code when intermediate values are known to be zero. For the 1LL << y test case above, we now generate: t1 = y & 32; t2 = 0; tmp_hi = 0; tmp_lo = 1 << y; out_hi = t1 ? tmp_lo : tmp_hi; out_lo = t1 ? t2 : tmp_lo; On arc-elf, for example, there are 18 RTL INSN_P instructions generated by expand before this patch, but only 12 with this patch (improving both compile-time and memory usage). This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-10-15 Roger Sayle gcc/ChangeLog * optabs.cc (expand_subword_shift): Call simplify_expand_binop instead of expand_binop. Optimize cases (i.e. avoid generating RTL) when CARRIES or INTO_INPUT is zero. Use one_cmpl_optab (i.e. NOT) instead of xor_optab with ~0 to calculate ~OP1. Thanks in advance, Roger diff --git a/gcc/optabs.cc b/gcc/optabs.cc index e1898da..f0a048a 100644 --- a/gcc/optabs.cc +++ b/gcc/optabs.cc @@ -533,15 +533,13 @@ expand_subword_shift (scalar_int_mode op1_mode, optab binoptab, has unknown behavior. Do a single shift first, then shift by the remainder. It's OK to use ~OP1 as the remainder if shift counts are truncated to the mode size. */ - carries = expand_binop (word_mode, reverse_unsigned_shift, - outof_input, const1_rtx, 0, unsignedp, methods); - if (shift_mask == BITS_PER_WORD - 1) - { - tmp = immed_wide_int_const - (wi::minus_one (GET_MODE_PRECISION (op1_mode)), op1_mode); - tmp = simplify_expand_binop (op1_mode, xor_optab, op1, tmp, - 0, true, methods); - } + carries = simplify_expand_binop (word_mode, reverse_unsigned_shift, + outof_input, const1_rtx, 0, + unsignedp, methods); + if (carries == const0_rtx) + tmp = const0_rtx; + else if (shift_mask == BITS_PER_WORD - 1) + tmp = expand_unop (op1_mode, one_cmpl_optab, op1, 0, true); else { tmp = immed_wide_int_const (wi::shwi (BITS_PER_WORD - 1, @@ -552,22 +550,29 @@ expand_subword_shift (scalar_int_mode op1_mode, optab binoptab, } if (tmp == 0 || carries == 0) return false; - carries = expand_binop (word_mode, reverse_unsigned_shift, - carries, tmp, 0, unsignedp, methods); + if (carries != const0_rtx && tmp != const0_rtx) + carries = simplify_expand_binop (word_mode, reverse_unsigned_shift, + carries, tmp, 0, unsignedp, methods); if (carries == 0) return false; - /* Shift INTO_INPUT logically by OP1. This is the last use of INTO_INPUT - so the result can go directly into INTO_TARGET if convenient. */ - tmp = expand_binop (word_mode, unsigned_shift, into_input, op1, - into_target, unsignedp, methods); - if (tmp == 0) - return false; + if (into_input != const0_rtx) + { + /* Shift INTO_INPUT logically by OP1. This is the last use of + INTO_INPUT so the result can go directly into INTO_TARGET if + convenient. */ + tmp = simplify_expand_binop (word_mode, unsigned_shift, into_input, + op1, into_target, unsignedp, methods); + if (tmp == 0) + return false; - /* Now OR in the bits carried over from OUTOF_INPUT. */ - if (!force_expand_binop (word_mode, ior_optab, tmp, carries, - into_target, unsignedp, methods)) - return false; + /* Now OR in the bits carried over from OUTOF_INPUT. */ + if (!force_expand_binop (word_mode, ior_optab, tmp, carries, + into_target, unsignedp, methods)) + return false; + } + else + emit_move_insn (into_target, carries); /* Use a standard word_mode shift for the out-of half. */ if (outof_target != 0)