From patchwork Sun Oct 29 09:16:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Roger Sayle X-Patchwork-Id: 159346 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp1594611vqb; Sun, 29 Oct 2023 02:17:13 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE8J2XLqDE/D61Ul8aFYoR4FX8oVdFpwhrZo6SGHf8bpaMal6QzdzqfVEkg4HKWPyB5MK94 X-Received: by 2002:a05:620a:4408:b0:778:8ebc:129f with SMTP id v8-20020a05620a440800b007788ebc129fmr13534142qkp.8.1698571033039; Sun, 29 Oct 2023 02:17:13 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1698571033; cv=pass; d=google.com; s=arc-20160816; b=EpjkpvvEeexDjYDG0OopGgpNb724OzSzV8WhmoYyyrsGjqbL7oMfXc8J7gu1MpjID4 8E/BNaXEbVY3YyrCINPq1tVIwm1yAOEuUD7sd7S/JmY6pRRzYuzW79GM/DmXzVFaw+L+ XmFNXzvnL7jckCNfoGVDYm8pz71Sc2w7YWWlNISab6PLybsOJf1Oc2vngJkI9nLcCNlP zFwSf8WQ2JVrxAqUUAQs89ZCaLEd4JqE3NZNl4PbNHZipbqK76G+FexPwAwZUtofV1ss 1E+nZ0qHL6V/h3tTNSmh8AfItNa/geqWsRJZSksXfwYEDzkSr/YEUBtYop19JQZxT+OT ZtIQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-language:thread-index :mime-version:message-id:date:subject:cc:to:from:dkim-signature :arc-filter:dmarc-filter:delivered-to; bh=dYw7qIHkvoUxsyPgSIvGeIu+Jrt7lwroVxU9qlGtJMw=; fh=Ht8bq8SVyF6sx4+E7Os+tBO2MuNVfxRGp/jyiuwYCzE=; b=gTbcz4auxYCUSaW2yioVTLLJlhxTftmIwi/RJyqk5wDim3GUx9YgyFD93AqGKy+eHy ybTSvd+E3I/aJYxElwMHoDvK82lzujJ1ZVgKrmjoJHGkNWs8T3TfuW4KURgDaoRHc2Qa rMThxjWYwWIx56JuIdHW4RLVnwi+mxthmm3n//QpzGV5MyxTT2HFxymz62p9R+xuzP3X m53NKVuQN461fXvFrBQKScJV4pJ45EK/AbO19RK+Wdxdo2Aw4dEEAdeWPRy44qRqnR22 pUoJPdJqUDNJWw2k+GSKveertTLkDDTGZplgeedxmivRTdcCWsd7g34PzLCgwsm5rfhj enCw== ARC-Authentication-Results: i=2; mx.google.com; dkim=fail header.i=@nextmovesoftware.com header.s=default header.b=iNDgHIBi; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id ee9-20020a0562140a4900b0065b15f43702si3647351qvb.473.2023.10.29.02.17.12 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 29 Oct 2023 02:17:13 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=fail header.i=@nextmovesoftware.com header.s=default header.b=iNDgHIBi; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C8A0B3861859 for ; Sun, 29 Oct 2023 09:17:12 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from server.nextmovesoftware.com (server.nextmovesoftware.com [162.254.253.69]) by sourceware.org (Postfix) with ESMTPS id 18B4A385840E for ; Sun, 29 Oct 2023 09:16:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 18B4A385840E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nextmovesoftware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=nextmovesoftware.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 18B4A385840E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=162.254.253.69 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698571009; cv=none; b=OX6d3vonGfw27Cr1uPwj4HNvseE6Zgs/notLYUj9f6wkLQe3dbzGiKpJ7aFqjrOcqu7EAc3Ew8wbZE8C+09CDNl78TUkYyjKGEtOz0ju8Ou5U3ywlZnxD7EiSbRrHAQyT8cc9BSwMduEnqADRqpuS1soQwmKlmKq58XpaCrCPQQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698571009; c=relaxed/simple; bh=VwC9sWCCYJ23ff6iEQY1ER/cxEOC8plIUC7qvGKHla0=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=IHFdmv0gb3bgZ+HGwLHhpIK8+9Kti8xuY+BemuCDov2omua6szBWVMvj1znIpezOorD1JKV+A1o3edBp6eYlRUPfhLLdiHy+Rpq4m4xAsqvC0l7m0T4BUj09SphS3rZlcvu5auKQSj4s5ITynQ4pjao6DJ7t/8Afe82XGo1WWos= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID: Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=dYw7qIHkvoUxsyPgSIvGeIu+Jrt7lwroVxU9qlGtJMw=; b=iNDgHIBiHwqdnzMfofgbnGhzaL iPKrHD9gtT+q8xiwiPYl4zQdrUP3X2flal8ktyKEbpbS1p72vswo69JEvwEaZGXO7kVol52qdPJUA 3wtbLml9kV6X/EAnR80nXPeLupUoDVjR1hvbZI+AYFBT0V9bnc/tZS8Uak62NHnmab4cmSTqaz4+V UuhzZ6+j5bacuUn05B5VOt0xs4bPoSxik/o+Vkz3rrIvBBmwH0Xv7QOZNxAF/uEI+o88bjJygucpw onhmKDeWPhguyOpAyrUpfeW3zYTRxYMNNZFuvINq7I0OMEcq+pPq0nNXR4FtKtvEfIqtaE2Y+UOW1 51BV4r9Q==; Received: from host86-160-20-38.range86-160.btcentralplus.com ([86.160.20.38]:50029 helo=Dell) by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96.2) (envelope-from ) id 1qx1uZ-0001lj-0l; Sun, 29 Oct 2023 05:16:47 -0400 From: "Roger Sayle" To: Cc: "'Claudiu Zissulescu'" Subject: [ARC PATCH] Improved ARC rtx_costs/insn_cost for SHIFTs and ROTATEs. Date: Sun, 29 Oct 2023 09:16:44 -0000 Message-ID: <014601da0a48$a3d6b010$eb841030$@nextmovesoftware.com> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AdoKRpqxWnbYR5DrRqSk901BRcS9sg== Content-Language: en-gb X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com X-AntiAbuse: Original Domain - gcc.gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - nextmovesoftware.com X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id: roger@nextmovesoftware.com X-Authenticated-Sender: server.nextmovesoftware.com: roger@nextmovesoftware.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781080819303499079 X-GMAIL-MSGID: 1781080819303499079 This patch overhauls the ARC backend's insn_cost target hook, and makes some related improvements to rtx_costs, BRANCH_COST, etc. The primary goal is to allow the backend to indicate that shifts and rotates are slow (discouraged) when the CPU doesn't have a barrel shifter. I should also acknowledge Richard Sandiford for inspiring the use of set_cost in this rewrite of arc_insn_cost; this implementation borrows heavily for the target hooks for AArch64 and ARM. The motivating example is derived from PR rtl-optimization/110717. struct S { int a : 5; }; unsigned int foo (struct S *p) { return p->a; } With a barrel shifter, GCC -O2 generates the reasonable: foo: ldb_s r0,[r0] asl_s r0,r0,27 j_s.d [blink] asr_s r0,r0,27 What's interesting is that during combine, the middle-end actually has two shifts by three bits, and a sign-extension from QI to SI. Trying 8, 9 -> 11: 8: r158:SI=r157:QI#0<<0x3 REG_DEAD r157:QI 9: r159:SI=sign_extend(r158:SI#0) REG_DEAD r158:SI 11: r155:SI=r159:SI>>0x3 REG_DEAD r159:SI Whilst it's reasonable to simplify this to two shifts by 27 bits when the CPU has a barrel shifter, it's actually a significant pessimization when these shifts are implemented by loops. This combination can be prevented if the backend provides accurate-ish estimates for insn_cost. Previously, without a barrel shifter, GCC -O2 -mcpu=em generates: foo: ldb_s r0,[r0] mov lp_count,27 lp 2f add r0,r0,r0 nop 2: # end single insn loop mov lp_count,27 lp 2f asr r0,r0 nop 2: # end single insn loop j_s [blink] which contains two loops and requires about ~113 cycles to execute. With this patch to rtx_cost/insn_cost, GCC -O2 -mcpu=em generates: foo: ldb_s r0,[r0] mov_s r2,0 ;3 add3 r0,r2,r0 sexb_s r0,r0 asr_s r0,r0 asr_s r0,r0 j_s.d [blink] asr_s r0,r0 which requires only ~6 cycles, for the shorter shifts by 3 and sign extension. Tested with a cross-compiler to arc-linux hosted on x86_64, with no new (compile-only) regressions from make -k check. Ok for mainline if this passes Claudiu's nightly testing? 2023-10-29 Roger Sayle gcc/ChangeLog * config/arc/arc.cc (arc_rtx_costs): Improve cost estimates. Provide reasonable values for SHIFTS and ROTATES by constant bit counts depending upon TARGET_BARREL_SHIFTER. (arc_insn_cost): Use insn attributes if the instruction is recognized. Avoid calling get_attr_length for type "multi", i.e. define_insn_and_split patterns without explicit type. Fall-back to set_rtx_cost for single_set and pattern_cost otherwise. * config/arc/arc.h (COSTS_N_BYTES): Define helper macro. (BRANCH_COST): Improve/correct definition. (LOGICAL_OP_NON_SHORT_CIRCUIT): Preserve previous behavior. Thanks again, Roger diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc index 353ac69..ae83e5e 100644 --- a/gcc/config/arc/arc.cc +++ b/gcc/config/arc/arc.cc @@ -5492,7 +5492,7 @@ arc_rtx_costs (rtx x, machine_mode mode, int outer_code, case CONST: case LABEL_REF: case SYMBOL_REF: - *total = speed ? COSTS_N_INSNS (1) : COSTS_N_INSNS (4); + *total = speed ? COSTS_N_INSNS (1) : COSTS_N_BYTES (4); return true; case CONST_DOUBLE: @@ -5516,26 +5516,32 @@ arc_rtx_costs (rtx x, machine_mode mode, int outer_code, case ASHIFT: case ASHIFTRT: case LSHIFTRT: + case ROTATE: + case ROTATERT: + if (mode == DImode) + return false; if (TARGET_BARREL_SHIFTER) { - if (CONSTANT_P (XEXP (x, 0))) + *total = COSTS_N_INSNS (1); + if (CONSTANT_P (XEXP (x, 1))) { - *total += rtx_cost (XEXP (x, 1), mode, (enum rtx_code) code, + *total += rtx_cost (XEXP (x, 0), mode, (enum rtx_code) code, 0, speed); return true; } - *total = COSTS_N_INSNS (1); } else if (GET_CODE (XEXP (x, 1)) != CONST_INT) - *total = COSTS_N_INSNS (16); + *total = speed ? COSTS_N_INSNS (16) : COSTS_N_INSNS (4); else { - *total = COSTS_N_INSNS (INTVAL (XEXP ((x), 1))); - /* ??? want_to_gcse_p can throw negative shift counts at us, - and then panics when it gets a negative cost as result. - Seen for gcc.c-torture/compile/20020710-1.c -Os . */ - if (*total < 0) - *total = 0; + int n = INTVAL (XEXP (x, 1)) & 31; + if (n < 4) + *total = COSTS_N_INSNS (n); + else + *total = speed ? COSTS_N_INSNS (n + 2) : COSTS_N_INSNS (4); + *total += rtx_cost (XEXP (x, 0), mode, (enum rtx_code) code, + 0, speed); + return true; } return false; @@ -5567,6 +5573,8 @@ arc_rtx_costs (rtx x, machine_mode mode, int outer_code, return false; case PLUS: + if (mode == DImode) + return false; if (outer_code == MEM && CONST_INT_P (XEXP (x, 1)) && RTX_OK_FOR_OFFSET_P (mode, XEXP (x, 1))) { @@ -11101,35 +11109,37 @@ static int arc_insn_cost (rtx_insn *insn, bool speed) { int cost; - if (recog_memoized (insn) < 0) - return 0; - - /* If optimizing for size, we want the insn size. */ - if (!speed) - return get_attr_length (insn); - - /* Use cost if provided. */ - cost = get_attr_cost (insn); - if (cost > 0) - return cost; - - /* For speed make a simple cost model: memory access is more - expensive than any other instruction. */ - enum attr_type type = get_attr_type (insn); - - switch (type) + enum attr_type type; + if (recog_memoized (insn) >= 0) { - case TYPE_LOAD: - case TYPE_STORE: - cost = COSTS_N_INSNS (2); - break; - - default: - cost = COSTS_N_INSNS (1); - break; + if (speed) + { + /* Use cost if provided. */ + cost = get_attr_cost (insn); + if (cost > 0) + return cost; + /* For speed make a simple cost model: memory access is more + expensive than any other instruction. */ + type = get_attr_type (insn); + if (type == TYPE_LOAD || type == TYPE_STORE) + return COSTS_N_INSNS (2); + } + else + { + /* If optimizing for size, we want the insn size. */ + type = get_attr_type (insn); + if (type != TYPE_MULTI) + return get_attr_length (insn); + } } - return cost; + if (rtx set = single_set (insn)) + cost = set_rtx_cost (set, speed); + else + cost = pattern_cost (PATTERN (insn), speed); + /* If the cost is zero, then it's likely a complex insn. We don't + want the cost of these to be less than something we know about. */ + return cost ? cost : COSTS_N_INSNS (2); } static unsigned diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h index 5877389..b34f0b2 100644 --- a/gcc/config/arc/arc.h +++ b/gcc/config/arc/arc.h @@ -956,10 +956,16 @@ arc_select_cc_mode (OP, X, Y) /* Costs. */ +/* Analog of COSTS_N_INSNS when optimizing for size. */ +#ifndef COSTS_N_BYTES +#define COSTS_N_BYTES(N) (N) +#endif + /* The cost of a branch insn. */ /* ??? What's the right value here? Branches are certainly more expensive than reg->reg moves. */ -#define BRANCH_COST(speed_p, predictable_p) 2 +#define BRANCH_COST(speed_p, predictable_p) \ + (speed_p ? COSTS_N_INSNS (2) : COSTS_N_INSNS (1)) /* Scc sets the destination to 1 and then conditionally zeroes it. Best case, ORed SCCs can be made into clear - condset - condset. @@ -971,11 +977,8 @@ arc_select_cc_mode (OP, X, Y) beging decisive of p0, we want: p0 * (branch_cost - 4) > (1 - p0) * 5 ??? We don't get to see that probability to evaluate, so we can - only wildly guess that it might be 50%. - ??? The compiler also lacks the notion of branch predictability. */ -#define LOGICAL_OP_NON_SHORT_CIRCUIT \ - (BRANCH_COST (optimize_function_for_speed_p (cfun), \ - false) > 9) + only wildly guess that it might be 50%. */ +#define LOGICAL_OP_NON_SHORT_CIRCUIT false /* Nonzero if access to memory by bytes is slow and undesirable. For RISC chips, it means that access to memory by bytes is no