From patchwork Sun Oct 29 09:16:44 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Roger Sayle <roger@nextmovesoftware.com>
X-Patchwork-Id: 159346
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp1594611vqb;
        Sun, 29 Oct 2023 02:17:13 -0700 (PDT)
X-Google-Smtp-Source: 
 AGHT+IE8J2XLqDE/D61Ul8aFYoR4FX8oVdFpwhrZo6SGHf8bpaMal6QzdzqfVEkg4HKWPyB5MK94
X-Received: by 2002:a05:620a:4408:b0:778:8ebc:129f with SMTP id
 v8-20020a05620a440800b007788ebc129fmr13534142qkp.8.1698571033039;
        Sun, 29 Oct 2023 02:17:13 -0700 (PDT)
ARC-Seal: i=2; a=rsa-sha256; t=1698571033; cv=pass;
        d=google.com; s=arc-20160816;
        b=EpjkpvvEeexDjYDG0OopGgpNb724OzSzV8WhmoYyyrsGjqbL7oMfXc8J7gu1MpjID4
         8E/BNaXEbVY3YyrCINPq1tVIwm1yAOEuUD7sd7S/JmY6pRRzYuzW79GM/DmXzVFaw+L+
         XmFNXzvnL7jckCNfoGVDYm8pz71Sc2w7YWWlNISab6PLybsOJf1Oc2vngJkI9nLcCNlP
         zFwSf8WQ2JVrxAqUUAQs89ZCaLEd4JqE3NZNl4PbNHZipbqK76G+FexPwAwZUtofV1ss
         1E+nZ0qHL6V/h3tTNSmh8AfItNa/geqWsRJZSksXfwYEDzkSr/YEUBtYop19JQZxT+OT
         ZtIQ==
ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=errors-to:list-subscribe:list-help:list-post:list-archive
         :list-unsubscribe:list-id:precedence:content-language:thread-index
         :mime-version:message-id:date:subject:cc:to:from:dkim-signature
         :arc-filter:dmarc-filter:delivered-to;
        bh=dYw7qIHkvoUxsyPgSIvGeIu+Jrt7lwroVxU9qlGtJMw=;
        fh=Ht8bq8SVyF6sx4+E7Os+tBO2MuNVfxRGp/jyiuwYCzE=;
        b=gTbcz4auxYCUSaW2yioVTLLJlhxTftmIwi/RJyqk5wDim3GUx9YgyFD93AqGKy+eHy
         ybTSvd+E3I/aJYxElwMHoDvK82lzujJ1ZVgKrmjoJHGkNWs8T3TfuW4KURgDaoRHc2Qa
         rMThxjWYwWIx56JuIdHW4RLVnwi+mxthmm3n//QpzGV5MyxTT2HFxymz62p9R+xuzP3X
         m53NKVuQN461fXvFrBQKScJV4pJ45EK/AbO19RK+Wdxdo2Aw4dEEAdeWPRy44qRqnR22
         pUoJPdJqUDNJWw2k+GSKveertTLkDDTGZplgeedxmivRTdcCWsd7g34PzLCgwsm5rfhj
         enCw==
ARC-Authentication-Results: i=2; mx.google.com;
       dkim=fail header.i=@nextmovesoftware.com header.s=default
 header.b=iNDgHIBi;
       arc=pass (i=1);
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"
Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97])
        by mx.google.com with ESMTPS id
 ee9-20020a0562140a4900b0065b15f43702si3647351qvb.473.2023.10.29.02.17.12
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sun, 29 Oct 2023 02:17:13 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender) client-ip=8.43.85.97;
Authentication-Results: mx.google.com;
       dkim=fail header.i=@nextmovesoftware.com header.s=default
 header.b=iNDgHIBi;
       arc=pass (i=1);
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id C8A0B3861859
	for <ouuuleilei@gmail.com>; Sun, 29 Oct 2023 09:17:12 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from server.nextmovesoftware.com (server.nextmovesoftware.com
 [162.254.253.69])
 by sourceware.org (Postfix) with ESMTPS id 18B4A385840E
 for <gcc-patches@gcc.gnu.org>; Sun, 29 Oct 2023 09:16:48 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 18B4A385840E
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=nextmovesoftware.com
Authentication-Results: sourceware.org;
 spf=pass smtp.mailfrom=nextmovesoftware.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 18B4A385840E
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=162.254.253.69
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698571009; cv=none;
 b=OX6d3vonGfw27Cr1uPwj4HNvseE6Zgs/notLYUj9f6wkLQe3dbzGiKpJ7aFqjrOcqu7EAc3Ew8wbZE8C+09CDNl78TUkYyjKGEtOz0ju8Ou5U3ywlZnxD7EiSbRrHAQyT8cc9BSwMduEnqADRqpuS1soQwmKlmKq58XpaCrCPQQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1698571009; c=relaxed/simple;
 bh=VwC9sWCCYJ23ff6iEQY1ER/cxEOC8plIUC7qvGKHla0=;
 h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;
 b=IHFdmv0gb3bgZ+HGwLHhpIK8+9Kti8xuY+BemuCDov2omua6szBWVMvj1znIpezOorD1JKV+A1o3edBp6eYlRUPfhLLdiHy+Rpq4m4xAsqvC0l7m0T4BUj09SphS3rZlcvu5auKQSj4s5ITynQ4pjao6DJ7t/8Afe82XGo1WWos=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 d=nextmovesoftware.com; s=default; h=Content-Type:MIME-Version:Message-ID:
 Date:Subject:Cc:To:From:Sender:Reply-To:Content-Transfer-Encoding:Content-ID:
 Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc
 :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:
 List-Subscribe:List-Post:List-Owner:List-Archive;
 bh=dYw7qIHkvoUxsyPgSIvGeIu+Jrt7lwroVxU9qlGtJMw=; b=iNDgHIBiHwqdnzMfofgbnGhzaL
 iPKrHD9gtT+q8xiwiPYl4zQdrUP3X2flal8ktyKEbpbS1p72vswo69JEvwEaZGXO7kVol52qdPJUA
 3wtbLml9kV6X/EAnR80nXPeLupUoDVjR1hvbZI+AYFBT0V9bnc/tZS8Uak62NHnmab4cmSTqaz4+V
 UuhzZ6+j5bacuUn05B5VOt0xs4bPoSxik/o+Vkz3rrIvBBmwH0Xv7QOZNxAF/uEI+o88bjJygucpw
 onhmKDeWPhguyOpAyrUpfeW3zYTRxYMNNZFuvINq7I0OMEcq+pPq0nNXR4FtKtvEfIqtaE2Y+UOW1
 51BV4r9Q==;
Received: from host86-160-20-38.range86-160.btcentralplus.com
 ([86.160.20.38]:50029 helo=Dell)
 by server.nextmovesoftware.com with esmtpsa (TLS1.2) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96.2)
 (envelope-from <roger@nextmovesoftware.com>) id 1qx1uZ-0001lj-0l;
 Sun, 29 Oct 2023 05:16:47 -0400
From: "Roger Sayle" <roger@nextmovesoftware.com>
To: <gcc-patches@gcc.gnu.org>
Cc: "'Claudiu Zissulescu'" <claziss@gmail.com>
Subject: [ARC PATCH] Improved ARC rtx_costs/insn_cost for SHIFTs and ROTATEs.
Date: Sun, 29 Oct 2023 09:16:44 -0000
Message-ID: <014601da0a48$a3d6b010$eb841030$@nextmovesoftware.com>
MIME-Version: 1.0
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AdoKRpqxWnbYR5DrRqSk901BRcS9sg==
Content-Language: en-gb
X-AntiAbuse: This header was added to track abuse,
 please include it with any abuse report
X-AntiAbuse: Primary Hostname - server.nextmovesoftware.com
X-AntiAbuse: Original Domain - gcc.gnu.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - nextmovesoftware.com
X-Get-Message-Sender-Via: server.nextmovesoftware.com: authenticated_id:
 roger@nextmovesoftware.com
X-Authenticated-Sender: server.nextmovesoftware.com:
 roger@nextmovesoftware.com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE,
 SPF_PASS,
 TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1781080819303499079
X-GMAIL-MSGID: 1781080819303499079

This patch overhauls the ARC backend's insn_cost target hook, and makes
some related improvements to rtx_costs, BRANCH_COST, etc.  The primary
goal is to allow the backend to indicate that shifts and rotates are
slow (discouraged) when the CPU doesn't have a barrel shifter. I should
also acknowledge Richard Sandiford for inspiring the use of set_cost
in this rewrite of arc_insn_cost; this implementation borrows heavily
for the target hooks for AArch64 and ARM.

The motivating example is derived from PR rtl-optimization/110717.

struct S { int a : 5; };
unsigned int foo (struct S *p) {
  return p->a;
}

With a barrel shifter, GCC -O2 generates the reasonable:

foo:    ldb_s   r0,[r0]
        asl_s   r0,r0,27
        j_s.d   [blink]
        asr_s   r0,r0,27

What's interesting is that during combine, the middle-end actually
has two shifts by three bits, and a sign-extension from QI to SI.

Trying 8, 9 -> 11:
    8: r158:SI=r157:QI#0<<0x3
      REG_DEAD r157:QI
    9: r159:SI=sign_extend(r158:SI#0)
      REG_DEAD r158:SI
   11: r155:SI=r159:SI>>0x3
      REG_DEAD r159:SI

Whilst it's reasonable to simplify this to two shifts by 27 bits when
the CPU has a barrel shifter, it's actually a significant pessimization
when these shifts are implemented by loops.  This combination can be
prevented if the backend provides accurate-ish estimates for insn_cost.


Previously, without a barrel shifter, GCC -O2 -mcpu=em generates:

foo:    ldb_s   r0,[r0]
        mov     lp_count,27
        lp      2f
        add     r0,r0,r0
        nop
2:      # end single insn loop
        mov     lp_count,27
        lp      2f
        asr     r0,r0
        nop
2:      # end single insn loop
        j_s     [blink]

which contains two loops and requires about ~113 cycles to execute.
With this patch to rtx_cost/insn_cost, GCC -O2 -mcpu=em generates:

foo:    ldb_s   r0,[r0]
        mov_s   r2,0    ;3
        add3    r0,r2,r0
        sexb_s  r0,r0
        asr_s   r0,r0
        asr_s   r0,r0
        j_s.d   [blink]
        asr_s   r0,r0

which requires only ~6 cycles, for the shorter shifts by 3 and sign
extension.


Tested with a cross-compiler to arc-linux hosted on x86_64,
with no new (compile-only) regressions from make -k check.
Ok for mainline if this passes Claudiu's nightly testing?


2023-10-29  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        * config/arc/arc.cc (arc_rtx_costs): Improve cost estimates.
        Provide reasonable values for SHIFTS and ROTATES by constant
        bit counts depending upon TARGET_BARREL_SHIFTER.
        (arc_insn_cost): Use insn attributes if the instruction is
        recognized.  Avoid calling get_attr_length for type "multi",
        i.e. define_insn_and_split patterns without explicit type.
        Fall-back to set_rtx_cost for single_set and pattern_cost
        otherwise.
        * config/arc/arc.h (COSTS_N_BYTES): Define helper macro.
        (BRANCH_COST): Improve/correct definition.
        (LOGICAL_OP_NON_SHORT_CIRCUIT): Preserve previous behavior.


Thanks again,
Roger

diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
index 353ac69..ae83e5e 100644
--- a/gcc/config/arc/arc.cc
+++ b/gcc/config/arc/arc.cc
@@ -5492,7 +5492,7 @@ arc_rtx_costs (rtx x, machine_mode mode, int outer_code,
     case CONST:
     case LABEL_REF:
     case SYMBOL_REF:
-      *total = speed ? COSTS_N_INSNS (1) : COSTS_N_INSNS (4);
+      *total = speed ? COSTS_N_INSNS (1) : COSTS_N_BYTES (4);
       return true;
 
     case CONST_DOUBLE:
@@ -5516,26 +5516,32 @@ arc_rtx_costs (rtx x, machine_mode mode, int outer_code,
     case ASHIFT:
     case ASHIFTRT:
     case LSHIFTRT:
+    case ROTATE:
+    case ROTATERT:
+      if (mode == DImode)
+	return false;
       if (TARGET_BARREL_SHIFTER)
 	{
-	  if (CONSTANT_P (XEXP (x, 0)))
+	  *total = COSTS_N_INSNS (1);
+	  if (CONSTANT_P (XEXP (x, 1)))
 	    {
-	      *total += rtx_cost (XEXP (x, 1), mode, (enum rtx_code) code,
+	      *total += rtx_cost (XEXP (x, 0), mode, (enum rtx_code) code,
 				  0, speed);
 	      return true;
 	    }
-	  *total = COSTS_N_INSNS (1);
 	}
       else if (GET_CODE (XEXP (x, 1)) != CONST_INT)
-	*total = COSTS_N_INSNS (16);
+	*total = speed ? COSTS_N_INSNS (16) : COSTS_N_INSNS (4);
       else
 	{
-	  *total = COSTS_N_INSNS (INTVAL (XEXP ((x), 1)));
-	  /* ??? want_to_gcse_p can throw negative shift counts at us,
-	     and then panics when it gets a negative cost as result.
-	     Seen for gcc.c-torture/compile/20020710-1.c -Os .  */
-	  if (*total < 0)
-	    *total = 0;
+	  int n = INTVAL (XEXP (x, 1)) & 31;
+          if (n < 4)
+	    *total = COSTS_N_INSNS (n);
+	  else
+	    *total = speed ? COSTS_N_INSNS (n + 2) : COSTS_N_INSNS (4);
+	  *total += rtx_cost (XEXP (x, 0), mode, (enum rtx_code) code,
+			      0, speed);
+	  return true;
 	}
       return false;
 
@@ -5567,6 +5573,8 @@ arc_rtx_costs (rtx x, machine_mode mode, int outer_code,
       return false;
 
     case PLUS:
+      if (mode == DImode)
+	return false;
       if (outer_code == MEM && CONST_INT_P (XEXP (x, 1))
 	  && RTX_OK_FOR_OFFSET_P (mode, XEXP (x, 1)))
 	{
@@ -11101,35 +11109,37 @@ static int
 arc_insn_cost (rtx_insn *insn, bool speed)
 {
   int cost;
-  if (recog_memoized (insn) < 0)
-    return 0;
-
-  /* If optimizing for size, we want the insn size.  */
-  if (!speed)
-    return get_attr_length (insn);
-
-  /* Use cost if provided.  */
-  cost = get_attr_cost (insn);
-  if (cost > 0)
-    return cost;
-
-  /* For speed make a simple cost model: memory access is more
-     expensive than any other instruction.  */
-  enum attr_type type = get_attr_type (insn);
-
-  switch (type)
+  enum attr_type type;
+  if (recog_memoized (insn) >= 0)
     {
-    case TYPE_LOAD:
-    case TYPE_STORE:
-      cost = COSTS_N_INSNS (2);
-      break;
-
-    default:
-      cost = COSTS_N_INSNS (1);
-      break;
+      if (speed)
+	{
+	  /* Use cost if provided.  */
+	  cost = get_attr_cost (insn);
+	  if (cost > 0)
+	    return cost;
+	  /* For speed make a simple cost model: memory access is more
+	     expensive than any other instruction.  */
+	  type = get_attr_type (insn);
+	  if (type == TYPE_LOAD || type == TYPE_STORE)
+	    return COSTS_N_INSNS (2);
+	}
+      else
+	{
+	  /* If optimizing for size, we want the insn size.  */
+	  type = get_attr_type (insn);
+	  if (type != TYPE_MULTI)
+	    return get_attr_length (insn);
+	}
     }
 
-  return cost;
+  if (rtx set = single_set (insn))
+    cost = set_rtx_cost (set, speed);
+  else
+    cost = pattern_cost (PATTERN (insn), speed);
+  /* If the cost is zero, then it's likely a complex insn.  We don't
+     want the cost of these to be less than something we know about.  */
+  return cost ? cost : COSTS_N_INSNS (2);
 }
 
 static unsigned
diff --git a/gcc/config/arc/arc.h b/gcc/config/arc/arc.h
index 5877389..b34f0b2 100644
--- a/gcc/config/arc/arc.h
+++ b/gcc/config/arc/arc.h
@@ -956,10 +956,16 @@ arc_select_cc_mode (OP, X, Y)
 
 /* Costs.  */
 
+/* Analog of COSTS_N_INSNS when optimizing for size.  */
+#ifndef COSTS_N_BYTES
+#define COSTS_N_BYTES(N) (N)
+#endif
+
 /* The cost of a branch insn.  */
 /* ??? What's the right value here?  Branches are certainly more
    expensive than reg->reg moves.  */
-#define BRANCH_COST(speed_p, predictable_p) 2
+#define BRANCH_COST(speed_p, predictable_p) \
+	(speed_p ? COSTS_N_INSNS (2) : COSTS_N_INSNS (1))
 
 /* Scc sets the destination to 1 and then conditionally zeroes it.
    Best case, ORed SCCs can be made into clear - condset - condset.
@@ -971,11 +977,8 @@ arc_select_cc_mode (OP, X, Y)
    beging decisive of p0, we want:
    p0 * (branch_cost - 4) > (1 - p0) * 5
    ??? We don't get to see that probability to evaluate, so we can
-   only wildly guess that it might be 50%.
-   ??? The compiler also lacks the notion of branch predictability.  */
-#define LOGICAL_OP_NON_SHORT_CIRCUIT \
-  (BRANCH_COST (optimize_function_for_speed_p (cfun), \
-		false) > 9)
+   only wildly guess that it might be 50%.  */
+#define LOGICAL_OP_NON_SHORT_CIRCUIT false
 
 /* Nonzero if access to memory by bytes is slow and undesirable.
    For RISC chips, it means that access to memory by bytes is no