[1/2] RTX_COST: Count instructions

Message ID 20231229174649.2811234-1-syq@gcc.gnu.org
State Unresolved
Headers
Series [1/2] RTX_COST: Count instructions |

Checks

Context Check Description
snail/gcc-patch-check warning Git am fail log

Commit Message

YunQiang Su Dec. 29, 2023, 5:46 p.m. UTC
  When we try to combine RTLs, the result may be very complex,
and `rtx_cost` may think that it need lots of costs. But in
fact, it may match a pattern in machine descriptions, which
may emit only 1 or 2 hardware instructions.  This combination
may be refused due to cost comparison failure.

Since the high cost may be due to a more expsensive operation.
To get real reason, we also need information about instruction
count.

gcc

	* rtl.h (struct full_rtx_costs): Add new members,
	speed_count and size_count.
	(init_costs_to_zero): Ditto.
	(costs_add_n_insns): Add new argument, expensive.
	(rtx_cost_and_count): New function.
	* rtlanal.cc (rtx_cost): Call rtx_cost_and_count now.
	(rtx_cost_and_count): New function.
	(get_full_rtx_cost): Call rtx_cost_and_count now.
	* hooks.cc (hook_bool_rtx_mode_int_int_intp_intp_bool_false):
	New fallback hook function.
	* hooks.h (hook_bool_rtx_mode_int_int_intp_intp_bool_false):
	New fallback hook function.
	* target.def (insn_costs): add new argument, count.
	* doc/tm.texi (TARGET_RTX_COSTS): Ditto.
	* config/aarch64/aarch64.cc (aarch64_rtx_costs_wrapper): Ditto.
	* config/alpha/alpha.cc (alpha_rtx_costs): Ditto.
	* config/arc/arc.cc (arc_rtx_costs): Ditto.
	* config/arm/arm.cc (arm_rtx_costs): Ditto.
	* config/avr/avr.cc (avr_rtx_costs): Ditto.
	* config/bfin/bfin.cc (bfin_rtx_costs): Ditto.
	* config/bpf/bpf.cc (bpf_rtx_costs): Ditto.
	* config/c6x/c6x.cc (c6x_rtx_costs): Ditto.
	* config/cris/cris.cc (cris_rtx_costs): Ditto.
	* config/csky/csky.cc (csky_rtx_costs): Ditto.
	* config/epiphany/epiphany.cc (epiphany_rtx_costs): Ditto.
	* config/frv/frv.cc (frv_rtx_costs): Ditto.
	* config/gcn/gcn.cc (gcn_rtx_costs): Ditto.
	* config/h8300/h8300.cc (h8300_rtx_costs): Ditto.
	* config/i386/i386.cc (i386_rtx_costs): Ditto.
	* config/ia64/ia64.cc (ia64_rtx_costs): Ditto.
	* config/iq2000/iq2000.cc (iq2000_rtx_costs): Ditto.
	* config/lm32/lm32.cc (lm32_rtx_costs): Ditto.
	* config/loongarch/loongarch.cc (loongarch_rtx_costs): Ditto.
	* config/m32c/m32c.cc (m32c_rtx_costs): Ditto.
	* config/m32c/m32r.cc (m32r_rtx_costs): Ditto.
	* config/m68k/m68k.cc (m68k_rtx_costs): Ditto.
	* config/mcore/mcore.cc (mcore_rtx_costs): Ditto.
	* config/microblaze/microblaze.cc (microblaze_rtx_costs): Ditto.
	* config/mips/mips.cc (mips_rtx_costs): Ditto.
	* config/mmix/mmix.cc (mmix_rtx_costs): Ditto.
	* config/mn10300/mn10300.cc (mn10300_rtx_costs): Ditto.
	* config/msp430/msp430.cc (msp430_rtx_costs): Ditto.
	* config/nds32/nds32.cc (nds32_rtx_costs): Ditto.
	* config/nios2/nios2.cc (nios2_rtx_costs): Ditto.
	* config/or1k/or1k.cc (or1k_rtx_costs): Ditto.
	* config/pa/pa.cc (hppa_rtx_costs): Ditto.
	* config/pdp11/pdp11.cc (pdp11_rtx_costs): Ditto.
	* config/pru/pru.cc (pru_rtx_costs): Ditto.
	* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
	* config/rl78/rl78.cc (rl78_rtx_costs): Ditto.
	* config/rs6000/rs6000.cc (rs6000_rtx_costs): Ditto.
	(rs6000_debug_rtx_costs): Ditto.
	* config/rx/rx.cc (rx_rtx_costs): Ditto.
	* config/s390/s390.cc (s390_rtx_costs): Ditto.
	* config/sh/sh.cc (sh_rtx_costs): Ditto.
	* config/sparc/sparc.cc (sparc_rtx_costs): Ditto.
	* config/stormy16/stormy16.cc (xstormy16_rtx_costs): Ditto.
	* config/v850/v850.cc (v850_rtx_costs): Ditto.
	* config/vax/vax.cc (vax_rtx_costs): Ditto.
	* config/visium/visium.cc (visium_rtx_costs): Ditto.
	* config/xtensa/xtensa.cc (xtensa_rtx_costs): Ditto.
---
 gcc/config/aarch64/aarch64.cc       |  3 +-
 gcc/config/alpha/alpha.cc           |  6 ++-
 gcc/config/arc/arc.cc               |  4 +-
 gcc/config/arm/arm.cc               |  7 +++-
 gcc/config/avr/avr.cc               | 10 +++--
 gcc/config/bfin/bfin.cc             |  4 +-
 gcc/config/bpf/bpf.cc               |  1 +
 gcc/config/c6x/c6x.cc               |  6 ++-
 gcc/config/cris/cris.cc             |  6 ++-
 gcc/config/csky/csky.cc             |  9 ++++-
 gcc/config/epiphany/epiphany.cc     |  5 ++-
 gcc/config/frv/frv.cc               |  5 ++-
 gcc/config/gcn/gcn.cc               |  4 +-
 gcc/config/h8300/h8300.cc           |  5 ++-
 gcc/config/i386/i386.cc             |  4 +-
 gcc/config/ia64/ia64.cc             |  6 ++-
 gcc/config/iq2000/iq2000.cc         |  7 +++-
 gcc/config/lm32/lm32.cc             |  7 +++-
 gcc/config/loongarch/loongarch.cc   |  5 ++-
 gcc/config/m32c/m32c.cc             |  4 +-
 gcc/config/m32r/m32r.cc             |  8 +++-
 gcc/config/m68k/m68k.cc             |  6 ++-
 gcc/config/mcore/mcore.cc           |  6 ++-
 gcc/config/microblaze/microblaze.cc |  4 +-
 gcc/config/mips/mips.cc             |  5 ++-
 gcc/config/mmix/mmix.cc             |  3 +-
 gcc/config/mn10300/mn10300.cc       |  5 ++-
 gcc/config/msp430/msp430.cc         |  2 +
 gcc/config/nds32/nds32.cc           |  1 +
 gcc/config/nios2/nios2.cc           |  6 ++-
 gcc/config/or1k/or1k.cc             |  4 +-
 gcc/config/pa/pa.cc                 |  6 ++-
 gcc/config/pdp11/pdp11.cc           |  7 +++-
 gcc/config/pru/pru.cc               |  4 +-
 gcc/config/riscv/riscv.cc           |  9 ++++-
 gcc/config/rl78/rl78.cc             |  3 ++
 gcc/config/rs6000/rs6000.cc         | 12 ++++--
 gcc/config/rx/rx.cc                 |  4 +-
 gcc/config/s390/s390.cc             |  4 +-
 gcc/config/sh/sh.cc                 |  6 ++-
 gcc/config/sparc/sparc.cc           |  6 ++-
 gcc/config/stormy16/stormy16.cc     |  5 ++-
 gcc/config/v850/v850.cc             |  5 ++-
 gcc/config/vax/vax.cc               |  6 ++-
 gcc/config/visium/visium.cc         |  6 ++-
 gcc/config/xtensa/xtensa.cc         |  6 ++-
 gcc/doc/tm.texi                     |  7 +++-
 gcc/hooks.cc                        |  7 ++++
 gcc/hooks.h                         |  5 +++
 gcc/rtl.h                           | 21 ++++++++++-
 gcc/rtlanal.cc                      | 58 +++++++++++++++++++++++++----
 gcc/target.def                      | 11 +++++-
 52 files changed, 277 insertions(+), 79 deletions(-)
  

Comments

Jeff Law Dec. 30, 2023, 4:14 a.m. UTC | #1
On 12/29/23 10:46, YunQiang Su wrote:
> When we try to combine RTLs, the result may be very complex,
> and `rtx_cost` may think that it need lots of costs. But in
> fact, it may match a pattern in machine descriptions, which
> may emit only 1 or 2 hardware instructions.  This combination
> may be refused due to cost comparison failure.
Then that's a problem with the backend's implementation of RTX_COST.

> 
> Since the high cost may be due to a more expsensive operation.
> To get real reason, we also need information about instruction
> count.
Then cost the *operations*, not the number of instructions.  Also note 
that a single insn may generate multiple assembler instructions.

Even with all its warts, the real solution here is to fix the port's RTX 
costs.

jeff
  
Segher Boessenkool Dec. 30, 2023, 5:49 p.m. UTC | #2
On Fri, Dec 29, 2023 at 09:14:52PM -0700, Jeff Law wrote:
> On 12/29/23 10:46, YunQiang Su wrote:
> >When we try to combine RTLs, the result may be very complex,
> >and `rtx_cost` may think that it need lots of costs. But in
> >fact, it may match a pattern in machine descriptions, which
> >may emit only 1 or 2 hardware instructions.  This combination
> >may be refused due to cost comparison failure.
> Then that's a problem with the backend's implementation of RTX_COST.
> 
> >Since the high cost may be due to a more expsensive operation.
> >To get real reason, we also need information about instruction
> >count.
> Then cost the *operations*, not the number of instructions.  Also note 
> that a single insn may generate multiple assembler instructions.
> 
> Even with all its warts, the real solution here is to fix the port's RTX 
> costs.

Or implement the insn_cost hook instead, it will be used preferably over
rtx_costs in most places then.  Including in the combiner.  insn_cost
is much easier to implement, and even possible to make good cost
estimates with :-)


Segher
  

Patch

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index f9850320f61..549adc664e7 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -15292,7 +15292,8 @@  cost_plus:
    if the total cost of X was calculated.  */
 static bool
 aarch64_rtx_costs_wrapper (rtx x, machine_mode mode, int outer,
-		   int param, int *cost, bool speed)
+		   int param, int *cost, int *count ATTRIBUTE_UNUSED,
+		   bool speed)
 {
   bool result = aarch64_rtx_costs (x, mode, outer, param, cost, speed);
 
diff --git a/gcc/config/alpha/alpha.cc b/gcc/config/alpha/alpha.cc
index 6aa93783226..57a262620cf 100644
--- a/gcc/config/alpha/alpha.cc
+++ b/gcc/config/alpha/alpha.cc
@@ -1359,13 +1359,15 @@  alpha_memory_move_cost (machine_mode /*mode*/, reg_class_t /*regclass*/,
    scanned.  In either case, *TOTAL contains the cost result.  */
 
 static bool
-alpha_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno, int *total,
-		 bool speed)
+alpha_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno,
+		 int *total, int *count, bool speed)
 {
   int code = GET_CODE (x);
   bool float_mode_p = FLOAT_MODE_P (mode);
   const struct alpha_rtx_cost_data *cost_data;
 
+  *count = 0;
+
   if (!speed)
     cost_data = &alpha_rtx_cost_size;
   else
diff --git a/gcc/config/arc/arc.cc b/gcc/config/arc/arc.cc
index 3f4eb5a5736..d895698b413 100644
--- a/gcc/config/arc/arc.cc
+++ b/gcc/config/arc/arc.cc
@@ -5485,10 +5485,12 @@  void arc_file_end (void)
 
 static bool
 arc_rtx_costs (rtx x, machine_mode mode, int outer_code,
-	       int opno ATTRIBUTE_UNUSED, int *total, bool speed)
+	       int opno ATTRIBUTE_UNUSED, int *total, int *count, bool speed)
 {
   int code = GET_CODE (x);
 
+  *count = 0;
+
   switch (code)
     {
       /* Small integers are as cheap as registers.  */
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index 0c0cb14a8a4..e7701b9835f 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -181,7 +181,7 @@  static void arm_output_mi_thunk (FILE *, tree, HOST_WIDE_INT, HOST_WIDE_INT,
 static bool arm_have_conditional_execution (void);
 static bool arm_cannot_force_const_mem (machine_mode, rtx);
 static bool arm_legitimate_constant_p (machine_mode, rtx);
-static bool arm_rtx_costs (rtx, machine_mode, int, int, int *, bool);
+static bool arm_rtx_costs (rtx, machine_mode, int, int, int *, int *, bool);
 static int arm_insn_cost (rtx_insn *, bool);
 static int arm_address_cost (rtx, machine_mode, addr_space_t, bool);
 static int arm_register_move_cost (machine_mode, reg_class_t, reg_class_t);
@@ -12128,12 +12128,15 @@  arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 
 static bool
 arm_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED, int outer_code,
-	       int opno ATTRIBUTE_UNUSED, int *total, bool speed)
+	       int opno ATTRIBUTE_UNUSED,
+	       int *total, int *count, bool speed)
 {
   bool result;
   int code = GET_CODE (x);
   gcc_assert (current_tune->insn_extra_cost);
 
+  *count = 0;
+
   result =  arm_rtx_costs_internal (x, (enum rtx_code) code,
 				(enum rtx_code) outer_code,
 				current_tune->insn_extra_cost,
diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index c5e9ccf9663..1458d638f0c 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -167,7 +167,7 @@  static struct machine_function * avr_init_machine_status (void);
 
 /* Prototypes for hook implementors if needed before their implementation.  */
 
-static bool avr_rtx_costs (rtx, machine_mode, int, int, int*, bool);
+static bool avr_rtx_costs (rtx, machine_mode, int, int, int*, int *, bool);
 
 
 /* Allocate registers from r25 to r8 for parameters for function calls.  */
@@ -11404,6 +11404,7 @@  avr_operand_rtx_cost (rtx x, machine_mode mode, enum rtx_code outer,
 {
   enum rtx_code code = GET_CODE (x);
   int total;
+  int count;
 
   switch (code)
     {
@@ -11421,7 +11422,8 @@  avr_operand_rtx_cost (rtx x, machine_mode mode, enum rtx_code outer,
     }
 
   total = 0;
-  avr_rtx_costs (x, mode, outer, opno, &total, speed);
+  count = 0;
+  avr_rtx_costs (x, mode, outer, opno, &total, &count, speed);
   return total;
 }
 
@@ -12315,8 +12317,10 @@  avr_rtx_costs_1 (rtx x, machine_mode mode, int outer_code,
 
 static bool
 avr_rtx_costs (rtx x, machine_mode mode, int outer_code,
-	       int opno, int *total, bool speed)
+	       int opno, int *total, int *count, bool speed)
 {
+  *count = 0;
+
   bool done = avr_rtx_costs_1 (x, mode, outer_code, opno, total, speed);
 
   if (avr_log.rtx_costs)
diff --git a/gcc/config/bfin/bfin.cc b/gcc/config/bfin/bfin.cc
index c02136f5e0c..f131a5022ea 100644
--- a/gcc/config/bfin/bfin.cc
+++ b/gcc/config/bfin/bfin.cc
@@ -2804,13 +2804,15 @@  bfin_legitimate_constant_p (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
 
 static bool
 bfin_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
-		int *total, bool speed)
+		int *total, int *count, bool speed)
 {
   enum rtx_code code = GET_CODE (x);
   enum rtx_code outer_code = (enum rtx_code) outer_code_i;
   int cost2 = COSTS_N_INSNS (1);
   rtx op0, op1;
 
+  *count = 0;
+
   switch (code)
     {
     case CONST_INT:
diff --git a/gcc/config/bpf/bpf.cc b/gcc/config/bpf/bpf.cc
index f7a5c772e16..5c5c93928cb 100644
--- a/gcc/config/bpf/bpf.cc
+++ b/gcc/config/bpf/bpf.cc
@@ -597,6 +597,7 @@  bpf_rtx_costs (rtx x ATTRIBUTE_UNUSED,
 	       int outer_code ATTRIBUTE_UNUSED,
 	       int opno ATTRIBUTE_UNUSED,
                int *total ATTRIBUTE_UNUSED,
+	       int *count ATTRIBUTE_UNUSED,
 	       bool speed ATTRIBUTE_UNUSED)
 {
   /* To be written.  */
diff --git a/gcc/config/c6x/c6x.cc b/gcc/config/c6x/c6x.cc
index 72e8b4c5345..076df0e3b4c 100644
--- a/gcc/config/c6x/c6x.cc
+++ b/gcc/config/c6x/c6x.cc
@@ -5994,13 +5994,15 @@  shift_p (rtx x, enum rtx_code code, int amount)
    scanned.  In either case, *TOTAL contains the cost result.  */
 
 static bool
-c6x_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno, int *total,
-	       bool speed)
+c6x_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno,
+	       int *total, int *count, bool speed)
 {
   int cost2 = COSTS_N_INSNS (1);
   rtx op0, op1;
   int code = GET_CODE (x);
 
+  *count = 0;
+
   switch (code)
     {
     case CONST_INT:
diff --git a/gcc/config/cris/cris.cc b/gcc/config/cris/cris.cc
index 38a4dd29114..b9fa5a6e6f7 100644
--- a/gcc/config/cris/cris.cc
+++ b/gcc/config/cris/cris.cc
@@ -139,7 +139,7 @@  static reg_class_t cris_spill_class (reg_class_t, machine_mode);
 static int cris_register_move_cost (machine_mode, reg_class_t, reg_class_t);
 static int cris_memory_move_cost (machine_mode, reg_class_t, bool);
 static machine_mode cris_cc_modes_compatible (machine_mode, machine_mode);
-static bool cris_rtx_costs (rtx, machine_mode, int, int, int *, bool);
+static bool cris_rtx_costs (rtx, machine_mode, int, int, int *, int *, bool);
 static int cris_address_cost (rtx, machine_mode, addr_space_t, bool);
 static bool cris_pass_by_reference (cumulative_args_t,
 				    const function_arg_info &);
@@ -1907,10 +1907,12 @@  cris_expand_return (bool on_stack)
 
 static bool
 cris_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno,
-		int *total, bool speed)
+		int *total, int *count, bool speed)
 {
   int code = GET_CODE (x);
 
+  *count = 0;
+
   switch (code)
     {
     case CONST_INT:
diff --git a/gcc/config/csky/csky.cc b/gcc/config/csky/csky.cc
index ac089feea62..04af1d309a8 100644
--- a/gcc/config/csky/csky.cc
+++ b/gcc/config/csky/csky.cc
@@ -6866,8 +6866,11 @@  ck807_ck810_rtx_costs (rtx x, int code,
 static bool
 ck860_rtx_costs (rtx x, int code, machine_mode mode,
 		 int outer_code ATTRIBUTE_UNUSED,
-		 int *total, bool speed ATTRIBUTE_UNUSED)
+		 int *total, int *count,
+		 bool speed ATTRIBUTE_UNUSED)
 {
+  *count = 0;
+
   switch (code)
     {
     case PLUS:
@@ -6915,10 +6918,12 @@  ck860_rtx_costs (rtx x, int code, machine_mode mode,
 
 static bool
 csky_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED, int outer_code,
-		int opno ATTRIBUTE_UNUSED, int *total, bool speed)
+		int opno ATTRIBUTE_UNUSED, int *total, int *count, bool speed)
 {
   int code = GET_CODE (x);
 
+  *count = 0;
+
   if (CSKY_TARGET_ARCH (CK802) || CSKY_TARGET_ARCH (CK801))
     return ck802_ck801_rtx_costs (x, code, outer_code, total, speed);
   else if (CSKY_TARGET_ARCH (CK803))
diff --git a/gcc/config/epiphany/epiphany.cc b/gcc/config/epiphany/epiphany.cc
index e10e64de823..4cd04be0b45 100644
--- a/gcc/config/epiphany/epiphany.cc
+++ b/gcc/config/epiphany/epiphany.cc
@@ -773,10 +773,13 @@  epiphany_arg_partial_bytes (cumulative_args_t cum,
 static bool
 epiphany_rtx_costs (rtx x, machine_mode mode, int outer_code,
 		    int opno ATTRIBUTE_UNUSED,
-		    int *total, bool speed ATTRIBUTE_UNUSED)
+		    int *total, int *count,
+		    bool speed ATTRIBUTE_UNUSED)
 {
   int code = GET_CODE (x);
 
+  *count = 0;
+
   switch (code)
     {
       /* Small integers in the right context are as cheap as registers.  */
diff --git a/gcc/config/frv/frv.cc b/gcc/config/frv/frv.cc
index 03976ba7b71..34912e2c893 100644
--- a/gcc/config/frv/frv.cc
+++ b/gcc/config/frv/frv.cc
@@ -365,7 +365,7 @@  static void frv_setup_incoming_varargs		(cumulative_args_t,
 static rtx frv_expand_builtin_saveregs		(void);
 static void frv_expand_builtin_va_start		(tree, rtx);
 static bool frv_rtx_costs			(rtx, machine_mode, int, int,
-						 int*, bool);
+						 int*, int *, bool);
 static int frv_register_move_cost		(machine_mode,
 						 reg_class_t, reg_class_t);
 static int frv_memory_move_cost			(machine_mode,
@@ -9323,10 +9323,13 @@  frv_rtx_costs (rtx x,
                int outer_code,
 	       int opno ATTRIBUTE_UNUSED,
                int *total,
+	       int *count,
 	       bool speed ATTRIBUTE_UNUSED)
 {
   int code = GET_CODE (x);
 
+  *count = 0;
+
   if (outer_code == MEM)
     {
       /* Don't differentiate between memory addresses.  All the ones
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index b67551a2e8e..20e61810c7d 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -3946,8 +3946,10 @@  gcn_emutls_var_init (tree, tree decl, tree)
    scanned.  In either case, *TOTAL contains the cost result.  */
 
 static bool
-gcn_rtx_costs (rtx x, machine_mode, int, int, int *total, bool)
+gcn_rtx_costs (rtx x, machine_mode, int, int, int *total, int *count, bool)
 {
+  *count = 0;
+
   enum rtx_code code = GET_CODE (x);
   switch (code)
     {
diff --git a/gcc/config/h8300/h8300.cc b/gcc/config/h8300/h8300.cc
index f906286d65d..5ae9b39fc87 100644
--- a/gcc/config/h8300/h8300.cc
+++ b/gcc/config/h8300/h8300.cc
@@ -1189,10 +1189,13 @@  h8300_shift_costs (rtx x)
 
 static bool
 h8300_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED, int outer_code,
-		 int opno ATTRIBUTE_UNUSED, int *total, bool speed)
+		 int opno ATTRIBUTE_UNUSED,
+		 int *total, int *count, bool speed)
 {
   int code = GET_CODE (x);
 
+  *count = 0;
+
   if (TARGET_H8300SX && outer_code == MEM)
     {
       /* Estimate the number of execution states needed to calculate
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 38d515dac04..7c770816541 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -21469,7 +21469,7 @@  ix86_shift_rotate_cost (const struct processor_costs *cost,
 
 static bool
 ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
-		int *total, bool speed)
+		int *total, int *count, bool speed)
 {
   rtx mask;
   enum rtx_code code = GET_CODE (x);
@@ -21478,6 +21478,8 @@  ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
     = speed ? ix86_tune_cost : &ix86_size_cost;
   int src_cost;
 
+  *count = 0;
+
   switch (code)
     {
     case SET:
diff --git a/gcc/config/ia64/ia64.cc b/gcc/config/ia64/ia64.cc
index ac566efcf19..094fa738305 100644
--- a/gcc/config/ia64/ia64.cc
+++ b/gcc/config/ia64/ia64.cc
@@ -220,7 +220,7 @@  static int ia64_register_move_cost (machine_mode, reg_class_t,
                                     reg_class_t);
 static int ia64_memory_move_cost (machine_mode mode, reg_class_t,
 				  bool);
-static bool ia64_rtx_costs (rtx, machine_mode, int, int, int *, bool);
+static bool ia64_rtx_costs (rtx, machine_mode, int, int, int *, int *, bool);
 static int ia64_unspec_may_trap_p (const_rtx, unsigned);
 static void fix_range (const char *);
 static struct machine_function * ia64_init_machine_status (void);
@@ -5688,10 +5688,12 @@  ia64_print_operand_punct_valid_p (unsigned char code)
 static bool
 ia64_rtx_costs (rtx x, machine_mode mode, int outer_code,
 		int opno ATTRIBUTE_UNUSED,
-		int *total, bool speed ATTRIBUTE_UNUSED)
+		int *total, int *count, bool speed ATTRIBUTE_UNUSED)
 {
   int code = GET_CODE (x);
 
+  *count = 0;
+
   switch (code)
     {
     case CONST_INT:
diff --git a/gcc/config/iq2000/iq2000.cc b/gcc/config/iq2000/iq2000.cc
index 54404e8d05a..52c87de276e 100644
--- a/gcc/config/iq2000/iq2000.cc
+++ b/gcc/config/iq2000/iq2000.cc
@@ -154,7 +154,8 @@  static bool iq2000_return_in_memory   (const_tree, const_tree);
 static void iq2000_setup_incoming_varargs (cumulative_args_t,
 					   const function_arg_info &,
 					   int *, int);
-static bool iq2000_rtx_costs          (rtx, machine_mode, int, int, int *, bool);
+static bool iq2000_rtx_costs (rtx, machine_mode, int, int,
+			      int *, int *, bool);
 static int  iq2000_address_cost       (rtx, machine_mode, addr_space_t,
 				       bool);
 static rtx  iq2000_legitimize_address (rtx, rtx, machine_mode);
@@ -3280,11 +3281,13 @@  iq2000_legitimize_address (rtx xinsn, rtx old_x ATTRIBUTE_UNUSED,
 
 static bool
 iq2000_rtx_costs (rtx x, machine_mode mode, int outer_code ATTRIBUTE_UNUSED,
-		  int opno ATTRIBUTE_UNUSED, int * total,
+		  int opno ATTRIBUTE_UNUSED, int * total, int * count,
 		  bool speed ATTRIBUTE_UNUSED)
 {
   int code = GET_CODE (x);
 
+  *count = 0;
+
   switch (code)
     {
     case MEM:
diff --git a/gcc/config/lm32/lm32.cc b/gcc/config/lm32/lm32.cc
index 9d65d66719c..e86e0c2d0d6 100644
--- a/gcc/config/lm32/lm32.cc
+++ b/gcc/config/lm32/lm32.cc
@@ -67,7 +67,7 @@  static void lm32_setup_incoming_varargs (cumulative_args_t cum,
 					 const function_arg_info &,
 					 int *pretend_size, int no_rtl);
 static bool lm32_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno,
-			    int *total, bool speed);
+			    int *total, int *count, bool speed);
 static bool lm32_can_eliminate (const int, const int);
 static bool lm32_legitimate_address_p (machine_mode mode, rtx x, bool strict,
 				       code_helper = ERROR_MARK);
@@ -923,7 +923,8 @@  nonpic_symbol_mentioned_p (rtx x)
 
 static bool
 lm32_rtx_costs (rtx x, machine_mode mode, int outer_code,
-		int opno ATTRIBUTE_UNUSED, int *total, bool speed)
+		int opno ATTRIBUTE_UNUSED,
+		int *total, int *count, bool speed)
 {
   int code = GET_CODE (x);
   bool small_mode;
@@ -935,6 +936,8 @@  lm32_rtx_costs (rtx x, machine_mode mode, int outer_code,
   const int load_latency = 3;
   const int libcall_size_cost = 5;
 
+  *count = 0;
+
   /* Determine if we can handle the given mode size in a single instruction.  */
   small_mode = (mode == QImode) || (mode == HImode) || (mode == SImode);
 
diff --git a/gcc/config/loongarch/loongarch.cc b/gcc/config/loongarch/loongarch.cc
index 1d4d8f0b256..267a9c4b632 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -3594,13 +3594,16 @@  loongarch_set_reg_reg_cost (machine_mode mode)
 
 static bool
 loongarch_rtx_costs (rtx x, machine_mode mode, int outer_code,
-		     int opno ATTRIBUTE_UNUSED, int *total, bool speed)
+		     int opno ATTRIBUTE_UNUSED,
+		     int *total, int *count, bool speed)
 {
   int code = GET_CODE (x);
   bool float_mode_p = FLOAT_MODE_P (mode);
   int cost;
   rtx addr;
 
+  *count = 0;
+
   if (outer_code == COMPARE)
     {
       gcc_assert (CONSTANT_P (x));
diff --git a/gcc/config/m32c/m32c.cc b/gcc/config/m32c/m32c.cc
index c63c75a6709..b9527152c7a 100644
--- a/gcc/config/m32c/m32c.cc
+++ b/gcc/config/m32c/m32c.cc
@@ -2213,8 +2213,10 @@  m32c_memory_move_cost (machine_mode mode ATTRIBUTE_UNUSED,
 static bool
 m32c_rtx_costs (rtx x, machine_mode mode, int outer_code,
 		int opno ATTRIBUTE_UNUSED,
-		int *total, bool speed ATTRIBUTE_UNUSED)
+		int *total, int *count, bool speed ATTRIBUTE_UNUSED)
 {
+  *count = 0;
+
   int code = GET_CODE (x);
   switch (code)
     {
diff --git a/gcc/config/m32r/m32r.cc b/gcc/config/m32r/m32r.cc
index 1a9c8ef1391..580fa83489f 100644
--- a/gcc/config/m32r/m32r.cc
+++ b/gcc/config/m32r/m32r.cc
@@ -92,7 +92,8 @@  static void m32r_setup_incoming_varargs (cumulative_args_t,
 					 const function_arg_info &,
 					 int *, int);
 static void init_idents (void);
-static bool m32r_rtx_costs (rtx, machine_mode, int, int, int *, bool speed);
+static bool m32r_rtx_costs (rtx, machine_mode, int, int,
+			    int *, int *, bool speed);
 static int m32r_memory_move_cost (machine_mode, reg_class_t, bool);
 static bool m32r_pass_by_reference (cumulative_args_t,
 				    const function_arg_info &arg);
@@ -1367,11 +1368,14 @@  m32r_memory_move_cost (machine_mode mode,
 static bool
 m32r_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED,
 		int outer_code ATTRIBUTE_UNUSED,
-		int opno ATTRIBUTE_UNUSED, int *total,
+		int opno ATTRIBUTE_UNUSED,
+		int *total, int *count,
 		bool speed ATTRIBUTE_UNUSED)
 {
   int code = GET_CODE (x);
 
+  *count = 0;
+
   switch (code)
     {
       /* Small integers are as cheap as registers.  4 byte values can be
diff --git a/gcc/config/m68k/m68k.cc b/gcc/config/m68k/m68k.cc
index 001cf5bd997..00fac5238c3 100644
--- a/gcc/config/m68k/m68k.cc
+++ b/gcc/config/m68k/m68k.cc
@@ -175,7 +175,7 @@  static bool m68k_save_reg (unsigned int regno, bool interrupt_handler);
 static bool m68k_ok_for_sibcall_p (tree, tree);
 static bool m68k_tls_symbol_p (rtx);
 static rtx m68k_legitimize_address (rtx, rtx, machine_mode);
-static bool m68k_rtx_costs (rtx, machine_mode, int, int, int *, bool);
+static bool m68k_rtx_costs (rtx, machine_mode, int, int, int *, int *, bool);
 #if M68K_HONOR_TARGET_STRICT_ALIGNMENT
 static bool m68k_return_in_memory (const_tree, const_tree);
 #endif
@@ -2996,10 +2996,12 @@  const_int_cost (HOST_WIDE_INT i)
 static bool
 m68k_rtx_costs (rtx x, machine_mode mode, int outer_code,
 		int opno ATTRIBUTE_UNUSED,
-		int *total, bool speed ATTRIBUTE_UNUSED)
+		int *total, int *count, bool speed ATTRIBUTE_UNUSED)
 {
   int code = GET_CODE (x);
 
+  *count = 0;
+
   switch (code)
     {
     case CONST_INT:
diff --git a/gcc/config/mcore/mcore.cc b/gcc/config/mcore/mcore.cc
index ca672547494..98c9a076217 100644
--- a/gcc/config/mcore/mcore.cc
+++ b/gcc/config/mcore/mcore.cc
@@ -127,7 +127,7 @@  static int        mcore_const_costs             (rtx, RTX_CODE);
 static int        mcore_and_cost                (rtx);
 static int        mcore_ior_cost                (rtx);
 static bool       mcore_rtx_costs		(rtx, machine_mode, int, int,
-						 int *, bool);
+						 int *, int *, bool);
 static void       mcore_external_libcall	(rtx);
 static bool       mcore_return_in_memory	(const_tree, const_tree);
 static int        mcore_arg_partial_bytes       (cumulative_args_t,
@@ -538,10 +538,12 @@  mcore_ior_cost (rtx x)
 static bool
 mcore_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED, int outer_code,
 		 int opno ATTRIBUTE_UNUSED,
-		 int * total, bool speed ATTRIBUTE_UNUSED)
+		 int *total, int *count, bool speed ATTRIBUTE_UNUSED)
 {
   int code = GET_CODE (x);
 
+  *count = 0;
+
   switch (code)
     {
     case CONST_INT:
diff --git a/gcc/config/microblaze/microblaze.cc b/gcc/config/microblaze/microblaze.cc
index 3ea177b835e..7eb77ebb732 100644
--- a/gcc/config/microblaze/microblaze.cc
+++ b/gcc/config/microblaze/microblaze.cc
@@ -1298,11 +1298,13 @@  microblaze_expand_block_move (rtx dest, rtx src, rtx length, rtx align_rtx)
 
 static bool
 microblaze_rtx_costs (rtx x, machine_mode mode, int outer_code ATTRIBUTE_UNUSED,
-		      int opno ATTRIBUTE_UNUSED, int *total,
+		      int opno ATTRIBUTE_UNUSED, int *total, int *count,
 		      bool speed ATTRIBUTE_UNUSED)
 {
   int code = GET_CODE (x);
 
+  *count = 0;
+
   switch (code)
     {
     case MEM:
diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 9180dbbf843..647095b6c81 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -4174,13 +4174,16 @@  mips_set_reg_reg_cost (machine_mode mode)
 
 static bool
 mips_rtx_costs (rtx x, machine_mode mode, int outer_code,
-		int opno ATTRIBUTE_UNUSED, int *total, bool speed)
+		int opno ATTRIBUTE_UNUSED,
+		int *total, int *count, bool speed)
 {
   int code = GET_CODE (x);
   bool float_mode_p = FLOAT_MODE_P (mode);
   int cost;
   rtx addr;
 
+  *count = 0;
+
   /* The cost of a COMPARE is hard to define for MIPS.  COMPAREs don't
      appear in the instruction stream, and the cost of a comparison is
      really the cost of the branch or scc condition.  At the time of
diff --git a/gcc/config/mmix/mmix.cc b/gcc/config/mmix/mmix.cc
index 34743092749..cd264186dcd 100644
--- a/gcc/config/mmix/mmix.cc
+++ b/gcc/config/mmix/mmix.cc
@@ -143,7 +143,7 @@  static void mmix_setup_incoming_varargs
 static void mmix_file_start (void);
 static void mmix_file_end (void);
 static void mmix_init_libfuncs (void);
-static bool mmix_rtx_costs (rtx, machine_mode, int, int, int *, bool);
+static bool mmix_rtx_costs (rtx, machine_mode, int, int, int *, int *, bool);
 static int mmix_register_move_cost (machine_mode,
 				    reg_class_t, reg_class_t);
 static rtx mmix_struct_value_rtx (tree, int);
@@ -1227,6 +1227,7 @@  mmix_rtx_costs (rtx x ATTRIBUTE_UNUSED,
 		int outer_code ATTRIBUTE_UNUSED,
 		int opno ATTRIBUTE_UNUSED,
 		int *total ATTRIBUTE_UNUSED,
+		int *count ATTRIBUTE_UNUSED,
 		bool speed ATTRIBUTE_UNUSED)
 {
   /* For the time being, this is just a stub and we'll accept the
diff --git a/gcc/config/mn10300/mn10300.cc b/gcc/config/mn10300/mn10300.cc
index d56247afc08..ec1da26e6b2 100644
--- a/gcc/config/mn10300/mn10300.cc
+++ b/gcc/config/mn10300/mn10300.cc
@@ -2313,7 +2313,8 @@  mn10300_memory_move_cost (machine_mode mode ATTRIBUTE_UNUSED,
 
 static bool
 mn10300_rtx_costs (rtx x, machine_mode mode, int outer_code,
-		   int opno ATTRIBUTE_UNUSED, int *ptotal, bool speed)
+		   int opno ATTRIBUTE_UNUSED,
+		   int *ptotal, int *pcount, bool speed)
 {
   /* This value is used for SYMBOL_REF etc where we want to pretend
      we have a full 32-bit constant.  */
@@ -2321,6 +2322,8 @@  mn10300_rtx_costs (rtx x, machine_mode mode, int outer_code,
   int total;
   int code = GET_CODE (x);
 
+  *pcount = 0;
+
   switch (code)
     {
     case CONST_INT:
diff --git a/gcc/config/msp430/msp430.cc b/gcc/config/msp430/msp430.cc
index 85f499f175d..33cc67d6fcc 100644
--- a/gcc/config/msp430/msp430.cc
+++ b/gcc/config/msp430/msp430.cc
@@ -1510,6 +1510,7 @@  msp430_rtx_costs (rtx x,
 		  int	   outer_code ATTRIBUTE_UNUSED,
 		  int	   opno ATTRIBUTE_UNUSED,
 		  int *	   total,
+		  int *	   count,
 		  bool	   speed)
 {
   enum rtx_code code = GET_CODE (x);
@@ -1517,6 +1518,7 @@  msp430_rtx_costs (rtx x,
   rtx dst_inner, src_inner;
 
   *total = 0;
+  *count = 0;
   dst = XEXP (x, 0);
   if (GET_RTX_LENGTH (code) == 1)
     /* Some RTX that are single-op in GCC are double-op when translated to
diff --git a/gcc/config/nds32/nds32.cc b/gcc/config/nds32/nds32.cc
index 921102df51b..76296e6890e 100644
--- a/gcc/config/nds32/nds32.cc
+++ b/gcc/config/nds32/nds32.cc
@@ -3060,6 +3060,7 @@  nds32_rtx_costs (rtx x,
 		 int outer_code,
 		 int opno,
 		 int *total,
+		 int *count ATTRIBUTE_UNUSED,
 		 bool speed)
 {
   return nds32_rtx_costs_impl (x, mode, outer_code, opno, total, speed);
diff --git a/gcc/config/nios2/nios2.cc b/gcc/config/nios2/nios2.cc
index b435d7475f9..44c345ca3b7 100644
--- a/gcc/config/nios2/nios2.cc
+++ b/gcc/config/nios2/nios2.cc
@@ -1464,10 +1464,14 @@  static bool
 nios2_rtx_costs (rtx x, machine_mode mode,
 		 int outer_code,
 		 int opno,
-		 int *total, bool speed)
+		 int *total,
+		 int *count,
+		 bool speed)
 {
   int code = GET_CODE (x);
 
+  *count = 0;
+
   switch (code)
     {
       case CONST_INT:
diff --git a/gcc/config/or1k/or1k.cc b/gcc/config/or1k/or1k.cc
index 5eeed0e91be..76580f7e429 100644
--- a/gcc/config/or1k/or1k.cc
+++ b/gcc/config/or1k/or1k.cc
@@ -1593,8 +1593,10 @@  or1k_function_ok_for_sibcall (tree decl, tree /* exp */)
 
 static bool
 or1k_rtx_costs (rtx x, machine_mode mode, int outer_code, int /* opno */,
-		int *total, bool /* speed */)
+		int *total, int *count, bool /* speed */)
 {
+  *count = 0;
+
   switch (GET_CODE (x))
     {
     case CONST_INT:
diff --git a/gcc/config/pa/pa.cc b/gcc/config/pa/pa.cc
index 2ee987796f6..1737c2e7c8f 100644
--- a/gcc/config/pa/pa.cc
+++ b/gcc/config/pa/pa.cc
@@ -98,7 +98,7 @@  static void fix_range (const char *);
 static int hppa_register_move_cost (machine_mode mode, reg_class_t,
 				    reg_class_t);
 static int hppa_address_cost (rtx, machine_mode mode, addr_space_t, bool);
-static bool hppa_rtx_costs (rtx, machine_mode, int, int, int *, bool);
+static bool hppa_rtx_costs (rtx, machine_mode, int, int, int *, int *, bool);
 static inline rtx force_mode (machine_mode, rtx);
 static void pa_reorg (void);
 static void pa_combine_instructions (void);
@@ -1540,10 +1540,12 @@  hppa_rtx_costs_shadd_p (rtx x)
 static bool
 hppa_rtx_costs (rtx x, machine_mode mode, int outer_code,
 		int opno ATTRIBUTE_UNUSED,
-		int *total, bool speed)
+		int *total, int *count, bool speed)
 {
   int code = GET_CODE (x);
 
+  *count = 0;
+
   switch (code)
     {
     case CONST_INT:
diff --git a/gcc/config/pdp11/pdp11.cc b/gcc/config/pdp11/pdp11.cc
index 478297e4a58..6f28f19cf1f 100644
--- a/gcc/config/pdp11/pdp11.cc
+++ b/gcc/config/pdp11/pdp11.cc
@@ -150,7 +150,7 @@  decode_pdp11_d (const struct real_format *fmt ATTRIBUTE_UNUSED,
 
 static const char *singlemove_string (rtx *);
 static bool pdp11_assemble_integer (rtx, unsigned int, int);
-static bool pdp11_rtx_costs (rtx, machine_mode, int, int, int *, bool);
+static bool pdp11_rtx_costs (rtx, machine_mode, int, int, int *, int *, bool);
 static int pdp11_addr_cost (rtx, machine_mode, addr_space_t, bool);
 static int pdp11_insn_cost (rtx_insn *insn, bool speed);
 static rtx_insn *pdp11_md_asm_adjust (vec<rtx> &, vec<rtx> &,
@@ -973,13 +973,16 @@  pdp11_register_move_cost (machine_mode mode ATTRIBUTE_UNUSED,
    inevitably a rough approximation.  */
 static bool
 pdp11_rtx_costs (rtx x, machine_mode mode, int outer_code,
-		 int opno ATTRIBUTE_UNUSED, int *total, bool speed)
+		 int opno ATTRIBUTE_UNUSED,
+		 int *total, int *count, bool speed)
 {
   const int code = GET_CODE (x);
   const int asize = (mode == QImode) ? 2 : GET_MODE_SIZE (mode);
   rtx src, dest;
   const char *fmt;
   
+  *count = 0;
+
   switch (code)
     {
     case CONST_INT:
diff --git a/gcc/config/pru/pru.cc b/gcc/config/pru/pru.cc
index fd1924e38dc..539d037b479 100644
--- a/gcc/config/pru/pru.cc
+++ b/gcc/config/pru/pru.cc
@@ -635,10 +635,12 @@  pru_option_override (void)
 static bool
 pru_rtx_costs (rtx x, machine_mode mode,
 	       int outer_code, int opno ATTRIBUTE_UNUSED,
-	       int *total, bool speed ATTRIBUTE_UNUSED)
+	       int *total, int *count, bool speed ATTRIBUTE_UNUSED)
 {
   const int code = GET_CODE (x);
 
+  *count = 0;
+
   switch (code)
     {
     case CONST_INT:
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 0d1cbc5cb5f..881a5e0c984 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2931,8 +2931,10 @@  riscv_extend_cost (rtx op, bool unsigned_p)
 
 static bool
 riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UNUSED,
-		 int *total, bool speed)
+		 int *total, int *count, bool speed)
 {
+  *count = 0;
+
   /* TODO: We set RVV instruction cost as 1 by default.
      Cost Model need to be well analyzed and supported in the future. */
   if (riscv_v_ext_mode_p (mode))
@@ -2951,7 +2953,10 @@  riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
 	 then cost based on the SET_SRC alone.  */
       if (outer_code == INSN && REG_P (SET_DEST (x)))
 	{
-	  riscv_rtx_costs (SET_SRC (x), mode, outer_code, opno, total, speed);
+	  int cnt;
+	  riscv_rtx_costs (SET_SRC (x), mode, outer_code, opno,
+			   total, &cnt, speed);
+	  *count = cnt;
 	  return true;
 	}
 
diff --git a/gcc/config/rl78/rl78.cc b/gcc/config/rl78/rl78.cc
index f3507280859..66b8ccf0c65 100644
--- a/gcc/config/rl78/rl78.cc
+++ b/gcc/config/rl78/rl78.cc
@@ -4365,10 +4365,13 @@  rl78_rtx_costs (rtx          x,
 		int          outer_code ATTRIBUTE_UNUSED,
 		int          opno ATTRIBUTE_UNUSED,
 		int *        total,
+		int *	     count,
 		bool         speed ATTRIBUTE_UNUSED)
 {
   int code = GET_CODE (x);
 
+  *count = 0;
+
   if (code == IF_THEN_ELSE)
     {
       *total = COSTS_N_INSNS (10);
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 6b9a40fcc66..1f91c632622 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22261,8 +22261,11 @@  rs6000_cannot_copy_insn_p (rtx_insn *insn)
 
 static bool
 rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
-		  int opno ATTRIBUTE_UNUSED, int *total, bool speed)
+		  int opno ATTRIBUTE_UNUSED,
+		  int *total, int *count, bool speed)
 {
+  *count = 0;
+
   int code = GET_CODE (x);
 
   switch (code)
@@ -22632,18 +22635,19 @@  rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 
 static bool
 rs6000_debug_rtx_costs (rtx x, machine_mode mode, int outer_code,
-			int opno, int *total, bool speed)
+			int opno, int *total, int *count, bool speed)
 {
-  bool ret = rs6000_rtx_costs (x, mode, outer_code, opno, total, speed);
+  bool ret = rs6000_rtx_costs (x, mode, outer_code, opno, total, count, speed);
 
   fprintf (stderr,
 	   "\nrs6000_rtx_costs, return = %s, mode = %s, outer_code = %s, "
-	   "opno = %d, total = %d, speed = %s, x:\n",
+	   "opno = %d, total = %d, count = %d, speed = %s, x:\n",
 	   ret ? "complete" : "scan inner",
 	   GET_MODE_NAME (mode),
 	   GET_RTX_NAME (outer_code),
 	   opno,
 	   *total,
+	   *count,
 	   speed ? "true" : "false");
 
   debug_rtx (x);
diff --git a/gcc/config/rx/rx.cc b/gcc/config/rx/rx.cc
index 0754e286552..e42ee2286ca 100644
--- a/gcc/config/rx/rx.cc
+++ b/gcc/config/rx/rx.cc
@@ -3008,8 +3008,10 @@  rx_address_cost (rtx addr, machine_mode mode ATTRIBUTE_UNUSED,
 
 static bool
 rx_rtx_costs (rtx x, machine_mode mode, int outer_code ATTRIBUTE_UNUSED,
-	      int opno ATTRIBUTE_UNUSED, int* total, bool speed)
+	      int opno ATTRIBUTE_UNUSED, int* total, int* count, bool speed)
 {
+  *count = 0;
+
   if (x == const0_rtx)
     {
       *total = 0;
diff --git a/gcc/config/s390/s390.cc b/gcc/config/s390/s390.cc
index a5c36b43972..6f628c364d8 100644
--- a/gcc/config/s390/s390.cc
+++ b/gcc/config/s390/s390.cc
@@ -3768,8 +3768,10 @@  s390_memory_move_cost (machine_mode mode ATTRIBUTE_UNUSED,
 static bool
 s390_rtx_costs (rtx x, machine_mode mode, int outer_code,
 		int opno ATTRIBUTE_UNUSED,
-		int *total, bool speed ATTRIBUTE_UNUSED)
+		int *total, int *count, bool speed ATTRIBUTE_UNUSED)
 {
+  *count = 0;
+
   int code = GET_CODE (x);
   switch (code)
     {
diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc
index 8c378b28b6d..896fb628f4e 100644
--- a/gcc/config/sh/sh.cc
+++ b/gcc/config/sh/sh.cc
@@ -258,7 +258,7 @@  static int multcosts (rtx);
 static bool unspec_caller_rtx_p (rtx);
 static bool sh_cannot_copy_insn_p (rtx_insn *);
 static bool sh_cannot_force_const_mem_p (machine_mode, rtx);
-static bool sh_rtx_costs (rtx, machine_mode, int, int, int *, bool);
+static bool sh_rtx_costs (rtx, machine_mode, int, int, int *, int *, bool);
 static int sh_address_cost (rtx, machine_mode, addr_space_t, bool);
 static int sh_pr_n_sets (void);
 static rtx sh_allocate_initial_value (rtx);
@@ -3227,10 +3227,12 @@  multcosts (rtx x ATTRIBUTE_UNUSED)
 static bool
 sh_rtx_costs (rtx x, machine_mode mode ATTRIBUTE_UNUSED, int outer_code,
 	      int opno ATTRIBUTE_UNUSED,
-	      int *total, bool speed ATTRIBUTE_UNUSED)
+	      int *total, int *count, bool speed ATTRIBUTE_UNUSED)
 {
   int code = GET_CODE (x);
 
+  *count = 0;
+
   switch (code)
     {
       /* The lower-subreg pass decides whether to split multi-word regs
diff --git a/gcc/config/sparc/sparc.cc b/gcc/config/sparc/sparc.cc
index c09dbcde7c5..229fd79b9d1 100644
--- a/gcc/config/sparc/sparc.cc
+++ b/gcc/config/sparc/sparc.cc
@@ -649,7 +649,7 @@  static rtx sparc_tls_get_addr (void);
 static rtx sparc_tls_got (void);
 static int sparc_register_move_cost (machine_mode,
 				     reg_class_t, reg_class_t);
-static bool sparc_rtx_costs (rtx, machine_mode, int, int, int *, bool);
+static bool sparc_rtx_costs (rtx, machine_mode, int, int, int *, int *, bool);
 static machine_mode sparc_promote_function_mode (const_tree, machine_mode,
 						      int *, const_tree, int);
 static bool sparc_strict_argument_naming (cumulative_args_t);
@@ -12055,11 +12055,13 @@  sparc_fold_builtin (tree fndecl, int n_args ATTRIBUTE_UNUSED,
 static bool
 sparc_rtx_costs (rtx x, machine_mode mode, int outer_code,
 		 int opno ATTRIBUTE_UNUSED,
-		 int *total, bool speed ATTRIBUTE_UNUSED)
+		 int *total, int *count, bool speed ATTRIBUTE_UNUSED)
 {
   int code = GET_CODE (x);
   bool float_mode_p = FLOAT_MODE_P (mode);
 
+  *count = 0;
+
   switch (code)
     {
     case CONST_INT:
diff --git a/gcc/config/stormy16/stormy16.cc b/gcc/config/stormy16/stormy16.cc
index 071043b4128..070eb28563b 100644
--- a/gcc/config/stormy16/stormy16.cc
+++ b/gcc/config/stormy16/stormy16.cc
@@ -74,10 +74,13 @@  static GTY(()) section *bss100_section;
 static bool
 xstormy16_rtx_costs (rtx x, machine_mode mode,
 		     int outer_code ATTRIBUTE_UNUSED,
-		     int opno ATTRIBUTE_UNUSED, int *total, bool speed_p)
+		     int opno ATTRIBUTE_UNUSED,
+		     int *total, int *count, bool speed_p)
 {
   rtx_code code = GET_CODE (x);
 
+  *count = 0;
+
   switch (code)
     {
     case CONST_INT:
diff --git a/gcc/config/v850/v850.cc b/gcc/config/v850/v850.cc
index 50c91c68b8b..a59dad267d0 100644
--- a/gcc/config/v850/v850.cc
+++ b/gcc/config/v850/v850.cc
@@ -319,10 +319,13 @@  const_costs (rtx r, enum rtx_code c)
 
 static bool
 v850_rtx_costs (rtx x, machine_mode mode, int outer_code,
-		int opno ATTRIBUTE_UNUSED, int *total, bool speed)
+		int opno ATTRIBUTE_UNUSED,
+		int *total, int *count, bool speed)
 {
   enum rtx_code code = GET_CODE (x);
 
+  *count = 0;
+
   switch (code)
     {
     case CONST_INT:
diff --git a/gcc/config/vax/vax.cc b/gcc/config/vax/vax.cc
index ccaf14b27d2..a6c970194e2 100644
--- a/gcc/config/vax/vax.cc
+++ b/gcc/config/vax/vax.cc
@@ -54,7 +54,7 @@  static void vax_output_mi_thunk (FILE *, tree, HOST_WIDE_INT,
 				 HOST_WIDE_INT, tree);
 static int vax_address_cost_1 (rtx);
 static int vax_address_cost (rtx, machine_mode, addr_space_t, bool);
-static bool vax_rtx_costs (rtx, machine_mode, int, int, int *, bool);
+static bool vax_rtx_costs (rtx, machine_mode, int, int, int *, int *, bool);
 static machine_mode vax_cc_modes_compatible (machine_mode, machine_mode);
 static rtx_insn *vax_md_asm_adjust (vec<rtx> &, vec<rtx> &,
 				    vec<machine_mode> &, vec<const char *> &,
@@ -783,12 +783,14 @@  vax_address_cost (rtx x, machine_mode mode ATTRIBUTE_UNUSED,
 static bool
 vax_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	       int opno ATTRIBUTE_UNUSED,
-	       int *total, bool speed ATTRIBUTE_UNUSED)
+	       int *total, int *count, bool speed ATTRIBUTE_UNUSED)
 {
   enum rtx_code code = GET_CODE (x);
   int i = 0;				   /* may be modified in switch */
   const char *fmt = GET_RTX_FORMAT (code); /* may be modified in switch */
 
+  *count = 0;
+
   switch (code)
     {
       /* On a VAX, constants from 0..63 are cheap because they can use the
diff --git a/gcc/config/visium/visium.cc b/gcc/config/visium/visium.cc
index 0691ea2ad13..33749216b33 100644
--- a/gcc/config/visium/visium.cc
+++ b/gcc/config/visium/visium.cc
@@ -223,7 +223,7 @@  static int visium_register_move_cost (machine_mode, reg_class_t,
 
 static int visium_memory_move_cost (machine_mode, reg_class_t, bool);
 
-static bool visium_rtx_costs (rtx, machine_mode, int, int, int *, bool);
+static bool visium_rtx_costs (rtx, machine_mode, int, int, int *, int *, bool);
 
 static void visium_option_override (void);
 
@@ -2021,11 +2021,13 @@  visium_memory_move_cost (machine_mode mode,
 
 static bool
 visium_rtx_costs (rtx x, machine_mode mode, int outer_code ATTRIBUTE_UNUSED,
-		  int opno ATTRIBUTE_UNUSED, int *total,
+		  int opno ATTRIBUTE_UNUSED, int *total, int *count,
 		  bool speed ATTRIBUTE_UNUSED)
 {
   int code = GET_CODE (x);
 
+  *count = 0;
+
   switch (code)
     {
     case CONST_INT:
diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc
index a4f8e3e49d0..0c7a6b6dbb0 100644
--- a/gcc/config/xtensa/xtensa.cc
+++ b/gcc/config/xtensa/xtensa.cc
@@ -129,7 +129,7 @@  static unsigned int xtensa_multibss_section_type_flags (tree, const char *,
 							int) ATTRIBUTE_UNUSED;
 static section *xtensa_select_rtx_section (machine_mode, rtx,
 					   unsigned HOST_WIDE_INT);
-static bool xtensa_rtx_costs (rtx, machine_mode, int, int, int *, bool);
+static bool xtensa_rtx_costs (rtx, machine_mode, int, int, int *, int *, bool);
 static int xtensa_insn_cost (rtx_insn *, bool);
 static int xtensa_register_move_cost (machine_mode, reg_class_t,
 				      reg_class_t);
@@ -4420,10 +4420,12 @@  xtensa_register_move_cost (machine_mode mode ATTRIBUTE_UNUSED,
 static bool
 xtensa_rtx_costs (rtx x, machine_mode mode, int outer_code,
 		  int opno ATTRIBUTE_UNUSED,
-		  int *total, bool speed)
+		  int *total, int *count, bool speed)
 {
   int code = GET_CODE (x);
 
+  *count = 0;
+
   switch (code)
     {
     case CONST_INT:
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 22a1bebc0f4..7b4c9845af8 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -7170,7 +7170,7 @@  optimizers should use optab @code{rintdf2}.
 The default hook returns true for all inputs.
 @end deftypefn
 
-@deftypefn {Target Hook} bool TARGET_RTX_COSTS (rtx @var{x}, machine_mode @var{mode}, int @var{outer_code}, int @var{opno}, int *@var{total}, bool @var{speed})
+@deftypefn {Target Hook} bool TARGET_RTX_COSTS (rtx @var{x}, machine_mode @var{mode}, int @var{outer_code}, int @var{opno}, int *@var{total}, int *@var{count}, bool @var{speed})
 This target hook describes the relative costs of RTL expressions.
 
 The cost may depend on the precise form of the expression, which is
@@ -7194,6 +7194,11 @@  necessary.  Traditionally, the default costs are @code{COSTS_N_INSNS (5)}
 for multiplications, @code{COSTS_N_INSNS (7)} for division and modulus
 operations, and @code{COSTS_N_INSNS (1)} for all other operations.
 
+On entry to the hook, @code{*@var{count}} contains a default estimate
+for the instruction count of the expression.  The hook should modify this
+value as necessary.  If 0, @code{*@var{total} / COSTS_N_INSNS (1)}
+should be assumed.
+
 When optimizing for code size, i.e.@: when @code{speed} is
 false, this target hook should be used to estimate the relative
 size cost of an expression, again relative to @code{COSTS_N_INSNS}.
diff --git a/gcc/hooks.cc b/gcc/hooks.cc
index fe59bbd7d26..0edd607523e 100644
--- a/gcc/hooks.cc
+++ b/gcc/hooks.cc
@@ -368,6 +368,13 @@  hook_bool_rtx_mode_int_int_intp_bool_false (rtx, machine_mode, int, int,
   return false;
 }
 
+bool
+hook_bool_rtx_mode_int_int_intp_intp_bool_false (rtx, machine_mode, int, int,
+					    int *, int *, bool)
+{
+  return false;
+}
+
 bool
 hook_bool_wint_wint_uint_bool_true (const widest_int &, const widest_int &,
 				    unsigned int, bool)
diff --git a/gcc/hooks.h b/gcc/hooks.h
index 3a02b6c8ac5..cea0de7d22e 100644
--- a/gcc/hooks.h
+++ b/gcc/hooks.h
@@ -69,6 +69,11 @@  extern bool hook_bool_mode_reg_class_t_reg_class_t_false (machine_mode,
 							  reg_class_t);
 extern bool hook_bool_rtx_mode_int_int_intp_bool_false (rtx, machine_mode,
 							int, int, int *, bool);
+extern bool hook_bool_rtx_mode_int_int_intp_intp_bool_false (rtx,
+							     machine_mode,
+							     int, int,
+							     int *, int *,
+							     bool);
 extern bool hook_bool_tree_tree_false (tree, tree);
 extern bool hook_bool_tree_tree_true (tree, tree);
 extern bool hook_bool_tree_bool_false (tree, bool);
diff --git a/gcc/rtl.h b/gcc/rtl.h
index e4b6cc0dbb5..cdae6212852 100644
--- a/gcc/rtl.h
+++ b/gcc/rtl.h
@@ -2079,6 +2079,8 @@  struct full_rtx_costs
 {
   int speed;
   int size;
+  int speed_count;
+  int size_count;
 };
 
 /* Initialize a full_rtx_costs structure C to the maximum cost.  */
@@ -2095,6 +2097,8 @@  init_costs_to_zero (struct full_rtx_costs *c)
 {
   c->speed = 0;
   c->size = 0;
+  c->speed_count = 0;
+  c->size_count = 0;
 }
 
 /* Compare two full_rtx_costs structures A and B, returning true
@@ -2112,12 +2116,23 @@  costs_lt_p (struct full_rtx_costs *a, struct full_rtx_costs *b,
 }
 
 /* Increase both members of the full_rtx_costs structure C by the
-   cost of N insns.  */
+   cost of N insns.  If we are adding a single expensive instruction
+   set expensive to true, so the count will be add with 1.  */
 inline void
-costs_add_n_insns (struct full_rtx_costs *c, int n)
+costs_add_n_insns (struct full_rtx_costs *c, int n, bool expensive=false)
 {
   c->speed += COSTS_N_INSNS (n);
   c->size += COSTS_N_INSNS (n);
+  if (expensive)
+    {
+      c->speed_count += 1;
+      c->size_count += 1;
+    }
+  else
+    {
+      c->speed_count += n;
+      c->size_count += n;
+    }
 }
 
 /* Describes the shape of a subreg:
@@ -2424,6 +2439,8 @@  poly_int_rtx_p (const_rtx x, poly_int64 *res)
 
 extern void init_rtlanal (void);
 extern int rtx_cost (rtx, machine_mode, enum rtx_code, int, bool);
+extern int rtx_cost_and_count (rtx, machine_mode, enum rtx_code,
+			       int, bool, int *);
 extern int address_cost (rtx, machine_mode, addr_space_t, bool);
 extern void get_full_rtx_cost (rtx, machine_mode, enum rtx_code, int,
 			       struct full_rtx_costs *);
diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc
index 87267ee3b88..adf5acd43c4 100644
--- a/gcc/rtlanal.cc
+++ b/gcc/rtlanal.cc
@@ -4520,13 +4520,25 @@  label_is_jump_target_p (const_rtx label, const rtx_insn *jump_insn)
 int
 rtx_cost (rtx x, machine_mode mode, enum rtx_code outer_code,
 	  int opno, bool speed)
+{
+  int count;
+  return rtx_cost_and_count (x, mode, outer_code, opno, speed, &count);
+}
+
+int
+rtx_cost_and_count (rtx x, machine_mode mode, enum rtx_code outer_code,
+		    int opno, bool speed, int *insn_count)
 {
   int i, j;
   enum rtx_code code;
   const char *fmt;
-  int total;
+  int total, cost;
   int factor;
   unsigned mode_size;
+  int sub_count;
+  bool finish;
+
+  *insn_count = 0;
 
   if (x == 0)
     return 0;
@@ -4560,6 +4572,7 @@  rtx_cost (rtx x, machine_mode mode, enum rtx_code outer_code,
 	 number of units (translated from digits) when using
 	 schoolbook long multiplication.  */
       total = factor * factor * COSTS_N_INSNS (5);
+      (*insn_count) ++;
       break;
     case DIV:
     case UDIV:
@@ -4569,38 +4582,53 @@  rtx_cost (rtx x, machine_mode mode, enum rtx_code outer_code,
     case US_DIV:
       /* Similarly, complexity for schoolbook long division.  */
       total = factor * factor * COSTS_N_INSNS (7);
+      (*insn_count) ++;
       break;
     case USE:
       /* Used in combine.cc as a marker.  */
       total = 0;
+      (*insn_count) = 0;
       break;
     default:
       total = factor * COSTS_N_INSNS (1);
+      (*insn_count) ++;
     }
 
   switch (code)
     {
     case REG:
+      (*insn_count) = 0;
       return 0;
 
     case SUBREG:
+      (*insn_count) = 0;
       total = 0;
       /* If we can't tie these modes, make this expensive.  The larger
 	 the mode, the more expensive it is.  */
       if (!targetm.modes_tieable_p (mode, GET_MODE (SUBREG_REG (x))))
-	return COSTS_N_INSNS (2 + factor);
+	{
+	  (*insn_count) ++;
+	  return COSTS_N_INSNS (2 + factor);
+	}
       break;
 
     case TRUNCATE:
       if (targetm.modes_tieable_p (mode, GET_MODE (XEXP (x, 0))))
 	{
+	  (*insn_count) = 0;
 	  total = 0;
 	  break;
 	}
       /* FALLTHRU */
     default:
-      if (targetm.rtx_costs (x, mode, outer_code, opno, &total, speed))
-	return total;
+      finish = targetm.rtx_costs (x, mode, outer_code, opno,
+				  &total, &sub_count, speed);
+      sub_count = sub_count ? sub_count : (total / COSTS_N_INSNS (1));
+      *insn_count = sub_count;
+      if (finish)
+	{
+	  return total;
+	}
       break;
     }
 
@@ -4610,10 +4638,22 @@  rtx_cost (rtx x, machine_mode mode, enum rtx_code outer_code,
   fmt = GET_RTX_FORMAT (code);
   for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
     if (fmt[i] == 'e')
-      total += rtx_cost (XEXP (x, i), mode, code, i, speed);
+      {
+	cost= rtx_cost_and_count (XEXP (x, i), mode,
+			code, i, speed, &sub_count);
+	total += cost;
+	sub_count = sub_count ? sub_count : (cost / COSTS_N_INSNS (1));
+	(*insn_count) += sub_count;
+      }
     else if (fmt[i] == 'E')
       for (j = 0; j < XVECLEN (x, i); j++)
-	total += rtx_cost (XVECEXP (x, i, j), mode, code, i, speed);
+	{
+	  cost = rtx_cost_and_count (XVECEXP (x, i, j),
+			mode, code, i, speed, &sub_count);
+	  total += cost;
+	  sub_count = sub_count ? sub_count : (cost / COSTS_N_INSNS (1));
+	  (*insn_count) += sub_count;
+	}
 
   return total;
 }
@@ -4625,8 +4665,10 @@  void
 get_full_rtx_cost (rtx x, machine_mode mode, enum rtx_code outer, int opno,
 		   struct full_rtx_costs *c)
 {
-  c->speed = rtx_cost (x, mode, outer, opno, true);
-  c->size = rtx_cost (x, mode, outer, opno, false);
+  c->speed = rtx_cost_and_count (x, mode, outer, opno,
+		  true, &(c->speed_count));
+  c->size = rtx_cost_and_count (x, mode, outer, opno,
+		  false, &(c->size_count));
 }
 
 
diff --git a/gcc/target.def b/gcc/target.def
index 0509e07d6b8..3b0f88f41f6 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3897,14 +3897,21 @@  necessary.  Traditionally, the default costs are @code{COSTS_N_INSNS (5)}\n\
 for multiplications, @code{COSTS_N_INSNS (7)} for division and modulus\n\
 operations, and @code{COSTS_N_INSNS (1)} for all other operations.\n\
 \n\
+On entry to the hook, @code{*@var{count}} contains a default estimate\n\
+for the instruction count of the expression.  The hook should modify this\n\
+value as necessary.  If 0, @code{*@var{total} / COSTS_N_INSNS (1)}\n\
+should be assumed.\n\
+\n\
 When optimizing for code size, i.e.@: when @code{speed} is\n\
 false, this target hook should be used to estimate the relative\n\
 size cost of an expression, again relative to @code{COSTS_N_INSNS}.\n\
 \n\
 The hook returns true when all subexpressions of @var{x} have been\n\
 processed, and false when @code{rtx_cost} should recurse.",
- bool, (rtx x, machine_mode mode, int outer_code, int opno, int *total, bool speed),
- hook_bool_rtx_mode_int_int_intp_bool_false)
+ bool,
+ (rtx x, machine_mode mode, int outer_code, int opno,
+  int *total, int *count, bool speed),
+ hook_bool_rtx_mode_int_int_intp_intp_bool_false)
 
 /* Compute the cost of X, used as an address.  Never called with
    invalid addresses.  */