RISC-V: Add RTX costs for `if_then_else' expressions

Message ID alpine.DEB.2.20.2203031440390.11552@tpp.orcam.me.uk
State New, archived
Headers
Series RISC-V: Add RTX costs for `if_then_else' expressions |

Commit Message

Maciej W. Rozycki July 18, 2022, 4:48 p.m. UTC
  Fix a performance regression from commit 391500af1932 ("Do not ignore 
costs of jump insns in combine."), a part of the m68k series for MODE_CC 
conversion (<https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01028.html>), 
observed in soft-fp code in libgcc used by some of the embench-iot 
benchmarks.

The immediate origin of the regression is the middle end, which in the 
absence of cost information from the backend estimates the cost of an 
RTL expression by assuming a single machine instruction for each of the 
expression's subexpression.

So for `if_then_else', which takes 3 operands, the estimated cost is 3 
instructions (i.e. 12 units) even though a branch instruction evaluates 
it in a single machine cycle (ignoring the cost of actually taking the 
branch of course, which is handled elsewhere).  Consequently an insn 
sequence like:

(insn 595 594 596 43 (set (reg:DI 305)
        (lshiftrt:DI (reg/v:DI 160 [ R_f ])
            (const_int 55 [0x37]))) ".../libgcc/soft-fp/adddf3.c":46:3 216 {lshrdi3}
     (nil))
(insn 596 595 597 43 (set (reg:DI 304)
        (and:DI (reg:DI 305)
            (const_int 1 [0x1]))) ".../libgcc/soft-fp/adddf3.c":46:3 109 {anddi3}
     (expr_list:REG_DEAD (reg:DI 305)
        (nil)))
(jump_insn 597 596 598 43 (set (pc)
        (if_then_else (eq (reg:DI 304)
                (const_int 0 [0]))
            (label_ref:DI 1644)
            (pc))) ".../libgcc/soft-fp/adddf3.c":46:3 237 {*branchdi}
     (expr_list:REG_DEAD (reg:DI 304)
        (int_list:REG_BR_PROB 536870916 (nil)))
 -> 1644)

does not (anymore, as from the commit referred) get combined into:

(note 595 594 596 43 NOTE_INSN_DELETED)
(note 596 595 597 43 NOTE_INSN_DELETED)
(jump_insn 597 596 598 43 (parallel [
            (set (pc)
                (if_then_else (eq (zero_extract:DI (reg/v:DI 160 [ R_f ])
                            (const_int 1 [0x1])
                            (const_int 55 [0x37]))
                        (const_int 0 [0]))
                    (label_ref:DI 1644)
                    (pc)))
            (clobber (scratch:DI))
        ]) ".../libgcc/soft-fp/adddf3.c":46:3 243 {*branch_on_bitdi}
     (int_list:REG_BR_PROB 536870916 (nil))
 -> 1644)

This is because the new cost is incorrectly calculated as 28 units while 
the cost of the original 3 instructions was 24:

rejecting combination of insns 595, 596 and 597
original costs 4 + 4 + 16 = 24
replacement cost 28

Before the commit referred the cost of jump instruction was ignored and 
considered 0 (i.e. unknown) and a sequence of instructions of a known 
cost used to win:

allowing combination of insns 595, 596 and 597
original costs 4 + 4 + 0 = 0
replacement cost 28

Add the missing costs for the 3 variants of `if_then_else' expressions 
we currently define in the backend.

With the fix in place the cost of this particular `if_then_else' pattern 
is 2 instructions or 8 units (because of the shift operation) and 
therefore the ultimate cost of the original 3 RTL insns will work out at 
16 units (4 + 4 + 8), however the replacement single RTL insn will cost 
8 units only.

	gcc/
	* config/riscv/riscv.cc (riscv_rtx_costs) <IF_THEN_ELSE>: New 
	case.
---
 gcc/config/riscv/riscv.cc |   27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

gcc-riscv-rtx-costs-if-then-else.diff
  

Comments

Li, Pan2 via Gcc-patches July 21, 2022, 7:25 a.m. UTC | #1
Hi Maciej:

LGTM, thanks for modeling this in cost model!

On Tue, Jul 19, 2022 at 12:48 AM Maciej W. Rozycki <macro@embecosm.com> wrote:
>
> Fix a performance regression from commit 391500af1932 ("Do not ignore
> costs of jump insns in combine."), a part of the m68k series for MODE_CC
> conversion (<https://gcc.gnu.org/ml/gcc-patches/2019-11/msg01028.html>),
> observed in soft-fp code in libgcc used by some of the embench-iot
> benchmarks.
>
> The immediate origin of the regression is the middle end, which in the
> absence of cost information from the backend estimates the cost of an
> RTL expression by assuming a single machine instruction for each of the
> expression's subexpression.
>
> So for `if_then_else', which takes 3 operands, the estimated cost is 3
> instructions (i.e. 12 units) even though a branch instruction evaluates
> it in a single machine cycle (ignoring the cost of actually taking the
> branch of course, which is handled elsewhere).  Consequently an insn
> sequence like:
>
> (insn 595 594 596 43 (set (reg:DI 305)
>         (lshiftrt:DI (reg/v:DI 160 [ R_f ])
>             (const_int 55 [0x37]))) ".../libgcc/soft-fp/adddf3.c":46:3 216 {lshrdi3}
>      (nil))
> (insn 596 595 597 43 (set (reg:DI 304)
>         (and:DI (reg:DI 305)
>             (const_int 1 [0x1]))) ".../libgcc/soft-fp/adddf3.c":46:3 109 {anddi3}
>      (expr_list:REG_DEAD (reg:DI 305)
>         (nil)))
> (jump_insn 597 596 598 43 (set (pc)
>         (if_then_else (eq (reg:DI 304)
>                 (const_int 0 [0]))
>             (label_ref:DI 1644)
>             (pc))) ".../libgcc/soft-fp/adddf3.c":46:3 237 {*branchdi}
>      (expr_list:REG_DEAD (reg:DI 304)
>         (int_list:REG_BR_PROB 536870916 (nil)))
>  -> 1644)
>
> does not (anymore, as from the commit referred) get combined into:
>
> (note 595 594 596 43 NOTE_INSN_DELETED)
> (note 596 595 597 43 NOTE_INSN_DELETED)
> (jump_insn 597 596 598 43 (parallel [
>             (set (pc)
>                 (if_then_else (eq (zero_extract:DI (reg/v:DI 160 [ R_f ])
>                             (const_int 1 [0x1])
>                             (const_int 55 [0x37]))
>                         (const_int 0 [0]))
>                     (label_ref:DI 1644)
>                     (pc)))
>             (clobber (scratch:DI))
>         ]) ".../libgcc/soft-fp/adddf3.c":46:3 243 {*branch_on_bitdi}
>      (int_list:REG_BR_PROB 536870916 (nil))
>  -> 1644)
>
> This is because the new cost is incorrectly calculated as 28 units while
> the cost of the original 3 instructions was 24:
>
> rejecting combination of insns 595, 596 and 597
> original costs 4 + 4 + 16 = 24
> replacement cost 28
>
> Before the commit referred the cost of jump instruction was ignored and
> considered 0 (i.e. unknown) and a sequence of instructions of a known
> cost used to win:
>
> allowing combination of insns 595, 596 and 597
> original costs 4 + 4 + 0 = 0
> replacement cost 28
>
> Add the missing costs for the 3 variants of `if_then_else' expressions
> we currently define in the backend.
>
> With the fix in place the cost of this particular `if_then_else' pattern
> is 2 instructions or 8 units (because of the shift operation) and
> therefore the ultimate cost of the original 3 RTL insns will work out at
> 16 units (4 + 4 + 8), however the replacement single RTL insn will cost
> 8 units only.
>
>         gcc/
>         * config/riscv/riscv.cc (riscv_rtx_costs) <IF_THEN_ELSE>: New
>         case.
> ---
>  gcc/config/riscv/riscv.cc |   27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
>
> gcc-riscv-rtx-costs-if-then-else.diff
> Index: gcc/gcc/config/riscv/riscv.cc
> ===================================================================
> --- gcc.orig/gcc/config/riscv/riscv.cc
> +++ gcc/gcc/config/riscv/riscv.cc
> @@ -1853,6 +1853,33 @@ riscv_rtx_costs (rtx x, machine_mode mod
>        /* Otherwise use the default handling.  */
>        return false;
>
> +    case IF_THEN_ELSE:
> +      if (TARGET_SFB_ALU
> +         && register_operand (XEXP (x, 1), mode)
> +         && sfb_alu_operand (XEXP (x, 2), mode)
> +         && comparison_operator (XEXP (x, 0), VOIDmode))
> +       {
> +         /* For predicated conditional-move operations we assume the cost
> +            of a single instruction even though there are actually two.  */
> +         *total = COSTS_N_INSNS (1);
> +         return true;
> +       }
> +      else if (LABEL_REF_P (XEXP (x, 1)) && XEXP (x, 2) == pc_rtx)
> +       {
> +         if (equality_operator (XEXP (x, 0), mode)
> +             && GET_CODE (XEXP (XEXP (x, 0), 0)) == ZERO_EXTRACT)
> +           {
> +             *total = COSTS_N_INSNS (SINGLE_SHIFT_COST + 1);
> +             return true;
> +           }
> +         if (order_operator (XEXP (x, 0), mode))
> +           {
> +             *total = COSTS_N_INSNS (1);
> +             return true;
> +           }
> +       }
> +      return false;
> +
>      case NOT:
>        *total = COSTS_N_INSNS (GET_MODE_SIZE (mode) > UNITS_PER_WORD ? 2 : 1);
>        return false;
  
Maciej W. Rozycki July 27, 2022, 10:27 a.m. UTC | #2
On Thu, 21 Jul 2022, Kito Cheng wrote:

> LGTM, thanks for modeling this in cost model!

 Patch applied now, thank you for your review.

  Maciej
  

Patch

Index: gcc/gcc/config/riscv/riscv.cc
===================================================================
--- gcc.orig/gcc/config/riscv/riscv.cc
+++ gcc/gcc/config/riscv/riscv.cc
@@ -1853,6 +1853,33 @@  riscv_rtx_costs (rtx x, machine_mode mod
       /* Otherwise use the default handling.  */
       return false;
 
+    case IF_THEN_ELSE:
+      if (TARGET_SFB_ALU
+	  && register_operand (XEXP (x, 1), mode)
+	  && sfb_alu_operand (XEXP (x, 2), mode)
+	  && comparison_operator (XEXP (x, 0), VOIDmode))
+	{
+	  /* For predicated conditional-move operations we assume the cost
+	     of a single instruction even though there are actually two.  */
+	  *total = COSTS_N_INSNS (1);
+	  return true;
+	}
+      else if (LABEL_REF_P (XEXP (x, 1)) && XEXP (x, 2) == pc_rtx)
+	{
+	  if (equality_operator (XEXP (x, 0), mode)
+	      && GET_CODE (XEXP (XEXP (x, 0), 0)) == ZERO_EXTRACT)
+	    {
+	      *total = COSTS_N_INSNS (SINGLE_SHIFT_COST + 1);
+	      return true;
+	    }
+	  if (order_operator (XEXP (x, 0), mode))
+	    {
+	      *total = COSTS_N_INSNS (1);
+	      return true;
+	    }
+	}
+      return false;
+
     case NOT:
       *total = COSTS_N_INSNS (GET_MODE_SIZE (mode) > UNITS_PER_WORD ? 2 : 1);
       return false;