[v2] RISC-V: costs: support shift-and-add in strength-reduction

Message ID 20221110213403.3592364-1-philipp.tomsich@vrull.eu
State Accepted
Headers
Series [v2] RISC-V: costs: support shift-and-add in strength-reduction |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Philipp Tomsich Nov. 10, 2022, 9:34 p.m. UTC
  The strength-reduction implementation in expmed.cc will assess the
profitability of using shift-and-add using a RTL expression that wraps
a MULT (with a power-of-2) in a PLUS.  Unless the RISC-V rtx_costs
function recognizes this as expressing a sh[123]add instruction, we
will return an inflated cost---thus defeating the optimization.

This change adds the necessary idiom recognition to provide an
accurate cost for this for of expressing sh[123]add.

Instead on expanding to
	li	a5,200
	mulw	a0,a5,a0
with this change, the expression 'a * 200' is sythesized as:
	sh2add	a0,a0,a0   // *5 = a + 4 * a
	sh2add	a0,a0,a0   // *5 = a + 4 * a
	slli	a0,a0,3    // *8

gcc/ChangeLog:

	* config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd,
	if expressed as a plus and multiplication with a power-of-2.
	Split costing for MINUS from PLUS.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/zba-shNadd-07.c: New test.

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
---

Changes in v2:
- Split rtx_costs calculation for MINUS from PLUS to ensure that
  (minus reg (ashift reg SHAMT)) is not mistaken for a shNadd
- Add testcase

 gcc/config/riscv/riscv.cc                     | 19 ++++++++++++
 .../gcc.target/riscv/zba-shNadd-07.c          | 31 +++++++++++++++++++
 2 files changed, 50 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zba-shNadd-07.c
  

Comments

Jeff Law Nov. 13, 2022, 1:23 a.m. UTC | #1
On 11/10/22 14:34, Philipp Tomsich wrote:
> The strength-reduction implementation in expmed.cc will assess the
> profitability of using shift-and-add using a RTL expression that wraps
> a MULT (with a power-of-2) in a PLUS.  Unless the RISC-V rtx_costs
> function recognizes this as expressing a sh[123]add instruction, we
> will return an inflated cost---thus defeating the optimization.
>
> This change adds the necessary idiom recognition to provide an
> accurate cost for this for of expressing sh[123]add.
>
> Instead on expanding to
> 	li	a5,200
> 	mulw	a0,a5,a0
> with this change, the expression 'a * 200' is sythesized as:
> 	sh2add	a0,a0,a0   // *5 = a + 4 * a
> 	sh2add	a0,a0,a0   // *5 = a + 4 * a
> 	slli	a0,a0,3    // *8
>
> gcc/ChangeLog:
>
> 	* config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd,
> 	if expressed as a plus and multiplication with a power-of-2.
> 	Split costing for MINUS from PLUS.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/riscv/zba-shNadd-07.c: New test.

OK.  Note that getting this right can impact one of the spec2017 integer 
benchmarks notably.  I don't recall which one, but it has a div and a 
mod by the same constant which is fairly reasonably implement with 
shifts and adds.  You won't see it in instruction count data, but would 
see it if you had cycle count data or instrumented for div/mod instructions.


Jeff
  
Philipp Tomsich Nov. 13, 2022, 3:40 p.m. UTC | #2
Applied to master. Thanks!

Note that the multiply-by-200 (in the testcase) originates from Dhrystone.

Philipp.


On Sun, 13 Nov 2022 at 02:23, Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
> On 11/10/22 14:34, Philipp Tomsich wrote:
> > The strength-reduction implementation in expmed.cc will assess the
> > profitability of using shift-and-add using a RTL expression that wraps
> > a MULT (with a power-of-2) in a PLUS.  Unless the RISC-V rtx_costs
> > function recognizes this as expressing a sh[123]add instruction, we
> > will return an inflated cost---thus defeating the optimization.
> >
> > This change adds the necessary idiom recognition to provide an
> > accurate cost for this for of expressing sh[123]add.
> >
> > Instead on expanding to
> >       li      a5,200
> >       mulw    a0,a5,a0
> > with this change, the expression 'a * 200' is sythesized as:
> >       sh2add  a0,a0,a0   // *5 = a + 4 * a
> >       sh2add  a0,a0,a0   // *5 = a + 4 * a
> >       slli    a0,a0,3    // *8
> >
> > gcc/ChangeLog:
> >
> >       * config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd,
> >       if expressed as a plus and multiplication with a power-of-2.
> >       Split costing for MINUS from PLUS.
> >
> > gcc/testsuite/ChangeLog:
> >
> >       * gcc.target/riscv/zba-shNadd-07.c: New test.
>
> OK.  Note that getting this right can impact one of the spec2017 integer
> benchmarks notably.  I don't recall which one, but it has a div and a
> mod by the same constant which is fairly reasonably implement with
> shifts and adds.  You won't see it in instruction count data, but would
> see it if you had cycle count data or instrumented for div/mod instructions.
>
>
> Jeff
>
>
  

Patch

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 3e2dc8192e4..2a94482b8ed 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2428,6 +2428,12 @@  riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
       return false;
 
     case MINUS:
+      if (float_mode_p)
+	*total = tune_param->fp_add[mode == DFmode];
+      else
+	*total = riscv_binary_cost (x, 1, 4);
+      return false;
+
     case PLUS:
       /* add.uw pattern for zba.  */
       if (TARGET_ZBA
@@ -2451,6 +2457,19 @@  riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
 	  *total = COSTS_N_INSNS (1);
 	  return true;
 	}
+      /* Before strength-reduction, the shNadd can be expressed as the addition
+	 of a multiplication with a power-of-two.  If this case is not handled,
+	 the strength-reduction in expmed.c will calculate an inflated cost. */
+      if (TARGET_ZBA
+	  && mode == word_mode
+	  && GET_CODE (XEXP (x, 0)) == MULT
+	  && REG_P (XEXP (XEXP (x, 0), 0))
+	  && CONST_INT_P (XEXP (XEXP (x, 0), 1))
+	  && IN_RANGE (pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3))
+	{
+	  *total = COSTS_N_INSNS (1);
+	  return true;
+	}
       /* shNadd.uw pattern for zba.
 	 [(set (match_operand:DI 0 "register_operand" "=r")
 	       (plus:DI
diff --git a/gcc/testsuite/gcc.target/riscv/zba-shNadd-07.c b/gcc/testsuite/gcc.target/riscv/zba-shNadd-07.c
new file mode 100644
index 00000000000..98d35e1da9b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zba-shNadd-07.c
@@ -0,0 +1,31 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba -mabi=lp64 -O2" } */
+
+unsigned long
+f1 (unsigned long i)
+{
+  return i * 200;
+}
+
+unsigned long
+f2 (unsigned long i)
+{
+  return i * 783;
+}
+
+unsigned long
+f3 (unsigned long i)
+{
+  return i * 784;
+}
+
+unsigned long
+f4 (unsigned long i)
+{
+  return i * 1574;
+}
+
+/* { dg-final { scan-assembler-times "sh2add" 2 } } */
+/* { dg-final { scan-assembler-times "sh1add" 2 } } */
+/* { dg-final { scan-assembler-times "slli" 5 } } */
+/* { dg-final { scan-assembler-times "mul" 1 } } */