[v2] RISC-V: costs: support shift-and-add in strength-reduction
Checks
Commit Message
The strength-reduction implementation in expmed.cc will assess the
profitability of using shift-and-add using a RTL expression that wraps
a MULT (with a power-of-2) in a PLUS. Unless the RISC-V rtx_costs
function recognizes this as expressing a sh[123]add instruction, we
will return an inflated cost---thus defeating the optimization.
This change adds the necessary idiom recognition to provide an
accurate cost for this for of expressing sh[123]add.
Instead on expanding to
li a5,200
mulw a0,a5,a0
with this change, the expression 'a * 200' is sythesized as:
sh2add a0,a0,a0 // *5 = a + 4 * a
sh2add a0,a0,a0 // *5 = a + 4 * a
slli a0,a0,3 // *8
gcc/ChangeLog:
* config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd,
if expressed as a plus and multiplication with a power-of-2.
Split costing for MINUS from PLUS.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/zba-shNadd-07.c: New test.
Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
---
Changes in v2:
- Split rtx_costs calculation for MINUS from PLUS to ensure that
(minus reg (ashift reg SHAMT)) is not mistaken for a shNadd
- Add testcase
gcc/config/riscv/riscv.cc | 19 ++++++++++++
.../gcc.target/riscv/zba-shNadd-07.c | 31 +++++++++++++++++++
2 files changed, 50 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/zba-shNadd-07.c
Comments
On 11/10/22 14:34, Philipp Tomsich wrote:
> The strength-reduction implementation in expmed.cc will assess the
> profitability of using shift-and-add using a RTL expression that wraps
> a MULT (with a power-of-2) in a PLUS. Unless the RISC-V rtx_costs
> function recognizes this as expressing a sh[123]add instruction, we
> will return an inflated cost---thus defeating the optimization.
>
> This change adds the necessary idiom recognition to provide an
> accurate cost for this for of expressing sh[123]add.
>
> Instead on expanding to
> li a5,200
> mulw a0,a5,a0
> with this change, the expression 'a * 200' is sythesized as:
> sh2add a0,a0,a0 // *5 = a + 4 * a
> sh2add a0,a0,a0 // *5 = a + 4 * a
> slli a0,a0,3 // *8
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd,
> if expressed as a plus and multiplication with a power-of-2.
> Split costing for MINUS from PLUS.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/zba-shNadd-07.c: New test.
OK. Note that getting this right can impact one of the spec2017 integer
benchmarks notably. I don't recall which one, but it has a div and a
mod by the same constant which is fairly reasonably implement with
shifts and adds. You won't see it in instruction count data, but would
see it if you had cycle count data or instrumented for div/mod instructions.
Jeff
Applied to master. Thanks!
Note that the multiply-by-200 (in the testcase) originates from Dhrystone.
Philipp.
On Sun, 13 Nov 2022 at 02:23, Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
> On 11/10/22 14:34, Philipp Tomsich wrote:
> > The strength-reduction implementation in expmed.cc will assess the
> > profitability of using shift-and-add using a RTL expression that wraps
> > a MULT (with a power-of-2) in a PLUS. Unless the RISC-V rtx_costs
> > function recognizes this as expressing a sh[123]add instruction, we
> > will return an inflated cost---thus defeating the optimization.
> >
> > This change adds the necessary idiom recognition to provide an
> > accurate cost for this for of expressing sh[123]add.
> >
> > Instead on expanding to
> > li a5,200
> > mulw a0,a5,a0
> > with this change, the expression 'a * 200' is sythesized as:
> > sh2add a0,a0,a0 // *5 = a + 4 * a
> > sh2add a0,a0,a0 // *5 = a + 4 * a
> > slli a0,a0,3 // *8
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv.c (riscv_rtx_costs): Recognize shNadd,
> > if expressed as a plus and multiplication with a power-of-2.
> > Split costing for MINUS from PLUS.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/riscv/zba-shNadd-07.c: New test.
>
> OK. Note that getting this right can impact one of the spec2017 integer
> benchmarks notably. I don't recall which one, but it has a div and a
> mod by the same constant which is fairly reasonably implement with
> shifts and adds. You won't see it in instruction count data, but would
> see it if you had cycle count data or instrumented for div/mod instructions.
>
>
> Jeff
>
>
@@ -2428,6 +2428,12 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
return false;
case MINUS:
+ if (float_mode_p)
+ *total = tune_param->fp_add[mode == DFmode];
+ else
+ *total = riscv_binary_cost (x, 1, 4);
+ return false;
+
case PLUS:
/* add.uw pattern for zba. */
if (TARGET_ZBA
@@ -2451,6 +2457,19 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
*total = COSTS_N_INSNS (1);
return true;
}
+ /* Before strength-reduction, the shNadd can be expressed as the addition
+ of a multiplication with a power-of-two. If this case is not handled,
+ the strength-reduction in expmed.c will calculate an inflated cost. */
+ if (TARGET_ZBA
+ && mode == word_mode
+ && GET_CODE (XEXP (x, 0)) == MULT
+ && REG_P (XEXP (XEXP (x, 0), 0))
+ && CONST_INT_P (XEXP (XEXP (x, 0), 1))
+ && IN_RANGE (pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3))
+ {
+ *total = COSTS_N_INSNS (1);
+ return true;
+ }
/* shNadd.uw pattern for zba.
[(set (match_operand:DI 0 "register_operand" "=r")
(plus:DI
new file mode 100644
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba -mabi=lp64 -O2" } */
+
+unsigned long
+f1 (unsigned long i)
+{
+ return i * 200;
+}
+
+unsigned long
+f2 (unsigned long i)
+{
+ return i * 783;
+}
+
+unsigned long
+f3 (unsigned long i)
+{
+ return i * 784;
+}
+
+unsigned long
+f4 (unsigned long i)
+{
+ return i * 1574;
+}
+
+/* { dg-final { scan-assembler-times "sh2add" 2 } } */
+/* { dg-final { scan-assembler-times "sh1add" 2 } } */
+/* { dg-final { scan-assembler-times "slli" 5 } } */
+/* { dg-final { scan-assembler-times "mul" 1 } } */