RISC-V: Enable Hoist to GCSE simple constants
Checks
Commit Message
Hoist want_to_gcse_p () calls rtx_cost () to compute max distance for
hoist candidates . For a const with cost 1 backend currently returns 0,
causing Hoist to bail and elide GCSE.
Note that constants requiring more than 1 insns to setup were working
already since backend is returning 1 as well. Arguably that needs updating
as well to reflect cost better, but that should be a different change
anyways.
To keep testsuite parity, some V tests need updatinge which started failing
in the new costing regime.
gcc/ChangeLog:
* gcc/config/riscv.cc (riscv_rtx_costs): Adjust const_int cost.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/gcse-const.c: New Test
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c: Disable for
-O2.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c: Ditto.
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
---
gcc/config/riscv/riscv.cc | 9 ++-------
gcc/testsuite/gcc.target/riscv/gcse-const.c | 13 +++++++++++++
.../gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c | 2 +-
.../gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c | 2 +-
4 files changed, 17 insertions(+), 9 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/gcse-const.c
Comments
On 8/9/23 18:30, Vineet Gupta wrote:
> Hoist want_to_gcse_p () calls rtx_cost () to compute max distance for
> hoist candidates . For a const with cost 1 backend currently returns 0,
> causing Hoist to bail and elide GCSE.
>
> Note that constants requiring more than 1 insns to setup were working
> already since backend is returning 1 as well. Arguably that needs updating
> as well to reflect cost better, but that should be a different change
> anyways.
>
> To keep testsuite parity, some V tests need updatinge which started failing
> in the new costing regime.
>
> gcc/ChangeLog:
> * gcc/config/riscv.cc (riscv_rtx_costs): Adjust const_int cost.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/riscv/gcse-const.c: New Test
> * gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c: Disable for
> -O2.
> * gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c: Ditto.
Thanks for your patience on this. I needed a bit of time to gather my
thoughts and review some code.
> index 8b7256108157..1802eef908fc 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -2464,14 +2464,9 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
> case CONST:
> if ((cost = riscv_const_insns (x)) > 0)
> {
> - /* If the constant is likely to be stored in a GPR, SETs of
> - single-insn constants are as cheap as register sets; we
> - never want to CSE them. */
> + /* Hoist will GCSE constants only if cost is non-zero. */
> if (cost == 1 && outer_code == SET)
> - *total = 0;
> - /* When we load a constant more than once, it usually is better
> - to duplicate the last operation in the sequence than to CSE
> - the constant itself. */
> + *total = COSTS_N_INSNS (1);
> else if (outer_code == SET || GET_MODE (x) == VOIDmode)
> *total = COSTS_N_INSNS (1);
> }
So the concern here was we have two classes of constants which can be
synthesized in a single instruction.
One class would be those constants that can be used as-is in most
instructions. (const_int 0) being the most obvious, but of course
there's many others.
The other class can be synthesized in a single instruction, but aren't
typically usable in something like addi, andi, etc. A good example
might be (const_int 0x40000000).
I wanted to make sure we were doing something sensible across those two
cases. And I think we probably are as we have an earlier check in the
CONST_INT case (no I don't like the case fallthrus at all :(
So I think your change makes sense. But I think it can be refined to
simplify the larger chunk of code we're looking at:
> /* If the constant is likely to be stored in a GPR, SETs of
> single-insn constants are as cheap as register sets; we
> never want to CSE them. */
> if (cost == 1 && outer_code == SET)
> *total = 0;
> /* When we load a constant more than once, it usually is better
> to duplicate the last operation in the sequence than to CSE
> the constant itself. */
> else if (outer_code == SET || GET_MODE (x) == VOIDmode)
> *total = COSTS_N_INSNS (1);
Turns into
if (outer_code == SET || GET_MODE (x) == VOIDmode)
*total = COSTS_N_INSNS (1);
With a suitable comment about GCSE and the general desire to duplicate
the last op rather than CSE the constant for multi instruction constant
synthesis cases.
If you agree, then consider the patch pre-approved with that change. If
not, then state why and your original patch is OK as well.
Again, thanks for your patience on this.
jeff
On 8/23/23 13:04, Jeff Law wrote:
> Thanks for your patience on this. I needed a bit of time to gather my
> thoughts and review some code.
No worries at all.
>> index 8b7256108157..1802eef908fc 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -2464,14 +2464,9 @@ riscv_rtx_costs (rtx x, machine_mode mode, int
>> outer_code, int opno ATTRIBUTE_UN
>> case CONST:
>> if ((cost = riscv_const_insns (x)) > 0)
>> {
>> - /* If the constant is likely to be stored in a GPR, SETs of
>> - single-insn constants are as cheap as register sets; we
>> - never want to CSE them. */
>> + /* Hoist will GCSE constants only if cost is non-zero. */
>> if (cost == 1 && outer_code == SET)
>> - *total = 0;
>> - /* When we load a constant more than once, it usually is better
>> - to duplicate the last operation in the sequence than to CSE
>> - the constant itself. */
>> + *total = COSTS_N_INSNS (1);
>> else if (outer_code == SET || GET_MODE (x) == VOIDmode)
>> *total = COSTS_N_INSNS (1);
>> }
> So the concern here was we have two classes of constants which can be
> synthesized in a single instruction.
>
> One class would be those constants that can be used as-is in most
> instructions. (const_int 0) being the most obvious, but of course
> there's many others.
>
> The other class can be synthesized in a single instruction, but aren't
> typically usable in something like addi, andi, etc. A good example
> might be (const_int 0x40000000).
>
>
> I wanted to make sure we were doing something sensible across those
> two cases. And I think we probably are as we have an earlier check in
> the CONST_INT case
Exactly, I have a note written down to trace the call flow to remind
myself how this works. I'll add a comment here to make this clear.
> (no I don't like the case fallthrus at all :(
Seriously, I detest it too, but the irony is I've now made my 2nd change
in there and keep adding to ugliness :-(
>
> So I think your change makes sense. But I think it can be refined to
> simplify the larger chunk of code we're looking at:
>
>> /* If the constant is likely to be stored in a GPR, SETs of
>> single-insn constants are as cheap as register sets; we
>> never want to CSE them. */
>> if (cost == 1 && outer_code == SET)
>> *total = 0;
>> /* When we load a constant more than once, it usually is
>> better
>> to duplicate the last operation in the sequence than to CSE
>> the constant itself. */
>> else if (outer_code == SET || GET_MODE (x) == VOIDmode)
>> *total = COSTS_N_INSNS (1);
>
> Turns into
> if (outer_code == SET || GET_MODE (x) == VOIDmode)
> *total = COSTS_N_INSNS (1);
Yep that's what I started with too but then left it, leaving it as an
visual indication to fix things up when ultimately cost model returns
the actual num of insns for a non trivial large const.Leaving the code
there meant we But I agree I'll fold it and add a TODO comment for
improving the cost model.
For the current proposal I do want to understand/reason what is left
there - what cases are we trying to filter out with #2467 ?
| case CONST:
| if ((cost = riscv_const_insns (x)) > 0) # 2465
| {
| if (outer_code == SET || GET_MODE (x) == VOIDmode) # 2467
| *total = COSTS_N_INSNS
(1); # 2468
(1) AFAIU, VOIDmode is for const_int - and supposedly true for symbolic
addresses etc whose actual values are not known at compile time ? Or is
it needed just as an artifact of the weird fall through.
(2) outer_code SET will kick in for set_src_cost( ) call from Hoist,
which passes a const_int rtx directly.
But in case of say expand_mult () -> set_src_cost ( ) called for say
(mult:DI (reg:DI 134)
(const_int [0x202020202020202]))
In the eventual call for const_int operand, outer_code is MULT,
so we elide #2468
But wait we don't even hit #2467 since #2465 has a weird cap
inside which caps 6 to 0.
| case CONST_INT:
| {
| int cost = riscv_integer_cost (INTVAL (x));
| /* Force complicated constants to memory. */
| return cost < 4 ? cost : 0; #1321
| }
This definitely needs to be tracked separately in a PR
>
> With a suitable comment about GCSE and the general desire to duplicate
> the last op rather than CSE the constant for multi instruction
> constant synthesis cases.
Hmm, do we not prefer GCSE/CSE a 3-4 insn const sequence. It seems the
RA is capable enough of undoing that ;-)
For now I can I can keep the comment with current philosophy
>
> If you agree, then consider the patch pre-approved with that change.
> If not, then state why and your original patch is OK as well.
I'm afraid I have more questions than either of us were hoping for :-)
But it seems we can chunk up the work for just Hoist enabling and then
improve the const cost model with that new PR.
I'll spin up a buildroot testbed to get a wider impact of that change.
Thx,
-Vineet
On 8/23/23 18:42, Vineet Gupta wrote:
>
> Seriously, I detest it too, but the irony is I've now made my 2nd change
> in there and keep adding to ugliness :-(
Happens to all of us sometimes.
>
>>
>> So I think your change makes sense. But I think it can be refined to
>> simplify the larger chunk of code we're looking at:
>>
>>> /* If the constant is likely to be stored in a GPR, SETs of
>>> single-insn constants are as cheap as register sets; we
>>> never want to CSE them. */
>>> if (cost == 1 && outer_code == SET)
>>> *total = 0;
>>> /* When we load a constant more than once, it usually is
>>> better
>>> to duplicate the last operation in the sequence than to CSE
>>> the constant itself. */
>>> else if (outer_code == SET || GET_MODE (x) == VOIDmode)
>>> *total = COSTS_N_INSNS (1);
>>
>> Turns into
>> if (outer_code == SET || GET_MODE (x) == VOIDmode)
>> *total = COSTS_N_INSNS (1);
>
> Yep that's what I started with too but then left it, leaving it as an
> visual indication to fix things up when ultimately cost model returns
> the actual num of insns for a non trivial large const.Leaving the code
> there meant we But I agree I'll fold it and add a TODO comment for
> improving the cost model.
>
> For the current proposal I do want to understand/reason what is left
> there - what cases are we trying to filter out with #2467 ?
>
> | case CONST:
> | if ((cost = riscv_const_insns (x)) > 0) # 2465
> | {
> | if (outer_code == SET || GET_MODE (x) == VOIDmode) # 2467
> | *total = COSTS_N_INSNS
> (1); # 2468
>
> (1) AFAIU, VOIDmode is for const_int - and supposedly true for symbolic
> addresses etc whose actual values are not known at compile time ? Or is
> it needed just as an artifact of the weird fall through.
I'd expect it to filter away some symbolics and perhaps floating point
constants.
>
> (2) outer_code SET will kick in for set_src_cost( ) call from Hoist,
> which passes a const_int rtx directly.
> But in case of say expand_mult () -> set_src_cost ( ) called for say
> (mult:DI (reg:DI 134)
> (const_int [0x202020202020202]))
>
> In the eventual call for const_int operand, outer_code is MULT,
> so we elide #2468
>
> But wait we don't even hit #2467 since #2465 has a weird cap
> inside which caps 6 to 0.
>
> | case CONST_INT:
> | {
> | int cost = riscv_integer_cost (INTVAL (x));
> | /* Force complicated constants to memory. */
> | return cost < 4 ? cost : 0; #1321
> | }
>
> This definitely needs to be tracked separately in a PR
And when we solve it, it'll likely need to be uarch driven. I've hinted
before that there'll be changes in this area that will allow some
targets to handle most, if not all, of those complicated constants in a
single cycle.
>
>>
>> With a suitable comment about GCSE and the general desire to duplicate
>> the last op rather than CSE the constant for multi instruction
>> constant synthesis cases.
>
> Hmm, do we not prefer GCSE/CSE a 3-4 insn const sequence. It seems the
> RA is capable enough of undoing that ;-)
> For now I can I can keep the comment with current philosophy
I mentally set aside the multi-insn sequence concern. Presumably the
code was written that way for a reason and without a good one, backed by
data, I was hesitant to push back on that choice.
>
>>
>> If you agree, then consider the patch pre-approved with that change.
>> If not, then state why and your original patch is OK as well.
>
> I'm afraid I have more questions than either of us were hoping for :-)
>
> But it seems we can chunk up the work for just Hoist enabling and then
> improve the const cost model with that new PR.
> I'll spin up a buildroot testbed to get a wider impact of that change.
Exactly. THere's multiple issues, but I think they can be tackled
independently.
jeff
@@ -2464,14 +2464,9 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
case CONST:
if ((cost = riscv_const_insns (x)) > 0)
{
- /* If the constant is likely to be stored in a GPR, SETs of
- single-insn constants are as cheap as register sets; we
- never want to CSE them. */
+ /* Hoist will GCSE constants only if cost is non-zero. */
if (cost == 1 && outer_code == SET)
- *total = 0;
- /* When we load a constant more than once, it usually is better
- to duplicate the last operation in the sequence than to CSE
- the constant itself. */
+ *total = COSTS_N_INSNS (1);
else if (outer_code == SET || GET_MODE (x) == VOIDmode)
*total = COSTS_N_INSNS (1);
}
new file mode 100644
@@ -0,0 +1,13 @@
+/* Slightly modified copy of gcc.target/arm/pr40956.c. */
+/* { dg-options "-Os" } */
+/* Make sure the constant "6" is loaded into register only once. */
+/* { dg-final { scan-assembler-times "\tli.*6" 1 } } */
+
+int foo(int p, int* q)
+{
+ if (p!=9)
+ *q = 6;
+ else
+ *(q+1) = 6;
+ return 3;
+}
@@ -21,5 +21,5 @@ void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond, si
}
/* { dg-final { scan-assembler-times {vsetvli} 4 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-O2" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*m8,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
@@ -21,7 +21,7 @@ void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond, si
}
/* { dg-final { scan-assembler-times {vsetvli} 5 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-O2" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*m8,\s*t[au],\s*m[au]} 3 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */
/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf2,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts "-g" } } } } */