[RFC] RISC-V: elide sign extend when expanding cmp_and_jump
Checks
Commit Message
RV64 comapre and branch instructions only support 64-bit operands.
The backend unconditionally generates zero/sign extend at Expand time
for compare and branch operands even if they are already as such
e.g. function args which ABI guarantees to be sign-extended (at callsite).
And subsequently REE fails to eliminate them as
"missing defintion(s)"
specially for function args since they show up as missing explicit
definition.
So elide the sign extensions at Expand time for a subreg promoted var
with inner word sized value whcih doesn't need explicit sign extending
(fairly good represntative of an incoming function arg).
There are patches floating to enhance REE and/or new passes to elimiate
extensions, but it is always better to not generate them if possible,
given Expand is so early, the elimiated extend would help improve outcomes
of later passes such as combine if they have fewer operations/combinations
to deal with.
The test used was existing gcc.c-torture/execute/20020506-1.c -O2 zbb
It elimiates the SEXT.W and an additional branch
before after
------- ------
test2: test2:
sext.b a0,a0
blt a0,zero,.L15
bne a1,zero,.L17 bne a1,zero,.L17
This is marked RFC as I only ran a spot check with a directed test and
want to use Patrick's pre-commit CI to do the A/B testing for me.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_extend_comparands): Don't
sign-extend operand if subreg promoted with inner word size.
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
---
gcc/config/riscv/riscv.cc | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
Comments
Hi Vineet,
I was thinking of two things while skimming the code:
- Couldn't we do this in the expanders directly? Or is the
subreg-promoted info gone until we reach that?
- Should some common-code part be more suited to handle that?
We already elide redundant sign-zero extensions for other
reasons. Maybe we could add subreg promoted handling there?
Regards
Robin
On 10/25/23 01:12, Robin Dapp wrote:
> Hi Vineet,
>
> I was thinking of two things while skimming the code:
>
> - Couldn't we do this in the expanders directly? Or is the
> subreg-promoted info gone until we reach that?
Well, it doesn't seem like there's a lot of difference between doing it
in the generic expander bits vs target expander bits -- the former just
calls into the latter for the most part. Thus if the subreg-promoted
state is available in the target expander, I'd expect it to be available
in the generic expander.
It may be the case that we have more places to fix because we have
different expander paths (think conditional branches, conditional moves,
sCC and probably others). By catching it in riscv_expand_comparands he
caught a nice little choke point. I think what Vineet has done will
also work for RTL if-conversion.
>
> - Should some common-code part be more suited to handle that?
> We already elide redundant sign-zero extensions for other
> reasons. Maybe we could add subreg promoted handling there?
Unsure. I wouldn't be surprised if we were able to find similar code in
simplify-rtx or something like that. It's probably worth a quick looksie.
I also wonder if Vineet's work would subsume this local change. I've
been meaning to find testcases for this and determine if we should drop
it or clean it up and submit it upstream:
> +(define_insn "*branch<mode>_equals_zero"
> + [(set (pc)
> + (if_then_else
> + (match_operator 1 "equality_operator"
> + [(match_operand:ANYI 2 "register_operand" "r")
> + (const_int 0)])
> + (label_ref (match_operand 0 "" ""))
> + (pc)))]
> + "!partial_subreg_p (operands[2])"
> + "b%C1\t%2,zero,%0"
> + [(set_attr "type" "branch")
> + (set_attr "mode" "none")])
My sense is it's just papering over a missed simplification elsewhere.
Jeff
> Well, it doesn't seem like there's a lot of difference between doing
> it in the generic expander bits vs target expander bits -- the former
> just calls into the latter for the most part. Thus if the
> subreg-promoted state is available in the target expander, I'd expect
> it to be available in the generic expander.
Ah, sorry I meant in the [sign|zero]extend expanders rather than the
compare expanders in order to catch promoted subregs from other origins
as well. Maybe that doesn't work, though?
Regards
Robin
On 10/25/23 07:47, Robin Dapp wrote:
>
>> Well, it doesn't seem like there's a lot of difference between doing
>> it in the generic expander bits vs target expander bits -- the former
>> just calls into the latter for the most part. Thus if the
>> subreg-promoted state is available in the target expander, I'd expect
>> it to be available in the generic expander.
>
> Ah, sorry I meant in the [sign|zero]extend expanders rather than the
> compare expanders in order to catch promoted subregs from other origins
> as well. Maybe that doesn't work, though?
That's a *really* interesting idea. Interested in playing around a bit
with that Vineet?
jeff
Hey Robin,
On 10/25/23 00:12, Robin Dapp wrote:
> Hi Vineet,
>
> I was thinking of two things while skimming the code:
>
> - Couldn't we do this in the expanders directly? Or is the
> subreg-promoted info gone until we reach that?
Following is the call stack involved:
expand_gimple_cond
do_compare_and_jump
emit_cmp_and_jump_insns
gen_cbranchqi4
riscv_expand_conditional_branch
riscv_emit_int_compare
riscv_extend_comparands
Last function is what introduces the extraneous sign extends, w/o taking
subreg-promoted into consideration and what my patch attempts to address.
> - Should some common-code part be more suited to handle that?
> We already elide redundant sign-zero extensions for other
> reasons. Maybe we could add subreg promoted handling there?
Not in the context of this specific issue.
-Vineet
On 10/25/23 10:25, Vineet Gupta wrote:
> Hey Robin,
>
> On 10/25/23 00:12, Robin Dapp wrote:
>> Hi Vineet,
>>
>> I was thinking of two things while skimming the code:
>>
>> - Couldn't we do this in the expanders directly? Or is the
>> subreg-promoted info gone until we reach that?
>
> Following is the call stack involved:
>
> expand_gimple_cond
> do_compare_and_jump
> emit_cmp_and_jump_insns
> gen_cbranchqi4
> riscv_expand_conditional_branch
> riscv_emit_int_compare
> riscv_extend_comparands
>
>
> Last function is what introduces the extraneous sign extends, w/o taking
> subreg-promoted into consideration and what my patch attempts to address.
>
>> - Should some common-code part be more suited to handle that?
>> We already elide redundant sign-zero extensions for other
>> reasons. Maybe we could add subreg promoted handling there?
>
> Not in the context of this specific issue.
Robin's point (IIUC) is that if we put this logic into a zero/sign
extend expander, then it'll get used for *any* attempt to zero/sign
extend that goes through the target expander.
It doesn't work for your case because we use gen_rtx_{ZERO,SIGN}_EXTEND
directly. But if those were adjusted to use the expander, then Robin's
idea would be applicable to this case too.
Jeff
On 10/25/23 09:30, Jeff Law wrote:
>>> - Should some common-code part be more suited to handle that?
>>> We already elide redundant sign-zero extensions for other
>>> reasons. Maybe we could add subreg promoted handling there?
>>
>> Not in the context of this specific issue.
> Robin's point (IIUC) is that if we put this logic into a zero/sign
> extend expander, then it'll get used for *any* attempt to zero/sign
> extend that goes through the target expander.
>
> It doesn't work for your case because we use
> gen_rtx_{ZERO,SIGN}_EXTEND directly. But if those were adjusted to
> use the expander, then Robin's idea would be applicable to this case too
Understood. Definitely solid idea.
-Vineet
On 10/25/23 06:52, Jeff Law wrote:
>
> On 10/25/23 07:47, Robin Dapp wrote:
>>
>>> Well, it doesn't seem like there's a lot of difference between doing
>>> it in the generic expander bits vs target expander bits -- the former
>>> just calls into the latter for the most part. Thus if the
>>> subreg-promoted state is available in the target expander, I'd expect
>>> it to be available in the generic expander.
>>
>> Ah, sorry I meant in the [sign|zero]extend expanders rather than the
>> compare expanders in order to catch promoted subregs from other origins
>> as well. Maybe that doesn't work, though?
> That's a *really* interesting idea. Interested in playing around a
> bit with that Vineet?
Sure I'll tinker with the {sign,zero}expanders.
And there's a third playing field :-) There seem to be still cases where
subreg-promoted note is not set when it probably should.
So we end up in riscv_extend_comparands but with note not being there
(for something corresponding to function arg) thus can't skip the extension.
Thx,
-Vineet
On 10/24/23 22:01, Vineet Gupta wrote:
> RV64 comapre and branch instructions only support 64-bit operands.
> The backend unconditionally generates zero/sign extend at Expand time
> for compare and branch operands even if they are already as such
> e.g. function args which ABI guarantees to be sign-extended (at callsite).
>
> And subsequently REE fails to eliminate them as
> "missing defintion(s)"
> specially for function args since they show up as missing explicit
> definition.
>
> So elide the sign extensions at Expand time for a subreg promoted var
> with inner word sized value whcih doesn't need explicit sign extending
> (fairly good represntative of an incoming function arg).
>
> There are patches floating to enhance REE and/or new passes to elimiate
> extensions, but it is always better to not generate them if possible,
> given Expand is so early, the elimiated extend would help improve outcomes
> of later passes such as combine if they have fewer operations/combinations
> to deal with.
>
> The test used was existing gcc.c-torture/execute/20020506-1.c -O2 zbb
> It elimiates the SEXT.W and an additional branch
>
> before after
> ------- ------
> test2: test2:
> sext.b a0,a0
> blt a0,zero,.L15
> bne a1,zero,.L17 bne a1,zero,.L17
>
> This is marked RFC as I only ran a spot check with a directed test and
> want to use Patrick's pre-commit CI to do the A/B testing for me.
>
> gcc/ChangeLog:
> * config/riscv/riscv.cc (riscv_extend_comparands): Don't
> sign-extend operand if subreg promoted with inner word size.
>
> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
> ---
> gcc/config/riscv/riscv.cc | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index f2dcb0db6fbd..a8d12717e43d 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -3707,7 +3707,16 @@ riscv_extend_comparands (rtx_code code, rtx *op0, rtx *op1)
> }
> else
> {
> - *op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
> + /* A subreg promoted SI of DI could be peeled to expose DI, eliding
> + an unnecessary sign extension. */
> + if (GET_CODE (*op0) == SUBREG
> + && SUBREG_PROMOTED_VAR_P (*op0)
> + && GET_MODE_SIZE (GET_MODE (XEXP (*op0, 0))).to_constant ()
> + == GET_MODE_SIZE (word_mode))
> + *op0 = XEXP (*op0, 0);
> + else
> + *op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
> +
> if (*op1 != const0_rtx)
> *op1 = gen_rtx_SIGN_EXTEND (word_mode, *op1);
> }
Just a quick update that testing revealed additional failures and
unpacking which took me a while and in hindsight embarrassing. I was
misunderstanding what ABI guarantees and what ISA actually does :-)
The ABI guarantees sign extension this for 32-bit things in 64-bit reg
(so in rv64, int), but *not* 8-bit things in a reg. i.e. not for char
arguments.
e.g.
uint8_t abs8(uint8_t x)
{
if (x & 0x80) // SEXT.b a4, a0
...
}
main()
{
if (abs8(128) != 127) // LI a0, 128 => ADDI a0, x0, 128
__builtin_abort();
}
So my patch was optimizing away the SEXT.B (despite claiming that subreg
prom of SI....) which is wrong.
Sure ADDI in main () sign-extends, but it does that for 12-bit imm
0x080, which comes out to 0x80, but sign-extending for char scope 0x80
would yield 0xFFFF....0x80. Hence the issue.
I'll spin a v2 after more testing.
This is slightly disappointing as it reduces the scope of optimization,
but correctness in this trade is non-negotiable :-)
-Vineet
@@ -3707,7 +3707,16 @@ riscv_extend_comparands (rtx_code code, rtx *op0, rtx *op1)
}
else
{
- *op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
+ /* A subreg promoted SI of DI could be peeled to expose DI, eliding
+ an unnecessary sign extension. */
+ if (GET_CODE (*op0) == SUBREG
+ && SUBREG_PROMOTED_VAR_P (*op0)
+ && GET_MODE_SIZE (GET_MODE (XEXP (*op0, 0))).to_constant ()
+ == GET_MODE_SIZE (word_mode))
+ *op0 = XEXP (*op0, 0);
+ else
+ *op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
+
if (*op1 != const0_rtx)
*op1 = gen_rtx_SIGN_EXTEND (word_mode, *op1);
}