[RFC] RISC-V: elide sign extend when expanding cmp_and_jump

Message ID 20231025050155.627837-1-vineetg@rivosinc.com
State Accepted
Headers
Series [RFC] RISC-V: elide sign extend when expanding cmp_and_jump |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Vineet Gupta Oct. 25, 2023, 5:01 a.m. UTC
  RV64 comapre and branch instructions only support 64-bit operands.
The backend unconditionally generates zero/sign extend at Expand time
for compare and branch operands even if they are already as such
e.g. function args which ABI guarantees to be sign-extended (at callsite).

And subsequently REE fails to eliminate them as
	"missing defintion(s)"
specially for function args since they show up as missing explicit
definition.

So elide the sign extensions at Expand time for a subreg promoted var
with inner word sized value whcih doesn't need explicit sign extending
(fairly good represntative of an incoming function arg).

There are patches floating to enhance REE and/or new passes to elimiate
extensions, but it is always better to not generate them if possible,
given Expand is so early, the elimiated extend would help improve outcomes
of later passes such as combine if they have fewer operations/combinations
to deal with.

The test used was existing gcc.c-torture/execute/20020506-1.c -O2 zbb
It elimiates the SEXT.W and an additional branch

before					after
-------					------
test2:					test2:
	sext.b	a0,a0
	blt	a0,zero,.L15
	bne	a1,zero,.L17			bne     a1,zero,.L17

This is marked RFC as I only ran a spot check with a directed test and
want to use Patrick's pre-commit CI to do the A/B testing for me.

gcc/ChangeLog:
	* config/riscv/riscv.cc (riscv_extend_comparands): Don't
	sign-extend operand if subreg promoted with inner word size.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
---
 gcc/config/riscv/riscv.cc | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)
  

Comments

Robin Dapp Oct. 25, 2023, 7:12 a.m. UTC | #1
Hi Vineet,

I was thinking of two things while skimming the code:

 - Couldn't we do this in the expanders directly?  Or is the
   subreg-promoted info gone until we reach that?

 - Should some common-code part be more suited to handle that?
   We already elide redundant sign-zero extensions for other
   reasons.  Maybe we could add subreg promoted handling there?

Regards
 Robin
  
Jeff Law Oct. 25, 2023, 1:32 p.m. UTC | #2
On 10/25/23 01:12, Robin Dapp wrote:
> Hi Vineet,
> 
> I was thinking of two things while skimming the code:
> 
>   - Couldn't we do this in the expanders directly?  Or is the
>     subreg-promoted info gone until we reach that?
Well, it doesn't seem like there's a lot of difference between doing it 
in the generic expander bits vs target expander bits -- the former just 
calls into the latter for the most part.  Thus if the subreg-promoted 
state is available in the target expander, I'd expect it to be available 
in the generic expander.

It may be the case that we have more places to fix because we have 
different expander paths (think conditional branches, conditional moves, 
sCC and probably others).  By catching it in riscv_expand_comparands he 
caught a nice little choke point.  I think what Vineet has done will 
also work for RTL if-conversion.


> 
>   - Should some common-code part be more suited to handle that?
>     We already elide redundant sign-zero extensions for other
>     reasons.  Maybe we could add subreg promoted handling there?
Unsure.  I wouldn't be surprised if we were able to find similar code in 
simplify-rtx or something like that.  It's probably worth a quick looksie.

I also wonder if Vineet's work would subsume this local change.  I've 
been meaning to find testcases for this and determine if we should drop 
it or clean it up and submit it upstream:

> +(define_insn "*branch<mode>_equals_zero"
> +  [(set (pc)
> +       (if_then_else
> +        (match_operator 1 "equality_operator"
> +                        [(match_operand:ANYI 2 "register_operand" "r")
> +                         (const_int 0)])
> +        (label_ref (match_operand 0 "" ""))
> +        (pc)))]
> +  "!partial_subreg_p (operands[2])"
> +  "b%C1\t%2,zero,%0"
> +  [(set_attr "type" "branch")
> +   (set_attr "mode" "none")])


My sense is it's just papering over a missed simplification elsewhere.

Jeff
  
Robin Dapp Oct. 25, 2023, 1:47 p.m. UTC | #3
> Well, it doesn't seem like there's a lot of difference between doing
> it in the generic expander bits vs target expander bits -- the former
> just calls into the latter for the most part.  Thus if the
> subreg-promoted state is available in the target expander, I'd expect
> it to be available in the generic expander.

Ah, sorry I meant in the [sign|zero]extend expanders rather than the
compare expanders in order to catch promoted subregs from other origins
as well.  Maybe that doesn't work, though?

Regards
 Robin
  
Jeff Law Oct. 25, 2023, 1:52 p.m. UTC | #4
On 10/25/23 07:47, Robin Dapp wrote:
> 
>> Well, it doesn't seem like there's a lot of difference between doing
>> it in the generic expander bits vs target expander bits -- the former
>> just calls into the latter for the most part.  Thus if the
>> subreg-promoted state is available in the target expander, I'd expect
>> it to be available in the generic expander.
> 
> Ah, sorry I meant in the [sign|zero]extend expanders rather than the
> compare expanders in order to catch promoted subregs from other origins
> as well.  Maybe that doesn't work, though?
That's a *really* interesting idea.  Interested in playing around a bit 
with that Vineet?

jeff
  
Vineet Gupta Oct. 25, 2023, 4:25 p.m. UTC | #5
Hey Robin,

On 10/25/23 00:12, Robin Dapp wrote:
> Hi Vineet,
>
> I was thinking of two things while skimming the code:
>
>   - Couldn't we do this in the expanders directly?  Or is the
>     subreg-promoted info gone until we reach that?

Following is the call stack involved:

   expand_gimple_cond
     do_compare_and_jump
        emit_cmp_and_jump_insns
            gen_cbranchqi4
                riscv_expand_conditional_branch
                    riscv_emit_int_compare
                       riscv_extend_comparands


Last function is what introduces the extraneous sign extends, w/o taking 
subreg-promoted into consideration and what my patch attempts to address.

>   - Should some common-code part be more suited to handle that?
>     We already elide redundant sign-zero extensions for other
>     reasons.  Maybe we could add subreg promoted handling there?

Not in the context of this specific issue.

-Vineet
  
Jeff Law Oct. 25, 2023, 4:30 p.m. UTC | #6
On 10/25/23 10:25, Vineet Gupta wrote:
> Hey Robin,
> 
> On 10/25/23 00:12, Robin Dapp wrote:
>> Hi Vineet,
>>
>> I was thinking of two things while skimming the code:
>>
>>   - Couldn't we do this in the expanders directly?  Or is the
>>     subreg-promoted info gone until we reach that?
> 
> Following is the call stack involved:
> 
>    expand_gimple_cond
>      do_compare_and_jump
>         emit_cmp_and_jump_insns
>             gen_cbranchqi4
>                 riscv_expand_conditional_branch
>                     riscv_emit_int_compare
>                        riscv_extend_comparands
> 
> 
> Last function is what introduces the extraneous sign extends, w/o taking 
> subreg-promoted into consideration and what my patch attempts to address.
> 
>>   - Should some common-code part be more suited to handle that?
>>     We already elide redundant sign-zero extensions for other
>>     reasons.  Maybe we could add subreg promoted handling there?
> 
> Not in the context of this specific issue.
Robin's point (IIUC) is that if we put this logic into a zero/sign 
extend expander, then it'll get used for *any* attempt to zero/sign 
extend that goes through the target expander.

It doesn't work for your case because we use gen_rtx_{ZERO,SIGN}_EXTEND 
directly.   But if those were adjusted to use the expander, then Robin's 
idea would be applicable to this case too.

Jeff
  
Vineet Gupta Oct. 25, 2023, 4:31 p.m. UTC | #7
On 10/25/23 09:30, Jeff Law wrote:
>>>   - Should some common-code part be more suited to handle that?
>>>     We already elide redundant sign-zero extensions for other
>>>     reasons.  Maybe we could add subreg promoted handling there?
>>
>> Not in the context of this specific issue.
> Robin's point (IIUC) is that if we put this logic into a zero/sign 
> extend expander, then it'll get used for *any* attempt to zero/sign 
> extend that goes through the target expander.
>
> It doesn't work for your case because we use 
> gen_rtx_{ZERO,SIGN}_EXTEND directly.   But if those were adjusted to 
> use the expander, then Robin's idea would be applicable to this case too

Understood. Definitely solid idea.

-Vineet
  
Vineet Gupta Oct. 25, 2023, 4:37 p.m. UTC | #8
On 10/25/23 06:52, Jeff Law wrote:
>
> On 10/25/23 07:47, Robin Dapp wrote:
>>
>>> Well, it doesn't seem like there's a lot of difference between doing
>>> it in the generic expander bits vs target expander bits -- the former
>>> just calls into the latter for the most part.  Thus if the
>>> subreg-promoted state is available in the target expander, I'd expect
>>> it to be available in the generic expander.
>>
>> Ah, sorry I meant in the [sign|zero]extend expanders rather than the
>> compare expanders in order to catch promoted subregs from other origins
>> as well.  Maybe that doesn't work, though?
> That's a *really* interesting idea.  Interested in playing around a 
> bit with that Vineet?

Sure I'll tinker with the {sign,zero}expanders.

And there's a third playing field :-) There seem to be still cases where 
subreg-promoted note is not set when it probably should.
So we end up in riscv_extend_comparands but with note not being there 
(for something corresponding to function arg) thus can't skip the extension.

Thx,
-Vineet
  
Vineet Gupta Oct. 25, 2023, 11:28 p.m. UTC | #9
On 10/24/23 22:01, Vineet Gupta wrote:
> RV64 comapre and branch instructions only support 64-bit operands.
> The backend unconditionally generates zero/sign extend at Expand time
> for compare and branch operands even if they are already as such
> e.g. function args which ABI guarantees to be sign-extended (at callsite).
>
> And subsequently REE fails to eliminate them as
> 	"missing defintion(s)"
> specially for function args since they show up as missing explicit
> definition.
>
> So elide the sign extensions at Expand time for a subreg promoted var
> with inner word sized value whcih doesn't need explicit sign extending
> (fairly good represntative of an incoming function arg).
>
> There are patches floating to enhance REE and/or new passes to elimiate
> extensions, but it is always better to not generate them if possible,
> given Expand is so early, the elimiated extend would help improve outcomes
> of later passes such as combine if they have fewer operations/combinations
> to deal with.
>
> The test used was existing gcc.c-torture/execute/20020506-1.c -O2 zbb
> It elimiates the SEXT.W and an additional branch
>
> before					after
> -------					------
> test2:					test2:
> 	sext.b	a0,a0
> 	blt	a0,zero,.L15
> 	bne	a1,zero,.L17			bne     a1,zero,.L17
>
> This is marked RFC as I only ran a spot check with a directed test and
> want to use Patrick's pre-commit CI to do the A/B testing for me.
>
> gcc/ChangeLog:
> 	* config/riscv/riscv.cc (riscv_extend_comparands): Don't
> 	sign-extend operand if subreg promoted with inner word size.
>
> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
> ---
>   gcc/config/riscv/riscv.cc | 11 ++++++++++-
>   1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index f2dcb0db6fbd..a8d12717e43d 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -3707,7 +3707,16 @@ riscv_extend_comparands (rtx_code code, rtx *op0, rtx *op1)
>   	}
>         else
>   	{
> -	  *op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
> +	/* A subreg promoted SI of DI could be peeled to expose DI, eliding
> +	   an unnecessary sign extension.  */
> +	if (GET_CODE (*op0) == SUBREG
> +	    && SUBREG_PROMOTED_VAR_P (*op0)
> +	    && GET_MODE_SIZE (GET_MODE (XEXP (*op0, 0))).to_constant ()
> +	       == GET_MODE_SIZE (word_mode))
> +	     *op0 = XEXP (*op0, 0);
> +	else
> +	     *op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
> +
>   	  if (*op1 != const0_rtx)
>   	    *op1 = gen_rtx_SIGN_EXTEND (word_mode, *op1);
>   	}

Just a quick update that testing revealed additional failures and 
unpacking which took me a while and in hindsight embarrassing. I was 
misunderstanding what ABI guarantees and what ISA actually does :-)

The ABI guarantees sign extension this for 32-bit things in 64-bit reg 
(so in rv64, int), but *not* 8-bit things in a reg. i.e. not for char 
arguments.
e.g.

uint8_t abs8(uint8_t x)
{
    if (x & 0x80) // SEXT.b a4, a0
       ...
}

main()
{
     if (abs8(128) != 127)     // LI a0, 128  => ADDI a0, x0, 128
        __builtin_abort();
}

So my patch was optimizing away the SEXT.B (despite claiming that subreg 
prom of SI....) which is wrong.
Sure ADDI in main () sign-extends, but it does that for 12-bit imm 
0x080, which comes out to 0x80, but sign-extending for char scope 0x80 
would yield 0xFFFF....0x80. Hence the issue.

I'll spin a v2 after more testing.

This is slightly disappointing as it reduces the scope of optimization, 
but correctness in this trade is non-negotiable :-)

-Vineet
  

Patch

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f2dcb0db6fbd..a8d12717e43d 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -3707,7 +3707,16 @@  riscv_extend_comparands (rtx_code code, rtx *op0, rtx *op1)
 	}
       else
 	{
-	  *op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
+	/* A subreg promoted SI of DI could be peeled to expose DI, eliding
+	   an unnecessary sign extension.  */
+	if (GET_CODE (*op0) == SUBREG
+	    && SUBREG_PROMOTED_VAR_P (*op0)
+	    && GET_MODE_SIZE (GET_MODE (XEXP (*op0, 0))).to_constant ()
+	       == GET_MODE_SIZE (word_mode))
+	     *op0 = XEXP (*op0, 0);
+	else
+	     *op0 = gen_rtx_SIGN_EXTEND (word_mode, *op0);
+
 	  if (*op1 != const0_rtx)
 	    *op1 = gen_rtx_SIGN_EXTEND (word_mode, *op1);
 	}