[2/4,ifcvt] if convert x=c ? y+z : y by RISC-V Zicond like insns

Message ID 20231030072523.26818-3-gaofei@eswincomputing.com
State Unresolved
Headers
Series add support for conditional zero operation |

Checks

Context Check Description
snail/gcc-patch-check warning Git am fail log

Commit Message

Fei Gao Oct. 30, 2023, 7:25 a.m. UTC
  Conditional add, if zero
rd = (rc == 0) ? (rs1 + rs2) : rs1
-->
czero.nez rd, rs2, rc
add rd, rs1, rd

Conditional add, if non-zero
rd = (rc != 0) ? (rs1 + rs2) : rs1
-->
czero.eqz rd, rs2, rc
add rd, rs1, rd

Co-authored-by: Xiao Zeng<zengxiao@eswincomputing.com>

gcc/ChangeLog:

        * ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
        (noce_try_cond_zero_arith): handler for condtional zero op
        (noce_process_if_block): add noce_try_cond_zero_arith with hook control

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
---
 gcc/ifcvt.cc                                  | 112 +++++++++++++++
 .../gcc.target/riscv/zicond_ifcvt_opt.c       | 130 ++++++++++++++++++
 2 files changed, 242 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
  

Comments

Jeff Law Oct. 30, 2023, 4:36 p.m. UTC | #1
On 10/30/23 01:25, Fei Gao wrote:
> Conditional add, if zero
> rd = (rc == 0) ? (rs1 + rs2) : rs1
> -->
> czero.nez rd, rs2, rc
> add rd, rs1, rd
> 
> Conditional add, if non-zero
> rd = (rc != 0) ? (rs1 + rs2) : rs1
> -->
> czero.eqz rd, rs2, rc
> add rd, rs1, rd
> 
> Co-authored-by: Xiao Zeng<zengxiao@eswincomputing.com>
> 
> gcc/ChangeLog:
> 
>          * ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
>          (noce_try_cond_zero_arith): handler for condtional zero op
>          (noce_process_if_block): add noce_try_cond_zero_arith with hook control
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
So the idea here is to improve upon the current code we generate for 
conditional arithmetic.  Right now we support conditional arithmetic 
using zicond, but the sequence is poor.

Basically the if-converter knows how to generate a conditional add, but 
it does so in a way that isn't as efficient as it could be.

In effect ifcvt wants to generate

t = a + b
res = cond ? t : b


We want to change it to

t = cond ? b : 0;
res = a + t;

The latter sequence expands to more efficient code trivially for risc-v.

I wandered a bit through the combine dumps to see if it would be easy to 
capture this class of cases.  We never get anything useful, and while I 
can imagine "bridge" patterns that would potentially expose enough RTL 
to allow us to rewrite without changing ifcvt, it'd just be a hack IMHO.

So going back to ifcvt...

In the first sequence the addition must wait for both "a" and "b" to be 
available and the conditional move can fire on the next cycle.

In the second sequence the conditional move can fire when just "b" is 
available.  So that gives "a" another cycle to become ready (say if it's 
coming from memory or a multi-cycle operation like multiply).

On the other hand the second sequence does keep "a" live longer.

In the end I strongly suspect neither sequence is significantly better 
than the other.  Meaning I don't think we need to conditionalize using 
condzero arith at all.


I'll note that subsequent patches add MINUS, IOR, XOR and AND.  It's 
also possible (and important) to handle shifts.  There's a conditional 
shift-by-6 in leela's hot path.

Overall this looks a lot like the VRULL code, but just less complete. 
My inclination is to do a cleanup pass on the VRULL code verify it 
handles all the cases in your tests and commit the VRULL implementation 
with your tests.

I'll do some further poking at this today.  Thanks for re-submitting 
these bits.  Getting this target independent work cleaned up has been on 
my TODO for a while now.

jeff
  
Jeff Law Oct. 30, 2023, 6:41 p.m. UTC | #2
On 10/30/23 01:25, Fei Gao wrote:
> Conditional add, if zero
> rd = (rc == 0) ? (rs1 + rs2) : rs1
> -->
> czero.nez rd, rs2, rc
> add rd, rs1, rd
> 
> Conditional add, if non-zero
> rd = (rc != 0) ? (rs1 + rs2) : rs1
> -->
> czero.eqz rd, rs2, rc
> add rd, rs1, rd
> 
> Co-authored-by: Xiao Zeng<zengxiao@eswincomputing.com>
> 
> gcc/ChangeLog:
> 
>          * ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
>          (noce_try_cond_zero_arith): handler for condtional zero op
>          (noce_process_if_block): add noce_try_cond_zero_arith with hook control
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
Just an intermediate follow-up.  As I expected, the work we have 
internally fixes all the cases included in this patch.  Similarly for 
patch #3.

jeff
  
Jeff Law Oct. 30, 2023, 7:16 p.m. UTC | #3
On 10/30/23 01:25, Fei Gao wrote:
> Conditional add, if zero
> rd = (rc == 0) ? (rs1 + rs2) : rs1
> -->
> czero.nez rd, rs2, rc
> add rd, rs1, rd
> 
> Conditional add, if non-zero
> rd = (rc != 0) ? (rs1 + rs2) : rs1
> -->
> czero.eqz rd, rs2, rc
> add rd, rs1, rd
> 
> Co-authored-by: Xiao Zeng<zengxiao@eswincomputing.com>
> 
> gcc/ChangeLog:
> 
>          * ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
>          (noce_try_cond_zero_arith): handler for condtional zero op
>          (noce_process_if_block): add noce_try_cond_zero_arith with hook control
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
> ---
>   gcc/ifcvt.cc                                  | 112 +++++++++++++++
>   .../gcc.target/riscv/zicond_ifcvt_opt.c       | 130 ++++++++++++++++++
>   2 files changed, 242 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
> 
> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index a0af553b9ff..4f98c1c7bf9 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> +static rtx
> +noce_emit_czero (struct noce_if_info *if_info, enum rtx_code czero_code, rtx z, rtx target)
> +{
> +  machine_mode mode = GET_MODE (target);
> +  rtx cond_op0 = XEXP (if_info->cond, 0);
> +  rtx czero_cond
> +    = gen_rtx_fmt_ee (czero_code, GET_MODE (cond_op0), cond_op0, const0_rtx);
> +  rtx if_then_else = gen_rtx_IF_THEN_ELSE (mode, czero_cond, const0_rtx, z);
> +  rtx set = gen_rtx_SET (target, if_then_else);
> +
> +  start_sequence ();
> +  rtx_insn *insn = emit_insn (set);
> +
> +  if (recog_memoized (insn) >= 0)
> +    {
> +      rtx_insn *seq = get_insns ();
> +      end_sequence ();
> +      emit_insn (seq);
> +
> +      return target;
> +    }
> +
> +  end_sequence ();
> +  return NULL_RTX;
> +}
So just a few notes to further illustrate why I'm currently looking to 
take the VRULL+Ventana implementation.  The code above would be much 
better handled by just calling noce_emit_cmove.  noce_emit_cmove will go 
through the conditional move expander.  So any improvement we make in 
the expander "just work" when called from the if-converter.
> +
>   /* Try only simple constants and registers here.  More complex cases
>      are handled in noce_try_cmove_arith after noce_try_store_flag_arith
>      has had a go at it.  */
> @@ -2880,6 +2908,88 @@ noce_try_sign_mask (struct noce_if_info *if_info)
>     return true;
>   }
>   
> +/* Convert x = c ? y + z : y or x = c ? y : y + z. */
> +
> +static bool
> +noce_try_cond_zero_arith (struct noce_if_info *if_info)
> +{
> +  rtx target;
> +  rtx_insn *seq;
> +  machine_mode mode = GET_MODE (if_info->x);
> +  rtx common = NULL_RTX;
> +  enum rtx_code czero_code = UNKNOWN;
> +  rtx a = if_info->a;
> +  rtx b = if_info->b;
> +  rtx z = NULL_RTX;
> +  rtx cond = if_info->cond;
> +
> +  if (!noce_simple_bbs (if_info))
> +    return false;
[ ... ]
So the internal code we have does a bit of canonicalization before the 
optimizing transformations.  In particular we may be presented with

(a == 0) ? b : a which we transform into (a != 0 ? a : b) which allows 
us to pick up more cases.  (b != 0 ? b : a) gets similar handling.

As I mentioned earlier, the VRULL+Ventana code handles wrapping 
extensions & subregs.  Our code also handles if-converting shifts/rotates.

Hopefully that explains a bit more why I think cleaning up the 
VRULL+Ventana code is a better choice.

jeff
  
Fei Gao Oct. 31, 2023, 2:53 a.m. UTC | #4
On 2023-10-31 00:36  Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
>On 10/30/23 01:25, Fei Gao wrote:
>> Conditional add, if zero
>> rd = (rc == 0) ? (rs1 + rs2) : rs1
>> -->
>> czero.nez rd, rs2, rc
>> add rd, rs1, rd
>>
>> Conditional add, if non-zero
>> rd = (rc != 0) ? (rs1 + rs2) : rs1
>> -->
>> czero.eqz rd, rs2, rc
>> add rd, rs1, rd
>>
>> Co-authored-by: Xiao Zeng<zengxiao@eswincomputing.com>
>>
>> gcc/ChangeLog:
>>
>>          * ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
>>          (noce_try_cond_zero_arith): handler for condtional zero op
>>          (noce_process_if_block): add noce_try_cond_zero_arith with hook control
>>
>> gcc/testsuite/ChangeLog:
>>
>>          * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
>So the idea here is to improve upon the current code we generate for
>conditional arithmetic.  Right now we support conditional arithmetic
>using zicond, but the sequence is poor.
>
>Basically the if-converter knows how to generate a conditional add, but
>it does so in a way that isn't as efficient as it could be.
>
>In effect ifcvt wants to generate
>
>t = a + b
>res = cond ? t : b
>
>
>We want to change it to
>
>t = cond ? b : 0;
>res = a + t;
>
>The latter sequence expands to more efficient code trivially for risc-v. 
Exactly. 2 less insns for add case below:
long test_ADD_ceqz(long x, long y, long z, long c){
  if (c)
    x = y + z;  
  else
    x = y;  
  return x;
  }
  
test_ADD_ceqz(before this patch): 
  add a2,a1,a2
  czero.eqz a0,a2,a3
  czero.nez a3,a1,a3
  or a0,a3,a0
ret

test_ADD_ceqz(after this patch):
  czero.eqz a3,a2,a3
  add a0,a1,a3
  ret
>
>I wandered a bit through the combine dumps to see if it would be easy to
>capture this class of cases.  We never get anything useful, and while I
>can imagine "bridge" patterns that would potentially expose enough RTL
>to allow us to rewrite without changing ifcvt, it'd just be a hack IMHO.
>
>So going back to ifcvt...
>
>In the first sequence the addition must wait for both "a" and "b" to be
>available and the conditional move can fire on the next cycle.
>
>In the second sequence the conditional move can fire when just "b" is
>available.  So that gives "a" another cycle to become ready (say if it's
>coming from memory or a multi-cycle operation like multiply).
>
>On the other hand the second sequence does keep "a" live longer.
>
>In the end I strongly suspect neither sequence is significantly better
>than the other.  Meaning I don't think we need to conditionalize using
>condzero arith at all. 
As shown case above, 2 less insns with using condzero arith.

>
>
>I'll note that subsequent patches add MINUS, IOR, XOR and AND.  It's
>also possible (and important) to handle shifts.  There's a conditional
>shift-by-6 in leela's hot path. 
This series is a initial framework for simple condzero arith. Shift may come later
as it involes sugreg stuff.

>
>Overall this looks a lot like the VRULL code, but just less complete.
>My inclination is to do a cleanup pass on the VRULL code verify it
>handles all the cases in your tests and commit the VRULL implementation
>with your tests. 
I searched and didn't find VRULL codes, could you please provide a link at
your convience? My colleague Zeng Xiao posted monthes ago 
https://patchwork.sourceware.org/project/gcc/patch/20230719101156.21771-6-zengxiao@eswincomputing.com/
But after fixing several bugs, we realized the previous implementation is quite complex and
come up with this patch series.

>
>I'll do some further poking at this today.  Thanks for re-submitting
>these bits.  Getting this target independent work cleaned up has been on
>my TODO for a while now. 
Thanks for your patience.

BR, 
Fei

>
>jeff
  
Fei Gao Oct. 31, 2023, 3:35 a.m. UTC | #5
On 2023-10-31 03:16  Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
>On 10/30/23 01:25, Fei Gao wrote:
>> Conditional add, if zero
>> rd = (rc == 0) ? (rs1 + rs2) : rs1
>> -->
>> czero.nez rd, rs2, rc
>> add rd, rs1, rd
>>
>> Conditional add, if non-zero
>> rd = (rc != 0) ? (rs1 + rs2) : rs1
>> -->
>> czero.eqz rd, rs2, rc
>> add rd, rs1, rd
>>
>> Co-authored-by: Xiao Zeng<zengxiao@eswincomputing.com>
>>
>> gcc/ChangeLog:
>>
>>          * ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
>>          (noce_try_cond_zero_arith): handler for condtional zero op
>>          (noce_process_if_block): add noce_try_cond_zero_arith with hook control
>>
>> gcc/testsuite/ChangeLog:
>>
>>          * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
>> ---
>>   gcc/ifcvt.cc                                  | 112 +++++++++++++++
>>   .../gcc.target/riscv/zicond_ifcvt_opt.c       | 130 ++++++++++++++++++
>>   2 files changed, 242 insertions(+)
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
>>
>> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
>> index a0af553b9ff..4f98c1c7bf9 100644
>> --- a/gcc/ifcvt.cc
>> +++ b/gcc/ifcvt.cc
>> +static rtx
>> +noce_emit_czero (struct noce_if_info *if_info, enum rtx_code czero_code, rtx z, rtx target)
>> +{
>> +  machine_mode mode = GET_MODE (target);
>> +  rtx cond_op0 = XEXP (if_info->cond, 0);
>> +  rtx czero_cond
>> +    = gen_rtx_fmt_ee (czero_code, GET_MODE (cond_op0), cond_op0, const0_rtx);
>> +  rtx if_then_else = gen_rtx_IF_THEN_ELSE (mode, czero_cond, const0_rtx, z);
>> +  rtx set = gen_rtx_SET (target, if_then_else);
>> +
>> +  start_sequence ();
>> +  rtx_insn *insn = emit_insn (set);
>> +
>> +  if (recog_memoized (insn) >= 0)
>> +    {
>> +      rtx_insn *seq = get_insns ();
>> +      end_sequence ();
>> +      emit_insn (seq);
>> +
>> +      return target;
>> +    }
>> +
>> +  end_sequence ();
>> +  return NULL_RTX;
>> +}
>So just a few notes to further illustrate why I'm currently looking to
>take the VRULL+Ventana implementation.  The code above would be much
>better handled by just calling noce_emit_cmove.  noce_emit_cmove will go
>through the conditional move expander.  So any improvement we make in
>the expander "just work" when called from the if-converter. 
noce_emit_czero is used here to make sure czero insns are emited. 
noce_emit_cmove includes SFB and Thead movcc, which will take precedence
over zicond in RISCV if enabled. Unfortunately we have products with SFB and Zicond
both available and saw such conflict. 
And that is also the reason to add hook TARGET_HAVE_COND_ZERO
in [PATCH 1/4] to disallow ineffient code emited by SFB enable and Zicond disabled case. 

>> +
>>   /* Try only simple constants and registers here.  More complex cases
>>      are handled in noce_try_cmove_arith after noce_try_store_flag_arith
>>      has had a go at it.  */
>> @@ -2880,6 +2908,88 @@ noce_try_sign_mask (struct noce_if_info *if_info)
>>     return true;
>>   }
>>  
>> +/* Convert x = c ? y + z : y or x = c ? y : y + z. */
>> +
>> +static bool
>> +noce_try_cond_zero_arith (struct noce_if_info *if_info)
>> +{
>> +  rtx target;
>> +  rtx_insn *seq;
>> +  machine_mode mode = GET_MODE (if_info->x);
>> +  rtx common = NULL_RTX;
>> +  enum rtx_code czero_code = UNKNOWN;
>> +  rtx a = if_info->a;
>> +  rtx b = if_info->b;
>> +  rtx z = NULL_RTX;
>> +  rtx cond = if_info->cond;
>> +
>> +  if (!noce_simple_bbs (if_info))
>> +    return false;
>[ ... ]
>So the internal code we have does a bit of canonicalization before the
>optimizing transformations.  In particular we may be presented with
>
>(a == 0) ? b : a which we transform into (a != 0 ? a : b) which allows
>us to pick up more cases.  (b != 0 ? b : a) gets similar handling.
>
>As I mentioned earlier, the VRULL+Ventana code handles wrapping
>extensions & subregs.  Our code also handles if-converting shifts/rotates. 
Cool and waiting for your submit. Shifts/rotates can be added in noce_try_cond_zero_arith.
I tried to keep noce_try_cond_zero_arith simple without introducing SCC and other stuff
as addtional insns will be generated for greater than like comparision
but may not be generated for branch-insn based SFB.
IMHO, the earlier the noce_try* function emerges in noce_process_if_block, the simpler
optimization scenarios are and more efficent codes shall be generated,
then the later function like noce_try_cmove_arith will handle the more general case.

BR, 
Fei
>
>Hopefully that explains a bit more why I think cleaning up the
>VRULL+Ventana code is a better choice. 

>
>jeff
  
Jeff Law Nov. 20, 2023, 6:46 a.m. UTC | #6
On 10/30/23 21:35, Fei Gao wrote:

>> So just a few notes to further illustrate why I'm currently looking to
>> take the VRULL+Ventana implementation.  The code above would be much
>> better handled by just calling noce_emit_cmove.  noce_emit_cmove will go
>> through the conditional move expander.  So any improvement we make in
>> the expander "just work" when called from the if-converter.
> noce_emit_czero is used here to make sure czero insns are emited.
> noce_emit_cmove includes SFB and Thead movcc, which will take precedence
> over zicond in RISCV if enabled. Unfortunately we have products with SFB and Zicond
> both available and saw such conflict.
> And that is also the reason to add hook TARGET_HAVE_COND_ZERO
> in [PATCH 1/4] to disallow ineffient code emited by SFB enable and Zicond disabled case.
I understand what you're trying to do, but I would consider the 
TARGET_HAVE_COND_ZERO fundamentally the wrong approach.

I'm willing to defer routing everything through noce_emit_cmove for now, 
but that's really where this code needs to be going.  If that's causing 
a conflict for a particular implementation with both SFB and Zicond, 
then we'll have to look at the details and adjust things in the target 
files.


> Cool and waiting for your submit. Shifts/rotates can be added in noce_try_cond_zero_arith.
Fully agreed.  Those are easy.

> I tried to keep noce_try_cond_zero_arith simple without introducing SCC and other stuff
> as addtional insns will be generated for greater than like comparision
> but may not be generated for branch-insn based SFB.
And I think the result is we're going to fail to implement many 
profitable if-conversions.


Jeff
  
Jeff Law Nov. 20, 2023, 6:59 a.m. UTC | #7
On 10/30/23 01:25, Fei Gao wrote:
> Conditional add, if zero
> rd = (rc == 0) ? (rs1 + rs2) : rs1
> -->
> czero.nez rd, rs2, rc
> add rd, rs1, rd
> 
> Conditional add, if non-zero
> rd = (rc != 0) ? (rs1 + rs2) : rs1
> -->
> czero.eqz rd, rs2, rc
> add rd, rs1, rd
> 
> Co-authored-by: Xiao Zeng<zengxiao@eswincomputing.com>
> 
> gcc/ChangeLog:
> 
>          * ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
>          (noce_try_cond_zero_arith): handler for condtional zero op
>          (noce_process_if_block): add noce_try_cond_zero_arith with hook control
> 
> gcc/testsuite/ChangeLog:
> 
>          * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
> ---
>   gcc/ifcvt.cc                                  | 112 +++++++++++++++
>   .../gcc.target/riscv/zicond_ifcvt_opt.c       | 130 ++++++++++++++++++
>   2 files changed, 242 insertions(+)
>   create mode 100644 gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
> 
> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
> index a0af553b9ff..4f98c1c7bf9 100644
> --- a/gcc/ifcvt.cc
> +++ b/gcc/ifcvt.cc
> @@ -781,12 +781,14 @@ static bool noce_try_store_flag_constants (struct noce_if_info *);
>   static bool noce_try_store_flag_mask (struct noce_if_info *);
>   static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx,
>   			    rtx, rtx, rtx, rtx = NULL, rtx = NULL);
> +static rtx noce_emit_czero (struct noce_if_info *, enum rtx_code, rtx, rtx);
>   static bool noce_try_cmove (struct noce_if_info *);
>   static bool noce_try_cmove_arith (struct noce_if_info *);
>   static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
>   static bool noce_try_minmax (struct noce_if_info *);
>   static bool noce_try_abs (struct noce_if_info *);
>   static bool noce_try_sign_mask (struct noce_if_info *);
> +static bool noce_try_cond_zero_arith (struct noce_if_info *);
>   
>   /* Return the comparison code for reversed condition for IF_INFO,
>      or UNKNOWN if reversing the condition is not possible.  */
> @@ -1831,6 +1833,32 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx x, enum rtx_code code,
>       return NULL_RTX;
>   }
>   
> +static rtx
> +noce_emit_czero (struct noce_if_info *if_info, enum rtx_code czero_code, rtx z, rtx target)
Every function needs a comment describing what the function does, it's 
return value(s) and its arguments.  There are many examples in ifcvt.cc 
you can use to guide you.  I might start with something like this:

/* Emit a conditional zero, returning the location of the result
    or NULL_RTX upon failure.

    IF_INFO describes the if-conversion scenario under consideration.
    CZERO_CODE selects the condition (EQ/NE).
    Z is the nonzero operand of the conditional move
    TARGET is the desired output register.  */

Or something like that.  I would suggest renaming "Z" to something more 
meaningful.



>   
> +/* Convert x = c ? y + z : y or x = c ? y : y + z. */
> +
> +static bool
> +noce_try_cond_zero_arith (struct noce_if_info *if_info)
The function comment really should be improved.  For example it doesn't 
indicate what the return value is.

> +
> +  /* cond must be EQ or NEQ comparision of a reg and 0.  */
In general when you refer to a variable in a comment, do so in upper 
case.  Use NE rather than NEQ as the former is how most code refers to a 
not-equal rtx code.


> +  if (GET_CODE (cond) != NE && GET_CODE (cond) != EQ)
> +    return false;
> +  if (!REG_P (XEXP (cond, 0)) || !rtx_equal_p (XEXP (cond, 1), const0_rtx))
> +    return false;
> +
> +  /* check y + z:y*/
> +  if (GET_CODE (a) == PLUS && REG_P (XEXP (a, 0)) && REG_P (XEXP (a, 1))
> +      && REG_P (b) && rtx_equal_p (XEXP (a, 0), b))
Write comments as complete sentences.

> +    {
> +      common = b;
> +      z = XEXP (a, 1);
Rather than "z" use a more descriptive variable name.


> +
> +  /* If we have x = c ? x + z : x, use a new reg to avoid modifying x  */
> +  if (common && rtx_equal_p (common, if_info->x))
> +    target = gen_reg_rtx (mode);
> +  else
> +    target = if_info->x;
if-conversion runs both before and after register allocation.  So you 
have to handle the case where you can not generate new registers.  Use 
can_create_pseudo_p () to test for that.  You may need to fail if you 
can't generate a new register.

> +
> +  target = noce_emit_czero (if_info, czero_code, z, target);
> +  if (!target)
> +    {
> +      end_sequence ();
> +      return false;
> +    }
> +
> +  target = expand_simple_binop (mode, PLUS, common, target, if_info->x, 0,
> +				OPTAB_DIRECT);
I think you want OPTAB_WIDEN and in the other calls.

> @@ -3975,6 +4085,8 @@ noce_process_if_block (struct noce_if_info *if_info)
>   	goto success;
>         if (noce_try_store_flag_mask (if_info))
>   	goto success;
> +      if (targetm.have_cond_zero () && noce_try_cond_zero_arith (if_info))
> +	goto success;
Replace targetm.have_cond_zero with HAVE_conditional_move since that's 
the RTL primitive we're building from.


>
> +**test_ADD_ceqz:
> +**	czero\.eqz	a3,a2,a3
> +**	add	a0,a1,a3
> +**	ret
Please don't use explicit registers unless you know they will always be 
correct.  In this sequence there's no guarantee the register allocator 
will put the result of the czero.eqz into $a3.  Use a suitable regexp 
instead to match a variety of registers.  This will be an issue for all 
your new tests.



Jeff
  
Fei Gao Nov. 28, 2023, 2:46 a.m. UTC | #8
On 2023-11-20 14:46  Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
>On 10/30/23 21:35, Fei Gao wrote:
>
>>> So just a few notes to further illustrate why I'm currently looking to
>>> take the VRULL+Ventana implementation.  The code above would be much
>>> better handled by just calling noce_emit_cmove.  noce_emit_cmove will go
>>> through the conditional move expander.  So any improvement we make in
>>> the expander "just work" when called from the if-converter.
>> noce_emit_czero is used here to make sure czero insns are emited.
>> noce_emit_cmove includes SFB and Thead movcc, which will take precedence
>> over zicond in RISCV if enabled. Unfortunately we have products with SFB and Zicond
>> both available and saw such conflict.
>> And that is also the reason to add hook TARGET_HAVE_COND_ZERO
>> in [PATCH 1/4] to disallow ineffient code emited by SFB enable and Zicond disabled case.
>I understand what you're trying to do, but I would consider the
>TARGET_HAVE_COND_ZERO fundamentally the wrong approach. 
Hi Jeff

Thanks for your review. I just post the new series.
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg327148.html
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg327151.html
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg327149.html
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg327150.html

TARGET_HAVE_COND_ZERO has been deleted. 

>
>I'm willing to defer routing everything through noce_emit_cmove for now,
>but that's really where this code needs to be going.  If that's causing
>a conflict for a particular implementation with both SFB and Zicond,
>then we'll have to look at the details and adjust things in the target
>files. 
Agree. We can try noce_emit_cmove later with more TCs integrated recently.
Also I tried to solve the conflict found in my TCs in [PATCH 1/4] and [PATCH 4/4].

>
>
>> Cool and waiting for your submit. Shifts/rotates can be added in noce_try_cond_zero_arith.
>Fully agreed.  Those are easy. 
Shifts/rotates have been added. 

BR, 
Fei
>
>> I tried to keep noce_try_cond_zero_arith simple without introducing SCC and other stuff
>> as addtional insns will be generated for greater than like comparision
>> but may not be generated for branch-insn based SFB.
>And I think the result is we're going to fail to implement many
>profitable if-conversions.
>
>
>Jeff
  
Fei Gao Nov. 28, 2023, 2:57 a.m. UTC | #9
On 2023-11-20 14:59  Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
>On 10/30/23 01:25, Fei Gao wrote:
>> Conditional add, if zero
>> rd = (rc == 0) ? (rs1 + rs2) : rs1
>> -->
>> czero.nez rd, rs2, rc
>> add rd, rs1, rd
>>
>> Conditional add, if non-zero
>> rd = (rc != 0) ? (rs1 + rs2) : rs1
>> -->
>> czero.eqz rd, rs2, rc
>> add rd, rs1, rd
>>
>> Co-authored-by: Xiao Zeng<zengxiao@eswincomputing.com>
>>
>> gcc/ChangeLog:
>>
>>          * ifcvt.cc (noce_emit_czero): helper for noce_try_cond_zero_arith
>>          (noce_try_cond_zero_arith): handler for condtional zero op
>>          (noce_process_if_block): add noce_try_cond_zero_arith with hook control
>>
>> gcc/testsuite/ChangeLog:
>>
>>          * gcc.target/riscv/zicond_ifcvt_opt.c: New test.
>> ---
>>   gcc/ifcvt.cc                                  | 112 +++++++++++++++
>>   .../gcc.target/riscv/zicond_ifcvt_opt.c       | 130 ++++++++++++++++++
>>   2 files changed, 242 insertions(+)
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
>>
>> diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
>> index a0af553b9ff..4f98c1c7bf9 100644
>> --- a/gcc/ifcvt.cc
>> +++ b/gcc/ifcvt.cc
>> @@ -781,12 +781,14 @@ static bool noce_try_store_flag_constants (struct noce_if_info *);
>>   static bool noce_try_store_flag_mask (struct noce_if_info *);
>>   static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx,
>>       rtx, rtx, rtx, rtx = NULL, rtx = NULL);
>> +static rtx noce_emit_czero (struct noce_if_info *, enum rtx_code, rtx, rtx);
>>   static bool noce_try_cmove (struct noce_if_info *);
>>   static bool noce_try_cmove_arith (struct noce_if_info *);
>>   static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
>>   static bool noce_try_minmax (struct noce_if_info *);
>>   static bool noce_try_abs (struct noce_if_info *);
>>   static bool noce_try_sign_mask (struct noce_if_info *);
>> +static bool noce_try_cond_zero_arith (struct noce_if_info *);
>>  
>>   /* Return the comparison code for reversed condition for IF_INFO,
>>      or UNKNOWN if reversing the condition is not possible.  */
>> @@ -1831,6 +1833,32 @@ noce_emit_cmove (struct noce_if_info *if_info, rtx x, enum rtx_code code,
>>       return NULL_RTX;
>>   }
>>  
>> +static rtx
>> +noce_emit_czero (struct noce_if_info *if_info, enum rtx_code czero_code, rtx z, rtx target)
>Every function needs a comment describing what the function does, it's
>return value(s) and its arguments.  There are many examples in ifcvt.cc
>you can use to guide you.  I might start with something like this:
>
>/* Emit a conditional zero, returning the location of the result
>    or NULL_RTX upon failure.
>
>    IF_INFO describes the if-conversion scenario under consideration.
>    CZERO_CODE selects the condition (EQ/NE).
>    Z is the nonzero operand of the conditional move
>    TARGET is the desired output register.  */
>
>Or something like that.  I would suggest renaming "Z" to something more
>meaningful. 
Hi Jeff

Thanks for your patients. All comments regarding coding style have been addressed in new patches.

>
>
>
>>  
>> +/* Convert x = c ? y + z : y or x = c ? y : y + z. */
>> +
>> +static bool
>> +noce_try_cond_zero_arith (struct noce_if_info *if_info)
>The function comment really should be improved.  For example it doesn't
>indicate what the return value is.
>
>> +
>> +  /* cond must be EQ or NEQ comparision of a reg and 0.  */
>In general when you refer to a variable in a comment, do so in upper
>case.  Use NE rather than NEQ as the former is how most code refers to a
>not-equal rtx code.
>
>
>> +  if (GET_CODE (cond) != NE && GET_CODE (cond) != EQ)
>> +    return false;
>> +  if (!REG_P (XEXP (cond, 0)) || !rtx_equal_p (XEXP (cond, 1), const0_rtx))
>> +    return false;
>> +
>> +  /* check y + z:y*/
>> +  if (GET_CODE (a) == PLUS && REG_P (XEXP (a, 0)) && REG_P (XEXP (a, 1))
>> +      && REG_P (b) && rtx_equal_p (XEXP (a, 0), b))
>Write comments as complete sentences.
>
>> +    {
>> +      common = b;
>> +      z = XEXP (a, 1);
>Rather than "z" use a more descriptive variable name.
>
>
>> +
>> +  /* If we have x = c ? x + z : x, use a new reg to avoid modifying x  */
>> +  if (common && rtx_equal_p (common, if_info->x))
>> +    target = gen_reg_rtx (mode);
>> +  else
>> +    target = if_info->x;
>if-conversion runs both before and after register allocation.  So you
>have to handle the case where you can not generate new registers.  Use
>can_create_pseudo_p () to test for that.  You may need to fail if you
>can't generate a new register. 
1. In find_if_header function, I found the following piece of codes:
if (!reload_completed && noce_find_if_block(...)), and find_if_header must 
be called before noce_try_cond_zero_arith().

2. In noce_try_strore_flag_constants, new registers are also generated
without can_create_pseudo_p() check.

So I guess no need to add can_create_pseudo_p() here.

>
>> +
>> +  target = noce_emit_czero (if_info, czero_code, z, target);
>> +  if (!target)
>> +    {
>> +      end_sequence ();
>> +      return false;
>> +    }
>> +
>> +  target = expand_simple_binop (mode, PLUS, common, target, if_info->x, 0,
>> +	OPTAB_DIRECT);
>I think you want OPTAB_WIDEN and in the other calls.
>
>> @@ -3975,6 +4085,8 @@ noce_process_if_block (struct noce_if_info *if_info)
>>   goto success;
>>         if (noce_try_store_flag_mask (if_info))
>>   goto success;
>> +      if (targetm.have_cond_zero () && noce_try_cond_zero_arith (if_info))
>> +	goto success;
>Replace targetm.have_cond_zero with HAVE_conditional_move since that's
>the RTL primitive we're building from. 
Done. 

>
>
>>
>> +**test_ADD_ceqz:
>> +**	czero\.eqz	a3,a2,a3
>> +**	add	a0,a1,a3
>> +**	ret
>Please don't use explicit registers unless you know they will always be
>correct.  In this sequence there's no guarantee the register allocator
>will put the result of the czero.eqz into $a3.  Use a suitable regexp
>instead to match a variety of registers.  This will be an issue for all
>your new tests. 
Done.

BR, 
Fei
>
>
>
>Jeff
  
Jeff Law Nov. 28, 2023, 5:05 a.m. UTC | #10
On 11/27/23 19:46, Fei Gao wrote:
> On 2023-11-20 14:46  Jeff Law <jeffreyalaw@gmail.com> wrote:
>>
>>
>>
>> On 10/30/23 21:35, Fei Gao wrote:
>>
>>>> So just a few notes to further illustrate why I'm currently looking to
>>>> take the VRULL+Ventana implementation.  The code above would be much
>>>> better handled by just calling noce_emit_cmove.  noce_emit_cmove will go
>>>> through the conditional move expander.  So any improvement we make in
>>>> the expander "just work" when called from the if-converter.
>>> noce_emit_czero is used here to make sure czero insns are emited.
>>> noce_emit_cmove includes SFB and Thead movcc, which will take precedence
>>> over zicond in RISCV if enabled. Unfortunately we have products with SFB and Zicond
>>> both available and saw such conflict.
>>> And that is also the reason to add hook TARGET_HAVE_COND_ZERO
>>> in [PATCH 1/4] to disallow ineffient code emited by SFB enable and Zicond disabled case.
>> I understand what you're trying to do, but I would consider the
>> TARGET_HAVE_COND_ZERO fundamentally the wrong approach.
> Hi Jeff
> 
> Thanks for your review. I just post the new series.
> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg327148.html
> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg327151.html
> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg327149.html
> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg327150.html
> 
> TARGET_HAVE_COND_ZERO has been deleted.
THanks for the V2.  I'll see what I can do with them this week.  The 
series was posted prior to close of stage1, so it can still be 
integrated into gcc-14 if we get it cleaned up.

Jeff
  
Jeff Law Nov. 29, 2023, 4:46 a.m. UTC | #11
On 11/27/23 19:57, Fei Gao wrote:

> 1. In find_if_header function, I found the following piece of codes:
> if (!reload_completed && noce_find_if_block(...)), and find_if_header must
> be called before noce_try_cond_zero_arith().
Ah good.


> 
> 2. In noce_try_strore_flag_constants, new registers are also generated
> without can_create_pseudo_p() check.
> 
> So I guess no need to add can_create_pseudo_p() here.
Agreed.  I could possibly make an argument that it might be nice to be 
able to look for these things after reload has completed, but I think we 
can ignore that for now.  Thanks!

Jeff
  

Patch

diff --git a/gcc/ifcvt.cc b/gcc/ifcvt.cc
index a0af553b9ff..4f98c1c7bf9 100644
--- a/gcc/ifcvt.cc
+++ b/gcc/ifcvt.cc
@@ -781,12 +781,14 @@  static bool noce_try_store_flag_constants (struct noce_if_info *);
 static bool noce_try_store_flag_mask (struct noce_if_info *);
 static rtx noce_emit_cmove (struct noce_if_info *, rtx, enum rtx_code, rtx,
 			    rtx, rtx, rtx, rtx = NULL, rtx = NULL);
+static rtx noce_emit_czero (struct noce_if_info *, enum rtx_code, rtx, rtx);
 static bool noce_try_cmove (struct noce_if_info *);
 static bool noce_try_cmove_arith (struct noce_if_info *);
 static rtx noce_get_alt_condition (struct noce_if_info *, rtx, rtx_insn **);
 static bool noce_try_minmax (struct noce_if_info *);
 static bool noce_try_abs (struct noce_if_info *);
 static bool noce_try_sign_mask (struct noce_if_info *);
+static bool noce_try_cond_zero_arith (struct noce_if_info *);
 
 /* Return the comparison code for reversed condition for IF_INFO,
    or UNKNOWN if reversing the condition is not possible.  */
@@ -1831,6 +1833,32 @@  noce_emit_cmove (struct noce_if_info *if_info, rtx x, enum rtx_code code,
     return NULL_RTX;
 }
 
+static rtx
+noce_emit_czero (struct noce_if_info *if_info, enum rtx_code czero_code, rtx z, rtx target)
+{
+  machine_mode mode = GET_MODE (target);
+  rtx cond_op0 = XEXP (if_info->cond, 0);
+  rtx czero_cond
+    = gen_rtx_fmt_ee (czero_code, GET_MODE (cond_op0), cond_op0, const0_rtx);
+  rtx if_then_else = gen_rtx_IF_THEN_ELSE (mode, czero_cond, const0_rtx, z);
+  rtx set = gen_rtx_SET (target, if_then_else);
+
+  start_sequence ();
+  rtx_insn *insn = emit_insn (set);
+
+  if (recog_memoized (insn) >= 0)
+    {
+      rtx_insn *seq = get_insns ();
+      end_sequence ();
+      emit_insn (seq);
+
+      return target;
+    }
+
+  end_sequence ();
+  return NULL_RTX;
+}
+
 /* Try only simple constants and registers here.  More complex cases
    are handled in noce_try_cmove_arith after noce_try_store_flag_arith
    has had a go at it.  */
@@ -2880,6 +2908,88 @@  noce_try_sign_mask (struct noce_if_info *if_info)
   return true;
 }
 
+/* Convert x = c ? y + z : y or x = c ? y : y + z. */
+
+static bool
+noce_try_cond_zero_arith (struct noce_if_info *if_info)
+{
+  rtx target;
+  rtx_insn *seq;
+  machine_mode mode = GET_MODE (if_info->x);
+  rtx common = NULL_RTX;
+  enum rtx_code czero_code = UNKNOWN;
+  rtx a = if_info->a;
+  rtx b = if_info->b;
+  rtx z = NULL_RTX;
+  rtx cond = if_info->cond;
+
+  if (!noce_simple_bbs (if_info))
+    return false;
+
+  /* cond must be EQ or NEQ comparision of a reg and 0.  */
+  if (GET_CODE (cond) != NE && GET_CODE (cond) != EQ)
+    return false;
+  if (!REG_P (XEXP (cond, 0)) || !rtx_equal_p (XEXP (cond, 1), const0_rtx))
+    return false;
+
+  /* check y + z:y*/
+  if (GET_CODE (a) == PLUS && REG_P (XEXP (a, 0)) && REG_P (XEXP (a, 1))
+      && REG_P (b) && rtx_equal_p (XEXP (a, 0), b))
+    {
+      common = b;
+      z = XEXP (a, 1);
+      /* x = c ? y+z : y, cond = !c --> x = cond ? y : y+z  */
+      czero_code = GET_CODE (cond);
+    }
+  /* check y : y+z  */
+  else if (GET_CODE (b) == PLUS && REG_P (XEXP (b, 0)) && REG_P (XEXP (b, 1))
+	   && REG_P (a) && rtx_equal_p (a, XEXP (b, 0)))
+    {
+      common = a;
+      z = XEXP (b, 1);
+      /* x = c ? y : y+z, cond = !c --> x = !cond ? y : y+z  */
+      czero_code = noce_reversed_cond_code (if_info);
+    }
+  else
+    return false;
+
+  if (czero_code == UNKNOWN)
+    return false;
+
+  start_sequence ();
+
+  /* If we have x = c ? x + z : x, use a new reg to avoid modifying x  */
+  if (common && rtx_equal_p (common, if_info->x))
+    target = gen_reg_rtx (mode);
+  else
+    target = if_info->x;
+
+  target = noce_emit_czero (if_info, czero_code, z, target);
+  if (!target)
+    {
+      end_sequence ();
+      return false;
+    }
+
+  target = expand_simple_binop (mode, PLUS, common, target, if_info->x, 0,
+				OPTAB_DIRECT);
+  if (!target)
+    {
+      end_sequence ();
+      return false;
+    }
+
+  if (target != if_info->x)
+    noce_emit_move_insn (if_info->x, target);
+
+  seq = end_ifcvt_sequence (if_info);
+  if (!seq || !targetm.noce_conversion_profitable_p (seq, if_info))
+    return false;
+
+  emit_insn_before_setloc (seq, if_info->jump, INSN_LOCATION (if_info->insn_a));
+  if_info->transform_name = "noce_try_cond_zero_arith";
+  return true;
+}
 
 /* Optimize away "if (x & C) x |= C" and similar bit manipulation
    transformations.  */
@@ -3975,6 +4085,8 @@  noce_process_if_block (struct noce_if_info *if_info)
 	goto success;
       if (noce_try_store_flag_mask (if_info))
 	goto success;
+      if (targetm.have_cond_zero () && noce_try_cond_zero_arith (if_info))
+	goto success;
       if (HAVE_conditional_move
 	  && noce_try_cmove_arith (if_info))
 	goto success;
diff --git a/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c b/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
new file mode 100644
index 00000000000..068c1443413
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zicond_ifcvt_opt.c
@@ -0,0 +1,130 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zicond -mabi=lp64d -O2" } */
+/* { dg-skip-if "" { *-*-* } {"-O0" "-O1" "-Os" "-Og" "-O3" "-Oz" "-flto"} } */
+/* { dg-final { check-function-bodies "**" ""  } } */
+
+/*
+**test_ADD_ceqz:
+**	czero\.eqz	a3,a2,a3
+**	add	a0,a1,a3
+**	ret
+*/
+// x = c ? y+z : y
+long test_ADD_ceqz(long x, long y, long z, long c)
+{
+  if (c)
+    x = y + z;
+  else
+    x = y;
+  return x;
+}
+
+/*
+**test_ADD_ceqz_x:
+**	czero\.eqz	a2,a1,a2
+**	add	a0,a0,a2
+**	ret
+*/
+// x = c ? x+z : x
+long test_ADD_ceqz_x(long x, long z, long c)
+{
+  if (c)
+    x = x + z;
+
+  return x;
+}
+
+/*
+**test_ADD_nez:
+**	czero\.nez	a3,a2,a3
+**	add	a0,a1,a3
+**	ret
+*/
+// x = c ? y : y+z
+long test_ADD_nez(long x, long y, long z, long c)
+{
+  if (c)
+    x = y;
+  else
+    x = y + z;
+  return x;
+}
+
+/*
+**test_ADD_nez_x:
+**	czero\.nez	a2,a1,a2
+**	add	a0,a0,a2
+**	ret
+*/
+// x = c ? x : x+z
+long test_ADD_nez_x(long x, long z, long c)
+{
+  if (c)
+  {}
+  else
+    x = x + z;
+  return x;
+}
+
+/*
+**test_ADD_nez_2:
+**	czero\.nez	a3,a2,a3
+**	add	a0,a1,a3
+**	ret
+*/
+// x = !c ? y+z : y
+long test_ADD_nez_2(long x, long y, long z, long c)
+{
+  if (!c)
+    x = y + z;
+  else
+    x = y;
+  return x;
+}
+
+/*
+**test_ADD_nez_x_2:
+**	czero\.nez	a2,a1,a2
+**	add	a0,a0,a2
+**	ret
+*/
+// x = !c ? x+z : x
+long test_ADD_nez_x_2(long x, long z, long c)
+{
+  if (!c)
+    x = x + z;
+
+  return x;
+}
+
+/*
+**test_ADD_eqz_2:
+**	czero\.eqz	a3,a2,a3
+**	add	a0,a1,a3
+**	ret
+*/
+// x = !c ? y : y+z
+long test_ADD_eqz_2(long x, long y, long z, long c)
+{
+  if (!c)
+    x = y;
+  else
+    x = y + z;
+  return x;
+}
+
+/*
+**test_ADD_eqz_x_2:
+**	czero\.eqz	a2,a1,a2
+**	add	a0,a0,a2
+**	ret
+*/
+// x = !c ? x : x+z
+long test_ADD_eqz_x_2(long x, long z, long c)
+{
+  if (!c)
+  {}
+  else
+    x = x + z;
+  return x;
+}