[1/2] middle-end: RFC: On expansion of conditional branches, give hint if argument is a truth type to backend

Message ID patch-16248-tamar@arm.com
State New, archived
Headers
Series [1/2] middle-end: RFC: On expansion of conditional branches, give hint if argument is a truth type to backend |

Commit Message

Tamar Christina Sept. 23, 2022, 9:24 a.m. UTC
  Hi All,

This is an RFC to figure out how to deal with targets that don't have native
comparisons against QImode values.

Booleans, at least in C99 and higher are 0-1 valued.  This means that we only
really need to test a single bit.  However in RTL we no longer have this
information available and just have an SImode value (due to the promotion of
QImode to SImode).

This RFC fixes it by emitting an explicit & 1 during the expansion of the
conditional branch.

However it's unlikely that we want to do this unconditionally.  Most targets
I've tested seem to have harmless code changes, like x86 changes from testb to
andl, $1.

So I have two questions:

1. Should I limit this behind a target macro? Or should I just leave it for all
   targets and deal with the fallout.
2. How can I tell whether the C99 0-1 values bools are being used or the older
   0, non-0 variant?

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
However there are some benign codegen changes on x86, testb changed to andl $1.

This pattern occurs more than 120,000 times in SPECCPU 2017 and so is quite common.

Thanks,
Tamar

gcc/ChangeLog:

	* tree.h (tree_zero_one_valued_p): New.
	* dojump.cc (do_jump): Add & 1 if truth type.

--- inline copy of patch -- 
diff --git a/gcc/dojump.cc b/gcc/dojump.cc
index 2af0cd1aca3b6af13d5d8799094ee93f18022296..8eaf1be49cd12298e61c6946ae79ca9de6197864 100644




--
diff --git a/gcc/dojump.cc b/gcc/dojump.cc
index 2af0cd1aca3b6af13d5d8799094ee93f18022296..8eaf1be49cd12298e61c6946ae79ca9de6197864 100644
--- a/gcc/dojump.cc
+++ b/gcc/dojump.cc
@@ -605,7 +605,17 @@ do_jump (tree exp, rtx_code_label *if_false_label,
       /* Fall through and generate the normal code.  */
     default:
     normal:
-      temp = expand_normal (exp);
+      tree cmp = exp;
+      /* If the expression is a truth type then explicitly generate an & 1
+	 to indicate to the target that it's a zero-one values type.  This
+	 allows the target to further optimize the comparison should it
+	 choose to.  */
+      if (tree_zero_one_valued_p (exp))
+	{
+	  type = TREE_TYPE (exp);
+	  cmp = build2 (BIT_AND_EXPR, type, exp, build_int_cstu (type, 1));
+	}
+      temp = expand_normal (cmp);
       do_pending_stack_adjust ();
       /* The RTL optimizers prefer comparisons against pseudos.  */
       if (GET_CODE (temp) == SUBREG)
diff --git a/gcc/tree.h b/gcc/tree.h
index 8f8a9660c9e0605eb516de194640b8c1b531b798..be3d2dee82f692e81082cf21c878c10f9fe9e1f1 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -4690,6 +4690,7 @@ extern tree signed_or_unsigned_type_for (int, tree);
 extern tree signed_type_for (tree);
 extern tree unsigned_type_for (tree);
 extern bool is_truth_type_for (tree, tree);
+extern bool tree_zero_one_valued_p (tree);
 extern tree truth_type_for (tree);
 extern tree build_pointer_type_for_mode (tree, machine_mode, bool);
 extern tree build_pointer_type (tree);
  

Comments

Richard Biener Sept. 26, 2022, 10:35 a.m. UTC | #1
On Fri, 23 Sep 2022, Tamar Christina wrote:

> Hi All,
> 
> This is an RFC to figure out how to deal with targets that don't have native
> comparisons against QImode values.
> 
> Booleans, at least in C99 and higher are 0-1 valued.  This means that we only
> really need to test a single bit.  However in RTL we no longer have this
> information available and just have an SImode value (due to the promotion of
> QImode to SImode).
>
> This RFC fixes it by emitting an explicit & 1 during the expansion of the
> conditional branch.
> 
> However it's unlikely that we want to do this unconditionally.  Most targets
> I've tested seem to have harmless code changes, like x86 changes from testb to
> andl, $1.
> 
> So I have two questions:
> 
> 1. Should I limit this behind a target macro? Or should I just leave it for all
>    targets and deal with the fallout.
> 2. How can I tell whether the C99 0-1 values bools are being used or the older
>    0, non-0 variant?
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> However there are some benign codegen changes on x86, testb changed to andl $1.
> 
> This pattern occurs more than 120,000 times in SPECCPU 2017 and so is quite common.

How does this help a target?  Why does RTL nonzerop bits not recover this
information and the desired optimization done later during combine
for example?  Why's a SImode compare not OK if there's no QImode compare?
We have undocumented addcc, negcc, etc. patterns, should we have a
andcc pattern for this indicating support for andcc + jump as opposed
to cmpcc + jump?

So - what's the target and what's a testcase?

Thanks,
Richard.
 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	* tree.h (tree_zero_one_valued_p): New.
> 	* dojump.cc (do_jump): Add & 1 if truth type.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/dojump.cc b/gcc/dojump.cc
> index 2af0cd1aca3b6af13d5d8799094ee93f18022296..8eaf1be49cd12298e61c6946ae79ca9de6197864 100644
> --- a/gcc/dojump.cc
> +++ b/gcc/dojump.cc
> @@ -605,7 +605,17 @@ do_jump (tree exp, rtx_code_label *if_false_label,
>        /* Fall through and generate the normal code.  */
>      default:
>      normal:
> -      temp = expand_normal (exp);
> +      tree cmp = exp;
> +      /* If the expression is a truth type then explicitly generate an & 1
> +	 to indicate to the target that it's a zero-one values type.  This
> +	 allows the target to further optimize the comparison should it
> +	 choose to.  */
> +      if (tree_zero_one_valued_p (exp))
> +	{
> +	  type = TREE_TYPE (exp);
> +	  cmp = build2 (BIT_AND_EXPR, type, exp, build_int_cstu (type, 1));
> +	}
> +      temp = expand_normal (cmp);
>        do_pending_stack_adjust ();
>        /* The RTL optimizers prefer comparisons against pseudos.  */
>        if (GET_CODE (temp) == SUBREG)
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 8f8a9660c9e0605eb516de194640b8c1b531b798..be3d2dee82f692e81082cf21c878c10f9fe9e1f1 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -4690,6 +4690,7 @@ extern tree signed_or_unsigned_type_for (int, tree);
>  extern tree signed_type_for (tree);
>  extern tree unsigned_type_for (tree);
>  extern bool is_truth_type_for (tree, tree);
> +extern bool tree_zero_one_valued_p (tree);
>  extern tree truth_type_for (tree);
>  extern tree build_pointer_type_for_mode (tree, machine_mode, bool);
>  extern tree build_pointer_type (tree);
> 
> 
> 
> 
>
  
Tamar Christina Sept. 26, 2022, 11:05 a.m. UTC | #2
> > This pattern occurs more than 120,000 times in SPECCPU 2017 and so is quite common.

> How does this help a target?

So the idea is to communicate that only the bottom bit of the value is relevant and not the entire value itself.

> Why does RTL nonzerop bits not recover thisinformation and the desired optimization done later during combinefor example?

I'm not sure it works here. We (AArch64) don't have QImode integer registers, so our apcs says that the top bits of the 32-bit registers it's passed in are undefined.

We have to zero extend the value first if we really want it as an 8-bit value. So the problem is if you e.g. Pass a boolean as an argument of a function I don't think nonzero bits will return anything useful.

> Why's a SImode compare not OK if there's no QImode compare?

The mode then becomes irrelevant because we're telling the backend that only a single bit matters. And we have instructions to test and branch on the value of a single bit. See https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602090.html for the testcases

> We have undocumented addcc, negcc, etc. patterns, should we have aandcc pattern for this indicating support for andcc + jump as opposedto cmpcc + jump?

This could work yeah. I didn't know these existed.

> So - what's the target and what's a testcase?

See https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602090.html :)

Thanks,
Tamar
  
Richard Biener Sept. 26, 2022, 11:32 a.m. UTC | #3
On Mon, 26 Sep 2022, Tamar Christina wrote:

> > > This pattern occurs more than 120,000 times in SPECCPU 2017 and so is quite common.
> 
> > How does this help a target?
> 
> So the idea is to communicate that only the bottom bit of the value is relevant and not the entire value itself.
> 
> > Why does RTL nonzerop bits not recover thisinformation and the desired optimization done later during combinefor example?
> 
> I'm not sure it works here. We (AArch64) don't have QImode integer registers, so our apcs says that the top bits of the 32-bit registers it's passed in are undefined.
> 
> We have to zero extend the value first if we really want it as an 8-bit value. So the problem is if you e.g. Pass a boolean as an argument of a function I don't think nonzero bits will return anything useful.
> 
> > Why's a SImode compare not OK if there's no QImode compare?
> 
> The mode then becomes irrelevant because we're telling the backend that only a single bit matters. And we have instructions to test and branch on the value of a single bit. See https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602090.html for the testcases

Maybe the target could use (subreg:SI (reg:BI ...)) as argument.  Heh.

> > We have undocumented addcc, negcc, etc. patterns, should we have aandcc pattern for this indicating support for andcc + jump as opposedto cmpcc + jump?
> 
> This could work yeah. I didn't know these existed.

Ah, so they are conditional add, not add setting CC, so andcc wouldn't
be appropriate.

So I'm not sure how we'd handle such situation - maybe looking at
REG_DECL and recognizing a _Bool PARM_DECL is OK?

Richard.

> > So - what's the target and what's a testcase?
> 
> See https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602090.html :)
> 
> Thanks,
> Tamar
> 
> ________________________________
> From: Richard Biener <rguenther@suse.de>
> Sent: Monday, September 26, 2022 11:35 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; jeffreyalaw@gmail.com <jeffreyalaw@gmail.com>; Richard Sandiford <Richard.Sandiford@arm.com>
> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional branches, give hint if argument is a truth type to backend
> 
> On Fri, 23 Sep 2022, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This is an RFC to figure out how to deal with targets that don't have native
> > comparisons against QImode values.
> >
> > Booleans, at least in C99 and higher are 0-1 valued.  This means that we only
> > really need to test a single bit.  However in RTL we no longer have this
> > information available and just have an SImode value (due to the promotion of
> > QImode to SImode).
> >
> > This RFC fixes it by emitting an explicit & 1 during the expansion of the
> > conditional branch.
> >
> > However it's unlikely that we want to do this unconditionally.  Most targets
> > I've tested seem to have harmless code changes, like x86 changes from testb to
> > andl, $1.
> >
> > So I have two questions:
> >
> > 1. Should I limit this behind a target macro? Or should I just leave it for all
> >    targets and deal with the fallout.
> > 2. How can I tell whether the C99 0-1 values bools are being used or the older
> >    0, non-0 variant?
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > However there are some benign codegen changes on x86, testb changed to andl $1.
> >
> > This pattern occurs more than 120,000 times in SPECCPU 2017 and so is quite common.
> 
> How does this help a target?  Why does RTL nonzerop bits not recover this
> information and the desired optimization done later during combine
> for example?  Why's a SImode compare not OK if there's no QImode compare?
> We have undocumented addcc, negcc, etc. patterns, should we have a
> andcc pattern for this indicating support for andcc + jump as opposed
> to cmpcc + jump?
> 
> So - what's the target and what's a testcase?
> 
> Thanks,
> Richard.
> 
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> >        * tree.h (tree_zero_one_valued_p): New.
> >        * dojump.cc (do_jump): Add & 1 if truth type.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/dojump.cc b/gcc/dojump.cc
> > index 2af0cd1aca3b6af13d5d8799094ee93f18022296..8eaf1be49cd12298e61c6946ae79ca9de6197864 100644
> > --- a/gcc/dojump.cc
> > +++ b/gcc/dojump.cc
> > @@ -605,7 +605,17 @@ do_jump (tree exp, rtx_code_label *if_false_label,
> >        /* Fall through and generate the normal code.  */
> >      default:
> >      normal:
> > -      temp = expand_normal (exp);
> > +      tree cmp = exp;
> > +      /* If the expression is a truth type then explicitly generate an & 1
> > +      to indicate to the target that it's a zero-one values type.  This
> > +      allows the target to further optimize the comparison should it
> > +      choose to.  */
> > +      if (tree_zero_one_valued_p (exp))
> > +     {
> > +       type = TREE_TYPE (exp);
> > +       cmp = build2 (BIT_AND_EXPR, type, exp, build_int_cstu (type, 1));
> > +     }
> > +      temp = expand_normal (cmp);
> >        do_pending_stack_adjust ();
> >        /* The RTL optimizers prefer comparisons against pseudos.  */
> >        if (GET_CODE (temp) == SUBREG)
> > diff --git a/gcc/tree.h b/gcc/tree.h
> > index 8f8a9660c9e0605eb516de194640b8c1b531b798..be3d2dee82f692e81082cf21c878c10f9fe9e1f1 100644
> > --- a/gcc/tree.h
> > +++ b/gcc/tree.h
> > @@ -4690,6 +4690,7 @@ extern tree signed_or_unsigned_type_for (int, tree);
> >  extern tree signed_type_for (tree);
> >  extern tree unsigned_type_for (tree);
> >  extern bool is_truth_type_for (tree, tree);
> > +extern bool tree_zero_one_valued_p (tree);
> >  extern tree truth_type_for (tree);
> >  extern tree build_pointer_type_for_mode (tree, machine_mode, bool);
> >  extern tree build_pointer_type (tree);
> >
> >
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> HRB 36809 (AG Nuernberg)
>
  
Tamar Christina Sept. 26, 2022, 11:46 a.m. UTC | #4
> Maybe the target could use (subreg:SI (reg:BI ...)) as argument. Heh.

But then I'd still need to change the expansion code. I suppose this could prevent the issue with changes to code on other targets.

> > > We have undocumented addcc, negcc, etc. patterns, should we have aandcc pattern for this indicating support for andcc + jump as opposedto cmpcc + jump?
> >
> > This could work yeah. I didn't know these existed.

> Ah, so they are conditional add, not add setting CC, so andcc wouldn't
> be appropriate.

> So I'm not sure how we'd handle such situation - maybe looking at
> REG_DECL and recognizing a _Bool PARM_DECL is OK?

I have a slight suspicion that Richard Sandiford would likely reject this though.. The additional AND seemed less hacky as it's just communicating range.

I still need to also figure out which representation of bool is being used, because only the 0-1 variant works. Is there a way to check that?

Thanks,
Tamar.
  
Richard Biener Sept. 26, 2022, 12:34 p.m. UTC | #5
On Mon, 26 Sep 2022, Tamar Christina wrote:

> > Maybe the target could use (subreg:SI (reg:BI ...)) as argument. Heh.
> 
> But then I'd still need to change the expansion code. I suppose this could prevent the issue with changes to code on other targets.
> 
> > > > We have undocumented addcc, negcc, etc. patterns, should we have aandcc pattern for this indicating support for andcc + jump as opposedto cmpcc + jump?
> > >
> > > This could work yeah. I didn't know these existed.
> 
> > Ah, so they are conditional add, not add setting CC, so andcc wouldn't
> > be appropriate.
> 
> > So I'm not sure how we'd handle such situation - maybe looking at
> > REG_DECL and recognizing a _Bool PARM_DECL is OK?
> 
> I have a slight suspicion that Richard Sandiford would likely reject this though.. The additional AND seemed less hacky as it's just communicating range.
> 
> I still need to also figure out which representation of bool is being used, because only the 0-1 variant works. Is there a way to check that?

So another option would be, in case you have (subreg:SI (reg:QI)),
if we expand

 if (b != 0)

expand that to

 !((b & 255) == 0)

basically invert the comparison and the leverage the paradoxical subreg
to specify a narrower immediate to AND with?  Just hoping that arm
can do 255 as immediate and still efficiently handle this?

Wouldn't this transform be possible in combine with the appropriate
backend pattern and combine synthesizing the and for paradoxical subregs?

Richard.
  
Richard Biener Sept. 26, 2022, 12:43 p.m. UTC | #6
On Mon, 26 Sep 2022, Richard Biener wrote:

> On Mon, 26 Sep 2022, Tamar Christina wrote:
> 
> > > Maybe the target could use (subreg:SI (reg:BI ...)) as argument. Heh.
> > 
> > But then I'd still need to change the expansion code. I suppose this could prevent the issue with changes to code on other targets.
> > 
> > > > > We have undocumented addcc, negcc, etc. patterns, should we have aandcc pattern for this indicating support for andcc + jump as opposedto cmpcc + jump?
> > > >
> > > > This could work yeah. I didn't know these existed.
> > 
> > > Ah, so they are conditional add, not add setting CC, so andcc wouldn't
> > > be appropriate.
> > 
> > > So I'm not sure how we'd handle such situation - maybe looking at
> > > REG_DECL and recognizing a _Bool PARM_DECL is OK?
> > 
> > I have a slight suspicion that Richard Sandiford would likely reject this though.. The additional AND seemed less hacky as it's just communicating range.
> > 
> > I still need to also figure out which representation of bool is being used, because only the 0-1 variant works. Is there a way to check that?
> 
> So another option would be, in case you have (subreg:SI (reg:QI)),
> if we expand
> 
>  if (b != 0)
> 
> expand that to
> 
>  !((b & 255) == 0)
> 
> basically invert the comparison and the leverage the paradoxical subreg
> to specify a narrower immediate to AND with?  Just hoping that arm
> can do 255 as immediate and still efficiently handle this?
> 
> Wouldn't this transform be possible in combine with the appropriate
> backend pattern and combine synthesizing the and for paradoxical subregs?

Looking at what we produce on aarch64 it seems 'bool' is using
an SImode register but your characterization that the upper 24 bits
have undefined content suggests that is a wrong representation?
If the ABI doesn't say anything about the upper bits we should
reflect that somehow?

Richard.
  
Tamar Christina Sept. 26, 2022, 2:02 p.m. UTC | #7
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Monday, September 26, 2022 1:43 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; jeffreyalaw@gmail.com;
> Richard Sandiford <Richard.Sandiford@arm.com>
> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
> branches, give hint if argument is a truth type to backend
> 
> On Mon, 26 Sep 2022, Richard Biener wrote:
> 
> > On Mon, 26 Sep 2022, Tamar Christina wrote:
> >
> > > > Maybe the target could use (subreg:SI (reg:BI ...)) as argument. Heh.
> > >
> > > But then I'd still need to change the expansion code. I suppose this could
> prevent the issue with changes to code on other targets.
> > >
> > > > > > We have undocumented addcc, negcc, etc. patterns, should we
> have aandcc pattern for this indicating support for andcc + jump as
> opposedto cmpcc + jump?
> > > > >
> > > > > This could work yeah. I didn't know these existed.
> > >
> > > > Ah, so they are conditional add, not add setting CC, so andcc
> > > > wouldn't be appropriate.
> > >
> > > > So I'm not sure how we'd handle such situation - maybe looking at
> > > > REG_DECL and recognizing a _Bool PARM_DECL is OK?
> > >
> > > I have a slight suspicion that Richard Sandiford would likely reject this
> though.. The additional AND seemed less hacky as it's just communicating
> range.
> > >
> > > I still need to also figure out which representation of bool is being used,
> because only the 0-1 variant works. Is there a way to check that?
> >
> > So another option would be, in case you have (subreg:SI (reg:QI)), if
> > we expand
> >
> >  if (b != 0)
> >
> > expand that to
> >
> >  !((b & 255) == 0)
> >
> > basically invert the comparison and the leverage the paradoxical
> > subreg to specify a narrower immediate to AND with?  Just hoping that
> > arm can do 255 as immediate and still efficiently handle this?

We can and already do, and don't need that representation to do so.
The problem is, handling 255 is already inefficient. It requires us to use an additional
Instruction to test the value. Whereas we have a fused test single bit and branch instruction.

> >
> > Wouldn't this transform be possible in combine with the appropriate
> > backend pattern and combine synthesizing the and for paradoxical
> subregs?

Not unless we have enough range information in RTL to know that whatever value has
been fed into the cbranch has a range of 1 bit. A range of 8 bits we already have and isn't value useful.

The idea was to transform what we currently have:

        tst     w0, 255
        bne     .L4
        ret

i.e. test the bottom 8 bits, into

        tbnz    w0, #0, .L4
        ret

i.e. test only bit 0 and branch based on that bit. We cannot do this when all we know is that the range is 8 bits.

> 
> Looking at what we produce on aarch64 it seems 'bool' is using an SImode
> register but your characterization that the upper 24 bits have undefined
> content suggests that is a wrong representation?
> If the ABI doesn't say anything about the upper bits we should reflect that
> somehow?

It does. And no "bool" is using QImode. The expansion of

extern void h ();

void g1(bool x)
{
  if (__builtin_expect (x, 0))
    h ();
}

Shows that the argument x is passed as a QI mode, but like many RISC targets (and even i386) we promote the argument during expansion:

(insn 2 4 3 2 (set (reg/v:SI 92 [ x ])
        (zero_extend:SI (reg:QI 0 x0 [ x ]))) "/app/example.cpp":4:1 -1
     (nil))

But the value is passed as QImode.

We use this fact to know that the range is 8 bits in the cbanch instruction.  If no operation was done that requires a bigger
range then combine will push the zero extend into the cbranch and we have various patterns to handle different forms of this.

For instance:

void g1(bool *x)
{
  if (__builtin_expect (*x, 0))
    h ();
}

Because of the load of x we generate:

        ldrb    w0, [x0]
        cbnz    w0, .L7
        ret

because we know the top bits are defined to 0 in this case and can just test the entire register.

The reason for this promotion for us and many other backends is one of efficiency. If we don't promote to something
we have native instructions for we would have to promote and demote the value at *every* instruction in RTL.

This causes significant noise in the RTL.  So we can't do anything different here.  I have plans to try to fix this, but not in GCC 13.

But even then it won't help with this case, because we explicitly need to know that the range is a single bit. Not 8 bits.

Regards,
Tamar

> 
> Richard.
  
Richard Sandiford Sept. 28, 2022, 3:04 p.m. UTC | #8
Tamar Christina <Tamar.Christina@arm.com> writes:
>> Maybe the target could use (subreg:SI (reg:BI ...)) as argument. Heh.
>
> But then I'd still need to change the expansion code. I suppose this could
> prevent the issue with changes to code on other targets. 
>
>> > > We have undocumented addcc, negcc, etc. patterns, should we have aandcc
> pattern for this indicating support for andcc + jump as opposedto cmpcc + jump?
>> >
>> > This could work yeah. I didn't know these existed.
>
>> Ah, so they are conditional add, not add setting CC, so andcc wouldn't
>> be appropriate.
>
>> So I'm not sure how we'd handle such situation - maybe looking at
>> REG_DECL and recognizing a _Bool PARM_DECL is OK?
>
> I have a slight suspicion that Richard Sandiford would likely reject this
> though..

Good guess :-P  We shouldn't rely on something like that for correctness.

Would it help if we promoted the test-and-branch instructions to optabs,
alongside cbranch?  The jump expanders could then target it directly.

IMO that'd be a reasonable thing to do if it does help.  It's a relatively
common operation, especially on CISCy targets.

Thanks,
Richard

> The additional AND seemed less hacky as it's just communicating range.
>
> I still need to also figure out which representation of bool is being used,
> because only the 0-1 variant works. Is there a way to check that?
>
> Thanks, 
> Tamar. 
>
> -------------------------------------------------------------------------------
> From: Richard Biener <rguenther@suse.de>
> Sent: Monday, September 26, 2022 12:32 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>;
> jeffreyalaw@gmail.com <jeffreyalaw@gmail.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional branches,
> give hint if argument is a truth type to backend
>  
> On Mon, 26 Sep 2022, Tamar Christina wrote:
>
>> > > This pattern occurs more than 120,000 times in SPECCPU 2017 and so is
> quite common.
>>
>> > How does this help a target?
>>
>> So the idea is to communicate that only the bottom bit of the value is
> relevant and not the entire value itself.
>>
>> > Why does RTL nonzerop bits not recover thisinformation and the desired
> optimization done later during combinefor example?
>>
>> I'm not sure it works here. We (AArch64) don't have QImode integer registers,
> so our apcs says that the top bits of the 32-bit registers it's passed in are
> undefined.
>>
>> We have to zero extend the value first if we really want it as an 8-bit
> value. So the problem is if you e.g. Pass a boolean as an argument of a
> function I don't think nonzero bits will return anything useful.
>>
>> > Why's a SImode compare not OK if there's no QImode compare?
>>
>> The mode then becomes irrelevant because we're telling the backend that only
> a single bit matters. And we have instructions to test and branch on the value
> of a single bit. See https://gcc.gnu.org/pipermail/gcc-patches/2022-September/
> 602090.html for the testcases
>
> Maybe the target could use (subreg:SI (reg:BI ...)) as argument.  Heh.
>
>> > We have undocumented addcc, negcc, etc. patterns, should we have aandcc
> pattern for this indicating support for andcc + jump as opposedto cmpcc + jump?
>>
>> This could work yeah. I didn't know these existed.
>
> Ah, so they are conditional add, not add setting CC, so andcc wouldn't
> be appropriate.
>
> So I'm not sure how we'd handle such situation - maybe looking at
> REG_DECL and recognizing a _Bool PARM_DECL is OK?
>
> Richard.
>
>> > So - what's the target and what's a testcase?
>>
>> See https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602090.html :)
>>
>> Thanks,
>> Tamar
>>
>> ________________________________
>> From: Richard Biener <rguenther@suse.de>
>> Sent: Monday, September 26, 2022 11:35 AM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: gcc-patches@gcc.gnu.org <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>;
> jeffreyalaw@gmail.com <jeffreyalaw@gmail.com>; Richard Sandiford
> <Richard.Sandiford@arm.com>
>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
> branches, give hint if argument is a truth type to backend
>>
>> On Fri, 23 Sep 2022, Tamar Christina wrote:
>>
>> > Hi All,
>> >
>> > This is an RFC to figure out how to deal with targets that don't have
> native
>> > comparisons against QImode values.
>> >
>> > Booleans, at least in C99 and higher are 0-1 valued.  This means that we
> only
>> > really need to test a single bit.  However in RTL we no longer have this
>> > information available and just have an SImode value (due to the promotion
> of
>> > QImode to SImode).
>> >
>> > This RFC fixes it by emitting an explicit & 1 during the expansion of the
>> > conditional branch.
>> >
>> > However it's unlikely that we want to do this unconditionally.  Most
> targets
>> > I've tested seem to have harmless code changes, like x86 changes from testb
> to
>> > andl, $1.
>> >
>> > So I have two questions:
>> >
>> > 1. Should I limit this behind a target macro? Or should I just leave it for
> all
>> >    targets and deal with the fallout.
>> > 2. How can I tell whether the C99 0-1 values bools are being used or the
> older
>> >    0, non-0 variant?
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> > However there are some benign codegen changes on x86, testb changed to andl
> $1.
>> >
>> > This pattern occurs more than 120,000 times in SPECCPU 2017 and so is quite
> common.
>>
>> How does this help a target?  Why does RTL nonzerop bits not recover this
>> information and the desired optimization done later during combine
>> for example?  Why's a SImode compare not OK if there's no QImode compare?
>> We have undocumented addcc, negcc, etc. patterns, should we have a
>> andcc pattern for this indicating support for andcc + jump as opposed
>> to cmpcc + jump?
>>
>> So - what's the target and what's a testcase?
>>
>> Thanks,
>> Richard.
>>
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> >        * tree.h (tree_zero_one_valued_p): New.
>> >        * dojump.cc (do_jump): Add & 1 if truth type.
>> >
>> > --- inline copy of patch --
>> > diff --git a/gcc/dojump.cc b/gcc/dojump.cc
>> > index
> 2af0cd1aca3b6af13d5d8799094ee93f18022296..8eaf1be49cd12298e61c6946ae79ca9de6197864
> 100644
>> > --- a/gcc/dojump.cc
>> > +++ b/gcc/dojump.cc
>> > @@ -605,7 +605,17 @@ do_jump (tree exp, rtx_code_label *if_false_label,
>> >        /* Fall through and generate the normal code.  */
>> >      default:
>> >      normal:
>> > -      temp = expand_normal (exp);
>> > +      tree cmp = exp;
>> > +      /* If the expression is a truth type then explicitly generate an & 1
>> > +      to indicate to the target that it's a zero-one values type.  This
>> > +      allows the target to further optimize the comparison should it
>> > +      choose to.  */
>> > +      if (tree_zero_one_valued_p (exp))
>> > +     {
>> > +       type = TREE_TYPE (exp);
>> > +       cmp = build2 (BIT_AND_EXPR, type, exp, build_int_cstu (type, 1));
>> > +     }
>> > +      temp = expand_normal (cmp);
>> >        do_pending_stack_adjust ();
>> >        /* The RTL optimizers prefer comparisons against pseudos.  */
>> >        if (GET_CODE (temp) == SUBREG)
>> > diff --git a/gcc/tree.h b/gcc/tree.h
>> > index
> 8f8a9660c9e0605eb516de194640b8c1b531b798..be3d2dee82f692e81082cf21c878c10f9fe9e1f1
> 100644
>> > --- a/gcc/tree.h
>> > +++ b/gcc/tree.h
>> > @@ -4690,6 +4690,7 @@ extern tree signed_or_unsigned_type_for (int, tree);
>> >  extern tree signed_type_for (tree);
>> >  extern tree unsigned_type_for (tree);
>> >  extern bool is_truth_type_for (tree, tree);
>> > +extern bool tree_zero_one_valued_p (tree);
>> >  extern tree truth_type_for (tree);
>> >  extern tree build_pointer_type_for_mode (tree, machine_mode, bool);
>> >  extern tree build_pointer_type (tree);
>> >
>> >
>> >
>> >
>> >
>>
>> --
>> Richard Biener <rguenther@suse.de>
>> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
>> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
>> HRB 36809 (AG Nuernberg)
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> HRB 36809 (AG Nuernberg)
  
Jeff Law Sept. 28, 2022, 5:23 p.m. UTC | #9
On 9/28/22 09:04, Richard Sandiford wrote:
> Tamar Christina <Tamar.Christina@arm.com> writes:
>>> Maybe the target could use (subreg:SI (reg:BI ...)) as argument. Heh.
>> But then I'd still need to change the expansion code. I suppose this could
>> prevent the issue with changes to code on other targets.
>>
>>>>> We have undocumented addcc, negcc, etc. patterns, should we have aandcc
>> pattern for this indicating support for andcc + jump as opposedto cmpcc + jump?
>>>> This could work yeah. I didn't know these existed.
>>> Ah, so they are conditional add, not add setting CC, so andcc wouldn't
>>> be appropriate.
>>> So I'm not sure how we'd handle such situation - maybe looking at
>>> REG_DECL and recognizing a _Bool PARM_DECL is OK?
>> I have a slight suspicion that Richard Sandiford would likely reject this
>> though..
> Good guess :-P  We shouldn't rely on something like that for correctness.
>
> Would it help if we promoted the test-and-branch instructions to optabs,
> alongside cbranch?  The jump expanders could then target it directly.
>
> IMO that'd be a reasonable thing to do if it does help.  It's a relatively
> common operation, especially on CISCy targets.

But don't we represent these single bit tests using zero_extract as the 
condition of the branch?  I guess if we can generate them directly 
rather than waiting for combine to deduce that we're dealing with a 
single bit test and constructing the zero_extract form would be an 
improvement and might help aarch at the same time.


jeff
  
Richard Sandiford Sept. 29, 2022, 9:37 a.m. UTC | #10
Jeff Law <jeffreyalaw@gmail.com> writes:
> On 9/28/22 09:04, Richard Sandiford wrote:
>> Tamar Christina <Tamar.Christina@arm.com> writes:
>>>> Maybe the target could use (subreg:SI (reg:BI ...)) as argument. Heh.
>>> But then I'd still need to change the expansion code. I suppose this could
>>> prevent the issue with changes to code on other targets.
>>>
>>>>>> We have undocumented addcc, negcc, etc. patterns, should we have aandcc
>>> pattern for this indicating support for andcc + jump as opposedto cmpcc + jump?
>>>>> This could work yeah. I didn't know these existed.
>>>> Ah, so they are conditional add, not add setting CC, so andcc wouldn't
>>>> be appropriate.
>>>> So I'm not sure how we'd handle such situation - maybe looking at
>>>> REG_DECL and recognizing a _Bool PARM_DECL is OK?
>>> I have a slight suspicion that Richard Sandiford would likely reject this
>>> though..
>> Good guess :-P  We shouldn't rely on something like that for correctness.
>>
>> Would it help if we promoted the test-and-branch instructions to optabs,
>> alongside cbranch?  The jump expanders could then target it directly.
>>
>> IMO that'd be a reasonable thing to do if it does help.  It's a relatively
>> common operation, especially on CISCy targets.
>
> But don't we represent these single bit tests using zero_extract as the 
> condition of the branch?  I guess if we can generate them directly 
> rather than waiting for combine to deduce that we're dealing with a 
> single bit test and constructing the zero_extract form would be an 
> improvement and might help aarch at the same time.

Do you mean that the promote_mode stuff should use ext(z)v rather than
zero_extend to promote a bool, where available?  If so, I agree that
might help.  But it sounds like it would have downsides too.  Currently
a bool memory can be zero-extended on the fly using a load, but if we
used the zero_extract form instead, we'd have to extract the bit after
the load.  And (as an alternative) choosing different behaviour based
on whether expand sees a REG or a MEM sounds like it could still cause
problems, since REGs could be replaced by MEMs (or vice versa) later in
the RTL passes.

ISTM that the original patch was inserting an extra operation in the
branch expansion in order to target a specific instruction.  Targeting
the instruction in expand seems good, but IMO we should do it directly,
based on knowledge of whether the instruction actually exists.

Thanks,
Richard
  
Richard Biener Sept. 29, 2022, 9:40 a.m. UTC | #11
On Thu, 29 Sep 2022, Richard Sandiford wrote:

> Jeff Law <jeffreyalaw@gmail.com> writes:
> > On 9/28/22 09:04, Richard Sandiford wrote:
> >> Tamar Christina <Tamar.Christina@arm.com> writes:
> >>>> Maybe the target could use (subreg:SI (reg:BI ...)) as argument. Heh.
> >>> But then I'd still need to change the expansion code. I suppose this could
> >>> prevent the issue with changes to code on other targets.
> >>>
> >>>>>> We have undocumented addcc, negcc, etc. patterns, should we have aandcc
> >>> pattern for this indicating support for andcc + jump as opposedto cmpcc + jump?
> >>>>> This could work yeah. I didn't know these existed.
> >>>> Ah, so they are conditional add, not add setting CC, so andcc wouldn't
> >>>> be appropriate.
> >>>> So I'm not sure how we'd handle such situation - maybe looking at
> >>>> REG_DECL and recognizing a _Bool PARM_DECL is OK?
> >>> I have a slight suspicion that Richard Sandiford would likely reject this
> >>> though..
> >> Good guess :-P  We shouldn't rely on something like that for correctness.
> >>
> >> Would it help if we promoted the test-and-branch instructions to optabs,
> >> alongside cbranch?  The jump expanders could then target it directly.
> >>
> >> IMO that'd be a reasonable thing to do if it does help.  It's a relatively
> >> common operation, especially on CISCy targets.
> >
> > But don't we represent these single bit tests using zero_extract as the 
> > condition of the branch?  I guess if we can generate them directly 
> > rather than waiting for combine to deduce that we're dealing with a 
> > single bit test and constructing the zero_extract form would be an 
> > improvement and might help aarch at the same time.
> 
> Do you mean that the promote_mode stuff should use ext(z)v rather than
> zero_extend to promote a bool, where available?  If so, I agree that
> might help.  But it sounds like it would have downsides too.  Currently
> a bool memory can be zero-extended on the fly using a load, but if we
> used the zero_extract form instead, we'd have to extract the bit after
> the load.  And (as an alternative) choosing different behaviour based
> on whether expand sees a REG or a MEM sounds like it could still cause
> problems, since REGs could be replaced by MEMs (or vice versa) later in
> the RTL passes.
> 
> ISTM that the original patch was inserting an extra operation in the
> branch expansion in order to target a specific instruction.  Targeting
> the instruction in expand seems good, but IMO we should do it directly,
> based on knowledge of whether the instruction actually exists.

Yes, I think a compare-and-branch pattern is the best fit here.  Note
on GIMPLE we'd rely on the fact this is a BOOLEAN_TYPE (so
even 8 bit precision bools only have 1 and 0 as meaningful values).
So the 'compare-' bit in compare-and-branch would be interpreting
a BOOLEAN_TYPE, not so much a general compare.

Richard.
  
Tamar Christina Sept. 29, 2022, 10:21 a.m. UTC | #12
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Thursday, September 29, 2022 10:41 AM
> To: Richard Sandiford <Richard.Sandiford@arm.com>
> Cc: Jeff Law <jeffreyalaw@gmail.com>; Tamar Christina
> <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; nd <nd@arm.com>
> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
> branches, give hint if argument is a truth type to backend
> 
> On Thu, 29 Sep 2022, Richard Sandiford wrote:
> 
> > Jeff Law <jeffreyalaw@gmail.com> writes:
> > > On 9/28/22 09:04, Richard Sandiford wrote:
> > >> Tamar Christina <Tamar.Christina@arm.com> writes:
> > >>>> Maybe the target could use (subreg:SI (reg:BI ...)) as argument. Heh.
> > >>> But then I'd still need to change the expansion code. I suppose
> > >>> this could prevent the issue with changes to code on other targets.
> > >>>
> > >>>>>> We have undocumented addcc, negcc, etc. patterns, should we
> > >>>>>> have aandcc
> > >>> pattern for this indicating support for andcc + jump as opposedto
> cmpcc + jump?
> > >>>>> This could work yeah. I didn't know these existed.
> > >>>> Ah, so they are conditional add, not add setting CC, so andcc
> > >>>> wouldn't be appropriate.
> > >>>> So I'm not sure how we'd handle such situation - maybe looking at
> > >>>> REG_DECL and recognizing a _Bool PARM_DECL is OK?
> > >>> I have a slight suspicion that Richard Sandiford would likely
> > >>> reject this though..
> > >> Good guess :-P  We shouldn't rely on something like that for
> correctness.
> > >>
> > >> Would it help if we promoted the test-and-branch instructions to
> > >> optabs, alongside cbranch?  The jump expanders could then target it
> directly.
> > >>
> > >> IMO that'd be a reasonable thing to do if it does help.  It's a
> > >> relatively common operation, especially on CISCy targets.
> > >
> > > But don't we represent these single bit tests using zero_extract as
> > > the condition of the branch?  I guess if we can generate them
> > > directly rather than waiting for combine to deduce that we're
> > > dealing with a single bit test and constructing the zero_extract
> > > form would be an improvement and might help aarch at the same time.
> >
> > Do you mean that the promote_mode stuff should use ext(z)v rather than
> > zero_extend to promote a bool, where available?  If so, I agree that
> > might help.  But it sounds like it would have downsides too.
> > Currently a bool memory can be zero-extended on the fly using a load,
> > but if we used the zero_extract form instead, we'd have to extract the
> > bit after the load.  And (as an alternative) choosing different
> > behaviour based on whether expand sees a REG or a MEM sounds like it
> > could still cause problems, since REGs could be replaced by MEMs (or
> > vice versa) later in the RTL passes.
> >
> > ISTM that the original patch was inserting an extra operation in the
> > branch expansion in order to target a specific instruction.  Targeting
> > the instruction in expand seems good, but IMO we should do it
> > directly, based on knowledge of whether the instruction actually exists.
> 
> Yes, I think a compare-and-branch pattern is the best fit here.  Note on
> GIMPLE we'd rely on the fact this is a BOOLEAN_TYPE (so even 8 bit precision
> bools only have 1 and 0 as meaningful values).
> So the 'compare-' bit in compare-and-branch would be interpreting a
> BOOLEAN_TYPE, not so much a general compare.

Oh, I was thinking of adding a constant argument representing the precision that
is relevant for the compare in order to make this a bit more general/future proof.

Are you thinking I should instead just make the optab implicitly only work for 1-bit
precision comparisons?

Thanks,
Tamar

> 
> Richard.
  
Richard Biener Sept. 29, 2022, 11:09 a.m. UTC | #13
> Am 29.09.2022 um 12:23 schrieb Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>:
> 
> 
>> 
>> -----Original Message-----
>> From: Richard Biener <rguenther@suse.de>
>> Sent: Thursday, September 29, 2022 10:41 AM
>> To: Richard Sandiford <Richard.Sandiford@arm.com>
>> Cc: Jeff Law <jeffreyalaw@gmail.com>; Tamar Christina
>> <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; nd <nd@arm.com>
>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
>> branches, give hint if argument is a truth type to backend
>> 
>>> On Thu, 29 Sep 2022, Richard Sandiford wrote:
>>> 
>>> Jeff Law <jeffreyalaw@gmail.com> writes:
>>>> On 9/28/22 09:04, Richard Sandiford wrote:
>>>>> Tamar Christina <Tamar.Christina@arm.com> writes:
>>>>>>> Maybe the target could use (subreg:SI (reg:BI ...)) as argument. Heh.
>>>>>> But then I'd still need to change the expansion code. I suppose
>>>>>> this could prevent the issue with changes to code on other targets.
>>>>>> 
>>>>>>>>> We have undocumented addcc, negcc, etc. patterns, should we
>>>>>>>>> have aandcc
>>>>>> pattern for this indicating support for andcc + jump as opposedto
>> cmpcc + jump?
>>>>>>>> This could work yeah. I didn't know these existed.
>>>>>>> Ah, so they are conditional add, not add setting CC, so andcc
>>>>>>> wouldn't be appropriate.
>>>>>>> So I'm not sure how we'd handle such situation - maybe looking at
>>>>>>> REG_DECL and recognizing a _Bool PARM_DECL is OK?
>>>>>> I have a slight suspicion that Richard Sandiford would likely
>>>>>> reject this though..
>>>>> Good guess :-P  We shouldn't rely on something like that for
>> correctness.
>>>>> 
>>>>> Would it help if we promoted the test-and-branch instructions to
>>>>> optabs, alongside cbranch?  The jump expanders could then target it
>> directly.
>>>>> 
>>>>> IMO that'd be a reasonable thing to do if it does help.  It's a
>>>>> relatively common operation, especially on CISCy targets.
>>>> 
>>>> But don't we represent these single bit tests using zero_extract as
>>>> the condition of the branch?  I guess if we can generate them
>>>> directly rather than waiting for combine to deduce that we're
>>>> dealing with a single bit test and constructing the zero_extract
>>>> form would be an improvement and might help aarch at the same time.
>>> 
>>> Do you mean that the promote_mode stuff should use ext(z)v rather than
>>> zero_extend to promote a bool, where available?  If so, I agree that
>>> might help.  But it sounds like it would have downsides too.
>>> Currently a bool memory can be zero-extended on the fly using a load,
>>> but if we used the zero_extract form instead, we'd have to extract the
>>> bit after the load.  And (as an alternative) choosing different
>>> behaviour based on whether expand sees a REG or a MEM sounds like it
>>> could still cause problems, since REGs could be replaced by MEMs (or
>>> vice versa) later in the RTL passes.
>>> 
>>> ISTM that the original patch was inserting an extra operation in the
>>> branch expansion in order to target a specific instruction.  Targeting
>>> the instruction in expand seems good, but IMO we should do it
>>> directly, based on knowledge of whether the instruction actually exists.
>> 
>> Yes, I think a compare-and-branch pattern is the best fit here.  Note on
>> GIMPLE we'd rely on the fact this is a BOOLEAN_TYPE (so even 8 bit precision
>> bools only have 1 and 0 as meaningful values).
>> So the 'compare-' bit in compare-and-branch would be interpreting a
>> BOOLEAN_TYPE, not so much a general compare.
> 
> Oh, I was thinking of adding a constant argument representing the precision that
> is relevant for the compare in order to make this a bit more general/future proof.
> 
> Are you thinking I should instead just make the optab implicitly only work for 1-bit
> precision comparisons?

What’s the optab you propose (cite also the documentation part)?

> 
> Thanks,
> Tamar
> 
>> 
>> Richard.
  
Jeff Law Sept. 29, 2022, 8:49 p.m. UTC | #14
On 9/29/22 03:37, Richard Sandiford wrote:
> Jeff Law <jeffreyalaw@gmail.com> writes:
>> On 9/28/22 09:04, Richard Sandiford wrote:
>>> Tamar Christina <Tamar.Christina@arm.com> writes:
>>>>> Maybe the target could use (subreg:SI (reg:BI ...)) as argument. Heh.
>>>> But then I'd still need to change the expansion code. I suppose this could
>>>> prevent the issue with changes to code on other targets.
>>>>
>>>>>>> We have undocumented addcc, negcc, etc. patterns, should we have aandcc
>>>> pattern for this indicating support for andcc + jump as opposedto cmpcc + jump?
>>>>>> This could work yeah. I didn't know these existed.
>>>>> Ah, so they are conditional add, not add setting CC, so andcc wouldn't
>>>>> be appropriate.
>>>>> So I'm not sure how we'd handle such situation - maybe looking at
>>>>> REG_DECL and recognizing a _Bool PARM_DECL is OK?
>>>> I have a slight suspicion that Richard Sandiford would likely reject this
>>>> though..
>>> Good guess :-P  We shouldn't rely on something like that for correctness.
>>>
>>> Would it help if we promoted the test-and-branch instructions to optabs,
>>> alongside cbranch?  The jump expanders could then target it directly.
>>>
>>> IMO that'd be a reasonable thing to do if it does help.  It's a relatively
>>> common operation, especially on CISCy targets.
>> But don't we represent these single bit tests using zero_extract as the
>> condition of the branch?  I guess if we can generate them directly
>> rather than waiting for combine to deduce that we're dealing with a
>> single bit test and constructing the zero_extract form would be an
>> improvement and might help aarch at the same time.
> Do you mean that the promote_mode stuff should use ext(z)v rather than
> zero_extend to promote a bool, where available?

No, just that if we're doing a single bit test that the way to handle 
that is with a zero_extract and the earlier we can generate that form, 
the better.


Jeff
  
Tamar Christina Sept. 30, 2022, 8 a.m. UTC | #15
> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of Richard
> Biener via Gcc-patches
> Sent: Thursday, September 29, 2022 12:09 PM
> To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>
> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; nd <nd@arm.com>
> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
> branches, give hint if argument is a truth type to backend
> 
> 
> 
> > Am 29.09.2022 um 12:23 schrieb Tamar Christina via Gcc-patches <gcc-
> patches@gcc.gnu.org>:
> >
> > 
> >>
> >> -----Original Message-----
> >> From: Richard Biener <rguenther@suse.de>
> >> Sent: Thursday, September 29, 2022 10:41 AM
> >> To: Richard Sandiford <Richard.Sandiford@arm.com>
> >> Cc: Jeff Law <jeffreyalaw@gmail.com>; Tamar Christina
> >> <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; nd
> <nd@arm.com>
> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
> >> branches, give hint if argument is a truth type to backend
> >>
> >>> On Thu, 29 Sep 2022, Richard Sandiford wrote:
> >>>
> >>> Jeff Law <jeffreyalaw@gmail.com> writes:
> >>>> On 9/28/22 09:04, Richard Sandiford wrote:
> >>>>> Tamar Christina <Tamar.Christina@arm.com> writes:
> >>>>>>> Maybe the target could use (subreg:SI (reg:BI ...)) as argument.
> Heh.
> >>>>>> But then I'd still need to change the expansion code. I suppose
> >>>>>> this could prevent the issue with changes to code on other targets.
> >>>>>>
> >>>>>>>>> We have undocumented addcc, negcc, etc. patterns, should we
> >>>>>>>>> have aandcc
> >>>>>> pattern for this indicating support for andcc + jump as opposedto
> >> cmpcc + jump?
> >>>>>>>> This could work yeah. I didn't know these existed.
> >>>>>>> Ah, so they are conditional add, not add setting CC, so andcc
> >>>>>>> wouldn't be appropriate.
> >>>>>>> So I'm not sure how we'd handle such situation - maybe looking
> >>>>>>> at REG_DECL and recognizing a _Bool PARM_DECL is OK?
> >>>>>> I have a slight suspicion that Richard Sandiford would likely
> >>>>>> reject this though..
> >>>>> Good guess :-P  We shouldn't rely on something like that for
> >> correctness.
> >>>>>
> >>>>> Would it help if we promoted the test-and-branch instructions to
> >>>>> optabs, alongside cbranch?  The jump expanders could then target
> >>>>> it
> >> directly.
> >>>>>
> >>>>> IMO that'd be a reasonable thing to do if it does help.  It's a
> >>>>> relatively common operation, especially on CISCy targets.
> >>>>
> >>>> But don't we represent these single bit tests using zero_extract as
> >>>> the condition of the branch?  I guess if we can generate them
> >>>> directly rather than waiting for combine to deduce that we're
> >>>> dealing with a single bit test and constructing the zero_extract
> >>>> form would be an improvement and might help aarch at the same time.
> >>>
> >>> Do you mean that the promote_mode stuff should use ext(z)v rather
> >>> than zero_extend to promote a bool, where available?  If so, I agree
> >>> that might help.  But it sounds like it would have downsides too.
> >>> Currently a bool memory can be zero-extended on the fly using a
> >>> load, but if we used the zero_extract form instead, we'd have to
> >>> extract the bit after the load.  And (as an alternative) choosing
> >>> different behaviour based on whether expand sees a REG or a MEM
> >>> sounds like it could still cause problems, since REGs could be
> >>> replaced by MEMs (or vice versa) later in the RTL passes.
> >>>
> >>> ISTM that the original patch was inserting an extra operation in the
> >>> branch expansion in order to target a specific instruction.
> >>> Targeting the instruction in expand seems good, but IMO we should do
> >>> it directly, based on knowledge of whether the instruction actually
> exists.
> >>
> >> Yes, I think a compare-and-branch pattern is the best fit here.  Note
> >> on GIMPLE we'd rely on the fact this is a BOOLEAN_TYPE (so even 8 bit
> >> precision bools only have 1 and 0 as meaningful values).
> >> So the 'compare-' bit in compare-and-branch would be interpreting a
> >> BOOLEAN_TYPE, not so much a general compare.
> >
> > Oh, I was thinking of adding a constant argument representing the
> > precision that is relevant for the compare in order to make this a bit more
> general/future proof.
> >
> > Are you thinking I should instead just make the optab implicitly only
> > work for 1-bit precision comparisons?
> 
> What’s the optab you propose (cite also the documentation part)?

tbranchmode5
  Conditional branch instruction combined with a bit test instruction. Operand 0 is a comparison operator.
  Operand 1 and Operand 2 are the first and second operands of the comparison, respectively.
  Operand 3 is the number of low-order bits that are relevant for the comparison.
  Operand 4 is the code_label to jump to.

Specifically this representation would allow us to emit all our different conditional branching instructions
without needing to rely on combine.  We have some cases that happen during optimization that sometimes prevent
the optimal sequence from being generated. This would also solve that as we would expand to what we want to
start with.

Tamar.

> 
> >
> > Thanks,
> > Tamar
> >
> >>
> >> Richard.
  
Richard Sandiford Sept. 30, 2022, 8:28 a.m. UTC | #16
Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Gcc-patches <gcc-patches-
>> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of Richard
>> Biener via Gcc-patches
>> Sent: Thursday, September 29, 2022 12:09 PM
>> To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>
>> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; nd <nd@arm.com>
>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
>> branches, give hint if argument is a truth type to backend
>>
>>
>>
>> > Am 29.09.2022 um 12:23 schrieb Tamar Christina via Gcc-patches <gcc-
>> patches@gcc.gnu.org>:
>> >
>> >
>> >>
>> >> -----Original Message-----
>> >> From: Richard Biener <rguenther@suse.de>
>> >> Sent: Thursday, September 29, 2022 10:41 AM
>> >> To: Richard Sandiford <Richard.Sandiford@arm.com>
>> >> Cc: Jeff Law <jeffreyalaw@gmail.com>; Tamar Christina
>> >> <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; nd
>> <nd@arm.com>
>> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
>> >> branches, give hint if argument is a truth type to backend
>> >>
>> >>> On Thu, 29 Sep 2022, Richard Sandiford wrote:
>> >>>
>> >>> Jeff Law <jeffreyalaw@gmail.com> writes:
>> >>>> On 9/28/22 09:04, Richard Sandiford wrote:
>> >>>>> Tamar Christina <Tamar.Christina@arm.com> writes:
>> >>>>>>> Maybe the target could use (subreg:SI (reg:BI ...)) as argument.
>> Heh.
>> >>>>>> But then I'd still need to change the expansion code. I suppose
>> >>>>>> this could prevent the issue with changes to code on other targets.
>> >>>>>>
>> >>>>>>>>> We have undocumented addcc, negcc, etc. patterns, should we
>> >>>>>>>>> have aandcc
>> >>>>>> pattern for this indicating support for andcc + jump as opposedto
>> >> cmpcc + jump?
>> >>>>>>>> This could work yeah. I didn't know these existed.
>> >>>>>>> Ah, so they are conditional add, not add setting CC, so andcc
>> >>>>>>> wouldn't be appropriate.
>> >>>>>>> So I'm not sure how we'd handle such situation - maybe looking
>> >>>>>>> at REG_DECL and recognizing a _Bool PARM_DECL is OK?
>> >>>>>> I have a slight suspicion that Richard Sandiford would likely
>> >>>>>> reject this though..
>> >>>>> Good guess :-P  We shouldn't rely on something like that for
>> >> correctness.
>> >>>>>
>> >>>>> Would it help if we promoted the test-and-branch instructions to
>> >>>>> optabs, alongside cbranch?  The jump expanders could then target
>> >>>>> it
>> >> directly.
>> >>>>>
>> >>>>> IMO that'd be a reasonable thing to do if it does help.  It's a
>> >>>>> relatively common operation, especially on CISCy targets.
>> >>>>
>> >>>> But don't we represent these single bit tests using zero_extract as
>> >>>> the condition of the branch?  I guess if we can generate them
>> >>>> directly rather than waiting for combine to deduce that we're
>> >>>> dealing with a single bit test and constructing the zero_extract
>> >>>> form would be an improvement and might help aarch at the same time.
>> >>>
>> >>> Do you mean that the promote_mode stuff should use ext(z)v rather
>> >>> than zero_extend to promote a bool, where available?  If so, I agree
>> >>> that might help.  But it sounds like it would have downsides too.
>> >>> Currently a bool memory can be zero-extended on the fly using a
>> >>> load, but if we used the zero_extract form instead, we'd have to
>> >>> extract the bit after the load.  And (as an alternative) choosing
>> >>> different behaviour based on whether expand sees a REG or a MEM
>> >>> sounds like it could still cause problems, since REGs could be
>> >>> replaced by MEMs (or vice versa) later in the RTL passes.
>> >>>
>> >>> ISTM that the original patch was inserting an extra operation in the
>> >>> branch expansion in order to target a specific instruction.
>> >>> Targeting the instruction in expand seems good, but IMO we should do
>> >>> it directly, based on knowledge of whether the instruction actually
>> exists.
>> >>
>> >> Yes, I think a compare-and-branch pattern is the best fit here.  Note
>> >> on GIMPLE we'd rely on the fact this is a BOOLEAN_TYPE (so even 8 bit
>> >> precision bools only have 1 and 0 as meaningful values).
>> >> So the 'compare-' bit in compare-and-branch would be interpreting a
>> >> BOOLEAN_TYPE, not so much a general compare.
>> >
>> > Oh, I was thinking of adding a constant argument representing the
>> > precision that is relevant for the compare in order to make this a bit more
>> general/future proof.
>> >
>> > Are you thinking I should instead just make the optab implicitly only
>> > work for 1-bit precision comparisons?
>>
>> What’s the optab you propose (cite also the documentation part)?
>
> tbranchmode5
>   Conditional branch instruction combined with a bit test instruction. Operand 0 is a comparison operator.
>   Operand 1 and Operand 2 are the first and second operands of the comparison, respectively.
>   Operand 3 is the number of low-order bits that are relevant for the comparison.
>   Operand 4 is the code_label to jump to.

For the TB instructions (and for other similar instructions that I've
seen on other architectures) it would be more useful to have a single-bit
test, with operand 4 specifying the bit position.  Arguably it might then
be better to have separate eq and ne optabs, to avoid the awkward doubling
of the operands (operand 1 contains operands 2 and 3).

I guess a more general way of achieving the same thing would be to make
operand 4 in the optab above a mask rather than a bit count.  But that
might be overly general, if there are no known architectures that have
such an instruction.

Thanks,
Richard

> Specifically this representation would allow us to emit all our different conditional branching instructions
> without needing to rely on combine.  We have some cases that happen during optimization that sometimes prevent
> the optimal sequence from being generated. This would also solve that as we would expand to what we want to
> start with.
>
> Tamar.
>
>>
>> >
>> > Thanks,
>> > Tamar
>> >
>> >>
>> >> Richard.
  
Tamar Christina Sept. 30, 2022, 8:38 a.m. UTC | #17
> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Friday, September 30, 2022 9:29 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Biener <rguenther@suse.de>; Tamar Christina via Gcc-patches
> <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff Law
> <jeffreyalaw@gmail.com>
> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
> branches, give hint if argument is a truth type to backend
> 
> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> -----Original Message-----
> >> From: Gcc-patches <gcc-patches-
> >> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of Richard
> >> Biener via Gcc-patches
> >> Sent: Thursday, September 29, 2022 12:09 PM
> >> To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>
> >> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; nd <nd@arm.com>
> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
> >> branches, give hint if argument is a truth type to backend
> >>
> >>
> >>
> >> > Am 29.09.2022 um 12:23 schrieb Tamar Christina via Gcc-patches
> >> > <gcc-
> >> patches@gcc.gnu.org>:
> >> >
> >> >
> >> >>
> >> >> -----Original Message-----
> >> >> From: Richard Biener <rguenther@suse.de>
> >> >> Sent: Thursday, September 29, 2022 10:41 AM
> >> >> To: Richard Sandiford <Richard.Sandiford@arm.com>
> >> >> Cc: Jeff Law <jeffreyalaw@gmail.com>; Tamar Christina
> >> >> <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; nd
> >> <nd@arm.com>
> >> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
> >> >> conditional branches, give hint if argument is a truth type to
> >> >> backend
> >> >>
> >> >>> On Thu, 29 Sep 2022, Richard Sandiford wrote:
> >> >>>
> >> >>> Jeff Law <jeffreyalaw@gmail.com> writes:
> >> >>>> On 9/28/22 09:04, Richard Sandiford wrote:
> >> >>>>> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> >>>>>>> Maybe the target could use (subreg:SI (reg:BI ...)) as argument.
> >> Heh.
> >> >>>>>> But then I'd still need to change the expansion code. I
> >> >>>>>> suppose this could prevent the issue with changes to code on
> other targets.
> >> >>>>>>
> >> >>>>>>>>> We have undocumented addcc, negcc, etc. patterns, should
> we
> >> >>>>>>>>> have aandcc
> >> >>>>>> pattern for this indicating support for andcc + jump as
> >> >>>>>> opposedto
> >> >> cmpcc + jump?
> >> >>>>>>>> This could work yeah. I didn't know these existed.
> >> >>>>>>> Ah, so they are conditional add, not add setting CC, so andcc
> >> >>>>>>> wouldn't be appropriate.
> >> >>>>>>> So I'm not sure how we'd handle such situation - maybe
> >> >>>>>>> looking at REG_DECL and recognizing a _Bool PARM_DECL is OK?
> >> >>>>>> I have a slight suspicion that Richard Sandiford would likely
> >> >>>>>> reject this though..
> >> >>>>> Good guess :-P  We shouldn't rely on something like that for
> >> >> correctness.
> >> >>>>>
> >> >>>>> Would it help if we promoted the test-and-branch instructions
> >> >>>>> to optabs, alongside cbranch?  The jump expanders could then
> >> >>>>> target it
> >> >> directly.
> >> >>>>>
> >> >>>>> IMO that'd be a reasonable thing to do if it does help.  It's a
> >> >>>>> relatively common operation, especially on CISCy targets.
> >> >>>>
> >> >>>> But don't we represent these single bit tests using zero_extract
> >> >>>> as the condition of the branch?  I guess if we can generate them
> >> >>>> directly rather than waiting for combine to deduce that we're
> >> >>>> dealing with a single bit test and constructing the zero_extract
> >> >>>> form would be an improvement and might help aarch at the same
> time.
> >> >>>
> >> >>> Do you mean that the promote_mode stuff should use ext(z)v rather
> >> >>> than zero_extend to promote a bool, where available?  If so, I
> >> >>> agree that might help.  But it sounds like it would have downsides
> too.
> >> >>> Currently a bool memory can be zero-extended on the fly using a
> >> >>> load, but if we used the zero_extract form instead, we'd have to
> >> >>> extract the bit after the load.  And (as an alternative) choosing
> >> >>> different behaviour based on whether expand sees a REG or a MEM
> >> >>> sounds like it could still cause problems, since REGs could be
> >> >>> replaced by MEMs (or vice versa) later in the RTL passes.
> >> >>>
> >> >>> ISTM that the original patch was inserting an extra operation in
> >> >>> the branch expansion in order to target a specific instruction.
> >> >>> Targeting the instruction in expand seems good, but IMO we should
> >> >>> do it directly, based on knowledge of whether the instruction
> >> >>> actually
> >> exists.
> >> >>
> >> >> Yes, I think a compare-and-branch pattern is the best fit here.
> >> >> Note on GIMPLE we'd rely on the fact this is a BOOLEAN_TYPE (so
> >> >> even 8 bit precision bools only have 1 and 0 as meaningful values).
> >> >> So the 'compare-' bit in compare-and-branch would be interpreting
> >> >> a BOOLEAN_TYPE, not so much a general compare.
> >> >
> >> > Oh, I was thinking of adding a constant argument representing the
> >> > precision that is relevant for the compare in order to make this a
> >> > bit more
> >> general/future proof.
> >> >
> >> > Are you thinking I should instead just make the optab implicitly
> >> > only work for 1-bit precision comparisons?
> >>
> >> What’s the optab you propose (cite also the documentation part)?
> >
> > tbranchmode5
> >   Conditional branch instruction combined with a bit test instruction.
> Operand 0 is a comparison operator.
> >   Operand 1 and Operand 2 are the first and second operands of the
> comparison, respectively.
> >   Operand 3 is the number of low-order bits that are relevant for the
> comparison.
> >   Operand 4 is the code_label to jump to.
> 
> For the TB instructions (and for other similar instructions that I've seen on
> other architectures) it would be more useful to have a single-bit test, with
> operand 4 specifying the bit position.  Arguably it might then be better to
> have separate eq and ne optabs, to avoid the awkward doubling of the
> operands (operand 1 contains operands 2 and 3).
> 
> I guess a more general way of achieving the same thing would be to make
> operand 4 in the optab above a mask rather than a bit count.  But that might
> be overly general, if there are no known architectures that have such an
> instruction.

One of the reasons I wanted a range rather than a single bit is that I can the use
this to generate cbz/cbnz early on as well.  This would mean we could use my earlier
patch that tried to drop the QI/HI promotions without needing the any_extend additional
pass if we wanted to.

We'd also no longer need to rely on seeing a paradoxical subreg for a tst.

Tamar.

> 
> Thanks,
> Richard
> 
> > Specifically this representation would allow us to emit all our
> > different conditional branching instructions without needing to rely
> > on combine.  We have some cases that happen during optimization that
> > sometimes prevent the optimal sequence from being generated. This
> would also solve that as we would expand to what we want to start with.
> >
> > Tamar.
> >
> >>
> >> >
> >> > Thanks,
> >> > Tamar
> >> >
> >> >>
> >> >> Richard.
  
Richard Sandiford Sept. 30, 2022, 8:48 a.m. UTC | #18
Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford <richard.sandiford@arm.com>
>> Sent: Friday, September 30, 2022 9:29 AM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: Richard Biener <rguenther@suse.de>; Tamar Christina via Gcc-patches
>> <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff Law
>> <jeffreyalaw@gmail.com>
>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
>> branches, give hint if argument is a truth type to backend
>> 
>> Tamar Christina <Tamar.Christina@arm.com> writes:
>> >> -----Original Message-----
>> >> From: Gcc-patches <gcc-patches-
>> >> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of Richard
>> >> Biener via Gcc-patches
>> >> Sent: Thursday, September 29, 2022 12:09 PM
>> >> To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>
>> >> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; nd <nd@arm.com>
>> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
>> >> branches, give hint if argument is a truth type to backend
>> >>
>> >>
>> >>
>> >> > Am 29.09.2022 um 12:23 schrieb Tamar Christina via Gcc-patches
>> >> > <gcc-
>> >> patches@gcc.gnu.org>:
>> >> >
>> >> >
>> >> >>
>> >> >> -----Original Message-----
>> >> >> From: Richard Biener <rguenther@suse.de>
>> >> >> Sent: Thursday, September 29, 2022 10:41 AM
>> >> >> To: Richard Sandiford <Richard.Sandiford@arm.com>
>> >> >> Cc: Jeff Law <jeffreyalaw@gmail.com>; Tamar Christina
>> >> >> <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; nd
>> >> <nd@arm.com>
>> >> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
>> >> >> conditional branches, give hint if argument is a truth type to
>> >> >> backend
>> >> >>
>> >> >>> On Thu, 29 Sep 2022, Richard Sandiford wrote:
>> >> >>>
>> >> >>> Jeff Law <jeffreyalaw@gmail.com> writes:
>> >> >>>> On 9/28/22 09:04, Richard Sandiford wrote:
>> >> >>>>> Tamar Christina <Tamar.Christina@arm.com> writes:
>> >> >>>>>>> Maybe the target could use (subreg:SI (reg:BI ...)) as argument.
>> >> Heh.
>> >> >>>>>> But then I'd still need to change the expansion code. I
>> >> >>>>>> suppose this could prevent the issue with changes to code on
>> other targets.
>> >> >>>>>>
>> >> >>>>>>>>> We have undocumented addcc, negcc, etc. patterns, should
>> we
>> >> >>>>>>>>> have aandcc
>> >> >>>>>> pattern for this indicating support for andcc + jump as
>> >> >>>>>> opposedto
>> >> >> cmpcc + jump?
>> >> >>>>>>>> This could work yeah. I didn't know these existed.
>> >> >>>>>>> Ah, so they are conditional add, not add setting CC, so andcc
>> >> >>>>>>> wouldn't be appropriate.
>> >> >>>>>>> So I'm not sure how we'd handle such situation - maybe
>> >> >>>>>>> looking at REG_DECL and recognizing a _Bool PARM_DECL is OK?
>> >> >>>>>> I have a slight suspicion that Richard Sandiford would likely
>> >> >>>>>> reject this though..
>> >> >>>>> Good guess :-P  We shouldn't rely on something like that for
>> >> >> correctness.
>> >> >>>>>
>> >> >>>>> Would it help if we promoted the test-and-branch instructions
>> >> >>>>> to optabs, alongside cbranch?  The jump expanders could then
>> >> >>>>> target it
>> >> >> directly.
>> >> >>>>>
>> >> >>>>> IMO that'd be a reasonable thing to do if it does help.  It's a
>> >> >>>>> relatively common operation, especially on CISCy targets.
>> >> >>>>
>> >> >>>> But don't we represent these single bit tests using zero_extract
>> >> >>>> as the condition of the branch?  I guess if we can generate them
>> >> >>>> directly rather than waiting for combine to deduce that we're
>> >> >>>> dealing with a single bit test and constructing the zero_extract
>> >> >>>> form would be an improvement and might help aarch at the same
>> time.
>> >> >>>
>> >> >>> Do you mean that the promote_mode stuff should use ext(z)v rather
>> >> >>> than zero_extend to promote a bool, where available?  If so, I
>> >> >>> agree that might help.  But it sounds like it would have downsides
>> too.
>> >> >>> Currently a bool memory can be zero-extended on the fly using a
>> >> >>> load, but if we used the zero_extract form instead, we'd have to
>> >> >>> extract the bit after the load.  And (as an alternative) choosing
>> >> >>> different behaviour based on whether expand sees a REG or a MEM
>> >> >>> sounds like it could still cause problems, since REGs could be
>> >> >>> replaced by MEMs (or vice versa) later in the RTL passes.
>> >> >>>
>> >> >>> ISTM that the original patch was inserting an extra operation in
>> >> >>> the branch expansion in order to target a specific instruction.
>> >> >>> Targeting the instruction in expand seems good, but IMO we should
>> >> >>> do it directly, based on knowledge of whether the instruction
>> >> >>> actually
>> >> exists.
>> >> >>
>> >> >> Yes, I think a compare-and-branch pattern is the best fit here.
>> >> >> Note on GIMPLE we'd rely on the fact this is a BOOLEAN_TYPE (so
>> >> >> even 8 bit precision bools only have 1 and 0 as meaningful values).
>> >> >> So the 'compare-' bit in compare-and-branch would be interpreting
>> >> >> a BOOLEAN_TYPE, not so much a general compare.
>> >> >
>> >> > Oh, I was thinking of adding a constant argument representing the
>> >> > precision that is relevant for the compare in order to make this a
>> >> > bit more
>> >> general/future proof.
>> >> >
>> >> > Are you thinking I should instead just make the optab implicitly
>> >> > only work for 1-bit precision comparisons?
>> >>
>> >> What’s the optab you propose (cite also the documentation part)?
>> >
>> > tbranchmode5
>> >   Conditional branch instruction combined with a bit test instruction.
>> Operand 0 is a comparison operator.
>> >   Operand 1 and Operand 2 are the first and second operands of the
>> comparison, respectively.
>> >   Operand 3 is the number of low-order bits that are relevant for the
>> comparison.
>> >   Operand 4 is the code_label to jump to.
>> 
>> For the TB instructions (and for other similar instructions that I've seen on
>> other architectures) it would be more useful to have a single-bit test, with
>> operand 4 specifying the bit position.  Arguably it might then be better to
>> have separate eq and ne optabs, to avoid the awkward doubling of the
>> operands (operand 1 contains operands 2 and 3).
>> 
>> I guess a more general way of achieving the same thing would be to make
>> operand 4 in the optab above a mask rather than a bit count.  But that might
>> be overly general, if there are no known architectures that have such an
>> instruction.
>
> One of the reasons I wanted a range rather than a single bit is that I can the use
> this to generate cbz/cbnz early on as well.

We already have the opportunity to do that via cbranch<mode>4.
But at the moment aarch64.md always forces the separate comparison
instead.  (Not sure why TBH.  Does it enable more ifcvt opportunities?)

If we change the body of cbranch<mode>4 to:

  if ((GET_CODE (operands[0]) != EQ && GET_CODE (operands[0]) != NE)
      || operands[2] != const0_rtx)
    {
      operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
					     operands[1], operands[2]);
      operands[2] = const0_rtx;
    }

then we generate the cbz/cbnz directly.

Thanks,
Richard


> This would mean we could use my earlier
> patch that tried to drop the QI/HI promotions without needing the any_extend additional
> pass if we wanted to.
>
> We'd also no longer need to rely on seeing a paradoxical subreg for a tst.
>
> Tamar.
>
>> 
>> Thanks,
>> Richard
>> 
>> > Specifically this representation would allow us to emit all our
>> > different conditional branching instructions without needing to rely
>> > on combine.  We have some cases that happen during optimization that
>> > sometimes prevent the optimal sequence from being generated. This
>> would also solve that as we would expand to what we want to start with.
>> >
>> > Tamar.
>> >
>> >>
>> >> >
>> >> > Thanks,
>> >> > Tamar
>> >> >
>> >> >>
>> >> >> Richard.
  
Tamar Christina Sept. 30, 2022, 9:15 a.m. UTC | #19
> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: Friday, September 30, 2022 9:49 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Biener <rguenther@suse.de>; Tamar Christina via Gcc-patches
> <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff Law
> <jeffreyalaw@gmail.com>
> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
> branches, give hint if argument is a truth type to backend
> 
> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> -----Original Message-----
> >> From: Richard Sandiford <richard.sandiford@arm.com>
> >> Sent: Friday, September 30, 2022 9:29 AM
> >> To: Tamar Christina <Tamar.Christina@arm.com>
> >> Cc: Richard Biener <rguenther@suse.de>; Tamar Christina via
> >> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff Law
> >> <jeffreyalaw@gmail.com>
> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
> >> branches, give hint if argument is a truth type to backend
> >>
> >> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> >> -----Original Message-----
> >> >> From: Gcc-patches <gcc-patches-
> >> >> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of
> Richard
> >> >> Biener via Gcc-patches
> >> >> Sent: Thursday, September 29, 2022 12:09 PM
> >> >> To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>
> >> >> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; nd
> <nd@arm.com>
> >> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
> >> >> conditional branches, give hint if argument is a truth type to
> >> >> backend
> >> >>
> >> >>
> >> >>
> >> >> > Am 29.09.2022 um 12:23 schrieb Tamar Christina via Gcc-patches
> >> >> > <gcc-
> >> >> patches@gcc.gnu.org>:
> >> >> >
> >> >> >
> >> >> >>
> >> >> >> -----Original Message-----
> >> >> >> From: Richard Biener <rguenther@suse.de>
> >> >> >> Sent: Thursday, September 29, 2022 10:41 AM
> >> >> >> To: Richard Sandiford <Richard.Sandiford@arm.com>
> >> >> >> Cc: Jeff Law <jeffreyalaw@gmail.com>; Tamar Christina
> >> >> >> <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; nd
> >> >> <nd@arm.com>
> >> >> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
> >> >> >> conditional branches, give hint if argument is a truth type to
> >> >> >> backend
> >> >> >>
> >> >> >>> On Thu, 29 Sep 2022, Richard Sandiford wrote:
> >> >> >>>
> >> >> >>> Jeff Law <jeffreyalaw@gmail.com> writes:
> >> >> >>>> On 9/28/22 09:04, Richard Sandiford wrote:
> >> >> >>>>> Tamar Christina <Tamar.Christina@arm.com> writes:
> >> >> >>>>>>> Maybe the target could use (subreg:SI (reg:BI ...)) as
> argument.
> >> >> Heh.
> >> >> >>>>>> But then I'd still need to change the expansion code. I
> >> >> >>>>>> suppose this could prevent the issue with changes to code
> >> >> >>>>>> on
> >> other targets.
> >> >> >>>>>>
> >> >> >>>>>>>>> We have undocumented addcc, negcc, etc. patterns,
> should
> >> we
> >> >> >>>>>>>>> have aandcc
> >> >> >>>>>> pattern for this indicating support for andcc + jump as
> >> >> >>>>>> opposedto
> >> >> >> cmpcc + jump?
> >> >> >>>>>>>> This could work yeah. I didn't know these existed.
> >> >> >>>>>>> Ah, so they are conditional add, not add setting CC, so
> >> >> >>>>>>> andcc wouldn't be appropriate.
> >> >> >>>>>>> So I'm not sure how we'd handle such situation - maybe
> >> >> >>>>>>> looking at REG_DECL and recognizing a _Bool PARM_DECL is
> OK?
> >> >> >>>>>> I have a slight suspicion that Richard Sandiford would
> >> >> >>>>>> likely reject this though..
> >> >> >>>>> Good guess :-P  We shouldn't rely on something like that for
> >> >> >> correctness.
> >> >> >>>>>
> >> >> >>>>> Would it help if we promoted the test-and-branch
> >> >> >>>>> instructions to optabs, alongside cbranch?  The jump
> >> >> >>>>> expanders could then target it
> >> >> >> directly.
> >> >> >>>>>
> >> >> >>>>> IMO that'd be a reasonable thing to do if it does help.
> >> >> >>>>> It's a relatively common operation, especially on CISCy targets.
> >> >> >>>>
> >> >> >>>> But don't we represent these single bit tests using
> >> >> >>>> zero_extract as the condition of the branch?  I guess if we
> >> >> >>>> can generate them directly rather than waiting for combine to
> >> >> >>>> deduce that we're dealing with a single bit test and
> >> >> >>>> constructing the zero_extract form would be an improvement
> >> >> >>>> and might help aarch at the same
> >> time.
> >> >> >>>
> >> >> >>> Do you mean that the promote_mode stuff should use ext(z)v
> >> >> >>> rather than zero_extend to promote a bool, where available?
> >> >> >>> If so, I agree that might help.  But it sounds like it would
> >> >> >>> have downsides
> >> too.
> >> >> >>> Currently a bool memory can be zero-extended on the fly using
> >> >> >>> a load, but if we used the zero_extract form instead, we'd
> >> >> >>> have to extract the bit after the load.  And (as an
> >> >> >>> alternative) choosing different behaviour based on whether
> >> >> >>> expand sees a REG or a MEM sounds like it could still cause
> >> >> >>> problems, since REGs could be replaced by MEMs (or vice versa)
> later in the RTL passes.
> >> >> >>>
> >> >> >>> ISTM that the original patch was inserting an extra operation
> >> >> >>> in the branch expansion in order to target a specific instruction.
> >> >> >>> Targeting the instruction in expand seems good, but IMO we
> >> >> >>> should do it directly, based on knowledge of whether the
> >> >> >>> instruction actually
> >> >> exists.
> >> >> >>
> >> >> >> Yes, I think a compare-and-branch pattern is the best fit here.
> >> >> >> Note on GIMPLE we'd rely on the fact this is a BOOLEAN_TYPE (so
> >> >> >> even 8 bit precision bools only have 1 and 0 as meaningful values).
> >> >> >> So the 'compare-' bit in compare-and-branch would be
> >> >> >> interpreting a BOOLEAN_TYPE, not so much a general compare.
> >> >> >
> >> >> > Oh, I was thinking of adding a constant argument representing
> >> >> > the precision that is relevant for the compare in order to make
> >> >> > this a bit more
> >> >> general/future proof.
> >> >> >
> >> >> > Are you thinking I should instead just make the optab implicitly
> >> >> > only work for 1-bit precision comparisons?
> >> >>
> >> >> What’s the optab you propose (cite also the documentation part)?
> >> >
> >> > tbranchmode5
> >> >   Conditional branch instruction combined with a bit test instruction.
> >> Operand 0 is a comparison operator.
> >> >   Operand 1 and Operand 2 are the first and second operands of the
> >> comparison, respectively.
> >> >   Operand 3 is the number of low-order bits that are relevant for
> >> > the
> >> comparison.
> >> >   Operand 4 is the code_label to jump to.
> >>
> >> For the TB instructions (and for other similar instructions that I've
> >> seen on other architectures) it would be more useful to have a
> >> single-bit test, with operand 4 specifying the bit position.
> >> Arguably it might then be better to have separate eq and ne optabs,
> >> to avoid the awkward doubling of the operands (operand 1 contains
> operands 2 and 3).
> >>
> >> I guess a more general way of achieving the same thing would be to
> >> make operand 4 in the optab above a mask rather than a bit count.
> >> But that might be overly general, if there are no known architectures
> >> that have such an instruction.
> >
> > One of the reasons I wanted a range rather than a single bit is that I
> > can the use this to generate cbz/cbnz early on as well.
> 
> We already have the opportunity to do that via cbranch<mode>4.
> But at the moment aarch64.md always forces the separate comparison
> instead.  (Not sure why TBH.  Does it enable more ifcvt opportunities?)
> 
> If we change the body of cbranch<mode>4 to:
> 
>   if ((GET_CODE (operands[0]) != EQ && GET_CODE (operands[0]) != NE)
>       || operands[2] != const0_rtx)
>     {
>       operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
> 					     operands[1], operands[2]);
>       operands[2] = const0_rtx;
>     }
> 
> then we generate the cbz/cbnz directly.
>

Ah ok, then if Richi agrees, bitpos it is then instead of bit count.

Tamar
 

> Thanks,
> Richard
> 
> 
> > This would mean we could use my earlier patch that tried to drop the
> > QI/HI promotions without needing the any_extend additional pass if we
> > wanted to.
> >
> > We'd also no longer need to rely on seeing a paradoxical subreg for a tst.
> >
> > Tamar.
> >
> >>
> >> Thanks,
> >> Richard
> >>
> >> > Specifically this representation would allow us to emit all our
> >> > different conditional branching instructions without needing to
> >> > rely on combine.  We have some cases that happen during
> >> > optimization that sometimes prevent the optimal sequence from being
> >> > generated. This
> >> would also solve that as we would expand to what we want to start with.
> >> >
> >> > Tamar.
> >> >
> >> >>
> >> >> >
> >> >> > Thanks,
> >> >> > Tamar
> >> >> >
> >> >> >>
> >> >> >> Richard.
  
Richard Biener Sept. 30, 2022, 10:16 a.m. UTC | #20
On Fri, 30 Sep 2022, Tamar Christina wrote:

> 
> 
> > -----Original Message-----
> > From: Richard Sandiford <richard.sandiford@arm.com>
> > Sent: Friday, September 30, 2022 9:49 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: Richard Biener <rguenther@suse.de>; Tamar Christina via Gcc-patches
> > <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff Law
> > <jeffreyalaw@gmail.com>
> > Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
> > branches, give hint if argument is a truth type to backend
> > 
> > Tamar Christina <Tamar.Christina@arm.com> writes:
> > >> -----Original Message-----
> > >> From: Richard Sandiford <richard.sandiford@arm.com>
> > >> Sent: Friday, September 30, 2022 9:29 AM
> > >> To: Tamar Christina <Tamar.Christina@arm.com>
> > >> Cc: Richard Biener <rguenther@suse.de>; Tamar Christina via
> > >> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff Law
> > >> <jeffreyalaw@gmail.com>
> > >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
> > >> branches, give hint if argument is a truth type to backend
> > >>
> > >> Tamar Christina <Tamar.Christina@arm.com> writes:
> > >> >> -----Original Message-----
> > >> >> From: Gcc-patches <gcc-patches-
> > >> >> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of
> > Richard
> > >> >> Biener via Gcc-patches
> > >> >> Sent: Thursday, September 29, 2022 12:09 PM
> > >> >> To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>
> > >> >> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; nd
> > <nd@arm.com>
> > >> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
> > >> >> conditional branches, give hint if argument is a truth type to
> > >> >> backend
> > >> >>
> > >> >>
> > >> >>
> > >> >> > Am 29.09.2022 um 12:23 schrieb Tamar Christina via Gcc-patches
> > >> >> > <gcc-
> > >> >> patches@gcc.gnu.org>:
> > >> >> >
> > >> >> >
> > >> >> >>
> > >> >> >> -----Original Message-----
> > >> >> >> From: Richard Biener <rguenther@suse.de>
> > >> >> >> Sent: Thursday, September 29, 2022 10:41 AM
> > >> >> >> To: Richard Sandiford <Richard.Sandiford@arm.com>
> > >> >> >> Cc: Jeff Law <jeffreyalaw@gmail.com>; Tamar Christina
> > >> >> >> <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; nd
> > >> >> <nd@arm.com>
> > >> >> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
> > >> >> >> conditional branches, give hint if argument is a truth type to
> > >> >> >> backend
> > >> >> >>
> > >> >> >>> On Thu, 29 Sep 2022, Richard Sandiford wrote:
> > >> >> >>>
> > >> >> >>> Jeff Law <jeffreyalaw@gmail.com> writes:
> > >> >> >>>> On 9/28/22 09:04, Richard Sandiford wrote:
> > >> >> >>>>> Tamar Christina <Tamar.Christina@arm.com> writes:
> > >> >> >>>>>>> Maybe the target could use (subreg:SI (reg:BI ...)) as
> > argument.
> > >> >> Heh.
> > >> >> >>>>>> But then I'd still need to change the expansion code. I
> > >> >> >>>>>> suppose this could prevent the issue with changes to code
> > >> >> >>>>>> on
> > >> other targets.
> > >> >> >>>>>>
> > >> >> >>>>>>>>> We have undocumented addcc, negcc, etc. patterns,
> > should
> > >> we
> > >> >> >>>>>>>>> have aandcc
> > >> >> >>>>>> pattern for this indicating support for andcc + jump as
> > >> >> >>>>>> opposedto
> > >> >> >> cmpcc + jump?
> > >> >> >>>>>>>> This could work yeah. I didn't know these existed.
> > >> >> >>>>>>> Ah, so they are conditional add, not add setting CC, so
> > >> >> >>>>>>> andcc wouldn't be appropriate.
> > >> >> >>>>>>> So I'm not sure how we'd handle such situation - maybe
> > >> >> >>>>>>> looking at REG_DECL and recognizing a _Bool PARM_DECL is
> > OK?
> > >> >> >>>>>> I have a slight suspicion that Richard Sandiford would
> > >> >> >>>>>> likely reject this though..
> > >> >> >>>>> Good guess :-P  We shouldn't rely on something like that for
> > >> >> >> correctness.
> > >> >> >>>>>
> > >> >> >>>>> Would it help if we promoted the test-and-branch
> > >> >> >>>>> instructions to optabs, alongside cbranch?  The jump
> > >> >> >>>>> expanders could then target it
> > >> >> >> directly.
> > >> >> >>>>>
> > >> >> >>>>> IMO that'd be a reasonable thing to do if it does help.
> > >> >> >>>>> It's a relatively common operation, especially on CISCy targets.
> > >> >> >>>>
> > >> >> >>>> But don't we represent these single bit tests using
> > >> >> >>>> zero_extract as the condition of the branch?  I guess if we
> > >> >> >>>> can generate them directly rather than waiting for combine to
> > >> >> >>>> deduce that we're dealing with a single bit test and
> > >> >> >>>> constructing the zero_extract form would be an improvement
> > >> >> >>>> and might help aarch at the same
> > >> time.
> > >> >> >>>
> > >> >> >>> Do you mean that the promote_mode stuff should use ext(z)v
> > >> >> >>> rather than zero_extend to promote a bool, where available?
> > >> >> >>> If so, I agree that might help.  But it sounds like it would
> > >> >> >>> have downsides
> > >> too.
> > >> >> >>> Currently a bool memory can be zero-extended on the fly using
> > >> >> >>> a load, but if we used the zero_extract form instead, we'd
> > >> >> >>> have to extract the bit after the load.  And (as an
> > >> >> >>> alternative) choosing different behaviour based on whether
> > >> >> >>> expand sees a REG or a MEM sounds like it could still cause
> > >> >> >>> problems, since REGs could be replaced by MEMs (or vice versa)
> > later in the RTL passes.
> > >> >> >>>
> > >> >> >>> ISTM that the original patch was inserting an extra operation
> > >> >> >>> in the branch expansion in order to target a specific instruction.
> > >> >> >>> Targeting the instruction in expand seems good, but IMO we
> > >> >> >>> should do it directly, based on knowledge of whether the
> > >> >> >>> instruction actually
> > >> >> exists.
> > >> >> >>
> > >> >> >> Yes, I think a compare-and-branch pattern is the best fit here.
> > >> >> >> Note on GIMPLE we'd rely on the fact this is a BOOLEAN_TYPE (so
> > >> >> >> even 8 bit precision bools only have 1 and 0 as meaningful values).
> > >> >> >> So the 'compare-' bit in compare-and-branch would be
> > >> >> >> interpreting a BOOLEAN_TYPE, not so much a general compare.
> > >> >> >
> > >> >> > Oh, I was thinking of adding a constant argument representing
> > >> >> > the precision that is relevant for the compare in order to make
> > >> >> > this a bit more
> > >> >> general/future proof.
> > >> >> >
> > >> >> > Are you thinking I should instead just make the optab implicitly
> > >> >> > only work for 1-bit precision comparisons?
> > >> >>
> > >> >> What?s the optab you propose (cite also the documentation part)?
> > >> >
> > >> > tbranchmode5
> > >> >   Conditional branch instruction combined with a bit test instruction.
> > >> Operand 0 is a comparison operator.
> > >> >   Operand 1 and Operand 2 are the first and second operands of the
> > >> comparison, respectively.
> > >> >   Operand 3 is the number of low-order bits that are relevant for
> > >> > the
> > >> comparison.
> > >> >   Operand 4 is the code_label to jump to.
> > >>
> > >> For the TB instructions (and for other similar instructions that I've
> > >> seen on other architectures) it would be more useful to have a
> > >> single-bit test, with operand 4 specifying the bit position.
> > >> Arguably it might then be better to have separate eq and ne optabs,
> > >> to avoid the awkward doubling of the operands (operand 1 contains
> > operands 2 and 3).
> > >>
> > >> I guess a more general way of achieving the same thing would be to
> > >> make operand 4 in the optab above a mask rather than a bit count.
> > >> But that might be overly general, if there are no known architectures
> > >> that have such an instruction.
> > >
> > > One of the reasons I wanted a range rather than a single bit is that I
> > > can the use this to generate cbz/cbnz early on as well.
> > 
> > We already have the opportunity to do that via cbranch<mode>4.
> > But at the moment aarch64.md always forces the separate comparison
> > instead.  (Not sure why TBH.  Does it enable more ifcvt opportunities?)
> > 
> > If we change the body of cbranch<mode>4 to:
> > 
> >   if ((GET_CODE (operands[0]) != EQ && GET_CODE (operands[0]) != NE)
> >       || operands[2] != const0_rtx)
> >     {
> >       operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
> > 					     operands[1], operands[2]);
> >       operands[2] = const0_rtx;
> >     }
> > 
> > then we generate the cbz/cbnz directly.
> >
> 
> Ah ok, then if Richi agrees, bitpos it is then instead of bit count.

Somehow I understood that cbranch<>4 is already fully capable of the
optimization?

On your earlier proposal I'd have commented that if it wouldn't make
sense to instead have a CCmode setter instead of an expander with
a branch label?  That would be a bit test, like {sign,zero}_extract
compared against zero which can then be combined with a branch?

Of course if the ISA is really bit-test-and-branch then cbranch<>4
itself or a variant of it might be more useful.  Maybe
cbranchbi4 would be "abused" for this?

> Tamar
>  
> 
> > Thanks,
> > Richard
> > 
> > 
> > > This would mean we could use my earlier patch that tried to drop the
> > > QI/HI promotions without needing the any_extend additional pass if we
> > > wanted to.
> > >
> > > We'd also no longer need to rely on seeing a paradoxical subreg for a tst.
> > >
> > > Tamar.
> > >
> > >>
> > >> Thanks,
> > >> Richard
> > >>
> > >> > Specifically this representation would allow us to emit all our
> > >> > different conditional branching instructions without needing to
> > >> > rely on combine.  We have some cases that happen during
> > >> > optimization that sometimes prevent the optimal sequence from being
> > >> > generated. This
> > >> would also solve that as we would expand to what we want to start with.
> > >> >
> > >> > Tamar.
> > >> >
> > >> >>
> > >> >> >
> > >> >> > Thanks,
> > >> >> > Tamar
> > >> >> >
> > >> >> >>
> > >> >> >> Richard.
>
  
Tamar Christina Sept. 30, 2022, 11:11 a.m. UTC | #21
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Friday, September 30, 2022 11:17 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Tamar Christina via
> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff Law
> <jeffreyalaw@gmail.com>
> Subject: RE: [PATCH 1/2]middle-end: RFC: On expansion of conditional
> branches, give hint if argument is a truth type to backend
> 
> On Fri, 30 Sep 2022, Tamar Christina wrote:
> 
> >
> >
> > > -----Original Message-----
> > > From: Richard Sandiford <richard.sandiford@arm.com>
> > > Sent: Friday, September 30, 2022 9:49 AM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: Richard Biener <rguenther@suse.de>; Tamar Christina via
> > > Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff Law
> > > <jeffreyalaw@gmail.com>
> > > Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
> > > branches, give hint if argument is a truth type to backend
> > >
> > > Tamar Christina <Tamar.Christina@arm.com> writes:
> > > >> -----Original Message-----
> > > >> From: Richard Sandiford <richard.sandiford@arm.com>
> > > >> Sent: Friday, September 30, 2022 9:29 AM
> > > >> To: Tamar Christina <Tamar.Christina@arm.com>
> > > >> Cc: Richard Biener <rguenther@suse.de>; Tamar Christina via
> > > >> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff
> Law
> > > >> <jeffreyalaw@gmail.com>
> > > >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
> > > >> conditional branches, give hint if argument is a truth type to
> > > >> backend
> > > >>
> > > >> Tamar Christina <Tamar.Christina@arm.com> writes:
> > > >> >> -----Original Message-----
> > > >> >> From: Gcc-patches <gcc-patches-
> > > >> >> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of
> > > Richard
> > > >> >> Biener via Gcc-patches
> > > >> >> Sent: Thursday, September 29, 2022 12:09 PM
> > > >> >> To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>
> > > >> >> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; nd
> > > <nd@arm.com>
> > > >> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
> > > >> >> conditional branches, give hint if argument is a truth type to
> > > >> >> backend
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> > Am 29.09.2022 um 12:23 schrieb Tamar Christina via
> > > >> >> > Gcc-patches
> > > >> >> > <gcc-
> > > >> >> patches@gcc.gnu.org>:
> > > >> >> >
> > > >> >> >
> > > >> >> >>
> > > >> >> >> -----Original Message-----
> > > >> >> >> From: Richard Biener <rguenther@suse.de>
> > > >> >> >> Sent: Thursday, September 29, 2022 10:41 AM
> > > >> >> >> To: Richard Sandiford <Richard.Sandiford@arm.com>
> > > >> >> >> Cc: Jeff Law <jeffreyalaw@gmail.com>; Tamar Christina
> > > >> >> >> <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; nd
> > > >> >> <nd@arm.com>
> > > >> >> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
> > > >> >> >> conditional branches, give hint if argument is a truth type
> > > >> >> >> to backend
> > > >> >> >>
> > > >> >> >>> On Thu, 29 Sep 2022, Richard Sandiford wrote:
> > > >> >> >>>
> > > >> >> >>> Jeff Law <jeffreyalaw@gmail.com> writes:
> > > >> >> >>>> On 9/28/22 09:04, Richard Sandiford wrote:
> > > >> >> >>>>> Tamar Christina <Tamar.Christina@arm.com> writes:
> > > >> >> >>>>>>> Maybe the target could use (subreg:SI (reg:BI ...)) as
> > > argument.
> > > >> >> Heh.
> > > >> >> >>>>>> But then I'd still need to change the expansion code. I
> > > >> >> >>>>>> suppose this could prevent the issue with changes to
> > > >> >> >>>>>> code on
> > > >> other targets.
> > > >> >> >>>>>>
> > > >> >> >>>>>>>>> We have undocumented addcc, negcc, etc. patterns,
> > > should
> > > >> we
> > > >> >> >>>>>>>>> have aandcc
> > > >> >> >>>>>> pattern for this indicating support for andcc + jump as
> > > >> >> >>>>>> opposedto
> > > >> >> >> cmpcc + jump?
> > > >> >> >>>>>>>> This could work yeah. I didn't know these existed.
> > > >> >> >>>>>>> Ah, so they are conditional add, not add setting CC,
> > > >> >> >>>>>>> so andcc wouldn't be appropriate.
> > > >> >> >>>>>>> So I'm not sure how we'd handle such situation - maybe
> > > >> >> >>>>>>> looking at REG_DECL and recognizing a _Bool PARM_DECL
> > > >> >> >>>>>>> is
> > > OK?
> > > >> >> >>>>>> I have a slight suspicion that Richard Sandiford would
> > > >> >> >>>>>> likely reject this though..
> > > >> >> >>>>> Good guess :-P  We shouldn't rely on something like that
> > > >> >> >>>>> for
> > > >> >> >> correctness.
> > > >> >> >>>>>
> > > >> >> >>>>> Would it help if we promoted the test-and-branch
> > > >> >> >>>>> instructions to optabs, alongside cbranch?  The jump
> > > >> >> >>>>> expanders could then target it
> > > >> >> >> directly.
> > > >> >> >>>>>
> > > >> >> >>>>> IMO that'd be a reasonable thing to do if it does help.
> > > >> >> >>>>> It's a relatively common operation, especially on CISCy
> targets.
> > > >> >> >>>>
> > > >> >> >>>> But don't we represent these single bit tests using
> > > >> >> >>>> zero_extract as the condition of the branch?  I guess if
> > > >> >> >>>> we can generate them directly rather than waiting for
> > > >> >> >>>> combine to deduce that we're dealing with a single bit
> > > >> >> >>>> test and constructing the zero_extract form would be an
> > > >> >> >>>> improvement and might help aarch at the same
> > > >> time.
> > > >> >> >>>
> > > >> >> >>> Do you mean that the promote_mode stuff should use ext(z)v
> > > >> >> >>> rather than zero_extend to promote a bool, where available?
> > > >> >> >>> If so, I agree that might help.  But it sounds like it
> > > >> >> >>> would have downsides
> > > >> too.
> > > >> >> >>> Currently a bool memory can be zero-extended on the fly
> > > >> >> >>> using a load, but if we used the zero_extract form
> > > >> >> >>> instead, we'd have to extract the bit after the load.  And
> > > >> >> >>> (as an
> > > >> >> >>> alternative) choosing different behaviour based on whether
> > > >> >> >>> expand sees a REG or a MEM sounds like it could still
> > > >> >> >>> cause problems, since REGs could be replaced by MEMs (or
> > > >> >> >>> vice versa)
> > > later in the RTL passes.
> > > >> >> >>>
> > > >> >> >>> ISTM that the original patch was inserting an extra
> > > >> >> >>> operation in the branch expansion in order to target a specific
> instruction.
> > > >> >> >>> Targeting the instruction in expand seems good, but IMO we
> > > >> >> >>> should do it directly, based on knowledge of whether the
> > > >> >> >>> instruction actually
> > > >> >> exists.
> > > >> >> >>
> > > >> >> >> Yes, I think a compare-and-branch pattern is the best fit here.
> > > >> >> >> Note on GIMPLE we'd rely on the fact this is a BOOLEAN_TYPE
> > > >> >> >> (so even 8 bit precision bools only have 1 and 0 as meaningful
> values).
> > > >> >> >> So the 'compare-' bit in compare-and-branch would be
> > > >> >> >> interpreting a BOOLEAN_TYPE, not so much a general compare.
> > > >> >> >
> > > >> >> > Oh, I was thinking of adding a constant argument
> > > >> >> > representing the precision that is relevant for the compare
> > > >> >> > in order to make this a bit more
> > > >> >> general/future proof.
> > > >> >> >
> > > >> >> > Are you thinking I should instead just make the optab
> > > >> >> > implicitly only work for 1-bit precision comparisons?
> > > >> >>
> > > >> >> What?s the optab you propose (cite also the documentation part)?
> > > >> >
> > > >> > tbranchmode5
> > > >> >   Conditional branch instruction combined with a bit test instruction.
> > > >> Operand 0 is a comparison operator.
> > > >> >   Operand 1 and Operand 2 are the first and second operands of
> > > >> > the
> > > >> comparison, respectively.
> > > >> >   Operand 3 is the number of low-order bits that are relevant
> > > >> > for the
> > > >> comparison.
> > > >> >   Operand 4 is the code_label to jump to.
> > > >>
> > > >> For the TB instructions (and for other similar instructions that
> > > >> I've seen on other architectures) it would be more useful to have
> > > >> a single-bit test, with operand 4 specifying the bit position.
> > > >> Arguably it might then be better to have separate eq and ne
> > > >> optabs, to avoid the awkward doubling of the operands (operand 1
> > > >> contains
> > > operands 2 and 3).
> > > >>
> > > >> I guess a more general way of achieving the same thing would be
> > > >> to make operand 4 in the optab above a mask rather than a bit count.
> > > >> But that might be overly general, if there are no known
> > > >> architectures that have such an instruction.
> > > >
> > > > One of the reasons I wanted a range rather than a single bit is
> > > > that I can the use this to generate cbz/cbnz early on as well.
> > >
> > > We already have the opportunity to do that via cbranch<mode>4.
> > > But at the moment aarch64.md always forces the separate comparison
> > > instead.  (Not sure why TBH.  Does it enable more ifcvt
> > > opportunities?)
> > >
> > > If we change the body of cbranch<mode>4 to:
> > >
> > >   if ((GET_CODE (operands[0]) != EQ && GET_CODE (operands[0]) != NE)
> > >       || operands[2] != const0_rtx)
> > >     {
> > >       operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
> > > 					     operands[1], operands[2]);
> > >       operands[2] = const0_rtx;
> > >     }
> > >
> > > then we generate the cbz/cbnz directly.
> > >
> >
> > Ah ok, then if Richi agrees, bitpos it is then instead of bit count.
> 
> Somehow I understood that cbranch<>4 is already fully capable of the
> optimization?
> 
> On your earlier proposal I'd have commented that if it wouldn't make sense
> to instead have a CCmode setter instead of an expander with a branch label?
> That would be a bit test, like {sign,zero}_extract compared against zero which
> can then be combined with a branch?
>

I missed that part, that could work too.

> Of course if the ISA is really bit-test-and-branch then cbranch<>4 itself or a
> variant of it might be more useful.  Maybe
> cbranchbi4 would be "abused" for this?

The instruction is an actual bit-test-and-branch with any arbitrary bitpos.
Yes we can abuse cbranchbi4 for this, but then it also means we can't e.g.
use this to optimize a < 0 where a is a signed value.  With the new optab
this would just be a bit-test-and-branch of the sign bit.

But also I'm not entirely convinced that using `BImode` and assuming a single
bit is safe here. What happens if I compile my source with -std=c89?

So I personally think the new optab makes more sense here. The CC setter would work too.

I guess my question is, between you folks, which approach would you like. It seems that Richi
You'd like a CC setter. Richard do you have a preference of one over the other?

Tamar

> 
> > Tamar
> >
> >
> > > Thanks,
> > > Richard
> > >
> > >
> > > > This would mean we could use my earlier patch that tried to drop
> > > > the QI/HI promotions without needing the any_extend additional
> > > > pass if we wanted to.
> > > >
> > > > We'd also no longer need to rely on seeing a paradoxical subreg for a
> tst.
> > > >
> > > > Tamar.
> > > >
> > > >>
> > > >> Thanks,
> > > >> Richard
> > > >>
> > > >> > Specifically this representation would allow us to emit all our
> > > >> > different conditional branching instructions without needing to
> > > >> > rely on combine.  We have some cases that happen during
> > > >> > optimization that sometimes prevent the optimal sequence from
> > > >> > being generated. This
> > > >> would also solve that as we would expand to what we want to start
> with.
> > > >> >
> > > >> > Tamar.
> > > >> >
> > > >> >>
> > > >> >> >
> > > >> >> > Thanks,
> > > >> >> > Tamar
> > > >> >> >
> > > >> >> >>
> > > >> >> >> Richard.
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald,
> Boudien Moerman; HRB 36809 (AG Nuernberg)
  
Richard Biener Sept. 30, 2022, 11:52 a.m. UTC | #22
On Fri, 30 Sep 2022, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Friday, September 30, 2022 11:17 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Tamar Christina via
> > Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff Law
> > <jeffreyalaw@gmail.com>
> > Subject: RE: [PATCH 1/2]middle-end: RFC: On expansion of conditional
> > branches, give hint if argument is a truth type to backend
> > 
> > On Fri, 30 Sep 2022, Tamar Christina wrote:
> > 
> > >
> > >
> > > > -----Original Message-----
> > > > From: Richard Sandiford <richard.sandiford@arm.com>
> > > > Sent: Friday, September 30, 2022 9:49 AM
> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > Cc: Richard Biener <rguenther@suse.de>; Tamar Christina via
> > > > Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff Law
> > > > <jeffreyalaw@gmail.com>
> > > > Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of conditional
> > > > branches, give hint if argument is a truth type to backend
> > > >
> > > > Tamar Christina <Tamar.Christina@arm.com> writes:
> > > > >> -----Original Message-----
> > > > >> From: Richard Sandiford <richard.sandiford@arm.com>
> > > > >> Sent: Friday, September 30, 2022 9:29 AM
> > > > >> To: Tamar Christina <Tamar.Christina@arm.com>
> > > > >> Cc: Richard Biener <rguenther@suse.de>; Tamar Christina via
> > > > >> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff
> > Law
> > > > >> <jeffreyalaw@gmail.com>
> > > > >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
> > > > >> conditional branches, give hint if argument is a truth type to
> > > > >> backend
> > > > >>
> > > > >> Tamar Christina <Tamar.Christina@arm.com> writes:
> > > > >> >> -----Original Message-----
> > > > >> >> From: Gcc-patches <gcc-patches-
> > > > >> >> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of
> > > > Richard
> > > > >> >> Biener via Gcc-patches
> > > > >> >> Sent: Thursday, September 29, 2022 12:09 PM
> > > > >> >> To: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org>
> > > > >> >> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; nd
> > > > <nd@arm.com>
> > > > >> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
> > > > >> >> conditional branches, give hint if argument is a truth type to
> > > > >> >> backend
> > > > >> >>
> > > > >> >>
> > > > >> >>
> > > > >> >> > Am 29.09.2022 um 12:23 schrieb Tamar Christina via
> > > > >> >> > Gcc-patches
> > > > >> >> > <gcc-
> > > > >> >> patches@gcc.gnu.org>:
> > > > >> >> >
> > > > >> >> >
> > > > >> >> >>
> > > > >> >> >> -----Original Message-----
> > > > >> >> >> From: Richard Biener <rguenther@suse.de>
> > > > >> >> >> Sent: Thursday, September 29, 2022 10:41 AM
> > > > >> >> >> To: Richard Sandiford <Richard.Sandiford@arm.com>
> > > > >> >> >> Cc: Jeff Law <jeffreyalaw@gmail.com>; Tamar Christina
> > > > >> >> >> <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; nd
> > > > >> >> <nd@arm.com>
> > > > >> >> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
> > > > >> >> >> conditional branches, give hint if argument is a truth type
> > > > >> >> >> to backend
> > > > >> >> >>
> > > > >> >> >>> On Thu, 29 Sep 2022, Richard Sandiford wrote:
> > > > >> >> >>>
> > > > >> >> >>> Jeff Law <jeffreyalaw@gmail.com> writes:
> > > > >> >> >>>> On 9/28/22 09:04, Richard Sandiford wrote:
> > > > >> >> >>>>> Tamar Christina <Tamar.Christina@arm.com> writes:
> > > > >> >> >>>>>>> Maybe the target could use (subreg:SI (reg:BI ...)) as
> > > > argument.
> > > > >> >> Heh.
> > > > >> >> >>>>>> But then I'd still need to change the expansion code. I
> > > > >> >> >>>>>> suppose this could prevent the issue with changes to
> > > > >> >> >>>>>> code on
> > > > >> other targets.
> > > > >> >> >>>>>>
> > > > >> >> >>>>>>>>> We have undocumented addcc, negcc, etc. patterns,
> > > > should
> > > > >> we
> > > > >> >> >>>>>>>>> have aandcc
> > > > >> >> >>>>>> pattern for this indicating support for andcc + jump as
> > > > >> >> >>>>>> opposedto
> > > > >> >> >> cmpcc + jump?
> > > > >> >> >>>>>>>> This could work yeah. I didn't know these existed.
> > > > >> >> >>>>>>> Ah, so they are conditional add, not add setting CC,
> > > > >> >> >>>>>>> so andcc wouldn't be appropriate.
> > > > >> >> >>>>>>> So I'm not sure how we'd handle such situation - maybe
> > > > >> >> >>>>>>> looking at REG_DECL and recognizing a _Bool PARM_DECL
> > > > >> >> >>>>>>> is
> > > > OK?
> > > > >> >> >>>>>> I have a slight suspicion that Richard Sandiford would
> > > > >> >> >>>>>> likely reject this though..
> > > > >> >> >>>>> Good guess :-P  We shouldn't rely on something like that
> > > > >> >> >>>>> for
> > > > >> >> >> correctness.
> > > > >> >> >>>>>
> > > > >> >> >>>>> Would it help if we promoted the test-and-branch
> > > > >> >> >>>>> instructions to optabs, alongside cbranch?  The jump
> > > > >> >> >>>>> expanders could then target it
> > > > >> >> >> directly.
> > > > >> >> >>>>>
> > > > >> >> >>>>> IMO that'd be a reasonable thing to do if it does help.
> > > > >> >> >>>>> It's a relatively common operation, especially on CISCy
> > targets.
> > > > >> >> >>>>
> > > > >> >> >>>> But don't we represent these single bit tests using
> > > > >> >> >>>> zero_extract as the condition of the branch?  I guess if
> > > > >> >> >>>> we can generate them directly rather than waiting for
> > > > >> >> >>>> combine to deduce that we're dealing with a single bit
> > > > >> >> >>>> test and constructing the zero_extract form would be an
> > > > >> >> >>>> improvement and might help aarch at the same
> > > > >> time.
> > > > >> >> >>>
> > > > >> >> >>> Do you mean that the promote_mode stuff should use ext(z)v
> > > > >> >> >>> rather than zero_extend to promote a bool, where available?
> > > > >> >> >>> If so, I agree that might help.  But it sounds like it
> > > > >> >> >>> would have downsides
> > > > >> too.
> > > > >> >> >>> Currently a bool memory can be zero-extended on the fly
> > > > >> >> >>> using a load, but if we used the zero_extract form
> > > > >> >> >>> instead, we'd have to extract the bit after the load.  And
> > > > >> >> >>> (as an
> > > > >> >> >>> alternative) choosing different behaviour based on whether
> > > > >> >> >>> expand sees a REG or a MEM sounds like it could still
> > > > >> >> >>> cause problems, since REGs could be replaced by MEMs (or
> > > > >> >> >>> vice versa)
> > > > later in the RTL passes.
> > > > >> >> >>>
> > > > >> >> >>> ISTM that the original patch was inserting an extra
> > > > >> >> >>> operation in the branch expansion in order to target a specific
> > instruction.
> > > > >> >> >>> Targeting the instruction in expand seems good, but IMO we
> > > > >> >> >>> should do it directly, based on knowledge of whether the
> > > > >> >> >>> instruction actually
> > > > >> >> exists.
> > > > >> >> >>
> > > > >> >> >> Yes, I think a compare-and-branch pattern is the best fit here.
> > > > >> >> >> Note on GIMPLE we'd rely on the fact this is a BOOLEAN_TYPE
> > > > >> >> >> (so even 8 bit precision bools only have 1 and 0 as meaningful
> > values).
> > > > >> >> >> So the 'compare-' bit in compare-and-branch would be
> > > > >> >> >> interpreting a BOOLEAN_TYPE, not so much a general compare.
> > > > >> >> >
> > > > >> >> > Oh, I was thinking of adding a constant argument
> > > > >> >> > representing the precision that is relevant for the compare
> > > > >> >> > in order to make this a bit more
> > > > >> >> general/future proof.
> > > > >> >> >
> > > > >> >> > Are you thinking I should instead just make the optab
> > > > >> >> > implicitly only work for 1-bit precision comparisons?
> > > > >> >>
> > > > >> >> What?s the optab you propose (cite also the documentation part)?
> > > > >> >
> > > > >> > tbranchmode5
> > > > >> >   Conditional branch instruction combined with a bit test instruction.
> > > > >> Operand 0 is a comparison operator.
> > > > >> >   Operand 1 and Operand 2 are the first and second operands of
> > > > >> > the
> > > > >> comparison, respectively.
> > > > >> >   Operand 3 is the number of low-order bits that are relevant
> > > > >> > for the
> > > > >> comparison.
> > > > >> >   Operand 4 is the code_label to jump to.
> > > > >>
> > > > >> For the TB instructions (and for other similar instructions that
> > > > >> I've seen on other architectures) it would be more useful to have
> > > > >> a single-bit test, with operand 4 specifying the bit position.
> > > > >> Arguably it might then be better to have separate eq and ne
> > > > >> optabs, to avoid the awkward doubling of the operands (operand 1
> > > > >> contains
> > > > operands 2 and 3).
> > > > >>
> > > > >> I guess a more general way of achieving the same thing would be
> > > > >> to make operand 4 in the optab above a mask rather than a bit count.
> > > > >> But that might be overly general, if there are no known
> > > > >> architectures that have such an instruction.
> > > > >
> > > > > One of the reasons I wanted a range rather than a single bit is
> > > > > that I can the use this to generate cbz/cbnz early on as well.
> > > >
> > > > We already have the opportunity to do that via cbranch<mode>4.
> > > > But at the moment aarch64.md always forces the separate comparison
> > > > instead.  (Not sure why TBH.  Does it enable more ifcvt
> > > > opportunities?)
> > > >
> > > > If we change the body of cbranch<mode>4 to:
> > > >
> > > >   if ((GET_CODE (operands[0]) != EQ && GET_CODE (operands[0]) != NE)
> > > >       || operands[2] != const0_rtx)
> > > >     {
> > > >       operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]),
> > > > 					     operands[1], operands[2]);
> > > >       operands[2] = const0_rtx;
> > > >     }
> > > >
> > > > then we generate the cbz/cbnz directly.
> > > >
> > >
> > > Ah ok, then if Richi agrees, bitpos it is then instead of bit count.
> > 
> > Somehow I understood that cbranch<>4 is already fully capable of the
> > optimization?
> > 
> > On your earlier proposal I'd have commented that if it wouldn't make sense
> > to instead have a CCmode setter instead of an expander with a branch label?
> > That would be a bit test, like {sign,zero}_extract compared against zero which
> > can then be combined with a branch?
> >
> 
> I missed that part, that could work too.
> 
> > Of course if the ISA is really bit-test-and-branch then cbranch<>4 itself or a
> > variant of it might be more useful.  Maybe
> > cbranchbi4 would be "abused" for this?
> 
> The instruction is an actual bit-test-and-branch with any arbitrary bitpos.
> Yes we can abuse cbranchbi4 for this, but then it also means we can't e.g.
> use this to optimize a < 0 where a is a signed value.  With the new optab
> this would just be a bit-test-and-branch of the sign bit.
> 
> But also I'm not entirely convinced that using `BImode` and assuming a single
> bit is safe here. What happens if I compile my source with -std=c89?
> 
> So I personally think the new optab makes more sense here. The CC setter would work too.
> 
> I guess my question is, between you folks, which approach would you like. It seems that Richi
> You'd like a CC setter. Richard do you have a preference of one over the other?

My order of preference is

a) an existing pattern, if possible
b) something usable by N > 1 targets we know of, even if it requires
some combine magic
c) something that fits the actual ISA

For b), x86 has BEXTR which performs a zero_extract from reg/mem
and sets ZF when the result is zero.  For a branch on sign bit there's
easier ways, so it's probably not very useful for general compare and 
branch optimization and if (a & 0x30) is probably handled by combine(?).
It also seems that if (a & 1) is handled for aarch64 already and
it's just a lack of an optab that allows RTL expansion to expand
if (bool) as if (bool & 1)?

Richard.

> Tamar
> 
> > 
> > > Tamar
> > >
> > >
> > > > Thanks,
> > > > Richard
> > > >
> > > >
> > > > > This would mean we could use my earlier patch that tried to drop
> > > > > the QI/HI promotions without needing the any_extend additional
> > > > > pass if we wanted to.
> > > > >
> > > > > We'd also no longer need to rely on seeing a paradoxical subreg for a
> > tst.
> > > > >
> > > > > Tamar.
> > > > >
> > > > >>
> > > > >> Thanks,
> > > > >> Richard
> > > > >>
> > > > >> > Specifically this representation would allow us to emit all our
> > > > >> > different conditional branching instructions without needing to
> > > > >> > rely on combine.  We have some cases that happen during
> > > > >> > optimization that sometimes prevent the optimal sequence from
> > > > >> > being generated. This
> > > > >> would also solve that as we would expand to what we want to start
> > with.
> > > > >> >
> > > > >> > Tamar.
> > > > >> >
> > > > >> >>
> > > > >> >> >
> > > > >> >> > Thanks,
> > > > >> >> > Tamar
> > > > >> >> >
> > > > >> >> >>
> > > > >> >> >> Richard.
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald,
> > Boudien Moerman; HRB 36809 (AG Nuernberg)
>
  
Tamar Christina Sept. 30, 2022, 12:48 p.m. UTC | #23
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Friday, September 30, 2022 12:53 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Tamar Christina via
> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff Law
> <jeffreyalaw@gmail.com>
> Subject: RE: [PATCH 1/2]middle-end: RFC: On expansion of conditional
> branches, give hint if argument is a truth type to backend
> 
> On Fri, 30 Sep 2022, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Friday, September 30, 2022 11:17 AM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Tamar Christina
> > > via Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff
> Law
> > > <jeffreyalaw@gmail.com>
> > > Subject: RE: [PATCH 1/2]middle-end: RFC: On expansion of conditional
> > > branches, give hint if argument is a truth type to backend
> > >
> > > On Fri, 30 Sep 2022, Tamar Christina wrote:
> > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Richard Sandiford <richard.sandiford@arm.com>
> > > > > Sent: Friday, September 30, 2022 9:49 AM
> > > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > Cc: Richard Biener <rguenther@suse.de>; Tamar Christina via
> > > > > Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff
> Law
> > > > > <jeffreyalaw@gmail.com>
> > > > > Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
> > > > > conditional branches, give hint if argument is a truth type to
> > > > > backend
> > > > >
> > > > > Tamar Christina <Tamar.Christina@arm.com> writes:
> > > > > >> -----Original Message-----
> > > > > >> From: Richard Sandiford <richard.sandiford@arm.com>
> > > > > >> Sent: Friday, September 30, 2022 9:29 AM
> > > > > >> To: Tamar Christina <Tamar.Christina@arm.com>
> > > > > >> Cc: Richard Biener <rguenther@suse.de>; Tamar Christina via
> > > > > >> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff
> > > Law
> > > > > >> <jeffreyalaw@gmail.com>
> > > > > >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
> > > > > >> conditional branches, give hint if argument is a truth type
> > > > > >> to backend
> > > > > >>
> > > > > >> Tamar Christina <Tamar.Christina@arm.com> writes:
> > > > > >> >> -----Original Message-----
> > > > > >> >> From: Gcc-patches <gcc-patches-
> > > > > >> >> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of
> > > > > Richard
> > > > > >> >> Biener via Gcc-patches
> > > > > >> >> Sent: Thursday, September 29, 2022 12:09 PM
> > > > > >> >> To: Tamar Christina via Gcc-patches
> > > > > >> >> <gcc-patches@gcc.gnu.org>
> > > > > >> >> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; nd
> > > > > <nd@arm.com>
> > > > > >> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
> > > > > >> >> conditional branches, give hint if argument is a truth
> > > > > >> >> type to backend
> > > > > >> >>
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> > Am 29.09.2022 um 12:23 schrieb Tamar Christina via
> > > > > >> >> > Gcc-patches
> > > > > >> >> > <gcc-
> > > > > >> >> patches@gcc.gnu.org>:
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> >>
> > > > > >> >> >> -----Original Message-----
> > > > > >> >> >> From: Richard Biener <rguenther@suse.de>
> > > > > >> >> >> Sent: Thursday, September 29, 2022 10:41 AM
> > > > > >> >> >> To: Richard Sandiford <Richard.Sandiford@arm.com>
> > > > > >> >> >> Cc: Jeff Law <jeffreyalaw@gmail.com>; Tamar Christina
> > > > > >> >> >> <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; nd
> > > > > >> >> <nd@arm.com>
> > > > > >> >> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion
> > > > > >> >> >> of conditional branches, give hint if argument is a
> > > > > >> >> >> truth type to backend
> > > > > >> >> >>
> > > > > >> >> >>> On Thu, 29 Sep 2022, Richard Sandiford wrote:
> > > > > >> >> >>>
> > > > > >> >> >>> Jeff Law <jeffreyalaw@gmail.com> writes:
> > > > > >> >> >>>> On 9/28/22 09:04, Richard Sandiford wrote:
> > > > > >> >> >>>>> Tamar Christina <Tamar.Christina@arm.com> writes:
> > > > > >> >> >>>>>>> Maybe the target could use (subreg:SI (reg:BI
> > > > > >> >> >>>>>>> ...)) as
> > > > > argument.
> > > > > >> >> Heh.
> > > > > >> >> >>>>>> But then I'd still need to change the expansion
> > > > > >> >> >>>>>> code. I suppose this could prevent the issue with
> > > > > >> >> >>>>>> changes to code on
> > > > > >> other targets.
> > > > > >> >> >>>>>>
> > > > > >> >> >>>>>>>>> We have undocumented addcc, negcc, etc.
> > > > > >> >> >>>>>>>>> patterns,
> > > > > should
> > > > > >> we
> > > > > >> >> >>>>>>>>> have aandcc
> > > > > >> >> >>>>>> pattern for this indicating support for andcc +
> > > > > >> >> >>>>>> jump as opposedto
> > > > > >> >> >> cmpcc + jump?
> > > > > >> >> >>>>>>>> This could work yeah. I didn't know these existed.
> > > > > >> >> >>>>>>> Ah, so they are conditional add, not add setting
> > > > > >> >> >>>>>>> CC, so andcc wouldn't be appropriate.
> > > > > >> >> >>>>>>> So I'm not sure how we'd handle such situation -
> > > > > >> >> >>>>>>> maybe looking at REG_DECL and recognizing a _Bool
> > > > > >> >> >>>>>>> PARM_DECL is
> > > > > OK?
> > > > > >> >> >>>>>> I have a slight suspicion that Richard Sandiford
> > > > > >> >> >>>>>> would likely reject this though..
> > > > > >> >> >>>>> Good guess :-P  We shouldn't rely on something like
> > > > > >> >> >>>>> that for
> > > > > >> >> >> correctness.
> > > > > >> >> >>>>>
> > > > > >> >> >>>>> Would it help if we promoted the test-and-branch
> > > > > >> >> >>>>> instructions to optabs, alongside cbranch?  The jump
> > > > > >> >> >>>>> expanders could then target it
> > > > > >> >> >> directly.
> > > > > >> >> >>>>>
> > > > > >> >> >>>>> IMO that'd be a reasonable thing to do if it does help.
> > > > > >> >> >>>>> It's a relatively common operation, especially on
> > > > > >> >> >>>>> CISCy
> > > targets.
> > > > > >> >> >>>>
> > > > > >> >> >>>> But don't we represent these single bit tests using
> > > > > >> >> >>>> zero_extract as the condition of the branch?  I guess
> > > > > >> >> >>>> if we can generate them directly rather than waiting
> > > > > >> >> >>>> for combine to deduce that we're dealing with a
> > > > > >> >> >>>> single bit test and constructing the zero_extract
> > > > > >> >> >>>> form would be an improvement and might help aarch at
> > > > > >> >> >>>> the same
> > > > > >> time.
> > > > > >> >> >>>
> > > > > >> >> >>> Do you mean that the promote_mode stuff should use
> > > > > >> >> >>> ext(z)v rather than zero_extend to promote a bool, where
> available?
> > > > > >> >> >>> If so, I agree that might help.  But it sounds like it
> > > > > >> >> >>> would have downsides
> > > > > >> too.
> > > > > >> >> >>> Currently a bool memory can be zero-extended on the
> > > > > >> >> >>> fly using a load, but if we used the zero_extract form
> > > > > >> >> >>> instead, we'd have to extract the bit after the load.
> > > > > >> >> >>> And (as an
> > > > > >> >> >>> alternative) choosing different behaviour based on
> > > > > >> >> >>> whether expand sees a REG or a MEM sounds like it
> > > > > >> >> >>> could still cause problems, since REGs could be
> > > > > >> >> >>> replaced by MEMs (or vice versa)
> > > > > later in the RTL passes.
> > > > > >> >> >>>
> > > > > >> >> >>> ISTM that the original patch was inserting an extra
> > > > > >> >> >>> operation in the branch expansion in order to target a
> > > > > >> >> >>> specific
> > > instruction.
> > > > > >> >> >>> Targeting the instruction in expand seems good, but
> > > > > >> >> >>> IMO we should do it directly, based on knowledge of
> > > > > >> >> >>> whether the instruction actually
> > > > > >> >> exists.
> > > > > >> >> >>
> > > > > >> >> >> Yes, I think a compare-and-branch pattern is the best fit
> here.
> > > > > >> >> >> Note on GIMPLE we'd rely on the fact this is a
> > > > > >> >> >> BOOLEAN_TYPE (so even 8 bit precision bools only have 1
> > > > > >> >> >> and 0 as meaningful
> > > values).
> > > > > >> >> >> So the 'compare-' bit in compare-and-branch would be
> > > > > >> >> >> interpreting a BOOLEAN_TYPE, not so much a general
> compare.
> > > > > >> >> >
> > > > > >> >> > Oh, I was thinking of adding a constant argument
> > > > > >> >> > representing the precision that is relevant for the
> > > > > >> >> > compare in order to make this a bit more
> > > > > >> >> general/future proof.
> > > > > >> >> >
> > > > > >> >> > Are you thinking I should instead just make the optab
> > > > > >> >> > implicitly only work for 1-bit precision comparisons?
> > > > > >> >>
> > > > > >> >> What?s the optab you propose (cite also the documentation
> part)?
> > > > > >> >
> > > > > >> > tbranchmode5
> > > > > >> >   Conditional branch instruction combined with a bit test
> instruction.
> > > > > >> Operand 0 is a comparison operator.
> > > > > >> >   Operand 1 and Operand 2 are the first and second operands
> > > > > >> > of the
> > > > > >> comparison, respectively.
> > > > > >> >   Operand 3 is the number of low-order bits that are
> > > > > >> > relevant for the
> > > > > >> comparison.
> > > > > >> >   Operand 4 is the code_label to jump to.
> > > > > >>
> > > > > >> For the TB instructions (and for other similar instructions
> > > > > >> that I've seen on other architectures) it would be more
> > > > > >> useful to have a single-bit test, with operand 4 specifying the bit
> position.
> > > > > >> Arguably it might then be better to have separate eq and ne
> > > > > >> optabs, to avoid the awkward doubling of the operands
> > > > > >> (operand 1 contains
> > > > > operands 2 and 3).
> > > > > >>
> > > > > >> I guess a more general way of achieving the same thing would
> > > > > >> be to make operand 4 in the optab above a mask rather than a bit
> count.
> > > > > >> But that might be overly general, if there are no known
> > > > > >> architectures that have such an instruction.
> > > > > >
> > > > > > One of the reasons I wanted a range rather than a single bit
> > > > > > is that I can the use this to generate cbz/cbnz early on as well.
> > > > >
> > > > > We already have the opportunity to do that via cbranch<mode>4.
> > > > > But at the moment aarch64.md always forces the separate
> > > > > comparison instead.  (Not sure why TBH.  Does it enable more
> > > > > ifcvt
> > > > > opportunities?)
> > > > >
> > > > > If we change the body of cbranch<mode>4 to:
> > > > >
> > > > >   if ((GET_CODE (operands[0]) != EQ && GET_CODE (operands[0]) !=
> NE)
> > > > >       || operands[2] != const0_rtx)
> > > > >     {
> > > > >       operands[1] = aarch64_gen_compare_reg (GET_CODE
> (operands[0]),
> > > > > 					     operands[1], operands[2]);
> > > > >       operands[2] = const0_rtx;
> > > > >     }
> > > > >
> > > > > then we generate the cbz/cbnz directly.
> > > > >
> > > >
> > > > Ah ok, then if Richi agrees, bitpos it is then instead of bit count.
> > >
> > > Somehow I understood that cbranch<>4 is already fully capable of the
> > > optimization?
> > >
> > > On your earlier proposal I'd have commented that if it wouldn't make
> > > sense to instead have a CCmode setter instead of an expander with a
> branch label?
> > > That would be a bit test, like {sign,zero}_extract compared against
> > > zero which can then be combined with a branch?
> > >
> >
> > I missed that part, that could work too.
> >
> > > Of course if the ISA is really bit-test-and-branch then cbranch<>4
> > > itself or a variant of it might be more useful.  Maybe
> > > cbranchbi4 would be "abused" for this?
> >
> > The instruction is an actual bit-test-and-branch with any arbitrary bitpos.
> > Yes we can abuse cbranchbi4 for this, but then it also means we can't e.g.
> > use this to optimize a < 0 where a is a signed value.  With the new
> > optab this would just be a bit-test-and-branch of the sign bit.
> >
> > But also I'm not entirely convinced that using `BImode` and assuming a
> > single bit is safe here. What happens if I compile my source with -std=c89?
> >
> > So I personally think the new optab makes more sense here. The CC setter
> would work too.
> >
> > I guess my question is, between you folks, which approach would you
> > like. It seems that Richi You'd like a CC setter. Richard do you have a
> preference of one over the other?
> 
> My order of preference is
> 
> a) an existing pattern, if possible
> b) something usable by N > 1 targets we know of, even if it requires some
> combine magic
> c) something that fits the actual ISA
>
> For b), x86 has BEXTR which performs a zero_extract from reg/mem and sets
> ZF when the result is zero.  For a branch on sign bit there's easier ways, so it's
> probably not very useful for general compare and branch optimization and if
> (a & 0x30) is probably handled by combine(?).

Agreed, I was more giving an example of other uses. But ok.

> It also seems that if (a & 1) is handled for aarch64 already and it's just a lack of
> an optab that allows RTL expansion to expand if (bool) as if (bool & 1)?

Yes this was what I was originally after.  Your original suggestion was to abuse BImode
and cbranch for this.  That could work, but I worry about introducing BImode in the RTL,
as I don't think we currently use it at all and wonder about new missed optimizations.

Richard Sandiford, would this be OK with you? I also don't want to emit an extract here as I think
that's gonna have a bigger effect on other targets than & 1 did.

I would rather have something I can explicitly check for target support for. I also wonder if just a
target hook asking the target how to expand a given tree Boolean argument would work.

Tamar

> 
> Richard.
> 
> > Tamar
> >
> > >
> > > > Tamar
> > > >
> > > >
> > > > > Thanks,
> > > > > Richard
> > > > >
> > > > >
> > > > > > This would mean we could use my earlier patch that tried to
> > > > > > drop the QI/HI promotions without needing the any_extend
> > > > > > additional pass if we wanted to.
> > > > > >
> > > > > > We'd also no longer need to rely on seeing a paradoxical
> > > > > > subreg for a
> > > tst.
> > > > > >
> > > > > > Tamar.
> > > > > >
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Richard
> > > > > >>
> > > > > >> > Specifically this representation would allow us to emit all
> > > > > >> > our different conditional branching instructions without
> > > > > >> > needing to rely on combine.  We have some cases that happen
> > > > > >> > during optimization that sometimes prevent the optimal
> > > > > >> > sequence from being generated. This
> > > > > >> would also solve that as we would expand to what we want to
> > > > > >> start
> > > with.
> > > > > >> >
> > > > > >> > Tamar.
> > > > > >> >
> > > > > >> >>
> > > > > >> >> >
> > > > > >> >> > Thanks,
> > > > > >> >> > Tamar
> > > > > >> >> >
> > > > > >> >> >>
> > > > > >> >> >> Richard.
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > > Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald,
> > > Boudien Moerman; HRB 36809 (AG Nuernberg)
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald,
> Boudien Moerman; HRB 36809 (AG Nuernberg)
  
Richard Sandiford Sept. 30, 2022, 2:28 p.m. UTC | #24
Tamar Christina <Tamar.Christina@arm.com> writes:
>> -----Original Message-----
>> From: Richard Biener <rguenther@suse.de>
>> Sent: Friday, September 30, 2022 12:53 PM
>> To: Tamar Christina <Tamar.Christina@arm.com>
>> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Tamar Christina via
>> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff Law
>> <jeffreyalaw@gmail.com>
>> Subject: RE: [PATCH 1/2]middle-end: RFC: On expansion of conditional
>> branches, give hint if argument is a truth type to backend
>>
>> On Fri, 30 Sep 2022, Tamar Christina wrote:
>>
>> > > -----Original Message-----
>> > > From: Richard Biener <rguenther@suse.de>
>> > > Sent: Friday, September 30, 2022 11:17 AM
>> > > To: Tamar Christina <Tamar.Christina@arm.com>
>> > > Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Tamar Christina
>> > > via Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff
>> Law
>> > > <jeffreyalaw@gmail.com>
>> > > Subject: RE: [PATCH 1/2]middle-end: RFC: On expansion of conditional
>> > > branches, give hint if argument is a truth type to backend
>> > >
>> > > On Fri, 30 Sep 2022, Tamar Christina wrote:
>> > >
>> > > >
>> > > >
>> > > > > -----Original Message-----
>> > > > > From: Richard Sandiford <richard.sandiford@arm.com>
>> > > > > Sent: Friday, September 30, 2022 9:49 AM
>> > > > > To: Tamar Christina <Tamar.Christina@arm.com>
>> > > > > Cc: Richard Biener <rguenther@suse.de>; Tamar Christina via
>> > > > > Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff
>> Law
>> > > > > <jeffreyalaw@gmail.com>
>> > > > > Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
>> > > > > conditional branches, give hint if argument is a truth type to
>> > > > > backend
>> > > > >
>> > > > > Tamar Christina <Tamar.Christina@arm.com> writes:
>> > > > > >> -----Original Message-----
>> > > > > >> From: Richard Sandiford <richard.sandiford@arm.com>
>> > > > > >> Sent: Friday, September 30, 2022 9:29 AM
>> > > > > >> To: Tamar Christina <Tamar.Christina@arm.com>
>> > > > > >> Cc: Richard Biener <rguenther@suse.de>; Tamar Christina via
>> > > > > >> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff
>> > > Law
>> > > > > >> <jeffreyalaw@gmail.com>
>> > > > > >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
>> > > > > >> conditional branches, give hint if argument is a truth type
>> > > > > >> to backend
>> > > > > >>
>> > > > > >> Tamar Christina <Tamar.Christina@arm.com> writes:
>> > > > > >> >> -----Original Message-----
>> > > > > >> >> From: Gcc-patches <gcc-patches-
>> > > > > >> >> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of
>> > > > > Richard
>> > > > > >> >> Biener via Gcc-patches
>> > > > > >> >> Sent: Thursday, September 29, 2022 12:09 PM
>> > > > > >> >> To: Tamar Christina via Gcc-patches
>> > > > > >> >> <gcc-patches@gcc.gnu.org>
>> > > > > >> >> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; nd
>> > > > > <nd@arm.com>
>> > > > > >> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
>> > > > > >> >> conditional branches, give hint if argument is a truth
>> > > > > >> >> type to backend
>> > > > > >> >>
>> > > > > >> >>
>> > > > > >> >>
>> > > > > >> >> > Am 29.09.2022 um 12:23 schrieb Tamar Christina via
>> > > > > >> >> > Gcc-patches
>> > > > > >> >> > <gcc-
>> > > > > >> >> patches@gcc.gnu.org>:
>> > > > > >> >> >
>> > > > > >> >> >
>> > > > > >> >> >>
>> > > > > >> >> >> -----Original Message-----
>> > > > > >> >> >> From: Richard Biener <rguenther@suse.de>
>> > > > > >> >> >> Sent: Thursday, September 29, 2022 10:41 AM
>> > > > > >> >> >> To: Richard Sandiford <Richard.Sandiford@arm.com>
>> > > > > >> >> >> Cc: Jeff Law <jeffreyalaw@gmail.com>; Tamar Christina
>> > > > > >> >> >> <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; nd
>> > > > > >> >> <nd@arm.com>
>> > > > > >> >> >> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion
>> > > > > >> >> >> of conditional branches, give hint if argument is a
>> > > > > >> >> >> truth type to backend
>> > > > > >> >> >>
>> > > > > >> >> >>> On Thu, 29 Sep 2022, Richard Sandiford wrote:
>> > > > > >> >> >>>
>> > > > > >> >> >>> Jeff Law <jeffreyalaw@gmail.com> writes:
>> > > > > >> >> >>>> On 9/28/22 09:04, Richard Sandiford wrote:
>> > > > > >> >> >>>>> Tamar Christina <Tamar.Christina@arm.com> writes:
>> > > > > >> >> >>>>>>> Maybe the target could use (subreg:SI (reg:BI
>> > > > > >> >> >>>>>>> ...)) as
>> > > > > argument.
>> > > > > >> >> Heh.
>> > > > > >> >> >>>>>> But then I'd still need to change the expansion
>> > > > > >> >> >>>>>> code. I suppose this could prevent the issue with
>> > > > > >> >> >>>>>> changes to code on
>> > > > > >> other targets.
>> > > > > >> >> >>>>>>
>> > > > > >> >> >>>>>>>>> We have undocumented addcc, negcc, etc.
>> > > > > >> >> >>>>>>>>> patterns,
>> > > > > should
>> > > > > >> we
>> > > > > >> >> >>>>>>>>> have aandcc
>> > > > > >> >> >>>>>> pattern for this indicating support for andcc +
>> > > > > >> >> >>>>>> jump as opposedto
>> > > > > >> >> >> cmpcc + jump?
>> > > > > >> >> >>>>>>>> This could work yeah. I didn't know these existed.
>> > > > > >> >> >>>>>>> Ah, so they are conditional add, not add setting
>> > > > > >> >> >>>>>>> CC, so andcc wouldn't be appropriate.
>> > > > > >> >> >>>>>>> So I'm not sure how we'd handle such situation -
>> > > > > >> >> >>>>>>> maybe looking at REG_DECL and recognizing a _Bool
>> > > > > >> >> >>>>>>> PARM_DECL is
>> > > > > OK?
>> > > > > >> >> >>>>>> I have a slight suspicion that Richard Sandiford
>> > > > > >> >> >>>>>> would likely reject this though..
>> > > > > >> >> >>>>> Good guess :-P  We shouldn't rely on something like
>> > > > > >> >> >>>>> that for
>> > > > > >> >> >> correctness.
>> > > > > >> >> >>>>>
>> > > > > >> >> >>>>> Would it help if we promoted the test-and-branch
>> > > > > >> >> >>>>> instructions to optabs, alongside cbranch?  The jump
>> > > > > >> >> >>>>> expanders could then target it
>> > > > > >> >> >> directly.
>> > > > > >> >> >>>>>
>> > > > > >> >> >>>>> IMO that'd be a reasonable thing to do if it does help.
>> > > > > >> >> >>>>> It's a relatively common operation, especially on
>> > > > > >> >> >>>>> CISCy
>> > > targets.
>> > > > > >> >> >>>>
>> > > > > >> >> >>>> But don't we represent these single bit tests using
>> > > > > >> >> >>>> zero_extract as the condition of the branch?  I guess
>> > > > > >> >> >>>> if we can generate them directly rather than waiting
>> > > > > >> >> >>>> for combine to deduce that we're dealing with a
>> > > > > >> >> >>>> single bit test and constructing the zero_extract
>> > > > > >> >> >>>> form would be an improvement and might help aarch at
>> > > > > >> >> >>>> the same
>> > > > > >> time.
>> > > > > >> >> >>>
>> > > > > >> >> >>> Do you mean that the promote_mode stuff should use
>> > > > > >> >> >>> ext(z)v rather than zero_extend to promote a bool, where
>> available?
>> > > > > >> >> >>> If so, I agree that might help.  But it sounds like it
>> > > > > >> >> >>> would have downsides
>> > > > > >> too.
>> > > > > >> >> >>> Currently a bool memory can be zero-extended on the
>> > > > > >> >> >>> fly using a load, but if we used the zero_extract form
>> > > > > >> >> >>> instead, we'd have to extract the bit after the load.
>> > > > > >> >> >>> And (as an
>> > > > > >> >> >>> alternative) choosing different behaviour based on
>> > > > > >> >> >>> whether expand sees a REG or a MEM sounds like it
>> > > > > >> >> >>> could still cause problems, since REGs could be
>> > > > > >> >> >>> replaced by MEMs (or vice versa)
>> > > > > later in the RTL passes.
>> > > > > >> >> >>>
>> > > > > >> >> >>> ISTM that the original patch was inserting an extra
>> > > > > >> >> >>> operation in the branch expansion in order to target a
>> > > > > >> >> >>> specific
>> > > instruction.
>> > > > > >> >> >>> Targeting the instruction in expand seems good, but
>> > > > > >> >> >>> IMO we should do it directly, based on knowledge of
>> > > > > >> >> >>> whether the instruction actually
>> > > > > >> >> exists.
>> > > > > >> >> >>
>> > > > > >> >> >> Yes, I think a compare-and-branch pattern is the best fit
>> here.
>> > > > > >> >> >> Note on GIMPLE we'd rely on the fact this is a
>> > > > > >> >> >> BOOLEAN_TYPE (so even 8 bit precision bools only have 1
>> > > > > >> >> >> and 0 as meaningful
>> > > values).
>> > > > > >> >> >> So the 'compare-' bit in compare-and-branch would be
>> > > > > >> >> >> interpreting a BOOLEAN_TYPE, not so much a general
>> compare.
>> > > > > >> >> >
>> > > > > >> >> > Oh, I was thinking of adding a constant argument
>> > > > > >> >> > representing the precision that is relevant for the
>> > > > > >> >> > compare in order to make this a bit more
>> > > > > >> >> general/future proof.
>> > > > > >> >> >
>> > > > > >> >> > Are you thinking I should instead just make the optab
>> > > > > >> >> > implicitly only work for 1-bit precision comparisons?
>> > > > > >> >>
>> > > > > >> >> What?s the optab you propose (cite also the documentation
>> part)?
>> > > > > >> >
>> > > > > >> > tbranchmode5
>> > > > > >> >   Conditional branch instruction combined with a bit test
>> instruction.
>> > > > > >> Operand 0 is a comparison operator.
>> > > > > >> >   Operand 1 and Operand 2 are the first and second operands
>> > > > > >> > of the
>> > > > > >> comparison, respectively.
>> > > > > >> >   Operand 3 is the number of low-order bits that are
>> > > > > >> > relevant for the
>> > > > > >> comparison.
>> > > > > >> >   Operand 4 is the code_label to jump to.
>> > > > > >>
>> > > > > >> For the TB instructions (and for other similar instructions
>> > > > > >> that I've seen on other architectures) it would be more
>> > > > > >> useful to have a single-bit test, with operand 4 specifying the bit
>> position.
>> > > > > >> Arguably it might then be better to have separate eq and ne
>> > > > > >> optabs, to avoid the awkward doubling of the operands
>> > > > > >> (operand 1 contains
>> > > > > operands 2 and 3).
>> > > > > >>
>> > > > > >> I guess a more general way of achieving the same thing would
>> > > > > >> be to make operand 4 in the optab above a mask rather than a bit
>> count.
>> > > > > >> But that might be overly general, if there are no known
>> > > > > >> architectures that have such an instruction.
>> > > > > >
>> > > > > > One of the reasons I wanted a range rather than a single bit
>> > > > > > is that I can the use this to generate cbz/cbnz early on as well.
>> > > > >
>> > > > > We already have the opportunity to do that via cbranch<mode>4.
>> > > > > But at the moment aarch64.md always forces the separate
>> > > > > comparison instead.  (Not sure why TBH.  Does it enable more
>> > > > > ifcvt
>> > > > > opportunities?)
>> > > > >
>> > > > > If we change the body of cbranch<mode>4 to:
>> > > > >
>> > > > >   if ((GET_CODE (operands[0]) != EQ && GET_CODE (operands[0]) !=
>> NE)
>> > > > >       || operands[2] != const0_rtx)
>> > > > >     {
>> > > > >       operands[1] = aarch64_gen_compare_reg (GET_CODE
>> (operands[0]),
>> > > > >                                            operands[1], operands[2]);
>> > > > >       operands[2] = const0_rtx;
>> > > > >     }
>> > > > >
>> > > > > then we generate the cbz/cbnz directly.
>> > > > >
>> > > >
>> > > > Ah ok, then if Richi agrees, bitpos it is then instead of bit count.
>> > >
>> > > Somehow I understood that cbranch<>4 is already fully capable of the
>> > > optimization?
>> > >
>> > > On your earlier proposal I'd have commented that if it wouldn't make
>> > > sense to instead have a CCmode setter instead of an expander with a
>> branch label?
>> > > That would be a bit test, like {sign,zero}_extract compared against
>> > > zero which can then be combined with a branch?
>> > >
>> >
>> > I missed that part, that could work too.
>> >
>> > > Of course if the ISA is really bit-test-and-branch then cbranch<>4
>> > > itself or a variant of it might be more useful.  Maybe
>> > > cbranchbi4 would be "abused" for this?
>> >
>> > The instruction is an actual bit-test-and-branch with any arbitrary bitpos.
>> > Yes we can abuse cbranchbi4 for this, but then it also means we can't e.g.
>> > use this to optimize a < 0 where a is a signed value.  With the new
>> > optab this would just be a bit-test-and-branch of the sign bit.
>> >
>> > But also I'm not entirely convinced that using `BImode` and assuming a
>> > single bit is safe here. What happens if I compile my source with -std=c89?
>> >
>> > So I personally think the new optab makes more sense here. The CC setter
>> would work too.
>> >
>> > I guess my question is, between you folks, which approach would you
>> > like. It seems that Richi You'd like a CC setter. Richard do you have a
>> preference of one over the other?
>>
>> My order of preference is
>>
>> a) an existing pattern, if possible
>> b) something usable by N > 1 targets we know of, even if it requires some
>> combine magic
>> c) something that fits the actual ISA
>>
>> For b), x86 has BEXTR which performs a zero_extract from reg/mem and sets
>> ZF when the result is zero.  For a branch on sign bit there's easier ways, so it's
>> probably not very useful for general compare and branch optimization and if
>> (a & 0x30) is probably handled by combine(?).
>
> Agreed, I was more giving an example of other uses. But ok.
>
>> It also seems that if (a & 1) is handled for aarch64 already and it's just a lack of
>> an optab that allows RTL expansion to expand if (bool) as if (bool & 1)?
>
> Yes this was what I was originally after.  Your original suggestion was to abuse BImode
> and cbranch for this.  That could work, but I worry about introducing BImode in the RTL,
> as I don't think we currently use it at all and wonder about new missed optimizations.
>
> Richard Sandiford, would this be OK with you? I also don't want to emit an extract here as I think
> that's gonna have a bigger effect on other targets than & 1 did.
>
> I would rather have something I can explicitly check for target support for. I also wonder if just a
> target hook asking the target how to expand a given tree Boolean argument would work.

Personally I'd prefer the test-and-branch optab.  We used to have
separate compare optabs and branch optabs in the cc0 days, but having
the combined pattern turned out to be much easier.  We don't currently
expose CC registers at the optabs level and I think that's a good thing.

We could have a test-bit-and-set-pseudo optab.  But using that optab
here would reintroduce the problem that I had with the original patch,
which is:

- Currently, the branch on the boolean value expands to a single optab
  (cbranch<mode>4).  That optab could easily expand to a single cb(n)z
  instruction on aarch64 (even though the port chooses not to do that
  currently).  So taken on its own, the branch is already maximally
  efficient.

  The patch instead adds an extra instruction to the branch expansion,
  so that it uses two optabs rather than one.  On the face of it,
  this extra instruction is entirely redundant.  But emitting it can
  sometimes allow profitable combines.  That is, rather than expand to:

    and reg, reg, 0xFF (from promote_mode)
    cbranch on reg     (from the branch expansion)

  the patch expanded to:

    and reg, reg, 0xFF (from promote_mode)
    and reg, reg, 0x1  (from the branch expansion)
    cbranch on reg     (from the branch expansion)

  The first two ANDs come from separate sources.  When both ANDs exist,
  combine can get rid of the redundancy and can then see that the
  retained AND 0x1 + cbranch optimise to a test-and-branch instruction.

  But if the target can't in fact use a test-and-branch in a particular
  situation, we'd be left with the AND 0x1 and the cbranch.  That's not
  too bad if the AND from promote_mode and the AND 0x1 can be combined.
  But if the AND with 0xff happens "too far" from the AND with 0x1
  (e.g. because it's in a loop preheader), or if there's intermediate
  logic, we might end up keeping all three instructions.

Emitting an extra bit test instruction as part of the branch expansion
IMO has the same problem.  We're relying on combine combining the
(redundant) bit test with the cbranch.  If combine doesn't do that
(e.g. because the target doesn't support the combination in certain
cases) then we could be left with three instructions rather than the
original two.

That's why I think the fix is either for promote_mode to use a different
expansion for booleans (which could also have knock-on effects) or for
the branch expanders to target the test-and-branch instruction directly.

The optab wouldn't be an aarch64 special.  arc, avr, and h8300sx have
similar instructions.  There are problem others too: h8300sx was from
memory and I stopped looking after avr. :-)

Thanks,
Richard
  
Richard Biener Sept. 30, 2022, 2:33 p.m. UTC | #25
> Am 30.09.2022 um 16:29 schrieb Richard Sandiford via Gcc-patches <gcc-patches@gcc.gnu.org>:
> 
> Tamar Christina <Tamar.Christina@arm.com> writes:
>>> -----Original Message-----
>>> From: Richard Biener <rguenther@suse.de>
>>> Sent: Friday, September 30, 2022 12:53 PM
>>> To: Tamar Christina <Tamar.Christina@arm.com>
>>> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Tamar Christina via
>>> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff Law
>>> <jeffreyalaw@gmail.com>
>>> Subject: RE: [PATCH 1/2]middle-end: RFC: On expansion of conditional
>>> branches, give hint if argument is a truth type to backend
>>> 
>>>> On Fri, 30 Sep 2022, Tamar Christina wrote:
>>> 
>>>>> -----Original Message-----
>>>>> From: Richard Biener <rguenther@suse.de>
>>>>> Sent: Friday, September 30, 2022 11:17 AM
>>>>> To: Tamar Christina <Tamar.Christina@arm.com>
>>>>> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; Tamar Christina
>>>>> via Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff
>>> Law
>>>>> <jeffreyalaw@gmail.com>
>>>>> Subject: RE: [PATCH 1/2]middle-end: RFC: On expansion of conditional
>>>>> branches, give hint if argument is a truth type to backend
>>>>> 
>>>>> On Fri, 30 Sep 2022, Tamar Christina wrote:
>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Richard Sandiford <richard.sandiford@arm.com>
>>>>>>> Sent: Friday, September 30, 2022 9:49 AM
>>>>>>> To: Tamar Christina <Tamar.Christina@arm.com>
>>>>>>> Cc: Richard Biener <rguenther@suse.de>; Tamar Christina via
>>>>>>> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff
>>> Law
>>>>>>> <jeffreyalaw@gmail.com>
>>>>>>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
>>>>>>> conditional branches, give hint if argument is a truth type to
>>>>>>> backend
>>>>>>> 
>>>>>>> Tamar Christina <Tamar.Christina@arm.com> writes:
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Richard Sandiford <richard.sandiford@arm.com>
>>>>>>>>> Sent: Friday, September 30, 2022 9:29 AM
>>>>>>>>> To: Tamar Christina <Tamar.Christina@arm.com>
>>>>>>>>> Cc: Richard Biener <rguenther@suse.de>; Tamar Christina via
>>>>>>>>> Gcc-patches <gcc-patches@gcc.gnu.org>; nd <nd@arm.com>; Jeff
>>>>> Law
>>>>>>>>> <jeffreyalaw@gmail.com>
>>>>>>>>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
>>>>>>>>> conditional branches, give hint if argument is a truth type
>>>>>>>>> to backend
>>>>>>>>> 
>>>>>>>>> Tamar Christina <Tamar.Christina@arm.com> writes:
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: Gcc-patches <gcc-patches-
>>>>>>>>>>> bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of
>>>>>>> Richard
>>>>>>>>>>> Biener via Gcc-patches
>>>>>>>>>>> Sent: Thursday, September 29, 2022 12:09 PM
>>>>>>>>>>> To: Tamar Christina via Gcc-patches
>>>>>>>>>>> <gcc-patches@gcc.gnu.org>
>>>>>>>>>>> Cc: Richard Sandiford <Richard.Sandiford@arm.com>; nd
>>>>>>> <nd@arm.com>
>>>>>>>>>>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion of
>>>>>>>>>>> conditional branches, give hint if argument is a truth
>>>>>>>>>>> type to backend
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> Am 29.09.2022 um 12:23 schrieb Tamar Christina via
>>>>>>>>>>>> Gcc-patches
>>>>>>>>>>>> <gcc-
>>>>>>>>>>> patches@gcc.gnu.org>:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: Richard Biener <rguenther@suse.de>
>>>>>>>>>>>>> Sent: Thursday, September 29, 2022 10:41 AM
>>>>>>>>>>>>> To: Richard Sandiford <Richard.Sandiford@arm.com>
>>>>>>>>>>>>> Cc: Jeff Law <jeffreyalaw@gmail.com>; Tamar Christina
>>>>>>>>>>>>> <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org; nd
>>>>>>>>>>> <nd@arm.com>
>>>>>>>>>>>>> Subject: Re: [PATCH 1/2]middle-end: RFC: On expansion
>>>>>>>>>>>>> of conditional branches, give hint if argument is a
>>>>>>>>>>>>> truth type to backend
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Thu, 29 Sep 2022, Richard Sandiford wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Jeff Law <jeffreyalaw@gmail.com> writes:
>>>>>>>>>>>>>>> On 9/28/22 09:04, Richard Sandiford wrote:
>>>>>>>>>>>>>>>> Tamar Christina <Tamar.Christina@arm.com> writes:
>>>>>>>>>>>>>>>>>> Maybe the target could use (subreg:SI (reg:BI
>>>>>>>>>>>>>>>>>> ...)) as
>>>>>>> argument.
>>>>>>>>>>> Heh.
>>>>>>>>>>>>>>>>> But then I'd still need to change the expansion
>>>>>>>>>>>>>>>>> code. I suppose this could prevent the issue with
>>>>>>>>>>>>>>>>> changes to code on
>>>>>>>>> other targets.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> We have undocumented addcc, negcc, etc.
>>>>>>>>>>>>>>>>>>>> patterns,
>>>>>>> should
>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>> have aandcc
>>>>>>>>>>>>>>>>> pattern for this indicating support for andcc +
>>>>>>>>>>>>>>>>> jump as opposedto
>>>>>>>>>>>>> cmpcc + jump?
>>>>>>>>>>>>>>>>>>> This could work yeah. I didn't know these existed.
>>>>>>>>>>>>>>>>>> Ah, so they are conditional add, not add setting
>>>>>>>>>>>>>>>>>> CC, so andcc wouldn't be appropriate.
>>>>>>>>>>>>>>>>>> So I'm not sure how we'd handle such situation -
>>>>>>>>>>>>>>>>>> maybe looking at REG_DECL and recognizing a _Bool
>>>>>>>>>>>>>>>>>> PARM_DECL is
>>>>>>> OK?
>>>>>>>>>>>>>>>>> I have a slight suspicion that Richard Sandiford
>>>>>>>>>>>>>>>>> would likely reject this though..
>>>>>>>>>>>>>>>> Good guess :-P  We shouldn't rely on something like
>>>>>>>>>>>>>>>> that for
>>>>>>>>>>>>> correctness.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Would it help if we promoted the test-and-branch
>>>>>>>>>>>>>>>> instructions to optabs, alongside cbranch?  The jump
>>>>>>>>>>>>>>>> expanders could then target it
>>>>>>>>>>>>> directly.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> IMO that'd be a reasonable thing to do if it does help.
>>>>>>>>>>>>>>>> It's a relatively common operation, especially on
>>>>>>>>>>>>>>>> CISCy
>>>>> targets.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> But don't we represent these single bit tests using
>>>>>>>>>>>>>>> zero_extract as the condition of the branch?  I guess
>>>>>>>>>>>>>>> if we can generate them directly rather than waiting
>>>>>>>>>>>>>>> for combine to deduce that we're dealing with a
>>>>>>>>>>>>>>> single bit test and constructing the zero_extract
>>>>>>>>>>>>>>> form would be an improvement and might help aarch at
>>>>>>>>>>>>>>> the same
>>>>>>>>> time.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Do you mean that the promote_mode stuff should use
>>>>>>>>>>>>>> ext(z)v rather than zero_extend to promote a bool, where
>>> available?
>>>>>>>>>>>>>> If so, I agree that might help.  But it sounds like it
>>>>>>>>>>>>>> would have downsides
>>>>>>>>> too.
>>>>>>>>>>>>>> Currently a bool memory can be zero-extended on the
>>>>>>>>>>>>>> fly using a load, but if we used the zero_extract form
>>>>>>>>>>>>>> instead, we'd have to extract the bit after the load.
>>>>>>>>>>>>>> And (as an
>>>>>>>>>>>>>> alternative) choosing different behaviour based on
>>>>>>>>>>>>>> whether expand sees a REG or a MEM sounds like it
>>>>>>>>>>>>>> could still cause problems, since REGs could be
>>>>>>>>>>>>>> replaced by MEMs (or vice versa)
>>>>>>> later in the RTL passes.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> ISTM that the original patch was inserting an extra
>>>>>>>>>>>>>> operation in the branch expansion in order to target a
>>>>>>>>>>>>>> specific
>>>>> instruction.
>>>>>>>>>>>>>> Targeting the instruction in expand seems good, but
>>>>>>>>>>>>>> IMO we should do it directly, based on knowledge of
>>>>>>>>>>>>>> whether the instruction actually
>>>>>>>>>>> exists.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Yes, I think a compare-and-branch pattern is the best fit
>>> here.
>>>>>>>>>>>>> Note on GIMPLE we'd rely on the fact this is a
>>>>>>>>>>>>> BOOLEAN_TYPE (so even 8 bit precision bools only have 1
>>>>>>>>>>>>> and 0 as meaningful
>>>>> values).
>>>>>>>>>>>>> So the 'compare-' bit in compare-and-branch would be
>>>>>>>>>>>>> interpreting a BOOLEAN_TYPE, not so much a general
>>> compare.
>>>>>>>>>>>> 
>>>>>>>>>>>> Oh, I was thinking of adding a constant argument
>>>>>>>>>>>> representing the precision that is relevant for the
>>>>>>>>>>>> compare in order to make this a bit more
>>>>>>>>>>> general/future proof.
>>>>>>>>>>>> 
>>>>>>>>>>>> Are you thinking I should instead just make the optab
>>>>>>>>>>>> implicitly only work for 1-bit precision comparisons?
>>>>>>>>>>> 
>>>>>>>>>>> What?s the optab you propose (cite also the documentation
>>> part)?
>>>>>>>>>> 
>>>>>>>>>> tbranchmode5
>>>>>>>>>>  Conditional branch instruction combined with a bit test
>>> instruction.
>>>>>>>>> Operand 0 is a comparison operator.
>>>>>>>>>>  Operand 1 and Operand 2 are the first and second operands
>>>>>>>>>> of the
>>>>>>>>> comparison, respectively.
>>>>>>>>>>  Operand 3 is the number of low-order bits that are
>>>>>>>>>> relevant for the
>>>>>>>>> comparison.
>>>>>>>>>>  Operand 4 is the code_label to jump to.
>>>>>>>>> 
>>>>>>>>> For the TB instructions (and for other similar instructions
>>>>>>>>> that I've seen on other architectures) it would be more
>>>>>>>>> useful to have a single-bit test, with operand 4 specifying the bit
>>> position.
>>>>>>>>> Arguably it might then be better to have separate eq and ne
>>>>>>>>> optabs, to avoid the awkward doubling of the operands
>>>>>>>>> (operand 1 contains
>>>>>>> operands 2 and 3).
>>>>>>>>> 
>>>>>>>>> I guess a more general way of achieving the same thing would
>>>>>>>>> be to make operand 4 in the optab above a mask rather than a bit
>>> count.
>>>>>>>>> But that might be overly general, if there are no known
>>>>>>>>> architectures that have such an instruction.
>>>>>>>> 
>>>>>>>> One of the reasons I wanted a range rather than a single bit
>>>>>>>> is that I can the use this to generate cbz/cbnz early on as well.
>>>>>>> 
>>>>>>> We already have the opportunity to do that via cbranch<mode>4.
>>>>>>> But at the moment aarch64.md always forces the separate
>>>>>>> comparison instead.  (Not sure why TBH.  Does it enable more
>>>>>>> ifcvt
>>>>>>> opportunities?)
>>>>>>> 
>>>>>>> If we change the body of cbranch<mode>4 to:
>>>>>>> 
>>>>>>>  if ((GET_CODE (operands[0]) != EQ && GET_CODE (operands[0]) !=
>>> NE)
>>>>>>>      || operands[2] != const0_rtx)
>>>>>>>    {
>>>>>>>      operands[1] = aarch64_gen_compare_reg (GET_CODE
>>> (operands[0]),
>>>>>>>                                           operands[1], operands[2]);
>>>>>>>      operands[2] = const0_rtx;
>>>>>>>    }
>>>>>>> 
>>>>>>> then we generate the cbz/cbnz directly.
>>>>>>> 
>>>>>> 
>>>>>> Ah ok, then if Richi agrees, bitpos it is then instead of bit count.
>>>>> 
>>>>> Somehow I understood that cbranch<>4 is already fully capable of the
>>>>> optimization?
>>>>> 
>>>>> On your earlier proposal I'd have commented that if it wouldn't make
>>>>> sense to instead have a CCmode setter instead of an expander with a
>>> branch label?
>>>>> That would be a bit test, like {sign,zero}_extract compared against
>>>>> zero which can then be combined with a branch?
>>>>> 
>>>> 
>>>> I missed that part, that could work too.
>>>> 
>>>>> Of course if the ISA is really bit-test-and-branch then cbranch<>4
>>>>> itself or a variant of it might be more useful.  Maybe
>>>>> cbranchbi4 would be "abused" for this?
>>>> 
>>>> The instruction is an actual bit-test-and-branch with any arbitrary bitpos.
>>>> Yes we can abuse cbranchbi4 for this, but then it also means we can't e.g.
>>>> use this to optimize a < 0 where a is a signed value.  With the new
>>>> optab this would just be a bit-test-and-branch of the sign bit.
>>>> 
>>>> But also I'm not entirely convinced that using `BImode` and assuming a
>>>> single bit is safe here. What happens if I compile my source with -std=c89?
>>>> 
>>>> So I personally think the new optab makes more sense here. The CC setter
>>> would work too.
>>>> 
>>>> I guess my question is, between you folks, which approach would you
>>>> like. It seems that Richi You'd like a CC setter. Richard do you have a
>>> preference of one over the other?
>>> 
>>> My order of preference is
>>> 
>>> a) an existing pattern, if possible
>>> b) something usable by N > 1 targets we know of, even if it requires some
>>> combine magic
>>> c) something that fits the actual ISA
>>> 
>>> For b), x86 has BEXTR which performs a zero_extract from reg/mem and sets
>>> ZF when the result is zero.  For a branch on sign bit there's easier ways, so it's
>>> probably not very useful for general compare and branch optimization and if
>>> (a & 0x30) is probably handled by combine(?).
>> 
>> Agreed, I was more giving an example of other uses. But ok.
>> 
>>> It also seems that if (a & 1) is handled for aarch64 already and it's just a lack of
>>> an optab that allows RTL expansion to expand if (bool) as if (bool & 1)?
>> 
>> Yes this was what I was originally after.  Your original suggestion was to abuse BImode
>> and cbranch for this.  That could work, but I worry about introducing BImode in the RTL,
>> as I don't think we currently use it at all and wonder about new missed optimizations.
>> 
>> Richard Sandiford, would this be OK with you? I also don't want to emit an extract here as I think
>> that's gonna have a bigger effect on other targets than & 1 did.
>> 
>> I would rather have something I can explicitly check for target support for. I also wonder if just a
>> target hook asking the target how to expand a given tree Boolean argument would work.
> 
> Personally I'd prefer the test-and-branch optab.  We used to have
> separate compare optabs and branch optabs in the cc0 days, but having
> the combined pattern turned out to be much easier.  We don't currently
> expose CC registers at the optabs level and I think that's a good thing.
> 
> We could have a test-bit-and-set-pseudo optab.  But using that optab
> here would reintroduce the problem that I had with the original patch,
> which is:
> 
> - Currently, the branch on the boolean value expands to a single optab
>  (cbranch<mode>4).  That optab could easily expand to a single cb(n)z
>  instruction on aarch64 (even though the port chooses not to do that
>  currently).  So taken on its own, the branch is already maximally
>  efficient.
> 
>  The patch instead adds an extra instruction to the branch expansion,
>  so that it uses two optabs rather than one.  On the face of it,
>  this extra instruction is entirely redundant.  But emitting it can
>  sometimes allow profitable combines.  That is, rather than expand to:
> 
>    and reg, reg, 0xFF (from promote_mode)
>    cbranch on reg     (from the branch expansion)
> 
>  the patch expanded to:
> 
>    and reg, reg, 0xFF (from promote_mode)
>    and reg, reg, 0x1  (from the branch expansion)
>    cbranch on reg     (from the branch expansion)
> 
>  The first two ANDs come from separate sources.  When both ANDs exist,
>  combine can get rid of the redundancy and can then see that the
>  retained AND 0x1 + cbranch optimise to a test-and-branch instruction.
> 
>  But if the target can't in fact use a test-and-branch in a particular
>  situation, we'd be left with the AND 0x1 and the cbranch.  That's not
>  too bad if the AND from promote_mode and the AND 0x1 can be combined.
>  But if the AND with 0xff happens "too far" from the AND with 0x1
>  (e.g. because it's in a loop preheader), or if there's intermediate
>  logic, we might end up keeping all three instructions.
> 
> Emitting an extra bit test instruction as part of the branch expansion
> IMO has the same problem.  We're relying on combine combining the
> (redundant) bit test with the cbranch.  If combine doesn't do that
> (e.g. because the target doesn't support the combination in certain
> cases) then we could be left with three instructions rather than the
> original two.
> 
> That's why I think the fix is either for promote_mode to use a different
> expansion for booleans (which could also have knock-on effects) or for
> the branch expanders to target the test-and-branch instruction directly.
> 
> The optab wouldn't be an aarch64 special.  arc, avr, and h8300sx have
> similar instructions.  There are problem others too: h8300sx was from
> memory and I stopped looking after avr. :-)

Fair enough, let’s go with that then.

Richard 


> Thanks,
> Richard
  
Andrew Pinski Oct. 27, 2022, 3:22 a.m. UTC | #26
On Fri, Sep 23, 2022 at 2:26 AM Tamar Christina via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi All,
>
> This is an RFC to figure out how to deal with targets that don't have native
> comparisons against QImode values.
>
> Booleans, at least in C99 and higher are 0-1 valued.  This means that we only
> really need to test a single bit.  However in RTL we no longer have this
> information available and just have an SImode value (due to the promotion of
> QImode to SImode).
>
> This RFC fixes it by emitting an explicit & 1 during the expansion of the
> conditional branch.
>
> However it's unlikely that we want to do this unconditionally.  Most targets
> I've tested seem to have harmless code changes, like x86 changes from testb to
> andl, $1.
>
> So I have two questions:
>
> 1. Should I limit this behind a target macro? Or should I just leave it for all
>    targets and deal with the fallout.
> 2. How can I tell whether the C99 0-1 values bools are being used or the older
>    0, non-0 variant?
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> However there are some benign codegen changes on x86, testb changed to andl $1.
>
> This pattern occurs more than 120,000 times in SPECCPU 2017 and so is quite common.
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>         * tree.h (tree_zero_one_valued_p): New.
>         * dojump.cc (do_jump): Add & 1 if truth type.

Just FYI, I think this fixes/improves https://gcc.gnu.org/PR101806 .

Thanks,
Andrew

>
> --- inline copy of patch --
> diff --git a/gcc/dojump.cc b/gcc/dojump.cc
> index 2af0cd1aca3b6af13d5d8799094ee93f18022296..8eaf1be49cd12298e61c6946ae79ca9de6197864 100644
> --- a/gcc/dojump.cc
> +++ b/gcc/dojump.cc
> @@ -605,7 +605,17 @@ do_jump (tree exp, rtx_code_label *if_false_label,
>        /* Fall through and generate the normal code.  */
>      default:
>      normal:
> -      temp = expand_normal (exp);
> +      tree cmp = exp;
> +      /* If the expression is a truth type then explicitly generate an & 1
> +        to indicate to the target that it's a zero-one values type.  This
> +        allows the target to further optimize the comparison should it
> +        choose to.  */
> +      if (tree_zero_one_valued_p (exp))
> +       {
> +         type = TREE_TYPE (exp);
> +         cmp = build2 (BIT_AND_EXPR, type, exp, build_int_cstu (type, 1));
> +       }
> +      temp = expand_normal (cmp);
>        do_pending_stack_adjust ();
>        /* The RTL optimizers prefer comparisons against pseudos.  */
>        if (GET_CODE (temp) == SUBREG)
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 8f8a9660c9e0605eb516de194640b8c1b531b798..be3d2dee82f692e81082cf21c878c10f9fe9e1f1 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -4690,6 +4690,7 @@ extern tree signed_or_unsigned_type_for (int, tree);
>  extern tree signed_type_for (tree);
>  extern tree unsigned_type_for (tree);
>  extern bool is_truth_type_for (tree, tree);
> +extern bool tree_zero_one_valued_p (tree);
>  extern tree truth_type_for (tree);
>  extern tree build_pointer_type_for_mode (tree, machine_mode, bool);
>  extern tree build_pointer_type (tree);
>
>
>
>
> --
  

Patch

--- a/gcc/dojump.cc
+++ b/gcc/dojump.cc
@@ -605,7 +605,17 @@  do_jump (tree exp, rtx_code_label *if_false_label,
       /* Fall through and generate the normal code.  */
     default:
     normal:
-      temp = expand_normal (exp);
+      tree cmp = exp;
+      /* If the expression is a truth type then explicitly generate an & 1
+	 to indicate to the target that it's a zero-one values type.  This
+	 allows the target to further optimize the comparison should it
+	 choose to.  */
+      if (tree_zero_one_valued_p (exp))
+	{
+	  type = TREE_TYPE (exp);
+	  cmp = build2 (BIT_AND_EXPR, type, exp, build_int_cstu (type, 1));
+	}
+      temp = expand_normal (cmp);
       do_pending_stack_adjust ();
       /* The RTL optimizers prefer comparisons against pseudos.  */
       if (GET_CODE (temp) == SUBREG)
diff --git a/gcc/tree.h b/gcc/tree.h
index 8f8a9660c9e0605eb516de194640b8c1b531b798..be3d2dee82f692e81082cf21c878c10f9fe9e1f1 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -4690,6 +4690,7 @@  extern tree signed_or_unsigned_type_for (int, tree);
 extern tree signed_type_for (tree);
 extern tree unsigned_type_for (tree);
 extern bool is_truth_type_for (tree, tree);
+extern bool tree_zero_one_valued_p (tree);
 extern tree truth_type_for (tree);
 extern tree build_pointer_type_for_mode (tree, machine_mode, bool);
 extern tree build_pointer_type (tree);