[1/2] rs6000: Emit vector fp comparison directly in rs6000_emit_vector_compare

Message ID e73c1320-0738-7645-b0fa-1da62a31ab94@linux.ibm.com
State Accepted
Headers
Series [1/2] rs6000: Emit vector fp comparison directly in rs6000_emit_vector_compare |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Kewen.Lin Nov. 16, 2022, 6:48 a.m. UTC
  Hi,

All kinds of vector float comparison operators have been
supported in one rtl comparison pattern as vector.md, we can
just emit an rtx comparison insn with the given comparison
operator in function rs6000_emit_vector_compare instead of
checking and handling the reverse condition cases.

This is also for a subsequent patch to deal with some
comparison operators under trapping math enabled or disabled,
so it's important to have one centralized place for vector
float comparison handlings for better maintenance.

Bootstrapped and regtested on powerpc64-linux-gnu P7 and P8,
and powerpc64le-linux-gnu P9 and P10.

I'm going to push this later this week if no objections.

BR,
Kewen
-----

gcc/ChangeLog:

	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove
	float only comparison operators.
	(rs6000_emit_vector_compare): Emit vector comparison insn directly for
	float modes.
---
 gcc/config/rs6000/rs6000.cc | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

--
2.27.0
  

Comments

Segher Boessenkool Nov. 16, 2022, 6:44 p.m. UTC | #1
Hi!

On Wed, Nov 16, 2022 at 02:48:25PM +0800, Kewen.Lin wrote:
> 	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove
> 	float only comparison operators.

Why?  Is that correct?  Your mail says nothing about this :-(

Is there any testcase that covers this, and that shows things still
generate the same code?


Segher
  
Kewen.Lin Nov. 17, 2022, 6:59 a.m. UTC | #2
Hi Segher,

Thanks for the comments!

on 2022/11/17 02:44, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Nov 16, 2022 at 02:48:25PM +0800, Kewen.Lin wrote:
>> 	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove
>> 	float only comparison operators.
> 
> Why?  Is that correct?  Your mail says nothing about this :-(
> 
> Is there any testcase that covers this, and that shows things still
> generate the same code?
> 

Sorry for the unclear description, I thought mistakenly that it's
probably straightforward.

With the change in this patch, all 14 vector float comparison operators
(unordered/ordered/eq/ne/gt/lt/ge/le/ungt/unge/unlt/unle/uneq/ltgt)
would be handled early in rs6000_emit_vector_compare.

For unordered/ordered/ltgt/uneq, the new way is exactly the same
as what we do in rs6000_emit_vector_compare_inner, it means there is
no chance to get into rs6000_emit_vector_compare_inner with any of them.
For eq/ge/gt, it's the same too, but they are shared with vector integer
comparison, I just left them alone here.  Just noticed we can remove ge
safely too as it's guarded with !MODE_VECTOR_INT.

For ne/ungt/unlt/unge/unle, rs6000_emit_vector_compare changes the code
with reverse_condition_maybe_unordered and invert the result, it's the
same as what we have in vector.md.

; unge(a,b) = ~lt(a,b)
; unle(a,b) = ~gt(a,b)
; ne(a,b)   = ~eq(a,b)
; ungt(a,b) = ~le(a,b)
; unlt(a,b) = ~ge(a,b)

Then eq/ge/gt on the right side would match the cases that were mentioned
above.  So we just need to focus on lt and le then.

For lt, rs6000_emit_vector_compare swaps operands and the operator to gt,
it's the same as what we have in vector.md:

; lt(a,b)   = gt(b,a)

, and further matches the case mentioned above.

As to le, rs6000_emit_vector_compare tries to split it into lt IOR eq,
and further handle lt recursively, that is:
   le = lt(a,b) || eq(a,b)
      = gt(b,a) || eq(a,b)

actually this is worse than what vector.md supports:

; le(a,b)   = ge(b,a)

In short, the function rs6000_emit_vector_compare_inner is only called by
twice in rs6000_emit_vector_compare, there is no chance to enter
rs6000_emit_vector_compare_inner with codes unordered/ordered/ltgt/uneq
any more, I think it's safe to make the change in function
rs6000_emit_vector_compare_inner.  Besides, the proposed way to handle
vector float comparison can improve slightly for UNGT and LE handlings.

I constructed a test case, compiled with option -O2 -ftree-vectorize
-fno-vect-cost-model on ppc64le, which goes into this function
rs6000_emit_vector_compare with all 14 vector float comparison codes,
the assembly of most functions doesn't change after this patch,
excepting for test_UNGT_{float,double} and test_LE_{float,double}.

one example from 
before:

          lxvx 12,3,9
          lxvx 11,4,9
          xvcmpgtsp 0,11,12
          xvcmpeqsp 12,12,11
          xxlor 0,0,12
          xxlandc 0,32,0
          stxvx 0,5,9
          addi 9,9,16
          bdnz .L77

vs. 

after: (good to be unrolled)

          lxvx 0,4,10
          lxvx 12,3,10
          addi 9,10,16
          lxvx 11,3,9
          xvcmpgesp 12,0,12
          lxvx 0,4,9
          xvcmpgesp 0,0,11
          xxlandc 12,32,12
          stxvx 12,5,10
          addi 10,10,32
          xxlandc 0,32,0
          stxvx 0,5,9
          bdnz .L77


===============
$ cat test.h

#define UNORD(a, b) (__builtin_isunordered ((a), (b)))
#define ORD(a, b) (!__builtin_isunordered ((a), (b)))
#define LTGT(a, b) (__builtin_islessgreater ((a), (b)))
#define UNEQ(a, b) (!__builtin_islessgreater ((a), (b)))
#define UNGT(a, b) (!__builtin_islessequal ((a), (b)))
#define UNGE(a, b) (!__builtin_isless ((a), (b)))
#define UNLT(a, b) (!__builtin_isgreaterequal ((a), (b)))
#define UNLE(a, b) (!__builtin_isgreater ((a), (b)))
#define GT(a, b) (((a) > (b)))
#define GE(a, b) (((a) >= (b)))
#define LT(a, b) (((a) < (b)))
#define LE(a, b) (((a) <= (b)))
#define EQ(a, b) (((a) == (b)))
#define NE(a, b) (((a) != (b)))

#define TEST_VECT(NAME, TYPE)                                                  \
  __attribute__ ((noipa)) void test_##NAME##_##TYPE (TYPE *x, TYPE *y,         \
                                                     int *res, int n)          \
  {                                                                            \
    for (int i = 0; i < n; i++)                                                \
      res[i] = NAME (x[i], y[i]);                                              \
  }

===============
$ cat test.c

#include "test.h"

#define TEST(TYPE)                                                             \
  TEST_VECT (UNORD, TYPE)                                                      \
  TEST_VECT (ORD, TYPE)                                                        \
  TEST_VECT (LTGT, TYPE)                                                       \
  TEST_VECT (UNEQ, TYPE)                                                       \
  TEST_VECT (UNGT, TYPE)                                                       \
  TEST_VECT (UNGE, TYPE)                                                       \
  TEST_VECT (UNLT, TYPE)                                                       \
  TEST_VECT (UNLE, TYPE)                                                       \
  TEST_VECT (GT, TYPE)                                                         \
  TEST_VECT (GE, TYPE)                                                         \
  TEST_VECT (LT, TYPE)                                                         \
  TEST_VECT (LE, TYPE)                                                         \
  TEST_VECT (EQ, TYPE)                                                         \
  TEST_VECT (NE, TYPE)

TEST (float)
TEST (double)
===============

Maybe it's good to add one test case with function test_{UNGT,LE}_{float,double}
and scan not xvcmp{gt,eq}[sd]p.

With the above explanation, does this patch look good to you?

BR,
Kewen
  
Segher Boessenkool Nov. 18, 2022, 3:10 p.m. UTC | #3
Hi!

On Thu, Nov 17, 2022 at 02:59:00PM +0800, Kewen.Lin wrote:
> on 2022/11/17 02:44, Segher Boessenkool wrote:
> > On Wed, Nov 16, 2022 at 02:48:25PM +0800, Kewen.Lin wrote:
> >> 	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove
> >> 	float only comparison operators.
> > 
> > Why?  Is that correct?  Your mail says nothing about this :-(
> > 
> > Is there any testcase that covers this, and that shows things still
> > generate the same code?
> > 
> 
> Sorry for the unclear description, I thought mistakenly that it's
> probably straightforward.
> 
> With the change in this patch, all 14 vector float comparison operators
> (unordered/ordered/eq/ne/gt/lt/ge/le/ungt/unge/unlt/unle/uneq/ltgt)
> would be handled early in rs6000_emit_vector_compare.
> 
> For unordered/ordered/ltgt/uneq, the new way is exactly the same
> as what we do in rs6000_emit_vector_compare_inner, it means there is
> no chance to get into rs6000_emit_vector_compare_inner with any of them.

Ah!  In that case, please add an assert there.  It helps catch problems,
but much more importantly even, if helps the reader understand what is
going on :-)

> For eq/ge/gt, it's the same too, but they are shared with vector integer
> comparison, I just left them alone here.  Just noticed we can remove ge
> safely too as it's guarded with !MODE_VECTOR_INT.

ge is nasty for float, it means something different with and without
-ffast-math (with fast-math ge means not lt, le means not gt; both can
be done with a simple single condition, no cror needed.  (Compare to ne
which is the same with and without -ffast-math, that is because it has a
"not" in its definition!)

> For ne/ungt/unlt/unge/unle, rs6000_emit_vector_compare changes the code
> with reverse_condition_maybe_unordered and invert the result, it's the
> same as what we have in vector.md.
> 
> ; unge(a,b) = ~lt(a,b)
> ; unle(a,b) = ~gt(a,b)
> ; ne(a,b)   = ~eq(a,b)
> ; ungt(a,b) = ~le(a,b)
> ; unlt(a,b) = ~ge(a,b)

But for these last two do we generate identical code still?  Since
forever we have only use cror here (with CCEQ), not crnor etc. (and will
CCEQ still do the correct thing always then?)

> Then eq/ge/gt on the right side would match the cases that were mentioned
> above.  So we just need to focus on lt and le then.
> 
> For lt, rs6000_emit_vector_compare swaps operands and the operator to gt,
> it's the same as what we have in vector.md:
> 
> ; lt(a,b)   = gt(b,a)
> 
> , and further matches the case mentioned above.
> 
> As to le, rs6000_emit_vector_compare tries to split it into lt IOR eq,
> and further handle lt recursively, that is:
>    le = lt(a,b) || eq(a,b)
>       = gt(b,a) || eq(a,b)
> 
> actually this is worse than what vector.md supports:
> 
> ; le(a,b)   = ge(b,a)
> 
> In short, the function rs6000_emit_vector_compare_inner is only called by
> twice in rs6000_emit_vector_compare, there is no chance to enter
> rs6000_emit_vector_compare_inner with codes unordered/ordered/ltgt/uneq
> any more, I think it's safe to make the change in function
> rs6000_emit_vector_compare_inner.  Besides, the proposed way to handle
> vector float comparison can improve slightly for UNGT and LE handlings.

Thanks for the explanation!

Can you do this in multiple steps, which will make it much easier to
review, and to spot the problem if some unexpected problem shows up?

> I constructed a test case, compiled with option -O2 -ftree-vectorize
> -fno-vect-cost-model on ppc64le, which goes into this function
> rs6000_emit_vector_compare with all 14 vector float comparison codes,
> the assembly of most functions doesn't change after this patch,
> excepting for test_UNGT_{float,double} and test_LE_{float,double}.

For, this is a separate change, a separate and the other patches will
show no changes in generated code at all.

> Maybe it's good to add one test case with function test_{UNGT,LE}_{float,double}
> and scan not xvcmp{gt,eq}[sd]p.

In the patch that changes code gen for those, sure :-)


Segher
  
Kewen.Lin Nov. 21, 2022, 2:01 a.m. UTC | #4
Hi Segher,

on 2022/11/18 23:10, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Nov 17, 2022 at 02:59:00PM +0800, Kewen.Lin wrote:
>> on 2022/11/17 02:44, Segher Boessenkool wrote:
>>> On Wed, Nov 16, 2022 at 02:48:25PM +0800, Kewen.Lin wrote:
>>>> 	* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove
>>>> 	float only comparison operators.
>>>
>>> Why?  Is that correct?  Your mail says nothing about this :-(
>>>
>>> Is there any testcase that covers this, and that shows things still
>>> generate the same code?
>>>
>>
>> Sorry for the unclear description, I thought mistakenly that it's
>> probably straightforward.
>>
>> With the change in this patch, all 14 vector float comparison operators
>> (unordered/ordered/eq/ne/gt/lt/ge/le/ungt/unge/unlt/unle/uneq/ltgt)
>> would be handled early in rs6000_emit_vector_compare.
>>
>> For unordered/ordered/ltgt/uneq, the new way is exactly the same
>> as what we do in rs6000_emit_vector_compare_inner, it means there is
>> no chance to get into rs6000_emit_vector_compare_inner with any of them.
> 
> Ah!  In that case, please add an assert there.  It helps catch problems,
> but much more importantly even, if helps the reader understand what is
> going on :-)

Good idea, will do.

> 
>> For eq/ge/gt, it's the same too, but they are shared with vector integer
>> comparison, I just left them alone here.  Just noticed we can remove ge
>> safely too as it's guarded with !MODE_VECTOR_INT.
> 
> ge is nasty for float, it means something different with and without
> -ffast-math (with fast-math ge means not lt, le means not gt; both can
> be done with a simple single condition, no cror needed.  (Compare to ne
> which is the same with and without -ffast-math, that is because it has a
> "not" in its definition!)
> 

It's true for scalar float comparison, but the context here is for vector
comparison, the result of comparison is still vector (of boolean), and we
have the corresponding vector comparison instruction for ge, so I think it
should be fine here.

>> For ne/ungt/unlt/unge/unle, rs6000_emit_vector_compare changes the code
>> with reverse_condition_maybe_unordered and invert the result, it's the
>> same as what we have in vector.md.
>>
>> ; unge(a,b) = ~lt(a,b)
>> ; unle(a,b) = ~gt(a,b)
>> ; ne(a,b)   = ~eq(a,b)
>> ; ungt(a,b) = ~le(a,b)
>> ; unlt(a,b) = ~ge(a,b)
> 
> But for these last two do we generate identical code still?  Since
> forever we have only use cror here (with CCEQ), not crnor etc. (and will
> CCEQ still do the correct thing always then?)

For ge (~ge), yes; while for le (~le), it's not, as explained previously below.

> 
>> Then eq/ge/gt on the right side would match the cases that were mentioned
>> above.  So we just need to focus on lt and le then.
>>
>> For lt, rs6000_emit_vector_compare swaps operands and the operator to gt,
>> it's the same as what we have in vector.md:
>>
>> ; lt(a,b)   = gt(b,a)
>>
>> , and further matches the case mentioned above.
>>
>> As to le, rs6000_emit_vector_compare tries to split it into lt IOR eq,
>> and further handle lt recursively, that is:
>>    le = lt(a,b) || eq(a,b)
>>       = gt(b,a) || eq(a,b)
>>
>> actually this is worse than what vector.md supports:
>>
>> ; le(a,b)   = ge(b,a)
>>
>> In short, the function rs6000_emit_vector_compare_inner is only called by
>> twice in rs6000_emit_vector_compare, there is no chance to enter
>> rs6000_emit_vector_compare_inner with codes unordered/ordered/ltgt/uneq
>> any more, I think it's safe to make the change in function
>> rs6000_emit_vector_compare_inner.  Besides, the proposed way to handle
>> vector float comparison can improve slightly for UNGT and LE handlings.
> 
> Thanks for the explanation!
> 
> Can you do this in multiple steps, which will make it much easier to
> review, and to spot the problem if some unexpected problem shows up?

Sure, I'll try my best to separate it into some steps and show how it
evolves gradually.

> 
>> I constructed a test case, compiled with option -O2 -ftree-vectorize
>> -fno-vect-cost-model on ppc64le, which goes into this function
>> rs6000_emit_vector_compare with all 14 vector float comparison codes,
>> the assembly of most functions doesn't change after this patch,
>> excepting for test_UNGT_{float,double} and test_LE_{float,double}.
> 
> For, this is a separate change, a separate and the other patches will
> show no changes in generated code at all.

Good point, will separate it.

> 
>> Maybe it's good to add one test case with function test_{UNGT,LE}_{float,double}
>> and scan not xvcmp{gt,eq}[sd]p.
> 
> In the patch that changes code gen for those, sure :-)
> 

Thanks for all the comments again.

BR,
Kewen
  
Segher Boessenkool Nov. 27, 2022, 6:16 p.m. UTC | #5
Hi!

Whoops I missed following up to this.

On Mon, Nov 21, 2022 at 10:01:14AM +0800, Kewen.Lin wrote:
> on 2022/11/18 23:10, Segher Boessenkool wrote:
> > ge is nasty for float, it means something different with and without
> > -ffast-math (with fast-math ge means not lt, le means not gt; both can
> > be done with a simple single condition, no cror needed.  (Compare to ne
> > which is the same with and without -ffast-math, that is because it has a
> > "not" in its definition!)
> 
> It's true for scalar float comparison, but the context here is for vector
> comparison, the result of comparison is still vector (of boolean), and we
> have the corresponding vector comparison instruction for ge, so I think it
> should be fine here.

It is fine if all contexts it is used in allow ge insns, sure.  But you
have to make sure that is true; ge still is nasty, it truly means
something different with fastmath (which applies to vector float just\
the same as it does to scalar float).

> > Thanks for the explanation!
> > 
> > Can you do this in multiple steps, which will make it much easier to
> > review, and to spot the problem if some unexpected problem shows up?
> 
> Sure, I'll try my best to separate it into some steps and show how it
> evolves gradually.

If you can make the bulk of the series not actually change code
generation, just rearrange and massage the compiler code, that is much
easier to review (and it also helps to spot the problems in if there are
regressions, as a bonus).

Cheers,


Segher
  

Patch

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 635aced6105..56db12f08a0 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15660,10 +15660,6 @@  rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1)
     case EQ:
     case GT:
     case GTU:
-    case ORDERED:
-    case UNORDERED:
-    case UNEQ:
-    case LTGT:
       mask = gen_reg_rtx (mode);
       emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, mode, op0, op1)));
       return mask;
@@ -15681,12 +15677,24 @@  rs6000_emit_vector_compare (enum rtx_code rcode,
 			    machine_mode dmode)
 {
   rtx mask;
-  bool swap_operands = false;
-  bool try_again = false;
-
   gcc_assert (VECTOR_UNIT_ALTIVEC_OR_VSX_P (dmode));
   gcc_assert (GET_MODE (op0) == GET_MODE (op1));

+  /* In vector.md, we support all kinds of vector float point
+     comparison operators in a comparison rtl pattern, we can
+     just emit the comparison rtx insn directly here.  Besides,
+     we should have a centralized place to handle the possibility
+     of raising invalid exception.  */
+  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT)
+    {
+      mask = gen_reg_rtx (dmode);
+      emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
+      return mask;
+    }
+
+  bool swap_operands = false;
+  bool try_again = false;
+
   /* See if the comparison works as is.  */
   mask = rs6000_emit_vector_compare_inner (rcode, op0, op1);
   if (mask)
@@ -15705,10 +15713,6 @@  rs6000_emit_vector_compare (enum rtx_code rcode,
       try_again = true;
       break;
     case NE:
-    case UNLE:
-    case UNLT:
-    case UNGE:
-    case UNGT:
       /* Invert condition and try again.
 	 e.g., A != B becomes ~(A==B).  */
       {