rs6000: Canonicalize copysign (x, -1) back to -abs (x) in the backend [PR112606]

Message ID ZWHJzN4hJHFSZ28f@tucnak
State Unresolved
Headers
Series rs6000: Canonicalize copysign (x, -1) back to -abs (x) in the backend [PR112606] |

Checks

Context Check Description
snail/gcc-patch-check warning Git am fail log

Commit Message

Jakub Jelinek Nov. 25, 2023, 10:17 a.m. UTC
  Hi!

The middle-end has been changed quite recently to canonicalize
-abs (x) to copysign (x, -1) rather than the other way around.
While I agree with that at GIMPLE level, since it matches the GIMPLE
goal of as few operations as possible for a canonical form (-abs (x)
is 2 GIMPLE statements, copysign (x, -1) is just one), I must say
I don't really like that being done on RTL as well (or at least
not canonicalizing (COPYSIGN x, negative) back to (NEG (ABS x))),
because on most targets most of floating point constants need to be loaded
from memory, there are a few exceptions but -1 is often not one of them.

Anyway, the following patch fixes the rs6000 regression caused by the
change in GIMPLE canonicalization (i.e. the desirable one).  As rs6000
clearly prefers -abs (x) form because it has a single instruction to do
that while it also has copysign instruction, but that requires loading the
-1 from memory, the following patch just ensures the copysign expander
can actually see the floating point constant and in that case emits the
-abs (x) code (or in the hypothetical case of copysign with non-negative
constant abs (x) - but there copysign (x, 1) in GIMPLE is canonicalized
to abs (x)), otherwise forces the operand to be the expected gpc_reg_operand
and does what it did before.

Bootstrapped/regtested on powerpc64le-linux, ok for trunk?

2023-11-25  Jakub Jelinek  <jakub@redhat.com>

	PR target/112606
	* config/rs6000/rs6000.md (copysign<mode>3): Change predicate
	of the last argument from gpc_reg_operand to any_operand.  If
	operands[2] is CONST_DOUBLE, emit abs or neg abs depending on
	its sign, otherwise if it doesn't satisfy gpc_reg_operand,
	force it to REG using copy_to_mode_reg.


	Jakub
  

Comments

Xi Ruoyao Nov. 25, 2023, 10:42 a.m. UTC | #1
On Sat, 2023-11-25 at 11:17 +0100, Jakub Jelinek wrote:
> The middle-end has been changed quite recently to canonicalize
> -abs (x) to copysign (x, -1) rather than the other way around.
> While I agree with that at GIMPLE level, since it matches the GIMPLE
> goal of as few operations as possible for a canonical form (-abs (x)
> is 2 GIMPLE statements, copysign (x, -1) is just one), I must say
> I don't really like that being done on RTL as well (or at least
> not canonicalizing (COPYSIGN x, negative) back to (NEG (ABS x))),
> because on most targets most of floating point constants need to be loaded
> from memory, there are a few exceptions but -1 is often not one of them.

On LoongArch fneg+fabs is even slower than loading a -1 from mem then do
copysign for some micro-architectural reason I don't know.  (FWIW on
LoongArch with LSX, the fastest way may be directly setting the sign bit
with LSX vbitseti instruction - it will also set the sign bits for
"junk" elements in the high bits of the vector register but there is no
harm.)

Can we make a target hook to control this?
  
Tamar Christina Nov. 25, 2023, 12:03 p.m. UTC | #2
> -----Original Message-----
> From: Xi Ruoyao <xry111@xry111.site>
> Sent: Saturday, November 25, 2023 10:43 AM
> To: Jakub Jelinek <jakub@redhat.com>; Segher Boessenkool
> <segher@kernel.crashing.org>; David Edelsohn <dje.gcc@gmail.com>
> Cc: gcc-patches@gcc.gnu.org; Tamar Christina <Tamar.Christina@arm.com>;
> Andrew Pinski <apinski@marvell.com>
> Subject: Re: [PATCH] rs6000: Canonicalize copysign (x, -1) back to -abs (x) in
> the backend [PR112606]
> 
> On Sat, 2023-11-25 at 11:17 +0100, Jakub Jelinek wrote:
> > The middle-end has been changed quite recently to canonicalize -abs
> > (x) to copysign (x, -1) rather than the other way around.
> > While I agree with that at GIMPLE level, since it matches the GIMPLE
> > goal of as few operations as possible for a canonical form (-abs (x)
> > is 2 GIMPLE statements, copysign (x, -1) is just one), I must say I
> > don't really like that being done on RTL as well (or at least not
> > canonicalizing (COPYSIGN x, negative) back to (NEG (ABS x))), because
> > on most targets most of floating point constants need to be loaded
> > from memory, there are a few exceptions but -1 is often not one of them.
> 
> On LoongArch fneg+fabs is even slower than loading a -1 from mem then do
> copysign for some micro-architectural reason I don't know.  (FWIW on
> LoongArch with LSX, the fastest way may be directly setting the sign bit with
> LSX vbitseti instruction - it will also set the sign bits for "junk" elements in the
> high bits of the vector register but there is no
> harm.)
> 
> Can we make a target hook to control this?

There already is.  I have been looking into this and this is the situation:

For the C99 versions of copysign, expand_COPYSIGN has optimized expansions inplace.
One of the hooks there forces it to abs/neg.  There is also  code in place for if the target
prefers integer expansion over floating point one etc.

There are several problems with it though, IFN expansions don't go through expand_COPYSIGN,
So copysignf (x, -1.f) and IFN_COPYSIGN (x, -1.f) are not treated the same, even though
Operationally they are.

The expansion also doesn't work for vector types, i.e. it's only doing it for types which have a C99
version of copysign.

match.pd has an unofficial "canonicalized" form for integer copysign, and expand_COPYSIGN expands
to a different one.  So most targets deal with the ones match.pd generate efficiently but not expand_COPYSIGNs.

All the optimizations only happen if the target does not implement the copysign optab.  Once you do, it's all up to you.

So even if we use expand_COPYSIGN for scalar expansions of IFN_COPYSIGN PPC would still need to reject the -1 case
or remove the optab and use combine to form the copysign instruction.

However the issue here is that IFNs at the moment only support direct expansion.  That is, you need an optab to get the
rewriting done.  So you have a catch 22, unlike the C99 versions which have a libcall fallback.

I have a patch locally that adds support for non-direct IFN expansions by providing hooks that a target can
Implement should they want to handle expansion or control which optimizations happen.  The patch also treats IFN and
C99 copysign the same, in that since we know we can always lower them to either equivalant integer or fp operations
we always allow the rewriting.

This allows most targets to just be able to remove the copysign optab implementation and get the same or better code than
before.

It's not a terribly big patch, but I missed stage 1 deadline and was unsure It's suitable for stage 3. It does fix years of copysign
issues once and for all though.

If maintainers want to see the patch I can finish regtesting and post it next week.

Cheers,
Tamar

> 
> --
> Xi Ruoyao <xry111@xry111.site>
> School of Aerospace Science and Technology, Xidian University
  
Jakub Jelinek Nov. 25, 2023, 12:09 p.m. UTC | #3
On Sat, Nov 25, 2023 at 12:03:56PM +0000, Tamar Christina wrote:
> For the C99 versions of copysign, expand_COPYSIGN has optimized expansions inplace.
> One of the hooks there forces it to abs/neg.  There is also  code in place for if the target
> prefers integer expansion over floating point one etc.

I thought the simplify-rtx.cc canonicalization of copysign (x, negative) to
(neg (abs ()) was disabled shortly after your change.

> However the issue here is that IFNs at the moment only support direct expansion.  That is, you need an optab to get the
> rewriting done.  So you have a catch 22, unlike the C99 versions which have a libcall fallback.

For POPCOUNT I've introduced recently a way to provide custom expand_*
function and decide there what optimizations to use, even when it otherwise
is an integral unary optab ifn.

	Jakub
  
Tamar Christina Nov. 27, 2023, 7:55 a.m. UTC | #4
> On Sat, Nov 25, 2023 at 12:03:56PM +0000, Tamar Christina wrote:
> > For the C99 versions of copysign, expand_COPYSIGN has optimized
> expansions inplace.
> > One of the hooks there forces it to abs/neg.  There is also  code in
> > place for if the target prefers integer expansion over floating point one etc.
> 
> I thought the simplify-rtx.cc canonicalization of copysign (x, negative) to (neg
> (abs ()) was disabled shortly after your change.

Yeah, but it was unclear what that code was supposed to do as it was dead code
before.  Tbf the patch you posted is probably the right thing to do for now.

> 
> > However the issue here is that IFNs at the moment only support direct
> > expansion.  That is, you need an optab to get the rewriting done.  So you
> have a catch 22, unlike the C99 versions which have a libcall fallback.
> 
> For POPCOUNT I've introduced recently a way to provide custom expand_*
> function and decide there what optimizations to use, even when it otherwise
> is an integral unary optab ifn.
> 

Oh that sounds interesting, do you have a commit for me to look at? I couldn't
Spot anything obvious in the history.

Cheers,
Tamar

> 	Jakub
  
Jakub Jelinek Nov. 27, 2023, 8:12 a.m. UTC | #5
On Mon, Nov 27, 2023 at 07:55:52AM +0000, Tamar Christina wrote:
> > For POPCOUNT I've introduced recently a way to provide custom expand_*
> > function and decide there what optimizations to use, even when it otherwise
> > is an integral unary optab ifn.
> > 
> 
> Oh that sounds interesting, do you have a commit for me to look at? I couldn't
> Spot anything obvious in the history.

https://gcc.gnu.org/r14-5613

	Jakub
  
Tamar Christina Nov. 27, 2023, 8:22 a.m. UTC | #6
> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: Monday, November 27, 2023 8:13 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Xi Ruoyao <xry111@xry111.site>; Segher Boessenkool
> <segher@kernel.crashing.org>; David Edelsohn <dje.gcc@gmail.com>; gcc-
> patches@gcc.gnu.org; Andrew Pinski <apinski@marvell.com>
> Subject: Re: [PATCH] rs6000: Canonicalize copysign (x, -1) back to -abs (x) in
> the backend [PR112606]
> 
> On Mon, Nov 27, 2023 at 07:55:52AM +0000, Tamar Christina wrote:
> > > For POPCOUNT I've introduced recently a way to provide custom
> > > expand_* function and decide there what optimizations to use, even
> > > when it otherwise is an integral unary optab ifn.
> > >
> >
> > Oh that sounds interesting, do you have a commit for me to look at? I
> > couldn't Spot anything obvious in the history.
> 
> https://gcc.gnu.org/r14-5613

Oh, that's nice! If that's the case a simpler fix could be to let COPYSIGN become
one of these as well, and then just have PPC do a FAIL on the abs and neg cases.

Expand_copysign already does the fneg (fabs ()) rewriting if the target rejects the
optab through expand_copysign_absneg

That would also fix the i386 and Arm assembly scan failures and te phi-opts case
when the IFN isn't available.. I can do that if you prefer? Since those are on my list
to fix anyway.

Thanks,
Tamar

> 
> 	Jakub
  
Jakub Jelinek Dec. 4, 2023, 8:35 a.m. UTC | #7
Hi!

I'd like to ping this patch.

Thanks

On Sat, Nov 25, 2023 at 11:17:48AM +0100, Jakub Jelinek wrote:
> The middle-end has been changed quite recently to canonicalize
> -abs (x) to copysign (x, -1) rather than the other way around.
> While I agree with that at GIMPLE level, since it matches the GIMPLE
> goal of as few operations as possible for a canonical form (-abs (x)
> is 2 GIMPLE statements, copysign (x, -1) is just one), I must say
> I don't really like that being done on RTL as well (or at least
> not canonicalizing (COPYSIGN x, negative) back to (NEG (ABS x))),
> because on most targets most of floating point constants need to be loaded
> from memory, there are a few exceptions but -1 is often not one of them.
> 
> Anyway, the following patch fixes the rs6000 regression caused by the
> change in GIMPLE canonicalization (i.e. the desirable one).  As rs6000
> clearly prefers -abs (x) form because it has a single instruction to do
> that while it also has copysign instruction, but that requires loading the
> -1 from memory, the following patch just ensures the copysign expander
> can actually see the floating point constant and in that case emits the
> -abs (x) code (or in the hypothetical case of copysign with non-negative
> constant abs (x) - but there copysign (x, 1) in GIMPLE is canonicalized
> to abs (x)), otherwise forces the operand to be the expected gpc_reg_operand
> and does what it did before.
> 
> Bootstrapped/regtested on powerpc64le-linux, ok for trunk?
> 
> 2023-11-25  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/112606
> 	* config/rs6000/rs6000.md (copysign<mode>3): Change predicate
> 	of the last argument from gpc_reg_operand to any_operand.  If
> 	operands[2] is CONST_DOUBLE, emit abs or neg abs depending on
> 	its sign, otherwise if it doesn't satisfy gpc_reg_operand,
> 	force it to REG using copy_to_mode_reg.
> 
> --- gcc/config/rs6000/rs6000.md.jj	2023-10-13 19:34:43.927834877 +0200
> +++ gcc/config/rs6000/rs6000.md	2023-11-24 18:54:13.587876170 +0100
> @@ -5358,7 +5358,7 @@ (define_expand "copysign<mode>3"
>     (set (match_dup 4)
>  	(neg:SFDF (abs:SFDF (match_dup 1))))
>     (set (match_operand:SFDF 0 "gpc_reg_operand")
> -        (if_then_else:SFDF (ge (match_operand:SFDF 2 "gpc_reg_operand")
> +	(if_then_else:SFDF (ge (match_operand:SFDF 2 "any_operand")
>  			       (match_dup 5))
>  			 (match_dup 3)
>  			 (match_dup 4)))]
> @@ -5369,6 +5369,24 @@ (define_expand "copysign<mode>3"
>         || TARGET_CMPB
>         || VECTOR_UNIT_VSX_P (<MODE>mode))"
>  {
> +  /* Middle-end canonicalizes -fabs (x) to copysign (x, -1),
> +     but PowerPC prefers -fabs (x).  */
> +  if (CONST_DOUBLE_AS_FLOAT_P (operands[2]))
> +    {
> +      if (real_isneg (CONST_DOUBLE_REAL_VALUE (operands[2])))
> +	{
> +	  operands[3] = gen_reg_rtx (<MODE>mode);
> +	  emit_insn (gen_abs<mode>2 (operands[3], operands[1]));
> +	  emit_insn (gen_neg<mode>2 (operands[0], operands[3]));
> +	}
> +      else
> +	emit_insn (gen_abs<mode>2 (operands[0], operands[1]));
> +      DONE;
> +    }
> +
> +  if (!gpc_reg_operand (operands[2], <MODE>mode))
> +    operands[2] = copy_to_mode_reg (<MODE>mode, operands[2]);
> +
>    if (TARGET_CMPB || VECTOR_UNIT_VSX_P (<MODE>mode))
>      {
>        emit_insn (gen_copysign<mode>3_fcpsgn (operands[0], operands[1],

	Jakub
  
Kewen.Lin Dec. 4, 2023, 9:21 a.m. UTC | #8
Hi Jakub,

on 2023/11/25 18:17, Jakub Jelinek wrote:
> Hi!
> 
> The middle-end has been changed quite recently to canonicalize
> -abs (x) to copysign (x, -1) rather than the other way around.
> While I agree with that at GIMPLE level, since it matches the GIMPLE
> goal of as few operations as possible for a canonical form (-abs (x)
> is 2 GIMPLE statements, copysign (x, -1) is just one), I must say
> I don't really like that being done on RTL as well (or at least
> not canonicalizing (COPYSIGN x, negative) back to (NEG (ABS x))),
> because on most targets most of floating point constants need to be loaded
> from memory, there are a few exceptions but -1 is often not one of them.
> 
> Anyway, the following patch fixes the rs6000 regression caused by the
> change in GIMPLE canonicalization (i.e. the desirable one).  As rs6000
> clearly prefers -abs (x) form because it has a single instruction to do
> that while it also has copysign instruction, but that requires loading the
> -1 from memory, the following patch just ensures the copysign expander
> can actually see the floating point constant and in that case emits the
> -abs (x) code (or in the hypothetical case of copysign with non-negative
> constant abs (x) - but there copysign (x, 1) in GIMPLE is canonicalized
> to abs (x)), otherwise forces the operand to be the expected gpc_reg_operand
> and does what it did before.
> 
> Bootstrapped/regtested on powerpc64le-linux, ok for trunk?

Thanks for fixing this!  IIUC even with Tamar's further improvement proposal,
we still need some rs6000 specific work, then updating this copysign expansion
looks more straightforward.  So okay for trunk, thanks!

BR,
Kewen

> 
> 2023-11-25  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR target/112606
> 	* config/rs6000/rs6000.md (copysign<mode>3): Change predicate
> 	of the last argument from gpc_reg_operand to any_operand.  If
> 	operands[2] is CONST_DOUBLE, emit abs or neg abs depending on
> 	its sign, otherwise if it doesn't satisfy gpc_reg_operand,
> 	force it to REG using copy_to_mode_reg.
> 
> --- gcc/config/rs6000/rs6000.md.jj	2023-10-13 19:34:43.927834877 +0200
> +++ gcc/config/rs6000/rs6000.md	2023-11-24 18:54:13.587876170 +0100
> @@ -5358,7 +5358,7 @@ (define_expand "copysign<mode>3"
>     (set (match_dup 4)
>  	(neg:SFDF (abs:SFDF (match_dup 1))))
>     (set (match_operand:SFDF 0 "gpc_reg_operand")
> -        (if_then_else:SFDF (ge (match_operand:SFDF 2 "gpc_reg_operand")
> +	(if_then_else:SFDF (ge (match_operand:SFDF 2 "any_operand")
>  			       (match_dup 5))
>  			 (match_dup 3)
>  			 (match_dup 4)))]
> @@ -5369,6 +5369,24 @@ (define_expand "copysign<mode>3"
>         || TARGET_CMPB
>         || VECTOR_UNIT_VSX_P (<MODE>mode))"
>  {
> +  /* Middle-end canonicalizes -fabs (x) to copysign (x, -1),
> +     but PowerPC prefers -fabs (x).  */
> +  if (CONST_DOUBLE_AS_FLOAT_P (operands[2]))
> +    {
> +      if (real_isneg (CONST_DOUBLE_REAL_VALUE (operands[2])))
> +	{
> +	  operands[3] = gen_reg_rtx (<MODE>mode);
> +	  emit_insn (gen_abs<mode>2 (operands[3], operands[1]));
> +	  emit_insn (gen_neg<mode>2 (operands[0], operands[3]));
> +	}
> +      else
> +	emit_insn (gen_abs<mode>2 (operands[0], operands[1]));
> +      DONE;
> +    }
> +
> +  if (!gpc_reg_operand (operands[2], <MODE>mode))
> +    operands[2] = copy_to_mode_reg (<MODE>mode, operands[2]);
> +
>    if (TARGET_CMPB || VECTOR_UNIT_VSX_P (<MODE>mode))
>      {
>        emit_insn (gen_copysign<mode>3_fcpsgn (operands[0], operands[1],
> 
> 	Jakub
>
  

Patch

--- gcc/config/rs6000/rs6000.md.jj	2023-10-13 19:34:43.927834877 +0200
+++ gcc/config/rs6000/rs6000.md	2023-11-24 18:54:13.587876170 +0100
@@ -5358,7 +5358,7 @@  (define_expand "copysign<mode>3"
    (set (match_dup 4)
 	(neg:SFDF (abs:SFDF (match_dup 1))))
    (set (match_operand:SFDF 0 "gpc_reg_operand")
-        (if_then_else:SFDF (ge (match_operand:SFDF 2 "gpc_reg_operand")
+	(if_then_else:SFDF (ge (match_operand:SFDF 2 "any_operand")
 			       (match_dup 5))
 			 (match_dup 3)
 			 (match_dup 4)))]
@@ -5369,6 +5369,24 @@  (define_expand "copysign<mode>3"
        || TARGET_CMPB
        || VECTOR_UNIT_VSX_P (<MODE>mode))"
 {
+  /* Middle-end canonicalizes -fabs (x) to copysign (x, -1),
+     but PowerPC prefers -fabs (x).  */
+  if (CONST_DOUBLE_AS_FLOAT_P (operands[2]))
+    {
+      if (real_isneg (CONST_DOUBLE_REAL_VALUE (operands[2])))
+	{
+	  operands[3] = gen_reg_rtx (<MODE>mode);
+	  emit_insn (gen_abs<mode>2 (operands[3], operands[1]));
+	  emit_insn (gen_neg<mode>2 (operands[0], operands[3]));
+	}
+      else
+	emit_insn (gen_abs<mode>2 (operands[0], operands[1]));
+      DONE;
+    }
+
+  if (!gpc_reg_operand (operands[2], <MODE>mode))
+    operands[2] = copy_to_mode_reg (<MODE>mode, operands[2]);
+
   if (TARGET_CMPB || VECTOR_UNIT_VSX_P (<MODE>mode))
     {
       emit_insn (gen_copysign<mode>3_fcpsgn (operands[0], operands[1],