[powerpc] Add a peephole2 to eliminate redundant move from VSX_REGS to GENERAL_REGS when it's from memory.

Message ID 20230504055446.1675940-1-hongtao.liu@intel.com
State Accepted
Headers
Series [powerpc] Add a peephole2 to eliminate redundant move from VSX_REGS to GENERAL_REGS when it's from memory. |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

liuhongt May 4, 2023, 5:54 a.m. UTC
  r14-172-g0368d169492017 use NO_REGS instead of GENERAL_REGS in memory cost
calculation when preferred register class is unkown.
+      /* Costs for NO_REGS are used in cost calculation on the
+        1st pass when the preferred register classes are not
+        known yet.  In this case we take the best scenario.  */

It regressed gcc.target/powerpc/dform-3.c which has inline asm explicitly
put a vector mode into a general register, then create an extra move.
RA doesn't allocate GENERAL_REGS for it because the backend pattern
explicitly disparage the alternative (<??r>, r), (??r, Y) which moves
from GENERAL_REGS/MEM to GENERAL_REGS.

(define_insn "vsx_mov<mode>_64bit"
  [(set (match_operand:VSX_M 0 "nonimmediate_operand"
               "=ZwO,      wa,        wa,        r,         we,        ?wQ,
                ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
                wa,        wa,
                ?wa,       v,         <??r>,     wZ,        v")

	(match_operand:VSX_M 1 "input_operand" 
               "wa,        ZwO,       wa,        we,        r,         r,
                wQ,        Y,         r,         r,         wE,        jwM,
                eQ,        eP,
                ?jwM,      W,         <nW>,      v,         wZ"))]

  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
   && (register_operand (operands[0], <MODE>mode) 
       || register_operand (operands[1], <MODE>mode))"
{
  return rs6000_output_move_128bit (operands);
}

Normally the extra move can be eliminated by pass_reload when src and
dest has same reg_class, but for that case, src and dest have
different reg_classes.

The patch adds a peephole2 to eliminate the extra move.

Bootstrapped and regtested on powerpc64le-linux-gnu.
Ok for trunk?


gcc/ChangeLog:

	PR target/109610
	* config/rs6000/vsx.md (define_peephole2): New peephole2 to
	catch memory loads to VSX_REGS and then moves to GENERAL_REGS.
---
 gcc/config/rs6000/vsx.md | 10 ++++++++++
 1 file changed, 10 insertions(+)
  

Comments

Segher Boessenkool May 15, 2023, 8:37 a.m. UTC | #1
On Thu, May 04, 2023 at 01:54:46PM +0800, liuhongt wrote:
> r14-172-g0368d169492017 use NO_REGS instead of GENERAL_REGS in memory cost
> calculation when preferred register class is unkown.
> +      /* Costs for NO_REGS are used in cost calculation on the
> +        1st pass when the preferred register classes are not
> +        known yet.  In this case we take the best scenario.  */
> 
> It regressed gcc.target/powerpc/dform-3.c which has inline asm explicitly
> put a vector mode into a general register, then create an extra move.
> RA doesn't allocate GENERAL_REGS for it because the backend pattern
> explicitly disparage the alternative (<??r>, r), (??r, Y) which moves
> from GENERAL_REGS/MEM to GENERAL_REGS.

And that is correct and wanted.

> Normally the extra move can be eliminated by pass_reload

Which is completely beside the point: reload is not in the business of
making good code.  Instead, it should make reasonable code when the
good code we attempted to make did not work out.  Think of it like a
last resort register allocation fixup.

> The patch adds a peephole2 to eliminate the extra move.

Nope.  Not okay.  This is not what peepholes are for *at all*, and this
path leads to insanity and millionfold maintenance cost.

Peepholes are *only*, I repeat *only*, for situations where we did a
*correct* cost estimation but due to some target detail we want to
fine-tune the generated code a bit.

If you want a peephole you most likely have a fundamental cost function
problem elsewhere.  Fix *that*, that is actually useful, and won't get
you into hotter water.

> Ok for trunk?

Not okay, no.  Please fix the actual bug?  Revert the previous patch,
for example :-(

> +;; Peephole to catch memory loads to VSX_REG and then moves to GENERAL_REGS.
> +(define_peephole2
> +  [(set (match_operand:VSX_M 0 "vsx_register_operand")
> +	(match_operand:VSX_M 1 "memory_operand"))
> +   (set (match_operand:VSX_M 2 "int_reg_operand")
> +	(match_dup 0))]
> +  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
> +  && peep2_reg_dead_p (2, operands[0])"
> +  [(set (match_dup 2) (match_dup 1))])

The condition does not make sense, even assuming the peephole does (it
does not).  Why would you care if the compiler is allowed to generate
64-bit insns here?

The formatting is messed up as well.


Segher
  

Patch

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 7d845df5c2d..a0808ccff9a 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1075,6 +1075,16 @@  (define_peephole2
    && peep2_reg_dead_p (2, operands[0])"
    [(set (match_dup 2) (match_dup 1))])
 
+;; Peephole to catch memory loads to VSX_REG and then moves to GENERAL_REGS.
+(define_peephole2
+  [(set (match_operand:VSX_M 0 "vsx_register_operand")
+	(match_operand:VSX_M 1 "memory_operand"))
+   (set (match_operand:VSX_M 2 "int_reg_operand")
+	(match_dup 0))]
+  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
+  && peep2_reg_dead_p (2, operands[0])"
+  [(set (match_dup 2) (match_dup 1))])
+
 ;; Peephole to catch memory to memory transfers for TImode if TImode landed in
 ;; VSX registers on a little endian system.  The vector types and IEEE 128-bit
 ;; floating point are handled by the more generic swap elimination pass.