[2/2] allow constant splitter run in split1 pass

Message ID 20220901032407.23751-1-guojiufu@linux.ibm.com
State New, archived
Headers
Series [1/2] Using pli(paddi) and rotate to build 64bit constants |

Commit Message

Jiufu Guo Sept. 1, 2022, 3:24 a.m. UTC
  Hi,

Currently, these two splitters (touched in this patch) are using predicate
`int_reg_operand_not_pseudo`, then they work in split2 pass after RA in
most times, and can not run before RA.

It would not be a bad idea to allow these splitters before RA.  Then more
passes (between split1 and split2) could optimize the emitted instructions.
And if splitting before RA, for these constant splitters, we may have more
freedom to create pseduo to generate more parallel instructions.

For the example in the leading patch [PATCH 1/2]: pli+plit+rldimi would be
better than pli+sldi+paddi.

Test this patch with spec, we could see performance gain some times; while
the improvement is not stable and woud caused by the patch indirectly.

This patch pass boostrap and regtest on ppc64le(includes p10).
Is it ok for trunk?  Thanks for comments!


BR,
Jeff(Jiufu)

gcc/ChangeLog:

	* config/rs6000/rs6000.md (const splitter): Update predicate.

---
 gcc/config/rs6000/rs6000.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

Segher Boessenkool Sept. 1, 2022, 10:33 p.m. UTC | #1
Hi!

On Thu, Sep 01, 2022 at 11:24:07AM +0800, Jiufu Guo wrote:
> Currently, these two splitters (touched in this patch) are using predicate
> `int_reg_operand_not_pseudo`, then they work in split2 pass after RA in
> most times, and can not run before RA.
> 
> It would not be a bad idea to allow these splitters before RA.  Then more
> passes (between split1 and split2) could optimize the emitted instructions.

The splitters can be used earlier even.  For example, often combine will
use them.

> And if splitting before RA, for these constant splitters, we may have more
> freedom to create pseduo to generate more parallel instructions.
> 
> For the example in the leading patch [PATCH 1/2]: pli+plit+rldimi would be
> better than pli+sldi+paddi.

Yes.  If you split after reload you have to do all local optimisations
(that would have been done in earlier passes) manually.  And all more
global ones (involving just one or two more insns already) are
essentially impossible to do.

Splitting after reload is necessary in some cases.  For example, all the
integer "dot" insns split to the base insn and an explicit compare, if
for some reason RA did not get cr0 here.  Importantly, this happens very
seldomly: RA knows it is two insns instead of one, and it chooses
accordingly.  Also it *has* to be after reload, it directly depends on
what RA chose to do.

Splitting dependent on if a VSR or a GPR (pair) was used is a losing
proposition.  It usually costs much more than it can gain.

> gcc/ChangeLog:
> 
> 	* config/rs6000/rs6000.md (const splitter): Update predicate.

	* config/rs6000/rs6000.md (splitter for set to and_mask constants):
	Use int_reg_operand (instead of int_reg_operand_not_pseudo).
	(splitter for multi-insn constant loads): Ditto.

You should mention the changed to *both* splitters.  For nameless
splitters it helps if you can describe it a bit.  This is hard, yes :-/

Okay for trunk like that.  Thanks!


Segher
  
Jiufu Guo Sept. 2, 2022, 3:35 a.m. UTC | #2
Segher Boessenkool <segher@kernel.crashing.org> writes:

> Hi!
>
> On Thu, Sep 01, 2022 at 11:24:07AM +0800, Jiufu Guo wrote:
>> Currently, these two splitters (touched in this patch) are using predicate
>> `int_reg_operand_not_pseudo`, then they work in split2 pass after RA in
>> most times, and can not run before RA.
>> 
>> It would not be a bad idea to allow these splitters before RA.  Then more
>> passes (between split1 and split2) could optimize the emitted instructions.
>
> The splitters can be used earlier even.  For example, often combine will
> use them.
Oh, yes! 
>
>> And if splitting before RA, for these constant splitters, we may have more
>> freedom to create pseduo to generate more parallel instructions.
>> 
>> For the example in the leading patch [PATCH 1/2]: pli+plit+rldimi would be
>> better than pli+sldi+paddi.
>
> Yes.  If you split after reload you have to do all local optimisations
> (that would have been done in earlier passes) manually.  And all more
> global ones (involving just one or two more insns already) are
> essentially impossible to do.
>
> Splitting after reload is necessary in some cases.  For example, all the
> integer "dot" insns split to the base insn and an explicit compare, if
> for some reason RA did not get cr0 here.  Importantly, this happens very
> seldomly: RA knows it is two insns instead of one, and it chooses
> accordingly.  Also it *has* to be after reload, it directly depends on
> what RA chose to do.
>
> Splitting dependent on if a VSR or a GPR (pair) was used is a losing
> proposition.  It usually costs much more than it can gain.
>
Thanks for your detailed explain which makes more reasonable!

>> gcc/ChangeLog:
>> 
>> 	* config/rs6000/rs6000.md (const splitter): Update predicate.
>
> 	* config/rs6000/rs6000.md (splitter for set to and_mask constants):
> 	Use int_reg_operand (instead of int_reg_operand_not_pseudo).
> 	(splitter for multi-insn constant loads): Ditto.
>
> You should mention the changed to *both* splitters.  For nameless
> splitters it helps if you can describe it a bit.  This is hard, yes :-/
>
Thanks for your always helpful comments!

BR,
Jeff(Jiufu)

> Okay for trunk like that.  Thanks!
>
>
> Segher
  

Patch

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 1367a2cb779..ec39676f628 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -9641,7 +9641,7 @@  (define_insn "*movdi_internal64"
 ; Some DImode loads are best done as a load of -1 followed by a mask
 ; instruction.
 (define_split
-  [(set (match_operand:DI 0 "int_reg_operand_not_pseudo")
+  [(set (match_operand:DI 0 "int_reg_operand")
 	(match_operand:DI 1 "const_int_operand"))]
   "TARGET_POWERPC64
    && num_insns_constant (operands[1], DImode) > 1
@@ -9659,7 +9659,7 @@  (define_split
 ;; When non-easy constants can go in the TOC, this should use
 ;; easy_fp_constant predicate.
 (define_split
-  [(set (match_operand:DI 0 "int_reg_operand_not_pseudo")
+  [(set (match_operand:DI 0 "int_reg_operand")
 	(match_operand:DI 1 "const_int_operand"))]
   "TARGET_POWERPC64 && num_insns_constant (operands[1], DImode) > 1"
   [(set (match_dup 0) (match_dup 2))