Don't call emit_clobber in lower-subreg.cc's resolve_simple_move.

Message ID 009901d9801a$57573ba0$0605b2e0$@nextmovesoftware.com
State Accepted
Headers
Series Don't call emit_clobber in lower-subreg.cc's resolve_simple_move. |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Roger Sayle May 6, 2023, 12:57 p.m. UTC
  Following up on posts/reviews by Segher and Uros, there's some question
over why the middle-end's lower subreg pass emits a clobber (of a
multi-word register) into the instruction stream before emitting the
sequence of moves of the word-sized parts.  This clobber interferes
with (LRA) register allocation, preventing the multi-word pseudo to
remain in the same hard registers.  This patch eliminates this
(presumably superfluous) clobber and thereby improves register allocation.

A concrete example of the observed improvement is PR target/43644.
For the test case:
__int128 foo(__int128 x, __int128 y) { return x+y; }

on x86_64-pc-linux-gnu, gcc -O2 currently generates:

foo:    movq    %rsi, %rax
        movq    %rdi, %r8
        movq    %rax, %rdi
        movq    %rdx, %rax
        movq    %rcx, %rdx
        addq    %r8, %rax
        adcq    %rdi, %rdx
        ret

with this patch, we now generate the much improved:

foo:    movq    %rdx, %rax
        movq    %rcx, %rdx
        addq    %rdi, %rax
        adcq    %rsi, %rdx
        ret

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32} with
no new failures.  OK for mainline?


2023-05-06  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
        PR target/43644
        * lower-subreg.cc (resolve_simple_move): Don't emit a clobber
        immediately before moving a multi-word register by parts.

gcc/testsuite/ChangeLog
        PR target/43644
        * gcc.target/i386/pr43644.c: New test case.


Thanks in advance,
Roger
--
  

Comments

Jeff Law May 6, 2023, 6:46 p.m. UTC | #1
On 5/6/23 06:57, Roger Sayle wrote:
> 
> Following up on posts/reviews by Segher and Uros, there's some question
> over why the middle-end's lower subreg pass emits a clobber (of a
> multi-word register) into the instruction stream before emitting the
> sequence of moves of the word-sized parts.  This clobber interferes
> with (LRA) register allocation, preventing the multi-word pseudo to
> remain in the same hard registers.  This patch eliminates this
> (presumably superfluous) clobber and thereby improves register allocation.
Those clobbered used to help dataflow analysis know that a multi word 
register was fully assigned by a subsequent sequence.  I suspect they 
haven't been terribly useful in quite a while.


> 
> A concrete example of the observed improvement is PR target/43644.
> For the test case:
> __int128 foo(__int128 x, __int128 y) { return x+y; }
> 
> on x86_64-pc-linux-gnu, gcc -O2 currently generates:
> 
> foo:    movq    %rsi, %rax
>          movq    %rdi, %r8
>          movq    %rax, %rdi
>          movq    %rdx, %rax
>          movq    %rcx, %rdx
>          addq    %r8, %rax
>          adcq    %rdi, %rdx
>          ret
> 
> with this patch, we now generate the much improved:
> 
> foo:    movq    %rdx, %rax
>          movq    %rcx, %rdx
>          addq    %rdi, %rax
>          adcq    %rsi, %rdx
>          ret
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32} with
> no new failures.  OK for mainline?
> 
> 
> 2023-05-06  Roger Sayle  <roger@nextmovesoftware.com>
> 
> gcc/ChangeLog
>          PR target/43644
>          * lower-subreg.cc (resolve_simple_move): Don't emit a clobber
>          immediately before moving a multi-word register by parts.
> 
> gcc/testsuite/ChangeLog
>          PR target/43644
>          * gcc.target/i386/pr43644.c: New test case.
OK for the trunk.  I won't be at all surprised to see fallout in the 
various target tests.  We can fault in fixes as needed.  More 
importantly I think we want as much soak time for this change as we can 
in case there are unexpected consequences.

jeff
  
Richard Biener May 8, 2023, 6:43 a.m. UTC | #2
On Sat, May 6, 2023 at 8:46 PM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 5/6/23 06:57, Roger Sayle wrote:
> >
> > Following up on posts/reviews by Segher and Uros, there's some question
> > over why the middle-end's lower subreg pass emits a clobber (of a
> > multi-word register) into the instruction stream before emitting the
> > sequence of moves of the word-sized parts.  This clobber interferes
> > with (LRA) register allocation, preventing the multi-word pseudo to
> > remain in the same hard registers.  This patch eliminates this
> > (presumably superfluous) clobber and thereby improves register allocation.
> Those clobbered used to help dataflow analysis know that a multi word
> register was fully assigned by a subsequent sequence.  I suspect they
> haven't been terribly useful in quite a while.

Likely - maybe they still make a difference for some targets though.
It might be interesting to see whether combining the clobber with the
first set or making the set a multi-set with a parallel would be any
better?

>
>
> >
> > A concrete example of the observed improvement is PR target/43644.
> > For the test case:
> > __int128 foo(__int128 x, __int128 y) { return x+y; }
> >
> > on x86_64-pc-linux-gnu, gcc -O2 currently generates:
> >
> > foo:    movq    %rsi, %rax
> >          movq    %rdi, %r8
> >          movq    %rax, %rdi
> >          movq    %rdx, %rax
> >          movq    %rcx, %rdx
> >          addq    %r8, %rax
> >          adcq    %rdi, %rdx
> >          ret
> >
> > with this patch, we now generate the much improved:
> >
> > foo:    movq    %rdx, %rax
> >          movq    %rcx, %rdx
> >          addq    %rdi, %rax
> >          adcq    %rsi, %rdx
> >          ret
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32} with
> > no new failures.  OK for mainline?
> >
> >
> > 2023-05-06  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >          PR target/43644
> >          * lower-subreg.cc (resolve_simple_move): Don't emit a clobber
> >          immediately before moving a multi-word register by parts.
> >
> > gcc/testsuite/ChangeLog
> >          PR target/43644
> >          * gcc.target/i386/pr43644.c: New test case.
> OK for the trunk.  I won't be at all surprised to see fallout in the
> various target tests.  We can fault in fixes as needed.  More
> importantly I think we want as much soak time for this change as we can
> in case there are unexpected consequences.
>
> jeff
  
Jeff Law May 8, 2023, 10:02 p.m. UTC | #3
On 5/8/23 00:43, Richard Biener wrote:
> On Sat, May 6, 2023 at 8:46 PM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>>
>>
>> On 5/6/23 06:57, Roger Sayle wrote:
>>>
>>> Following up on posts/reviews by Segher and Uros, there's some question
>>> over why the middle-end's lower subreg pass emits a clobber (of a
>>> multi-word register) into the instruction stream before emitting the
>>> sequence of moves of the word-sized parts.  This clobber interferes
>>> with (LRA) register allocation, preventing the multi-word pseudo to
>>> remain in the same hard registers.  This patch eliminates this
>>> (presumably superfluous) clobber and thereby improves register allocation.
>> Those clobbered used to help dataflow analysis know that a multi word
>> register was fully assigned by a subsequent sequence.  I suspect they
>> haven't been terribly useful in quite a while.
> 
> Likely - maybe they still make a difference for some targets though.
> It might be interesting to see whether combining the clobber with the
> first set or making the set a multi-set with a parallel would be any
> better?
Wrapping them inside a PARALLEL might be better, but probably isn't 
worth the effort.  I think all this stuff dates back to the era where we 
had flow.c to provide the register lifetimes used by local-alloc.  We 
also had things like REG_NO_CONFLICT to indicate that the sub-object 
assignments didn't conflict.  In all it was rather hackish.

Jeff
  

Patch

diff --git a/gcc/lower-subreg.cc b/gcc/lower-subreg.cc
index 81fc5380..7c9cc3c 100644
--- a/gcc/lower-subreg.cc
+++ b/gcc/lower-subreg.cc
@@ -1086,9 +1086,6 @@  resolve_simple_move (rtx set, rtx_insn *insn)
     {
       unsigned int i;
 
-      if (REG_P (dest) && !HARD_REGISTER_NUM_P (REGNO (dest)))
-	emit_clobber (dest);
-
       for (i = 0; i < words; ++i)
 	{
 	  rtx t = simplify_gen_subreg_concatn (word_mode, dest,
diff --git a/gcc/testsuite/gcc.target/i386/pr43644.c b/gcc/testsuite/gcc.target/i386/pr43644.c
new file mode 100644
index 0000000..ffdf31c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr43644.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2" } */
+
+__int128 foo(__int128 x, __int128 y)
+{
+  return x+y;
+}
+
+/* { dg-final { scan-assembler-times "movq" 2 } } */
+/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */