Don't call emit_clobber in lower-subreg.cc's resolve_simple_move.
Checks
Commit Message
Following up on posts/reviews by Segher and Uros, there's some question
over why the middle-end's lower subreg pass emits a clobber (of a
multi-word register) into the instruction stream before emitting the
sequence of moves of the word-sized parts. This clobber interferes
with (LRA) register allocation, preventing the multi-word pseudo to
remain in the same hard registers. This patch eliminates this
(presumably superfluous) clobber and thereby improves register allocation.
A concrete example of the observed improvement is PR target/43644.
For the test case:
__int128 foo(__int128 x, __int128 y) { return x+y; }
on x86_64-pc-linux-gnu, gcc -O2 currently generates:
foo: movq %rsi, %rax
movq %rdi, %r8
movq %rax, %rdi
movq %rdx, %rax
movq %rcx, %rdx
addq %r8, %rax
adcq %rdi, %rdx
ret
with this patch, we now generate the much improved:
foo: movq %rdx, %rax
movq %rcx, %rdx
addq %rdi, %rax
adcq %rsi, %rdx
ret
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32} with
no new failures. OK for mainline?
2023-05-06 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/43644
* lower-subreg.cc (resolve_simple_move): Don't emit a clobber
immediately before moving a multi-word register by parts.
gcc/testsuite/ChangeLog
PR target/43644
* gcc.target/i386/pr43644.c: New test case.
Thanks in advance,
Roger
--
Comments
On 5/6/23 06:57, Roger Sayle wrote:
>
> Following up on posts/reviews by Segher and Uros, there's some question
> over why the middle-end's lower subreg pass emits a clobber (of a
> multi-word register) into the instruction stream before emitting the
> sequence of moves of the word-sized parts. This clobber interferes
> with (LRA) register allocation, preventing the multi-word pseudo to
> remain in the same hard registers. This patch eliminates this
> (presumably superfluous) clobber and thereby improves register allocation.
Those clobbered used to help dataflow analysis know that a multi word
register was fully assigned by a subsequent sequence. I suspect they
haven't been terribly useful in quite a while.
>
> A concrete example of the observed improvement is PR target/43644.
> For the test case:
> __int128 foo(__int128 x, __int128 y) { return x+y; }
>
> on x86_64-pc-linux-gnu, gcc -O2 currently generates:
>
> foo: movq %rsi, %rax
> movq %rdi, %r8
> movq %rax, %rdi
> movq %rdx, %rax
> movq %rcx, %rdx
> addq %r8, %rax
> adcq %rdi, %rdx
> ret
>
> with this patch, we now generate the much improved:
>
> foo: movq %rdx, %rax
> movq %rcx, %rdx
> addq %rdi, %rax
> adcq %rsi, %rdx
> ret
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32} with
> no new failures. OK for mainline?
>
>
> 2023-05-06 Roger Sayle <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
> PR target/43644
> * lower-subreg.cc (resolve_simple_move): Don't emit a clobber
> immediately before moving a multi-word register by parts.
>
> gcc/testsuite/ChangeLog
> PR target/43644
> * gcc.target/i386/pr43644.c: New test case.
OK for the trunk. I won't be at all surprised to see fallout in the
various target tests. We can fault in fixes as needed. More
importantly I think we want as much soak time for this change as we can
in case there are unexpected consequences.
jeff
On Sat, May 6, 2023 at 8:46 PM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 5/6/23 06:57, Roger Sayle wrote:
> >
> > Following up on posts/reviews by Segher and Uros, there's some question
> > over why the middle-end's lower subreg pass emits a clobber (of a
> > multi-word register) into the instruction stream before emitting the
> > sequence of moves of the word-sized parts. This clobber interferes
> > with (LRA) register allocation, preventing the multi-word pseudo to
> > remain in the same hard registers. This patch eliminates this
> > (presumably superfluous) clobber and thereby improves register allocation.
> Those clobbered used to help dataflow analysis know that a multi word
> register was fully assigned by a subsequent sequence. I suspect they
> haven't been terribly useful in quite a while.
Likely - maybe they still make a difference for some targets though.
It might be interesting to see whether combining the clobber with the
first set or making the set a multi-set with a parallel would be any
better?
>
>
> >
> > A concrete example of the observed improvement is PR target/43644.
> > For the test case:
> > __int128 foo(__int128 x, __int128 y) { return x+y; }
> >
> > on x86_64-pc-linux-gnu, gcc -O2 currently generates:
> >
> > foo: movq %rsi, %rax
> > movq %rdi, %r8
> > movq %rax, %rdi
> > movq %rdx, %rax
> > movq %rcx, %rdx
> > addq %r8, %rax
> > adcq %rdi, %rdx
> > ret
> >
> > with this patch, we now generate the much improved:
> >
> > foo: movq %rdx, %rax
> > movq %rcx, %rdx
> > addq %rdi, %rax
> > adcq %rsi, %rdx
> > ret
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32} with
> > no new failures. OK for mainline?
> >
> >
> > 2023-05-06 Roger Sayle <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> > PR target/43644
> > * lower-subreg.cc (resolve_simple_move): Don't emit a clobber
> > immediately before moving a multi-word register by parts.
> >
> > gcc/testsuite/ChangeLog
> > PR target/43644
> > * gcc.target/i386/pr43644.c: New test case.
> OK for the trunk. I won't be at all surprised to see fallout in the
> various target tests. We can fault in fixes as needed. More
> importantly I think we want as much soak time for this change as we can
> in case there are unexpected consequences.
>
> jeff
On 5/8/23 00:43, Richard Biener wrote:
> On Sat, May 6, 2023 at 8:46 PM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>>
>>
>> On 5/6/23 06:57, Roger Sayle wrote:
>>>
>>> Following up on posts/reviews by Segher and Uros, there's some question
>>> over why the middle-end's lower subreg pass emits a clobber (of a
>>> multi-word register) into the instruction stream before emitting the
>>> sequence of moves of the word-sized parts. This clobber interferes
>>> with (LRA) register allocation, preventing the multi-word pseudo to
>>> remain in the same hard registers. This patch eliminates this
>>> (presumably superfluous) clobber and thereby improves register allocation.
>> Those clobbered used to help dataflow analysis know that a multi word
>> register was fully assigned by a subsequent sequence. I suspect they
>> haven't been terribly useful in quite a while.
>
> Likely - maybe they still make a difference for some targets though.
> It might be interesting to see whether combining the clobber with the
> first set or making the set a multi-set with a parallel would be any
> better?
Wrapping them inside a PARALLEL might be better, but probably isn't
worth the effort. I think all this stuff dates back to the era where we
had flow.c to provide the register lifetimes used by local-alloc. We
also had things like REG_NO_CONFLICT to indicate that the sub-object
assignments didn't conflict. In all it was rather hackish.
Jeff
@@ -1086,9 +1086,6 @@ resolve_simple_move (rtx set, rtx_insn *insn)
{
unsigned int i;
- if (REG_P (dest) && !HARD_REGISTER_NUM_P (REGNO (dest)))
- emit_clobber (dest);
-
for (i = 0; i < words; ++i)
{
rtx t = simplify_gen_subreg_concatn (word_mode, dest,
new file mode 100644
@@ -0,0 +1,11 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2" } */
+
+__int128 foo(__int128 x, __int128 y)
+{
+ return x+y;
+}
+
+/* { dg-final { scan-assembler-times "movq" 2 } } */
+/* { dg-final { scan-assembler-not "push" } } */
+/* { dg-final { scan-assembler-not "pop" } } */