[x86] Fix FAIL of gcc.target/i386/pr91681-1.c
Checks
Commit Message
The recent change in TImode parameter passing on x86_64 results in the
FAIL of pr91681-1.c. The issue is that with the extra flexibility,
the combine pass is now spoilt for choice between using either the
*add<dwi>3_doubleword_concat or the *add<dwi>3_doubleword_zext
patterns, when one operand is a *concat and the other is a zero_extend.
The solution proposed below is provide an *add<dwi>3_doubleword_concat_zext
define_insn_and_split, that can benefit both from the register allocation
of *concat, and still avoid the xor normally required by zero extension.
I'm investigating a follow-up refinement to improve register allocation
further by avoiding the early clobber in the =&r, and handling (custom)
reloads explicitly, but this piece resolves the testcase failure.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures. Ok for mainline?
2023-07-11 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/91681
* config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New
define_insn_and_split derived from *add<dwi>3_doubleword_concat
and *add<dwi>3_doubleword_zext.
Thanks,
Roger
--
Comments
On Tue, Jul 11, 2023 at 10:07 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> The recent change in TImode parameter passing on x86_64 results in the
> FAIL of pr91681-1.c. The issue is that with the extra flexibility,
> the combine pass is now spoilt for choice between using either the
> *add<dwi>3_doubleword_concat or the *add<dwi>3_doubleword_zext
> patterns, when one operand is a *concat and the other is a zero_extend.
> The solution proposed below is provide an *add<dwi>3_doubleword_concat_zext
> define_insn_and_split, that can benefit both from the register allocation
> of *concat, and still avoid the xor normally required by zero extension.
>
> I'm investigating a follow-up refinement to improve register allocation
> further by avoiding the early clobber in the =&r, and handling (custom)
> reloads explicitly, but this piece resolves the testcase failure.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures. Ok for mainline?
>
>
> 2023-07-11 Roger Sayle <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
> PR target/91681
> * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New
> define_insn_and_split derived from *add<dwi>3_doubleword_concat
> and *add<dwi>3_doubleword_zext.
OK.
Thanks,
Uros.
>
>
> Thanks,
> Roger
> --
>
> The recent change in TImode parameter passing on x86_64 results in the FAIL
> of pr91681-1.c. The issue is that with the extra flexibility, the combine pass is
> now spoilt for choice between using either the
> *add<dwi>3_doubleword_concat or the *add<dwi>3_doubleword_zext
> patterns, when one operand is a *concat and the other is a zero_extend.
> The solution proposed below is provide an
> *add<dwi>3_doubleword_concat_zext define_insn_and_split, that can
> benefit both from the register allocation of *concat, and still avoid the xor
> normally required by zero extension.
>
> I'm investigating a follow-up refinement to improve register allocation
> further by avoiding the early clobber in the =&r, and handling (custom)
> reloads explicitly, but this piece resolves the testcase failure.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> make -k check, both with and without --target_board=unix{-m32} with no
> new failures. Ok for mainline?
>
>
> 2023-07-11 Roger Sayle <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
> PR target/91681
> * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New
> define_insn_and_split derived from *add<dwi>3_doubleword_concat
> and *add<dwi>3_doubleword_zext.
Hi Roger,
This commit currently changed the codegen of testcase p443644-2.c from:
movq %rdx, %rax
xorl %edx, %edx
addq %rdi, %rax
adcq %rsi, %rdx
to:
movq %rdx, %rcx
movq %rdi, %rax
movq %rsi, %rdx
addq %rcx, %rax
adcq $0, %rdx
which causes the testcase fail under -m64.
Is this within your expectation?
BRs,
Haochen
>
>
> Thanks,
> Roger
> --
> -----Original Message-----
> From: Jiang, Haochen
> Sent: Friday, July 14, 2023 10:50 AM
> To: Roger Sayle <roger@nextmovesoftware.com>; gcc-patches@gcc.gnu.org
> Cc: 'Uros Bizjak' <ubizjak@gmail.com>
> Subject: RE: [x86 PATCH] Fix FAIL of gcc.target/i386/pr91681-1.c
>
> > The recent change in TImode parameter passing on x86_64 results in the
> > FAIL of pr91681-1.c. The issue is that with the extra flexibility,
> > the combine pass is now spoilt for choice between using either the
> > *add<dwi>3_doubleword_concat or the *add<dwi>3_doubleword_zext
> > patterns, when one operand is a *concat and the other is a zero_extend.
> > The solution proposed below is provide an
> > *add<dwi>3_doubleword_concat_zext define_insn_and_split, that can
> > benefit both from the register allocation of *concat, and still avoid
> > the xor normally required by zero extension.
> >
> > I'm investigating a follow-up refinement to improve register
> > allocation further by avoiding the early clobber in the =&r, and
> > handling (custom) reloads explicitly, but this piece resolves the testcase
> failure.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures. Ok for mainline?
> >
> >
> > 2023-07-11 Roger Sayle <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> > PR target/91681
> > * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New
> > define_insn_and_split derived from *add<dwi>3_doubleword_concat
> > and *add<dwi>3_doubleword_zext.
>
> Hi Roger,
>
> This commit currently changed the codegen of testcase p443644-2.c from:
Oops, a typo, I mean pr43644-2.c.
Haochen
>
> movq %rdx, %rax
> xorl %edx, %edx
> addq %rdi, %rax
> adcq %rsi, %rdx
> to:
>
> movq %rdx, %rcx
> movq %rdi, %rax
> movq %rsi, %rdx
> addq %rcx, %rax
> adcq $0, %rdx
>
> which causes the testcase fail under -m64.
>
> Is this within your expectation?
>
> BRs,
> Haochen
>
> >
> >
> > Thanks,
> > Roger
> > --
> From: Jiang, Haochen <haochen.jiang@intel.com>
> Sent: 17 July 2023 02:50
>
> > From: Jiang, Haochen
> > Sent: Friday, July 14, 2023 10:50 AM
> >
> > > The recent change in TImode parameter passing on x86_64 results in
> > > the FAIL of pr91681-1.c. The issue is that with the extra
> > > flexibility, the combine pass is now spoilt for choice between using
> > > either the *add<dwi>3_doubleword_concat or the
> > > *add<dwi>3_doubleword_zext patterns, when one operand is a *concat and
> the other is a zero_extend.
> > > The solution proposed below is provide an
> > > *add<dwi>3_doubleword_concat_zext define_insn_and_split, that can
> > > benefit both from the register allocation of *concat, and still
> > > avoid the xor normally required by zero extension.
> > >
> > > I'm investigating a follow-up refinement to improve register
> > > allocation further by avoiding the early clobber in the =&r, and
> > > handling (custom) reloads explicitly, but this piece resolves the
> > > testcase
> > failure.
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make
> > > bootstrap and make -k check, both with and without
> > > --target_board=unix{-m32} with no new failures. Ok for mainline?
> > >
> > >
> > > 2023-07-11 Roger Sayle <roger@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > > PR target/91681
> > > * config/i386/i386.md (*add<dwi>3_doubleword_concat_zext): New
> > > define_insn_and_split derived from
*add<dwi>3_doubleword_concat
> > > and *add<dwi>3_doubleword_zext.
> >
> > Hi Roger,
> >
> > This commit currently changed the codegen of testcase p443644-2.c from:
>
> Oops, a typo, I mean pr43644-2.c.
>
> Haochen
I'm working on a fix and hope to have this resolved soon (unfortunately
fixing
things in a post-reload splitter isn't working out due to reload's choices,
so the
solution will likely be a peephole2).
The problem is that pr91681-1.c and pr43644-2.c can't both PASS (as
written)!
The operation x = y + 0, can be generated as either "mov y,x; add $0,x" or
as
"xor x,x; add y,x". pr91681-1.c checks there isn't an xor, pr43644-2.c
checks
there isn't a mov. Doh! As the author of both these test cases, I've
painted
myself into a corner.
The solution is that add $0,x should be generated (optimal) when y is
already in x,
and "xor x,x; add y,x" used otherwise (as this is shorter than "mov y,x; add
$0,x",
both sequences being approximately equal performance-wise).
> > movq %rdx, %rax
> > xorl %edx, %edx
> > addq %rdi, %rax
> > adcq %rsi, %rdx
> > to:
> > movq %rdx, %rcx
> > movq %rdi, %rax
> > movq %rsi, %rdx
> > addq %rcx, %rax
> > adcq $0, %rdx
> >
> > which causes the testcase fail under -m64.
> > Is this within your expectation?
You're right that the original (using xor) is better for pr43644-2.c's test
case.
unsigned __int128 foo(unsigned __int128 x, unsigned long long y) { return
x+y; }
but the closely related (swapping the argument order):
unsigned __int128 bar(unsigned long long y, unsigned __int128 x) { return
x+y; }
is better using "adcq $0", than having a superfluous xor.
Executive summary: This FAIL isn't serious. I'll silence it soon.
> > BRs,
> > Haochen
> >
> > >
> > >
> > > Thanks,
> > > Roger
> > > --
@@ -6222,6 +6222,39 @@
(clobber (reg:CC FLAGS_REG))])]
"split_double_mode (<DWI>mode, &operands[0], 2, &operands[0], &operands[5]);")
+(define_insn_and_split "*add<dwi>3_doubleword_concat_zext"
+ [(set (match_operand:<DWI> 0 "register_operand" "=&r")
+ (plus:<DWI>
+ (any_or_plus:<DWI>
+ (ashift:<DWI>
+ (zero_extend:<DWI>
+ (match_operand:DWIH 2 "nonimmediate_operand" "rm"))
+ (match_operand:QI 3 "const_int_operand"))
+ (zero_extend:<DWI>
+ (match_operand:DWIH 4 "nonimmediate_operand" "rm")))
+ (zero_extend:<DWI>
+ (match_operand:DWIH 1 "nonimmediate_operand" "rm")))
+ (clobber (reg:CC FLAGS_REG))]
+ "INTVAL (operands[3]) == <MODE_SIZE> * BITS_PER_UNIT"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0) (match_dup 4))
+ (set (match_dup 5) (match_dup 2))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (plus:DWIH (match_dup 0) (match_dup 1))
+ (match_dup 0)))
+ (set (match_dup 0)
+ (plus:DWIH (match_dup 0) (match_dup 1)))])
+ (parallel [(set (match_dup 5)
+ (plus:DWIH
+ (plus:DWIH
+ (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0))
+ (match_dup 5))
+ (const_int 0)))
+ (clobber (reg:CC FLAGS_REG))])]
+ "split_double_mode (<DWI>mode, &operands[0], 1, &operands[0], &operands[5]);")
+
(define_insn "*add<mode>_1"
[(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r")
(plus:SWI48