[x86] PR target/31985: Improve memory operand use with doubleword add.
Checks
Commit Message
This patch addresses the last remaining issue with PR target/31985, that
GCC could make better use of memory addressing modes when implementing
double word addition. This is achieved by adding a define_insn_and_split
that combines an *add<dwi>3_doubleword with a *concat<mode><dwi>3, so
that the components of the concat can be used directly, without first
being loaded into a double word register.
For test_c in the bugzilla PR:
Before:
pushl %ebx
subl $16, %esp
movl 28(%esp), %eax
movl 36(%esp), %ecx
movl 32(%esp), %ebx
movl 24(%esp), %edx
addl %ecx, %eax
adcl %ebx, %edx
movl %eax, 8(%esp)
movl %edx, 12(%esp)
addl $16, %esp
popl %ebx
ret
After:
test_c:
subl $20, %esp
movl 36(%esp), %eax
movl 32(%esp), %edx
addl 28(%esp), %eax
adcl 24(%esp), %edx
movl %eax, 8(%esp)
movl %edx, 12(%esp)
addl $20, %esp
ret
If this approach is considered acceptable, similar splitters can be
used for other doubleword operations.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures. Ok for mainline?
2023-06-07 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/31985
* config/i386/i386.md (*add<dwi>3_doubleword_concat): New
define_insn_and_split combine *add<dwi>3_doubleword with a
*concat<mode><dwi>3 for more efficient lowering after reload.
gcc/testsuite/ChangeLog
PR target/31985
* gcc.target/i386/pr31985.c: New test case.
Roger
--
Comments
On Wed, Jun 7, 2023 at 1:05 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch addresses the last remaining issue with PR target/31985, that
> GCC could make better use of memory addressing modes when implementing
> double word addition. This is achieved by adding a define_insn_and_split
> that combines an *add<dwi>3_doubleword with a *concat<mode><dwi>3, so
> that the components of the concat can be used directly, without first
> being loaded into a double word register.
>
> For test_c in the bugzilla PR:
>
> Before:
> pushl %ebx
> subl $16, %esp
> movl 28(%esp), %eax
> movl 36(%esp), %ecx
> movl 32(%esp), %ebx
> movl 24(%esp), %edx
> addl %ecx, %eax
> adcl %ebx, %edx
> movl %eax, 8(%esp)
> movl %edx, 12(%esp)
> addl $16, %esp
> popl %ebx
> ret
>
> After:
> test_c:
> subl $20, %esp
> movl 36(%esp), %eax
> movl 32(%esp), %edx
> addl 28(%esp), %eax
> adcl 24(%esp), %edx
> movl %eax, 8(%esp)
> movl %edx, 12(%esp)
> addl $20, %esp
> ret
>
>
> If this approach is considered acceptable, similar splitters can be
> used for other doubleword operations.
Yes, the approach looks good to me. But please take care not to
interfere with STV transformations.
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures. Ok for mainline?
>
> 2023-06-07 Roger Sayle <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
> PR target/31985
> * config/i386/i386.md (*add<dwi>3_doubleword_concat): New
> define_insn_and_split combine *add<dwi>3_doubleword with a
> *concat<mode><dwi>3 for more efficient lowering after reload.
>
> gcc/testsuite/ChangeLog
> PR target/31985
> * gcc.target/i386/pr31985.c: New test case.
OK.
Thanks,
Uros.
>
>
> Roger
> --
>
On Wed, Jun 7, 2023 at 8:32 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Wed, Jun 7, 2023 at 1:05 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
> >
> >
> > This patch addresses the last remaining issue with PR target/31985, that
> > GCC could make better use of memory addressing modes when implementing
> > double word addition. This is achieved by adding a define_insn_and_split
> > that combines an *add<dwi>3_doubleword with a *concat<mode><dwi>3, so
> > that the components of the concat can be used directly, without first
> > being loaded into a double word register.
> >
> > For test_c in the bugzilla PR:
> >
> > Before:
> > pushl %ebx
> > subl $16, %esp
> > movl 28(%esp), %eax
> > movl 36(%esp), %ecx
> > movl 32(%esp), %ebx
> > movl 24(%esp), %edx
> > addl %ecx, %eax
> > adcl %ebx, %edx
> > movl %eax, 8(%esp)
> > movl %edx, 12(%esp)
> > addl $16, %esp
> > popl %ebx
> > ret
> >
> > After:
> > test_c:
> > subl $20, %esp
> > movl 36(%esp), %eax
> > movl 32(%esp), %edx
> > addl 28(%esp), %eax
> > adcl 24(%esp), %edx
> > movl %eax, 8(%esp)
> > movl %edx, 12(%esp)
> > addl $20, %esp
> > ret
> >
> >
> > If this approach is considered acceptable, similar splitters can be
> > used for other doubleword operations.
>
> Yes, the approach looks good to me. But please take care not to
> interfere with STV transformations.
Hm, I have one second thought:
+(define_insn_and_split "*add<dwi>3_doubleword_concat"
+ [(set (match_operand:<DWI> 0 "register_operand" "=r")
+ (plus:<DWI>
+ (any_or_plus:<DWI>
+ (ashift:<DWI>
+ (zero_extend:<DWI>
+ (match_operand:DWIH 2 "nonimmediate_operand" "rm"))
+ (match_operand:<DWI> 3 "const_int_operand"))
+ (zero_extend:<DWI>
+ (match_operand:DWIH 4 "nonimmediate_operand" "rm")))
+ (match_operand:<DWI> 1 "register_operand" "0")))
+ (clobber (reg:CC FLAGS_REG))]
+ "INTVAL (operands[3]) == <MODE_SIZE> * BITS_PER_UNIT"
+ "#"
+ "&& reload_completed"
+ [(parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (plus:DWIH (match_dup 1) (match_dup 4))
+ (match_dup 1)))
+ (set (match_dup 0)
+ (plus:DWIH (match_dup 1) (match_dup 4)))])
+ (parallel [(set (match_dup 5)
+ (plus:DWIH
+ (plus:DWIH
+ (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0))
+ (match_dup 6))
+ (match_dup 2)))
+ (clobber (reg:CC FLAGS_REG))])]
+ "split_double_mode (<DWI>mode, &operands[0], 2, &operands[0], &operands[5]);")
The register allocator considers the instruction-to-be-split as one
instruction, so it can allocate output register to match an input
register (or a register that forms an input address), So, you have to
either add an early clobber to the output, or somehow prevent output
to clobber registers in the second pattern.
Uros.
>
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures. Ok for mainline?
> >
> > 2023-06-07 Roger Sayle <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> > PR target/31985
> > * config/i386/i386.md (*add<dwi>3_doubleword_concat): New
> > define_insn_and_split combine *add<dwi>3_doubleword with a
> > *concat<mode><dwi>3 for more efficient lowering after reload.
> >
> > gcc/testsuite/ChangeLog
> > PR target/31985
> > * gcc.target/i386/pr31985.c: New test case.
>
> OK.
>
> Thanks,
> Uros.
>
> >
> >
> > Roger
> > --
> >
Hi Uros,
> On the 7th June 2023, Uros Bizkak wrote:
> The register allocator considers the instruction-to-be-split as one instruction, so it
> can allocate output register to match an input register (or a register that forms an
> input address), So, you have to either add an early clobber to the output, or
> somehow prevent output to clobber registers in the second pattern.
This implements your suggestion of adding an early clobber to the output, a
one character ('&') change from the previous version of this patch. Retested
with make bootstrap and make -k check, with and without -m32, to confirm
there are no issues, and this still fixes the pr31985.c test case.
As you've suggested, I'm also working on improving STV in this area.
Ok for mainline?
2023-06-15 Roger Sayle <roger@nextmovesoftware.com>
Uros Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
PR target/31985
* config/i386/i386.md (*add<dwi>3_doubleword_concat): New
define_insn_and_split combine *add<dwi>3_doubleword with a
*concat<mode><dwi>3 for more efficient lowering after reload.
gcc/testsuite/ChangeLog
PR target/31985
* gcc.target/i386/pr31985.c: New test case.
Roger
--
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e6ebc46..42c302d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -6124,6 +6124,36 @@
(clobber (reg:CC FLAGS_REG))])]
"split_double_mode (<DWI>mode, &operands[0], 2, &operands[0], &operands[3]);")
+(define_insn_and_split "*add<dwi>3_doubleword_concat"
+ [(set (match_operand:<DWI> 0 "register_operand" "=&r")
+ (plus:<DWI>
+ (any_or_plus:<DWI>
+ (ashift:<DWI>
+ (zero_extend:<DWI>
+ (match_operand:DWIH 2 "nonimmediate_operand" "rm"))
+ (match_operand:<DWI> 3 "const_int_operand"))
+ (zero_extend:<DWI>
+ (match_operand:DWIH 4 "nonimmediate_operand" "rm")))
+ (match_operand:<DWI> 1 "register_operand" "0")))
+ (clobber (reg:CC FLAGS_REG))]
+ "INTVAL (operands[3]) == <MODE_SIZE> * BITS_PER_UNIT"
+ "#"
+ "&& reload_completed"
+ [(parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (plus:DWIH (match_dup 1) (match_dup 4))
+ (match_dup 1)))
+ (set (match_dup 0)
+ (plus:DWIH (match_dup 1) (match_dup 4)))])
+ (parallel [(set (match_dup 5)
+ (plus:DWIH
+ (plus:DWIH
+ (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0))
+ (match_dup 6))
+ (match_dup 2)))
+ (clobber (reg:CC FLAGS_REG))])]
+ "split_double_mode (<DWI>mode, &operands[0], 2, &operands[0], &operands[5]);")
+
(define_insn "*add<mode>_1"
[(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r")
(plus:SWI48
diff --git a/gcc/testsuite/gcc.target/i386/pr31985.c b/gcc/testsuite/gcc.target/i386/pr31985.c
new file mode 100644
index 0000000..a6de1b5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr31985.c
@@ -0,0 +1,14 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2" } */
+
+void test_c (unsigned int a, unsigned int b, unsigned int c, unsigned int d)
+{
+ volatile unsigned int x, y;
+ unsigned long long __a = b | ((unsigned long long)a << 32);
+ unsigned long long __b = d | ((unsigned long long)c << 32);
+ unsigned long long __c = __a + __b;
+ x = (unsigned int)(__c & 0xffffffff);
+ y = (unsigned int)(__c >> 32);
+}
+
+/* { dg-final { scan-assembler-times "movl" 4 } } */
On Fri, Jun 16, 2023 at 12:04 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
>
> > On the 7th June 2023, Uros Bizkak wrote:
> > The register allocator considers the instruction-to-be-split as one instruction, so it
> > can allocate output register to match an input register (or a register that forms an
> > input address), So, you have to either add an early clobber to the output, or
> > somehow prevent output to clobber registers in the second pattern.
>
> This implements your suggestion of adding an early clobber to the output, a
> one character ('&') change from the previous version of this patch. Retested
> with make bootstrap and make -k check, with and without -m32, to confirm
> there are no issues, and this still fixes the pr31985.c test case.
>
> As you've suggested, I'm also working on improving STV in this area.
>
> Ok for mainline?
>
>
> 2023-06-15 Roger Sayle <roger@nextmovesoftware.com>
> Uros Bizjak <ubizjak@gmail.com>
>
> gcc/ChangeLog
> PR target/31985
> * config/i386/i386.md (*add<dwi>3_doubleword_concat): New
> define_insn_and_split combine *add<dwi>3_doubleword with a
> *concat<mode><dwi>3 for more efficient lowering after reload.
>
> gcc/testsuite/ChangeLog
> PR target/31985
> * gcc.target/i386/pr31985.c: New test case.
OK with a small change below.
Thanks,
Uros.
+(define_insn_and_split "*add<dwi>3_doubleword_concat"
+ [(set (match_operand:<DWI> 0 "register_operand" "=&r")
+ (plus:<DWI>
+ (any_or_plus:<DWI>
+ (ashift:<DWI>
+ (zero_extend:<DWI>
+ (match_operand:DWIH 2 "nonimmediate_operand" "rm"))
+ (match_operand:<DWI> 3 "const_int_operand"))
The above mode should be QImode, all shifts have QImode on x86.
+ (zero_extend:<DWI>
+ (match_operand:DWIH 4 "nonimmediate_operand" "rm")))
+ (match_operand:<DWI> 1 "register_operand" "0")))
+ (clobber (reg:CC FLAGS_REG))]
+ "INTVAL (operands[3]) == <MODE_SIZE> * BITS_PER_UNIT"
+ "#"
+ "&& reload_completed"
+ [(parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (plus:DWIH (match_dup 1) (match_dup 4))
+ (match_dup 1)))
+ (set (match_dup 0)
+ (plus:DWIH (match_dup 1) (match_dup 4)))])
+ (parallel [(set (match_dup 5)
+ (plus:DWIH
+ (plus:DWIH
+ (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0))
+ (match_dup 6))
+ (match_dup 2)))
+ (clobber (reg:CC FLAGS_REG))])]
+ "split_double_mode (<DWI>mode, &operands[0], 2, &operands[0], &operands[5]);")
+
> Roger
> --
>
@@ -6124,6 +6124,36 @@
(clobber (reg:CC FLAGS_REG))])]
"split_double_mode (<DWI>mode, &operands[0], 2, &operands[0], &operands[3]);")
+(define_insn_and_split "*add<dwi>3_doubleword_concat"
+ [(set (match_operand:<DWI> 0 "register_operand" "=r")
+ (plus:<DWI>
+ (any_or_plus:<DWI>
+ (ashift:<DWI>
+ (zero_extend:<DWI>
+ (match_operand:DWIH 2 "nonimmediate_operand" "rm"))
+ (match_operand:<DWI> 3 "const_int_operand"))
+ (zero_extend:<DWI>
+ (match_operand:DWIH 4 "nonimmediate_operand" "rm")))
+ (match_operand:<DWI> 1 "register_operand" "0")))
+ (clobber (reg:CC FLAGS_REG))]
+ "INTVAL (operands[3]) == <MODE_SIZE> * BITS_PER_UNIT"
+ "#"
+ "&& reload_completed"
+ [(parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (plus:DWIH (match_dup 1) (match_dup 4))
+ (match_dup 1)))
+ (set (match_dup 0)
+ (plus:DWIH (match_dup 1) (match_dup 4)))])
+ (parallel [(set (match_dup 5)
+ (plus:DWIH
+ (plus:DWIH
+ (ltu:DWIH (reg:CC FLAGS_REG) (const_int 0))
+ (match_dup 6))
+ (match_dup 2)))
+ (clobber (reg:CC FLAGS_REG))])]
+ "split_double_mode (<DWI>mode, &operands[0], 2, &operands[0], &operands[5]);")
+
(define_insn "*add<mode>_1"
[(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r,r")
(plus:SWI48
new file mode 100644
@@ -0,0 +1,14 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2" } */
+
+void test_c (unsigned int a, unsigned int b, unsigned int c, unsigned int d)
+{
+ volatile unsigned int x, y;
+ unsigned long long __a = b | ((unsigned long long)a << 32);
+ unsigned long long __b = d | ((unsigned long long)c << 32);
+ unsigned long long __c = __a + __b;
+ x = (unsigned int)(__c & 0xffffffff);
+ y = (unsigned int)(__c >> 32);
+}
+
+/* { dg-final { scan-assembler-times "movl" 4 } } */