[x86_64] Introduce insvti_highpart define_insn_and_split.
Checks
Commit Message
This patch adds a convenient post-reload splitter for setting/updating
the highpart of a TImode variable, using the recently added
split_double_concat infrastructure.
For the new test case below:
__int128 foo(__int128 x, unsigned long long y)
{
__int128 t = (__int128)y << 64;
__int128 r = (x & ~0ull) | t;
return r;
}
mainline GCC with -O2 currently generates:
foo: movq %rdi, %rcx
xorl %eax, %eax
xorl %edi, %edi
orq %rcx, %rax
orq %rdi, %rdx
ret
with this patch, GCC instead now generates the much better:
foo: movq %rdi, %rcx
movq %rcx, %rax
ret
[which interestingly shows the deeper (ABI) issue that I'm trying
to fix, this new define_insn_and_split being the next piece].
It turns out that the -m32 equivalent of this testcase, already
avoids using explict orl/xor instructions, as it gets optimized
(in combine) by a completely different path. Given that this idiom
isn't seen in 32-bit code (and therefore this pattern wouldn't match),
and also that the shorter 32-bit AND bitmask is represented as a
CONST_INT rather than a CONST_WIDE_INT, this new define_insn_and_split
is implemented for just TARGET_64BIT rather than contort a "generic"
implementation using DWI mode iterators.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures. Ok for mainline? Please let me know if you'd
prefer a different pattern name [insv seemed better than mov].
2023-01-05 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386.md (any_or_plus): Move definition earlier.
(*insvti_highpart_1): New define_insn_and_split to overwrite
(insv) the highpart of a TImode register/memory.
gcc/testsuite/ChangeLog
* gcc.target/i386/insvti_highpart-1.c: New test case.
Thanks in advance,
Roger
--
Comments
On Thu, Jan 5, 2023 at 3:10 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch adds a convenient post-reload splitter for setting/updating
> the highpart of a TImode variable, using the recently added
> split_double_concat infrastructure.
>
> For the new test case below:
>
> __int128 foo(__int128 x, unsigned long long y)
> {
> __int128 t = (__int128)y << 64;
> __int128 r = (x & ~0ull) | t;
> return r;
> }
>
> mainline GCC with -O2 currently generates:
>
> foo: movq %rdi, %rcx
> xorl %eax, %eax
> xorl %edi, %edi
> orq %rcx, %rax
> orq %rdi, %rdx
> ret
>
> with this patch, GCC instead now generates the much better:
>
> foo: movq %rdi, %rcx
> movq %rcx, %rax
> ret
>
> [which interestingly shows the deeper (ABI) issue that I'm trying
> to fix, this new define_insn_and_split being the next piece].
>
> It turns out that the -m32 equivalent of this testcase, already
> avoids using explict orl/xor instructions, as it gets optimized
> (in combine) by a completely different path. Given that this idiom
> isn't seen in 32-bit code (and therefore this pattern wouldn't match),
> and also that the shorter 32-bit AND bitmask is represented as a
> CONST_INT rather than a CONST_WIDE_INT, this new define_insn_and_split
> is implemented for just TARGET_64BIT rather than contort a "generic"
> implementation using DWI mode iterators.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures. Ok for mainline? Please let me know if you'd
> prefer a different pattern name [insv seemed better than mov].
>
>
> 2023-01-05 Roger Sayle <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
> * config/i386/i386.md (any_or_plus): Move definition earlier.
> (*insvti_highpart_1): New define_insn_and_split to overwrite
> (insv) the highpart of a TImode register/memory.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/insvti_highpart-1.c: New test case.
LGTM, but the new pattern looks complex enough to make me a bit
nervous at this development stage. If the patch just extended the
existing SI/DI mode pattern to also handle DI/TI case for 64-bit
targets, I would approve it without hesitation, but since 128 bit
types are not that common ATM, I'd rather postpone introduction of
complex new pattern to the next stage 1.
Target specific part of the compiler does have some more freedom to
introduce new functionality during "general bugfixing" development
stage, so simple patches that extend existing patterns to handle new
modes or constraints should be OK, but let's not stretch this rule too
much.
To follow with the example, your recent extendditi2 splitter patch was
OK even at this development stage, because it just extended existing
patterns and existing approach to also handle 64bit instructions (that
are actually same as 32bit ones).
Uros.
@@ -3401,6 +3401,31 @@
"mov{b}\t{%h1, %h0|%h0, %h1}"
[(set_attr "type" "imov")
(set_attr "mode" "QI")])
+
+(define_code_iterator any_or_plus [plus ior xor])
+(define_insn_and_split "*insvti_highpart_1"
+ [(set (match_operand:TI 0 "nonimmediate_operand" "=ro,r,r,&r")
+ (any_or_plus:TI
+ (and:TI
+ (match_operand:TI 1 "nonimmediate_operand" "r,m,r,m")
+ (match_operand:TI 3 "const_scalar_int_operand" "n,n,n,n"))
+ (ashift:TI
+ (zero_extend:TI
+ (match_operand:DI 2 "nonimmediate_operand" "r,r,m,m"))
+ (const_int 64))))]
+ "TARGET_64BIT
+ && CONST_WIDE_INT_P (operands[3])
+ && CONST_WIDE_INT_NUNITS (operands[3]) == 2
+ && CONST_WIDE_INT_ELT (operands[3], 0) == -1
+ && CONST_WIDE_INT_ELT (operands[3], 1) == 0"
+ "#"
+ "&& reload_completed"
+ [(clobber (const_int 0))]
+{
+ operands[4] = gen_lowpart (DImode, operands[1]);
+ split_double_concat (TImode, operands[0], operands[4], operands[2]);
+ DONE;
+})
;; Floating point push instructions.
@@ -11418,7 +11443,6 @@
(set_attr "mode" "QI")])
;; Split DST = (HI<<32)|LO early to minimize register usage.
-(define_code_iterator any_or_plus [plus ior xor])
(define_insn_and_split "*concat<mode><dwi>3_1"
[(set (match_operand:<DWI> 0 "nonimmediate_operand" "=ro,r")
(any_or_plus:<DWI>
new file mode 100644
@@ -0,0 +1,12 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2" } */
+
+__int128 foo(__int128 x, unsigned long long y)
+{
+ __int128 t = (__int128)y << 64;
+ __int128 r = (x & ~0ull) | t;
+ return r;
+}
+
+/* { dg-final { scan-assembler-not "xorl" } } */
+/* { dg-final { scan-assembler-not "orq" } } */