[x86_64] PR target/113690: Fix-up MULT REG_EQUAL notes in STV.
Checks
Commit Message
This patch fixes PR target/113690, an ICE-on-valid regression on x86_64
that exhibits with a specific combination of command line options. The
cause is that x86's scalar-to-vector pass converts a chain of instructions
from TImode to V1TImode, but fails to appropriately update the attached
REG_EQUAL note. Given that multiplication isn't supported in V1TImode,
the REG_NOTE handling code wasn't expecting to see a MULT. Easily solved
with additional handling for other binary operators that may potentially
(in future) have an immediate constant as the second operand that needs
handling. For convenience, this code (re)factors the logic to convert
a TImode constant into a V1TImode constant vector into a subroutine and
reuses it.
For the record, STV is actually doing something useful in this strange
testcase, GCC with -O2 -fno-dce -fno-forward-propagate
-fno-split-wide-types
-funroll-loops generates:
foo: movl $v, %eax
pxor %xmm0, %xmm0
movaps %xmm0, 48(%rax)
movaps %xmm0, (%rax)
movaps %xmm0, 16(%rax)
movaps %xmm0, 32(%rax)
ret
With the addition of -mno-stv (to disable the patched code) it gives:
foo: movl $v, %eax
movq $0, 48(%rax)
movq $0, 56(%rax)
movq $0, (%rax)
movq $0, 8(%rax)
movq $0, 16(%rax)
movq $0, 24(%rax)
movq $0, 32(%rax)
movq $0, 40(%rax)
ret
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures. Ok for mainline?
2024-02-05 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/113690
* config/i386/i386-features.cc (timode_convert_cst): New helper
function to convert a TImode CONST_SCALAR_INT_P to a V1TImode
CONST_VECTOR.
(timode_scalar_chain::convert_op): Use timode_convert_cst.
(timode_scalar_chain::convert_insn): If a REG_EQUAL note contains
a binary operator where the second operand is an immediate integer
constant, convert it to V1TImode using timode_convert_cst.
Use timode_convert_cst.
gcc/testsuite/ChangeLog
PR target/113690
* gcc.target/i386/pr113690.c: New test case.
Thanks in advance,
Roger
--
Comments
On Mon, Feb 5, 2024 at 1:24 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch fixes PR target/113690, an ICE-on-valid regression on x86_64
> that exhibits with a specific combination of command line options. The
> cause is that x86's scalar-to-vector pass converts a chain of instructions
> from TImode to V1TImode, but fails to appropriately update the attached
> REG_EQUAL note. Given that multiplication isn't supported in V1TImode,
> the REG_NOTE handling code wasn't expecting to see a MULT. Easily solved
> with additional handling for other binary operators that may potentially
> (in future) have an immediate constant as the second operand that needs
> handling. For convenience, this code (re)factors the logic to convert
> a TImode constant into a V1TImode constant vector into a subroutine and
> reuses it.
>
> For the record, STV is actually doing something useful in this strange
> testcase, GCC with -O2 -fno-dce -fno-forward-propagate
> -fno-split-wide-types
> -funroll-loops generates:
>
> foo: movl $v, %eax
> pxor %xmm0, %xmm0
> movaps %xmm0, 48(%rax)
> movaps %xmm0, (%rax)
> movaps %xmm0, 16(%rax)
> movaps %xmm0, 32(%rax)
> ret
>
> With the addition of -mno-stv (to disable the patched code) it gives:
>
> foo: movl $v, %eax
> movq $0, 48(%rax)
> movq $0, 56(%rax)
> movq $0, (%rax)
> movq $0, 8(%rax)
> movq $0, 16(%rax)
> movq $0, 24(%rax)
> movq $0, 32(%rax)
> movq $0, 40(%rax)
> ret
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures. Ok for mainline?
>
>
> 2024-02-05 Roger Sayle <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
> PR target/113690
> * config/i386/i386-features.cc (timode_convert_cst): New helper
> function to convert a TImode CONST_SCALAR_INT_P to a V1TImode
> CONST_VECTOR.
> (timode_scalar_chain::convert_op): Use timode_convert_cst.
> (timode_scalar_chain::convert_insn): If a REG_EQUAL note contains
> a binary operator where the second operand is an immediate integer
> constant, convert it to V1TImode using timode_convert_cst.
> Use timode_convert_cst.
>
> gcc/testsuite/ChangeLog
> PR target/113690
> * gcc.target/i386/pr113690.c: New test case.
OK.
Thanks,
Uros.
>
>
> Thanks in advance,
> Roger
> --
>
On Mon, Feb 5, 2024 at 9:06 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, Feb 5, 2024 at 1:24 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
> >
> >
> > This patch fixes PR target/113690, an ICE-on-valid regression on x86_64
> > that exhibits with a specific combination of command line options. The
> > cause is that x86's scalar-to-vector pass converts a chain of instructions
> > from TImode to V1TImode, but fails to appropriately update the attached
> > REG_EQUAL note. Given that multiplication isn't supported in V1TImode,
> > the REG_NOTE handling code wasn't expecting to see a MULT. Easily solved
> > with additional handling for other binary operators that may potentially
> > (in future) have an immediate constant as the second operand that needs
> > handling. For convenience, this code (re)factors the logic to convert
> > a TImode constant into a V1TImode constant vector into a subroutine and
> > reuses it.
> >
> > For the record, STV is actually doing something useful in this strange
> > testcase, GCC with -O2 -fno-dce -fno-forward-propagate
> > -fno-split-wide-types
> > -funroll-loops generates:
> >
> > foo: movl $v, %eax
> > pxor %xmm0, %xmm0
> > movaps %xmm0, 48(%rax)
> > movaps %xmm0, (%rax)
> > movaps %xmm0, 16(%rax)
> > movaps %xmm0, 32(%rax)
> > ret
> >
> > With the addition of -mno-stv (to disable the patched code) it gives:
> >
> > foo: movl $v, %eax
> > movq $0, 48(%rax)
> > movq $0, 56(%rax)
> > movq $0, (%rax)
> > movq $0, 8(%rax)
> > movq $0, 16(%rax)
> > movq $0, 24(%rax)
> > movq $0, 32(%rax)
> > movq $0, 40(%rax)
> > ret
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures. Ok for mainline?
> >
> >
> > 2024-02-05 Roger Sayle <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> > PR target/113690
> > * config/i386/i386-features.cc (timode_convert_cst): New helper
> > function to convert a TImode CONST_SCALAR_INT_P to a V1TImode
> > CONST_VECTOR.
> > (timode_scalar_chain::convert_op): Use timode_convert_cst.
> > (timode_scalar_chain::convert_insn): If a REG_EQUAL note contains
> > a binary operator where the second operand is an immediate integer
> > constant, convert it to V1TImode using timode_convert_cst.
> > Use timode_convert_cst.
> >
> > gcc/testsuite/ChangeLog
> > PR target/113690
> > * gcc.target/i386/pr113690.c: New test case.
>
> OK.
OTOH, how about we follow the approach from
general_scalar_chain::convert_insn and just kill the note?
Uros.
@@ -1749,6 +1749,19 @@ timode_scalar_chain::fix_debug_reg_uses (rtx reg)
}
}
+/* Helper function to convert immediate constant X to V1TImode. */
+static rtx
+timode_convert_cst (rtx x)
+{
+ /* Prefer all ones vector in case of -1. */
+ if (constm1_operand (x, TImode))
+ return CONSTM1_RTX (V1TImode);
+
+ rtx *v = XALLOCAVEC (rtx, 1);
+ v[0] = x;
+ return gen_rtx_CONST_VECTOR (V1TImode, gen_rtvec_v (1, v));
+}
+
/* Convert operand OP in INSN from TImode to V1TImode. */
void
@@ -1775,18 +1788,8 @@ timode_scalar_chain::convert_op (rtx *op, rtx_insn *insn)
}
else if (CONST_SCALAR_INT_P (*op))
{
- rtx vec_cst;
rtx tmp = gen_reg_rtx (V1TImode);
-
- /* Prefer all ones vector in case of -1. */
- if (constm1_operand (*op, TImode))
- vec_cst = CONSTM1_RTX (V1TImode);
- else
- {
- rtx *v = XALLOCAVEC (rtx, 1);
- v[0] = *op;
- vec_cst = gen_rtx_CONST_VECTOR (V1TImode, gen_rtvec_v (1, v));
- }
+ rtx vec_cst = timode_convert_cst (*op);
if (!standard_sse_constant_p (vec_cst, V1TImode))
{
@@ -1830,12 +1833,28 @@ timode_scalar_chain::convert_insn (rtx_insn *insn)
tmp = find_reg_equal_equiv_note (insn);
if (tmp)
{
- if (GET_MODE (XEXP (tmp, 0)) == TImode)
- PUT_MODE (XEXP (tmp, 0), V1TImode);
- else if (CONST_SCALAR_INT_P (XEXP (tmp, 0)))
- XEXP (tmp, 0)
- = gen_rtx_CONST_VECTOR (V1TImode,
- gen_rtvec (1, XEXP (tmp, 0)));
+ rtx expr = XEXP (tmp, 0);
+ if (GET_MODE (expr) == TImode)
+ {
+ PUT_MODE (expr, V1TImode);
+ switch (GET_CODE (expr))
+ {
+ case PLUS:
+ case MINUS:
+ case MULT:
+ case AND:
+ case IOR:
+ case XOR:
+ if (CONST_SCALAR_INT_P (XEXP (expr, 1)))
+ XEXP (expr, 1) = timode_convert_cst (XEXP (expr, 1));
+ break;
+
+ default:
+ break;
+ }
+ }
+ else if (CONST_SCALAR_INT_P (expr))
+ XEXP (tmp, 0) = timode_convert_cst (expr);
}
}
break;
@@ -1876,7 +1895,7 @@ timode_scalar_chain::convert_insn (rtx_insn *insn)
}
else
{
- src = gen_rtx_CONST_VECTOR (V1TImode, gen_rtvec (1, src));
+ src = timode_convert_cst (src);
src = validize_mem (force_const_mem (V1TImode, src));
use_move = MEM_P (dst);
}
new file mode 100644
@@ -0,0 +1,12 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -fno-dce -fno-forward-propagate -fno-split-wide-types -funroll-loops" } */
+int i;
+__attribute__((__vector_size__(64))) __int128 v;
+
+void
+foo(void)
+{
+ v <<= 127;
+ __builtin_mul_overflow(0, i, &v[3]);
+ v *= 6;
+}