i386: Move vzeroupper pass from after reload pass to after postreload_cse [PR112760]
Checks
Commit Message
Hi!
Regardless of the outcome of the REG_UNUSED discussions, I think
it is a good idea to move the vzeroupper pass one pass later.
As can be seen in the multiple PRs and as postreload.cc documents,
reload/LRA is known to create dead statements quite often, which
is the reason why we have postreload_cse pass at all.
Doing vzeroupper pass before such cleanup means the pass including
df_analyze for it needs to process more instructions than needed
and because mode switching adds note problem, also higher chance of
having stale REG_UNUSED notes.
And, I really don't see why vzeroupper can't wait until those cleanups
are done.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
2023-12-05 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/112760
* config/i386/i386-passes.def (pass_insert_vzeroupper): Insert
after pass_postreload_cse rather than pass_reload.
* config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper):
Adjust comment for it.
* gcc.dg/pr112760.c: New test.
Jakub
Comments
On Wed, Dec 6, 2023 at 6:23 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> Hi!
>
> Regardless of the outcome of the REG_UNUSED discussions, I think
> it is a good idea to move the vzeroupper pass one pass later.
> As can be seen in the multiple PRs and as postreload.cc documents,
> reload/LRA is known to create dead statements quite often, which
> is the reason why we have postreload_cse pass at all.
> Doing vzeroupper pass before such cleanup means the pass including
> df_analyze for it needs to process more instructions than needed
> and because mode switching adds note problem, also higher chance of
> having stale REG_UNUSED notes.
> And, I really don't see why vzeroupper can't wait until those cleanups
> are done.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
LGTM.
>
> 2023-12-05 Jakub Jelinek <jakub@redhat.com>
>
> PR rtl-optimization/112760
> * config/i386/i386-passes.def (pass_insert_vzeroupper): Insert
> after pass_postreload_cse rather than pass_reload.
> * config/i386/i386-features.cc (rest_of_handle_insert_vzeroupper):
> Adjust comment for it.
>
> * gcc.dg/pr112760.c: New test.
>
> --- gcc/config/i386/i386-passes.def.jj 2023-01-16 11:52:15.960735877 +0100
> +++ gcc/config/i386/i386-passes.def 2023-12-05 19:15:01.748279329 +0100
> @@ -24,7 +24,7 @@ along with GCC; see the file COPYING3.
> REPLACE_PASS (PASS, INSTANCE, TGT_PASS)
> */
>
> - INSERT_PASS_AFTER (pass_reload, 1, pass_insert_vzeroupper);
> + INSERT_PASS_AFTER (pass_postreload_cse, 1, pass_insert_vzeroupper);
> INSERT_PASS_AFTER (pass_combine, 1, pass_stv, false /* timode_p */);
> /* Run the 64-bit STV pass before the CSE pass so that CONST0_RTX and
> CONSTM1_RTX generated by the STV pass can be CSEed. */
> --- gcc/config/i386/i386-features.cc.jj 2023-11-02 07:49:15.029894060 +0100
> +++ gcc/config/i386/i386-features.cc 2023-12-05 19:15:48.658620698 +0100
> @@ -2627,10 +2627,11 @@ convert_scalars_to_vector (bool timode_p
> static unsigned int
> rest_of_handle_insert_vzeroupper (void)
> {
> - /* vzeroupper instructions are inserted immediately after reload to
> - account for possible spills from 256bit or 512bit registers. The pass
> - reuses mode switching infrastructure by re-running mode insertion
> - pass, so disable entities that have already been processed. */
> + /* vzeroupper instructions are inserted immediately after reload and
> + postreload_cse to clean up after it a little bit to account for possible
> + spills from 256bit or 512bit registers. The pass reuses mode switching
> + infrastructure by re-running mode insertion pass, so disable entities
> + that have already been processed. */
> for (int i = 0; i < MAX_386_ENTITIES; i++)
> ix86_optimize_mode_switching[i] = 0;
>
> --- gcc/testsuite/gcc.dg/pr112760.c.jj 2023-12-01 13:46:57.444746529 +0100
> +++ gcc/testsuite/gcc.dg/pr112760.c 2023-12-01 13:46:36.729036971 +0100
> @@ -0,0 +1,22 @@
> +/* PR rtl-optimization/112760 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fno-dce -fno-guess-branch-probability --param=max-cse-insns=0" } */
> +/* { dg-additional-options "-m8bit-idiv -mavx" { target i?86-*-* x86_64-*-* } } */
> +
> +unsigned g;
> +
> +__attribute__((__noipa__)) unsigned short
> +foo (unsigned short a, unsigned short b)
> +{
> + unsigned short x = __builtin_add_overflow_p (a, g, (unsigned short) 0);
> + g -= g / b;
> + return x;
> +}
> +
> +int
> +main ()
> +{
> + unsigned short x = foo (40, 6);
> + if (x != 0)
> + __builtin_abort ();
> +}
>
> Jakub
>
@@ -24,7 +24,7 @@ along with GCC; see the file COPYING3.
REPLACE_PASS (PASS, INSTANCE, TGT_PASS)
*/
- INSERT_PASS_AFTER (pass_reload, 1, pass_insert_vzeroupper);
+ INSERT_PASS_AFTER (pass_postreload_cse, 1, pass_insert_vzeroupper);
INSERT_PASS_AFTER (pass_combine, 1, pass_stv, false /* timode_p */);
/* Run the 64-bit STV pass before the CSE pass so that CONST0_RTX and
CONSTM1_RTX generated by the STV pass can be CSEed. */
@@ -2627,10 +2627,11 @@ convert_scalars_to_vector (bool timode_p
static unsigned int
rest_of_handle_insert_vzeroupper (void)
{
- /* vzeroupper instructions are inserted immediately after reload to
- account for possible spills from 256bit or 512bit registers. The pass
- reuses mode switching infrastructure by re-running mode insertion
- pass, so disable entities that have already been processed. */
+ /* vzeroupper instructions are inserted immediately after reload and
+ postreload_cse to clean up after it a little bit to account for possible
+ spills from 256bit or 512bit registers. The pass reuses mode switching
+ infrastructure by re-running mode insertion pass, so disable entities
+ that have already been processed. */
for (int i = 0; i < MAX_386_ENTITIES; i++)
ix86_optimize_mode_switching[i] = 0;
@@ -0,0 +1,22 @@
+/* PR rtl-optimization/112760 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-dce -fno-guess-branch-probability --param=max-cse-insns=0" } */
+/* { dg-additional-options "-m8bit-idiv -mavx" { target i?86-*-* x86_64-*-* } } */
+
+unsigned g;
+
+__attribute__((__noipa__)) unsigned short
+foo (unsigned short a, unsigned short b)
+{
+ unsigned short x = __builtin_add_overflow_p (a, g, (unsigned short) 0);
+ g -= g / b;
+ return x;
+}
+
+int
+main ()
+{
+ unsigned short x = foo (40, 6);
+ if (x != 0)
+ __builtin_abort ();
+}