[x86] Some tidy up for RA related hooks.

Message ID 20221121021150.3348406-1-hongtao.liu@intel.com
State Accepted
Headers
Series [x86] Some tidy up for RA related hooks. |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

liuhongt Nov. 21, 2022, 2:11 a.m. UTC
  When i'm working at [1] for ix86_can_change_mode_class,
I notice there're some incorrectness/misoptimization in current RA-related hook.
This patch tries to do some fix and tidy up for them:

1. We also need to guard size of TO to be
less than TARGET_SSE2 ? 2 : 4 in ix86_can_change_mode_class.
2. Merge VALID_AVX512FP16_SCALAR_MODE plus BFmode
into VALID_AVX512F_SCALAR_MODE since we've support 16-bit data move
above SSE2, so no need for the condition of AVX512FP16 for those evex
sse registers.
3. Allocate DI/HImode to sse register for SSE2 above just like
SImode since we've supported 16-bit data move between sse and gpr
above SSE2, this will help RA to handle cases like (subreg:HI (reg:V8HI)
0) or else RA will spill it. This enable optimization for
pieces-memset-{3,37,39}.c
4. Guard 64/32-bit vector move patterns with ix86_hard_reg_move_ok.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606373.html

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

	* config/i386/i386.cc (ix86_can_change_mode_class): Also guard
	size of TO.
	(ix86_hard_regno_mode_ok): Remove VALID_AVX512FP16_SCALAR_MODE
	* config/i386/i386.h (VALID_AVX512FP16_SCALAR_MODE): Merged to
	..
	(VALID_AVX512F_SCALAR_MODE): .. this, also add HImode.
	(VALID_SSE_REG_MODE): Add DI/HImode.
	* config/i386/mmx.md (*mov<mode>_internal): Add
	ix86_hard_reg_move_ok to condition.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pieces-memset-3.c: Remove xfail.
	* gcc.target/i386/pieces-memset-37.c: Remove xfail.
	* gcc.target/i386/pieces-memset-39.c: Remove xfail.
---
 gcc/config/i386/i386.cc                          |  9 ++-------
 gcc/config/i386/i386.h                           | 16 ++++++++--------
 gcc/config/i386/mmx.md                           |  6 ++++--
 gcc/testsuite/gcc.target/i386/pieces-memset-3.c  |  4 ++--
 gcc/testsuite/gcc.target/i386/pieces-memset-37.c |  4 ++--
 gcc/testsuite/gcc.target/i386/pieces-memset-39.c |  4 ++--
 6 files changed, 20 insertions(+), 23 deletions(-)
  

Comments

Hongtao Liu Nov. 21, 2022, 5:28 a.m. UTC | #1
On Mon, Nov 21, 2022 at 10:13 AM liuhongt <hongtao.liu@intel.com> wrote:
>
> When i'm working at [1] for ix86_can_change_mode_class,
> I notice there're some incorrectness/misoptimization in current RA-related hook.
> This patch tries to do some fix and tidy up for them:
>
> 1. We also need to guard size of TO to be
> less than TARGET_SSE2 ? 2 : 4 in ix86_can_change_mode_class.
> 2. Merge VALID_AVX512FP16_SCALAR_MODE plus BFmode
> into VALID_AVX512F_SCALAR_MODE since we've support 16-bit data move
> above SSE2, so no need for the condition of AVX512FP16 for those evex
> sse registers.
> 3. Allocate DI/HImode to sse register for SSE2 above just like
> SImode since we've supported 16-bit data move between sse and gpr
> above SSE2, this will help RA to handle cases like (subreg:HI (reg:V8HI)
> 0) or else RA will spill it. This enable optimization for
> pieces-memset-{3,37,39}.c
> 4. Guard 64/32-bit vector move patterns with ix86_hard_reg_move_ok.
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606373.html
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
>         * config/i386/i386.cc (ix86_can_change_mode_class): Also guard
>         size of TO.
>         (ix86_hard_regno_mode_ok): Remove VALID_AVX512FP16_SCALAR_MODE
>         * config/i386/i386.h (VALID_AVX512FP16_SCALAR_MODE): Merged to
>         ..
>         (VALID_AVX512F_SCALAR_MODE): .. this, also add HImode.
>         (VALID_SSE_REG_MODE): Add DI/HImode.
>         * config/i386/mmx.md (*mov<mode>_internal): Add
>         ix86_hard_reg_move_ok to condition.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/i386/pieces-memset-3.c: Remove xfail.
>         * gcc.target/i386/pieces-memset-37.c: Remove xfail.
>         * gcc.target/i386/pieces-memset-39.c: Remove xfail.
> ---
>  gcc/config/i386/i386.cc                          |  9 ++-------
>  gcc/config/i386/i386.h                           | 16 ++++++++--------
>  gcc/config/i386/mmx.md                           |  6 ++++--
>  gcc/testsuite/gcc.target/i386/pieces-memset-3.c  |  4 ++--
>  gcc/testsuite/gcc.target/i386/pieces-memset-37.c |  4 ++--
>  gcc/testsuite/gcc.target/i386/pieces-memset-39.c |  4 ++--
>  6 files changed, 20 insertions(+), 23 deletions(-)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 292b32c5e99..030c26965ab 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -19725,7 +19725,8 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to,
>          the vec_dupv4hi pattern.
>          NB: SSE2 can load 16bit data to sse register via pinsrw.  */
>        int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 : 4;
> -      if (GET_MODE_SIZE (from) < mov_size)
> +      if (GET_MODE_SIZE (from) < mov_size
> +         || GET_MODE_SIZE (to) < mov_size)
>         return false;
>      }
>
> @@ -20089,12 +20090,6 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
>               || VALID_AVX512F_SCALAR_MODE (mode)))
>         return true;
>
> -      /* For AVX512FP16, vmovw supports movement of HImode
> -        and HFmode between GPR and SSE registers.  */
> -      if (TARGET_AVX512FP16
> -         && VALID_AVX512FP16_SCALAR_MODE (mode))
> -       return true;
> -
>        /* For AVX-5124FMAPS or AVX-5124VNNIW
>          allow V64SF and V64SI modes for special regnos.  */
>        if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW)
> diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> index 3869db8f2d3..d9a1fb0e420 100644
> --- a/gcc/config/i386/i386.h
> +++ b/gcc/config/i386/i386.h
> @@ -1017,11 +1017,9 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>    (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode)
>
>  #define VALID_AVX512F_SCALAR_MODE(MODE)                                        \
> -  ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode            \
> -   || (MODE) == SFmode)
> -
> -#define VALID_AVX512FP16_SCALAR_MODE(MODE)     \
> -  ((MODE) == HImode || (MODE) == HFmode)
> +  ((MODE) == DImode || (MODE) == DFmode                                        \
> +   || (MODE) == SImode || (MODE) == SFmode                             \
> +   || (MODE) == HImode || (MODE) == HFmode || (MODE) == BFmode)
>
>  #define VALID_AVX512F_REG_MODE(MODE)                                   \
>    ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode     \
> @@ -1045,13 +1043,15 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
>     || (MODE) == V8HFmode || (MODE) == V4HFmode || (MODE) == V2HFmode   \
>     || (MODE) == V8BFmode || (MODE) == V4BFmode || (MODE) == V2BFmode   \
>     || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode   \
> -   || (MODE) == V2DImode || (MODE) == V2QImode || (MODE) == DFmode     \
> -   || (MODE) == HFmode || (MODE) == BFmode)
> +   || (MODE) == V2DImode || (MODE) == V2QImode                         \
> +   || (MODE) == DFmode || (MODE) == DImode                             \
> +   || (MODE) == HFmode || (MODE) == BFmode || (MODE) == HImode)
I realize there's no TARGET_SSE2 for VALID_SSE_REG_MODE in
hard_regno_mode_ok, it's ok for other modes except HImode which must
require SSE2, orelse we can move 16-bit from/to integer

So Remove HImode from it, and add it separately in hard_regno_mode_ok
 as +      /* Use pinsrw/pextrw to mov 16-bit data from/to sse to/from
integer.  */
+      if (TARGET_SSE2 && mode == HImode)
+ return true;
+


>
>  #define VALID_SSE_REG_MODE(MODE)                                       \
>    ((MODE) == V1TImode || (MODE) == TImode                              \
>     || (MODE) == V4SFmode || (MODE) == V4SImode                         \
> -   || (MODE) == SFmode || (MODE) == TFmode || (MODE) == TDmode)
> +   || (MODE) == SFmode || (MODE) == SImode                             \
> +   || (MODE) == TFmode || (MODE) == TDmode)
>
>  #define VALID_MMX_REG_MODE_3DNOW(MODE) \
>    ((MODE) == V2SFmode || (MODE) == SFmode)
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index d5134cc351e..63aff287795 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -133,7 +133,8 @@ (define_insn "*mov<mode>_internal"
>         (match_operand:MMXMODE 1 "nonimm_or_0_operand"
>      "rCo,rC,C,rm,rC,C  ,!y,m  ,?!y,?!y,r  ,C,v,m,v,v,r,*x,!y"))]
>    "(TARGET_MMX || TARGET_MMX_WITH_SSE)
> -   && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
> +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))
> +   && ix86_hardreg_mov_ok (operands[0], operands[1])"
>  {
>    switch (get_attr_type (insn))
>      {
> @@ -286,7 +287,8 @@ (define_insn "*mov<mode>_internal"
>      "=r ,m ,v,v,v,m,r,v")
>         (match_operand:V_32 1 "general_operand"
>      "rmC,rC,C,v,m,v,v,r"))]
> -  "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> +  "!(MEM_P (operands[0]) && MEM_P (operands[1]))
> +   && ix86_hardreg_mov_ok (operands[0], operands[1])"
>  {
>    switch (get_attr_type (insn))
>      {
> diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-3.c b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c
> index 765441a7c4a..4f105f58b26 100644
> --- a/gcc/testsuite/gcc.target/i386/pieces-memset-3.c
> +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c
> @@ -13,6 +13,6 @@ foo (int x)
>  /* { dg-final { scan-assembler-times "vinserti64x4\[ \\t\]+\[^\n\]*%zmm" 1 } } */
>  /* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 1 } } */
>  /* No need to dynamically realign the stack here.  */
> -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
>  /* Nor use a frame pointer.  */
> -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-37.c b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c
> index 0c5056be54d..fd09bd153ce 100644
> --- a/gcc/testsuite/gcc.target/i386/pieces-memset-37.c
> +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c
> @@ -10,6 +10,6 @@ foo (int a1, int a2, int a3, int a4, int a5, int a6, int x, char *dst)
>  /* { dg-final { scan-assembler-times "vpbroadcastb\[ \\t\]+\[^\n\]*%ymm" 1 } } */
>  /* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */
>  /* No need to dynamically realign the stack here.  */
> -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
>  /* Nor use a frame pointer.  */
> -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
> diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-39.c b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c
> index e33644c2f10..0ed88b274bd 100644
> --- a/gcc/testsuite/gcc.target/i386/pieces-memset-39.c
> +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c
> @@ -11,6 +11,6 @@ foo (int a1, int a2, int a3, int a4, int a5, int a6, int x, char *dst)
>  /* { dg-final { scan-assembler-not "vinserti64x4" } } */
>  /* { dg-final { scan-assembler-times "vmovdqu8\[ \\t\]+\[^\n\]*%zmm" 1 } } */
>  /* No need to dynamically realign the stack here.  */
> -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
>  /* Nor use a frame pointer.  */
> -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */
> +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
> --
> 2.27.0
>
  
Uros Bizjak Nov. 21, 2022, 7:17 a.m. UTC | #2
On Mon, Nov 21, 2022 at 6:24 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Mon, Nov 21, 2022 at 10:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> >
> > When i'm working at [1] for ix86_can_change_mode_class,
> > I notice there're some incorrectness/misoptimization in current RA-related hook.
> > This patch tries to do some fix and tidy up for them:
> >
> > 1. We also need to guard size of TO to be
> > less than TARGET_SSE2 ? 2 : 4 in ix86_can_change_mode_class.
> > 2. Merge VALID_AVX512FP16_SCALAR_MODE plus BFmode
> > into VALID_AVX512F_SCALAR_MODE since we've support 16-bit data move
> > above SSE2, so no need for the condition of AVX512FP16 for those evex
> > sse registers.
> > 3. Allocate DI/HImode to sse register for SSE2 above just like
> > SImode since we've supported 16-bit data move between sse and gpr
> > above SSE2, this will help RA to handle cases like (subreg:HI (reg:V8HI)
> > 0) or else RA will spill it. This enable optimization for
> > pieces-memset-{3,37,39}.c
> > 4. Guard 64/32-bit vector move patterns with ix86_hard_reg_move_ok.
> >
> > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606373.html
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> >         * config/i386/i386.cc (ix86_can_change_mode_class): Also guard
> >         size of TO.
> >         (ix86_hard_regno_mode_ok): Remove VALID_AVX512FP16_SCALAR_MODE
> >         * config/i386/i386.h (VALID_AVX512FP16_SCALAR_MODE): Merged to
> >         ..
> >         (VALID_AVX512F_SCALAR_MODE): .. this, also add HImode.
> >         (VALID_SSE_REG_MODE): Add DI/HImode.
> >         * config/i386/mmx.md (*mov<mode>_internal): Add
> >         ix86_hard_reg_move_ok to condition.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/i386/pieces-memset-3.c: Remove xfail.
> >         * gcc.target/i386/pieces-memset-37.c: Remove xfail.
> >         * gcc.target/i386/pieces-memset-39.c: Remove xfail.

OK.

This is somehow tricky part of the compiler, so it would be nice if
the patch can be split to a couple of patches to ease bisecting if
something goes wrong. OTOH, recently there were a couple of similar
changes in this area, and there were no problems.

Thanks,
Uros.

> > ---
> >  gcc/config/i386/i386.cc                          |  9 ++-------
> >  gcc/config/i386/i386.h                           | 16 ++++++++--------
> >  gcc/config/i386/mmx.md                           |  6 ++++--
> >  gcc/testsuite/gcc.target/i386/pieces-memset-3.c  |  4 ++--
> >  gcc/testsuite/gcc.target/i386/pieces-memset-37.c |  4 ++--
> >  gcc/testsuite/gcc.target/i386/pieces-memset-39.c |  4 ++--
> >  6 files changed, 20 insertions(+), 23 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 292b32c5e99..030c26965ab 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -19725,7 +19725,8 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to,
> >          the vec_dupv4hi pattern.
> >          NB: SSE2 can load 16bit data to sse register via pinsrw.  */
> >        int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 : 4;
> > -      if (GET_MODE_SIZE (from) < mov_size)
> > +      if (GET_MODE_SIZE (from) < mov_size
> > +         || GET_MODE_SIZE (to) < mov_size)
> >         return false;
> >      }
> >
> > @@ -20089,12 +20090,6 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
> >               || VALID_AVX512F_SCALAR_MODE (mode)))
> >         return true;
> >
> > -      /* For AVX512FP16, vmovw supports movement of HImode
> > -        and HFmode between GPR and SSE registers.  */
> > -      if (TARGET_AVX512FP16
> > -         && VALID_AVX512FP16_SCALAR_MODE (mode))
> > -       return true;
> > -
> >        /* For AVX-5124FMAPS or AVX-5124VNNIW
> >          allow V64SF and V64SI modes for special regnos.  */
> >        if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW)
> > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > index 3869db8f2d3..d9a1fb0e420 100644
> > --- a/gcc/config/i386/i386.h
> > +++ b/gcc/config/i386/i386.h
> > @@ -1017,11 +1017,9 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
> >    (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode)
> >
> >  #define VALID_AVX512F_SCALAR_MODE(MODE)                                        \
> > -  ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode            \
> > -   || (MODE) == SFmode)
> > -
> > -#define VALID_AVX512FP16_SCALAR_MODE(MODE)     \
> > -  ((MODE) == HImode || (MODE) == HFmode)
> > +  ((MODE) == DImode || (MODE) == DFmode                                        \
> > +   || (MODE) == SImode || (MODE) == SFmode                             \
> > +   || (MODE) == HImode || (MODE) == HFmode || (MODE) == BFmode)
> >
> >  #define VALID_AVX512F_REG_MODE(MODE)                                   \
> >    ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode     \
> > @@ -1045,13 +1043,15 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
> >     || (MODE) == V8HFmode || (MODE) == V4HFmode || (MODE) == V2HFmode   \
> >     || (MODE) == V8BFmode || (MODE) == V4BFmode || (MODE) == V2BFmode   \
> >     || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode   \
> > -   || (MODE) == V2DImode || (MODE) == V2QImode || (MODE) == DFmode     \
> > -   || (MODE) == HFmode || (MODE) == BFmode)
> > +   || (MODE) == V2DImode || (MODE) == V2QImode                         \
> > +   || (MODE) == DFmode || (MODE) == DImode                             \
> > +   || (MODE) == HFmode || (MODE) == BFmode || (MODE) == HImode)
> I realize there's no TARGET_SSE2 for VALID_SSE_REG_MODE in
> hard_regno_mode_ok, it's ok for other modes except HImode which must
> require SSE2, orelse we can move 16-bit from/to integer
>
> So Remove HImode from it, and add it separately in hard_regno_mode_ok
>  as +      /* Use pinsrw/pextrw to mov 16-bit data from/to sse to/from
> integer.  */
> +      if (TARGET_SSE2 && mode == HImode)
> + return true;
> +
>
>
> >
> >  #define VALID_SSE_REG_MODE(MODE)                                       \
> >    ((MODE) == V1TImode || (MODE) == TImode                              \
> >     || (MODE) == V4SFmode || (MODE) == V4SImode                         \
> > -   || (MODE) == SFmode || (MODE) == TFmode || (MODE) == TDmode)
> > +   || (MODE) == SFmode || (MODE) == SImode                             \
> > +   || (MODE) == TFmode || (MODE) == TDmode)
> >
> >  #define VALID_MMX_REG_MODE_3DNOW(MODE) \
> >    ((MODE) == V2SFmode || (MODE) == SFmode)
> > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> > index d5134cc351e..63aff287795 100644
> > --- a/gcc/config/i386/mmx.md
> > +++ b/gcc/config/i386/mmx.md
> > @@ -133,7 +133,8 @@ (define_insn "*mov<mode>_internal"
> >         (match_operand:MMXMODE 1 "nonimm_or_0_operand"
> >      "rCo,rC,C,rm,rC,C  ,!y,m  ,?!y,?!y,r  ,C,v,m,v,v,r,*x,!y"))]
> >    "(TARGET_MMX || TARGET_MMX_WITH_SSE)
> > -   && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
> > +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))
> > +   && ix86_hardreg_mov_ok (operands[0], operands[1])"
> >  {
> >    switch (get_attr_type (insn))
> >      {
> > @@ -286,7 +287,8 @@ (define_insn "*mov<mode>_internal"
> >      "=r ,m ,v,v,v,m,r,v")
> >         (match_operand:V_32 1 "general_operand"
> >      "rmC,rC,C,v,m,v,v,r"))]
> > -  "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> > +  "!(MEM_P (operands[0]) && MEM_P (operands[1]))
> > +   && ix86_hardreg_mov_ok (operands[0], operands[1])"
> >  {
> >    switch (get_attr_type (insn))
> >      {
> > diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-3.c b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c
> > index 765441a7c4a..4f105f58b26 100644
> > --- a/gcc/testsuite/gcc.target/i386/pieces-memset-3.c
> > +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c
> > @@ -13,6 +13,6 @@ foo (int x)
> >  /* { dg-final { scan-assembler-times "vinserti64x4\[ \\t\]+\[^\n\]*%zmm" 1 } } */
> >  /* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 1 } } */
> >  /* No need to dynamically realign the stack here.  */
> > -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */
> > +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
> >  /* Nor use a frame pointer.  */
> > -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */
> > +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-37.c b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c
> > index 0c5056be54d..fd09bd153ce 100644
> > --- a/gcc/testsuite/gcc.target/i386/pieces-memset-37.c
> > +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c
> > @@ -10,6 +10,6 @@ foo (int a1, int a2, int a3, int a4, int a5, int a6, int x, char *dst)
> >  /* { dg-final { scan-assembler-times "vpbroadcastb\[ \\t\]+\[^\n\]*%ymm" 1 } } */
> >  /* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */
> >  /* No need to dynamically realign the stack here.  */
> > -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */
> > +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
> >  /* Nor use a frame pointer.  */
> > -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */
> > +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
> > diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-39.c b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c
> > index e33644c2f10..0ed88b274bd 100644
> > --- a/gcc/testsuite/gcc.target/i386/pieces-memset-39.c
> > +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c
> > @@ -11,6 +11,6 @@ foo (int a1, int a2, int a3, int a4, int a5, int a6, int x, char *dst)
> >  /* { dg-final { scan-assembler-not "vinserti64x4" } } */
> >  /* { dg-final { scan-assembler-times "vmovdqu8\[ \\t\]+\[^\n\]*%zmm" 1 } } */
> >  /* No need to dynamically realign the stack here.  */
> > -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */
> > +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
> >  /* Nor use a frame pointer.  */
> > -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */
> > +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
> > --
> > 2.27.0
> >
>
>
> --
> BR,
> Hongtao
  
Hongtao Liu Nov. 21, 2022, 8:11 a.m. UTC | #3
On Mon, Nov 21, 2022 at 3:17 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, Nov 21, 2022 at 6:24 AM Hongtao Liu <crazylht@gmail.com> wrote:
> >
> > On Mon, Nov 21, 2022 at 10:13 AM liuhongt <hongtao.liu@intel.com> wrote:
> > >
> > > When i'm working at [1] for ix86_can_change_mode_class,
> > > I notice there're some incorrectness/misoptimization in current RA-related hook.
> > > This patch tries to do some fix and tidy up for them:
> > >
> > > 1. We also need to guard size of TO to be
> > > less than TARGET_SSE2 ? 2 : 4 in ix86_can_change_mode_class.
> > > 2. Merge VALID_AVX512FP16_SCALAR_MODE plus BFmode
> > > into VALID_AVX512F_SCALAR_MODE since we've support 16-bit data move
> > > above SSE2, so no need for the condition of AVX512FP16 for those evex
> > > sse registers.
> > > 3. Allocate DI/HImode to sse register for SSE2 above just like
> > > SImode since we've supported 16-bit data move between sse and gpr
> > > above SSE2, this will help RA to handle cases like (subreg:HI (reg:V8HI)
> > > 0) or else RA will spill it. This enable optimization for
> > > pieces-memset-{3,37,39}.c
> > > 4. Guard 64/32-bit vector move patterns with ix86_hard_reg_move_ok.
> > >
> > > [1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606373.html
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > >         * config/i386/i386.cc (ix86_can_change_mode_class): Also guard
> > >         size of TO.
> > >         (ix86_hard_regno_mode_ok): Remove VALID_AVX512FP16_SCALAR_MODE
> > >         * config/i386/i386.h (VALID_AVX512FP16_SCALAR_MODE): Merged to
> > >         ..
> > >         (VALID_AVX512F_SCALAR_MODE): .. this, also add HImode.
> > >         (VALID_SSE_REG_MODE): Add DI/HImode.
> > >         * config/i386/mmx.md (*mov<mode>_internal): Add
> > >         ix86_hard_reg_move_ok to condition.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >         * gcc.target/i386/pieces-memset-3.c: Remove xfail.
> > >         * gcc.target/i386/pieces-memset-37.c: Remove xfail.
> > >         * gcc.target/i386/pieces-memset-39.c: Remove xfail.
>
> OK.
>
> This is somehow tricky part of the compiler, so it would be nice if
> the patch can be split to a couple of patches to ease bisecting if
> something goes wrong. OTOH, recently there were a couple of similar
> changes in this area, and there were no problems.
Ok, I'll separate ix86_hard_reg_move_ok part into a separate patch.
>
> Thanks,
> Uros.
>
> > > ---
> > >  gcc/config/i386/i386.cc                          |  9 ++-------
> > >  gcc/config/i386/i386.h                           | 16 ++++++++--------
> > >  gcc/config/i386/mmx.md                           |  6 ++++--
> > >  gcc/testsuite/gcc.target/i386/pieces-memset-3.c  |  4 ++--
> > >  gcc/testsuite/gcc.target/i386/pieces-memset-37.c |  4 ++--
> > >  gcc/testsuite/gcc.target/i386/pieces-memset-39.c |  4 ++--
> > >  6 files changed, 20 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index 292b32c5e99..030c26965ab 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -19725,7 +19725,8 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to,
> > >          the vec_dupv4hi pattern.
> > >          NB: SSE2 can load 16bit data to sse register via pinsrw.  */
> > >        int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 : 4;
> > > -      if (GET_MODE_SIZE (from) < mov_size)
> > > +      if (GET_MODE_SIZE (from) < mov_size
> > > +         || GET_MODE_SIZE (to) < mov_size)
> > >         return false;
> > >      }
> > >
> > > @@ -20089,12 +20090,6 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
> > >               || VALID_AVX512F_SCALAR_MODE (mode)))
> > >         return true;
> > >
> > > -      /* For AVX512FP16, vmovw supports movement of HImode
> > > -        and HFmode between GPR and SSE registers.  */
> > > -      if (TARGET_AVX512FP16
> > > -         && VALID_AVX512FP16_SCALAR_MODE (mode))
> > > -       return true;
> > > -
> > >        /* For AVX-5124FMAPS or AVX-5124VNNIW
> > >          allow V64SF and V64SI modes for special regnos.  */
> > >        if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW)
> > > diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
> > > index 3869db8f2d3..d9a1fb0e420 100644
> > > --- a/gcc/config/i386/i386.h
> > > +++ b/gcc/config/i386/i386.h
> > > @@ -1017,11 +1017,9 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
> > >    (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode)
> > >
> > >  #define VALID_AVX512F_SCALAR_MODE(MODE)                                        \
> > > -  ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode            \
> > > -   || (MODE) == SFmode)
> > > -
> > > -#define VALID_AVX512FP16_SCALAR_MODE(MODE)     \
> > > -  ((MODE) == HImode || (MODE) == HFmode)
> > > +  ((MODE) == DImode || (MODE) == DFmode                                        \
> > > +   || (MODE) == SImode || (MODE) == SFmode                             \
> > > +   || (MODE) == HImode || (MODE) == HFmode || (MODE) == BFmode)
> > >
> > >  #define VALID_AVX512F_REG_MODE(MODE)                                   \
> > >    ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode     \
> > > @@ -1045,13 +1043,15 @@ extern const char *host_detect_local_cpu (int argc, const char **argv);
> > >     || (MODE) == V8HFmode || (MODE) == V4HFmode || (MODE) == V2HFmode   \
> > >     || (MODE) == V8BFmode || (MODE) == V4BFmode || (MODE) == V2BFmode   \
> > >     || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode   \
> > > -   || (MODE) == V2DImode || (MODE) == V2QImode || (MODE) == DFmode     \
> > > -   || (MODE) == HFmode || (MODE) == BFmode)
> > > +   || (MODE) == V2DImode || (MODE) == V2QImode                         \
> > > +   || (MODE) == DFmode || (MODE) == DImode                             \
> > > +   || (MODE) == HFmode || (MODE) == BFmode || (MODE) == HImode)
> > I realize there's no TARGET_SSE2 for VALID_SSE_REG_MODE in
> > hard_regno_mode_ok, it's ok for other modes except HImode which must
> > require SSE2, orelse we can move 16-bit from/to integer
> >
> > So Remove HImode from it, and add it separately in hard_regno_mode_ok
> >  as +      /* Use pinsrw/pextrw to mov 16-bit data from/to sse to/from
> > integer.  */
> > +      if (TARGET_SSE2 && mode == HImode)
> > + return true;
> > +
> >
> >
> > >
> > >  #define VALID_SSE_REG_MODE(MODE)                                       \
> > >    ((MODE) == V1TImode || (MODE) == TImode                              \
> > >     || (MODE) == V4SFmode || (MODE) == V4SImode                         \
> > > -   || (MODE) == SFmode || (MODE) == TFmode || (MODE) == TDmode)
> > > +   || (MODE) == SFmode || (MODE) == SImode                             \
> > > +   || (MODE) == TFmode || (MODE) == TDmode)
> > >
> > >  #define VALID_MMX_REG_MODE_3DNOW(MODE) \
> > >    ((MODE) == V2SFmode || (MODE) == SFmode)
> > > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> > > index d5134cc351e..63aff287795 100644
> > > --- a/gcc/config/i386/mmx.md
> > > +++ b/gcc/config/i386/mmx.md
> > > @@ -133,7 +133,8 @@ (define_insn "*mov<mode>_internal"
> > >         (match_operand:MMXMODE 1 "nonimm_or_0_operand"
> > >      "rCo,rC,C,rm,rC,C  ,!y,m  ,?!y,?!y,r  ,C,v,m,v,v,r,*x,!y"))]
> > >    "(TARGET_MMX || TARGET_MMX_WITH_SSE)
> > > -   && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
> > > +   && !(MEM_P (operands[0]) && MEM_P (operands[1]))
> > > +   && ix86_hardreg_mov_ok (operands[0], operands[1])"
> > >  {
> > >    switch (get_attr_type (insn))
> > >      {
> > > @@ -286,7 +287,8 @@ (define_insn "*mov<mode>_internal"
> > >      "=r ,m ,v,v,v,m,r,v")
> > >         (match_operand:V_32 1 "general_operand"
> > >      "rmC,rC,C,v,m,v,v,r"))]
> > > -  "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
> > > +  "!(MEM_P (operands[0]) && MEM_P (operands[1]))
> > > +   && ix86_hardreg_mov_ok (operands[0], operands[1])"
> > >  {
> > >    switch (get_attr_type (insn))
> > >      {
> > > diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-3.c b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c
> > > index 765441a7c4a..4f105f58b26 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pieces-memset-3.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c
> > > @@ -13,6 +13,6 @@ foo (int x)
> > >  /* { dg-final { scan-assembler-times "vinserti64x4\[ \\t\]+\[^\n\]*%zmm" 1 } } */
> > >  /* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 1 } } */
> > >  /* No need to dynamically realign the stack here.  */
> > > -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */
> > > +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
> > >  /* Nor use a frame pointer.  */
> > > -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */
> > > +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-37.c b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c
> > > index 0c5056be54d..fd09bd153ce 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pieces-memset-37.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c
> > > @@ -10,6 +10,6 @@ foo (int a1, int a2, int a3, int a4, int a5, int a6, int x, char *dst)
> > >  /* { dg-final { scan-assembler-times "vpbroadcastb\[ \\t\]+\[^\n\]*%ymm" 1 } } */
> > >  /* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */
> > >  /* No need to dynamically realign the stack here.  */
> > > -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */
> > > +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
> > >  /* Nor use a frame pointer.  */
> > > -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */
> > > +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
> > > diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-39.c b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c
> > > index e33644c2f10..0ed88b274bd 100644
> > > --- a/gcc/testsuite/gcc.target/i386/pieces-memset-39.c
> > > +++ b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c
> > > @@ -11,6 +11,6 @@ foo (int a1, int a2, int a3, int a4, int a5, int a6, int x, char *dst)
> > >  /* { dg-final { scan-assembler-not "vinserti64x4" } } */
> > >  /* { dg-final { scan-assembler-times "vmovdqu8\[ \\t\]+\[^\n\]*%zmm" 1 } } */
> > >  /* No need to dynamically realign the stack here.  */
> > > -/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */
> > > +/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
> > >  /* Nor use a frame pointer.  */
> > > -/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */
> > > +/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
> > > --
> > > 2.27.0
> > >
> >
> >
> > --
> > BR,
> > Hongtao
  

Patch

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 292b32c5e99..030c26965ab 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -19725,7 +19725,8 @@  ix86_can_change_mode_class (machine_mode from, machine_mode to,
 	 the vec_dupv4hi pattern.
 	 NB: SSE2 can load 16bit data to sse register via pinsrw.  */
       int mov_size = MAYBE_SSE_CLASS_P (regclass) && TARGET_SSE2 ? 2 : 4;
-      if (GET_MODE_SIZE (from) < mov_size)
+      if (GET_MODE_SIZE (from) < mov_size
+	  || GET_MODE_SIZE (to) < mov_size)
 	return false;
     }
 
@@ -20089,12 +20090,6 @@  ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 	      || VALID_AVX512F_SCALAR_MODE (mode)))
 	return true;
 
-      /* For AVX512FP16, vmovw supports movement of HImode
-	 and HFmode between GPR and SSE registers.  */
-      if (TARGET_AVX512FP16
-	  && VALID_AVX512FP16_SCALAR_MODE (mode))
-	return true;
-
       /* For AVX-5124FMAPS or AVX-5124VNNIW
 	 allow V64SF and V64SI modes for special regnos.  */
       if ((TARGET_AVX5124FMAPS || TARGET_AVX5124VNNIW)
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 3869db8f2d3..d9a1fb0e420 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1017,11 +1017,9 @@  extern const char *host_detect_local_cpu (int argc, const char **argv);
   (VALID_AVX256_REG_MODE (MODE) || (MODE) == OImode)
 
 #define VALID_AVX512F_SCALAR_MODE(MODE)					\
-  ((MODE) == DImode || (MODE) == DFmode || (MODE) == SImode		\
-   || (MODE) == SFmode)
-
-#define VALID_AVX512FP16_SCALAR_MODE(MODE)	\
-  ((MODE) == HImode || (MODE) == HFmode)
+  ((MODE) == DImode || (MODE) == DFmode					\
+   || (MODE) == SImode || (MODE) == SFmode				\
+   || (MODE) == HImode || (MODE) == HFmode || (MODE) == BFmode)
 
 #define VALID_AVX512F_REG_MODE(MODE)					\
   ((MODE) == V8DImode || (MODE) == V8DFmode || (MODE) == V64QImode	\
@@ -1045,13 +1043,15 @@  extern const char *host_detect_local_cpu (int argc, const char **argv);
    || (MODE) == V8HFmode || (MODE) == V4HFmode || (MODE) == V2HFmode	\
    || (MODE) == V8BFmode || (MODE) == V4BFmode || (MODE) == V2BFmode	\
    || (MODE) == V4QImode || (MODE) == V2HImode || (MODE) == V1SImode	\
-   || (MODE) == V2DImode || (MODE) == V2QImode || (MODE) == DFmode	\
-   || (MODE) == HFmode || (MODE) == BFmode)
+   || (MODE) == V2DImode || (MODE) == V2QImode				\
+   || (MODE) == DFmode	|| (MODE) == DImode				\
+   || (MODE) == HFmode || (MODE) == BFmode || (MODE) == HImode)
 
 #define VALID_SSE_REG_MODE(MODE)					\
   ((MODE) == V1TImode || (MODE) == TImode				\
    || (MODE) == V4SFmode || (MODE) == V4SImode				\
-   || (MODE) == SFmode || (MODE) == TFmode || (MODE) == TDmode)
+   || (MODE) == SFmode || (MODE) == SImode				\
+   || (MODE) == TFmode || (MODE) == TDmode)
 
 #define VALID_MMX_REG_MODE_3DNOW(MODE) \
   ((MODE) == V2SFmode || (MODE) == SFmode)
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index d5134cc351e..63aff287795 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -133,7 +133,8 @@  (define_insn "*mov<mode>_internal"
 	(match_operand:MMXMODE 1 "nonimm_or_0_operand"
     "rCo,rC,C,rm,rC,C  ,!y,m  ,?!y,?!y,r  ,C,v,m,v,v,r,*x,!y"))]
   "(TARGET_MMX || TARGET_MMX_WITH_SSE)
-   && !(MEM_P (operands[0]) && MEM_P (operands[1]))"
+   && !(MEM_P (operands[0]) && MEM_P (operands[1]))
+   && ix86_hardreg_mov_ok (operands[0], operands[1])"
 {
   switch (get_attr_type (insn))
     {
@@ -286,7 +287,8 @@  (define_insn "*mov<mode>_internal"
     "=r ,m ,v,v,v,m,r,v")
 	(match_operand:V_32 1 "general_operand"
     "rmC,rC,C,v,m,v,v,r"))]
-  "!(MEM_P (operands[0]) && MEM_P (operands[1]))"
+  "!(MEM_P (operands[0]) && MEM_P (operands[1]))
+   && ix86_hardreg_mov_ok (operands[0], operands[1])"
 {
   switch (get_attr_type (insn))
     {
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-3.c b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c
index 765441a7c4a..4f105f58b26 100644
--- a/gcc/testsuite/gcc.target/i386/pieces-memset-3.c
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-3.c
@@ -13,6 +13,6 @@  foo (int x)
 /* { dg-final { scan-assembler-times "vinserti64x4\[ \\t\]+\[^\n\]*%zmm" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqu64\[ \\t\]+\[^\n\]*%zmm" 1 } } */
 /* No need to dynamically realign the stack here.  */
-/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
 /* Nor use a frame pointer.  */
-/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-37.c b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c
index 0c5056be54d..fd09bd153ce 100644
--- a/gcc/testsuite/gcc.target/i386/pieces-memset-37.c
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-37.c
@@ -10,6 +10,6 @@  foo (int a1, int a2, int a3, int a4, int a5, int a6, int x, char *dst)
 /* { dg-final { scan-assembler-times "vpbroadcastb\[ \\t\]+\[^\n\]*%ymm" 1 } } */
 /* { dg-final { scan-assembler-times "vmovdqu\[ \\t\]+\[^\n\]*%ymm" 2 } } */
 /* No need to dynamically realign the stack here.  */
-/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
 /* Nor use a frame pointer.  */
-/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-39.c b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c
index e33644c2f10..0ed88b274bd 100644
--- a/gcc/testsuite/gcc.target/i386/pieces-memset-39.c
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-39.c
@@ -11,6 +11,6 @@  foo (int a1, int a2, int a3, int a4, int a5, int a6, int x, char *dst)
 /* { dg-final { scan-assembler-not "vinserti64x4" } } */
 /* { dg-final { scan-assembler-times "vmovdqu8\[ \\t\]+\[^\n\]*%zmm" 1 } } */
 /* No need to dynamically realign the stack here.  */
-/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-not "and\[^\n\r]*%\[re\]sp" } } */
 /* Nor use a frame pointer.  */
-/* { dg-final { scan-assembler-not "%\[re\]bp" { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-not "%\[re\]bp" } } */