[11/13,APX,EGPR] Handle legacy insns that only support GPR16 (3/5)
Checks
Commit Message
From: Kong Lingling <lingling.kong@intel.com>
Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
but no evex counterpart.
insn list:
1. phminposuw/vphminposuw
2. ptest/vptest
3. roundps/vroundps, roundpd/vroundpd,
roundss/vroundss, roundsd/vroundsd
4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm
6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist
gcc/ChangeLog:
* config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New
prototype.
* config/i386/i386.cc (x86_evex_reg_mentioned_p): New
function.
* config/i386/i386.md (sse4_1_round<mode>2): Set attr gpr32 0
and constraint Bt/BM to all non-evex alternatives, adjust
alternative outputs if evex reg is mentioned.
* config/i386/sse.md (<sse4_1>_ptest<mode>): Set attr gpr32 0
and constraint Bt/BM to all non-evex alternatives.
(ptesttf2): Likewise.
(<sse4_1>_round<ssemodesuffix><avxsizesuffix): Likewise.
(sse4_1_round<ssescalarmodesuffix>): Likewise.
(sse4_2_pcmpestri): Likewise.
(sse4_2_pcmpestrm): Likewise.
(sse4_2_pcmpestr_cconly): Likewise.
(sse4_2_pcmpistr): Likewise.
(sse4_2_pcmpistri): Likewise.
(sse4_2_pcmpistrm): Likewise.
(sse4_2_pcmpistr_cconly): Likewise.
(aesimc): Likewise.
(aeskeygenassist): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
tests.
---
gcc/config/i386/i386-protos.h | 1 +
gcc/config/i386/i386.cc | 13 +++
gcc/config/i386/i386.md | 3 +-
gcc/config/i386/sse.md | 93 +++++++++++++------
.../i386/apx-legacy-insn-check-norex2.c | 55 ++++++++++-
5 files changed, 132 insertions(+), 33 deletions(-)
Comments
On Thu, Aug 31, 2023 at 10:25 AM Hongyu Wang via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> From: Kong Lingling <lingling.kong@intel.com>
>
> Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
> but no evex counterpart.
>
> insn list:
> 1. phminposuw/vphminposuw
> 2. ptest/vptest
> 3. roundps/vroundps, roundpd/vroundpd,
> roundss/vroundss, roundsd/vroundsd
> 4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
> 5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm
How are GPRs involved in the above? Or did I misunderstand something?
> 6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist
>
> gcc/ChangeLog:
>
> * config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New
> prototype.
> * config/i386/i386.cc (x86_evex_reg_mentioned_p): New
> function.
> * config/i386/i386.md (sse4_1_round<mode>2): Set attr gpr32 0
> and constraint Bt/BM to all non-evex alternatives, adjust
> alternative outputs if evex reg is mentioned.
> * config/i386/sse.md (<sse4_1>_ptest<mode>): Set attr gpr32 0
> and constraint Bt/BM to all non-evex alternatives.
> (ptesttf2): Likewise.
> (<sse4_1>_round<ssemodesuffix><avxsizesuffix): Likewise.
> (sse4_1_round<ssescalarmodesuffix>): Likewise.
> (sse4_2_pcmpestri): Likewise.
> (sse4_2_pcmpestrm): Likewise.
> (sse4_2_pcmpestr_cconly): Likewise.
> (sse4_2_pcmpistr): Likewise.
> (sse4_2_pcmpistri): Likewise.
> (sse4_2_pcmpistrm): Likewise.
> (sse4_2_pcmpistr_cconly): Likewise.
> (aesimc): Likewise.
> (aeskeygenassist): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
> tests.
> ---
> gcc/config/i386/i386-protos.h | 1 +
> gcc/config/i386/i386.cc | 13 +++
> gcc/config/i386/i386.md | 3 +-
> gcc/config/i386/sse.md | 93 +++++++++++++------
> .../i386/apx-legacy-insn-check-norex2.c | 55 ++++++++++-
> 5 files changed, 132 insertions(+), 33 deletions(-)
>
> diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> index 78eb3e0f584..bbb219e3039 100644
> --- a/gcc/config/i386/i386-protos.h
> +++ b/gcc/config/i386/i386-protos.h
> @@ -65,6 +65,7 @@ extern bool extended_reg_mentioned_p (rtx);
> extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
> extern bool x86_extended_reg_mentioned_p (rtx);
> extern bool x86_extended_rex2reg_mentioned_p (rtx);
> +extern bool x86_evex_reg_mentioned_p (rtx [], int);
> extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
> extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index f5d642948bc..ec93c5bab97 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -22936,6 +22936,19 @@ x86_extended_rex2reg_mentioned_p (rtx insn)
> return false;
> }
>
> +/* Return true when rtx operands mentions register that must be encoded using
> + evex prefix. */
> +bool
> +x86_evex_reg_mentioned_p (rtx operands[], int nops)
> +{
> + int i;
> + for (i = 0; i < nops; i++)
> + if (EXT_REX_SSE_REG_P (operands[i])
> + || x86_extended_rex2reg_mentioned_p (operands[i]))
> + return true;
> + return false;
> +}
> +
> /* If profitable, negate (without causing overflow) integer constant
> of mode MODE at location LOC. Return true in this case. */
> bool
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 83ad01b43c1..4c305e72389 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -21603,7 +21603,7 @@ (define_expand "significand<mode>2"
> (define_insn "sse4_1_round<mode>2"
> [(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v")
> (unspec:MODEFH
> - [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,m,v,m")
> + [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,Bt,v,m")
> (match_operand:SI 2 "const_0_to_15_operand")]
> UNSPEC_ROUND))]
> "TARGET_SSE4_1"
> @@ -21616,6 +21616,7 @@ (define_insn "sse4_1_round<mode>2"
> [(set_attr "type" "ssecvt")
> (set_attr "prefix_extra" "1,1,1,*,*")
> (set_attr "length_immediate" "1")
> + (set_attr "gpr32" "1,1,0,1,1")
> (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
> (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
> (set_attr "avx_partial_xmm_update" "false,false,true,false,true")
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 05963de9219..456713b991a 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -22617,11 +22617,12 @@ (define_insn "avx2_pblendd<mode>"
>
> (define_insn "sse4_1_phminposuw"
> [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,x")
> - (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBm,*xBm,xm")]
> + (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBT,*xBT,xBt")]
> UNSPEC_PHMINPOSUW))]
> "TARGET_SSE4_1"
> "%vphminposuw\t{%1, %0|%0, %1}"
> [(set_attr "isa" "noavx,noavx,avx")
> + (set_attr "gpr32" "0")
> (set_attr "type" "sselog1")
> (set_attr "prefix_extra" "1")
> (set_attr "prefix" "orig,orig,vex")
> @@ -23810,12 +23811,13 @@ (define_insn "avx_vtest<ssemodesuffix><avxsizesuffix>"
> (define_insn "*<sse4_1>_ptest<mode>"
> [(set (reg FLAGS_REG)
> (unspec [(match_operand:V_AVX 0 "register_operand" "Yr, *x, x")
> - (match_operand:V_AVX 1 "vector_operand" "YrBm, *xBm, xm")]
> + (match_operand:V_AVX 1 "vector_operand" "YrBT, *xBT, xBt")]
> UNSPEC_PTEST))]
> "TARGET_SSE4_1 && ix86_match_ptest_ccmode (insn)"
> "%vptest\t{%1, %0|%0, %1}"
> [(set_attr "isa" "noavx,noavx,avx")
> (set_attr "type" "ssecomi")
> + (set_attr "gpr32" "0")
> (set_attr "prefix_extra" "1")
> (set_attr "prefix" "orig,orig,vex")
> (set (attr "btver2_decode")
> @@ -23852,12 +23854,13 @@ (define_expand "<sse4_1>_ptest<mode>"
> (define_insn "ptesttf2"
> [(set (reg:CC FLAGS_REG)
> (unspec:CC [(match_operand:TF 0 "register_operand" "Yr, *x, x")
> - (match_operand:TF 1 "vector_operand" "YrBm, *xBm, xm")]
> + (match_operand:TF 1 "vector_operand" "YrBT, *xBT, xBt")]
> UNSPEC_PTEST))]
> "TARGET_SSE4_1"
> "%vptest\t{%1, %0|%0, %1}"
> [(set_attr "isa" "noavx,noavx,avx")
> (set_attr "type" "ssecomi")
> + (set_attr "gpr32" "0")
> (set_attr "prefix_extra" "1")
> (set_attr "prefix" "orig,orig,vex")
> (set_attr "mode" "TI")])
> @@ -23968,13 +23971,14 @@ (define_expand "lrint<mode><sseintvecmodelower>2"
> (define_insn "<sse4_1>_round<ssemodesuffix><avxsizesuffix>"
> [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
> (unspec:VF_128_256
> - [(match_operand:VF_128_256 1 "vector_operand" "YrBm,*xBm,xm")
> + [(match_operand:VF_128_256 1 "vector_operand" "YrBT,*xBT,xBt")
> (match_operand:SI 2 "const_0_to_15_operand")]
> UNSPEC_ROUND))]
> "TARGET_SSE4_1"
> "%vround<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
> [(set_attr "isa" "noavx,noavx,avx")
> (set_attr "type" "ssecvt")
> + (set_attr "gpr32" "0")
> (set_attr "prefix_data16" "1,1,*")
> (set_attr "prefix_extra" "1")
> (set_attr "length_immediate" "1")
> @@ -24061,19 +24065,32 @@ (define_insn "sse4_1_round<ssescalarmodesuffix>"
> [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
> (vec_merge:VF_128
> (unspec:VF_128
> - [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> + [(match_operand:VF_128 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
> (match_operand:SI 3 "const_0_to_15_operand")]
> UNSPEC_ROUND)
> (match_operand:VF_128 1 "register_operand" "0,0,x,v")
> (const_int 1)))]
> "TARGET_SSE4_1"
> - "@
> - round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> - round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> - vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
> - vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
> - [(set_attr "isa" "noavx,noavx,avx,avx512f")
> +{
> + switch (which_alternative)
> + {
> + case 0:
> + case 1:
> + return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}";
> + case 2:
> + return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> + case 3:
> + if (x86_evex_reg_mentioned_p (operands, 3))
> + return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> + else
> + return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> + default:
> + gcc_unreachable ();
> + }
> +}
> + [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
> (set_attr "type" "ssecvt")
> + (set_attr "gpr32" "0,0,0,1")
> (set_attr "length_immediate" "1")
> (set_attr "prefix_data16" "1,1,*,*")
> (set_attr "prefix_extra" "1")
> @@ -24085,19 +24102,32 @@ (define_insn "*sse4_1_round<ssescalarmodesuffix>"
> (vec_merge:VFH_128
> (vec_duplicate:VFH_128
> (unspec:<ssescalarmode>
> - [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> + [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
> (match_operand:SI 3 "const_0_to_15_operand")]
> UNSPEC_ROUND))
> (match_operand:VFH_128 1 "register_operand" "0,0,x,v")
> (const_int 1)))]
> "TARGET_SSE4_1"
> - "@
> - round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
> - round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
> - vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
> - vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
> - [(set_attr "isa" "noavx,noavx,avx,avx512f")
> +{
> + switch (which_alternative)
> + {
> + case 0:
> + case 1:
> + return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}";
> + case 2:
> + return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> + case 3:
> + if (x86_evex_reg_mentioned_p (operands, 3) || <MODE>mode == V8HFmode)
> + return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> + else
> + return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> + default:
> + gcc_unreachable ();
> + }
> +}
> + [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
> (set_attr "type" "ssecvt")
> + (set_attr "gpr32" "0,0,0,1")
> (set_attr "length_immediate" "1")
> (set_attr "prefix_data16" "1,1,*,*")
> (set_attr "prefix_extra" "1")
> @@ -24318,7 +24348,7 @@ (define_insn "sse4_2_pcmpestri"
> (unspec:SI
> [(match_operand:V16QI 1 "register_operand" "x,x")
> (match_operand:SI 2 "register_operand" "a,a")
> - (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> + (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> (match_operand:SI 4 "register_operand" "d,d")
> (match_operand:SI 5 "const_0_to_255_operand")]
> UNSPEC_PCMPESTR))
> @@ -24333,6 +24363,7 @@ (define_insn "sse4_2_pcmpestri"
> "TARGET_SSE4_2"
> "%vpcmpestri\t{%5, %3, %1|%1, %3, %5}"
> [(set_attr "type" "sselog")
> + (set_attr "gpr32" "0")
> (set_attr "prefix_extra" "1")
> (set_attr "prefix" "maybe_vex")
> (set_attr "length_immediate" "1")
> @@ -24345,7 +24376,7 @@ (define_insn "sse4_2_pcmpestrm"
> (unspec:V16QI
> [(match_operand:V16QI 1 "register_operand" "x,x")
> (match_operand:SI 2 "register_operand" "a,a")
> - (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> + (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> (match_operand:SI 4 "register_operand" "d,d")
> (match_operand:SI 5 "const_0_to_255_operand")]
> UNSPEC_PCMPESTR))
> @@ -24360,6 +24391,7 @@ (define_insn "sse4_2_pcmpestrm"
> "TARGET_SSE4_2"
> "%vpcmpestrm\t{%5, %3, %1|%1, %3, %5}"
> [(set_attr "type" "sselog")
> + (set_attr "gpr32" "0")
> (set_attr "prefix_extra" "1")
> (set_attr "length_immediate" "1")
> (set_attr "prefix" "maybe_vex")
> @@ -24372,7 +24404,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
> (unspec:CC
> [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
> (match_operand:SI 3 "register_operand" "a,a,a,a")
> - (match_operand:V16QI 4 "nonimmediate_operand" "x,m,x,m")
> + (match_operand:V16QI 4 "nonimmediate_operand" "x,Bt,x,Bt")
> (match_operand:SI 5 "register_operand" "d,d,d,d")
> (match_operand:SI 6 "const_0_to_255_operand")]
> UNSPEC_PCMPESTR))
> @@ -24385,6 +24417,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
> %vpcmpestri\t{%6, %4, %2|%2, %4, %6}
> %vpcmpestri\t{%6, %4, %2|%2, %4, %6}"
> [(set_attr "type" "sselog")
> + (set_attr "gpr32" "0")
> (set_attr "prefix_extra" "1")
> (set_attr "length_immediate" "1")
> (set_attr "memory" "none,load,none,load")
> @@ -24396,7 +24429,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
> [(set (match_operand:SI 0 "register_operand" "=c,c")
> (unspec:SI
> [(match_operand:V16QI 2 "register_operand" "x,x")
> - (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> + (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> (match_operand:SI 4 "const_0_to_255_operand")]
> UNSPEC_PCMPISTR))
> (set (match_operand:V16QI 1 "register_operand" "=Yz,Yz")
> @@ -24439,6 +24472,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
> DONE;
> }
> [(set_attr "type" "sselog")
> + (set_attr "gpr32" "0")
> (set_attr "prefix_extra" "1")
> (set_attr "length_immediate" "1")
> (set_attr "memory" "none,load")
> @@ -24448,7 +24482,7 @@ (define_insn "sse4_2_pcmpistri"
> [(set (match_operand:SI 0 "register_operand" "=c,c")
> (unspec:SI
> [(match_operand:V16QI 1 "register_operand" "x,x")
> - (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
> + (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
> (match_operand:SI 3 "const_0_to_255_operand")]
> UNSPEC_PCMPISTR))
> (set (reg:CC FLAGS_REG)
> @@ -24460,6 +24494,7 @@ (define_insn "sse4_2_pcmpistri"
> "TARGET_SSE4_2"
> "%vpcmpistri\t{%3, %2, %1|%1, %2, %3}"
> [(set_attr "type" "sselog")
> + (set_attr "gpr32" "0")
> (set_attr "prefix_extra" "1")
> (set_attr "length_immediate" "1")
> (set_attr "prefix" "maybe_vex")
> @@ -24471,7 +24506,7 @@ (define_insn "sse4_2_pcmpistrm"
> [(set (match_operand:V16QI 0 "register_operand" "=Yz,Yz")
> (unspec:V16QI
> [(match_operand:V16QI 1 "register_operand" "x,x")
> - (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
> + (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
> (match_operand:SI 3 "const_0_to_255_operand")]
> UNSPEC_PCMPISTR))
> (set (reg:CC FLAGS_REG)
> @@ -24483,6 +24518,7 @@ (define_insn "sse4_2_pcmpistrm"
> "TARGET_SSE4_2"
> "%vpcmpistrm\t{%3, %2, %1|%1, %2, %3}"
> [(set_attr "type" "sselog")
> + (set_attr "gpr32" "0")
> (set_attr "prefix_extra" "1")
> (set_attr "length_immediate" "1")
> (set_attr "prefix" "maybe_vex")
> @@ -24494,7 +24530,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
> [(set (reg:CC FLAGS_REG)
> (unspec:CC
> [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
> - (match_operand:V16QI 3 "nonimmediate_operand" "x,m,x,m")
> + (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt,x,Bt")
> (match_operand:SI 4 "const_0_to_255_operand")]
> UNSPEC_PCMPISTR))
> (clobber (match_scratch:V16QI 0 "=Yz,Yz,X,X"))
> @@ -24506,6 +24542,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
> %vpcmpistri\t{%4, %3, %2|%2, %3, %4}
> %vpcmpistri\t{%4, %3, %2|%2, %3, %4}"
> [(set_attr "type" "sselog")
> + (set_attr "gpr32" "0")
> (set_attr "prefix_extra" "1")
> (set_attr "length_immediate" "1")
> (set_attr "memory" "none,load,none,load")
> @@ -25990,23 +26027,25 @@ (define_insn "aesdeclast"
>
> (define_insn "aesimc"
> [(set (match_operand:V2DI 0 "register_operand" "=x")
> - (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")]
> + (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")]
> UNSPEC_AESIMC))]
> "TARGET_AES"
> "%vaesimc\t{%1, %0|%0, %1}"
> [(set_attr "type" "sselog1")
> + (set_attr "gpr32" "0")
> (set_attr "prefix_extra" "1")
> (set_attr "prefix" "maybe_vex")
> (set_attr "mode" "TI")])
>
> (define_insn "aeskeygenassist"
> [(set (match_operand:V2DI 0 "register_operand" "=x")
> - (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")
> + (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")
> (match_operand:SI 2 "const_0_to_255_operand")]
> UNSPEC_AESKEYGENASSIST))]
> "TARGET_AES"
> "%vaeskeygenassist\t{%2, %1, %0|%0, %1, %2}"
> [(set_attr "type" "sselog1")
> + (set_attr "gpr32" "0")
> (set_attr "prefix_extra" "1")
> (set_attr "length_immediate" "1")
> (set_attr "prefix" "maybe_vex")
> diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> index 510213a6ca7..771bcb078e1 100644
> --- a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> +++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> @@ -45,13 +45,22 @@ typedef union
> DTYPE a[16];
> } tmp_u;
>
> -__attribute__((target("sse4.2")))
> +__attribute__((target("sse4.2,aes")))
> void sse_test ()
> {
> register tmp_u *tdst __asm__("%r16");
> register tmp_u *src1 __asm__("%r17");
> register tmp_u *src2 __asm__("%r18");
> -
> +
> + src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
> + src1->a[2] = _mm_testc_si128 (src1->xi[3], src2->xi[4]);
> + src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
> + _MM_FROUND_CUR_DIRECTION);
> + src1->xf[4] = _mm_round_ps (src1->xf[7], _MM_FROUND_CUR_DIRECTION);
> + src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
> + _MM_FROUND_CUR_DIRECTION);
> + src1->xd[1] = _mm_round_pd (src1->xd[4], _MM_FROUND_CUR_DIRECTION);
> +
> src1->xi[0] = _mm_hadd_epi16 (tdst->xi[2], src2->xi[3]);
> src1->xi[1] = _mm_hadd_epi32 (tdst->xi[0], src2->xi[1]);
> tdst->xi[2] = _mm_hadds_epi16 (src1->xi[4], src2->xi[5]);
> @@ -77,16 +86,33 @@ void sse_test ()
> tdst->xi[1] = _mm_sign_epi8 (src1->xi[5], src2->xi[6]);
> tdst->xi[2] = _mm_sign_epi16 (src1->xi[7], src2->xi[0]);
> tdst->xi[3] = _mm_sign_epi32 (src1->xi[1], src2->xi[2]);
> +
> + tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
> + tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
> + tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
> + tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
> +
> + tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
> + tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
> }
>
> -__attribute__((target("avx2")))
> +__attribute__((target("avx2,aes")))
> void vex_test ()
> {
>
> register tmp_u *tdst __asm__("%r16");
> register tmp_u *src1 __asm__("%r17");
> register tmp_u *src2 __asm__("%r18");
> -
> +
> + src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
> + src1->a[2] = _mm256_testc_si256 (src1->yi[2], src2->yi[3]);
> + src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
> + _MM_FROUND_CUR_DIRECTION);
> + src1->yf[4] = _mm256_round_ps (src1->yf[2], _MM_FROUND_CUR_DIRECTION);
> + src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
> + _MM_FROUND_CUR_DIRECTION);
> + src1->yd[1] = _mm256_round_pd (src1->yd[3], _MM_FROUND_CUR_DIRECTION);
> +
> src1->yi[1] = _mm256_hadd_epi16 (tdst->yi[2], src2->yi[3]);
> src1->yi[2] = _mm256_hadd_epi32 (tdst->yi[0], src2->yi[1]);
> tdst->yi[3] = _mm256_hadds_epi16 (src1->yi[1], src2->yi[2]);
> @@ -98,7 +124,6 @@ void vex_test ()
> src1->yi[1] = _mm256_cmpgt_epi64 (tdst->yi[3], src2->yi[0]);
>
> tdst->yf[2] = _mm256_dp_ps (src1->yf[0], src2->yf[1], 0xbf);
> - tdst->xd[3] = _mm_dp_pd (src1->xd[0], src2->xd[1], 0xbf);
>
> tdst->yi[3] = _mm256_mpsadbw_epu8 (src1->yi[1], src2->yi[1], 0xc1);
>
> @@ -112,6 +137,14 @@ void vex_test ()
> tdst->yi[2] = _mm256_sign_epi8 (src1->yi[0], src2->yi[1]);
> tdst->yi[3] = _mm256_sign_epi16 (src1->yi[2], src2->yi[3]);
> tdst->yi[0] = _mm256_sign_epi32 (src1->yi[0], src2->yi[1]);
> +
> + tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
> + tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
> + tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
> + tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
> +
> + tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
> + tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
> }
>
> /* { dg-final { scan-assembler-not "v?pcmpeqq\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> @@ -134,3 +167,15 @@ void vex_test ()
> /* { dg-final { scan-assembler-not "v?psignb\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> /* { dg-final { scan-assembler-not "v?psignw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> /* { dg-final { scan-assembler-not "v?psignd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?phminposuw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?ptest\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?roundss\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?roundsd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?roundps\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?roundpd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?pcmpestri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?pcmpistri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?pcmpestrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?pcmpistrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?aesimc\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> +/* { dg-final { scan-assembler-not "v?aeskeygenassist\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> --
> 2.31.1
>
On Thu, Aug 31, 2023 at 11:26 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Thu, Aug 31, 2023 at 10:25 AM Hongyu Wang via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > From: Kong Lingling <lingling.kong@intel.com>
> >
> > Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
> > but no evex counterpart.
> >
> > insn list:
> > 1. phminposuw/vphminposuw
> > 2. ptest/vptest
> > 3. roundps/vroundps, roundpd/vroundpd,
> > roundss/vroundss, roundsd/vroundsd
> > 4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
> > 5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm
>
> How are GPRs involved in the above? Or did I misunderstand something?
Following up myself - for the memory operand alternatives I guess. How about
simply disabling the memory alternatives when EGPR is active? Wouldn't
that simplify the initial patchset a lot? Re-enabling them when
deemed important
could be done as followup then?
Richard.
> > 6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New
> > prototype.
> > * config/i386/i386.cc (x86_evex_reg_mentioned_p): New
> > function.
> > * config/i386/i386.md (sse4_1_round<mode>2): Set attr gpr32 0
> > and constraint Bt/BM to all non-evex alternatives, adjust
> > alternative outputs if evex reg is mentioned.
> > * config/i386/sse.md (<sse4_1>_ptest<mode>): Set attr gpr32 0
> > and constraint Bt/BM to all non-evex alternatives.
> > (ptesttf2): Likewise.
> > (<sse4_1>_round<ssemodesuffix><avxsizesuffix): Likewise.
> > (sse4_1_round<ssescalarmodesuffix>): Likewise.
> > (sse4_2_pcmpestri): Likewise.
> > (sse4_2_pcmpestrm): Likewise.
> > (sse4_2_pcmpestr_cconly): Likewise.
> > (sse4_2_pcmpistr): Likewise.
> > (sse4_2_pcmpistri): Likewise.
> > (sse4_2_pcmpistrm): Likewise.
> > (sse4_2_pcmpistr_cconly): Likewise.
> > (aesimc): Likewise.
> > (aeskeygenassist): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
> > tests.
> > ---
> > gcc/config/i386/i386-protos.h | 1 +
> > gcc/config/i386/i386.cc | 13 +++
> > gcc/config/i386/i386.md | 3 +-
> > gcc/config/i386/sse.md | 93 +++++++++++++------
> > .../i386/apx-legacy-insn-check-norex2.c | 55 ++++++++++-
> > 5 files changed, 132 insertions(+), 33 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> > index 78eb3e0f584..bbb219e3039 100644
> > --- a/gcc/config/i386/i386-protos.h
> > +++ b/gcc/config/i386/i386-protos.h
> > @@ -65,6 +65,7 @@ extern bool extended_reg_mentioned_p (rtx);
> > extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
> > extern bool x86_extended_reg_mentioned_p (rtx);
> > extern bool x86_extended_rex2reg_mentioned_p (rtx);
> > +extern bool x86_evex_reg_mentioned_p (rtx [], int);
> > extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
> > extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index f5d642948bc..ec93c5bab97 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -22936,6 +22936,19 @@ x86_extended_rex2reg_mentioned_p (rtx insn)
> > return false;
> > }
> >
> > +/* Return true when rtx operands mentions register that must be encoded using
> > + evex prefix. */
> > +bool
> > +x86_evex_reg_mentioned_p (rtx operands[], int nops)
> > +{
> > + int i;
> > + for (i = 0; i < nops; i++)
> > + if (EXT_REX_SSE_REG_P (operands[i])
> > + || x86_extended_rex2reg_mentioned_p (operands[i]))
> > + return true;
> > + return false;
> > +}
> > +
> > /* If profitable, negate (without causing overflow) integer constant
> > of mode MODE at location LOC. Return true in this case. */
> > bool
> > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > index 83ad01b43c1..4c305e72389 100644
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -21603,7 +21603,7 @@ (define_expand "significand<mode>2"
> > (define_insn "sse4_1_round<mode>2"
> > [(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v")
> > (unspec:MODEFH
> > - [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,m,v,m")
> > + [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,Bt,v,m")
> > (match_operand:SI 2 "const_0_to_15_operand")]
> > UNSPEC_ROUND))]
> > "TARGET_SSE4_1"
> > @@ -21616,6 +21616,7 @@ (define_insn "sse4_1_round<mode>2"
> > [(set_attr "type" "ssecvt")
> > (set_attr "prefix_extra" "1,1,1,*,*")
> > (set_attr "length_immediate" "1")
> > + (set_attr "gpr32" "1,1,0,1,1")
> > (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
> > (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
> > (set_attr "avx_partial_xmm_update" "false,false,true,false,true")
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 05963de9219..456713b991a 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -22617,11 +22617,12 @@ (define_insn "avx2_pblendd<mode>"
> >
> > (define_insn "sse4_1_phminposuw"
> > [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,x")
> > - (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBm,*xBm,xm")]
> > + (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBT,*xBT,xBt")]
> > UNSPEC_PHMINPOSUW))]
> > "TARGET_SSE4_1"
> > "%vphminposuw\t{%1, %0|%0, %1}"
> > [(set_attr "isa" "noavx,noavx,avx")
> > + (set_attr "gpr32" "0")
> > (set_attr "type" "sselog1")
> > (set_attr "prefix_extra" "1")
> > (set_attr "prefix" "orig,orig,vex")
> > @@ -23810,12 +23811,13 @@ (define_insn "avx_vtest<ssemodesuffix><avxsizesuffix>"
> > (define_insn "*<sse4_1>_ptest<mode>"
> > [(set (reg FLAGS_REG)
> > (unspec [(match_operand:V_AVX 0 "register_operand" "Yr, *x, x")
> > - (match_operand:V_AVX 1 "vector_operand" "YrBm, *xBm, xm")]
> > + (match_operand:V_AVX 1 "vector_operand" "YrBT, *xBT, xBt")]
> > UNSPEC_PTEST))]
> > "TARGET_SSE4_1 && ix86_match_ptest_ccmode (insn)"
> > "%vptest\t{%1, %0|%0, %1}"
> > [(set_attr "isa" "noavx,noavx,avx")
> > (set_attr "type" "ssecomi")
> > + (set_attr "gpr32" "0")
> > (set_attr "prefix_extra" "1")
> > (set_attr "prefix" "orig,orig,vex")
> > (set (attr "btver2_decode")
> > @@ -23852,12 +23854,13 @@ (define_expand "<sse4_1>_ptest<mode>"
> > (define_insn "ptesttf2"
> > [(set (reg:CC FLAGS_REG)
> > (unspec:CC [(match_operand:TF 0 "register_operand" "Yr, *x, x")
> > - (match_operand:TF 1 "vector_operand" "YrBm, *xBm, xm")]
> > + (match_operand:TF 1 "vector_operand" "YrBT, *xBT, xBt")]
> > UNSPEC_PTEST))]
> > "TARGET_SSE4_1"
> > "%vptest\t{%1, %0|%0, %1}"
> > [(set_attr "isa" "noavx,noavx,avx")
> > (set_attr "type" "ssecomi")
> > + (set_attr "gpr32" "0")
> > (set_attr "prefix_extra" "1")
> > (set_attr "prefix" "orig,orig,vex")
> > (set_attr "mode" "TI")])
> > @@ -23968,13 +23971,14 @@ (define_expand "lrint<mode><sseintvecmodelower>2"
> > (define_insn "<sse4_1>_round<ssemodesuffix><avxsizesuffix>"
> > [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
> > (unspec:VF_128_256
> > - [(match_operand:VF_128_256 1 "vector_operand" "YrBm,*xBm,xm")
> > + [(match_operand:VF_128_256 1 "vector_operand" "YrBT,*xBT,xBt")
> > (match_operand:SI 2 "const_0_to_15_operand")]
> > UNSPEC_ROUND))]
> > "TARGET_SSE4_1"
> > "%vround<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
> > [(set_attr "isa" "noavx,noavx,avx")
> > (set_attr "type" "ssecvt")
> > + (set_attr "gpr32" "0")
> > (set_attr "prefix_data16" "1,1,*")
> > (set_attr "prefix_extra" "1")
> > (set_attr "length_immediate" "1")
> > @@ -24061,19 +24065,32 @@ (define_insn "sse4_1_round<ssescalarmodesuffix>"
> > [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
> > (vec_merge:VF_128
> > (unspec:VF_128
> > - [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> > + [(match_operand:VF_128 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
> > (match_operand:SI 3 "const_0_to_15_operand")]
> > UNSPEC_ROUND)
> > (match_operand:VF_128 1 "register_operand" "0,0,x,v")
> > (const_int 1)))]
> > "TARGET_SSE4_1"
> > - "@
> > - round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > - round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > - vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
> > - vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
> > - [(set_attr "isa" "noavx,noavx,avx,avx512f")
> > +{
> > + switch (which_alternative)
> > + {
> > + case 0:
> > + case 1:
> > + return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}";
> > + case 2:
> > + return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> > + case 3:
> > + if (x86_evex_reg_mentioned_p (operands, 3))
> > + return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> > + else
> > + return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> > + default:
> > + gcc_unreachable ();
> > + }
> > +}
> > + [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
> > (set_attr "type" "ssecvt")
> > + (set_attr "gpr32" "0,0,0,1")
> > (set_attr "length_immediate" "1")
> > (set_attr "prefix_data16" "1,1,*,*")
> > (set_attr "prefix_extra" "1")
> > @@ -24085,19 +24102,32 @@ (define_insn "*sse4_1_round<ssescalarmodesuffix>"
> > (vec_merge:VFH_128
> > (vec_duplicate:VFH_128
> > (unspec:<ssescalarmode>
> > - [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> > + [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
> > (match_operand:SI 3 "const_0_to_15_operand")]
> > UNSPEC_ROUND))
> > (match_operand:VFH_128 1 "register_operand" "0,0,x,v")
> > (const_int 1)))]
> > "TARGET_SSE4_1"
> > - "@
> > - round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
> > - round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
> > - vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
> > - vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
> > - [(set_attr "isa" "noavx,noavx,avx,avx512f")
> > +{
> > + switch (which_alternative)
> > + {
> > + case 0:
> > + case 1:
> > + return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}";
> > + case 2:
> > + return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> > + case 3:
> > + if (x86_evex_reg_mentioned_p (operands, 3) || <MODE>mode == V8HFmode)
> > + return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> > + else
> > + return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> > + default:
> > + gcc_unreachable ();
> > + }
> > +}
> > + [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
> > (set_attr "type" "ssecvt")
> > + (set_attr "gpr32" "0,0,0,1")
> > (set_attr "length_immediate" "1")
> > (set_attr "prefix_data16" "1,1,*,*")
> > (set_attr "prefix_extra" "1")
> > @@ -24318,7 +24348,7 @@ (define_insn "sse4_2_pcmpestri"
> > (unspec:SI
> > [(match_operand:V16QI 1 "register_operand" "x,x")
> > (match_operand:SI 2 "register_operand" "a,a")
> > - (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> > + (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> > (match_operand:SI 4 "register_operand" "d,d")
> > (match_operand:SI 5 "const_0_to_255_operand")]
> > UNSPEC_PCMPESTR))
> > @@ -24333,6 +24363,7 @@ (define_insn "sse4_2_pcmpestri"
> > "TARGET_SSE4_2"
> > "%vpcmpestri\t{%5, %3, %1|%1, %3, %5}"
> > [(set_attr "type" "sselog")
> > + (set_attr "gpr32" "0")
> > (set_attr "prefix_extra" "1")
> > (set_attr "prefix" "maybe_vex")
> > (set_attr "length_immediate" "1")
> > @@ -24345,7 +24376,7 @@ (define_insn "sse4_2_pcmpestrm"
> > (unspec:V16QI
> > [(match_operand:V16QI 1 "register_operand" "x,x")
> > (match_operand:SI 2 "register_operand" "a,a")
> > - (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> > + (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> > (match_operand:SI 4 "register_operand" "d,d")
> > (match_operand:SI 5 "const_0_to_255_operand")]
> > UNSPEC_PCMPESTR))
> > @@ -24360,6 +24391,7 @@ (define_insn "sse4_2_pcmpestrm"
> > "TARGET_SSE4_2"
> > "%vpcmpestrm\t{%5, %3, %1|%1, %3, %5}"
> > [(set_attr "type" "sselog")
> > + (set_attr "gpr32" "0")
> > (set_attr "prefix_extra" "1")
> > (set_attr "length_immediate" "1")
> > (set_attr "prefix" "maybe_vex")
> > @@ -24372,7 +24404,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
> > (unspec:CC
> > [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
> > (match_operand:SI 3 "register_operand" "a,a,a,a")
> > - (match_operand:V16QI 4 "nonimmediate_operand" "x,m,x,m")
> > + (match_operand:V16QI 4 "nonimmediate_operand" "x,Bt,x,Bt")
> > (match_operand:SI 5 "register_operand" "d,d,d,d")
> > (match_operand:SI 6 "const_0_to_255_operand")]
> > UNSPEC_PCMPESTR))
> > @@ -24385,6 +24417,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
> > %vpcmpestri\t{%6, %4, %2|%2, %4, %6}
> > %vpcmpestri\t{%6, %4, %2|%2, %4, %6}"
> > [(set_attr "type" "sselog")
> > + (set_attr "gpr32" "0")
> > (set_attr "prefix_extra" "1")
> > (set_attr "length_immediate" "1")
> > (set_attr "memory" "none,load,none,load")
> > @@ -24396,7 +24429,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
> > [(set (match_operand:SI 0 "register_operand" "=c,c")
> > (unspec:SI
> > [(match_operand:V16QI 2 "register_operand" "x,x")
> > - (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> > + (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> > (match_operand:SI 4 "const_0_to_255_operand")]
> > UNSPEC_PCMPISTR))
> > (set (match_operand:V16QI 1 "register_operand" "=Yz,Yz")
> > @@ -24439,6 +24472,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
> > DONE;
> > }
> > [(set_attr "type" "sselog")
> > + (set_attr "gpr32" "0")
> > (set_attr "prefix_extra" "1")
> > (set_attr "length_immediate" "1")
> > (set_attr "memory" "none,load")
> > @@ -24448,7 +24482,7 @@ (define_insn "sse4_2_pcmpistri"
> > [(set (match_operand:SI 0 "register_operand" "=c,c")
> > (unspec:SI
> > [(match_operand:V16QI 1 "register_operand" "x,x")
> > - (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
> > + (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
> > (match_operand:SI 3 "const_0_to_255_operand")]
> > UNSPEC_PCMPISTR))
> > (set (reg:CC FLAGS_REG)
> > @@ -24460,6 +24494,7 @@ (define_insn "sse4_2_pcmpistri"
> > "TARGET_SSE4_2"
> > "%vpcmpistri\t{%3, %2, %1|%1, %2, %3}"
> > [(set_attr "type" "sselog")
> > + (set_attr "gpr32" "0")
> > (set_attr "prefix_extra" "1")
> > (set_attr "length_immediate" "1")
> > (set_attr "prefix" "maybe_vex")
> > @@ -24471,7 +24506,7 @@ (define_insn "sse4_2_pcmpistrm"
> > [(set (match_operand:V16QI 0 "register_operand" "=Yz,Yz")
> > (unspec:V16QI
> > [(match_operand:V16QI 1 "register_operand" "x,x")
> > - (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
> > + (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
> > (match_operand:SI 3 "const_0_to_255_operand")]
> > UNSPEC_PCMPISTR))
> > (set (reg:CC FLAGS_REG)
> > @@ -24483,6 +24518,7 @@ (define_insn "sse4_2_pcmpistrm"
> > "TARGET_SSE4_2"
> > "%vpcmpistrm\t{%3, %2, %1|%1, %2, %3}"
> > [(set_attr "type" "sselog")
> > + (set_attr "gpr32" "0")
> > (set_attr "prefix_extra" "1")
> > (set_attr "length_immediate" "1")
> > (set_attr "prefix" "maybe_vex")
> > @@ -24494,7 +24530,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
> > [(set (reg:CC FLAGS_REG)
> > (unspec:CC
> > [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
> > - (match_operand:V16QI 3 "nonimmediate_operand" "x,m,x,m")
> > + (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt,x,Bt")
> > (match_operand:SI 4 "const_0_to_255_operand")]
> > UNSPEC_PCMPISTR))
> > (clobber (match_scratch:V16QI 0 "=Yz,Yz,X,X"))
> > @@ -24506,6 +24542,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
> > %vpcmpistri\t{%4, %3, %2|%2, %3, %4}
> > %vpcmpistri\t{%4, %3, %2|%2, %3, %4}"
> > [(set_attr "type" "sselog")
> > + (set_attr "gpr32" "0")
> > (set_attr "prefix_extra" "1")
> > (set_attr "length_immediate" "1")
> > (set_attr "memory" "none,load,none,load")
> > @@ -25990,23 +26027,25 @@ (define_insn "aesdeclast"
> >
> > (define_insn "aesimc"
> > [(set (match_operand:V2DI 0 "register_operand" "=x")
> > - (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")]
> > + (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")]
> > UNSPEC_AESIMC))]
> > "TARGET_AES"
> > "%vaesimc\t{%1, %0|%0, %1}"
> > [(set_attr "type" "sselog1")
> > + (set_attr "gpr32" "0")
> > (set_attr "prefix_extra" "1")
> > (set_attr "prefix" "maybe_vex")
> > (set_attr "mode" "TI")])
> >
> > (define_insn "aeskeygenassist"
> > [(set (match_operand:V2DI 0 "register_operand" "=x")
> > - (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")
> > + (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")
> > (match_operand:SI 2 "const_0_to_255_operand")]
> > UNSPEC_AESKEYGENASSIST))]
> > "TARGET_AES"
> > "%vaeskeygenassist\t{%2, %1, %0|%0, %1, %2}"
> > [(set_attr "type" "sselog1")
> > + (set_attr "gpr32" "0")
> > (set_attr "prefix_extra" "1")
> > (set_attr "length_immediate" "1")
> > (set_attr "prefix" "maybe_vex")
> > diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> > index 510213a6ca7..771bcb078e1 100644
> > --- a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> > +++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> > @@ -45,13 +45,22 @@ typedef union
> > DTYPE a[16];
> > } tmp_u;
> >
> > -__attribute__((target("sse4.2")))
> > +__attribute__((target("sse4.2,aes")))
> > void sse_test ()
> > {
> > register tmp_u *tdst __asm__("%r16");
> > register tmp_u *src1 __asm__("%r17");
> > register tmp_u *src2 __asm__("%r18");
> > -
> > +
> > + src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
> > + src1->a[2] = _mm_testc_si128 (src1->xi[3], src2->xi[4]);
> > + src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
> > + _MM_FROUND_CUR_DIRECTION);
> > + src1->xf[4] = _mm_round_ps (src1->xf[7], _MM_FROUND_CUR_DIRECTION);
> > + src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
> > + _MM_FROUND_CUR_DIRECTION);
> > + src1->xd[1] = _mm_round_pd (src1->xd[4], _MM_FROUND_CUR_DIRECTION);
> > +
> > src1->xi[0] = _mm_hadd_epi16 (tdst->xi[2], src2->xi[3]);
> > src1->xi[1] = _mm_hadd_epi32 (tdst->xi[0], src2->xi[1]);
> > tdst->xi[2] = _mm_hadds_epi16 (src1->xi[4], src2->xi[5]);
> > @@ -77,16 +86,33 @@ void sse_test ()
> > tdst->xi[1] = _mm_sign_epi8 (src1->xi[5], src2->xi[6]);
> > tdst->xi[2] = _mm_sign_epi16 (src1->xi[7], src2->xi[0]);
> > tdst->xi[3] = _mm_sign_epi32 (src1->xi[1], src2->xi[2]);
> > +
> > + tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
> > + tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
> > + tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
> > + tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
> > +
> > + tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
> > + tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
> > }
> >
> > -__attribute__((target("avx2")))
> > +__attribute__((target("avx2,aes")))
> > void vex_test ()
> > {
> >
> > register tmp_u *tdst __asm__("%r16");
> > register tmp_u *src1 __asm__("%r17");
> > register tmp_u *src2 __asm__("%r18");
> > -
> > +
> > + src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
> > + src1->a[2] = _mm256_testc_si256 (src1->yi[2], src2->yi[3]);
> > + src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
> > + _MM_FROUND_CUR_DIRECTION);
> > + src1->yf[4] = _mm256_round_ps (src1->yf[2], _MM_FROUND_CUR_DIRECTION);
> > + src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
> > + _MM_FROUND_CUR_DIRECTION);
> > + src1->yd[1] = _mm256_round_pd (src1->yd[3], _MM_FROUND_CUR_DIRECTION);
> > +
> > src1->yi[1] = _mm256_hadd_epi16 (tdst->yi[2], src2->yi[3]);
> > src1->yi[2] = _mm256_hadd_epi32 (tdst->yi[0], src2->yi[1]);
> > tdst->yi[3] = _mm256_hadds_epi16 (src1->yi[1], src2->yi[2]);
> > @@ -98,7 +124,6 @@ void vex_test ()
> > src1->yi[1] = _mm256_cmpgt_epi64 (tdst->yi[3], src2->yi[0]);
> >
> > tdst->yf[2] = _mm256_dp_ps (src1->yf[0], src2->yf[1], 0xbf);
> > - tdst->xd[3] = _mm_dp_pd (src1->xd[0], src2->xd[1], 0xbf);
> >
> > tdst->yi[3] = _mm256_mpsadbw_epu8 (src1->yi[1], src2->yi[1], 0xc1);
> >
> > @@ -112,6 +137,14 @@ void vex_test ()
> > tdst->yi[2] = _mm256_sign_epi8 (src1->yi[0], src2->yi[1]);
> > tdst->yi[3] = _mm256_sign_epi16 (src1->yi[2], src2->yi[3]);
> > tdst->yi[0] = _mm256_sign_epi32 (src1->yi[0], src2->yi[1]);
> > +
> > + tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
> > + tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
> > + tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
> > + tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
> > +
> > + tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
> > + tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
> > }
> >
> > /* { dg-final { scan-assembler-not "v?pcmpeqq\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > @@ -134,3 +167,15 @@ void vex_test ()
> > /* { dg-final { scan-assembler-not "v?psignb\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > /* { dg-final { scan-assembler-not "v?psignw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > /* { dg-final { scan-assembler-not "v?psignd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?phminposuw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?ptest\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?roundss\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?roundsd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?roundps\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?roundpd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?pcmpestri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?pcmpistri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?pcmpestrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?pcmpistrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?aesimc\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > +/* { dg-final { scan-assembler-not "v?aeskeygenassist\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > --
> > 2.31.1
> >
On Thu, Aug 31, 2023 at 11:26:26AM +0200, Richard Biener wrote:
> On Thu, Aug 31, 2023 at 10:25 AM Hongyu Wang via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > From: Kong Lingling <lingling.kong@intel.com>
> >
> > Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
> > but no evex counterpart.
> >
> > insn list:
> > 1. phminposuw/vphminposuw
> > 2. ptest/vptest
> > 3. roundps/vroundps, roundpd/vroundpd,
> > roundss/vroundss, roundsd/vroundsd
> > 4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
> > 5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm
>
> How are GPRs involved in the above? Or did I misunderstand something?
Those instructions allow memory operands, and say vptest (%r18), %xmm7
isn't supported.
Jakub
Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> 于2023年8月31日周四 17:31写道:
>
> On Thu, Aug 31, 2023 at 11:26 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Thu, Aug 31, 2023 at 10:25 AM Hongyu Wang via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > From: Kong Lingling <lingling.kong@intel.com>
> > >
> > > Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
> > > but no evex counterpart.
> > >
> > > insn list:
> > > 1. phminposuw/vphminposuw
> > > 2. ptest/vptest
> > > 3. roundps/vroundps, roundpd/vroundpd,
> > > roundss/vroundss, roundsd/vroundsd
> > > 4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
> > > 5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm
> >
> > How are GPRs involved in the above? Or did I misunderstand something?
>
> Following up myself - for the memory operand alternatives I guess. How about
> simply disabling the memory alternatives when EGPR is active? Wouldn't
> that simplify the initial patchset a lot? Re-enabling them when
> deemed important
> could be done as followup then?
>
It also require per-pattern change that set the attr isa for mem alternative
with "noapx_egpr". We also have a series of patterns that some
alternatives support
EGPR and another does not. Like commonly used vec_set patterns, then we still
need to identify the alternatives that if it is EGPR supported.
> Richard.
>
> > > 6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New
> > > prototype.
> > > * config/i386/i386.cc (x86_evex_reg_mentioned_p): New
> > > function.
> > > * config/i386/i386.md (sse4_1_round<mode>2): Set attr gpr32 0
> > > and constraint Bt/BM to all non-evex alternatives, adjust
> > > alternative outputs if evex reg is mentioned.
> > > * config/i386/sse.md (<sse4_1>_ptest<mode>): Set attr gpr32 0
> > > and constraint Bt/BM to all non-evex alternatives.
> > > (ptesttf2): Likewise.
> > > (<sse4_1>_round<ssemodesuffix><avxsizesuffix): Likewise.
> > > (sse4_1_round<ssescalarmodesuffix>): Likewise.
> > > (sse4_2_pcmpestri): Likewise.
> > > (sse4_2_pcmpestrm): Likewise.
> > > (sse4_2_pcmpestr_cconly): Likewise.
> > > (sse4_2_pcmpistr): Likewise.
> > > (sse4_2_pcmpistri): Likewise.
> > > (sse4_2_pcmpistrm): Likewise.
> > > (sse4_2_pcmpistr_cconly): Likewise.
> > > (aesimc): Likewise.
> > > (aeskeygenassist): Likewise.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
> > > tests.
> > > ---
> > > gcc/config/i386/i386-protos.h | 1 +
> > > gcc/config/i386/i386.cc | 13 +++
> > > gcc/config/i386/i386.md | 3 +-
> > > gcc/config/i386/sse.md | 93 +++++++++++++------
> > > .../i386/apx-legacy-insn-check-norex2.c | 55 ++++++++++-
> > > 5 files changed, 132 insertions(+), 33 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> > > index 78eb3e0f584..bbb219e3039 100644
> > > --- a/gcc/config/i386/i386-protos.h
> > > +++ b/gcc/config/i386/i386-protos.h
> > > @@ -65,6 +65,7 @@ extern bool extended_reg_mentioned_p (rtx);
> > > extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
> > > extern bool x86_extended_reg_mentioned_p (rtx);
> > > extern bool x86_extended_rex2reg_mentioned_p (rtx);
> > > +extern bool x86_evex_reg_mentioned_p (rtx [], int);
> > > extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
> > > extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
> > >
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index f5d642948bc..ec93c5bab97 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -22936,6 +22936,19 @@ x86_extended_rex2reg_mentioned_p (rtx insn)
> > > return false;
> > > }
> > >
> > > +/* Return true when rtx operands mentions register that must be encoded using
> > > + evex prefix. */
> > > +bool
> > > +x86_evex_reg_mentioned_p (rtx operands[], int nops)
> > > +{
> > > + int i;
> > > + for (i = 0; i < nops; i++)
> > > + if (EXT_REX_SSE_REG_P (operands[i])
> > > + || x86_extended_rex2reg_mentioned_p (operands[i]))
> > > + return true;
> > > + return false;
> > > +}
> > > +
> > > /* If profitable, negate (without causing overflow) integer constant
> > > of mode MODE at location LOC. Return true in this case. */
> > > bool
> > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > > index 83ad01b43c1..4c305e72389 100644
> > > --- a/gcc/config/i386/i386.md
> > > +++ b/gcc/config/i386/i386.md
> > > @@ -21603,7 +21603,7 @@ (define_expand "significand<mode>2"
> > > (define_insn "sse4_1_round<mode>2"
> > > [(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v")
> > > (unspec:MODEFH
> > > - [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,m,v,m")
> > > + [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,Bt,v,m")
> > > (match_operand:SI 2 "const_0_to_15_operand")]
> > > UNSPEC_ROUND))]
> > > "TARGET_SSE4_1"
> > > @@ -21616,6 +21616,7 @@ (define_insn "sse4_1_round<mode>2"
> > > [(set_attr "type" "ssecvt")
> > > (set_attr "prefix_extra" "1,1,1,*,*")
> > > (set_attr "length_immediate" "1")
> > > + (set_attr "gpr32" "1,1,0,1,1")
> > > (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
> > > (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
> > > (set_attr "avx_partial_xmm_update" "false,false,true,false,true")
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > index 05963de9219..456713b991a 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -22617,11 +22617,12 @@ (define_insn "avx2_pblendd<mode>"
> > >
> > > (define_insn "sse4_1_phminposuw"
> > > [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,x")
> > > - (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBm,*xBm,xm")]
> > > + (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBT,*xBT,xBt")]
> > > UNSPEC_PHMINPOSUW))]
> > > "TARGET_SSE4_1"
> > > "%vphminposuw\t{%1, %0|%0, %1}"
> > > [(set_attr "isa" "noavx,noavx,avx")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "type" "sselog1")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "prefix" "orig,orig,vex")
> > > @@ -23810,12 +23811,13 @@ (define_insn "avx_vtest<ssemodesuffix><avxsizesuffix>"
> > > (define_insn "*<sse4_1>_ptest<mode>"
> > > [(set (reg FLAGS_REG)
> > > (unspec [(match_operand:V_AVX 0 "register_operand" "Yr, *x, x")
> > > - (match_operand:V_AVX 1 "vector_operand" "YrBm, *xBm, xm")]
> > > + (match_operand:V_AVX 1 "vector_operand" "YrBT, *xBT, xBt")]
> > > UNSPEC_PTEST))]
> > > "TARGET_SSE4_1 && ix86_match_ptest_ccmode (insn)"
> > > "%vptest\t{%1, %0|%0, %1}"
> > > [(set_attr "isa" "noavx,noavx,avx")
> > > (set_attr "type" "ssecomi")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "prefix" "orig,orig,vex")
> > > (set (attr "btver2_decode")
> > > @@ -23852,12 +23854,13 @@ (define_expand "<sse4_1>_ptest<mode>"
> > > (define_insn "ptesttf2"
> > > [(set (reg:CC FLAGS_REG)
> > > (unspec:CC [(match_operand:TF 0 "register_operand" "Yr, *x, x")
> > > - (match_operand:TF 1 "vector_operand" "YrBm, *xBm, xm")]
> > > + (match_operand:TF 1 "vector_operand" "YrBT, *xBT, xBt")]
> > > UNSPEC_PTEST))]
> > > "TARGET_SSE4_1"
> > > "%vptest\t{%1, %0|%0, %1}"
> > > [(set_attr "isa" "noavx,noavx,avx")
> > > (set_attr "type" "ssecomi")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "prefix" "orig,orig,vex")
> > > (set_attr "mode" "TI")])
> > > @@ -23968,13 +23971,14 @@ (define_expand "lrint<mode><sseintvecmodelower>2"
> > > (define_insn "<sse4_1>_round<ssemodesuffix><avxsizesuffix>"
> > > [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
> > > (unspec:VF_128_256
> > > - [(match_operand:VF_128_256 1 "vector_operand" "YrBm,*xBm,xm")
> > > + [(match_operand:VF_128_256 1 "vector_operand" "YrBT,*xBT,xBt")
> > > (match_operand:SI 2 "const_0_to_15_operand")]
> > > UNSPEC_ROUND))]
> > > "TARGET_SSE4_1"
> > > "%vround<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
> > > [(set_attr "isa" "noavx,noavx,avx")
> > > (set_attr "type" "ssecvt")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_data16" "1,1,*")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "length_immediate" "1")
> > > @@ -24061,19 +24065,32 @@ (define_insn "sse4_1_round<ssescalarmodesuffix>"
> > > [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
> > > (vec_merge:VF_128
> > > (unspec:VF_128
> > > - [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> > > + [(match_operand:VF_128 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
> > > (match_operand:SI 3 "const_0_to_15_operand")]
> > > UNSPEC_ROUND)
> > > (match_operand:VF_128 1 "register_operand" "0,0,x,v")
> > > (const_int 1)))]
> > > "TARGET_SSE4_1"
> > > - "@
> > > - round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > > - round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > > - vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
> > > - vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
> > > - [(set_attr "isa" "noavx,noavx,avx,avx512f")
> > > +{
> > > + switch (which_alternative)
> > > + {
> > > + case 0:
> > > + case 1:
> > > + return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}";
> > > + case 2:
> > > + return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> > > + case 3:
> > > + if (x86_evex_reg_mentioned_p (operands, 3))
> > > + return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> > > + else
> > > + return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> > > + default:
> > > + gcc_unreachable ();
> > > + }
> > > +}
> > > + [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
> > > (set_attr "type" "ssecvt")
> > > + (set_attr "gpr32" "0,0,0,1")
> > > (set_attr "length_immediate" "1")
> > > (set_attr "prefix_data16" "1,1,*,*")
> > > (set_attr "prefix_extra" "1")
> > > @@ -24085,19 +24102,32 @@ (define_insn "*sse4_1_round<ssescalarmodesuffix>"
> > > (vec_merge:VFH_128
> > > (vec_duplicate:VFH_128
> > > (unspec:<ssescalarmode>
> > > - [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> > > + [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
> > > (match_operand:SI 3 "const_0_to_15_operand")]
> > > UNSPEC_ROUND))
> > > (match_operand:VFH_128 1 "register_operand" "0,0,x,v")
> > > (const_int 1)))]
> > > "TARGET_SSE4_1"
> > > - "@
> > > - round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
> > > - round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
> > > - vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
> > > - vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
> > > - [(set_attr "isa" "noavx,noavx,avx,avx512f")
> > > +{
> > > + switch (which_alternative)
> > > + {
> > > + case 0:
> > > + case 1:
> > > + return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}";
> > > + case 2:
> > > + return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> > > + case 3:
> > > + if (x86_evex_reg_mentioned_p (operands, 3) || <MODE>mode == V8HFmode)
> > > + return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> > > + else
> > > + return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> > > + default:
> > > + gcc_unreachable ();
> > > + }
> > > +}
> > > + [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
> > > (set_attr "type" "ssecvt")
> > > + (set_attr "gpr32" "0,0,0,1")
> > > (set_attr "length_immediate" "1")
> > > (set_attr "prefix_data16" "1,1,*,*")
> > > (set_attr "prefix_extra" "1")
> > > @@ -24318,7 +24348,7 @@ (define_insn "sse4_2_pcmpestri"
> > > (unspec:SI
> > > [(match_operand:V16QI 1 "register_operand" "x,x")
> > > (match_operand:SI 2 "register_operand" "a,a")
> > > - (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> > > + (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> > > (match_operand:SI 4 "register_operand" "d,d")
> > > (match_operand:SI 5 "const_0_to_255_operand")]
> > > UNSPEC_PCMPESTR))
> > > @@ -24333,6 +24363,7 @@ (define_insn "sse4_2_pcmpestri"
> > > "TARGET_SSE4_2"
> > > "%vpcmpestri\t{%5, %3, %1|%1, %3, %5}"
> > > [(set_attr "type" "sselog")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "prefix" "maybe_vex")
> > > (set_attr "length_immediate" "1")
> > > @@ -24345,7 +24376,7 @@ (define_insn "sse4_2_pcmpestrm"
> > > (unspec:V16QI
> > > [(match_operand:V16QI 1 "register_operand" "x,x")
> > > (match_operand:SI 2 "register_operand" "a,a")
> > > - (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> > > + (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> > > (match_operand:SI 4 "register_operand" "d,d")
> > > (match_operand:SI 5 "const_0_to_255_operand")]
> > > UNSPEC_PCMPESTR))
> > > @@ -24360,6 +24391,7 @@ (define_insn "sse4_2_pcmpestrm"
> > > "TARGET_SSE4_2"
> > > "%vpcmpestrm\t{%5, %3, %1|%1, %3, %5}"
> > > [(set_attr "type" "sselog")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "length_immediate" "1")
> > > (set_attr "prefix" "maybe_vex")
> > > @@ -24372,7 +24404,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
> > > (unspec:CC
> > > [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
> > > (match_operand:SI 3 "register_operand" "a,a,a,a")
> > > - (match_operand:V16QI 4 "nonimmediate_operand" "x,m,x,m")
> > > + (match_operand:V16QI 4 "nonimmediate_operand" "x,Bt,x,Bt")
> > > (match_operand:SI 5 "register_operand" "d,d,d,d")
> > > (match_operand:SI 6 "const_0_to_255_operand")]
> > > UNSPEC_PCMPESTR))
> > > @@ -24385,6 +24417,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
> > > %vpcmpestri\t{%6, %4, %2|%2, %4, %6}
> > > %vpcmpestri\t{%6, %4, %2|%2, %4, %6}"
> > > [(set_attr "type" "sselog")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "length_immediate" "1")
> > > (set_attr "memory" "none,load,none,load")
> > > @@ -24396,7 +24429,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
> > > [(set (match_operand:SI 0 "register_operand" "=c,c")
> > > (unspec:SI
> > > [(match_operand:V16QI 2 "register_operand" "x,x")
> > > - (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> > > + (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> > > (match_operand:SI 4 "const_0_to_255_operand")]
> > > UNSPEC_PCMPISTR))
> > > (set (match_operand:V16QI 1 "register_operand" "=Yz,Yz")
> > > @@ -24439,6 +24472,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
> > > DONE;
> > > }
> > > [(set_attr "type" "sselog")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "length_immediate" "1")
> > > (set_attr "memory" "none,load")
> > > @@ -24448,7 +24482,7 @@ (define_insn "sse4_2_pcmpistri"
> > > [(set (match_operand:SI 0 "register_operand" "=c,c")
> > > (unspec:SI
> > > [(match_operand:V16QI 1 "register_operand" "x,x")
> > > - (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
> > > + (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
> > > (match_operand:SI 3 "const_0_to_255_operand")]
> > > UNSPEC_PCMPISTR))
> > > (set (reg:CC FLAGS_REG)
> > > @@ -24460,6 +24494,7 @@ (define_insn "sse4_2_pcmpistri"
> > > "TARGET_SSE4_2"
> > > "%vpcmpistri\t{%3, %2, %1|%1, %2, %3}"
> > > [(set_attr "type" "sselog")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "length_immediate" "1")
> > > (set_attr "prefix" "maybe_vex")
> > > @@ -24471,7 +24506,7 @@ (define_insn "sse4_2_pcmpistrm"
> > > [(set (match_operand:V16QI 0 "register_operand" "=Yz,Yz")
> > > (unspec:V16QI
> > > [(match_operand:V16QI 1 "register_operand" "x,x")
> > > - (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
> > > + (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
> > > (match_operand:SI 3 "const_0_to_255_operand")]
> > > UNSPEC_PCMPISTR))
> > > (set (reg:CC FLAGS_REG)
> > > @@ -24483,6 +24518,7 @@ (define_insn "sse4_2_pcmpistrm"
> > > "TARGET_SSE4_2"
> > > "%vpcmpistrm\t{%3, %2, %1|%1, %2, %3}"
> > > [(set_attr "type" "sselog")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "length_immediate" "1")
> > > (set_attr "prefix" "maybe_vex")
> > > @@ -24494,7 +24530,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
> > > [(set (reg:CC FLAGS_REG)
> > > (unspec:CC
> > > [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
> > > - (match_operand:V16QI 3 "nonimmediate_operand" "x,m,x,m")
> > > + (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt,x,Bt")
> > > (match_operand:SI 4 "const_0_to_255_operand")]
> > > UNSPEC_PCMPISTR))
> > > (clobber (match_scratch:V16QI 0 "=Yz,Yz,X,X"))
> > > @@ -24506,6 +24542,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
> > > %vpcmpistri\t{%4, %3, %2|%2, %3, %4}
> > > %vpcmpistri\t{%4, %3, %2|%2, %3, %4}"
> > > [(set_attr "type" "sselog")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "length_immediate" "1")
> > > (set_attr "memory" "none,load,none,load")
> > > @@ -25990,23 +26027,25 @@ (define_insn "aesdeclast"
> > >
> > > (define_insn "aesimc"
> > > [(set (match_operand:V2DI 0 "register_operand" "=x")
> > > - (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")]
> > > + (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")]
> > > UNSPEC_AESIMC))]
> > > "TARGET_AES"
> > > "%vaesimc\t{%1, %0|%0, %1}"
> > > [(set_attr "type" "sselog1")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "prefix" "maybe_vex")
> > > (set_attr "mode" "TI")])
> > >
> > > (define_insn "aeskeygenassist"
> > > [(set (match_operand:V2DI 0 "register_operand" "=x")
> > > - (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")
> > > + (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")
> > > (match_operand:SI 2 "const_0_to_255_operand")]
> > > UNSPEC_AESKEYGENASSIST))]
> > > "TARGET_AES"
> > > "%vaeskeygenassist\t{%2, %1, %0|%0, %1, %2}"
> > > [(set_attr "type" "sselog1")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "length_immediate" "1")
> > > (set_attr "prefix" "maybe_vex")
> > > diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> > > index 510213a6ca7..771bcb078e1 100644
> > > --- a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> > > +++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> > > @@ -45,13 +45,22 @@ typedef union
> > > DTYPE a[16];
> > > } tmp_u;
> > >
> > > -__attribute__((target("sse4.2")))
> > > +__attribute__((target("sse4.2,aes")))
> > > void sse_test ()
> > > {
> > > register tmp_u *tdst __asm__("%r16");
> > > register tmp_u *src1 __asm__("%r17");
> > > register tmp_u *src2 __asm__("%r18");
> > > -
> > > +
> > > + src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
> > > + src1->a[2] = _mm_testc_si128 (src1->xi[3], src2->xi[4]);
> > > + src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
> > > + _MM_FROUND_CUR_DIRECTION);
> > > + src1->xf[4] = _mm_round_ps (src1->xf[7], _MM_FROUND_CUR_DIRECTION);
> > > + src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
> > > + _MM_FROUND_CUR_DIRECTION);
> > > + src1->xd[1] = _mm_round_pd (src1->xd[4], _MM_FROUND_CUR_DIRECTION);
> > > +
> > > src1->xi[0] = _mm_hadd_epi16 (tdst->xi[2], src2->xi[3]);
> > > src1->xi[1] = _mm_hadd_epi32 (tdst->xi[0], src2->xi[1]);
> > > tdst->xi[2] = _mm_hadds_epi16 (src1->xi[4], src2->xi[5]);
> > > @@ -77,16 +86,33 @@ void sse_test ()
> > > tdst->xi[1] = _mm_sign_epi8 (src1->xi[5], src2->xi[6]);
> > > tdst->xi[2] = _mm_sign_epi16 (src1->xi[7], src2->xi[0]);
> > > tdst->xi[3] = _mm_sign_epi32 (src1->xi[1], src2->xi[2]);
> > > +
> > > + tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
> > > + tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
> > > + tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
> > > + tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
> > > +
> > > + tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
> > > + tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
> > > }
> > >
> > > -__attribute__((target("avx2")))
> > > +__attribute__((target("avx2,aes")))
> > > void vex_test ()
> > > {
> > >
> > > register tmp_u *tdst __asm__("%r16");
> > > register tmp_u *src1 __asm__("%r17");
> > > register tmp_u *src2 __asm__("%r18");
> > > -
> > > +
> > > + src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
> > > + src1->a[2] = _mm256_testc_si256 (src1->yi[2], src2->yi[3]);
> > > + src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
> > > + _MM_FROUND_CUR_DIRECTION);
> > > + src1->yf[4] = _mm256_round_ps (src1->yf[2], _MM_FROUND_CUR_DIRECTION);
> > > + src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
> > > + _MM_FROUND_CUR_DIRECTION);
> > > + src1->yd[1] = _mm256_round_pd (src1->yd[3], _MM_FROUND_CUR_DIRECTION);
> > > +
> > > src1->yi[1] = _mm256_hadd_epi16 (tdst->yi[2], src2->yi[3]);
> > > src1->yi[2] = _mm256_hadd_epi32 (tdst->yi[0], src2->yi[1]);
> > > tdst->yi[3] = _mm256_hadds_epi16 (src1->yi[1], src2->yi[2]);
> > > @@ -98,7 +124,6 @@ void vex_test ()
> > > src1->yi[1] = _mm256_cmpgt_epi64 (tdst->yi[3], src2->yi[0]);
> > >
> > > tdst->yf[2] = _mm256_dp_ps (src1->yf[0], src2->yf[1], 0xbf);
> > > - tdst->xd[3] = _mm_dp_pd (src1->xd[0], src2->xd[1], 0xbf);
> > >
> > > tdst->yi[3] = _mm256_mpsadbw_epu8 (src1->yi[1], src2->yi[1], 0xc1);
> > >
> > > @@ -112,6 +137,14 @@ void vex_test ()
> > > tdst->yi[2] = _mm256_sign_epi8 (src1->yi[0], src2->yi[1]);
> > > tdst->yi[3] = _mm256_sign_epi16 (src1->yi[2], src2->yi[3]);
> > > tdst->yi[0] = _mm256_sign_epi32 (src1->yi[0], src2->yi[1]);
> > > +
> > > + tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
> > > + tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
> > > + tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
> > > + tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
> > > +
> > > + tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
> > > + tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
> > > }
> > >
> > > /* { dg-final { scan-assembler-not "v?pcmpeqq\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > @@ -134,3 +167,15 @@ void vex_test ()
> > > /* { dg-final { scan-assembler-not "v?psignb\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > /* { dg-final { scan-assembler-not "v?psignw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > /* { dg-final { scan-assembler-not "v?psignd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?phminposuw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?ptest\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?roundss\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?roundsd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?roundps\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?roundpd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?pcmpestri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?pcmpistri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?pcmpestrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?pcmpistrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?aesimc\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?aeskeygenassist\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > --
> > > 2.31.1
> > >
On Thu, Aug 31, 2023 at 5:31 PM Richard Biener via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On Thu, Aug 31, 2023 at 11:26 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Thu, Aug 31, 2023 at 10:25 AM Hongyu Wang via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > From: Kong Lingling <lingling.kong@intel.com>
> > >
> > > Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
> > > but no evex counterpart.
> > >
> > > insn list:
> > > 1. phminposuw/vphminposuw
> > > 2. ptest/vptest
> > > 3. roundps/vroundps, roundpd/vroundpd,
> > > roundss/vroundss, roundsd/vroundsd
> > > 4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
> > > 5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm
> >
> > How are GPRs involved in the above? Or did I misunderstand something?
>
> Following up myself - for the memory operand alternatives I guess. How about
> simply disabling the memory alternatives when EGPR is active? Wouldn't
> that simplify the initial patchset a lot? Re-enabling them when
> deemed important
> could be done as followup then?
>
There're instructions only support memory operand but don't support
gpr32 (.i.e. xsave)
We still need to handle them at the initial patch.
> Richard.
>
> > > 6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New
> > > prototype.
> > > * config/i386/i386.cc (x86_evex_reg_mentioned_p): New
> > > function.
> > > * config/i386/i386.md (sse4_1_round<mode>2): Set attr gpr32 0
> > > and constraint Bt/BM to all non-evex alternatives, adjust
> > > alternative outputs if evex reg is mentioned.
> > > * config/i386/sse.md (<sse4_1>_ptest<mode>): Set attr gpr32 0
> > > and constraint Bt/BM to all non-evex alternatives.
> > > (ptesttf2): Likewise.
> > > (<sse4_1>_round<ssemodesuffix><avxsizesuffix): Likewise.
> > > (sse4_1_round<ssescalarmodesuffix>): Likewise.
> > > (sse4_2_pcmpestri): Likewise.
> > > (sse4_2_pcmpestrm): Likewise.
> > > (sse4_2_pcmpestr_cconly): Likewise.
> > > (sse4_2_pcmpistr): Likewise.
> > > (sse4_2_pcmpistri): Likewise.
> > > (sse4_2_pcmpistrm): Likewise.
> > > (sse4_2_pcmpistr_cconly): Likewise.
> > > (aesimc): Likewise.
> > > (aeskeygenassist): Likewise.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
> > > tests.
> > > ---
> > > gcc/config/i386/i386-protos.h | 1 +
> > > gcc/config/i386/i386.cc | 13 +++
> > > gcc/config/i386/i386.md | 3 +-
> > > gcc/config/i386/sse.md | 93 +++++++++++++------
> > > .../i386/apx-legacy-insn-check-norex2.c | 55 ++++++++++-
> > > 5 files changed, 132 insertions(+), 33 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> > > index 78eb3e0f584..bbb219e3039 100644
> > > --- a/gcc/config/i386/i386-protos.h
> > > +++ b/gcc/config/i386/i386-protos.h
> > > @@ -65,6 +65,7 @@ extern bool extended_reg_mentioned_p (rtx);
> > > extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
> > > extern bool x86_extended_reg_mentioned_p (rtx);
> > > extern bool x86_extended_rex2reg_mentioned_p (rtx);
> > > +extern bool x86_evex_reg_mentioned_p (rtx [], int);
> > > extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
> > > extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
> > >
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index f5d642948bc..ec93c5bab97 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -22936,6 +22936,19 @@ x86_extended_rex2reg_mentioned_p (rtx insn)
> > > return false;
> > > }
> > >
> > > +/* Return true when rtx operands mentions register that must be encoded using
> > > + evex prefix. */
> > > +bool
> > > +x86_evex_reg_mentioned_p (rtx operands[], int nops)
> > > +{
> > > + int i;
> > > + for (i = 0; i < nops; i++)
> > > + if (EXT_REX_SSE_REG_P (operands[i])
> > > + || x86_extended_rex2reg_mentioned_p (operands[i]))
> > > + return true;
> > > + return false;
> > > +}
> > > +
> > > /* If profitable, negate (without causing overflow) integer constant
> > > of mode MODE at location LOC. Return true in this case. */
> > > bool
> > > diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> > > index 83ad01b43c1..4c305e72389 100644
> > > --- a/gcc/config/i386/i386.md
> > > +++ b/gcc/config/i386/i386.md
> > > @@ -21603,7 +21603,7 @@ (define_expand "significand<mode>2"
> > > (define_insn "sse4_1_round<mode>2"
> > > [(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v")
> > > (unspec:MODEFH
> > > - [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,m,v,m")
> > > + [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,Bt,v,m")
> > > (match_operand:SI 2 "const_0_to_15_operand")]
> > > UNSPEC_ROUND))]
> > > "TARGET_SSE4_1"
> > > @@ -21616,6 +21616,7 @@ (define_insn "sse4_1_round<mode>2"
> > > [(set_attr "type" "ssecvt")
> > > (set_attr "prefix_extra" "1,1,1,*,*")
> > > (set_attr "length_immediate" "1")
> > > + (set_attr "gpr32" "1,1,0,1,1")
> > > (set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
> > > (set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
> > > (set_attr "avx_partial_xmm_update" "false,false,true,false,true")
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > index 05963de9219..456713b991a 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -22617,11 +22617,12 @@ (define_insn "avx2_pblendd<mode>"
> > >
> > > (define_insn "sse4_1_phminposuw"
> > > [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,x")
> > > - (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBm,*xBm,xm")]
> > > + (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBT,*xBT,xBt")]
> > > UNSPEC_PHMINPOSUW))]
> > > "TARGET_SSE4_1"
> > > "%vphminposuw\t{%1, %0|%0, %1}"
> > > [(set_attr "isa" "noavx,noavx,avx")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "type" "sselog1")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "prefix" "orig,orig,vex")
> > > @@ -23810,12 +23811,13 @@ (define_insn "avx_vtest<ssemodesuffix><avxsizesuffix>"
> > > (define_insn "*<sse4_1>_ptest<mode>"
> > > [(set (reg FLAGS_REG)
> > > (unspec [(match_operand:V_AVX 0 "register_operand" "Yr, *x, x")
> > > - (match_operand:V_AVX 1 "vector_operand" "YrBm, *xBm, xm")]
> > > + (match_operand:V_AVX 1 "vector_operand" "YrBT, *xBT, xBt")]
> > > UNSPEC_PTEST))]
> > > "TARGET_SSE4_1 && ix86_match_ptest_ccmode (insn)"
> > > "%vptest\t{%1, %0|%0, %1}"
> > > [(set_attr "isa" "noavx,noavx,avx")
> > > (set_attr "type" "ssecomi")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "prefix" "orig,orig,vex")
> > > (set (attr "btver2_decode")
> > > @@ -23852,12 +23854,13 @@ (define_expand "<sse4_1>_ptest<mode>"
> > > (define_insn "ptesttf2"
> > > [(set (reg:CC FLAGS_REG)
> > > (unspec:CC [(match_operand:TF 0 "register_operand" "Yr, *x, x")
> > > - (match_operand:TF 1 "vector_operand" "YrBm, *xBm, xm")]
> > > + (match_operand:TF 1 "vector_operand" "YrBT, *xBT, xBt")]
> > > UNSPEC_PTEST))]
> > > "TARGET_SSE4_1"
> > > "%vptest\t{%1, %0|%0, %1}"
> > > [(set_attr "isa" "noavx,noavx,avx")
> > > (set_attr "type" "ssecomi")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "prefix" "orig,orig,vex")
> > > (set_attr "mode" "TI")])
> > > @@ -23968,13 +23971,14 @@ (define_expand "lrint<mode><sseintvecmodelower>2"
> > > (define_insn "<sse4_1>_round<ssemodesuffix><avxsizesuffix>"
> > > [(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
> > > (unspec:VF_128_256
> > > - [(match_operand:VF_128_256 1 "vector_operand" "YrBm,*xBm,xm")
> > > + [(match_operand:VF_128_256 1 "vector_operand" "YrBT,*xBT,xBt")
> > > (match_operand:SI 2 "const_0_to_15_operand")]
> > > UNSPEC_ROUND))]
> > > "TARGET_SSE4_1"
> > > "%vround<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
> > > [(set_attr "isa" "noavx,noavx,avx")
> > > (set_attr "type" "ssecvt")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_data16" "1,1,*")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "length_immediate" "1")
> > > @@ -24061,19 +24065,32 @@ (define_insn "sse4_1_round<ssescalarmodesuffix>"
> > > [(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
> > > (vec_merge:VF_128
> > > (unspec:VF_128
> > > - [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> > > + [(match_operand:VF_128 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
> > > (match_operand:SI 3 "const_0_to_15_operand")]
> > > UNSPEC_ROUND)
> > > (match_operand:VF_128 1 "register_operand" "0,0,x,v")
> > > (const_int 1)))]
> > > "TARGET_SSE4_1"
> > > - "@
> > > - round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > > - round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
> > > - vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
> > > - vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
> > > - [(set_attr "isa" "noavx,noavx,avx,avx512f")
> > > +{
> > > + switch (which_alternative)
> > > + {
> > > + case 0:
> > > + case 1:
> > > + return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}";
> > > + case 2:
> > > + return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> > > + case 3:
> > > + if (x86_evex_reg_mentioned_p (operands, 3))
> > > + return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> > > + else
> > > + return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
> > > + default:
> > > + gcc_unreachable ();
> > > + }
> > > +}
> > > + [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
> > > (set_attr "type" "ssecvt")
> > > + (set_attr "gpr32" "0,0,0,1")
> > > (set_attr "length_immediate" "1")
> > > (set_attr "prefix_data16" "1,1,*,*")
> > > (set_attr "prefix_extra" "1")
> > > @@ -24085,19 +24102,32 @@ (define_insn "*sse4_1_round<ssescalarmodesuffix>"
> > > (vec_merge:VFH_128
> > > (vec_duplicate:VFH_128
> > > (unspec:<ssescalarmode>
> > > - [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
> > > + [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
> > > (match_operand:SI 3 "const_0_to_15_operand")]
> > > UNSPEC_ROUND))
> > > (match_operand:VFH_128 1 "register_operand" "0,0,x,v")
> > > (const_int 1)))]
> > > "TARGET_SSE4_1"
> > > - "@
> > > - round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
> > > - round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
> > > - vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
> > > - vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
> > > - [(set_attr "isa" "noavx,noavx,avx,avx512f")
> > > +{
> > > + switch (which_alternative)
> > > + {
> > > + case 0:
> > > + case 1:
> > > + return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}";
> > > + case 2:
> > > + return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> > > + case 3:
> > > + if (x86_evex_reg_mentioned_p (operands, 3) || <MODE>mode == V8HFmode)
> > > + return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> > > + else
> > > + return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
> > > + default:
> > > + gcc_unreachable ();
> > > + }
> > > +}
> > > + [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
> > > (set_attr "type" "ssecvt")
> > > + (set_attr "gpr32" "0,0,0,1")
> > > (set_attr "length_immediate" "1")
> > > (set_attr "prefix_data16" "1,1,*,*")
> > > (set_attr "prefix_extra" "1")
> > > @@ -24318,7 +24348,7 @@ (define_insn "sse4_2_pcmpestri"
> > > (unspec:SI
> > > [(match_operand:V16QI 1 "register_operand" "x,x")
> > > (match_operand:SI 2 "register_operand" "a,a")
> > > - (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> > > + (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> > > (match_operand:SI 4 "register_operand" "d,d")
> > > (match_operand:SI 5 "const_0_to_255_operand")]
> > > UNSPEC_PCMPESTR))
> > > @@ -24333,6 +24363,7 @@ (define_insn "sse4_2_pcmpestri"
> > > "TARGET_SSE4_2"
> > > "%vpcmpestri\t{%5, %3, %1|%1, %3, %5}"
> > > [(set_attr "type" "sselog")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "prefix" "maybe_vex")
> > > (set_attr "length_immediate" "1")
> > > @@ -24345,7 +24376,7 @@ (define_insn "sse4_2_pcmpestrm"
> > > (unspec:V16QI
> > > [(match_operand:V16QI 1 "register_operand" "x,x")
> > > (match_operand:SI 2 "register_operand" "a,a")
> > > - (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> > > + (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> > > (match_operand:SI 4 "register_operand" "d,d")
> > > (match_operand:SI 5 "const_0_to_255_operand")]
> > > UNSPEC_PCMPESTR))
> > > @@ -24360,6 +24391,7 @@ (define_insn "sse4_2_pcmpestrm"
> > > "TARGET_SSE4_2"
> > > "%vpcmpestrm\t{%5, %3, %1|%1, %3, %5}"
> > > [(set_attr "type" "sselog")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "length_immediate" "1")
> > > (set_attr "prefix" "maybe_vex")
> > > @@ -24372,7 +24404,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
> > > (unspec:CC
> > > [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
> > > (match_operand:SI 3 "register_operand" "a,a,a,a")
> > > - (match_operand:V16QI 4 "nonimmediate_operand" "x,m,x,m")
> > > + (match_operand:V16QI 4 "nonimmediate_operand" "x,Bt,x,Bt")
> > > (match_operand:SI 5 "register_operand" "d,d,d,d")
> > > (match_operand:SI 6 "const_0_to_255_operand")]
> > > UNSPEC_PCMPESTR))
> > > @@ -24385,6 +24417,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
> > > %vpcmpestri\t{%6, %4, %2|%2, %4, %6}
> > > %vpcmpestri\t{%6, %4, %2|%2, %4, %6}"
> > > [(set_attr "type" "sselog")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "length_immediate" "1")
> > > (set_attr "memory" "none,load,none,load")
> > > @@ -24396,7 +24429,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
> > > [(set (match_operand:SI 0 "register_operand" "=c,c")
> > > (unspec:SI
> > > [(match_operand:V16QI 2 "register_operand" "x,x")
> > > - (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
> > > + (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
> > > (match_operand:SI 4 "const_0_to_255_operand")]
> > > UNSPEC_PCMPISTR))
> > > (set (match_operand:V16QI 1 "register_operand" "=Yz,Yz")
> > > @@ -24439,6 +24472,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
> > > DONE;
> > > }
> > > [(set_attr "type" "sselog")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "length_immediate" "1")
> > > (set_attr "memory" "none,load")
> > > @@ -24448,7 +24482,7 @@ (define_insn "sse4_2_pcmpistri"
> > > [(set (match_operand:SI 0 "register_operand" "=c,c")
> > > (unspec:SI
> > > [(match_operand:V16QI 1 "register_operand" "x,x")
> > > - (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
> > > + (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
> > > (match_operand:SI 3 "const_0_to_255_operand")]
> > > UNSPEC_PCMPISTR))
> > > (set (reg:CC FLAGS_REG)
> > > @@ -24460,6 +24494,7 @@ (define_insn "sse4_2_pcmpistri"
> > > "TARGET_SSE4_2"
> > > "%vpcmpistri\t{%3, %2, %1|%1, %2, %3}"
> > > [(set_attr "type" "sselog")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "length_immediate" "1")
> > > (set_attr "prefix" "maybe_vex")
> > > @@ -24471,7 +24506,7 @@ (define_insn "sse4_2_pcmpistrm"
> > > [(set (match_operand:V16QI 0 "register_operand" "=Yz,Yz")
> > > (unspec:V16QI
> > > [(match_operand:V16QI 1 "register_operand" "x,x")
> > > - (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
> > > + (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
> > > (match_operand:SI 3 "const_0_to_255_operand")]
> > > UNSPEC_PCMPISTR))
> > > (set (reg:CC FLAGS_REG)
> > > @@ -24483,6 +24518,7 @@ (define_insn "sse4_2_pcmpistrm"
> > > "TARGET_SSE4_2"
> > > "%vpcmpistrm\t{%3, %2, %1|%1, %2, %3}"
> > > [(set_attr "type" "sselog")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "length_immediate" "1")
> > > (set_attr "prefix" "maybe_vex")
> > > @@ -24494,7 +24530,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
> > > [(set (reg:CC FLAGS_REG)
> > > (unspec:CC
> > > [(match_operand:V16QI 2 "register_operand" "x,x,x,x")
> > > - (match_operand:V16QI 3 "nonimmediate_operand" "x,m,x,m")
> > > + (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt,x,Bt")
> > > (match_operand:SI 4 "const_0_to_255_operand")]
> > > UNSPEC_PCMPISTR))
> > > (clobber (match_scratch:V16QI 0 "=Yz,Yz,X,X"))
> > > @@ -24506,6 +24542,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
> > > %vpcmpistri\t{%4, %3, %2|%2, %3, %4}
> > > %vpcmpistri\t{%4, %3, %2|%2, %3, %4}"
> > > [(set_attr "type" "sselog")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "length_immediate" "1")
> > > (set_attr "memory" "none,load,none,load")
> > > @@ -25990,23 +26027,25 @@ (define_insn "aesdeclast"
> > >
> > > (define_insn "aesimc"
> > > [(set (match_operand:V2DI 0 "register_operand" "=x")
> > > - (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")]
> > > + (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")]
> > > UNSPEC_AESIMC))]
> > > "TARGET_AES"
> > > "%vaesimc\t{%1, %0|%0, %1}"
> > > [(set_attr "type" "sselog1")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "prefix" "maybe_vex")
> > > (set_attr "mode" "TI")])
> > >
> > > (define_insn "aeskeygenassist"
> > > [(set (match_operand:V2DI 0 "register_operand" "=x")
> > > - (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")
> > > + (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")
> > > (match_operand:SI 2 "const_0_to_255_operand")]
> > > UNSPEC_AESKEYGENASSIST))]
> > > "TARGET_AES"
> > > "%vaeskeygenassist\t{%2, %1, %0|%0, %1, %2}"
> > > [(set_attr "type" "sselog1")
> > > + (set_attr "gpr32" "0")
> > > (set_attr "prefix_extra" "1")
> > > (set_attr "length_immediate" "1")
> > > (set_attr "prefix" "maybe_vex")
> > > diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> > > index 510213a6ca7..771bcb078e1 100644
> > > --- a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> > > +++ b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
> > > @@ -45,13 +45,22 @@ typedef union
> > > DTYPE a[16];
> > > } tmp_u;
> > >
> > > -__attribute__((target("sse4.2")))
> > > +__attribute__((target("sse4.2,aes")))
> > > void sse_test ()
> > > {
> > > register tmp_u *tdst __asm__("%r16");
> > > register tmp_u *src1 __asm__("%r17");
> > > register tmp_u *src2 __asm__("%r18");
> > > -
> > > +
> > > + src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
> > > + src1->a[2] = _mm_testc_si128 (src1->xi[3], src2->xi[4]);
> > > + src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
> > > + _MM_FROUND_CUR_DIRECTION);
> > > + src1->xf[4] = _mm_round_ps (src1->xf[7], _MM_FROUND_CUR_DIRECTION);
> > > + src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
> > > + _MM_FROUND_CUR_DIRECTION);
> > > + src1->xd[1] = _mm_round_pd (src1->xd[4], _MM_FROUND_CUR_DIRECTION);
> > > +
> > > src1->xi[0] = _mm_hadd_epi16 (tdst->xi[2], src2->xi[3]);
> > > src1->xi[1] = _mm_hadd_epi32 (tdst->xi[0], src2->xi[1]);
> > > tdst->xi[2] = _mm_hadds_epi16 (src1->xi[4], src2->xi[5]);
> > > @@ -77,16 +86,33 @@ void sse_test ()
> > > tdst->xi[1] = _mm_sign_epi8 (src1->xi[5], src2->xi[6]);
> > > tdst->xi[2] = _mm_sign_epi16 (src1->xi[7], src2->xi[0]);
> > > tdst->xi[3] = _mm_sign_epi32 (src1->xi[1], src2->xi[2]);
> > > +
> > > + tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
> > > + tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
> > > + tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
> > > + tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
> > > +
> > > + tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
> > > + tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
> > > }
> > >
> > > -__attribute__((target("avx2")))
> > > +__attribute__((target("avx2,aes")))
> > > void vex_test ()
> > > {
> > >
> > > register tmp_u *tdst __asm__("%r16");
> > > register tmp_u *src1 __asm__("%r17");
> > > register tmp_u *src2 __asm__("%r18");
> > > -
> > > +
> > > + src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
> > > + src1->a[2] = _mm256_testc_si256 (src1->yi[2], src2->yi[3]);
> > > + src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
> > > + _MM_FROUND_CUR_DIRECTION);
> > > + src1->yf[4] = _mm256_round_ps (src1->yf[2], _MM_FROUND_CUR_DIRECTION);
> > > + src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
> > > + _MM_FROUND_CUR_DIRECTION);
> > > + src1->yd[1] = _mm256_round_pd (src1->yd[3], _MM_FROUND_CUR_DIRECTION);
> > > +
> > > src1->yi[1] = _mm256_hadd_epi16 (tdst->yi[2], src2->yi[3]);
> > > src1->yi[2] = _mm256_hadd_epi32 (tdst->yi[0], src2->yi[1]);
> > > tdst->yi[3] = _mm256_hadds_epi16 (src1->yi[1], src2->yi[2]);
> > > @@ -98,7 +124,6 @@ void vex_test ()
> > > src1->yi[1] = _mm256_cmpgt_epi64 (tdst->yi[3], src2->yi[0]);
> > >
> > > tdst->yf[2] = _mm256_dp_ps (src1->yf[0], src2->yf[1], 0xbf);
> > > - tdst->xd[3] = _mm_dp_pd (src1->xd[0], src2->xd[1], 0xbf);
> > >
> > > tdst->yi[3] = _mm256_mpsadbw_epu8 (src1->yi[1], src2->yi[1], 0xc1);
> > >
> > > @@ -112,6 +137,14 @@ void vex_test ()
> > > tdst->yi[2] = _mm256_sign_epi8 (src1->yi[0], src2->yi[1]);
> > > tdst->yi[3] = _mm256_sign_epi16 (src1->yi[2], src2->yi[3]);
> > > tdst->yi[0] = _mm256_sign_epi32 (src1->yi[0], src2->yi[1]);
> > > +
> > > + tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
> > > + tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
> > > + tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
> > > + tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
> > > +
> > > + tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
> > > + tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
> > > }
> > >
> > > /* { dg-final { scan-assembler-not "v?pcmpeqq\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > @@ -134,3 +167,15 @@ void vex_test ()
> > > /* { dg-final { scan-assembler-not "v?psignb\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > /* { dg-final { scan-assembler-not "v?psignw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > /* { dg-final { scan-assembler-not "v?psignd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?phminposuw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?ptest\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?roundss\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?roundsd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?roundps\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?roundpd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?pcmpestri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?pcmpistri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?pcmpestrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?pcmpistrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?aesimc\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > +/* { dg-final { scan-assembler-not "v?aeskeygenassist\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
> > > --
> > > 2.31.1
> > >
@@ -65,6 +65,7 @@ extern bool extended_reg_mentioned_p (rtx);
extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
extern bool x86_extended_reg_mentioned_p (rtx);
extern bool x86_extended_rex2reg_mentioned_p (rtx);
+extern bool x86_evex_reg_mentioned_p (rtx [], int);
extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
@@ -22936,6 +22936,19 @@ x86_extended_rex2reg_mentioned_p (rtx insn)
return false;
}
+/* Return true when rtx operands mentions register that must be encoded using
+ evex prefix. */
+bool
+x86_evex_reg_mentioned_p (rtx operands[], int nops)
+{
+ int i;
+ for (i = 0; i < nops; i++)
+ if (EXT_REX_SSE_REG_P (operands[i])
+ || x86_extended_rex2reg_mentioned_p (operands[i]))
+ return true;
+ return false;
+}
+
/* If profitable, negate (without causing overflow) integer constant
of mode MODE at location LOC. Return true in this case. */
bool
@@ -21603,7 +21603,7 @@ (define_expand "significand<mode>2"
(define_insn "sse4_1_round<mode>2"
[(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v")
(unspec:MODEFH
- [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,m,v,m")
+ [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,Bt,v,m")
(match_operand:SI 2 "const_0_to_15_operand")]
UNSPEC_ROUND))]
"TARGET_SSE4_1"
@@ -21616,6 +21616,7 @@ (define_insn "sse4_1_round<mode>2"
[(set_attr "type" "ssecvt")
(set_attr "prefix_extra" "1,1,1,*,*")
(set_attr "length_immediate" "1")
+ (set_attr "gpr32" "1,1,0,1,1")
(set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
(set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
(set_attr "avx_partial_xmm_update" "false,false,true,false,true")
@@ -22617,11 +22617,12 @@ (define_insn "avx2_pblendd<mode>"
(define_insn "sse4_1_phminposuw"
[(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,x")
- (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBm,*xBm,xm")]
+ (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBT,*xBT,xBt")]
UNSPEC_PHMINPOSUW))]
"TARGET_SSE4_1"
"%vphminposuw\t{%1, %0|%0, %1}"
[(set_attr "isa" "noavx,noavx,avx")
+ (set_attr "gpr32" "0")
(set_attr "type" "sselog1")
(set_attr "prefix_extra" "1")
(set_attr "prefix" "orig,orig,vex")
@@ -23810,12 +23811,13 @@ (define_insn "avx_vtest<ssemodesuffix><avxsizesuffix>"
(define_insn "*<sse4_1>_ptest<mode>"
[(set (reg FLAGS_REG)
(unspec [(match_operand:V_AVX 0 "register_operand" "Yr, *x, x")
- (match_operand:V_AVX 1 "vector_operand" "YrBm, *xBm, xm")]
+ (match_operand:V_AVX 1 "vector_operand" "YrBT, *xBT, xBt")]
UNSPEC_PTEST))]
"TARGET_SSE4_1 && ix86_match_ptest_ccmode (insn)"
"%vptest\t{%1, %0|%0, %1}"
[(set_attr "isa" "noavx,noavx,avx")
(set_attr "type" "ssecomi")
+ (set_attr "gpr32" "0")
(set_attr "prefix_extra" "1")
(set_attr "prefix" "orig,orig,vex")
(set (attr "btver2_decode")
@@ -23852,12 +23854,13 @@ (define_expand "<sse4_1>_ptest<mode>"
(define_insn "ptesttf2"
[(set (reg:CC FLAGS_REG)
(unspec:CC [(match_operand:TF 0 "register_operand" "Yr, *x, x")
- (match_operand:TF 1 "vector_operand" "YrBm, *xBm, xm")]
+ (match_operand:TF 1 "vector_operand" "YrBT, *xBT, xBt")]
UNSPEC_PTEST))]
"TARGET_SSE4_1"
"%vptest\t{%1, %0|%0, %1}"
[(set_attr "isa" "noavx,noavx,avx")
(set_attr "type" "ssecomi")
+ (set_attr "gpr32" "0")
(set_attr "prefix_extra" "1")
(set_attr "prefix" "orig,orig,vex")
(set_attr "mode" "TI")])
@@ -23968,13 +23971,14 @@ (define_expand "lrint<mode><sseintvecmodelower>2"
(define_insn "<sse4_1>_round<ssemodesuffix><avxsizesuffix>"
[(set (match_operand:VF_128_256 0 "register_operand" "=Yr,*x,x")
(unspec:VF_128_256
- [(match_operand:VF_128_256 1 "vector_operand" "YrBm,*xBm,xm")
+ [(match_operand:VF_128_256 1 "vector_operand" "YrBT,*xBT,xBt")
(match_operand:SI 2 "const_0_to_15_operand")]
UNSPEC_ROUND))]
"TARGET_SSE4_1"
"%vround<ssemodesuffix>\t{%2, %1, %0|%0, %1, %2}"
[(set_attr "isa" "noavx,noavx,avx")
(set_attr "type" "ssecvt")
+ (set_attr "gpr32" "0")
(set_attr "prefix_data16" "1,1,*")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
@@ -24061,19 +24065,32 @@ (define_insn "sse4_1_round<ssescalarmodesuffix>"
[(set (match_operand:VF_128 0 "register_operand" "=Yr,*x,x,v")
(vec_merge:VF_128
(unspec:VF_128
- [(match_operand:VF_128 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
+ [(match_operand:VF_128 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
(match_operand:SI 3 "const_0_to_15_operand")]
UNSPEC_ROUND)
(match_operand:VF_128 1 "register_operand" "0,0,x,v")
(const_int 1)))]
"TARGET_SSE4_1"
- "@
- round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
- round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}
- vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}
- vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}"
- [(set_attr "isa" "noavx,noavx,avx,avx512f")
+{
+ switch (which_alternative)
+ {
+ case 0:
+ case 1:
+ return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %<iptr>2, %3}";
+ case 2:
+ return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
+ case 3:
+ if (x86_evex_reg_mentioned_p (operands, 3))
+ return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
+ else
+ return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %<iptr>2, %3}";
+ default:
+ gcc_unreachable ();
+ }
+}
+ [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
(set_attr "type" "ssecvt")
+ (set_attr "gpr32" "0,0,0,1")
(set_attr "length_immediate" "1")
(set_attr "prefix_data16" "1,1,*,*")
(set_attr "prefix_extra" "1")
@@ -24085,19 +24102,32 @@ (define_insn "*sse4_1_round<ssescalarmodesuffix>"
(vec_merge:VFH_128
(vec_duplicate:VFH_128
(unspec:<ssescalarmode>
- [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "Yrm,*xm,xm,vm")
+ [(match_operand:<ssescalarmode> 2 "nonimmediate_operand" "YrBt,*xBt,xBt,vm")
(match_operand:SI 3 "const_0_to_15_operand")]
UNSPEC_ROUND))
(match_operand:VFH_128 1 "register_operand" "0,0,x,v")
(const_int 1)))]
"TARGET_SSE4_1"
- "@
- round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
- round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}
- vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}
- vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}"
- [(set_attr "isa" "noavx,noavx,avx,avx512f")
+{
+ switch (which_alternative)
+ {
+ case 0:
+ case 1:
+ return "round<ssescalarmodesuffix>\t{%3, %2, %0|%0, %2, %3}";
+ case 2:
+ return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+ case 3:
+ if (x86_evex_reg_mentioned_p (operands, 3) || <MODE>mode == V8HFmode)
+ return "vrndscale<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+ else
+ return "vround<ssescalarmodesuffix>\t{%3, %2, %1, %0|%0, %1, %2, %3}";
+ default:
+ gcc_unreachable ();
+ }
+}
+ [(set_attr "isa" "noavx,noavx,noavx512f,avx512f")
(set_attr "type" "ssecvt")
+ (set_attr "gpr32" "0,0,0,1")
(set_attr "length_immediate" "1")
(set_attr "prefix_data16" "1,1,*,*")
(set_attr "prefix_extra" "1")
@@ -24318,7 +24348,7 @@ (define_insn "sse4_2_pcmpestri"
(unspec:SI
[(match_operand:V16QI 1 "register_operand" "x,x")
(match_operand:SI 2 "register_operand" "a,a")
- (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
+ (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
(match_operand:SI 4 "register_operand" "d,d")
(match_operand:SI 5 "const_0_to_255_operand")]
UNSPEC_PCMPESTR))
@@ -24333,6 +24363,7 @@ (define_insn "sse4_2_pcmpestri"
"TARGET_SSE4_2"
"%vpcmpestri\t{%5, %3, %1|%1, %3, %5}"
[(set_attr "type" "sselog")
+ (set_attr "gpr32" "0")
(set_attr "prefix_extra" "1")
(set_attr "prefix" "maybe_vex")
(set_attr "length_immediate" "1")
@@ -24345,7 +24376,7 @@ (define_insn "sse4_2_pcmpestrm"
(unspec:V16QI
[(match_operand:V16QI 1 "register_operand" "x,x")
(match_operand:SI 2 "register_operand" "a,a")
- (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
+ (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
(match_operand:SI 4 "register_operand" "d,d")
(match_operand:SI 5 "const_0_to_255_operand")]
UNSPEC_PCMPESTR))
@@ -24360,6 +24391,7 @@ (define_insn "sse4_2_pcmpestrm"
"TARGET_SSE4_2"
"%vpcmpestrm\t{%5, %3, %1|%1, %3, %5}"
[(set_attr "type" "sselog")
+ (set_attr "gpr32" "0")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "prefix" "maybe_vex")
@@ -24372,7 +24404,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
(unspec:CC
[(match_operand:V16QI 2 "register_operand" "x,x,x,x")
(match_operand:SI 3 "register_operand" "a,a,a,a")
- (match_operand:V16QI 4 "nonimmediate_operand" "x,m,x,m")
+ (match_operand:V16QI 4 "nonimmediate_operand" "x,Bt,x,Bt")
(match_operand:SI 5 "register_operand" "d,d,d,d")
(match_operand:SI 6 "const_0_to_255_operand")]
UNSPEC_PCMPESTR))
@@ -24385,6 +24417,7 @@ (define_insn "sse4_2_pcmpestr_cconly"
%vpcmpestri\t{%6, %4, %2|%2, %4, %6}
%vpcmpestri\t{%6, %4, %2|%2, %4, %6}"
[(set_attr "type" "sselog")
+ (set_attr "gpr32" "0")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "memory" "none,load,none,load")
@@ -24396,7 +24429,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
[(set (match_operand:SI 0 "register_operand" "=c,c")
(unspec:SI
[(match_operand:V16QI 2 "register_operand" "x,x")
- (match_operand:V16QI 3 "nonimmediate_operand" "x,m")
+ (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt")
(match_operand:SI 4 "const_0_to_255_operand")]
UNSPEC_PCMPISTR))
(set (match_operand:V16QI 1 "register_operand" "=Yz,Yz")
@@ -24439,6 +24472,7 @@ (define_insn_and_split "sse4_2_pcmpistr"
DONE;
}
[(set_attr "type" "sselog")
+ (set_attr "gpr32" "0")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "memory" "none,load")
@@ -24448,7 +24482,7 @@ (define_insn "sse4_2_pcmpistri"
[(set (match_operand:SI 0 "register_operand" "=c,c")
(unspec:SI
[(match_operand:V16QI 1 "register_operand" "x,x")
- (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
+ (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
(match_operand:SI 3 "const_0_to_255_operand")]
UNSPEC_PCMPISTR))
(set (reg:CC FLAGS_REG)
@@ -24460,6 +24494,7 @@ (define_insn "sse4_2_pcmpistri"
"TARGET_SSE4_2"
"%vpcmpistri\t{%3, %2, %1|%1, %2, %3}"
[(set_attr "type" "sselog")
+ (set_attr "gpr32" "0")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "prefix" "maybe_vex")
@@ -24471,7 +24506,7 @@ (define_insn "sse4_2_pcmpistrm"
[(set (match_operand:V16QI 0 "register_operand" "=Yz,Yz")
(unspec:V16QI
[(match_operand:V16QI 1 "register_operand" "x,x")
- (match_operand:V16QI 2 "nonimmediate_operand" "x,m")
+ (match_operand:V16QI 2 "nonimmediate_operand" "x,Bt")
(match_operand:SI 3 "const_0_to_255_operand")]
UNSPEC_PCMPISTR))
(set (reg:CC FLAGS_REG)
@@ -24483,6 +24518,7 @@ (define_insn "sse4_2_pcmpistrm"
"TARGET_SSE4_2"
"%vpcmpistrm\t{%3, %2, %1|%1, %2, %3}"
[(set_attr "type" "sselog")
+ (set_attr "gpr32" "0")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "prefix" "maybe_vex")
@@ -24494,7 +24530,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
[(set (reg:CC FLAGS_REG)
(unspec:CC
[(match_operand:V16QI 2 "register_operand" "x,x,x,x")
- (match_operand:V16QI 3 "nonimmediate_operand" "x,m,x,m")
+ (match_operand:V16QI 3 "nonimmediate_operand" "x,Bt,x,Bt")
(match_operand:SI 4 "const_0_to_255_operand")]
UNSPEC_PCMPISTR))
(clobber (match_scratch:V16QI 0 "=Yz,Yz,X,X"))
@@ -24506,6 +24542,7 @@ (define_insn "sse4_2_pcmpistr_cconly"
%vpcmpistri\t{%4, %3, %2|%2, %3, %4}
%vpcmpistri\t{%4, %3, %2|%2, %3, %4}"
[(set_attr "type" "sselog")
+ (set_attr "gpr32" "0")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "memory" "none,load,none,load")
@@ -25990,23 +26027,25 @@ (define_insn "aesdeclast"
(define_insn "aesimc"
[(set (match_operand:V2DI 0 "register_operand" "=x")
- (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")]
+ (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")]
UNSPEC_AESIMC))]
"TARGET_AES"
"%vaesimc\t{%1, %0|%0, %1}"
[(set_attr "type" "sselog1")
+ (set_attr "gpr32" "0")
(set_attr "prefix_extra" "1")
(set_attr "prefix" "maybe_vex")
(set_attr "mode" "TI")])
(define_insn "aeskeygenassist"
[(set (match_operand:V2DI 0 "register_operand" "=x")
- (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBm")
+ (unspec:V2DI [(match_operand:V2DI 1 "vector_operand" "xBT")
(match_operand:SI 2 "const_0_to_255_operand")]
UNSPEC_AESKEYGENASSIST))]
"TARGET_AES"
"%vaeskeygenassist\t{%2, %1, %0|%0, %1, %2}"
[(set_attr "type" "sselog1")
+ (set_attr "gpr32" "0")
(set_attr "prefix_extra" "1")
(set_attr "length_immediate" "1")
(set_attr "prefix" "maybe_vex")
@@ -45,13 +45,22 @@ typedef union
DTYPE a[16];
} tmp_u;
-__attribute__((target("sse4.2")))
+__attribute__((target("sse4.2,aes")))
void sse_test ()
{
register tmp_u *tdst __asm__("%r16");
register tmp_u *src1 __asm__("%r17");
register tmp_u *src2 __asm__("%r18");
-
+
+ src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
+ src1->a[2] = _mm_testc_si128 (src1->xi[3], src2->xi[4]);
+ src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
+ _MM_FROUND_CUR_DIRECTION);
+ src1->xf[4] = _mm_round_ps (src1->xf[7], _MM_FROUND_CUR_DIRECTION);
+ src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
+ _MM_FROUND_CUR_DIRECTION);
+ src1->xd[1] = _mm_round_pd (src1->xd[4], _MM_FROUND_CUR_DIRECTION);
+
src1->xi[0] = _mm_hadd_epi16 (tdst->xi[2], src2->xi[3]);
src1->xi[1] = _mm_hadd_epi32 (tdst->xi[0], src2->xi[1]);
tdst->xi[2] = _mm_hadds_epi16 (src1->xi[4], src2->xi[5]);
@@ -77,16 +86,33 @@ void sse_test ()
tdst->xi[1] = _mm_sign_epi8 (src1->xi[5], src2->xi[6]);
tdst->xi[2] = _mm_sign_epi16 (src1->xi[7], src2->xi[0]);
tdst->xi[3] = _mm_sign_epi32 (src1->xi[1], src2->xi[2]);
+
+ tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
+ tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
+ tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
+ tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
+
+ tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
+ tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
}
-__attribute__((target("avx2")))
+__attribute__((target("avx2,aes")))
void vex_test ()
{
register tmp_u *tdst __asm__("%r16");
register tmp_u *src1 __asm__("%r17");
register tmp_u *src2 __asm__("%r18");
-
+
+ src1->xi[0] = _mm_minpos_epu16 (src1->xi[1]);
+ src1->a[2] = _mm256_testc_si256 (src1->yi[2], src2->yi[3]);
+ src1->xf[3] = _mm_round_ss (src1->xf[5], src2->xf[6],
+ _MM_FROUND_CUR_DIRECTION);
+ src1->yf[4] = _mm256_round_ps (src1->yf[2], _MM_FROUND_CUR_DIRECTION);
+ src1->xd[0] = _mm_round_sd (src1->xd[2], src2->xd[3],
+ _MM_FROUND_CUR_DIRECTION);
+ src1->yd[1] = _mm256_round_pd (src1->yd[3], _MM_FROUND_CUR_DIRECTION);
+
src1->yi[1] = _mm256_hadd_epi16 (tdst->yi[2], src2->yi[3]);
src1->yi[2] = _mm256_hadd_epi32 (tdst->yi[0], src2->yi[1]);
tdst->yi[3] = _mm256_hadds_epi16 (src1->yi[1], src2->yi[2]);
@@ -98,7 +124,6 @@ void vex_test ()
src1->yi[1] = _mm256_cmpgt_epi64 (tdst->yi[3], src2->yi[0]);
tdst->yf[2] = _mm256_dp_ps (src1->yf[0], src2->yf[1], 0xbf);
- tdst->xd[3] = _mm_dp_pd (src1->xd[0], src2->xd[1], 0xbf);
tdst->yi[3] = _mm256_mpsadbw_epu8 (src1->yi[1], src2->yi[1], 0xc1);
@@ -112,6 +137,14 @@ void vex_test ()
tdst->yi[2] = _mm256_sign_epi8 (src1->yi[0], src2->yi[1]);
tdst->yi[3] = _mm256_sign_epi16 (src1->yi[2], src2->yi[3]);
tdst->yi[0] = _mm256_sign_epi32 (src1->yi[0], src2->yi[1]);
+
+ tdst->a[2] = _mm_cmpestri (src1->xi[3], 16, src2->xi[4], 16, 0x0c);
+ tdst->xi[4] = _mm_cmpestrm (src1->xi[3], 16, src2->xi[4], 16, 0x20);
+ tdst->a[5] = _mm_cmpistri (src1->xi[5], src2->xi[6], 0x30);
+ tdst->xi[6] = _mm_cmpistrm (src1->xi[5], src2->xi[6], 0x40);
+
+ tdst->xi[7] = _mm_aesimc_si128 (src1->xi[7]);
+ tdst->xi[0] = _mm_aeskeygenassist_si128 (src1->xi[1], 0x1b);
}
/* { dg-final { scan-assembler-not "v?pcmpeqq\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
@@ -134,3 +167,15 @@ void vex_test ()
/* { dg-final { scan-assembler-not "v?psignb\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
/* { dg-final { scan-assembler-not "v?psignw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
/* { dg-final { scan-assembler-not "v?psignd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?phminposuw\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?ptest\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?roundss\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?roundsd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?roundps\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?roundpd\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?pcmpestri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?pcmpistri\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?pcmpestrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?pcmpistrm\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?aesimc\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */
+/* { dg-final { scan-assembler-not "v?aeskeygenassist\[ \\t]+\\\.\\\*r\(1\[6-9\]\|2\[0-9\]|30\|31\)" } } */