diff mbox series

[v3,4/9] Support APX GPR32 with extend evex prefix

Message ID	20231124070213.3886483-4-lili.cui@intel.com
State	Unresolved
Headers	Received-SPF: pass (google.com: domain of binutils-bounces+ouuuleilei=gmail.com@sourceware.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 776873858D3C From: "Cui, Lili" <lili.cui@intel.com> To: binutils@sourceware.org Cc: jbeulich@suse.com, hongjiu.lu@intel.com Subject: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix Date: Fri, 24 Nov 2023 07:02:08 +0000 Message-Id: <20231124070213.3886483-4-lili.cui@intel.com> In-Reply-To: <20231124070213.3886483-1-lili.cui@intel.com> References: <20231124070213.3886483-1-lili.cui@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: binutils-bounces+ouuuleilei=gmail.com@sourceware.org X-getmail-retrieved-from-mailbox: INBOX
Series	[1/9] Make const_1_mode print $1 in AT&T syntax \| [1/9] Make const_1_mode print $1 in AT&T syntax [v3,2/9] Support APX GPR32 with rex2 prefix [v3,4/9] Support APX GPR32 with extend evex prefix [v3,5/9] Add tests for APX GPR32 with extend evex prefix [v3,6/9] Support APX NDD [v3,7/9] Support APX Push2/Pop2 [v3,8/9] Support APX NDD optimized encoding. [v3,9/9] Support APX JMPABS for disassembler

Checks

Context	Check	Description
snail/binutils-gdb-check	warning	Git am fail log

Commit Message

Cui, Lili Nov. 24, 2023, 7:02 a.m. UTC

  This patch adds non-ND, non-NF forms of EVEX promotion insn.

EVEX extension of legacy instructions:
  All promoted legacy instructions are placed in EVEX map 4, which is
  currently reserved.
EVEX extension of EVEX instructions:
  All existing EVEX instructions are extended by APX using the extended
  EVEX prefix, so that they can access all 32 GPRs.
EVEX extension of VEX instructions:
  Promoting a VEX instruction into the EVEX space does not change the map
  id, the opcode, or the operand encoding of the VEX instruction.

Note: The promoted versions of MOVBE will be extended to include the “MOVBE
  reg1, reg2”.

  gas/ChangeLog:

  2023-11-21  Lingling Kong <lingling.kong@intel.com>
	      H.J. Lu  <hongjiu.lu@intel.com>
	      Lili Cui <lili.cui@intel.com>
	      Lin Hu   <lin1.hu@intel.com>

	* config/tc-i386.c (cpu_flags_not_or_check): Add a new
	function for APX cpu flag checking.
	(cpu_flags_match): handle cpu_flags_not_or_check.
	(install_template): Add AMX_TILE and APX combine.
	(is_any_apx_evex_encoding): Test apx evex encoding.
	(build_apx_evex_prefix): Enabe APX evex prefix.
	(md_assemble): Handle apx with evex encoding.
	(check_EgprOperands): Add nodgpr check for apx.
	(process_suffix): Handle apx map4 prefix.
	(check_register): Assign i.vec_encoding for APX evex instructions.
	* testsuite/gas/i386/x86-64-evex.d: Adjust test cases.
	* gas/testsuite/gas/i386/x86-64-inval-movbe.s: Ditto.
	* gas/testsuite/gas/i386/x86-64-inval-movbe.l: Ditto.

opcodes/ChangeLog:

	* i386-dis-evex-len.h: Handle EVEX_LEN_0F38F2, EVEX_LEN_0F38F3.
	* i386-dis-evex-prefix.h: Handle PREFIX_EVEX_0F38F2_L_0,
	PREFIX_EVEX_0F38F3_L_0, PREFIX_EVEX_MAP4_D8,
	PREFIX_EVEX_MAP4_DA, PREFIX_EVEX_MAP4_D8,
	PREFIX_EVEX_MAP4_DA, PREFIX_EVEX_MAP4_DB,
	PREFIX_EVEX_MAP4_DC, PREFIX_EVEX_MAP4_DD,
	PREFIX_EVEX_MAP4_DE, PREFIX_EVEX_MAP4_DF,
	PREFIX_EVEX_MAP4_F0, PREFIX_EVEX_MAP4_F1,
	PREFIX_EVEX_MAP4_F2, PREFIX_EVEX_MAP4_F8.
	* i386-dis-evex-reg.h: Handle REG_EVEX_0F38F3_L_0_P_0.
	* i386-dis-evex.h: Add EVEX_MAP4_ for legacy insn
	promote to apx to use gpr32
	* opcodes/i386-dis-evex-x86-64.h: Handle Add X86_64_EVEX_0F90,
	X86_64_EVEX_0F92, X86_64_EVEX_0F93, X86_64_EVEX_0F3849,
	X86_64_EVEX_0F384B, X86_64_EVEX_0F38E0, X86_64_EVEX_0F38E1,
	X86_64_EVEX_0F38E2, X86_64_EVEX_0F38E3, X86_64_EVEX_0F38E4,
	X86_64_EVEX_0F38E5, X86_64_EVEX_0F38E6, X86_64_EVEX_0F38E7,
	X86_64_EVEX_0F38E8, X86_64_EVEX_0F38E9, X86_64_EVEX_0F38EA,
	X86_64_EVEX_0F38EB, X86_64_EVEX_0F38EC, X86_64_EVEX_0F38ED,
	X86_64_EVEX_0F38EE, X86_64_EVEX_0F38EF, X86_64_EVEX_0F38F2,
	X86_64_EVEX_0F38F3, X86_64_EVEX_0F38F5, X86_64_EVEX_0F38F6,
	X86_64_EVEX_0F38F7, X86_64_EVEX_0F3AF0, X86_64_EVEX_0F91.
	* i386-dis.c
	(struct instr_info): Deleted bool r.
	(PREFIX_NP_OR_DATA): New.
	(NO_PREFIX): New.
	(putop): Ditto.
	(X86_64_EVEX_FROM_VEX_TABLE): Diito.
	(get_valid_dis386): Decode insn erex in extend evex prefix.
	Handle EVEX_MAP4
	(print_insn): Handle PREFIX_DATA_AND_NP_ONLY.
	(print_register): Handle apx instructions decode.
	(OP_E_memory): Diito.
	(OP_G): Diito.
	(OP_XMM): Diito.
	(DistinctDest_Fixup): Diito.
	* i386-gen.c (process_i386_opcode_modifier):
	* i386-opc.h (SPACE_EVEXMAP4): Add legacy insn
	promote to evex.
	* i386-opc.tbl: Handle some legacy and vex insns don't
	support gpr32. And add some legacy insn (map2 / 3) promote
	to evex.
---
 gas/config/tc-i386.c                 |  81 ++++++++++++---
 gas/testsuite/gas/i386/x86-64-evex.d |   2 +-
 gas/testsuite/gas/i386/x86-64.exp    |   2 +-
 opcodes/i386-dis-evex-len.h          |  10 ++
 opcodes/i386-dis-evex-prefix.h       |  66 +++++++++++++
 opcodes/i386-dis-evex-reg.h          |   7 ++
 opcodes/i386-dis-evex-x86-64.h       |  60 +++++++++++
 opcodes/i386-dis-evex.h              |  94 +++++++++---------
 opcodes/i386-dis.c                   | 142 +++++++++++++++++++++++----
 opcodes/i386-gen.c                   |   2 +
 opcodes/i386-opc.h                   |  10 ++
 opcodes/i386-opc.tbl                 |  90 +++++++++++------
 12 files changed, 455 insertions(+), 111 deletions(-)
 create mode 100644 opcodes/i386-dis-evex-x86-64.h

Comments

Jan Beulich Dec. 7, 2023, 12:38 p.m. UTC | #1

On 24.11.2023 08:02, Cui, Lili wrote:
> --- a/gas/config/tc-i386.c
> +++ b/gas/config/tc-i386.c
> @@ -409,6 +409,9 @@ struct _i386_insn
>      /* Compressed disp8*N attribute.  */
>      unsigned int memshift;
>  
> +    /* No CSPAZO flags update.*/
> +    bool has_nf;

As before I don't see the point in adding this field when it's not used
in the change. Note that this is unrelated to the introduction of the NF
attribute right here, which has a reason.

> @@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
>  
>    /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> -  {
> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> -	   || maybe_cpu (t, CpuFMA))
> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
> +    {
> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>  	{
>  	  if (need_evex_encoding ())

There are several issues here:
- Why did you need to change (to the worse) the original code?
- Why did you not model the addition after that original code?
- How come APX_F (CpuAVX512*) constructs appear here, when no AVX512 insn
  can be VEX-encoded?
- If these new macros are really needed for whatever reason, they shouldn't
  be added to opcodes/i386-opc.h when they're useful only in the assembler.
- Style requires a blank before the opening parenthesis in function
  invocations (which also covers function-like macro invocations).

I think I asked before: How is it that you get away without altering
cpu_flags_match(), containing related and quite similar logic?

> @@ -3873,6 +3877,14 @@ is_any_vex_encoding (const insn_template *t)
>    return t->opcode_modifier.vex || t->opcode_modifier.evex;
>  }
>  
> +static INLINE bool
> +is_apx_evex_encoding (void)
> +{
> +  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4
> +    || (i.vex.register_specifier
> +	&& i.vex.register_specifier->reg_flags & RegRex2);
> +}

If you want this to be a function despite being used just once, you'll
need to add a comment mentioning the constraint when calling it (or
else the use of i.rex2 in particular is confusing). I'm sure I commented
on this before, and I thought such a comment had already appeared.

> @@ -5655,17 +5693,17 @@ md_assemble (char *line)
>       instruction already has a prefix, we need to convert old
>       registers to new ones.  */
>  
> -  if ((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
> -       && (i.op[0].regs->reg_flags & RegRex64) != 0)
> -      || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
> -	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
> -      || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> -	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> -	  && (i.rex != 0 || i.rex2 != 0)))
> +  if (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
> +	&& (i.op[0].regs->reg_flags & RegRex64) != 0)
> +       || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
> +	   && (i.op[1].regs->reg_flags & RegRex64) != 0)
> +       || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> +	    || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> +	   && (i.rex != 0 || i.rex2 != 0))))

I'm having trouble spotting the change here: There's an outer pair of
parentheses being added, but that's for no reason unless there's another
change well hidden. Please clarify.

>      {
>        int x;
>  
> -      if (!i.rex2)
> +      if (!is_apx_rex2_encoding () && !is_any_vex_encoding(&i.tm))
>  	i.rex |= REX_OPCODE;

Why the change to is_apx_rex2_encoding()? If that's wanted / needed
here, shouldn't that be put in place by the earlier patch?

> @@ -14233,6 +14276,12 @@ static bool check_register (const reg_entry *r)
>        if (!cpu_arch_flags.bitfield.cpuapx_f
>  	  || flag_code != CODE_64BIT)
>  	return false;
> +
> +      /* When using RegRex2, dual VEX/EVEX templates need to be marked as EVEX.
> +	 For the later install_template function.  */
> +      if (current_templates->start->opcode_modifier.vex
> +	  && current_templates->start->opcode_modifier.evex)
> +	i.vec_encoding = vex_encoding_evex;

I'm afraid I don't understand the 2nd sentence of the comment. This may
be related to my question regarding cpu_flags_match() further up.

The first sentence isn't quite correct either - you don't mark any
template here (and you can't, because we don't even know yet which
template we're going to use).

Finally - do you really need the .evex check here? (I won't exclude
that this yields a better diagnostic in certain cases, but this wants
clarifying if so.)

> --- a/gas/testsuite/gas/i386/x86-64.exp
> +++ b/gas/testsuite/gas/i386/x86-64.exp
> @@ -250,7 +250,7 @@ run_dump_test "x86-64-sse-noavx"
>  run_dump_test "x86-64-movbe"
>  run_dump_test "x86-64-movbe-intel"
>  run_dump_test "x86-64-movbe-suffix"
> -run_list_test "x86-64-inval-movbe" "-al"
> +run_list_test "x86-64-inval-movbe" "-I${srcdir}/$subdir -march=+noapx_f -al"

I can see why you add the -march=, as we've been through this before.
But why the -I ?

> @@ -896,7 +897,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
>  <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
>                        load:Load:0, store:Store:0, +
>                        vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
> -                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>
> +                      rex:REX:x64, rex2:REX2:APX_F, nooptimize:NoOptimize:0>

This change wants to go into the earlier patch?

> @@ -1319,13 +1320,16 @@ getsec, 0xf37, SMX, NoSuf, {}
>  
>  invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
>  invept, 0x660f3880, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> +invept, 0xf3f0, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
>  invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
>  invvpid, 0x660f3881, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> +invvpid, 0xf3f1, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }

Seeing these: Are there any Map4 encodings which aren't EVex128? If not (and
if you're also not hiddenly aware of some appearing in the near future),
please consider making EVexMap4 include this right away. Even if in the
longer run other encodings appear, it'll then be easy to simply replace all
the EVexMap4 uses in a purely mechanical way. Until then shorter template
lines are preferable.

> @@ -1437,7 +1443,6 @@ xgetbv, 0xf01d0, Xsave, NoSuf, {}
>  xsetbv, 0xf01d1, Xsave, NoSuf, {}
>  
>  // xsaveopt
> -
>  xsaveopt, 0xfae/6, Xsaveopt, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr, { Unspecified|BaseIndex }
>  xsaveopt64, 0xfae/6, Xsaveopt&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }

Iirc the earlier patch added that blank line. Why would you do such back
and forth?

> @@ -1837,14 +1842,14 @@ xtest, 0xf01d6, HLE|RTM, NoSuf, {}
>  
>  // BMI2 instructions.
>  
> -bzhi, 0xf5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> -mulx, 0xf2f6, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> -pdep, 0xf2f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> -pext, 0xf3f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> -rorx, 0xf2f0, BMI2, Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSuf, { Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
> -sarx, 0xf3f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> -shlx, 0x66f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> -shrx, 0xf2f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> +bzhi, 0xf5, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }

Hmm, I had specifically suggested a pre-processor macro to use in place of the
open-coded BMI2&(BMI2|APX_F). Is there a reason you didn't use that (here and
below)?

Jan

Jan Beulich Dec. 7, 2023, 1:34 p.m. UTC | #2

On 24.11.2023 08:02, Cui, Lili wrote:
> --- /dev/null
> +++ b/opcodes/i386-dis-evex-x86-64.h
> @@ -0,0 +1,60 @@
> +  /* X86_64_EVEX_0F90 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F90) },
> +  },
> +  /* X86_64_EVEX_0F91 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F91) },
> +  },
> +  /* X86_64_EVEX_0F92 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F92) },
> +  },
> +  /* X86_64_EVEX_0F93 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F93) },
> +  },
> +  /* X86_64_EVEX_0F3849 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F3849_X86_64) },
> +  },
> +  /* X86_64_EVEX_0F384B */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F384B_X86_64) },
> +  },
> +  /* X86_64_EVEX_0F38F2 */
> +  {
> +    { Bad_Opcode },
> +    { EVEX_LEN_TABLE (EVEX_LEN_0F38F2) },
> +  },
> +  /* X86_64_EVEX_0F38F3 */
> +  {
> +    { Bad_Opcode },
> +    { EVEX_LEN_TABLE (EVEX_LEN_0F38F3) },
> +  },
> +  /* X86_64_EVEX_0F38F5 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F38F5) },
> +  },
> +  /* X86_64_EVEX_0F38F6 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F38F6) },
> +  },
> +  /* X86_64_EVEX_0F38F7 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F38F7) },
> +  },
> +  /* X86_64_EVEX_0F3AF0 */
> +  {
> +    { Bad_Opcode },
> +    { VEX_LEN_TABLE (VEX_LEN_0F3AF0) },
> +  },

I'm puzzled here: There are two uses of EVEX_LEN_TABLE() and several more
of VEX_LEN_TABLE(). Yet the underlying pattern of those insns is all the
same. I may guess that this is related to PREFIX_OPCODE use in the
respective VEX table entries, yet isn't it then cheaper overall to have
VEX encodings also go through prefix_table[], and then sharing those
entries with EVEX encodings?

What's further puzzling: When setting evex_from_vex you already check
L'L == 0, so there's no reason to go through evex_len_table[] /
vex_len_table[].

> @@ -1268,7 +1296,21 @@ enum
>    X86_64_VEX_0F38ED,
>    X86_64_VEX_0F38EE,
>    X86_64_VEX_0F38EF,
> +
>    X86_64_VEX_MAP7_F8_L_0_W_0_R_0,
> +
> +  X86_64_EVEX_0F90,
> +  X86_64_EVEX_0F91,
> +  X86_64_EVEX_0F92,
> +  X86_64_EVEX_0F93,
> +  X86_64_EVEX_0F3849,
> +  X86_64_EVEX_0F384B,

For these two, won't the respective VEX enumerators and table entries
do?

> @@ -4524,10 +4568,11 @@ static const struct dis386 x86_64_table[][2] = {
>  
>    /* X86_64_VEX_MAP7_F8_L_0_W_0_R_0 */
>    {
> -    { Bad_Opcode },
> -    { PREFIX_TABLE (PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64) },
> +      { Bad_Opcode },
> +      { PREFIX_TABLE (PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64) },
>    },

Actively corrupting indentation here?

> @@ -8733,6 +8778,17 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
>        dp = &prefix_table[dp->op[1].bytemode][vindex];
>        break;
>  
> +    case USE_X86_64_EVEX_FROM_VEX_TABLE:
> +      ins->evex_type = evex_from_vex;
> +      /* EVEX from evex instrucions require that EVEX.z, EVEX.L’L, EVEX.b and

"EVEX from VEX ..."?

> +	 the lower 2 bits of EVEX.aaa must be 0.  */
> +      if ((ins->vex.mask_register_specifier & 0x3) != 0
> +	  || ins->vex.ll != 0
> +	  || ins->vex.zeroing != 0
> +	  || ins->vex.b)
> +	return &bad_opcode;
> +
> +      /* Fall through.  */
>      case USE_X86_64_TABLE:

Instead of falling through here to go through x86_64_table[] (where in all
cases the non-64-bit slot is "bad"), can't you avoid that step and go to
the next step (uniformly the LEN one) right away, saving all those new
table entries (along the lines of what you do below when processing into
evex_from_legacy)?

> @@ -8978,9 +9034,13 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
>        if (!fetch_code (ins->info, ins->codep + 4))
>  	return &err_opcode;
>        /* The first byte after 0x62.  */
> +      if (*ins->codep & 0x8)
> +	ins->rex2 |= REX_B;
> +      if (!(*ins->codep & 0x10))
> +	ins->rex2 |= REX_R;
> +
>        ins->rex = ~(*ins->codep >> 5) & 0x7;
> -      ins->vex.r = *ins->codep & 0x10;
> -      switch ((*ins->codep & 0xf))
> +      switch ((*ins->codep & 0x7))

Please can you take the opportunity and drop the excess parentheses?

> @@ -9041,12 +9106,24 @@ get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
>  
>        if (ins->address_mode != mode_64bit)
>  	{
> +	  if (ins->evex_type != evex_default
> +	      || (ins->rex2 & (REX_B | REX_X)))
> +	    return &bad_opcode;

What's special about X and B?

> @@ -9460,6 +9537,13 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>        dp = get_valid_dis386 (dp, &ins);
>        if (dp == &err_opcode)
>  	goto fetch_error_out;
> +
> +      /* For APX instructions promoted from legacy maps 0/1, prefix
> +	 0x66 is interpreted as the operand size override.  */
> +      if (ins.evex_type == evex_from_legacy
> +	  && ins.vex.prefix == DATA_PREFIX_OPCODE)
> +	sizeflag ^= DFLAG;

I think the comment wants to say "embedded prefix", as "prefix 0x66" is
simply invalid to use with EVEX.

> @@ -9639,6 +9723,24 @@ print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
>        if (ins.last_repnz_prefix >= 0)
>  	ins.all_prefixes[ins.last_repnz_prefix] = 0xf2;
>        break;
> +
> +    case PREFIX_NP_OR_DATA:
> +      if (ins.vex.prefix & ~DATA_PREFIX_OPCODE)

~DATA_PREFIX_OPCODE == 0x99, which likely isn't what you mean here? Do
you perhaps mean e.g. "> DATA_PREFIX_OPCODE"? (Using the opcodes in
vex.prefix is questionable anyway, but that's a pre-existing oddity.)

Jan

Cui, Lili Dec. 8, 2023, 3:21 p.m. UTC | #3

> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Thursday, December 7, 2023 8:39 PM
> To: Cui, Lili <lili.cui@intel.com>
> Cc: Lu, Hongjiu <hongjiu.lu@intel.com>; binutils@sourceware.org
> Subject: Re: [PATCH v3 4/9] Support APX GPR32 with extend evex prefix
> 
> On 24.11.2023 08:02, Cui, Lili wrote:
> > --- a/gas/config/tc-i386.c
> > +++ b/gas/config/tc-i386.c
> > @@ -409,6 +409,9 @@ struct _i386_insn
> >      /* Compressed disp8*N attribute.  */
> >      unsigned int memshift;
> >
> > +    /* No CSPAZO flags update.*/
> > +    bool has_nf;
> 
> As before I don't see the point in adding this field when it's not used in the
> change. Note that this is unrelated to the introduction of the NF attribute right
> here, which has a reason.
> 

Moved.

> > @@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
> >
> >    /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
> >    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> > -  {
> > -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> > -	   || maybe_cpu (t, CpuFMA))
> > -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
> > +    {
> > +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> > +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> APX_F(CpuCMPCCXADD)
> > +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> APX_F(CpuAVX512DQ)
> > +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> >  	{
> >  	  if (need_evex_encoding ())
> 
> There are several issues here:
> - Why did you need to change (to the worse) the original code?
> - Why did you not model the addition after that original code?
> - How come APX_F (CpuAVX512*) constructs appear here, when no AVX512
> insn can be VEX-encoded?

 I don't understand what you mean, we have this combination.

kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F), Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }

> - If these new macros are really needed for whatever reason, they shouldn't
>   be added to opcodes/i386-opc.h when they're useful only in the assembler.
> - Style requires a blank before the opening parenthesis in function
>   invocations (which also covers function-like macro invocations).
> 
> I think I asked before: How is it that you get away without altering
> cpu_flags_match(), containing related and quite similar logic?
> 

For the original logic ( ... || ... ) && ( ... || ...), the content in the first bracket and the content in the following brackets can be combined arbitrarily. I think it is Inaccurate. So I give examples one by one for each identified combination.

Just found cpu_flags_match() has similar logic, I think the following is the only code related to CPUID alerts, but none of our combinations are related to cpuavx.

          if (all.bitfield.cpuavx)
            {
              /* We need to check SSE2AVX with AVX.  */
              if (!t->opcode_modifier.sse2avx
                  || (sse2avx && !i.prefix[DATA_PREFIX]))
                match |= CPU_FLAGS_ARCH_MATCH;
            }

> > @@ -3873,6 +3877,14 @@ is_any_vex_encoding (const insn_template *t)
> >    return t->opcode_modifier.vex || t->opcode_modifier.evex;  }
> >
> > +static INLINE bool
> > +is_apx_evex_encoding (void)
> > +{
> > +  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4
> > +    || (i.vex.register_specifier
> > +	&& i.vex.register_specifier->reg_flags & RegRex2); }
> 
> If you want this to be a function despite being used just once, you'll need to
> add a comment mentioning the constraint when calling it (or else the use of
> i.rex2 in particular is confusing). I'm sure I commented on this before, and I
> thought such a comment had already appeared.
> 

I also have the impression that it was added, anyway I will add it.

+/* We can use this function only when the current encoding is evex.  */
 static INLINE bool
 is_apx_evex_encoding (void)
 {

> > @@ -5655,17 +5693,17 @@ md_assemble (char *line)
> >       instruction already has a prefix, we need to convert old
> >       registers to new ones.  */
> >
> > -  if ((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
> > -       && (i.op[0].regs->reg_flags & RegRex64) != 0)
> > -      || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
> > -	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
> > -      || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> > -	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> > -	  && (i.rex != 0 || i.rex2 != 0)))
> > +  if (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
> > +	&& (i.op[0].regs->reg_flags & RegRex64) != 0)
> > +       || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
> > +	   && (i.op[1].regs->reg_flags & RegRex64) != 0)
> > +       || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> > +	    || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> > +	   && (i.rex != 0 || i.rex2 != 0))))
> 
> I'm having trouble spotting the change here: There's an outer pair of
> parentheses being added, but that's for no reason unless there's another
> change well hidden. Please clarify.
> 

Removed.

> >      {
> >        int x;
> >
> > -      if (!i.rex2)
> > +      if (!is_apx_rex2_encoding () && !is_any_vex_encoding(&i.tm))
> >  	i.rex |= REX_OPCODE;
> 
> Why the change to is_apx_rex2_encoding()? If that's wanted / needed here,
> shouldn't that be put in place by the earlier patch?
>

Moved to the corresponding patch.

> > @@ -14233,6 +14276,12 @@ static bool check_register (const reg_entry
> *r)
> >        if (!cpu_arch_flags.bitfield.cpuapx_f
> >  	  || flag_code != CODE_64BIT)
> >  	return false;
> > +
> > +      /* When using RegRex2, dual VEX/EVEX templates need to be marked as
> EVEX.
> > +	 For the later install_template function.  */
> > +      if (current_templates->start->opcode_modifier.vex
> > +	  && current_templates->start->opcode_modifier.evex)
> > +	i.vec_encoding = vex_encoding_evex;
> 
> I'm afraid I don't understand the 2nd sentence of the comment. This may be
> related to my question regarding cpu_flags_match() further up.
> 
> The first sentence isn't quite correct either - you don't mark any template here
> (and you can't, because we don't even know yet which template we're going
> to use).
> 
> Finally - do you really need the .evex check here? (I won't exclude that this
> yields a better diagnostic in certain cases, but this wants clarifying if so.)
> 

If you look at install_template(), you'll see that before this function we need to know if the current encoding is evex. We need to check opcode_modifier.evex here, it is a fix for issues caused by the merge of VEX and EVEX.
  if (t->opcode_modifier.vex && t->opcode_modifier.evex)
    {
      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
          || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
          || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
          || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
        {
          if (need_evex_encoding ())
            {

> > --- a/gas/testsuite/gas/i386/x86-64.exp
> > +++ b/gas/testsuite/gas/i386/x86-64.exp
> > @@ -250,7 +250,7 @@ run_dump_test "x86-64-sse-noavx"
> >  run_dump_test "x86-64-movbe"
> >  run_dump_test "x86-64-movbe-intel"
> >  run_dump_test "x86-64-movbe-suffix"
> > -run_list_test "x86-64-inval-movbe" "-al"
> > +run_list_test "x86-64-inval-movbe" "-I${srcdir}/$subdir -march=+noapx_f -
> al"
> 
> I can see why you add the -march=, as we've been through this before.
> But why the -I ?
> 

Removed, It is redundant.

> > @@ -896,7 +897,7 @@ rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
> > <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
> >                        load:Load:0, store:Store:0, +
> >                        vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
> > -                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>
> > +                      rex:REX:x64, rex2:REX2:APX_F,
> > + nooptimize:NoOptimize:0>
> 
> This change wants to go into the earlier patch?
> 

Done.

> > @@ -1319,13 +1320,16 @@ getsec, 0xf37, SMX, NoSuf, {}
> >
> >  invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, {
> > Oword|Unspecified|BaseIndex, Reg32 }  invept, 0x660f3880, EPT&x64,
> > Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> > +invept, 0xf3f0, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, {
> > +Oword|Unspecified|BaseIndex, Reg64 }
> >  invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, {
> > Oword|Unspecified|BaseIndex, Reg32 }  invvpid, 0x660f3881, EPT&x64,
> > Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> > +invvpid, 0xf3f1, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, {
> > +Oword|Unspecified|BaseIndex, Reg64 }
> 
> Seeing these: Are there any Map4 encodings which aren't EVex128? If not
> (and if you're also not hiddenly aware of some appearing in the near future),
> please consider making EVexMap4 include this right away. Even if in the longer
> run other encodings appear, it'll then be easy to simply replace all the
> EVexMap4 uses in a purely mechanical way. Until then shorter template lines
> are preferable.
> 

Would you mind defining it this way? Since #define EVex128 is behind it. Considering that you don't like unnecessary changes.

+#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4|EVex=EVEX128

> > @@ -1437,7 +1443,6 @@ xgetbv, 0xf01d0, Xsave, NoSuf, {}  xsetbv,
> > 0xf01d1, Xsave, NoSuf, {}
> >
> >  // xsaveopt
> > -
> >  xsaveopt, 0xfae/6, Xsaveopt,
> > Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr,
> { Unspecified|BaseIndex
> > }  xsaveopt64, 0xfae/6, Xsaveopt&x64, Modrm|NoSuf|Size64|NoEgpr, {
> > Unspecified|BaseIndex }
> 
> Iirc the earlier patch added that blank line. Why would you do such back and
> forth?
> 

Done.

> > @@ -1837,14 +1842,14 @@ xtest, 0xf01d6, HLE|RTM, NoSuf, {}
> >
> >  // BMI2 instructions.
> >
> > -bzhi, 0xf5, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
> _bSuf|No
> > _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> > Reg32|Reg64 } -mulx, 0xf2f6, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
> uf|No_sSu
> > f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> > -pdep, 0xf2f5, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
> uf|No_sSu
> > f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> > -pext, 0xf3f5, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
> uf|No_sSu
> > f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
> > -rorx, 0xf2f0, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSu
> f, {
> > Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex,
> Reg32|Reg64
> > } -sarx, 0xf3f7, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
> _bSuf|No
> > _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> > Reg32|Reg64 } -shlx, 0x66f7, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
> _bSuf|No
> > _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> > Reg32|Reg64 } -shrx, 0xf2f7, BMI2,
> >
> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
> _bSuf|No
> > _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> > Reg32|Reg64 }
> > +bzhi, 0xf5, BMI2&(BMI2|APX_F),
> >
> +Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapS
> ources|N
> > +o_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64,
> > +Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> 
> Hmm, I had specifically suggested a pre-processor macro to use in place of the
> open-coded BMI2&(BMI2|APX_F). Is there a reason you didn't use that (here
> and below)?
> 

There are many different types of combinations, and each combination appears relatively few times, so I think adding a #define for each combination feels a bit wasteful.

Thanks,
Lili.

Cui, Lili Dec. 11, 2023, 6:16 a.m. UTC | #4

> On 24.11.2023 08:02, Cui, Lili wrote:
> > --- /dev/null
> > +++ b/opcodes/i386-dis-evex-x86-64.h
> > @@ -0,0 +1,60 @@
> > +  /* X86_64_EVEX_0F90 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F90) },  },
> > +  /* X86_64_EVEX_0F91 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F91) },  },
> > +  /* X86_64_EVEX_0F92 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F92) },  },
> > +  /* X86_64_EVEX_0F93 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F93) },  },
> > +  /* X86_64_EVEX_0F3849 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F3849_X86_64) },  },
> > +  /* X86_64_EVEX_0F384B */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F384B_X86_64) },  },
> > +  /* X86_64_EVEX_0F38F2 */
> > +  {
> > +    { Bad_Opcode },
> > +    { EVEX_LEN_TABLE (EVEX_LEN_0F38F2) },  },
> > +  /* X86_64_EVEX_0F38F3 */
> > +  {
> > +    { Bad_Opcode },
> > +    { EVEX_LEN_TABLE (EVEX_LEN_0F38F3) },  },
> > +  /* X86_64_EVEX_0F38F5 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F38F5) },  },
> > +  /* X86_64_EVEX_0F38F6 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F38F6) },  },
> > +  /* X86_64_EVEX_0F38F7 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F38F7) },  },
> > +  /* X86_64_EVEX_0F3AF0 */
> > +  {
> > +    { Bad_Opcode },
> > +    { VEX_LEN_TABLE (VEX_LEN_0F3AF0) },  },
> 
> I'm puzzled here: There are two uses of EVEX_LEN_TABLE() and several more
> of VEX_LEN_TABLE(). Yet the underlying pattern of those insns is all the same. I
> may guess that this is related to PREFIX_OPCODE use in the respective VEX
> table entries, yet isn't it then cheaper overall to have VEX encodings also go
> through prefix_table[], and then sharing those entries with EVEX encodings?
> 

Done.

> What's further puzzling: When setting evex_from_vex you already check L'L ==
> 0, so there's no reason to go through evex_len_table[] / vex_len_table[].
> 

Directly use the next level of len_table[].

> > @@ -1268,7 +1296,21 @@ enum
> >    X86_64_VEX_0F38ED,
> >    X86_64_VEX_0F38EE,
> >    X86_64_VEX_0F38EF,
> > +
> >    X86_64_VEX_MAP7_F8_L_0_W_0_R_0,
> > +
> > +  X86_64_EVEX_0F90,
> > +  X86_64_EVEX_0F91,
> > +  X86_64_EVEX_0F92,
> > +  X86_64_EVEX_0F93,
> > +  X86_64_EVEX_0F3849,
> > +  X86_64_EVEX_0F384B,
> 
> For these two, won't the respective VEX enumerators and table entries do?
> 

Done.

> > @@ -4524,10 +4568,11 @@ static const struct dis386 x86_64_table[][2] =
> > {
> >
> >    /* X86_64_VEX_MAP7_F8_L_0_W_0_R_0 */
> >    {
> > -    { Bad_Opcode },
> > -    { PREFIX_TABLE (PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64) },
> > +      { Bad_Opcode },
> > +      { PREFIX_TABLE (PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64) },
> >    },
> 
> Actively corrupting indentation here?
> 

Done.

> > @@ -8733,6 +8778,17 @@ get_valid_dis386 (const struct dis386 *dp,
> instr_info *ins)
> >        dp = &prefix_table[dp->op[1].bytemode][vindex];
> >        break;
> >
> > +    case USE_X86_64_EVEX_FROM_VEX_TABLE:
> > +      ins->evex_type = evex_from_vex;
> > +      /* EVEX from evex instrucions require that EVEX.z, EVEX.L’L,
> > + EVEX.b and
> 
> "EVEX from VEX ..."?
> 

Done.

> > +	 the lower 2 bits of EVEX.aaa must be 0.  */
> > +      if ((ins->vex.mask_register_specifier & 0x3) != 0
> > +	  || ins->vex.ll != 0
> > +	  || ins->vex.zeroing != 0
> > +	  || ins->vex.b)
> > +	return &bad_opcode;
> > +
> > +      /* Fall through.  */
> >      case USE_X86_64_TABLE:
> 
> Instead of falling through here to go through x86_64_table[] (where in all
> cases the non-64-bit slot is "bad"), can't you avoid that step and go to the
> next step (uniformly the LEN one) right away, saving all those new table entries
> (along the lines of what you do below when processing into
> evex_from_legacy)?
> 

It's not very clear to me here, do you want to add the vex_len_table to delete all entries in i386-dis-evex-x86-64.h?  but in this way, there are still some instructions that need to go through x86_64_table[], such as X86_64_VEX_0F38E*.

    case USE_X86_64_EVEX_FROM_VEX_TABLE:
      ins->evex_type = evex_from_vex;
      /* EVEX from VEX instrucions require that EVEX.z, EVEX.L’L, EVEX.b and
         the lower 2 bits of EVEX.aaa must be 0.  */
      if ((ins->vex.mask_register_specifier & 0x3) != 0
          || ins->vex.ll != 0
          || ins->vex.zeroing != 0
          || ins->vex.b)
        return &bad_opcode;

     dp = &vex_len_table[dp->op[1].bytemode][0];
break;

> > @@ -8978,9 +9034,13 @@ get_valid_dis386 (const struct dis386 *dp,
> instr_info *ins)
> >        if (!fetch_code (ins->info, ins->codep + 4))
> >  	return &err_opcode;
> >        /* The first byte after 0x62.  */
> > +      if (*ins->codep & 0x8)
> > +	ins->rex2 |= REX_B;
> > +      if (!(*ins->codep & 0x10))
> > +	ins->rex2 |= REX_R;
> > +
> >        ins->rex = ~(*ins->codep >> 5) & 0x7;
> > -      ins->vex.r = *ins->codep & 0x10;
> > -      switch ((*ins->codep & 0xf))
> > +      switch ((*ins->codep & 0x7))
> 
> Please can you take the opportunity and drop the excess parentheses?
> 

Done.

> > @@ -9041,12 +9106,24 @@ get_valid_dis386 (const struct dis386 *dp,
> > instr_info *ins)
> >
> >        if (ins->address_mode != mode_64bit)
> >  	{
> > +	  if (ins->evex_type != evex_default
> > +	      || (ins->rex2 & (REX_B | REX_X)))
> > +	    return &bad_opcode;
> 
> What's special about X and B?
> 

For evex_default, the values of these two bits are fixed. Comment added.

      if (ins->address_mode != mode_64bit)
        {
          /* Report bad for !evex_default and when two fixed values of evex
             change..  */
          if (ins->evex_type != evex_default
              || (ins->rex2 & (REX_B | REX_X)))
            return &bad_opcode;

> > @@ -9460,6 +9537,13 @@ print_insn (bfd_vma pc, disassemble_info *info,
> int intel_syntax)
> >        dp = get_valid_dis386 (dp, &ins);
> >        if (dp == &err_opcode)
> >  	goto fetch_error_out;
> > +
> > +      /* For APX instructions promoted from legacy maps 0/1, prefix
> > +	 0x66 is interpreted as the operand size override.  */
> > +      if (ins.evex_type == evex_from_legacy
> > +	  && ins.vex.prefix == DATA_PREFIX_OPCODE)
> > +	sizeflag ^= DFLAG;
> 
> I think the comment wants to say "embedded prefix", as "prefix 0x66" is
> simply invalid to use with EVEX.
> 

Done, thanks.

> > @@ -9639,6 +9723,24 @@ print_insn (bfd_vma pc, disassemble_info *info,
> int intel_syntax)
> >        if (ins.last_repnz_prefix >= 0)
> >  	ins.all_prefixes[ins.last_repnz_prefix] = 0xf2;
> >        break;
> > +
> > +    case PREFIX_NP_OR_DATA:
> > +      if (ins.vex.prefix & ~DATA_PREFIX_OPCODE)
> 
> ~DATA_PREFIX_OPCODE == 0x99, which likely isn't what you mean here? Do
> you perhaps mean e.g. "> DATA_PREFIX_OPCODE"? (Using the opcodes in
> vex.prefix is questionable anyway, but that's a pre-existing oddity.)
> 

(A || 0) & ~A must be 0. It's hard to read.  

How about this ? This is more intuitive and easy to understand.

    case PREFIX_NP_OR_DATA:
      if (ins.vex.prefix == REPE_PREFIX_OPCODE
          || ins.vex.prefix == REPNE_PREFIX_OPCODE)
        {
          i386_dis_printf (info, dis_style_text, "(bad)");
          ret = ins.end_codep - priv.the_buffer;
          goto out;
        }

Thanks,
Lili.

Jan Beulich Dec. 11, 2023, 8:34 a.m. UTC | #5

On 08.12.2023 16:21, Cui, Lili wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Thursday, December 7, 2023 8:39 PM
>>
>> On 24.11.2023 08:02, Cui, Lili wrote:
>>> @@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
>>>
>>>    /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
>>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>>> -  {
>>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
>>> -	   || maybe_cpu (t, CpuFMA))
>>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
>>> +    {
>>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>> APX_F(CpuCMPCCXADD)
>>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>> APX_F(CpuAVX512DQ)
>>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>>>  	{
>>>  	  if (need_evex_encoding ())
>>
>> There are several issues here:
>> - Why did you need to change (to the worse) the original code?
>> - Why did you not model the addition after that original code?
>> - How come APX_F (CpuAVX512*) constructs appear here, when no AVX512
>> insn can be VEX-encoded?
> 
>  I don't understand what you mean, we have this combination.
> 
> kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F), Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }

Oh, I'm sorry: I forgot about the mask register insns.

>> - If these new macros are really needed for whatever reason, they shouldn't
>>   be added to opcodes/i386-opc.h when they're useful only in the assembler.
>> - Style requires a blank before the opening parenthesis in function
>>   invocations (which also covers function-like macro invocations).
>>
>> I think I asked before: How is it that you get away without altering
>> cpu_flags_match(), containing related and quite similar logic?
>>
> 
> For the original logic ( ... || ... ) && ( ... || ...), the content in the first bracket and the content in the following brackets can be combined arbitrarily. I think it is Inaccurate.

In which way? If there are issues with the existing code, these issues want
taking care of in separate (prereq) patches. Of course there are assumptions
made here about the CPU combinations that can (and cannot) occur in any of
our templates. Similar assumptions are imo fine to make in the APX additions.

Note how I used two nested if()s despite that not having been necessary at
that time. I did so in anticipation that for APX you'd want to add another
(separate) inner if(), rather than altering the one that's there.

> So I give examples one by one for each identified combination.

Which examples are you talking about? I see none given in your reply.

> Just found cpu_flags_match() has similar logic, I think the following is the only code related to CPUID alerts, but none of our combinations are related to cpuavx.
> 
>           if (all.bitfield.cpuavx)
>             {
>               /* We need to check SSE2AVX with AVX.  */
>               if (!t->opcode_modifier.sse2avx
>                   || (sse2avx && !i.prefix[DATA_PREFIX]))
>                 match |= CPU_FLAGS_ARCH_MATCH;
>             }

Not sure why you pick out this one. This special case is needed for sse2avx;
I don't see how it's related here. What I've been pointing you at is the
code in that function which follows a similar "Dual VEX/EVEX templates ..."
comment.

>>> @@ -14233,6 +14276,12 @@ static bool check_register (const reg_entry
>> *r)
>>>        if (!cpu_arch_flags.bitfield.cpuapx_f
>>>  	  || flag_code != CODE_64BIT)
>>>  	return false;
>>> +
>>> +      /* When using RegRex2, dual VEX/EVEX templates need to be marked as
>> EVEX.
>>> +	 For the later install_template function.  */
>>> +      if (current_templates->start->opcode_modifier.vex
>>> +	  && current_templates->start->opcode_modifier.evex)
>>> +	i.vec_encoding = vex_encoding_evex;
>>
>> I'm afraid I don't understand the 2nd sentence of the comment. This may be
>> related to my question regarding cpu_flags_match() further up.
>>
>> The first sentence isn't quite correct either - you don't mark any template here
>> (and you can't, because we don't even know yet which template we're going
>> to use).
>>
>> Finally - do you really need the .evex check here? (I won't exclude that this
>> yields a better diagnostic in certain cases, but this wants clarifying if so.)
>>
> 
> If you look at install_template(), you'll see that before this function we need to know if the current encoding is evex.

"This function" being check_register()? If so, then no, we can't know up front
whether EVEX encoding is going to be needed, as operand parsing happens ahead
of template selection. If instead you mean "that function" and hence
install_template(), then yes, we need to know whether to use EVEX there. Yet
how does that result in a need for the .evex check here? (Or maybe your reply
was really to the first of the three parts of my earlier one?)

But anyway - as said earlier on, using current_templates here looks wrong in
the first place. check_register() deals with only a register, without regard
to the context it is used in (with the sole exception of allow_pseudo_reg).
May I remind you that earlier on I already indicated that I suspect you'll
need a new enumerator to put in i.vec_encoding for this new purpose?

> We need to check opcode_modifier.evex here, it is a fix for issues caused by the merge of VEX and EVEX.
>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>     {
>       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
>           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
>           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>         {
>           if (need_evex_encoding ())
>             {
>[...]
>>> @@ -1319,13 +1320,16 @@ getsec, 0xf37, SMX, NoSuf, {}
>>>
>>>  invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, {
>>> Oword|Unspecified|BaseIndex, Reg32 }  invept, 0x660f3880, EPT&x64,
>>> Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
>>> +invept, 0xf3f0, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, {
>>> +Oword|Unspecified|BaseIndex, Reg64 }
>>>  invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, {
>>> Oword|Unspecified|BaseIndex, Reg32 }  invvpid, 0x660f3881, EPT&x64,
>>> Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
>>> +invvpid, 0xf3f1, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, {
>>> +Oword|Unspecified|BaseIndex, Reg64 }
>>
>> Seeing these: Are there any Map4 encodings which aren't EVex128? If not
>> (and if you're also not hiddenly aware of some appearing in the near future),
>> please consider making EVexMap4 include this right away. Even if in the longer
>> run other encodings appear, it'll then be easy to simply replace all the
>> EVexMap4 uses in a purely mechanical way. Until then shorter template lines
>> are preferable.
>>
> 
> Would you mind defining it this way? Since #define EVex128 is behind it. Considering that you don't like unnecessary changes.
> 
> +#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4|EVex=EVEX128

The order of #define-s doesn't matter. There's no reason not to use EVex128 here
even if it's #define-d only a few lines later.

>>> @@ -1837,14 +1842,14 @@ xtest, 0xf01d6, HLE|RTM, NoSuf, {}
>>>
>>>  // BMI2 instructions.
>>>
>>> -bzhi, 0xf5, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
>> _bSuf|No
>>> _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
>>> Reg32|Reg64 } -mulx, 0xf2f6, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
>> uf|No_sSu
>>> f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
>>> -pdep, 0xf2f5, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
>> uf|No_sSu
>>> f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
>>> -pext, 0xf3f5, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wS
>> uf|No_sSu
>>> f, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
>>> -rorx, 0xf2f0, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSu
>> f, {
>>> Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex,
>> Reg32|Reg64
>>> } -sarx, 0xf3f7, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
>> _bSuf|No
>>> _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
>>> Reg32|Reg64 } -shlx, 0x66f7, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
>> _bSuf|No
>>> _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
>>> Reg32|Reg64 } -shrx, 0xf2f7, BMI2,
>>>
>> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
>> _bSuf|No
>>> _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
>>> Reg32|Reg64 }
>>> +bzhi, 0xf5, BMI2&(BMI2|APX_F),
>>>
>> +Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapS
>> ources|N
>>> +o_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64,
>>> +Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
>>
>> Hmm, I had specifically suggested a pre-processor macro to use in place of the
>> open-coded BMI2&(BMI2|APX_F). Is there a reason you didn't use that (here
>> and below)?
> 
> There are many different types of combinations, and each combination appears relatively few times, so I think adding a #define for each combination feels a bit wasteful.

I never suggested using multiple #define-s. I suggested a single APX_F()
macro which would be used uniformly here and elsewhere (here: APX_F(BMI2)).
And that macro would come with a comment explaining why the expression is
the (seemingly strange) way it is. Right now there's no such explanation
anywhere, and it would also be hard to find a good (central) place where to
put it.

Jan

Jan Beulich Dec. 11, 2023, 8:43 a.m. UTC | #6

On 11.12.2023 07:16, Cui, Lili wrote:
>> On 24.11.2023 08:02, Cui, Lili wrote:
>>> +	 the lower 2 bits of EVEX.aaa must be 0.  */
>>> +      if ((ins->vex.mask_register_specifier & 0x3) != 0
>>> +	  || ins->vex.ll != 0
>>> +	  || ins->vex.zeroing != 0
>>> +	  || ins->vex.b)
>>> +	return &bad_opcode;
>>> +
>>> +      /* Fall through.  */
>>>      case USE_X86_64_TABLE:
>>
>> Instead of falling through here to go through x86_64_table[] (where in all
>> cases the non-64-bit slot is "bad"), can't you avoid that step and go to the
>> next step (uniformly the LEN one) right away, saving all those new table entries
>> (along the lines of what you do below when processing into
>> evex_from_legacy)?
>>
> 
> It's not very clear to me here, do you want to add the vex_len_table to delete all entries in i386-dis-evex-x86-64.h?

Indeed I think that nothing there is really needed / warranted. The case can
be handled with no new table entries at all, I think.

>  but in this way, there are still some instructions that need to go through x86_64_table[], such as X86_64_VEX_0F38E*.

Why would these need special treatment? All EVEX-from-VEX encodings are uniform
in being defined for 64-bit code only.

>>> @@ -9041,12 +9106,24 @@ get_valid_dis386 (const struct dis386 *dp,
>>> instr_info *ins)
>>>
>>>        if (ins->address_mode != mode_64bit)
>>>  	{
>>> +	  if (ins->evex_type != evex_default
>>> +	      || (ins->rex2 & (REX_B | REX_X)))
>>> +	    return &bad_opcode;
>>
>> What's special about X and B?
>>
> 
> For evex_default, the values of these two bits are fixed. Comment added.
> 
>       if (ins->address_mode != mode_64bit)
>         {
>           /* Report bad for !evex_default and when two fixed values of evex
>              change..  */
>           if (ins->evex_type != evex_default
>               || (ins->rex2 & (REX_B | REX_X)))
>             return &bad_opcode;

Maybe you didn't get my point: What's wrong with just checking ins->rex2 here
as a whole, rather than specially treating two of the bits?

>>> @@ -9639,6 +9723,24 @@ print_insn (bfd_vma pc, disassemble_info *info,
>> int intel_syntax)
>>>        if (ins.last_repnz_prefix >= 0)
>>>  	ins.all_prefixes[ins.last_repnz_prefix] = 0xf2;
>>>        break;
>>> +
>>> +    case PREFIX_NP_OR_DATA:
>>> +      if (ins.vex.prefix & ~DATA_PREFIX_OPCODE)
>>
>> ~DATA_PREFIX_OPCODE == 0x99, which likely isn't what you mean here? Do
>> you perhaps mean e.g. "> DATA_PREFIX_OPCODE"? (Using the opcodes in
>> vex.prefix is questionable anyway, but that's a pre-existing oddity.)
>>
> 
> (A || 0) & ~A must be 0. It's hard to read.  
> 
> How about this ? This is more intuitive and easy to understand.
> 
>     case PREFIX_NP_OR_DATA:
>       if (ins.vex.prefix == REPE_PREFIX_OPCODE
>           || ins.vex.prefix == REPNE_PREFIX_OPCODE)
>         {
>           i386_dis_printf (info, dis_style_text, "(bad)");
>           ret = ins.end_codep - priv.the_buffer;
>           goto out;
>         }

That's fine with me as well, sure.

Jan

Jan Beulich Dec. 11, 2023, 11:50 a.m. UTC | #7

On 24.11.2023 08:02, Cui, Lili wrote:
> This patch adds non-ND, non-NF forms of EVEX promotion insn.
> 
> EVEX extension of legacy instructions:
>   All promoted legacy instructions are placed in EVEX map 4, which is
>   currently reserved.
> EVEX extension of EVEX instructions:
>   All existing EVEX instructions are extended by APX using the extended
>   EVEX prefix, so that they can access all 32 GPRs.
> EVEX extension of VEX instructions:
>   Promoting a VEX instruction into the EVEX space does not change the map
>   id, the opcode, or the operand encoding of the VEX instruction.
> 
> Note: The promoted versions of MOVBE will be extended to include the “MOVBE
>   reg1, reg2”.
> 
>   gas/ChangeLog:
> 
>   2023-11-21  Lingling Kong <lingling.kong@intel.com>
> 	      H.J. Lu  <hongjiu.lu@intel.com>
> 	      Lili Cui <lili.cui@intel.com>
> 	      Lin Hu   <lin1.hu@intel.com>
> 
> 	* config/tc-i386.c (cpu_flags_not_or_check): Add a new
> 	function for APX cpu flag checking.
> 	(cpu_flags_match): handle cpu_flags_not_or_check.
> 	(install_template): Add AMX_TILE and APX combine.
> 	(is_any_apx_evex_encoding): Test apx evex encoding.
> 	(build_apx_evex_prefix): Enabe APX evex prefix.
> 	(md_assemble): Handle apx with evex encoding.
> 	(check_EgprOperands): Add nodgpr check for apx.

Btw - these mechanical ChangeLog entries also need keeping up to date. Afaics
check_EgprOperands() isn't touched here (anymore?) at all, as I merely happened
to notice by searching for where the function first appears.

Jan

Cui, Lili Dec. 12, 2023, 10:44 a.m. UTC | #8

> >> On 24.11.2023 08:02, Cui, Lili wrote:
> >>> @@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
> >>>
> >>>    /* Dual VEX/EVEX templates need stripping one of the possible
> variants.  */
> >>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >>> -  {
> >>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> >>> -	   || maybe_cpu (t, CpuFMA))
> >>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
> >>> +    {
> >>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> >>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> >> APX_F(CpuCMPCCXADD)
> >>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> >> APX_F(CpuAVX512DQ)
> >>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> >>>  	{
> >>>  	  if (need_evex_encoding ())
> >>
> >> There are several issues here:
> >> - Why did you need to change (to the worse) the original code?
> >> - Why did you not model the addition after that original code?
> >> - How come APX_F (CpuAVX512*) constructs appear here, when no
> AVX512
> >> insn can be VEX-encoded?
> >
> >  I don't understand what you mean, we have this combination.
> >
> > kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F),
> > Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, {
> > RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
> 
> Oh, I'm sorry: I forgot about the mask register insns.
> 
> >> - If these new macros are really needed for whatever reason, they
> shouldn't
> >>   be added to opcodes/i386-opc.h when they're useful only in the
> assembler.
> >> - Style requires a blank before the opening parenthesis in function
> >>   invocations (which also covers function-like macro invocations).
> >>
> >> I think I asked before: How is it that you get away without altering
> >> cpu_flags_match(), containing related and quite similar logic?
> >>
> >
> > For the original logic ( ... || ... ) && ( ... || ...), the content in the first bracket
> and the content in the following brackets can be combined arbitrarily. I think
> it is Inaccurate.
> 
> In which way? If there are issues with the existing code, these issues want
> taking care of in separate (prereq) patches. Of course there are assumptions
> made here about the CPU combinations that can (and cannot) occur in any of
> our templates. Similar assumptions are imo fine to make in the APX additions.
> 
> Note how I used two nested if()s despite that not having been necessary at
> that time. I did so in anticipation that for APX you'd want to add another
> (separate) inner if(), rather than altering the one that's there.
> 
Hi Jan, 

Could we remove the CPU check here? it's a bit ugly and has limited effectiveness.

  if (t->opcode_modifier.vex && t->opcode_modifier.evex)
    {
      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
          || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
          || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
          || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))


> > So I give examples one by one for each identified combination.
> 
> Which examples are you talking about? I see none given in your reply.
> 

Sorry, I want to say "I've listed every possible combination".

> > Just found cpu_flags_match() has similar logic, I think the following is the
> only code related to CPUID alerts, but none of our combinations are related
> to cpuavx.
> >
> >           if (all.bitfield.cpuavx)
> >             {
> >               /* We need to check SSE2AVX with AVX.  */
> >               if (!t->opcode_modifier.sse2avx
> >                   || (sse2avx && !i.prefix[DATA_PREFIX]))
> >                 match |= CPU_FLAGS_ARCH_MATCH;
> >             }
> 
> Not sure why you pick out this one. This special case is needed for sse2avx; I
> don't see how it's related here. What I've been pointing you at is the code in
> that function which follows a similar "Dual VEX/EVEX templates ..."
> comment.
> 

I know you're talking about this code, I'm just guessing what it does? Don't know what I missed.

For example 

.arch .nobmi
andn    (%eax), %eax, %eax

---------------------------------------------------------------------------------------------
  if (flag_code != CODE_64BIT)
    active = cpu_flags_and_not (cpu_arch_flags, cpu_64_flags);
  else
    active = cpu_arch_flags;                   ---> cpubmi = 0;
  cpu = cpu_flags_and (all, active);      ---> cpuapx =1; cpubmi = 0;
  if (cpu_flags_equal (&cpu, &all))       ---> &cpu and &all are not same.
    {
    ...
    }    
Return  CPU_FLAGS_64BIT_MATCH
----------------------------------------------------------------------------------------------
Then we will report an arch error.

          if (supported != CPU_FLAGS_PERFECT_MATCH)
            {
              as_bad (_("`%s' is not supported on `%s%s'"),
                      insn_name (current_templates.start),
                      cpu_arch_name ? cpu_arch_name : default_arch,
                      cpu_sub_arch_name ? cpu_sub_arch_name : "");
              return NULL;
            }

Thanks,
Lili.

Jan Beulich Dec. 12, 2023, 11:16 a.m. UTC | #9

On 12.12.2023 11:44, Cui, Lili wrote:
>>>> On 24.11.2023 08:02, Cui, Lili wrote:
>>>>> @@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
>>>>>
>>>>>    /* Dual VEX/EVEX templates need stripping one of the possible
>> variants.  */
>>>>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>>>>> -  {
>>>>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
>>>>> -	   || maybe_cpu (t, CpuFMA))
>>>>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
>>>>> +    {
>>>>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>>>>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>>>> APX_F(CpuCMPCCXADD)
>>>>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>>>> APX_F(CpuAVX512DQ)
>>>>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>>>>>  	{
>>>>>  	  if (need_evex_encoding ())
>>>>
>>>> There are several issues here:
>>>> - Why did you need to change (to the worse) the original code?
>>>> - Why did you not model the addition after that original code?
>>>> - How come APX_F (CpuAVX512*) constructs appear here, when no
>> AVX512
>>>> insn can be VEX-encoded?
>>>
>>>  I don't understand what you mean, we have this combination.
>>>
>>> kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F),
>>> Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, {
>>> RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
>>
>> Oh, I'm sorry: I forgot about the mask register insns.
>>
>>>> - If these new macros are really needed for whatever reason, they
>> shouldn't
>>>>   be added to opcodes/i386-opc.h when they're useful only in the
>> assembler.
>>>> - Style requires a blank before the opening parenthesis in function
>>>>   invocations (which also covers function-like macro invocations).
>>>>
>>>> I think I asked before: How is it that you get away without altering
>>>> cpu_flags_match(), containing related and quite similar logic?
>>>>
>>>
>>> For the original logic ( ... || ... ) && ( ... || ...), the content in the first bracket
>> and the content in the following brackets can be combined arbitrarily. I think
>> it is Inaccurate.
>>
>> In which way? If there are issues with the existing code, these issues want
>> taking care of in separate (prereq) patches. Of course there are assumptions
>> made here about the CPU combinations that can (and cannot) occur in any of
>> our templates. Similar assumptions are imo fine to make in the APX additions.
>>
>> Note how I used two nested if()s despite that not having been necessary at
>> that time. I did so in anticipation that for APX you'd want to add another
>> (separate) inner if(), rather than altering the one that's there.
> 
> Could we remove the CPU check here? it's a bit ugly and has limited effectiveness.
> 
>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>     {
>       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
>           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
>           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))

I agree on the "a bit ugly" part, but taking what's there right now I
don't understand "has limited effectiveness". Of course you can remove
any code you want, provided you can prove nothing breaks.

>>> So I give examples one by one for each identified combination.
>>
>> Which examples are you talking about? I see none given in your reply.
>>
> 
> Sorry, I want to say "I've listed every possible combination".
> 
>>> Just found cpu_flags_match() has similar logic, I think the following is the
>> only code related to CPUID alerts, but none of our combinations are related
>> to cpuavx.
>>>
>>>           if (all.bitfield.cpuavx)
>>>             {
>>>               /* We need to check SSE2AVX with AVX.  */
>>>               if (!t->opcode_modifier.sse2avx
>>>                   || (sse2avx && !i.prefix[DATA_PREFIX]))
>>>                 match |= CPU_FLAGS_ARCH_MATCH;
>>>             }
>>
>> Not sure why you pick out this one. This special case is needed for sse2avx; I
>> don't see how it's related here. What I've been pointing you at is the code in
>> that function which follows a similar "Dual VEX/EVEX templates ..."
>> comment.
>>
> 
> I know you're talking about this code, I'm just guessing what it does? Don't know what I missed.

You pulled out this sse2avx code. Hence I was expecting you to tell me
why you consider it relevant here.

> For example 
> 
> .arch .nobmi
> andn    (%eax), %eax, %eax
> 
> ---------------------------------------------------------------------------------------------
>   if (flag_code != CODE_64BIT)
>     active = cpu_flags_and_not (cpu_arch_flags, cpu_64_flags);
>   else
>     active = cpu_arch_flags;                   ---> cpubmi = 0;
>   cpu = cpu_flags_and (all, active);      ---> cpuapx =1; cpubmi = 0;
>   if (cpu_flags_equal (&cpu, &all))       ---> &cpu and &all are not same.
>     {
>     ...
>     }    
> Return  CPU_FLAGS_64BIT_MATCH
> ----------------------------------------------------------------------------------------------
> Then we will report an arch error.
> 
>           if (supported != CPU_FLAGS_PERFECT_MATCH)
>             {
>               as_bad (_("`%s' is not supported on `%s%s'"),
>                       insn_name (current_templates.start),
>                       cpu_arch_name ? cpu_arch_name : default_arch,
>                       cpu_sub_arch_name ? cpu_sub_arch_name : "");
>               return NULL;
>             }

Which is what we want, I think (for the particular example you picked)? Yet
again, I don't think I can see what you're trying to tell me. I also have
to confess I've lost track of whether we're discussing install_template(),
cpu_flag_match(), or both. For example in install_template() you may
indeed be able to get away with little or no changes, as long as there's
no used features tracking for APX (see the early ELF-specific part of
output_insn()). Things would be somewhat inconsistent then, but that may be
tolerable (as long as properly justified in the patch description). Not
getting this into proper shape right with the introduction of APX may bite
us later, though.

Jan

Cui, Lili Dec. 12, 2023, 12:32 p.m. UTC | #10

> >>>>> @@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
> >>>>>
> >>>>>    /* Dual VEX/EVEX templates need stripping one of the possible
> >> variants.  */
> >>>>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >>>>> -  {
> >>>>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> >>>>> -	   || maybe_cpu (t, CpuFMA))
> >>>>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
> >>>>> +    {
> >>>>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> >>>>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> >>>> APX_F(CpuCMPCCXADD)
> >>>>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> >>>> APX_F(CpuAVX512DQ)
> >>>>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) ||
> APX_F(CpuBMI2))
> >>>>>  	{
> >>>>>  	  if (need_evex_encoding ())
> >>>>
> >>>> There are several issues here:
> >>>> - Why did you need to change (to the worse) the original code?
> >>>> - Why did you not model the addition after that original code?
> >>>> - How come APX_F (CpuAVX512*) constructs appear here, when no
> >> AVX512
> >>>> insn can be VEX-encoded?
> >>>
> >>>  I don't understand what you mean, we have this combination.
> >>>
> >>> kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F),
> >>> Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, {
> >>> RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
> >>
> >> Oh, I'm sorry: I forgot about the mask register insns.
> >>
> >>>> - If these new macros are really needed for whatever reason, they
> >> shouldn't
> >>>>   be added to opcodes/i386-opc.h when they're useful only in the
> >> assembler.
> >>>> - Style requires a blank before the opening parenthesis in function
> >>>>   invocations (which also covers function-like macro invocations).
> >>>>
> >>>> I think I asked before: How is it that you get away without
> >>>> altering cpu_flags_match(), containing related and quite similar logic?
> >>>>
> >>>
> >>> For the original logic ( ... || ... ) && ( ... || ...), the content
> >>> in the first bracket
> >> and the content in the following brackets can be combined
> >> arbitrarily. I think it is Inaccurate.
> >>
> >> In which way? If there are issues with the existing code, these
> >> issues want taking care of in separate (prereq) patches. Of course
> >> there are assumptions made here about the CPU combinations that can
> >> (and cannot) occur in any of our templates. Similar assumptions are imo
> fine to make in the APX additions.
> >>
> >> Note how I used two nested if()s despite that not having been
> >> necessary at that time. I did so in anticipation that for APX you'd
> >> want to add another
> >> (separate) inner if(), rather than altering the one that's there.
> >
> > Could we remove the CPU check here? it's a bit ugly and has limited
> effectiveness.
> >
> >   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >     {
> >       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> >           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> APX_F(CpuCMPCCXADD)
> >           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> APX_F(CpuAVX512DQ)
> >           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> 
> I agree on the "a bit ugly" part, but taking what's there right now I don't
> understand "has limited effectiveness". Of course you can remove any code
> you want, provided you can prove nothing breaks.
> 

Here is install_template().
All I can say is that after removing the CPU check, no test cases failed. I know it's hard to convince you to delete this place, or what do you suggest to do with this? APX requires this, otherwise the test cases will fail.

-      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
-         || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
-         || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
-         || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
-       {

> >>> So I give examples one by one for each identified combination.
> >>
> >> Which examples are you talking about? I see none given in your reply.
> >>
> >
> > Sorry, I want to say "I've listed every possible combination".
> >
> >>> Just found cpu_flags_match() has similar logic, I think the
> >>> following is the
> >> only code related to CPUID alerts, but none of our combinations are
> >> related to cpuavx.
> >>>
> >>>           if (all.bitfield.cpuavx)
> >>>             {
> >>>               /* We need to check SSE2AVX with AVX.  */
> >>>               if (!t->opcode_modifier.sse2avx
> >>>                   || (sse2avx && !i.prefix[DATA_PREFIX]))
> >>>                 match |= CPU_FLAGS_ARCH_MATCH;
> >>>             }
> >>
> >> Not sure why you pick out this one. This special case is needed for
> >> sse2avx; I don't see how it's related here. What I've been pointing
> >> you at is the code in that function which follows a similar "Dual VEX/EVEX
> templates ..."
> >> comment.
> >>
> >
> > I know you're talking about this code, I'm just guessing what it does? Don't
> know what I missed.
> 
> You pulled out this sse2avx code. Hence I was expecting you to tell me why
> you consider it relevant here.
> 
Here is cpu_flag_match().

I rechecked the code, maybe you want to say I missed the outer loop.

      cpu = cpu_flags_and (any, active);
      if (cpu_flags_all_zero (&any) || !cpu_flags_all_zero (&cpu))
        {
          if (all.bitfield.cpuavx)
            {
              /* We need to check SSE2AVX with AVX.  */
              if (!t->opcode_modifier.sse2avx
                  || (sse2avx && !i.prefix[DATA_PREFIX]))
                match |= CPU_FLAGS_ARCH_MATCH;
            }
          else
            match |= CPU_FLAGS_ARCH_MATCH;
        }

> > For example
> >
> > .arch .nobmi
> > andn    (%eax), %eax, %eax
> >
> > ---------------------------------------------------------------------------------------------
> >   if (flag_code != CODE_64BIT)
> >     active = cpu_flags_and_not (cpu_arch_flags, cpu_64_flags);
> >   else
> >     active = cpu_arch_flags;                   ---> cpubmi = 0;
> >   cpu = cpu_flags_and (all, active);      ---> cpuapx =1; cpubmi = 0;
> >   if (cpu_flags_equal (&cpu, &all))       ---> &cpu and &all are not same.
> >     {
> >     ...
> >     }
> > Return  CPU_FLAGS_64BIT_MATCH
> > ----------------------------------------------------------------------
> > ------------------------
> > Then we will report an arch error.
> >
> >           if (supported != CPU_FLAGS_PERFECT_MATCH)
> >             {
> >               as_bad (_("`%s' is not supported on `%s%s'"),
> >                       insn_name (current_templates.start),
> >                       cpu_arch_name ? cpu_arch_name : default_arch,
> >                       cpu_sub_arch_name ? cpu_sub_arch_name : "");
> >               return NULL;
> >             }
> 
> Which is what we want, I think (for the particular example you picked)? Yet
> again, I don't think I can see what you're trying to tell me. I also have to
> confess I've lost track of whether we're discussing install_template(),
> cpu_flag_match(), or both. For example in install_template() you may indeed
> be able to get away with little or no changes, as long as there's no used
> features tracking for APX (see the early ELF-specific part of output_insn()).
> Things would be somewhat inconsistent then, but that may be tolerable (as
> long as properly justified in the patch description). Not getting this into
> proper shape right with the introduction of APX may bite us later, though.
> 

Here is cpu_flag_match().
I just want to say that for the APX part we don't need to handle it in the "Double VEX/EVEX Template...".

Thanks,
Lili.

Jan Beulich Dec. 12, 2023, 12:39 p.m. UTC | #11

On 12.12.2023 13:32, Cui, Lili wrote:
>>>>>>> @@ -3670,10 +3673,11 @@ install_template (const insn_template *t)
>>>>>>>
>>>>>>>    /* Dual VEX/EVEX templates need stripping one of the possible
>>>> variants.  */
>>>>>>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>>>>>>> -  {
>>>>>>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
>>>>>>> -	   || maybe_cpu (t, CpuFMA))
>>>>>>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
>>>>>>> +    {
>>>>>>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>>>>>>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>>>>>> APX_F(CpuCMPCCXADD)
>>>>>>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>>>>>> APX_F(CpuAVX512DQ)
>>>>>>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) ||
>> APX_F(CpuBMI2))
>>>>>>>  	{
>>>>>>>  	  if (need_evex_encoding ())
>>>>>>
>>>>>> There are several issues here:
>>>>>> - Why did you need to change (to the worse) the original code?
>>>>>> - Why did you not model the addition after that original code?
>>>>>> - How come APX_F (CpuAVX512*) constructs appear here, when no
>>>> AVX512
>>>>>> insn can be VEX-encoded?
>>>>>
>>>>>  I don't understand what you mean, we have this combination.
>>>>>
>>>>> kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F),
>>>>> Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, {
>>>>> RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
>>>>
>>>> Oh, I'm sorry: I forgot about the mask register insns.
>>>>
>>>>>> - If these new macros are really needed for whatever reason, they
>>>> shouldn't
>>>>>>   be added to opcodes/i386-opc.h when they're useful only in the
>>>> assembler.
>>>>>> - Style requires a blank before the opening parenthesis in function
>>>>>>   invocations (which also covers function-like macro invocations).
>>>>>>
>>>>>> I think I asked before: How is it that you get away without
>>>>>> altering cpu_flags_match(), containing related and quite similar logic?
>>>>>>
>>>>>
>>>>> For the original logic ( ... || ... ) && ( ... || ...), the content
>>>>> in the first bracket
>>>> and the content in the following brackets can be combined
>>>> arbitrarily. I think it is Inaccurate.
>>>>
>>>> In which way? If there are issues with the existing code, these
>>>> issues want taking care of in separate (prereq) patches. Of course
>>>> there are assumptions made here about the CPU combinations that can
>>>> (and cannot) occur in any of our templates. Similar assumptions are imo
>> fine to make in the APX additions.
>>>>
>>>> Note how I used two nested if()s despite that not having been
>>>> necessary at that time. I did so in anticipation that for APX you'd
>>>> want to add another
>>>> (separate) inner if(), rather than altering the one that's there.
>>>
>>> Could we remove the CPU check here? it's a bit ugly and has limited
>> effectiveness.
>>>
>>>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>>>     {
>>>       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>>>           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>> APX_F(CpuCMPCCXADD)
>>>           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>> APX_F(CpuAVX512DQ)
>>>           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>>
>> I agree on the "a bit ugly" part, but taking what's there right now I don't
>> understand "has limited effectiveness". Of course you can remove any code
>> you want, provided you can prove nothing breaks.
>>
> 
> Here is install_template().
> All I can say is that after removing the CPU check, no test cases failed. I know it's hard to convince you to delete this place, or what do you suggest to do with this? APX requires this, otherwise the test cases will fail.
> 
> -      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> -         || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
> -         || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
> -         || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> -       {

So be it then (assuming you don't delete any pre-existing code there). As
said, I expect this will bite us later.

>>>>> Just found cpu_flags_match() has similar logic, I think the
>>>>> following is the
>>>> only code related to CPUID alerts, but none of our combinations are
>>>> related to cpuavx.
>>>>>
>>>>>           if (all.bitfield.cpuavx)
>>>>>             {
>>>>>               /* We need to check SSE2AVX with AVX.  */
>>>>>               if (!t->opcode_modifier.sse2avx
>>>>>                   || (sse2avx && !i.prefix[DATA_PREFIX]))
>>>>>                 match |= CPU_FLAGS_ARCH_MATCH;
>>>>>             }
>>>>
>>>> Not sure why you pick out this one. This special case is needed for
>>>> sse2avx; I don't see how it's related here. What I've been pointing
>>>> you at is the code in that function which follows a similar "Dual VEX/EVEX
>> templates ..."
>>>> comment.
>>>>
>>>
>>> I know you're talking about this code, I'm just guessing what it does? Don't
>> know what I missed.
>>
>> You pulled out this sse2avx code. Hence I was expecting you to tell me why
>> you consider it relevant here.
>>
> Here is cpu_flag_match().
> 
> I rechecked the code, maybe you want to say I missed the outer loop.
> 
>       cpu = cpu_flags_and (any, active);
>       if (cpu_flags_all_zero (&any) || !cpu_flags_all_zero (&cpu))
>         {
>           if (all.bitfield.cpuavx)
>             {
>               /* We need to check SSE2AVX with AVX.  */
>               if (!t->opcode_modifier.sse2avx
>                   || (sse2avx && !i.prefix[DATA_PREFIX]))
>                 match |= CPU_FLAGS_ARCH_MATCH;
>             }
>           else
>             match |= CPU_FLAGS_ARCH_MATCH;
>         }

No, ...

>>> For example
>>>
>>> .arch .nobmi
>>> andn    (%eax), %eax, %eax
>>>
>>> ---------------------------------------------------------------------------------------------
>>>   if (flag_code != CODE_64BIT)
>>>     active = cpu_flags_and_not (cpu_arch_flags, cpu_64_flags);
>>>   else
>>>     active = cpu_arch_flags;                   ---> cpubmi = 0;
>>>   cpu = cpu_flags_and (all, active);      ---> cpuapx =1; cpubmi = 0;
>>>   if (cpu_flags_equal (&cpu, &all))       ---> &cpu and &all are not same.
>>>     {
>>>     ...
>>>     }
>>> Return  CPU_FLAGS_64BIT_MATCH
>>> ----------------------------------------------------------------------
>>> ------------------------
>>> Then we will report an arch error.
>>>
>>>           if (supported != CPU_FLAGS_PERFECT_MATCH)
>>>             {
>>>               as_bad (_("`%s' is not supported on `%s%s'"),
>>>                       insn_name (current_templates.start),
>>>                       cpu_arch_name ? cpu_arch_name : default_arch,
>>>                       cpu_sub_arch_name ? cpu_sub_arch_name : "");
>>>               return NULL;
>>>             }
>>
>> Which is what we want, I think (for the particular example you picked)? Yet
>> again, I don't think I can see what you're trying to tell me. I also have to
>> confess I've lost track of whether we're discussing install_template(),
>> cpu_flag_match(), or both. For example in install_template() you may indeed
>> be able to get away with little or no changes, as long as there's no used
>> features tracking for APX (see the early ELF-specific part of output_insn()).
>> Things would be somewhat inconsistent then, but that may be tolerable (as
>> long as properly justified in the patch description). Not getting this into
>> proper shape right with the introduction of APX may bite us later, though.
>>
> 
> Here is cpu_flag_match().
> I just want to say that for the APX part we don't need to handle it in the "Double VEX/EVEX Template...".

... I was referring to the dual VEX/EVEX logic. I have to admit I still don't
understand how you get away without touching that, but if everything works,
all is fine of course.

Jan

Cui, Lili Dec. 12, 2023, 12:58 p.m. UTC | #12

> 
> >>> @@ -14233,6 +14276,12 @@ static bool check_register (const reg_entry
> >> *r)
> >>>        if (!cpu_arch_flags.bitfield.cpuapx_f
> >>>  	  || flag_code != CODE_64BIT)
> >>>  	return false;
> >>> +
> >>> +      /* When using RegRex2, dual VEX/EVEX templates need to be
> >>> + marked as
> >> EVEX.
> >>> +	 For the later install_template function.  */
> >>> +      if (current_templates->start->opcode_modifier.vex
> >>> +	  && current_templates->start->opcode_modifier.evex)
> >>> +	i.vec_encoding = vex_encoding_evex;
> >>
> >> I'm afraid I don't understand the 2nd sentence of the comment. This
> >> may be related to my question regarding cpu_flags_match() further up.
> >>
> >> The first sentence isn't quite correct either - you don't mark any
> >> template here (and you can't, because we don't even know yet which
> >> template we're going to use).
> >>
> >> Finally - do you really need the .evex check here? (I won't exclude
> >> that this yields a better diagnostic in certain cases, but this wants
> >> clarifying if so.)
> >>
> >
> > If you look at install_template(), you'll see that before this function we
> need to know if the current encoding is evex.
> 
> "This function" being check_register()? If so, then no, we can't know up front
> whether EVEX encoding is going to be needed, as operand parsing happens
> ahead of template selection. If instead you mean "that function" and hence
> install_template(), then yes, we need to know whether to use EVEX there.
> Yet how does that result in a need for the .evex check here? (Or maybe your
> reply was really to the first of the three parts of my earlier one?)
>

Agree with you, put them here is unreasonable. 

For example 

vtestps (%r27),%ymm6

we should report unsupported  Egpr. But without .evex check, it will report "Error: no EVEX encoding for `vtestps'"

> But anyway - as said earlier on, using current_templates here looks wrong in
> the first place. check_register() deals with only a register, without regard to
> the context it is used in (with the sole exception of allow_pseudo_reg).
> May I remind you that earlier on I already indicated that I suspect you'll need
> a new enumerator to put in i.vec_encoding for this new purpose?
> 

If we don't put it in check_register(), we need to add a for loop at the beginning of the install_template() to check RegRex2. Do you think it is okay? Or create a function for it.

for (unsigned int op = 0; op < i.operands; op++)
    {
      if (i.types[op].bitfield.class != Reg)
        continue;

      if (i.op[op].regs->reg_flags & RegRex2)
        i.vec_encoding = vex_encoding_evex;
    }

  if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
      || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
    i.vec_encoding = vex_encoding_evex; 


> > We need to check opcode_modifier.evex here, it is a fix for issues caused by
> the merge of VEX and EVEX.
> >   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >     {
> >       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> >           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> APX_F(CpuCMPCCXADD)
> >           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> APX_F(CpuAVX512DQ)
> >           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> >         {
> >           if (need_evex_encoding ())
> >             {
> >[...]
> >>> @@ -1319,13 +1320,16 @@ getsec, 0xf37, SMX, NoSuf, {}
> >>>
> >>>  invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, {
> >>> Oword|Unspecified|BaseIndex, Reg32 }  invept, 0x660f3880, EPT&x64,
> >>> Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> >>> +invept, 0xf3f0, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, {
> >>> +Oword|Unspecified|BaseIndex, Reg64 }
> >>>  invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, {
> >>> Oword|Unspecified|BaseIndex, Reg32 }  invvpid, 0x660f3881, EPT&x64,
> >>> Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
> >>> +invvpid, 0xf3f1, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, {
> >>> +Oword|Unspecified|BaseIndex, Reg64 }
> >>
> >> Seeing these: Are there any Map4 encodings which aren't EVex128? If
> >> not (and if you're also not hiddenly aware of some appearing in the
> >> near future), please consider making EVexMap4 include this right
> >> away. Even if in the longer run other encodings appear, it'll then be
> >> easy to simply replace all the
> >> EVexMap4 uses in a purely mechanical way. Until then shorter template
> >> lines are preferable.
> >>
> >
> > Would you mind defining it this way? Since #define EVex128 is behind it.
> Considering that you don't like unnecessary changes.
> >
> > +#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4|EVex=EVEX128
> 
> The order of #define-s doesn't matter. There's no reason not to use EVex128
> here even if it's #define-d only a few lines later.
> 

OK

#define EVex128 EVex=EVEX128
#define EVex256 EVex=EVEX256
#define EVex512 EVex=EVEX512
#define EVexLIG EVex=EVEXLIG
#define EVexDYN EVex=EVEXDYN

+#define Space0F    OpcodeSpace=SPACE_0F
+#define Space0F38  OpcodeSpace=SPACE_0F38
+#define Space0F3A  OpcodeSpace=SPACE_0F3A
+#define SpaceXOP08 OpcodeSpace=SPACE_XOP08
+#define SpaceXOP09 OpcodeSpace=SPACE_XOP09
+#define SpaceXOP0A OpcodeSpace=SPACE_XOP0A
+
+#define EVexMap4 OpcodeSpace=MAP4|EVex128
+#define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5
+#define EVexMap6 OpcodeSpace=SPACE_EVEXMAP6

> Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No
> >> _bSuf|No
> >>> _wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex,
> >>> Reg32|Reg64 }
> >>> +bzhi, 0xf5, BMI2&(BMI2|APX_F),
> >>>
> >>
> +Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapS
> >> ources|N
> >>> +o_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64,
> >>> +Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
> >>
> >> Hmm, I had specifically suggested a pre-processor macro to use in
> >> place of the open-coded BMI2&(BMI2|APX_F). Is there a reason you
> >> didn't use that (here and below)?
> >
> > There are many different types of combinations, and each combination
> appears relatively few times, so I think adding a #define for each combination
> feels a bit wasteful.
> 
> I never suggested using multiple #define-s. I suggested a single APX_F()
> macro which would be used uniformly here and elsewhere (here:
> APX_F(BMI2)).
> And that macro would come with a comment explaining why the expression
> is the (seemingly strange) way it is. Right now there's no such explanation
> anywhere, and it would also be hard to find a good (central) place where to
> put it.
> 

Oh, get you.

Thanks,
Lili.

Cui, Lili Dec. 12, 2023, 1:15 p.m. UTC | #13

> On 12.12.2023 13:32, Cui, Lili wrote:
> >>>>>>> @@ -3670,10 +3673,11 @@ install_template (const insn_template
> >>>>>>> *t)
> >>>>>>>
> >>>>>>>    /* Dual VEX/EVEX templates need stripping one of the possible
> >>>> variants.  */
> >>>>>>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >>>>>>> -  {
> >>>>>>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> >>>>>>> -	   || maybe_cpu (t, CpuFMA))
> >>>>>>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t,
> CpuAVX512VL)))
> >>>>>>> +    {
> >>>>>>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) ||
> AVX512F(CpuFMA)
> >>>>>>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> >>>>>> APX_F(CpuCMPCCXADD)
> >>>>>>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> >>>>>> APX_F(CpuAVX512DQ)
> >>>>>>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) ||
> >> APX_F(CpuBMI2))
> >>>>>>>  	{
> >>>>>>>  	  if (need_evex_encoding ())
> >>>>>>
> >>>>>> There are several issues here:
> >>>>>> - Why did you need to change (to the worse) the original code?
> >>>>>> - Why did you not model the addition after that original code?
> >>>>>> - How come APX_F (CpuAVX512*) constructs appear here, when no
> >>>> AVX512
> >>>>>> insn can be VEX-encoded?
> >>>>>
> >>>>>  I don't understand what you mean, we have this combination.
> >>>>>
> >>>>> kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F),
> >>>>> Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, {
> >>>>> RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
> >>>>
> >>>> Oh, I'm sorry: I forgot about the mask register insns.
> >>>>
> >>>>>> - If these new macros are really needed for whatever reason, they
> >>>> shouldn't
> >>>>>>   be added to opcodes/i386-opc.h when they're useful only in the
> >>>> assembler.
> >>>>>> - Style requires a blank before the opening parenthesis in function
> >>>>>>   invocations (which also covers function-like macro invocations).
> >>>>>>
> >>>>>> I think I asked before: How is it that you get away without
> >>>>>> altering cpu_flags_match(), containing related and quite similar
> logic?
> >>>>>>
> >>>>>
> >>>>> For the original logic ( ... || ... ) && ( ... || ...), the
> >>>>> content in the first bracket
> >>>> and the content in the following brackets can be combined
> >>>> arbitrarily. I think it is Inaccurate.
> >>>>
> >>>> In which way? If there are issues with the existing code, these
> >>>> issues want taking care of in separate (prereq) patches. Of course
> >>>> there are assumptions made here about the CPU combinations that can
> >>>> (and cannot) occur in any of our templates. Similar assumptions are
> >>>> imo
> >> fine to make in the APX additions.
> >>>>
> >>>> Note how I used two nested if()s despite that not having been
> >>>> necessary at that time. I did so in anticipation that for APX you'd
> >>>> want to add another
> >>>> (separate) inner if(), rather than altering the one that's there.
> >>>
> >>> Could we remove the CPU check here? it's a bit ugly and has limited
> >> effectiveness.
> >>>
> >>>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >>>     {
> >>>       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> >>>           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> >> APX_F(CpuCMPCCXADD)
> >>>           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> >> APX_F(CpuAVX512DQ)
> >>>           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> >>
> >> I agree on the "a bit ugly" part, but taking what's there right now I
> >> don't understand "has limited effectiveness". Of course you can
> >> remove any code you want, provided you can prove nothing breaks.
> >>
> >
> > Here is install_template().
> > All I can say is that after removing the CPU check, no test cases failed. I
> know it's hard to convince you to delete this place, or what do you suggest to
> do with this? APX requires this, otherwise the test cases will fail.
> >
> > -      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> > -         || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> APX_F(CpuCMPCCXADD)
> > -         || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> APX_F(CpuAVX512DQ)
> > -         || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> > -       {
> 
> So be it then (assuming you don't delete any pre-existing code there). As said,
> I expect this will bite us later.
> 

Done.

+      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
+          || maybe_cpu (t, CpuFMA))
+         && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL))
+         || APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F)
+         || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) || APX_F(CpuBMI)
+         || APX_F(CpuBMI2))

> >>>>> Just found cpu_flags_match() has similar logic, I think the
> >>>>> following is the
> >>>> only code related to CPUID alerts, but none of our combinations are
> >>>> related to cpuavx.
> >>>>>
> >>>>>           if (all.bitfield.cpuavx)
> >>>>>             {
> >>>>>               /* We need to check SSE2AVX with AVX.  */
> >>>>>               if (!t->opcode_modifier.sse2avx
> >>>>>                   || (sse2avx && !i.prefix[DATA_PREFIX]))
> >>>>>                 match |= CPU_FLAGS_ARCH_MATCH;
> >>>>>             }
> >>>>
> >>>> Not sure why you pick out this one. This special case is needed for
> >>>> sse2avx; I don't see how it's related here. What I've been pointing
> >>>> you at is the code in that function which follows a similar "Dual
> >>>> VEX/EVEX
> >> templates ..."
> >>>> comment.
> >>>>
> >>>
> >>> I know you're talking about this code, I'm just guessing what it
> >>> does? Don't
> >> know what I missed.
> >>
> >> You pulled out this sse2avx code. Hence I was expecting you to tell
> >> me why you consider it relevant here.
> >>
> > Here is cpu_flag_match().
> >
> > I rechecked the code, maybe you want to say I missed the outer loop.
> >
> >       cpu = cpu_flags_and (any, active);
> >       if (cpu_flags_all_zero (&any) || !cpu_flags_all_zero (&cpu))
> >         {
> >           if (all.bitfield.cpuavx)
> >             {
> >               /* We need to check SSE2AVX with AVX.  */
> >               if (!t->opcode_modifier.sse2avx
> >                   || (sse2avx && !i.prefix[DATA_PREFIX]))
> >                 match |= CPU_FLAGS_ARCH_MATCH;
> >             }
> >           else
> >             match |= CPU_FLAGS_ARCH_MATCH;
> >         }
> 
> No, ...
> 
> >>> For example
> >>>
> >>> .arch .nobmi
> >>> andn    (%eax), %eax, %eax
> >>>
> >>> ---------------------------------------------------------------------------------------------
> >>>   if (flag_code != CODE_64BIT)
> >>>     active = cpu_flags_and_not (cpu_arch_flags, cpu_64_flags);
> >>>   else
> >>>     active = cpu_arch_flags;                   ---> cpubmi = 0;
> >>>   cpu = cpu_flags_and (all, active);      ---> cpuapx =1; cpubmi = 0;
> >>>   if (cpu_flags_equal (&cpu, &all))       ---> &cpu and &all are not same.
> >>>     {
> >>>     ...
> >>>     }
> >>> Return  CPU_FLAGS_64BIT_MATCH
> >>> --------------------------------------------------------------------
> >>> --
> >>> ------------------------
> >>> Then we will report an arch error.
> >>>
> >>>           if (supported != CPU_FLAGS_PERFECT_MATCH)
> >>>             {
> >>>               as_bad (_("`%s' is not supported on `%s%s'"),
> >>>                       insn_name (current_templates.start),
> >>>                       cpu_arch_name ? cpu_arch_name : default_arch,
> >>>                       cpu_sub_arch_name ? cpu_sub_arch_name : "");
> >>>               return NULL;
> >>>             }
> >>
> >> Which is what we want, I think (for the particular example you
> >> picked)? Yet again, I don't think I can see what you're trying to
> >> tell me. I also have to confess I've lost track of whether we're
> >> discussing install_template(), cpu_flag_match(), or both. For example
> >> in install_template() you may indeed be able to get away with little
> >> or no changes, as long as there's no used features tracking for APX (see the
> early ELF-specific part of output_insn()).
> >> Things would be somewhat inconsistent then, but that may be tolerable
> >> (as long as properly justified in the patch description). Not getting
> >> this into proper shape right with the introduction of APX may bite us later,
> though.
> >>
> >
> > Here is cpu_flag_match().
> > I just want to say that for the APX part we don't need to handle it in the
> "Double VEX/EVEX Template...".
> 
> ... I was referring to the dual VEX/EVEX logic. I have to admit I still don't
> understand how you get away without touching that, but if everything
> works, all is fine of course.
> 

Maybe, when FMA is combined with AVX512F, if we disable FMA, but the current instruction belongs to AVX512F, there is no need to report cpu errors for it. But it's different with APX. The combination of APX and BMI requires that both are indispensable.

Thanks,
Lili.

Jan Beulich Dec. 12, 2023, 2:04 p.m. UTC | #14

On 12.12.2023 13:58, Cui, Lili wrote:
>>
>>>>> @@ -14233,6 +14276,12 @@ static bool check_register (const reg_entry
>>>> *r)
>>>>>        if (!cpu_arch_flags.bitfield.cpuapx_f
>>>>>  	  || flag_code != CODE_64BIT)
>>>>>  	return false;
>>>>> +
>>>>> +      /* When using RegRex2, dual VEX/EVEX templates need to be
>>>>> + marked as
>>>> EVEX.
>>>>> +	 For the later install_template function.  */
>>>>> +      if (current_templates->start->opcode_modifier.vex
>>>>> +	  && current_templates->start->opcode_modifier.evex)
>>>>> +	i.vec_encoding = vex_encoding_evex;
>>>>
>>>> I'm afraid I don't understand the 2nd sentence of the comment. This
>>>> may be related to my question regarding cpu_flags_match() further up.
>>>>
>>>> The first sentence isn't quite correct either - you don't mark any
>>>> template here (and you can't, because we don't even know yet which
>>>> template we're going to use).
>>>>
>>>> Finally - do you really need the .evex check here? (I won't exclude
>>>> that this yields a better diagnostic in certain cases, but this wants
>>>> clarifying if so.)
>>>>
>>>
>>> If you look at install_template(), you'll see that before this function we
>> need to know if the current encoding is evex.
>>
>> "This function" being check_register()? If so, then no, we can't know up front
>> whether EVEX encoding is going to be needed, as operand parsing happens
>> ahead of template selection. If instead you mean "that function" and hence
>> install_template(), then yes, we need to know whether to use EVEX there.
>> Yet how does that result in a need for the .evex check here? (Or maybe your
>> reply was really to the first of the three parts of my earlier one?)
>>
> 
> Agree with you, put them here is unreasonable. 
> 
> For example 
> 
> vtestps (%r27),%ymm6
> 
> we should report unsupported  Egpr. But without .evex check, it will report "Error: no EVEX encoding for `vtestps'"
> 
>> But anyway - as said earlier on, using current_templates here looks wrong in
>> the first place. check_register() deals with only a register, without regard to
>> the context it is used in (with the sole exception of allow_pseudo_reg).
>> May I remind you that earlier on I already indicated that I suspect you'll need
>> a new enumerator to put in i.vec_encoding for this new purpose?
>>
> 
> If we don't put it in check_register(), we need to add a for loop at the beginning of the install_template() to check RegRex2. Do you think it is okay? Or create a function for it.
> 
> for (unsigned int op = 0; op < i.operands; op++)
>     {
>       if (i.types[op].bitfield.class != Reg)
>         continue;
> 
>       if (i.op[op].regs->reg_flags & RegRex2)
>         i.vec_encoding = vex_encoding_evex;
>     }
> 
>   if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
>       || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
>     i.vec_encoding = vex_encoding_evex; 

As a last resort this may be an option. But until my suggestion wasn't at
least tried or demonstrated to be worse, I don't think the above would be
acceptable.

Jan

Jan Beulich Dec. 12, 2023, 2:13 p.m. UTC | #15

On 12.12.2023 14:15, Cui, Lili wrote:
>> On 12.12.2023 13:32, Cui, Lili wrote:
>>>>>>>>> @@ -3670,10 +3673,11 @@ install_template (const insn_template
>>>>>>>>> *t)
>>>>>>>>>
>>>>>>>>>    /* Dual VEX/EVEX templates need stripping one of the possible
>>>>>> variants.  */
>>>>>>>>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>>>>>>>>> -  {
>>>>>>>>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
>>>>>>>>> -	   || maybe_cpu (t, CpuFMA))
>>>>>>>>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t,
>> CpuAVX512VL)))
>>>>>>>>> +    {
>>>>>>>>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) ||
>> AVX512F(CpuFMA)
>>>>>>>>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>>>>>>>> APX_F(CpuCMPCCXADD)
>>>>>>>>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>>>>>>>> APX_F(CpuAVX512DQ)
>>>>>>>>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) ||
>>>> APX_F(CpuBMI2))
>>>>>>>>>  	{
>>>>>>>>>  	  if (need_evex_encoding ())
>>>>>>>>
>>>>>>>> There are several issues here:
>>>>>>>> - Why did you need to change (to the worse) the original code?
>>>>>>>> - Why did you not model the addition after that original code?
>>>>>>>> - How come APX_F (CpuAVX512*) constructs appear here, when no
>>>>>> AVX512
>>>>>>>> insn can be VEX-encoded?
>>>>>>>
>>>>>>>  I don't understand what you mean, we have this combination.
>>>>>>>
>>>>>>> kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F),
>>>>>>> Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, {
>>>>>>> RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
>>>>>>
>>>>>> Oh, I'm sorry: I forgot about the mask register insns.
>>>>>>
>>>>>>>> - If these new macros are really needed for whatever reason, they
>>>>>> shouldn't
>>>>>>>>   be added to opcodes/i386-opc.h when they're useful only in the
>>>>>> assembler.
>>>>>>>> - Style requires a blank before the opening parenthesis in function
>>>>>>>>   invocations (which also covers function-like macro invocations).
>>>>>>>>
>>>>>>>> I think I asked before: How is it that you get away without
>>>>>>>> altering cpu_flags_match(), containing related and quite similar
>> logic?
>>>>>>>>
>>>>>>>
>>>>>>> For the original logic ( ... || ... ) && ( ... || ...), the
>>>>>>> content in the first bracket
>>>>>> and the content in the following brackets can be combined
>>>>>> arbitrarily. I think it is Inaccurate.
>>>>>>
>>>>>> In which way? If there are issues with the existing code, these
>>>>>> issues want taking care of in separate (prereq) patches. Of course
>>>>>> there are assumptions made here about the CPU combinations that can
>>>>>> (and cannot) occur in any of our templates. Similar assumptions are
>>>>>> imo
>>>> fine to make in the APX additions.
>>>>>>
>>>>>> Note how I used two nested if()s despite that not having been
>>>>>> necessary at that time. I did so in anticipation that for APX you'd
>>>>>> want to add another
>>>>>> (separate) inner if(), rather than altering the one that's there.
>>>>>
>>>>> Could we remove the CPU check here? it's a bit ugly and has limited
>>>> effectiveness.
>>>>>
>>>>>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>>>>>     {
>>>>>       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>>>>>           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>>>> APX_F(CpuCMPCCXADD)
>>>>>           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>>>> APX_F(CpuAVX512DQ)
>>>>>           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>>>>
>>>> I agree on the "a bit ugly" part, but taking what's there right now I
>>>> don't understand "has limited effectiveness". Of course you can
>>>> remove any code you want, provided you can prove nothing breaks.
>>>>
>>>
>>> Here is install_template().
>>> All I can say is that after removing the CPU check, no test cases failed. I
>> know it's hard to convince you to delete this place, or what do you suggest to
>> do with this? APX requires this, otherwise the test cases will fail.
>>>
>>> -      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>>> -         || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>> APX_F(CpuCMPCCXADD)
>>> -         || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>> APX_F(CpuAVX512DQ)
>>> -         || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>>> -       {
>>
>> So be it then (assuming you don't delete any pre-existing code there). As said,
>> I expect this will bite us later.
> 
> Done.

I can't connect this with ...

> +      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> +          || maybe_cpu (t, CpuFMA))
> +         && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL))
> +         || APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F)
> +         || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) || APX_F(CpuBMI)
> +         || APX_F(CpuBMI2))

... this: You said you want to remove all the new checks. And now you say
"done" with the checks all still there? And even if I misunderstood you,
I still don't see why you'd modify the existing condition: The adjustments
made in the body of the if() aren't applicable to APX afaict. Plus there
are still the odd APX_F() uses; I'm sure I commented on that before. If
any adjustments need making for APX, you want to add a 2nd inner if()
inside the enclosing one.

Jan

Cui, Lili Dec. 13, 2023, 7:36 a.m. UTC | #16

> >>>>>>>>> @@ -3670,10 +3673,11 @@ install_template (const
> insn_template
> >>>>>>>>> *t)
> >>>>>>>>>
> >>>>>>>>>    /* Dual VEX/EVEX templates need stripping one of the
> >>>>>>>>> possible
> >>>>>> variants.  */
> >>>>>>>>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >>>>>>>>> -  {
> >>>>>>>>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> >>>>>>>>> -	   || maybe_cpu (t, CpuFMA))
> >>>>>>>>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t,
> >> CpuAVX512VL)))
> >>>>>>>>> +    {
> >>>>>>>>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) ||
> >> AVX512F(CpuFMA)
> >>>>>>>>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> >>>>>>>> APX_F(CpuCMPCCXADD)
> >>>>>>>>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> >>>>>>>> APX_F(CpuAVX512DQ)
> >>>>>>>>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) ||
> >>>> APX_F(CpuBMI2))
> >>>>>>>>>  	{
> >>>>>>>>>  	  if (need_evex_encoding ())
> >>>>>>>>
> >>>>>>>> There are several issues here:
> >>>>>>>> - Why did you need to change (to the worse) the original code?
> >>>>>>>> - Why did you not model the addition after that original code?
> >>>>>>>> - How come APX_F (CpuAVX512*) constructs appear here, when no
> >>>>>> AVX512
> >>>>>>>> insn can be VEX-encoded?
> >>>>>>>
> >>>>>>>  I don't understand what you mean, we have this combination.
> >>>>>>>
> >>>>>>> kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F),
> >>>>>>> Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, {
> >>>>>>> RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
> >>>>>>
> >>>>>> Oh, I'm sorry: I forgot about the mask register insns.
> >>>>>>
> >>>>>>>> - If these new macros are really needed for whatever reason,
> >>>>>>>> they
> >>>>>> shouldn't
> >>>>>>>>   be added to opcodes/i386-opc.h when they're useful only in
> >>>>>>>> the
> >>>>>> assembler.
> >>>>>>>> - Style requires a blank before the opening parenthesis in function
> >>>>>>>>   invocations (which also covers function-like macro invocations).
> >>>>>>>>
> >>>>>>>> I think I asked before: How is it that you get away without
> >>>>>>>> altering cpu_flags_match(), containing related and quite
> >>>>>>>> similar
> >> logic?
> >>>>>>>>
> >>>>>>>
> >>>>>>> For the original logic ( ... || ... ) && ( ... || ...), the
> >>>>>>> content in the first bracket
> >>>>>> and the content in the following brackets can be combined
> >>>>>> arbitrarily. I think it is Inaccurate.
> >>>>>>
> >>>>>> In which way? If there are issues with the existing code, these
> >>>>>> issues want taking care of in separate (prereq) patches. Of
> >>>>>> course there are assumptions made here about the CPU combinations
> >>>>>> that can (and cannot) occur in any of our templates. Similar
> >>>>>> assumptions are imo
> >>>> fine to make in the APX additions.
> >>>>>>
> >>>>>> Note how I used two nested if()s despite that not having been
> >>>>>> necessary at that time. I did so in anticipation that for APX
> >>>>>> you'd want to add another
> >>>>>> (separate) inner if(), rather than altering the one that's there.
> >>>>>
> >>>>> Could we remove the CPU check here? it's a bit ugly and has
> >>>>> limited
> >>>> effectiveness.
> >>>>>
> >>>>>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
> >>>>>     {
> >>>>>       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> >>>>>           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> >>>> APX_F(CpuCMPCCXADD)
> >>>>>           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> >>>> APX_F(CpuAVX512DQ)
> >>>>>           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) ||
> >>>>> APX_F(CpuBMI2))
> >>>>
> >>>> I agree on the "a bit ugly" part, but taking what's there right now
> >>>> I don't understand "has limited effectiveness". Of course you can
> >>>> remove any code you want, provided you can prove nothing breaks.
> >>>>
> >>>
> >>> Here is install_template().
> >>> All I can say is that after removing the CPU check, no test cases
> >>> failed. I
> >> know it's hard to convince you to delete this place, or what do you
> >> suggest to do with this? APX requires this, otherwise the test cases will fail.
> >>>
> >>> -      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
> >>> -         || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
> >> APX_F(CpuCMPCCXADD)
> >>> -         || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
> >> APX_F(CpuAVX512DQ)
> >>> -         || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
> >>> -       {
> >>
> >> So be it then (assuming you don't delete any pre-existing code
> >> there). As said, I expect this will bite us later.
> >
> > Done.
> 
> I can't connect this with ...
> 
> > +      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
> > +          || maybe_cpu (t, CpuFMA))
> > +         && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL))
> > +         || APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) ||
> APX_F(CpuAVX512F)
> > +         || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) ||
> APX_F(CpuBMI)
> > +         || APX_F(CpuBMI2))
> 
> ... this: You said you want to remove all the new checks. And now you say
> "done" with the checks all still there? And even if I misunderstood you, I still
> don't see why you'd modify the existing condition: The adjustments made in
> the body of the if() aren't applicable to APX afaict. Plus there are still the odd
> APX_F() uses; I'm sure I commented on that before. If any adjustments need
> making for APX, you want to add a 2nd inner if() inside the enclosing one.
> 

I want to remove all, including your pre-existing code,  there is an EVEX testcase failure due to not clean i.tm.opcode_modifier.vex = 0;  As you required that don't delete any pre-existing code, so I still need to add my new combination,  

How about this ?


I want to remove all code, including your pre-existing code, VEX test case fails because it wasn't cleaned up i.tm.opcode_modifier.evex = 0; As you asked, don't remove any pre-existing code, so I still need to add my new combinations.

How about this?

  /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
  if (t->opcode_modifier.vex && t->opcode_modifier.evex)
    {
      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
           || maybe_cpu (t, CpuFMA))
          && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
        {
          if (need_evex_encoding ())
            {
              i.tm.opcode_modifier.vex = 0;
              i.tm.cpu.bitfield.cpuavx512f = i.tm.cpu_any.bitfield.cpuavx512f;
              i.tm.cpu.bitfield.cpuavx512vl = i.tm.cpu_any.bitfield.cpuavx512vl;
            }
          else
            {
              i.tm.opcode_modifier.evex = 0;
              if (i.tm.cpu_any.bitfield.cpuavx)
                i.tm.cpu.bitfield.cpuavx = 1;
              else if (!i.tm.cpu.bitfield.isa)
                i.tm.cpu.bitfield.isa = i.tm.cpu_any.bitfield.isa;
              else
                gas_assert (i.tm.cpu.bitfield.isa == i.tm.cpu_any.bitfield.isa);
            }
        }

      if (APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F)
          || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) || APX_F(CpuBMI)
          || APX_F(CpuBMI2))
        if (need_evex_encoding ())
          i.tm.opcode_modifier.vex = 0;
        else
          i.tm.opcode_modifier.evex = 0;
    }

Thanks,
Lili.

Jan Beulich Dec. 13, 2023, 7:48 a.m. UTC | #17

On 13.12.2023 08:36, Cui, Lili wrote:
>>>>>>>>>>> @@ -3670,10 +3673,11 @@ install_template (const
>> insn_template
>>>>>>>>>>> *t)
>>>>>>>>>>>
>>>>>>>>>>>    /* Dual VEX/EVEX templates need stripping one of the
>>>>>>>>>>> possible
>>>>>>>> variants.  */
>>>>>>>>>>>    if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>>>>>>>>>>> -  {
>>>>>>>>>>> -      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
>>>>>>>>>>> -	   || maybe_cpu (t, CpuFMA))
>>>>>>>>>>> -	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t,
>>>> CpuAVX512VL)))
>>>>>>>>>>> +    {
>>>>>>>>>>> +      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) ||
>>>> AVX512F(CpuFMA)
>>>>>>>>>>> +	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>>>>>>>>>> APX_F(CpuCMPCCXADD)
>>>>>>>>>>> +	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>>>>>>>>>> APX_F(CpuAVX512DQ)
>>>>>>>>>>> +	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) ||
>>>>>> APX_F(CpuBMI2))
>>>>>>>>>>>  	{
>>>>>>>>>>>  	  if (need_evex_encoding ())
>>>>>>>>>>
>>>>>>>>>> There are several issues here:
>>>>>>>>>> - Why did you need to change (to the worse) the original code?
>>>>>>>>>> - Why did you not model the addition after that original code?
>>>>>>>>>> - How come APX_F (CpuAVX512*) constructs appear here, when no
>>>>>>>> AVX512
>>>>>>>>>> insn can be VEX-encoded?
>>>>>>>>>
>>>>>>>>>  I don't understand what you mean, we have this combination.
>>>>>>>>>
>>>>>>>>> kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F),
>>>>>>>>> Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, {
>>>>>>>>> RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
>>>>>>>>
>>>>>>>> Oh, I'm sorry: I forgot about the mask register insns.
>>>>>>>>
>>>>>>>>>> - If these new macros are really needed for whatever reason,
>>>>>>>>>> they
>>>>>>>> shouldn't
>>>>>>>>>>   be added to opcodes/i386-opc.h when they're useful only in
>>>>>>>>>> the
>>>>>>>> assembler.
>>>>>>>>>> - Style requires a blank before the opening parenthesis in function
>>>>>>>>>>   invocations (which also covers function-like macro invocations).
>>>>>>>>>>
>>>>>>>>>> I think I asked before: How is it that you get away without
>>>>>>>>>> altering cpu_flags_match(), containing related and quite
>>>>>>>>>> similar
>>>> logic?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> For the original logic ( ... || ... ) && ( ... || ...), the
>>>>>>>>> content in the first bracket
>>>>>>>> and the content in the following brackets can be combined
>>>>>>>> arbitrarily. I think it is Inaccurate.
>>>>>>>>
>>>>>>>> In which way? If there are issues with the existing code, these
>>>>>>>> issues want taking care of in separate (prereq) patches. Of
>>>>>>>> course there are assumptions made here about the CPU combinations
>>>>>>>> that can (and cannot) occur in any of our templates. Similar
>>>>>>>> assumptions are imo
>>>>>> fine to make in the APX additions.
>>>>>>>>
>>>>>>>> Note how I used two nested if()s despite that not having been
>>>>>>>> necessary at that time. I did so in anticipation that for APX
>>>>>>>> you'd want to add another
>>>>>>>> (separate) inner if(), rather than altering the one that's there.
>>>>>>>
>>>>>>> Could we remove the CPU check here? it's a bit ugly and has
>>>>>>> limited
>>>>>> effectiveness.
>>>>>>>
>>>>>>>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>>>>>>>     {
>>>>>>>       if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>>>>>>>           || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>>>>>> APX_F(CpuCMPCCXADD)
>>>>>>>           || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>>>>>> APX_F(CpuAVX512DQ)
>>>>>>>           || APX_F(CpuAVX512BW) || APX_F(CpuBMI) ||
>>>>>>> APX_F(CpuBMI2))
>>>>>>
>>>>>> I agree on the "a bit ugly" part, but taking what's there right now
>>>>>> I don't understand "has limited effectiveness". Of course you can
>>>>>> remove any code you want, provided you can prove nothing breaks.
>>>>>>
>>>>>
>>>>> Here is install_template().
>>>>> All I can say is that after removing the CPU check, no test cases
>>>>> failed. I
>>>> know it's hard to convince you to delete this place, or what do you
>>>> suggest to do with this? APX requires this, otherwise the test cases will fail.
>>>>>
>>>>> -      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
>>>>> -         || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) ||
>>>> APX_F(CpuCMPCCXADD)
>>>>> -         || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) ||
>>>> APX_F(CpuAVX512DQ)
>>>>> -         || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
>>>>> -       {
>>>>
>>>> So be it then (assuming you don't delete any pre-existing code
>>>> there). As said, I expect this will bite us later.
>>>
>>> Done.
>>
>> I can't connect this with ...
>>
>>> +      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
>>> +          || maybe_cpu (t, CpuFMA))
>>> +         && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL))
>>> +         || APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) ||
>> APX_F(CpuAVX512F)
>>> +         || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) ||
>> APX_F(CpuBMI)
>>> +         || APX_F(CpuBMI2))
>>
>> ... this: You said you want to remove all the new checks. And now you say
>> "done" with the checks all still there? And even if I misunderstood you, I still
>> don't see why you'd modify the existing condition: The adjustments made in
>> the body of the if() aren't applicable to APX afaict. Plus there are still the odd
>> APX_F() uses; I'm sure I commented on that before. If any adjustments need
>> making for APX, you want to add a 2nd inner if() inside the enclosing one.
>>
> 
> I want to remove all, including your pre-existing code,  there is an EVEX testcase failure due to not clean i.tm.opcode_modifier.vex = 0;  As you required that don't delete any pre-existing code, so I still need to add my new combination,  
> 
> How about this ?
> 
> 
> I want to remove all code, including your pre-existing code, VEX test case fails because it wasn't cleaned up i.tm.opcode_modifier.evex = 0; As you asked, don't remove any pre-existing code, so I still need to add my new combinations.
> 
> How about this?
> 
>   /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
>   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
>     {
>       if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
>            || maybe_cpu (t, CpuFMA))
>           && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
>         {
>           if (need_evex_encoding ())
>             {
>               i.tm.opcode_modifier.vex = 0;
>               i.tm.cpu.bitfield.cpuavx512f = i.tm.cpu_any.bitfield.cpuavx512f;
>               i.tm.cpu.bitfield.cpuavx512vl = i.tm.cpu_any.bitfield.cpuavx512vl;
>             }
>           else
>             {
>               i.tm.opcode_modifier.evex = 0;
>               if (i.tm.cpu_any.bitfield.cpuavx)
>                 i.tm.cpu.bitfield.cpuavx = 1;
>               else if (!i.tm.cpu.bitfield.isa)
>                 i.tm.cpu.bitfield.isa = i.tm.cpu_any.bitfield.isa;
>               else
>                 gas_assert (i.tm.cpu.bitfield.isa == i.tm.cpu_any.bitfield.isa);
>             }
>         }
> 
>       if (APX_F(CpuCMPCCXADD) || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F)
>           || APX_F(CpuAVX512DQ) || APX_F(CpuAVX512BW) || APX_F(CpuBMI)
>           || APX_F(CpuBMI2))
>         if (need_evex_encoding ())
>           i.tm.opcode_modifier.vex = 0;
>         else
>           i.tm.opcode_modifier.evex = 0;
>     }

Something along these lines, indeed. But without APX_F(). I've just looked
it up again:

#define APX_F(cpuid) (maybe_cpu (t, CpuAPX_F) && maybe_cpu (t, cpuid))

Why would you test CpuAPX_F over and over again in the conditional? See
how the code that has been there for a little while checks each CpuXYZ
exactly once.

Plus, simply as a style remark, you want to add braces around the if/else,
to make entirely clear that the else belongs to the inner if() (iirc some
compiler versions warn about code as you have it above).

Jan

Cui, Lili Dec. 13, 2023, 8:35 a.m. UTC | #18

> >>>>> @@ -14233,6 +14276,12 @@ static bool check_register (const
> >>>>> reg_entry
> >>>> *r)
> >>>>>        if (!cpu_arch_flags.bitfield.cpuapx_f
> >>>>>  	  || flag_code != CODE_64BIT)
> >>>>>  	return false;
> >>>>> +
> >>>>> +      /* When using RegRex2, dual VEX/EVEX templates need to be
> >>>>> + marked as
> >>>> EVEX.
> >>>>> +	 For the later install_template function.  */
> >>>>> +      if (current_templates->start->opcode_modifier.vex
> >>>>> +	  && current_templates->start->opcode_modifier.evex)
> >>>>> +	i.vec_encoding = vex_encoding_evex;
> >>>>
> >>>> I'm afraid I don't understand the 2nd sentence of the comment. This
> >>>> may be related to my question regarding cpu_flags_match() further up.
> >>>>
> >>>> The first sentence isn't quite correct either - you don't mark any
> >>>> template here (and you can't, because we don't even know yet which
> >>>> template we're going to use).
> >>>>
> >>>> Finally - do you really need the .evex check here? (I won't exclude
> >>>> that this yields a better diagnostic in certain cases, but this
> >>>> wants clarifying if so.)
> >>>>
> >>>
> >>> If you look at install_template(), you'll see that before this
> >>> function we
> >> need to know if the current encoding is evex.
> >>
> >> "This function" being check_register()? If so, then no, we can't know
> >> up front whether EVEX encoding is going to be needed, as operand
> >> parsing happens ahead of template selection. If instead you mean
> >> "that function" and hence install_template(), then yes, we need to know
> whether to use EVEX there.
> >> Yet how does that result in a need for the .evex check here? (Or
> >> maybe your reply was really to the first of the three parts of my
> >> earlier one?)
> >>
> >
> > Agree with you, put them here is unreasonable.
> >
> > For example
> >
> > vtestps (%r27),%ymm6
> >
> > we should report unsupported  Egpr. But without .evex check, it will report
> "Error: no EVEX encoding for `vtestps'"
> >
> >> But anyway - as said earlier on, using current_templates here looks
> >> wrong in the first place. check_register() deals with only a
> >> register, without regard to the context it is used in (with the sole exception
> of allow_pseudo_reg).
> >> May I remind you that earlier on I already indicated that I suspect
> >> you'll need a new enumerator to put in i.vec_encoding for this new
> purpose?
> >>
> >
> > If we don't put it in check_register(), we need to add a for loop at the
> beginning of the install_template() to check RegRex2. Do you think it is okay?
> Or create a function for it.
> >
> > for (unsigned int op = 0; op < i.operands; op++)
> >     {
> >       if (i.types[op].bitfield.class != Reg)
> >         continue;
> >
> >       if (i.op[op].regs->reg_flags & RegRex2)
> >         i.vec_encoding = vex_encoding_evex;
> >     }
> >
> >   if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
> >       || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
> >     i.vec_encoding = vex_encoding_evex;
> 
> As a last resort this may be an option. But until my suggestion wasn't at least
> tried or demonstrated to be worse, I don't think the above would be
> acceptable.
> 

> >> May I remind you that earlier on I already indicated that I suspect
> >> you'll need a new enumerator to put in i.vec_encoding for this new
> purpose?

Jan, I didn't get your point, I think the enumerator vex_encoding_evex works well, the question is how to filter if Egpr is used in the current instruction, we should make a choice before install_template, whether it's an evex template or a vex template.

Thanks,
Lili

Jan Beulich Dec. 13, 2023, 9:13 a.m. UTC | #19

On 13.12.2023 09:35, Cui, Lili wrote:
>>>>>>> @@ -14233,6 +14276,12 @@ static bool check_register (const
>>>>>>> reg_entry
>>>>>> *r)
>>>>>>>        if (!cpu_arch_flags.bitfield.cpuapx_f
>>>>>>>  	  || flag_code != CODE_64BIT)
>>>>>>>  	return false;
>>>>>>> +
>>>>>>> +      /* When using RegRex2, dual VEX/EVEX templates need to be
>>>>>>> + marked as
>>>>>> EVEX.
>>>>>>> +	 For the later install_template function.  */
>>>>>>> +      if (current_templates->start->opcode_modifier.vex
>>>>>>> +	  && current_templates->start->opcode_modifier.evex)
>>>>>>> +	i.vec_encoding = vex_encoding_evex;
>>>>>>
>>>>>> I'm afraid I don't understand the 2nd sentence of the comment. This
>>>>>> may be related to my question regarding cpu_flags_match() further up.
>>>>>>
>>>>>> The first sentence isn't quite correct either - you don't mark any
>>>>>> template here (and you can't, because we don't even know yet which
>>>>>> template we're going to use).
>>>>>>
>>>>>> Finally - do you really need the .evex check here? (I won't exclude
>>>>>> that this yields a better diagnostic in certain cases, but this
>>>>>> wants clarifying if so.)
>>>>>>
>>>>>
>>>>> If you look at install_template(), you'll see that before this
>>>>> function we
>>>> need to know if the current encoding is evex.
>>>>
>>>> "This function" being check_register()? If so, then no, we can't know
>>>> up front whether EVEX encoding is going to be needed, as operand
>>>> parsing happens ahead of template selection. If instead you mean
>>>> "that function" and hence install_template(), then yes, we need to know
>> whether to use EVEX there.
>>>> Yet how does that result in a need for the .evex check here? (Or
>>>> maybe your reply was really to the first of the three parts of my
>>>> earlier one?)
>>>>
>>>
>>> Agree with you, put them here is unreasonable.
>>>
>>> For example
>>>
>>> vtestps (%r27),%ymm6
>>>
>>> we should report unsupported  Egpr. But without .evex check, it will report
>> "Error: no EVEX encoding for `vtestps'"
>>>
>>>> But anyway - as said earlier on, using current_templates here looks
>>>> wrong in the first place. check_register() deals with only a
>>>> register, without regard to the context it is used in (with the sole exception
>> of allow_pseudo_reg).
>>>> May I remind you that earlier on I already indicated that I suspect
>>>> you'll need a new enumerator to put in i.vec_encoding for this new
>> purpose?
>>>>
>>>
>>> If we don't put it in check_register(), we need to add a for loop at the
>> beginning of the install_template() to check RegRex2. Do you think it is okay?
>> Or create a function for it.
>>>
>>> for (unsigned int op = 0; op < i.operands; op++)
>>>     {
>>>       if (i.types[op].bitfield.class != Reg)
>>>         continue;
>>>
>>>       if (i.op[op].regs->reg_flags & RegRex2)
>>>         i.vec_encoding = vex_encoding_evex;
>>>     }
>>>
>>>   if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
>>>       || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
>>>     i.vec_encoding = vex_encoding_evex;
>>
>> As a last resort this may be an option. But until my suggestion wasn't at least
>> tried or demonstrated to be worse, I don't think the above would be
>> acceptable.
>>
> 
>>>> May I remind you that earlier on I already indicated that I suspect
>>>> you'll need a new enumerator to put in i.vec_encoding for this new
>> purpose?
> 
> Jan, I didn't get your point, I think the enumerator vex_encoding_evex works well, the question is how to filter if Egpr is used in the current instruction, we should make a choice before install_template, whether it's an evex template or a vex template.

Well, of course you can make the existing enumerator work, by - see above -
adding yet another loop over all operands. In a similar way it may have
been possible to avoid the introduction of vex_encoding_evex512. But it is
generally better to record the precise reason for requiring a certain
encoding / putting in place a certain restriction, i.e. going even beyond
the desire to avoid introducing new loops over all operands if at all
possible.

Here as soon as an eGPR is used, various encodings cannot be used anymore.
That's best expressed explicitly. (It might even be that such a new
enumerator would help with the REX2 encodings. Recall that I said earlier
already that both field and enumerator names aren't fully appropriate
anymore. Yet changing them isn't a priority, so we can defer that until
after all the APX work has landed.)

Jan

diff mbox series

Patch

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 638d3aa07c8..ba8001fe1c8 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -409,6 +409,9 @@  struct _i386_insn
     /* Compressed disp8*N attribute.  */
     unsigned int memshift;
 
+    /* No CSPAZO flags update.*/
+    bool has_nf;
+
     /* Prefer load or store in encoding.  */
     enum
       {
@@ -3670,10 +3673,11 @@  install_template (const insn_template *t)
 
   /* Dual VEX/EVEX templates need stripping one of the possible variants.  */
   if (t->opcode_modifier.vex && t->opcode_modifier.evex)
-  {
-      if ((maybe_cpu (t, CpuAVX) || maybe_cpu (t, CpuAVX2)
-	   || maybe_cpu (t, CpuFMA))
-	  && (maybe_cpu (t, CpuAVX512F) || maybe_cpu (t, CpuAVX512VL)))
+    {
+      if (AVX512F(CpuAVX) || AVX512F(CpuAVX2) || AVX512F(CpuFMA)
+	  || AVX512VL(CpuAVX) || AVX512VL(CpuAVX2) || APX_F(CpuCMPCCXADD)
+	  || APX_F(CpuAMX_TILE) || APX_F(CpuAVX512F) || APX_F(CpuAVX512DQ)
+	  || APX_F(CpuAVX512BW) || APX_F(CpuBMI) || APX_F(CpuBMI2))
 	{
 	  if (need_evex_encoding ())
 	    {
@@ -3692,7 +3696,7 @@  install_template (const insn_template *t)
 		gas_assert (i.tm.cpu.bitfield.isa == i.tm.cpu_any.bitfield.isa);
 	    }
 	}
-  }
+    }
 
   /* Note that for pseudo prefixes this produces a length of 1. But for them
      the length isn't interesting at all.  */
@@ -3873,6 +3877,14 @@  is_any_vex_encoding (const insn_template *t)
   return t->opcode_modifier.vex || t->opcode_modifier.evex;
 }
 
+static INLINE bool
+is_apx_evex_encoding (void)
+{
+  return i.rex2 || i.tm.opcode_space == SPACE_EVEXMAP4
+    || (i.vex.register_specifier
+	&& i.vex.register_specifier->reg_flags & RegRex2);
+}
+
 static INLINE bool
 is_apx_rex2_encoding (void)
 {
@@ -4149,6 +4161,27 @@  build_rex2_prefix (void)
 		    | (i.rex2 << 4) | i.rex);
 }
 
+/* Build the EVEX prefix (4-byte) for evex insn
+   | 62h |
+   | `R`X`B`R' | B'mmm |
+   | W | v`v`v`v | `x' | pp |
+   | z| L'L | b | `v | aaa |
+*/
+static void
+build_apx_evex_prefix (void)
+{
+  build_evex_prefix ();
+  if (i.rex2 & REX_R)
+    i.vex.bytes[1] &= ~0x10;
+  if (i.rex2 & REX_B)
+    i.vex.bytes[1] |= 0x08;
+  if (i.rex2 & REX_X)
+    i.vex.bytes[2] &= ~0x04;
+  if (i.vex.register_specifier
+      && i.vex.register_specifier->reg_flags & RegRex2)
+    i.vex.bytes[3] &= ~0x08;
+}
+
 static void
 process_immext (void)
 {
@@ -5622,13 +5655,18 @@  md_assemble (char *line)
 	  return;
 	}
 
-      if (i.tm.opcode_modifier.vex)
+      if (is_apx_evex_encoding ())
+	build_apx_evex_prefix ();
+      else if (i.tm.opcode_modifier.vex)
 	build_vex_prefix (t);
       else
 	build_evex_prefix ();
 
       /* The individual REX.RXBW bits got consumed.  */
       i.rex &= REX_OPCODE;
+
+      /* The rex2 bits got consumed.  */
+      i.rex2 = 0;
     }
 
   /* Handle conversion of 'int $3' --> special int3 insn.  */
@@ -5655,17 +5693,17 @@  md_assemble (char *line)
      instruction already has a prefix, we need to convert old
      registers to new ones.  */
 
-  if ((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
-       && (i.op[0].regs->reg_flags & RegRex64) != 0)
-      || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
-	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
-      || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
-	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
-	  && (i.rex != 0 || i.rex2 != 0)))
+  if (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte
+	&& (i.op[0].regs->reg_flags & RegRex64) != 0)
+       || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte
+	   && (i.op[1].regs->reg_flags & RegRex64) != 0)
+       || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
+	    || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
+	   && (i.rex != 0 || i.rex2 != 0))))
     {
       int x;
 
-      if (!i.rex2)
+      if (!is_apx_rex2_encoding () && !is_any_vex_encoding(&i.tm))
 	i.rex |= REX_OPCODE;
       for (x = 0; x < 2; x++)
 	{
@@ -8020,7 +8058,8 @@  process_suffix (void)
       if (i.suffix != QWORD_MNEM_SUFFIX
 	  && i.tm.opcode_modifier.mnemonicsize != IGNORESIZE
 	  && !i.tm.opcode_modifier.floatmf
-	  && !is_any_vex_encoding (&i.tm)
+	  && (!is_any_vex_encoding (&i.tm)
+	      || i.tm.opcode_space == SPACE_EVEXMAP4)
 	  && ((i.suffix == LONG_MNEM_SUFFIX) == (flag_code == CODE_16BIT)
 	      || (flag_code == CODE_64BIT
 		  && i.tm.opcode_modifier.jump == JUMP_BYTE)))
@@ -8030,7 +8069,11 @@  process_suffix (void)
 	  if (i.tm.opcode_modifier.jump == JUMP_BYTE) /* jcxz, loop */
 	    prefix = ADDR_PREFIX_OPCODE;
 
-	  if (!add_prefix (prefix))
+	  /* The DATA PREFIX of EVEX promoted from legacy APX instructions
+	     needs to be adjusted.  */
+	  if (i.tm.opcode_space == SPACE_EVEXMAP4)
+	    i.tm.opcode_modifier.opcodeprefix = PREFIX_0X66;
+	  else if (!add_prefix (prefix))
 	    return 0;
 	}
 
@@ -14233,6 +14276,12 @@  static bool check_register (const reg_entry *r)
       if (!cpu_arch_flags.bitfield.cpuapx_f
 	  || flag_code != CODE_64BIT)
 	return false;
+
+      /* When using RegRex2, dual VEX/EVEX templates need to be marked as EVEX.
+	 For the later install_template function.  */
+      if (current_templates->start->opcode_modifier.vex
+	  && current_templates->start->opcode_modifier.evex)
+	i.vec_encoding = vex_encoding_evex;
     }
 
   if (((r->reg_flags & (RegRex64 | RegRex)) || r->reg_type.bitfield.qword)
diff --git a/gas/testsuite/gas/i386/x86-64-evex.d b/gas/testsuite/gas/i386/x86-64-evex.d
index 041747db892..5d974c312da 100644
--- a/gas/testsuite/gas/i386/x86-64-evex.d
+++ b/gas/testsuite/gas/i386/x86-64-evex.d
@@ -17,6 +17,6 @@  Disassembly of section .text:
  +[a-f0-9]+:	62 f1 d6 38 7b f0    	vcvtusi2ss %rax,\{rd-sae\},%xmm5,%xmm6
  +[a-f0-9]+:	62 f1 57 38 7b f0    	vcvtusi2sd %eax,\{rd-bad\},%xmm5,%xmm6
  +[a-f0-9]+:	62 f1 d7 38 7b f0    	vcvtusi2sd %rax,\{rd-sae\},%xmm5,%xmm6
- +[a-f0-9]+:	62 e1 7e 08 2d c0    	vcvtss2si %xmm0,\(bad\)
+ +[a-f0-9]+:	62 e1 7e 08 2d c0    	vcvtss2si %xmm0,%r16d
  +[a-f0-9]+:	62 e1 7c 08 c2 c0 00 	vcmpeqps %xmm0,%xmm0,\(bad\)
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 2be0df0e981..4a59a726ecb 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -250,7 +250,7 @@  run_dump_test "x86-64-sse-noavx"
 run_dump_test "x86-64-movbe"
 run_dump_test "x86-64-movbe-intel"
 run_dump_test "x86-64-movbe-suffix"
-run_list_test "x86-64-inval-movbe" "-al"
+run_list_test "x86-64-inval-movbe" "-I${srcdir}/$subdir -march=+noapx_f -al"
 run_dump_test "x86-64-ept"
 run_dump_test "x86-64-ept-intel"
 run_list_test "x86-64-inval-ept" "-al"
diff --git a/opcodes/i386-dis-evex-len.h b/opcodes/i386-dis-evex-len.h
index a02609c50f2..ad59a559e0d 100644
--- a/opcodes/i386-dis-evex-len.h
+++ b/opcodes/i386-dis-evex-len.h
@@ -62,6 +62,16 @@  static const struct dis386 evex_len_table[][3] = {
     { REG_TABLE (REG_EVEX_0F38C7_L_2) },
   },
 
+  /* EVEX_LEN_0F38F2 */
+  {
+    { PREFIX_TABLE (PREFIX_EVEX_0F38F2_L_0) },
+  },
+
+  /* EVEX_LEN_0F38F3 */
+  {
+    { PREFIX_TABLE (PREFIX_EVEX_0F38F3_L_0) },
+  },
+
   /* EVEX_LEN_0F3A00 */
   {
     { Bad_Opcode },
diff --git a/opcodes/i386-dis-evex-prefix.h b/opcodes/i386-dis-evex-prefix.h
index 28da54922c7..b11b7adb443 100644
--- a/opcodes/i386-dis-evex-prefix.h
+++ b/opcodes/i386-dis-evex-prefix.h
@@ -285,6 +285,14 @@ 
     { "%XEvfmsub213s%XW",	{ XMScalar, VexScalar, EXdq, EXxEVexR }, 0 },
     { "v4fnmadds%XS",	{ XMScalar, VexScalar, Mxmm }, 0 },
   },
+  /* PREFIX_EVEX_0F38F2_L_0 */
+  {
+    { "andnS",	{ Gdq, VexGdq, Edq }, 0 },
+  },
+  /* PREFIX_EVEX_0F38F3_L_0 */
+  {
+    { REG_TABLE (REG_EVEX_0F38F3_L_0_P_0) },
+  },
   /* PREFIX_EVEX_0F3A08 */
   {
     { "vrndscalep%XH",  { XM, EXxh, EXxEVexS, Ib }, 0 },
@@ -338,6 +346,64 @@ 
     { "vcmpp%XH", { MaskG, Vex, EXxh, EXxEVexS, CMP }, 0 },
     { "vcmps%XH", { MaskG, VexScalar, EXw, EXxEVexS, CMP }, 0 },
   },
+  /* PREFIX_EVEX_MAP4_D8 */
+  {
+    { "sha1nexte", { XM, EXxmm }, 0 },
+    { REG_TABLE (REG_0F38D8_PREFIX_1) },
+  },
+  /* PREFIX_EVEX_MAP4_DA */
+  {
+    { "sha1msg2", { XM, EXxmm }, 0 },
+    { "encodekey128", { Gd, Rd }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DB */
+  {
+    { "sha256rnds2", { XM, EXxmm, XMM0 }, 0 },
+    { "encodekey256", { Gd, Rd }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DC */
+  {
+    { "sha256msg1", { XM, EXxmm }, 0 },
+    { "aesenc128kl", { XM, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DD */
+  {
+    { "sha256msg2", { XM, EXxmm }, 0 },
+    { "aesdec128kl", { XM, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DE */
+  {
+    { Bad_Opcode },
+    { "aesenc256kl", { XM, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_DF */
+  {
+    { Bad_Opcode },
+    { "aesdec256kl", { XM, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F0 */
+  {
+    { "crc32A", { Gdq, Eb }, 0 },
+    { "invept", { Gm, Mo }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F1 */
+  {
+    { "crc32Q", { Gdq, Ev }, 0 },
+    { "invvpid", { Gm, Mo }, 0 },
+    { "crc32Q", { Gdq, Ev }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F2 */
+  {
+    { Bad_Opcode },
+    { "invpcid", { Gm, M }, 0 },
+  },
+  /* PREFIX_EVEX_MAP4_F8 */
+  {
+    { Bad_Opcode },
+    { "enqcmds", { Gva, M },  0 },
+    { "movdir64b", { Gva, M }, 0 },
+    { "enqcmd", { Gva, M }, 0 },
+  },
   /* PREFIX_EVEX_MAP5_10 */
   {
     { Bad_Opcode },
diff --git a/opcodes/i386-dis-evex-reg.h b/opcodes/i386-dis-evex-reg.h
index 2885063628b..8374f0ea93a 100644
--- a/opcodes/i386-dis-evex-reg.h
+++ b/opcodes/i386-dis-evex-reg.h
@@ -49,3 +49,10 @@ 
     { "vscatterpf0qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
     { "vscatterpf1qp%XW",  { MVexVSIBQWpX }, PREFIX_DATA },
   },
+  /* REG_EVEX_0F38F3_L_0_P_0 */
+  {
+    { Bad_Opcode },
+    { "blsrS",	{ VexGdq, Edq }, 0 },
+    { "blsmskS",	{ VexGdq, Edq }, 0 },
+    { "blsiS",	{ VexGdq, Edq }, 0 },
+  },
diff --git a/opcodes/i386-dis-evex-x86-64.h b/opcodes/i386-dis-evex-x86-64.h
new file mode 100644
index 00000000000..18c297feb86
--- /dev/null
+++ b/opcodes/i386-dis-evex-x86-64.h
@@ -0,0 +1,60 @@ 
+  /* X86_64_EVEX_0F90 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F90) },
+  },
+  /* X86_64_EVEX_0F91 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F91) },
+  },
+  /* X86_64_EVEX_0F92 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F92) },
+  },
+  /* X86_64_EVEX_0F93 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F93) },
+  },
+  /* X86_64_EVEX_0F3849 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_X86_64) },
+  },
+  /* X86_64_EVEX_0F384B */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F384B_X86_64) },
+  },
+  /* X86_64_EVEX_0F38F2 */
+  {
+    { Bad_Opcode },
+    { EVEX_LEN_TABLE (EVEX_LEN_0F38F2) },
+  },
+  /* X86_64_EVEX_0F38F3 */
+  {
+    { Bad_Opcode },
+    { EVEX_LEN_TABLE (EVEX_LEN_0F38F3) },
+  },
+  /* X86_64_EVEX_0F38F5 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F38F5) },
+  },
+  /* X86_64_EVEX_0F38F6 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F38F6) },
+  },
+  /* X86_64_EVEX_0F38F7 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F38F7) },
+  },
+  /* X86_64_EVEX_0F3AF0 */
+  {
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F3AF0) },
+  },
diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index 7ad1edbe72d..ea0a4c0b2a5 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -164,10 +164,10 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 90 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F90) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F91) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F92) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F93) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -375,9 +375,9 @@  static const struct dis386 evex_table[][256] = {
     { "vpsllv%DQ",	{ XM, Vex, EXx }, PREFIX_DATA },
     /* 48 */
     { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F3849) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F384B) },
     { "vrcp14p%XW",	{ XM, EXx }, PREFIX_DATA },
     { "vrcp14s%XW",	{ XMScalar, VexScalar, EXdq }, PREFIX_DATA },
     { "vrsqrt14p%XW",	{ XM, EXx }, 0 },
@@ -545,32 +545,32 @@  static const struct dis386 evex_table[][256] = {
     { "%XEvaesdecY",	{ XM, Vex, EXx }, PREFIX_DATA },
     { "%XEvaesdeclastY", { XM, Vex, EXx }, PREFIX_DATA },
     /* E0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E0) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E1) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E2) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E3) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E4) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E5) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E6) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E7) },
     /* E8 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E8) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38E9) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EA) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EB) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EC) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38ED) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EE) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_VEX_0F38EF) },
     /* F0 */
     { Bad_Opcode },
     { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F2) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F3) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F5) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F6) },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F38F7) },
     /* F8 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -854,7 +854,7 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* F0 */
-    { Bad_Opcode },
+    { X86_64_EVEX_FROM_VEX_TABLE (X86_64_EVEX_0F3AF0) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -983,13 +983,13 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* 60 */
+    { "movbeS",	{ Gv, Ev }, PREFIX_NP_OR_DATA },
+    { "movbeS",	{ Ev, Gv }, PREFIX_NP_OR_DATA },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { "wrussK",	{ M, Gdq }, PREFIX_DATA },
+    { PREFIX_TABLE (PREFIX_0F38F6) },
     { Bad_Opcode },
     /* 68 */
     { Bad_Opcode },
@@ -1113,19 +1113,19 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
+    { "sha1rnds4",	{ XM, EXxmm, Ib }, NO_PREFIX },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* D8 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_D8) },
+    { "sha1msg1",	{ XM, EXxmm }, NO_PREFIX },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DA) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DB) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DC) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DD) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DE) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_DF) },
     /* E0 */
     { Bad_Opcode },
     { Bad_Opcode },
@@ -1145,20 +1145,20 @@  static const struct dis386 evex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     /* F0 */
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F0) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F1) },
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F2) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
     /* F8 */
+    { PREFIX_TABLE (PREFIX_EVEX_MAP4_F8) },
+    { "movdiri",	{ Mdq, Gdq }, NO_PREFIX },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_0F38FC) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 3f1a8644930..b81e75aa786 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -132,6 +132,13 @@  enum x86_64_isa
   intel64
 };
 
+enum evex_type
+{
+  evex_default = 0,
+  evex_from_legacy,
+  evex_from_vex,
+};
+
 struct instr_info
 {
   enum address_mode address_mode;
@@ -211,7 +218,6 @@  struct instr_info
     int ll;
     bool w;
     bool evex;
-    bool r;
     bool v;
     bool zeroing;
     bool b;
@@ -219,6 +225,8 @@  struct instr_info
   }
   vex;
 
+  enum evex_type evex_type;
+
   /* Remember if the current op is a jump instruction.  */
   bool op_is_jump;
 
@@ -304,6 +312,8 @@  struct dis_private {
 #define PREFIX_ADDR 0x400
 #define PREFIX_FWAIT 0x800
 #define PREFIX_REX2 0x1000
+#define PREFIX_NP_OR_DATA 0x2000
+#define NO_PREFIX   0x4000
 
 /* Make sure that bytes from INFO->PRIVATE_DATA->BUFFER (inclusive)
    to ADDR (exclusive) are valid.  Returns true for success, false
@@ -801,6 +811,7 @@  enum
   USE_RM_TABLE,
   USE_PREFIX_TABLE,
   USE_X86_64_TABLE,
+  USE_X86_64_EVEX_FROM_VEX_TABLE,
   USE_3BYTE_TABLE,
   USE_XOP_8F_TABLE,
   USE_VEX_C4_TABLE,
@@ -819,6 +830,8 @@  enum
 #define RM_TABLE(I)		DIS386 (USE_RM_TABLE, (I))
 #define PREFIX_TABLE(I)		DIS386 (USE_PREFIX_TABLE, (I))
 #define X86_64_TABLE(I)		DIS386 (USE_X86_64_TABLE, (I))
+#define X86_64_EVEX_FROM_VEX_TABLE(I) \
+  DIS386 (USE_X86_64_EVEX_FROM_VEX_TABLE, (I))
 #define THREE_BYTE_TABLE(I)	DIS386 (USE_3BYTE_TABLE, (I))
 #define XOP_8F_TABLE()		DIS386 (USE_XOP_8F_TABLE, 0)
 #define VEX_C4_TABLE()		DIS386 (USE_VEX_C4_TABLE, 0)
@@ -879,7 +892,8 @@  enum
   REG_EVEX_0F72,
   REG_EVEX_0F73,
   REG_EVEX_0F38C6_L_2,
-  REG_EVEX_0F38C7_L_2
+  REG_EVEX_0F38C7_L_2,
+  REG_EVEX_0F38F3_L_0_P_0,
 };
 
 enum
@@ -1146,6 +1160,8 @@  enum
   PREFIX_EVEX_0F389B,
   PREFIX_EVEX_0F38AA,
   PREFIX_EVEX_0F38AB,
+  PREFIX_EVEX_0F38F2_L_0,
+  PREFIX_EVEX_0F38F3_L_0,
 
   PREFIX_EVEX_0F3A08,
   PREFIX_EVEX_0F3A0A,
@@ -1157,6 +1173,18 @@  enum
   PREFIX_EVEX_0F3A67,
   PREFIX_EVEX_0F3AC2,
 
+  PREFIX_EVEX_MAP4_D8,
+  PREFIX_EVEX_MAP4_DA,
+  PREFIX_EVEX_MAP4_DB,
+  PREFIX_EVEX_MAP4_DC,
+  PREFIX_EVEX_MAP4_DD,
+  PREFIX_EVEX_MAP4_DE,
+  PREFIX_EVEX_MAP4_DF,
+  PREFIX_EVEX_MAP4_F0,
+  PREFIX_EVEX_MAP4_F1,
+  PREFIX_EVEX_MAP4_F2,
+  PREFIX_EVEX_MAP4_F8,
+
   PREFIX_EVEX_MAP5_10,
   PREFIX_EVEX_MAP5_11,
   PREFIX_EVEX_MAP5_1D,
@@ -1268,7 +1296,21 @@  enum
   X86_64_VEX_0F38ED,
   X86_64_VEX_0F38EE,
   X86_64_VEX_0F38EF,
+
   X86_64_VEX_MAP7_F8_L_0_W_0_R_0,
+
+  X86_64_EVEX_0F90,
+  X86_64_EVEX_0F91,
+  X86_64_EVEX_0F92,
+  X86_64_EVEX_0F93,
+  X86_64_EVEX_0F3849,
+  X86_64_EVEX_0F384B,
+  X86_64_EVEX_0F38F2,
+  X86_64_EVEX_0F38F3,
+  X86_64_EVEX_0F38F5,
+  X86_64_EVEX_0F38F6,
+  X86_64_EVEX_0F38F7,
+  X86_64_EVEX_0F3AF0,
 };
 
 enum
@@ -1453,6 +1495,8 @@  enum
   EVEX_LEN_0F385B,
   EVEX_LEN_0F38C6,
   EVEX_LEN_0F38C7,
+  EVEX_LEN_0F38F2,
+  EVEX_LEN_0F38F3,
   EVEX_LEN_0F3A00,
   EVEX_LEN_0F3A01,
   EVEX_LEN_0F3A18,
@@ -4524,10 +4568,11 @@  static const struct dis386 x86_64_table[][2] = {
 
   /* X86_64_VEX_MAP7_F8_L_0_W_0_R_0 */
   {
-    { Bad_Opcode },
-    { PREFIX_TABLE (PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64) },
+      { Bad_Opcode },
+      { PREFIX_TABLE (PREFIX_VEX_MAP7_F8_L_0_W_0_R_0_X86_64) },
   },
 
+#include "i386-dis-evex-x86-64.h"
 };
 
 static const struct dis386 three_byte_table[][256] = {
@@ -8733,6 +8778,17 @@  get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
       dp = &prefix_table[dp->op[1].bytemode][vindex];
       break;
 
+    case USE_X86_64_EVEX_FROM_VEX_TABLE:
+      ins->evex_type = evex_from_vex;
+      /* EVEX from evex instrucions require that EVEX.z, EVEX.L’L, EVEX.b and
+	 the lower 2 bits of EVEX.aaa must be 0.  */
+      if ((ins->vex.mask_register_specifier & 0x3) != 0
+	  || ins->vex.ll != 0
+	  || ins->vex.zeroing != 0
+	  || ins->vex.b)
+	return &bad_opcode;
+
+      /* Fall through.  */
     case USE_X86_64_TABLE:
       vindex = ins->address_mode == mode_64bit ? 1 : 0;
       dp = &x86_64_table[dp->op[1].bytemode][vindex];
@@ -8978,9 +9034,13 @@  get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
       if (!fetch_code (ins->info, ins->codep + 4))
 	return &err_opcode;
       /* The first byte after 0x62.  */
+      if (*ins->codep & 0x8)
+	ins->rex2 |= REX_B;
+      if (!(*ins->codep & 0x10))
+	ins->rex2 |= REX_R;
+
       ins->rex = ~(*ins->codep >> 5) & 0x7;
-      ins->vex.r = *ins->codep & 0x10;
-      switch ((*ins->codep & 0xf))
+      switch ((*ins->codep & 0x7))
 	{
 	default:
 	  return &bad_opcode;
@@ -8993,6 +9053,12 @@  get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 	case 0x3:
 	  vex_table_index = EVEX_0F3A;
 	  break;
+	case 0x4:
+	  vex_table_index = EVEX_MAP4;
+	  ins->evex_type = evex_from_legacy;
+	  if (ins->address_mode != mode_64bit)
+	    return &bad_opcode;
+	  break;
 	case 0x5:
 	  vex_table_index = EVEX_MAP5;
 	  break;
@@ -9009,9 +9075,8 @@  get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 
       ins->vex.register_specifier = (~(*ins->codep >> 3)) & 0xf;
 
-      /* The U bit.  */
       if (!(*ins->codep & 0x4))
-	return &bad_opcode;
+	ins->rex2 |= REX_X;
 
       switch ((*ins->codep & 0x3))
 	{
@@ -9041,12 +9106,24 @@  get_valid_dis386 (const struct dis386 *dp, instr_info *ins)
 
       if (ins->address_mode != mode_64bit)
 	{
+	  if (ins->evex_type != evex_default
+	      || (ins->rex2 & (REX_B | REX_X)))
+	    return &bad_opcode;
 	  /* In 16/32-bit mode silently ignore following bits.  */
 	  ins->rex &= ~REX_B;
-	  ins->vex.r = true;
+	  ins->rex2 &= ~REX_R;
 	}
 
       ins->need_vex = 4;
+
+      /* EVEX from legacy instructions require that EVEX.z, EVEX.L’L and the
+	 lower 2 bits of EVEX.aaa must be 0.  */
+      if (ins->evex_type == evex_from_legacy
+	  && ((ins->vex.mask_register_specifier & 0x3) != 0
+	      || ins->vex.ll != 0
+	      || ins->vex.zeroing != 0))
+	return &bad_opcode;
+
       ins->codep++;
       vindex = *ins->codep++;
       dp = &evex_table[vex_table_index][vindex];
@@ -9460,6 +9537,13 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       dp = get_valid_dis386 (dp, &ins);
       if (dp == &err_opcode)
 	goto fetch_error_out;
+
+      /* For APX instructions promoted from legacy maps 0/1, prefix
+	 0x66 is interpreted as the operand size override.  */
+      if (ins.evex_type == evex_from_legacy
+	  && ins.vex.prefix == DATA_PREFIX_OPCODE)
+	sizeflag ^= DFLAG;
+
       if (dp != NULL && putop (&ins, dp->name, sizeflag) == 0)
 	{
 	  if (!get_sib (&ins, sizeflag))
@@ -9639,6 +9723,24 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       if (ins.last_repnz_prefix >= 0)
 	ins.all_prefixes[ins.last_repnz_prefix] = 0xf2;
       break;
+
+    case PREFIX_NP_OR_DATA:
+      if (ins.vex.prefix & ~DATA_PREFIX_OPCODE)
+	{
+	  i386_dis_printf (info, dis_style_text, "(bad)");
+	  ret = ins.end_codep - priv.the_buffer;
+	  goto out;
+	}
+      break;
+
+    case NO_PREFIX:
+      if (ins.vex.prefix)
+	{
+	  i386_dis_printf (info, dis_style_text, "(bad)");
+	  ret = ins.end_codep - priv.the_buffer;
+	  goto out;
+	}
+      break;
     }
 
   /* Check if the REX prefix is used.  */
@@ -10344,7 +10446,7 @@  putop (instr_info *ins, const char *in_template, int sizeflag)
 		{
 		case 'X':
 		  if (!ins->vex.evex || ins->vex.b || ins->vex.ll >= 2
-		      || !ins->vex.r
+		      || (ins->rex2 & REX_R)
 		      || (ins->modrm.mod == 3 && (ins->rex & REX_X))
 		      || !ins->vex.v || ins->vex.mask_register_specifier)
 		    break;
@@ -11456,7 +11558,7 @@  OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
   add += (ins->rex2 & REX_B) ? 16 : 0;
 
-  if (ins->vex.evex)
+  if (ins->vex.evex && ins->evex_type == evex_default)
     {
 
       /* Zeroing-masking is invalid for memory destinations. Set the flag
@@ -11604,6 +11706,13 @@  OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 		abort ();
 	      if (ins->vex.evex)
 		{
+		  /* S/G EVEX insns require EVEX.X4 not to be set.  */
+		  if (ins->rex2 & REX_X)
+		    {
+		      oappend (ins, "(bad)");
+		      return true;
+		    }
+
 		  if (!ins->vex.v)
 		    vindex += 16;
 		  check_gather = ins->obufp == ins->op_out[1];
@@ -11803,7 +11912,7 @@  OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
 	      if (ins->rex & REX_R)
 	        modrm_reg += 8;
-	      if (!ins->vex.r)
+	      if (ins->rex2 & REX_R)
 	        modrm_reg += 16;
 	      if (vindex == modrm_reg)
 		oappend (ins, "/(bad)");
@@ -12009,10 +12118,7 @@  OP_indirE (instr_info *ins, int bytemode, int sizeflag)
 static bool
 OP_G (instr_info *ins, int bytemode, int sizeflag)
 {
-  if (ins->vex.evex && !ins->vex.r && ins->address_mode == mode_64bit)
-    oappend (ins, "(bad)");
-  else
-    print_register (ins, ins->modrm.reg, REX_R, bytemode, sizeflag);
+  print_register (ins, ins->modrm.reg, REX_R, bytemode, sizeflag);
   return true;
 }
 
@@ -12644,7 +12750,7 @@  OP_XMM (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
     reg += 8;
   if (ins->vex.evex)
     {
-      if (!ins->vex.r)
+      if (ins->rex2 & REX_R)
 	reg += 16;
     }
 
@@ -13652,7 +13758,7 @@  DistinctDest_Fixup (instr_info *ins, int bytemode, int sizeflag)
   /* Calc destination register number.  */
   if (ins->rex & REX_R)
     modrm_reg += 8;
-  if (!ins->vex.r)
+  if (ins->rex2 & REX_R)
     modrm_reg += 16;
 
   /* Calc src1 register number.  */
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 6402b669d37..7dab744134f 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -490,6 +490,7 @@  static bitfield opcode_modifiers[] =
   BITFIELD (IntelSyntax),
   BITFIELD (ISA64),
   BITFIELD (NoEgpr),
+  BITFIELD (NF),
 };
 
 #define CLASS(n) #n, n
@@ -1127,6 +1128,7 @@  process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
     SPACE(0F),
     SPACE(0F38),
     SPACE(0F3A),
+    SPACE(EVEXMAP4),
     SPACE(EVEXMAP5),
     SPACE(EVEXMAP6),
     SPACE(VEXMAP7),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index d28a4cedf0f..88717fd7575 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -753,6 +753,9 @@  enum
      whether the instruction supports pseudo-prefix {rex2}.  */
   NoEgpr,
 
+  /* No CSPAZO flags update indication.  */
+  NF,
+
   /* The last bitfield in i386_opcode_modifier.  */
   Opcode_Modifier_Num
 };
@@ -801,6 +804,7 @@  typedef struct i386_opcode_modifier
   unsigned int intelsyntax:1;
   unsigned int isa64:2;
   unsigned int noegpr:1;
+  unsigned int nf:1;
 } i386_opcode_modifier;
 
 /* Operand classes.  */
@@ -976,6 +980,7 @@  typedef struct insn_template
      1: 0F opcode prefix / space.
      2: 0F38 opcode prefix / space.
      3: 0F3A opcode prefix / space.
+     4: EVEXMAP4 opcode prefix / space.
      5: EVEXMAP5 opcode prefix / space.
      6: EVEXMAP6 opcode prefix / space.
      7: VEXMAP7 opcode prefix / space.
@@ -987,6 +992,7 @@  typedef struct insn_template
 #define SPACE_0F	1
 #define SPACE_0F38	2
 #define SPACE_0F3A	3
+#define SPACE_EVEXMAP4	4
 #define SPACE_EVEXMAP5	5
 #define SPACE_EVEXMAP6	6
 #define SPACE_VEXMAP7	7
@@ -1054,3 +1060,7 @@  typedef struct
 #define Dw2Inval (-1)
 }
 reg_entry;
+
+#define APX_F(cpuid) (maybe_cpu (t, CpuAPX_F) && maybe_cpu (t, cpuid))
+#define AVX512F(cpuid) (maybe_cpu (t, CpuAVX512F) && maybe_cpu (t, cpuid))
+#define AVX512VL(cpuid) (maybe_cpu (t, CpuAVX512VL) && maybe_cpu (t, cpuid))
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index cbf9d968fba..b27131ef185 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -109,6 +109,7 @@ 
 #define SpaceXOP09 OpcodeSpace=SPACE_XOP09
 #define SpaceXOP0A OpcodeSpace=SPACE_XOP0A
 
+#define EVexMap4 OpcodeSpace=SPACE_EVEXMAP4
 #define EVexMap5 OpcodeSpace=SPACE_EVEXMAP5
 #define EVexMap6 OpcodeSpace=SPACE_EVEXMAP6
 
@@ -138,7 +139,6 @@ 
 #define Vsz256 Vsz=VSZ256
 #define Vsz512 Vsz=VSZ512
 
-
 // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
 // the bit to mark commutative VEX encodings where swapping the source
 // operands may allow to switch from 3-byte to 2-byte VEX encoding.
@@ -194,6 +194,7 @@  mov, 0xf24, i386&No64, D|RegMem|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|No_qSuf, { Te
 
 // Move after swapping the bytes
 movbe, 0x0f38f0, Movbe, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+movbe, 0x60, Movbe&APX_F, D|Modrm|CheckOperandSize|No_bSuf|No_sSuf|EVex128|EVexMap4, { Reg16|Reg32|Reg64|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // Move with sign extend.
 movsb, 0xfbe, i386, Modrm|No_bSuf|No_sSuf, { Reg8|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
@@ -896,7 +897,7 @@  rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
 <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
                       load:Load:0, store:Store:0, +
                       vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
-                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>
+                      rex:REX:x64, rex2:REX2:APX_F, nooptimize:NoOptimize:0>
 
 {<pseudopfx>}, PSEUDO_PREFIX/Prefix_<pseudopfx:ident>, <pseudopfx:cpu>, NoSuf|IsPrefix, {}
 
@@ -1319,13 +1320,16 @@  getsec, 0xf37, SMX, NoSuf, {}
 
 invept, 0x660f3880, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
 invept, 0x660f3880, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
+invept, 0xf3f0, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
 invvpid, 0x660f3881, EPT&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
 invvpid, 0x660f3881, EPT&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
+invvpid, 0xf3f1, EPT&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
 
 // INVPCID instruction
 
 invpcid, 0x660f3882, INVPCID&No64, Modrm|IgnoreSize|NoSuf, { Oword|Unspecified|BaseIndex, Reg32 }
 invpcid, 0x660f3882, INVPCID&x64, Modrm|NoSuf|NoRex64, { Oword|Unspecified|BaseIndex, Reg64 }
+invpcid, 0xf3f2, INVPCID&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Oword|Unspecified|BaseIndex, Reg64 }
 
 // SSSE3 instructions.
 
@@ -1426,6 +1430,8 @@  pcmpistri<sse42>, 0x660f3a63, <sse42:cpu>, Modrm|<sse42:attr>|NoSuf, { Imm8, Reg
 pcmpistrm<sse42>, 0x660f3a62, <sse42:cpu>, Modrm|<sse42:attr>|NoSuf, { Imm8, RegXMM|Unspecified|BaseIndex, RegXMM }
 crc32, 0xf20f38f0, SSE4_2, W|Modrm|No_sSuf|No_qSuf, { Reg8|Reg16|Reg32|Unspecified|BaseIndex, Reg32 }
 crc32, 0xf20f38f0, SSE4_2&x64, W|Modrm|No_wSuf|No_lSuf|No_sSuf, { Reg8|Reg64|Unspecified|BaseIndex, Reg64 }
+crc32, 0xf0, APX_F, W|Modrm|No_sSuf|No_qSuf|EVex128|EVexMap4, { Reg8|Reg16|Reg32|Unspecified|BaseIndex, Reg32 }
+crc32, 0xf0, APX_F, W|Modrm|No_wSuf|No_lSuf|No_sSuf|EVex128|EVexMap4, { Reg8|Reg64|Unspecified|BaseIndex, Reg64 }
 
 // xsave/xrstor New Instructions.
 
@@ -1437,7 +1443,6 @@  xgetbv, 0xf01d0, Xsave, NoSuf, {}
 xsetbv, 0xf01d1, Xsave, NoSuf, {}
 
 // xsaveopt
-
 xsaveopt, 0xfae/6, Xsaveopt, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr, { Unspecified|BaseIndex }
 xsaveopt64, 0xfae/6, Xsaveopt&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
 
@@ -1837,14 +1842,14 @@  xtest, 0xf01d6, HLE|RTM, NoSuf, {}
 
 // BMI2 instructions.
 
-bzhi, 0xf5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-mulx, 0xf2f6, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-pdep, 0xf2f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-pext, 0xf3f5, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-rorx, 0xf2f0, BMI2, Modrm|CheckOperandSize|Vex128|Space0F3A|No_bSuf|No_wSuf|No_sSuf, { Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
-sarx, 0xf3f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-shlx, 0x66f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-shrx, 0xf2f7, BMI2, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+bzhi, 0xf5, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+mulx, 0xf2f6, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+pdep, 0xf2f5, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+pext, 0xf3f5, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+rorx, 0xf2f0, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F3A|No_bSuf|No_wSuf|No_sSuf, { Imm8|Imm8S, Reg32|Reg64|Dword|Qword|Unspecified|BaseIndex, Reg32|Reg64 }
+sarx, 0xf3f7, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+shlx, 0x66f7, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+shrx, 0xf2f7, BMI2&(BMI2|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 
 // FMA4 instructions
 
@@ -1914,11 +1919,11 @@  lwpins, 0x12/0, LWP, Modrm|SpaceXOP0A|NoSuf|VexVVVV|Vex, { Imm32|Imm32S, Reg32|U
 
 // BMI instructions
 
-andn, 0xf2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
-bextr, 0xf7, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsi, 0xf3/3, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsmsk, 0xf3/2, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
-blsr, 0xf3/1, BMI, Modrm|CheckOperandSize|Vex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+andn, 0xf2, BMI&(BMI|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64, Reg32|Reg64 }
+bextr, 0xf7, BMI&(BMI|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|SwapSources|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64, Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsi, 0xf3/3, BMI&(BMI|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsmsk, 0xf3/2, BMI&(BMI|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
+blsr, 0xf3/1, BMI&(BMI|APX_F), Modrm|CheckOperandSize|Vex128|EVex128|Space0F38|VexVVVV|No_bSuf|No_wSuf|No_sSuf|NF, { Reg32|Reg64|Unspecified|BaseIndex, Reg32|Reg64 }
 tzcnt, 0xf30fbc, BMI, Modrm|CheckOperandSize|No_bSuf|No_sSuf, { Reg16|Reg32|Reg64|Word|Dword|Qword|Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
 
 // TBM instructions
@@ -2047,13 +2052,20 @@  bndldx, 0x0f1a, MPX, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex, RegBND }
 
 // SHA instructions.
 sha1rnds4, 0xf3acc, SHA, Modrm|NoSuf, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1rnds4, 0xd4, SHA&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Imm8|Imm8S, RegXMM|Unspecified|BaseIndex, RegXMM }
 sha1nexte, 0xf38c8, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1nexte, 0xd8, SHA&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha1msg1, 0xf38c9, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1msg1, 0xd9, SHA&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha1msg2, 0xf38ca, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha1msg2, 0xda, SHA&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256rnds2, 0xf38cb, SHA, Modrm|NoSuf, { Acc|Xmmword, RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256rnds2, 0xf38cb, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha256rnds2, 0xdb, SHA&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256msg1, 0xf38cc, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha256msg1, 0xdc, SHA&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 sha256msg2, 0xf38cd, SHA, Modrm|NoSuf, { RegXMM|Unspecified|BaseIndex, RegXMM }
+sha256msg2, 0xdd, SHA&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { RegXMM|Unspecified|BaseIndex, RegXMM }
 
 // SHA512 instructions.
 
@@ -2112,9 +2124,9 @@  kor<bw>, 0x<bw:kpfx>45, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { R
 kxnor<bw>, 0x<bw:kpfx>46, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { RegMask, RegMask, RegMask }
 kxor<bw>, 0x<bw:kpfx>47, <bw:kcpu>, Modrm|Vex256|Space0F|VexVVVV|VexW0|NoSuf, { RegMask, RegMask, RegMask }
 
-kmov<bw>, 0x<bw:kpfx>90, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask|<bw:elem>|Unspecified|BaseIndex, RegMask }
-kmov<bw>, 0x<bw:kpfx>91, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask, <bw:elem>|Unspecified|BaseIndex }
-kmov<bw>, 0x<bw:kpfx>92, <bw:kcpu>, D|Modrm|Vex128|Space0F|VexW0|NoSuf, { Reg32, RegMask }
+kmov<bw>, 0x<bw:kpfx>90, <bw:kcpu>&(<bw:kcpu>|APX_F), Modrm|Vex128|EVex128|Space0F|VexW0|NoSuf, { RegMask|<bw:elem>|Unspecified|BaseIndex, RegMask }
+kmov<bw>, 0x<bw:kpfx>91, <bw:kcpu>&(<bw:kcpu>|APX_F), Modrm|Vex128|EVex128|Space0F|VexW0|NoSuf, { RegMask, <bw:elem>|Unspecified|BaseIndex }
+kmov<bw>, 0x<bw:kpfx>92, <bw:kcpu>&(<bw:kcpu>|APX_F), D|Modrm|Vex128|EVex128|Space0F|VexW0|NoSuf, { Reg32, RegMask }
 
 knot<bw>, 0x<bw:kpfx>44, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask, RegMask }
 kortest<bw>, 0x<bw:kpfx>98, <bw:kcpu>, Modrm|Vex128|Space0F|VexW0|NoSuf, { RegMask, RegMask }
@@ -2589,9 +2601,9 @@  vpmovzxdq, 0x6635, AVX512VL, Modrm|EVex=3|Masking|Space0F38|VexW=1|Disp8MemShift
 kadd<dq>, 0x<dq:kpfx>4a, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
 kand<dq>, 0x<dq:kpfx>41, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
 kandn<dq>, 0x<dq:kpfx>42, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf|Optimize, { RegMask, RegMask, RegMask }
-kmov<dq>, 0x<dq:kpfx>90, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
-kmov<dq>, 0x<dq:kpfx>91, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
-kmov<dq>, 0xf292, AVX512BW, D|Modrm|Vex128|Space0F|<dq:vexw64>|<dq:kvsz>|NoSuf, { <dq:gpr>, RegMask }
+kmov<dq>, 0x<dq:kpfx>90, AVX512BW&(AVX512BW|APX_F), Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask|<dq:elem>|Unspecified|BaseIndex, RegMask }
+kmov<dq>, 0x<dq:kpfx>91, AVX512BW&(AVX512BW|APX_F), Modrm|Vex128|EVex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, <dq:elem>|Unspecified|BaseIndex }
+kmov<dq>, 0xf292, AVX512BW&(AVX512BW|APX_F), D|Modrm|Vex128|EVex128|Space0F|<dq:vexw64>|<dq:kvsz>|NoSuf, { <dq:gpr>, RegMask }
 knot<dq>, 0x<dq:kpfx>44, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask }
 kor<dq>, 0x<dq:kpfx>45, AVX512BW, Modrm|Vex256|Space0F|VexVVVV|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask, RegMask }
 kortest<dq>, 0x<dq:kpfx>98, AVX512BW, Modrm|Vex128|Space0F|VexW1|<dq:kvsz>|NoSuf, { RegMask, RegMask }
@@ -2990,9 +3002,13 @@  rdsspq, 0xf30f1e/1, SHSTK&x64, Modrm|NoSuf, { Reg64 }
 saveprevssp, 0xf30f01ea, SHSTK, NoSuf, {}
 rstorssp, 0xf30f01/5, SHSTK, Modrm|NoSuf, { Qword|Unspecified|BaseIndex }
 wrssd, 0x0f38f6, SHSTK, Modrm|IgnoreSize|NoSuf, { Reg32, Dword|Unspecified|BaseIndex }
+wrssd, 0x66, SHSTK&APX_F, Modrm|IgnoreSize|NoSuf|EVex128|EVexMap4, { Reg32, Dword|Unspecified|BaseIndex }
 wrssq, 0x0f38f6, SHSTK&x64, Modrm|NoSuf|Size64, { Reg64, Qword|Unspecified|BaseIndex }
+wrssq, 0x66, SHSTK&APX_F, Modrm|NoSuf|Size64|EVex128|EVexMap4, { Reg64, Qword|Unspecified|BaseIndex }
 wrussd, 0x660f38f5, SHSTK, Modrm|IgnoreSize|NoSuf, { Reg32, Dword|Unspecified|BaseIndex }
+wrussd, 0x6665, SHSTK&APX_F, Modrm|IgnoreSize|NoSuf|EVex128|EVexMap4, { Reg32, Dword|Unspecified|BaseIndex }
 wrussq, 0x660f38f5, SHSTK&x64, Modrm|NoSuf, { Reg64, Qword|Unspecified|BaseIndex }
+wrussq, 0x6665, SHSTK&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Reg64, Qword|Unspecified|BaseIndex }
 setssbsy, 0xf30f01e8, SHSTK, NoSuf, {}
 clrssbsy, 0xf30fae/6, SHSTK, Modrm|NoSuf, { Qword|Unspecified|BaseIndex }
 endbr64, 0xf30f1efa, IBT, NoSuf, {}
@@ -3040,7 +3056,9 @@  cldemote, 0x0f1c/0, CLDEMOTE, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex }
 // MOVDIR[I,64B] instructions.
 
 movdiri, 0xf38f9, MOVDIRI, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+movdiri, 0xf9, MOVDIRI&APX_F, Modrm|CheckOperandSize|IgnoreSize|No_bSuf|No_wSuf|No_sSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 movdir64b, 0x660f38f8, MOVDIR64B, Modrm|AddrPrefixOpReg|NoSuf, { Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+movdir64b, 0x66f8, MOVDIR64B&APX_F, Modrm|AddrPrefixOpReg|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, Reg32|Reg64 }
 
 // MOVEDIR instructions end.
 
@@ -3069,7 +3087,9 @@  vcvtneps2bf16<Vxy>, 0xf372, AVX_NE_CONVERT, Modrm|<Vxy:vex>|Space0F38|VexW0|NoSu
 // ENQCMD instructions.
 
 enqcmd, 0xf20f38f8, ENQCMD, Modrm|AddrPrefixOpReg|NoSuf, { Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+enqcmd, 0xf2f8, ENQCMD&(ENQCMD|APX_F), Modrm|AddrPrefixOpReg|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, Reg32|Reg64 }
 enqcmds, 0xf30f38f8, ENQCMD, Modrm|AddrPrefixOpReg|NoSuf, { Unspecified|BaseIndex, Reg16|Reg32|Reg64 }
+enqcmds, 0xf3f8, ENQCMD&(ENQCMD|APX_F), Modrm|AddrPrefixOpReg|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, Reg32|Reg64 }
 
 // ENQCMD instructions end.
 
@@ -3130,8 +3150,8 @@  xresldtrk, 0xf20f01e9, TSXLDTRK, NoSuf, {}
 
 // AMX instructions.
 
-ldtilecfg, 0x49/0, AMX_TILE, Modrm|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
-sttilecfg, 0x6649/0, AMX_TILE, Modrm|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
+ldtilecfg, 0x49/0, AMX_TILE&(AMX_TILE|APX_F), Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
+sttilecfg, 0x6649/0, AMX_TILE&(AMX_TILE|APX_F), Modrm|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex }
 
 tcmmimfp16ps, 0x666c, AMX_COMPLEX, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 tcmmrlfp16ps, 0x6c, AMX_COMPLEX, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
@@ -3143,9 +3163,9 @@  tdpbuud, 0x5e, AMX_INT8, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf,
 tdpbusd, 0x665e, AMX_INT8, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 tdpbsud, 0xf35e, AMX_INT8, Modrm|Vex128|Space0F38|VexVVVV|VexW0|SwapSources|NoSuf, { RegTMM, RegTMM, RegTMM }
 
-tileloadd, 0xf24b, AMX_TILE, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
-tileloaddt1, 0x664b, AMX_TILE, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
-tilestored, 0xf34b, AMX_TILE, Sibmem|Vex128|Space0F38|VexW0|NoSuf, { RegTMM, Unspecified|BaseIndex }
+tileloadd, 0xf24b, AMX_TILE&(AMX_TILE|APX_F), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
+tileloaddt1, 0x664b, AMX_TILE&(AMX_TILE|APX_F), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { Unspecified|BaseIndex, RegTMM }
+tilestored, 0xf34b, AMX_TILE&(AMX_TILE|APX_F), Sibmem|Vex128|EVex128|Space0F38|VexW0|NoSuf, { RegTMM, Unspecified|BaseIndex }
 
 tilerelease, 0x49c0, AMX_TILE, Vex128|Space0F38|VexW0|NoSuf, {}
 
@@ -3157,15 +3177,25 @@  tilezero, 0xf249, AMX_TILE, Modrm|Vex128|Space0F38|VexW0|NoSuf, { RegTMM }
 
 loadiwkey, 0xf30f38dc, KL, Load|Modrm|NoSuf, { RegXMM, RegXMM }
 encodekey128, 0xf30f38fa, KL, Modrm|NoSuf, { Reg32, Reg32 }
+encodekey128, 0xf3da, KL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Reg32, Reg32 }
 encodekey256, 0xf30f38fb, KL, Modrm|NoSuf, { Reg32, Reg32 }
+encodekey256, 0xf3db, KL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Reg32, Reg32 }
 aesenc128kl, 0xf30f38dc, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesenc128kl, 0xf3dc, KL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesdec128kl, 0xf30f38dd, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesdec128kl, 0xf3dd, KL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesenc256kl, 0xf30f38de, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesenc256kl, 0xf3de, KL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesdec256kl, 0xf30f38df, KL, Modrm|NoSuf, { Unspecified|BaseIndex, RegXMM }
+aesdec256kl, 0xf3df, KL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex, RegXMM }
 aesencwide128kl, 0xf30f38d8/0, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesencwide128kl, 0xf3d8/0, WideKL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex }
 aesdecwide128kl, 0xf30f38d8/1, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesdecwide128kl, 0xf3d8/1, WideKL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex }
 aesencwide256kl, 0xf30f38d8/2, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesencwide256kl, 0xf3d8/2, WideKL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex }
 aesdecwide256kl, 0xf30f38d8/3, WideKL, Modrm|NoSuf, { Unspecified|BaseIndex }
+aesdecwide256kl, 0xf3d8/3, WideKL&APX_F, Modrm|NoSuf|EVex128|EVexMap4, { Unspecified|BaseIndex }
 
 // KEYLOCKER instructions end.
 
@@ -3313,7 +3343,7 @@  prefetchit1, 0xf18/6, PREFETCHI, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex }
 
 // CMPCCXADD instructions.
 
-cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD, Modrm|Vex|Space0F38|VexVVVV|SwapSources|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+cmp<cc>xadd, 0x66e<cc:opc>, CMPCCXADD&(CMPCCXADD|APX_F), Modrm|Vex|EVex128|Space0F38|VexVVVV|SwapSources|CheckOperandSize|NoSuf, { Reg32|Reg64, Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 
 // CMPCCXADD instructions end.
 
@@ -3333,9 +3363,13 @@  wrmsrlist, 0xf30f01c6, MSRLIST, NoSuf, {}
 // RAO-INT instructions.
 
 aadd, 0xf38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+aadd, 0xfc, RAO_INT&APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 aand, 0x660f38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+aand, 0x66fc, RAO_INT&APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 aor, 0xf20f38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+aor, 0xf2fc, RAO_INT&APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 axor, 0xf30f38fc, RAO_INT, Modrm|IgnoreSize|CheckOperandSize|NoSuf, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
+axor, 0xf3fc, RAO_INT&APX_F, Modrm|IgnoreSize|CheckOperandSize|NoSuf|EVex128|EVexMap4, { Reg32|Reg64, Dword|Qword|Unspecified|BaseIndex }
 
 // RAO-INT instructions end.