[v3,2/9] Support APX GPR32 with rex2 prefix

Message ID 20231124070213.3886483-2-lili.cui@intel.com
State Unresolved
Headers
Series [1/9] Make const_1_mode print $1 in AT&T syntax |

Checks

Context Check Description
snail/binutils-gdb-check warning Git am fail log

Commit Message

Cui, Lili Nov. 24, 2023, 7:02 a.m. UTC
  APX uses the REX2 prefix to support EGPR for map0 and map1 of legacy
instructions. We added the NoEgpr flag in i386-gen.c for instructions
that do not support EGPR.

We print the pseudo prefix {rex2} for instructions that are ambiguous,
unlike REX.

gas/ChangeLog:

2023-11-21  Lingling Kong <lingling.kong@intel.com>
	    H.J. Lu  <hongjiu.lu@intel.com>
	    Lili Cui <lili.cui@intel.com>
	    Lin Hu   <lin1.hu@intel.com>

	* config/tc-i386.c
	(enum i386_error): Add register_type_of_address_mismatch
	and invalid_pseudo_prefix.
	(struct _i386_insn): Add rex2 rex-byte and rex2_encoding for
	gpr32 r16-r31.
	(is_cpu): Add apx_f.
	(register_number): Handle RegRex2 for gpr32.
	(is_apx_rex2_encoding): New func. Test rex2 prefix encoding.
	(build_rex2_prefix): New func. Build legacy insn in
	opcode 0/1 use gpr32 with rex2 prefix.
	(optimize_encoding): Handel add r16-r31 for registers.
	(md_assemble): Handle apx encoding.
	(parse_insn): Handle Prefix_REX2.
	(check_EgprOperands): New func. Check if Egprs operands
	are valid for the instruction
	(match_template):  Handle Egpr operands check.
	(set_rex_rex2):  New func. set i.rex and i.rex2.
	(build_modrm_byte): Ditto.
	(output_insn): Handle rex2 2-byte prefix output.
	(check_register): Handle check egpr illegal without
	target apx, 64-bit mode and with rex_prefix.
	* doc/c-i386.texi: Document .apx.
	* testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d: D5 valid
	in 64-bit mode.
	* testsuite/gas/i386/ilp32/x86-64-opcode-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-inval-pseudo.l: Add rex2 invalid testcase.
	* testsuite/gas/i386/x86-64-inval-pseudo.s:  Ditto.
	* testsuite/gas/i386/x86-64-opcode-inval-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-opcode-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-pseudos-bad.l: Add illegal rex2 test.
	* testsuite/gas/i386/x86-64-pseudos-bad.s: Ditto.
	* testsuite/gas/i386/x86-64-pseudos.d: Add rex2 test.
	* testsuite/gas/i386/x86-64-pseudos.s: Ditto.
	* testsuite/gas/i386/x86-64.exp: Run APX tests.
	* testsuite/gas/i386/x86-64-apx-egpr-inval.l: New test.
	* testsuite/gas/i386/x86-64-apx-egpr-inval.s: New test.
	* testsuite/gas/i386/x86-64-apx-rex2.d: New test.
	* testsuite/gas/i386/x86-64-apx-rex2.s: New test.

include/ChangeLog:

	* opcode/i386.h (REX2_OPCODE): Add REX2_OPCODE.

opcodes/ChangeLog:

	* i386-dis.c (struct instr_info): Add erex for gpr32.
	Add last_erex_prefix for rex2 prefix.
	(USED_REX2): Extend for gpr32.
	(REX2_M): Ditto.
	(PREFIX_REX2): Ditto.
	(ILLEGAL_PREFIX_REX2): Ditto.
	(ckprefix): Ditto.
	(prefix_name): Ditto.
	(print_insn): Ditto.
	(print_register): Ditto.
	(OP_E_memory): Ditto.
	(OP_REG): Ditto.
	(OP_EX): Ditto.
	* i386-gen.c (rex2_disallowed): Some instructions are not allowed rex2 prefix.
	(process_i386_opcode_modifier): Set NoEgpr for VEX and some special instructions.
	(output_i386_opcode): Handle if_entry_needs_special_handle.
	* i386-init.h : Regenerated.
	* i386-mnem.h : Regenerated.
	* i386-opc.h (enum i386_cpu): Add CpuAPX_F.
	(Prefix_NoOptimize): Ditto.
	(Prefix_REX2): Ditto.
	(RegRex2): Ditto.
	* i386-opc.tbl: Add rex2 prefix.
	* i386-reg.tbl: Add egprs (r16-r31).
	* i386-tbl.h: Regenerated.
---
 gas/config/tc-i386.c                          | 164 +++++++++--
 gas/doc/c-i386.texi                           |   6 +-
 .../i386/ilp32/x86-64-opcode-inval-intel.d    |  47 +---
 .../gas/i386/ilp32/x86-64-opcode-inval.d      |  47 +---
 .../gas/i386/x86-64-apx-egpr-inval.l          |  15 +
 .../gas/i386/x86-64-apx-egpr-inval.s          |  18 ++
 gas/testsuite/gas/i386/x86-64-apx-rex2.d      |  83 ++++++
 gas/testsuite/gas/i386/x86-64-apx-rex2.s      |  86 ++++++
 gas/testsuite/gas/i386/x86-64-inval-pseudo.l  |   6 +
 gas/testsuite/gas/i386/x86-64-inval-pseudo.s  |   4 +
 .../gas/i386/x86-64-opcode-inval-intel.d      |  26 +-
 gas/testsuite/gas/i386/x86-64-opcode-inval.d  |  26 +-
 gas/testsuite/gas/i386/x86-64-opcode-inval.s  |   4 -
 gas/testsuite/gas/i386/x86-64-pseudos-bad.l   |  59 +++-
 gas/testsuite/gas/i386/x86-64-pseudos-bad.s   |  58 ++++
 gas/testsuite/gas/i386/x86-64-pseudos.d       |  21 ++
 gas/testsuite/gas/i386/x86-64-pseudos.s       |  22 ++
 gas/testsuite/gas/i386/x86-64.exp             |   2 +
 include/opcode/i386.h                         |   2 +
 opcodes/i386-dis.c                            | 262 ++++++++++++------
 opcodes/i386-gen.c                            |  55 +++-
 opcodes/i386-opc.h                            |  13 +-
 opcodes/i386-opc.tbl                          |  28 +-
 opcodes/i386-reg.tbl                          |  64 +++++
 24 files changed, 856 insertions(+), 262 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-apx-rex2.s
  

Comments

Jan Beulich Dec. 4, 2023, 4:30 p.m. UTC | #1
On 24.11.2023 08:02, Cui, Lili wrote:
> @@ -3865,6 +3873,12 @@ is_any_vex_encoding (const insn_template *t)
>    return t->opcode_modifier.vex || t->opcode_modifier.evex;
>  }
>  
> +static INLINE bool
> +is_apx_rex2_encoding (void)
> +{
> +  return i.rex2 || i.rex2_encoding;
> +}

This function is used just once. Do we really need it? Or else why
don't you use it near the end of md_assemble()?

> @@ -4120,6 +4134,21 @@ build_evex_prefix (void)
>      i.vex.bytes[3] |= i.mask.reg->reg_num;
>  }
>  
> +/* Build (2 bytes) rex2 prefix.
> +   | D5h |
> +   | m | R4 X4 B4 | W R X B |
> +*/
> +static void
> +build_rex2_prefix (void)
> +{
> +  /* Rex2 reuses i.vex because they handle i.tm.opcode_space the same.  */

How do they handle it the same? (Also I don't think this is useful as
a code comment; it instead belongs in the description imo.)

> +  i.vex.length = 2;
> +  i.vex.bytes[0] = 0xd5;
> +  /* For the W R X B bits, the variables of rex prefix will be reused.  */
> +  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> +		    | (i.rex2 << 4) | i.rex);
> +}
> +
>  static void
>  process_immext (void)
>  {
> @@ -4385,12 +4414,16 @@ optimize_encoding (void)
>  	  i.suffix = 0;
>  	  /* Convert to byte registers.  */
>  	  if (i.types[1].bitfield.word)
> -	    j = 16;
> -	  else if (i.types[1].bitfield.dword)
> +	    /* There are 40 8-bit registers.  */
>  	    j = 32;
> +	  else if (i.types[1].bitfield.dword)
> +	    /* 32 8-bit registers + 32 16-bit registers.  */
> +	    j = 64;
>  	  else
> -	    j = 48;
> -	  if (!(i.op[1].regs->reg_flags & RegRex) && base_regnum < 4)
> +	    /* 32 8-bit registers + 32 16-bit registers
> +	       + 32 32-bit registers.  */
> +	    j = 96;
> +	  if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum < 4)
>  	    j += 8;
>  	  i.op[1].regs -= j;
>  	}

I did comment on, in particular, the 8-bit register counts before.
Afaict the comments above are nevertheless unchanged and hence
still not really correct.

> @@ -5576,6 +5615,13 @@ md_assemble (char *line)
>  	  return;
>  	}
>  
> +      /* Check for explicit REX2 prefix.  */
> +      if (i.rex2_encoding)
> +	{
> +	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
> +	  return;
> +	}

Again I'm pretty sure I pointed out before that i.rex2_encoding reflects
use of {rex2}. Which then the error message should correctly refer to.

> @@ -5615,11 +5661,12 @@ md_assemble (char *line)
>  	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
>        || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
>  	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> -	  && i.rex != 0))
> +	  && (i.rex != 0 || i.rex2 != 0)))
>      {
>        int x;
>  
> -      i.rex |= REX_OPCODE;
> +      if (!i.rex2)
> +	i.rex |= REX_OPCODE;
>        for (x = 0; x < 2; x++)
>  	{
>  	  /* Look for 8 bit operand that uses old registers.  */
> @@ -5630,7 +5677,7 @@ md_assemble (char *line)
>  	      /* In case it is "hi" register, give up.  */
>  	      if (i.op[x].regs->reg_num > 3)
>  		as_bad (_("can't encode register '%s%s' in an "
> -			  "instruction requiring REX prefix."),
> +			  "instruction requiring REX/REX2 prefix."),
>  			register_prefix, i.op[x].regs->reg_name);
>  
>  	      /* Otherwise it is equivalent to the extended register.
> @@ -5642,11 +5689,11 @@ md_assemble (char *line)
>  	}
>      }
>  
> -  if (i.rex == 0 && i.rex_encoding)
> +  if ((i.rex == 0 && i.rex_encoding) || (i.rex2 == 0 && i.rex2_encoding))

Doesn't this want to be

  if (i.rex == 0 && i.rex2 == 0 && (i.rex_encoding || i.rex2_encoding))

?

>      {
>        /* Check if we can add a REX_OPCODE byte.  Look for 8 bit operand
>  	 that uses legacy register.  If it is "hi" register, don't add
> -	 the REX_OPCODE byte.  */
> +	 rex and rex2 prefix.  */
>        int x;
>        for (x = 0; x < 2; x++)
>  	if (i.types[x].bitfield.class == Reg
> @@ -5656,6 +5703,7 @@ md_assemble (char *line)
>  	  {
>  	    gas_assert (!(i.op[x].regs->reg_flags & RegRex));
>  	    i.rex_encoding = false;
> +	    i.rex2_encoding = false;
>  	    break;
>  	  }
>  
> @@ -5663,7 +5711,13 @@ md_assemble (char *line)
>  	i.rex = REX_OPCODE;
>      }
>  
> -  if (i.rex != 0)
> +  if (i.rex2 != 0 || i.rex2_encoding)
> +    {
> +      build_rex2_prefix ();
> +      /* The individual REX.RXBW bits got consumed.  */
> +      i.rex &= REX_OPCODE;
> +    }
> +  else if (i.rex != 0)
>      add_prefix (REX_OPCODE | i.rex);
>  
>    insert_lfence_before ();
> @@ -5834,6 +5888,10 @@ parse_insn (const char *line, char *mnemonic, bool prefix_only)
>  		  /* {rex} */
>  		  i.rex_encoding = true;
>  		  break;
> +		case Prefix_REX2:
> +		  /* {rex2} */
> +		  i.rex2_encoding = true;
> +		  break;
>  		case Prefix_NoOptimize:
>  		  /* {nooptimize} */
>  		  i.no_optimize = true;
> @@ -6971,6 +7029,45 @@ VEX_check_encoding (const insn_template *t)
>    return 0;
>  }
>  
> +/* Check if Egprs operands are valid for the instruction.  */
> +
> +static int
> +check_EgprOperands (const insn_template *t)
> +{
> +  if (!t->opcode_modifier.noegpr)
> +    return 0;
> +
> +  for (unsigned int op = 0; op < i.operands; op++)
> +    {
> +      if (i.types[op].bitfield.class != Reg
> +	  /* Special case for (%dx) while doing input/output op */
> +	  || i.input_output_operand)

Didn't we agree that this extra condition isn't necessary, once the
producer site correctly updates all state (which was supposed to be
done in a small prereq patch)?

> @@ -7107,7 +7204,9 @@ match_template (char mnem_suffix)
>        /* Do not verify operands when there are none.  */
>        if (!t->operands)
>  	{
> -	  if (VEX_check_encoding (t))
> +	  /* When there are no operands, we still need to use the
> +	     check_EgprOperands function to check whether {rex2} is valid.  */
> +	  if (VEX_check_encoding (t) || check_EgprOperands (t))

As before imo either the function name wants changing (so it becomes
reasonable to use here, without the need for a comment explaining the
oddity), or you simply open-code the sole check that is needed here
(afaict: t->opcode_modifier.noegpr && i.rex2_encoding).

> @@ -7443,6 +7542,13 @@ match_template (char mnem_suffix)
>  	  continue;
>  	}
>  
> +      /* Check if EGRPS operands(r16-r31) are valid.  */

EGPR?

> --- a/gas/doc/c-i386.texi
> +++ b/gas/doc/c-i386.texi
> @@ -217,6 +217,7 @@ accept various extension mnemonics.  For example,
>  @code{avx10.1/256},
>  @code{avx10.1/128},
>  @code{user_msr},
> +@code{apx_f},
>  @code{amx_int8},
>  @code{amx_bf16},
>  @code{amx_fp16},
> @@ -983,6 +984,9 @@ Different encoding options can be specified via pseudo prefixes:
>  instructions (x86-64 only).  Note that this differs from the @samp{rex}
>  prefix which generates REX prefix unconditionally.
>  
> +@item
> +@samp{@{rex2@}} -- encode with REX2 prefix

This isn't in line with what's said for {rex}. Iirc we were in
agreement that we want both to behave consistently. In which case
documentation also needs to describe them consistently.

> --- /dev/null
> +++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
> @@ -0,0 +1,86 @@
> +# Check 64bit instructions with rex2 prefix encoding
> +
> +	.allow_index_reg
> +	.text
> +_start:
> +         test	$0x7, %r24b
> +         test	$0x7, %r24d
> +         test	$0x7, %r24
> +         test	$0x7, %r24w
> +## REX2.M bit
> +         imull	%eax, %r15d
> +         imull	%eax, %r16d
> +         punpckldq (%r18), %mm2
> +## REX2.R4 bit
> +         leal	(%rax), %r16d
> +         leal	(%rax), %r17d
> +         leal	(%rax), %r18d
> +         leal	(%rax), %r19d
> +         leal	(%rax), %r20d
> +         leal	(%rax), %r21d
> +         leal	(%rax), %r22d
> +         leal	(%rax), %r23d
> +         leal	(%rax), %r24d
> +         leal	(%rax), %r25d
> +         leal	(%rax), %r26d
> +         leal	(%rax), %r27d
> +         leal	(%rax), %r28d
> +         leal	(%rax), %r29d
> +         leal	(%rax), %r30d
> +         leal	(%rax), %r31d
> +## REX2.X4 bit
> +         leal	(,%r16), %eax
> +         leal	(,%r17), %eax
> +         leal	(,%r18), %eax
> +         leal	(,%r19), %eax
> +         leal	(,%r20), %eax
> +         leal	(,%r21), %eax
> +         leal	(,%r22), %eax
> +         leal	(,%r23), %eax
> +         leal	(,%r24), %eax
> +         leal	(,%r25), %eax
> +         leal	(,%r26), %eax
> +         leal	(,%r27), %eax
> +         leal	(,%r28), %eax
> +         leal	(,%r29), %eax
> +         leal	(,%r30), %eax
> +         leal	(,%r31), %eax
> +## REX.B4 bit

Further up you properly say REX2. Here and below it's only REX?

> --- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> +++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> @@ -5,3 +5,61 @@ pseudos:
>  	{rex} vmovaps %xmm7,%xmm2
>  	{rex} vmovaps %xmm17,%xmm2
>  	{rex} rorx $7,%eax,%ebx
> +	{rex2} vmovaps %xmm7,%xmm2
> +	{rex2} xsave (%rax)
> +	{rex2} xsaves (%ecx)
> +	{rex2} xsaves64 (%ecx)
> +	{rex2} xsavec (%ecx)
> +	{rex2} xrstors (%ecx)
> +	{rex2} xrstors64 (%ecx)
> +
> +	#All opcodes in the row 0xa* prefixed REX2 are illegal.
> +	#{rex2} test (0xa8) is a special case, it will remap to test (0xf6)
> +	{rex2} mov    0x90909090,%al
> +	{rex2} movabs 0x1,%al
> +	{rex2} cmpsb  %es:(%edi),%ds:(%esi)
> +	{rex2} lodsb
> +	{rex2} lods   %ds:(%esi),%al
> +	{rex2} lodsb   (%esi)
> +	{rex2} movs
> +	{rex2} movs   (%esi), (%edi)
> +	{rex2} scasl
> +	{rex2} scas   %es:(%edi),%eax
> +	{rex2} scasb   (%edi)
> +	{rex2} stosb
> +	{rex2} stosb   (%edi)
> +	{rex2} stos   %eax,%es:(%edi)
> +
> +	#All opcodes in the row 0x7* prefixed REX2 are illegal.

This also covers map 1 row 8, doesn't it?

> +	{rex2} jo     .+2-0x70
> +	{rex2} jno    .+2-0x70
> +	{rex2} jb     .+2-0x70
> +	{rex2} jae    .+2-0x70
> +	{rex2} je     .+2-0x70
> +	{rex2} jne    .+2-0x70
> +	{rex2} jbe    .+2-0x70
> +	{rex2} ja     .+2-0x70
> +	{rex2} js     .+2-0x70
> +	{rex2} jns    .+2-0x70
> +	{rex2} jp     .+2-0x70
> +	{rex2} jnp    .+2-0x70
> +	{rex2} jl     .+2-0x70
> +	{rex2} jge    .+2-0x70
> +	{rex2} jle    .+2-0x70
> +	{rex2} jg     .+2-0x70
> +
> +	#All opcodes in the row 0x7* prefixed REX2 are illegal.
> +	{rex2} in $0x90,%al
> +	{rex2} in $0x90
> +	{rex2} out $0x90,%al
> +	{rex2} out $0x90
> +	{rex2} jmp  *%eax
> +	{rex2} loop foo

Isn't this row 0xE?

> +	#All opcodes in the row 0xf3* prefixed REX2 are illegal.

This comment continues to be confusing: 0xf3 is a REP prefix. Perhaps
best to either say "map 1" and omit the "f" or at least write 0x0f3*
or slightly better 0x0f 0x3*.

> --- a/gas/testsuite/gas/i386/x86-64-pseudos.s
> +++ b/gas/testsuite/gas/i386/x86-64-pseudos.s
> @@ -360,6 +360,19 @@ _start:
>  	{rex} movaps (%r8),%xmm2
>  	{rex} phaddw (%rcx),%mm0
>  	{rex} phaddw (%r8),%mm0
> +	{rex2} mov %al,%ah
> +	{rex2} shl %cl, %eax
> +	{rex2} cmp %cl, %dl
> +	{rex2} mov $1, %bl
> +	{rex2} movl %eax,%ebx
> +	{rex2} movl %eax,%r14d
> +	{rex2} movl %eax,(%r8)
> +	{rex2} movaps %xmm7,%xmm2
> +	{rex2} movaps %xmm7,%xmm12
> +	{rex2} movaps (%rcx),%xmm2
> +	{rex2} movaps (%r8),%xmm2
> +	{rex2} pmullw %mm0,%mm6
> +
>  
>  	movb (%rbp),%al
>  	{disp8} movb (%rbp),%al

No double blank lines please.

> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c

Disassembler comments (if any) in a separate (later) mail again.

> --- a/opcodes/i386-gen.c
> +++ b/opcodes/i386-gen.c
> @@ -275,6 +275,8 @@ static const dependency isa_dependencies[] =
>      "64" },
>    { "USER_MSR",
>      "64" },
> +  { "APX_F",
> +    "XSAVE|64" },
>  };
>  
>  /* This array is populated as process_i386_initializers() walks cpu_flags[].  */
> @@ -397,6 +399,7 @@ static bitfield cpu_flags[] =
>    BITFIELD (FRED),
>    BITFIELD (LKGS),
>    BITFIELD (USER_MSR),
> +  BITFIELD (APX_F),
>    BITFIELD (MWAITX),
>    BITFIELD (CLZERO),
>    BITFIELD (OSPKE),
> @@ -486,6 +489,7 @@ static bitfield opcode_modifiers[] =
>    BITFIELD (ATTSyntax),
>    BITFIELD (IntelSyntax),
>    BITFIELD (ISA64),
> +  BITFIELD (NoEgpr),
>  };
>  
>  #define CLASS(n) #n, n
> @@ -1072,10 +1076,48 @@ get_element_size (char **opnd, int lineno)
>    return elem_size;
>  }
>  
> +static bool
> +rex2_disallowed (const unsigned long long opcode, unsigned int space,
> +			       const char *cpu_flags)
> +{
> +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers #UD.  */
> +  if (strcmp (cpu_flags, "XSAVES") >= 0
> +      || strcmp (cpu_flags, "XSAVEC") >= 0
> +      || strcmp (cpu_flags, "Xsave") >= 0
> +      || strcmp (cpu_flags, "Xsaveopt") >= 0)
> +    return true;

Wasn't this intended to be dropped, being redundant with the opcode table
attributes?

> +  /* All opcodes listed map0 0x4*, 0x7*, 0xa*, 0xe* and map1 0x3*, 0x8*
> +     are reserved under REX2 and triggers #UD when prefixed with REX2 */
> +  if (space == 0)
> +    switch (opcode >> 4)

Both here and ...

> +      {
> +      case 0x4:
> +      case 0x7:
> +      case 0xA:
> +      case 0xE:
> +	return true;
> +      default:
> +	return false;
> +    }
> +
> +  if (space == SPACE_0F)
> +    switch (opcode >> 4)

... here, don't you also need to mask off further bits? There are
quite a few opcodes which have a kind-of ModR/M byte encoded directly
in the opcode, for example.

> +      {
> +      case 0x3:
> +      case 0x8:
> +	return true;
> +      default:
> +	return false;
> +      }
> +
> +  return false;
> +}
> +
>  static void
>  process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
>  			      unsigned int prefix, const char *extension_opcode,
> -			      char **opnd, int lineno)
> +			      char **opnd, int lineno, bool rex2_disallowed)
>  {
>    char *str, *next, *last;
>    bitfield modifiers [ARRAY_SIZE (opcode_modifiers)];
> @@ -1202,6 +1244,12 @@ process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
>  	  || modifiers[SAE].value))
>      modifiers[EVex].value = EVEXDYN;
>  
> +  /* Vex, legacy map2 and map3 and rex2_disallowed do not support EGPR.
> +     For template supports both Vex and EVex allowing EGPR.  */

"Templates supporting both Vex and EVex allow EGPR."

> +  if ((modifiers[Vex].value || space > SPACE_0F || rex2_disallowed)
> +      && !modifiers[EVex].value)
> +    modifiers[NoEgpr].value = 1;
> +
>    output_opcode_modifier (table, modifiers, ARRAY_SIZE (modifiers));
>  }
>  
> @@ -1425,8 +1473,11 @@ output_i386_opcode (FILE *table, const char *name, char *str,
>  	   ident, 2 * (int)length, opcode, end, i);
>    free (ident);
>  
> +  /* Add some specilal handle for current entry.  */
> +  bool  has_special_handle = rex2_disallowed (opcode, space, cpu_flags);

The local variable (if one is needed in the first place) wants naming as
usefully as the function now is named. Similarly the comment would want
improving alonmg those lines.

>    process_i386_opcode_modifier (table, opcode_modifier, space, prefix,
> -				extension_opcode, operand_types, lineno);
> +				extension_opcode, operand_types, lineno,
> +				has_special_handle);
>  
>    process_i386_cpu_flag (table, cpu_flags, NULL, ",", "    ", lineno, CpuMax);
>  
> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -138,6 +138,7 @@
>  #define Vsz256 Vsz=VSZ256
>  #define Vsz512 Vsz=VSZ512
>  
> +
>  // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
>  // the bit to mark commutative VEX encodings where swapping the source
>  // operands may allow to switch from 3-byte to 2-byte VEX encoding.

Stray change (in general please avoid introducing double blank lines, as
those make patch context less useful).

Jan
  
Cui, Lili Dec. 5, 2023, 1:31 p.m. UTC | #2
> On 24.11.2023 08:02, Cui, Lili wrote:
> > @@ -3865,6 +3873,12 @@ is_any_vex_encoding (const insn_template *t)
> >    return t->opcode_modifier.vex || t->opcode_modifier.evex;  }
> >
> > +static INLINE bool
> > +is_apx_rex2_encoding (void)
> > +{
> > +  return i.rex2 || i.rex2_encoding;
> > +}
> 
> This function is used just once. Do we really need it? Or else why don't you
> use it near the end of md_assemble()?
> 

Yes, I also found this issue and used this function instead of " (i.rex2 != 0 || i.rex2_encoding)" at the end of md_assemble().

> > @@ -4120,6 +4134,21 @@ build_evex_prefix (void)
> >      i.vex.bytes[3] |= i.mask.reg->reg_num;  }
> >
> > +/* Build (2 bytes) rex2 prefix.
> > +   | D5h |
> > +   | m | R4 X4 B4 | W R X B |
> > +*/
> > +static void
> > +build_rex2_prefix (void)
> > +{
> > +  /* Rex2 reuses i.vex because they handle i.tm.opcode_space the
> > +same.  */
> 
> How do they handle it the same? (Also I don't think this is useful as a code
> comment; it instead belongs in the description imo.)
> 

Moved the comment to the functions description.

/* Build (2 bytes) rex2 prefix.
   | D5h |
   | m | R4 X4 B4 | W R X B |

   Rex2 reuses i.vex as they handle i.tm.opcode_space the same way.  */
static void
build_rex2_prefix (void)


In function "output_insn",  some handle like this.

      if (!i.vex.length)
        switch (i.tm.opcode_space)
          {
          case SPACE_BASE:
            break;
          case SPACE_0F:
            ++j;
            break;
          case SPACE_0F38:
          case SPACE_0F3A:
            j += 2;
            break;
          default:
            abort ();
          }
.....
         if (!i.vex.length
              && i.tm.opcode_space != SPACE_BASE)
            {
              *p++ = 0x0f;
              if (i.tm.opcode_space != SPACE_0F)
                *p++ = i.tm.opcode_space == SPACE_0F38
                       ? 0x38 : 0x3a;
            }

> > +  i.vex.length = 2;
> > +  i.vex.bytes[0] = 0xd5;
> > +  /* For the W R X B bits, the variables of rex prefix will be
> > +reused.  */
> > +  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
> > +		    | (i.rex2 << 4) | i.rex);
> > +}
> > +
> >  static void
> >  process_immext (void)
> >  {
> > @@ -4385,12 +4414,16 @@ optimize_encoding (void)
> >  	  i.suffix = 0;
> >  	  /* Convert to byte registers.  */
> >  	  if (i.types[1].bitfield.word)
> > -	    j = 16;
> > -	  else if (i.types[1].bitfield.dword)
> > +	    /* There are 40 8-bit registers.  */
> >  	    j = 32;
> > +	  else if (i.types[1].bitfield.dword)
> > +	    /* 32 8-bit registers + 32 16-bit registers.  */
> > +	    j = 64;
> >  	  else
> > -	    j = 48;
> > -	  if (!(i.op[1].regs->reg_flags & RegRex) && base_regnum < 4)
> > +	    /* 32 8-bit registers + 32 16-bit registers
> > +	       + 32 32-bit registers.  */
> > +	    j = 96;
> > +	  if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum
> > +< 4)
> >  	    j += 8;
> >  	  i.op[1].regs -= j;
> >  	}
> 
> I did comment on, in particular, the 8-bit register counts before.
> Afaict the comments above are nevertheless unchanged and hence still not
> really correct.
> 

Changed to :

      if (flag_code == CODE_64BIT || base_regnum < 4)
        {
          i.types[1].bitfield.byte = 1;
          /* Ignore the suffix.  */
          i.suffix = 0;
          /* Convert to byte registers. 8-bit registers are special,
             RegRex64 and non-RegRex64 each have 8 registers.  */
          if (i.types[1].bitfield.word)
            /* 32 (or 40) 8-bit registers.  */
            j = 32;
          else if (i.types[1].bitfield.dword)
            /* 32 (or 40)8-bit registers + 32 16-bit registers.  */
            j = 64;
          else
            /* 32 (or 40) 8-bit registers + 32 16-bit registers
               + 32 32-bit registers.  */
            j = 96;

          if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum < 4)
            j += 8;
          i.op[1].regs -= j;
        }

> > @@ -5576,6 +5615,13 @@ md_assemble (char *line)
> >  	  return;
> >  	}
> >
> > +      /* Check for explicit REX2 prefix.  */
> > +      if (i.rex2_encoding)
> > +	{
> > +	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
> > +	  return;
> > +	}
> 
> Again I'm pretty sure I pointed out before that i.rex2_encoding reflects use of
> {rex2}. Which then the error message should correctly refer to.
> 

Changed to 

      /* Check for explicit REX2 prefix.  */
      if (i.rex2_encoding)
        {
          as_bad (_("{rex2} prefix invalid with `%s'"), insn_name (&i.tm));
          return;
        }

> > @@ -5615,11 +5661,12 @@ md_assemble (char *line)
> >  	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
> >        || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
> >  	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
> > -	  && i.rex != 0))
> > +	  && (i.rex != 0 || i.rex2 != 0)))
> >      {
> >        int x;
> >
> > -      i.rex |= REX_OPCODE;
> > +      if (!i.rex2)
> > +	i.rex |= REX_OPCODE;
> >        for (x = 0; x < 2; x++)
> >  	{
> >  	  /* Look for 8 bit operand that uses old registers.  */ @@ -5630,7
> > +5677,7 @@ md_assemble (char *line)
> >  	      /* In case it is "hi" register, give up.  */
> >  	      if (i.op[x].regs->reg_num > 3)
> >  		as_bad (_("can't encode register '%s%s' in an "
> > -			  "instruction requiring REX prefix."),
> > +			  "instruction requiring REX/REX2 prefix."),
> >  			register_prefix, i.op[x].regs->reg_name);
> >
> >  	      /* Otherwise it is equivalent to the extended register.
> > @@ -5642,11 +5689,11 @@ md_assemble (char *line)
> >  	}
> >      }
> >
> > -  if (i.rex == 0 && i.rex_encoding)
> > +  if ((i.rex == 0 && i.rex_encoding) || (i.rex2 == 0 &&
> > + i.rex2_encoding))
> 
> Doesn't this want to be
> 
>   if (i.rex == 0 && i.rex2 == 0 && (i.rex_encoding || i.rex2_encoding))
> 
> ?

Done.

> 
> >      {
> >        /* Check if we can add a REX_OPCODE byte.  Look for 8 bit operand
> >  	 that uses legacy register.  If it is "hi" register, don't add
> > -	 the REX_OPCODE byte.  */
> > +	 rex and rex2 prefix.  */
> >        int x;
> >        for (x = 0; x < 2; x++)
> >  	if (i.types[x].bitfield.class == Reg @@ -5656,6 +5703,7 @@
> > md_assemble (char *line)
> >  	  {
> >  	    gas_assert (!(i.op[x].regs->reg_flags & RegRex));
> >  	    i.rex_encoding = false;
> > +	    i.rex2_encoding = false;
> >  	    break;
> >  	  }
> >
> > @@ -5663,7 +5711,13 @@ md_assemble (char *line)
> >  	i.rex = REX_OPCODE;
> >      }
> >
> > -  if (i.rex != 0)
> > +  if (i.rex2 != 0 || i.rex2_encoding)
> > +    {
> > +      build_rex2_prefix ();
> > +      /* The individual REX.RXBW bits got consumed.  */
> > +      i.rex &= REX_OPCODE;
> > +    }
> > +  else if (i.rex != 0)
> >      add_prefix (REX_OPCODE | i.rex);
> >
> >    insert_lfence_before ();
> > @@ -5834,6 +5888,10 @@ parse_insn (const char *line, char *mnemonic,
> bool prefix_only)
> >  		  /* {rex} */
> >  		  i.rex_encoding = true;
> >  		  break;
> > +		case Prefix_REX2:
> > +		  /* {rex2} */
> > +		  i.rex2_encoding = true;
> > +		  break;
> >  		case Prefix_NoOptimize:
> >  		  /* {nooptimize} */
> >  		  i.no_optimize = true;
> > @@ -6971,6 +7029,45 @@ VEX_check_encoding (const insn_template *t)
> >    return 0;
> >  }
> >
> > +/* Check if Egprs operands are valid for the instruction.  */
> > +
> > +static int
> > +check_EgprOperands (const insn_template *t) {
> > +  if (!t->opcode_modifier.noegpr)
> > +    return 0;
> > +
> > +  for (unsigned int op = 0; op < i.operands; op++)
> > +    {
> > +      if (i.types[op].bitfield.class != Reg
> > +	  /* Special case for (%dx) while doing input/output op */
> > +	  || i.input_output_operand)
> 
> Didn't we agree that this extra condition isn't necessary, once the producer
> site correctly updates all state (which was supposed to be done in a small
> prereq patch)?
> 

I tried adding "Unspecified | BaseIndex" to the InOutPortReg, then some related instructions had two memory operands, so it raised a lot of invalid test case fail, and more ugly code needed to be added. In the end, I felt that this simple modification might be better.

@@ -13137,6 +13137,7 @@ i386_att_operand (char *operand_string)
          && !operand_type_check (i.types[this_operand], disp))
        {
          i.types[this_operand] = i.base_reg->reg_type;
+         i.types[this_operand].bitfield.class = 0;
          i.input_output_operand = true;
          return 1;

> > @@ -7107,7 +7204,9 @@ match_template (char mnem_suffix)
> >        /* Do not verify operands when there are none.  */
> >        if (!t->operands)
> >  	{
> > -	  if (VEX_check_encoding (t))
> > +	  /* When there are no operands, we still need to use the
> > +	     check_EgprOperands function to check whether {rex2} is valid.  */
> > +	  if (VEX_check_encoding (t) || check_EgprOperands (t))
> 
> As before imo either the function name wants changing (so it becomes
> reasonable to use here, without the need for a comment explaining the
> oddity), or you simply open-code the sole check that is needed here
> (afaict: t->opcode_modifier.noegpr && i.rex2_encoding).
> 

Changed to 

          if (VEX_check_encoding (t))
            {
              specific_error = progress (i.error);
              continue;
            }

          /* Check if pseudo prefix {rex2} is valid.  */
          if (t->opcode_modifier.noegpr && i.rex2_encoding)
            {
              i.error = invalid_pseudo_prefix;
              specific_error = progress (i.error);
              continue;
            }

> > @@ -7443,6 +7542,13 @@ match_template (char mnem_suffix)
> >  	  continue;
> >  	}
> >
> > +      /* Check if EGRPS operands(r16-r31) are valid.  */
> 
> EGPR?
> 

Done.

> > --- a/gas/doc/c-i386.texi
> > +++ b/gas/doc/c-i386.texi
> > @@ -217,6 +217,7 @@ accept various extension mnemonics.  For example,
> > @code{avx10.1/256},  @code{avx10.1/128},  @code{user_msr},
> > +@code{apx_f},
> >  @code{amx_int8},
> >  @code{amx_bf16},
> >  @code{amx_fp16},
> > @@ -983,6 +984,9 @@ Different encoding options can be specified via
> pseudo prefixes:
> >  instructions (x86-64 only).  Note that this differs from the
> > @samp{rex}  prefix which generates REX prefix unconditionally.
> >
> > +@item
> > +@samp{@{rex2@}} -- encode with REX2 prefix
> 
> This isn't in line with what's said for {rex}. Iirc we were in agreement that we
> want both to behave consistently. In which case documentation also needs to
> describe them consistently.
> 

Changed to 

@item
@samp{@{rex2@}} -- prefer REX2 prefix for integer and legacy vector
instructions (APX_F only).  Note that this differs from the @samp{rex2}
prefix which generates REX2 prefix unconditionally.

> > --- /dev/null
> > +++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
> > @@ -0,0 +1,86 @@
> > +# Check 64bit instructions with rex2 prefix encoding
> > +         leal	(,%r16), %eax
> > +         leal	(,%r17), %eax
> > +         leal	(,%r18), %eax
> > +         leal	(,%r19), %eax
> > +         leal	(,%r20), %eax
> > +         leal	(,%r21), %eax
> > +         leal	(,%r22), %eax
> > +         leal	(,%r23), %eax
> > +         leal	(,%r24), %eax
> > +         leal	(,%r25), %eax
> > +         leal	(,%r26), %eax
> > +         leal	(,%r27), %eax
> > +         leal	(,%r28), %eax
> > +         leal	(,%r29), %eax
> > +         leal	(,%r30), %eax
> > +         leal	(,%r31), %eax
> > +## REX.B4 bit
> 
> Further up you properly say REX2. Here and below it's only REX?
> 

Done.

> > --- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> > +++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> > @@ -5,3 +5,61 @@ pseudos:
> >  	{rex} vmovaps %xmm7,%xmm2
> >  	{rex} vmovaps %xmm17,%xmm2
> >  	{rex} rorx $7,%eax,%ebx
> > +	{rex2} vmovaps %xmm7,%xmm2
> > +	{rex2} xsave (%rax)
> > +	{rex2} xsaves (%ecx)
> > +	{rex2} xsaves64 (%ecx)
> > +	{rex2} xsavec (%ecx)
> > +	{rex2} xrstors (%ecx)
> > +	{rex2} xrstors64 (%ecx)
> > +
> > +	#All opcodes in the row 0xa* prefixed REX2 are illegal.
> > +	#{rex2} test (0xa8) is a special case, it will remap to test (0xf6)
> > +	{rex2} mov    0x90909090,%al
> > +	{rex2} movabs 0x1,%al
> > +	{rex2} cmpsb  %es:(%edi),%ds:(%esi)
> > +	{rex2} lodsb
> > +	{rex2} lods   %ds:(%esi),%al
> > +	{rex2} lodsb   (%esi)
> > +	{rex2} movs
> > +	{rex2} movs   (%esi), (%edi)
> > +	{rex2} scasl
> > +	{rex2} scas   %es:(%edi),%eax
> > +	{rex2} scasb   (%edi)
> > +	{rex2} stosb
> > +	{rex2} stosb   (%edi)
> > +	{rex2} stos   %eax,%es:(%edi)
> > +
> > +	#All opcodes in the row 0x7* prefixed REX2 are illegal.
> 
> This also covers map 1 row 8, doesn't it?
> 

No, I didn't find 0xf8* in opcode table.

> > +	{rex2} jo     .+2-0x70
> > +	{rex2} jno    .+2-0x70
> > +	{rex2} jb     .+2-0x70
> > +	{rex2} jae    .+2-0x70
> > +	{rex2} je     .+2-0x70
> > +	{rex2} jne    .+2-0x70
> > +	{rex2} jbe    .+2-0x70
> > +	{rex2} ja     .+2-0x70
> > +	{rex2} js     .+2-0x70
> > +	{rex2} jns    .+2-0x70
> > +	{rex2} jp     .+2-0x70
> > +	{rex2} jnp    .+2-0x70
> > +	{rex2} jl     .+2-0x70
> > +	{rex2} jge    .+2-0x70
> > +	{rex2} jle    .+2-0x70
> > +	{rex2} jg     .+2-0x70
> > +
> > +	#All opcodes in the row 0x7* prefixed REX2 are illegal.
> > +	{rex2} in $0x90,%al
> > +	{rex2} in $0x90
> > +	{rex2} out $0x90,%al
> > +	{rex2} out $0x90
> > +	{rex2} jmp  *%eax
> > +	{rex2} loop foo
> > +	#All opcodes in the row 0xf3* prefixed REX2 are illegal.
> 
> This comment continues to be confusing: 0xf3 is a REP prefix. Perhaps best to
> either say "map 1" and omit the "f" or at least write 0x0f3* or slightly better
> 0x0f 0x3*.
> 

Done.

> > --- a/gas/testsuite/gas/i386/x86-64-pseudos.s
> > +++ b/gas/testsuite/gas/i386/x86-64-pseudos.s
> > @@ -360,6 +360,19 @@ _start:
> >  	{rex} movaps (%r8),%xmm2
> >  	{rex} phaddw (%rcx),%mm0
> >  	{rex} phaddw (%r8),%mm0
> > +	{rex2} mov %al,%ah
> > +	{rex2} shl %cl, %eax
> > +	{rex2} cmp %cl, %dl
> > +	{rex2} mov $1, %bl
> > +	{rex2} movl %eax,%ebx
> > +	{rex2} movl %eax,%r14d
> > +	{rex2} movl %eax,(%r8)
> > +	{rex2} movaps %xmm7,%xmm2
> > +	{rex2} movaps %xmm7,%xmm12
> > +	{rex2} movaps (%rcx),%xmm2
> > +	{rex2} movaps (%r8),%xmm2
> > +	{rex2} pmullw %mm0,%mm6
> > +
> >
> >  	movb (%rbp),%al
> >  	{disp8} movb (%rbp),%al
> 
> No double blank lines please.
> 

Done.

> > --- a/opcodes/i386-dis.c
> > +++ b/opcodes/i386-dis.c
> 
> Disassembler comments (if any) in a separate (later) mail again.
> 

OK.

> > --- a/opcodes/i386-gen.c
> > +++ b/opcodes/i386-gen.c
> > @@ -275,6 +275,8 @@ static const dependency isa_dependencies[] =
> >      "64" },
> >    { "USER_MSR",
> >      "64" },
> > +  { "APX_F",
> > +    "XSAVE|64" },
> >  };
> >
> >  /* This array is populated as process_i386_initializers() walks
> > cpu_flags[].  */ @@ -397,6 +399,7 @@ static bitfield cpu_flags[] =
> >    BITFIELD (FRED),
> >    BITFIELD (LKGS),
> >    BITFIELD (USER_MSR),
> > +  BITFIELD (APX_F),
> >    BITFIELD (MWAITX),
> >    BITFIELD (CLZERO),
> >    BITFIELD (OSPKE),
> > @@ -486,6 +489,7 @@ static bitfield opcode_modifiers[] =
> >    BITFIELD (ATTSyntax),
> >    BITFIELD (IntelSyntax),
> >    BITFIELD (ISA64),
> > +  BITFIELD (NoEgpr),
> >  };
> >
> >  #define CLASS(n) #n, n
> > @@ -1072,10 +1076,48 @@ get_element_size (char **opnd, int lineno)
> >    return elem_size;
> >  }
> >
> > +static bool
> > +rex2_disallowed (const unsigned long long opcode, unsigned int space,
> > +			       const char *cpu_flags)
> > +{
> > +  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers
> > +#UD.  */
> > +  if (strcmp (cpu_flags, "XSAVES") >= 0
> > +      || strcmp (cpu_flags, "XSAVEC") >= 0
> > +      || strcmp (cpu_flags, "Xsave") >= 0
> > +      || strcmp (cpu_flags, "Xsaveopt") >= 0)
> > +    return true;
> 
> Wasn't this intended to be dropped, being redundant with the opcode table
> attributes?
>

Yes, dropped.
 
> > +  /* All opcodes listed map0 0x4*, 0x7*, 0xa*, 0xe* and map1 0x3*, 0x8*
> > +     are reserved under REX2 and triggers #UD when prefixed with REX2
> > + */  if (space == 0)
> > +    switch (opcode >> 4)
> 
> Both here and ...
>
> > +      {
> > +      case 0x4:
> > +      case 0x7:
> > +      case 0xA:
> > +      case 0xE:
> > +	return true;
> > +      default:
> > +	return false;
> > +    }
> > +
> > +  if (space == SPACE_0F)
> > +    switch (opcode >> 4)
> 
> ... here, don't you also need to mask off further bits? There are quite a few
> opcodes which have a kind-of ModR/M byte encoded directly in the opcode,
> for example.
> 

Thanks for reminding. Added the code like this.

/* Some opcodes encode a ModR/M byte directly in the opcode.  */
  unsigned long long
  base_opcode = (length > 1) ? opcode >> (8 * length - 8) : opcode;

/* All opcodes listed map0 0x4*, 0x7*, 0xa*, 0xe* and map1 0x3*, 0x8*
     are reserved under REX2 and triggers #UD when prefixed with REX2 */
  if (space == 0)
    switch (base_opcode >> 4)
      {
      case 0x4:
      case 0x7:
      case 0xA:
      case 0xE:
        return true;
      default:
        return false;
    }

  if (space == SPACE_0F)
    switch (base_opcode >> 4)

> > +      {
> > +      case 0x3:
> > +      case 0x8:
> > +	return true;
> > +      default:
> > +	return false;
> > +      }
> > +
> > +  return false;
> > +}
> > +
> >  static void
> >  process_i386_opcode_modifier (FILE *table, char *mod, unsigned int
> space,
> >  			      unsigned int prefix, const char
> *extension_opcode,
> > -			      char **opnd, int lineno)
> > +			      char **opnd, int lineno, bool rex2_disallowed)
> >  {
> >    char *str, *next, *last;
> >    bitfield modifiers [ARRAY_SIZE (opcode_modifiers)]; @@ -1202,6
> > +1244,12 @@ process_i386_opcode_modifier (FILE *table, char *mod,
> unsigned int space,
> >  	  || modifiers[SAE].value))
> >      modifiers[EVex].value = EVEXDYN;
> >
> > +  /* Vex, legacy map2 and map3 and rex2_disallowed do not support
> EGPR.
> > +     For template supports both Vex and EVex allowing EGPR.  */
> 
> "Templates supporting both Vex and EVex allow EGPR."
> 

Done.

> > +  if ((modifiers[Vex].value || space > SPACE_0F || rex2_disallowed)
> > +      && !modifiers[EVex].value)
> > +    modifiers[NoEgpr].value = 1;
> > +
> >    output_opcode_modifier (table, modifiers, ARRAY_SIZE (modifiers));
> > }
> >
> > @@ -1425,8 +1473,11 @@ output_i386_opcode (FILE *table, const char
> *name, char *str,
> >  	   ident, 2 * (int)length, opcode, end, i);
> >    free (ident);
> >
> > +  /* Add some specilal handle for current entry.  */  bool
> > + has_special_handle = rex2_disallowed (opcode, space, cpu_flags);
> 
> The local variable (if one is needed in the first place) wants naming as usefully
> as the function now is named. Similarly the comment would want improving
> alonmg those lines.
> 

Dropped the local variable. Changed to

  process_i386_opcode_modifier (table, opcode_modifier, space, prefix,
                                extension_opcode, operand_types, lineno,
                                rex2_disallowed (opcode, length, space,
                                                 cpu_flags));

> >    process_i386_opcode_modifier (table, opcode_modifier, space, prefix,
> > -				extension_opcode, operand_types, lineno);
> > +				extension_opcode, operand_types, lineno,
> > +				has_special_handle);
> >
> >    process_i386_cpu_flag (table, cpu_flags, NULL, ",", "    ", lineno, CpuMax);
> >
> > --- a/opcodes/i386-opc.tbl
> > +++ b/opcodes/i386-opc.tbl
> > @@ -138,6 +138,7 @@
> >  #define Vsz256 Vsz=VSZ256
> >  #define Vsz512 Vsz=VSZ512
> >
> > +
> >  // The EVEX purpose of StaticRounding appears only together with SAE.
> > Re-use  // the bit to mark commutative VEX encodings where swapping
> > the source  // operands may allow to switch from 3-byte to 2-byte VEX
> encoding.
> 
> Stray change (in general please avoid introducing double blank lines, as those
> make patch context less useful).
> 
Done.

Thanks,
Lili.
  
Jan Beulich Dec. 6, 2023, 7:52 a.m. UTC | #3
On 05.12.2023 14:31, Cui, Lili wrote:
>> On 24.11.2023 08:02, Cui, Lili wrote:
>>> @@ -4120,6 +4134,21 @@ build_evex_prefix (void)
>>>      i.vex.bytes[3] |= i.mask.reg->reg_num;  }
>>>
>>> +/* Build (2 bytes) rex2 prefix.
>>> +   | D5h |
>>> +   | m | R4 X4 B4 | W R X B |
>>> +*/
>>> +static void
>>> +build_rex2_prefix (void)
>>> +{
>>> +  /* Rex2 reuses i.vex because they handle i.tm.opcode_space the
>>> +same.  */
>>
>> How do they handle it the same? (Also I don't think this is useful as a code
>> comment; it instead belongs in the description imo.)
>>
> 
> Moved the comment to the functions description.
> 
> /* Build (2 bytes) rex2 prefix.
>    | D5h |
>    | m | R4 X4 B4 | W R X B |
> 
>    Rex2 reuses i.vex as they handle i.tm.opcode_space the same way.  */
> static void
> build_rex2_prefix (void)
> 
> 
> In function "output_insn",  some handle like this.
> 
>       if (!i.vex.length)
>         switch (i.tm.opcode_space)
>           {
>           case SPACE_BASE:
>             break;
>           case SPACE_0F:
>             ++j;
>             break;
>           case SPACE_0F38:
>           case SPACE_0F3A:
>             j += 2;
>             break;
>           default:
>             abort ();
>           }
> .....
>          if (!i.vex.length
>               && i.tm.opcode_space != SPACE_BASE)
>             {
>               *p++ = 0x0f;
>               if (i.tm.opcode_space != SPACE_0F)
>                 *p++ = i.tm.opcode_space == SPACE_0F38
>                        ? 0x38 : 0x3a;
>             }

Oh, I see. That's pretty remote. How about replacing "the same way"? Perhaps
"Rex2 reuses i.vex as they both encode i.tm.opcode_space in their prefixes"?

While in that form it's fine to remain in a code comment, just a general
clarification: When I say something wants saying in the "description", it's
(almost) always that I mean the patch description, not anything else.

>>> @@ -4385,12 +4414,16 @@ optimize_encoding (void)
>>>  	  i.suffix = 0;
>>>  	  /* Convert to byte registers.  */
>>>  	  if (i.types[1].bitfield.word)
>>> -	    j = 16;
>>> -	  else if (i.types[1].bitfield.dword)
>>> +	    /* There are 40 8-bit registers.  */
>>>  	    j = 32;
>>> +	  else if (i.types[1].bitfield.dword)
>>> +	    /* 32 8-bit registers + 32 16-bit registers.  */
>>> +	    j = 64;
>>>  	  else
>>> -	    j = 48;
>>> -	  if (!(i.op[1].regs->reg_flags & RegRex) && base_regnum < 4)
>>> +	    /* 32 8-bit registers + 32 16-bit registers
>>> +	       + 32 32-bit registers.  */
>>> +	    j = 96;
>>> +	  if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum
>>> +< 4)
>>>  	    j += 8;
>>>  	  i.op[1].regs -= j;
>>>  	}
>>
>> I did comment on, in particular, the 8-bit register counts before.
>> Afaict the comments above are nevertheless unchanged and hence still not
>> really correct.
>>
> 
> Changed to :
> 
>       if (flag_code == CODE_64BIT || base_regnum < 4)
>         {
>           i.types[1].bitfield.byte = 1;
>           /* Ignore the suffix.  */
>           i.suffix = 0;
>           /* Convert to byte registers. 8-bit registers are special,
>              RegRex64 and non-RegRex64 each have 8 registers.  */
>           if (i.types[1].bitfield.word)
>             /* 32 (or 40) 8-bit registers.  */
>             j = 32;
>           else if (i.types[1].bitfield.dword)
>             /* 32 (or 40)8-bit registers + 32 16-bit registers.  */

Nit: Missing blank.

>             j = 64;
>           else
>             /* 32 (or 40) 8-bit registers + 32 16-bit registers
>                + 32 32-bit registers.  */
>             j = 96;
> 
>           if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum < 4)
>             j += 8;
>           i.op[1].regs -= j;
>         }

I won't insist on further changes, but imo as you're adding comments,
also adding a comment to this last if() (which finally takes care of
the 8-bit reg special case) would be advisable.

>>> @@ -5663,7 +5711,13 @@ md_assemble (char *line)
>>>  	i.rex = REX_OPCODE;
>>>      }
>>>
>>> -  if (i.rex != 0)
>>> +  if (i.rex2 != 0 || i.rex2_encoding)
>>> +    {
>>> +      build_rex2_prefix ();
>>> +      /* The individual REX.RXBW bits got consumed.  */
>>> +      i.rex &= REX_OPCODE;
>>> +    }
>>> +  else if (i.rex != 0)
>>>      add_prefix (REX_OPCODE | i.rex);
>>>
>>>    insert_lfence_before ();
>>> @@ -5834,6 +5888,10 @@ parse_insn (const char *line, char *mnemonic,
>> bool prefix_only)
>>>  		  /* {rex} */
>>>  		  i.rex_encoding = true;
>>>  		  break;
>>> +		case Prefix_REX2:
>>> +		  /* {rex2} */
>>> +		  i.rex2_encoding = true;
>>> +		  break;
>>>  		case Prefix_NoOptimize:
>>>  		  /* {nooptimize} */
>>>  		  i.no_optimize = true;
>>> @@ -6971,6 +7029,45 @@ VEX_check_encoding (const insn_template *t)
>>>    return 0;
>>>  }
>>>
>>> +/* Check if Egprs operands are valid for the instruction.  */
>>> +
>>> +static int
>>> +check_EgprOperands (const insn_template *t) {
>>> +  if (!t->opcode_modifier.noegpr)
>>> +    return 0;
>>> +
>>> +  for (unsigned int op = 0; op < i.operands; op++)
>>> +    {
>>> +      if (i.types[op].bitfield.class != Reg
>>> +	  /* Special case for (%dx) while doing input/output op */
>>> +	  || i.input_output_operand)
>>
>> Didn't we agree that this extra condition isn't necessary, once the producer
>> site correctly updates all state (which was supposed to be done in a small
>> prereq patch)?
>>
> 
> I tried adding "Unspecified | BaseIndex" to the InOutPortReg, then some related instructions had two memory operands, so it raised a lot of invalid test case fail, and more ugly code needed to be added. In the end, I felt that this simple modification might be better.

Changing InOutPortReg of course isn't going to be easy. But that also wasn't
what we had discussed. Instead (I thought) we agreed on ...

> @@ -13137,6 +13137,7 @@ i386_att_operand (char *operand_string)
>           && !operand_type_check (i.types[this_operand], disp))
>         {
>           i.types[this_operand] = i.base_reg->reg_type;
> +         i.types[this_operand].bitfield.class = 0;
>           i.input_output_operand = true;
>           return 1;

amending this code to also correctly set i.op[].regs. Perhaps it would also
be best to actually clear i.base_reg (for there not being any memory operand).
(FTAOD: All of this in a separate prereq patch, not here. The code creating
inconsistent state has been a [latent] bug for a long time.)

>>> --- a/gas/doc/c-i386.texi
>>> +++ b/gas/doc/c-i386.texi
>>> @@ -217,6 +217,7 @@ accept various extension mnemonics.  For example,
>>> @code{avx10.1/256},  @code{avx10.1/128},  @code{user_msr},
>>> +@code{apx_f},
>>>  @code{amx_int8},
>>>  @code{amx_bf16},
>>>  @code{amx_fp16},
>>> @@ -983,6 +984,9 @@ Different encoding options can be specified via
>> pseudo prefixes:
>>>  instructions (x86-64 only).  Note that this differs from the
>>> @samp{rex}  prefix which generates REX prefix unconditionally.
>>>
>>> +@item
>>> +@samp{@{rex2@}} -- encode with REX2 prefix
>>
>> This isn't in line with what's said for {rex}. Iirc we were in agreement that we
>> want both to behave consistently. In which case documentation also needs to
>> describe them consistently.
>>
> 
> Changed to 
> 
> @item
> @samp{@{rex2@}} -- prefer REX2 prefix for integer and legacy vector
> instructions (APX_F only).  Note that this differs from the @samp{rex2}
> prefix which generates REX2 prefix unconditionally.

Except there's no "rex2" prefix according to the present implementation.

>>> --- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
>>> +++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
>>> @@ -5,3 +5,61 @@ pseudos:
>>>  	{rex} vmovaps %xmm7,%xmm2
>>>  	{rex} vmovaps %xmm17,%xmm2
>>>  	{rex} rorx $7,%eax,%ebx
>>> +	{rex2} vmovaps %xmm7,%xmm2
>>> +	{rex2} xsave (%rax)
>>> +	{rex2} xsaves (%ecx)
>>> +	{rex2} xsaves64 (%ecx)
>>> +	{rex2} xsavec (%ecx)
>>> +	{rex2} xrstors (%ecx)
>>> +	{rex2} xrstors64 (%ecx)
>>> +
>>> +	#All opcodes in the row 0xa* prefixed REX2 are illegal.
>>> +	#{rex2} test (0xa8) is a special case, it will remap to test (0xf6)
>>> +	{rex2} mov    0x90909090,%al
>>> +	{rex2} movabs 0x1,%al
>>> +	{rex2} cmpsb  %es:(%edi),%ds:(%esi)
>>> +	{rex2} lodsb
>>> +	{rex2} lods   %ds:(%esi),%al
>>> +	{rex2} lodsb   (%esi)
>>> +	{rex2} movs
>>> +	{rex2} movs   (%esi), (%edi)
>>> +	{rex2} scasl
>>> +	{rex2} scas   %es:(%edi),%eax
>>> +	{rex2} scasb   (%edi)
>>> +	{rex2} stosb
>>> +	{rex2} stosb   (%edi)
>>> +	{rex2} stos   %eax,%es:(%edi)
>>> +
>>> +	#All opcodes in the row 0x7* prefixed REX2 are illegal.
>>
>> This also covers map 1 row 8, doesn't it?
>>
> 
> No, I didn't find 0xf8* in opcode table.

Assuming (again) you mean 0x0f 0x8*, how did you not find it? Or
wait, depends on what "opcode table" here means: The manual's or
opcodes/i386-opc.tbl? The latter of course doesn't have them, as
they're ...

>>> +	{rex2} jo     .+2-0x70
>>> +	{rex2} jno    .+2-0x70
>>> +	{rex2} jb     .+2-0x70
>>> +	{rex2} jae    .+2-0x70
>>> +	{rex2} je     .+2-0x70
>>> +	{rex2} jne    .+2-0x70
>>> +	{rex2} jbe    .+2-0x70
>>> +	{rex2} ja     .+2-0x70
>>> +	{rex2} js     .+2-0x70
>>> +	{rex2} jns    .+2-0x70
>>> +	{rex2} jp     .+2-0x70
>>> +	{rex2} jnp    .+2-0x70
>>> +	{rex2} jl     .+2-0x70
>>> +	{rex2} jge    .+2-0x70
>>> +	{rex2} jle    .+2-0x70
>>> +	{rex2} jg     .+2-0x70

... the disp32/disp16 forms of these branches, which are created only
during relaxation.

>>> +  /* All opcodes listed map0 0x4*, 0x7*, 0xa*, 0xe* and map1 0x3*, 0x8*
>>> +     are reserved under REX2 and triggers #UD when prefixed with REX2
>>> + */  if (space == 0)
>>> +    switch (opcode >> 4)
>>
>> Both here and ...
>>
>>> +      {
>>> +      case 0x4:
>>> +      case 0x7:
>>> +      case 0xA:
>>> +      case 0xE:
>>> +	return true;
>>> +      default:
>>> +	return false;
>>> +    }
>>> +
>>> +  if (space == SPACE_0F)
>>> +    switch (opcode >> 4)
>>
>> ... here, don't you also need to mask off further bits? There are quite a few
>> opcodes which have a kind-of ModR/M byte encoded directly in the opcode,
>> for example.
>>
> 
> Thanks for reminding. Added the code like this.
> 
> /* Some opcodes encode a ModR/M byte directly in the opcode.  */
>   unsigned long long
>   base_opcode = (length > 1) ? opcode >> (8 * length - 8) : opcode;

Can length be 0? I didn't think so, and then

   base_opcode = opcode >> (8 * length - 8);

would be all you need.

Also in the comment, I think it would be slightly better to say "ModR/M-like
byte".

Jan
  
Cui, Lili Dec. 6, 2023, 12:43 p.m. UTC | #4
> On 05.12.2023 14:31, Cui, Lili wrote:
> >> On 24.11.2023 08:02, Cui, Lili wrote:
> >>> @@ -4120,6 +4134,21 @@ build_evex_prefix (void)
> >>>      i.vex.bytes[3] |= i.mask.reg->reg_num;  }
> >>>
> >>> +/* Build (2 bytes) rex2 prefix.
> >>> +   | D5h |
> >>> +   | m | R4 X4 B4 | W R X B |
> >>> +*/
> >>> +static void
> >>> +build_rex2_prefix (void)
> >>> +{
> >>> +  /* Rex2 reuses i.vex because they handle i.tm.opcode_space the
> >>> +same.  */
> >>
> >> How do they handle it the same? (Also I don't think this is useful as
> >> a code comment; it instead belongs in the description imo.)
> >>
> >
> > Moved the comment to the functions description.
> >
> > /* Build (2 bytes) rex2 prefix.
> >    | D5h |
> >    | m | R4 X4 B4 | W R X B |
> >
> >    Rex2 reuses i.vex as they handle i.tm.opcode_space the same way.
> > */ static void build_rex2_prefix (void)
> >
> >
> > In function "output_insn",  some handle like this.
> >
> >       if (!i.vex.length)
> >         switch (i.tm.opcode_space)
> >           {
> >           case SPACE_BASE:
> >             break;
> >           case SPACE_0F:
> >             ++j;
> >             break;
> >           case SPACE_0F38:
> >           case SPACE_0F3A:
> >             j += 2;
> >             break;
> >           default:
> >             abort ();
> >           }
> > .....
> >          if (!i.vex.length
> >               && i.tm.opcode_space != SPACE_BASE)
> >             {
> >               *p++ = 0x0f;
> >               if (i.tm.opcode_space != SPACE_0F)
> >                 *p++ = i.tm.opcode_space == SPACE_0F38
> >                        ? 0x38 : 0x3a;
> >             }
> 
> Oh, I see. That's pretty remote. How about replacing "the same way"?
> Perhaps
> "Rex2 reuses i.vex as they both encode i.tm.opcode_space in their prefixes"?
> 

Done.

> While in that form it's fine to remain in a code comment, just a general
> clarification: When I say something wants saying in the "description", it's
> (almost) always that I mean the patch description, not anything else.
> 

I see.

> >> I did comment on, in particular, the 8-bit register counts before.
> >> Afaict the comments above are nevertheless unchanged and hence still
> >> not really correct.
> >>
> >
> > Changed to :
> >
> >       if (flag_code == CODE_64BIT || base_regnum < 4)
> >         {
> >           i.types[1].bitfield.byte = 1;
> >           /* Ignore the suffix.  */
> >           i.suffix = 0;
> >           /* Convert to byte registers. 8-bit registers are special,
> >              RegRex64 and non-RegRex64 each have 8 registers.  */
> >           if (i.types[1].bitfield.word)
> >             /* 32 (or 40) 8-bit registers.  */
> >             j = 32;
> >           else if (i.types[1].bitfield.dword)
> >             /* 32 (or 40)8-bit registers + 32 16-bit registers.  */
> 
> Nit: Missing blank.
> 

Done.

> >             j = 64;
> >           else
> >             /* 32 (or 40) 8-bit registers + 32 16-bit registers
> >                + 32 32-bit registers.  */
> >             j = 96;
> >
> >           if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum <
> 4)
> >             j += 8;
> >           i.op[1].regs -= j;
> >         }
> 
> I won't insist on further changes, but imo as you're adding comments, also
> adding a comment to this last if() (which finally takes care of the 8-bit reg
> special case) would be advisable.
> 

Added.

          /* In 64-bit mode, the following byte registers cannot be accessed
             if using the Rex and Rex2 prefix: AH, BH, CH, DH */
          if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum < 4)
            j += 8;

> >>> +/* Check if Egprs operands are valid for the instruction.  */
> >>> +
> >>> +static int
> >>> +check_EgprOperands (const insn_template *t) {
> >>> +  if (!t->opcode_modifier.noegpr)
> >>> +    return 0;
> >>> +
> >>> +  for (unsigned int op = 0; op < i.operands; op++)
> >>> +    {
> >>> +      if (i.types[op].bitfield.class != Reg
> >>> +	  /* Special case for (%dx) while doing input/output op */
> >>> +	  || i.input_output_operand)
> >>
> >> Didn't we agree that this extra condition isn't necessary, once the
> >> producer site correctly updates all state (which was supposed to be
> >> done in a small prereq patch)?
> >>
> >
> > I tried adding "Unspecified | BaseIndex" to the InOutPortReg, then some
> related instructions had two memory operands, so it raised a lot of invalid
> test case fail, and more ugly code needed to be added. In the end, I felt that
> this simple modification might be better.
> 
> Changing InOutPortReg of course isn't going to be easy. But that also wasn't
> what we had discussed. Instead (I thought) we agreed on ...
> 
> > @@ -13137,6 +13137,7 @@ i386_att_operand (char *operand_string)
> >           && !operand_type_check (i.types[this_operand], disp))
> >         {
> >           i.types[this_operand] = i.base_reg->reg_type;
> > +         i.types[this_operand].bitfield.class = 0;
> >           i.input_output_operand = true;
> >           return 1;
> 
> amending this code to also correctly set i.op[].regs. Perhaps it would also be
> best to actually clear i.base_reg (for there not being any memory operand).
> (FTAOD: All of this in a separate prereq patch, not here. The code creating
> inconsistent state has been a [latent] bug for a long time.)
> 

Added i.base_reg = NULL. Just discussing it here, I'll create a new patch for it.

@@ -13016,6 +13016,8 @@ i386_att_operand (char *operand_string)
          && !operand_type_check (i.types[this_operand], disp))
        {
          i.types[this_operand] = i.base_reg->reg_type;
+         i.types[this_operand].bitfield.class = 0;
+         i.base_reg = NULL;
          i.input_output_operand = true;
          return 1;
        }

> >>> --- a/gas/doc/c-i386.texi
> >>> +++ b/gas/doc/c-i386.texi
> >>> @@ -217,6 +217,7 @@ accept various extension mnemonics.  For
> >>> example, @code{avx10.1/256},  @code{avx10.1/128},  @code{user_msr},
> >>> +@code{apx_f},
> >>>  @code{amx_int8},
> >>>  @code{amx_bf16},
> >>>  @code{amx_fp16},
> >>> @@ -983,6 +984,9 @@ Different encoding options can be specified via
> >> pseudo prefixes:
> >>>  instructions (x86-64 only).  Note that this differs from the
> >>> @samp{rex}  prefix which generates REX prefix unconditionally.
> >>>
> >>> +@item
> >>> +@samp{@{rex2@}} -- encode with REX2 prefix
> >>
> >> This isn't in line with what's said for {rex}. Iirc we were in
> >> agreement that we want both to behave consistently. In which case
> >> documentation also needs to describe them consistently.
> >>
> >
> > Changed to
> >
> > @item
> > @samp{@{rex2@}} -- prefer REX2 prefix for integer and legacy vector
> > instructions (APX_F only).  Note that this differs from the
> > @samp{rex2} prefix which generates REX2 prefix unconditionally.
> 
> Except there's no "rex2" prefix according to the present implementation.
>
 
Remove them for current implementation.

@item
@samp{@{rex2@}} -- prefer REX2 prefix for integer and legacy vector
instructions (APX_F only).

> >>> --- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> >>> +++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
> >>> @@ -5,3 +5,61 @@ pseudos:
> >>>  	{rex} vmovaps %xmm7,%xmm2
> >>>  	{rex} vmovaps %xmm17,%xmm2
> >>>  	{rex} rorx $7,%eax,%ebx
> >>> +	{rex2} vmovaps %xmm7,%xmm2
> >>> +	{rex2} xsave (%rax)
> >>> +	{rex2} xsaves (%ecx)
> >>> +	{rex2} xsaves64 (%ecx)
> >>> +	{rex2} xsavec (%ecx)
> >>> +	{rex2} xrstors (%ecx)
> >>> +	{rex2} xrstors64 (%ecx)
> >>> +
> >>> +	#All opcodes in the row 0xa* prefixed REX2 are illegal.
> >>> +	#{rex2} test (0xa8) is a special case, it will remap to test (0xf6)
> >>> +	{rex2} mov    0x90909090,%al
> >>> +	{rex2} movabs 0x1,%al
> >>> +	{rex2} cmpsb  %es:(%edi),%ds:(%esi)
> >>> +	{rex2} lodsb
> >>> +	{rex2} lods   %ds:(%esi),%al
> >>> +	{rex2} lodsb   (%esi)
> >>> +	{rex2} movs
> >>> +	{rex2} movs   (%esi), (%edi)
> >>> +	{rex2} scasl
> >>> +	{rex2} scas   %es:(%edi),%eax
> >>> +	{rex2} scasb   (%edi)
> >>> +	{rex2} stosb
> >>> +	{rex2} stosb   (%edi)
> >>> +	{rex2} stos   %eax,%es:(%edi)
> >>> +
> >>> +	#All opcodes in the row 0x7* prefixed REX2 are illegal.
> >>
> >> This also covers map 1 row 8, doesn't it?
> >>
> >
> > No, I didn't find 0xf8* in opcode table.
> 
> Assuming (again) you mean 0x0f 0x8*, how did you not find it? Or wait,
> depends on what "opcode table" here means: The manual's or opcodes/i386-
> opc.tbl? The latter of course doesn't have them, as they're ...
> 
> >>> +	{rex2} jo     .+2-0x70
> >>> +	{rex2} jno    .+2-0x70
> >>> +	{rex2} jb     .+2-0x70
> >>> +	{rex2} jae    .+2-0x70
> >>> +	{rex2} je     .+2-0x70
> >>> +	{rex2} jne    .+2-0x70
> >>> +	{rex2} jbe    .+2-0x70
> >>> +	{rex2} ja     .+2-0x70
> >>> +	{rex2} js     .+2-0x70
> >>> +	{rex2} jns    .+2-0x70
> >>> +	{rex2} jp     .+2-0x70
> >>> +	{rex2} jnp    .+2-0x70
> >>> +	{rex2} jl     .+2-0x70
> >>> +	{rex2} jge    .+2-0x70
> >>> +	{rex2} jle    .+2-0x70
> >>> +	{rex2} jg     .+2-0x70
> 
> ... the disp32/disp16 forms of these branches, which are created only during
> relaxation.
>

Oh,  I see,  I found them in sdm and added testcase for them.

        #All opcodes in the row 0x8* (map1) prefixed REX2 are illegal.
        {rex2} jo     .+6+0x90909090
        {rex2} jno    .+6+0x90909090
        {rex2} jb     .+6+0x90909090
        {rex2} jae    .+6+0x90909090
        {rex2} je     .+6+0x90909090
        {rex2} jne    .+6+0x90909090
        {rex2} jbe    .+6+0x90909090
        {rex2} ja     .+6+0x90909090
        {rex2} js     .+6+0x90909090
        {rex2} jns    .+6+0x90909090
        {rex2} jp     .+6+0x90909090
        {rex2} jnp    .+6+0x90909090
        {rex2} jl     .+6+0x90909090
        {rex2} jge    .+6+0x90909090
        {rex2} jle    .+6+0x90909090
        {rex2} jg     .+6+0x90909090
 
> >>> +  /* All opcodes listed map0 0x4*, 0x7*, 0xa*, 0xe* and map1 0x3*, 0x8*
> >>> +     are reserved under REX2 and triggers #UD when prefixed with
> >>> + REX2 */  if (space == 0)
> >>> +    switch (opcode >> 4)
> >>
> >> Both here and ...
> >>
> >>> +      {
> >>> +      case 0x4:
> >>> +      case 0x7:
> >>> +      case 0xA:
> >>> +      case 0xE:
> >>> +	return true;
> >>> +      default:
> >>> +	return false;
> >>> +    }
> >>> +
> >>> +  if (space == SPACE_0F)
> >>> +    switch (opcode >> 4)
> >>
> >> ... here, don't you also need to mask off further bits? There are
> >> quite a few opcodes which have a kind-of ModR/M byte encoded directly
> >> in the opcode, for example.
> >>
> >
> > Thanks for reminding. Added the code like this.
> >
> > /* Some opcodes encode a ModR/M byte directly in the opcode.  */
> >   unsigned long long
> >   base_opcode = (length > 1) ? opcode >> (8 * length - 8) : opcode;
> 
> Can length be 0? I didn't think so, and then
> 
>    base_opcode = opcode >> (8 * length - 8);
> 
> would be all you need.
>

yes good way.

> Also in the comment, I think it would be slightly better to say "ModR/M-like
> byte".
> 

Done.

Thanks,
Lili.
  
Jan Beulich Dec. 7, 2023, 9:01 a.m. UTC | #5
On 06.12.2023 13:43, Cui, Lili wrote:
>> On 05.12.2023 14:31, Cui, Lili wrote:
>>>> On 24.11.2023 08:02, Cui, Lili wrote:
>>>>> +	#All opcodes in the row 0x7* prefixed REX2 are illegal.
>>>>
>>>> This also covers map 1 row 8, doesn't it?
>>>>
>>>
>>> No, I didn't find 0xf8* in opcode table.
>>
>> Assuming (again) you mean 0x0f 0x8*, how did you not find it? Or wait,
>> depends on what "opcode table" here means: The manual's or opcodes/i386-
>> opc.tbl? The latter of course doesn't have them, as they're ...
>>
>>>>> +	{rex2} jo     .+2-0x70
>>>>> +	{rex2} jno    .+2-0x70
>>>>> +	{rex2} jb     .+2-0x70
>>>>> +	{rex2} jae    .+2-0x70
>>>>> +	{rex2} je     .+2-0x70
>>>>> +	{rex2} jne    .+2-0x70
>>>>> +	{rex2} jbe    .+2-0x70
>>>>> +	{rex2} ja     .+2-0x70
>>>>> +	{rex2} js     .+2-0x70
>>>>> +	{rex2} jns    .+2-0x70
>>>>> +	{rex2} jp     .+2-0x70
>>>>> +	{rex2} jnp    .+2-0x70
>>>>> +	{rex2} jl     .+2-0x70
>>>>> +	{rex2} jge    .+2-0x70
>>>>> +	{rex2} jle    .+2-0x70
>>>>> +	{rex2} jg     .+2-0x70
>>
>> ... the disp32/disp16 forms of these branches, which are created only during
>> relaxation.
>>
> 
> Oh,  I see,  I found them in sdm and added testcase for them.
> 
>         #All opcodes in the row 0x8* (map1) prefixed REX2 are illegal.
>         {rex2} jo     .+6+0x90909090
>         {rex2} jno    .+6+0x90909090
>         {rex2} jb     .+6+0x90909090
>         {rex2} jae    .+6+0x90909090
>         {rex2} je     .+6+0x90909090
>         {rex2} jne    .+6+0x90909090
>         {rex2} jbe    .+6+0x90909090
>         {rex2} ja     .+6+0x90909090
>         {rex2} js     .+6+0x90909090
>         {rex2} jns    .+6+0x90909090
>         {rex2} jp     .+6+0x90909090
>         {rex2} jnp    .+6+0x90909090
>         {rex2} jl     .+6+0x90909090
>         {rex2} jge    .+6+0x90909090
>         {rex2} jle    .+6+0x90909090
>         {rex2} jg     .+6+0x90909090

I don't mind the addition, but I don't think this actually tests anything that
the other block didn't already test. Hence why I suggested to merely update
the comment there.

Jan
  
Cui, Lili Dec. 8, 2023, 3:10 a.m. UTC | #6
> On 06.12.2023 13:43, Cui, Lili wrote:
> >> On 05.12.2023 14:31, Cui, Lili wrote:
> >>>> On 24.11.2023 08:02, Cui, Lili wrote:
> >>>>> +	#All opcodes in the row 0x7* prefixed REX2 are illegal.
> >>>>
> >>>> This also covers map 1 row 8, doesn't it?
> >>>>
> >>>
> >>> No, I didn't find 0xf8* in opcode table.
> >>
> >> Assuming (again) you mean 0x0f 0x8*, how did you not find it? Or
> >> wait, depends on what "opcode table" here means: The manual's or
> >> opcodes/i386- opc.tbl? The latter of course doesn't have them, as they're ...
> >>
> >>>>> +	{rex2} jo     .+2-0x70
> >>>>> +	{rex2} jno    .+2-0x70
> >>>>> +	{rex2} jb     .+2-0x70
> >>>>> +	{rex2} jae    .+2-0x70
> >>>>> +	{rex2} je     .+2-0x70
> >>>>> +	{rex2} jne    .+2-0x70
> >>>>> +	{rex2} jbe    .+2-0x70
> >>>>> +	{rex2} ja     .+2-0x70
> >>>>> +	{rex2} js     .+2-0x70
> >>>>> +	{rex2} jns    .+2-0x70
> >>>>> +	{rex2} jp     .+2-0x70
> >>>>> +	{rex2} jnp    .+2-0x70
> >>>>> +	{rex2} jl     .+2-0x70
> >>>>> +	{rex2} jge    .+2-0x70
> >>>>> +	{rex2} jle    .+2-0x70
> >>>>> +	{rex2} jg     .+2-0x70
> >>
> >> ... the disp32/disp16 forms of these branches, which are created only
> >> during relaxation.
> >>
> >
> > Oh,  I see,  I found them in sdm and added testcase for them.
> >
> >         #All opcodes in the row 0x8* (map1) prefixed REX2 are illegal.
> >         {rex2} jo     .+6+0x90909090
> >         {rex2} jno    .+6+0x90909090
> >         {rex2} jb     .+6+0x90909090
> >         {rex2} jae    .+6+0x90909090
> >         {rex2} je     .+6+0x90909090
> >         {rex2} jne    .+6+0x90909090
> >         {rex2} jbe    .+6+0x90909090
> >         {rex2} ja     .+6+0x90909090
> >         {rex2} js     .+6+0x90909090
> >         {rex2} jns    .+6+0x90909090
> >         {rex2} jp     .+6+0x90909090
> >         {rex2} jnp    .+6+0x90909090
> >         {rex2} jl     .+6+0x90909090
> >         {rex2} jge    .+6+0x90909090
> >         {rex2} jle    .+6+0x90909090
> >         {rex2} jg     .+6+0x90909090
> 
> I don't mind the addition, but I don't think this actually tests anything that the
> other block didn't already test. Hence why I suggested to merely update the
> comment there.
> 

Moved new test cases together with 0x7* (map0).

        #All opcodes in the row 0x7* (map0) and 0x8* (map1) prefixed REX2 are illegal.
        {rex2} jo     .+2-0x70
        {rex2} jno    .+2-0x70
        {rex2} jb     .+2-0x70
        {rex2} jae    .+2-0x70
        {rex2} je     .+2-0x70
        {rex2} jne    .+2-0x70
        {rex2} jbe    .+2-0x70
        {rex2} ja     .+2-0x70
        {rex2} js     .+2-0x70
        {rex2} jns    .+2-0x70
        {rex2} jp     .+2-0x70
        {rex2} jnp    .+2-0x70
        {rex2} jl     .+2-0x70
        {rex2} jge    .+2-0x70
        {rex2} jle    .+2-0x70
        {rex2} jg     .+2-0x70
        {rex2} jo     .+6+0x90909090
        {rex2} jno    .+6+0x90909090
        {rex2} jb     .+6+0x90909090
        {rex2} jae    .+6+0x90909090
        {rex2} je     .+6+0x90909090
        {rex2} jne    .+6+0x90909090
        {rex2} jbe    .+6+0x90909090
        {rex2} ja     .+6+0x90909090
        {rex2} js     .+6+0x90909090
        {rex2} jns    .+6+0x90909090
        {rex2} jp     .+6+0x90909090
        {rex2} jnp    .+6+0x90909090
        {rex2} jl     .+6+0x90909090
        {rex2} jge    .+6+0x90909090
        {rex2} jle    .+6+0x90909090
        {rex2} jg     .+6+0x90909090

Lili.
  

Patch

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 235e41e7918..638d3aa07c8 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -239,6 +239,7 @@  enum i386_error
     bad_imm4,
     unsupported_with_intel_mnemonic,
     unsupported_syntax,
+    unsupported_EGPR_for_addressing,
     unsupported,
     unsupported_on_arch,
     unsupported_64bit,
@@ -247,6 +248,7 @@  enum i386_error
     invalid_vector_register_set,
     invalid_tmm_register_set,
     invalid_dest_and_src_register_set,
+    invalid_pseudo_prefix,
     unsupported_vector_index_register,
     unsupported_broadcast,
     broadcast_needed,
@@ -354,6 +356,7 @@  struct _i386_insn
     modrm_byte rm;
     rex_byte rex;
     rex_byte vrex;
+    rex_byte rex2;
     sib_byte sib;
     vex_prefix vex;
 
@@ -427,6 +430,9 @@  struct _i386_insn
     /* Prefer the REX byte in encoding.  */
     bool rex_encoding;
 
+    /* Prefer the REX2 prefix in encoding.  */
+    bool rex2_encoding;
+
     /* Disable instruction size optimization.  */
     bool no_optimize;
 
@@ -1161,6 +1167,7 @@  static const arch_entry cpu_arch[] =
   SUBARCH (pbndkb, PBNDKB, PBNDKB, false),
   VECARCH (avx10.1, AVX10_1, ANY_AVX512F, set),
   SUBARCH (user_msr, USER_MSR, USER_MSR, false),
+  SUBARCH (apx_f, APX_F, APX_F, false),
 };
 
 #undef SUBARCH
@@ -1667,6 +1674,7 @@  _is_cpu (const i386_cpu_attr *a, enum i386_cpu cpu)
     case CpuHLE:      return a->bitfield.cpuhle;
     case CpuAVX512F:  return a->bitfield.cpuavx512f;
     case CpuAVX512VL: return a->bitfield.cpuavx512vl;
+    case CpuAPX_F:    return a->bitfield.cpuapx_f;
     case Cpu64:       return a->bitfield.cpu64;
     case CpuNo64:     return a->bitfield.cpuno64;
     default:
@@ -2338,7 +2346,7 @@  register_number (const reg_entry *r)
   if (r->reg_flags & RegRex)
     nr += 8;
 
-  if (r->reg_flags & RegVRex)
+  if (r->reg_flags & (RegVRex | RegRex2))
     nr += 16;
 
   return nr;
@@ -3865,6 +3873,12 @@  is_any_vex_encoding (const insn_template *t)
   return t->opcode_modifier.vex || t->opcode_modifier.evex;
 }
 
+static INLINE bool
+is_apx_rex2_encoding (void)
+{
+  return i.rex2 || i.rex2_encoding;
+}
+
 static unsigned int
 get_broadcast_bytes (const insn_template *t, bool diag)
 {
@@ -4120,6 +4134,21 @@  build_evex_prefix (void)
     i.vex.bytes[3] |= i.mask.reg->reg_num;
 }
 
+/* Build (2 bytes) rex2 prefix.
+   | D5h |
+   | m | R4 X4 B4 | W R X B |
+*/
+static void
+build_rex2_prefix (void)
+{
+  /* Rex2 reuses i.vex because they handle i.tm.opcode_space the same.  */
+  i.vex.length = 2;
+  i.vex.bytes[0] = 0xd5;
+  /* For the W R X B bits, the variables of rex prefix will be reused.  */
+  i.vex.bytes[1] = ((i.tm.opcode_space << 7)
+		    | (i.rex2 << 4) | i.rex);
+}
+
 static void
 process_immext (void)
 {
@@ -4385,12 +4414,16 @@  optimize_encoding (void)
 	  i.suffix = 0;
 	  /* Convert to byte registers.  */
 	  if (i.types[1].bitfield.word)
-	    j = 16;
-	  else if (i.types[1].bitfield.dword)
+	    /* There are 40 8-bit registers.  */
 	    j = 32;
+	  else if (i.types[1].bitfield.dword)
+	    /* 32 8-bit registers + 32 16-bit registers.  */
+	    j = 64;
 	  else
-	    j = 48;
-	  if (!(i.op[1].regs->reg_flags & RegRex) && base_regnum < 4)
+	    /* 32 8-bit registers + 32 16-bit registers
+	       + 32 32-bit registers.  */
+	    j = 96;
+	  if (!(i.op[1].regs->reg_flags & (RegRex | RegRex2)) && base_regnum < 4)
 	    j += 8;
 	  i.op[1].regs -= j;
 	}
@@ -5275,6 +5308,9 @@  md_assemble (char *line)
 	case unsupported_syntax:
 	  err_msg = _("unsupported syntax");
 	  break;
+	case unsupported_EGPR_for_addressing:
+	  err_msg = _("unsupported extended GPR for addressing");
+	  break;
 	case unsupported:
 	  as_bad (_("unsupported instruction `%s'"),
 		  pass1_mnem ? pass1_mnem : insn_name (current_templates->start));
@@ -5322,6 +5358,9 @@  md_assemble (char *line)
 	case invalid_dest_and_src_register_set:
 	  err_msg = _("destination and source registers must be distinct");
 	  break;
+	case invalid_pseudo_prefix:
+	  err_msg = _("rex2 pseudo prefix cannot be used here");
+	  break;
 	case unsupported_vector_index_register:
 	  err_msg = _("unsupported vector index register");
 	  break;
@@ -5576,6 +5615,13 @@  md_assemble (char *line)
 	  return;
 	}
 
+      /* Check for explicit REX2 prefix.  */
+      if (i.rex2_encoding)
+	{
+	  as_bad (_("REX2 prefix invalid with `%s'"), insn_name (&i.tm));
+	  return;
+	}
+
       if (i.tm.opcode_modifier.vex)
 	build_vex_prefix (t);
       else
@@ -5615,11 +5661,12 @@  md_assemble (char *line)
 	  && (i.op[1].regs->reg_flags & RegRex64) != 0)
       || (((i.types[0].bitfield.class == Reg && i.types[0].bitfield.byte)
 	   || (i.types[1].bitfield.class == Reg && i.types[1].bitfield.byte))
-	  && i.rex != 0))
+	  && (i.rex != 0 || i.rex2 != 0)))
     {
       int x;
 
-      i.rex |= REX_OPCODE;
+      if (!i.rex2)
+	i.rex |= REX_OPCODE;
       for (x = 0; x < 2; x++)
 	{
 	  /* Look for 8 bit operand that uses old registers.  */
@@ -5630,7 +5677,7 @@  md_assemble (char *line)
 	      /* In case it is "hi" register, give up.  */
 	      if (i.op[x].regs->reg_num > 3)
 		as_bad (_("can't encode register '%s%s' in an "
-			  "instruction requiring REX prefix."),
+			  "instruction requiring REX/REX2 prefix."),
 			register_prefix, i.op[x].regs->reg_name);
 
 	      /* Otherwise it is equivalent to the extended register.
@@ -5642,11 +5689,11 @@  md_assemble (char *line)
 	}
     }
 
-  if (i.rex == 0 && i.rex_encoding)
+  if ((i.rex == 0 && i.rex_encoding) || (i.rex2 == 0 && i.rex2_encoding))
     {
       /* Check if we can add a REX_OPCODE byte.  Look for 8 bit operand
 	 that uses legacy register.  If it is "hi" register, don't add
-	 the REX_OPCODE byte.  */
+	 rex and rex2 prefix.  */
       int x;
       for (x = 0; x < 2; x++)
 	if (i.types[x].bitfield.class == Reg
@@ -5656,6 +5703,7 @@  md_assemble (char *line)
 	  {
 	    gas_assert (!(i.op[x].regs->reg_flags & RegRex));
 	    i.rex_encoding = false;
+	    i.rex2_encoding = false;
 	    break;
 	  }
 
@@ -5663,7 +5711,13 @@  md_assemble (char *line)
 	i.rex = REX_OPCODE;
     }
 
-  if (i.rex != 0)
+  if (i.rex2 != 0 || i.rex2_encoding)
+    {
+      build_rex2_prefix ();
+      /* The individual REX.RXBW bits got consumed.  */
+      i.rex &= REX_OPCODE;
+    }
+  else if (i.rex != 0)
     add_prefix (REX_OPCODE | i.rex);
 
   insert_lfence_before ();
@@ -5834,6 +5888,10 @@  parse_insn (const char *line, char *mnemonic, bool prefix_only)
 		  /* {rex} */
 		  i.rex_encoding = true;
 		  break;
+		case Prefix_REX2:
+		  /* {rex2} */
+		  i.rex2_encoding = true;
+		  break;
 		case Prefix_NoOptimize:
 		  /* {nooptimize} */
 		  i.no_optimize = true;
@@ -6971,6 +7029,45 @@  VEX_check_encoding (const insn_template *t)
   return 0;
 }
 
+/* Check if Egprs operands are valid for the instruction.  */
+
+static int
+check_EgprOperands (const insn_template *t)
+{
+  if (!t->opcode_modifier.noegpr)
+    return 0;
+
+  for (unsigned int op = 0; op < i.operands; op++)
+    {
+      if (i.types[op].bitfield.class != Reg
+	  /* Special case for (%dx) while doing input/output op */
+	  || i.input_output_operand)
+	continue;
+
+      if (i.op[op].regs->reg_flags & RegRex2)
+	{
+	  i.error = register_type_mismatch;
+	  return 1;
+	}
+    }
+
+  if ((i.index_reg && (i.index_reg->reg_flags & RegRex2))
+      || (i.base_reg && (i.base_reg->reg_flags & RegRex2)))
+    {
+      i.error = unsupported_EGPR_for_addressing;
+      return 1;
+    }
+
+  /* Check pseudo prefix {rex2} are valid.  */
+  if (i.rex2_encoding)
+    {
+      i.error = invalid_pseudo_prefix;
+      return 1;
+    }
+
+  return 0;
+}
+
 /* Helper function for the progress() macro in match_template().  */
 static INLINE enum i386_error progress (enum i386_error new,
 					enum i386_error last,
@@ -7107,7 +7204,9 @@  match_template (char mnem_suffix)
       /* Do not verify operands when there are none.  */
       if (!t->operands)
 	{
-	  if (VEX_check_encoding (t))
+	  /* When there are no operands, we still need to use the
+	     check_EgprOperands function to check whether {rex2} is valid.  */
+	  if (VEX_check_encoding (t) || check_EgprOperands (t))
 	    {
 	      specific_error = progress (i.error);
 	      continue;
@@ -7443,6 +7542,13 @@  match_template (char mnem_suffix)
 	  continue;
 	}
 
+      /* Check if EGRPS operands(r16-r31) are valid.  */
+      if (check_EgprOperands (t))
+	{
+	  specific_error = progress (i.error);
+	  continue;
+	}
+
       /* Check whether to use the shorter VEX encoding for certain insns where
 	 the EVEX enconding comes first in the table.  This requires the respective
 	 AVX-* feature to be explicitly enabled.  */
@@ -8340,6 +8446,18 @@  static INLINE void set_rex_vrex (const reg_entry *r, unsigned int rex_bit,
 
   if (r->reg_flags & RegVRex)
     i.vrex |= rex_bit;
+
+  if (r->reg_flags & RegRex2)
+    i.rex2 |= rex_bit;
+}
+
+static INLINE void
+set_rex_rex2 (const reg_entry *r, unsigned int rex_bit)
+{
+  if ((r->reg_flags & RegRex) != 0)
+    i.rex |= rex_bit;
+  if ((r->reg_flags & RegRex2) != 0)
+    i.rex2 |= rex_bit;
 }
 
 static int
@@ -8823,8 +8941,7 @@  build_modrm_byte (void)
 		  i.rm.regmem = ESCAPE_TO_TWO_BYTE_ADDRESSING;
 		  i.types[op] = operand_type_and_not (i.types[op], anydisp);
 		  i.types[op].bitfield.disp32 = 1;
-		  if ((i.index_reg->reg_flags & RegRex) != 0)
-		    i.rex |= REX_X;
+		  set_rex_rex2 (i.index_reg, REX_X);
 		}
 	    }
 	  /* RIP addressing for 64bit mode.  */
@@ -8895,8 +9012,7 @@  build_modrm_byte (void)
 
 	      if (!i.tm.opcode_modifier.sib)
 		i.rm.regmem = i.base_reg->reg_num;
-	      if ((i.base_reg->reg_flags & RegRex) != 0)
-		i.rex |= REX_B;
+	      set_rex_rex2 (i.base_reg, REX_B);
 	      i.sib.base = i.base_reg->reg_num;
 	      /* x86-64 ignores REX prefix bit here to avoid decoder
 		 complications.  */
@@ -8934,8 +9050,7 @@  build_modrm_byte (void)
 		  else
 		    i.sib.index = i.index_reg->reg_num;
 		  i.rm.regmem = ESCAPE_TO_TWO_BYTE_ADDRESSING;
-		  if ((i.index_reg->reg_flags & RegRex) != 0)
-		    i.rex |= REX_X;
+		  set_rex_rex2 (i.index_reg, REX_X);
 		}
 
 	      if (i.disp_operands
@@ -10080,6 +10195,12 @@  output_insn (void)
 	  for (j = ARRAY_SIZE (i.prefix), q = i.prefix; j > 0; j--, q++)
 	    if (*q)
 	      frag_opcode_byte (*q);
+
+	  if (is_apx_rex2_encoding ())
+	    {
+	      frag_opcode_byte (i.vex.bytes[0]);
+	      frag_opcode_byte (i.vex.bytes[1]);
+	    }
 	}
       else
 	{
@@ -14107,6 +14228,13 @@  static bool check_register (const reg_entry *r)
 	i.vec_encoding = vex_encoding_error;
     }
 
+  if (r->reg_flags & RegRex2)
+    {
+      if (!cpu_arch_flags.bitfield.cpuapx_f
+	  || flag_code != CODE_64BIT)
+	return false;
+    }
+
   if (((r->reg_flags & (RegRex64 | RegRex)) || r->reg_type.bitfield.qword)
       && (!cpu_arch_flags.bitfield.cpu64
 	  || r->reg_type.bitfield.class != RegCR
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 03ee980bef7..53fc6fd6899 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -217,6 +217,7 @@  accept various extension mnemonics.  For example,
 @code{avx10.1/256},
 @code{avx10.1/128},
 @code{user_msr},
+@code{apx_f},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -983,6 +984,9 @@  Different encoding options can be specified via pseudo prefixes:
 instructions (x86-64 only).  Note that this differs from the @samp{rex}
 prefix which generates REX prefix unconditionally.
 
+@item
+@samp{@{rex2@}} -- encode with REX2 prefix
+
 @item
 @samp{@{nooptimize@}} -- disable instruction size optimization.
 @end itemize
@@ -1663,7 +1667,7 @@  supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab @samp{.cx16}
 @item @samp{.padlock} @tab @samp{.clzero} @tab @samp{.mwaitx} @tab @samp{.rdpru}
 @item @samp{.mcommit} @tab @samp{.sev_es} @tab @samp{.snp} @tab @samp{.invlpgb}
-@item @samp{.tlbsync}
+@item @samp{.tlbsync} @tab @samp{.apx_f}
 @end multitable
 
 Apart from the warning, there are only two other effects on
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
index a2b09d2e74f..56834371133 100644
--- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval-intel.d
@@ -2,49 +2,4 @@ 
 #as: --32
 #objdump: -dw -Mx86-64 -Mintel
 #name: x86-64 (ILP32) illegal opcodes (Intel mode)
-
-.*: +file format .*
-
-Disassembly of section .text:
-
-0+ <aaa>:
-[ 	]*[a-f0-9]+:	37                   	\(bad\)
-
-0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+5 <aam0>:
-[ 	]*[a-f0-9]+:	d4                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+7 <aam1>:
-[ 	]*[a-f0-9]+:	d4                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+9 <aas>:
-[ 	]*[a-f0-9]+:	3f                   	\(bad\)
-
-0+a <bound>:
-[ 	]*[a-f0-9]+:	62                   	.byte 0x62
-[ 	]*[a-f0-9]+:	10                   	.byte 0x10
-
-0+c <daa>:
-[ 	]*[a-f0-9]+:	27                   	\(bad\)
-
-0+d <das>:
-[ 	]*[a-f0-9]+:	2f                   	\(bad\)
-
-0+e <into>:
-[ 	]*[a-f0-9]+:	ce                   	\(bad\)
-
-0+f <pusha>:
-[ 	]*[a-f0-9]+:	60                   	\(bad\)
-
-0+10 <popa>:
-[ 	]*[a-f0-9]+:	61                   	\(bad\)
-#pass
+#dump: ../x86-64-opcode-inval-intel.d
diff --git a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
index 5a17b0b412e..b5233a5cf93 100644
--- a/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
+++ b/gas/testsuite/gas/i386/ilp32/x86-64-opcode-inval.d
@@ -2,49 +2,4 @@ 
 #as: --32
 #objdump: -dw -Mx86-64
 #name: x86-64 (ILP32) illegal opcodes
-
-.*: +file format .*
-
-Disassembly of section .text:
-
-0+ <aaa>:
-[ 	]*[a-f0-9]+:	37                   	\(bad\)
-
-0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+5 <aam0>:
-[ 	]*[a-f0-9]+:	d4                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+7 <aam1>:
-[ 	]*[a-f0-9]+:	d4                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+9 <aas>:
-[ 	]*[a-f0-9]+:	3f                   	\(bad\)
-
-0+a <bound>:
-[ 	]*[a-f0-9]+:	62                   	.byte 0x62
-[ 	]*[a-f0-9]+:	10                   	.byte 0x10
-
-0+c <daa>:
-[ 	]*[a-f0-9]+:	27                   	\(bad\)
-
-0+d <das>:
-[ 	]*[a-f0-9]+:	2f                   	\(bad\)
-
-0+e <into>:
-[ 	]*[a-f0-9]+:	ce                   	\(bad\)
-
-0+f <pusha>:
-[ 	]*[a-f0-9]+:	60                   	\(bad\)
-
-0+10 <popa>:
-[ 	]*[a-f0-9]+:	61                   	\(bad\)
-#pass
+#dump: ../x86-64-opcode-inval.d
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
new file mode 100644
index 00000000000..0aa079ca29c
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.l
@@ -0,0 +1,15 @@ 
+.*: Assembler messages:
+.*:4: Error: bad register name `%r17d'
+.*:7: Error: unsupported extended GPR for addressing for `xsave'
+.*:8: Error: unsupported extended GPR for addressing for `xsave64'
+.*:9: Error: unsupported extended GPR for addressing for `xrstor'
+.*:10: Error: unsupported extended GPR for addressing for `xrstor64'
+.*:11: Error: unsupported extended GPR for addressing for `xsaves'
+.*:12: Error: unsupported extended GPR for addressing for `xsaves64'
+.*:13: Error: unsupported extended GPR for addressing for `xrstors'
+.*:14: Error: unsupported extended GPR for addressing for `xrstors64'
+.*:15: Error: unsupported extended GPR for addressing for `xsaveopt'
+.*:16: Error: unsupported extended GPR for addressing for `xsaveopt64'
+.*:17: Error: unsupported extended GPR for addressing for `xsavec'
+.*:18: Error: unsupported extended GPR for addressing for `xsavec64'
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
new file mode 100644
index 00000000000..c4d2308a604
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-egpr-inval.s
@@ -0,0 +1,18 @@ 
+# Check Illegal 64bit APX_F instructions
+	.text
+	.arch .noapx_f
+	test    $0x7, %r17d
+	.arch .apx_f
+	test    $0x7, %r17d
+	xsave (%r16, %rbx)
+	xsave64 (%r16, %r31)
+	xrstor (%r16, %rbx)
+	xrstor64 (%r16, %rbx)
+	xsaves (%rbx, %r16)
+	xsaves64 (%r16, %rbx)
+	xrstors (%rbx, %r31)
+	xrstors64 (%r16, %rbx)
+	xsaveopt (%r16, %rbx)
+	xsaveopt64 (%r16, %r31)
+	xsavec (%r16, %rbx)
+	xsavec64 (%r16, %r31)
diff --git a/gas/testsuite/gas/i386/x86-64-apx-rex2.d b/gas/testsuite/gas/i386/x86-64-apx-rex2.d
new file mode 100644
index 00000000000..e3cd534da11
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.d
@@ -0,0 +1,83 @@ 
+#as:
+#objdump: -dw
+#name: x86-64 APX_F use gpr32 with rex2 prefix
+#source: x86-64-apx-rex2.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+[	 ]*[a-f0-9]+:[	 ]*d5 11 f6 c0 07[	 ]+test   \$0x7,%r24b
+[	 ]*[a-f0-9]+:[	 ]*d5 11 f7 c0 07 00 00 00[	 ]+test   \$0x7,%r24d
+[	 ]*[a-f0-9]+:[	 ]*d5 19 f7 c0 07 00 00 00[	 ]+test   \$0x7,%r24
+[	 ]*[a-f0-9]+:[	 ]*66 d5 11 f7 c0 07 00[	 ]+test   \$0x7,%r24w
+[	 ]*[a-f0-9]+:[	 ]*44 0f af f8[	 ]+imul   %eax,%r15d
+[	 ]*[a-f0-9]+:[	 ]*d5 c0 af c0[	 ]+imul   %eax,%r16d
+[	 ]*[a-f0-9]+:[	 ]*d5 90 62 12[	 ]+punpckldq %mm2,\(%r18\)
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 00[	 ]+lea    \(%rax\),%r16d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 08[	 ]+lea    \(%rax\),%r17d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 10[	 ]+lea    \(%rax\),%r18d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 18[	 ]+lea    \(%rax\),%r19d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 20[	 ]+lea    \(%rax\),%r20d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 28[	 ]+lea    \(%rax\),%r21d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 30[	 ]+lea    \(%rax\),%r22d
+[	 ]*[a-f0-9]+:[	 ]*d5 40 8d 38[	 ]+lea    \(%rax\),%r23d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 00[	 ]+lea    \(%rax\),%r24d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 08[	 ]+lea    \(%rax\),%r25d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 10[	 ]+lea    \(%rax\),%r26d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 18[	 ]+lea    \(%rax\),%r27d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 20[	 ]+lea    \(%rax\),%r28d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 28[	 ]+lea    \(%rax\),%r29d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 30[	 ]+lea    \(%rax\),%r30d
+[	 ]*[a-f0-9]+:[	 ]*d5 44 8d 38[	 ]+lea    \(%rax\),%r31d
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 05 00 00 00 00[	 ]+lea    0x0\(,%r16,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 0d 00 00 00 00[	 ]+lea    0x0\(,%r17,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 15 00 00 00 00[	 ]+lea    0x0\(,%r18,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 1d 00 00 00 00[	 ]+lea    0x0\(,%r19,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 25 00 00 00 00[	 ]+lea    0x0\(,%r20,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 2d 00 00 00 00[	 ]+lea    0x0\(,%r21,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 35 00 00 00 00[	 ]+lea    0x0\(,%r22,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 20 8d 04 3d 00 00 00 00[	 ]+lea    0x0\(,%r23,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 05 00 00 00 00[	 ]+lea    0x0\(,%r24,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 0d 00 00 00 00[	 ]+lea    0x0\(,%r25,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 15 00 00 00 00[	 ]+lea    0x0\(,%r26,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 1d 00 00 00 00[	 ]+lea    0x0\(,%r27,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 25 00 00 00 00[	 ]+lea    0x0\(,%r28,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 2d 00 00 00 00[	 ]+lea    0x0\(,%r29,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 35 00 00 00 00[	 ]+lea    0x0\(,%r30,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 22 8d 04 3d 00 00 00 00[	 ]+lea    0x0\(,%r31,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 00[	 ]+lea    \(%r16\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 01[	 ]+lea    \(%r17\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 02[	 ]+lea    \(%r18\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 03[	 ]+lea    \(%r19\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 04 24       	lea    \(%r20\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 45 00       	lea    0x0\(%r21\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 06[	 ]+lea    \(%r22\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 10 8d 07[	 ]+lea    \(%r23\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 00[	 ]+lea    \(%r24\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 01[	 ]+lea    \(%r25\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 02[	 ]+lea    \(%r26\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 03[	 ]+lea    \(%r27\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 04 24       	lea    \(%r28\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 45 00       	lea    0x0\(%r29\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 06          	lea    \(%r30\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 11 8d 07          	lea    \(%r31\),%eax
+[	 ]*[a-f0-9]+:[	 ]*4c 8d 38             	lea    \(%rax\),%r15
+[	 ]*[a-f0-9]+:[	 ]*d5 48 8d 00          	lea    \(%rax\),%r16
+[	 ]*[a-f0-9]+:[	 ]*49 8d 07             	lea    \(%r15\),%rax
+[	 ]*[a-f0-9]+:[	 ]*d5 18 8d 00          	lea    \(%r16\),%rax
+[	 ]*[a-f0-9]+:[	 ]*4a 8d 04 3d 00 00 00 00 	lea    0x0\(,%r15,1\),%rax
+[	 ]*[a-f0-9]+:[	 ]*d5 28 8d 04 05 00 00 00 00 	lea    0x0\(,%r16,1\),%rax
+[	 ]*[a-f0-9]+:[	 ]*d5 1c 03 00          	add    \(%r16\),%r8
+[	 ]*[a-f0-9]+:[	 ]*d5 1c 03 38          	add    \(%r16\),%r15
+[	 ]*[a-f0-9]+:[	 ]*d5 4a 8b 04 0d 00 00 00 00 	mov    0x0\(,%r9,1\),%r16
+[	 ]*[a-f0-9]+:[	 ]*d5 4a 8b 04 35 00 00 00 00 	mov    0x0\(,%r14,1\),%r16
+[	 ]*[a-f0-9]+:[	 ]*d5 4d 2b 3a          	sub    \(%r10\),%r31
+[	 ]*[a-f0-9]+:[	 ]*d5 4d 2b 7d 00       	sub    0x0\(%r13\),%r31
+[	 ]*[a-f0-9]+:[	 ]*d5 30 8d 44 20 01    	lea    0x1\(%r16,%r20,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 76 8d 7c 20 01    	lea    0x1\(%r16,%r28,1\),%r31d
+[	 ]*[a-f0-9]+:[	 ]*d5 12 8d 84 04 81 00 00 00 	lea    0x81\(%r20,%r8,1\),%eax
+[	 ]*[a-f0-9]+:[	 ]*d5 57 8d bc 04 81 00 00 00 	lea    0x81\(%r28,%r8,1\),%r31d
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-apx-rex2.s b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
new file mode 100644
index 00000000000..543f0f573d4
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-apx-rex2.s
@@ -0,0 +1,86 @@ 
+# Check 64bit instructions with rex2 prefix encoding
+
+	.allow_index_reg
+	.text
+_start:
+         test	$0x7, %r24b
+         test	$0x7, %r24d
+         test	$0x7, %r24
+         test	$0x7, %r24w
+## REX2.M bit
+         imull	%eax, %r15d
+         imull	%eax, %r16d
+         punpckldq (%r18), %mm2
+## REX2.R4 bit
+         leal	(%rax), %r16d
+         leal	(%rax), %r17d
+         leal	(%rax), %r18d
+         leal	(%rax), %r19d
+         leal	(%rax), %r20d
+         leal	(%rax), %r21d
+         leal	(%rax), %r22d
+         leal	(%rax), %r23d
+         leal	(%rax), %r24d
+         leal	(%rax), %r25d
+         leal	(%rax), %r26d
+         leal	(%rax), %r27d
+         leal	(%rax), %r28d
+         leal	(%rax), %r29d
+         leal	(%rax), %r30d
+         leal	(%rax), %r31d
+## REX2.X4 bit
+         leal	(,%r16), %eax
+         leal	(,%r17), %eax
+         leal	(,%r18), %eax
+         leal	(,%r19), %eax
+         leal	(,%r20), %eax
+         leal	(,%r21), %eax
+         leal	(,%r22), %eax
+         leal	(,%r23), %eax
+         leal	(,%r24), %eax
+         leal	(,%r25), %eax
+         leal	(,%r26), %eax
+         leal	(,%r27), %eax
+         leal	(,%r28), %eax
+         leal	(,%r29), %eax
+         leal	(,%r30), %eax
+         leal	(,%r31), %eax
+## REX.B4 bit
+         leal	(%r16), %eax
+         leal	(%r17), %eax
+         leal	(%r18), %eax
+         leal	(%r19), %eax
+         leal	(%r20), %eax
+         leal	(%r21), %eax
+         leal	(%r22), %eax
+         leal	(%r23), %eax
+         leal	(%r24), %eax
+         leal	(%r25), %eax
+         leal	(%r26), %eax
+         leal	(%r27), %eax
+         leal	(%r28), %eax
+         leal	(%r29), %eax
+         leal	(%r30), %eax
+         leal	(%r31), %eax
+## REX.W bit
+         leaq	(%rax), %r15
+         leaq	(%rax), %r16
+         leaq	(%r15), %rax
+         leaq	(%r16), %rax
+         leaq	(,%r15), %rax
+         leaq	(,%r16), %rax
+## REX.R3 bit
+         add    (%r16), %r8
+         add    (%r16), %r15
+## REX.X3 bit
+         mov    (,%r9), %r16
+         mov    (,%r14), %r16
+## REX.B3 bit
+	 sub   (%r10), %r31
+	 sub   (%r13), %r31
+
+## SIB
+         leal	1(%r16, %r20), %eax
+         leal	1(%r16, %r28), %r31d
+         leal	129(%r20, %r8), %eax
+         leal	129(%r28, %r8), %r31d
diff --git a/gas/testsuite/gas/i386/x86-64-inval-pseudo.l b/gas/testsuite/gas/i386/x86-64-inval-pseudo.l
index 13ad0fb768f..256e1b9a370 100644
--- a/gas/testsuite/gas/i386/x86-64-inval-pseudo.l
+++ b/gas/testsuite/gas/i386/x86-64-inval-pseudo.l
@@ -1,10 +1,16 @@ 
 .*: Assembler messages:
 .*:2: Error: .*
 .*:3: Error: .*
+.*:6: Error: .*
+.*:7: Error: .*
 GAS LISTING .*
 
 
 [ 	]*1[ 	]+\.text
 [ 	]*2[ 	]+\{disp16\} movb \(%ebp\),%al
 [ 	]*3[ 	]+\{disp16\} movb \(%rbp\),%al
+[ 	]*4[ 	]+
+[ 	]*5[ 	]+.*
+[ 	]*6[ 	]+\{rex2\} xsave \(%r15, %rbx\)
+[ 	]*7[ 	]+\{rex2\} xsave64 \(%r15, %rbx\)
 #...
diff --git a/gas/testsuite/gas/i386/x86-64-inval-pseudo.s b/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
index c10b14c2099..ae30476e500 100644
--- a/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
+++ b/gas/testsuite/gas/i386/x86-64-inval-pseudo.s
@@ -1,4 +1,8 @@ 
 	.text
 	{disp16} movb (%ebp),%al
 	{disp16} movb (%rbp),%al
+
+	/* Instruction not support APX.  */
+	{rex2} xsave (%r15, %rbx)
+	{rex2} xsave64 (%r15, %rbx)
 	.p2align 4,0
diff --git a/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d b/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d
index 6ee5b2f95ce..66c4d2cddc0 100644
--- a/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d
+++ b/gas/testsuite/gas/i386/x86-64-opcode-inval-intel.d
@@ -10,41 +10,33 @@  Disassembly of section .text:
 0+ <aaa>:
 [ 	]*[a-f0-9]+:	37                   	\(bad\)
 
-0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+5 <aam0>:
+0+1 <aam0>:
 [ 	]*[a-f0-9]+:	d4                   	\(bad\)
 [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
 
-0+7 <aam1>:
+0+3 <aam1>:
 [ 	]*[a-f0-9]+:	d4                   	\(bad\)
 [ 	]*[a-f0-9]+:	02                   	.byte 0x2
 
-0+9 <aas>:
+0+5 <aas>:
 [ 	]*[a-f0-9]+:	3f                   	\(bad\)
 
-0+a <bound>:
+0+6 <bound>:
 [ 	]*[a-f0-9]+:	62                   	.byte 0x62
 [ 	]*[a-f0-9]+:	10                   	.byte 0x10
 
-0+c <daa>:
+0+8 <daa>:
 [ 	]*[a-f0-9]+:	27                   	\(bad\)
 
-0+d <das>:
+0+9 <das>:
 [ 	]*[a-f0-9]+:	2f                   	\(bad\)
 
-0+e <into>:
+0+a <into>:
 [ 	]*[a-f0-9]+:	ce                   	\(bad\)
 
-0+f <pusha>:
+0+b <pusha>:
 [ 	]*[a-f0-9]+:	60                   	\(bad\)
 
-0+10 <popa>:
+0+c <popa>:
 [ 	]*[a-f0-9]+:	61                   	\(bad\)
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-opcode-inval.d b/gas/testsuite/gas/i386/x86-64-opcode-inval.d
index 12f02c1766c..fbb850b56da 100644
--- a/gas/testsuite/gas/i386/x86-64-opcode-inval.d
+++ b/gas/testsuite/gas/i386/x86-64-opcode-inval.d
@@ -9,41 +9,33 @@  Disassembly of section .text:
 0+ <aaa>:
 [ 	]*[a-f0-9]+:	37                   	\(bad\)
 
-0+1 <aad0>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	0a                   	.byte 0xa
-
-0+3 <aad1>:
-[ 	]*[a-f0-9]+:	d5                   	\(bad\)
-[ 	]*[a-f0-9]+:	02                   	.byte 0x2
-
-0+5 <aam0>:
+0+1 <aam0>:
 [ 	]*[a-f0-9]+:	d4                   	\(bad\)
 [ 	]*[a-f0-9]+:	0a                   	.byte 0xa
 
-0+7 <aam1>:
+0+3 <aam1>:
 [ 	]*[a-f0-9]+:	d4                   	\(bad\)
 [ 	]*[a-f0-9]+:	02                   	.byte 0x2
 
-0+9 <aas>:
+0+5 <aas>:
 [ 	]*[a-f0-9]+:	3f                   	\(bad\)
 
-0+a <bound>:
+0+6 <bound>:
 [ 	]*[a-f0-9]+:	62                   	.byte 0x62
 [ 	]*[a-f0-9]+:	10                   	.byte 0x10
 
-0+c <daa>:
+0+8 <daa>:
 [ 	]*[a-f0-9]+:	27                   	\(bad\)
 
-0+d <das>:
+0+9 <das>:
 [ 	]*[a-f0-9]+:	2f                   	\(bad\)
 
-0+e <into>:
+0+a <into>:
 [ 	]*[a-f0-9]+:	ce                   	\(bad\)
 
-0+f <pusha>:
+0+b <pusha>:
 [ 	]*[a-f0-9]+:	60                   	\(bad\)
 
-0+10 <popa>:
+0+c <popa>:
 [ 	]*[a-f0-9]+:	61                   	\(bad\)
 #pass
diff --git a/gas/testsuite/gas/i386/x86-64-opcode-inval.s b/gas/testsuite/gas/i386/x86-64-opcode-inval.s
index 6cbfe7705a8..fbcda3df773 100644
--- a/gas/testsuite/gas/i386/x86-64-opcode-inval.s
+++ b/gas/testsuite/gas/i386/x86-64-opcode-inval.s
@@ -2,10 +2,6 @@ 
 # All the followings are illegal opcodes for x86-64.
 aaa:
 	aaa
-aad0:
-	aad
-aad1:
-	aad $2
 aam0:
 	aam
 aam1:
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos-bad.l b/gas/testsuite/gas/i386/x86-64-pseudos-bad.l
index 3f9f67fcf4b..7e8c04d970b 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.l
+++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.l
@@ -1,6 +1,55 @@ 
 .*: Assembler messages:
-.*:3: Error: .*`vmovaps'.*
-.*:4: Error: .*`vmovaps'.*
-.*:5: Error: .*`vmovaps'.*
-.*:6: Error: .*`vmovaps'.*
-.*:7: Error: .*`rorx'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`rorx'.*
+.*:[0-9]+: Error: .*`vmovaps'.*
+.*:[0-9]+: Error: .*`xsave'.*
+.*:[0-9]+: Error: .*`xsaves'.*
+.*:[0-9]+: Error: .*`xsaves64'.*
+.*:[0-9]+: Error: .*`xsavec'.*
+.*:[0-9]+: Error: .*`xrstors'.*
+.*:[0-9]+: Error: .*`xrstors64'.*
+.*:[0-9]+: Error: .*`mov'.*
+.*:[0-9]+: Error: .*`movabs'.*
+.*:[0-9]+: Error: .*`cmps'.*
+.*:[0-9]+: Error: .*`lods'.*
+.*:[0-9]+: Error: .*`lods'.*
+.*:[0-9]+: Error: .*`lods'.*
+.*:[0-9]+: Error: .*`movs'.*
+.*:[0-9]+: Error: .*`movs'.*
+.*:[0-9]+: Error: .*`scas'.*
+.*:[0-9]+: Error: .*`scas'.*
+.*:[0-9]+: Error: .*`scas'.*
+.*:[0-9]+: Error: .*`stos'.*
+.*:[0-9]+: Error: .*`stos'.*
+.*:[0-9]+: Error: .*`stos'.*
+.*:[0-9]+: Error: .*`jo'.*
+.*:[0-9]+: Error: .*`jno'.*
+.*:[0-9]+: Error: .*`jb'.*
+.*:[0-9]+: Error: .*`jae'.*
+.*:[0-9]+: Error: .*`je'.*
+.*:[0-9]+: Error: .*`jne'.*
+.*:[0-9]+: Error: .*`jbe'.*
+.*:[0-9]+: Error: .*`ja'.*
+.*:[0-9]+: Error: .*`js'.*
+.*:[0-9]+: Error: .*`jns'.*
+.*:[0-9]+: Error: .*`jp'.*
+.*:[0-9]+: Error: .*`jnp'.*
+.*:[0-9]+: Error: .*`jl'.*
+.*:[0-9]+: Error: .*`jge'.*
+.*:[0-9]+: Error: .*`jle'.*
+.*:[0-9]+: Error: .*`jg'.*
+.*:[0-9]+: Error: .*`in'.*
+.*:[0-9]+: Error: .*`in'.*
+.*:[0-9]+: Error: .*`out'.*
+.*:[0-9]+: Error: .*`out'.*
+.*:[0-9]+: Error: .*`jmp'.*
+.*:[0-9]+: Error: .*`loop'.*
+.*:[0-9]+: Error: .*`wrmsr'.*
+.*:[0-9]+: Error: .*`rdtsc'.*
+.*:[0-9]+: Error: .*`rdmsr'.*
+.*:[0-9]+: Error: .*`sysenter'.*
+.*:[0-9]+: Error: .*`sysexit'.*
+.*:[0-9]+: Error: .*`rdpmc'.*
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
index 3b923593a6a..c65b2dc848d 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
+++ b/gas/testsuite/gas/i386/x86-64-pseudos-bad.s
@@ -5,3 +5,61 @@  pseudos:
 	{rex} vmovaps %xmm7,%xmm2
 	{rex} vmovaps %xmm17,%xmm2
 	{rex} rorx $7,%eax,%ebx
+	{rex2} vmovaps %xmm7,%xmm2
+	{rex2} xsave (%rax)
+	{rex2} xsaves (%ecx)
+	{rex2} xsaves64 (%ecx)
+	{rex2} xsavec (%ecx)
+	{rex2} xrstors (%ecx)
+	{rex2} xrstors64 (%ecx)
+
+	#All opcodes in the row 0xa* prefixed REX2 are illegal.
+	#{rex2} test (0xa8) is a special case, it will remap to test (0xf6)
+	{rex2} mov    0x90909090,%al
+	{rex2} movabs 0x1,%al
+	{rex2} cmpsb  %es:(%edi),%ds:(%esi)
+	{rex2} lodsb
+	{rex2} lods   %ds:(%esi),%al
+	{rex2} lodsb   (%esi)
+	{rex2} movs
+	{rex2} movs   (%esi), (%edi)
+	{rex2} scasl
+	{rex2} scas   %es:(%edi),%eax
+	{rex2} scasb   (%edi)
+	{rex2} stosb
+	{rex2} stosb   (%edi)
+	{rex2} stos   %eax,%es:(%edi)
+
+	#All opcodes in the row 0x7* prefixed REX2 are illegal.
+	{rex2} jo     .+2-0x70
+	{rex2} jno    .+2-0x70
+	{rex2} jb     .+2-0x70
+	{rex2} jae    .+2-0x70
+	{rex2} je     .+2-0x70
+	{rex2} jne    .+2-0x70
+	{rex2} jbe    .+2-0x70
+	{rex2} ja     .+2-0x70
+	{rex2} js     .+2-0x70
+	{rex2} jns    .+2-0x70
+	{rex2} jp     .+2-0x70
+	{rex2} jnp    .+2-0x70
+	{rex2} jl     .+2-0x70
+	{rex2} jge    .+2-0x70
+	{rex2} jle    .+2-0x70
+	{rex2} jg     .+2-0x70
+
+	#All opcodes in the row 0x7* prefixed REX2 are illegal.
+	{rex2} in $0x90,%al
+	{rex2} in $0x90
+	{rex2} out $0x90,%al
+	{rex2} out $0x90
+	{rex2} jmp  *%eax
+	{rex2} loop foo
+
+	#All opcodes in the row 0xf3* prefixed REX2 are illegal.
+	{rex2} wrmsr
+	{rex2} rdtsc
+	{rex2} rdmsr
+	{rex2} sysenter
+	{rex2} sysexitl
+	{rex2} rdpmc
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.d b/gas/testsuite/gas/i386/x86-64-pseudos.d
index 0cc75ef2457..708c22b5899 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.d
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.d
@@ -404,6 +404,18 @@  Disassembly of section .text:
  +[a-f0-9]+:	41 0f 28 10          	movaps \(%r8\),%xmm2
  +[a-f0-9]+:	40 0f 38 01 01       	rex phaddw \(%rcx\),%mm0
  +[a-f0-9]+:	41 0f 38 01 00       	phaddw \(%r8\),%mm0
+ +[a-f0-9]+:	88 c4                	mov    %al,%ah
+ +[a-f0-9]+:	d5 00 d3 e0          	{rex2} shl %cl,%eax
+ +[a-f0-9]+:	d5 00 38 ca          	{rex2} cmp %cl,%dl
+ +[a-f0-9]+:	d5 00 b3 01          	{rex2} mov \$(0x)?1,%bl
+ +[a-f0-9]+:	d5 00 89 c3          	{rex2} mov %eax,%ebx
+ +[a-f0-9]+:	d5 01 89 c6          	{rex2} mov %eax,%r14d
+ +[a-f0-9]+:	d5 01 89 00          	{rex2} mov %eax,\(%r8\)
+ +[a-f0-9]+:	d5 80 28 d7          	{rex2} movaps %xmm7,%xmm2
+ +[a-f0-9]+:	d5 84 28 e7          	{rex2} movaps %xmm7,%xmm12
+ +[a-f0-9]+:	d5 80 28 11          	{rex2} movaps \(%rcx\),%xmm2
+ +[a-f0-9]+:	d5 81 28 10          	{rex2} movaps \(%r8\),%xmm2
+ +[a-f0-9]+:	d5 80 d5 f0          	{rex2} pmullw %mm0,%mm6
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 85 00 00 00 00    	mov    0x0\(%rbp\),%al
@@ -458,6 +470,15 @@  Disassembly of section .text:
  +[a-f0-9]+:	41 0f 28 10          	movaps \(%r8\),%xmm2
  +[a-f0-9]+:	40 0f 38 01 01       	rex phaddw \(%rcx\),%mm0
  +[a-f0-9]+:	41 0f 38 01 00       	phaddw \(%r8\),%mm0
+ +[a-f0-9]+:	88 c4                	mov    %al,%ah
+ +[a-f0-9]+:	d5 00 89 c3          	{rex2} mov %eax,%ebx
+ +[a-f0-9]+:	d5 01 89 c6          	{rex2} mov %eax,%r14d
+ +[a-f0-9]+:	d5 01 89 00          	{rex2} mov %eax,\(%r8\)
+ +[a-f0-9]+:	d5 80 28 d7          	{rex2} movaps %xmm7,%xmm2
+ +[a-f0-9]+:	d5 84 28 e7          	{rex2} movaps %xmm7,%xmm12
+ +[a-f0-9]+:	d5 80 28 11          	{rex2} movaps \(%rcx\),%xmm2
+ +[a-f0-9]+:	d5 81 28 10          	{rex2} movaps \(%r8\),%xmm2
+ +[a-f0-9]+:	d5 80 d5 f0          	{rex2} pmullw %mm0,%mm6
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 45 00             	mov    0x0\(%rbp\),%al
  +[a-f0-9]+:	8a 85 00 00 00 00    	mov    0x0\(%rbp\),%al
diff --git a/gas/testsuite/gas/i386/x86-64-pseudos.s b/gas/testsuite/gas/i386/x86-64-pseudos.s
index 08fac8381c6..29a0c3368fc 100644
--- a/gas/testsuite/gas/i386/x86-64-pseudos.s
+++ b/gas/testsuite/gas/i386/x86-64-pseudos.s
@@ -360,6 +360,19 @@  _start:
 	{rex} movaps (%r8),%xmm2
 	{rex} phaddw (%rcx),%mm0
 	{rex} phaddw (%r8),%mm0
+	{rex2} mov %al,%ah
+	{rex2} shl %cl, %eax
+	{rex2} cmp %cl, %dl
+	{rex2} mov $1, %bl
+	{rex2} movl %eax,%ebx
+	{rex2} movl %eax,%r14d
+	{rex2} movl %eax,(%r8)
+	{rex2} movaps %xmm7,%xmm2
+	{rex2} movaps %xmm7,%xmm12
+	{rex2} movaps (%rcx),%xmm2
+	{rex2} movaps (%r8),%xmm2
+	{rex2} pmullw %mm0,%mm6
+
 
 	movb (%rbp),%al
 	{disp8} movb (%rbp),%al
@@ -422,6 +435,15 @@  _start:
 	{rex} movaps xmm2,XMMWORD PTR [r8]
 	{rex} phaddw mm0,QWORD PTR [rcx]
 	{rex} phaddw mm0,QWORD PTR [r8]
+	{rex2} mov ah,al
+	{rex2} mov ebx,eax
+	{rex2} mov r14d,eax
+	{rex2} mov DWORD PTR [r8],eax
+	{rex2} movaps xmm2,xmm7
+	{rex2} movaps xmm12,xmm7
+	{rex2} movaps xmm2,XMMWORD PTR [rcx]
+	{rex2} movaps xmm2,XMMWORD PTR [r8]
+	{rex2} pmullw mm6,mm0
 
 	mov al, BYTE PTR [rbp]
 	{disp8} mov al, BYTE PTR [rbp]
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index a7f5547017f..2be0df0e981 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -363,6 +363,8 @@  run_dump_test "x86-64-avx512f-rcigrne-intel"
 run_dump_test "x86-64-avx512f-rcigrne"
 run_dump_test "x86-64-avx512f-rcigru-intel"
 run_dump_test "x86-64-avx512f-rcigru"
+run_list_test "x86-64-apx-egpr-inval"
+run_dump_test "x86-64-apx-rex2"
 run_dump_test "x86-64-avx512f-rcigrz-intel"
 run_dump_test "x86-64-avx512f-rcigrz"
 run_dump_test "x86-64-clwb"
diff --git a/include/opcode/i386.h b/include/opcode/i386.h
index dec7652c1cc..a6af3d54da0 100644
--- a/include/opcode/i386.h
+++ b/include/opcode/i386.h
@@ -112,6 +112,8 @@ 
 /* x86-64 extension prefix.  */
 #define REX_OPCODE	0x40
 
+#define REX2_OPCODE	0xd5
+
 /* Non-zero if OPCODE is the rex prefix.  */
 #define REX_PREFIX_P(opcode) (((opcode) & 0xf0) == REX_OPCODE)
 
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index e432b61a6cd..d402d575a3a 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -144,6 +144,11 @@  struct instr_info
   /* Bits of REX we've already used.  */
   uint8_t rex_used;
 
+  /* REX2 prefix for the current instruction use gpr32(r16-r31). */
+  unsigned char rex2;
+  /* Bits of REX2 we've already used.  */
+  unsigned char rex2_used;
+
   bool need_modrm;
   unsigned char need_vex;
   bool has_sib;
@@ -169,6 +174,7 @@  struct instr_info
   signed char last_data_prefix;
   signed char last_addr_prefix;
   signed char last_rex_prefix;
+  signed char last_rex2_prefix;
   signed char last_seg_prefix;
   signed char fwait_prefix;
   /* The active segment register prefix.  */
@@ -272,10 +278,18 @@  struct dis_private {
       ins->rex_used |= REX_OPCODE;			\
   }
 
+#define USED_REX2(value)				\
+  {							\
+    if ((ins->rex2 & value))				\
+      ins->rex2_used |= value;				\
+  }
 
 #define EVEX_b_used 1
 #define EVEX_len_used 2
 
+/* M0 in rex2 prefix represents map0 or map1.  */
+#define REX2_M 0x8
+
 /* Flags stored in PREFIXES.  */
 #define PREFIX_REPZ 1
 #define PREFIX_REPNZ 2
@@ -289,6 +303,7 @@  struct dis_private {
 #define PREFIX_DATA 0x200
 #define PREFIX_ADDR 0x400
 #define PREFIX_FWAIT 0x800
+#define PREFIX_REX2 0x1000
 
 /* Make sure that bytes from INFO->PRIVATE_DATA->BUFFER (inclusive)
    to ADDR (exclusive) are valid.  Returns true for success, false
@@ -370,6 +385,7 @@  fetch_error (const instr_info *ins)
 #define PREFIX_IGNORED_DATA	(PREFIX_DATA << PREFIX_IGNORED_SHIFT)
 #define PREFIX_IGNORED_ADDR	(PREFIX_ADDR << PREFIX_IGNORED_SHIFT)
 #define PREFIX_IGNORED_LOCK	(PREFIX_LOCK << PREFIX_IGNORED_SHIFT)
+#define PREFIX_REX2_ILLEGAL	(PREFIX_REX2 << PREFIX_IGNORED_SHIFT)
 
 /* Opcode prefixes.  */
 #define PREFIX_OPCODE		(PREFIX_REPZ \
@@ -1888,23 +1904,23 @@  static const struct dis386 dis386[] = {
   { "outs{b|}",		{ indirDXr, Xb }, 0 },
   { X86_64_TABLE (X86_64_6F) },
   /* 70 */
-  { "joH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jnoH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jbH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jaeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jneH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jbeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jaH",		{ Jb, BND, cond_jump_flag }, 0 },
+  { "joH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnoH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jneH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 78 */
-  { "jsH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jnsH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jpH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jnpH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jlH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jgeH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jleH",		{ Jb, BND, cond_jump_flag }, 0 },
-  { "jgH",		{ Jb, BND, cond_jump_flag }, 0 },
+  { "jsH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnsH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jpH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnpH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jlH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgeH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jleH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgH",		{ Jb, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 80 */
   { REG_TABLE (REG_80) },
   { REG_TABLE (REG_81) },
@@ -1942,23 +1958,23 @@  static const struct dis386 dis386[] = {
   { "sahf",		{ XX }, 0 },
   { "lahf",		{ XX }, 0 },
   /* a0 */
-  { "mov%LB",		{ AL, Ob }, 0 },
-  { "mov%LS",		{ eAX, Ov }, 0 },
-  { "mov%LB",		{ Ob, AL }, 0 },
-  { "mov%LS",		{ Ov, eAX }, 0 },
-  { "movs{b|}",		{ Ybr, Xb }, 0 },
-  { "movs{R|}",		{ Yvr, Xv }, 0 },
-  { "cmps{b|}",		{ Xb, Yb }, 0 },
-  { "cmps{R|}",		{ Xv, Yv }, 0 },
+  { "mov%LB",		{ AL, Ob }, PREFIX_REX2_ILLEGAL },
+  { "mov%LS",		{ eAX, Ov }, PREFIX_REX2_ILLEGAL },
+  { "mov%LB",		{ Ob, AL }, PREFIX_REX2_ILLEGAL },
+  { "mov%LS",		{ Ov, eAX }, PREFIX_REX2_ILLEGAL },
+  { "movs{b|}",		{ Ybr, Xb }, PREFIX_REX2_ILLEGAL },
+  { "movs{R|}",		{ Yvr, Xv }, PREFIX_REX2_ILLEGAL },
+  { "cmps{b|}",		{ Xb, Yb }, PREFIX_REX2_ILLEGAL },
+  { "cmps{R|}",		{ Xv, Yv }, PREFIX_REX2_ILLEGAL },
   /* a8 */
-  { "testB",		{ AL, Ib }, 0 },
-  { "testS",		{ eAX, Iv }, 0 },
-  { "stosB",		{ Ybr, AL }, 0 },
-  { "stosS",		{ Yvr, eAX }, 0 },
-  { "lodsB",		{ ALr, Xb }, 0 },
-  { "lodsS",		{ eAXr, Xv }, 0 },
-  { "scasB",		{ AL, Yb }, 0 },
-  { "scasS",		{ eAX, Yv }, 0 },
+  { "testB",		{ AL, Ib }, PREFIX_REX2_ILLEGAL },
+  { "testS",		{ eAX, Iv }, PREFIX_REX2_ILLEGAL },
+  { "stosB",		{ Ybr, AL }, PREFIX_REX2_ILLEGAL },
+  { "stosS",		{ Yvr, eAX }, PREFIX_REX2_ILLEGAL },
+  { "lodsB",		{ ALr, Xb }, PREFIX_REX2_ILLEGAL },
+  { "lodsS",		{ eAXr, Xv }, PREFIX_REX2_ILLEGAL },
+  { "scasB",		{ AL, Yb }, PREFIX_REX2_ILLEGAL },
+  { "scasS",		{ eAX, Yv }, PREFIX_REX2_ILLEGAL },
   /* b0 */
   { "movB",		{ RMAL, Ib }, 0 },
   { "movB",		{ RMCL, Ib }, 0 },
@@ -2014,23 +2030,23 @@  static const struct dis386 dis386[] = {
   { FLOAT },
   { FLOAT },
   /* e0 */
-  { "loopneFH",		{ Jb, XX, loop_jcxz_flag }, 0 },
-  { "loopeFH",		{ Jb, XX, loop_jcxz_flag }, 0 },
-  { "loopFH",		{ Jb, XX, loop_jcxz_flag }, 0 },
-  { "jEcxzH",		{ Jb, XX, loop_jcxz_flag }, 0 },
-  { "inB",		{ AL, Ib }, 0 },
-  { "inG",		{ zAX, Ib }, 0 },
-  { "outB",		{ Ib, AL }, 0 },
-  { "outG",		{ Ib, zAX }, 0 },
+  { "loopneFH",		{ Jb, XX, loop_jcxz_flag }, PREFIX_REX2_ILLEGAL },
+  { "loopeFH",		{ Jb, XX, loop_jcxz_flag }, PREFIX_REX2_ILLEGAL },
+  { "loopFH",		{ Jb, XX, loop_jcxz_flag }, PREFIX_REX2_ILLEGAL },
+  { "jEcxzH",		{ Jb, XX, loop_jcxz_flag }, PREFIX_REX2_ILLEGAL },
+  { "inB",		{ AL, Ib }, PREFIX_REX2_ILLEGAL },
+  { "inG",		{ zAX, Ib }, PREFIX_REX2_ILLEGAL },
+  { "outB",		{ Ib, AL }, PREFIX_REX2_ILLEGAL },
+  { "outG",		{ Ib, zAX }, PREFIX_REX2_ILLEGAL },
   /* e8 */
   { X86_64_TABLE (X86_64_E8) },
   { X86_64_TABLE (X86_64_E9) },
   { X86_64_TABLE (X86_64_EA) },
-  { "jmp",		{ Jb, BND }, 0 },
-  { "inB",		{ AL, indirDX }, 0 },
-  { "inG",		{ zAX, indirDX }, 0 },
-  { "outB",		{ indirDX, AL }, 0 },
-  { "outG",		{ indirDX, zAX }, 0 },
+  { "jmp",		{ Jb, BND }, PREFIX_REX2_ILLEGAL },
+  { "inB",		{ AL, indirDX }, PREFIX_REX2_ILLEGAL },
+  { "inG",		{ zAX, indirDX }, PREFIX_REX2_ILLEGAL },
+  { "outB",		{ indirDX, AL }, PREFIX_REX2_ILLEGAL },
+  { "outG",		{ indirDX, zAX }, PREFIX_REX2_ILLEGAL },
   /* f0 */
   { Bad_Opcode },	/* lock prefix */
   { "int1",		{ XX }, 0 },
@@ -2107,12 +2123,12 @@  static const struct dis386 dis386_twobyte[] = {
   { PREFIX_TABLE (PREFIX_0F2E) },
   { PREFIX_TABLE (PREFIX_0F2F) },
   /* 30 */
-  { "wrmsr",		{ XX }, 0 },
-  { "rdtsc",		{ XX }, 0 },
-  { "rdmsr",		{ XX }, 0 },
-  { "rdpmc",		{ XX }, 0 },
-  { "sysenter",		{ SEP }, 0 },
-  { "sysexit%LQ",	{ SEP }, 0 },
+  { "wrmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "rdtsc",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "rdmsr",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "rdpmc",		{ XX }, PREFIX_REX2_ILLEGAL },
+  { "sysenter",		{ SEP }, PREFIX_REX2_ILLEGAL },
+  { "sysexit%LQ",	{ SEP }, PREFIX_REX2_ILLEGAL },
   { Bad_Opcode },
   { "getsec",		{ XX }, 0 },
   /* 38 */
@@ -2197,23 +2213,23 @@  static const struct dis386 dis386_twobyte[] = {
   { PREFIX_TABLE (PREFIX_0F7E) },
   { PREFIX_TABLE (PREFIX_0F7F) },
   /* 80 */
-  { "joH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jnoH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jbH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jaeH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jeH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jneH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jbeH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jaH",		{ Jv, BND, cond_jump_flag }, 0 },
+  { "joH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnoH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaeH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jeH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jneH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jbeH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jaH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 88 */
-  { "jsH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jnsH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jpH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jnpH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jlH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jgeH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jleH",		{ Jv, BND, cond_jump_flag }, 0 },
-  { "jgH",		{ Jv, BND, cond_jump_flag }, 0 },
+  { "jsH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnsH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jpH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jnpH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jlH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgeH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jleH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
+  { "jgH",		{ Jv, BND, cond_jump_flag }, PREFIX_REX2_ILLEGAL },
   /* 90 */
   { "seto",		{ Eb }, 0 },
   { "setno",		{ Eb }, 0 },
@@ -2406,22 +2422,30 @@  static const char intel_index16[][6] = {
 
 static const char att_names64[][8] = {
   "%rax", "%rcx", "%rdx", "%rbx", "%rsp", "%rbp", "%rsi", "%rdi",
-  "%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15"
+  "%r8", "%r9", "%r10", "%r11", "%r12", "%r13", "%r14", "%r15",
+  "%r16", "%r17", "%r18", "%r19", "%r20", "%r21", "%r22", "%r23",
+  "%r24", "%r25", "%r26", "%r27", "%r28", "%r29", "%r30", "%r31"
 };
 static const char att_names32[][8] = {
   "%eax", "%ecx", "%edx", "%ebx", "%esp", "%ebp", "%esi", "%edi",
-  "%r8d", "%r9d", "%r10d", "%r11d", "%r12d", "%r13d", "%r14d", "%r15d"
+  "%r8d", "%r9d", "%r10d", "%r11d", "%r12d", "%r13d", "%r14d", "%r15d",
+  "%r16d", "%r17d", "%r18d", "%r19d", "%r20d", "%r21d", "%r22d", "%r23d",
+  "%r24d", "%r25d", "%r26d", "%r27d", "%r28d", "%r29d", "%r30d", "%r31d"
 };
 static const char att_names16[][8] = {
   "%ax", "%cx", "%dx", "%bx", "%sp", "%bp", "%si", "%di",
-  "%r8w", "%r9w", "%r10w", "%r11w", "%r12w", "%r13w", "%r14w", "%r15w"
+  "%r8w", "%r9w", "%r10w", "%r11w", "%r12w", "%r13w", "%r14w", "%r15w",
+  "%r16w", "%r17w", "%r18w", "%r19w", "%r20w", "%r21w", "%r22w", "%r23w",
+  "%r24w", "%r25w", "%r26w", "%r27w", "%r28w", "%r29w", "%r30w", "%r31w"
 };
 static const char att_names8[][8] = {
   "%al", "%cl", "%dl", "%bl", "%ah", "%ch", "%dh", "%bh",
 };
 static const char att_names8rex[][8] = {
   "%al", "%cl", "%dl", "%bl", "%spl", "%bpl", "%sil", "%dil",
-  "%r8b", "%r9b", "%r10b", "%r11b", "%r12b", "%r13b", "%r14b", "%r15b"
+  "%r8b", "%r9b", "%r10b", "%r11b", "%r12b", "%r13b", "%r14b", "%r15b",
+  "%r16b", "%r17b", "%r18b", "%r19b", "%r20b", "%r21b", "%r22b", "%r23b",
+  "%r24b", "%r25b", "%r26b", "%r27b", "%r28b", "%r29b", "%r30b", "%r31b"
 };
 static const char att_names_seg[][4] = {
   "%es", "%cs", "%ss", "%ds", "%fs", "%gs", "%?", "%?",
@@ -2810,9 +2834,9 @@  static const struct dis386 reg_table[][8] = {
     { Bad_Opcode },
     { "cmpxchg8b", { { CMPXCHG8B_Fixup, q_mode } }, 0 },
     { Bad_Opcode },
-    { "xrstors", { FXSAVE }, 0 },
-    { "xsavec", { FXSAVE }, 0 },
-    { "xsaves", { FXSAVE }, 0 },
+    { "xrstors", { FXSAVE }, PREFIX_REX2_ILLEGAL },
+    { "xsavec", { FXSAVE }, PREFIX_REX2_ILLEGAL },
+    { "xsaves", { FXSAVE }, PREFIX_REX2_ILLEGAL },
     { MOD_TABLE (MOD_0FC7_REG_6) },
     { MOD_TABLE (MOD_0FC7_REG_7) },
   },
@@ -3384,7 +3408,7 @@  static const struct dis386 prefix_table[][4] = {
 
   /* PREFIX_0FAE_REG_4_MOD_0 */
   {
-    { "xsave",	{ FXSAVE }, 0 },
+    { "xsave",	{ FXSAVE }, PREFIX_REX2_ILLEGAL },
     { "ptwrite{%LQ|}", { Edq }, 0 },
   },
 
@@ -3402,7 +3426,7 @@  static const struct dis386 prefix_table[][4] = {
 
   /* PREFIX_0FAE_REG_6_MOD_0 */
   {
-    { "xsaveopt",	{ FXSAVE }, PREFIX_OPCODE },
+    { "xsaveopt",	{ FXSAVE }, PREFIX_OPCODE | PREFIX_REX2_ILLEGAL },
     { "clrssbsy",	{ Mq }, PREFIX_OPCODE },
     { "clwb",	{ Mb }, PREFIX_OPCODE },
   },
@@ -4196,19 +4220,19 @@  static const struct dis386 x86_64_table[][2] = {
 
   /* X86_64_E8 */
   {
-    { "callP",		{ Jv, BND }, 0 },
-    { "call@",		{ Jv, BND }, 0 }
+    { "callP",		{ Jv, BND }, PREFIX_REX2_ILLEGAL },
+    { "call@",		{ Jv, BND }, PREFIX_REX2_ILLEGAL }
   },
 
   /* X86_64_E9 */
   {
-    { "jmpP",		{ Jv, BND }, 0 },
-    { "jmp@",		{ Jv, BND }, 0 }
+    { "jmpP",		{ Jv, BND }, PREFIX_REX2_ILLEGAL },
+    { "jmp@",		{ Jv, BND }, PREFIX_REX2_ILLEGAL }
   },
 
   /* X86_64_EA */
   {
-    { "{l|}jmp{P|}", { Ap }, 0 },
+    { "{l|}jmp{P|}", { Ap }, PREFIX_REX2_ILLEGAL },
   },
 
   /* X86_64_0F00_REG_6 */
@@ -8184,7 +8208,7 @@  static const struct dis386 mod_table[][2] = {
   },
   {
     /* MOD_0FAE_REG_5 */
-    { "xrstor",		{ FXSAVE }, PREFIX_OPCODE },
+    { "xrstor",		{ FXSAVE }, PREFIX_OPCODE | PREFIX_REX2_ILLEGAL },
     { PREFIX_TABLE (PREFIX_0FAE_REG_5_MOD_3) },
   },
   {
@@ -8387,6 +8411,24 @@  ckprefix (instr_info *ins)
 	    return ckp_okay;
 	  ins->last_rex_prefix = i;
 	  break;
+	/* REX2 must be the last prefix. */
+	case 0xd5:
+	  if (ins->address_mode == mode_64bit)
+	    {
+	      if (ins->last_rex_prefix >= 0)
+		return ckp_bogus;
+
+	      ins->codep++;
+	      if (!fetch_code (ins->info, ins->codep + 1))
+		return ckp_fetch_error;
+	      unsigned char rex2_payload = *ins->codep;
+	      ins->rex2 = rex2_payload >> 4;
+	      ins->rex = (rex2_payload & 0xf) | REX_OPCODE;
+	      ins->codep++;
+	      ins->last_rex2_prefix = i;
+	      ins->all_prefixes[i] = REX2_OPCODE;
+	    }
+	  return ckp_okay;
 	case 0xf3:
 	  ins->prefixes |= PREFIX_REPZ;
 	  ins->last_repz_prefix = i;
@@ -8554,6 +8596,8 @@  prefix_name (enum address_mode mode, uint8_t pref, int sizeflag)
       return "bnd";
     case NOTRACK_PREFIX:
       return "notrack";
+    case REX2_OPCODE:
+      return "rex2";
     default:
       return NULL;
     }
@@ -9202,6 +9246,7 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
     .last_data_prefix = -1,
     .last_addr_prefix = -1,
     .last_rex_prefix = -1,
+    .last_rex2_prefix = -1,
     .last_seg_prefix = -1,
     .fwait_prefix = -1,
   };
@@ -9366,13 +9411,18 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       goto out;
     }
 
-  if (*ins.codep == 0x0f)
+  /* REX2.M in rex2 prefix represents map0 or map1.  */
+  if (ins.last_rex2_prefix < 0 ? *ins.codep == 0x0f : (ins.rex2 & REX2_M))
     {
       unsigned char threebyte;
 
-      ins.codep++;
-      if (!fetch_code (info, ins.codep + 1))
-	goto fetch_error_out;
+      if (!ins.rex2)
+	{
+	  ins.codep++;
+	  if (!fetch_code (info, ins.codep + 1))
+	    goto fetch_error_out;
+	}
+
       threebyte = *ins.codep;
       dp = &dis386_twobyte[threebyte];
       ins.need_modrm = twobyte_has_modrm[threebyte];
@@ -9528,7 +9578,15 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       goto out;
     }
 
-  switch (dp->prefix_requirement)
+  if ((dp->prefix_requirement & PREFIX_REX2_ILLEGAL)
+      && ins.last_rex2_prefix >= 0)
+    {
+      i386_dis_printf (info, dis_style_text, "(bad)");
+      ret = ins.end_codep - priv.the_buffer;
+      goto out;
+    }
+
+  switch (dp->prefix_requirement & ~PREFIX_REX2_ILLEGAL)
     {
     case PREFIX_DATA:
       /* If only the data prefix is marked as mandatory, its absence renders
@@ -9587,6 +9645,10 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
       && !ins.need_vex && ins.last_rex_prefix >= 0)
     ins.all_prefixes[ins.last_rex_prefix] = 0;
 
+  /* Check if the REX2 prefix is used.  */
+  if (ins.last_rex2_prefix >= 0 && (ins.rex2 & 7))
+    ins.all_prefixes[ins.last_rex2_prefix] = 0;
+
   /* Check if the SEG prefix is used.  */
   if ((ins.prefixes & (PREFIX_CS | PREFIX_SS | PREFIX_DS | PREFIX_ES
 		       | PREFIX_FS | PREFIX_GS)) != 0
@@ -9615,7 +9677,10 @@  print_insn (bfd_vma pc, disassemble_info *info, int intel_syntax)
 	if (name == NULL)
 	  abort ();
 	prefix_length += strlen (name) + 1;
-	i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
+	if (ins.all_prefixes[i] == REX2_OPCODE)
+	  i386_dis_printf (info, dis_style_mnemonic, "{%s} ", name);
+	else
+	  i386_dis_printf (info, dis_style_mnemonic, "%s ", name);
       }
 
   /* Check maximum code length.  */
@@ -11160,8 +11225,11 @@  print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
     ins->illegal_masking = true;
 
   USED_REX (rexmask);
+  USED_REX2 (rexmask);
   if (ins->rex & rexmask)
     reg += 8;
+  if (ins->rex2 & rexmask)
+    reg += 16;
 
   switch (bytemode)
     {
@@ -11169,7 +11237,7 @@  print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
     case b_swap_mode:
       if (reg & 4)
 	USED_REX (0);
-      if (ins->rex)
+      if (ins->rex || ins->rex2)
 	names = att_names8rex;
       else
 	names = att_names8;
@@ -11385,6 +11453,8 @@  OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
   int riprel = 0;
   int shift;
 
+  add += (ins->rex2 & REX_B) ? 16 : 0;
+
   if (ins->vex.evex)
     {
 
@@ -11489,6 +11559,7 @@  OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
     shift = 0;
 
   USED_REX (REX_B);
+  USED_REX2 (REX_B);
   if (ins->intel_syntax)
     intel_operand_size (ins, bytemode, sizeflag);
   append_seg (ins);
@@ -11519,8 +11590,11 @@  OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	{
 	  vindex = ins->sib.index;
 	  USED_REX (REX_X);
+	  USED_REX2 (REX_X);
 	  if (ins->rex & REX_X)
 	    vindex += 8;
+	  if (ins->rex2 & REX_X)
+	    vindex += 16;
 	  switch (bytemode)
 	    {
 	    case vex_vsib_d_w_dq_mode:
@@ -11945,7 +12019,7 @@  static bool
 OP_REG (instr_info *ins, int code, int sizeflag)
 {
   const char *s;
-  int add;
+  int add = 0;
 
   switch (code)
     {
@@ -11956,10 +12030,11 @@  OP_REG (instr_info *ins, int code, int sizeflag)
     }
 
   USED_REX (REX_B);
+  USED_REX2 (REX_B);
   if (ins->rex & REX_B)
     add = 8;
-  else
-    add = 0;
+  if (ins->rex2 & REX_B)
+    add += 16;
 
   switch (code)
     {
@@ -12671,8 +12746,11 @@  OP_EX (instr_info *ins, int bytemode, int sizeflag)
 
   reg = ins->modrm.rm;
   USED_REX (REX_B);
+  USED_REX2 (REX_B);
   if (ins->rex & REX_B)
     reg += 8;
+  if (ins->rex2 & REX_B)
+    reg += 16;
   if (ins->vex.evex)
     {
       USED_REX (REX_X);
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 53cb700d0aa..6402b669d37 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -275,6 +275,8 @@  static const dependency isa_dependencies[] =
     "64" },
   { "USER_MSR",
     "64" },
+  { "APX_F",
+    "XSAVE|64" },
 };
 
 /* This array is populated as process_i386_initializers() walks cpu_flags[].  */
@@ -397,6 +399,7 @@  static bitfield cpu_flags[] =
   BITFIELD (FRED),
   BITFIELD (LKGS),
   BITFIELD (USER_MSR),
+  BITFIELD (APX_F),
   BITFIELD (MWAITX),
   BITFIELD (CLZERO),
   BITFIELD (OSPKE),
@@ -486,6 +489,7 @@  static bitfield opcode_modifiers[] =
   BITFIELD (ATTSyntax),
   BITFIELD (IntelSyntax),
   BITFIELD (ISA64),
+  BITFIELD (NoEgpr),
 };
 
 #define CLASS(n) #n, n
@@ -1072,10 +1076,48 @@  get_element_size (char **opnd, int lineno)
   return elem_size;
 }
 
+static bool
+rex2_disallowed (const unsigned long long opcode, unsigned int space,
+			       const char *cpu_flags)
+{
+  /* Prefixing XSAVE* and XRSTOR* instructions with REX2 triggers #UD.  */
+  if (strcmp (cpu_flags, "XSAVES") >= 0
+      || strcmp (cpu_flags, "XSAVEC") >= 0
+      || strcmp (cpu_flags, "Xsave") >= 0
+      || strcmp (cpu_flags, "Xsaveopt") >= 0)
+    return true;
+
+  /* All opcodes listed map0 0x4*, 0x7*, 0xa*, 0xe* and map1 0x3*, 0x8*
+     are reserved under REX2 and triggers #UD when prefixed with REX2 */
+  if (space == 0)
+    switch (opcode >> 4)
+      {
+      case 0x4:
+      case 0x7:
+      case 0xA:
+      case 0xE:
+	return true;
+      default:
+	return false;
+    }
+
+  if (space == SPACE_0F)
+    switch (opcode >> 4)
+      {
+      case 0x3:
+      case 0x8:
+	return true;
+      default:
+	return false;
+      }
+
+  return false;
+}
+
 static void
 process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
 			      unsigned int prefix, const char *extension_opcode,
-			      char **opnd, int lineno)
+			      char **opnd, int lineno, bool rex2_disallowed)
 {
   char *str, *next, *last;
   bitfield modifiers [ARRAY_SIZE (opcode_modifiers)];
@@ -1202,6 +1244,12 @@  process_i386_opcode_modifier (FILE *table, char *mod, unsigned int space,
 	  || modifiers[SAE].value))
     modifiers[EVex].value = EVEXDYN;
 
+  /* Vex, legacy map2 and map3 and rex2_disallowed do not support EGPR.
+     For template supports both Vex and EVex allowing EGPR.  */
+  if ((modifiers[Vex].value || space > SPACE_0F || rex2_disallowed)
+      && !modifiers[EVex].value)
+    modifiers[NoEgpr].value = 1;
+
   output_opcode_modifier (table, modifiers, ARRAY_SIZE (modifiers));
 }
 
@@ -1425,8 +1473,11 @@  output_i386_opcode (FILE *table, const char *name, char *str,
 	   ident, 2 * (int)length, opcode, end, i);
   free (ident);
 
+  /* Add some specilal handle for current entry.  */
+  bool  has_special_handle = rex2_disallowed (opcode, space, cpu_flags);
   process_i386_opcode_modifier (table, opcode_modifier, space, prefix,
-				extension_opcode, operand_types, lineno);
+				extension_opcode, operand_types, lineno,
+				has_special_handle);
 
   process_i386_cpu_flag (table, cpu_flags, NULL, ",", "    ", lineno, CpuMax);
 
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 7bb8084b291..d28a4cedf0f 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -319,6 +319,8 @@  enum i386_cpu
   CpuAVX512F,
   /* Intel AVX-512 VL Instructions support required.  */
   CpuAVX512VL,
+  /* Intel APX_F Instructions support required.  */
+  CpuAPX_F,
   /* Not supported in the 64bit mode  */
   CpuNo64,
 
@@ -354,6 +356,7 @@  enum i386_cpu
 		   cpuhle:1, \
 		   cpuavx512f:1, \
 		   cpuavx512vl:1, \
+		   cpuapx_f:1, \
       /* NOTE: This field needs to remain last. */ \
 		   cpuno64:1
 
@@ -745,6 +748,11 @@  enum
 #define INTEL64		2
 #define INTEL64ONLY	3
   ISA64,
+
+  /* egprs (r16-r31) on instruction illegal. We also use it to judge
+     whether the instruction supports pseudo-prefix {rex2}.  */
+  NoEgpr,
+
   /* The last bitfield in i386_opcode_modifier.  */
   Opcode_Modifier_Num
 };
@@ -792,6 +800,7 @@  typedef struct i386_opcode_modifier
   unsigned int attsyntax:1;
   unsigned int intelsyntax:1;
   unsigned int isa64:2;
+  unsigned int noegpr:1;
 } i386_opcode_modifier;
 
 /* Operand classes.  */
@@ -1006,7 +1015,8 @@  typedef struct insn_template
 #define Prefix_VEX3		6	/* {vex3} */
 #define Prefix_EVEX		7	/* {evex} */
 #define Prefix_REX		8	/* {rex} */
-#define Prefix_NoOptimize	9	/* {nooptimize} */
+#define Prefix_REX2		9	/* {rex2} */
+#define Prefix_NoOptimize	10	/* {nooptimize} */
 
   /* the bits in opcode_modifier are used to generate the final opcode from
      the base_opcode.  These bits also are used to detect alternate forms of
@@ -1033,6 +1043,7 @@  typedef struct
 #define RegRex	    0x1  /* Extended register.  */
 #define RegRex64    0x2  /* Extended 8 bit register.  */
 #define RegVRex	    0x4  /* Extended vector register.  */
+#define RegRex2	    0x8  /* Extended GPRs R16–R31 register.  */
   unsigned char reg_num;
 #define RegIP	((unsigned char ) ~0)
 /* EIZ and RIZ are fake index registers.  */
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index c31bf20f2e6..cbf9d968fba 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -138,6 +138,7 @@ 
 #define Vsz256 Vsz=VSZ256
 #define Vsz512 Vsz=VSZ512
 
+
 // The EVEX purpose of StaticRounding appears only together with SAE. Re-use
 // the bit to mark commutative VEX encodings where swapping the source
 // operands may allow to switch from 3-byte to 2-byte VEX encoding.
@@ -895,7 +896,7 @@  rex.wrxb, 0x4f, x64, NoSuf|IsPrefix, {}
 <pseudopfx:ident:cpu, disp8:Disp8:0, disp16:Disp16:0, disp32:Disp32:0, +
                       load:Load:0, store:Store:0, +
                       vex:VEX:0, vex2:VEX:0, vex3:VEX3:0, evex:EVEX:0, +
-                      rex:REX:x64, nooptimize:NoOptimize:0>
+                      rex:REX:x64, rex2:REX2:x64, nooptimize:NoOptimize:0>
 
 {<pseudopfx>}, PSEUDO_PREFIX/Prefix_<pseudopfx:ident>, <pseudopfx:cpu>, NoSuf|IsPrefix, {}
 
@@ -1428,16 +1429,17 @@  crc32, 0xf20f38f0, SSE4_2&x64, W|Modrm|No_wSuf|No_lSuf|No_sSuf, { Reg8|Reg64|Uns
 
 // xsave/xrstor New Instructions.
 
-xsave, 0xfae/4, Xsave, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Unspecified|BaseIndex }
-xsave64, 0xfae/4, Xsave&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
-xrstor, 0xfae/5, Xsave, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Unspecified|BaseIndex }
-xrstor64, 0xfae/5, Xsave&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
+xsave, 0xfae/4, Xsave, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr, { Unspecified|BaseIndex }
+xsave64, 0xfae/4, Xsave&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
+xrstor, 0xfae/5, Xsave, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr, { Unspecified|BaseIndex }
+xrstor64, 0xfae/5, Xsave&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
 xgetbv, 0xf01d0, Xsave, NoSuf, {}
 xsetbv, 0xf01d1, Xsave, NoSuf, {}
 
 // xsaveopt
-xsaveopt, 0xfae/6, Xsaveopt, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf, { Unspecified|BaseIndex }
-xsaveopt64, 0xfae/6, Xsaveopt&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
+
+xsaveopt, 0xfae/6, Xsaveopt, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|NoEgpr, { Unspecified|BaseIndex }
+xsaveopt64, 0xfae/6, Xsaveopt&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
 
 // AES instructions.
 
@@ -2474,17 +2476,17 @@  clflushopt, 0x660fae/7, ClflushOpt, Modrm|Anysize|IgnoreSize|NoSuf, { BaseIndex
 
 // XSAVES/XRSTORS instructions.
 
-xrstors, 0xfc7/3, XSAVES, Modrm|NoSuf, { Unspecified|BaseIndex }
-xrstors64, 0xfc7/3, XSAVES&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
-xsaves, 0xfc7/5, XSAVES, Modrm|NoSuf, { Unspecified|BaseIndex }
-xsaves64, 0xfc7/5, XSAVES&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
+xrstors, 0xfc7/3, XSAVES, Modrm|NoSuf|NoEgpr, { Unspecified|BaseIndex }
+xrstors64, 0xfc7/3, XSAVES&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
+xsaves, 0xfc7/5, XSAVES, Modrm|NoSuf|NoEgpr, { Unspecified|BaseIndex }
+xsaves64, 0xfc7/5, XSAVES&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
 
 // XSAVES instructions end.
 
 // XSAVEC instructions.
 
-xsavec, 0xfc7/4, XSAVEC, Modrm|NoSuf, { Unspecified|BaseIndex }
-xsavec64, 0xfc7/4, XSAVEC&x64, Modrm|NoSuf|Size64, { Unspecified|BaseIndex }
+xsavec, 0xfc7/4, XSAVEC, Modrm|NoSuf|NoEgpr, { Unspecified|BaseIndex }
+xsavec64, 0xfc7/4, XSAVEC&x64, Modrm|NoSuf|Size64|NoEgpr, { Unspecified|BaseIndex }
 
 // XSAVEC instructions end.
 
diff --git a/opcodes/i386-reg.tbl b/opcodes/i386-reg.tbl
index 2ac56e3fd0b..8fead35e320 100644
--- a/opcodes/i386-reg.tbl
+++ b/opcodes/i386-reg.tbl
@@ -43,6 +43,22 @@  r12b, Class=Reg|Byte, RegRex|RegRex64, 4, Dw2Inval, Dw2Inval
 r13b, Class=Reg|Byte, RegRex|RegRex64, 5, Dw2Inval, Dw2Inval
 r14b, Class=Reg|Byte, RegRex|RegRex64, 6, Dw2Inval, Dw2Inval
 r15b, Class=Reg|Byte, RegRex|RegRex64, 7, Dw2Inval, Dw2Inval
+r16b, Class=Reg|Byte, RegRex2|RegRex64, 0, Dw2Inval, Dw2Inval
+r17b, Class=Reg|Byte, RegRex2|RegRex64, 1, Dw2Inval, Dw2Inval
+r18b, Class=Reg|Byte, RegRex2|RegRex64, 2, Dw2Inval, Dw2Inval
+r19b, Class=Reg|Byte, RegRex2|RegRex64, 3, Dw2Inval, Dw2Inval
+r20b, Class=Reg|Byte, RegRex2|RegRex64, 4, Dw2Inval, Dw2Inval
+r21b, Class=Reg|Byte, RegRex2|RegRex64, 5, Dw2Inval, Dw2Inval
+r22b, Class=Reg|Byte, RegRex2|RegRex64, 6, Dw2Inval, Dw2Inval
+r23b, Class=Reg|Byte, RegRex2|RegRex64, 7, Dw2Inval, Dw2Inval
+r24b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 0, Dw2Inval, Dw2Inval
+r25b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 1, Dw2Inval, Dw2Inval
+r26b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 2, Dw2Inval, Dw2Inval
+r27b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 3, Dw2Inval, Dw2Inval
+r28b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 4, Dw2Inval, Dw2Inval
+r29b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 5, Dw2Inval, Dw2Inval
+r30b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 6, Dw2Inval, Dw2Inval
+r31b, Class=Reg|Byte, RegRex2|RegRex64|RegRex, 7, Dw2Inval, Dw2Inval
 // 16 bit regs
 ax, Class=Reg|Instance=Accum|Word, 0, 0, Dw2Inval, Dw2Inval
 cx, Class=Reg|Word, 0, 1, Dw2Inval, Dw2Inval
@@ -60,6 +76,22 @@  r12w, Class=Reg|Word, RegRex, 4, Dw2Inval, Dw2Inval
 r13w, Class=Reg|Word, RegRex, 5, Dw2Inval, Dw2Inval
 r14w, Class=Reg|Word, RegRex, 6, Dw2Inval, Dw2Inval
 r15w, Class=Reg|Word, RegRex, 7, Dw2Inval, Dw2Inval
+r16w, Class=Reg|Word, RegRex2, 0, Dw2Inval, Dw2Inval
+r17w, Class=Reg|Word, RegRex2, 1, Dw2Inval, Dw2Inval
+r18w, Class=Reg|Word, RegRex2, 2, Dw2Inval, Dw2Inval
+r19w, Class=Reg|Word, RegRex2, 3, Dw2Inval, Dw2Inval
+r20w, Class=Reg|Word, RegRex2, 4, Dw2Inval, Dw2Inval
+r21w, Class=Reg|Word, RegRex2, 5, Dw2Inval, Dw2Inval
+r22w, Class=Reg|Word, RegRex2, 6, Dw2Inval, Dw2Inval
+r23w, Class=Reg|Word, RegRex2, 7, Dw2Inval, Dw2Inval
+r24w, Class=Reg|Word, RegRex2|RegRex, 0, Dw2Inval, Dw2Inval
+r25w, Class=Reg|Word, RegRex2|RegRex, 1, Dw2Inval, Dw2Inval
+r26w, Class=Reg|Word, RegRex2|RegRex, 2, Dw2Inval, Dw2Inval
+r27w, Class=Reg|Word, RegRex2|RegRex, 3, Dw2Inval, Dw2Inval
+r28w, Class=Reg|Word, RegRex2|RegRex, 4, Dw2Inval, Dw2Inval
+r29w, Class=Reg|Word, RegRex2|RegRex, 5, Dw2Inval, Dw2Inval
+r30w, Class=Reg|Word, RegRex2|RegRex, 6, Dw2Inval, Dw2Inval
+r31w, Class=Reg|Word, RegRex2|RegRex, 7, Dw2Inval, Dw2Inval
 // 32 bit regs
 eax, Class=Reg|Instance=Accum|Dword|BaseIndex, 0, 0, 0, Dw2Inval
 ecx, Class=Reg|Instance=RegC|Dword|BaseIndex, 0, 1, 1, Dw2Inval
@@ -77,6 +109,22 @@  r12d, Class=Reg|Dword|BaseIndex, RegRex, 4, Dw2Inval, Dw2Inval
 r13d, Class=Reg|Dword|BaseIndex, RegRex, 5, Dw2Inval, Dw2Inval
 r14d, Class=Reg|Dword|BaseIndex, RegRex, 6, Dw2Inval, Dw2Inval
 r15d, Class=Reg|Dword|BaseIndex, RegRex, 7, Dw2Inval, Dw2Inval
+r16d, Class=Reg|Dword|BaseIndex, RegRex2, 0, Dw2Inval, Dw2Inval
+r17d, Class=Reg|Dword|BaseIndex, RegRex2, 1, Dw2Inval, Dw2Inval
+r18d, Class=Reg|Dword|BaseIndex, RegRex2, 2, Dw2Inval, Dw2Inval
+r19d, Class=Reg|Dword|BaseIndex, RegRex2, 3, Dw2Inval, Dw2Inval
+r20d, Class=Reg|Dword|BaseIndex, RegRex2, 4, Dw2Inval, Dw2Inval
+r21d, Class=Reg|Dword|BaseIndex, RegRex2, 5, Dw2Inval, Dw2Inval
+r22d, Class=Reg|Dword|BaseIndex, RegRex2, 6, Dw2Inval, Dw2Inval
+r23d, Class=Reg|Dword|BaseIndex, RegRex2, 7, Dw2Inval, Dw2Inval
+r24d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 0, Dw2Inval, Dw2Inval
+r25d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 1, Dw2Inval, Dw2Inval
+r26d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 2, Dw2Inval, Dw2Inval
+r27d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 3, Dw2Inval, Dw2Inval
+r28d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 4, Dw2Inval, Dw2Inval
+r29d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 5, Dw2Inval, Dw2Inval
+r30d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 6, Dw2Inval, Dw2Inval
+r31d, Class=Reg|Dword|BaseIndex, RegRex2|RegRex, 7, Dw2Inval, Dw2Inval
 rax, Class=Reg|Instance=Accum|Qword|BaseIndex, 0, 0, Dw2Inval, 0
 rcx, Class=Reg|Instance=RegC|Qword|BaseIndex, 0, 1, Dw2Inval, 2
 rdx, Class=Reg|Instance=RegD|Qword|BaseIndex, 0, 2, Dw2Inval, 1
@@ -93,6 +141,22 @@  r12, Class=Reg|Qword|BaseIndex, RegRex, 4, Dw2Inval, 12
 r13, Class=Reg|Qword|BaseIndex, RegRex, 5, Dw2Inval, 13
 r14, Class=Reg|Qword|BaseIndex, RegRex, 6, Dw2Inval, 14
 r15, Class=Reg|Qword|BaseIndex, RegRex, 7, Dw2Inval, 15
+r16, Class=Reg|Qword|BaseIndex, RegRex2, 0, Dw2Inval, 130
+r17, Class=Reg|Qword|BaseIndex, RegRex2, 1, Dw2Inval, 131
+r18, Class=Reg|Qword|BaseIndex, RegRex2, 2, Dw2Inval, 132
+r19, Class=Reg|Qword|BaseIndex, RegRex2, 3, Dw2Inval, 133
+r20, Class=Reg|Qword|BaseIndex, RegRex2, 4, Dw2Inval, 134
+r21, Class=Reg|Qword|BaseIndex, RegRex2, 5, Dw2Inval, 135
+r22, Class=Reg|Qword|BaseIndex, RegRex2, 6, Dw2Inval, 136
+r23, Class=Reg|Qword|BaseIndex, RegRex2, 7, Dw2Inval, 137
+r24, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 0, Dw2Inval, 138
+r25, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 1, Dw2Inval, 139
+r26, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 2, Dw2Inval, 140
+r27, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 3, Dw2Inval, 141
+r28, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 4, Dw2Inval, 142
+r29, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 5, Dw2Inval, 143
+r30, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 6, Dw2Inval, 144
+r31, Class=Reg|Qword|BaseIndex, RegRex2|RegRex, 7, Dw2Inval, 145
 // Vector mask registers.
 k0, Class=RegMask, 0, 0, 93, 118
 k1, Class=RegMask, 0, 1, 94, 119