[v2] Support Intel AVX10.1

Message ID 20230814064535.3228154-1-haochen.jiang@intel.com
State Unresolved
Headers
Series [v2] Support Intel AVX10.1 |

Checks

Context Check Description
snail/binutils-gdb-check warning Git am fail log

Commit Message

Jiang, Haochen Aug. 14, 2023, 6:45 a.m. UTC
  Hi all,

Sorry for the patch delay since the hot discussion in GCC community in AVX10
last week occupied lots of my time.

I have just finished v2 patch for AVX10.1.

Changes in v2:

1. Added new attribute avx10_max_512bit to indicate 512 bit usage. The name is
aligned with the attribute used for GCC implementation. Since binutils uses
default on mode for attribute, I added check only when zmm is used or 64 bit
mask register instruction is used but not in the table.

I am open for the attribute name change or the implementation method change.

2. Removed 32 bit invalid test. 64 bit is enough. Also removed redundant
tests in x86-64-avx10_1.s

3. Added some comments and simpified the changes in gas/config/tc-i386.c.

This change is needed for AVX512_VP2INTERSECT table entry.

@@ -6382,7 +6400,9 @@ check_VecOperands (const insn_template *t)
   cpu = cpu_flags_and (t->cpu_flags, avx512);
   if (!cpu_flags_all_zero (&cpu)
       && !t->cpu_flags.bitfield.cpuavx512vl
-      && !cpu_arch_flags.bitfield.cpuavx512vl)
+      && !cpu_arch_flags.bitfield.cpuavx512vl
+      && (!t->cpu_flags.bitfield.cpuavx10_1
+         || !cpu_arch_flags.bitfield.cpuavx10_1))

Hope that I did not ignore something need to change in v1 patch. Thank for
your review.

Thx,
Haochen

gas/ChangeLog:

	* NEWS: Support Intel AVX10.1.
	* config/tc-i386.c
	(cpu_arch): Add avx10.1 and avx10_max_512bit.
	(cpu_flags_match): Handle AVX10.1 related instructions.
	(check_VecOperands): Ditto.
	(check_register): Allow zmm for avx10.1-512 and mask registers
	for avx10.1.
	* doc/c-i386.texi: Document .avx10.1 and .avx10_max_512bit.
	* testsuite/gas/i386/avx-ifma-inval.l: Add .noavx10.1.
	* testsuite/gas/i386/avx-ifma-inval.s: Ditto.
	* testsuite/gas/i386/avx-ifma.s: Ditto.
	* testsuite/gas/i386/avx-vnni-inval.l: Ditto.
	* testsuite/gas/i386/avx-vnni-inval.s: Ditto.
	* testsuite/gas/i386/avx-vnni.s: Ditto.
	* testsuite/gas/i386/noavx512-1.l: Ditto.
	* testsuite/gas/i386/noavx512-1.s: Ditto.
	* testsuite/gas/i386/noavx512-2.l: Ditto.
	* testsuite/gas/i386/noavx512-2.s: Ditto.
	* testsuite/gas/i386/x86-64-avx-ifma-inval.l: Ditto.
	* testsuite/gas/i386/x86-64-avx-ifma-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-avx-vnni-inval.l: Ditto.
	* testsuite/gas/i386/x86-64-avx-vnni-inval.s: Ditto.
	* testsuite/gas/i386/xmmhi32.s: Ditto.
	* testsuite/gas/i386/x86-64.exp: Run AVX10.1 tests.
	* testsuite/gas/i386/x86-64-avx10_1-inval.d: Ditto.
	* testsuite/gas/i386/x86-64-avx10_1-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-avx10_1.d: Ditto.
	* testsuite/gas/i386/x86-64-avx10_1.s: Ditto.

opcodes/ChangeLog:

	* i386-gen.c (isa_dependencies): Add AVX10_1 and
	AVX10_MAX_512BIT.
	(cpu_flags): Ditto.
	(output_i386_opcode): Add AVX10_1 in table for allowed
	instructions.
	* i386-init.h: Regenerated.
	* i386-opc.h (CpuAVX10_1, CpuAVX10_MAX_512BIT): New.
	(i386_cpu_flags): Add cpuavx10_1 and cpuavx10_max_512bit.
	* i386-init.h: Regenerated.
	* i386-tbl.h: Ditto.
---
 gas/NEWS                                      |     2 +
 gas/config/tc-i386.c                          |    43 +-
 gas/doc/c-i386.texi                           |     4 +-
 gas/testsuite/gas/i386/avx-ifma-inval.l       |     4 +-
 gas/testsuite/gas/i386/avx-ifma-inval.s       |     1 +
 gas/testsuite/gas/i386/avx-ifma.s             |     3 +
 gas/testsuite/gas/i386/avx-vnni-inval.l       |     4 +-
 gas/testsuite/gas/i386/avx-vnni-inval.s       |     1 +
 gas/testsuite/gas/i386/avx-vnni.s             |     3 +
 gas/testsuite/gas/i386/noavx512-1.l           |    39 +-
 gas/testsuite/gas/i386/noavx512-1.s           |     1 +
 gas/testsuite/gas/i386/noavx512-2.l           |   153 +-
 gas/testsuite/gas/i386/noavx512-2.s           |     1 +
 .../gas/i386/x86-64-avx-ifma-inval.l          |     4 +-
 .../gas/i386/x86-64-avx-ifma-inval.s          |     1 +
 .../gas/i386/x86-64-avx-vnni-inval.l          |     4 +-
 .../gas/i386/x86-64-avx-vnni-inval.s          |     1 +
 gas/testsuite/gas/i386/x86-64-avx10_1-inval.l |    20 +
 gas/testsuite/gas/i386/x86-64-avx10_1-inval.s |    27 +
 gas/testsuite/gas/i386/x86-64-avx10_1.d       |    54 +
 gas/testsuite/gas/i386/x86-64-avx10_1.s       |    50 +
 gas/testsuite/gas/i386/x86-64.exp             |     2 +
 gas/testsuite/gas/i386/xmmhi32.s              |     1 +
 opcodes/i386-gen.c                            |    25 +-
 opcodes/i386-init.h                           |   684 +-
 opcodes/i386-opc.h                            |     6 +
 opcodes/i386-tbl.h                            | 10436 ++++++++--------
 27 files changed, 5924 insertions(+), 5650 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx10_1-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx10_1-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx10_1.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx10_1.s
  

Comments

Jan Beulich Aug. 14, 2023, 8:19 a.m. UTC | #1
On 14.08.2023 08:45, Haochen Jiang wrote:
> Changes in v2:
> 
> 1. Added new attribute avx10_max_512bit to indicate 512 bit usage. The name is
> aligned with the attribute used for GCC implementation. Since binutils uses
> default on mode for attribute, I added check only when zmm is used or 64 bit
> mask register instruction is used but not in the table.
> 
> I am open for the attribute name change or the implementation method change.
> 
> 2. Removed 32 bit invalid test. 64 bit is enough. Also removed redundant
> tests in x86-64-avx10_1.s
> 
> 3. Added some comments and simpified the changes in gas/config/tc-i386.c.
> 
> This change is needed for AVX512_VP2INTERSECT table entry.

Before I get into any details here, I'd like to understand why there still
is a new CpuAVX10_1 bit, when I had asked to drop it. I'm also concerned
of CpuAVX10_MAX_512BIT, when I did suggest a new attribute (i.e. a new
bitfield in struct i386_opcode_modifier), and then a more general purpose
one (so that by it being / becoming not just boolean it can later also be
used to deal with the - for now only theoretical - AVX10/128 case).

Jan
  
Frager, Neal via Binutils Aug. 14, 2023, 8:46 a.m. UTC | #2
> Before I get into any details here, I'd like to understand why there still
> is a new CpuAVX10_1 bit, when I had asked to drop it. I'm also concerned

The reason is that we would like to keep the OR logic in the toolchain, which
means opening AVX10.1 but closing AVX512F should not disable the encoding.

But I just double think on that and get your point. GCC is using a default "off"
mode, if we are using OR logic, no code and current behavior are changed and
everything is natural and smooth. However, binutils is using a default "on"
mode, if we stick to OR logic just like GCC, it will eventually corrupt the current
behavior of .noavx512xxx, which could be a problem. I am slightly persuaded on
the proposal of setting and clearing bits of AVX512 for AVX10 in binutils.

H.J., what is your opinion?

> of CpuAVX10_MAX_512BIT, when I did suggest a new attribute (i.e. a new
> bitfield in struct i386_opcode_modifier), and then a more general purpose
> one (so that by it being / becoming not just boolean it can later also be
> used to deal with the - for now only theoretical - AVX10/128 case).

For question 2, I misunderstood the meaning of attribute. But I suppose
AVX10/128 is too theoretical to be true. I will make it a boolean for now.

Thx,
Haochen

> 
> Jan
  
Jan Beulich Aug. 14, 2023, 10:33 a.m. UTC | #3
On 14.08.2023 10:46, Jiang, Haochen wrote:
>> Before I get into any details here, I'd like to understand why there still
>> is a new CpuAVX10_1 bit, when I had asked to drop it. I'm also concerned
> 
> The reason is that we would like to keep the OR logic in the toolchain, which
> means opening AVX10.1 but closing AVX512F should not disable the encoding.
> 
> But I just double think on that and get your point. GCC is using a default "off"
> mode, if we are using OR logic, no code and current behavior are changed and
> everything is natural and smooth. However, binutils is using a default "on"
> mode, if we stick to OR logic just like GCC, it will eventually corrupt the current
> behavior of .noavx512xxx, which could be a problem. I am slightly persuaded on
> the proposal of setting and clearing bits of AVX512 for AVX10 in binutils.

The primary indication of things being done the wrong way is the need to
add several ".arch .noavx10.1" in the testsuite. Whatever the final
solution, this should not be necessary (because it indicates people may
also need to change their code then, if they want a guarantee that no
512-bit insns are used).

>> of CpuAVX10_MAX_512BIT, when I did suggest a new attribute (i.e. a new
>> bitfield in struct i386_opcode_modifier), and then a more general purpose
>> one (so that by it being / becoming not just boolean it can later also be
>> used to deal with the - for now only theoretical - AVX10/128 case).
> 
> For question 2, I misunderstood the meaning of attribute. But I suppose
> AVX10/128 is too theoretical to be true. I will make it a boolean for now.

Right, a boolean is fine initially, but with the spec explicitly allowing
the 128-bits-only mode, I'm pretty sure we ought to support that rather
sooner than later. After all, more artificial environments (emulators,
virtualization) may expose feature combinations not ever seen on real
hardware.

Jan
  
Jan Beulich Aug. 14, 2023, 10:35 a.m. UTC | #4
On 14.08.2023 12:33, Jan Beulich via Binutils wrote:
> On 14.08.2023 10:46, Jiang, Haochen wrote:
>> For question 2, I misunderstood the meaning of attribute. But I suppose
>> AVX10/128 is too theoretical to be true. I will make it a boolean for now.
> 
> Right, a boolean is fine initially, but with the spec explicitly allowing
> the 128-bits-only mode, I'm pretty sure we ought to support that rather
> sooner than later. After all, more artificial environments (emulators,
> virtualization) may expose feature combinations not ever seen on real
> hardware.

Actually, making it a boolean isn't nice, because a boolean would be named
differently than a numeric field. So I think it wants to be numeric, but
with only 0 and one other value permitted for now.

Jan
  
Frager, Neal via Binutils Aug. 15, 2023, 8:32 a.m. UTC | #5
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Monday, August 14, 2023 6:34 PM
> To: Jiang, Haochen <haochen.jiang@intel.com>
> Cc: binutils@sourceware.org; hjl.tools@gmail.com
> Subject: Re: [PATCH v2] Support Intel AVX10.1
> 
> On 14.08.2023 10:46, Jiang, Haochen wrote:
> >> Before I get into any details here, I'd like to understand why there
> >> still is a new CpuAVX10_1 bit, when I had asked to drop it. I'm also
> >> concerned
> >
> > The reason is that we would like to keep the OR logic in the
> > toolchain, which means opening AVX10.1 but closing AVX512F should not
> disable the encoding.
> >
> > But I just double think on that and get your point. GCC is using a default "off"
> > mode, if we are using OR logic, no code and current behavior are
> > changed and everything is natural and smooth. However, binutils is using a
> default "on"
> > mode, if we stick to OR logic just like GCC, it will eventually
> > corrupt the current behavior of .noavx512xxx, which could be a
> > problem. I am slightly persuaded on the proposal of setting and clearing bits
> of AVX512 for AVX10 in binutils.
> 
> The primary indication of things being done the wrong way is the need to add
> several ".arch .noavx10.1" in the testsuite. Whatever the final solution, this
> should not be necessary (because it indicates people may also need to change
> their code then, if they want a guarantee that no 512-bit insns are used).
> 

I have an open after digging into .arch directives corner cases when we choose
to set/clear bits for AVX512 in AVX10.1.

Should directives like .noavx512f .avx10.1 open zmm registers? For directive
.noavx512fp16 .avx10.1, should we enable zmm registers for AVX512FP16 insts?

> >> of CpuAVX10_MAX_512BIT, when I did suggest a new attribute (i.e. a
> >> new bitfield in struct i386_opcode_modifier), and then a more general
> >> purpose one (so that by it being / becoming not just boolean it can
> >> later also be used to deal with the - for now only theoretical - AVX10/128
> case).
> >
> > For question 2, I misunderstood the meaning of attribute. But I
> > suppose
> > AVX10/128 is too theoretical to be true. I will make it a boolean for now.
> 
> Right, a boolean is fine initially, but with the spec explicitly allowing the 128-
> bits-only mode, I'm pretty sure we ought to support that rather sooner than
> later. After all, more artificial environments (emulators,
> virtualization) may expose feature combinations not ever seen on real
> hardware.

After I think twice on that, I suppose maybe it is not that appropriate to put it
into i386_opcode_modifier since in AVX10, the vector width is depends on CPU.
I suppose i386_opcode_modifier is a feature for instructions but not CPU.

Thx,
Haochen

> 
> Jan
  
Jan Beulich Aug. 15, 2023, 2:10 p.m. UTC | #6
On 15.08.2023 10:32, Jiang, Haochen wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Monday, August 14, 2023 6:34 PM
>> To: Jiang, Haochen <haochen.jiang@intel.com>
>> Cc: binutils@sourceware.org; hjl.tools@gmail.com
>> Subject: Re: [PATCH v2] Support Intel AVX10.1
>>
>> On 14.08.2023 10:46, Jiang, Haochen wrote:
>>>> Before I get into any details here, I'd like to understand why there
>>>> still is a new CpuAVX10_1 bit, when I had asked to drop it. I'm also
>>>> concerned
>>>
>>> The reason is that we would like to keep the OR logic in the
>>> toolchain, which means opening AVX10.1 but closing AVX512F should not
>> disable the encoding.
>>>
>>> But I just double think on that and get your point. GCC is using a default "off"
>>> mode, if we are using OR logic, no code and current behavior are
>>> changed and everything is natural and smooth. However, binutils is using a
>> default "on"
>>> mode, if we stick to OR logic just like GCC, it will eventually
>>> corrupt the current behavior of .noavx512xxx, which could be a
>>> problem. I am slightly persuaded on the proposal of setting and clearing bits
>> of AVX512 for AVX10 in binutils.
>>
>> The primary indication of things being done the wrong way is the need to add
>> several ".arch .noavx10.1" in the testsuite. Whatever the final solution, this
>> should not be necessary (because it indicates people may also need to change
>> their code then, if they want a guarantee that no 512-bit insns are used).
>>
> 
> I have an open after digging into .arch directives corner cases when we choose
> to set/clear bits for AVX512 in AVX10.1.
> 
> Should directives like .noavx512f .avx10.1 open zmm registers?

You mean the combination of the two, in that order? Yes, of course.

> For directive
> .noavx512fp16 .avx10.1, should we enable zmm registers for AVX512FP16 insts?

And then yes here, too.

In both cases what remains to be determined is how vector size is to
be limited. I think that wants to be independent of the .avx10.<N>.

>>>> of CpuAVX10_MAX_512BIT, when I did suggest a new attribute (i.e. a
>>>> new bitfield in struct i386_opcode_modifier), and then a more general
>>>> purpose one (so that by it being / becoming not just boolean it can
>>>> later also be used to deal with the - for now only theoretical - AVX10/128
>> case).
>>>
>>> For question 2, I misunderstood the meaning of attribute. But I
>>> suppose
>>> AVX10/128 is too theoretical to be true. I will make it a boolean for now.
>>
>> Right, a boolean is fine initially, but with the spec explicitly allowing the 128-
>> bits-only mode, I'm pretty sure we ought to support that rather sooner than
>> later. After all, more artificial environments (emulators,
>> virtualization) may expose feature combinations not ever seen on real
>> hardware.
> 
> After I think twice on that, I suppose maybe it is not that appropriate to put it
> into i386_opcode_modifier since in AVX10, the vector width is depends on CPU.
> I suppose i386_opcode_modifier is a feature for instructions but not CPU.

I disagree. See the uses of EVex, for example. As said above, I think
maximum vector width and ISA extensions want dealing with separately,
and only the latter would generally qualify for Cpu* flags. Furthermore
recall that the attribute wants widening sooner or later, and Cpu*
flags are uniformly boolean. Only attributes may have numeric values.

Jan
  
Frager, Neal via Binutils Aug. 16, 2023, 8:21 a.m. UTC | #7
> > I have an open after digging into .arch directives corner cases when we choose
> > to set/clear bits for AVX512 in AVX10.1.
> >
> > Should directives like .noavx512f .avx10.1 open zmm registers?
> 
> You mean the combination of the two, in that order? Yes, of course.
> 
> > For directive
> > .noavx512fp16 .avx10.1, should we enable zmm registers for AVX512FP16 insts?
> 
> And then yes here, too.
> 
> In both cases what remains to be determined is how vector size is to
> be limited. I think that wants to be independent of the .avx10.<N>.
> 

That also met my expectation. And it will make everything easy to
understand.

> >> Right, a boolean is fine initially, but with the spec explicitly allowing the 128-
> >> bits-only mode, I'm pretty sure we ought to support that rather sooner than
> >> later. After all, more artificial environments (emulators,
> >> virtualization) may expose feature combinations not ever seen on real
> >> hardware.
> >
> > After I think twice on that, I suppose maybe it is not that appropriate to put it
> > into i386_opcode_modifier since in AVX10, the vector width is depends on CPU.
> > I suppose i386_opcode_modifier is a feature for instructions but not CPU.
> 
> I disagree. See the uses of EVex, for example. As said above, I think
> maximum vector width and ISA extensions want dealing with separately,
> and only the latter would generally qualify for Cpu* flags. Furthermore
> recall that the attribute wants widening sooner or later, and Cpu*
> flags are uniformly boolean. Only attributes may have numeric values.

After I checked code, I still miss the point here.

My concern is how to actually disable the zmm registers for AVX10/256
and ymm registers for theoretical AVX10/128. I suppose i386_opcode_modifier
is more related to building up the whole encoding. But each AVX10.X/256 is an
actual arch.

Adding a feature in i386_opcode_modifier can indicate what is the maximum
vector length the instruction is allowed on all archs but has nothing to do with
disabling zmm registers on an 256-bit only arch.

I might be wrong on the understanding on what to add in i386_opcode_modifier.
Please just correct if there is something wrong.

Thx,
Haochen

> 
> Jan
  
Jan Beulich Aug. 16, 2023, 8:59 a.m. UTC | #8
On 16.08.2023 10:21, Jiang, Haochen wrote:
>>> After I think twice on that, I suppose maybe it is not that appropriate to put it
>>> into i386_opcode_modifier since in AVX10, the vector width is depends on CPU.
>>> I suppose i386_opcode_modifier is a feature for instructions but not CPU.
>>
>> I disagree. See the uses of EVex, for example. As said above, I think
>> maximum vector width and ISA extensions want dealing with separately,
>> and only the latter would generally qualify for Cpu* flags. Furthermore
>> recall that the attribute wants widening sooner or later, and Cpu*
>> flags are uniformly boolean. Only attributes may have numeric values.
> 
> After I checked code, I still miss the point here.
> 
> My concern is how to actually disable the zmm registers for AVX10/256
> and ymm registers for theoretical AVX10/128.

That's the easy part: That'll want doing in check_register(). The issue
is with insns which do 512-bit operation despite not using zmm registers
(think of vfpclassp* with memory operand).

> I suppose i386_opcode_modifier
> is more related to building up the whole encoding. But each AVX10.X/256 is an
> actual arch.

I wouldn't agree with the last sentence, but ...

> Adding a feature in i386_opcode_modifier can indicate what is the maximum
> vector length the instruction is allowed on all archs but has nothing to do with
> disabling zmm registers on an 256-bit only arch.

... you still have a point here. Maybe it only wants to be a boolean,
indicating that an insn is vector-length sensitive. Yet re-using the
EVex attribute continues to be an option: With vector length
constrained to 256 (or 128) bits, EVEX512 (or EVEX256) simply become
unavailable for encoding, and EVEXDYN would be equally constrained.
And if re-using that attribute continues to be an option, adding a
new non-boolean attribute clearly is also possible.

So I guess there may have been a slight misunderstanding: I was
suggesting an attribute expressing permissible vector lengths (hence
the consideration of re-using EVex), which would then be checked
against the established (through whatever directive / command line
option) maximum vector length. I did not suggest a new "max vector
length" attribute.

Jan
  
Jan Beulich Aug. 17, 2023, 9:08 a.m. UTC | #9
On 16.08.2023 10:59, Jan Beulich via Binutils wrote:
> On 16.08.2023 10:21, Jiang, Haochen wrote:
>>>> After I think twice on that, I suppose maybe it is not that appropriate to put it
>>>> into i386_opcode_modifier since in AVX10, the vector width is depends on CPU.
>>>> I suppose i386_opcode_modifier is a feature for instructions but not CPU.
>>>
>>> I disagree. See the uses of EVex, for example. As said above, I think
>>> maximum vector width and ISA extensions want dealing with separately,
>>> and only the latter would generally qualify for Cpu* flags. Furthermore
>>> recall that the attribute wants widening sooner or later, and Cpu*
>>> flags are uniformly boolean. Only attributes may have numeric values.
>>
>> After I checked code, I still miss the point here.
>>
>> My concern is how to actually disable the zmm registers for AVX10/256
>> and ymm registers for theoretical AVX10/128.
> 
> That's the easy part: That'll want doing in check_register(). The issue
> is with insns which do 512-bit operation despite not using zmm registers
> (think of vfpclassp* with memory operand).
> 
>> I suppose i386_opcode_modifier
>> is more related to building up the whole encoding. But each AVX10.X/256 is an
>> actual arch.
> 
> I wouldn't agree with the last sentence, but ...
> 
>> Adding a feature in i386_opcode_modifier can indicate what is the maximum
>> vector length the instruction is allowed on all archs but has nothing to do with
>> disabling zmm registers on an 256-bit only arch.
> 
> ... you still have a point here. Maybe it only wants to be a boolean,
> indicating that an insn is vector-length sensitive. Yet re-using the
> EVex attribute continues to be an option: With vector length
> constrained to 256 (or 128) bits, EVEX512 (or EVEX256) simply become
> unavailable for encoding, and EVEXDYN would be equally constrained.
> And if re-using that attribute continues to be an option, adding a
> new non-boolean attribute clearly is also possible.
> 
> So I guess there may have been a slight misunderstanding: I was
> suggesting an attribute expressing permissible vector lengths (hence
> the consideration of re-using EVex), which would then be checked
> against the established (through whatever directive / command line
> option) maximum vector length. I did not suggest a new "max vector
> length" attribute.

Just to mention it: I've meanwhile realized that re-using EVex here will
collide with APX introducing EVEX-encoded KMOV*. So it'll need to be a
very similar but distinct attribute. And if it turned out that the
attribute then is really only needed on the mask insns (using EVex
elsewhere), it could equally well be a "permitted vector lengths" or a
"maximum vector length" one, as both are then equal. Question is what
AVX10/128 would mean for VEX-encoded insns. It seems likely that 256-bit
forms wouldn't be permitted there either then, in which case applicable
VEX-encoded insns would then need to gain such attributes as well. In
that case it would of course be more logical to stick to "permitted
vector lengths".

Jan
  
Jan Beulich Aug. 18, 2023, 6:53 a.m. UTC | #10
On 17.08.2023 11:08, Jan Beulich via Binutils wrote:
> On 16.08.2023 10:59, Jan Beulich via Binutils wrote:
>> On 16.08.2023 10:21, Jiang, Haochen wrote:
>>>>> After I think twice on that, I suppose maybe it is not that appropriate to put it
>>>>> into i386_opcode_modifier since in AVX10, the vector width is depends on CPU.
>>>>> I suppose i386_opcode_modifier is a feature for instructions but not CPU.
>>>>
>>>> I disagree. See the uses of EVex, for example. As said above, I think
>>>> maximum vector width and ISA extensions want dealing with separately,
>>>> and only the latter would generally qualify for Cpu* flags. Furthermore
>>>> recall that the attribute wants widening sooner or later, and Cpu*
>>>> flags are uniformly boolean. Only attributes may have numeric values.
>>>
>>> After I checked code, I still miss the point here.
>>>
>>> My concern is how to actually disable the zmm registers for AVX10/256
>>> and ymm registers for theoretical AVX10/128.
>>
>> That's the easy part: That'll want doing in check_register(). The issue
>> is with insns which do 512-bit operation despite not using zmm registers
>> (think of vfpclassp* with memory operand).
>>
>>> I suppose i386_opcode_modifier
>>> is more related to building up the whole encoding. But each AVX10.X/256 is an
>>> actual arch.
>>
>> I wouldn't agree with the last sentence, but ...
>>
>>> Adding a feature in i386_opcode_modifier can indicate what is the maximum
>>> vector length the instruction is allowed on all archs but has nothing to do with
>>> disabling zmm registers on an 256-bit only arch.
>>
>> ... you still have a point here. Maybe it only wants to be a boolean,
>> indicating that an insn is vector-length sensitive. Yet re-using the
>> EVex attribute continues to be an option: With vector length
>> constrained to 256 (or 128) bits, EVEX512 (or EVEX256) simply become
>> unavailable for encoding, and EVEXDYN would be equally constrained.
>> And if re-using that attribute continues to be an option, adding a
>> new non-boolean attribute clearly is also possible.
>>
>> So I guess there may have been a slight misunderstanding: I was
>> suggesting an attribute expressing permissible vector lengths (hence
>> the consideration of re-using EVex), which would then be checked
>> against the established (through whatever directive / command line
>> option) maximum vector length. I did not suggest a new "max vector
>> length" attribute.
> 
> Just to mention it: I've meanwhile realized that re-using EVex here will
> collide with APX introducing EVEX-encoded KMOV*. So it'll need to be a
> very similar but distinct attribute. And if it turned out that the
> attribute then is really only needed on the mask insns (using EVex
> elsewhere), it could equally well be a "permitted vector lengths" or a
> "maximum vector length" one, as both are then equal. Question is what
> AVX10/128 would mean for VEX-encoded insns. It seems likely that 256-bit
> forms wouldn't be permitted there either then, in which case applicable
> VEX-encoded insns would then need to gain such attributes as well. In
> that case it would of course be more logical to stick to "permitted
> vector lengths".

Sorry, yet another update. For one it can't be "maximum", but only
"minimum". And I meanwhile think "permitted" along the lines of EVex
won't catch it either. I guess I will want to take a stab myself ...

Jan
  
Jan Beulich Aug. 18, 2023, 1:03 p.m. UTC | #11
On 14.08.2023 08:45, Haochen Jiang wrote:
> @@ -1315,6 +1321,20 @@ output_i386_opcode (FILE *table, const char *name, char *str,
>    ident = mkident (name);
>    fprintf (table, "  { MN_%s, 0x%0*llx%s, %u,",
>  	   ident, 2 * (int)length, opcode, end, i);
> +
> +  j = strlen(ident);
> +  /* All AVX512F based instructions are usable for AVX10.1 except
> +     AVX512PF/ER/4FMAPS/4VNNIW/VP2INTERSECT.  */
> +  if (strstr (cpu_flags, "AVX512")
> +      && !strstr (cpu_flags, "AVX512PF")
> +      && !strstr (cpu_flags, "AVX512ER")
> +      && !strstr (cpu_flags, "4FMAPS")
> +      && !strstr (cpu_flags, "4VNNIW")
> +      && !strstr (cpu_flags, "VP2INTERSECT"))
> +    {
> +      cpu_flags = concat (cpu_flags, "|AVX10_1", NULL);
> +      k = 1;
> +    }
>    free (ident);

While making a patch myself along the lines of what I had outlined, I came
to realize that the above isn't enough. (I'm pretty sure I wouldn't have
spotted this by merely reviewing your patch.) This may be a result of the
spec being somewhat ambiguous when it comes to GFNI, VAES, and VPCLMULQDQ.
There's a note there saying something about the respective EVEX encodings.
But that still requires the VEX encodings connected to these three
features to also become suitably available. While this works fine for
GFNI, it doesn't for the other two: The 128-bit VEX encodings, which
surely are available when the 256-bit ones are, would become impossible
to use. The assembler would pick the (larger) EVEX forms instead. There
are two ways to solve this that I can see right away:
1) AES becomes a dependency of VAES (and PCLMULQDQ one of VPCLMULQDQ)
2) We put in place extra templates.
I'm wary of the first option as long as not at least informally supported
by you (Intel). Hence I went with option 2 for now.

I'm only done with the /512 patch, so I won't post right away. I'm still
debating with myself whether to control maximum vector length via a new
directive, or via a special form of .arch.

Jan
  
Frager, Neal via Binutils Aug. 23, 2023, 2:20 a.m. UTC | #12
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Friday, August 18, 2023 9:03 PM
> To: Jiang, Haochen <haochen.jiang@intel.com>
> Cc: hjl.tools@gmail.com; binutils@sourceware.org
> Subject: Re: [PATCH v2] Support Intel AVX10.1
> 
> On 14.08.2023 08:45, Haochen Jiang wrote:
> > @@ -1315,6 +1321,20 @@ output_i386_opcode (FILE *table, const char
> *name, char *str,
> >    ident = mkident (name);
> >    fprintf (table, "  { MN_%s, 0x%0*llx%s, %u,",
> >  	   ident, 2 * (int)length, opcode, end, i);
> > +
> > +  j = strlen(ident);
> > +  /* All AVX512F based instructions are usable for AVX10.1 except
> > +     AVX512PF/ER/4FMAPS/4VNNIW/VP2INTERSECT.  */  if (strstr
> > + (cpu_flags, "AVX512")
> > +      && !strstr (cpu_flags, "AVX512PF")
> > +      && !strstr (cpu_flags, "AVX512ER")
> > +      && !strstr (cpu_flags, "4FMAPS")
> > +      && !strstr (cpu_flags, "4VNNIW")
> > +      && !strstr (cpu_flags, "VP2INTERSECT"))
> > +    {
> > +      cpu_flags = concat (cpu_flags, "|AVX10_1", NULL);
> > +      k = 1;
> > +    }
> >    free (ident);
> 
> While making a patch myself along the lines of what I had outlined, I came to
> realize that the above isn't enough. (I'm pretty sure I wouldn't have spotted
> this by merely reviewing your patch.) This may be a result of the spec being
> somewhat ambiguous when it comes to GFNI, VAES, and VPCLMULQDQ.
> There's a note there saying something about the respective EVEX encodings.
> But that still requires the VEX encodings connected to these three features to
> also become suitably available. While this works fine for GFNI, it doesn't for
> the other two: The 128-bit VEX encodings, which surely are available when the
> 256-bit ones are, would become impossible to use. The assembler would pick
> the (larger) EVEX forms instead. There are two ways to solve this that I can see
> right away:
> 1) AES becomes a dependency of VAES (and PCLMULQDQ one of
> VPCLMULQDQ)
> 2) We put in place extra templates.
> I'm wary of the first option as long as not at least informally supported by you
> (Intel). Hence I went with option 2 for now.
> 
> I'm only done with the /512 patch, so I won't post right away. I'm still
> debating with myself whether to control maximum vector length via a new
> directive, or via a special form of .arch.

Hi Jan,

Do you think a command line option like -mavx10maxvl=256/512 with default 512
is ok for this scenario? I am working to revise the AVX10.1 patch like that.

Thx,
Haochen

> 
> Jan
  
Jan Beulich Aug. 23, 2023, 5:54 a.m. UTC | #13
On 23.08.2023 04:20, Jiang, Haochen wrote:
> 
> 
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Friday, August 18, 2023 9:03 PM
>> To: Jiang, Haochen <haochen.jiang@intel.com>
>> Cc: hjl.tools@gmail.com; binutils@sourceware.org
>> Subject: Re: [PATCH v2] Support Intel AVX10.1
>>
>> On 14.08.2023 08:45, Haochen Jiang wrote:
>>> @@ -1315,6 +1321,20 @@ output_i386_opcode (FILE *table, const char
>> *name, char *str,
>>>    ident = mkident (name);
>>>    fprintf (table, "  { MN_%s, 0x%0*llx%s, %u,",
>>>  	   ident, 2 * (int)length, opcode, end, i);
>>> +
>>> +  j = strlen(ident);
>>> +  /* All AVX512F based instructions are usable for AVX10.1 except
>>> +     AVX512PF/ER/4FMAPS/4VNNIW/VP2INTERSECT.  */  if (strstr
>>> + (cpu_flags, "AVX512")
>>> +      && !strstr (cpu_flags, "AVX512PF")
>>> +      && !strstr (cpu_flags, "AVX512ER")
>>> +      && !strstr (cpu_flags, "4FMAPS")
>>> +      && !strstr (cpu_flags, "4VNNIW")
>>> +      && !strstr (cpu_flags, "VP2INTERSECT"))
>>> +    {
>>> +      cpu_flags = concat (cpu_flags, "|AVX10_1", NULL);
>>> +      k = 1;
>>> +    }
>>>    free (ident);
>>
>> While making a patch myself along the lines of what I had outlined, I came to
>> realize that the above isn't enough. (I'm pretty sure I wouldn't have spotted
>> this by merely reviewing your patch.) This may be a result of the spec being
>> somewhat ambiguous when it comes to GFNI, VAES, and VPCLMULQDQ.
>> There's a note there saying something about the respective EVEX encodings.
>> But that still requires the VEX encodings connected to these three features to
>> also become suitably available. While this works fine for GFNI, it doesn't for
>> the other two: The 128-bit VEX encodings, which surely are available when the
>> 256-bit ones are, would become impossible to use. The assembler would pick
>> the (larger) EVEX forms instead. There are two ways to solve this that I can see
>> right away:
>> 1) AES becomes a dependency of VAES (and PCLMULQDQ one of
>> VPCLMULQDQ)
>> 2) We put in place extra templates.
>> I'm wary of the first option as long as not at least informally supported by you
>> (Intel). Hence I went with option 2 for now.
>>
>> I'm only done with the /512 patch, so I won't post right away. I'm still
>> debating with myself whether to control maximum vector length via a new
>> directive, or via a special form of .arch.
> 
> Do you think a command line option like -mavx10maxvl=256/512 with default 512
> is ok for this scenario? I am working to revise the AVX10.1 patch like that.

That's certainly an option, but right now I have different plans.

Jan
  
Frager, Neal via Binutils Aug. 23, 2023, 6:21 a.m. UTC | #14
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Wednesday, August 23, 2023 1:54 PM
> To: Jiang, Haochen <haochen.jiang@intel.com>
> Cc: hjl.tools@gmail.com; binutils@sourceware.org
> Subject: Re: [PATCH v2] Support Intel AVX10.1
> 
> On 23.08.2023 04:20, Jiang, Haochen wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Friday, August 18, 2023 9:03 PM
> >> To: Jiang, Haochen <haochen.jiang@intel.com>
> >> Cc: hjl.tools@gmail.com; binutils@sourceware.org
> >> Subject: Re: [PATCH v2] Support Intel AVX10.1
> >>
> >> On 14.08.2023 08:45, Haochen Jiang wrote:
> >>> @@ -1315,6 +1321,20 @@ output_i386_opcode (FILE *table, const char
> >> *name, char *str,
> >>>    ident = mkident (name);
> >>>    fprintf (table, "  { MN_%s, 0x%0*llx%s, %u,",
> >>>  	   ident, 2 * (int)length, opcode, end, i);
> >>> +
> >>> +  j = strlen(ident);
> >>> +  /* All AVX512F based instructions are usable for AVX10.1 except
> >>> +     AVX512PF/ER/4FMAPS/4VNNIW/VP2INTERSECT.  */  if (strstr
> >>> + (cpu_flags, "AVX512")
> >>> +      && !strstr (cpu_flags, "AVX512PF")
> >>> +      && !strstr (cpu_flags, "AVX512ER")
> >>> +      && !strstr (cpu_flags, "4FMAPS")
> >>> +      && !strstr (cpu_flags, "4VNNIW")
> >>> +      && !strstr (cpu_flags, "VP2INTERSECT"))
> >>> +    {
> >>> +      cpu_flags = concat (cpu_flags, "|AVX10_1", NULL);
> >>> +      k = 1;
> >>> +    }
> >>>    free (ident);
> >>
> >> While making a patch myself along the lines of what I had outlined, I
> >> came to realize that the above isn't enough. (I'm pretty sure I
> >> wouldn't have spotted this by merely reviewing your patch.) This may
> >> be a result of the spec being somewhat ambiguous when it comes to GFNI,
> VAES, and VPCLMULQDQ.
> >> There's a note there saying something about the respective EVEX
> encodings.
> >> But that still requires the VEX encodings connected to these three
> >> features to also become suitably available. While this works fine for
> >> GFNI, it doesn't for the other two: The 128-bit VEX encodings, which
> >> surely are available when the 256-bit ones are, would become
> >> impossible to use. The assembler would pick the (larger) EVEX forms
> >> instead. There are two ways to solve this that I can see right away:
> >> 1) AES becomes a dependency of VAES (and PCLMULQDQ one of
> >> VPCLMULQDQ)
> >> 2) We put in place extra templates.
> >> I'm wary of the first option as long as not at least informally
> >> supported by you (Intel). Hence I went with option 2 for now.
> >>
> >> I'm only done with the /512 patch, so I won't post right away. I'm
> >> still debating with myself whether to control maximum vector length
> >> via a new directive, or via a special form of .arch.
> >
> > Do you think a command line option like -mavx10maxvl=256/512 with
> > default 512 is ok for this scenario? I am working to revise the AVX10.1 patch
> like that.
> 
> That's certainly an option, but right now I have different plans.

Actually all the three options are ok for me, they should not be that complex
based on the current part of v3 patch setting/clearing AVX512 bit for AVX10.1.

Thx,
Haochen

> 
> Jan
  
Jan Beulich Aug. 23, 2023, 6:24 a.m. UTC | #15
On 23.08.2023 08:21, Jiang, Haochen wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Wednesday, August 23, 2023 1:54 PM
>> To: Jiang, Haochen <haochen.jiang@intel.com>
>> Cc: hjl.tools@gmail.com; binutils@sourceware.org
>> Subject: Re: [PATCH v2] Support Intel AVX10.1
>>
>> On 23.08.2023 04:20, Jiang, Haochen wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: Friday, August 18, 2023 9:03 PM
>>>> To: Jiang, Haochen <haochen.jiang@intel.com>
>>>> Cc: hjl.tools@gmail.com; binutils@sourceware.org
>>>> Subject: Re: [PATCH v2] Support Intel AVX10.1
>>>>
>>>> On 14.08.2023 08:45, Haochen Jiang wrote:
>>>>> @@ -1315,6 +1321,20 @@ output_i386_opcode (FILE *table, const char
>>>> *name, char *str,
>>>>>    ident = mkident (name);
>>>>>    fprintf (table, "  { MN_%s, 0x%0*llx%s, %u,",
>>>>>  	   ident, 2 * (int)length, opcode, end, i);
>>>>> +
>>>>> +  j = strlen(ident);
>>>>> +  /* All AVX512F based instructions are usable for AVX10.1 except
>>>>> +     AVX512PF/ER/4FMAPS/4VNNIW/VP2INTERSECT.  */  if (strstr
>>>>> + (cpu_flags, "AVX512")
>>>>> +      && !strstr (cpu_flags, "AVX512PF")
>>>>> +      && !strstr (cpu_flags, "AVX512ER")
>>>>> +      && !strstr (cpu_flags, "4FMAPS")
>>>>> +      && !strstr (cpu_flags, "4VNNIW")
>>>>> +      && !strstr (cpu_flags, "VP2INTERSECT"))
>>>>> +    {
>>>>> +      cpu_flags = concat (cpu_flags, "|AVX10_1", NULL);
>>>>> +      k = 1;
>>>>> +    }
>>>>>    free (ident);
>>>>
>>>> While making a patch myself along the lines of what I had outlined, I
>>>> came to realize that the above isn't enough. (I'm pretty sure I
>>>> wouldn't have spotted this by merely reviewing your patch.) This may
>>>> be a result of the spec being somewhat ambiguous when it comes to GFNI,
>> VAES, and VPCLMULQDQ.
>>>> There's a note there saying something about the respective EVEX
>> encodings.
>>>> But that still requires the VEX encodings connected to these three
>>>> features to also become suitably available. While this works fine for
>>>> GFNI, it doesn't for the other two: The 128-bit VEX encodings, which
>>>> surely are available when the 256-bit ones are, would become
>>>> impossible to use. The assembler would pick the (larger) EVEX forms
>>>> instead. There are two ways to solve this that I can see right away:
>>>> 1) AES becomes a dependency of VAES (and PCLMULQDQ one of
>>>> VPCLMULQDQ)
>>>> 2) We put in place extra templates.
>>>> I'm wary of the first option as long as not at least informally
>>>> supported by you (Intel). Hence I went with option 2 for now.
>>>>
>>>> I'm only done with the /512 patch, so I won't post right away. I'm
>>>> still debating with myself whether to control maximum vector length
>>>> via a new directive, or via a special form of .arch.
>>>
>>> Do you think a command line option like -mavx10maxvl=256/512 with
>>> default 512 is ok for this scenario? I am working to revise the AVX10.1 patch
>> like that.
>>
>> That's certainly an option, but right now I have different plans.
> 
> Actually all the three options are ok for me, they should not be that complex
> based on the current part of v3 patch setting/clearing AVX512 bit for AVX10.1.

Mind me asking what "all the three options" you're referring to here?

Jan
  
Frager, Neal via Binutils Aug. 23, 2023, 6:25 a.m. UTC | #16
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: Wednesday, August 23, 2023 2:24 PM
> To: Jiang, Haochen <haochen.jiang@intel.com>
> Cc: hjl.tools@gmail.com; binutils@sourceware.org
> Subject: Re: [PATCH v2] Support Intel AVX10.1
> 
> On 23.08.2023 08:21, Jiang, Haochen wrote:
> >> -----Original Message-----
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: Wednesday, August 23, 2023 1:54 PM
> >> To: Jiang, Haochen <haochen.jiang@intel.com>
> >> Cc: hjl.tools@gmail.com; binutils@sourceware.org
> >> Subject: Re: [PATCH v2] Support Intel AVX10.1
> >>
> >> On 23.08.2023 04:20, Jiang, Haochen wrote:
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Jan Beulich <jbeulich@suse.com>
> >>>> Sent: Friday, August 18, 2023 9:03 PM
> >>>> To: Jiang, Haochen <haochen.jiang@intel.com>
> >>>> Cc: hjl.tools@gmail.com; binutils@sourceware.org
> >>>> Subject: Re: [PATCH v2] Support Intel AVX10.1
> >>>>
> >>>> On 14.08.2023 08:45, Haochen Jiang wrote:
> >>>>> @@ -1315,6 +1321,20 @@ output_i386_opcode (FILE *table, const
> char
> >>>> *name, char *str,
> >>>>>    ident = mkident (name);
> >>>>>    fprintf (table, "  { MN_%s, 0x%0*llx%s, %u,",
> >>>>>  	   ident, 2 * (int)length, opcode, end, i);
> >>>>> +
> >>>>> +  j = strlen(ident);
> >>>>> +  /* All AVX512F based instructions are usable for AVX10.1 except
> >>>>> +     AVX512PF/ER/4FMAPS/4VNNIW/VP2INTERSECT.  */  if (strstr
> >>>>> + (cpu_flags, "AVX512")
> >>>>> +      && !strstr (cpu_flags, "AVX512PF")
> >>>>> +      && !strstr (cpu_flags, "AVX512ER")
> >>>>> +      && !strstr (cpu_flags, "4FMAPS")
> >>>>> +      && !strstr (cpu_flags, "4VNNIW")
> >>>>> +      && !strstr (cpu_flags, "VP2INTERSECT"))
> >>>>> +    {
> >>>>> +      cpu_flags = concat (cpu_flags, "|AVX10_1", NULL);
> >>>>> +      k = 1;
> >>>>> +    }
> >>>>>    free (ident);
> >>>>
> >>>> While making a patch myself along the lines of what I had outlined,
> >>>> I came to realize that the above isn't enough. (I'm pretty sure I
> >>>> wouldn't have spotted this by merely reviewing your patch.) This
> >>>> may be a result of the spec being somewhat ambiguous when it comes
> >>>> to GFNI,
> >> VAES, and VPCLMULQDQ.
> >>>> There's a note there saying something about the respective EVEX
> >> encodings.
> >>>> But that still requires the VEX encodings connected to these three
> >>>> features to also become suitably available. While this works fine
> >>>> for GFNI, it doesn't for the other two: The 128-bit VEX encodings,
> >>>> which surely are available when the 256-bit ones are, would become
> >>>> impossible to use. The assembler would pick the (larger) EVEX forms
> >>>> instead. There are two ways to solve this that I can see right away:
> >>>> 1) AES becomes a dependency of VAES (and PCLMULQDQ one of
> >>>> VPCLMULQDQ)
> >>>> 2) We put in place extra templates.
> >>>> I'm wary of the first option as long as not at least informally
> >>>> supported by you (Intel). Hence I went with option 2 for now.
> >>>>
> >>>> I'm only done with the /512 patch, so I won't post right away. I'm
> >>>> still debating with myself whether to control maximum vector length
> >>>> via a new directive, or via a special form of .arch.
> >>>
> >>> Do you think a command line option like -mavx10maxvl=256/512 with
> >>> default 512 is ok for this scenario? I am working to revise the
> >>> AVX10.1 patch
> >> like that.
> >>
> >> That's certainly an option, but right now I have different plans.
> >
> > Actually all the three options are ok for me, they should not be that
> > complex based on the current part of v3 patch setting/clearing AVX512 bit
> for AVX10.1.
> 
> Mind me asking what "all the three options" you're referring to here?

A new directive, a special form of .arch or the command line option.

Thx,
Haochen

> 
> Jan
  
Jan Beulich Aug. 23, 2023, 6:39 a.m. UTC | #17
On 23.08.2023 08:25, Jiang, Haochen wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: Wednesday, August 23, 2023 2:24 PM
>> To: Jiang, Haochen <haochen.jiang@intel.com>
>> Cc: hjl.tools@gmail.com; binutils@sourceware.org
>> Subject: Re: [PATCH v2] Support Intel AVX10.1
>>
>> On 23.08.2023 08:21, Jiang, Haochen wrote:
>>>> -----Original Message-----
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: Wednesday, August 23, 2023 1:54 PM
>>>> To: Jiang, Haochen <haochen.jiang@intel.com>
>>>> Cc: hjl.tools@gmail.com; binutils@sourceware.org
>>>> Subject: Re: [PATCH v2] Support Intel AVX10.1
>>>>
>>>> On 23.08.2023 04:20, Jiang, Haochen wrote:
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>>> Sent: Friday, August 18, 2023 9:03 PM
>>>>>> To: Jiang, Haochen <haochen.jiang@intel.com>
>>>>>> Cc: hjl.tools@gmail.com; binutils@sourceware.org
>>>>>> Subject: Re: [PATCH v2] Support Intel AVX10.1
>>>>>>
>>>>>> On 14.08.2023 08:45, Haochen Jiang wrote:
>>>>>>> @@ -1315,6 +1321,20 @@ output_i386_opcode (FILE *table, const
>> char
>>>>>> *name, char *str,
>>>>>>>    ident = mkident (name);
>>>>>>>    fprintf (table, "  { MN_%s, 0x%0*llx%s, %u,",
>>>>>>>  	   ident, 2 * (int)length, opcode, end, i);
>>>>>>> +
>>>>>>> +  j = strlen(ident);
>>>>>>> +  /* All AVX512F based instructions are usable for AVX10.1 except
>>>>>>> +     AVX512PF/ER/4FMAPS/4VNNIW/VP2INTERSECT.  */  if (strstr
>>>>>>> + (cpu_flags, "AVX512")
>>>>>>> +      && !strstr (cpu_flags, "AVX512PF")
>>>>>>> +      && !strstr (cpu_flags, "AVX512ER")
>>>>>>> +      && !strstr (cpu_flags, "4FMAPS")
>>>>>>> +      && !strstr (cpu_flags, "4VNNIW")
>>>>>>> +      && !strstr (cpu_flags, "VP2INTERSECT"))
>>>>>>> +    {
>>>>>>> +      cpu_flags = concat (cpu_flags, "|AVX10_1", NULL);
>>>>>>> +      k = 1;
>>>>>>> +    }
>>>>>>>    free (ident);
>>>>>>
>>>>>> While making a patch myself along the lines of what I had outlined,
>>>>>> I came to realize that the above isn't enough. (I'm pretty sure I
>>>>>> wouldn't have spotted this by merely reviewing your patch.) This
>>>>>> may be a result of the spec being somewhat ambiguous when it comes
>>>>>> to GFNI,
>>>> VAES, and VPCLMULQDQ.
>>>>>> There's a note there saying something about the respective EVEX
>>>> encodings.
>>>>>> But that still requires the VEX encodings connected to these three
>>>>>> features to also become suitably available. While this works fine
>>>>>> for GFNI, it doesn't for the other two: The 128-bit VEX encodings,
>>>>>> which surely are available when the 256-bit ones are, would become
>>>>>> impossible to use. The assembler would pick the (larger) EVEX forms
>>>>>> instead. There are two ways to solve this that I can see right away:
>>>>>> 1) AES becomes a dependency of VAES (and PCLMULQDQ one of
>>>>>> VPCLMULQDQ)
>>>>>> 2) We put in place extra templates.
>>>>>> I'm wary of the first option as long as not at least informally
>>>>>> supported by you (Intel). Hence I went with option 2 for now.
>>>>>>
>>>>>> I'm only done with the /512 patch, so I won't post right away. I'm
>>>>>> still debating with myself whether to control maximum vector length
>>>>>> via a new directive, or via a special form of .arch.
>>>>>
>>>>> Do you think a command line option like -mavx10maxvl=256/512 with
>>>>> default 512 is ok for this scenario? I am working to revise the
>>>>> AVX10.1 patch
>>>> like that.
>>>>
>>>> That's certainly an option, but right now I have different plans.
>>>
>>> Actually all the three options are ok for me, they should not be that
>>> complex based on the current part of v3 patch setting/clearing AVX512 bit
>> for AVX10.1.
>>
>> Mind me asking what "all the three options" you're referring to here?
> 
> A new directive, a special form of .arch or the command line option.

Oh, I see. Whatever we choose, it'll need to come in both command line
and directive form, I think. And then both want to be sufficiently
similar. As mentioned, I have a firm plan now, but of course I need to
see whether it ends up looking sensibly once actually carried out.

Jan
  

Patch

diff --git a/gas/NEWS b/gas/NEWS
index 1ed043511eb..4f3cc01d66a 100644
--- a/gas/NEWS
+++ b/gas/NEWS
@@ -1,5 +1,7 @@ 
 -*- text -*-
 
+* Add support for Intel AVX10.1 instructions.
+
 * Add support for Intel PBNDKB instructions.
 
 * Add support for Intel SM4 instructions.
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index e35e2660ed5..aa0941b0428 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1156,6 +1156,8 @@  static const arch_entry cpu_arch[] =
   SUBARCH (sm3, SM3, ANY_SM3, false),
   SUBARCH (sm4, SM4, ANY_SM4, false),
   SUBARCH (pbndkb, PBNDKB, PBNDKB, false),
+  SUBARCH (avx10.1, AVX10_1, ANY_AVX10_1, false),
+  SUBARCH (avx10_max_512bit, AVX10_MAX_512BIT, ANY_AVX10_MAX_512BIT, false),
 };
 
 #undef SUBARCH
@@ -1844,8 +1846,12 @@  cpu_flags_match (const insn_template *t)
       /* This instruction is available only on some archs.  */
       i386_cpu_flags cpu = cpu_arch_flags;
 
-      /* AVX512VL is no standalone feature - match it and then strip it.  */
-      if (x.bitfield.cpuavx512vl && !cpu.bitfield.cpuavx512vl)
+      /* AVX512VL is no standalone feature - match it and then strip it.
+         AVX10.1 shares the same encoding with AVX512VL, we also need to
+	 check it is set or not.  */
+      if (x.bitfield.cpuavx512vl
+	  && !cpu.bitfield.cpuavx512vl
+	  && !cpu.bitfield.cpuavx10_1)
 	return match;
       x.bitfield.cpuavx512vl = 0;
 
@@ -1871,13 +1877,25 @@  cpu_flags_match (const insn_template *t)
 	    }
 	  else if (x.bitfield.cpuavx512f)
 	    {
-	      /* We need to check a few extra flags with AVX512F.  */
-	      if (cpu.bitfield.cpuavx512f
+	      /* We need to check a few extra flags with AVX512F
+		 or AVX10.1.  */
+	      if ((cpu.bitfield.cpuavx512f || cpu.bitfield.cpuavx10_1)
 		  && (!x.bitfield.cpugfni || cpu.bitfield.cpugfni)
 		  && (!x.bitfield.cpuvaes || cpu.bitfield.cpuvaes)
 		  && (!x.bitfield.cpuvpclmulqdq || cpu.bitfield.cpuvpclmulqdq))
 		match |= CPU_FLAGS_ARCH_MATCH;
 	    }
+	  else if (x.bitfield.cpuavx512bw)
+	    {
+	      /* We need to eliminate 64 bit mask instructions when AVX512BW
+		 and AVX10.1-512 are both disabled.  */
+	      if (cpu.bitfield.cpuavx512bw
+		  || cpu_arch_flags.bitfield.cpuavx10_max_512bit
+		  || t->opcode_modifier.evex || t->opcode_modifier.vexw != 2
+		  || (t->opcode_modifier.opcodeprefix == 1
+		      && t->opcode_space != 3))
+		match |= CPU_FLAGS_ARCH_MATCH;
+	    }
 	  else
 	    match |= CPU_FLAGS_ARCH_MATCH;
 	}
@@ -6382,7 +6400,9 @@  check_VecOperands (const insn_template *t)
   cpu = cpu_flags_and (t->cpu_flags, avx512);
   if (!cpu_flags_all_zero (&cpu)
       && !t->cpu_flags.bitfield.cpuavx512vl
-      && !cpu_arch_flags.bitfield.cpuavx512vl)
+      && !cpu_arch_flags.bitfield.cpuavx512vl
+      && (!t->cpu_flags.bitfield.cpuavx10_1
+	  || !cpu_arch_flags.bitfield.cpuavx10_1))
     {
       for (op = 0; op < t->operands; ++op)
 	{
@@ -13794,10 +13814,14 @@  static bool check_register (const reg_entry *r)
   if (r->reg_type.bitfield.class == RegMMX && !cpu_arch_flags.bitfield.cpummx)
     return false;
 
-  if (!cpu_arch_flags.bitfield.cpuavx512f)
+  if (!cpu_arch_flags.bitfield.cpuavx512f
+      && !cpu_arch_flags.bitfield.cpuavx10_max_512bit)
     {
-      if (r->reg_type.bitfield.zmmword
-	  || r->reg_type.bitfield.class == RegMask)
+      if (r->reg_type.bitfield.zmmword)
+	return false;
+
+      if (!cpu_arch_flags.bitfield.cpuavx10_1
+	  && r->reg_type.bitfield.class == RegMask)
 	return false;
 
       if (!cpu_arch_flags.bitfield.cpuavx)
@@ -13826,7 +13850,8 @@  static bool check_register (const reg_entry *r)
      mode, and require EVEX encoding.  */
   if (r->reg_flags & RegVRex)
     {
-      if (!cpu_arch_flags.bitfield.cpuavx512f
+      if ((!cpu_arch_flags.bitfield.cpuavx512f
+	   && !cpu_arch_flags.bitfield.cpuavx10_1)
 	  || flag_code != CODE_64BIT)
 	return false;
 
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index dd06282a5a3..ddb6e4dec81 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -212,6 +212,8 @@  accept various extension mnemonics.  For example,
 @code{sm3},
 @code{sm4},
 @code{pbndkb},
+@code{avx10.1},
+@code{avx10_max_512bit},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -1642,7 +1644,7 @@  supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.cmpccxadd} @tab @samp{.wrmsrns} @tab @samp{.msrlist}
 @item @samp{.avx_ne_convert} @tab @samp{.rao_int} @tab @samp{.fred} @tab @samp{.lkgs}
 @item @samp{.avx_vnni_int16} @tab @samp{.sha512} @tab @samp{.sm3} @tab @samp{.sm4}
-@item @samp{.pbndkb}
+@item @samp{.pbndkb} @tab @samp{.avx10.1} @tab @samp{.avx10_max_512bit}
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
diff --git a/gas/testsuite/gas/i386/avx-ifma-inval.l b/gas/testsuite/gas/i386/avx-ifma-inval.l
index 5294c2ca73d..d2f1cf1d544 100644
--- a/gas/testsuite/gas/i386/avx-ifma-inval.l
+++ b/gas/testsuite/gas/i386/avx-ifma-inval.l
@@ -1,3 +1,3 @@ 
 .* Assembler messages:
-.*:6: Error: unsupported .* `vpmadd52huq'
-.*:7: Error: operand .* `vpmadd52huq'
+.*:7: Error: unsupported .* `vpmadd52huq'
+.*:8: Error: operand .* `vpmadd52huq'
diff --git a/gas/testsuite/gas/i386/avx-ifma-inval.s b/gas/testsuite/gas/i386/avx-ifma-inval.s
index 4b763b6e450..a1a50dcacc7 100644
--- a/gas/testsuite/gas/i386/avx-ifma-inval.s
+++ b/gas/testsuite/gas/i386/avx-ifma-inval.s
@@ -2,6 +2,7 @@ 
 
 	.text
 	.arch .noavx512ifma
+	.arch .noavx10.1
 _start:
 	vpmadd52huq %xmm2, %xmm4, %xmm2{%k6}
 	vpmadd52huq %zmm2, %zmm4, %zmm2
diff --git a/gas/testsuite/gas/i386/avx-ifma.s b/gas/testsuite/gas/i386/avx-ifma.s
index 81046966d70..8c1b3133a19 100644
--- a/gas/testsuite/gas/i386/avx-ifma.s
+++ b/gas/testsuite/gas/i386/avx-ifma.s
@@ -17,6 +17,7 @@  _start:
        test_insn vpmadd52luq
 
        .arch .noavx512vl
+       .arch .noavx10.1
 
        vpmadd52huq	  %zmm0, %zmm0, %zmm0
        vpmadd52huq	  %ymm0, %ymm0, %ymm0
@@ -24,12 +25,14 @@  _start:
 
        .arch default
        .arch .noavx512ifma
+       .arch .noavx10.1
        
        vpmadd52huq	  %ymm0, %ymm0, %ymm0
        vpmadd52huq	  %xmm0, %xmm0, %xmm0
 
        .arch default
        .arch .noavx512f
+       .arch .noavx10.1
 
        vpmadd52huq	  %ymm0, %ymm0, %ymm0
        vpmadd52huq	  %xmm0, %xmm0, %xmm0
diff --git a/gas/testsuite/gas/i386/avx-vnni-inval.l b/gas/testsuite/gas/i386/avx-vnni-inval.l
index 58535cf8deb..5b9b1a514f4 100644
--- a/gas/testsuite/gas/i386/avx-vnni-inval.l
+++ b/gas/testsuite/gas/i386/avx-vnni-inval.l
@@ -1,3 +1,3 @@ 
 .* Assembler messages:
-.*:6: Error: unsupported .* `vpdpbusd'
-.*:7: Error: operand .* `vpdpbusd'
+.*:7: Error: unsupported .* `vpdpbusd'
+.*:8: Error: operand .* `vpdpbusd'
diff --git a/gas/testsuite/gas/i386/avx-vnni-inval.s b/gas/testsuite/gas/i386/avx-vnni-inval.s
index 28366f1e6d2..a2b07957e1e 100644
--- a/gas/testsuite/gas/i386/avx-vnni-inval.s
+++ b/gas/testsuite/gas/i386/avx-vnni-inval.s
@@ -2,6 +2,7 @@ 
 
 	.text
 	.arch .noavx512_vnni
+	.arch .noavx10.1
 _start:
 	vpdpbusd %xmm2, %xmm4, %xmm2{%k6}
 	vpdpbusd %zmm2, %zmm4, %zmm2
diff --git a/gas/testsuite/gas/i386/avx-vnni.s b/gas/testsuite/gas/i386/avx-vnni.s
index 6260330cca4..a31af4c4376 100644
--- a/gas/testsuite/gas/i386/avx-vnni.s
+++ b/gas/testsuite/gas/i386/avx-vnni.s
@@ -17,6 +17,7 @@  _start:
 	test_insn vpdpwssds
 
 	.arch .noavx512vl
+	.arch .noavx10.1
 
 	vpdpbusd	%zmm0, %zmm0, %zmm0
 	vpdpbusd	%ymm0, %ymm0, %ymm0
@@ -24,12 +25,14 @@  _start:
 
 	.arch default
 	.arch .noavx512_vnni
+	.arch .noavx10.1
 
 	vpdpbusd	%ymm0, %ymm0, %ymm0
 	vpdpbusd	%xmm0, %xmm0, %xmm0
 
 	.arch default
 	.arch .noavx512f
+	.arch .noavx10.1
 
 	vpdpbusd	%ymm0, %ymm0, %ymm0
 	vpdpbusd	%xmm0, %xmm0, %xmm0
diff --git a/gas/testsuite/gas/i386/noavx512-1.l b/gas/testsuite/gas/i386/noavx512-1.l
index 655a90de2ce..c636717086a 100644
--- a/gas/testsuite/gas/i386/noavx512-1.l
+++ b/gas/testsuite/gas/i386/noavx512-1.l
@@ -1,44 +1,44 @@ 
 .*: Assembler messages:
-.*:8: Error: .*operand size mismatch.*
-.*:9: Error: .*unsupported masking.*
+.*:9: Error: .*operand size mismatch.*
 .*:10: Error: .*unsupported masking.*
-.*:25: Error: .*not supported.*
+.*:11: Error: .*unsupported masking.*
 .*:26: Error: .*not supported.*
 .*:27: Error: .*not supported.*
-.*:11: Error: .*not supported.*
+.*:28: Error: .*not supported.*
 .*:12: Error: .*not supported.*
 .*:13: Error: .*not supported.*
 .*:14: Error: .*not supported.*
 .*:15: Error: .*not supported.*
 .*:16: Error: .*not supported.*
 .*:17: Error: .*not supported.*
-.*:21: Error: .*operand.*mismatch.*
-.*:22: Error: .*unsupported masking.*
+.*:18: Error: .*not supported.*
+.*:22: Error: .*operand.*mismatch.*
 .*:23: Error: .*unsupported masking.*
-.*:24: Error: .*not supported.*
+.*:24: Error: .*unsupported masking.*
 .*:25: Error: .*not supported.*
 .*:26: Error: .*not supported.*
 .*:27: Error: .*not supported.*
-.*:8: Error: .*bad register name.*
-.*:9: Error: .*unknown vector operation.*
+.*:28: Error: .*not supported.*
+.*:9: Error: .*bad register name.*
 .*:10: Error: .*unknown vector operation.*
-.*:11: Error: .*not supported.*
+.*:11: Error: .*unknown vector operation.*
 .*:12: Error: .*not supported.*
 .*:13: Error: .*not supported.*
 .*:14: Error: .*not supported.*
 .*:15: Error: .*not supported.*
 .*:16: Error: .*not supported.*
 .*:17: Error: .*not supported.*
-.*:18: Error: .*bad register name.*
-.*:19: Error: .*unknown vector operation.*
+.*:18: Error: .*not supported.*
+.*:19: Error: .*bad register name.*
 .*:20: Error: .*unknown vector operation.*
-.*:21: Error: .*bad register name.*
-.*:22: Error: .*unknown vector operation.*
+.*:21: Error: .*unknown vector operation.*
+.*:22: Error: .*bad register name.*
 .*:23: Error: .*unknown vector operation.*
-.*:24: Error: .*not supported.*
+.*:24: Error: .*unknown vector operation.*
 .*:25: Error: .*not supported.*
 .*:26: Error: .*not supported.*
 .*:27: Error: .*not supported.*
+.*:28: Error: .*not supported.*
 #...
 [ 	]*[0-9]+[ 	]+\# Test \.arch \.noavx512XX
 [ 	]*[0-9]+[ 	]+\.text
@@ -49,6 +49,7 @@ 
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch default
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -93,6 +94,7 @@ 
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512bw
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+>  vpabsb %xmm5,%xmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+>  vpabsb %ymm5,%ymm6\{%k7\}
@@ -131,6 +133,7 @@ 
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512cd
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -172,6 +175,7 @@ 
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512dq
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -213,6 +217,7 @@ 
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512er
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -256,6 +261,7 @@ 
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512ifma
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -297,6 +303,7 @@ 
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512pf
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -339,6 +346,7 @@ 
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512vbmi
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D4F 	>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+1CF5
 [ 	]*[0-9]+[ 	]+\?\?\?\? 62F27D0F 	>  vpabsb %xmm5,%xmm6\{%k7\}
@@ -380,6 +388,7 @@ 
 #...
 [ 	]*[0-9]+[ 	]+>  \.arch default
 [ 	]*[0-9]+[ 	]+>  \.arch \.noavx512f
+[ 	]*[0-9]+[ 	]+>  \.arch \.noavx10.1
 [ 	]*[0-9]+[ 	]+>  vpabsb %zmm5,%zmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+>  vpabsb %xmm5,%xmm6\{%k7\}
 [ 	]*[0-9]+[ 	]+>  vpabsb %ymm5,%ymm6\{%k7\}
diff --git a/gas/testsuite/gas/i386/noavx512-1.s b/gas/testsuite/gas/i386/noavx512-1.s
index ab3abdc5ceb..8f579474fdb 100644
--- a/gas/testsuite/gas/i386/noavx512-1.s
+++ b/gas/testsuite/gas/i386/noavx512-1.s
@@ -5,6 +5,7 @@ 
 
 	.arch default
 	.arch \isa
+	.arch .noavx10.1
 	vpabsb %zmm5, %zmm6{%k7}		# AVX512BW
 	vpabsb %xmm5, %xmm6{%k7}		# AVX512BW + AVX512VL
 	vpabsb %ymm5, %ymm6{%k7}		# AVX512BW + AVX512VL
diff --git a/gas/testsuite/gas/i386/noavx512-2.l b/gas/testsuite/gas/i386/noavx512-2.l
index 02c92e0d8db..1a73eb0613a 100644
--- a/gas/testsuite/gas/i386/noavx512-2.l
+++ b/gas/testsuite/gas/i386/noavx512-2.l
@@ -1,106 +1,107 @@ 
 .*: Assembler messages:
-.*:26: Error: .*unsupported masking.*
 .*:27: Error: .*unsupported masking.*
-.*:29: Error: .*unsupported instruction.*
+.*:28: Error: .*unsupported masking.*
 .*:30: Error: .*unsupported instruction.*
-.*:32: Error: .*unsupported instruction.*
+.*:31: Error: .*unsupported instruction.*
 .*:33: Error: .*unsupported instruction.*
-.*:36: Error: .*unsupported masking.*
+.*:34: Error: .*unsupported instruction.*
 .*:37: Error: .*unsupported masking.*
-.*:39: Error: .*unsupported instruction.*
+.*:38: Error: .*unsupported masking.*
 .*:40: Error: .*unsupported instruction.*
-.*:43: Error: .*unsupported instruction.*
+.*:41: Error: .*unsupported instruction.*
 .*:44: Error: .*unsupported instruction.*
+.*:45: Error: .*unsupported instruction.*
 GAS LISTING .*
 #...
 [ 	]*1[ 	]+\# Test \.arch \.noavx512vl
 [ 	]*2[ 	]+\.text
-[ 	]*3[ 	]+\?\?\?\? 62F27D4F 		vpabsb %zmm5, %zmm6\{%k7\}		\# AVX512BW
-[ 	]*3[ 	]+1CF5
-[ 	]*4[ 	]+\?\?\?\? 62F27D0F 		vpabsb %xmm5, %xmm6\{%k7\}		\# AVX512BW \+ AVX512VL
+[ 	]*3[ 	]+\.arch \.noavx10.1
+[ 	]*4[ 	]+\?\?\?\? 62F27D4F 		vpabsb %zmm5, %zmm6\{%k7\}		\# AVX512BW
 [ 	]*4[ 	]+1CF5
-[ 	]*5[ 	]+\?\?\?\? 62F27D2F 		vpabsb %ymm5, %ymm6\{%k7\}		\# AVX512BW \+ AVX512VL
+[ 	]*5[ 	]+\?\?\?\? 62F27D0F 		vpabsb %xmm5, %xmm6\{%k7\}		\# AVX512BW \+ AVX512VL
 [ 	]*5[ 	]+1CF5
-[ 	]*6[ 	]+\?\?\?\? 62F27D48 		vpconflictd %zmm5, %zmm6		\# AVX412CD
-[ 	]*6[ 	]+C4F5
-[ 	]*7[ 	]+\?\?\?\? 62F27D08 		vpconflictd %xmm5, %xmm6		\# AVX412CD \+ AVX512VL
+[ 	]*6[ 	]+\?\?\?\? 62F27D2F 		vpabsb %ymm5, %ymm6\{%k7\}		\# AVX512BW \+ AVX512VL
+[ 	]*6[ 	]+1CF5
+[ 	]*7[ 	]+\?\?\?\? 62F27D48 		vpconflictd %zmm5, %zmm6		\# AVX412CD
 [ 	]*7[ 	]+C4F5
-[ 	]*8[ 	]+\?\?\?\? 62F27D28 		vpconflictd %ymm5, %ymm6		\# AVX412CD \+ AVX512VL
+[ 	]*8[ 	]+\?\?\?\? 62F27D08 		vpconflictd %xmm5, %xmm6		\# AVX412CD \+ AVX512VL
 [ 	]*8[ 	]+C4F5
-[ 	]*9[ 	]+\?\?\?\? 62F1FD4F 		vcvtpd2qq \(%ecx\), %zmm6\{%k7\}		\# AVX512DQ
-[ 	]*9[ 	]+7B31
-[ 	]*10[ 	]+\?\?\?\? 62F1FD0F 		vcvtpd2qq \(%ecx\), %xmm6\{%k7\}		\# AVX512DQ \+ AVX512VL
+[ 	]*9[ 	]+\?\?\?\? 62F27D28 		vpconflictd %ymm5, %ymm6		\# AVX412CD \+ AVX512VL
+[ 	]*9[ 	]+C4F5
+[ 	]*10[ 	]+\?\?\?\? 62F1FD4F 		vcvtpd2qq \(%ecx\), %zmm6\{%k7\}		\# AVX512DQ
 [ 	]*10[ 	]+7B31
-[ 	]*11[ 	]+\?\?\?\? 62F1FD2F 		vcvtpd2qq \(%ecx\), %ymm6\{%k7\}		\# AVX512DQ \+ AVX512VL
+[ 	]*11[ 	]+\?\?\?\? 62F1FD0F 		vcvtpd2qq \(%ecx\), %xmm6\{%k7\}		\# AVX512DQ \+ AVX512VL
 [ 	]*11[ 	]+7B31
-[ 	]*12[ 	]+\?\?\?\? 62F27D4F 		vexp2ps %zmm5, %zmm6\{%k7\}		\# AVX512ER
-[ 	]*12[ 	]+C8F5
-[ 	]*13[ 	]+\?\?\?\? 62F1D54F 		vaddpd %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512F
-[ 	]*13[ 	]+58F4
-[ 	]*14[ 	]+\?\?\?\? 62F1D50F 		vaddpd %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512F \+ AVX512VL
+[ 	]*12[ 	]+\?\?\?\? 62F1FD2F 		vcvtpd2qq \(%ecx\), %ymm6\{%k7\}		\# AVX512DQ \+ AVX512VL
+[ 	]*12[ 	]+7B31
+[ 	]*13[ 	]+\?\?\?\? 62F27D4F 		vexp2ps %zmm5, %zmm6\{%k7\}		\# AVX512ER
+[ 	]*13[ 	]+C8F5
+[ 	]*14[ 	]+\?\?\?\? 62F1D54F 		vaddpd %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512F
 [ 	]*14[ 	]+58F4
-[ 	]*15[ 	]+\?\?\?\? 62F1D52F 		vaddpd %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512F \+ AVX512VL
+[ 	]*15[ 	]+\?\?\?\? 62F1D50F 		vaddpd %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512F \+ AVX512VL
 [ 	]*15[ 	]+58F4
-[ 	]*16[ 	]+\?\?\?\? 62F2D54F 		vpmadd52luq %zmm4, %zmm5, %zmm6\{%k7\}	\# AVX512IFMA
-[ 	]*16[ 	]+B4F4
-[ 	]*17[ 	]+\?\?\?\? 62F2D50F 		vpmadd52luq %xmm4, %xmm5, %xmm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
+[ 	]*16[ 	]+\?\?\?\? 62F1D52F 		vaddpd %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512F \+ AVX512VL
+[ 	]*16[ 	]+58F4
+[ 	]*17[ 	]+\?\?\?\? 62F2D54F 		vpmadd52luq %zmm4, %zmm5, %zmm6\{%k7\}	\# AVX512IFMA
 [ 	]*17[ 	]+B4F4
-[ 	]*18[ 	]+\?\?\?\? 62F2D52F 		vpmadd52luq %ymm4, %ymm5, %ymm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
+[ 	]*18[ 	]+\?\?\?\? 62F2D50F 		vpmadd52luq %xmm4, %xmm5, %xmm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
 [ 	]*18[ 	]+B4F4
-[ 	]*19[ 	]+\?\?\?\? 62F2FD49 		vgatherpf0dpd 23\(%ebp,%ymm7,8\)\{%k1\}	\# AVX512PF
-[ 	]*19[ 	]+C68CFD17 
-[ 	]*19[ 	]+000000
-[ 	]*20[ 	]+\?\?\?\? 62F2554F 		vpermb %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512VBMI
-[ 	]*20[ 	]+8DF4
-[ 	]*21[ 	]+\?\?\?\? 62F2550F 		vpermb %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
+[ 	]*19[ 	]+\?\?\?\? 62F2D52F 		vpmadd52luq %ymm4, %ymm5, %ymm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
+[ 	]*19[ 	]+B4F4
+[ 	]*20[ 	]+\?\?\?\? 62F2FD49 		vgatherpf0dpd 23\(%ebp,%ymm7,8\)\{%k1\}	\# AVX512PF
+[ 	]*20[ 	]+C68CFD17 
+[ 	]*20[ 	]+000000
+[ 	]*21[ 	]+\?\?\?\? 62F2554F 		vpermb %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512VBMI
 [ 	]*21[ 	]+8DF4
-[ 	]*22[ 	]+\?\?\?\? 62F2552F 		vpermb %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
+[ 	]*22[ 	]+\?\?\?\? 62F2550F 		vpermb %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
 [ 	]*22[ 	]+8DF4
-[ 	]*23[ 	]+
-[ 	]*24[ 	]+\.arch \.noavx512vl
-[ 	]*25[ 	]+\?\?\?\? 62F27D4F 		vpabsb %zmm5, %zmm6\{%k7\}		\# AVX512BW
-[ 	]*25[ 	]+1CF5
-[ 	]*26[ 	]+vpabsb %xmm5, %xmm6\{%k7\}		\# AVX512BW \+ AVX512VL
-[ 	]*27[ 	]+vpabsb %ymm5, %ymm6\{%k7\}		\# AVX512BW \+ AVX512VL
-[ 	]*28[ 	]+\?\?\?\? 62F27D48 		vpconflictd %zmm5, %zmm6		\# AVX412CD
-[ 	]*28[ 	]+C4F5
-[ 	]*29[ 	]+vpconflictd %xmm5, %xmm6		\# AVX412CD \+ AVX512VL
-[ 	]*30[ 	]+vpconflictd %ymm5, %ymm6		\# AVX412CD \+ AVX512VL
-[ 	]*31[ 	]+\?\?\?\? 62F1FD4F 		vcvtpd2qq \(%ecx\), %zmm6\{%k7\}		\# AVX512DQ
-[ 	]*31[ 	]+7B31
-[ 	]*32[ 	]+vcvtpd2qq \(%ecx\), %xmm6\{%k7\}		\# AVX512DQ \+ AVX512VL
-[ 	]*33[ 	]+vcvtpd2qq \(%ecx\), %ymm6\{%k7\}		\# AVX512DQ \+ AVX512VL
+[ 	]*23[ 	]+\?\?\?\? 62F2552F 		vpermb %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
+[ 	]*23[ 	]+8DF4
+[ 	]*24[ 	]+
+[ 	]*25[ 	]+\.arch \.noavx512vl
+[ 	]*26[ 	]+\?\?\?\? 62F27D4F 		vpabsb %zmm5, %zmm6\{%k7\}		\# AVX512BW
+[ 	]*26[ 	]+1CF5
+[ 	]*27[ 	]+vpabsb %xmm5, %xmm6\{%k7\}		\# AVX512BW \+ AVX512VL
+[ 	]*28[ 	]+vpabsb %ymm5, %ymm6\{%k7\}		\# AVX512BW \+ AVX512VL
+[ 	]*29[ 	]+\?\?\?\? 62F27D48 		vpconflictd %zmm5, %zmm6		\# AVX412CD
+[ 	]*29[ 	]+C4F5
+[ 	]*30[ 	]+vpconflictd %xmm5, %xmm6		\# AVX412CD \+ AVX512VL
+[ 	]*31[ 	]+vpconflictd %ymm5, %ymm6		\# AVX412CD \+ AVX512VL
+[ 	]*32[ 	]+\?\?\?\? 62F1FD4F 		vcvtpd2qq \(%ecx\), %zmm6\{%k7\}		\# AVX512DQ
+[ 	]*32[ 	]+7B31
+[ 	]*33[ 	]+vcvtpd2qq \(%ecx\), %xmm6\{%k7\}		\# AVX512DQ \+ AVX512VL
 GAS LISTING .*
 
 
-[ 	]*34[ 	]+\?\?\?\? 62F27D4F 		vexp2ps %zmm5, %zmm6\{%k7\}		\# AVX512ER
-[ 	]*34[ 	]+C8F5
-[ 	]*35[ 	]+\?\?\?\? 62F1D54F 		vaddpd %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512F
-[ 	]*35[ 	]+58F4
-[ 	]*36[ 	]+vaddpd %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512F \+ AVX512VL
-[ 	]*37[ 	]+vaddpd %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512F \+ AVX512VL
-[ 	]*38[ 	]+\?\?\?\? 62F2D54F 		vpmadd52luq %zmm4, %zmm5, %zmm6\{%k7\}	\# AVX512IFMA
-[ 	]*38[ 	]+B4F4
-[ 	]*39[ 	]+vpmadd52luq %xmm4, %xmm5, %xmm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
-[ 	]*40[ 	]+vpmadd52luq %ymm4, %ymm5, %ymm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
-[ 	]*41[ 	]+\?\?\?\? 62F2FD49 		vgatherpf0dpd 23\(%ebp,%ymm7,8\)\{%k1\}	\# AVX512PF
-[ 	]*41[ 	]+C68CFD17 
-[ 	]*41[ 	]+000000
-[ 	]*42[ 	]+\?\?\?\? 62F2554F 		vpermb %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512VBMI
-[ 	]*42[ 	]+8DF4
-[ 	]*43[ 	]+vpermb %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
-[ 	]*44[ 	]+vpermb %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
-[ 	]*45[ 	]+
-[ 	]*46[ 	]+\?\?\?\? C4E2791C 		vpabsb %xmm5, %xmm6
-[ 	]*46[ 	]+F5
-[ 	]*47[ 	]+\?\?\?\? C4E27D1C 		vpabsb %ymm5, %ymm6
+[ 	]*34[ 	]+vcvtpd2qq \(%ecx\), %ymm6\{%k7\}		\# AVX512DQ \+ AVX512VL
+[ 	]*35[ 	]+\?\?\?\? 62F27D4F 		vexp2ps %zmm5, %zmm6\{%k7\}		\# AVX512ER
+[ 	]*35[ 	]+C8F5
+[ 	]*36[ 	]+\?\?\?\? 62F1D54F 		vaddpd %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512F
+[ 	]*36[ 	]+58F4
+[ 	]*37[ 	]+vaddpd %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512F \+ AVX512VL
+[ 	]*38[ 	]+vaddpd %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512F \+ AVX512VL
+[ 	]*39[ 	]+\?\?\?\? 62F2D54F 		vpmadd52luq %zmm4, %zmm5, %zmm6\{%k7\}	\# AVX512IFMA
+[ 	]*39[ 	]+B4F4
+[ 	]*40[ 	]+vpmadd52luq %xmm4, %xmm5, %xmm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
+[ 	]*41[ 	]+vpmadd52luq %ymm4, %ymm5, %ymm6\{%k7\}	\# AVX512IFMA \+ AVX512VL
+[ 	]*42[ 	]+\?\?\?\? 62F2FD49 		vgatherpf0dpd 23\(%ebp,%ymm7,8\)\{%k1\}	\# AVX512PF
+[ 	]*42[ 	]+C68CFD17 
+[ 	]*42[ 	]+000000
+[ 	]*43[ 	]+\?\?\?\? 62F2554F 		vpermb %zmm4, %zmm5, %zmm6\{%k7\}		\# AVX512VBMI
+[ 	]*43[ 	]+8DF4
+[ 	]*44[ 	]+vpermb %xmm4, %xmm5, %xmm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
+[ 	]*45[ 	]+vpermb %ymm4, %ymm5, %ymm6\{%k7\}		\# AVX512VBMI \+ AVX512VL
+[ 	]*46[ 	]+
+[ 	]*47[ 	]+\?\?\?\? C4E2791C 		vpabsb %xmm5, %xmm6
 [ 	]*47[ 	]+F5
-[ 	]*48[ 	]+\?\?\?\? C5D158F4 		vaddpd %xmm4, %xmm5, %xmm6
-[ 	]*49[ 	]+\?\?\?\? C5D558F4 		vaddpd %ymm4, %ymm5, %ymm6
-[ 	]*50[ 	]+\?\?\?\? 660F381C 		pabsb %xmm5, %xmm6
-[ 	]*50[ 	]+F5
-[ 	]*51[ 	]+\?\?\?\? 660F58F4 		addpd %xmm4, %xmm6
-[ 	]*52[ 	]+
+[ 	]*48[ 	]+\?\?\?\? C4E27D1C 		vpabsb %ymm5, %ymm6
+[ 	]*48[ 	]+F5
+[ 	]*49[ 	]+\?\?\?\? C5D158F4 		vaddpd %xmm4, %xmm5, %xmm6
+[ 	]*50[ 	]+\?\?\?\? C5D558F4 		vaddpd %ymm4, %ymm5, %ymm6
+[ 	]*51[ 	]+\?\?\?\? 660F381C 		pabsb %xmm5, %xmm6
+[ 	]*51[ 	]+F5
+[ 	]*52[ 	]+\?\?\?\? 660F58F4 		addpd %xmm4, %xmm6
+[ 	]*53[ 	]+
 [ 	]*[1-9][0-9]*[ 	]+\.intel_syntax noprefix
 [ 	]*[1-9][0-9]*[ 	]+\?\?\?\? 62F3FD48 		vfpclasspd k0, \[eax], 0
 [ 	]*[1-9][0-9]*[ 	]+660000
diff --git a/gas/testsuite/gas/i386/noavx512-2.s b/gas/testsuite/gas/i386/noavx512-2.s
index d974bcf9df5..a63d0484c61 100644
--- a/gas/testsuite/gas/i386/noavx512-2.s
+++ b/gas/testsuite/gas/i386/noavx512-2.s
@@ -1,5 +1,6 @@ 
 # Test .arch .noavx512vl
 	.text
+	.arch .noavx10.1
 	vpabsb %zmm5, %zmm6{%k7}		# AVX512BW
 	vpabsb %xmm5, %xmm6{%k7}		# AVX512BW + AVX512VL
 	vpabsb %ymm5, %ymm6{%k7}		# AVX512BW + AVX512VL
diff --git a/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.l b/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.l
index fad43f6768c..0046cbcb5d1 100644
--- a/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.l
+++ b/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.l
@@ -1,4 +1,4 @@ 
 .* Assembler messages:
-.*:6: Error: unsupported .* `vpmadd52huq'
 .*:7: Error: unsupported .* `vpmadd52huq'
-.*:8: Error: operand .* `vpmadd52huq'
+.*:8: Error: unsupported .* `vpmadd52huq'
+.*:9: Error: operand .* `vpmadd52huq'
diff --git a/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.s b/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.s
index 76da0f1a37d..b2175e8d066 100644
--- a/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.s
+++ b/gas/testsuite/gas/i386/x86-64-avx-ifma-inval.s
@@ -2,6 +2,7 @@ 
 
 	.text
 	.arch .noavx512ifma
+	.arch .noavx10.1
 _start:
 	vpmadd52huq %xmm2, %xmm4, %xmm2{%k6}
 	vpmadd52huq %xmm22, %xmm4, %xmm2{%k1}
diff --git a/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.l b/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.l
index 61808668a8d..81aedddf4e2 100644
--- a/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.l
+++ b/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.l
@@ -1,4 +1,4 @@ 
 .* Assembler messages:
-.*:6: Error: unsupported .* `vpdpbusds'
 .*:7: Error: unsupported .* `vpdpbusds'
-.*:8: Error: operand .* `vpdpbusds'
+.*:8: Error: unsupported .* `vpdpbusds'
+.*:9: Error: operand .* `vpdpbusds'
diff --git a/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.s b/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.s
index 8b1b80cac5d..78284546650 100644
--- a/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.s
+++ b/gas/testsuite/gas/i386/x86-64-avx-vnni-inval.s
@@ -2,6 +2,7 @@ 
 
 	.text
 	.arch .noavx512_vnni
+	.arch .noavx10.1
 _start:
 	vpdpbusds %xmm2, %xmm4, %xmm2{%k6}
 	vpdpbusds %xmm22, %xmm4, %xmm2{%k1}
diff --git a/gas/testsuite/gas/i386/x86-64-avx10_1-inval.l b/gas/testsuite/gas/i386/x86-64-avx10_1-inval.l
new file mode 100644
index 00000000000..0e4b9269c62
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx10_1-inval.l
@@ -0,0 +1,20 @@ 
+.* Assembler messages:
+.*:6: Error: `vp2intersectq' is not supported on `x86_64.noavx512f'
+.*:7: Error: `vgatherpf0dpd' is not supported on `x86_64.noavx512f'
+.*:8: Error: `vrcp28ss' is not supported on `x86_64.noavx512f'
+.*:9: Error: `vp4dpwssd' is not supported on `x86_64.noavx512f'
+.*:10: Error: `v4fnmaddss' is not supported on `x86_64.noavx512f'
+.*:14: Error: `kaddq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:15: Error: `kandq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:16: Error: `kandnq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:17: Error: `kmovq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:18: Error: `knotq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:19: Error: `korq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:20: Error: `kortestq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:21: Error: `kshiftlq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:22: Error: `kshiftrq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:23: Error: `ktestq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:24: Error: `kunpckdq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:25: Error: `kxnorq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:26: Error: `kxorq' is not supported on `x86_64.noavx512f.noavx10_max_512bit'
+.*:27: Error: bad register name `%zmm4'
diff --git a/gas/testsuite/gas/i386/x86-64-avx10_1-inval.s b/gas/testsuite/gas/i386/x86-64-avx10_1-inval.s
new file mode 100644
index 00000000000..1d091b83ae4
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx10_1-inval.s
@@ -0,0 +1,27 @@ 
+# Check invalid AVX10.1 instructions
+
+	.text
+__start:
+	.arch .noavx512f
+	vp2intersectq	%xmm1, %xmm2, %k3
+	vgatherpf0dpd	123(%ebp,%ymm7,8){%k1}
+	vrcp28ss	%xmm4, %xmm5, %xmm6{%k7}
+	vp4dpwssd	(%ecx), %zmm4, %zmm1
+	v4fnmaddss	(%ecx), %xmm4, %xmm1
+
+	.arch .noavx512f
+	.arch .noavx10_max_512bit
+	kaddq	%k1, %k2, %k3
+	kandq	%k1, %k2, %k3
+	kandnq	%k1, %k2, %k3
+	kmovq	%k1, %k2
+	knotq	%k1, %k2
+	korq	%k1, %k2, %k3
+	kortestq	%k1, %k2
+	kshiftlq	$1, %k1, %k2
+	kshiftrq	$1, %k1, %k2
+	ktestq	%k1, %k2
+	kunpckdq	%k1, %k2, %k3
+	kxnorq	%k1, %k2, %k3
+	kxorq	%k1, %k2, %k3
+	vaddpd  %zmm4, %zmm5, %zmm6
diff --git a/gas/testsuite/gas/i386/x86-64-avx10_1.d b/gas/testsuite/gas/i386/x86-64-avx10_1.d
new file mode 100644
index 00000000000..4225c2e2c58
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx10_1.d
@@ -0,0 +1,54 @@ 
+#objdump: -dw
+#name: x86_64 AVX10.1 instructions
+#source: x86-64-avx10_1.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e1 ed 4a d9\s+kaddd  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*c5 ed 4a d9\s+kaddb  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*c5 ec 4a d9\s+kaddw  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*c4 e1 ec 4a d9\s+kaddq  %k1,%k2,%k3
+\s*[a-f0-9]+:\s*67 c5 f9 90 29\s+kmovb  \(%ecx\),%k5
+\s*[a-f0-9]+:\s*67 c5 f9 91 ac f4 c0 1d fe ff\s+kmovb  %k5,-0x1e240\(%esp,%esi,8\)
+\s*[a-f0-9]+:\s*67 c4 e1 f9 90 ac f4 c0 1d fe ff\s+kmovd  -0x1e240\(%esp,%esi,8\),%k5
+\s*[a-f0-9]+:\s*c5 fb 92 ed\s+kmovd  %ebp,%k5
+\s*[a-f0-9]+:\s*67 c5 f8 91 29\s+kmovw  %k5,\(%ecx\)
+\s*[a-f0-9]+:\s*c5 f8 93 ed\s+kmovw  %k5,%ebp
+\s*[a-f0-9]+:\s*62 f1 d5 0f 58 f4\s+vaddpd %xmm4,%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 0f 58 31\s+vaddpd \(%ecx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 1f 58 30\s+vaddpd \(%eax\)\{1to2\},%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 0f 58 b2 00 08 00 00\s+vaddpd 0x800\(%edx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 0f 58 b2 f0 f7 ff ff\s+vaddpd -0x810\(%edx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 1f 58 b2 00 04 00 00\s+vaddpd 0x400\(%edx\)\{1to2\},%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 1f 58 b2 f8 fb ff ff\s+vaddpd -0x408\(%edx\)\{1to2\},%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*62 f1 d5 cf 58 f4\s+vaddpd %zmm4,%zmm5,%zmm6\{%k7\}\{z\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 2f 58 b4 f4 c0 1d fe ff\s+vaddpd -0x1e240\(%esp,%esi,8\),%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 4f 58 b2 00 20 00 00\s+vaddpd 0x2000\(%edx\),%zmm5,%zmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 2f 58 72 80\s+vaddpd -0x1000\(%edx\),%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 3f 58 72 7f\s+vaddpd 0x3f8\(%edx\)\{1to4\},%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f1 d5 5f 58 b2 00 f8 ff ff\s+vaddpd -0x800\(%edx\)\{1to8\},%zmm5,%zmm6\{%k7\}
+\s*[a-f0-9]+:\s*62 f3 d5 0f ce f4 ab\s+vgf2p8affineqb \$0xab,%xmm4,%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f3 d5 2f ce b4 f4 c0 1d fe ff 7b\s+vgf2p8affineqb \$0x7b,-0x1e240\(%esp,%esi,8\),%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f3 d5 3f ce 72 7f 7b\s+vgf2p8affineqb \$0x7b,0x3f8\(%edx\)\{1to4\},%ymm5,%ymm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f3 d5 0f cf 72 7f 7b\s+vgf2p8affineinvqb \$0x7b,0x7f0\(%edx\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*62 f3 d5 af cf f4 ab\s+vgf2p8affineinvqb \$0xab,%ymm4,%ymm5,%ymm6\{%k7\}\{z\}
+\s*[a-f0-9]+:\s*62 f2 55 4f cf f4\s+vgf2p8mulb %zmm4,%zmm5,%zmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f2 55 0f cf b4 f4 c0 1d fe ff\s+vgf2p8mulb -0x1e240\(%esp,%esi,8\),%xmm5,%xmm6\{%k7\}
+\s*[a-f0-9]+:\s*67 62 f2 55 4f cf b2 00 20 00 00\s+vgf2p8mulb 0x2000\(%edx\),%zmm5,%zmm6\{%k7\}
+\s*[a-f0-9]+:\s*62 82 2d 20 dc f0\s+vaesenc %ymm24,%ymm26,%ymm22
+\s*[a-f0-9]+:\s*67 62 e2 05 08 de 84 f4 c0 1d fe ff\s+vaesdec -0x1e240\(%esp,%esi,8\),%xmm15,%xmm16
+\s*[a-f0-9]+:\s*62 02 2d 00 dd d8\s+vaesenclast %xmm24,%xmm26,%xmm27
+\s*[a-f0-9]+:\s*67 62 62 35 20 df 52 7f\s+vaesdeclast 0xfe0\(%edx\),%ymm25,%ymm26
+\s*[a-f0-9]+:\s*62 82 2d 40 de f0\s+vaesdec %zmm24,%zmm26,%zmm22
+\s*[a-f0-9]+:\s*67 62 62 2d 40 df 19\s+vaesdeclast \(%ecx\),%zmm26,%zmm27
+\s*[a-f0-9]+:\s*62 a3 4d 00 44 fe ab\s+vpclmulqdq \$0xab,%xmm22,%xmm22,%xmm23
+\s*[a-f0-9]+:\s*67 62 e3 4d 00 44 7a 7f 7b\s+vpclmulqdq \$0x7b,0x7f0\(%edx\),%xmm22,%xmm23
+\s*[a-f0-9]+:\s*67 62 73 7d 20 44 b4 f4 c0 1d fe ff 7b\s+vpclmulqdq \$0x7b,-0x1e240\(%esp,%esi,8\),%ymm16,%ymm14
+\s*[a-f0-9]+:\s*62 23 45 00 44 c6 11\s+vpclmulhqhqdq %xmm22,%xmm23,%xmm24
+\s*[a-f0-9]+:\s*62 c3 05 08 44 c6 10\s+vpclmullqhqdq %xmm14,%xmm15,%xmm16
+\s*[a-f0-9]+:\s*62 23 45 20 44 c6 01\s+vpclmulhqlqdq %ymm22,%ymm23,%ymm24
+\s*[a-f0-9]+:\s*62 c3 05 48 44 c6 00\s+vpclmullqlqdq %zmm14,%zmm15,%zmm16
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-avx10_1.s b/gas/testsuite/gas/i386/x86-64-avx10_1.s
new file mode 100644
index 00000000000..5169d15ba6b
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx10_1.s
@@ -0,0 +1,50 @@ 
+# Check AVX10.1 instructions
+
+	.text
+_start:
+	.arch .noavx512f
+
+	kaddd	%k1, %k2, %k3
+	kaddb	%k1, %k2, %k3
+	kaddw	%k1, %k2, %k3
+	kaddq	%k1, %k2, %k3
+	kmovb   (%ecx), %k5
+	kmovb   %k5, -123456(%esp,%esi,8)
+	kmovd   -123456(%esp,%esi,8), %k5
+	kmovd   %ebp, %k5
+	kmovw   %k5, (%ecx)
+	kmovw   %k5, %ebp
+	vaddpd  %xmm4, %xmm5, %xmm6{%k7}
+	vaddpd  (%ecx), %xmm5, %xmm6{%k7}
+	vaddpd  (%eax){1to2}, %xmm5, %xmm6{%k7}
+	vaddpd  2048(%edx), %xmm5, %xmm6{%k7}
+	vaddpd  -2064(%edx), %xmm5, %xmm6{%k7}
+	vaddpd  1024(%edx){1to2}, %xmm5, %xmm6{%k7}
+	vaddpd  -1032(%edx){1to2}, %xmm5, %xmm6{%k7}
+	vaddpd  %zmm4, %zmm5, %zmm6{%k7}{z}
+	vaddpd  -123456(%esp,%esi,8), %ymm5, %ymm6{%k7}
+	vaddpd  8192(%edx), %zmm5, %zmm6{%k7}
+	vaddpd  -4096(%edx), %ymm5, %ymm6{%k7}
+	vaddpd  1016(%edx){1to4}, %ymm5, %ymm6{%k7}
+	vaddpd  -2048(%edx){1to8}, %zmm5, %zmm6{%k7}
+	vgf2p8affineqb	$0xab, %xmm4, %xmm5, %xmm6{%k7}
+	vgf2p8affineqb	$123, -123456(%esp,%esi,8), %ymm5, %ymm6{%k7}
+	vgf2p8affineqb	$123, 1016(%edx){1to4}, %ymm5, %ymm6{%k7}
+	vgf2p8affineinvqb	$123, 2032(%edx), %xmm5, %xmm6{%k7}
+	vgf2p8affineinvqb	$0xab, %ymm4, %ymm5, %ymm6{%k7}{z}
+	vgf2p8mulb	%zmm4, %zmm5, %zmm6{%k7}
+	vgf2p8mulb	-123456(%esp,%esi,8), %xmm5, %xmm6{%k7}
+	vgf2p8mulb	8192(%edx), %zmm5, %zmm6{%k7}
+	vaesenc	%ymm24, %ymm26, %ymm22
+	vaesdec	-123456(%esp,%esi,8), %xmm15, %xmm16
+	vaesenclast	%xmm24, %xmm26, %xmm27
+	vaesdeclast     4064(%edx), %ymm25, %ymm26
+	vaesdec		%zmm24, %zmm26, %zmm22
+	vaesdeclast	(%ecx), %zmm26, %zmm27
+	vpclmulqdq	$0xab, %xmm22, %xmm22, %xmm23
+	vpclmulqdq	$123, 2032(%edx), %xmm22, %xmm23
+	vpclmulqdq	$123, -123456(%esp,%esi,8), %ymm16, %ymm14
+	vpclmulhqhqdq	%xmm22, %xmm23, %xmm24
+	vpclmullqhqdq	%xmm14, %xmm15, %xmm16
+	vpclmulhqlqdq	%ymm22, %ymm23, %ymm24
+	vpclmullqlqdq	%zmm14, %zmm15, %zmm16
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 52711cdcf6f..07e711df559 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -450,6 +450,8 @@  run_dump_test "x86-64-sm4"
 run_dump_test "x86-64-sm4-intel"
 run_dump_test "x86-64-pbndkb"
 run_dump_test "x86-64-pbndkb-intel"
+run_dump_test "x86-64-avx10_1"
+run_list_test "x86-64-avx10_1-inval"
 run_dump_test "x86-64-clzero"
 run_dump_test "x86-64-mwaitx-bdver4"
 run_list_test "x86-64-mwaitx-reg"
diff --git a/gas/testsuite/gas/i386/xmmhi32.s b/gas/testsuite/gas/i386/xmmhi32.s
index 8e8767ac37d..f562711714a 100644
--- a/gas/testsuite/gas/i386/xmmhi32.s
+++ b/gas/testsuite/gas/i386/xmmhi32.s
@@ -26,6 +26,7 @@  xmm:
 	vmovdqa	ymm24, ymm0
 
 	.arch .noavx512f
+	.arch .noavx10.1
 	vaddps	xmm0, xmm1, xmm8
 	vaddps	xmm0, xmm1, xmm16
 	vaddps	xmm0, xmm1, xmm24
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 91c22c9e873..499149356b1 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -168,6 +168,10 @@  static const dependency isa_dependencies[] =
     "AVX2" },
   { "FRED",
     "LKGS" },
+  { "AVX10_1",
+    "AVX2" },
+  { "AVX10_MAX_512BIT",
+    "AVX10_1" },
   { "AVX512F",
     "AVX2" },
   { "AVX512CD",
@@ -378,6 +382,8 @@  static bitfield cpu_flags[] =
   BITFIELD (RAO_INT),
   BITFIELD (FRED),
   BITFIELD (LKGS),
+  BITFIELD (AVX10_1),
+  BITFIELD (AVX10_MAX_512BIT),
   BITFIELD (MWAITX),
   BITFIELD (CLZERO),
   BITFIELD (OSPKE),
@@ -1217,7 +1223,7 @@  static void
 output_i386_opcode (FILE *table, const char *name, char *str,
 		    char *last, int lineno)
 {
-  unsigned int i, length, prefix = 0, space = 0;
+  unsigned int i, j, length, prefix = 0, space = 0, k = 0;
   char *base_opcode, *extension_opcode, *end, *ident;
   char *cpu_flags, *opcode_modifier, *operand_types [MAX_OPERANDS];
   unsigned long long opcode;
@@ -1315,6 +1321,20 @@  output_i386_opcode (FILE *table, const char *name, char *str,
   ident = mkident (name);
   fprintf (table, "  { MN_%s, 0x%0*llx%s, %u,",
 	   ident, 2 * (int)length, opcode, end, i);
+
+  j = strlen(ident);
+  /* All AVX512F based instructions are usable for AVX10.1 except
+     AVX512PF/ER/4FMAPS/4VNNIW/VP2INTERSECT.  */
+  if (strstr (cpu_flags, "AVX512")
+      && !strstr (cpu_flags, "AVX512PF")
+      && !strstr (cpu_flags, "AVX512ER")
+      && !strstr (cpu_flags, "4FMAPS")
+      && !strstr (cpu_flags, "4VNNIW")
+      && !strstr (cpu_flags, "VP2INTERSECT"))
+    {
+      cpu_flags = concat (cpu_flags, "|AVX10_1", NULL);
+      k = 1;
+    }
   free (ident);
 
   process_i386_opcode_modifier (table, opcode_modifier, space, prefix,
@@ -1322,6 +1342,9 @@  output_i386_opcode (FILE *table, const char *name, char *str,
 
   process_i386_cpu_flag (table, cpu_flags, NULL, ",", "    ", lineno, CpuMax);
 
+  if (k)
+    free (cpu_flags);
+
   fprintf (table, "    { ");
 
   for (i = 0; i < ARRAY_SIZE (operand_types); i++)
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 284475076a1..e34cc518834 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -241,6 +241,10 @@  enum
   CpuFRED,
   /* lkgs instruction required */
   CpuLKGS,
+  /* Intel AVX10.1 Instructions support required.  */
+  CpuAVX10_1,
+  /* Intel AVX10 512 bit vector width support required.  */
+  CpuAVX10_MAX_512BIT,
   /* mwaitx instruction required */
   CpuMWAITX,
   /* Clzero instruction required */
@@ -444,6 +448,8 @@  typedef union i386_cpu_flags
       unsigned int cpurao_int:1;
       unsigned int cpufred:1;
       unsigned int cpulkgs:1;
+      unsigned int cpuavx10_1:1;
+      unsigned int cpuavx10_max_512bit:1;
       unsigned int cpumwaitx:1;
       unsigned int cpuclzero:1;
       unsigned int cpuospke:1;