[2/5] Support Intel SHA512

Message ID 20230713063303.205862-3-haochen.jiang@intel.com
State Unresolved
Headers
Series Support Intel Arrow Lake/Lunar Lake ISAs |

Checks

Context Check Description
snail/binutils-gdb-check warning Git am fail log

Commit Message

Jiang, Haochen July 13, 2023, 6:33 a.m. UTC
  Hi Jan,

In SHA512 patch, I have considered to eliminate the ModR/M table pass
for vsha512msg1 and vsha512rnds2 since you just introduced OP_R with
Uxmm.

However, xmm_mode in OP_R requires VEX128 or less. But unfortunately,
for both instructions, they are VEX256. Therefore, I still keep the
ModR/M table pass in the patch.

BRs,
Haochen

gas/ChangeLog:

	* NEWS: Support Intel SHA512.
	* config/tc-i386.c: Add sha512.
	* doc/c-i386.texi: Document .sha512.
	* testsuite/gas/i386/i386.exp: Run SHA512 tests.
	* testsuite/gas/i386/x86-64.exp: Ditto.
	* testsuite/gas/i386/sha512-intel.d: New test.
	* testsuite/gas/i386/sha512-inval.l: Ditto.
	* testsuite/gas/i386/sha512-inval.s: Ditto.
	* testsuite/gas/i386/sha512.d: Ditto.
	* testsuite/gas/i386/sha512.s: Ditto.
	* testsuite/gas/i386/x86-64-sha512-intel.d: Ditto.
	* testsuite/gas/i386/x86-64-sha512-inval.l: Ditto.
	* testsuite/gas/i386/x86-64-sha512-inval.s: Ditto.
	* testsuite/gas/i386/x86-64-sha512.d: Ditto.
	* testsuite/gas/i386/x86-64-sha512.s: Ditto.

opcodes/ChangeLog:

	* i386-dis.c (Uymm): New.
	(MOD_VEX_0F38CB_P_3_W_0_L_1): Ditto.
	(MOD_VEX_0F38CC_P_3_W_0_L_1): Ditto.
	(PREFIX_VEX_0F38CB): Ditto.
	(PREFIX_VEX_0F38CC): Ditto.
	(PREFIX_VEX_0F38CD): Ditto.
	(VEX_LEN_0F38CB_P_3_W_0): Ditto.
	(VEX_LEN_0F38CC_P_3_W_0): Ditto.
	(VEX_LEN_0F38CD_P_3_W_0): Ditto.
	(VEX_W_0F38CB_P_3): Ditto.
	(VEX_W_0F38CC_P_3): Ditto.
	(VEX_W_0F38CD_P_3): Ditto.
	(mod_table): Add MOD_VEX_0F38CB_P_3_W_0_L_1, MOD_VEX_0F38CC_P_3_W_0_L_1,
	(prefix_table): Add PREFIX_VEX_0F38CB, PREFIX_VEX_0F38CC,
	PREFIX_VEX_0F38CD.
	(vex_len_table): Add VEX_LEN_0F38CB_P_3_W_0,
	VEX_LEN_0F38CC_P_3_W_0, VEX_LEN_0F38CD_P_3_W_0.
	(vex_w_table): Add VEX_W_0F38CB_P_3, VEX_W_0F38CC_P_3, VEX_W_0F38CD_P_3.
	* i386-gen.c (isa_dependencies): Add SHA512.
	(cpu_flags): Ditto.
	* i386-init.h: Regenerated.
	* i386-mnem.h: Ditto.
	* i386-opc.h (CpuSHA512): New.
	(i386_cpu_flags): Add cpusha512.
	* i386-opc.tbl: Add SHA512 instructions.
	* i386-tbl.h: Regenerated.
---
 gas/NEWS                                     |    2 +
 gas/config/tc-i386.c                         |    1 +
 gas/doc/c-i386.texi                          |    3 +-
 gas/testsuite/gas/i386/i386.exp              |    2 +
 gas/testsuite/gas/i386/sha512-intel.d        |   16 +
 gas/testsuite/gas/i386/sha512.d              |   16 +
 gas/testsuite/gas/i386/sha512.s              |   13 +
 gas/testsuite/gas/i386/x86-64-sha512-intel.d |   16 +
 gas/testsuite/gas/i386/x86-64-sha512.d       |   16 +
 gas/testsuite/gas/i386/x86-64-sha512.s       |   13 +
 gas/testsuite/gas/i386/x86-64.exp            |    2 +
 opcodes/i386-dis.c                           |   82 +-
 opcodes/i386-gen.c                           |    3 +
 opcodes/i386-init.h                          |  648 +-
 opcodes/i386-mnem.h                          | 3949 ++++----
 opcodes/i386-opc.h                           |    3 +
 opcodes/i386-opc.tbl                         |    8 +
 opcodes/i386-tbl.h                           | 8555 +++++++++---------
 18 files changed, 6806 insertions(+), 6542 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/sha512-intel.d
 create mode 100644 gas/testsuite/gas/i386/sha512.d
 create mode 100644 gas/testsuite/gas/i386/sha512.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-sha512-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-sha512.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-sha512.s
  

Comments

Jan Beulich July 13, 2023, 10:02 a.m. UTC | #1
Up-front question on title and naming in the patch: Doc indeed says just
SHA512 (same for SM3 and SM4), but are you (including those who
assigned those names) sure that's going to stay this way by the time
this is merged into the SDM? Considering other ISA names, AVX-SHA512
would seem more consistent to me.

On 13.07.2023 08:33, Haochen Jiang wrote:
> In SHA512 patch, I have considered to eliminate the ModR/M table pass
> for vsha512msg1 and vsha512rnds2 since you just introduced OP_R with
> Uxmm.
> 
> However, xmm_mode in OP_R requires VEX128 or less. But unfortunately,
> for both instructions, they are VEX256. Therefore, I still keep the
> ModR/M table pass in the patch.

I guess I don't (fully) understand. Uxmm and xmm_mode aren't well suited
here anyway. What's wrong with introducing

#define Rxmmq { OP_R, xmmq_mode }

(or Uxmmq) and using it there, rejecting VEX.L==0 just like VEX.L==1 is
rejected for xmm_mode?

> --- a/gas/testsuite/gas/i386/i386.exp
> +++ b/gas/testsuite/gas/i386/i386.exp
> @@ -498,6 +498,8 @@ if [gas_32_check] then {
>      run_list_test "amx-complex-inval"
>      run_dump_test "avx-vnni-int16"
>      run_dump_test "avx-vnni-int16-intel"
> +    run_dump_test "sha512"
> +    run_dump_test "sha512-intel"

Perhaps worth having further tests proving that both assembler and
disassembler correctly deal with (invalid) memory operands / encodings?
(The disassembler part may not need to be a separate test; I think we
already have one which could be extended: disassem.[sd] and its 64-bit
counterpart.)

> --- a/opcodes/i386-gen.c
> +++ b/opcodes/i386-gen.c
> @@ -168,6 +168,8 @@ static const dependency isa_dependencies[] =
>      "LKGS" },
>    { "AVX_VNNI_INT16",
>      "AVX2" },
> +  { "SHA512",
> +    "AVX" },

Like for the earlier patch this wants to move up a little. I also
question that it's AVX that's the baseline feature here. While correct
for SM3, I expect it needs to be AVX2 both here and for SM4, for AVX
offering no real 256-bit integer operations. (Obviously this wants
taking care of in the doc as well.)

> --- a/opcodes/i386-opc.tbl
> +++ b/opcodes/i386-opc.tbl
> @@ -3375,3 +3375,11 @@ vpdpwsud, 0xf3d2, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperand
>  vpdpwsuds, 0xf3d3, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
>  
>  // AVX_VNNI_INT16 instructions end.
> +
> +// SHA512 instructions.
> +
> +vsha512rnds2, 0xf2cb, SHA512, Vex256|Space0F38|Modrm|VexVVVV|VexW0|NoSuf, { RegXMM, RegYMM, RegYMM }
> +vsha512msg1, 0xf2cc, SHA512, Vex256|Space0F38|Modrm|VexW0|NoSuf, { RegXMM, RegYMM }
> +vsha512msg2, 0xf2cd, SHA512, Vex256|Space0F38|Modrm|VexW0|NoSuf, { RegYMM, RegYMM }

Can we please stick to Modrm coming first?

Jan
  
Frager, Neal via Binutils July 14, 2023, 3:40 a.m. UTC | #2
> Up-front question on title and naming in the patch: Doc indeed says just
> SHA512 (same for SM3 and SM4), but are you (including those who assigned
> those names) sure that's going to stay this way by the time this is merged
> into the SDM? Considering other ISA names, AVX-SHA512 would seem more
> consistent to me.

SHA512 is not an ISA under AVX set. So AVX-SHA512 is not used.

The actual meaning in SDM/ISE is that we need to check both AVX and SHA512
feature bit to use the instruction.

I could drop the imply in implementation and change to checking both ISA bit
set. But since it will use xmm/ymm register, in current implementation, we
choose to imply AVX for SHA512 for convenience.

Whether it should be AVX/AVX2 will be mentioned below.

> On 13.07.2023 08:33, Haochen Jiang wrote:
> > In SHA512 patch, I have considered to eliminate the ModR/M table pass
> > for vsha512msg1 and vsha512rnds2 since you just introduced OP_R with
> > Uxmm.
> >
> > However, xmm_mode in OP_R requires VEX128 or less. But unfortunately,
> > for both instructions, they are VEX256. Therefore, I still keep the
> > ModR/M table pass in the patch.
> 
> I guess I don't (fully) understand. Uxmm and xmm_mode aren't well suited
> here anyway. What's wrong with introducing
> 
> #define Rxmmq { OP_R, xmmq_mode }
> 
> (or Uxmmq) and using it there, rejecting VEX.L==0 just like VEX.L==1 is
> rejected for xmm_mode?

Since xmm_mode and xmmq_mode does same under VEX.L==1, it could
be used here. I will change to that.

> > --- a/gas/testsuite/gas/i386/i386.exp
> > +++ b/gas/testsuite/gas/i386/i386.exp
> > @@ -498,6 +498,8 @@ if [gas_32_check] then {
> >      run_list_test "amx-complex-inval"
> >      run_dump_test "avx-vnni-int16"
> >      run_dump_test "avx-vnni-int16-intel"
> > +    run_dump_test "sha512"
> > +    run_dump_test "sha512-intel"
> 
> Perhaps worth having further tests proving that both assembler and
> disassembler correctly deal with (invalid) memory operands / encodings?
> (The disassembler part may not need to be a separate test; I think we
> already have one which could be extended: disassem.[sd] and its 64-bit
> counterpart.)

I will try to add that in next version.

> > --- a/opcodes/i386-gen.c
> > +++ b/opcodes/i386-gen.c
> > @@ -168,6 +168,8 @@ static const dependency isa_dependencies[] =
> >      "LKGS" },
> >    { "AVX_VNNI_INT16",
> >      "AVX2" },
> > +  { "SHA512",
> > +    "AVX" },
> 
> Like for the earlier patch this wants to move up a little. I also question that it's
> AVX that's the baseline feature here. While correct for SM3, I expect it needs
> to be AVX2 both here and for SM4, for AVX offering no real 256-bit integer
> operations. (Obviously this wants taking care of in the doc as well.)

You got a point here.

I will check with the design and HW team since it is actually AVX2 introduces the
256-bit integer operations to see if this is a misuse.

One reason I can think of using AVX only is that SHA512 and SM4 actually do not
need other integer operations to help with. It only needs VMOV, which is introduced
by AVX. So when hardware checking XSTATE, AVX is enough.

Thx,
Haochen
  
Jan Beulich July 14, 2023, 7:12 a.m. UTC | #3
On 14.07.2023 05:40, Jiang, Haochen wrote:
>> Up-front question on title and naming in the patch: Doc indeed says just
>> SHA512 (same for SM3 and SM4), but are you (including those who assigned
>> those names) sure that's going to stay this way by the time this is merged
>> into the SDM? Considering other ISA names, AVX-SHA512 would seem more
>> consistent to me.
> 
> SHA512 is not an ISA under AVX set.

I'm afraid I don't understand. How is it not? It uses YMM registers.
And conceivably there could be EVEX encodings of these (allowing the
full 32 register set to be used), which I'd then call AVX512-SHA512.

It's also not possible to potentially express the same thing in
legacy encodings (unlike e.g. GFNI). Even for SM3, where only 128-
bit operations are used, that's not possible, as the insns have 3
inputs (the destination is r/w).

> So AVX-SHA512 is not used.
> 
> The actual meaning in SDM/ISE is that we need to check both AVX and SHA512
> feature bit to use the instruction.
> 
> I could drop the imply in implementation and change to checking both ISA bit
> set. But since it will use xmm/ymm register, in current implementation, we
> choose to imply AVX for SHA512 for convenience.
> 
> Whether it should be AVX/AVX2 will be mentioned below.
>[...]
>>> --- a/opcodes/i386-gen.c
>>> +++ b/opcodes/i386-gen.c
>>> @@ -168,6 +168,8 @@ static const dependency isa_dependencies[] =
>>>      "LKGS" },
>>>    { "AVX_VNNI_INT16",
>>>      "AVX2" },
>>> +  { "SHA512",
>>> +    "AVX" },
>>
>> Like for the earlier patch this wants to move up a little. I also question that it's
>> AVX that's the baseline feature here. While correct for SM3, I expect it needs
>> to be AVX2 both here and for SM4, for AVX offering no real 256-bit integer
>> operations. (Obviously this wants taking care of in the doc as well.)
> 
> You got a point here.
> 
> I will check with the design and HW team since it is actually AVX2 introduces the
> 256-bit integer operations to see if this is a misuse.
> 
> One reason I can think of using AVX only is that SHA512 and SM4 actually do not
> need other integer operations to help with. It only needs VMOV, which is introduced
> by AVX. So when hardware checking XSTATE, AVX is enough.

So for a feature check requirement referencing just AVX may be okay. But
there's not going to be any SHA512 without AVX anyway, for there not
being any YMM registers without AVX; you wouldn't be able to fill the
register operands. Hence the extra feature check is redundant (and would
hence better be omitted).

As to implying baseline functionality, using AVX (rather than AVX2) makes
little sense, so even if the feature check remained (note that various
other extensions, including e.g. AVX-VNNI-INT<n>, don't have such a
secondary requirement), I'd still be fairly insistent on having the
base feature named here (and for SM4) be AVX2 (to be in line with other
similar baseline selections).

Jan
  
Frager, Neal via Binutils July 18, 2023, 7:20 a.m. UTC | #4
> I'm afraid I don't understand. How is it not? It uses YMM registers.
> And conceivably there could be EVEX encodings of these (allowing the
> full 32 register set to be used), which I'd then call AVX512-SHA512.
> 
> It's also not possible to potentially express the same thing in
> legacy encodings (unlike e.g. GFNI). Even for SM3, where only 128-
> bit operations are used, that's not possible, as the insns have 3
> inputs (the destination is r/w).

I am actually expressing that to the same thing as GFNI although it does not
has legacy encoding.

Actually, we somehow want to show the evolution from previous SHA. I will
move the entry of them just after the SHA since they are both crypto related
ISAs.

> [...]
> So for a feature check requirement referencing just AVX may be okay. But
> there's not going to be any SHA512 without AVX anyway, for there not
> being any YMM registers without AVX; you wouldn't be able to fill the
> register operands. Hence the extra feature check is redundant (and would
> hence better be omitted).
> 
> As to implying baseline functionality, using AVX (rather than AVX2) makes
> little sense, so even if the feature check remained (note that various
> other extensions, including e.g. AVX-VNNI-INT<n>, don't have such a
> secondary requirement), I'd still be fairly insistent on having the
> base feature named here (and for SM4) be AVX2 (to be in line with other
> similar baseline selections).

I confirmed that AVX in doc here means a state of the whole AVX ISA,
which should include AVX and AVX2. 

I will change the imply of SHA512 and SM4 to AVX2 since it looks much more
reasonable.

Should we also change the imply of SM3 here?

Thx,
Haochen

> 
> Jan
  
Jan Beulich July 18, 2023, 8:11 a.m. UTC | #5
On 18.07.2023 09:20, Jiang, Haochen wrote:
>> As to implying baseline functionality, using AVX (rather than AVX2) makes
>> little sense, so even if the feature check remained (note that various
>> other extensions, including e.g. AVX-VNNI-INT<n>, don't have such a
>> secondary requirement), I'd still be fairly insistent on having the
>> base feature named here (and for SM4) be AVX2 (to be in line with other
>> similar baseline selections).
> 
> I confirmed that AVX in doc here means a state of the whole AVX ISA,
> which should include AVX and AVX2. 
> 
> I will change the imply of SHA512 and SM4 to AVX2 since it looks much more
> reasonable.

Thanks.

> Should we also change the imply of SM3 here?

AVX looks sufficient there, so I'd say only if you have a good justification.

Jan
  

Patch

diff --git a/gas/NEWS b/gas/NEWS
index 5e9ed5ab4bc..fe2c055fa7f 100644
--- a/gas/NEWS
+++ b/gas/NEWS
@@ -1,5 +1,7 @@ 
 -*- text -*-
 
+* Add support for Intel SHA512 instructions.
+
 * Add support for Intel AVX-VNNI-INT16 instructions.
 
 Changes in 2.41:
diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 0d3d7560efe..836640d9123 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1152,6 +1152,7 @@  static const arch_entry cpu_arch[] =
   SUBARCH (fred, FRED, ANY_FRED, false),
   SUBARCH (lkgs, LKGS, ANY_LKGS, false),
   SUBARCH (avx_vnni_int16, AVX_VNNI_INT16, ANY_AVX_VNNI_INT16, false),
+  SUBARCH (sha512, SHA512, ANY_SHA512, false),
 };
 
 #undef SUBARCH
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 40ba942d9cb..21fb71e54ab 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -208,6 +208,7 @@  accept various extension mnemonics.  For example,
 @code{fred},
 @code{lkgs},
 @code{avx_vnni_int16},
+@code{sha512},
 @code{amx_int8},
 @code{amx_bf16},
 @code{amx_fp16},
@@ -1637,7 +1638,7 @@  supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.prefetchi} @tab @samp{.avx_ifma} @tab @samp{.avx_vnni_int8}
 @item @samp{.cmpccxadd} @tab @samp{.wrmsrns} @tab @samp{.msrlist}
 @item @samp{.avx_ne_convert} @tab @samp{.rao_int} @tab @samp{.fred} @tab @samp{.lkgs}
-@item @samp{.avx_vnni_int16}
+@item @samp{.avx_vnni_int16} @tab @samp{.sha512}
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index b69c692cd16..487811ad988 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -498,6 +498,8 @@  if [gas_32_check] then {
     run_list_test "amx-complex-inval"
     run_dump_test "avx-vnni-int16"
     run_dump_test "avx-vnni-int16-intel"
+    run_dump_test "sha512"
+    run_dump_test "sha512-intel"
     run_list_test "sg"
     run_dump_test "clzero"
     run_dump_test "invlpgb"
diff --git a/gas/testsuite/gas/i386/sha512-intel.d b/gas/testsuite/gas/i386/sha512-intel.d
new file mode 100644
index 00000000000..c1cc85b9f26
--- /dev/null
+++ b/gas/testsuite/gas/i386/sha512-intel.d
@@ -0,0 +1,16 @@ 
+#as:
+#objdump: -dw -Mintel
+#name: i386 SHA512 insns (Intel disassembly)
+#source: sha512.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 ymm6,xmm5
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 ymm6,ymm5
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 ymm6,ymm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 ymm6,xmm5
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 ymm6,ymm5
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 ymm6,ymm5,xmm4
diff --git a/gas/testsuite/gas/i386/sha512.d b/gas/testsuite/gas/i386/sha512.d
new file mode 100644
index 00000000000..b90019954ea
--- /dev/null
+++ b/gas/testsuite/gas/i386/sha512.d
@@ -0,0 +1,16 @@ 
+#as:
+#objdump: -dw
+#name: i386 SHA512 insns
+#source: sha512.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 %xmm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 %ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 %xmm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 %xmm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 %ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 %xmm4,%ymm5,%ymm6
diff --git a/gas/testsuite/gas/i386/sha512.s b/gas/testsuite/gas/i386/sha512.s
new file mode 100644
index 00000000000..e238c272970
--- /dev/null
+++ b/gas/testsuite/gas/i386/sha512.s
@@ -0,0 +1,13 @@ 
+# Check 32bit SHA512 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsha512msg1	%xmm5, %ymm6	 #SHA512
+	vsha512msg2	%ymm5, %ymm6	 #SHA512
+	vsha512rnds2	%xmm4, %ymm5, %ymm6	 #SHA512
+
+.intel_syntax noprefix
+	vsha512msg1	ymm6, xmm5	 #SHA512
+	vsha512msg2	ymm6, ymm5	 #SHA512
+	vsha512rnds2	ymm6, ymm5, xmm4	 #SHA512
diff --git a/gas/testsuite/gas/i386/x86-64-sha512-intel.d b/gas/testsuite/gas/i386/x86-64-sha512-intel.d
new file mode 100644
index 00000000000..e644168e311
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sha512-intel.d
@@ -0,0 +1,16 @@ 
+#as:
+#objdump: -dw -Mintel
+#name: x86_64 SHA512 insns (Intel disassembly)
+#source: x86-64-sha512.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 ymm6,xmm5
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 ymm6,ymm5
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 ymm6,ymm5,xmm4
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 ymm6,xmm5
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 ymm6,ymm5
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 ymm6,ymm5,xmm4
diff --git a/gas/testsuite/gas/i386/x86-64-sha512.d b/gas/testsuite/gas/i386/x86-64-sha512.d
new file mode 100644
index 00000000000..fcb8ae61fee
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sha512.d
@@ -0,0 +1,16 @@ 
+#as:
+#objdump: -dw
+#name: x86_64 SHA512 insns
+#source: x86-64-sha512.s
+
+.*: +file format .*
+
+Disassembly of section \.text:
+
+0+ <_start>:
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 %xmm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 %ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 %xmm4,%ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cc f5\s+vsha512msg1 %xmm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 7f cd f5\s+vsha512msg2 %ymm5,%ymm6
+\s*[a-f0-9]+:\s*c4 e2 57 cb f4\s+vsha512rnds2 %xmm4,%ymm5,%ymm6
diff --git a/gas/testsuite/gas/i386/x86-64-sha512.s b/gas/testsuite/gas/i386/x86-64-sha512.s
new file mode 100644
index 00000000000..5eaadb3bade
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-sha512.s
@@ -0,0 +1,13 @@ 
+# Check 64bit SHA512 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vsha512msg1	%xmm5, %ymm6	 #SHA512
+	vsha512msg2	%ymm5, %ymm6	 #SHA512
+	vsha512rnds2	%xmm4, %ymm5, %ymm6	 #SHA512
+
+.intel_syntax noprefix
+	vsha512msg1	ymm6, xmm5	 #SHA512
+	vsha512msg2	ymm6, ymm5	 #SHA512
+	vsha512rnds2	ymm6, ymm5, xmm4	 #SHA512
diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp
index 0f2903c6185..64d8c3726d4 100644
--- a/gas/testsuite/gas/i386/x86-64.exp
+++ b/gas/testsuite/gas/i386/x86-64.exp
@@ -440,6 +440,8 @@  run_dump_test "x86-64-lkgs"
 run_list_test "x86-64-lkgs-inval"
 run_dump_test "x86-64-avx-vnni-int16"
 run_dump_test "x86-64-avx-vnni-int16-intel"
+run_dump_test "x86-64-sha512"
+run_dump_test "x86-64-sha512-intel"
 run_dump_test "x86-64-clzero"
 run_dump_test "x86-64-mwaitx-bdver4"
 run_list_test "x86-64-mwaitx-reg"
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 9311d832342..430238c3e4e 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -530,6 +530,7 @@  fetch_error (const instr_info *ins)
 #define Nq { OP_R, q_mode }
 #define Ux { OP_R, x_mode }
 #define Uxmm { OP_R, xmm_mode }
+#define Uymm { OP_R, ymm_mode }
 #define Rtmm { OP_R, tmm_mode }
 #define EMCq { OP_EMC, q_mode }
 #define MXC { OP_MXC, 0 }
@@ -895,6 +896,8 @@  enum
   MOD_0F38DC_PREFIX_1,
 
   MOD_VEX_0F3849_X86_64_L_0_W_0,
+  MOD_VEX_0F38CB_P_3_W_0_L_1,
+  MOD_VEX_0F38CC_P_3_W_0_L_1,
 };
 
 enum
@@ -1064,6 +1067,9 @@  enum
   PREFIX_VEX_0F38B1_W_0,
   PREFIX_VEX_0F38D2_W_0,
   PREFIX_VEX_0F38D3_W_0,
+  PREFIX_VEX_0F38CB,
+  PREFIX_VEX_0F38CC,
+  PREFIX_VEX_0F38CD,
   PREFIX_VEX_0F38F5_L_0,
   PREFIX_VEX_0F38F6_L_0,
   PREFIX_VEX_0F38F7_L_0,
@@ -1306,6 +1312,9 @@  enum
   VEX_LEN_0F385C_X86_64,
   VEX_LEN_0F385E_X86_64,
   VEX_LEN_0F386C_X86_64,
+  VEX_LEN_0F38CB_P_3_W_0,
+  VEX_LEN_0F38CC_P_3_W_0,
+  VEX_LEN_0F38CD_P_3_W_0,
   VEX_LEN_0F38DB,
   VEX_LEN_0F38F2,
   VEX_LEN_0F38F3,
@@ -1473,6 +1482,9 @@  enum
   VEX_W_0F38B1,
   VEX_W_0F38B4,
   VEX_W_0F38B5,
+  VEX_W_0F38CB_P_3,
+  VEX_W_0F38CC_P_3,
+  VEX_W_0F38CD_P_3,
   VEX_W_0F38CF,
   VEX_W_0F38D2,
   VEX_W_0F38D3,
@@ -3928,6 +3940,30 @@  static const struct dis386 prefix_table[][4] = {
     { "vpdpwusds",	{ XM, Vex, EXx }, 0 },
   },
 
+  /* PREFIX_VEX_0F38CB */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F38CB_P_3) },
+  },
+
+  /* PREFIX_VEX_0F38CC */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F38CC_P_3) },
+  },
+
+  /* PREFIX_VEX_0F38CD */
+  {
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F38CD_P_3) },
+  },
+
   /* PREFIX_VEX_0F38F5_L_0 */
   {
     { "bzhiS",		{ Gdq, Edq, VexGdq }, 0 },
@@ -6380,9 +6416,9 @@  static const struct dis386 vex_table[][256] = {
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F38CB) },
+    { PREFIX_TABLE (PREFIX_VEX_0F38CC) },
+    { PREFIX_TABLE (PREFIX_VEX_0F38CD) },
     { Bad_Opcode },
     { VEX_W_TABLE (VEX_W_0F38CF) },
     /* d0 */
@@ -6944,6 +6980,24 @@  static const struct dis386 vex_len_table[][2] = {
     { VEX_W_TABLE (VEX_W_0F386C_X86_64_L_0) },
   },
 
+  /* VEX_LEN_0F38CB_P_3_W_0 */
+  {
+    { Bad_Opcode },
+    { MOD_TABLE (MOD_VEX_0F38CB_P_3_W_0_L_1) },
+  },
+
+  /* VEX_LEN_0F38CC_P_3_W_0 */
+  {
+    { Bad_Opcode },
+    { MOD_TABLE (MOD_VEX_0F38CC_P_3_W_0_L_1) },
+  },
+
+  /* VEX_LEN_0F38CD_P_3_W_0 */
+  {
+    { Bad_Opcode },
+    { "vsha512msg2", { XM, Uymm }, 0 },
+  },
+
   /* VEX_LEN_0F38DB */
   {
     { "vaesimc",	{ XM, EXx }, PREFIX_DATA },
@@ -7614,6 +7668,18 @@  static const struct dis386 vex_w_table[][2] = {
     { Bad_Opcode },
     { "%XVvpmadd52huq",	{ XM, Vex, EXx }, PREFIX_DATA },
   },
+  {
+    /* VEX_W_0F38CB_P_3 */
+    { VEX_LEN_TABLE (VEX_LEN_0F38CB_P_3_W_0) },
+  },
+  {
+    /* VEX_W_0F38CC_P_3 */
+    { VEX_LEN_TABLE (VEX_LEN_0F38CC_P_3_W_0) },
+  },
+  {
+    /* VEX_W_0F38CD_P_3 */
+    { VEX_LEN_TABLE (VEX_LEN_0F38CD_P_3_W_0) },
+  },
   {
     /* VEX_W_0F38CF */
     { "%XEvgf2p8mulb", { XM, Vex, EXx }, PREFIX_DATA },
@@ -8055,6 +8121,16 @@  static const struct dis386 mod_table[][2] = {
     { PREFIX_TABLE (PREFIX_VEX_0F3849_X86_64_L_0_W_0_M_0) },
     { PREFIX_TABLE (PREFIX_VEX_0F3849_X86_64_L_0_W_0_M_1) },
   },
+  {
+    /* MOD_VEX_0F38CB_P_3_W_0_L_1 */
+    { Bad_Opcode },
+    { "vsha512rnds2", { XM, Vex, EXxmm }, 0 },
+  },
+  {
+    /* MOD_VEX_0F38CC_P_3_W_0_L_1 */
+    { Bad_Opcode },
+    { "vsha512msg1", { XM, EXxmm }, 0 },
+  },
 
 #include "i386-dis-evex-mod.h"
 };
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 9796977a2aa..8a163533eeb 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -168,6 +168,8 @@  static const dependency isa_dependencies[] =
     "LKGS" },
   { "AVX_VNNI_INT16",
     "AVX2" },
+  { "SHA512",
+    "AVX" },
   { "AVX512F",
     "AVX2" },
   { "AVX512CD",
@@ -369,6 +371,7 @@  static bitfield cpu_flags[] =
   BITFIELD (FRED),
   BITFIELD (LKGS),
   BITFIELD (AVX_VNNI_INT16),
+  BITFIELD (SHA512),
   BITFIELD (MWAITX),
   BITFIELD (CLZERO),
   BITFIELD (OSPKE),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 4a225202e64..224ca04661e 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -235,6 +235,8 @@  enum
   CpuLKGS,
   /* Intel AVX VNNI-INT16 Instructions support required.  */
   CpuAVX_VNNI_INT16,
+  /* Intel SHA512 Instructions support required.  */
+  CpuSHA512,
   /* mwaitx instruction required */
   CpuMWAITX,
   /* Clzero instruction required */
@@ -433,6 +435,7 @@  typedef union i386_cpu_flags
       unsigned int cpufred:1;
       unsigned int cpulkgs:1;
       unsigned int cpuavx_vnni_int16:1;
+      unsigned int cpusha512:1;
       unsigned int cpumwaitx:1;
       unsigned int cpuclzero:1;
       unsigned int cpuospke:1;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 4903d3b2361..18ea2f1500e 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -3375,3 +3375,11 @@  vpdpwsud, 0xf3d2, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperand
 vpdpwsuds, 0xf3d3, AVX_VNNI_INT16, Modrm|Vex|Space0F38|VexVVVV|VexW0|CheckOperandSize|NoSuf, { RegXMM|RegYMM|Unspecified|BaseIndex, RegXMM|RegYMM, RegXMM|RegYMM }
 
 // AVX_VNNI_INT16 instructions end.
+
+// SHA512 instructions.
+
+vsha512rnds2, 0xf2cb, SHA512, Vex256|Space0F38|Modrm|VexVVVV|VexW0|NoSuf, { RegXMM, RegYMM, RegYMM }
+vsha512msg1, 0xf2cc, SHA512, Vex256|Space0F38|Modrm|VexW0|NoSuf, { RegXMM, RegYMM }
+vsha512msg2, 0xf2cd, SHA512, Vex256|Space0F38|Modrm|VexW0|NoSuf, { RegYMM, RegYMM }
+
+// SHA512 instructions end.