Repost [PATCH 4/6] PowerPC: Make MMA insns support DMR registers.

Message ID ZZiTS0adUUPx7wjY@cowardly-lion.the-meissners.org
State Unresolved
Headers
Series Repost [PATCH 4/6] PowerPC: Make MMA insns support DMR registers. |

Checks

Context Check Description
snail/gcc-patch-check warning Git am fail log

Commit Message

Michael Meissner Jan. 5, 2024, 11:39 p.m. UTC
  This patch changes the MMA instructions to use either FPR registers
(-mcpu=power10) or DMRs (-mcpu=future).  In this patch, the existing MMA
instruction names are used.

A macro (__PPC_DMR__) is defined if the MMA instructions use the DMRs.

The patches have been tested on both little and big endian systems.  Can I check
it into the master branch?

2024-01-05   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	* config/rs6000/mma.md (mma_<acc>): New define_expand to handle
	mma_<acc> for dense math and non dense math.
	(mma_<acc> insn): Restrict to non dense math.
	(mma_xxsetaccz): Convert to define_expand to handle non dense math and
	dense math.
	(mma_xxsetaccz_vsx): Rename from mma_xxsetaccz and restrict usage to non
	dense math.
	(mma_xxsetaccz_dm): Dense math version of mma_xxsetaccz.
	(mma_<vv>): Add support for dense math.
	(mma_<avv>): Likewise.
	(mma_<pv>): Likewise.
	(mma_<apv>): Likewise.
	(mma_<vvi4i4i8>): Likewise.
	(mma_<avvi4i4i8>): Likewise.
	(mma_<vvi4i4i2>): Likewise.
	(mma_<avvi4i4i2>): Likewise.
	(mma_<vvi4i4>): Likewise.
	(mma_<avvi4i4>): Likewise.
	(mma_<pvi4i2>): Likewise.
	(mma_<apvi4i2>): Likewise.
	(mma_<vvi4i4i4>): Likewise.
	(mma_<avvi4i4i4>): Likewise.
	* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
	__PPC_DMR__ if we have dense math instructions.
	* config/rs6000/rs6000.cc (print_operand): Make %A handle only DMRs if
	dense math and only FPRs if not dense math.
	(rs6000_split_multireg_move): Do not generate the xxmtacc instruction to
	prime the DMR registers or the xxmfacc instruction to de-prime
	instructions if we have dense math register support.
---
 gcc/config/rs6000/mma.md      | 247 +++++++++++++++++++++-------------
 gcc/config/rs6000/rs6000-c.cc |   3 +
 gcc/config/rs6000/rs6000.cc   |  35 ++---
 3 files changed, 176 insertions(+), 109 deletions(-)
  

Comments

Michael Meissner Jan. 19, 2024, 6:47 p.m. UTC | #1
Ping

| Date: Fri, 5 Jan 2024 18:39:55 -0500
| From: Michael Meissner <meissner@linux.ibm.com>
| Subject: Repost [PATCH 4/6] PowerPC: Make MMA insns support DMR registers.
| Message-ID: <ZZiTS0adUUPx7wjY@cowardly-lion.the-meissners.org>

https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641964.html
  
Kewen.Lin Feb. 4, 2024, 3:21 a.m. UTC | #2
Hi Mike,

on 2024/1/6 07:39, Michael Meissner wrote:
> This patch changes the MMA instructions to use either FPR registers
> (-mcpu=power10) or DMRs (-mcpu=future).  In this patch, the existing MMA
> instruction names are used.
> 
> A macro (__PPC_DMR__) is defined if the MMA instructions use the DMRs.
> 
> The patches have been tested on both little and big endian systems.  Can I check
> it into the master branch?
> 
> 2024-01-05   Michael Meissner  <meissner@linux.ibm.com>
> 
> gcc/
> 
> 	* config/rs6000/mma.md (mma_<acc>): New define_expand to handle
> 	mma_<acc> for dense math and non dense math.
> 	(mma_<acc> insn): Restrict to non dense math.
> 	(mma_xxsetaccz): Convert to define_expand to handle non dense math and
> 	dense math.
> 	(mma_xxsetaccz_vsx): Rename from mma_xxsetaccz and restrict usage to non
> 	dense math.
> 	(mma_xxsetaccz_dm): Dense math version of mma_xxsetaccz.
> 	(mma_<vv>): Add support for dense math.
> 	(mma_<avv>): Likewise.
> 	(mma_<pv>): Likewise.
> 	(mma_<apv>): Likewise.
> 	(mma_<vvi4i4i8>): Likewise.
> 	(mma_<avvi4i4i8>): Likewise.
> 	(mma_<vvi4i4i2>): Likewise.
> 	(mma_<avvi4i4i2>): Likewise.
> 	(mma_<vvi4i4>): Likewise.
> 	(mma_<avvi4i4>): Likewise.
> 	(mma_<pvi4i2>): Likewise.
> 	(mma_<apvi4i2>): Likewise.
> 	(mma_<vvi4i4i4>): Likewise.
> 	(mma_<avvi4i4i4>): Likewise.
> 	* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
> 	__PPC_DMR__ if we have dense math instructions.
> 	* config/rs6000/rs6000.cc (print_operand): Make %A handle only DMRs if
> 	dense math and only FPRs if not dense math.
> 	(rs6000_split_multireg_move): Do not generate the xxmtacc instruction to
> 	prime the DMR registers or the xxmfacc instruction to de-prime
> 	instructions if we have dense math register support.
> ---
>  gcc/config/rs6000/mma.md      | 247 +++++++++++++++++++++-------------
>  gcc/config/rs6000/rs6000-c.cc |   3 +
>  gcc/config/rs6000/rs6000.cc   |  35 ++---
>  3 files changed, 176 insertions(+), 109 deletions(-)
> 
> diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
> index bb898919ab5..525a85146ff 100644
> --- a/gcc/config/rs6000/mma.md
> +++ b/gcc/config/rs6000/mma.md
> @@ -559,190 +559,249 @@ (define_insn "*mma_disassemble_acc_dm"
>    "dmxxextfdmr256 %0,%1,2"
>    [(set_attr "type" "mma")])
>  
> -(define_insn "mma_<acc>"
> +;; MMA instructions that do not use their accumulators as an input, still must
> +;; not allow their vector operands to overlap the registers used by the
> +;; accumulator.  We enforce this by marking the output as early clobber.  If we
> +;; have dense math, we don't need the whole prime/de-prime action, so just make
> +;; thse instructions be NOPs.

typo: thse.

> +
> +(define_expand "mma_<acc>"
> +  [(set (match_operand:XO 0 "register_operand")
> +	(unspec:XO [(match_operand:XO 1 "register_operand")]

s/register_operand/accumulator_operand/?

> +		   MMA_ACC))]
> +  "TARGET_MMA"
> +{
> +  if (TARGET_DENSE_MATH)
> +    {
> +      if (!rtx_equal_p (operands[0], operands[1]))
> +	emit_move_insn (operands[0], operands[1]);
> +      DONE;
> +    }
> +
> +  /* Generate the prime/de-prime code.  */
> +})
> +
> +(define_insn "*mma_<acc>"

May be better to name with "*mma_<acc>_nodm"?

>    [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
>  	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")]
>  		    MMA_ACC))]
> -  "TARGET_MMA"
> +  "TARGET_MMA && !TARGET_DENSE_MATH"

I found that "TARGET_MMA && !TARGET_DENSE_MATH" is used much (like changes in function
rs6000_split_multireg_move in this patch and some places in previous patches), maybe we
can introduce a macro named as TARGET_MMA_NODM short for it?

>    "<acc> %A0"
>    [(set_attr "type" "mma")])
>  
>  ;; We can't have integer constants in XOmode so we wrap this in an
> -;; UNSPEC_VOLATILE.
> +;; UNSPEC_VOLATILE for the non-dense math case.  For dense math, we don't need
> +;; to disable optimization and we can do a normal UNSPEC.
>  
> -(define_insn "mma_xxsetaccz"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
> +(define_expand "mma_xxsetaccz"
> +  [(set (match_operand:XO 0 "register_operand")

s/register_operand/accumulator_operand/?

>  	(unspec_volatile:XO [(const_int 0)]
>  			    UNSPECV_MMA_XXSETACCZ))]
>    "TARGET_MMA"
> +{
> +  if (TARGET_DENSE_MATH)
> +    {
> +      emit_insn (gen_mma_xxsetaccz_dm (operands[0]));
> +      DONE;
> +    }
> +})
> +
> +(define_insn "*mma_xxsetaccz_vsx"

s/vsx/nodm/

> +  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
> +	(unspec_volatile:XO [(const_int 0)]
> +			    UNSPECV_MMA_XXSETACCZ))]
> +  "TARGET_MMA && !TARGET_DENSE_MATH"
>    "xxsetaccz %A0"
>    [(set_attr "type" "mma")])
>  
> +
> +(define_insn "mma_xxsetaccz_dm"
> +  [(set (match_operand:XO 0 "dmr_operand" "=wD")
> +	(unspec:XO [(const_int 0)]
> +		   UNSPECV_MMA_XXSETACCZ))]
> +  "TARGET_DENSE_MATH"
> +  "dmsetdmrz %0"
> +  [(set_attr "type" "mma")])
> +
>  (define_insn "mma_<vv>"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
> -	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
> -		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")]
> +  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
> +	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")]
>  		    MMA_VV))]
>    "TARGET_MMA"
>    "<vv> %A0,%x1,%x2"
> -  [(set_attr "type" "mma")])
> +  [(set_attr "type" "mma")
> +   (set_attr "isa" "dm,not_dm,not_dm")])

Like what's suggested in previous patches, s/not_dm/nodm/

The others look good to me, thanks!

BR,
Kewen

>  
>  (define_insn "mma_<avv>"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
> -	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
> -		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
> -		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")]
> +  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
> +	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
> +		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")]
>  		    MMA_AVV))]
>    "TARGET_MMA"
>    "<avv> %A0,%x2,%x3"
> -  [(set_attr "type" "mma")])
> +  [(set_attr "type" "mma")
> +   (set_attr "isa" "dm,not_dm,not_dm")])
>  
>  (define_insn "mma_<pv>"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
> -	(unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa")
> -		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")]
> +  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
> +	(unspec:XO [(match_operand:OO 1 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")]
>  		    MMA_PV))]
>    "TARGET_MMA"
>    "<pv> %A0,%x1,%x2"
> -  [(set_attr "type" "mma")])
> +  [(set_attr "type" "mma")
> +   (set_attr "isa" "dm,not_dm,not_dm")])
>  
>  (define_insn "mma_<apv>"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
> -	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
> -		    (match_operand:OO 2 "vsx_register_operand" "v,?wa")
> -		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")]
> +  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
> +	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
> +		    (match_operand:OO 2 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")]
>  		    MMA_APV))]
>    "TARGET_MMA"
>    "<apv> %A0,%x2,%x3"
> -  [(set_attr "type" "mma")])
> +  [(set_attr "type" "mma")
> +   (set_attr "isa" "dm,not_dm,not_dm")])
>  
>  (define_insn "mma_<vvi4i4i8>"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
> -	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
> -		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
> -		    (match_operand:SI 3 "const_0_to_15_operand" "n,n")
> -		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
> -		    (match_operand:SI 5 "u8bit_cint_operand" "n,n")]
> +  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
> +	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n")
> +		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
> +		    (match_operand:SI 5 "u8bit_cint_operand" "n,n,n")]
>  		    MMA_VVI4I4I8))]
>    "TARGET_MMA"
>    "<vvi4i4i8> %A0,%x1,%x2,%3,%4,%5"
>    [(set_attr "type" "mma")
> -   (set_attr "prefixed" "yes")])
> +   (set_attr "prefixed" "yes")
> +   (set_attr "isa" "dm,not_dm,not_dm")])
>  
>  (define_insn "mma_<avvi4i4i8>"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
> -	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
> -		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
> -		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")
> -		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
> -		    (match_operand:SI 5 "const_0_to_15_operand" "n,n")
> -		    (match_operand:SI 6 "u8bit_cint_operand" "n,n")]
> +  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
> +	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
> +		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
> +		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")
> +		    (match_operand:SI 6 "u8bit_cint_operand" "n,n,n")]
>  		    MMA_AVVI4I4I8))]
>    "TARGET_MMA"
>    "<avvi4i4i8> %A0,%x2,%x3,%4,%5,%6"
>    [(set_attr "type" "mma")
> -   (set_attr "prefixed" "yes")])
> +   (set_attr "prefixed" "yes")
> +   (set_attr "isa" "dm,not_dm,not_dm")])
>  
>  (define_insn "mma_<vvi4i4i2>"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
> -	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
> -		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
> -		    (match_operand:SI 3 "const_0_to_15_operand" "n,n")
> -		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
> -		    (match_operand:SI 5 "const_0_to_3_operand" "n,n")]
> +  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
> +	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n")
> +		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
> +		    (match_operand:SI 5 "const_0_to_3_operand" "n,n,n")]
>  		    MMA_VVI4I4I2))]
>    "TARGET_MMA"
>    "<vvi4i4i2> %A0,%x1,%x2,%3,%4,%5"
>    [(set_attr "type" "mma")
> -   (set_attr "prefixed" "yes")])
> +   (set_attr "prefixed" "yes")
> +   (set_attr "isa" "dm,not_dm,not_dm")])
>  
>  (define_insn "mma_<avvi4i4i2>"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
> -	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
> -		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
> -		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")
> -		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
> -		    (match_operand:SI 5 "const_0_to_15_operand" "n,n")
> -		    (match_operand:SI 6 "const_0_to_3_operand" "n,n")]
> +  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
> +	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
> +		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
> +		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")
> +		    (match_operand:SI 6 "const_0_to_3_operand" "n,n,n")]
>  		    MMA_AVVI4I4I2))]
>    "TARGET_MMA"
>    "<avvi4i4i2> %A0,%x2,%x3,%4,%5,%6"
>    [(set_attr "type" "mma")
> -   (set_attr "prefixed" "yes")])
> +   (set_attr "prefixed" "yes")
> +   (set_attr "isa" "dm,not_dm,not_dm")])
>  
>  (define_insn "mma_<vvi4i4>"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
> -	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
> -		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
> -		    (match_operand:SI 3 "const_0_to_15_operand" "n,n")
> -		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")]
> +  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
> +	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n")
> +		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")]
>  		    MMA_VVI4I4))]
>    "TARGET_MMA"
>    "<vvi4i4> %A0,%x1,%x2,%3,%4"
>    [(set_attr "type" "mma")
> -   (set_attr "prefixed" "yes")])
> +   (set_attr "prefixed" "yes")
> +   (set_attr "isa" "dm,not_dm,not_dm")])
>  
>  (define_insn "mma_<avvi4i4>"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
> -	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
> -		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
> -		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")
> -		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
> -		    (match_operand:SI 5 "const_0_to_15_operand" "n,n")]
> +  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
> +	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
> +		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
> +		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")]
>  		    MMA_AVVI4I4))]
>    "TARGET_MMA"
>    "<avvi4i4> %A0,%x2,%x3,%4,%5"
>    [(set_attr "type" "mma")
> -   (set_attr "prefixed" "yes")])
> +   (set_attr "prefixed" "yes")
> +   (set_attr "isa" "dm,not_dm,not_dm")])
>  
>  (define_insn "mma_<pvi4i2>"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
> -	(unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa")
> -		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
> -		    (match_operand:SI 3 "const_0_to_15_operand" "n,n")
> -		    (match_operand:SI 4 "const_0_to_3_operand" "n,n")]
> +  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
> +	(unspec:XO [(match_operand:OO 1 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n")
> +		    (match_operand:SI 4 "const_0_to_3_operand" "n,n,n")]
>  		    MMA_PVI4I2))]
>    "TARGET_MMA"
>    "<pvi4i2> %A0,%x1,%x2,%3,%4"
>    [(set_attr "type" "mma")
> -   (set_attr "prefixed" "yes")])
> +   (set_attr "prefixed" "yes")
> +   (set_attr "isa" "dm,not_dm,not_dm")])
>  
>  (define_insn "mma_<apvi4i2>"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
> -	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
> -		    (match_operand:OO 2 "vsx_register_operand" "v,?wa")
> -		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")
> -		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
> -		    (match_operand:SI 5 "const_0_to_3_operand" "n,n")]
> +  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
> +	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
> +		    (match_operand:OO 2 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
> +		    (match_operand:SI 5 "const_0_to_3_operand" "n,n,n")]
>  		    MMA_APVI4I2))]
>    "TARGET_MMA"
>    "<apvi4i2> %A0,%x2,%x3,%4,%5"
>    [(set_attr "type" "mma")
> -   (set_attr "prefixed" "yes")])
> +   (set_attr "prefixed" "yes")
> +   (set_attr "isa" "dm,not_dm,not_dm")])
>  
>  (define_insn "mma_<vvi4i4i4>"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
> -	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
> -		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
> -		    (match_operand:SI 3 "const_0_to_15_operand" "n,n")
> -		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
> -		    (match_operand:SI 5 "const_0_to_15_operand" "n,n")]
> +  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
> +	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n")
> +		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
> +		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")]
>  		    MMA_VVI4I4I4))]
>    "TARGET_MMA"
>    "<vvi4i4i4> %A0,%x1,%x2,%3,%4,%5"
>    [(set_attr "type" "mma")
> -   (set_attr "prefixed" "yes")])
> +   (set_attr "prefixed" "yes")
> +   (set_attr "isa" "dm,not_dm,not_dm")])
>  
>  (define_insn "mma_<avvi4i4i4>"
> -  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
> -	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
> -		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
> -		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")
> -		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
> -		    (match_operand:SI 5 "const_0_to_15_operand" "n,n")
> -		    (match_operand:SI 6 "const_0_to_15_operand" "n,n")]
> +  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
> +	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
> +		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")
> +		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
> +		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")
> +		    (match_operand:SI 6 "const_0_to_15_operand" "n,n,n")]
>  		    MMA_AVVI4I4I4))]
>    "TARGET_MMA"
>    "<avvi4i4i4> %A0,%x2,%x3,%4,%5,%6"
>    [(set_attr "type" "mma")
> -   (set_attr "prefixed" "yes")])
> +   (set_attr "prefixed" "yes")
> +   (set_attr "isa" "dm,not_dm,not_dm")])
> diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
> index f2fb5bef678..4342620f87f 100644
> --- a/gcc/config/rs6000/rs6000-c.cc
> +++ b/gcc/config/rs6000/rs6000-c.cc
> @@ -600,6 +600,9 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags)
>    /* Tell the user if we support the MMA instructions.  */
>    if ((flags & OPTION_MASK_MMA) != 0)
>      rs6000_define_or_undefine_macro (define_p, "__MMA__");
> +  /* Tell the user if we support the dense math instructions.  */
> +  if ((flags & OPTION_MASK_DENSE_MATH) != 0)
> +    rs6000_define_or_undefine_macro (define_p, "__PPC_DMR__");
>    /* Whether pc-relative code is being generated.  */
>    if ((flags & OPTION_MASK_PCREL) != 0)
>      rs6000_define_or_undefine_macro (define_p, "__PCREL__");
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 83e32f7a43a..59517c8608d 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -14264,8 +14264,13 @@ print_operand (FILE *file, rtx x, int code)
>  	 overlapping with the FPR registers.  */
>        if (!REG_P (x))
>  	output_operand_lossage ("invalid %%A value");
> -      else if (TARGET_DENSE_MATH && DMR_REGNO_P (REGNO (x)))
> -	fprintf (file, "%d", REGNO (x) - FIRST_DMR_REGNO);
> +      else if (TARGET_DENSE_MATH)
> +	{
> +	  if (DMR_REGNO_P (REGNO (x)))
> +	    fprintf (file, "%d", REGNO (x) - FIRST_DMR_REGNO);
> +	  else
> +	    output_operand_lossage ("%%A operand is not a DMR");
> +	}
>        else if (!FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0)
>  	output_operand_lossage ("invalid %%A value");
>        else
> @@ -27719,7 +27724,7 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>  
>  	  /* If we are reading an accumulator register, we have to
>  	     deprime it before we can access it.  */
> -	  if (TARGET_MMA
> +	  if (TARGET_MMA && !TARGET_DENSE_MATH
>  	      && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
>  	    emit_insn (gen_mma_xxmfacc (src, src));
>  
> @@ -27751,9 +27756,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>  	      emit_insn (gen_rtx_SET (dst2, src2));
>  	    }
>  
> -	  /* If we are writing an accumulator register, we have to
> -	     prime it after we've written it.  */
> -	  if (TARGET_MMA
> +	  /* If we are writing an accumulator register that overlaps with the
> +	     FPR registers, we have to prime it after we've written it.  */
> +	  if (TARGET_MMA && !TARGET_DENSE_MATH
>  	      && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
>  	    emit_insn (gen_mma_xxmtacc (dst, dst));
>  
> @@ -27822,9 +27827,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>  	      emit_insn (gen_rtx_SET (dst_i, op));
>  	    }
>  
> -	  /* We are writing an accumulator register, so we have to
> -	     prime it after we've written it.  */
> -	  if (GET_MODE (src) == XOmode)
> +	  /* On systems without dense math where accumulators overlap with the
> +	     vector registers, we have to prime it after we've written it.  */
> +	  if (GET_MODE (src) == XOmode && !TARGET_DENSE_MATH)
>  	    emit_insn (gen_mma_xxmtacc (dst, dst));
>  
>  	  return;
> @@ -27835,9 +27840,9 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>  
>    if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst)))
>      {
> -      /* If we are reading an accumulator register, we have to
> -	 deprime it before we can access it.  */
> -      if (TARGET_MMA
> +      /* If we are reading an accumulator register and we don't have dense
> +	 math, we have to deprime it before we can access it.  */
> +      if (TARGET_MMA && !TARGET_DENSE_MATH
>  	  && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
>  	emit_insn (gen_mma_xxmfacc (src, src));
>  
> @@ -27865,7 +27870,7 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>  
>        /* If we are writing an accumulator register, we have to
>  	 prime it after we've written it.  */
> -      if (TARGET_MMA
> +      if (TARGET_MMA && !TARGET_DENSE_MATH
>  	  && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
>  	emit_insn (gen_mma_xxmtacc (dst, dst));
>      }
> @@ -28002,7 +28007,7 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>  
>        /* If we are reading an accumulator register, we have to
>  	 deprime it before we can access it.  */
> -      if (TARGET_MMA && REG_P (src)
> +      if (TARGET_MMA && !TARGET_DENSE_MATH && REG_P (src)
>  	  && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
>  	emit_insn (gen_mma_xxmfacc (src, src));
>  
> @@ -28034,7 +28039,7 @@ rs6000_split_multireg_move (rtx dst, rtx src)
>  
>        /* If we are writing an accumulator register, we have to
>  	 prime it after we've written it.  */
> -      if (TARGET_MMA && REG_P (dst)
> +      if (TARGET_MMA && !TARGET_DENSE_MATH && REG_P (dst)
>  	  && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
>  	emit_insn (gen_mma_xxmtacc (dst, dst));
>  
Michael Meissner Feb. 7, 2024, 3:31 a.m. UTC | #3
On Sun, Feb 04, 2024 at 11:21:49AM +0800, Kewen.Lin wrote:
> Hi Mike,
> 
> > --- a/gcc/config/rs6000/mma.md
> > +++ b/gcc/config/rs6000/mma.md
> > @@ -559,190 +559,249 @@ (define_insn "*mma_disassemble_acc_dm"
> >    "dmxxextfdmr256 %0,%1,2"
> >    [(set_attr "type" "mma")])
> >  
> > -(define_insn "mma_<acc>"
> > +;; MMA instructions that do not use their accumulators as an input, still must
> > +;; not allow their vector operands to overlap the registers used by the
> > +;; accumulator.  We enforce this by marking the output as early clobber.  If we
> > +;; have dense math, we don't need the whole prime/de-prime action, so just make
> > +;; thse instructions be NOPs.
> 
> typo: thse.

Ok.

> > +
> > +(define_expand "mma_<acc>"
> > +  [(set (match_operand:XO 0 "register_operand")
> > +	(unspec:XO [(match_operand:XO 1 "register_operand")]
> 
> s/register_operand/accumulator_operand/?

Ok.

> > +		   MMA_ACC))]
> > +  "TARGET_MMA"
> > +{
> > +  if (TARGET_DENSE_MATH)
> > +    {
> > +      if (!rtx_equal_p (operands[0], operands[1]))
> > +	emit_move_insn (operands[0], operands[1]);
> > +      DONE;
> > +    }
> > +
> > +  /* Generate the prime/de-prime code.  */
> > +})
> > +
> > +(define_insn "*mma_<acc>"
> 
> May be better to name with "*mma_<acc>_nodm"?

Ok.

> >    [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
> >  	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")]
> >  		    MMA_ACC))]
> > -  "TARGET_MMA"
> > +  "TARGET_MMA && !TARGET_DENSE_MATH"
> 
> I found that "TARGET_MMA && !TARGET_DENSE_MATH" is used much (like changes in function
> rs6000_split_multireg_move in this patch and some places in previous patches), maybe we
> can introduce a macro named as TARGET_MMA_NODM short for it?

As I said in the message about the last patch, I added
TARGET_MMA_NO_DENSE_MATH.

> >    "<acc> %A0"
> >    [(set_attr "type" "mma")])
> >  
> >  ;; We can't have integer constants in XOmode so we wrap this in an
> > -;; UNSPEC_VOLATILE.
> > +;; UNSPEC_VOLATILE for the non-dense math case.  For dense math, we don't need
> > +;; to disable optimization and we can do a normal UNSPEC.
> >  
> > -(define_insn "mma_xxsetaccz"
> > -  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
> > +(define_expand "mma_xxsetaccz"
> > +  [(set (match_operand:XO 0 "register_operand")
> 
> s/register_operand/accumulator_operand/?

Ok.

> >  	(unspec_volatile:XO [(const_int 0)]
> >  			    UNSPECV_MMA_XXSETACCZ))]
> >    "TARGET_MMA"
> > +{
> > +  if (TARGET_DENSE_MATH)
> > +    {
> > +      emit_insn (gen_mma_xxsetaccz_dm (operands[0]));
> > +      DONE;
> > +    }
> > +})
> > +
> > +(define_insn "*mma_xxsetaccz_vsx"
> 
> s/vsx/nodm/

Ok.

> > +  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
> > +	(unspec_volatile:XO [(const_int 0)]
> > +			    UNSPECV_MMA_XXSETACCZ))]
> > +  "TARGET_MMA && !TARGET_DENSE_MATH"
> >    "xxsetaccz %A0"
> >    [(set_attr "type" "mma")])
> >  
> > +
> > +(define_insn "mma_xxsetaccz_dm"
> > +  [(set (match_operand:XO 0 "dmr_operand" "=wD")
> > +	(unspec:XO [(const_int 0)]
> > +		   UNSPECV_MMA_XXSETACCZ))]
> > +  "TARGET_DENSE_MATH"
> > +  "dmsetdmrz %0"
> > +  [(set_attr "type" "mma")])
> > +
> >  (define_insn "mma_<vv>"
> > -  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
> > -	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
> > -		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")]
> > +  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
> > +	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa")
> > +		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")]
> >  		    MMA_VV))]
> >    "TARGET_MMA"
> >    "<vv> %A0,%x1,%x2"
> > -  [(set_attr "type" "mma")])
> > +  [(set_attr "type" "mma")
> > +   (set_attr "isa" "dm,not_dm,not_dm")])
> 
> Like what's suggested in previous patches, s/not_dm/nodm/

Ok.
  

Patch

diff --git a/gcc/config/rs6000/mma.md b/gcc/config/rs6000/mma.md
index bb898919ab5..525a85146ff 100644
--- a/gcc/config/rs6000/mma.md
+++ b/gcc/config/rs6000/mma.md
@@ -559,190 +559,249 @@  (define_insn "*mma_disassemble_acc_dm"
   "dmxxextfdmr256 %0,%1,2"
   [(set_attr "type" "mma")])
 
-(define_insn "mma_<acc>"
+;; MMA instructions that do not use their accumulators as an input, still must
+;; not allow their vector operands to overlap the registers used by the
+;; accumulator.  We enforce this by marking the output as early clobber.  If we
+;; have dense math, we don't need the whole prime/de-prime action, so just make
+;; thse instructions be NOPs.
+
+(define_expand "mma_<acc>"
+  [(set (match_operand:XO 0 "register_operand")
+	(unspec:XO [(match_operand:XO 1 "register_operand")]
+		   MMA_ACC))]
+  "TARGET_MMA"
+{
+  if (TARGET_DENSE_MATH)
+    {
+      if (!rtx_equal_p (operands[0], operands[1]))
+	emit_move_insn (operands[0], operands[1]);
+      DONE;
+    }
+
+  /* Generate the prime/de-prime code.  */
+})
+
+(define_insn "*mma_<acc>"
   [(set (match_operand:XO 0 "fpr_reg_operand" "=&d")
 	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0")]
 		    MMA_ACC))]
-  "TARGET_MMA"
+  "TARGET_MMA && !TARGET_DENSE_MATH"
   "<acc> %A0"
   [(set_attr "type" "mma")])
 
 ;; We can't have integer constants in XOmode so we wrap this in an
-;; UNSPEC_VOLATILE.
+;; UNSPEC_VOLATILE for the non-dense math case.  For dense math, we don't need
+;; to disable optimization and we can do a normal UNSPEC.
 
-(define_insn "mma_xxsetaccz"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+(define_expand "mma_xxsetaccz"
+  [(set (match_operand:XO 0 "register_operand")
 	(unspec_volatile:XO [(const_int 0)]
 			    UNSPECV_MMA_XXSETACCZ))]
   "TARGET_MMA"
+{
+  if (TARGET_DENSE_MATH)
+    {
+      emit_insn (gen_mma_xxsetaccz_dm (operands[0]));
+      DONE;
+    }
+})
+
+(define_insn "*mma_xxsetaccz_vsx"
+  [(set (match_operand:XO 0 "fpr_reg_operand" "=d")
+	(unspec_volatile:XO [(const_int 0)]
+			    UNSPECV_MMA_XXSETACCZ))]
+  "TARGET_MMA && !TARGET_DENSE_MATH"
   "xxsetaccz %A0"
   [(set_attr "type" "mma")])
 
+
+(define_insn "mma_xxsetaccz_dm"
+  [(set (match_operand:XO 0 "dmr_operand" "=wD")
+	(unspec:XO [(const_int 0)]
+		   UNSPECV_MMA_XXSETACCZ))]
+  "TARGET_DENSE_MATH"
+  "dmsetdmrz %0"
+  [(set_attr "type" "mma")])
+
 (define_insn "mma_<vv>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")]
 		    MMA_VV))]
   "TARGET_MMA"
   "<vv> %A0,%x1,%x2"
-  [(set_attr "type" "mma")])
+  [(set_attr "type" "mma")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<avv>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")]
 		    MMA_AVV))]
   "TARGET_MMA"
   "<avv> %A0,%x2,%x3"
-  [(set_attr "type" "mma")])
+  [(set_attr "type" "mma")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<pv>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:OO 1 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")]
 		    MMA_PV))]
   "TARGET_MMA"
   "<pv> %A0,%x1,%x2"
-  [(set_attr "type" "mma")])
+  [(set_attr "type" "mma")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<apv>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
-		    (match_operand:OO 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
+		    (match_operand:OO 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")]
 		    MMA_APV))]
   "TARGET_MMA"
   "<apv> %A0,%x2,%x3"
-  [(set_attr "type" "mma")])
+  [(set_attr "type" "mma")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<vvi4i4i8>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 3 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 5 "u8bit_cint_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 5 "u8bit_cint_operand" "n,n,n")]
 		    MMA_VVI4I4I8))]
   "TARGET_MMA"
   "<vvi4i4i8> %A0,%x1,%x2,%3,%4,%5"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<avvi4i4i8>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 5 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 6 "u8bit_cint_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 6 "u8bit_cint_operand" "n,n,n")]
 		    MMA_AVVI4I4I8))]
   "TARGET_MMA"
   "<avvi4i4i8> %A0,%x2,%x3,%4,%5,%6"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<vvi4i4i2>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 3 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 5 "const_0_to_3_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 5 "const_0_to_3_operand" "n,n,n")]
 		    MMA_VVI4I4I2))]
   "TARGET_MMA"
   "<vvi4i4i2> %A0,%x1,%x2,%3,%4,%5"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<avvi4i4i2>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 5 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 6 "const_0_to_3_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 6 "const_0_to_3_operand" "n,n,n")]
 		    MMA_AVVI4I4I2))]
   "TARGET_MMA"
   "<avvi4i4i2> %A0,%x2,%x3,%4,%5,%6"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<vvi4i4>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 3 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")]
 		    MMA_VVI4I4))]
   "TARGET_MMA"
   "<vvi4i4> %A0,%x1,%x2,%3,%4"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<avvi4i4>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 5 "const_0_to_15_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")]
 		    MMA_AVVI4I4))]
   "TARGET_MMA"
   "<avvi4i4> %A0,%x2,%x3,%4,%5"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<pvi4i2>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:OO 1 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 3 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 4 "const_0_to_3_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:OO 1 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 4 "const_0_to_3_operand" "n,n,n")]
 		    MMA_PVI4I2))]
   "TARGET_MMA"
   "<pvi4i2> %A0,%x1,%x2,%3,%4"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<apvi4i2>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
-		    (match_operand:OO 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 5 "const_0_to_3_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
+		    (match_operand:OO 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 5 "const_0_to_3_operand" "n,n,n")]
 		    MMA_APVI4I2))]
   "TARGET_MMA"
   "<apvi4i2> %A0,%x2,%x3,%4,%5"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<vvi4i4i4>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 3 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 5 "const_0_to_15_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:V16QI 1 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 3 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")]
 		    MMA_VVI4I4I4))]
   "TARGET_MMA"
   "<vvi4i4i4> %A0,%x1,%x2,%3,%4,%5"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
 
 (define_insn "mma_<avvi4i4i4>"
-  [(set (match_operand:XO 0 "fpr_reg_operand" "=&d,&d")
-	(unspec:XO [(match_operand:XO 1 "fpr_reg_operand" "0,0")
-		    (match_operand:V16QI 2 "vsx_register_operand" "v,?wa")
-		    (match_operand:V16QI 3 "vsx_register_operand" "v,?wa")
-		    (match_operand:SI 4 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 5 "const_0_to_15_operand" "n,n")
-		    (match_operand:SI 6 "const_0_to_15_operand" "n,n")]
+  [(set (match_operand:XO 0 "accumulator_operand" "=wD,&d,&d")
+	(unspec:XO [(match_operand:XO 1 "accumulator_operand" "0,0,0")
+		    (match_operand:V16QI 2 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:V16QI 3 "vsx_register_operand" "wa,v,?wa")
+		    (match_operand:SI 4 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 5 "const_0_to_15_operand" "n,n,n")
+		    (match_operand:SI 6 "const_0_to_15_operand" "n,n,n")]
 		    MMA_AVVI4I4I4))]
   "TARGET_MMA"
   "<avvi4i4i4> %A0,%x2,%x3,%4,%5,%6"
   [(set_attr "type" "mma")
-   (set_attr "prefixed" "yes")])
+   (set_attr "prefixed" "yes")
+   (set_attr "isa" "dm,not_dm,not_dm")])
diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index f2fb5bef678..4342620f87f 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -600,6 +600,9 @@  rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags)
   /* Tell the user if we support the MMA instructions.  */
   if ((flags & OPTION_MASK_MMA) != 0)
     rs6000_define_or_undefine_macro (define_p, "__MMA__");
+  /* Tell the user if we support the dense math instructions.  */
+  if ((flags & OPTION_MASK_DENSE_MATH) != 0)
+    rs6000_define_or_undefine_macro (define_p, "__PPC_DMR__");
   /* Whether pc-relative code is being generated.  */
   if ((flags & OPTION_MASK_PCREL) != 0)
     rs6000_define_or_undefine_macro (define_p, "__PCREL__");
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 83e32f7a43a..59517c8608d 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -14264,8 +14264,13 @@  print_operand (FILE *file, rtx x, int code)
 	 overlapping with the FPR registers.  */
       if (!REG_P (x))
 	output_operand_lossage ("invalid %%A value");
-      else if (TARGET_DENSE_MATH && DMR_REGNO_P (REGNO (x)))
-	fprintf (file, "%d", REGNO (x) - FIRST_DMR_REGNO);
+      else if (TARGET_DENSE_MATH)
+	{
+	  if (DMR_REGNO_P (REGNO (x)))
+	    fprintf (file, "%d", REGNO (x) - FIRST_DMR_REGNO);
+	  else
+	    output_operand_lossage ("%%A operand is not a DMR");
+	}
       else if (!FP_REGNO_P (REGNO (x)) || (REGNO (x) % 4) != 0)
 	output_operand_lossage ("invalid %%A value");
       else
@@ -27719,7 +27724,7 @@  rs6000_split_multireg_move (rtx dst, rtx src)
 
 	  /* If we are reading an accumulator register, we have to
 	     deprime it before we can access it.  */
-	  if (TARGET_MMA
+	  if (TARGET_MMA && !TARGET_DENSE_MATH
 	      && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
 	    emit_insn (gen_mma_xxmfacc (src, src));
 
@@ -27751,9 +27756,9 @@  rs6000_split_multireg_move (rtx dst, rtx src)
 	      emit_insn (gen_rtx_SET (dst2, src2));
 	    }
 
-	  /* If we are writing an accumulator register, we have to
-	     prime it after we've written it.  */
-	  if (TARGET_MMA
+	  /* If we are writing an accumulator register that overlaps with the
+	     FPR registers, we have to prime it after we've written it.  */
+	  if (TARGET_MMA && !TARGET_DENSE_MATH
 	      && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
 	    emit_insn (gen_mma_xxmtacc (dst, dst));
 
@@ -27822,9 +27827,9 @@  rs6000_split_multireg_move (rtx dst, rtx src)
 	      emit_insn (gen_rtx_SET (dst_i, op));
 	    }
 
-	  /* We are writing an accumulator register, so we have to
-	     prime it after we've written it.  */
-	  if (GET_MODE (src) == XOmode)
+	  /* On systems without dense math where accumulators overlap with the
+	     vector registers, we have to prime it after we've written it.  */
+	  if (GET_MODE (src) == XOmode && !TARGET_DENSE_MATH)
 	    emit_insn (gen_mma_xxmtacc (dst, dst));
 
 	  return;
@@ -27835,9 +27840,9 @@  rs6000_split_multireg_move (rtx dst, rtx src)
 
   if (REG_P (src) && REG_P (dst) && (REGNO (src) < REGNO (dst)))
     {
-      /* If we are reading an accumulator register, we have to
-	 deprime it before we can access it.  */
-      if (TARGET_MMA
+      /* If we are reading an accumulator register and we don't have dense
+	 math, we have to deprime it before we can access it.  */
+      if (TARGET_MMA && !TARGET_DENSE_MATH
 	  && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
 	emit_insn (gen_mma_xxmfacc (src, src));
 
@@ -27865,7 +27870,7 @@  rs6000_split_multireg_move (rtx dst, rtx src)
 
       /* If we are writing an accumulator register, we have to
 	 prime it after we've written it.  */
-      if (TARGET_MMA
+      if (TARGET_MMA && !TARGET_DENSE_MATH
 	  && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
 	emit_insn (gen_mma_xxmtacc (dst, dst));
     }
@@ -28002,7 +28007,7 @@  rs6000_split_multireg_move (rtx dst, rtx src)
 
       /* If we are reading an accumulator register, we have to
 	 deprime it before we can access it.  */
-      if (TARGET_MMA && REG_P (src)
+      if (TARGET_MMA && !TARGET_DENSE_MATH && REG_P (src)
 	  && GET_MODE (src) == XOmode && FP_REGNO_P (REGNO (src)))
 	emit_insn (gen_mma_xxmfacc (src, src));
 
@@ -28034,7 +28039,7 @@  rs6000_split_multireg_move (rtx dst, rtx src)
 
       /* If we are writing an accumulator register, we have to
 	 prime it after we've written it.  */
-      if (TARGET_MMA && REG_P (dst)
+      if (TARGET_MMA && !TARGET_DENSE_MATH && REG_P (dst)
 	  && GET_MODE (dst) == XOmode && FP_REGNO_P (REGNO (dst)))
 	emit_insn (gen_mma_xxmtacc (dst, dst));