[0/6] PowerPC Future patches

Message ID	ZTBwes3Kzi/k2/WR@cowardly-lion.the-meissners.org
Headers	Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AD8DF3858D33 Date: Wed, 18 Oct 2023 19:55:38 -0400 From: Michael Meissner <meissner@linux.ibm.com> To: gcc-patches@gcc.gnu.org, Michael Meissner <meissner@linux.ibm.com>, Segher Boessenkool <segher@kernel.crashing.org>, "Kewen.Lin" <linkw@linux.ibm.com>, David Edelsohn <dje.gcc@gmail.com>, Peter Bergner <bergner@linux.ibm.com> Subject: [PATCH 0/6] PowerPC Future patches Message-ID: <ZTBwes3Kzi/k2/WR@cowardly-lion.the-meissners.org> Mail-Followup-To: Michael Meissner <meissner@linux.ibm.com>, gcc-patches@gcc.gnu.org, Segher Boessenkool <segher@kernel.crashing.org>, "Kewen.Lin" <linkw@linux.ibm.com>, David Edelsohn <dje.gcc@gmail.com>, Peter Bergner <bergner@linux.ibm.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline MIME-Version: 1.0 Precedence: list Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX
Series	PowerPC Future patches \| [0/6] PowerPC Future patches [2/6] PowerPC: Make -mcpu=future enable -mblock-ops-vector-pair. [3/6] PowerPC: Add support for accumulators in DMR registers. [4/6] PowerPC: Make MMA insns support DMR registers. [5/6] PowerPC: Switch to dense math names for all MMA operations. [6/6] PowerPC: Add support for 1,024 bit DMR registers.

Message ID

ZTBwes3Kzi/k2/WR@cowardly-lion.the-meissners.org

Headers

Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 client-ip=2620:52:3:1:0:246e:9693:128c;
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AD8DF3858D33
Date: Wed, 18 Oct 2023 19:55:38 -0400
From: Michael Meissner <meissner@linux.ibm.com>
To: gcc-patches@gcc.gnu.org, Michael Meissner <meissner@linux.ibm.com>,
 Segher Boessenkool <segher@kernel.crashing.org>,
 "Kewen.Lin" <linkw@linux.ibm.com>, David Edelsohn <dje.gcc@gmail.com>,
 Peter Bergner <bergner@linux.ibm.com>
Subject: [PATCH 0/6] PowerPC Future patches
Message-ID: <ZTBwes3Kzi/k2/WR@cowardly-lion.the-meissners.org>
Mail-Followup-To: Michael Meissner <meissner@linux.ibm.com>,
 gcc-patches@gcc.gnu.org,
 Segher Boessenkool <segher@kernel.crashing.org>,
 "Kewen.Lin" <linkw@linux.ibm.com>,
 David Edelsohn <dje.gcc@gmail.com>,
 Peter Bergner <bergner@linux.ibm.com>
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
MIME-Version: 1.0
Precedence: list
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org

Series

PowerPC Future patches |

Message

Michael Meissner Oct. 18, 2023, 11:55 p.m. UTC

  This patch is very preliminary support for a potential new feature to the
PowerPC that extends the current power10 MMA architecture.  This feature may or
may not be present in any specific future PowerPC processor.

In the current MMA subsystem for Power10, there are 8 512-bit accumulator
registers.  These accumulators are each tied to sets of 4 FPR registers.  When
you issue a prime instruction, it makes sure the accumulator is a copy of the 4
FPR registers the accumulator is tied to.  When you issue a deprime
instruction, it makes sure that the accumulator data content is logically
copied to the matching FPR register.

In the potential dense math system, the accumulators are moved to separate
registers called dense math registers (DM registers or DMR).  The DMRs are then
extended to 1,024 bits and new instructions will be added to deal with all
1,024 bits of the DMRs.

If you take existing MMA code, it will work as long as you don't do anything
with accumulators, and you follow the rules in the ISA 3.1 documentation for
using the MMA subsystem.

These patches add support for the 512-bit accumulators within the dense math
system, and for allocation of the 1,024-bit DMRs.  At this time, no additional
built-in functions will be done to support any dense math features other than
doing data movement between the DMRs and the VSX registers.  Before we can look
at adding any new dense math support other than data movement, we need the GCC
compiler to be able to allocate and use these DMRs.

There are 6 patches in this patch set:

1) The first patch just adds -mcpu=future as an option to add new support.
This is similar to the -mcpu=future that we did before power10 was announced.

2) The second patch enables GCC to use the load and store vector pair
instructions to optimize memory copy operations in the compiler.  For power10,
we needed to just stay with normal vector load/stores for memory copy
operations.

3) The third patch enables 512-bit accumulators that are located within in DMRs
instead of the FPRs.  This patch enables the register allocation, but it does
not move the existing MMA to use these registers.

4) The fourth patch switches the MMA subsystem to use 512-bit accumulators
within DMRs if you use -mcpu=future.

5) The fifth patch switches the names of the MMA instructions to use the dense
math equivalent name if -mcpu=future.

6) The sixth patch enables using the full 1,024-bit DMRs.  Right now, all you
can do with DMRs is move a VSX register to a DMR register, and to move a DMR
register to a VSX register.

In terms of changes, these patch now use the wD constraint for accumulators.
If you compile with -mcpu=power10, the wD constraint will match the equivalent
FPR register that overlaps with the accumulator.  If you compile with
-mcpu=future, the wD constraint will match the DMR register and not the FPR
register.

These patches also modifies the print_operand %A output modifier to print out
DMR register numbers if -mcpu=future, and continue to print out the FPR
register number divided by 4 for -mcpu=power10.

In general, if you only use the built-in functions, things work between the two
systems.  If you use extended asm, you will likely need to modify the code.
Going forward, hopefully if you modify your code to use the wD constraint and
%A output modifier, you can write code that switches more easily between the
two systems.

Again, these are preliminary patches for a potential future machine.  Things
will likely change in terms of implementation and usage over time.

Originally these patches were submitted in November 2022:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605581.html

Comments

Michael Meissner Oct. 18, 2023, 11:58 p.m. UTC | #1

This patch implements support for a potential future PowerPC cpu.  Features
added with -mcpu=future, may or may not be added to new PowerPC processors.

This patch adds support for the -mcpu=future option.  If you use -mcpu=future,
the macro __ARCH_PWR_FUTURE__ is defined, and the assembler .machine directive
"future" is used.  Future patches in this series will add support for new
instructions that may be present in future PowerPC processors.

This particular patch does not any new features.  It exists as a ground work
for future patches to support for a possible PowerPC processor in the future.

This patch does not implement any differences in tuning when -mcpu=future is
used compared to -mcpu=power10.  If -mcpu=future is used, GCC will use power10
tuning.  If you explicitly use -mtune=future, you will get a warning that
-mtune=future is not supported, and default tuning will be set for power10.

The patches have been tested on both little and big endian systems.  Can I check
it into the master branch?

2023-10-18   Michael Meissner  <meissner@linux.ibm.com>

gcc/

	* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
	__ARCH_PWR_FUTURE__ if -mcpu=future.
	* config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS): New macro.
	(POWERPC_MASKS): Add -mcpu=future support.
	* config/rs6000/rs6000-opts.h (enum processor_type): Add
	PROCESSOR_FUTURE.
	* config/rs6000/rs6000-tables.opt: Regenerate.
	* config/rs6000/rs6000.cc (rs600_cpu_index_lookup): New helper
	function.
	(rs6000_option_override_internal): Make -mcpu=future set
	-mtune=power10.  If the user explicitly uses -mtune=future, give a
	warning and reset the tuning to power10.
	(rs6000_option_override_internal): Use power10 costs for future
	machine.
	(rs6000_machine_from_flags): Add support for -mcpu=future.
	(rs6000_opt_masks): Likewise.
	* config/rs6000/rs6000.h (ASM_CPU_SUPPORT): Likewise.
	* config/rs6000/rs6000.md (cpu attribute): Likewise.
	* config/rs6000/rs6000.opt (-mfuture): New undocumented debug switch.
	* doc/invoke.texi (IBM RS/6000 and PowerPC Options): Document -mcpu=future.
---
 gcc/config/rs6000/rs6000-c.cc       |  2 +
 gcc/config/rs6000/rs6000-cpus.def   |  6 +++
 gcc/config/rs6000/rs6000-opts.h     |  4 +-
 gcc/config/rs6000/rs6000-tables.opt |  3 ++
 gcc/config/rs6000/rs6000.cc         | 58 ++++++++++++++++++++++++-----
 gcc/config/rs6000/rs6000.h          |  1 +
 gcc/config/rs6000/rs6000.md         |  2 +-
 gcc/config/rs6000/rs6000.opt        |  4 ++
 gcc/doc/invoke.texi                 |  2 +-
 9 files changed, 69 insertions(+), 13 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 65be0ac43e2..e276c20cccd 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -447,6 +447,8 @@ rs6000_target_modify_macros (bool define_p, HOST_WIDE_INT flags)
     rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR9");
   if ((flags & OPTION_MASK_POWER10) != 0)
     rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR10");
+  if ((flags & OPTION_MASK_FUTURE) != 0)
+    rs6000_define_or_undefine_macro (define_p, "_ARCH_PWR_FUTURE");
   if ((flags & OPTION_MASK_SOFT_FLOAT) != 0)
     rs6000_define_or_undefine_macro (define_p, "_SOFT_FLOAT");
   if ((flags & OPTION_MASK_RECIP_PRECISION) != 0)
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index 8c530a22da8..a6d9d7bf9a8 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -88,6 +88,10 @@
 				 | OPTION_MASK_POWER10			\
 				 | OTHER_POWER10_MASKS)
 
+/* Flags for a potential future processor that may or may not be delivered.  */
+#define ISA_FUTURE_MASKS	(ISA_3_1_MASKS_SERVER			\
+				 | OPTION_MASK_FUTURE)
+
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS	(OPTION_MASK_FLOAT128_HW		\
 				 | OPTION_MASK_P9_MINMAX)
@@ -134,6 +138,7 @@
 				 | OPTION_MASK_FPRND			\
 				 | OPTION_MASK_POWER10			\
 				 | OPTION_MASK_P10_FUSION		\
+				 | OPTION_MASK_FUTURE			\
 				 | OPTION_MASK_HTM			\
 				 | OPTION_MASK_ISEL			\
 				 | OPTION_MASK_LOAD_VECTOR_PAIR		\
@@ -267,3 +272,4 @@ RS6000_CPU ("powerpc64", PROCESSOR_POWERPC64, OPTION_MASK_PPC_GFXOPT
 RS6000_CPU ("powerpc64le", PROCESSOR_POWER8, MASK_POWERPC64
 	    | ISA_2_7_MASKS_SERVER | OPTION_MASK_HTM)
 RS6000_CPU ("rs64", PROCESSOR_RS64A, OPTION_MASK_PPC_GFXOPT | MASK_POWERPC64)
+RS6000_CPU ("future", PROCESSOR_FUTURE, MASK_POWERPC64 | ISA_FUTURE_MASKS)
diff --git a/gcc/config/rs6000/rs6000-opts.h b/gcc/config/rs6000/rs6000-opts.h
index 8040cfdc06e..f56f01d6fa5 100644
--- a/gcc/config/rs6000/rs6000-opts.h
+++ b/gcc/config/rs6000/rs6000-opts.h
@@ -67,7 +67,9 @@ enum processor_type
    PROCESSOR_MPCCORE,
    PROCESSOR_CELL,
    PROCESSOR_PPCA2,
-   PROCESSOR_TITAN
+   PROCESSOR_TITAN,
+
+   PROCESSOR_FUTURE
 };
 
 
diff --git a/gcc/config/rs6000/rs6000-tables.opt b/gcc/config/rs6000/rs6000-tables.opt
index b82f8205fa1..3ff28e39f6c 100644
--- a/gcc/config/rs6000/rs6000-tables.opt
+++ b/gcc/config/rs6000/rs6000-tables.opt
@@ -197,3 +197,6 @@ Enum(rs6000_cpu_opt_value) String(powerpc64le) Value(55)
 EnumValue
 Enum(rs6000_cpu_opt_value) String(rs64) Value(56)
 
+EnumValue
+Enum(rs6000_cpu_opt_value) String(future) Value(57)
+
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 8f06b37171a..a48a161eb55 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1800,6 +1800,18 @@ rs6000_cpu_name_lookup (const char *name)
   return -1;
 }
 
+/* Look up the index for a specific processor.  */
+
+static int
+rs600_cpu_index_lookup (enum processor_type processor)
+{
+  for (size_t i = 0; i < ARRAY_SIZE (processor_target_table); i++)
+    if (processor_target_table[i].processor == processor)
+      return i;
+
+  return -1;
+}
+
 
 /* Return number of consecutive hard regs needed starting at reg REGNO
    to hold something of mode MODE.
@@ -3746,23 +3758,45 @@ rs6000_option_override_internal (bool global_init_p)
     rs6000_isa_flags &= ~OPTION_MASK_POWERPC64;
 #endif
 
+  /* At the moment, we don't have explict -mtune=future support.  If the user
+     explicitly tried to use -mtune=future, give a warning.  If not, use the
+     power10 tuning until future tuning is added.  */
   if (rs6000_tune_index >= 0)
-    tune_index = rs6000_tune_index;
+    {
+      enum processor_type cur_proc
+	= processor_target_table[rs6000_tune_index].processor;
+
+      if (cur_proc == PROCESSOR_FUTURE)
+	{
+	  static bool issued_future_tune_warning = false;
+	  if (!issued_future_tune_warning)
+	    {
+	      issued_future_tune_warning = true;
+	      warning (0, "%qs is not currently supported", "-mtune=future");
+	    }
+
+	  rs6000_tune_index = rs600_cpu_index_lookup (PROCESSOR_POWER10);
+	}
+      tune_index = rs6000_tune_index;
+    }
   else if (cpu_index >= 0)
-    rs6000_tune_index = tune_index = cpu_index;
+    {
+      enum processor_type cur_cpu
+	= processor_target_table[cpu_index].processor;
+
+      rs6000_tune_index = tune_index
+	= (cur_cpu == PROCESSOR_FUTURE
+	   ? rs600_cpu_index_lookup (PROCESSOR_POWER10)
+	   : cpu_index);
+    }
   else
     {
-      size_t i;
       enum processor_type tune_proc
 	= (TARGET_POWERPC64 ? PROCESSOR_DEFAULT64 : PROCESSOR_DEFAULT);
 
-      tune_index = -1;
-      for (i = 0; i < ARRAY_SIZE (processor_target_table); i++)
-	if (processor_target_table[i].processor == tune_proc)
-	  {
-	    tune_index = i;
-	    break;
-	  }
+      tune_index = rs600_cpu_index_lookup (tune_proc == PROCESSOR_FUTURE
+					   ? PROCESSOR_POWER10
+					   : tune_proc);
     }
 
   if (cpu_index >= 0)
@@ -4775,6 +4809,7 @@ rs6000_option_override_internal (bool global_init_p)
 	break;
 
       case PROCESSOR_POWER10:
+      case PROCESSOR_FUTURE:
 	rs6000_cost = &power10_cost;
 	break;
 
@@ -5934,6 +5969,8 @@ rs6000_machine_from_flags (void)
   /* Disable the flags that should never influence the .machine selection.  */
   flags &= ~(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT | OPTION_MASK_ISEL);
 
+  if ((flags & (ISA_FUTURE_MASKS & ~ISA_3_1_MASKS_SERVER)) != 0)
+    return "future";
   if ((flags & (ISA_3_1_MASKS_SERVER & ~ISA_3_0_MASKS_SERVER)) != 0)
     return "power10";
   if ((flags & (ISA_3_0_MASKS_SERVER & ~ISA_2_7_MASKS_SERVER)) != 0)
@@ -24458,6 +24495,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
   { "float128-hardware",	OPTION_MASK_FLOAT128_HW,	false, true  },
   { "fprnd",			OPTION_MASK_FPRND,		false, true  },
   { "power10",			OPTION_MASK_POWER10,		false, true  },
+  { "future",			OPTION_MASK_FUTURE,		false, true  },
   { "hard-dfp",			OPTION_MASK_DFP,		false, true  },
   { "htm",			OPTION_MASK_HTM,		false, true  },
   { "isel",			OPTION_MASK_ISEL,		false, true  },
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 22595f6ebd7..a9c9a11765c 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -163,6 +163,7 @@
   mcpu=e5500: -me5500; \
   mcpu=e6500: -me6500; \
   mcpu=titan: -mtitan; \
+  mcpu=future: -mfuture; \
   !mcpu*: %{mpower9-vector: -mpower9; \
 	    mpower8-vector|mcrypto|mdirect-move|mhtm: -mpower8; \
 	    mvsx: -mpower7; \
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 2a1b5ecfaee..bc530948aff 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -351,7 +351,7 @@ (define_attr "cpu"
    ppc403,ppc405,ppc440,ppc476,
    ppc8540,ppc8548,ppce300c2,ppce300c3,ppce500mc,ppce500mc64,ppce5500,ppce6500,
    power4,power5,power6,power7,power8,power9,power10,
-   rs64a,mpccore,cell,ppca2,titan"
+   rs64a,mpccore,cell,ppca2,titan,future"
   (const (symbol_ref "(enum attr_cpu) rs6000_tune")))
 
 ;; The ISA we implement.
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 369095df9ed..6c8ca106b30 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -628,6 +628,10 @@ mieee128-constant
 Target Var(TARGET_IEEE128_CONSTANT) Init(1) Save
 Generate (do not generate) code that uses the LXVKQ instruction.
 
+mfuture
+Target Undocumented Mask(FUTURE) Var(rs6000_isa_flags)
+Generate (do not generate) future instructions.
+
 ; Documented parameters
 
 -param=rs6000-vect-unroll-limit=
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index eb714d18511..3558e7effe7 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -29747,7 +29747,7 @@ Supported values for @var{cpu_type} are @samp{401}, @samp{403},
 @samp{titan}, @samp{power3}, @samp{power4}, @samp{power5}, @samp{power5+},
 @samp{power6}, @samp{power6x}, @samp{power7}, @samp{power8},
 @samp{power9}, @samp{power10}, @samp{powerpc}, @samp{powerpc64},
-@samp{powerpc64le}, @samp{rs64}, and @samp{native}.
+@samp{powerpc64le}, @samp{rs64}, @samp{future}, and @samp{native}.
 
 @option{-mcpu=powerpc}, @option{-mcpu=powerpc64}, and
 @option{-mcpu=powerpc64le} specify pure 32-bit PowerPC (either

Michael Meissner Oct. 25, 2023, 8:15 p.m. UTC | #2

Ping patch.

| Date: Wed, 18 Oct 2023 19:58:56 -0400
| From: Michael Meissner <meissner@linux.ibm.com>
| Subject: Re: [PATCH 1/6] PowerPC: Add -mcpu=future option
| Message-ID: <ZTBxQKlWTmHaKNeO@cowardly-lion.the-meissners.org>

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633511.html

Michael Meissner Nov. 3, 2023, 7:49 a.m. UTC | #3

Ping #2

| Date: Wed, 18 Oct 2023 19:58:56 -0400
| From: Michael Meissner <meissner@linux.ibm.com>
| Subject: Re: [PATCH 1/6] PowerPC: Add -mcpu=future option
| Message-ID: <ZTBxQKlWTmHaKNeO@cowardly-lion.the-meissners.org>

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633511.html