[v3] aarch64: Implement the ACLE instruction/data prefetch functions.

Message ID 20231205150946.3542939-1-victor.donascimento@arm.com
State Accepted
Headers
Series [v3] aarch64: Implement the ACLE instruction/data prefetch functions. |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Victor Do Nascimento Dec. 5, 2023, 3:09 p.m. UTC
  Key changes in v3:
  * Implement the `require_const_argument' function to ensure the nth
  argument in EXP represents a const-type argument in the valid range
  given by [minval, maxval), forgoing expansion altogether when an
  invalid argument is detected early on.
  * Whereas in the previous iteration, out-of-bound function
  parameters led to warnings and sensible defaults set, akin to the
  `__builtin_prefetch' implementation, parameters outside valid ranges
  now result in an error, more faithfully reflecting ACLE
  specifications.

 ---

Implement the ACLE data and instruction prefetch functions[1] with the
following signatures:

  1. Data prefetch intrinsics:
  ----------------------------
  void __pldx (/*constant*/ unsigned int /*access_kind*/,
               /*constant*/ unsigned int /*cache_level*/,
               /*constant*/ unsigned int /*retention_policy*/,
               void const volatile *addr);

  void __pld (void const volatile *addr);

  2. Instruction prefetch intrinsics:
  -----------------------------------
  void __plix (/*constant*/ unsigned int /*cache_level*/,
               /*constant*/ unsigned int /*retention_policy*/,
               void const volatile *addr);

  void __pli (void const volatile *addr);

`__pldx' affords the programmer more fine-grained control over the
data prefetch behavior than the analogous GCC builtin
`__builtin_prefetch', and allows access to the "SLC" cache level.

While `__builtin_prefetch' chooses both cache-level and retention
policy automatically via the optional `locality' parameter, `__pldx'
expects 2 (mandatory) arguments to explicitly define the desired
cache-level and retention policies.

`__plix' on the other hand, generates a code prefetch instruction and
so extends functionality on aarch64 targets beyond that which is
exposed by `builtin_prefetch'.

`__pld' and `__pli' do prefetch of data and instructions,
respectively, using default values for both cache-level and retention
policies.

Bootstrapped and tested on aarch64-none-linux-gnu.

[1] https://arm-software.github.io/acle/main/acle.html#memory-prefetch-intrinsics

gcc/ChangeLog:

	* config/aarch64/aarch64-builtins.cc:
	(AARCH64_PLD): New enum aarch64_builtins entry.
	(AARCH64_PLDX): Likewise.
	(AARCH64_PLI): Likewise.
	(AARCH64_PLIX): Likewise.
	(aarch64_init_prefetch_builtin): New.
	(aarch64_general_init_builtins): Call prefetch init function.
	(aarch64_expand_prefetch_builtin): New.
	(aarch64_general_expand_builtin):  Add prefetch expansion.
	(require_const_argument): New.
	* config/aarch64/aarch64.md (UNSPEC_PLDX): New.
	(aarch64_pldx): New.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/builtin_pld_pli.c: New.
	* gcc.target/aarch64/builtin_pld_pli_illegal.c: New.
---
 gcc/config/aarch64/aarch64-builtins.cc        | 136 ++++++++++++++++++
 gcc/config/aarch64/aarch64.md                 |  12 ++
 gcc/config/aarch64/arm_acle.h                 |  30 ++++
 .../gcc.target/aarch64/builtin_pld_pli.c      |  90 ++++++++++++
 .../aarch64/builtin_pld_pli_illegal.c         |  33 +++++
 5 files changed, 301 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/builtin_pld_pli.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/builtin_pld_pli_illegal.c
  

Comments

Richard Sandiford Dec. 11, 2023, 12:04 p.m. UTC | #1
Victor Do Nascimento <victor.donascimento@arm.com> writes:
> Key changes in v3:
>   * Implement the `require_const_argument' function to ensure the nth
>   argument in EXP represents a const-type argument in the valid range
>   given by [minval, maxval), forgoing expansion altogether when an
>   invalid argument is detected early on.
>   * Whereas in the previous iteration, out-of-bound function
>   parameters led to warnings and sensible defaults set, akin to the
>   `__builtin_prefetch' implementation, parameters outside valid ranges
>   now result in an error, more faithfully reflecting ACLE
>   specifications.
>
>  ---
>
> Implement the ACLE data and instruction prefetch functions[1] with the
> following signatures:
>
>   1. Data prefetch intrinsics:
>   ----------------------------
>   void __pldx (/*constant*/ unsigned int /*access_kind*/,
>                /*constant*/ unsigned int /*cache_level*/,
>                /*constant*/ unsigned int /*retention_policy*/,
>                void const volatile *addr);
>
>   void __pld (void const volatile *addr);
>
>   2. Instruction prefetch intrinsics:
>   -----------------------------------
>   void __plix (/*constant*/ unsigned int /*cache_level*/,
>                /*constant*/ unsigned int /*retention_policy*/,
>                void const volatile *addr);
>
>   void __pli (void const volatile *addr);
>
> `__pldx' affords the programmer more fine-grained control over the
> data prefetch behavior than the analogous GCC builtin
> `__builtin_prefetch', and allows access to the "SLC" cache level.
>
> While `__builtin_prefetch' chooses both cache-level and retention
> policy automatically via the optional `locality' parameter, `__pldx'
> expects 2 (mandatory) arguments to explicitly define the desired
> cache-level and retention policies.
>
> `__plix' on the other hand, generates a code prefetch instruction and
> so extends functionality on aarch64 targets beyond that which is
> exposed by `builtin_prefetch'.
>
> `__pld' and `__pli' do prefetch of data and instructions,
> respectively, using default values for both cache-level and retention
> policies.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> [1] https://arm-software.github.io/acle/main/acle.html#memory-prefetch-intrinsics
>
> gcc/ChangeLog:
>
> 	* config/aarch64/aarch64-builtins.cc:
> 	(AARCH64_PLD): New enum aarch64_builtins entry.
> 	(AARCH64_PLDX): Likewise.
> 	(AARCH64_PLI): Likewise.
> 	(AARCH64_PLIX): Likewise.
> 	(aarch64_init_prefetch_builtin): New.
> 	(aarch64_general_init_builtins): Call prefetch init function.
> 	(aarch64_expand_prefetch_builtin): New.
> 	(aarch64_general_expand_builtin):  Add prefetch expansion.
> 	(require_const_argument): New.
> 	* config/aarch64/aarch64.md (UNSPEC_PLDX): New.
> 	(aarch64_pldx): New.
>
> gcc/testsuite/ChangeLog:
>
> 	* gcc.target/aarch64/builtin_pld_pli.c: New.
> 	* gcc.target/aarch64/builtin_pld_pli_illegal.c: New.
> ---
>  gcc/config/aarch64/aarch64-builtins.cc        | 136 ++++++++++++++++++
>  gcc/config/aarch64/aarch64.md                 |  12 ++
>  gcc/config/aarch64/arm_acle.h                 |  30 ++++
>  .../gcc.target/aarch64/builtin_pld_pli.c      |  90 ++++++++++++
>  .../aarch64/builtin_pld_pli_illegal.c         |  33 +++++
>  5 files changed, 301 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/builtin_pld_pli.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/builtin_pld_pli_illegal.c
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
> index 04f59fd9a54..d092654b6fb 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -808,6 +808,10 @@ enum aarch64_builtins
>    AARCH64_RBIT,
>    AARCH64_RBITL,
>    AARCH64_RBITLL,
> +  AARCH64_PLD,
> +  AARCH64_PLDX,
> +  AARCH64_PLI,
> +  AARCH64_PLIX,
>    AARCH64_BUILTIN_MAX
>  };
>  
> @@ -1798,6 +1802,34 @@ aarch64_init_rng_builtins (void)
>  				   AARCH64_BUILTIN_RNG_RNDRRS);
>  }
>  
> +/* Add builtins for data and instrution prefetch.  */
> +static void
> +aarch64_init_prefetch_builtin (void)
> +{
> +#define AARCH64_INIT_PREFETCH_BUILTIN(INDEX, N)				\
> +  aarch64_builtin_decls[INDEX] =					\
> +    aarch64_general_add_builtin ("__builtin_aarch64_" N, ftype, INDEX)
> +
> +  tree ftype;
> +  tree cv_argtype;
> +  cv_argtype = build_qualified_type (void_type_node, TYPE_QUAL_CONST
> +						     | TYPE_QUAL_VOLATILE);
> +  cv_argtype = build_pointer_type (cv_argtype);
> +
> +  ftype = build_function_type_list (void_type_node, cv_argtype, NULL);
> +  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLD, "pld");
> +  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLI, "pli");
> +
> +  ftype = build_function_type_list (void_type_node, unsigned_type_node,
> +				    unsigned_type_node, unsigned_type_node,
> +				    cv_argtype, NULL);
> +  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLDX, "pldx");
> +
> +  ftype = build_function_type_list (void_type_node, unsigned_type_node,
> +				    unsigned_type_node, cv_argtype, NULL);
> +  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLIX, "plix");
> +}
> +
>  /* Initialize the memory tagging extension (MTE) builtins.  */
>  struct
>  {
> @@ -2019,6 +2051,8 @@ aarch64_general_init_builtins (void)
>    aarch64_init_rng_builtins ();
>    aarch64_init_data_intrinsics ();
>  
> +  aarch64_init_prefetch_builtin ();
> +
>    tree ftype_jcvt
>      = build_function_type_list (intSI_type_node, double_type_node, NULL);
>    aarch64_builtin_decls[AARCH64_JSCVT]
> @@ -2599,6 +2633,102 @@ aarch64_expand_rng_builtin (tree exp, rtx target, int fcode, int ignore)
>    return target;
>  }
>  
> +/* Ensure ARGNO argument in EXP represents a const-type argument in the range

s/ARGNO argument/argument ARGNO/

> +   [MINVAL, MAXVAL).  */
> +static HOST_WIDE_INT
> +require_const_argument (tree exp, unsigned int argno, HOST_WIDE_INT minval,
> +			HOST_WIDE_INT maxval)
> +{
> +  maxval--;
> +  tree arg = CALL_EXPR_ARG (exp, argno);
> +  if (TREE_CODE (arg) != INTEGER_CST)
> +      error_at (EXPR_LOCATION (exp), "Constant-type argument expected");
> +
> +  wi::tree_to_widest_ref argval = wi::to_widest (arg);

Better to use auto here.

> +
> +  if (argval < minval || argval > maxval)
> +    error_at (EXPR_LOCATION (exp),
> +	      "argument %d must be a constant immediate "
> +	      "in range [%ld,%ld]", argno, minval, maxval);

The format for HOST_WIDE_INT is %wd.  (%ld isn't portable across hosts.)

GCC's error messages use 1-based argument numbers rather than 0-based argument
numbers, so this should be argno + 1.

> +
> +  HOST_WIDE_INT retval = *argval.get_val ();

Better to use argval.to_shwi ().

OK with those changes, thanks.

Richard

> +  return retval;
> +}
> +
> +
> +/* Expand a prefetch builtin EXP.  */
> +void
> +aarch64_expand_prefetch_builtin (tree exp, int fcode)
> +{
> +  unsigned narg;
> +
> +  tree args[4];
> +  int kind_id = -1;
> +  int level_id = -1;
> +  int rettn_id = -1;
> +  char prfop[11];
> +  class expand_operand ops[2];
> +
> +  static const char *kind_s[] = {"PLD", "PST", "PLI"};
> +  static const char *level_s[] = {"L1", "L2", "L3", "SLC"};
> +  static const char *rettn_s[] = {"KEEP", "STRM"};
> +
> +  /* Each of the four prefetch builtins takes a different number of
> +     arguments, but proceeds to call the PRFM insn which requires 4
> +     pieces of information to be fully defined.
> +
> +     Specify the total number of arguments for each builtin and, where
> +     one of these takes less than 4 arguments, set sensible defaults.  */
> +  switch (fcode)
> +    {
> +    case AARCH64_PLDX:
> +      narg = 4;
> +      break;
> +    case AARCH64_PLIX:
> +      kind_id = 2;
> +      narg = 3;
> +      break;
> +    case AARCH64_PLI:
> +    case AARCH64_PLD:
> +      kind_id  = (fcode == AARCH64_PLD) ? 0 : 2;
> +      level_id = 0;
> +      rettn_id = 0;
> +      narg = 1;
> +      break;
> +    default:
> +      gcc_unreachable ();
> +    }
> +
> +  /* Any -1 id variable is to be user-supplied.  Here we fill these in and
> +     run bounds checks on them.
> +     "PLI" is used only implicitly by AARCH64_PLI & AARCH64_PLIX, never
> +     explicitly.  */
> +  int argno = 0;
> +  if (kind_id < 0)
> +    kind_id = require_const_argument (exp, argno++, 0, ARRAY_SIZE (kind_s) - 1);
> +  if (level_id < 0)
> +    level_id = require_const_argument (exp, argno++, 0, ARRAY_SIZE (level_s));
> +  if (rettn_id < 0)
> +    rettn_id = require_const_argument (exp, argno++, 0, ARRAY_SIZE (rettn_s));
> +  rtx address = expand_expr (CALL_EXPR_ARG (exp, argno), NULL_RTX, Pmode,
> +			     EXPAND_NORMAL);
> +
> +  if (seen_error ())
> +    return;
> +
> +  sprintf (prfop, "%s%s%s", kind_s[kind_id],
> +			    level_s[level_id],
> +			    rettn_s[rettn_id]);
> +
> +  rtx const_str = rtx_alloc (CONST_STRING);
> +  PUT_CODE (const_str, CONST_STRING);
> +  XSTR (const_str, 0) = ggc_strdup (prfop);
> +
> +  create_fixed_operand (&ops[0], const_str);
> +  create_address_operand (&ops[1], address);
> +  maybe_expand_insn (CODE_FOR_aarch64_pldx, 2, ops);
> +}
> +
>  /* Expand an expression EXP that calls a MEMTAG built-in FCODE
>     with result going to TARGET.  */
>  static rtx
> @@ -2832,6 +2962,12 @@ aarch64_general_expand_builtin (unsigned int fcode, tree exp, rtx target,
>      case AARCH64_BUILTIN_RNG_RNDR:
>      case AARCH64_BUILTIN_RNG_RNDRRS:
>        return aarch64_expand_rng_builtin (exp, target, fcode, ignore);
> +    case AARCH64_PLD:
> +    case AARCH64_PLDX:
> +    case AARCH64_PLI:
> +    case AARCH64_PLIX:
> +      aarch64_expand_prefetch_builtin (exp, fcode);
> +      return target;
>      }
>  
>    if (fcode >= AARCH64_SIMD_BUILTIN_BASE && fcode <= AARCH64_SIMD_BUILTIN_MAX)
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index b8e12fc1d4b..68713c58f07 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -338,6 +338,7 @@
>      UNSPEC_UPDATE_FFRT
>      UNSPEC_RDFFR
>      UNSPEC_WRFFR
> +    UNSPEC_PLDX
>      ;; Represents an SVE-style lane index, in which the indexing applies
>      ;; within the containing 128-bit block.
>      UNSPEC_SVE_LANE_SELECT
> @@ -916,6 +917,17 @@
>    [(set_attr "type" "load_4")]
>  )
>  
> +(define_insn "aarch64_pldx"
> +  [(unspec [(match_operand 0 "" "")
> +	    (match_operand:DI 1 "aarch64_prefetch_operand" "Dp")] UNSPEC_PLDX)]
> +  ""
> +  {
> +    operands[1] = gen_rtx_MEM (DImode, operands[1]);
> +    return "prfm\\t%0, %1";
> +  }
> +  [(set_attr "type" "load_4")]
> +)
> +
>  (define_insn "trap"
>    [(trap_if (const_int 1) (const_int 8))]
>    ""
> diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
> index 7599a32301d..40e5aa61be2 100644
> --- a/gcc/config/aarch64/arm_acle.h
> +++ b/gcc/config/aarch64/arm_acle.h
> @@ -78,6 +78,36 @@ _GCC_ARM_ACLE_DATA_FN (revll, bswap64, uint64_t, uint64_t)
>  
>  #undef _GCC_ARM_ACLE_DATA_FN
>  
> +__extension__ extern __inline void
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__pld (void const volatile *__addr)
> +{
> +  return __builtin_aarch64_pld (__addr);
> +}
> +
> +__extension__ extern __inline void
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__pli (void const volatile *__addr)
> +{
> +  return __builtin_aarch64_pli (__addr);
> +}
> +
> +__extension__ extern __inline void
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__plix (unsigned int __cache, unsigned int __rettn,
> +	void const volatile *__addr)
> +{
> +  return __builtin_aarch64_plix (__cache, __rettn, __addr);
> +}
> +
> +__extension__ extern __inline void
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__pldx (unsigned int __access, unsigned int __cache, unsigned int __rettn,
> +	void const volatile *__addr)
> +{
> +  return __builtin_aarch64_pldx (__access, __cache, __rettn, __addr);
> +}
> +
>  __extension__ extern __inline unsigned long
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>  __revl (unsigned long __value)
> diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_pld_pli.c b/gcc/testsuite/gcc.target/aarch64/builtin_pld_pli.c
> new file mode 100644
> index 00000000000..8cbaa97c00c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/builtin_pld_pli.c
> @@ -0,0 +1,90 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=armv8-a -O2" } */
> +
> +#include <arm_acle.h>
> +
> +/* Check that we can generate the immediate-offset addressing
> +   mode for PRFM.  */
> +
> +/* Access kind specifiers.  */
> +#define PLD 0
> +#define PST 1
> +/* Cache levels.  */
> +#define L1  0
> +#define L2  1
> +#define L3  2
> +#define SLC 3
> +/* Retention policies.  */
> +#define KEEP 0
> +#define STRM 1
> +
> +void
> +prefetch_for_read_write (void *a)
> +{
> +  __pldx (PLD, L1, KEEP, a);
> +  __pldx (PLD, L1, STRM, a);
> +  __pldx (PLD, L2, KEEP, a);
> +  __pldx (PLD, L2, STRM, a);
> +  __pldx (PLD, L3, KEEP, a);
> +  __pldx (PLD, L3, STRM, a);
> +  __pldx (PLD, SLC, KEEP, a);
> +  __pldx (PLD, SLC, STRM, a);
> +  __pldx (PST, L1, KEEP, a);
> +  __pldx (PST, L1, STRM, a);
> +  __pldx (PST, L2, KEEP, a);
> +  __pldx (PST, L2, STRM, a);
> +  __pldx (PST, L3, KEEP, a);
> +  __pldx (PST, L3, STRM, a);
> +  __pldx (PST, SLC, KEEP, a);
> +  __pldx (PST, SLC, STRM, a);
> +}
> +
> +/* { dg-final { scan-assembler "prfm\tPLDL1KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLDL1STRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLDL2KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLDL2STRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLDL3KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLDL3STRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLDSLCKEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLDSLCSTRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPSTL1KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPSTL1STRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPSTL2KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPSTL2STRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPSTL3KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPSTL3STRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPSTSLCKEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPSTSLCSTRM, \\\[x\[0-9\]+\\\]" } } */
> +
> +void
> +prefetch_simple (void *a)
> +{
> +  __pld (a);
> +  __pli (a);
> +}
> +
> +/* { dg-final { scan-assembler "prfm\tPLDL1KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLIL1KEEP, \\\[x\[0-9\]+\\\]" } } */
> +
> +void
> +prefetch_instructions (void *a)
> +{
> +  __plix (L1, KEEP, a);
> +  __plix (L1, STRM, a);
> +  __plix (L2, KEEP, a);
> +  __plix (L2, STRM, a);
> +  __plix (L3, KEEP, a);
> +  __plix (L3, STRM, a);
> +  __plix (SLC, KEEP, a);
> +  __plix (SLC, STRM, a);
> +}
> +
> +/* { dg-final { scan-assembler "prfm\tPLIL1KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLIL1STRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLIL2KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLIL2STRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLIL3KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLIL3STRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLISLCKEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLISLCSTRM, \\\[x\[0-9\]+\\\]" } } */
> +
> diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_pld_pli_illegal.c b/gcc/testsuite/gcc.target/aarch64/builtin_pld_pli_illegal.c
> new file mode 100644
> index 00000000000..fa9faf71ad1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/builtin_pld_pli_illegal.c
> @@ -0,0 +1,33 @@
> +/* Check that PRFM-related bounds checks are applied correctly.  */
> +/* { dg-do compile } */
> +#include <arm_acle.h>
> +
> +/* Access kind specifiers.  */
> +#define KIND_LOW -1
> +#define KIND_HIGH 2
> +/* Cache levels.  */
> +#define LEVEL_LOW  -1
> +#define LEVEL_HIGH  4
> +/* Retention policies.  */
> +#define POLICY_LOW -1
> +#define POLICY_HIGH 2
> +
> +void
> +data_rw_prefetch_bad_bounds (void *a)
> +{
> +  __builtin_aarch64_pldx (KIND_LOW, 0, 0, a);  /* { dg-error {argument 0 must be a constant immediate in range \[0,1\]} } */
> +  __builtin_aarch64_pldx (KIND_HIGH, 0, 0, a); /* { dg-error {argument 0 must be a constant immediate in range \[0,1\]} } */
> +  __builtin_aarch64_pldx (0, LEVEL_LOW, 0, a);  /* { dg-error {argument 1 must be a constant immediate in range \[0,3\]} } */
> +  __builtin_aarch64_pldx (0, LEVEL_HIGH, 0, a); /* { dg-error {argument 1 must be a constant immediate in range \[0,3\]} } */
> +  __builtin_aarch64_pldx (0, 0, POLICY_LOW, a);  /* { dg-error {argument 2 must be a constant immediate in range \[0,1\]} } */
> +  __builtin_aarch64_pldx (0, 0, POLICY_HIGH, a); /* { dg-error {argument 2 must be a constant immediate in range \[0,1\]} } */
> +}
> +
> +void
> +insn_prefetch_bad_bounds (void *a)
> +{
> +  __builtin_aarch64_plix (LEVEL_LOW, 0, a);  /* { dg-error {argument 0 must be a constant immediate in range \[0,3\]} } */
> +  __builtin_aarch64_plix (LEVEL_HIGH, 0, a); /* { dg-error {argument 0 must be a constant immediate in range \[0,3\]} } */
> +  __builtin_aarch64_plix (0, POLICY_LOW, a);  /* { dg-error {argument 1 must be a constant immediate in range \[0,1\]} } */
> +  __builtin_aarch64_plix (0, POLICY_HIGH, a); /* { dg-error {argument 1 must be a constant immediate in range \[0,1\]} } */
> +}
  

Patch

diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
index 04f59fd9a54..d092654b6fb 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -808,6 +808,10 @@  enum aarch64_builtins
   AARCH64_RBIT,
   AARCH64_RBITL,
   AARCH64_RBITLL,
+  AARCH64_PLD,
+  AARCH64_PLDX,
+  AARCH64_PLI,
+  AARCH64_PLIX,
   AARCH64_BUILTIN_MAX
 };
 
@@ -1798,6 +1802,34 @@  aarch64_init_rng_builtins (void)
 				   AARCH64_BUILTIN_RNG_RNDRRS);
 }
 
+/* Add builtins for data and instrution prefetch.  */
+static void
+aarch64_init_prefetch_builtin (void)
+{
+#define AARCH64_INIT_PREFETCH_BUILTIN(INDEX, N)				\
+  aarch64_builtin_decls[INDEX] =					\
+    aarch64_general_add_builtin ("__builtin_aarch64_" N, ftype, INDEX)
+
+  tree ftype;
+  tree cv_argtype;
+  cv_argtype = build_qualified_type (void_type_node, TYPE_QUAL_CONST
+						     | TYPE_QUAL_VOLATILE);
+  cv_argtype = build_pointer_type (cv_argtype);
+
+  ftype = build_function_type_list (void_type_node, cv_argtype, NULL);
+  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLD, "pld");
+  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLI, "pli");
+
+  ftype = build_function_type_list (void_type_node, unsigned_type_node,
+				    unsigned_type_node, unsigned_type_node,
+				    cv_argtype, NULL);
+  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLDX, "pldx");
+
+  ftype = build_function_type_list (void_type_node, unsigned_type_node,
+				    unsigned_type_node, cv_argtype, NULL);
+  AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLIX, "plix");
+}
+
 /* Initialize the memory tagging extension (MTE) builtins.  */
 struct
 {
@@ -2019,6 +2051,8 @@  aarch64_general_init_builtins (void)
   aarch64_init_rng_builtins ();
   aarch64_init_data_intrinsics ();
 
+  aarch64_init_prefetch_builtin ();
+
   tree ftype_jcvt
     = build_function_type_list (intSI_type_node, double_type_node, NULL);
   aarch64_builtin_decls[AARCH64_JSCVT]
@@ -2599,6 +2633,102 @@  aarch64_expand_rng_builtin (tree exp, rtx target, int fcode, int ignore)
   return target;
 }
 
+/* Ensure ARGNO argument in EXP represents a const-type argument in the range
+   [MINVAL, MAXVAL).  */
+static HOST_WIDE_INT
+require_const_argument (tree exp, unsigned int argno, HOST_WIDE_INT minval,
+			HOST_WIDE_INT maxval)
+{
+  maxval--;
+  tree arg = CALL_EXPR_ARG (exp, argno);
+  if (TREE_CODE (arg) != INTEGER_CST)
+      error_at (EXPR_LOCATION (exp), "Constant-type argument expected");
+
+  wi::tree_to_widest_ref argval = wi::to_widest (arg);
+
+  if (argval < minval || argval > maxval)
+    error_at (EXPR_LOCATION (exp),
+	      "argument %d must be a constant immediate "
+	      "in range [%ld,%ld]", argno, minval, maxval);
+
+  HOST_WIDE_INT retval = *argval.get_val ();
+  return retval;
+}
+
+
+/* Expand a prefetch builtin EXP.  */
+void
+aarch64_expand_prefetch_builtin (tree exp, int fcode)
+{
+  unsigned narg;
+
+  tree args[4];
+  int kind_id = -1;
+  int level_id = -1;
+  int rettn_id = -1;
+  char prfop[11];
+  class expand_operand ops[2];
+
+  static const char *kind_s[] = {"PLD", "PST", "PLI"};
+  static const char *level_s[] = {"L1", "L2", "L3", "SLC"};
+  static const char *rettn_s[] = {"KEEP", "STRM"};
+
+  /* Each of the four prefetch builtins takes a different number of
+     arguments, but proceeds to call the PRFM insn which requires 4
+     pieces of information to be fully defined.
+
+     Specify the total number of arguments for each builtin and, where
+     one of these takes less than 4 arguments, set sensible defaults.  */
+  switch (fcode)
+    {
+    case AARCH64_PLDX:
+      narg = 4;
+      break;
+    case AARCH64_PLIX:
+      kind_id = 2;
+      narg = 3;
+      break;
+    case AARCH64_PLI:
+    case AARCH64_PLD:
+      kind_id  = (fcode == AARCH64_PLD) ? 0 : 2;
+      level_id = 0;
+      rettn_id = 0;
+      narg = 1;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  /* Any -1 id variable is to be user-supplied.  Here we fill these in and
+     run bounds checks on them.
+     "PLI" is used only implicitly by AARCH64_PLI & AARCH64_PLIX, never
+     explicitly.  */
+  int argno = 0;
+  if (kind_id < 0)
+    kind_id = require_const_argument (exp, argno++, 0, ARRAY_SIZE (kind_s) - 1);
+  if (level_id < 0)
+    level_id = require_const_argument (exp, argno++, 0, ARRAY_SIZE (level_s));
+  if (rettn_id < 0)
+    rettn_id = require_const_argument (exp, argno++, 0, ARRAY_SIZE (rettn_s));
+  rtx address = expand_expr (CALL_EXPR_ARG (exp, argno), NULL_RTX, Pmode,
+			     EXPAND_NORMAL);
+
+  if (seen_error ())
+    return;
+
+  sprintf (prfop, "%s%s%s", kind_s[kind_id],
+			    level_s[level_id],
+			    rettn_s[rettn_id]);
+
+  rtx const_str = rtx_alloc (CONST_STRING);
+  PUT_CODE (const_str, CONST_STRING);
+  XSTR (const_str, 0) = ggc_strdup (prfop);
+
+  create_fixed_operand (&ops[0], const_str);
+  create_address_operand (&ops[1], address);
+  maybe_expand_insn (CODE_FOR_aarch64_pldx, 2, ops);
+}
+
 /* Expand an expression EXP that calls a MEMTAG built-in FCODE
    with result going to TARGET.  */
 static rtx
@@ -2832,6 +2962,12 @@  aarch64_general_expand_builtin (unsigned int fcode, tree exp, rtx target,
     case AARCH64_BUILTIN_RNG_RNDR:
     case AARCH64_BUILTIN_RNG_RNDRRS:
       return aarch64_expand_rng_builtin (exp, target, fcode, ignore);
+    case AARCH64_PLD:
+    case AARCH64_PLDX:
+    case AARCH64_PLI:
+    case AARCH64_PLIX:
+      aarch64_expand_prefetch_builtin (exp, fcode);
+      return target;
     }
 
   if (fcode >= AARCH64_SIMD_BUILTIN_BASE && fcode <= AARCH64_SIMD_BUILTIN_MAX)
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index b8e12fc1d4b..68713c58f07 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -338,6 +338,7 @@ 
     UNSPEC_UPDATE_FFRT
     UNSPEC_RDFFR
     UNSPEC_WRFFR
+    UNSPEC_PLDX
     ;; Represents an SVE-style lane index, in which the indexing applies
     ;; within the containing 128-bit block.
     UNSPEC_SVE_LANE_SELECT
@@ -916,6 +917,17 @@ 
   [(set_attr "type" "load_4")]
 )
 
+(define_insn "aarch64_pldx"
+  [(unspec [(match_operand 0 "" "")
+	    (match_operand:DI 1 "aarch64_prefetch_operand" "Dp")] UNSPEC_PLDX)]
+  ""
+  {
+    operands[1] = gen_rtx_MEM (DImode, operands[1]);
+    return "prfm\\t%0, %1";
+  }
+  [(set_attr "type" "load_4")]
+)
+
 (define_insn "trap"
   [(trap_if (const_int 1) (const_int 8))]
   ""
diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
index 7599a32301d..40e5aa61be2 100644
--- a/gcc/config/aarch64/arm_acle.h
+++ b/gcc/config/aarch64/arm_acle.h
@@ -78,6 +78,36 @@  _GCC_ARM_ACLE_DATA_FN (revll, bswap64, uint64_t, uint64_t)
 
 #undef _GCC_ARM_ACLE_DATA_FN
 
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__pld (void const volatile *__addr)
+{
+  return __builtin_aarch64_pld (__addr);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__pli (void const volatile *__addr)
+{
+  return __builtin_aarch64_pli (__addr);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__plix (unsigned int __cache, unsigned int __rettn,
+	void const volatile *__addr)
+{
+  return __builtin_aarch64_plix (__cache, __rettn, __addr);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__pldx (unsigned int __access, unsigned int __cache, unsigned int __rettn,
+	void const volatile *__addr)
+{
+  return __builtin_aarch64_pldx (__access, __cache, __rettn, __addr);
+}
+
 __extension__ extern __inline unsigned long
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __revl (unsigned long __value)
diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_pld_pli.c b/gcc/testsuite/gcc.target/aarch64/builtin_pld_pli.c
new file mode 100644
index 00000000000..8cbaa97c00c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/builtin_pld_pli.c
@@ -0,0 +1,90 @@ 
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a -O2" } */
+
+#include <arm_acle.h>
+
+/* Check that we can generate the immediate-offset addressing
+   mode for PRFM.  */
+
+/* Access kind specifiers.  */
+#define PLD 0
+#define PST 1
+/* Cache levels.  */
+#define L1  0
+#define L2  1
+#define L3  2
+#define SLC 3
+/* Retention policies.  */
+#define KEEP 0
+#define STRM 1
+
+void
+prefetch_for_read_write (void *a)
+{
+  __pldx (PLD, L1, KEEP, a);
+  __pldx (PLD, L1, STRM, a);
+  __pldx (PLD, L2, KEEP, a);
+  __pldx (PLD, L2, STRM, a);
+  __pldx (PLD, L3, KEEP, a);
+  __pldx (PLD, L3, STRM, a);
+  __pldx (PLD, SLC, KEEP, a);
+  __pldx (PLD, SLC, STRM, a);
+  __pldx (PST, L1, KEEP, a);
+  __pldx (PST, L1, STRM, a);
+  __pldx (PST, L2, KEEP, a);
+  __pldx (PST, L2, STRM, a);
+  __pldx (PST, L3, KEEP, a);
+  __pldx (PST, L3, STRM, a);
+  __pldx (PST, SLC, KEEP, a);
+  __pldx (PST, SLC, STRM, a);
+}
+
+/* { dg-final { scan-assembler "prfm\tPLDL1KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLDL1STRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLDL2KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLDL2STRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLDL3KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLDL3STRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLDSLCKEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLDSLCSTRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPSTL1KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPSTL1STRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPSTL2KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPSTL2STRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPSTL3KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPSTL3STRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPSTSLCKEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPSTSLCSTRM, \\\[x\[0-9\]+\\\]" } } */
+
+void
+prefetch_simple (void *a)
+{
+  __pld (a);
+  __pli (a);
+}
+
+/* { dg-final { scan-assembler "prfm\tPLDL1KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLIL1KEEP, \\\[x\[0-9\]+\\\]" } } */
+
+void
+prefetch_instructions (void *a)
+{
+  __plix (L1, KEEP, a);
+  __plix (L1, STRM, a);
+  __plix (L2, KEEP, a);
+  __plix (L2, STRM, a);
+  __plix (L3, KEEP, a);
+  __plix (L3, STRM, a);
+  __plix (SLC, KEEP, a);
+  __plix (SLC, STRM, a);
+}
+
+/* { dg-final { scan-assembler "prfm\tPLIL1KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLIL1STRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLIL2KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLIL2STRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLIL3KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLIL3STRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLISLCKEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLISLCSTRM, \\\[x\[0-9\]+\\\]" } } */
+
diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_pld_pli_illegal.c b/gcc/testsuite/gcc.target/aarch64/builtin_pld_pli_illegal.c
new file mode 100644
index 00000000000..fa9faf71ad1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/builtin_pld_pli_illegal.c
@@ -0,0 +1,33 @@ 
+/* Check that PRFM-related bounds checks are applied correctly.  */
+/* { dg-do compile } */
+#include <arm_acle.h>
+
+/* Access kind specifiers.  */
+#define KIND_LOW -1
+#define KIND_HIGH 2
+/* Cache levels.  */
+#define LEVEL_LOW  -1
+#define LEVEL_HIGH  4
+/* Retention policies.  */
+#define POLICY_LOW -1
+#define POLICY_HIGH 2
+
+void
+data_rw_prefetch_bad_bounds (void *a)
+{
+  __builtin_aarch64_pldx (KIND_LOW, 0, 0, a);  /* { dg-error {argument 0 must be a constant immediate in range \[0,1\]} } */
+  __builtin_aarch64_pldx (KIND_HIGH, 0, 0, a); /* { dg-error {argument 0 must be a constant immediate in range \[0,1\]} } */
+  __builtin_aarch64_pldx (0, LEVEL_LOW, 0, a);  /* { dg-error {argument 1 must be a constant immediate in range \[0,3\]} } */
+  __builtin_aarch64_pldx (0, LEVEL_HIGH, 0, a); /* { dg-error {argument 1 must be a constant immediate in range \[0,3\]} } */
+  __builtin_aarch64_pldx (0, 0, POLICY_LOW, a);  /* { dg-error {argument 2 must be a constant immediate in range \[0,1\]} } */
+  __builtin_aarch64_pldx (0, 0, POLICY_HIGH, a); /* { dg-error {argument 2 must be a constant immediate in range \[0,1\]} } */
+}
+
+void
+insn_prefetch_bad_bounds (void *a)
+{
+  __builtin_aarch64_plix (LEVEL_LOW, 0, a);  /* { dg-error {argument 0 must be a constant immediate in range \[0,3\]} } */
+  __builtin_aarch64_plix (LEVEL_HIGH, 0, a); /* { dg-error {argument 0 must be a constant immediate in range \[0,3\]} } */
+  __builtin_aarch64_plix (0, POLICY_LOW, a);  /* { dg-error {argument 1 must be a constant immediate in range \[0,1\]} } */
+  __builtin_aarch64_plix (0, POLICY_HIGH, a); /* { dg-error {argument 1 must be a constant immediate in range \[0,1\]} } */
+}