[v3] aarch64: Implement the ACLE instruction/data prefetch functions.
Checks
Commit Message
Key changes in v3:
* Implement the `require_const_argument' function to ensure the nth
argument in EXP represents a const-type argument in the valid range
given by [minval, maxval), forgoing expansion altogether when an
invalid argument is detected early on.
* Whereas in the previous iteration, out-of-bound function
parameters led to warnings and sensible defaults set, akin to the
`__builtin_prefetch' implementation, parameters outside valid ranges
now result in an error, more faithfully reflecting ACLE
specifications.
---
Implement the ACLE data and instruction prefetch functions[1] with the
following signatures:
1. Data prefetch intrinsics:
----------------------------
void __pldx (/*constant*/ unsigned int /*access_kind*/,
/*constant*/ unsigned int /*cache_level*/,
/*constant*/ unsigned int /*retention_policy*/,
void const volatile *addr);
void __pld (void const volatile *addr);
2. Instruction prefetch intrinsics:
-----------------------------------
void __plix (/*constant*/ unsigned int /*cache_level*/,
/*constant*/ unsigned int /*retention_policy*/,
void const volatile *addr);
void __pli (void const volatile *addr);
`__pldx' affords the programmer more fine-grained control over the
data prefetch behavior than the analogous GCC builtin
`__builtin_prefetch', and allows access to the "SLC" cache level.
While `__builtin_prefetch' chooses both cache-level and retention
policy automatically via the optional `locality' parameter, `__pldx'
expects 2 (mandatory) arguments to explicitly define the desired
cache-level and retention policies.
`__plix' on the other hand, generates a code prefetch instruction and
so extends functionality on aarch64 targets beyond that which is
exposed by `builtin_prefetch'.
`__pld' and `__pli' do prefetch of data and instructions,
respectively, using default values for both cache-level and retention
policies.
Bootstrapped and tested on aarch64-none-linux-gnu.
[1] https://arm-software.github.io/acle/main/acle.html#memory-prefetch-intrinsics
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc:
(AARCH64_PLD): New enum aarch64_builtins entry.
(AARCH64_PLDX): Likewise.
(AARCH64_PLI): Likewise.
(AARCH64_PLIX): Likewise.
(aarch64_init_prefetch_builtin): New.
(aarch64_general_init_builtins): Call prefetch init function.
(aarch64_expand_prefetch_builtin): New.
(aarch64_general_expand_builtin): Add prefetch expansion.
(require_const_argument): New.
* config/aarch64/aarch64.md (UNSPEC_PLDX): New.
(aarch64_pldx): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/builtin_pld_pli.c: New.
* gcc.target/aarch64/builtin_pld_pli_illegal.c: New.
---
gcc/config/aarch64/aarch64-builtins.cc | 136 ++++++++++++++++++
gcc/config/aarch64/aarch64.md | 12 ++
gcc/config/aarch64/arm_acle.h | 30 ++++
.../gcc.target/aarch64/builtin_pld_pli.c | 90 ++++++++++++
.../aarch64/builtin_pld_pli_illegal.c | 33 +++++
5 files changed, 301 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/aarch64/builtin_pld_pli.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/builtin_pld_pli_illegal.c
Comments
Victor Do Nascimento <victor.donascimento@arm.com> writes:
> Key changes in v3:
> * Implement the `require_const_argument' function to ensure the nth
> argument in EXP represents a const-type argument in the valid range
> given by [minval, maxval), forgoing expansion altogether when an
> invalid argument is detected early on.
> * Whereas in the previous iteration, out-of-bound function
> parameters led to warnings and sensible defaults set, akin to the
> `__builtin_prefetch' implementation, parameters outside valid ranges
> now result in an error, more faithfully reflecting ACLE
> specifications.
>
> ---
>
> Implement the ACLE data and instruction prefetch functions[1] with the
> following signatures:
>
> 1. Data prefetch intrinsics:
> ----------------------------
> void __pldx (/*constant*/ unsigned int /*access_kind*/,
> /*constant*/ unsigned int /*cache_level*/,
> /*constant*/ unsigned int /*retention_policy*/,
> void const volatile *addr);
>
> void __pld (void const volatile *addr);
>
> 2. Instruction prefetch intrinsics:
> -----------------------------------
> void __plix (/*constant*/ unsigned int /*cache_level*/,
> /*constant*/ unsigned int /*retention_policy*/,
> void const volatile *addr);
>
> void __pli (void const volatile *addr);
>
> `__pldx' affords the programmer more fine-grained control over the
> data prefetch behavior than the analogous GCC builtin
> `__builtin_prefetch', and allows access to the "SLC" cache level.
>
> While `__builtin_prefetch' chooses both cache-level and retention
> policy automatically via the optional `locality' parameter, `__pldx'
> expects 2 (mandatory) arguments to explicitly define the desired
> cache-level and retention policies.
>
> `__plix' on the other hand, generates a code prefetch instruction and
> so extends functionality on aarch64 targets beyond that which is
> exposed by `builtin_prefetch'.
>
> `__pld' and `__pli' do prefetch of data and instructions,
> respectively, using default values for both cache-level and retention
> policies.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> [1] https://arm-software.github.io/acle/main/acle.html#memory-prefetch-intrinsics
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-builtins.cc:
> (AARCH64_PLD): New enum aarch64_builtins entry.
> (AARCH64_PLDX): Likewise.
> (AARCH64_PLI): Likewise.
> (AARCH64_PLIX): Likewise.
> (aarch64_init_prefetch_builtin): New.
> (aarch64_general_init_builtins): Call prefetch init function.
> (aarch64_expand_prefetch_builtin): New.
> (aarch64_general_expand_builtin): Add prefetch expansion.
> (require_const_argument): New.
> * config/aarch64/aarch64.md (UNSPEC_PLDX): New.
> (aarch64_pldx): New.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/builtin_pld_pli.c: New.
> * gcc.target/aarch64/builtin_pld_pli_illegal.c: New.
> ---
> gcc/config/aarch64/aarch64-builtins.cc | 136 ++++++++++++++++++
> gcc/config/aarch64/aarch64.md | 12 ++
> gcc/config/aarch64/arm_acle.h | 30 ++++
> .../gcc.target/aarch64/builtin_pld_pli.c | 90 ++++++++++++
> .../aarch64/builtin_pld_pli_illegal.c | 33 +++++
> 5 files changed, 301 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/builtin_pld_pli.c
> create mode 100644 gcc/testsuite/gcc.target/aarch64/builtin_pld_pli_illegal.c
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
> index 04f59fd9a54..d092654b6fb 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -808,6 +808,10 @@ enum aarch64_builtins
> AARCH64_RBIT,
> AARCH64_RBITL,
> AARCH64_RBITLL,
> + AARCH64_PLD,
> + AARCH64_PLDX,
> + AARCH64_PLI,
> + AARCH64_PLIX,
> AARCH64_BUILTIN_MAX
> };
>
> @@ -1798,6 +1802,34 @@ aarch64_init_rng_builtins (void)
> AARCH64_BUILTIN_RNG_RNDRRS);
> }
>
> +/* Add builtins for data and instrution prefetch. */
> +static void
> +aarch64_init_prefetch_builtin (void)
> +{
> +#define AARCH64_INIT_PREFETCH_BUILTIN(INDEX, N) \
> + aarch64_builtin_decls[INDEX] = \
> + aarch64_general_add_builtin ("__builtin_aarch64_" N, ftype, INDEX)
> +
> + tree ftype;
> + tree cv_argtype;
> + cv_argtype = build_qualified_type (void_type_node, TYPE_QUAL_CONST
> + | TYPE_QUAL_VOLATILE);
> + cv_argtype = build_pointer_type (cv_argtype);
> +
> + ftype = build_function_type_list (void_type_node, cv_argtype, NULL);
> + AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLD, "pld");
> + AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLI, "pli");
> +
> + ftype = build_function_type_list (void_type_node, unsigned_type_node,
> + unsigned_type_node, unsigned_type_node,
> + cv_argtype, NULL);
> + AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLDX, "pldx");
> +
> + ftype = build_function_type_list (void_type_node, unsigned_type_node,
> + unsigned_type_node, cv_argtype, NULL);
> + AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLIX, "plix");
> +}
> +
> /* Initialize the memory tagging extension (MTE) builtins. */
> struct
> {
> @@ -2019,6 +2051,8 @@ aarch64_general_init_builtins (void)
> aarch64_init_rng_builtins ();
> aarch64_init_data_intrinsics ();
>
> + aarch64_init_prefetch_builtin ();
> +
> tree ftype_jcvt
> = build_function_type_list (intSI_type_node, double_type_node, NULL);
> aarch64_builtin_decls[AARCH64_JSCVT]
> @@ -2599,6 +2633,102 @@ aarch64_expand_rng_builtin (tree exp, rtx target, int fcode, int ignore)
> return target;
> }
>
> +/* Ensure ARGNO argument in EXP represents a const-type argument in the range
s/ARGNO argument/argument ARGNO/
> + [MINVAL, MAXVAL). */
> +static HOST_WIDE_INT
> +require_const_argument (tree exp, unsigned int argno, HOST_WIDE_INT minval,
> + HOST_WIDE_INT maxval)
> +{
> + maxval--;
> + tree arg = CALL_EXPR_ARG (exp, argno);
> + if (TREE_CODE (arg) != INTEGER_CST)
> + error_at (EXPR_LOCATION (exp), "Constant-type argument expected");
> +
> + wi::tree_to_widest_ref argval = wi::to_widest (arg);
Better to use auto here.
> +
> + if (argval < minval || argval > maxval)
> + error_at (EXPR_LOCATION (exp),
> + "argument %d must be a constant immediate "
> + "in range [%ld,%ld]", argno, minval, maxval);
The format for HOST_WIDE_INT is %wd. (%ld isn't portable across hosts.)
GCC's error messages use 1-based argument numbers rather than 0-based argument
numbers, so this should be argno + 1.
> +
> + HOST_WIDE_INT retval = *argval.get_val ();
Better to use argval.to_shwi ().
OK with those changes, thanks.
Richard
> + return retval;
> +}
> +
> +
> +/* Expand a prefetch builtin EXP. */
> +void
> +aarch64_expand_prefetch_builtin (tree exp, int fcode)
> +{
> + unsigned narg;
> +
> + tree args[4];
> + int kind_id = -1;
> + int level_id = -1;
> + int rettn_id = -1;
> + char prfop[11];
> + class expand_operand ops[2];
> +
> + static const char *kind_s[] = {"PLD", "PST", "PLI"};
> + static const char *level_s[] = {"L1", "L2", "L3", "SLC"};
> + static const char *rettn_s[] = {"KEEP", "STRM"};
> +
> + /* Each of the four prefetch builtins takes a different number of
> + arguments, but proceeds to call the PRFM insn which requires 4
> + pieces of information to be fully defined.
> +
> + Specify the total number of arguments for each builtin and, where
> + one of these takes less than 4 arguments, set sensible defaults. */
> + switch (fcode)
> + {
> + case AARCH64_PLDX:
> + narg = 4;
> + break;
> + case AARCH64_PLIX:
> + kind_id = 2;
> + narg = 3;
> + break;
> + case AARCH64_PLI:
> + case AARCH64_PLD:
> + kind_id = (fcode == AARCH64_PLD) ? 0 : 2;
> + level_id = 0;
> + rettn_id = 0;
> + narg = 1;
> + break;
> + default:
> + gcc_unreachable ();
> + }
> +
> + /* Any -1 id variable is to be user-supplied. Here we fill these in and
> + run bounds checks on them.
> + "PLI" is used only implicitly by AARCH64_PLI & AARCH64_PLIX, never
> + explicitly. */
> + int argno = 0;
> + if (kind_id < 0)
> + kind_id = require_const_argument (exp, argno++, 0, ARRAY_SIZE (kind_s) - 1);
> + if (level_id < 0)
> + level_id = require_const_argument (exp, argno++, 0, ARRAY_SIZE (level_s));
> + if (rettn_id < 0)
> + rettn_id = require_const_argument (exp, argno++, 0, ARRAY_SIZE (rettn_s));
> + rtx address = expand_expr (CALL_EXPR_ARG (exp, argno), NULL_RTX, Pmode,
> + EXPAND_NORMAL);
> +
> + if (seen_error ())
> + return;
> +
> + sprintf (prfop, "%s%s%s", kind_s[kind_id],
> + level_s[level_id],
> + rettn_s[rettn_id]);
> +
> + rtx const_str = rtx_alloc (CONST_STRING);
> + PUT_CODE (const_str, CONST_STRING);
> + XSTR (const_str, 0) = ggc_strdup (prfop);
> +
> + create_fixed_operand (&ops[0], const_str);
> + create_address_operand (&ops[1], address);
> + maybe_expand_insn (CODE_FOR_aarch64_pldx, 2, ops);
> +}
> +
> /* Expand an expression EXP that calls a MEMTAG built-in FCODE
> with result going to TARGET. */
> static rtx
> @@ -2832,6 +2962,12 @@ aarch64_general_expand_builtin (unsigned int fcode, tree exp, rtx target,
> case AARCH64_BUILTIN_RNG_RNDR:
> case AARCH64_BUILTIN_RNG_RNDRRS:
> return aarch64_expand_rng_builtin (exp, target, fcode, ignore);
> + case AARCH64_PLD:
> + case AARCH64_PLDX:
> + case AARCH64_PLI:
> + case AARCH64_PLIX:
> + aarch64_expand_prefetch_builtin (exp, fcode);
> + return target;
> }
>
> if (fcode >= AARCH64_SIMD_BUILTIN_BASE && fcode <= AARCH64_SIMD_BUILTIN_MAX)
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index b8e12fc1d4b..68713c58f07 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -338,6 +338,7 @@
> UNSPEC_UPDATE_FFRT
> UNSPEC_RDFFR
> UNSPEC_WRFFR
> + UNSPEC_PLDX
> ;; Represents an SVE-style lane index, in which the indexing applies
> ;; within the containing 128-bit block.
> UNSPEC_SVE_LANE_SELECT
> @@ -916,6 +917,17 @@
> [(set_attr "type" "load_4")]
> )
>
> +(define_insn "aarch64_pldx"
> + [(unspec [(match_operand 0 "" "")
> + (match_operand:DI 1 "aarch64_prefetch_operand" "Dp")] UNSPEC_PLDX)]
> + ""
> + {
> + operands[1] = gen_rtx_MEM (DImode, operands[1]);
> + return "prfm\\t%0, %1";
> + }
> + [(set_attr "type" "load_4")]
> +)
> +
> (define_insn "trap"
> [(trap_if (const_int 1) (const_int 8))]
> ""
> diff --git a/gcc/config/aarch64/arm_acle.h b/gcc/config/aarch64/arm_acle.h
> index 7599a32301d..40e5aa61be2 100644
> --- a/gcc/config/aarch64/arm_acle.h
> +++ b/gcc/config/aarch64/arm_acle.h
> @@ -78,6 +78,36 @@ _GCC_ARM_ACLE_DATA_FN (revll, bswap64, uint64_t, uint64_t)
>
> #undef _GCC_ARM_ACLE_DATA_FN
>
> +__extension__ extern __inline void
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__pld (void const volatile *__addr)
> +{
> + return __builtin_aarch64_pld (__addr);
> +}
> +
> +__extension__ extern __inline void
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__pli (void const volatile *__addr)
> +{
> + return __builtin_aarch64_pli (__addr);
> +}
> +
> +__extension__ extern __inline void
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__plix (unsigned int __cache, unsigned int __rettn,
> + void const volatile *__addr)
> +{
> + return __builtin_aarch64_plix (__cache, __rettn, __addr);
> +}
> +
> +__extension__ extern __inline void
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +__pldx (unsigned int __access, unsigned int __cache, unsigned int __rettn,
> + void const volatile *__addr)
> +{
> + return __builtin_aarch64_pldx (__access, __cache, __rettn, __addr);
> +}
> +
> __extension__ extern __inline unsigned long
> __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> __revl (unsigned long __value)
> diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_pld_pli.c b/gcc/testsuite/gcc.target/aarch64/builtin_pld_pli.c
> new file mode 100644
> index 00000000000..8cbaa97c00c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/builtin_pld_pli.c
> @@ -0,0 +1,90 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=armv8-a -O2" } */
> +
> +#include <arm_acle.h>
> +
> +/* Check that we can generate the immediate-offset addressing
> + mode for PRFM. */
> +
> +/* Access kind specifiers. */
> +#define PLD 0
> +#define PST 1
> +/* Cache levels. */
> +#define L1 0
> +#define L2 1
> +#define L3 2
> +#define SLC 3
> +/* Retention policies. */
> +#define KEEP 0
> +#define STRM 1
> +
> +void
> +prefetch_for_read_write (void *a)
> +{
> + __pldx (PLD, L1, KEEP, a);
> + __pldx (PLD, L1, STRM, a);
> + __pldx (PLD, L2, KEEP, a);
> + __pldx (PLD, L2, STRM, a);
> + __pldx (PLD, L3, KEEP, a);
> + __pldx (PLD, L3, STRM, a);
> + __pldx (PLD, SLC, KEEP, a);
> + __pldx (PLD, SLC, STRM, a);
> + __pldx (PST, L1, KEEP, a);
> + __pldx (PST, L1, STRM, a);
> + __pldx (PST, L2, KEEP, a);
> + __pldx (PST, L2, STRM, a);
> + __pldx (PST, L3, KEEP, a);
> + __pldx (PST, L3, STRM, a);
> + __pldx (PST, SLC, KEEP, a);
> + __pldx (PST, SLC, STRM, a);
> +}
> +
> +/* { dg-final { scan-assembler "prfm\tPLDL1KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLDL1STRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLDL2KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLDL2STRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLDL3KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLDL3STRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLDSLCKEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLDSLCSTRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPSTL1KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPSTL1STRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPSTL2KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPSTL2STRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPSTL3KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPSTL3STRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPSTSLCKEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPSTSLCSTRM, \\\[x\[0-9\]+\\\]" } } */
> +
> +void
> +prefetch_simple (void *a)
> +{
> + __pld (a);
> + __pli (a);
> +}
> +
> +/* { dg-final { scan-assembler "prfm\tPLDL1KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLIL1KEEP, \\\[x\[0-9\]+\\\]" } } */
> +
> +void
> +prefetch_instructions (void *a)
> +{
> + __plix (L1, KEEP, a);
> + __plix (L1, STRM, a);
> + __plix (L2, KEEP, a);
> + __plix (L2, STRM, a);
> + __plix (L3, KEEP, a);
> + __plix (L3, STRM, a);
> + __plix (SLC, KEEP, a);
> + __plix (SLC, STRM, a);
> +}
> +
> +/* { dg-final { scan-assembler "prfm\tPLIL1KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLIL1STRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLIL2KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLIL2STRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLIL3KEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLIL3STRM, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLISLCKEEP, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "prfm\tPLISLCSTRM, \\\[x\[0-9\]+\\\]" } } */
> +
> diff --git a/gcc/testsuite/gcc.target/aarch64/builtin_pld_pli_illegal.c b/gcc/testsuite/gcc.target/aarch64/builtin_pld_pli_illegal.c
> new file mode 100644
> index 00000000000..fa9faf71ad1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/builtin_pld_pli_illegal.c
> @@ -0,0 +1,33 @@
> +/* Check that PRFM-related bounds checks are applied correctly. */
> +/* { dg-do compile } */
> +#include <arm_acle.h>
> +
> +/* Access kind specifiers. */
> +#define KIND_LOW -1
> +#define KIND_HIGH 2
> +/* Cache levels. */
> +#define LEVEL_LOW -1
> +#define LEVEL_HIGH 4
> +/* Retention policies. */
> +#define POLICY_LOW -1
> +#define POLICY_HIGH 2
> +
> +void
> +data_rw_prefetch_bad_bounds (void *a)
> +{
> + __builtin_aarch64_pldx (KIND_LOW, 0, 0, a); /* { dg-error {argument 0 must be a constant immediate in range \[0,1\]} } */
> + __builtin_aarch64_pldx (KIND_HIGH, 0, 0, a); /* { dg-error {argument 0 must be a constant immediate in range \[0,1\]} } */
> + __builtin_aarch64_pldx (0, LEVEL_LOW, 0, a); /* { dg-error {argument 1 must be a constant immediate in range \[0,3\]} } */
> + __builtin_aarch64_pldx (0, LEVEL_HIGH, 0, a); /* { dg-error {argument 1 must be a constant immediate in range \[0,3\]} } */
> + __builtin_aarch64_pldx (0, 0, POLICY_LOW, a); /* { dg-error {argument 2 must be a constant immediate in range \[0,1\]} } */
> + __builtin_aarch64_pldx (0, 0, POLICY_HIGH, a); /* { dg-error {argument 2 must be a constant immediate in range \[0,1\]} } */
> +}
> +
> +void
> +insn_prefetch_bad_bounds (void *a)
> +{
> + __builtin_aarch64_plix (LEVEL_LOW, 0, a); /* { dg-error {argument 0 must be a constant immediate in range \[0,3\]} } */
> + __builtin_aarch64_plix (LEVEL_HIGH, 0, a); /* { dg-error {argument 0 must be a constant immediate in range \[0,3\]} } */
> + __builtin_aarch64_plix (0, POLICY_LOW, a); /* { dg-error {argument 1 must be a constant immediate in range \[0,1\]} } */
> + __builtin_aarch64_plix (0, POLICY_HIGH, a); /* { dg-error {argument 1 must be a constant immediate in range \[0,1\]} } */
> +}
@@ -808,6 +808,10 @@ enum aarch64_builtins
AARCH64_RBIT,
AARCH64_RBITL,
AARCH64_RBITLL,
+ AARCH64_PLD,
+ AARCH64_PLDX,
+ AARCH64_PLI,
+ AARCH64_PLIX,
AARCH64_BUILTIN_MAX
};
@@ -1798,6 +1802,34 @@ aarch64_init_rng_builtins (void)
AARCH64_BUILTIN_RNG_RNDRRS);
}
+/* Add builtins for data and instrution prefetch. */
+static void
+aarch64_init_prefetch_builtin (void)
+{
+#define AARCH64_INIT_PREFETCH_BUILTIN(INDEX, N) \
+ aarch64_builtin_decls[INDEX] = \
+ aarch64_general_add_builtin ("__builtin_aarch64_" N, ftype, INDEX)
+
+ tree ftype;
+ tree cv_argtype;
+ cv_argtype = build_qualified_type (void_type_node, TYPE_QUAL_CONST
+ | TYPE_QUAL_VOLATILE);
+ cv_argtype = build_pointer_type (cv_argtype);
+
+ ftype = build_function_type_list (void_type_node, cv_argtype, NULL);
+ AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLD, "pld");
+ AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLI, "pli");
+
+ ftype = build_function_type_list (void_type_node, unsigned_type_node,
+ unsigned_type_node, unsigned_type_node,
+ cv_argtype, NULL);
+ AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLDX, "pldx");
+
+ ftype = build_function_type_list (void_type_node, unsigned_type_node,
+ unsigned_type_node, cv_argtype, NULL);
+ AARCH64_INIT_PREFETCH_BUILTIN (AARCH64_PLIX, "plix");
+}
+
/* Initialize the memory tagging extension (MTE) builtins. */
struct
{
@@ -2019,6 +2051,8 @@ aarch64_general_init_builtins (void)
aarch64_init_rng_builtins ();
aarch64_init_data_intrinsics ();
+ aarch64_init_prefetch_builtin ();
+
tree ftype_jcvt
= build_function_type_list (intSI_type_node, double_type_node, NULL);
aarch64_builtin_decls[AARCH64_JSCVT]
@@ -2599,6 +2633,102 @@ aarch64_expand_rng_builtin (tree exp, rtx target, int fcode, int ignore)
return target;
}
+/* Ensure ARGNO argument in EXP represents a const-type argument in the range
+ [MINVAL, MAXVAL). */
+static HOST_WIDE_INT
+require_const_argument (tree exp, unsigned int argno, HOST_WIDE_INT minval,
+ HOST_WIDE_INT maxval)
+{
+ maxval--;
+ tree arg = CALL_EXPR_ARG (exp, argno);
+ if (TREE_CODE (arg) != INTEGER_CST)
+ error_at (EXPR_LOCATION (exp), "Constant-type argument expected");
+
+ wi::tree_to_widest_ref argval = wi::to_widest (arg);
+
+ if (argval < minval || argval > maxval)
+ error_at (EXPR_LOCATION (exp),
+ "argument %d must be a constant immediate "
+ "in range [%ld,%ld]", argno, minval, maxval);
+
+ HOST_WIDE_INT retval = *argval.get_val ();
+ return retval;
+}
+
+
+/* Expand a prefetch builtin EXP. */
+void
+aarch64_expand_prefetch_builtin (tree exp, int fcode)
+{
+ unsigned narg;
+
+ tree args[4];
+ int kind_id = -1;
+ int level_id = -1;
+ int rettn_id = -1;
+ char prfop[11];
+ class expand_operand ops[2];
+
+ static const char *kind_s[] = {"PLD", "PST", "PLI"};
+ static const char *level_s[] = {"L1", "L2", "L3", "SLC"};
+ static const char *rettn_s[] = {"KEEP", "STRM"};
+
+ /* Each of the four prefetch builtins takes a different number of
+ arguments, but proceeds to call the PRFM insn which requires 4
+ pieces of information to be fully defined.
+
+ Specify the total number of arguments for each builtin and, where
+ one of these takes less than 4 arguments, set sensible defaults. */
+ switch (fcode)
+ {
+ case AARCH64_PLDX:
+ narg = 4;
+ break;
+ case AARCH64_PLIX:
+ kind_id = 2;
+ narg = 3;
+ break;
+ case AARCH64_PLI:
+ case AARCH64_PLD:
+ kind_id = (fcode == AARCH64_PLD) ? 0 : 2;
+ level_id = 0;
+ rettn_id = 0;
+ narg = 1;
+ break;
+ default:
+ gcc_unreachable ();
+ }
+
+ /* Any -1 id variable is to be user-supplied. Here we fill these in and
+ run bounds checks on them.
+ "PLI" is used only implicitly by AARCH64_PLI & AARCH64_PLIX, never
+ explicitly. */
+ int argno = 0;
+ if (kind_id < 0)
+ kind_id = require_const_argument (exp, argno++, 0, ARRAY_SIZE (kind_s) - 1);
+ if (level_id < 0)
+ level_id = require_const_argument (exp, argno++, 0, ARRAY_SIZE (level_s));
+ if (rettn_id < 0)
+ rettn_id = require_const_argument (exp, argno++, 0, ARRAY_SIZE (rettn_s));
+ rtx address = expand_expr (CALL_EXPR_ARG (exp, argno), NULL_RTX, Pmode,
+ EXPAND_NORMAL);
+
+ if (seen_error ())
+ return;
+
+ sprintf (prfop, "%s%s%s", kind_s[kind_id],
+ level_s[level_id],
+ rettn_s[rettn_id]);
+
+ rtx const_str = rtx_alloc (CONST_STRING);
+ PUT_CODE (const_str, CONST_STRING);
+ XSTR (const_str, 0) = ggc_strdup (prfop);
+
+ create_fixed_operand (&ops[0], const_str);
+ create_address_operand (&ops[1], address);
+ maybe_expand_insn (CODE_FOR_aarch64_pldx, 2, ops);
+}
+
/* Expand an expression EXP that calls a MEMTAG built-in FCODE
with result going to TARGET. */
static rtx
@@ -2832,6 +2962,12 @@ aarch64_general_expand_builtin (unsigned int fcode, tree exp, rtx target,
case AARCH64_BUILTIN_RNG_RNDR:
case AARCH64_BUILTIN_RNG_RNDRRS:
return aarch64_expand_rng_builtin (exp, target, fcode, ignore);
+ case AARCH64_PLD:
+ case AARCH64_PLDX:
+ case AARCH64_PLI:
+ case AARCH64_PLIX:
+ aarch64_expand_prefetch_builtin (exp, fcode);
+ return target;
}
if (fcode >= AARCH64_SIMD_BUILTIN_BASE && fcode <= AARCH64_SIMD_BUILTIN_MAX)
@@ -338,6 +338,7 @@
UNSPEC_UPDATE_FFRT
UNSPEC_RDFFR
UNSPEC_WRFFR
+ UNSPEC_PLDX
;; Represents an SVE-style lane index, in which the indexing applies
;; within the containing 128-bit block.
UNSPEC_SVE_LANE_SELECT
@@ -916,6 +917,17 @@
[(set_attr "type" "load_4")]
)
+(define_insn "aarch64_pldx"
+ [(unspec [(match_operand 0 "" "")
+ (match_operand:DI 1 "aarch64_prefetch_operand" "Dp")] UNSPEC_PLDX)]
+ ""
+ {
+ operands[1] = gen_rtx_MEM (DImode, operands[1]);
+ return "prfm\\t%0, %1";
+ }
+ [(set_attr "type" "load_4")]
+)
+
(define_insn "trap"
[(trap_if (const_int 1) (const_int 8))]
""
@@ -78,6 +78,36 @@ _GCC_ARM_ACLE_DATA_FN (revll, bswap64, uint64_t, uint64_t)
#undef _GCC_ARM_ACLE_DATA_FN
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__pld (void const volatile *__addr)
+{
+ return __builtin_aarch64_pld (__addr);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__pli (void const volatile *__addr)
+{
+ return __builtin_aarch64_pli (__addr);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__plix (unsigned int __cache, unsigned int __rettn,
+ void const volatile *__addr)
+{
+ return __builtin_aarch64_plix (__cache, __rettn, __addr);
+}
+
+__extension__ extern __inline void
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+__pldx (unsigned int __access, unsigned int __cache, unsigned int __rettn,
+ void const volatile *__addr)
+{
+ return __builtin_aarch64_pldx (__access, __cache, __rettn, __addr);
+}
+
__extension__ extern __inline unsigned long
__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
__revl (unsigned long __value)
new file mode 100644
@@ -0,0 +1,90 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv8-a -O2" } */
+
+#include <arm_acle.h>
+
+/* Check that we can generate the immediate-offset addressing
+ mode for PRFM. */
+
+/* Access kind specifiers. */
+#define PLD 0
+#define PST 1
+/* Cache levels. */
+#define L1 0
+#define L2 1
+#define L3 2
+#define SLC 3
+/* Retention policies. */
+#define KEEP 0
+#define STRM 1
+
+void
+prefetch_for_read_write (void *a)
+{
+ __pldx (PLD, L1, KEEP, a);
+ __pldx (PLD, L1, STRM, a);
+ __pldx (PLD, L2, KEEP, a);
+ __pldx (PLD, L2, STRM, a);
+ __pldx (PLD, L3, KEEP, a);
+ __pldx (PLD, L3, STRM, a);
+ __pldx (PLD, SLC, KEEP, a);
+ __pldx (PLD, SLC, STRM, a);
+ __pldx (PST, L1, KEEP, a);
+ __pldx (PST, L1, STRM, a);
+ __pldx (PST, L2, KEEP, a);
+ __pldx (PST, L2, STRM, a);
+ __pldx (PST, L3, KEEP, a);
+ __pldx (PST, L3, STRM, a);
+ __pldx (PST, SLC, KEEP, a);
+ __pldx (PST, SLC, STRM, a);
+}
+
+/* { dg-final { scan-assembler "prfm\tPLDL1KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLDL1STRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLDL2KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLDL2STRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLDL3KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLDL3STRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLDSLCKEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLDSLCSTRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPSTL1KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPSTL1STRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPSTL2KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPSTL2STRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPSTL3KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPSTL3STRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPSTSLCKEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPSTSLCSTRM, \\\[x\[0-9\]+\\\]" } } */
+
+void
+prefetch_simple (void *a)
+{
+ __pld (a);
+ __pli (a);
+}
+
+/* { dg-final { scan-assembler "prfm\tPLDL1KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLIL1KEEP, \\\[x\[0-9\]+\\\]" } } */
+
+void
+prefetch_instructions (void *a)
+{
+ __plix (L1, KEEP, a);
+ __plix (L1, STRM, a);
+ __plix (L2, KEEP, a);
+ __plix (L2, STRM, a);
+ __plix (L3, KEEP, a);
+ __plix (L3, STRM, a);
+ __plix (SLC, KEEP, a);
+ __plix (SLC, STRM, a);
+}
+
+/* { dg-final { scan-assembler "prfm\tPLIL1KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLIL1STRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLIL2KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLIL2STRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLIL3KEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLIL3STRM, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLISLCKEEP, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "prfm\tPLISLCSTRM, \\\[x\[0-9\]+\\\]" } } */
+
new file mode 100644
@@ -0,0 +1,33 @@
+/* Check that PRFM-related bounds checks are applied correctly. */
+/* { dg-do compile } */
+#include <arm_acle.h>
+
+/* Access kind specifiers. */
+#define KIND_LOW -1
+#define KIND_HIGH 2
+/* Cache levels. */
+#define LEVEL_LOW -1
+#define LEVEL_HIGH 4
+/* Retention policies. */
+#define POLICY_LOW -1
+#define POLICY_HIGH 2
+
+void
+data_rw_prefetch_bad_bounds (void *a)
+{
+ __builtin_aarch64_pldx (KIND_LOW, 0, 0, a); /* { dg-error {argument 0 must be a constant immediate in range \[0,1\]} } */
+ __builtin_aarch64_pldx (KIND_HIGH, 0, 0, a); /* { dg-error {argument 0 must be a constant immediate in range \[0,1\]} } */
+ __builtin_aarch64_pldx (0, LEVEL_LOW, 0, a); /* { dg-error {argument 1 must be a constant immediate in range \[0,3\]} } */
+ __builtin_aarch64_pldx (0, LEVEL_HIGH, 0, a); /* { dg-error {argument 1 must be a constant immediate in range \[0,3\]} } */
+ __builtin_aarch64_pldx (0, 0, POLICY_LOW, a); /* { dg-error {argument 2 must be a constant immediate in range \[0,1\]} } */
+ __builtin_aarch64_pldx (0, 0, POLICY_HIGH, a); /* { dg-error {argument 2 must be a constant immediate in range \[0,1\]} } */
+}
+
+void
+insn_prefetch_bad_bounds (void *a)
+{
+ __builtin_aarch64_plix (LEVEL_LOW, 0, a); /* { dg-error {argument 0 must be a constant immediate in range \[0,3\]} } */
+ __builtin_aarch64_plix (LEVEL_HIGH, 0, a); /* { dg-error {argument 0 must be a constant immediate in range \[0,3\]} } */
+ __builtin_aarch64_plix (0, POLICY_LOW, a); /* { dg-error {argument 1 must be a constant immediate in range \[0,1\]} } */
+ __builtin_aarch64_plix (0, POLICY_HIGH, a); /* { dg-error {argument 1 must be a constant immediate in range \[0,1\]} } */
+}