diff mbox series

aarch64: Add -mveclibabi=sleefgnu

Message ID	ff0c733d75a54aa59a3c29aa9085b09e@ex13mbxc01n01.ikhex.ikoula.com
State	Accepted
Headers	Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0EE703858D20 To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org> CC: Etienne Renault <etienne.renault@sipearl.com>, Lou Knauer <lou.knauer@sipearl.com> Subject: [PATCH] aarch64: Add -mveclibabi=sleefgnu Thread-Topic: [PATCH] aarch64: Add -mveclibabi=sleefgnu Thread-Index: AdlungKdRQS+x06IT1KxiCcVvZfH1w== Date: Fri, 14 Apr 2023 07:02:35 +0000 Message-ID: <ff0c733d75a54aa59a3c29aa9085b09e@ex13mbxc01n01.ikhex.ikoula.com> Accept-Language: en-GB, en-US Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Precedence: list From: Lou Knauer via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Lou Knauer <lou.knauer@sipearl.com> Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
Series	aarch64: Add -mveclibabi=sleefgnu \| aarch64: Add -mveclibabi=sleefgnu

Checks

Context	Check	Description
snail/gcc-patch-check	success	Github commit url

Commit Message

Lou Knauer April 14, 2023, 7:02 a.m. UTC

  This adds support for the -mveclibabi option to the AArch64 backend of GCC by
implementing the builtin_vectorized_function target hook for AArch64.
The SLEEF Vectorized Math Library's GNUABI interface is used, and
NEON/Advanced SIMD as well as SVE are supported.

This was tested on the gcc testsuite and the llvm-test-suite on a AArch64
host for NEON and SVE as well as on hand-written benchmarks. Where the
vectorization of builtins was applied successfully in loops bound by the
calls to those, significant (>2) performance gains can be observed.

gcc/ChangeLog:

        * config/aarch64/aarch64.opt: Add -mveclibabi option.
        * config/aarch64/aarch64-opts.h: Add aarch64_veclibabi enum.
        * config/aarch64/aarch64-protos.h: Add
        aarch64_builtin_vectorized_function declaration.
        * config/aarch64/aarch64.cc: Handle -mveclibabi option and pure
        scalable type info for scalable vectors without "SVE type" attributes.
        * config/aarch64/aarch64-builtins.cc: Add
        aarch64_builtin_vectorized_function definition.
        * doc/invoke.texi: Document -mveclibabi for AArch64 targets.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c: New testcase.
        * gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c: New testcase.
---
 gcc/config/aarch64/aarch64-builtins.cc        | 113 ++++++++++++++++++
 gcc/config/aarch64/aarch64-opts.h             |   5 +
 gcc/config/aarch64/aarch64-protos.h           |   3 +
 gcc/config/aarch64/aarch64.cc                 |  66 ++++++++++
 gcc/config/aarch64/aarch64.opt                |  15 +++
 gcc/doc/invoke.texi                           |  15 +++
 .../aarch64/vect-vecabi-sleefgnu-neon.c       |  16 +++
 .../aarch64/vect-vecabi-sleefgnu-sve.c        |  16 +++
 8 files changed, 249 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c

--
2.25.1

Comments

Andrew Pinski April 14, 2023, 7:07 a.m. UTC | #1

On Fri, Apr 14, 2023 at 12:03 AM Lou Knauer via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> This adds support for the -mveclibabi option to the AArch64 backend of GCC by
> implementing the builtin_vectorized_function target hook for AArch64.
> The SLEEF Vectorized Math Library's GNUABI interface is used, and
> NEON/Advanced SIMD as well as SVE are supported.
>
> This was tested on the gcc testsuite and the llvm-test-suite on a AArch64
> host for NEON and SVE as well as on hand-written benchmarks. Where the
> vectorization of builtins was applied successfully in loops bound by the
> calls to those, significant (>2) performance gains can be observed.

This is so wrong and it is better if you actually just used a header
file instead.  Specifically the openmp vect pragmas.

Thanks,
Andrew Pinski


>
> gcc/ChangeLog:
>
>         * config/aarch64/aarch64.opt: Add -mveclibabi option.
>         * config/aarch64/aarch64-opts.h: Add aarch64_veclibabi enum.
>         * config/aarch64/aarch64-protos.h: Add
>         aarch64_builtin_vectorized_function declaration.
>         * config/aarch64/aarch64.cc: Handle -mveclibabi option and pure
>         scalable type info for scalable vectors without "SVE type" attributes.
>         * config/aarch64/aarch64-builtins.cc: Add
>         aarch64_builtin_vectorized_function definition.
>         * doc/invoke.texi: Document -mveclibabi for AArch64 targets.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c: New testcase.
>         * gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c: New testcase.
> ---
>  gcc/config/aarch64/aarch64-builtins.cc        | 113 ++++++++++++++++++
>  gcc/config/aarch64/aarch64-opts.h             |   5 +
>  gcc/config/aarch64/aarch64-protos.h           |   3 +
>  gcc/config/aarch64/aarch64.cc                 |  66 ++++++++++
>  gcc/config/aarch64/aarch64.opt                |  15 +++
>  gcc/doc/invoke.texi                           |  15 +++
>  .../aarch64/vect-vecabi-sleefgnu-neon.c       |  16 +++
>  .../aarch64/vect-vecabi-sleefgnu-sve.c        |  16 +++
>  8 files changed, 249 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
> index cc6b7c01fd1..f53fa91b8d0 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -47,6 +47,7 @@
>  #include "stringpool.h"
>  #include "attribs.h"
>  #include "gimple-fold.h"
> +#include "builtins.h"
>
>  #define v8qi_UP  E_V8QImode
>  #define v8di_UP  E_V8DImode
> @@ -3450,6 +3451,118 @@ aarch64_resolve_overloaded_builtin_general (location_t loc, tree function,
>    return NULL_TREE;
>  }
>
> +/* The vector library abi to use, if any.  */
> +extern enum aarch64_veclibabi aarch64_selected_veclibabi;
> +
> +/* Returns a function declaration for a vectorized version of the combined
> +   function with combined_fn code FN and the result vector type TYPE.
> +   NULL_TREE is returned if there is none available.  */
> +tree
> +aarch64_builtin_vectorized_function (unsigned int fn_code,
> +                                    tree type_out, tree type_in)
> +{
> +  if (TREE_CODE (type_out) != VECTOR_TYPE
> +      || TREE_CODE (type_in) != VECTOR_TYPE
> +      || aarch64_selected_veclibabi != aarch64_veclibabi_type_sleefgnu
> +      || !flag_unsafe_math_optimizations)
> +    return NULL_TREE;
> +
> +  machine_mode mode = TYPE_MODE (TREE_TYPE (type_out));
> +  poly_uint64 n = TYPE_VECTOR_SUBPARTS (type_out);
> +  if (mode != TYPE_MODE (TREE_TYPE (type_in))
> +      || !known_eq (n, TYPE_VECTOR_SUBPARTS (type_in)))
> +    return NULL_TREE;
> +
> +  bool is_scalable = !n.is_constant ();
> +  if (is_scalable)
> +    {
> +      /* SVE is needed for scalable vectors, a SVE register's size is
> +        always a multiple of 128.  */
> +      if (!TARGET_SVE
> +         || (mode == DFmode && !known_eq (n, poly_uint64 (2, 2)))
> +         || (mode == SFmode && !known_eq (n, poly_uint64 (4, 4))))
> +       return NULL_TREE;
> +    }
> +  else
> +    {
> +      /* A NEON register can hold two doubles or one float.  */
> +      if (!TARGET_SIMD
> +         || (mode == DFmode && n.to_constant () != 2)
> +         || (mode == SFmode && n.to_constant () != 4))
> +       return NULL_TREE;
> +    }
> +
> +  tree fntype;
> +  combined_fn fn = combined_fn (fn_code);
> +  const char *argencoding;
> +  switch (fn)
> +    {
> +      CASE_CFN_EXP:
> +      CASE_CFN_LOG:
> +      CASE_CFN_LOG10:
> +      CASE_CFN_TANH:
> +      CASE_CFN_TAN:
> +      CASE_CFN_ATAN:
> +      CASE_CFN_ATANH:
> +      CASE_CFN_CBRT:
> +      CASE_CFN_SINH:
> +      CASE_CFN_SIN:
> +      CASE_CFN_ASINH:
> +      CASE_CFN_ASIN:
> +      CASE_CFN_COSH:
> +      CASE_CFN_COS:
> +      CASE_CFN_ACOSH:
> +      CASE_CFN_ACOS:
> +       fntype = build_function_type_list (type_out, type_in, NULL);
> +       argencoding = "v";
> +       break;
> +
> +      CASE_CFN_POW:
> +      CASE_CFN_ATAN2:
> +       fntype = build_function_type_list (type_out, type_in, type_in, NULL);
> +       argencoding = "vv";
> +       break;
> +
> +      default:
> +       return NULL_TREE;
> +    }
> +
> +  tree fndecl = mathfn_built_in (mode == DFmode
> +                                ? double_type_node : float_type_node, fn);
> +  const char *scalar_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
> +  /* Builtins will always be prefixed with '__builtin_'.  */
> +  gcc_assert (strncmp (scalar_name, "__builtin_", 10) == 0);
> +  scalar_name += 10;
> +
> +  char vectorized_name[32];
> +  if (is_scalable)
> +    {
> +      /* SVE ISA */
> +      int n = snprintf (vectorized_name, sizeof (vectorized_name),
> +                       "_ZGVsNx%s_%s", argencoding, scalar_name);
> +      if (n < 0 || n > sizeof (vectorized_name))
> +       return NULL_TREE;
> +    }
> +  else
> +    {
> +      /* NEON ISA */
> +      int n = snprintf (vectorized_name, sizeof (vectorized_name),
> +                       "_ZGVnN%d%s_%s", mode == SFmode ? 4 : 2,
> +                       argencoding, scalar_name);
> +      if (n < 0 || n > sizeof (vectorized_name))
> +       return NULL_TREE;
> +    }
> +
> +  tree new_fndecl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
> +                               get_identifier (vectorized_name), fntype);
> +  TREE_PUBLIC (new_fndecl) = 1;
> +  TREE_READONLY (new_fndecl) = 1;
> +  DECL_EXTERNAL (new_fndecl) = 1;
> +  DECL_IS_NOVOPS (new_fndecl) = 1;
> +
> +  return new_fndecl;
> +}
> +
>  #undef AARCH64_CHECK_BUILTIN_MODE
>  #undef AARCH64_FIND_FRINT_VARIANT
>  #undef CF0
> diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h
> index a9f3e2715ca..d12871b893c 100644
> --- a/gcc/config/aarch64/aarch64-opts.h
> +++ b/gcc/config/aarch64/aarch64-opts.h
> @@ -98,4 +98,9 @@ enum aarch64_key_type {
>    AARCH64_KEY_B
>  };
>
> +enum aarch64_veclibabi {
> +  aarch64_veclibabi_type_none,
> +  aarch64_veclibabi_type_sleefgnu
> +};
> +
>  #endif
> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> index 63339fa47df..53c6e455da8 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -1066,4 +1066,7 @@ extern bool aarch64_harden_sls_blr_p (void);
>
>  extern void aarch64_output_patchable_area (unsigned int, bool);
>
> +extern tree aarch64_builtin_vectorized_function (unsigned int fn,
> +                                                tree type_out, tree type_in);
> +
>  #endif /* GCC_AARCH64_PROTOS_H */
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 42617ced73a..50ac37ff01e 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -84,6 +84,7 @@
>  #include "aarch64-feature-deps.h"
>  #include "config/arm/aarch-common.h"
>  #include "config/arm/aarch-common-protos.h"
> +#include "print-tree.h"
>
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -2951,6 +2952,62 @@ pure_scalable_type_info::analyze (const_tree type)
>        return IS_PST;
>      }
>
> +  /* Only functions and types that are part of the ARM C Language
> +     Extensions (arm_sve.h) have the SVE type attributes.
> +     The auto-vectorizer does not annotate the vector types it creates with
> +     those attributes. With the support of vectorized libm function
> +     builtins  for SVE, scalable vectors without special attributes
> +     have to be treated as well.  */
> +  if (TREE_CODE (type) == VECTOR_TYPE
> +      && !TYPE_VECTOR_SUBPARTS (type).is_constant ())
> +    {
> +      /* Boolean vectors are special because they are used by
> +        the vectorizer as masks that must go into the
> +        predicate registers.  */
> +      if (TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
> +       {
> +         p.num_zr = 0;
> +         p.num_pr = 1;
> +         p.mode = p.orig_mode = TYPE_MODE (type);
> +         add_piece (p);
> +         return IS_PST;
> +       }
> +
> +      static const struct {
> +       machine_mode mode;
> +       unsigned int element_size;
> +       poly_uint64 vector_size;
> +      } valid_vectors[] = {
> +       { VNx8BFmode, 16, poly_uint64 (8, 8) }, /* svbfloat16_t */
> +       { VNx8HFmode, 16, poly_uint64 (8, 8) }, /* svfloat16_t */
> +       { VNx4SFmode, 32, poly_uint64 (4, 4) }, /* svfloat32_t */
> +       { VNx2DFmode, 64, poly_uint64 (2, 2) }, /* svfloat64_t */
> +       { VNx16BImode, 8, poly_uint64 (16, 16) }, /* sv[u]int8_t */
> +       { VNx8HImode, 16, poly_uint64 (8, 8) }, /* sv[u]int16_t */
> +       { VNx4SImode, 32, poly_uint64 (4, 4) }, /* sv[u]int32_t */
> +       { VNx2DImode, 64, poly_uint64 (2, 2) }, /* sv[u]int64_t */
> +      };
> +
> +      machine_mode elm_mode = TYPE_MODE (TREE_TYPE (type));
> +      unsigned int elm_size = GET_MODE_BITSIZE (elm_mode).to_constant ();
> +      for (unsigned i = 0;
> +          i < sizeof (valid_vectors) / sizeof (valid_vectors[0]); i++)
> +       if (valid_vectors[i].element_size == elm_size
> +           && valid_vectors[i].mode == TYPE_MODE (type)
> +           && known_eq (valid_vectors[i].vector_size,
> +                        TYPE_VECTOR_SUBPARTS (type)))
> +         {
> +           p.num_zr = 1;
> +           p.num_pr = 0;
> +           p.mode = p.orig_mode = valid_vectors[i].mode;
> +           add_piece (p);
> +           return IS_PST;
> +         }
> +
> +      fatal_error (input_location, "unsupported vector type %qT"
> +                  " as function parameter without SVE attributes", type);
> +    }
> +
>    /* Check for user-defined PSTs.  */
>    if (TREE_CODE (type) == ARRAY_TYPE)
>      return analyze_array (type);
> @@ -17851,6 +17908,8 @@ aarch64_override_options_after_change_1 (struct gcc_options *opts)
>      flag_mrecip_low_precision_sqrt = true;
>  }
>
> +enum aarch64_veclibabi aarch64_selected_veclibabi = aarch64_veclibabi_type_none;
> +
>  /* 'Unpack' up the internal tuning structs and update the options
>      in OPTS.  The caller must have set up selected_tune and selected_arch
>      as all the other target-specific codegen decisions are
> @@ -18031,6 +18090,9 @@ aarch64_override_options_internal (struct gcc_options *opts)
>        && opts->x_optimize >= aarch64_tune_params.prefetch->default_opt_level)
>      opts->x_flag_prefetch_loop_arrays = 1;
>
> +  if (opts->x_aarch64_veclibabi_type == aarch64_veclibabi_type_sleefgnu)
> +    aarch64_selected_veclibabi = aarch64_veclibabi_type_sleefgnu;
> +
>    aarch64_override_options_after_change_1 (opts);
>  }
>
> @@ -28085,6 +28147,10 @@ aarch64_libgcc_floating_mode_supported_p
>  #undef TARGET_CONST_ANCHOR
>  #define TARGET_CONST_ANCHOR 0x1000000
>
> +#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
> +#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \
> +  aarch64_builtin_vectorized_function
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
>
>  #include "gt-aarch64.h"
> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
> index 1d7967db9c0..76013dacdea 100644
> --- a/gcc/config/aarch64/aarch64.opt
> +++ b/gcc/config/aarch64/aarch64.opt
> @@ -302,3 +302,18 @@ Constant memset size in bytes from which to start using MOPS sequence.
>  -param=aarch64-vect-unroll-limit=
>  Target Joined UInteger Var(aarch64_vect_unroll_limit) Init(4) Param
>  Limit how much the autovectorizer may unroll a loop.
> +
> +;; -mveclibabi=
> +TargetVariable
> +enum aarch64_veclibabi aarch64_veclibabi_type = aarch64_veclibabi_type_none
> +
> +mveclibabi=
> +Target RejectNegative Joined Var(aarch64_veclibabi_type) Enum(aarch64_veclibabi) Init(aarch64_veclibabi_type_none)
> +Vector library ABI to use.
> +
> +Enum
> +Name(aarch64_veclibabi) Type(enum aarch64_veclibabi)
> +Known vectorization library ABIs (for use with the -mveclibabi= option):
> +
> +EnumValue
> +Enum(aarch64_veclibabi) String(sleefgnu) Value(aarch64_veclibabi_type_sleefgnu)
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index a38547f53e5..71fbbf27522 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -20383,6 +20383,21 @@ across releases.
>
>  This option is only intended to be useful when developing GCC.
>
> +@opindex mveclibabi
> +@item -mveclibabi=@var{type}
> +Specifies the ABI type to use for vectorizing intrinsics using an
> +external library.  The only type supported at present is @samp{sleefgnu},
> +which specifies to use the GNU ABI variant of the Sleef Vectorized
> +Math Library. This flag can be used for both, Advanced SIMD (NEON) and SVE.
> +
> +GCC currently emits vectorized calls to @code{exp}, @code{log}, @code{log10},
> +@code{tanh}, @code{tan}, @code{atan}, @code{atanh}, @code{cbrt}, @code{sinh},
> +@code{sin}, @code{asinh} and @code{asin} when possible and profitable
> +on AArch64.
> +
> +Both @option{-ftree-vectorize} and @option{-funsafe-math-optimizations}
> +must also be enabled. The libsleefgnu must be specified at link time.
> +
>  @opindex mverbose-cost-dump
>  @item -mverbose-cost-dump
>  Enable verbose cost model dumping in the debug dump files.  This option is
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c
> new file mode 100644
> index 00000000000..e9f6078cd12
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8-a+simd -ftree-vectorize -mveclibabi=sleefgnu -ffast-math" } */
> +
> +extern float sinf(float);
> +
> +float x[256];
> +
> +void foo(void)
> +{
> +  int i;
> +
> +  for (i=0; i<256; ++i)
> +    x[i] = sinf(x[i]);
> +}
> +
> +/* { dg-final { scan-assembler "_ZGVnN4v_sinf" } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c
> new file mode 100644
> index 00000000000..8319ae420e1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8-a+sve -ftree-vectorize -mveclibabi=sleefgnu -ffast-math" } */
> +
> +extern float sinf(float);
> +
> +float x[256];
> +
> +void foo(void)
> +{
> +  int i;
> +
> +  for (i=0; i<256; ++i)
> +    x[i] = sinf(x[i]);
> +}
> +
> +/* { dg-final { scan-assembler "_ZGVsNxv_sinf" } } */
> --
> 2.25.1
>

Lou Knauer April 14, 2023, 9:34 a.m. UTC | #2

> -----Original Message-----
> From: Andrew Pinski <pinskia@gmail.com>
> Sent: Friday, April 14, 2023 09:08
> To: Lou Knauer <lou.knauer@sipearl.com>
> Cc: gcc-patches@gcc.gnu.org; Etienne Renault <etienne.renault@sipearl.com>
> Subject: Re: [PATCH] aarch64: Add -mveclibabi=sleefgnu
> 
> On Fri, Apr 14, 2023 at 12:03 AM Lou Knauer via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > This adds support for the -mveclibabi option to the AArch64 backend of GCC by
> > implementing the builtin_vectorized_function target hook for AArch64.
> > The SLEEF Vectorized Math Library's GNUABI interface is used, and
> > NEON/Advanced SIMD as well as SVE are supported.
> >
> > This was tested on the gcc testsuite and the llvm-test-suite on a AArch64
> > host for NEON and SVE as well as on hand-written benchmarks. Where the
> > vectorization of builtins was applied successfully in loops bound by the
> > calls to those, significant (>2) performance gains can be observed.
> 
> This is so wrong and it is better if you actually just used a header
> file instead.  Specifically the openmp vect pragmas.
> 
> Thanks,
> Andrew Pinski
> 

Thank you for your quick response. I do not fully understand your point:
the OpenMP Declare SIMD pragmas are not yet implemented for SVE (here [0]
someone started working on that, but it does not work in its current state).
The `-mveclibabi` flag seems to be the only solution for SVE vectorization of
libm functions from our point of view.

Indeed, a custom header that redirects regular libm function calls to their
Sleef equivalent would be a solution for NEON since OpenMP Declare SIMD
pragmas are implemented for NEON in GCC. Nonetheless as far as I can tell,
the libmvec is not yet support for AArch64, so Sleef is unavoidable. I
therefore opted for a solution similar to the one for x86 and the SVML, where
only a additional flag during compilation is needed (instead of having to
modify source code to add includes). From a vectorization legality perspective,
this strategy also seems more reliable than a redirecting header since
Sleef functions (even the scalar ones) never set the errno and GCC already
verifies such details when transforming libm calls to builtins.

Alternatively, do you prefere a patch that adds SVE support for
#pragma omp declare simd declarations, thus enabling the same header-based
strategy for SVE as for NEON?

Thank you and kind regards,
Lou Knauer

[0]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342

> 
> >
> > gcc/ChangeLog:
> >
> >         * config/aarch64/aarch64.opt: Add -mveclibabi option.
> >         * config/aarch64/aarch64-opts.h: Add aarch64_veclibabi enum.
> >         * config/aarch64/aarch64-protos.h: Add
> >         aarch64_builtin_vectorized_function declaration.
> >         * config/aarch64/aarch64.cc: Handle -mveclibabi option and pure
> >         scalable type info for scalable vectors without "SVE type" attributes.
> >         * config/aarch64/aarch64-builtins.cc: Add
> >         aarch64_builtin_vectorized_function definition.
> >         * doc/invoke.texi: Document -mveclibabi for AArch64 targets.
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c: New testcase.
> >         * gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c: New testcase.
> > ---
> >  gcc/config/aarch64/aarch64-builtins.cc        | 113 ++++++++++++++++++
> >  gcc/config/aarch64/aarch64-opts.h             |   5 +
> >  gcc/config/aarch64/aarch64-protos.h           |   3 +
> >  gcc/config/aarch64/aarch64.cc                 |  66 ++++++++++
> >  gcc/config/aarch64/aarch64.opt                |  15 +++
> >  gcc/doc/invoke.texi                           |  15 +++
> >  .../aarch64/vect-vecabi-sleefgnu-neon.c       |  16 +++
> >  .../aarch64/vect-vecabi-sleefgnu-sve.c        |  16 +++
> >  8 files changed, 249 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c
> >
> > diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
> > index cc6b7c01fd1..f53fa91b8d0 100644
> > --- a/gcc/config/aarch64/aarch64-builtins.cc
> > +++ b/gcc/config/aarch64/aarch64-builtins.cc
> > @@ -47,6 +47,7 @@
> >  #include "stringpool.h"
> >  #include "attribs.h"
> >  #include "gimple-fold.h"
> > +#include "builtins.h"
> >
> >  #define v8qi_UP  E_V8QImode
> >  #define v8di_UP  E_V8DImode
> > @@ -3450,6 +3451,118 @@ aarch64_resolve_overloaded_builtin_general (location_t loc, tree function,
> >    return NULL_TREE;
> >  }
> >
> > +/* The vector library abi to use, if any.  */
> > +extern enum aarch64_veclibabi aarch64_selected_veclibabi;
> > +
> > +/* Returns a function declaration for a vectorized version of the combined
> > +   function with combined_fn code FN and the result vector type TYPE.
> > +   NULL_TREE is returned if there is none available.  */
> > +tree
> > +aarch64_builtin_vectorized_function (unsigned int fn_code,
> > +                                    tree type_out, tree type_in)
> > +{
> > +  if (TREE_CODE (type_out) != VECTOR_TYPE
> > +      || TREE_CODE (type_in) != VECTOR_TYPE
> > +      || aarch64_selected_veclibabi != aarch64_veclibabi_type_sleefgnu
> > +      || !flag_unsafe_math_optimizations)
> > +    return NULL_TREE;
> > +
> > +  machine_mode mode = TYPE_MODE (TREE_TYPE (type_out));
> > +  poly_uint64 n = TYPE_VECTOR_SUBPARTS (type_out);
> > +  if (mode != TYPE_MODE (TREE_TYPE (type_in))
> > +      || !known_eq (n, TYPE_VECTOR_SUBPARTS (type_in)))
> > +    return NULL_TREE;
> > +
> > +  bool is_scalable = !n.is_constant ();
> > +  if (is_scalable)
> > +    {
> > +      /* SVE is needed for scalable vectors, a SVE register's size is
> > +        always a multiple of 128.  */
> > +      if (!TARGET_SVE
> > +         || (mode == DFmode && !known_eq (n, poly_uint64 (2, 2)))
> > +         || (mode == SFmode && !known_eq (n, poly_uint64 (4, 4))))
> > +       return NULL_TREE;
> > +    }
> > +  else
> > +    {
> > +      /* A NEON register can hold two doubles or one float.  */
> > +      if (!TARGET_SIMD
> > +         || (mode == DFmode && n.to_constant () != 2)
> > +         || (mode == SFmode && n.to_constant () != 4))
> > +       return NULL_TREE;
> > +    }
> > +
> > +  tree fntype;
> > +  combined_fn fn = combined_fn (fn_code);
> > +  const char *argencoding;
> > +  switch (fn)
> > +    {
> > +      CASE_CFN_EXP:
> > +      CASE_CFN_LOG:
> > +      CASE_CFN_LOG10:
> > +      CASE_CFN_TANH:
> > +      CASE_CFN_TAN:
> > +      CASE_CFN_ATAN:
> > +      CASE_CFN_ATANH:
> > +      CASE_CFN_CBRT:
> > +      CASE_CFN_SINH:
> > +      CASE_CFN_SIN:
> > +      CASE_CFN_ASINH:
> > +      CASE_CFN_ASIN:
> > +      CASE_CFN_COSH:
> > +      CASE_CFN_COS:
> > +      CASE_CFN_ACOSH:
> > +      CASE_CFN_ACOS:
> > +       fntype = build_function_type_list (type_out, type_in, NULL);
> > +       argencoding = "v";
> > +       break;
> > +
> > +      CASE_CFN_POW:
> > +      CASE_CFN_ATAN2:
> > +       fntype = build_function_type_list (type_out, type_in, type_in, NULL);
> > +       argencoding = "vv";
> > +       break;
> > +
> > +      default:
> > +       return NULL_TREE;
> > +    }
> > +
> > +  tree fndecl = mathfn_built_in (mode == DFmode
> > +                                ? double_type_node : float_type_node, fn);
> > +  const char *scalar_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
> > +  /* Builtins will always be prefixed with '__builtin_'.  */
> > +  gcc_assert (strncmp (scalar_name, "__builtin_", 10) == 0);
> > +  scalar_name += 10;
> > +
> > +  char vectorized_name[32];
> > +  if (is_scalable)
> > +    {
> > +      /* SVE ISA */
> > +      int n = snprintf (vectorized_name, sizeof (vectorized_name),
> > +                       "_ZGVsNx%s_%s", argencoding, scalar_name);
> > +      if (n < 0 || n > sizeof (vectorized_name))
> > +       return NULL_TREE;
> > +    }
> > +  else
> > +    {
> > +      /* NEON ISA */
> > +      int n = snprintf (vectorized_name, sizeof (vectorized_name),
> > +                       "_ZGVnN%d%s_%s", mode == SFmode ? 4 : 2,
> > +                       argencoding, scalar_name);
> > +      if (n < 0 || n > sizeof (vectorized_name))
> > +       return NULL_TREE;
> > +    }
> > +
> > +  tree new_fndecl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
> > +                               get_identifier (vectorized_name), fntype);
> > +  TREE_PUBLIC (new_fndecl) = 1;
> > +  TREE_READONLY (new_fndecl) = 1;
> > +  DECL_EXTERNAL (new_fndecl) = 1;
> > +  DECL_IS_NOVOPS (new_fndecl) = 1;
> > +
> > +  return new_fndecl;
> > +}
> > +
> >  #undef AARCH64_CHECK_BUILTIN_MODE
> >  #undef AARCH64_FIND_FRINT_VARIANT
> >  #undef CF0
> > diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h
> > index a9f3e2715ca..d12871b893c 100644
> > --- a/gcc/config/aarch64/aarch64-opts.h
> > +++ b/gcc/config/aarch64/aarch64-opts.h
> > @@ -98,4 +98,9 @@ enum aarch64_key_type {
> >    AARCH64_KEY_B
> >  };
> >
> > +enum aarch64_veclibabi {
> > +  aarch64_veclibabi_type_none,
> > +  aarch64_veclibabi_type_sleefgnu
> > +};
> > +
> >  #endif
> > diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> > index 63339fa47df..53c6e455da8 100644
> > --- a/gcc/config/aarch64/aarch64-protos.h
> > +++ b/gcc/config/aarch64/aarch64-protos.h
> > @@ -1066,4 +1066,7 @@ extern bool aarch64_harden_sls_blr_p (void);
> >
> >  extern void aarch64_output_patchable_area (unsigned int, bool);
> >
> > +extern tree aarch64_builtin_vectorized_function (unsigned int fn,
> > +                                                tree type_out, tree type_in);
> > +
> >  #endif /* GCC_AARCH64_PROTOS_H */
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 42617ced73a..50ac37ff01e 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -84,6 +84,7 @@
> >  #include "aarch64-feature-deps.h"
> >  #include "config/arm/aarch-common.h"
> >  #include "config/arm/aarch-common-protos.h"
> > +#include "print-tree.h"
> >
> >  /* This file should be included last.  */
> >  #include "target-def.h"
> > @@ -2951,6 +2952,62 @@ pure_scalable_type_info::analyze (const_tree type)
> >        return IS_PST;
> >      }
> >
> > +  /* Only functions and types that are part of the ARM C Language
> > +     Extensions (arm_sve.h) have the SVE type attributes.
> > +     The auto-vectorizer does not annotate the vector types it creates with
> > +     those attributes. With the support of vectorized libm function
> > +     builtins  for SVE, scalable vectors without special attributes
> > +     have to be treated as well.  */
> > +  if (TREE_CODE (type) == VECTOR_TYPE
> > +      && !TYPE_VECTOR_SUBPARTS (type).is_constant ())
> > +    {
> > +      /* Boolean vectors are special because they are used by
> > +        the vectorizer as masks that must go into the
> > +        predicate registers.  */
> > +      if (TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
> > +       {
> > +         p.num_zr = 0;
> > +         p.num_pr = 1;
> > +         p.mode = p.orig_mode = TYPE_MODE (type);
> > +         add_piece (p);
> > +         return IS_PST;
> > +       }
> > +
> > +      static const struct {
> > +       machine_mode mode;
> > +       unsigned int element_size;
> > +       poly_uint64 vector_size;
> > +      } valid_vectors[] = {
> > +       { VNx8BFmode, 16, poly_uint64 (8, 8) }, /* svbfloat16_t */
> > +       { VNx8HFmode, 16, poly_uint64 (8, 8) }, /* svfloat16_t */
> > +       { VNx4SFmode, 32, poly_uint64 (4, 4) }, /* svfloat32_t */
> > +       { VNx2DFmode, 64, poly_uint64 (2, 2) }, /* svfloat64_t */
> > +       { VNx16BImode, 8, poly_uint64 (16, 16) }, /* sv[u]int8_t */
> > +       { VNx8HImode, 16, poly_uint64 (8, 8) }, /* sv[u]int16_t */
> > +       { VNx4SImode, 32, poly_uint64 (4, 4) }, /* sv[u]int32_t */
> > +       { VNx2DImode, 64, poly_uint64 (2, 2) }, /* sv[u]int64_t */
> > +      };
> > +
> > +      machine_mode elm_mode = TYPE_MODE (TREE_TYPE (type));
> > +      unsigned int elm_size = GET_MODE_BITSIZE (elm_mode).to_constant ();
> > +      for (unsigned i = 0;
> > +          i < sizeof (valid_vectors) / sizeof (valid_vectors[0]); i++)
> > +       if (valid_vectors[i].element_size == elm_size
> > +           && valid_vectors[i].mode == TYPE_MODE (type)
> > +           && known_eq (valid_vectors[i].vector_size,
> > +                        TYPE_VECTOR_SUBPARTS (type)))
> > +         {
> > +           p.num_zr = 1;
> > +           p.num_pr = 0;
> > +           p.mode = p.orig_mode = valid_vectors[i].mode;
> > +           add_piece (p);
> > +           return IS_PST;
> > +         }
> > +
> > +      fatal_error (input_location, "unsupported vector type %qT"
> > +                  " as function parameter without SVE attributes", type);
> > +    }
> > +
> >    /* Check for user-defined PSTs.  */
> >    if (TREE_CODE (type) == ARRAY_TYPE)
> >      return analyze_array (type);
> > @@ -17851,6 +17908,8 @@ aarch64_override_options_after_change_1 (struct gcc_options *opts)
> >      flag_mrecip_low_precision_sqrt = true;
> >  }
> >
> > +enum aarch64_veclibabi aarch64_selected_veclibabi = aarch64_veclibabi_type_none;
> > +
> >  /* 'Unpack' up the internal tuning structs and update the options
> >      in OPTS.  The caller must have set up selected_tune and selected_arch
> >      as all the other target-specific codegen decisions are
> > @@ -18031,6 +18090,9 @@ aarch64_override_options_internal (struct gcc_options *opts)
> >        && opts->x_optimize >= aarch64_tune_params.prefetch->default_opt_level)
> >      opts->x_flag_prefetch_loop_arrays = 1;
> >
> > +  if (opts->x_aarch64_veclibabi_type == aarch64_veclibabi_type_sleefgnu)
> > +    aarch64_selected_veclibabi = aarch64_veclibabi_type_sleefgnu;
> > +
> >    aarch64_override_options_after_change_1 (opts);
> >  }
> >
> > @@ -28085,6 +28147,10 @@ aarch64_libgcc_floating_mode_supported_p
> >  #undef TARGET_CONST_ANCHOR
> >  #define TARGET_CONST_ANCHOR 0x1000000
> >
> > +#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
> > +#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \
> > +  aarch64_builtin_vectorized_function
> > +
> >  struct gcc_target targetm = TARGET_INITIALIZER;
> >
> >  #include "gt-aarch64.h"
> > diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
> > index 1d7967db9c0..76013dacdea 100644
> > --- a/gcc/config/aarch64/aarch64.opt
> > +++ b/gcc/config/aarch64/aarch64.opt
> > @@ -302,3 +302,18 @@ Constant memset size in bytes from which to start using MOPS sequence.
> >  -param=aarch64-vect-unroll-limit=
> >  Target Joined UInteger Var(aarch64_vect_unroll_limit) Init(4) Param
> >  Limit how much the autovectorizer may unroll a loop.
> > +
> > +;; -mveclibabi=
> > +TargetVariable
> > +enum aarch64_veclibabi aarch64_veclibabi_type = aarch64_veclibabi_type_none
> > +
> > +mveclibabi=
> > +Target RejectNegative Joined Var(aarch64_veclibabi_type) Enum(aarch64_veclibabi) Init(aarch64_veclibabi_type_none)
> > +Vector library ABI to use.
> > +
> > +Enum
> > +Name(aarch64_veclibabi) Type(enum aarch64_veclibabi)
> > +Known vectorization library ABIs (for use with the -mveclibabi= option):
> > +
> > +EnumValue
> > +Enum(aarch64_veclibabi) String(sleefgnu) Value(aarch64_veclibabi_type_sleefgnu)
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index a38547f53e5..71fbbf27522 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -20383,6 +20383,21 @@ across releases.
> >
> >  This option is only intended to be useful when developing GCC.
> >
> > +@opindex mveclibabi
> > +@item -mveclibabi=@var{type}
> > +Specifies the ABI type to use for vectorizing intrinsics using an
> > +external library.  The only type supported at present is @samp{sleefgnu},
> > +which specifies to use the GNU ABI variant of the Sleef Vectorized
> > +Math Library. This flag can be used for both, Advanced SIMD (NEON) and SVE.
> > +
> > +GCC currently emits vectorized calls to @code{exp}, @code{log}, @code{log10},
> > +@code{tanh}, @code{tan}, @code{atan}, @code{atanh}, @code{cbrt}, @code{sinh},
> > +@code{sin}, @code{asinh} and @code{asin} when possible and profitable
> > +on AArch64.
> > +
> > +Both @option{-ftree-vectorize} and @option{-funsafe-math-optimizations}
> > +must also be enabled. The libsleefgnu must be specified at link time.
> > +
> >  @opindex mverbose-cost-dump
> >  @item -mverbose-cost-dump
> >  Enable verbose cost model dumping in the debug dump files.  This option is
> > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-
> sleefgnu-neon.c
> > new file mode 100644
> > index 00000000000..e9f6078cd12
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -march=armv8-a+simd -ftree-vectorize -mveclibabi=sleefgnu -ffast-math" } */
> > +
> > +extern float sinf(float);
> > +
> > +float x[256];
> > +
> > +void foo(void)
> > +{
> > +  int i;
> > +
> > +  for (i=0; i<256; ++i)
> > +    x[i] = sinf(x[i]);
> > +}
> > +
> > +/* { dg-final { scan-assembler "_ZGVnN4v_sinf" } } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-
> sleefgnu-sve.c
> > new file mode 100644
> > index 00000000000..8319ae420e1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -march=armv8-a+sve -ftree-vectorize -mveclibabi=sleefgnu -ffast-math" } */
> > +
> > +extern float sinf(float);
> > +
> > +float x[256];
> > +
> > +void foo(void)
> > +{
> > +  int i;
> > +
> > +  for (i=0; i<256; ++i)
> > +    x[i] = sinf(x[i]);
> > +}
> > +
> > +/* { dg-final { scan-assembler "_ZGVsNxv_sinf" } } */
> > --
> > 2.25.1
> >

Andre Vieira (lists) April 14, 2023, 10:30 a.m. UTC | #3

I have (outdated) RFC's here: 
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613593.html

I am working on this patch series for stage 1. The list of features I am 
working on are:
* SVE support for #pragma omp declare simd
* Support for simdclone usage in autovec from #pragma omp declare variant
   This offers us a more fine-tuned approach to define what is and 
what's not available per function
* Support for use of simdclones in SLP

Also planning to enable the use of mixed-types that is currently 
disabled for AArch64, it's not a feature I suspect we need for our 
use-case but it will enable better testing as we can then enable AArch64 
as a simdclone target in the testsuite.

I could try to post some updates to the RFCs, I have been rebasing them 
on top of Andrew Stubbs latest patch to enable inbranch codegen. Let me 
know if you'd like to see these updates sooner rather than later so you 
can try them out for your usecase.

Kind regards,
Andre

On 14/04/2023 10:34, Lou Knauer via Gcc-patches wrote:
>> -----Original Message-----
>> From: Andrew Pinski <pinskia@gmail.com>
>> Sent: Friday, April 14, 2023 09:08
>> To: Lou Knauer <lou.knauer@sipearl.com>
>> Cc: gcc-patches@gcc.gnu.org; Etienne Renault <etienne.renault@sipearl.com>
>> Subject: Re: [PATCH] aarch64: Add -mveclibabi=sleefgnu
>>
>> On Fri, Apr 14, 2023 at 12:03 AM Lou Knauer via Gcc-patches
>> <gcc-patches@gcc.gnu.org> wrote:
>>>
>>> This adds support for the -mveclibabi option to the AArch64 backend of GCC by
>>> implementing the builtin_vectorized_function target hook for AArch64.
>>> The SLEEF Vectorized Math Library's GNUABI interface is used, and
>>> NEON/Advanced SIMD as well as SVE are supported.
>>>
>>> This was tested on the gcc testsuite and the llvm-test-suite on a AArch64
>>> host for NEON and SVE as well as on hand-written benchmarks. Where the
>>> vectorization of builtins was applied successfully in loops bound by the
>>> calls to those, significant (>2) performance gains can be observed.
>>
>> This is so wrong and it is better if you actually just used a header
>> file instead.  Specifically the openmp vect pragmas.
>>
>> Thanks,
>> Andrew Pinski
>>
> 
> Thank you for your quick response. I do not fully understand your point:
> the OpenMP Declare SIMD pragmas are not yet implemented for SVE (here [0]
> someone started working on that, but it does not work in its current state).
> The `-mveclibabi` flag seems to be the only solution for SVE vectorization of
> libm functions from our point of view.
> 
> Indeed, a custom header that redirects regular libm function calls to their
> Sleef equivalent would be a solution for NEON since OpenMP Declare SIMD
> pragmas are implemented for NEON in GCC. Nonetheless as far as I can tell,
> the libmvec is not yet support for AArch64, so Sleef is unavoidable. I
> therefore opted for a solution similar to the one for x86 and the SVML, where
> only a additional flag during compilation is needed (instead of having to
> modify source code to add includes). From a vectorization legality perspective,
> this strategy also seems more reliable than a redirecting header since
> Sleef functions (even the scalar ones) never set the errno and GCC already
> verifies such details when transforming libm calls to builtins.
> 
> Alternatively, do you prefere a patch that adds SVE support for
> #pragma omp declare simd declarations, thus enabling the same header-based
> strategy for SVE as for NEON?
> 
> Thank you and kind regards,
> Lou Knauer
> 
> [0]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342
> 
>>
>>>
>>> gcc/ChangeLog:
>>>
>>>          * config/aarch64/aarch64.opt: Add -mveclibabi option.
>>>          * config/aarch64/aarch64-opts.h: Add aarch64_veclibabi enum.
>>>          * config/aarch64/aarch64-protos.h: Add
>>>          aarch64_builtin_vectorized_function declaration.
>>>          * config/aarch64/aarch64.cc: Handle -mveclibabi option and pure
>>>          scalable type info for scalable vectors without "SVE type" attributes.
>>>          * config/aarch64/aarch64-builtins.cc: Add
>>>          aarch64_builtin_vectorized_function definition.
>>>          * doc/invoke.texi: Document -mveclibabi for AArch64 targets.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>          * gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c: New testcase.
>>>          * gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c: New testcase.
>>> ---
>>>   gcc/config/aarch64/aarch64-builtins.cc        | 113 ++++++++++++++++++
>>>   gcc/config/aarch64/aarch64-opts.h             |   5 +
>>>   gcc/config/aarch64/aarch64-protos.h           |   3 +
>>>   gcc/config/aarch64/aarch64.cc                 |  66 ++++++++++
>>>   gcc/config/aarch64/aarch64.opt                |  15 +++
>>>   gcc/doc/invoke.texi                           |  15 +++
>>>   .../aarch64/vect-vecabi-sleefgnu-neon.c       |  16 +++
>>>   .../aarch64/vect-vecabi-sleefgnu-sve.c        |  16 +++
>>>   8 files changed, 249 insertions(+)
>>>   create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c
>>>   create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c
>>>
>>> diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
>>> index cc6b7c01fd1..f53fa91b8d0 100644
>>> --- a/gcc/config/aarch64/aarch64-builtins.cc
>>> +++ b/gcc/config/aarch64/aarch64-builtins.cc
>>> @@ -47,6 +47,7 @@
>>>   #include "stringpool.h"
>>>   #include "attribs.h"
>>>   #include "gimple-fold.h"
>>> +#include "builtins.h"
>>>
>>>   #define v8qi_UP  E_V8QImode
>>>   #define v8di_UP  E_V8DImode
>>> @@ -3450,6 +3451,118 @@ aarch64_resolve_overloaded_builtin_general (location_t loc, tree function,
>>>     return NULL_TREE;
>>>   }
>>>
>>> +/* The vector library abi to use, if any.  */
>>> +extern enum aarch64_veclibabi aarch64_selected_veclibabi;
>>> +
>>> +/* Returns a function declaration for a vectorized version of the combined
>>> +   function with combined_fn code FN and the result vector type TYPE.
>>> +   NULL_TREE is returned if there is none available.  */
>>> +tree
>>> +aarch64_builtin_vectorized_function (unsigned int fn_code,
>>> +                                    tree type_out, tree type_in)
>>> +{
>>> +  if (TREE_CODE (type_out) != VECTOR_TYPE
>>> +      || TREE_CODE (type_in) != VECTOR_TYPE
>>> +      || aarch64_selected_veclibabi != aarch64_veclibabi_type_sleefgnu
>>> +      || !flag_unsafe_math_optimizations)
>>> +    return NULL_TREE;
>>> +
>>> +  machine_mode mode = TYPE_MODE (TREE_TYPE (type_out));
>>> +  poly_uint64 n = TYPE_VECTOR_SUBPARTS (type_out);
>>> +  if (mode != TYPE_MODE (TREE_TYPE (type_in))
>>> +      || !known_eq (n, TYPE_VECTOR_SUBPARTS (type_in)))
>>> +    return NULL_TREE;
>>> +
>>> +  bool is_scalable = !n.is_constant ();
>>> +  if (is_scalable)
>>> +    {
>>> +      /* SVE is needed for scalable vectors, a SVE register's size is
>>> +        always a multiple of 128.  */
>>> +      if (!TARGET_SVE
>>> +         || (mode == DFmode && !known_eq (n, poly_uint64 (2, 2)))
>>> +         || (mode == SFmode && !known_eq (n, poly_uint64 (4, 4))))
>>> +       return NULL_TREE;
>>> +    }
>>> +  else
>>> +    {
>>> +      /* A NEON register can hold two doubles or one float.  */
>>> +      if (!TARGET_SIMD
>>> +         || (mode == DFmode && n.to_constant () != 2)
>>> +         || (mode == SFmode && n.to_constant () != 4))
>>> +       return NULL_TREE;
>>> +    }
>>> +
>>> +  tree fntype;
>>> +  combined_fn fn = combined_fn (fn_code);
>>> +  const char *argencoding;
>>> +  switch (fn)
>>> +    {
>>> +      CASE_CFN_EXP:
>>> +      CASE_CFN_LOG:
>>> +      CASE_CFN_LOG10:
>>> +      CASE_CFN_TANH:
>>> +      CASE_CFN_TAN:
>>> +      CASE_CFN_ATAN:
>>> +      CASE_CFN_ATANH:
>>> +      CASE_CFN_CBRT:
>>> +      CASE_CFN_SINH:
>>> +      CASE_CFN_SIN:
>>> +      CASE_CFN_ASINH:
>>> +      CASE_CFN_ASIN:
>>> +      CASE_CFN_COSH:
>>> +      CASE_CFN_COS:
>>> +      CASE_CFN_ACOSH:
>>> +      CASE_CFN_ACOS:
>>> +       fntype = build_function_type_list (type_out, type_in, NULL);
>>> +       argencoding = "v";
>>> +       break;
>>> +
>>> +      CASE_CFN_POW:
>>> +      CASE_CFN_ATAN2:
>>> +       fntype = build_function_type_list (type_out, type_in, type_in, NULL);
>>> +       argencoding = "vv";
>>> +       break;
>>> +
>>> +      default:
>>> +       return NULL_TREE;
>>> +    }
>>> +
>>> +  tree fndecl = mathfn_built_in (mode == DFmode
>>> +                                ? double_type_node : float_type_node, fn);
>>> +  const char *scalar_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
>>> +  /* Builtins will always be prefixed with '__builtin_'.  */
>>> +  gcc_assert (strncmp (scalar_name, "__builtin_", 10) == 0);
>>> +  scalar_name += 10;
>>> +
>>> +  char vectorized_name[32];
>>> +  if (is_scalable)
>>> +    {
>>> +      /* SVE ISA */
>>> +      int n = snprintf (vectorized_name, sizeof (vectorized_name),
>>> +                       "_ZGVsNx%s_%s", argencoding, scalar_name);
>>> +      if (n < 0 || n > sizeof (vectorized_name))
>>> +       return NULL_TREE;
>>> +    }
>>> +  else
>>> +    {
>>> +      /* NEON ISA */
>>> +      int n = snprintf (vectorized_name, sizeof (vectorized_name),
>>> +                       "_ZGVnN%d%s_%s", mode == SFmode ? 4 : 2,
>>> +                       argencoding, scalar_name);
>>> +      if (n < 0 || n > sizeof (vectorized_name))
>>> +       return NULL_TREE;
>>> +    }
>>> +
>>> +  tree new_fndecl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
>>> +                               get_identifier (vectorized_name), fntype);
>>> +  TREE_PUBLIC (new_fndecl) = 1;
>>> +  TREE_READONLY (new_fndecl) = 1;
>>> +  DECL_EXTERNAL (new_fndecl) = 1;
>>> +  DECL_IS_NOVOPS (new_fndecl) = 1;
>>> +
>>> +  return new_fndecl;
>>> +}
>>> +
>>>   #undef AARCH64_CHECK_BUILTIN_MODE
>>>   #undef AARCH64_FIND_FRINT_VARIANT
>>>   #undef CF0
>>> diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h
>>> index a9f3e2715ca..d12871b893c 100644
>>> --- a/gcc/config/aarch64/aarch64-opts.h
>>> +++ b/gcc/config/aarch64/aarch64-opts.h
>>> @@ -98,4 +98,9 @@ enum aarch64_key_type {
>>>     AARCH64_KEY_B
>>>   };
>>>
>>> +enum aarch64_veclibabi {
>>> +  aarch64_veclibabi_type_none,
>>> +  aarch64_veclibabi_type_sleefgnu
>>> +};
>>> +
>>>   #endif
>>> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
>>> index 63339fa47df..53c6e455da8 100644
>>> --- a/gcc/config/aarch64/aarch64-protos.h
>>> +++ b/gcc/config/aarch64/aarch64-protos.h
>>> @@ -1066,4 +1066,7 @@ extern bool aarch64_harden_sls_blr_p (void);
>>>
>>>   extern void aarch64_output_patchable_area (unsigned int, bool);
>>>
>>> +extern tree aarch64_builtin_vectorized_function (unsigned int fn,
>>> +                                                tree type_out, tree type_in);
>>> +
>>>   #endif /* GCC_AARCH64_PROTOS_H */
>>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>>> index 42617ced73a..50ac37ff01e 100644
>>> --- a/gcc/config/aarch64/aarch64.cc
>>> +++ b/gcc/config/aarch64/aarch64.cc
>>> @@ -84,6 +84,7 @@
>>>   #include "aarch64-feature-deps.h"
>>>   #include "config/arm/aarch-common.h"
>>>   #include "config/arm/aarch-common-protos.h"
>>> +#include "print-tree.h"
>>>
>>>   /* This file should be included last.  */
>>>   #include "target-def.h"
>>> @@ -2951,6 +2952,62 @@ pure_scalable_type_info::analyze (const_tree type)
>>>         return IS_PST;
>>>       }
>>>
>>> +  /* Only functions and types that are part of the ARM C Language
>>> +     Extensions (arm_sve.h) have the SVE type attributes.
>>> +     The auto-vectorizer does not annotate the vector types it creates with
>>> +     those attributes. With the support of vectorized libm function
>>> +     builtins  for SVE, scalable vectors without special attributes
>>> +     have to be treated as well.  */
>>> +  if (TREE_CODE (type) == VECTOR_TYPE
>>> +      && !TYPE_VECTOR_SUBPARTS (type).is_constant ())
>>> +    {
>>> +      /* Boolean vectors are special because they are used by
>>> +        the vectorizer as masks that must go into the
>>> +        predicate registers.  */
>>> +      if (TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
>>> +       {
>>> +         p.num_zr = 0;
>>> +         p.num_pr = 1;
>>> +         p.mode = p.orig_mode = TYPE_MODE (type);
>>> +         add_piece (p);
>>> +         return IS_PST;
>>> +       }
>>> +
>>> +      static const struct {
>>> +       machine_mode mode;
>>> +       unsigned int element_size;
>>> +       poly_uint64 vector_size;
>>> +      } valid_vectors[] = {
>>> +       { VNx8BFmode, 16, poly_uint64 (8, 8) }, /* svbfloat16_t */
>>> +       { VNx8HFmode, 16, poly_uint64 (8, 8) }, /* svfloat16_t */
>>> +       { VNx4SFmode, 32, poly_uint64 (4, 4) }, /* svfloat32_t */
>>> +       { VNx2DFmode, 64, poly_uint64 (2, 2) }, /* svfloat64_t */
>>> +       { VNx16BImode, 8, poly_uint64 (16, 16) }, /* sv[u]int8_t */
>>> +       { VNx8HImode, 16, poly_uint64 (8, 8) }, /* sv[u]int16_t */
>>> +       { VNx4SImode, 32, poly_uint64 (4, 4) }, /* sv[u]int32_t */
>>> +       { VNx2DImode, 64, poly_uint64 (2, 2) }, /* sv[u]int64_t */
>>> +      };
>>> +
>>> +      machine_mode elm_mode = TYPE_MODE (TREE_TYPE (type));
>>> +      unsigned int elm_size = GET_MODE_BITSIZE (elm_mode).to_constant ();
>>> +      for (unsigned i = 0;
>>> +          i < sizeof (valid_vectors) / sizeof (valid_vectors[0]); i++)
>>> +       if (valid_vectors[i].element_size == elm_size
>>> +           && valid_vectors[i].mode == TYPE_MODE (type)
>>> +           && known_eq (valid_vectors[i].vector_size,
>>> +                        TYPE_VECTOR_SUBPARTS (type)))
>>> +         {
>>> +           p.num_zr = 1;
>>> +           p.num_pr = 0;
>>> +           p.mode = p.orig_mode = valid_vectors[i].mode;
>>> +           add_piece (p);
>>> +           return IS_PST;
>>> +         }
>>> +
>>> +      fatal_error (input_location, "unsupported vector type %qT"
>>> +                  " as function parameter without SVE attributes", type);
>>> +    }
>>> +
>>>     /* Check for user-defined PSTs.  */
>>>     if (TREE_CODE (type) == ARRAY_TYPE)
>>>       return analyze_array (type);
>>> @@ -17851,6 +17908,8 @@ aarch64_override_options_after_change_1 (struct gcc_options *opts)
>>>       flag_mrecip_low_precision_sqrt = true;
>>>   }
>>>
>>> +enum aarch64_veclibabi aarch64_selected_veclibabi = aarch64_veclibabi_type_none;
>>> +
>>>   /* 'Unpack' up the internal tuning structs and update the options
>>>       in OPTS.  The caller must have set up selected_tune and selected_arch
>>>       as all the other target-specific codegen decisions are
>>> @@ -18031,6 +18090,9 @@ aarch64_override_options_internal (struct gcc_options *opts)
>>>         && opts->x_optimize >= aarch64_tune_params.prefetch->default_opt_level)
>>>       opts->x_flag_prefetch_loop_arrays = 1;
>>>
>>> +  if (opts->x_aarch64_veclibabi_type == aarch64_veclibabi_type_sleefgnu)
>>> +    aarch64_selected_veclibabi = aarch64_veclibabi_type_sleefgnu;
>>> +
>>>     aarch64_override_options_after_change_1 (opts);
>>>   }
>>>
>>> @@ -28085,6 +28147,10 @@ aarch64_libgcc_floating_mode_supported_p
>>>   #undef TARGET_CONST_ANCHOR
>>>   #define TARGET_CONST_ANCHOR 0x1000000
>>>
>>> +#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
>>> +#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \
>>> +  aarch64_builtin_vectorized_function
>>> +
>>>   struct gcc_target targetm = TARGET_INITIALIZER;
>>>
>>>   #include "gt-aarch64.h"
>>> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
>>> index 1d7967db9c0..76013dacdea 100644
>>> --- a/gcc/config/aarch64/aarch64.opt
>>> +++ b/gcc/config/aarch64/aarch64.opt
>>> @@ -302,3 +302,18 @@ Constant memset size in bytes from which to start using MOPS sequence.
>>>   -param=aarch64-vect-unroll-limit=
>>>   Target Joined UInteger Var(aarch64_vect_unroll_limit) Init(4) Param
>>>   Limit how much the autovectorizer may unroll a loop.
>>> +
>>> +;; -mveclibabi=
>>> +TargetVariable
>>> +enum aarch64_veclibabi aarch64_veclibabi_type = aarch64_veclibabi_type_none
>>> +
>>> +mveclibabi=
>>> +Target RejectNegative Joined Var(aarch64_veclibabi_type) Enum(aarch64_veclibabi) Init(aarch64_veclibabi_type_none)
>>> +Vector library ABI to use.
>>> +
>>> +Enum
>>> +Name(aarch64_veclibabi) Type(enum aarch64_veclibabi)
>>> +Known vectorization library ABIs (for use with the -mveclibabi= option):
>>> +
>>> +EnumValue
>>> +Enum(aarch64_veclibabi) String(sleefgnu) Value(aarch64_veclibabi_type_sleefgnu)
>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>>> index a38547f53e5..71fbbf27522 100644
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>>> @@ -20383,6 +20383,21 @@ across releases.
>>>
>>>   This option is only intended to be useful when developing GCC.
>>>
>>> +@opindex mveclibabi
>>> +@item -mveclibabi=@var{type}
>>> +Specifies the ABI type to use for vectorizing intrinsics using an
>>> +external library.  The only type supported at present is @samp{sleefgnu},
>>> +which specifies to use the GNU ABI variant of the Sleef Vectorized
>>> +Math Library. This flag can be used for both, Advanced SIMD (NEON) and SVE.
>>> +
>>> +GCC currently emits vectorized calls to @code{exp}, @code{log}, @code{log10},
>>> +@code{tanh}, @code{tan}, @code{atan}, @code{atanh}, @code{cbrt}, @code{sinh},
>>> +@code{sin}, @code{asinh} and @code{asin} when possible and profitable
>>> +on AArch64.
>>> +
>>> +Both @option{-ftree-vectorize} and @option{-funsafe-math-optimizations}
>>> +must also be enabled. The libsleefgnu must be specified at link time.
>>> +
>>>   @opindex mverbose-cost-dump
>>>   @item -mverbose-cost-dump
>>>   Enable verbose cost model dumping in the debug dump files.  This option is
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-
>> sleefgnu-neon.c
>>> new file mode 100644
>>> index 00000000000..e9f6078cd12
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c
>>> @@ -0,0 +1,16 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O3 -march=armv8-a+simd -ftree-vectorize -mveclibabi=sleefgnu -ffast-math" } */
>>> +
>>> +extern float sinf(float);
>>> +
>>> +float x[256];
>>> +
>>> +void foo(void)
>>> +{
>>> +  int i;
>>> +
>>> +  for (i=0; i<256; ++i)
>>> +    x[i] = sinf(x[i]);
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler "_ZGVnN4v_sinf" } } */
>>> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-
>> sleefgnu-sve.c
>>> new file mode 100644
>>> index 00000000000..8319ae420e1
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c
>>> @@ -0,0 +1,16 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O3 -march=armv8-a+sve -ftree-vectorize -mveclibabi=sleefgnu -ffast-math" } */
>>> +
>>> +extern float sinf(float);
>>> +
>>> +float x[256];
>>> +
>>> +void foo(void)
>>> +{
>>> +  int i;
>>> +
>>> +  for (i=0; i<256; ++i)
>>> +    x[i] = sinf(x[i]);
>>> +}
>>> +
>>> +/* { dg-final { scan-assembler "_ZGVsNxv_sinf" } } */
>>> --
>>> 2.25.1
>>>

Lou Knauer April 14, 2023, 3:07 p.m. UTC | #4

Hello Andre,

Thank you for your nice reply. I can see that your patches,
combined with upcoming AArch64 support in libmvec,
will nullify the need for my patch.

I will definitely test your patches and look forward to seeing them
evolve, I will rebase and apply them myself. 

Thank you very much!
Kind regards,
Lou

> -----Original Message-----
> From: Andre Vieira (lists) <andre.simoesdiasvieira@arm.com>
> Sent: Friday, April 14, 2023 12:30
> To: Lou Knauer <lou.knauer@sipearl.com>; Andrew Pinski <pinskia@gmail.com>
> Cc: gcc-patches@gcc.gnu.org; Etienne Renault <etienne.renault@sipearl.com>
> Subject: Re: [PATCH] aarch64: Add -mveclibabi=sleefgnu
> 
> I have (outdated) RFC's here:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613593.html
> 
> I am working on this patch series for stage 1. The list of features I am
> working on are:
> * SVE support for #pragma omp declare simd
> * Support for simdclone usage in autovec from #pragma omp declare variant
>    This offers us a more fine-tuned approach to define what is and
> what's not available per function
> * Support for use of simdclones in SLP
> 
> Also planning to enable the use of mixed-types that is currently
> disabled for AArch64, it's not a feature I suspect we need for our
> use-case but it will enable better testing as we can then enable AArch64
> as a simdclone target in the testsuite.
> 
> I could try to post some updates to the RFCs, I have been rebasing them
> on top of Andrew Stubbs latest patch to enable inbranch codegen. Let me
> know if you'd like to see these updates sooner rather than later so you
> can try them out for your usecase.
> 
> Kind regards,
> Andre
> 
> On 14/04/2023 10:34, Lou Knauer via Gcc-patches wrote:
> >> -----Original Message-----
> >> From: Andrew Pinski <pinskia@gmail.com>
> >> Sent: Friday, April 14, 2023 09:08
> >> To: Lou Knauer <lou.knauer@sipearl.com>
> >> Cc: gcc-patches@gcc.gnu.org; Etienne Renault <etienne.renault@sipearl.com>
> >> Subject: Re: [PATCH] aarch64: Add -mveclibabi=sleefgnu
> >>
> >> On Fri, Apr 14, 2023 at 12:03 AM Lou Knauer via Gcc-patches
> >> <gcc-patches@gcc.gnu.org> wrote:
> >>>
> >>> This adds support for the -mveclibabi option to the AArch64 backend of GCC by
> >>> implementing the builtin_vectorized_function target hook for AArch64.
> >>> The SLEEF Vectorized Math Library's GNUABI interface is used, and
> >>> NEON/Advanced SIMD as well as SVE are supported.
> >>>
> >>> This was tested on the gcc testsuite and the llvm-test-suite on a AArch64
> >>> host for NEON and SVE as well as on hand-written benchmarks. Where the
> >>> vectorization of builtins was applied successfully in loops bound by the
> >>> calls to those, significant (>2) performance gains can be observed.
> >>
> >> This is so wrong and it is better if you actually just used a header
> >> file instead.  Specifically the openmp vect pragmas.
> >>
> >> Thanks,
> >> Andrew Pinski
> >>
> >
> > Thank you for your quick response. I do not fully understand your point:
> > the OpenMP Declare SIMD pragmas are not yet implemented for SVE (here [0]
> > someone started working on that, but it does not work in its current state).
> > The `-mveclibabi` flag seems to be the only solution for SVE vectorization of
> > libm functions from our point of view.
> >
> > Indeed, a custom header that redirects regular libm function calls to their
> > Sleef equivalent would be a solution for NEON since OpenMP Declare SIMD
> > pragmas are implemented for NEON in GCC. Nonetheless as far as I can tell,
> > the libmvec is not yet support for AArch64, so Sleef is unavoidable. I
> > therefore opted for a solution similar to the one for x86 and the SVML, where
> > only a additional flag during compilation is needed (instead of having to
> > modify source code to add includes). From a vectorization legality perspective,
> > this strategy also seems more reliable than a redirecting header since
> > Sleef functions (even the scalar ones) never set the errno and GCC already
> > verifies such details when transforming libm calls to builtins.
> >
> > Alternatively, do you prefere a patch that adds SVE support for
> > #pragma omp declare simd declarations, thus enabling the same header-based
> > strategy for SVE as for NEON?
> >
> > Thank you and kind regards,
> > Lou Knauer
> >
> > [0]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342
> >
> >>
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>          * config/aarch64/aarch64.opt: Add -mveclibabi option.
> >>>          * config/aarch64/aarch64-opts.h: Add aarch64_veclibabi enum.
> >>>          * config/aarch64/aarch64-protos.h: Add
> >>>          aarch64_builtin_vectorized_function declaration.
> >>>          * config/aarch64/aarch64.cc: Handle -mveclibabi option and pure
> >>>          scalable type info for scalable vectors without "SVE type" attributes.
> >>>          * config/aarch64/aarch64-builtins.cc: Add
> >>>          aarch64_builtin_vectorized_function definition.
> >>>          * doc/invoke.texi: Document -mveclibabi for AArch64 targets.
> >>>
> >>> gcc/testsuite/ChangeLog:
> >>>
> >>>          * gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c: New testcase.
> >>>          * gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c: New testcase.
> >>> ---
> >>>   gcc/config/aarch64/aarch64-builtins.cc        | 113 ++++++++++++++++++
> >>>   gcc/config/aarch64/aarch64-opts.h             |   5 +
> >>>   gcc/config/aarch64/aarch64-protos.h           |   3 +
> >>>   gcc/config/aarch64/aarch64.cc                 |  66 ++++++++++
> >>>   gcc/config/aarch64/aarch64.opt                |  15 +++
> >>>   gcc/doc/invoke.texi                           |  15 +++
> >>>   .../aarch64/vect-vecabi-sleefgnu-neon.c       |  16 +++
> >>>   .../aarch64/vect-vecabi-sleefgnu-sve.c        |  16 +++
> >>>   8 files changed, 249 insertions(+)
> >>>   create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c
> >>>   create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c
> >>>
> >>> diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
> >>> index cc6b7c01fd1..f53fa91b8d0 100644
> >>> --- a/gcc/config/aarch64/aarch64-builtins.cc
> >>> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> >>> @@ -47,6 +47,7 @@
> >>>   #include "stringpool.h"
> >>>   #include "attribs.h"
> >>>   #include "gimple-fold.h"
> >>> +#include "builtins.h"
> >>>
> >>>   #define v8qi_UP  E_V8QImode
> >>>   #define v8di_UP  E_V8DImode
> >>> @@ -3450,6 +3451,118 @@ aarch64_resolve_overloaded_builtin_general (location_t loc, tree function,
> >>>     return NULL_TREE;
> >>>   }
> >>>
> >>> +/* The vector library abi to use, if any.  */
> >>> +extern enum aarch64_veclibabi aarch64_selected_veclibabi;
> >>> +
> >>> +/* Returns a function declaration for a vectorized version of the combined
> >>> +   function with combined_fn code FN and the result vector type TYPE.
> >>> +   NULL_TREE is returned if there is none available.  */
> >>> +tree
> >>> +aarch64_builtin_vectorized_function (unsigned int fn_code,
> >>> +                                    tree type_out, tree type_in)
> >>> +{
> >>> +  if (TREE_CODE (type_out) != VECTOR_TYPE
> >>> +      || TREE_CODE (type_in) != VECTOR_TYPE
> >>> +      || aarch64_selected_veclibabi != aarch64_veclibabi_type_sleefgnu
> >>> +      || !flag_unsafe_math_optimizations)
> >>> +    return NULL_TREE;
> >>> +
> >>> +  machine_mode mode = TYPE_MODE (TREE_TYPE (type_out));
> >>> +  poly_uint64 n = TYPE_VECTOR_SUBPARTS (type_out);
> >>> +  if (mode != TYPE_MODE (TREE_TYPE (type_in))
> >>> +      || !known_eq (n, TYPE_VECTOR_SUBPARTS (type_in)))
> >>> +    return NULL_TREE;
> >>> +
> >>> +  bool is_scalable = !n.is_constant ();
> >>> +  if (is_scalable)
> >>> +    {
> >>> +      /* SVE is needed for scalable vectors, a SVE register's size is
> >>> +        always a multiple of 128.  */
> >>> +      if (!TARGET_SVE
> >>> +         || (mode == DFmode && !known_eq (n, poly_uint64 (2, 2)))
> >>> +         || (mode == SFmode && !known_eq (n, poly_uint64 (4, 4))))
> >>> +       return NULL_TREE;
> >>> +    }
> >>> +  else
> >>> +    {
> >>> +      /* A NEON register can hold two doubles or one float.  */
> >>> +      if (!TARGET_SIMD
> >>> +         || (mode == DFmode && n.to_constant () != 2)
> >>> +         || (mode == SFmode && n.to_constant () != 4))
> >>> +       return NULL_TREE;
> >>> +    }
> >>> +
> >>> +  tree fntype;
> >>> +  combined_fn fn = combined_fn (fn_code);
> >>> +  const char *argencoding;
> >>> +  switch (fn)
> >>> +    {
> >>> +      CASE_CFN_EXP:
> >>> +      CASE_CFN_LOG:
> >>> +      CASE_CFN_LOG10:
> >>> +      CASE_CFN_TANH:
> >>> +      CASE_CFN_TAN:
> >>> +      CASE_CFN_ATAN:
> >>> +      CASE_CFN_ATANH:
> >>> +      CASE_CFN_CBRT:
> >>> +      CASE_CFN_SINH:
> >>> +      CASE_CFN_SIN:
> >>> +      CASE_CFN_ASINH:
> >>> +      CASE_CFN_ASIN:
> >>> +      CASE_CFN_COSH:
> >>> +      CASE_CFN_COS:
> >>> +      CASE_CFN_ACOSH:
> >>> +      CASE_CFN_ACOS:
> >>> +       fntype = build_function_type_list (type_out, type_in, NULL);
> >>> +       argencoding = "v";
> >>> +       break;
> >>> +
> >>> +      CASE_CFN_POW:
> >>> +      CASE_CFN_ATAN2:
> >>> +       fntype = build_function_type_list (type_out, type_in, type_in, NULL);
> >>> +       argencoding = "vv";
> >>> +       break;
> >>> +
> >>> +      default:
> >>> +       return NULL_TREE;
> >>> +    }
> >>> +
> >>> +  tree fndecl = mathfn_built_in (mode == DFmode
> >>> +                                ? double_type_node : float_type_node, fn);
> >>> +  const char *scalar_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
> >>> +  /* Builtins will always be prefixed with '__builtin_'.  */
> >>> +  gcc_assert (strncmp (scalar_name, "__builtin_", 10) == 0);
> >>> +  scalar_name += 10;
> >>> +
> >>> +  char vectorized_name[32];
> >>> +  if (is_scalable)
> >>> +    {
> >>> +      /* SVE ISA */
> >>> +      int n = snprintf (vectorized_name, sizeof (vectorized_name),
> >>> +                       "_ZGVsNx%s_%s", argencoding, scalar_name);
> >>> +      if (n < 0 || n > sizeof (vectorized_name))
> >>> +       return NULL_TREE;
> >>> +    }
> >>> +  else
> >>> +    {
> >>> +      /* NEON ISA */
> >>> +      int n = snprintf (vectorized_name, sizeof (vectorized_name),
> >>> +                       "_ZGVnN%d%s_%s", mode == SFmode ? 4 : 2,
> >>> +                       argencoding, scalar_name);
> >>> +      if (n < 0 || n > sizeof (vectorized_name))
> >>> +       return NULL_TREE;
> >>> +    }
> >>> +
> >>> +  tree new_fndecl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
> >>> +                               get_identifier (vectorized_name), fntype);
> >>> +  TREE_PUBLIC (new_fndecl) = 1;
> >>> +  TREE_READONLY (new_fndecl) = 1;
> >>> +  DECL_EXTERNAL (new_fndecl) = 1;
> >>> +  DECL_IS_NOVOPS (new_fndecl) = 1;
> >>> +
> >>> +  return new_fndecl;
> >>> +}
> >>> +
> >>>   #undef AARCH64_CHECK_BUILTIN_MODE
> >>>   #undef AARCH64_FIND_FRINT_VARIANT
> >>>   #undef CF0
> >>> diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h
> >>> index a9f3e2715ca..d12871b893c 100644
> >>> --- a/gcc/config/aarch64/aarch64-opts.h
> >>> +++ b/gcc/config/aarch64/aarch64-opts.h
> >>> @@ -98,4 +98,9 @@ enum aarch64_key_type {
> >>>     AARCH64_KEY_B
> >>>   };
> >>>
> >>> +enum aarch64_veclibabi {
> >>> +  aarch64_veclibabi_type_none,
> >>> +  aarch64_veclibabi_type_sleefgnu
> >>> +};
> >>> +
> >>>   #endif
> >>> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
> >>> index 63339fa47df..53c6e455da8 100644
> >>> --- a/gcc/config/aarch64/aarch64-protos.h
> >>> +++ b/gcc/config/aarch64/aarch64-protos.h
> >>> @@ -1066,4 +1066,7 @@ extern bool aarch64_harden_sls_blr_p (void);
> >>>
> >>>   extern void aarch64_output_patchable_area (unsigned int, bool);
> >>>
> >>> +extern tree aarch64_builtin_vectorized_function (unsigned int fn,
> >>> +                                                tree type_out, tree type_in);
> >>> +
> >>>   #endif /* GCC_AARCH64_PROTOS_H */
> >>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> >>> index 42617ced73a..50ac37ff01e 100644
> >>> --- a/gcc/config/aarch64/aarch64.cc
> >>> +++ b/gcc/config/aarch64/aarch64.cc
> >>> @@ -84,6 +84,7 @@
> >>>   #include "aarch64-feature-deps.h"
> >>>   #include "config/arm/aarch-common.h"
> >>>   #include "config/arm/aarch-common-protos.h"
> >>> +#include "print-tree.h"
> >>>
> >>>   /* This file should be included last.  */
> >>>   #include "target-def.h"
> >>> @@ -2951,6 +2952,62 @@ pure_scalable_type_info::analyze (const_tree type)
> >>>         return IS_PST;
> >>>       }
> >>>
> >>> +  /* Only functions and types that are part of the ARM C Language
> >>> +     Extensions (arm_sve.h) have the SVE type attributes.
> >>> +     The auto-vectorizer does not annotate the vector types it creates with
> >>> +     those attributes. With the support of vectorized libm function
> >>> +     builtins  for SVE, scalable vectors without special attributes
> >>> +     have to be treated as well.  */
> >>> +  if (TREE_CODE (type) == VECTOR_TYPE
> >>> +      && !TYPE_VECTOR_SUBPARTS (type).is_constant ())
> >>> +    {
> >>> +      /* Boolean vectors are special because they are used by
> >>> +        the vectorizer as masks that must go into the
> >>> +        predicate registers.  */
> >>> +      if (TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
> >>> +       {
> >>> +         p.num_zr = 0;
> >>> +         p.num_pr = 1;
> >>> +         p.mode = p.orig_mode = TYPE_MODE (type);
> >>> +         add_piece (p);
> >>> +         return IS_PST;
> >>> +       }
> >>> +
> >>> +      static const struct {
> >>> +       machine_mode mode;
> >>> +       unsigned int element_size;
> >>> +       poly_uint64 vector_size;
> >>> +      } valid_vectors[] = {
> >>> +       { VNx8BFmode, 16, poly_uint64 (8, 8) }, /* svbfloat16_t */
> >>> +       { VNx8HFmode, 16, poly_uint64 (8, 8) }, /* svfloat16_t */
> >>> +       { VNx4SFmode, 32, poly_uint64 (4, 4) }, /* svfloat32_t */
> >>> +       { VNx2DFmode, 64, poly_uint64 (2, 2) }, /* svfloat64_t */
> >>> +       { VNx16BImode, 8, poly_uint64 (16, 16) }, /* sv[u]int8_t */
> >>> +       { VNx8HImode, 16, poly_uint64 (8, 8) }, /* sv[u]int16_t */
> >>> +       { VNx4SImode, 32, poly_uint64 (4, 4) }, /* sv[u]int32_t */
> >>> +       { VNx2DImode, 64, poly_uint64 (2, 2) }, /* sv[u]int64_t */
> >>> +      };
> >>> +
> >>> +      machine_mode elm_mode = TYPE_MODE (TREE_TYPE (type));
> >>> +      unsigned int elm_size = GET_MODE_BITSIZE (elm_mode).to_constant ();
> >>> +      for (unsigned i = 0;
> >>> +          i < sizeof (valid_vectors) / sizeof (valid_vectors[0]); i++)
> >>> +       if (valid_vectors[i].element_size == elm_size
> >>> +           && valid_vectors[i].mode == TYPE_MODE (type)
> >>> +           && known_eq (valid_vectors[i].vector_size,
> >>> +                        TYPE_VECTOR_SUBPARTS (type)))
> >>> +         {
> >>> +           p.num_zr = 1;
> >>> +           p.num_pr = 0;
> >>> +           p.mode = p.orig_mode = valid_vectors[i].mode;
> >>> +           add_piece (p);
> >>> +           return IS_PST;
> >>> +         }
> >>> +
> >>> +      fatal_error (input_location, "unsupported vector type %qT"
> >>> +                  " as function parameter without SVE attributes", type);
> >>> +    }
> >>> +
> >>>     /* Check for user-defined PSTs.  */
> >>>     if (TREE_CODE (type) == ARRAY_TYPE)
> >>>       return analyze_array (type);
> >>> @@ -17851,6 +17908,8 @@ aarch64_override_options_after_change_1 (struct gcc_options *opts)
> >>>       flag_mrecip_low_precision_sqrt = true;
> >>>   }
> >>>
> >>> +enum aarch64_veclibabi aarch64_selected_veclibabi = aarch64_veclibabi_type_none;
> >>> +
> >>>   /* 'Unpack' up the internal tuning structs and update the options
> >>>       in OPTS.  The caller must have set up selected_tune and selected_arch
> >>>       as all the other target-specific codegen decisions are
> >>> @@ -18031,6 +18090,9 @@ aarch64_override_options_internal (struct gcc_options *opts)
> >>>         && opts->x_optimize >= aarch64_tune_params.prefetch->default_opt_level)
> >>>       opts->x_flag_prefetch_loop_arrays = 1;
> >>>
> >>> +  if (opts->x_aarch64_veclibabi_type == aarch64_veclibabi_type_sleefgnu)
> >>> +    aarch64_selected_veclibabi = aarch64_veclibabi_type_sleefgnu;
> >>> +
> >>>     aarch64_override_options_after_change_1 (opts);
> >>>   }
> >>>
> >>> @@ -28085,6 +28147,10 @@ aarch64_libgcc_floating_mode_supported_p
> >>>   #undef TARGET_CONST_ANCHOR
> >>>   #define TARGET_CONST_ANCHOR 0x1000000
> >>>
> >>> +#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
> >>> +#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \
> >>> +  aarch64_builtin_vectorized_function
> >>> +
> >>>   struct gcc_target targetm = TARGET_INITIALIZER;
> >>>
> >>>   #include "gt-aarch64.h"
> >>> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
> >>> index 1d7967db9c0..76013dacdea 100644
> >>> --- a/gcc/config/aarch64/aarch64.opt
> >>> +++ b/gcc/config/aarch64/aarch64.opt
> >>> @@ -302,3 +302,18 @@ Constant memset size in bytes from which to start using MOPS sequence.
> >>>   -param=aarch64-vect-unroll-limit=
> >>>   Target Joined UInteger Var(aarch64_vect_unroll_limit) Init(4) Param
> >>>   Limit how much the autovectorizer may unroll a loop.
> >>> +
> >>> +;; -mveclibabi=
> >>> +TargetVariable
> >>> +enum aarch64_veclibabi aarch64_veclibabi_type = aarch64_veclibabi_type_none
> >>> +
> >>> +mveclibabi=
> >>> +Target RejectNegative Joined Var(aarch64_veclibabi_type) Enum(aarch64_veclibabi) Init(aarch64_veclibabi_type_none)
> >>> +Vector library ABI to use.
> >>> +
> >>> +Enum
> >>> +Name(aarch64_veclibabi) Type(enum aarch64_veclibabi)
> >>> +Known vectorization library ABIs (for use with the -mveclibabi= option):
> >>> +
> >>> +EnumValue
> >>> +Enum(aarch64_veclibabi) String(sleefgnu) Value(aarch64_veclibabi_type_sleefgnu)
> >>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> >>> index a38547f53e5..71fbbf27522 100644
> >>> --- a/gcc/doc/invoke.texi
> >>> +++ b/gcc/doc/invoke.texi
> >>> @@ -20383,6 +20383,21 @@ across releases.
> >>>
> >>>   This option is only intended to be useful when developing GCC.
> >>>
> >>> +@opindex mveclibabi
> >>> +@item -mveclibabi=@var{type}
> >>> +Specifies the ABI type to use for vectorizing intrinsics using an
> >>> +external library.  The only type supported at present is @samp{sleefgnu},
> >>> +which specifies to use the GNU ABI variant of the Sleef Vectorized
> >>> +Math Library. This flag can be used for both, Advanced SIMD (NEON) and SVE.
> >>> +
> >>> +GCC currently emits vectorized calls to @code{exp}, @code{log}, @code{log10},
> >>> +@code{tanh}, @code{tan}, @code{atan}, @code{atanh}, @code{cbrt}, @code{sinh},
> >>> +@code{sin}, @code{asinh} and @code{asin} when possible and profitable
> >>> +on AArch64.
> >>> +
> >>> +Both @option{-ftree-vectorize} and @option{-funsafe-math-optimizations}
> >>> +must also be enabled. The libsleefgnu must be specified at link time.
> >>> +
> >>>   @opindex mverbose-cost-dump
> >>>   @item -mverbose-cost-dump
> >>>   Enable verbose cost model dumping in the debug dump files.  This option is
> >>> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c b/gcc/testsuite/gcc.target/aarch64/vect-
> vecabi-
> >> sleefgnu-neon.c
> >>> new file mode 100644
> >>> index 00000000000..e9f6078cd12
> >>> --- /dev/null
> >>> +++ b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c
> >>> @@ -0,0 +1,16 @@
> >>> +/* { dg-do compile } */
> >>> +/* { dg-options "-O3 -march=armv8-a+simd -ftree-vectorize -mveclibabi=sleefgnu -ffast-math" } */
> >>> +
> >>> +extern float sinf(float);
> >>> +
> >>> +float x[256];
> >>> +
> >>> +void foo(void)
> >>> +{
> >>> +  int i;
> >>> +
> >>> +  for (i=0; i<256; ++i)
> >>> +    x[i] = sinf(x[i]);
> >>> +}
> >>> +
> >>> +/* { dg-final { scan-assembler "_ZGVnN4v_sinf" } } */
> >>> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-
> >> sleefgnu-sve.c
> >>> new file mode 100644
> >>> index 00000000000..8319ae420e1
> >>> --- /dev/null
> >>> +++ b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c
> >>> @@ -0,0 +1,16 @@
> >>> +/* { dg-do compile } */
> >>> +/* { dg-options "-O3 -march=armv8-a+sve -ftree-vectorize -mveclibabi=sleefgnu -ffast-math" } */
> >>> +
> >>> +extern float sinf(float);
> >>> +
> >>> +float x[256];
> >>> +
> >>> +void foo(void)
> >>> +{
> >>> +  int i;
> >>> +
> >>> +  for (i=0; i<256; ++i)
> >>> +    x[i] = sinf(x[i]);
> >>> +}
> >>> +
> >>> +/* { dg-final { scan-assembler "_ZGVsNxv_sinf" } } */
> >>> --
> >>> 2.25.1
> >>>

Andre Vieira (lists) April 14, 2023, 3:23 p.m. UTC | #5

No problem Lou and testing will be appreciated.

I strongly recommend against rebasing the version that is on the mailing 
list now, the conflicts with Andrew's patches aren't simple to resolve ;)

I'll do my best to get you revised versions next week :)

On 14/04/2023 16:07, Lou Knauer wrote:
> Hello Andre,
> 
> Thank you for your nice reply. I can see that your patches,
> combined with upcoming AArch64 support in libmvec,
> will nullify the need for my patch.
> 
> I will definitely test your patches and look forward to seeing them
> evolve, I will rebase and apply them myself.
> 
> Thank you very much!
> Kind regards,
> Lou
> 
>> -----Original Message-----
>> From: Andre Vieira (lists) <andre.simoesdiasvieira@arm.com>
>> Sent: Friday, April 14, 2023 12:30
>> To: Lou Knauer <lou.knauer@sipearl.com>; Andrew Pinski <pinskia@gmail.com>
>> Cc: gcc-patches@gcc.gnu.org; Etienne Renault <etienne.renault@sipearl.com>
>> Subject: Re: [PATCH] aarch64: Add -mveclibabi=sleefgnu
>>
>> I have (outdated) RFC's here:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-March/613593.html
>>
>> I am working on this patch series for stage 1. The list of features I am
>> working on are:
>> * SVE support for #pragma omp declare simd
>> * Support for simdclone usage in autovec from #pragma omp declare variant
>>     This offers us a more fine-tuned approach to define what is and
>> what's not available per function
>> * Support for use of simdclones in SLP
>>
>> Also planning to enable the use of mixed-types that is currently
>> disabled for AArch64, it's not a feature I suspect we need for our
>> use-case but it will enable better testing as we can then enable AArch64
>> as a simdclone target in the testsuite.
>>
>> I could try to post some updates to the RFCs, I have been rebasing them
>> on top of Andrew Stubbs latest patch to enable inbranch codegen. Let me
>> know if you'd like to see these updates sooner rather than later so you
>> can try them out for your usecase.
>>
>> Kind regards,
>> Andre
>>
>> On 14/04/2023 10:34, Lou Knauer via Gcc-patches wrote:
>>>> -----Original Message-----
>>>> From: Andrew Pinski <pinskia@gmail.com>
>>>> Sent: Friday, April 14, 2023 09:08
>>>> To: Lou Knauer <lou.knauer@sipearl.com>
>>>> Cc: gcc-patches@gcc.gnu.org; Etienne Renault <etienne.renault@sipearl.com>
>>>> Subject: Re: [PATCH] aarch64: Add -mveclibabi=sleefgnu
>>>>
>>>> On Fri, Apr 14, 2023 at 12:03 AM Lou Knauer via Gcc-patches
>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>
>>>>> This adds support for the -mveclibabi option to the AArch64 backend of GCC by
>>>>> implementing the builtin_vectorized_function target hook for AArch64.
>>>>> The SLEEF Vectorized Math Library's GNUABI interface is used, and
>>>>> NEON/Advanced SIMD as well as SVE are supported.
>>>>>
>>>>> This was tested on the gcc testsuite and the llvm-test-suite on a AArch64
>>>>> host for NEON and SVE as well as on hand-written benchmarks. Where the
>>>>> vectorization of builtins was applied successfully in loops bound by the
>>>>> calls to those, significant (>2) performance gains can be observed.
>>>>
>>>> This is so wrong and it is better if you actually just used a header
>>>> file instead.  Specifically the openmp vect pragmas.
>>>>
>>>> Thanks,
>>>> Andrew Pinski
>>>>
>>>
>>> Thank you for your quick response. I do not fully understand your point:
>>> the OpenMP Declare SIMD pragmas are not yet implemented for SVE (here [0]
>>> someone started working on that, but it does not work in its current state).
>>> The `-mveclibabi` flag seems to be the only solution for SVE vectorization of
>>> libm functions from our point of view.
>>>
>>> Indeed, a custom header that redirects regular libm function calls to their
>>> Sleef equivalent would be a solution for NEON since OpenMP Declare SIMD
>>> pragmas are implemented for NEON in GCC. Nonetheless as far as I can tell,
>>> the libmvec is not yet support for AArch64, so Sleef is unavoidable. I
>>> therefore opted for a solution similar to the one for x86 and the SVML, where
>>> only a additional flag during compilation is needed (instead of having to
>>> modify source code to add includes). From a vectorization legality perspective,
>>> this strategy also seems more reliable than a redirecting header since
>>> Sleef functions (even the scalar ones) never set the errno and GCC already
>>> verifies such details when transforming libm calls to builtins.
>>>
>>> Alternatively, do you prefere a patch that adds SVE support for
>>> #pragma omp declare simd declarations, thus enabling the same header-based
>>> strategy for SVE as for NEON?
>>>
>>> Thank you and kind regards,
>>> Lou Knauer
>>>
>>> [0]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342
>>>
>>>>
>>>>>
>>>>> gcc/ChangeLog:
>>>>>
>>>>>           * config/aarch64/aarch64.opt: Add -mveclibabi option.
>>>>>           * config/aarch64/aarch64-opts.h: Add aarch64_veclibabi enum.
>>>>>           * config/aarch64/aarch64-protos.h: Add
>>>>>           aarch64_builtin_vectorized_function declaration.
>>>>>           * config/aarch64/aarch64.cc: Handle -mveclibabi option and pure
>>>>>           scalable type info for scalable vectors without "SVE type" attributes.
>>>>>           * config/aarch64/aarch64-builtins.cc: Add
>>>>>           aarch64_builtin_vectorized_function definition.
>>>>>           * doc/invoke.texi: Document -mveclibabi for AArch64 targets.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>>
>>>>>           * gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c: New testcase.
>>>>>           * gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c: New testcase.
>>>>> ---
>>>>>    gcc/config/aarch64/aarch64-builtins.cc        | 113 ++++++++++++++++++
>>>>>    gcc/config/aarch64/aarch64-opts.h             |   5 +
>>>>>    gcc/config/aarch64/aarch64-protos.h           |   3 +
>>>>>    gcc/config/aarch64/aarch64.cc                 |  66 ++++++++++
>>>>>    gcc/config/aarch64/aarch64.opt                |  15 +++
>>>>>    gcc/doc/invoke.texi                           |  15 +++
>>>>>    .../aarch64/vect-vecabi-sleefgnu-neon.c       |  16 +++
>>>>>    .../aarch64/vect-vecabi-sleefgnu-sve.c        |  16 +++
>>>>>    8 files changed, 249 insertions(+)
>>>>>    create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c
>>>>>    create mode 100644 gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c
>>>>>
>>>>> diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
>>>>> index cc6b7c01fd1..f53fa91b8d0 100644
>>>>> --- a/gcc/config/aarch64/aarch64-builtins.cc
>>>>> +++ b/gcc/config/aarch64/aarch64-builtins.cc
>>>>> @@ -47,6 +47,7 @@
>>>>>    #include "stringpool.h"
>>>>>    #include "attribs.h"
>>>>>    #include "gimple-fold.h"
>>>>> +#include "builtins.h"
>>>>>
>>>>>    #define v8qi_UP  E_V8QImode
>>>>>    #define v8di_UP  E_V8DImode
>>>>> @@ -3450,6 +3451,118 @@ aarch64_resolve_overloaded_builtin_general (location_t loc, tree function,
>>>>>      return NULL_TREE;
>>>>>    }
>>>>>
>>>>> +/* The vector library abi to use, if any.  */
>>>>> +extern enum aarch64_veclibabi aarch64_selected_veclibabi;
>>>>> +
>>>>> +/* Returns a function declaration for a vectorized version of the combined
>>>>> +   function with combined_fn code FN and the result vector type TYPE.
>>>>> +   NULL_TREE is returned if there is none available.  */
>>>>> +tree
>>>>> +aarch64_builtin_vectorized_function (unsigned int fn_code,
>>>>> +                                    tree type_out, tree type_in)
>>>>> +{
>>>>> +  if (TREE_CODE (type_out) != VECTOR_TYPE
>>>>> +      || TREE_CODE (type_in) != VECTOR_TYPE
>>>>> +      || aarch64_selected_veclibabi != aarch64_veclibabi_type_sleefgnu
>>>>> +      || !flag_unsafe_math_optimizations)
>>>>> +    return NULL_TREE;
>>>>> +
>>>>> +  machine_mode mode = TYPE_MODE (TREE_TYPE (type_out));
>>>>> +  poly_uint64 n = TYPE_VECTOR_SUBPARTS (type_out);
>>>>> +  if (mode != TYPE_MODE (TREE_TYPE (type_in))
>>>>> +      || !known_eq (n, TYPE_VECTOR_SUBPARTS (type_in)))
>>>>> +    return NULL_TREE;
>>>>> +
>>>>> +  bool is_scalable = !n.is_constant ();
>>>>> +  if (is_scalable)
>>>>> +    {
>>>>> +      /* SVE is needed for scalable vectors, a SVE register's size is
>>>>> +        always a multiple of 128.  */
>>>>> +      if (!TARGET_SVE
>>>>> +         || (mode == DFmode && !known_eq (n, poly_uint64 (2, 2)))
>>>>> +         || (mode == SFmode && !known_eq (n, poly_uint64 (4, 4))))
>>>>> +       return NULL_TREE;
>>>>> +    }
>>>>> +  else
>>>>> +    {
>>>>> +      /* A NEON register can hold two doubles or one float.  */
>>>>> +      if (!TARGET_SIMD
>>>>> +         || (mode == DFmode && n.to_constant () != 2)
>>>>> +         || (mode == SFmode && n.to_constant () != 4))
>>>>> +       return NULL_TREE;
>>>>> +    }
>>>>> +
>>>>> +  tree fntype;
>>>>> +  combined_fn fn = combined_fn (fn_code);
>>>>> +  const char *argencoding;
>>>>> +  switch (fn)
>>>>> +    {
>>>>> +      CASE_CFN_EXP:
>>>>> +      CASE_CFN_LOG:
>>>>> +      CASE_CFN_LOG10:
>>>>> +      CASE_CFN_TANH:
>>>>> +      CASE_CFN_TAN:
>>>>> +      CASE_CFN_ATAN:
>>>>> +      CASE_CFN_ATANH:
>>>>> +      CASE_CFN_CBRT:
>>>>> +      CASE_CFN_SINH:
>>>>> +      CASE_CFN_SIN:
>>>>> +      CASE_CFN_ASINH:
>>>>> +      CASE_CFN_ASIN:
>>>>> +      CASE_CFN_COSH:
>>>>> +      CASE_CFN_COS:
>>>>> +      CASE_CFN_ACOSH:
>>>>> +      CASE_CFN_ACOS:
>>>>> +       fntype = build_function_type_list (type_out, type_in, NULL);
>>>>> +       argencoding = "v";
>>>>> +       break;
>>>>> +
>>>>> +      CASE_CFN_POW:
>>>>> +      CASE_CFN_ATAN2:
>>>>> +       fntype = build_function_type_list (type_out, type_in, type_in, NULL);
>>>>> +       argencoding = "vv";
>>>>> +       break;
>>>>> +
>>>>> +      default:
>>>>> +       return NULL_TREE;
>>>>> +    }
>>>>> +
>>>>> +  tree fndecl = mathfn_built_in (mode == DFmode
>>>>> +                                ? double_type_node : float_type_node, fn);
>>>>> +  const char *scalar_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
>>>>> +  /* Builtins will always be prefixed with '__builtin_'.  */
>>>>> +  gcc_assert (strncmp (scalar_name, "__builtin_", 10) == 0);
>>>>> +  scalar_name += 10;
>>>>> +
>>>>> +  char vectorized_name[32];
>>>>> +  if (is_scalable)
>>>>> +    {
>>>>> +      /* SVE ISA */
>>>>> +      int n = snprintf (vectorized_name, sizeof (vectorized_name),
>>>>> +                       "_ZGVsNx%s_%s", argencoding, scalar_name);
>>>>> +      if (n < 0 || n > sizeof (vectorized_name))
>>>>> +       return NULL_TREE;
>>>>> +    }
>>>>> +  else
>>>>> +    {
>>>>> +      /* NEON ISA */
>>>>> +      int n = snprintf (vectorized_name, sizeof (vectorized_name),
>>>>> +                       "_ZGVnN%d%s_%s", mode == SFmode ? 4 : 2,
>>>>> +                       argencoding, scalar_name);
>>>>> +      if (n < 0 || n > sizeof (vectorized_name))
>>>>> +       return NULL_TREE;
>>>>> +    }
>>>>> +
>>>>> +  tree new_fndecl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
>>>>> +                               get_identifier (vectorized_name), fntype);
>>>>> +  TREE_PUBLIC (new_fndecl) = 1;
>>>>> +  TREE_READONLY (new_fndecl) = 1;
>>>>> +  DECL_EXTERNAL (new_fndecl) = 1;
>>>>> +  DECL_IS_NOVOPS (new_fndecl) = 1;
>>>>> +
>>>>> +  return new_fndecl;
>>>>> +}
>>>>> +
>>>>>    #undef AARCH64_CHECK_BUILTIN_MODE
>>>>>    #undef AARCH64_FIND_FRINT_VARIANT
>>>>>    #undef CF0
>>>>> diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h
>>>>> index a9f3e2715ca..d12871b893c 100644
>>>>> --- a/gcc/config/aarch64/aarch64-opts.h
>>>>> +++ b/gcc/config/aarch64/aarch64-opts.h
>>>>> @@ -98,4 +98,9 @@ enum aarch64_key_type {
>>>>>      AARCH64_KEY_B
>>>>>    };
>>>>>
>>>>> +enum aarch64_veclibabi {
>>>>> +  aarch64_veclibabi_type_none,
>>>>> +  aarch64_veclibabi_type_sleefgnu
>>>>> +};
>>>>> +
>>>>>    #endif
>>>>> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
>>>>> index 63339fa47df..53c6e455da8 100644
>>>>> --- a/gcc/config/aarch64/aarch64-protos.h
>>>>> +++ b/gcc/config/aarch64/aarch64-protos.h
>>>>> @@ -1066,4 +1066,7 @@ extern bool aarch64_harden_sls_blr_p (void);
>>>>>
>>>>>    extern void aarch64_output_patchable_area (unsigned int, bool);
>>>>>
>>>>> +extern tree aarch64_builtin_vectorized_function (unsigned int fn,
>>>>> +                                                tree type_out, tree type_in);
>>>>> +
>>>>>    #endif /* GCC_AARCH64_PROTOS_H */
>>>>> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>>>>> index 42617ced73a..50ac37ff01e 100644
>>>>> --- a/gcc/config/aarch64/aarch64.cc
>>>>> +++ b/gcc/config/aarch64/aarch64.cc
>>>>> @@ -84,6 +84,7 @@
>>>>>    #include "aarch64-feature-deps.h"
>>>>>    #include "config/arm/aarch-common.h"
>>>>>    #include "config/arm/aarch-common-protos.h"
>>>>> +#include "print-tree.h"
>>>>>
>>>>>    /* This file should be included last.  */
>>>>>    #include "target-def.h"
>>>>> @@ -2951,6 +2952,62 @@ pure_scalable_type_info::analyze (const_tree type)
>>>>>          return IS_PST;
>>>>>        }
>>>>>
>>>>> +  /* Only functions and types that are part of the ARM C Language
>>>>> +     Extensions (arm_sve.h) have the SVE type attributes.
>>>>> +     The auto-vectorizer does not annotate the vector types it creates with
>>>>> +     those attributes. With the support of vectorized libm function
>>>>> +     builtins  for SVE, scalable vectors without special attributes
>>>>> +     have to be treated as well.  */
>>>>> +  if (TREE_CODE (type) == VECTOR_TYPE
>>>>> +      && !TYPE_VECTOR_SUBPARTS (type).is_constant ())
>>>>> +    {
>>>>> +      /* Boolean vectors are special because they are used by
>>>>> +        the vectorizer as masks that must go into the
>>>>> +        predicate registers.  */
>>>>> +      if (TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
>>>>> +       {
>>>>> +         p.num_zr = 0;
>>>>> +         p.num_pr = 1;
>>>>> +         p.mode = p.orig_mode = TYPE_MODE (type);
>>>>> +         add_piece (p);
>>>>> +         return IS_PST;
>>>>> +       }
>>>>> +
>>>>> +      static const struct {
>>>>> +       machine_mode mode;
>>>>> +       unsigned int element_size;
>>>>> +       poly_uint64 vector_size;
>>>>> +      } valid_vectors[] = {
>>>>> +       { VNx8BFmode, 16, poly_uint64 (8, 8) }, /* svbfloat16_t */
>>>>> +       { VNx8HFmode, 16, poly_uint64 (8, 8) }, /* svfloat16_t */
>>>>> +       { VNx4SFmode, 32, poly_uint64 (4, 4) }, /* svfloat32_t */
>>>>> +       { VNx2DFmode, 64, poly_uint64 (2, 2) }, /* svfloat64_t */
>>>>> +       { VNx16BImode, 8, poly_uint64 (16, 16) }, /* sv[u]int8_t */
>>>>> +       { VNx8HImode, 16, poly_uint64 (8, 8) }, /* sv[u]int16_t */
>>>>> +       { VNx4SImode, 32, poly_uint64 (4, 4) }, /* sv[u]int32_t */
>>>>> +       { VNx2DImode, 64, poly_uint64 (2, 2) }, /* sv[u]int64_t */
>>>>> +      };
>>>>> +
>>>>> +      machine_mode elm_mode = TYPE_MODE (TREE_TYPE (type));
>>>>> +      unsigned int elm_size = GET_MODE_BITSIZE (elm_mode).to_constant ();
>>>>> +      for (unsigned i = 0;
>>>>> +          i < sizeof (valid_vectors) / sizeof (valid_vectors[0]); i++)
>>>>> +       if (valid_vectors[i].element_size == elm_size
>>>>> +           && valid_vectors[i].mode == TYPE_MODE (type)
>>>>> +           && known_eq (valid_vectors[i].vector_size,
>>>>> +                        TYPE_VECTOR_SUBPARTS (type)))
>>>>> +         {
>>>>> +           p.num_zr = 1;
>>>>> +           p.num_pr = 0;
>>>>> +           p.mode = p.orig_mode = valid_vectors[i].mode;
>>>>> +           add_piece (p);
>>>>> +           return IS_PST;
>>>>> +         }
>>>>> +
>>>>> +      fatal_error (input_location, "unsupported vector type %qT"
>>>>> +                  " as function parameter without SVE attributes", type);
>>>>> +    }
>>>>> +
>>>>>      /* Check for user-defined PSTs.  */
>>>>>      if (TREE_CODE (type) == ARRAY_TYPE)
>>>>>        return analyze_array (type);
>>>>> @@ -17851,6 +17908,8 @@ aarch64_override_options_after_change_1 (struct gcc_options *opts)
>>>>>        flag_mrecip_low_precision_sqrt = true;
>>>>>    }
>>>>>
>>>>> +enum aarch64_veclibabi aarch64_selected_veclibabi = aarch64_veclibabi_type_none;
>>>>> +
>>>>>    /* 'Unpack' up the internal tuning structs and update the options
>>>>>        in OPTS.  The caller must have set up selected_tune and selected_arch
>>>>>        as all the other target-specific codegen decisions are
>>>>> @@ -18031,6 +18090,9 @@ aarch64_override_options_internal (struct gcc_options *opts)
>>>>>          && opts->x_optimize >= aarch64_tune_params.prefetch->default_opt_level)
>>>>>        opts->x_flag_prefetch_loop_arrays = 1;
>>>>>
>>>>> +  if (opts->x_aarch64_veclibabi_type == aarch64_veclibabi_type_sleefgnu)
>>>>> +    aarch64_selected_veclibabi = aarch64_veclibabi_type_sleefgnu;
>>>>> +
>>>>>      aarch64_override_options_after_change_1 (opts);
>>>>>    }
>>>>>
>>>>> @@ -28085,6 +28147,10 @@ aarch64_libgcc_floating_mode_supported_p
>>>>>    #undef TARGET_CONST_ANCHOR
>>>>>    #define TARGET_CONST_ANCHOR 0x1000000
>>>>>
>>>>> +#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
>>>>> +#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \
>>>>> +  aarch64_builtin_vectorized_function
>>>>> +
>>>>>    struct gcc_target targetm = TARGET_INITIALIZER;
>>>>>
>>>>>    #include "gt-aarch64.h"
>>>>> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
>>>>> index 1d7967db9c0..76013dacdea 100644
>>>>> --- a/gcc/config/aarch64/aarch64.opt
>>>>> +++ b/gcc/config/aarch64/aarch64.opt
>>>>> @@ -302,3 +302,18 @@ Constant memset size in bytes from which to start using MOPS sequence.
>>>>>    -param=aarch64-vect-unroll-limit=
>>>>>    Target Joined UInteger Var(aarch64_vect_unroll_limit) Init(4) Param
>>>>>    Limit how much the autovectorizer may unroll a loop.
>>>>> +
>>>>> +;; -mveclibabi=
>>>>> +TargetVariable
>>>>> +enum aarch64_veclibabi aarch64_veclibabi_type = aarch64_veclibabi_type_none
>>>>> +
>>>>> +mveclibabi=
>>>>> +Target RejectNegative Joined Var(aarch64_veclibabi_type) Enum(aarch64_veclibabi) Init(aarch64_veclibabi_type_none)
>>>>> +Vector library ABI to use.
>>>>> +
>>>>> +Enum
>>>>> +Name(aarch64_veclibabi) Type(enum aarch64_veclibabi)
>>>>> +Known vectorization library ABIs (for use with the -mveclibabi= option):
>>>>> +
>>>>> +EnumValue
>>>>> +Enum(aarch64_veclibabi) String(sleefgnu) Value(aarch64_veclibabi_type_sleefgnu)
>>>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>>>>> index a38547f53e5..71fbbf27522 100644
>>>>> --- a/gcc/doc/invoke.texi
>>>>> +++ b/gcc/doc/invoke.texi
>>>>> @@ -20383,6 +20383,21 @@ across releases.
>>>>>
>>>>>    This option is only intended to be useful when developing GCC.
>>>>>
>>>>> +@opindex mveclibabi
>>>>> +@item -mveclibabi=@var{type}
>>>>> +Specifies the ABI type to use for vectorizing intrinsics using an
>>>>> +external library.  The only type supported at present is @samp{sleefgnu},
>>>>> +which specifies to use the GNU ABI variant of the Sleef Vectorized
>>>>> +Math Library. This flag can be used for both, Advanced SIMD (NEON) and SVE.
>>>>> +
>>>>> +GCC currently emits vectorized calls to @code{exp}, @code{log}, @code{log10},
>>>>> +@code{tanh}, @code{tan}, @code{atan}, @code{atanh}, @code{cbrt}, @code{sinh},
>>>>> +@code{sin}, @code{asinh} and @code{asin} when possible and profitable
>>>>> +on AArch64.
>>>>> +
>>>>> +Both @option{-ftree-vectorize} and @option{-funsafe-math-optimizations}
>>>>> +must also be enabled. The libsleefgnu must be specified at link time.
>>>>> +
>>>>>    @opindex mverbose-cost-dump
>>>>>    @item -mverbose-cost-dump
>>>>>    Enable verbose cost model dumping in the debug dump files.  This option is
>>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c b/gcc/testsuite/gcc.target/aarch64/vect-
>> vecabi-
>>>> sleefgnu-neon.c
>>>>> new file mode 100644
>>>>> index 00000000000..e9f6078cd12
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c
>>>>> @@ -0,0 +1,16 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O3 -march=armv8-a+simd -ftree-vectorize -mveclibabi=sleefgnu -ffast-math" } */
>>>>> +
>>>>> +extern float sinf(float);
>>>>> +
>>>>> +float x[256];
>>>>> +
>>>>> +void foo(void)
>>>>> +{
>>>>> +  int i;
>>>>> +
>>>>> +  for (i=0; i<256; ++i)
>>>>> +    x[i] = sinf(x[i]);
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler "_ZGVnN4v_sinf" } } */
>>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-
>>>> sleefgnu-sve.c
>>>>> new file mode 100644
>>>>> index 00000000000..8319ae420e1
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c
>>>>> @@ -0,0 +1,16 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O3 -march=armv8-a+sve -ftree-vectorize -mveclibabi=sleefgnu -ffast-math" } */
>>>>> +
>>>>> +extern float sinf(float);
>>>>> +
>>>>> +float x[256];
>>>>> +
>>>>> +void foo(void)
>>>>> +{
>>>>> +  int i;
>>>>> +
>>>>> +  for (i=0; i<256; ++i)
>>>>> +    x[i] = sinf(x[i]);
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-assembler "_ZGVsNxv_sinf" } } */
>>>>> --
>>>>> 2.25.1
>>>>>

diff mbox series

Patch

diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
index cc6b7c01fd1..f53fa91b8d0 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -47,6 +47,7 @@ 
 #include "stringpool.h"
 #include "attribs.h"
 #include "gimple-fold.h"
+#include "builtins.h"

 #define v8qi_UP  E_V8QImode
 #define v8di_UP  E_V8DImode
@@ -3450,6 +3451,118 @@  aarch64_resolve_overloaded_builtin_general (location_t loc, tree function,
   return NULL_TREE;
 }

+/* The vector library abi to use, if any.  */
+extern enum aarch64_veclibabi aarch64_selected_veclibabi;
+
+/* Returns a function declaration for a vectorized version of the combined
+   function with combined_fn code FN and the result vector type TYPE.
+   NULL_TREE is returned if there is none available.  */
+tree
+aarch64_builtin_vectorized_function (unsigned int fn_code,
+                                    tree type_out, tree type_in)
+{
+  if (TREE_CODE (type_out) != VECTOR_TYPE
+      || TREE_CODE (type_in) != VECTOR_TYPE
+      || aarch64_selected_veclibabi != aarch64_veclibabi_type_sleefgnu
+      || !flag_unsafe_math_optimizations)
+    return NULL_TREE;
+
+  machine_mode mode = TYPE_MODE (TREE_TYPE (type_out));
+  poly_uint64 n = TYPE_VECTOR_SUBPARTS (type_out);
+  if (mode != TYPE_MODE (TREE_TYPE (type_in))
+      || !known_eq (n, TYPE_VECTOR_SUBPARTS (type_in)))
+    return NULL_TREE;
+
+  bool is_scalable = !n.is_constant ();
+  if (is_scalable)
+    {
+      /* SVE is needed for scalable vectors, a SVE register's size is
+        always a multiple of 128.  */
+      if (!TARGET_SVE
+         || (mode == DFmode && !known_eq (n, poly_uint64 (2, 2)))
+         || (mode == SFmode && !known_eq (n, poly_uint64 (4, 4))))
+       return NULL_TREE;
+    }
+  else
+    {
+      /* A NEON register can hold two doubles or one float.  */
+      if (!TARGET_SIMD
+         || (mode == DFmode && n.to_constant () != 2)
+         || (mode == SFmode && n.to_constant () != 4))
+       return NULL_TREE;
+    }
+
+  tree fntype;
+  combined_fn fn = combined_fn (fn_code);
+  const char *argencoding;
+  switch (fn)
+    {
+      CASE_CFN_EXP:
+      CASE_CFN_LOG:
+      CASE_CFN_LOG10:
+      CASE_CFN_TANH:
+      CASE_CFN_TAN:
+      CASE_CFN_ATAN:
+      CASE_CFN_ATANH:
+      CASE_CFN_CBRT:
+      CASE_CFN_SINH:
+      CASE_CFN_SIN:
+      CASE_CFN_ASINH:
+      CASE_CFN_ASIN:
+      CASE_CFN_COSH:
+      CASE_CFN_COS:
+      CASE_CFN_ACOSH:
+      CASE_CFN_ACOS:
+       fntype = build_function_type_list (type_out, type_in, NULL);
+       argencoding = "v";
+       break;
+
+      CASE_CFN_POW:
+      CASE_CFN_ATAN2:
+       fntype = build_function_type_list (type_out, type_in, type_in, NULL);
+       argencoding = "vv";
+       break;
+
+      default:
+       return NULL_TREE;
+    }
+
+  tree fndecl = mathfn_built_in (mode == DFmode
+                                ? double_type_node : float_type_node, fn);
+  const char *scalar_name = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+  /* Builtins will always be prefixed with '__builtin_'.  */
+  gcc_assert (strncmp (scalar_name, "__builtin_", 10) == 0);
+  scalar_name += 10;
+
+  char vectorized_name[32];
+  if (is_scalable)
+    {
+      /* SVE ISA */
+      int n = snprintf (vectorized_name, sizeof (vectorized_name),
+                       "_ZGVsNx%s_%s", argencoding, scalar_name);
+      if (n < 0 || n > sizeof (vectorized_name))
+       return NULL_TREE;
+    }
+  else
+    {
+      /* NEON ISA */
+      int n = snprintf (vectorized_name, sizeof (vectorized_name),
+                       "_ZGVnN%d%s_%s", mode == SFmode ? 4 : 2,
+                       argencoding, scalar_name);
+      if (n < 0 || n > sizeof (vectorized_name))
+       return NULL_TREE;
+    }
+
+  tree new_fndecl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
+                               get_identifier (vectorized_name), fntype);
+  TREE_PUBLIC (new_fndecl) = 1;
+  TREE_READONLY (new_fndecl) = 1;
+  DECL_EXTERNAL (new_fndecl) = 1;
+  DECL_IS_NOVOPS (new_fndecl) = 1;
+
+  return new_fndecl;
+}
+
 #undef AARCH64_CHECK_BUILTIN_MODE
 #undef AARCH64_FIND_FRINT_VARIANT
 #undef CF0
diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h
index a9f3e2715ca..d12871b893c 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -98,4 +98,9 @@  enum aarch64_key_type {
   AARCH64_KEY_B
 };

+enum aarch64_veclibabi {
+  aarch64_veclibabi_type_none,
+  aarch64_veclibabi_type_sleefgnu
+};
+
 #endif
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 63339fa47df..53c6e455da8 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1066,4 +1066,7 @@  extern bool aarch64_harden_sls_blr_p (void);

 extern void aarch64_output_patchable_area (unsigned int, bool);

+extern tree aarch64_builtin_vectorized_function (unsigned int fn,
+                                                tree type_out, tree type_in);
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 42617ced73a..50ac37ff01e 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -84,6 +84,7 @@ 
 #include "aarch64-feature-deps.h"
 #include "config/arm/aarch-common.h"
 #include "config/arm/aarch-common-protos.h"
+#include "print-tree.h"

 /* This file should be included last.  */
 #include "target-def.h"
@@ -2951,6 +2952,62 @@  pure_scalable_type_info::analyze (const_tree type)
       return IS_PST;
     }

+  /* Only functions and types that are part of the ARM C Language
+     Extensions (arm_sve.h) have the SVE type attributes.
+     The auto-vectorizer does not annotate the vector types it creates with
+     those attributes. With the support of vectorized libm function
+     builtins  for SVE, scalable vectors without special attributes
+     have to be treated as well.  */
+  if (TREE_CODE (type) == VECTOR_TYPE
+      && !TYPE_VECTOR_SUBPARTS (type).is_constant ())
+    {
+      /* Boolean vectors are special because they are used by
+        the vectorizer as masks that must go into the
+        predicate registers.  */
+      if (TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE)
+       {
+         p.num_zr = 0;
+         p.num_pr = 1;
+         p.mode = p.orig_mode = TYPE_MODE (type);
+         add_piece (p);
+         return IS_PST;
+       }
+
+      static const struct {
+       machine_mode mode;
+       unsigned int element_size;
+       poly_uint64 vector_size;
+      } valid_vectors[] = {
+       { VNx8BFmode, 16, poly_uint64 (8, 8) }, /* svbfloat16_t */
+       { VNx8HFmode, 16, poly_uint64 (8, 8) }, /* svfloat16_t */
+       { VNx4SFmode, 32, poly_uint64 (4, 4) }, /* svfloat32_t */
+       { VNx2DFmode, 64, poly_uint64 (2, 2) }, /* svfloat64_t */
+       { VNx16BImode, 8, poly_uint64 (16, 16) }, /* sv[u]int8_t */
+       { VNx8HImode, 16, poly_uint64 (8, 8) }, /* sv[u]int16_t */
+       { VNx4SImode, 32, poly_uint64 (4, 4) }, /* sv[u]int32_t */
+       { VNx2DImode, 64, poly_uint64 (2, 2) }, /* sv[u]int64_t */
+      };
+
+      machine_mode elm_mode = TYPE_MODE (TREE_TYPE (type));
+      unsigned int elm_size = GET_MODE_BITSIZE (elm_mode).to_constant ();
+      for (unsigned i = 0;
+          i < sizeof (valid_vectors) / sizeof (valid_vectors[0]); i++)
+       if (valid_vectors[i].element_size == elm_size
+           && valid_vectors[i].mode == TYPE_MODE (type)
+           && known_eq (valid_vectors[i].vector_size,
+                        TYPE_VECTOR_SUBPARTS (type)))
+         {
+           p.num_zr = 1;
+           p.num_pr = 0;
+           p.mode = p.orig_mode = valid_vectors[i].mode;
+           add_piece (p);
+           return IS_PST;
+         }
+
+      fatal_error (input_location, "unsupported vector type %qT"
+                  " as function parameter without SVE attributes", type);
+    }
+
   /* Check for user-defined PSTs.  */
   if (TREE_CODE (type) == ARRAY_TYPE)
     return analyze_array (type);
@@ -17851,6 +17908,8 @@  aarch64_override_options_after_change_1 (struct gcc_options *opts)
     flag_mrecip_low_precision_sqrt = true;
 }

+enum aarch64_veclibabi aarch64_selected_veclibabi = aarch64_veclibabi_type_none;
+
 /* 'Unpack' up the internal tuning structs and update the options
     in OPTS.  The caller must have set up selected_tune and selected_arch
     as all the other target-specific codegen decisions are
@@ -18031,6 +18090,9 @@  aarch64_override_options_internal (struct gcc_options *opts)
       && opts->x_optimize >= aarch64_tune_params.prefetch->default_opt_level)
     opts->x_flag_prefetch_loop_arrays = 1;

+  if (opts->x_aarch64_veclibabi_type == aarch64_veclibabi_type_sleefgnu)
+    aarch64_selected_veclibabi = aarch64_veclibabi_type_sleefgnu;
+
   aarch64_override_options_after_change_1 (opts);
 }

@@ -28085,6 +28147,10 @@  aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_CONST_ANCHOR
 #define TARGET_CONST_ANCHOR 0x1000000

+#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
+#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \
+  aarch64_builtin_vectorized_function
+
 struct gcc_target targetm = TARGET_INITIALIZER;

 #include "gt-aarch64.h"
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index 1d7967db9c0..76013dacdea 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -302,3 +302,18 @@  Constant memset size in bytes from which to start using MOPS sequence.
 -param=aarch64-vect-unroll-limit=
 Target Joined UInteger Var(aarch64_vect_unroll_limit) Init(4) Param
 Limit how much the autovectorizer may unroll a loop.
+
+;; -mveclibabi=
+TargetVariable
+enum aarch64_veclibabi aarch64_veclibabi_type = aarch64_veclibabi_type_none
+
+mveclibabi=
+Target RejectNegative Joined Var(aarch64_veclibabi_type) Enum(aarch64_veclibabi) Init(aarch64_veclibabi_type_none)
+Vector library ABI to use.
+
+Enum
+Name(aarch64_veclibabi) Type(enum aarch64_veclibabi)
+Known vectorization library ABIs (for use with the -mveclibabi= option):
+
+EnumValue
+Enum(aarch64_veclibabi) String(sleefgnu) Value(aarch64_veclibabi_type_sleefgnu)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a38547f53e5..71fbbf27522 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -20383,6 +20383,21 @@  across releases.

 This option is only intended to be useful when developing GCC.

+@opindex mveclibabi
+@item -mveclibabi=@var{type}
+Specifies the ABI type to use for vectorizing intrinsics using an
+external library.  The only type supported at present is @samp{sleefgnu},
+which specifies to use the GNU ABI variant of the Sleef Vectorized
+Math Library. This flag can be used for both, Advanced SIMD (NEON) and SVE.
+
+GCC currently emits vectorized calls to @code{exp}, @code{log}, @code{log10},
+@code{tanh}, @code{tan}, @code{atan}, @code{atanh}, @code{cbrt}, @code{sinh},
+@code{sin}, @code{asinh} and @code{asin} when possible and profitable
+on AArch64.
+
+Both @option{-ftree-vectorize} and @option{-funsafe-math-optimizations}
+must also be enabled. The libsleefgnu must be specified at link time.
+
 @opindex mverbose-cost-dump
 @item -mverbose-cost-dump
 Enable verbose cost model dumping in the debug dump files.  This option is
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c
new file mode 100644
index 00000000000..e9f6078cd12
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-neon.c
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8-a+simd -ftree-vectorize -mveclibabi=sleefgnu -ffast-math" } */
+
+extern float sinf(float);
+
+float x[256];
+
+void foo(void)
+{
+  int i;
+
+  for (i=0; i<256; ++i)
+    x[i] = sinf(x[i]);
+}
+
+/* { dg-final { scan-assembler "_ZGVnN4v_sinf" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c
new file mode 100644
index 00000000000..8319ae420e1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vect-vecabi-sleefgnu-sve.c
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=armv8-a+sve -ftree-vectorize -mveclibabi=sleefgnu -ffast-math" } */
+
+extern float sinf(float);
+
+float x[256];
+
+void foo(void)
+{
+  int i;
+
+  for (i=0; i<256; ++i)
+    x[i] = sinf(x[i]);
+}
+
+/* { dg-final { scan-assembler "_ZGVsNxv_sinf" } } */