From patchwork Tue Feb 28 23:01:20 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kwok Cheung Yeung <kcy@codesourcery.com>
X-Patchwork-Id: 62675
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp3299802wrd;
        Tue, 28 Feb 2023 15:02:11 -0800 (PST)
X-Google-Smtp-Source: 
 AK7set/UvI1N3lh8SXh16upXlnDvw54zGfw53+M0IB7j6MYER/lzltOXxfwScfHHNd4JjPum2cCv
X-Received: by 2002:a17:906:698d:b0:889:5686:486a with SMTP id
 i13-20020a170906698d00b008895686486amr4785428ejr.30.1677625331008;
        Tue, 28 Feb 2023 15:02:11 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1677625330; cv=none;
        d=google.com; s=arc-20160816;
        b=pJkhU+KbUcPbPbdPwIkB6wuFFlgvbNkmmW3s+lKaDis5ZX4LzXcxbXBEl7oig5J5iA
         7VzjF1SY0VEw8D3cZZGI8gjM7gKHbQjv6KuLWNw7iXVy4ELfftSQv0lmMIJhCmf8dkx8
         6g6qXXn6ThxcEyU4tYQjDInGADHP4BxNOvGjSeZTDCzSGjdQm4cjQh8XDudCgG4y2YyP
         3Hon71t5YK/7vm4B69Dn0QddSYy3WPmnmjR52CjJhJ4w/fs8EiRlHt//GgxEzXZnYE5e
         UC+13TvZUI4WQeWoVz6YQ22/4bDW9EjYqrwGJeZyUkE1zW3u4D2LP1zUQkmSRxB8BEJK
         Vo5w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:list-subscribe:list-help:list-post:list-archive
         :list-unsubscribe:list-id:precedence:to:subject:from:user-agent
         :mime-version:date:message-id:ironport-sdr:dmarc-filter:delivered-to;
        bh=fK83lKbnRE8JbgsIxjLTwVk9xZLI0QWZTwD98pahBfE=;
        b=wqWaXTTUja3YiLp8Ou2KOmByGqjQknha5PoHwk+px6nEwz1BaW2UooifpUV38pPtlz
         MQhswNI2F2psITqp7FDn3COlTeQp87p3LxHHXwhXTE2d72QJwIkQT1mb5bOVNIJAfolt
         vIoK8/zeJQGKqgcFEZRqRF4f1lcs51DF6PY6rd4TP3DA5Q52bmezzywN4G3ehmHWFeBD
         6/GbzAScAeo80PQ95u3QbN3id2CWSHaAY9B93QomC0h1AmpWpkWI3z74QoBLkReQF9Nn
         ZYWBjiI+O7cHgT8eKp61zDJy5Xe9Gi9+Z3qPl79BU/0kgGWIpqrGWksBGg0dR7dIJGJZ
         bxuQ==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"
Received: from sourceware.org (server2.sourceware.org. [8.43.85.97])
        by mx.google.com with ESMTPS id
 i13-20020a1709063c4d00b008b13839a413si1213817ejg.50.2023.02.28.15.02.10
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 28 Feb 2023 15:02:10 -0800 (PST)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender) client-ip=8.43.85.97;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 46138385B50D
	for <ouuuleilei@gmail.com>; Tue, 28 Feb 2023 23:01:58 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98])
 by sourceware.org (Postfix) with ESMTPS id 46B3C3858D33
 for <gcc-patches@gcc.gnu.org>; Tue, 28 Feb 2023 23:01:29 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 46B3C3858D33
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none)
 header.from=codesourcery.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com
X-IronPort-AV: E=Sophos;i="5.98,223,1673942400"; d="scan'208,223";a="99858896"
Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167])
 by esa2.mentor.iphmx.com with ESMTP; 28 Feb 2023 15:01:26 -0800
IronPort-SDR: 
 GOR4QyOhb9nG1umZ7o9qJkQC6D4/544M14F3216cd6f6etKt4lWICJqI64OmqIOmydAuGwTqQu
 n7MVADz+ZxO0ALU1DIXosiN8eiqBfjKTaTNJnym8HYymPbKP0K/x02h4exyhLy03sfeU+DZzit
 n9n47i8yorddqdzeJIZLWeqPLQe+c6bWvtVesYF26B+6CztzFIWLUsIS56ACLr3vE7gBJH49vu
 jYF7hd6hcdVtd79Ta5S72VMLAROXYorSt+jqsm8kUt95UOe8g9pVyBNn24hIHJ+UcugaRnVBEF
 5wQ=
Message-ID: <f85990b0-1a7a-bc17-50c1-ef176cbd0dae@codesourcery.com>
Date: Tue, 28 Feb 2023 23:01:20 +0000
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101
 Thunderbird/102.8.0
From: Kwok Cheung Yeung <kcy@codesourcery.com>
Subject: [PATCH] amdgcn: Enable SIMD vectorization of math functions
To: gcc-patches <gcc-patches@gcc.gnu.org>, <ams@codesourcery.com>
X-Originating-IP: [137.202.0.90]
X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To
 svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12)
X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT,
 RCVD_IN_MSPIKE_H2,
 SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1759117658893739078?=
X-GMAIL-MSGID: =?utf-8?q?1759117658893739078?=

Hello

This patch implements the TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION 
target hook for the AMD GCN architecture, such that when vectorized, 
calls to builtin standard math functions such as asinf, exp, pow etc. 
are converted to calls to the recently added vectorized math functions 
for GCN in Newlib. The -fno-math-errno flag is required in addition to 
the usual vectorization optimization flags for this to occur, and some 
of the math functions (the larger double-precision ones) require a large 
stack size to function properly.

This patch requires the GCN vector math functions in Newlib to function 
- these were included in the recent 4.3.0.20230120 snapshot. As this was 
a minimum requirement starting from the patch 'amdgcn, libgomp: Manually 
allocated stacks', this should not be a problem.

I have added new testcases in the testsuite that compare the output of 
the vectorized math functions against the scalar, passing if they are 
sufficiently close. With the testcase for standalone GCN (without 
libgomp) in gcc.target/gcn/, there is a problem since gcn-run currently 
cannot set the stack size correctly in DejaGnu testing, so I have made 
it a compile test for now - it is still useful to check that calls to 
the correct functions are being made. The runtime correctness is still 
covered by the libgomp test.

Okay for trunk?

Thanks

Kwok
From 69d13dc898ff7c70e80299a92dc895a89a9e679b Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <kcy@codesourcery.com>
Date: Tue, 28 Feb 2023 14:15:47 +0000
Subject: [PATCH] amdgcn: Enable SIMD vectorization of math functions

Calls to vectorized versions of routines in the math library will now
be inserted when vectorizing code containing supported math functions.

2023-02-28  Kwok Cheung Yeung  <kcy@codesourcery.com>
	    Paul-Antoine Arras  <pa@codesourcery.com>

	gcc/
	* builtins.cc (mathfn_built_in_explicit): New.
	* config/gcn/gcn.cc: Include case-cfn-macros.h.
	(mathfn_built_in_explicit): Add prototype.
	(gcn_vectorize_builtin_vectorized_function): New.
	(gcn_libc_has_function): New.
	(TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION): Define.
	(TARGET_LIBC_HAS_FUNCTION): Define.

	gcc/testsuite/
	* gcc.target/gcn/simd-math-1.c: New testcase.

	libgomp/
	* testsuite/libgomp.c/simd-math-1.c: New testcase.
---
 gcc/builtins.cc                            |   8 +
 gcc/config/gcn/gcn.cc                      | 110 +++++++++++
 gcc/testsuite/gcc.target/gcn/simd-math-1.c | 210 ++++++++++++++++++++
 libgomp/testsuite/libgomp.c/simd-math-1.c  | 217 +++++++++++++++++++++
 4 files changed, 545 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/gcn/simd-math-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/simd-math-1.c

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 4d467c8c5c1..305c65c29be 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -2089,6 +2089,14 @@ mathfn_built_in (tree type, combined_fn fn)
   return mathfn_built_in_1 (type, fn, /*implicit=*/ 1);
 }
 
+/* Like mathfn_built_in_1, but always use the explicit array.  */
+
+tree
+mathfn_built_in_explicit (tree type, combined_fn fn)
+{
+  return mathfn_built_in_1 (type, fn, /*implicit=*/ 0);
+}
+
 /* Like mathfn_built_in_1, but take a built_in_function and
    always use the implicit array.  */
 
diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 23ab01e75d8..d99bb63d4c0 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -53,6 +53,7 @@
 #include "dwarf2.h"
 #include "gimple.h"
 #include "cgraph.h"
+#include "case-cfn-macros.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -5240,6 +5241,110 @@ gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node))
   return 0;
 }
 
+tree mathfn_built_in_explicit (tree, combined_fn);
+
+/* Implement TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION.
+   Return the function declaration of the vectorized version of the builtin
+   in the math library if available.  */
+
+tree
+gcn_vectorize_builtin_vectorized_function (unsigned int fn, tree type_out,
+					   tree type_in)
+{
+  if (TREE_CODE (type_out) != VECTOR_TYPE
+      || TREE_CODE (type_in) != VECTOR_TYPE)
+    return NULL_TREE;
+
+  machine_mode out_mode = TYPE_MODE (TREE_TYPE (type_out));
+  int out_n = TYPE_VECTOR_SUBPARTS (type_out);
+  machine_mode in_mode = TYPE_MODE (TREE_TYPE (type_in));
+  int in_n = TYPE_VECTOR_SUBPARTS (type_in);
+  combined_fn cfn = combined_fn (fn);
+
+  /* Keep this consistent with the list of vectorized math routines.  */
+  int implicit_p;
+  switch (fn)
+    {
+    CASE_CFN_ACOS:
+    CASE_CFN_ACOSH:
+    CASE_CFN_ASIN:
+    CASE_CFN_ASINH:
+    CASE_CFN_ATAN:
+    CASE_CFN_ATAN2:
+    CASE_CFN_ATANH:
+    CASE_CFN_COPYSIGN:
+    CASE_CFN_COS:
+    CASE_CFN_COSH:
+    CASE_CFN_ERF:
+    CASE_CFN_EXP:
+    CASE_CFN_EXP2:
+    CASE_CFN_FINITE:
+    CASE_CFN_FMOD:
+    CASE_CFN_GAMMA:
+    CASE_CFN_HYPOT:
+    CASE_CFN_ISNAN:
+    CASE_CFN_LGAMMA:
+    CASE_CFN_LOG:
+    CASE_CFN_LOG10:
+    CASE_CFN_LOG2:
+    CASE_CFN_POW:
+    CASE_CFN_REMAINDER:
+    CASE_CFN_RINT:
+    CASE_CFN_SIN:
+    CASE_CFN_SINH:
+    CASE_CFN_SQRT:
+    CASE_CFN_TAN:
+    CASE_CFN_TANH:
+    CASE_CFN_TGAMMA:
+      implicit_p = 1;
+      break;
+
+    CASE_CFN_SCALB:
+    CASE_CFN_SIGNIFICAND:
+      implicit_p = 0;
+      break;
+
+    default:
+      return NULL_TREE;
+    }
+
+  tree out_t_node = (out_mode == DFmode) ? double_type_node : float_type_node;
+  tree fndecl = implicit_p ? mathfn_built_in (out_t_node, cfn)
+			   : mathfn_built_in_explicit (out_t_node, cfn);
+
+  const char *bname = IDENTIFIER_POINTER (DECL_NAME (fndecl));
+  char name[20];
+  sprintf (name, out_mode == DFmode ? "v%ddf_%s" : "v%dsf_%s",
+	   out_n, bname + 10);
+
+  unsigned arity = 0;
+  for (tree args = DECL_ARGUMENTS (fndecl); args; args = TREE_CHAIN (args))
+    arity++;
+
+  tree fntype = (arity == 1)
+		? build_function_type_list (type_out, type_in, NULL)
+		: build_function_type_list (type_out, type_in, type_in, NULL);
+
+  /* Build a function declaration for the vectorized function.  */
+  tree new_fndecl = build_decl (BUILTINS_LOCATION,
+				FUNCTION_DECL, get_identifier (name), fntype);
+  TREE_PUBLIC (new_fndecl) = 1;
+  DECL_EXTERNAL (new_fndecl) = 1;
+  DECL_IS_NOVOPS (new_fndecl) = 1;
+  TREE_READONLY (new_fndecl) = 1;
+
+  return new_fndecl;
+}
+
+/* Implement TARGET_LIBC_HAS_FUNCTION.  */
+
+bool
+gcn_libc_has_function (enum function_class fn_class,
+		       tree type)
+{
+  return bsd_libc_has_function (fn_class, type);
+}
+
 /* }}}  */
 /* {{{ md_reorg pass.  */
 
@@ -7324,6 +7429,11 @@ gcn_dwarf_register_span (rtx rtl)
   gcn_simd_clone_compute_vecsize_and_simdlen
 #undef  TARGET_SIMD_CLONE_USABLE
 #define TARGET_SIMD_CLONE_USABLE gcn_simd_clone_usable
+#undef TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
+#define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \
+  gcn_vectorize_builtin_vectorized_function
+#undef TARGET_LIBC_HAS_FUNCTION
+#define TARGET_LIBC_HAS_FUNCTION gcn_libc_has_function
 #undef  TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P
 #define TARGET_SMALL_REGISTER_CLASSES_FOR_MODE_P \
   gcn_small_register_classes_for_mode_p
diff --git a/gcc/testsuite/gcc.target/gcn/simd-math-1.c b/gcc/testsuite/gcc.target/gcn/simd-math-1.c
new file mode 100644
index 00000000000..54e8761f720
--- /dev/null
+++ b/gcc/testsuite/gcc.target/gcn/simd-math-1.c
@@ -0,0 +1,210 @@
+/* Check that the SIMD versions of math routines give the same (or
+   sufficiently close) results as their scalar equivalents, and that the
+   calls to the vectorized math functions are actually emitted.  */
+
+/* Ideally this test should be run, but the math routines require a large
+   stack and gcn-run currently does not respect the stack-size parameter.  */
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -fno-math-errno -mstack-size=3000000 -fdump-tree-vect" } */
+
+
+#undef PRINT_RESULT
+#define VERBOSE 0
+#define EARLY_EXIT 1
+
+#include <math.h>
+#include <stdlib.h>
+
+#ifdef PRINT_RESULT
+  #include <stdio.h>
+  #define PRINTF printf
+#else
+  static void null_printf (const char *f, ...) { }
+
+  #define PRINTF null_printf
+#endif
+
+#define N 512
+#define EPSILON_float 1e-5
+#define EPSILON_double 1e-10
+
+static int failed = 0;
+
+int deviation_float (float x, float y)
+{
+  union {
+    float f;
+    unsigned u;
+  } u, v;
+
+  u.f = x;
+  v.f = y;
+
+  unsigned mask = 0x80000000U; 
+  int i;
+
+  for (i = 32; i > 0; i--)
+    if ((u.u ^ v.u) & mask)
+      break;
+    else
+      mask >>= 1;
+
+  return i;
+}
+
+int deviation_double (double x, double y)
+{
+  union {
+    double d;
+    unsigned long long u;
+  } u, v;
+
+  u.d = x;
+  v.d = y;
+
+  unsigned long long mask = 0x8000000000000000ULL;
+  int i;
+
+  for (i = 64; i > 0; i--)
+    if ((u.u ^ v.u) & mask)
+      break;
+    else
+      mask >>= 1;
+
+  return i;
+}
+
+#define TEST_FUN(TFLOAT, LOW, HIGH, FUN) \
+__attribute__((optimize("no-tree-vectorize"))) \
+__attribute__((optimize("no-unsafe-math-optimizations"))) \
+void check_##FUN (TFLOAT res[N], TFLOAT a[N]) \
+{ \
+  int failed = 0; \
+  for (int i = 0; i < N; i++) { \
+    TFLOAT expected = FUN (a[i]); \
+    TFLOAT diff = __builtin_fabs (expected - res[i]); \
+    int deviation = deviation_##TFLOAT (expected, res[i]); \
+    int fail = isnan (res[i]) != isnan (expected) \
+               || isinf (res[i]) != isinf (expected) \
+               || (diff > EPSILON_##TFLOAT && deviation > 10); \
+    failed |= fail; \
+    if (VERBOSE || fail) \
+      PRINTF (#FUN "(%f) = %f, expected = %f, diff = %f, deviation = %d %s\n", \
+              a[i], res[i], expected, diff, deviation, fail ? "(!)" : ""); \
+    if (EARLY_EXIT && fail) \
+      exit (1); \
+  } \
+} \
+void test_##FUN (void) \
+{ \
+  TFLOAT res[N], a[N]; \
+  for (int i = 0; i < N; i++) \
+    a[i] = LOW + ((HIGH - LOW) / N) * i; \
+  for (int i = 0; i < N; i++) \
+    res[i] = FUN (a[i]); \
+  check_##FUN (res, a); \
+}\
+test_##FUN ();
+
+#define TEST_FUN2(TFLOAT, LOW1, HIGH1, LOW2, HIGH2, FUN) \
+__attribute__((optimize("no-tree-vectorize"))) \
+__attribute__((optimize("no-unsafe-math-optimizations"))) \
+void check_##FUN (TFLOAT res[N], TFLOAT a[N], TFLOAT b[N]) \
+{ \
+  int failed = 0; \
+  for (int i = 0; i < N; i++) { \
+    TFLOAT expected = FUN (a[i], b[i]); \
+    TFLOAT diff = __builtin_fabs (expected - res[i]); \
+    int deviation = deviation_##TFLOAT (expected, res[i]); \
+    int fail = isnan (res[i]) != isnan (expected) \
+               || isinf (res[i]) != isinf (expected) \
+               || (diff > EPSILON_##TFLOAT && deviation > 10); \
+    failed |= fail; \
+    if (VERBOSE || fail) \
+      PRINTF (#FUN "(%f,%f) = %f, expected = %f, diff = %f, deviation = %d %s\n", \
+              a[i], b[i], res[i], expected, diff, deviation, fail ? "(!)" : ""); \
+    if (EARLY_EXIT && fail) \
+      exit (1); \
+  } \
+} \
+void test_##FUN (void) \
+{ \
+  TFLOAT res[N], a[N], b[N]; \
+  for (int i = 0; i < N; i++) { \
+    a[i] = LOW1 + ((HIGH1 - LOW1) / N) * i; \
+    b[i] = LOW2 + ((HIGH2 - LOW2) / N) * i; \
+  } \
+  for (int i = 0; i < N; i++) \
+    res[i] = FUN (a[i], b[i]); \
+  check_##FUN (res, a, b); \
+}\
+test_##FUN ();
+
+int main (void)
+{
+  TEST_FUN (float, -1.1, 1.1, acosf); /* { dg-final { scan-tree-dump "v64sf_acosf" "vect" } }*/
+  TEST_FUN (float, -10, 10, acoshf); /* { dg-final { scan-tree-dump "v64sf_acoshf" "vect" } }*/
+  TEST_FUN (float, -1.1, 1.1, asinf); /* { dg-final { scan-tree-dump "v64sf_asinf" "vect" } }*/
+  TEST_FUN (float, -10, 10, asinhf); /* { dg-final { scan-tree-dump "v64sf_asinhf" "vect" } }*/
+  TEST_FUN (float, -1.1, 1.1, atanf); /* { dg-final { scan-tree-dump "v64sf_atanf" "vect" } }*/
+  TEST_FUN2 (float, -2.0, 2.0, 2.0, -2.0, atan2f); /* { dg-final { scan-tree-dump "v64sf_atan2f" "vect" } }*/
+  TEST_FUN (float, -2.0, 2.0, atanhf); /* { dg-final { scan-tree-dump "v64sf_atanhf" "vect" } }*/
+  TEST_FUN2 (float, -10.0, 10.0, 5.0, -15.0, copysignf); /* { dg-final { scan-tree-dump "v64sf_copysignf" "vect" } }*/
+  TEST_FUN (float, -3.14159265359, 3.14159265359, cosf); /* { dg-final { scan-tree-dump "v64sf_cosf" "vect" } }*/
+  TEST_FUN (float, -3.14159265359, 3.14159265359, coshf); /* { dg-final { scan-tree-dump "v64sf_coshf" "vect" } }*/
+  TEST_FUN (float, -10.0, 10.0, erff);  /* { dg-final { scan-tree-dump "v64sf_erff" "vect" } }*/
+  TEST_FUN (float, -10.0, 10.0, expf); /* { dg-final { scan-tree-dump "v64sf_expf" "vect" } }*/
+  TEST_FUN (float, -10.0, 10.0, exp2f); /* { dg-final { scan-tree-dump "v64sf_exp2f" "vect" } }*/
+  TEST_FUN2 (float, -10.0, 10.0, 100.0, -25.0, fmodf); /* { dg-final { scan-tree-dump "v64sf_fmodf" "vect" } }*/
+  TEST_FUN (float, -10.0, 10.0, gammaf); /* { dg-final { scan-tree-dump "v64sf_gammaf" "vect" { xfail *-*-*} } }*/
+  TEST_FUN2 (float, -10.0, 10.0, 15.0, -5.0,hypotf); /* { dg-final { scan-tree-dump "v64sf_hypotf" "vect" } }*/
+  TEST_FUN (float, -10.0, 10.0, lgammaf); /* { dg-final { scan-tree-dump "v64sf_lgammaf" "vect" { xfail *-*-*} } }*/
+  TEST_FUN (float, -1.0, 50.0, logf); /* { dg-final { scan-tree-dump "v64sf_logf" "vect" } }*/
+  TEST_FUN (float, -1.0, 500.0, log10f); /* { dg-final { scan-tree-dump "v64sf_log10f" "vect" } }*/
+  TEST_FUN (float, -1.0, 64.0, log2f); /* { dg-final { scan-tree-dump "v64sf_log2f" "vect" } }*/
+  TEST_FUN2 (float, -100.0, 100.0, 100.0, -100.0, powf); /* { dg-final { scan-tree-dump "v64sf_powf" "vect" } }*/
+  TEST_FUN2 (float, -50.0, 100.0, -2.0, 40.0, remainderf); /* { dg-final { scan-tree-dump "v64sf_remainderf" "vect" } }*/
+  TEST_FUN (float, -50.0, 50.0, rintf);  /* { dg-final { scan-tree-dump "v64sf_rintf" "vect" } }*/
+  TEST_FUN2 (float, -50.0, 50.0, -10.0, 32.0, __builtin_scalbf); /* { dg-final { scan-tree-dump "v64sf_scalbf" "vect" } }*/
+  TEST_FUN (float, -10.0, 10.0, __builtin_significandf); /* { dg-final { scan-tree-dump "v64sf_significandf" "vect" } }*/
+  TEST_FUN (float, -3.14159265359, 3.14159265359, sinf); /* { dg-final { scan-tree-dump "v64sf_sinf" "vect" } }*/
+  TEST_FUN (float, -3.14159265359, 3.14159265359, sinhf); /* { dg-final { scan-tree-dump "v64sf_sinhf" "vect" } }*/
+  TEST_FUN (float, -0.1, 10000.0, sqrtf); /* { dg-final { scan-tree-dump "v64sf_sqrtf" "vect" } }*/
+  TEST_FUN (float, -5.0, 5.0, tanf); /* { dg-final { scan-tree-dump "v64sf_tanf" "vect" } }*/
+  TEST_FUN (float, -3.14159265359, 3.14159265359, tanhf); /* { dg-final { scan-tree-dump "v64sf_tanhf" "vect" } }*/
+  TEST_FUN (float, -10.0, 10.0, tgammaf); /* { dg-final { scan-tree-dump "v64sf_tgammaf" "vect" } }*/
+
+  TEST_FUN (double, -1.1, 1.1, acos); /* { dg-final { scan-tree-dump "v64df_acos" "vect" } }*/
+  TEST_FUN (double, -10, 10, acosh); /* { dg-final { scan-tree-dump "v64df_acosh" "vect" } }*/
+  TEST_FUN (double, -1.1, 1.1, asin); /* { dg-final { scan-tree-dump "v64df_asin" "vect" } }*/
+  TEST_FUN (double, -10, 10, asinh); /* { dg-final { scan-tree-dump "v64df_asinh" "vect" } }*/
+  TEST_FUN (double, -1.1, 1.1, atan); /* { dg-final { scan-tree-dump "v64df_atan" "vect" } }*/
+  TEST_FUN2 (double, -2.0, 2.0, 2.0, -2.0, atan2); /* { dg-final { scan-tree-dump "v64df_atan2" "vect" } }*/
+  TEST_FUN (double, -2.0, 2.0, atanh); /* { dg-final { scan-tree-dump "v64df_atanh" "vect" } }*/
+  TEST_FUN2 (double, -10.0, 10.0, 5.0, -15.0, copysign); /* { dg-final { scan-tree-dump "v64df_copysign" "vect" } }*/
+  TEST_FUN (double, -3.14159265359, 3.14159265359, cos); /* { dg-final { scan-tree-dump "v64df_cos" "vect" } }*/
+  TEST_FUN (double, -3.14159265359, 3.14159265359, cosh); /* { dg-final { scan-tree-dump "v64df_cosh" "vect" } }*/
+  TEST_FUN (double, -10.0, 10.0, erf); /* { dg-final { scan-tree-dump "v64df_erf" "vect" } }*/
+  TEST_FUN (double, -10.0, 10.0, exp); /* { dg-final { scan-tree-dump "v64df_exp" "vect" } }*/
+  TEST_FUN (double, -10.0, 10.0, exp2); /* { dg-final { scan-tree-dump "v64df_exp2" "vect" } }*/
+  TEST_FUN2 (double, -10.0, 10.0, 100.0, -25.0, fmod); /* { dg-final { scan-tree-dump "v64df_fmod" "vect" } }*/
+  TEST_FUN (double, -10.0, 10.0, gamma); /* { dg-final { scan-tree-dump "v64df_gamma" "vect" { xfail *-*-*} } }*/
+  TEST_FUN2 (double, -10.0, 10.0, 15.0, -5.0, hypot); /* { dg-final { scan-tree-dump "v64df_hypot" "vect" } }*/
+  TEST_FUN (double, -10.0, 10.0, lgamma); /* { dg-final { scan-tree-dump "v64df_lgamma" "vect" { xfail *-*-*} } }*/
+  TEST_FUN (double, -1.0, 50.0, log); /* { dg-final { scan-tree-dump "v64df_log" "vect" } }*/
+  TEST_FUN (double, -1.0, 500.0, log10); /* { dg-final { scan-tree-dump "v64df_log10" "vect" } }*/
+  TEST_FUN (double, -1.0, 64.0, log2); /* { dg-final { scan-tree-dump "v64df_log2" "vect" { xfail *-*-*} } }*/
+  TEST_FUN2 (double, -100.0, 100.0, 100.0, -100.0, pow); /* { dg-final { scan-tree-dump "v64df_pow" "vect" } }*/
+  TEST_FUN2 (double, -50.0, 100.0, -2.0, 40.0, remainder); /* { dg-final { scan-tree-dump "v64df_remainder" "vect" } }*/
+  TEST_FUN (double, -50.0, 50.0, rint); /* { dg-final { scan-tree-dump "v64df_rint" "vect" } }*/
+  TEST_FUN2 (double, -50.0, 50.0, -10.0, 32.0, __builtin_scalb); /* { dg-final { scan-tree-dump "v64df_scalb" "vect" } }*/
+  TEST_FUN (double, -10.0, 10.0, __builtin_significand); /* { dg-final { scan-tree-dump "v64df_significand" "vect" } }*/
+  TEST_FUN (double, -3.14159265359, 3.14159265359, sin); /* { dg-final { scan-tree-dump "v64df_sin" "vect" } }*/
+  TEST_FUN (double, -3.14159265359, 3.14159265359, sinh); /* { dg-final { scan-tree-dump "v64df_sinh" "vect" } }*/
+  TEST_FUN (double, -0.1, 10000.0, sqrt); /* { dg-final { scan-tree-dump "v64df_sqrt" "vect" } }*/
+  TEST_FUN (double, -5.0, 5.0, tan); /* { dg-final { scan-tree-dump "v64df_tan" "vect" } }*/
+  TEST_FUN (double, -3.14159265359, 3.14159265359, tanh); /* { dg-final { scan-tree-dump "v64df_tanh" "vect" } }*/
+  TEST_FUN (double, -10.0, 10.0, tgamma); /* { dg-final { scan-tree-dump "v64df_tgamma" "vect" } }*/
+
+  return failed;
+}
diff --git a/libgomp/testsuite/libgomp.c/simd-math-1.c b/libgomp/testsuite/libgomp.c/simd-math-1.c
new file mode 100644
index 00000000000..947bf606e36
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/simd-math-1.c
@@ -0,0 +1,217 @@
+/* Check that the SIMD versions of math routines give the same (or
+   sufficiently close) results as their scalar equivalents.  */
+
+/* { dg-do run } */
+/* { dg-options "-O2 -ftree-vectorize -fno-math-errno" } */
+/* { dg-additional-options -foffload-options=amdgcn-amdhsa=-mstack-size=3000000 { target offload_target_amdgcn } } */
+/* { dg-additional-options -foffload-options=-lm } */
+
+#undef PRINT_RESULT
+#define VERBOSE 0
+#define EARLY_EXIT 1
+
+#include <math.h>
+#include <stdlib.h>
+
+#ifdef PRINT_RESULT
+  #include <stdio.h>
+  #define PRINTF printf
+#else
+  static void null_printf (const char *f, ...) { }
+
+  #define PRINTF null_printf
+#endif
+
+#define N 512
+#define EPSILON_float 1e-5
+#define EPSILON_double 1e-10
+
+static int xfail = 0;
+static int failed = 0;
+
+int deviation_float (float x, float y)
+{
+  union {
+    float f;
+    unsigned u;
+  } u, v;
+
+  u.f = x;
+  v.f = y;
+
+  unsigned mask = 0x80000000U;
+  int i;
+
+  for (i = 32; i > 0; i--)
+    if ((u.u ^ v.u) & mask)
+      break;
+    else
+      mask >>= 1;
+
+  return i;
+}
+
+int deviation_double (double x, double y)
+{
+  union {
+    double d;
+    unsigned long long u;
+  } u, v;
+
+  u.d = x;
+  v.d = y;
+
+  unsigned long long mask = 0x8000000000000000ULL;
+  int i;
+
+  for (i = 64; i > 0; i--)
+    if ((u.u ^ v.u) & mask)
+      break;
+    else
+      mask >>= 1;
+
+  return i;
+}
+
+#define TEST_FUN_XFAIL(TFLOAT, LOW, HIGH, FUN) \
+  xfail = 1; \
+  TEST_FUN (TFLOAT, LOW, HIGH, FUN); \
+  xfail = 0;
+
+#define TEST_FUN(TFLOAT, LOW, HIGH, FUN) \
+__attribute__((optimize("no-tree-vectorize"))) \
+__attribute__((optimize("no-unsafe-math-optimizations"))) \
+void check_##FUN (TFLOAT res[N], TFLOAT a[N]) \
+{ \
+  for (int i = 0; i < N; i++) { \
+    TFLOAT expected = FUN (a[i]); \
+    TFLOAT diff = __builtin_fabs (expected - res[i]); \
+    int deviation = deviation_##TFLOAT (expected, res[i]); \
+    int fail = isnan (res[i]) != isnan (expected) \
+	       || isinf (res[i]) != isinf (expected) \
+	       || (diff > EPSILON_##TFLOAT && deviation > 10); \
+    if (VERBOSE || fail) \
+      PRINTF (#FUN "(%f) = %f, expected = %f, diff = %f, deviation = %d %s\n", \
+	      a[i], res[i], expected, diff, deviation, fail ? "(!)" : ""); \
+    failed |= (fail && !xfail); \
+    if (EARLY_EXIT && failed) \
+      exit (1); \
+  } \
+} \
+void test_##FUN (void) \
+{ \
+  TFLOAT res[N], a[N]; \
+  for (int i = 0; i < N; i++) \
+    a[i] = LOW + ((HIGH - LOW) / N) * i; \
+  _Pragma ("omp target parallel for simd map(to:a) map(from:res)") \
+    for (int i = 0; i < N; i++) \
+      res[i] = FUN (a[i]); \
+  check_##FUN (res, a); \
+}\
+test_##FUN ();
+
+#define TEST_FUN2(TFLOAT, LOW1, HIGH1, LOW2, HIGH2, FUN) \
+__attribute__((optimize("no-tree-vectorize"))) \
+__attribute__((optimize("no-unsafe-math-optimizations"))) \
+void check_##FUN (TFLOAT res[N], TFLOAT a[N], TFLOAT b[N]) \
+{ \
+  int failed = 0; \
+  for (int i = 0; i < N; i++) { \
+    TFLOAT expected = FUN (a[i], b[i]); \
+    TFLOAT diff = __builtin_fabs (expected - res[i]); \
+    int deviation = deviation_##TFLOAT (expected, res[i]); \
+    int fail = isnan (res[i]) != isnan (expected) \
+	       || isinf (res[i]) != isinf (expected) \
+	       || (diff > EPSILON_##TFLOAT && deviation > 10); \
+    failed |= fail; \
+    if (VERBOSE || fail) \
+      PRINTF (#FUN "(%f,%f) = %f, expected = %f, diff = %f, deviation = %d %s\n", \
+	      a[i], b[i], res[i], expected, diff, deviation, fail ? "(!)" : ""); \
+    if (EARLY_EXIT && fail) \
+      exit (1); \
+  } \
+} \
+void test_##FUN (void) \
+{ \
+  TFLOAT res[N], a[N], b[N]; \
+  for (int i = 0; i < N; i++) { \
+    a[i] = LOW1 + ((HIGH1 - LOW1) / N) * i; \
+    b[i] = LOW2 + ((HIGH2 - LOW2) / N) * i; \
+  } \
+  _Pragma ("omp target parallel for simd map(to:a) map(from:res)") \
+    for (int i = 0; i < N; i++) \
+      res[i] = FUN (a[i], b[i]); \
+  check_##FUN (res, a, b); \
+}\
+test_##FUN ();
+
+int main (void)
+{
+  TEST_FUN (float, -1.1, 1.1, acosf);
+  TEST_FUN (float, -10, 10, acoshf);
+  TEST_FUN (float, -1.1, 1.1, asinf);
+  TEST_FUN (float, -10, 10, asinhf);
+  TEST_FUN (float, -1.1, 1.1, atanf);
+  TEST_FUN2 (float, -2.0, 2.0, 2.0, -2.0, atan2f);
+  TEST_FUN (float, -2.0, 2.0, atanhf);
+  TEST_FUN2 (float, -10.0, 10.0, 5.0, -15.0, copysignf);
+  TEST_FUN (float, -3.14159265359, 3.14159265359, cosf);
+  TEST_FUN (float, -3.14159265359, 3.14159265359, coshf);
+  TEST_FUN (float, -10.0, 10.0, erff);
+  TEST_FUN (float, -10.0, 10.0, expf);
+  TEST_FUN (float, -10.0, 10.0, exp2f);
+  TEST_FUN2 (float, -10.0, 10.0, 100.0, -25.0, fmodf);
+  TEST_FUN (float, -10.0, 10.0, gammaf);
+  TEST_FUN2 (float, -10.0, 10.0, 15.0, -5.0,hypotf);
+  TEST_FUN (float, -10.0, 10.0, lgammaf);
+  TEST_FUN (float, -1.0, 50.0, logf);
+  TEST_FUN (float, -1.0, 500.0, log10f);
+  TEST_FUN (float, -1.0, 64.0, log2f);
+  TEST_FUN2 (float, -100.0, 100.0, 100.0, -100.0, powf);
+  TEST_FUN2 (float, -50.0, 100.0, -2.0, 40.0, remainderf);
+  TEST_FUN (float, -50.0, 50.0, rintf);
+  TEST_FUN2 (float, -50.0, 50.0, -10.0, 32.0, __builtin_scalbf);
+  TEST_FUN (float, -10.0, 10.0, __builtin_significandf);
+  TEST_FUN (float, -3.14159265359, 3.14159265359, sinf);
+  TEST_FUN (float, -3.14159265359, 3.14159265359, sinhf);
+  TEST_FUN (float, -0.1, 10000.0, sqrtf);
+  TEST_FUN (float, -5.0, 5.0, tanf);
+  TEST_FUN (float, -3.14159265359, 3.14159265359, tanhf);
+  /* Newlib's version of tgammaf is known to have poor accuracy.  */
+  TEST_FUN_XFAIL (float, -10.0, 10.0, tgammaf);
+
+  TEST_FUN (double, -1.1, 1.1, acos);
+  TEST_FUN (double, -10, 10, acosh);
+  TEST_FUN (double, -1.1, 1.1, asin);
+  TEST_FUN (double, -10, 10, asinh);
+  TEST_FUN (double, -1.1, 1.1, atan);
+  TEST_FUN2 (double, -2.0, 2.0, 2.0, -2.0, atan2);
+  TEST_FUN (double, -2.0, 2.0, atanh);
+  TEST_FUN2 (double, -10.0, 10.0, 5.0, -15.0, copysign);
+  TEST_FUN (double, -3.14159265359, 3.14159265359, cos);
+  TEST_FUN (double, -3.14159265359, 3.14159265359, cosh);
+  TEST_FUN (double, -10.0, 10.0, erf);
+  TEST_FUN (double, -10.0, 10.0, exp);
+  TEST_FUN (double, -10.0, 10.0, exp2);
+  TEST_FUN2 (double, -10.0, 10.0, 100.0, -25.0, fmod);
+  TEST_FUN (double, -10.0, 10.0, gamma);
+  TEST_FUN2 (double, -10.0, 10.0, 15.0, -5.0, hypot);
+  TEST_FUN (double, -10.0, 10.0, lgamma);
+  TEST_FUN (double, -1.0, 50.0, log);
+  TEST_FUN (double, -1.0, 500.0, log10);
+  TEST_FUN (double, -1.0, 64.0, log2);
+  TEST_FUN2 (double, -100.0, 100.0, 100.0, -100.0, pow);
+  TEST_FUN2 (double, -50.0, 100.0, -2.0, 40.0, remainder);
+  TEST_FUN (double, -50.0, 50.0, rint);
+  TEST_FUN2 (double, -50.0, 50.0, -10.0, 32.0, __builtin_scalb);
+  TEST_FUN (double, -10.0, 10.0, __builtin_significand);
+  TEST_FUN (double, -3.14159265359, 3.14159265359, sin);
+  TEST_FUN (double, -3.14159265359, 3.14159265359, sinh);
+  TEST_FUN (double, -0.1, 10000.0, sqrt);
+  TEST_FUN (double, -5.0, 5.0, tan);
+  TEST_FUN (double, -3.14159265359, 3.14159265359, tanh);
+  /* Newlib's version of tgamma is known to have poor accuracy.  */
+  TEST_FUN_XFAIL (double, -10.0, 10.0, tgamma);
+
+  return failed;
+}