From patchwork Tue Oct 17 05:13:27 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: liuhongt <hongtao.liu@intel.com>
X-Patchwork-Id: 153897
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id
 ib8csp3907664vqb;
        Mon, 16 Oct 2023 22:13:59 -0700 (PDT)
X-Google-Smtp-Source: 
 AGHT+IHlZQUIbMIcGNs9qhTtRexcHsHhcVzzAEclVReQjlQwc3e71jOyUfceXsdGSPTPvKMSk+Bz
X-Received: by 2002:a05:620a:1927:b0:773:fd71:6e7c with SMTP id
 bj39-20020a05620a192700b00773fd716e7cmr1277780qkb.59.1697519639009;
        Mon, 16 Oct 2023 22:13:59 -0700 (PDT)
ARC-Seal: i=2; a=rsa-sha256; t=1697519638; cv=pass;
        d=google.com; s=arc-20160816;
        b=0tNsAk0YSYWG2+bAI44ePx+w6D4LKvSfx03/qF1xs0tqHF+aDRa5lN0ghiKoQYhaj9
         evQfvD8HXujDDQXMnanWdKVP9PLVWVEkE4H2fDBJxluwyI1yGx+Xv4iweuzROMdgI5OT
         xIuvsEhkiekpnLvY7afTtCw/eOQE/Fz/vxT1ICtWAwR/mUk3N/ln1TNHpCV04lY9+w8R
         /Z/zO83oXfyGH3gnjBRNdLL6CMciZZMU/jPyL5ntrgr1pyQzLVD8dfF4fWzswjza0UBj
         T43z5dRizC5qbYR9j7XqGnvOxgxs2EWVZwi6r5KUnpO1r21Zu8h9WOqcQYh/FMvtvNT5
         77AQ==
ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=errors-to:list-subscribe:list-help:list-post:list-archive
         :list-unsubscribe:list-id:precedence:content-transfer-encoding
         :mime-version:message-id:date:subject:cc:to:from:dkim-signature
         :arc-filter:dmarc-filter:delivered-to;
        bh=hQbvScsvsI97ysi2ND2+BzLFSeBGjG3DCs3iJWYBp78=;
        fh=ChXOctppJn0KECDRINafwUY5xHRufGHaa0Ju9pddrcQ=;
        b=v0psC0Yq6ixmw8TGOm51Iu7BUx+oH9QcQ5LXL9fHp2RRuEc+3yMzDxfD9s5yGEkal7
         dVktMhpFCli+0anrLaB/R9k2hzO3dr2ZRTszE00QNJ6aK95OEC7lSwt8VwRvYzBd9lcE
         kniqU7/YbhA4lHq2Cqf4b90/ZBNQ7RdNHXLqpmI6XgnOGHigeyUUIkGNJ5NpunGDOttd
         xNIhGs9p7gWOxcA9+212W7cpZ6rvCbJe4JgUrgqGtECVSfGAUjPlXQ0qt+bLviSCB1ov
         IbiCY4dQhLOTQhWn3OSRbfc18QKneQhY0pITq2hUBXSTkrEQJSo1jE3NVZ54+QBYH7ZU
         w/aw==
ARC-Authentication-Results: i=2; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=SrBjEfXy;
       arc=pass (i=1);
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org.
 [8.43.85.97])
        by mx.google.com with ESMTPS id
 b8-20020a05620a0cc800b0076f1024f25asi550165qkj.448.2023.10.16.22.13.58
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 16 Oct 2023 22:13:58 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender) client-ip=8.43.85.97;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=SrBjEfXy;
       arc=pass (i=1);
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id C43E33857353
	for <ouuuleilei@gmail.com>; Tue, 17 Oct 2023 05:13:58 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31])
 by sourceware.org (Postfix) with ESMTPS id 28F393858423
 for <gcc-patches@gcc.gnu.org>; Tue, 17 Oct 2023 05:13:32 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 28F393858423
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 28F393858423
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=134.134.136.31
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697519615; cv=none;
 b=H8JZXpzUN7ZAK7troZgHN+1ryuFwcI+/VvVTT6NGyMW8twjzPMlE24espd8mBnj/VfX+F8FMx6vs2IiANkFyMfwbNPCA/PYsYrllh3AFEW2VcIdfYLGrGgGc+YBFxb/ZNwMVzrnuq3KHsi5RU5sYCG0J9mxpuvbrBdfqmHlEkNk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1697519615; c=relaxed/simple;
 bh=+HsDM486HAWMwmXkj+i3E5gDUkYCxpF9a88C6MZZa1E=;
 h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version;
 b=qs9YojmxaBvaoRMMvKYKVTcWDayj9WkR0IKTPUu/vAIZkqVD3Pp2jD/2xmDOE5qA95sCwPoZBic7JUl5b9NLwmZ8D7yDaUgtCHnnJMo2NRZQKc664L39MKTXUxp3TfIOI8NR/KwKizV6nlGEnHV751swePZGm+U9Eoe5o2SwJE0=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1697519612; x=1729055612;
 h=from:to:cc:subject:date:message-id:mime-version:
 content-transfer-encoding;
 bh=+HsDM486HAWMwmXkj+i3E5gDUkYCxpF9a88C6MZZa1E=;
 b=SrBjEfXyDZLRupK4u3dtkcLMUvs6Z/HmTDHbwkJTyLXvpLmxnC3pu6tm
 771/jnfYWv9GpEh5Bp8UsB37trOJncYsEGMoJkib69Pe3PuEX62rnnF7J
 A605a7r9t7Sq/eDqxU+O7bKYGmASLAGHdBNTBJ2N0/nl1JRhwACqwIC2w
 jTO3ORYxI/obWFycX7W6ZzbCEs9DQuS1jLPZqnkM3kOyXUFWszUypMweb
 n2sjyvdbOVZMS+5ab+ub2k4rHoecMpmiPkG1rcJKHyA9LmCkLvIcSYD0B
 FFkIqHw1yZb3Ir7Rm0mluhcYTCLT5jfyMVLXTM3Fd1vvTblxC/dxIK/Er g==;
X-IronPort-AV: E=McAfee;i="6600,9927,10865"; a="449923780"
X-IronPort-AV: E=Sophos;i="6.03,231,1694761200"; d="scan'208";a="449923780"
Received: from orsmga006.jf.intel.com ([10.7.209.51])
 by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Oct 2023 22:13:30 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10865"; a="732582529"
X-IronPort-AV: E=Sophos;i="6.03,231,1694761200"; d="scan'208";a="732582529"
Received: from shvmail03.sh.intel.com ([10.239.245.20])
 by orsmga006.jf.intel.com with ESMTP; 16 Oct 2023 22:13:28 -0700
Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com
 [10.239.240.127])
 by shvmail03.sh.intel.com (Postfix) with ESMTP id E19B91005717;
 Tue, 17 Oct 2023 13:13:27 +0800 (CST)
From: liuhongt <hongtao.liu@intel.com>
To: gcc-patches@gcc.gnu.org
Cc: crazylht@gmail.com,
	hjl.tools@gmail.com
Subject: [PATCH] Support 32/64-bit vectorization for _Float16 fma related
 operations.
Date: Tue, 17 Oct 2023 13:13:27 +0800
Message-Id: <20231017051327.110300-1-hongtao.liu@intel.com>
X-Mailer: git-send-email 2.31.1
MIME-Version: 1.0
X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0,
 KAM_SHORT,
 SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1779978352648232788
X-GMAIL-MSGID: 1779978352648232788

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.

gcc/ChangeLog:

	* config/i386/mmx.md (fma<mode>4): New expander.
	(fms<mode>4): Ditto.
	(fnma<mode>4): Ditto.
	(fnms<mode>4): Ditto.
	(vec_fmaddsubv4hf4): Ditto.
	(vec_fmsubaddv4hf4): Ditto.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/part-vect-fmaddsubhf-1.c: New test.
	* gcc.target/i386/part-vect-fmahf-1.c: New test.
---
 gcc/config/i386/mmx.md                        | 152 +++++++++++++++++-
 .../gcc.target/i386/part-vect-fmaddsubhf-1.c  |  22 +++
 .../gcc.target/i386/part-vect-fmahf-1.c       |  58 +++++++
 3 files changed, 231 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-fmaddsubhf-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-fmahf-1.c

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 82ca49c207b..491a0a51272 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -2365,7 +2365,157 @@ (define_expand "signbit<mode>2"
 
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
-;; Parallel single-precision floating point conversion operations
+;; Parallel half-precision FMA multiply/accumulate instructions.
+;;
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define_expand "fma<mode>4"
+  [(set (match_operand:VHF_32_64 0 "register_operand")
+	(fma:VHF_32_64
+	  (match_operand:VHF_32_64 1 "nonimmediate_operand")
+	  (match_operand:VHF_32_64 2 "nonimmediate_operand")
+	  (match_operand:VHF_32_64 3 "nonimmediate_operand")))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL && ix86_partial_vec_fp_math"
+{
+  rtx op3 = gen_reg_rtx (V8HFmode);
+  rtx op2 = gen_reg_rtx (V8HFmode);
+  rtx op1 = gen_reg_rtx (V8HFmode);
+  rtx op0 = gen_reg_rtx (V8HFmode);
+
+  emit_insn (gen_mov<mov_to_sse_suffix>_<mode>_to_sse (op3, operands[3]));
+  emit_insn (gen_mov<mov_to_sse_suffix>_<mode>_to_sse (op2, operands[2]));
+  emit_insn (gen_mov<mov_to_sse_suffix>_<mode>_to_sse (op1, operands[1]));
+
+  emit_insn (gen_fmav8hf4 (op0, op1, op2, op3));
+
+  emit_move_insn (operands[0], lowpart_subreg (<MODE>mode, op0, V8HFmode));
+  DONE;
+})
+
+(define_expand "fms<mode>4"
+  [(set (match_operand:VHF_32_64 0 "register_operand")
+	(fma:VHF_32_64
+	  (match_operand:VHF_32_64   1 "nonimmediate_operand")
+	  (match_operand:VHF_32_64   2 "nonimmediate_operand")
+	  (neg:VHF_32_64
+	    (match_operand:VHF_32_64 3 "nonimmediate_operand"))))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL && ix86_partial_vec_fp_math"
+{
+  rtx op3 = gen_reg_rtx (V8HFmode);
+  rtx op2 = gen_reg_rtx (V8HFmode);
+  rtx op1 = gen_reg_rtx (V8HFmode);
+  rtx op0 = gen_reg_rtx (V8HFmode);
+
+  emit_insn (gen_mov<mov_to_sse_suffix>_<mode>_to_sse (op3, operands[3]));
+  emit_insn (gen_mov<mov_to_sse_suffix>_<mode>_to_sse (op2, operands[2]));
+  emit_insn (gen_mov<mov_to_sse_suffix>_<mode>_to_sse (op1, operands[1]));
+
+  emit_insn (gen_fmsv8hf4 (op0, op1, op2, op3));
+
+  emit_move_insn (operands[0], lowpart_subreg (<MODE>mode, op0, V8HFmode));
+  DONE;
+})
+
+(define_expand "fnma<mode>4"
+  [(set (match_operand:VHF_32_64 0 "register_operand")
+	(fma:VHF_32_64
+	  (neg:VHF_32_64
+	    (match_operand:VHF_32_64 1 "nonimmediate_operand"))
+	  (match_operand:VHF_32_64   2 "nonimmediate_operand")
+	  (match_operand:VHF_32_64   3 "nonimmediate_operand")))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL && ix86_partial_vec_fp_math"
+{
+  rtx op3 = gen_reg_rtx (V8HFmode);
+  rtx op2 = gen_reg_rtx (V8HFmode);
+  rtx op1 = gen_reg_rtx (V8HFmode);
+  rtx op0 = gen_reg_rtx (V8HFmode);
+
+  emit_insn (gen_mov<mov_to_sse_suffix>_<mode>_to_sse (op3, operands[3]));
+  emit_insn (gen_mov<mov_to_sse_suffix>_<mode>_to_sse (op2, operands[2]));
+  emit_insn (gen_mov<mov_to_sse_suffix>_<mode>_to_sse (op1, operands[1]));
+
+  emit_insn (gen_fnmav8hf4 (op0, op1, op2, op3));
+
+  emit_move_insn (operands[0], lowpart_subreg (<MODE>mode, op0, V8HFmode));
+  DONE;
+})
+
+(define_expand "fnms<mode>4"
+  [(set (match_operand:VHF_32_64 0 "register_operand" "=v,v,x")
+	(fma:VHF_32_64
+	  (neg:VHF_32_64
+	    (match_operand:VHF_32_64 1 "nonimmediate_operand"))
+	  (match_operand:VHF_32_64   2 "nonimmediate_operand")
+	  (neg:VHF_32_64
+	    (match_operand:VHF_32_64 3 "nonimmediate_operand"))))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL && ix86_partial_vec_fp_math"
+{
+  rtx op3 = gen_reg_rtx (V8HFmode);
+  rtx op2 = gen_reg_rtx (V8HFmode);
+  rtx op1 = gen_reg_rtx (V8HFmode);
+  rtx op0 = gen_reg_rtx (V8HFmode);
+
+  emit_insn (gen_mov<mov_to_sse_suffix>_<mode>_to_sse (op3, operands[3]));
+  emit_insn (gen_mov<mov_to_sse_suffix>_<mode>_to_sse (op2, operands[2]));
+  emit_insn (gen_mov<mov_to_sse_suffix>_<mode>_to_sse (op1, operands[1]));
+
+  emit_insn (gen_fnmsv8hf4 (op0, op1, op2, op3));
+
+  emit_move_insn (operands[0], lowpart_subreg (<MODE>mode, op0, V8HFmode));
+  DONE;
+})
+
+(define_expand "vec_fmaddsubv4hf4"
+  [(match_operand:V4HF 0 "register_operand")
+   (match_operand:V4HF 1 "nonimmediate_operand")
+   (match_operand:V4HF 2 "nonimmediate_operand")
+   (match_operand:V4HF 3 "nonimmediate_operand")]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL
+   && TARGET_MMX_WITH_SSE
+   && ix86_partial_vec_fp_math"
+{
+  rtx op3 = gen_reg_rtx (V8HFmode);
+  rtx op2 = gen_reg_rtx (V8HFmode);
+  rtx op1 = gen_reg_rtx (V8HFmode);
+  rtx op0 = gen_reg_rtx (V8HFmode);
+
+  emit_insn (gen_movq_v4hf_to_sse (op3, operands[3]));
+  emit_insn (gen_movq_v4hf_to_sse (op2, operands[2]));
+  emit_insn (gen_movq_v4hf_to_sse (op1, operands[1]));
+
+  emit_insn (gen_vec_fmaddsubv8hf4 (op0, op1, op2, op3));
+
+  emit_move_insn (operands[0], lowpart_subreg (V4HFmode, op0, V8HFmode));
+  DONE;
+})
+
+(define_expand "vec_fmsubaddv4hf4"
+  [(match_operand:V4HF 0 "register_operand")
+   (match_operand:V4HF 1 "nonimmediate_operand")
+   (match_operand:V4HF 2 "nonimmediate_operand")
+   (match_operand:V4HF 3 "nonimmediate_operand")]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL
+   && ix86_partial_vec_fp_math
+   && TARGET_MMX_WITH_SSE"
+{
+  rtx op3 = gen_reg_rtx (V8HFmode);
+  rtx op2 = gen_reg_rtx (V8HFmode);
+  rtx op1 = gen_reg_rtx (V8HFmode);
+  rtx op0 = gen_reg_rtx (V8HFmode);
+
+  emit_insn (gen_movq_v4hf_to_sse (op3, operands[3]));
+  emit_insn (gen_movq_v4hf_to_sse (op2, operands[2]));
+  emit_insn (gen_movq_v4hf_to_sse (op1, operands[1]));
+
+  emit_insn (gen_vec_fmsubaddv8hf4 (op0, op1, op2, op3));
+
+  emit_move_insn (operands[0], lowpart_subreg (V4HFmode, op0, V8HFmode));
+  DONE;
+})
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;;
+;; Parallel half-precision floating point conversion operations
 ;;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
diff --git a/gcc/testsuite/gcc.target/i386/part-vect-fmaddsubhf-1.c b/gcc/testsuite/gcc.target/i386/part-vect-fmaddsubhf-1.c
new file mode 100644
index 00000000000..051f992f66e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/part-vect-fmaddsubhf-1.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */
+/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \t\]+\[^\n\]*%xmm\[0-9\]" 1 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \t\]+\[^\n\]*%xmm\[0-9\]" 1 { target { ! ia32 } } } } */
+
+void vec_fmaddsub_fp16(int n, _Float16 da_r, _Float16 *x, _Float16* y, _Float16* __restrict z)
+{
+  for (int i = 0; i < 4; i += 2)
+    {
+      z[i] =  da_r * x[i] - y[i];
+      z[i+1]  =  da_r * x[i+1] + y[i+1];
+    }
+}
+
+void vec_fmasubadd_fp16(int n, _Float16 da_r, _Float16 *x, _Float16* y, _Float16* __restrict z)
+{
+  for (int i = 0; i < 4; i += 2)
+    {
+      z[i] =  da_r * x[i] + y[i];
+      z[i+1]  =  da_r * x[i+1] - y[i+1];
+    }
+}
diff --git a/gcc/testsuite/gcc.target/i386/part-vect-fmahf-1.c b/gcc/testsuite/gcc.target/i386/part-vect-fmahf-1.c
new file mode 100644
index 00000000000..46e3cd34103
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/part-vect-fmahf-1.c
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */
+/* { dg-final { scan-assembler-times "vfmadd132ph\[^\n\r\]*xmm\[0-9\]" 2 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vfnmadd132ph\[^\n\r\]*xmm\[0-9\]" 2 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vfmsub132ph\[^\n\r\]*xmm\[0-9\]" 2 { target { ! ia32 } } } } */
+/* { dg-final { scan-assembler-times "vfnmsub132ph\[^\n\r\]*xmm\[0-9\]" 2 { target { ! ia32 } } } } */
+
+typedef _Float16 v4hf __attribute__ ((__vector_size__ (8)));
+typedef _Float16 v2hf __attribute__ ((__vector_size__ (4)));
+
+v4hf
+fma_v4hf (v4hf a, v4hf b, v4hf c)
+{
+  return a * b + c;
+}
+
+v4hf
+fnma_v4hf (v4hf a, v4hf b, v4hf c)
+{
+  return -a * b + c;
+}
+
+v4hf
+fms_v4hf (v4hf a, v4hf b, v4hf c)
+{
+  return a * b - c;
+}
+
+v4hf
+fnms_v4hf (v4hf a, v4hf b, v4hf c)
+{
+  return -a * b - c;
+}
+
+v2hf
+fma_v2hf (v2hf a, v2hf b, v2hf c)
+{
+  return a * b + c;
+}
+
+v2hf
+fnma_v2hf (v2hf a, v2hf b, v2hf c)
+{
+  return -a * b + c;
+}
+
+v2hf
+fms_v2hf (v2hf a, v2hf b, v2hf c)
+{
+  return a * b - c;
+}
+
+v2hf
+fnms_v2hf (v2hf a, v2hf b, v2hf c)
+{
+  return -a * b - c;
+}
+