From patchwork Tue Oct 17 05:13:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 153897 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp3907664vqb; Mon, 16 Oct 2023 22:13:59 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHlZQUIbMIcGNs9qhTtRexcHsHhcVzzAEclVReQjlQwc3e71jOyUfceXsdGSPTPvKMSk+Bz X-Received: by 2002:a05:620a:1927:b0:773:fd71:6e7c with SMTP id bj39-20020a05620a192700b00773fd716e7cmr1277780qkb.59.1697519639009; Mon, 16 Oct 2023 22:13:59 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1697519638; cv=pass; d=google.com; s=arc-20160816; b=0tNsAk0YSYWG2+bAI44ePx+w6D4LKvSfx03/qF1xs0tqHF+aDRa5lN0ghiKoQYhaj9 evQfvD8HXujDDQXMnanWdKVP9PLVWVEkE4H2fDBJxluwyI1yGx+Xv4iweuzROMdgI5OT xIuvsEhkiekpnLvY7afTtCw/eOQE/Fz/vxT1ICtWAwR/mUk3N/ln1TNHpCV04lY9+w8R /Z/zO83oXfyGH3gnjBRNdLL6CMciZZMU/jPyL5ntrgr1pyQzLVD8dfF4fWzswjza0UBj T43z5dRizC5qbYR9j7XqGnvOxgxs2EWVZwi6r5KUnpO1r21Zu8h9WOqcQYh/FMvtvNT5 77AQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :arc-filter:dmarc-filter:delivered-to; bh=hQbvScsvsI97ysi2ND2+BzLFSeBGjG3DCs3iJWYBp78=; fh=ChXOctppJn0KECDRINafwUY5xHRufGHaa0Ju9pddrcQ=; b=v0psC0Yq6ixmw8TGOm51Iu7BUx+oH9QcQ5LXL9fHp2RRuEc+3yMzDxfD9s5yGEkal7 dVktMhpFCli+0anrLaB/R9k2hzO3dr2ZRTszE00QNJ6aK95OEC7lSwt8VwRvYzBd9lcE kniqU7/YbhA4lHq2Cqf4b90/ZBNQ7RdNHXLqpmI6XgnOGHigeyUUIkGNJ5NpunGDOttd xNIhGs9p7gWOxcA9+212W7cpZ6rvCbJe4JgUrgqGtECVSfGAUjPlXQ0qt+bLviSCB1ov IbiCY4dQhLOTQhWn3OSRbfc18QKneQhY0pITq2hUBXSTkrEQJSo1jE3NVZ54+QBYH7ZU w/aw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=SrBjEfXy; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id b8-20020a05620a0cc800b0076f1024f25asi550165qkj.448.2023.10.16.22.13.58 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 22:13:58 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=SrBjEfXy; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C43E33857353 for ; Tue, 17 Oct 2023 05:13:58 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by sourceware.org (Postfix) with ESMTPS id 28F393858423 for ; Tue, 17 Oct 2023 05:13:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 28F393858423 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 28F393858423 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=134.134.136.31 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697519615; cv=none; b=H8JZXpzUN7ZAK7troZgHN+1ryuFwcI+/VvVTT6NGyMW8twjzPMlE24espd8mBnj/VfX+F8FMx6vs2IiANkFyMfwbNPCA/PYsYrllh3AFEW2VcIdfYLGrGgGc+YBFxb/ZNwMVzrnuq3KHsi5RU5sYCG0J9mxpuvbrBdfqmHlEkNk= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697519615; c=relaxed/simple; bh=+HsDM486HAWMwmXkj+i3E5gDUkYCxpF9a88C6MZZa1E=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=qs9YojmxaBvaoRMMvKYKVTcWDayj9WkR0IKTPUu/vAIZkqVD3Pp2jD/2xmDOE5qA95sCwPoZBic7JUl5b9NLwmZ8D7yDaUgtCHnnJMo2NRZQKc664L39MKTXUxp3TfIOI8NR/KwKizV6nlGEnHV751swePZGm+U9Eoe5o2SwJE0= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697519612; x=1729055612; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=+HsDM486HAWMwmXkj+i3E5gDUkYCxpF9a88C6MZZa1E=; b=SrBjEfXyDZLRupK4u3dtkcLMUvs6Z/HmTDHbwkJTyLXvpLmxnC3pu6tm 771/jnfYWv9GpEh5Bp8UsB37trOJncYsEGMoJkib69Pe3PuEX62rnnF7J A605a7r9t7Sq/eDqxU+O7bKYGmASLAGHdBNTBJ2N0/nl1JRhwACqwIC2w jTO3ORYxI/obWFycX7W6ZzbCEs9DQuS1jLPZqnkM3kOyXUFWszUypMweb n2sjyvdbOVZMS+5ab+ub2k4rHoecMpmiPkG1rcJKHyA9LmCkLvIcSYD0B FFkIqHw1yZb3Ir7Rm0mluhcYTCLT5jfyMVLXTM3Fd1vvTblxC/dxIK/Er g==; X-IronPort-AV: E=McAfee;i="6600,9927,10865"; a="449923780" X-IronPort-AV: E=Sophos;i="6.03,231,1694761200"; d="scan'208";a="449923780" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Oct 2023 22:13:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10865"; a="732582529" X-IronPort-AV: E=Sophos;i="6.03,231,1694761200"; d="scan'208";a="732582529" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga006.jf.intel.com with ESMTP; 16 Oct 2023 22:13:28 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id E19B91005717; Tue, 17 Oct 2023 13:13:27 +0800 (CST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com, hjl.tools@gmail.com Subject: [PATCH] Support 32/64-bit vectorization for _Float16 fma related operations. Date: Tue, 17 Oct 2023 13:13:27 +0800 Message-Id: <20231017051327.110300-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1779978352648232788 X-GMAIL-MSGID: 1779978352648232788 Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ready push to trunk. gcc/ChangeLog: * config/i386/mmx.md (fma4): New expander. (fms4): Ditto. (fnma4): Ditto. (fnms4): Ditto. (vec_fmaddsubv4hf4): Ditto. (vec_fmsubaddv4hf4): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/part-vect-fmaddsubhf-1.c: New test. * gcc.target/i386/part-vect-fmahf-1.c: New test. --- gcc/config/i386/mmx.md | 152 +++++++++++++++++- .../gcc.target/i386/part-vect-fmaddsubhf-1.c | 22 +++ .../gcc.target/i386/part-vect-fmahf-1.c | 58 +++++++ 3 files changed, 231 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-fmaddsubhf-1.c create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-fmahf-1.c diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index 82ca49c207b..491a0a51272 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -2365,7 +2365,157 @@ (define_expand "signbit2" ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; -;; Parallel single-precision floating point conversion operations +;; Parallel half-precision FMA multiply/accumulate instructions. +;; +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(define_expand "fma4" + [(set (match_operand:VHF_32_64 0 "register_operand") + (fma:VHF_32_64 + (match_operand:VHF_32_64 1 "nonimmediate_operand") + (match_operand:VHF_32_64 2 "nonimmediate_operand") + (match_operand:VHF_32_64 3 "nonimmediate_operand")))] + "TARGET_AVX512FP16 && TARGET_AVX512VL && ix86_partial_vec_fp_math" +{ + rtx op3 = gen_reg_rtx (V8HFmode); + rtx op2 = gen_reg_rtx (V8HFmode); + rtx op1 = gen_reg_rtx (V8HFmode); + rtx op0 = gen_reg_rtx (V8HFmode); + + emit_insn (gen_mov__to_sse (op3, operands[3])); + emit_insn (gen_mov__to_sse (op2, operands[2])); + emit_insn (gen_mov__to_sse (op1, operands[1])); + + emit_insn (gen_fmav8hf4 (op0, op1, op2, op3)); + + emit_move_insn (operands[0], lowpart_subreg (mode, op0, V8HFmode)); + DONE; +}) + +(define_expand "fms4" + [(set (match_operand:VHF_32_64 0 "register_operand") + (fma:VHF_32_64 + (match_operand:VHF_32_64 1 "nonimmediate_operand") + (match_operand:VHF_32_64 2 "nonimmediate_operand") + (neg:VHF_32_64 + (match_operand:VHF_32_64 3 "nonimmediate_operand"))))] + "TARGET_AVX512FP16 && TARGET_AVX512VL && ix86_partial_vec_fp_math" +{ + rtx op3 = gen_reg_rtx (V8HFmode); + rtx op2 = gen_reg_rtx (V8HFmode); + rtx op1 = gen_reg_rtx (V8HFmode); + rtx op0 = gen_reg_rtx (V8HFmode); + + emit_insn (gen_mov__to_sse (op3, operands[3])); + emit_insn (gen_mov__to_sse (op2, operands[2])); + emit_insn (gen_mov__to_sse (op1, operands[1])); + + emit_insn (gen_fmsv8hf4 (op0, op1, op2, op3)); + + emit_move_insn (operands[0], lowpart_subreg (mode, op0, V8HFmode)); + DONE; +}) + +(define_expand "fnma4" + [(set (match_operand:VHF_32_64 0 "register_operand") + (fma:VHF_32_64 + (neg:VHF_32_64 + (match_operand:VHF_32_64 1 "nonimmediate_operand")) + (match_operand:VHF_32_64 2 "nonimmediate_operand") + (match_operand:VHF_32_64 3 "nonimmediate_operand")))] + "TARGET_AVX512FP16 && TARGET_AVX512VL && ix86_partial_vec_fp_math" +{ + rtx op3 = gen_reg_rtx (V8HFmode); + rtx op2 = gen_reg_rtx (V8HFmode); + rtx op1 = gen_reg_rtx (V8HFmode); + rtx op0 = gen_reg_rtx (V8HFmode); + + emit_insn (gen_mov__to_sse (op3, operands[3])); + emit_insn (gen_mov__to_sse (op2, operands[2])); + emit_insn (gen_mov__to_sse (op1, operands[1])); + + emit_insn (gen_fnmav8hf4 (op0, op1, op2, op3)); + + emit_move_insn (operands[0], lowpart_subreg (mode, op0, V8HFmode)); + DONE; +}) + +(define_expand "fnms4" + [(set (match_operand:VHF_32_64 0 "register_operand" "=v,v,x") + (fma:VHF_32_64 + (neg:VHF_32_64 + (match_operand:VHF_32_64 1 "nonimmediate_operand")) + (match_operand:VHF_32_64 2 "nonimmediate_operand") + (neg:VHF_32_64 + (match_operand:VHF_32_64 3 "nonimmediate_operand"))))] + "TARGET_AVX512FP16 && TARGET_AVX512VL && ix86_partial_vec_fp_math" +{ + rtx op3 = gen_reg_rtx (V8HFmode); + rtx op2 = gen_reg_rtx (V8HFmode); + rtx op1 = gen_reg_rtx (V8HFmode); + rtx op0 = gen_reg_rtx (V8HFmode); + + emit_insn (gen_mov__to_sse (op3, operands[3])); + emit_insn (gen_mov__to_sse (op2, operands[2])); + emit_insn (gen_mov__to_sse (op1, operands[1])); + + emit_insn (gen_fnmsv8hf4 (op0, op1, op2, op3)); + + emit_move_insn (operands[0], lowpart_subreg (mode, op0, V8HFmode)); + DONE; +}) + +(define_expand "vec_fmaddsubv4hf4" + [(match_operand:V4HF 0 "register_operand") + (match_operand:V4HF 1 "nonimmediate_operand") + (match_operand:V4HF 2 "nonimmediate_operand") + (match_operand:V4HF 3 "nonimmediate_operand")] + "TARGET_AVX512FP16 && TARGET_AVX512VL + && TARGET_MMX_WITH_SSE + && ix86_partial_vec_fp_math" +{ + rtx op3 = gen_reg_rtx (V8HFmode); + rtx op2 = gen_reg_rtx (V8HFmode); + rtx op1 = gen_reg_rtx (V8HFmode); + rtx op0 = gen_reg_rtx (V8HFmode); + + emit_insn (gen_movq_v4hf_to_sse (op3, operands[3])); + emit_insn (gen_movq_v4hf_to_sse (op2, operands[2])); + emit_insn (gen_movq_v4hf_to_sse (op1, operands[1])); + + emit_insn (gen_vec_fmaddsubv8hf4 (op0, op1, op2, op3)); + + emit_move_insn (operands[0], lowpart_subreg (V4HFmode, op0, V8HFmode)); + DONE; +}) + +(define_expand "vec_fmsubaddv4hf4" + [(match_operand:V4HF 0 "register_operand") + (match_operand:V4HF 1 "nonimmediate_operand") + (match_operand:V4HF 2 "nonimmediate_operand") + (match_operand:V4HF 3 "nonimmediate_operand")] + "TARGET_AVX512FP16 && TARGET_AVX512VL + && ix86_partial_vec_fp_math + && TARGET_MMX_WITH_SSE" +{ + rtx op3 = gen_reg_rtx (V8HFmode); + rtx op2 = gen_reg_rtx (V8HFmode); + rtx op1 = gen_reg_rtx (V8HFmode); + rtx op0 = gen_reg_rtx (V8HFmode); + + emit_insn (gen_movq_v4hf_to_sse (op3, operands[3])); + emit_insn (gen_movq_v4hf_to_sse (op2, operands[2])); + emit_insn (gen_movq_v4hf_to_sse (op1, operands[1])); + + emit_insn (gen_vec_fmsubaddv8hf4 (op0, op1, op2, op3)); + + emit_move_insn (operands[0], lowpart_subreg (V4HFmode, op0, V8HFmode)); + DONE; +}) + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;; +;; Parallel half-precision floating point conversion operations ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; diff --git a/gcc/testsuite/gcc.target/i386/part-vect-fmaddsubhf-1.c b/gcc/testsuite/gcc.target/i386/part-vect-fmaddsubhf-1.c new file mode 100644 index 00000000000..051f992f66e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/part-vect-fmaddsubhf-1.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512fp16 -mavx512vl -O2" } */ +/* { dg-final { scan-assembler-times "vfmaddsub...ph\[ \t\]+\[^\n\]*%xmm\[0-9\]" 1 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vfmsubadd...ph\[ \t\]+\[^\n\]*%xmm\[0-9\]" 1 { target { ! ia32 } } } } */ + +void vec_fmaddsub_fp16(int n, _Float16 da_r, _Float16 *x, _Float16* y, _Float16* __restrict z) +{ + for (int i = 0; i < 4; i += 2) + { + z[i] = da_r * x[i] - y[i]; + z[i+1] = da_r * x[i+1] + y[i+1]; + } +} + +void vec_fmasubadd_fp16(int n, _Float16 da_r, _Float16 *x, _Float16* y, _Float16* __restrict z) +{ + for (int i = 0; i < 4; i += 2) + { + z[i] = da_r * x[i] + y[i]; + z[i+1] = da_r * x[i+1] - y[i+1]; + } +} diff --git a/gcc/testsuite/gcc.target/i386/part-vect-fmahf-1.c b/gcc/testsuite/gcc.target/i386/part-vect-fmahf-1.c new file mode 100644 index 00000000000..46e3cd34103 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/part-vect-fmahf-1.c @@ -0,0 +1,58 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512fp16 -mavx512vl" } */ +/* { dg-final { scan-assembler-times "vfmadd132ph\[^\n\r\]*xmm\[0-9\]" 2 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vfnmadd132ph\[^\n\r\]*xmm\[0-9\]" 2 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vfmsub132ph\[^\n\r\]*xmm\[0-9\]" 2 { target { ! ia32 } } } } */ +/* { dg-final { scan-assembler-times "vfnmsub132ph\[^\n\r\]*xmm\[0-9\]" 2 { target { ! ia32 } } } } */ + +typedef _Float16 v4hf __attribute__ ((__vector_size__ (8))); +typedef _Float16 v2hf __attribute__ ((__vector_size__ (4))); + +v4hf +fma_v4hf (v4hf a, v4hf b, v4hf c) +{ + return a * b + c; +} + +v4hf +fnma_v4hf (v4hf a, v4hf b, v4hf c) +{ + return -a * b + c; +} + +v4hf +fms_v4hf (v4hf a, v4hf b, v4hf c) +{ + return a * b - c; +} + +v4hf +fnms_v4hf (v4hf a, v4hf b, v4hf c) +{ + return -a * b - c; +} + +v2hf +fma_v2hf (v2hf a, v2hf b, v2hf c) +{ + return a * b + c; +} + +v2hf +fnma_v2hf (v2hf a, v2hf b, v2hf c) +{ + return -a * b + c; +} + +v2hf +fms_v2hf (v2hf a, v2hf b, v2hf c) +{ + return a * b - c; +} + +v2hf +fnms_v2hf (v2hf a, v2hf b, v2hf c) +{ + return -a * b - c; +} +