From patchwork Tue Sep 20 02:12:35 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: liuhongt <hongtao.liu@intel.com>
X-Patchwork-Id: 1308
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a5d:5044:0:0:0:0:0 with SMTP id h4csp1241781wrt;
        Mon, 19 Sep 2022 19:15:27 -0700 (PDT)
X-Google-Smtp-Source: 
 AMsMyM6z1qFdpTNGRTU7APIT2mM1UY/35DZ6y2eP8o5u6QL4fNq4SLO0YARzdjBaiyk2SMdrd2os
X-Received: by 2002:a17:907:3f24:b0:77b:1916:7967 with SMTP id
 hq36-20020a1709073f2400b0077b19167967mr15134506ejc.558.1663640127197;
        Mon, 19 Sep 2022 19:15:27 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1663640127; cv=none;
        d=google.com; s=arc-20160816;
        b=jO1EP9WIESo9ZMQ/WKTNoiYUy6aT5hZMaMYLF9pR6XX8LyNYNjPyUMKfi+QcACFYnw
         jZSJzAq+t3pa8gRjELlK6hTpcstQ87hiCw22y2lUU82qJEpZ1XdwggoMdE85rqgqsKi4
         wcyBZ6i9GEkg7PKUPqcDUfFiT3mEZ/9uyGO0KPZlKqphRl9OFUPOv5P6GedjB1F+7rDV
         HnZm7eFy6971QvQ0CD7tzR3IiENDi576p4hcE+x+NYea+BU2nTrUB9aVq5QKIQ3rKMMd
         ekqorlVEF8RKILcd30vCT0OeoaI802ItRfBG2UoTLgUxo/P/h/GTRT8Xf388Yaprlc4a
         zs3Q==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post
         :list-archive:list-unsubscribe:list-id:precedence
         :content-transfer-encoding:mime-version:message-id:date:subject:to
         :dmarc-filter:delivered-to:dkim-signature:dkim-filter;
        bh=sE4KsB8ZVnY8AX7U6f7nAYyHyPrWsT7mk1i2dVm/lbo=;
        b=Wx+l9FVG0gyIramed8XMjtjdKhgUJmG8W0TytBQs75F5ujJ2FKKFYWaCuNYD4PFMap
         GDzF3UubmiJkSt8ymzshzhPizdDzd7OpDFNY5w/tNbq4v9jCxuVC28jkUFZwrf8voZ+5
         /Wx50wxOszywsRagVPV40E3M3wjOUQ1OT3clHsfCJN13SySBrkR/dIXrC8i0th6bcvf5
         m08ddLDsHKocAwpcFvB0uJwclLKseShPQQ91G78MITICptJYlwaNl6+/F5sO8HnayG7v
         7h5ocr9+a5YkaEYiE8hlzcI0ZwQ4JggGAHzUXXFGRr2bJarUsNffQ+0/+ZHl4Hb4Kn1J
         UGAg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=sbVtfPwn;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from sourceware.org (server2.sourceware.org.
 [2620:52:3:1:0:246e:9693:128c])
        by mx.google.com with ESMTPS id
 he6-20020a1709073d8600b00781b5f92182si109911ejc.38.2022.09.19.19.15.27
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 19 Sep 2022 19:15:27 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 client-ip=2620:52:3:1:0:246e:9693:128c;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=sbVtfPwn;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 241BE38582B3
	for <ouuuleilei@gmail.com>; Tue, 20 Sep 2022 02:15:25 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 241BE38582B3
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1663640125;
	bh=sE4KsB8ZVnY8AX7U6f7nAYyHyPrWsT7mk1i2dVm/lbo=;
	h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:From;
	b=sbVtfPwnE6NKyj6Jjgl5MC8ht2M/RK+2dCxDP7vGNrmIjtjD1/3+hMyTNAPRbsRx7
	 i68JFvpJZ+bm+zMJOTvbcbE6VGA2JTSu4MDeYijnA14KbMxulqt8botlZf338Be3jK
	 Q/HjCNE8efHVn/7sH5spJ4UHo0v6ogOM9kXA5bbM=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by sourceware.org (Postfix) with ESMTPS id 6B8A03858D28
 for <gcc-patches@gcc.gnu.org>; Tue, 20 Sep 2022 02:14:39 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6B8A03858D28
X-IronPort-AV: E=McAfee;i="6500,9779,10475"; a="299562592"
X-IronPort-AV: E=Sophos;i="5.93,329,1654585200"; d="scan'208";a="299562592"
Received: from orsmga007.jf.intel.com ([10.7.209.58])
 by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 19 Sep 2022 19:14:38 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.93,329,1654585200"; d="scan'208";a="614219721"
Received: from shvmail03.sh.intel.com ([10.239.245.20])
 by orsmga007.jf.intel.com with ESMTP; 19 Sep 2022 19:14:36 -0700
Received: from shliclel4051.sh.intel.com (shliclel4051.sh.intel.com
 [10.239.240.51])
 by shvmail03.sh.intel.com (Postfix) with ESMTP id 4BE991005165;
 Tue, 20 Sep 2022 10:14:35 +0800 (CST)
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] Support 64-bit vectorization for single-precision floating
 rounding operation.
Date: Tue, 20 Sep 2022 10:12:35 +0800
Message-Id: <20220920021235.2634748-1-hongtao.liu@intel.com>
X-Mailer: git-send-email 2.27.0
MIME-Version: 1.0
X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0,
 KAM_SHORT,
 SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: liuhongt via Gcc-patches <gcc-patches@gcc.gnu.org>
From: liuhongt <hongtao.liu@intel.com>
Reply-To: liuhongt <hongtao.liu@intel.com>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1744453109807782310?=
X-GMAIL-MSGID: =?utf-8?q?1744453109807782310?=

Here's list the patch supported.
rint/nearbyint/ceil/floor/trunc/lrint/lceil/lfloor/round/lround.


Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}
Ok for trunk?

gcc/ChangeLog:

	PR target/106910
	* config/i386/mmx.md (nearbyintv2sf2): New expander.
	(rintv2sf2): Ditto.
	(ceilv2sf2): Ditto.
	(lceilv2sfv2si2): Ditto.
	(floorv2sf2): Ditto.
	(lfloorv2sfv2si2): Ditto.
	(btruncv2sf2): Ditto.
	(lrintv2sfv2si2): Ditto.
	(roundv2sf2): Ditto.
	(lroundv2sfv2si2): Ditto.
	(*mmx_roundv2sf2): New define_insn.

gcc/testsuite/ChangeLog:

	* gcc.target/i386/pr106910-1.c: New test.
---
 gcc/config/i386/mmx.md                     | 154 +++++++++++++++++++++
 gcc/testsuite/gcc.target/i386/pr106910-1.c |  77 +++++++++++
 2 files changed, 231 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr106910-1.c

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index dda4b43f5c1..222a041de58 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1627,6 +1627,160 @@ (define_expand "vec_initv2sfsf"
   DONE;
 })
 
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+;;
+;; Parallel single-precision floating point rounding operations.
+;;
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+(define_expand "nearbyintv2sf2"
+  [(set (match_operand:V2SF 0 "register_operand")
+	(unspec:V2SF
+	  [(match_operand:V2SF 1 "register_operand")
+	   (match_dup 2)]
+	  UNSPEC_ROUND))]
+  "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
+  "operands[2] = GEN_INT (ROUND_MXCSR | ROUND_NO_EXC);")
+
+(define_expand "rintv2sf2"
+  [(set (match_operand:V2SF 0 "register_operand")
+	(unspec:V2SF
+	  [(match_operand:V2SF 1 "register_operand")
+	   (match_dup 2)]
+	  UNSPEC_ROUND))]
+  "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
+  "operands[2] = GEN_INT (ROUND_MXCSR);")
+
+(define_expand "ceilv2sf2"
+  [(set (match_operand:V2SF 0 "register_operand")
+	(unspec:V2SF
+	  [(match_operand:V2SF 1 "register_operand")
+	   (match_dup 2)]
+	  UNSPEC_ROUND))]
+  "TARGET_SSE4_1 && !flag_trapping_math
+   && TARGET_MMX_WITH_SSE"
+  "operands[2] = GEN_INT (ROUND_CEIL | ROUND_NO_EXC);")
+
+(define_expand "lceilv2sfv2si2"
+  [(match_operand:V2SI 0 "register_operand")
+   (match_operand:V2SF 1 "register_operand")]
+ "TARGET_SSE4_1 && !flag_trapping_math
+  && TARGET_MMX_WITH_SSE"
+{
+  rtx tmp = gen_reg_rtx (V2SFmode);
+  emit_insn (gen_ceilv2sf2 (tmp, operands[1]));
+  emit_insn (gen_fix_truncv2sfv2si2 (operands[0], tmp));
+  DONE;
+})
+
+(define_expand "floorv2sf2"
+  [(set (match_operand:V2SF 0 "register_operand")
+	(unspec:V2SF
+	  [(match_operand:V2SF 1 "vector_operand")
+	   (match_dup 2)]
+	  UNSPEC_ROUND))]
+  "TARGET_SSE4_1 && !flag_trapping_math
+  && TARGET_MMX_WITH_SSE"
+  "operands[2] = GEN_INT (ROUND_FLOOR | ROUND_NO_EXC);")
+
+(define_expand "lfloorv2sfv2si2"
+  [(match_operand:V2SI 0 "register_operand")
+   (match_operand:V2SF 1 "register_operand")]
+ "TARGET_SSE4_1 && !flag_trapping_math
+  && TARGET_MMX_WITH_SSE"
+{
+  rtx tmp = gen_reg_rtx (V2SFmode);
+  emit_insn (gen_floorv2sf2 (tmp, operands[1]));
+  emit_insn (gen_fix_truncv2sfv2si2 (operands[0], tmp));
+  DONE;
+})
+
+(define_expand "btruncv2sf2"
+  [(set (match_operand:V2SF 0 "register_operand")
+	(unspec:V2SF
+	  [(match_operand:V2SF 1 "register_operand")
+	   (match_dup 2)]
+	  UNSPEC_ROUND))]
+  "TARGET_SSE4_1 && !flag_trapping_math"
+  "operands[2] = GEN_INT (ROUND_TRUNC | ROUND_NO_EXC);")
+
+(define_insn "*mmx_roundv2sf2"
+  [(set (match_operand:V2SF 0 "register_operand" "=Yr,*x,v")
+	(unspec:V2SF
+	  [(match_operand:V2SF 1 "register_operand" "Yr,x,v")
+	   (match_operand:SI 2 "const_0_to_15_operand")]
+	  UNSPEC_ROUND))]
+  "TARGET_SSE4_1 && TARGET_MMX_WITH_SSE"
+  "%vroundps\t{%2, %1, %0|%0, %1, %2}"
+  [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "type" "ssecvt")
+   (set_attr "prefix_data16" "1,1,*")
+   (set_attr "prefix_extra" "1")
+   (set_attr "length_immediate" "1")
+   (set_attr "prefix" "orig,orig,vex")
+   (set_attr "mode" "V4SF")])
+
+(define_insn "lrintv2sfv2si2"
+  [(set (match_operand:V2SI 0 "register_operand" "=v")
+	(unspec:V2SI
+	  [(match_operand:V2SF 1 "register_operand" "v")]
+	  UNSPEC_FIX_NOTRUNC))]
+  "TARGET_MMX_WITH_SSE"
+  "%vcvtps2dq\t{%1, %0|%0, %1}"
+  [(set_attr "type" "ssecvt")
+   (set (attr "prefix_data16")
+     (if_then_else
+       (match_test "TARGET_AVX")
+     (const_string "*")
+     (const_string "1")))
+   (set_attr "prefix" "maybe_vex")
+   (set_attr "mode" "TI")])
+
+(define_expand "roundv2sf2"
+  [(set (match_dup 3)
+	(plus:V2SF
+	  (match_operand:V2SF 1 "register_operand")
+	  (match_dup 2)))
+   (set (match_operand:V2SF 0 "register_operand")
+	(unspec:V2SF
+	  [(match_dup 3) (match_dup 4)]
+	  UNSPEC_ROUND))]
+  "TARGET_SSE4_1 && !flag_trapping_math
+   && TARGET_MMX_WITH_SSE"
+{
+  const struct real_format *fmt;
+  REAL_VALUE_TYPE pred_half, half_minus_pred_half;
+  rtx half, vec_half;
+
+  /* load nextafter (0.5, 0.0) */
+  fmt = REAL_MODE_FORMAT (SFmode);
+  real_2expN (&half_minus_pred_half, -(fmt->p) - 1, SFmode);
+  real_arithmetic (&pred_half, MINUS_EXPR, &dconsthalf, &half_minus_pred_half);
+  half = const_double_from_real_value (pred_half, SFmode);
+
+  vec_half = ix86_build_const_vector (V2SFmode, true, half);
+  vec_half = force_reg (V2SFmode, vec_half);
+
+  operands[2] = gen_reg_rtx (V2SFmode);
+  emit_insn (gen_copysignv2sf3 (operands[2], vec_half, operands[1]));
+
+  operands[3] = gen_reg_rtx (V2SFmode);
+  operands[4] = GEN_INT (ROUND_TRUNC);
+})
+
+(define_expand "lroundv2sfv2si2"
+  [(match_operand:V2SI 0 "register_operand")
+   (match_operand:V2SF 1 "register_operand")]
+ "TARGET_SSE4_1 && !flag_trapping_math
+  && TARGET_MMX_WITH_SSE"
+{
+  rtx tmp = gen_reg_rtx (V2SFmode);
+  emit_insn (gen_roundv2sf2 (tmp, operands[1]));
+  emit_insn (gen_fix_truncv2sfv2si2 (operands[0], tmp));
+  DONE;
+})
+
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;;
 ;; Parallel half-precision floating point arithmetic
diff --git a/gcc/testsuite/gcc.target/i386/pr106910-1.c b/gcc/testsuite/gcc.target/i386/pr106910-1.c
new file mode 100644
index 00000000000..c7685a32183
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106910-1.c
@@ -0,0 +1,77 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-msse4.1 -O2 -Ofast" } */
+/* { dg-final { scan-assembler-times "roundps" 9 } } */
+/* { dg-final { scan-assembler-times "cvtps2dq" 1 } } */
+/* { dg-final { scan-assembler-times "cvttps2dq" 3 } } */
+
+#include<math.h>
+
+void
+foo (float* p, float* __restrict q)
+{
+  p[0] = truncf (q[0]);
+  p[1] = truncf (q[1]);
+}
+
+void
+foo1 (float* p, float* __restrict q)
+{
+  p[0] = floorf (q[0]);
+  p[1] = floorf (q[1]);
+}
+
+void
+foo1i (int* p, float* __restrict q)
+{
+  p[0] = (int) floorf (q[0]);
+  p[1] = (int) floorf (q[1]);
+}
+
+void
+foo2 (float* p, float* __restrict q)
+{
+  p[0] = ceilf (q[0]);
+  p[1] = ceilf (q[1]);
+}
+
+void
+foo2i (int* p, float* __restrict q)
+{
+  p[0] = (int) ceilf (q[0]);
+  p[1] = (int) ceilf (q[1]);
+}
+
+void
+foo3 (float* p, float* __restrict q)
+{
+  p[0] = rintf (q[0]);
+  p[1] = rintf (q[1]);
+}
+
+void
+foo3i (int* p, float* __restrict q)
+{
+  p[0] = (int) rintf (q[0]);
+  p[1] = (int) rintf (q[1]);
+}
+
+void
+foo4 (float* p, float* __restrict q)
+{
+  p[0] = nearbyintf (q[0]);
+  p[1] = nearbyintf (q[1]);
+}
+
+void
+foo5(float* p, float* __restrict q)
+{
+  p[0] = roundf (q[0]);
+  p[1] = roundf (q[1]);
+}
+
+void
+foo5i(int* p, float* __restrict q)
+{
+  p[0] = (int) roundf (q[0]);
+  p[1] = (int) roundf (q[1]);
+}