From patchwork Mon Nov 13 20:07:14 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Xi Ruoyao <xry111@xry111.site>
X-Patchwork-Id: 164637
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b909:0:b0:403:3b70:6f57 with SMTP id t9csp1447907vqg;
        Mon, 13 Nov 2023 12:09:19 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IEQ3db+OxQBf5Q2D3E38yBbGXotR60LJ4Ot+Z7S73rZeZYUheGXMmkTd2+4P+yCnJWEfMe/
X-Received: by 2002:a05:620a:1036:b0:77b:aa20:8fa with SMTP id
 a22-20020a05620a103600b0077baa2008famr266916qkk.46.1699906159088;
        Mon, 13 Nov 2023 12:09:19 -0800 (PST)
ARC-Seal: i=2; a=rsa-sha256; t=1699906159; cv=pass;
        d=google.com; s=arc-20160816;
        b=UJ9PedzdP6rsJs2tz/iowNzSfOcQnAB2sdqBNwUR9ktfk3odY/DsWOaEfdk6V5edAj
         XCAlzYl+ej007AvMaWKMUYh63RZdFz4rxtd0fnSp9MKwka+7zwNpmJSsYsEggzdM5LWN
         nxiVHv3ijbgH2cV6bcF/sgHelB1NQI6RMGp2DTdeygATiqDUPS2S+jIxHAl/h4j5rRFn
         MZrLUe3afsd5+qAfimlQIpVa5RAsr0iiVm3xanp8esJMB+kBRrAWmq4U8eKquOsNc/+/
         N+jWRUiQZm9injrEeT0YB9r+aL2hfq64nJdM+qGeWKiJzegH2SZkocg86h65CoiEVAC9
         BggA==
ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=errors-to:list-subscribe:list-help:list-post:list-archive
         :list-unsubscribe:list-id:precedence:content-transfer-encoding
         :mime-version:message-id:date:subject:cc:to:from:dkim-signature
         :arc-filter:dmarc-filter:delivered-to;
        bh=5uFd4EraRp97WZFq/Nn9UGhhv3DJntSWfr5fUTHix/M=;
        fh=oUCfM/eMlWtMCtZZKY1bglzxCo7b3kw9D5LTFFWuz38=;
        b=NugW+N5pL/5Xv+kytU8CQLA8tw9uclP9TW8NyUUVFbBvN6m8EniVLPQgGdPCkxBPs+
         GkGFGDdEzRc021FjAcVP0N1JsEmJy8BMd0uEVKvCuL4SoZOcbbwiAySm3ZRxrCL/zsRN
         0utE462ODqj2s8wykLqLrHmzBFBvuYvXi7C6eWMevvwGq/QZpBjpyFMMY51OYdxbxnPP
         Y0liRPxZ7dNfzAT7G7qIErzuBnB3wByD2bz6Pr2I3GLxFxr5XZKD3d38uZs79MC9laPM
         OxUfGdjT5FMplH8PrpdZinKPBB1STSHPaBcKx6sdzAh5Aff0JvDI4x5FLbP4orJRkMby
         Nmqg==
ARC-Authentication-Results: i=2; mx.google.com;
       dkim=pass header.i=@xry111.site header.s=default header.b=SuIAvipm;
       arc=pass (i=1);
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=xry111.site
Received: from server2.sourceware.org (server2.sourceware.org.
 [2620:52:3:1:0:246e:9693:128c])
        by mx.google.com with ESMTPS id
 h17-20020a05620a401100b0077742712562si5523331qko.335.2023.11.13.12.09.18
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 13 Nov 2023 12:09:19 -0800 (PST)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 client-ip=2620:52:3:1:0:246e:9693:128c;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@xry111.site header.s=default header.b=SuIAvipm;
       arc=pass (i=1);
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=xry111.site
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id D001C3858C2F
	for <ouuuleilei@gmail.com>; Mon, 13 Nov 2023 20:09:18 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from xry111.site (xry111.site [89.208.246.23])
 by sourceware.org (Postfix) with ESMTPS id 518343858C2F
 for <gcc-patches@gcc.gnu.org>; Mon, 13 Nov 2023 20:08:53 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 518343858C2F
Authentication-Results: sourceware.org;
 dmarc=pass (p=reject dis=none) header.from=xry111.site
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=xry111.site
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 518343858C2F
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=89.208.246.23
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699906135; cv=none;
 b=BkFgQW4hTC5Hk0HByOXoStIHJ8MDiheTowrNnJtGrmuiAXdh/SFmvIEWkCuTnUqMG3col3Eyc9adz0fbFzi7HO59iXpzcaxKOM46ai+ICU5s25TFHxJElqBfm7vt4d2JxopXVhFhWXKtbWB2OchFDLC+PYEdRMMTY1HmiCyrYcM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1699906135; c=relaxed/simple;
 bh=2l/07mAwKe9H4wiZa+YbzWsZSjm70gdiwdUmWNfxrMM=;
 h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version;
 b=aN5X97m9TbB5f1hjJP8eNxmumy55GfqW5CKrMh6Le4vJ50QysFT9ePmVedGumAfuBoUoH8xCMwFD39RxqSDsdI74fTkuyXYyoMawyzcNw7caMdCbpwJU5wqmaEFQzsRX6N2eFPP78oq9ggcAPkE0KuvNLU5n6Vq0ifSvhMAiiLE=
ARC-Authentication-Results: i=1; server2.sourceware.org
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=xry111.site;
 s=default; t=1699906132;
 bh=2l/07mAwKe9H4wiZa+YbzWsZSjm70gdiwdUmWNfxrMM=;
 h=From:To:Cc:Subject:Date:From;
 b=SuIAvipmGA1eA/7MupfNk+ZF6a7XgTG61/Egf/WACsP3/H4b3dQxB+6oGIJa1xuQU
 XfXCIVyknKdwRk/HnrwojTWLfCbkT6dhFUT/mPZ+ofmLN+u8x8RhlWuRoZ6NFVwCBI
 igaGuDVnnx2uKz9Pu1s/a90CnZhPttKFj8PizcF4=
Received: from stargazer.. (unknown
 [IPv6:240e:358:11b3:9500:dc73:854d:832e:2])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384)
 (Client did not present a certificate)
 (Authenticated sender: xry111@xry111.site)
 by xry111.site (Postfix) with ESMTPSA id 5D34066B06;
 Mon, 13 Nov 2023 15:08:48 -0500 (EST)
From: Xi Ruoyao <xry111@xry111.site>
To: gcc-patches@gcc.gnu.org
Cc: chenglulu <chenglulu@loongson.cn>, i@xen0n.name, xuchenghua@loongson.cn,
 Xi Ruoyao <xry111@xry111.site>
Subject: [PATCH] LoongArch: Handle vectorized copysign (x,
 -1) expansion efficiently
Date: Tue, 14 Nov 2023 04:07:14 +0800
Message-ID: <20231113200840.339229-1-xry111@xry111.site>
X-Mailer: git-send-email 2.42.1
MIME-Version: 1.0
X-Spam-Status: No, score=-9.0 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, LIKELY_SPAM_FROM,
 SPF_HELO_PASS, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1782480800352589998
X-GMAIL-MSGID: 1782480800352589998

With LSX or LASX, copysign (x[i], -1) (or any negative constant) can be
vectorized using [x]vbitseti.{w/d} instructions to directly set the
signbits.

Inspired by Tamar Christina's "AArch64: Handle copysign (x, -1) expansion
efficiently" (r14-5289).

gcc/ChangeLog:

	* config/loongarch/lsx.md (copysign<mode>3): Allow operand[2] to
	be an reg_or_vector_same_val_operand.  If it's a const vector
	with same negative elements, expand the copysign with a bitset
	instruction.  Otherwise, force it into an register.
	* config/loongarch/lasx.md (copysign<mode>3): Likewise.

gcc/testsuite/ChangeLog:

	* g++.target/loongarch/vect-copysign-negconst.C: New test.
	* g++.target/loongarch/vect-copysign-negconst-run.C: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/lasx.md                  | 22 ++++++++-
 gcc/config/loongarch/lsx.md                   | 22 ++++++++-
 .../loongarch/vect-copysign-negconst-run.C    | 47 +++++++++++++++++++
 .../loongarch/vect-copysign-negconst.C        | 27 +++++++++++
 4 files changed, 116 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/loongarch/vect-copysign-negconst-run.C
 create mode 100644 gcc/testsuite/g++.target/loongarch/vect-copysign-negconst.C

diff --git a/gcc/config/loongarch/lasx.md b/gcc/config/loongarch/lasx.md
index f0f2dd08dd8..2e11f061202 100644
--- a/gcc/config/loongarch/lasx.md
+++ b/gcc/config/loongarch/lasx.md
@@ -3136,11 +3136,31 @@ (define_expand "copysign<mode>3"
 	  (match_operand:FLASX 1 "register_operand")))
    (set (match_dup 5)
 	(and:FLASX (match_dup 3)
-		   (match_operand:FLASX 2 "register_operand")))
+		   (match_operand:FLASX 2 "reg_or_vector_same_val_operand")))
    (set (match_operand:FLASX 0 "register_operand")
 	(ior:FLASX (match_dup 4) (match_dup 5)))]
   "ISA_HAS_LASX"
 {
+  /* copysign (x, -1) should instead be expanded as setting the sign
+     bit.  */
+  if (!REG_P (operands[2]))
+    {
+      rtx op2_elt = unwrap_const_vec_duplicate (operands[2]);
+      if (GET_CODE (op2_elt) == CONST_DOUBLE
+	  && real_isneg (CONST_DOUBLE_REAL_VALUE (op2_elt)))
+	{
+	  rtx n = GEN_INT (8 * GET_MODE_SIZE (<UNITMODE>mode) - 1);
+	  operands[0] = lowpart_subreg (<VIMODE256>mode, operands[0],
+					<MODE>mode);
+	  operands[1] = lowpart_subreg (<VIMODE256>mode, operands[1],
+					<MODE>mode);
+	  emit_insn (gen_lasx_xvbitseti_<lasxfmt> (operands[0],
+						   operands[1], n));
+	  DONE;
+	}
+    }
+
+  operands[2] = force_reg (<MODE>mode, operands[2]);
   operands[3] = loongarch_build_signbit_mask (<MODE>mode, 1, 0);
 
   operands[4] = gen_reg_rtx (<MODE>mode);
diff --git a/gcc/config/loongarch/lsx.md b/gcc/config/loongarch/lsx.md
index 55c7d79a030..8ea41c85b01 100644
--- a/gcc/config/loongarch/lsx.md
+++ b/gcc/config/loongarch/lsx.md
@@ -2873,11 +2873,31 @@ (define_expand "copysign<mode>3"
 	  (match_operand:FLSX 1 "register_operand")))
    (set (match_dup 5)
 	(and:FLSX (match_dup 3)
-		  (match_operand:FLSX 2 "register_operand")))
+		  (match_operand:FLSX 2 "reg_or_vector_same_val_operand")))
    (set (match_operand:FLSX 0 "register_operand")
 	(ior:FLSX (match_dup 4) (match_dup 5)))]
   "ISA_HAS_LSX"
 {
+  /* copysign (x, -1) should instead be expanded as setting the sign
+     bit.  */
+  if (!REG_P (operands[2]))
+    {
+      rtx op2_elt = unwrap_const_vec_duplicate (operands[2]);
+      if (GET_CODE (op2_elt) == CONST_DOUBLE
+	  && real_isneg (CONST_DOUBLE_REAL_VALUE (op2_elt)))
+	{
+	  rtx n = GEN_INT (8 * GET_MODE_SIZE (<UNITMODE>mode) - 1);
+	  operands[0] = lowpart_subreg (<VIMODE>mode, operands[0],
+					<MODE>mode);
+	  operands[1] = lowpart_subreg (<VIMODE>mode, operands[1],
+					<MODE>mode);
+	  emit_insn (gen_lsx_vbitseti_<lsxfmt> (operands[0], operands[1],
+						n));
+	  DONE;
+	}
+    }
+
+  operands[2] = force_reg (<MODE>mode, operands[2]);
   operands[3] = loongarch_build_signbit_mask (<MODE>mode, 1, 0);
 
   operands[4] = gen_reg_rtx (<MODE>mode);
diff --git a/gcc/testsuite/g++.target/loongarch/vect-copysign-negconst-run.C b/gcc/testsuite/g++.target/loongarch/vect-copysign-negconst-run.C
new file mode 100644
index 00000000000..d2d5d15c933
--- /dev/null
+++ b/gcc/testsuite/g++.target/loongarch/vect-copysign-negconst-run.C
@@ -0,0 +1,47 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -march=loongarch64 -mlasx -mno-strict-align" } */
+/* { dg-require-effective-target loongarch_asx_hw } */
+
+#include "vect-copysign-negconst.C"
+
+double d[] = {1.2, -3.4, -5.6, 7.8};
+float f[] = {1.2, -3.4, -5.6, 7.8, -9.0, -11.4, 51.4, 1919.810};
+
+double _abs(double x) { return __builtin_fabs (x); }
+float _abs(float x) { return __builtin_fabsf (x); }
+
+template <class T>
+void
+check (T *arr, T *orig, int len)
+{
+  for (int i = 0; i < len; i++)
+    {
+      if (arr[i] > 0)
+	__builtin_trap ();
+      if (_abs (arr[i]) != _abs (orig[i]))
+	__builtin_trap ();
+    }
+}
+
+int
+main()
+{
+  double test_d[4];
+  float test_f[8];
+
+  __builtin_memcpy (test_d, d, sizeof (test_d));
+  force_negative<2> (test_d);
+  check (test_d, d, 2);
+
+  __builtin_memcpy (test_d, d, sizeof (test_d));
+  force_negative<4> (test_d);
+  check (test_d, d, 4);
+
+  __builtin_memcpy (test_f, f, sizeof (test_f));
+  force_negative<4> (test_f);
+  check (test_f, f, 4);
+
+  __builtin_memcpy (test_f, f, sizeof (test_f));
+  force_negative<8> (test_f);
+  check (test_f, f, 8);
+}
diff --git a/gcc/testsuite/g++.target/loongarch/vect-copysign-negconst.C b/gcc/testsuite/g++.target/loongarch/vect-copysign-negconst.C
new file mode 100644
index 00000000000..5e8820d2bca
--- /dev/null
+++ b/gcc/testsuite/g++.target/loongarch/vect-copysign-negconst.C
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mlasx -mno-strict-align" } */
+/* { dg-final { scan-assembler "\txvbitseti.*63" } } */
+/* { dg-final { scan-assembler "\txvbitseti.*31" } } */
+/* { dg-final { scan-assembler "\tvbitseti.*63" } } */
+/* { dg-final { scan-assembler "\tvbitseti.*31" } } */
+
+template <int N>
+__attribute__ ((noipa)) void
+force_negative (float *arr)
+{
+  for (int i = 0; i < N; i++)
+    arr[i] = __builtin_copysignf (arr[i], -2);
+}
+
+template <int N>
+__attribute__ ((noipa)) void
+force_negative (double *arr)
+{
+  for (int i = 0; i < N; i++)
+    arr[i] = __builtin_copysign (arr[i], -3);
+}
+
+template void force_negative<4>(float *);
+template void force_negative<8>(float *);
+template void force_negative<2>(double *);
+template void force_negative<4>(double *);