From patchwork Thu Jan 25 12:05:28 2024
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Richard Sandiford <richard.sandiford@arm.com>
X-Patchwork-Id: 192043
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a05:7300:2553:b0:103:945f:af90 with SMTP id
 p19csp1588488dyi;
        Thu, 25 Jan 2024 04:06:15 -0800 (PST)
X-Google-Smtp-Source: 
 AGHT+IFGCvD91f9J7Vdwl+GUKzia92DNFErBTOhrxND0DVbPwtHKW0d+fOv9I18f4AsjQJWOYlJg
X-Received: by 2002:ad4:5f0c:0:b0:686:ac19:ac96 with SMTP id
 fo12-20020ad45f0c000000b00686ac19ac96mr1141116qvb.67.1706184375043;
        Thu, 25 Jan 2024 04:06:15 -0800 (PST)
ARC-Seal: i=2; a=rsa-sha256; t=1706184375; cv=pass;
        d=google.com; s=arc-20160816;
        b=Hs23bpkmAl8Vlz7An8JNXdGSdbdMZE0T3GfToFMKahGk4ONuZSft6U4FvKrrx2aJQa
         ek5ZXS+Y1NZDUb5guNjExU/Tydf23RFYOxgBZsAQW8PZ6IsM2pyKqKqKRh7ESvP/6kaf
         IeKjyQt1fY01b7mP26M0seHHYhnwm3SBGfgKZgxe9PiPg2+Gg87CylJi8hre8RsXBfF2
         ZiU/eeZx0GEwmOCvN1j3IByu6y/Zp6AS/7ZBX/1OF5utsid4Wf55DvwQIexaOHiUyUw8
         hUYJF6HAc/r/gb3/jvgulvcC3FsRtfD05J160lv3m6yQFFbApV4/YghFF0bRdFODrapI
         5hmw==
ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=errors-to:list-subscribe:list-help:list-post:list-archive
         :list-unsubscribe:list-id:precedence:content-transfer-encoding
         :mime-version:user-agent:message-id:date:subject:mail-followup-to:to
         :from:arc-filter:dmarc-filter:delivered-to;
        bh=XdrQ0sugEhu+YovYqtjOAUfvlE6nvaPMmDtJ2HelsKk=;
        fh=hPrbWPhweUx4V0GV9uXJqbyAzg2ABmTz7kczrAQqMmM=;
        b=hwkZ0X3qjDYiuftLsDEsj9atgstXq1DY4CyNAERBVJLUeFZ3gqW70MaBaL9CMqHNg0
         D2yH9c43hVwmJuCScLY7v2Tb8lvO82C4IU+isT/JisokgZUsFH1govIZNwKAbDeJ6CK/
         UnLS4MUycb4SzqkZkX01xuETYT4d5UCW221HPkyGD737M8t0WnMV7aIX5gUipOSrhm7g
         lxxG5Z4wwLxnS8jM0yHESac6NubuYQYHpzuBuDWFi70112H+X+K6nHh5Ef+jVxSowL/6
         QI76zXZa5GF7SBet9Zqh1a4FEzixNFQMgs5/pQnOeRLTSO3YPrBJXYKFK4n5Ak2Rcjol
         gS5Q==
ARC-Authentication-Results: i=2; mx.google.com;
       arc=pass (i=1);
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com
Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97])
        by mx.google.com with ESMTPS id
 t19-20020a0cde13000000b0067a56b2ec60si12909999qvk.168.2024.01.25.04.06.14
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 25 Jan 2024 04:06:15 -0800 (PST)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender) client-ip=8.43.85.97;
Authentication-Results: mx.google.com;
       arc=pass (i=1);
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id BD2B53858297
	for <ouuuleilei@gmail.com>; Thu, 25 Jan 2024 12:06:14 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
 by sourceware.org (Postfix) with ESMTP id 0373B3858C66
 for <gcc-patches@gcc.gnu.org>; Thu, 25 Jan 2024 12:05:31 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0373B3858C66
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0373B3858C66
Authentication-Results: server2.sourceware.org;
 arc=none smtp.remote-ip=217.140.110.172
ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706184334; cv=none;
 b=VLJ9BVeHmsviSYx1C3f5h1iMq53cyNvbWUt9tq8tjEG/ZrclNvWtrB4WYuXD+6I0hBe3OOvOL+GlFk2pPuXFnB10C/dsO+NGTBwKplP1HXhKpPMdNDmNwymeZ2ETaaqdn0hHBE6Bgsm7FKpJqmniqExu4zLrqbPtsxEDetF6jAg=
ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key;
 t=1706184334; c=relaxed/simple;
 bh=URWX+5NG5Gc5ITG0l6w3E+vC8NxG+4MKQhKK3Mlr0ro=;
 h=From:To:Subject:Date:Message-ID:MIME-Version;
 b=bQLOc2D/CDiQ3Kp3ETcr9gIPjV1q53GadHQ6opmwsuqSql/r+ttjhD6S2uKzkBlZkYxyMm37Z59pE/e/qF/PZcyLAnzUFxD4Ha9uMk/w6ZF06WPAVdXqUOZ4jwEzGvBk6FQLWayk5s0tvg1Gvja215kJYWzqNC8Ru3QkayR7jw4=
ARC-Authentication-Results: i=1; server2.sourceware.org
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CC81D1FB
 for <gcc-patches@gcc.gnu.org>; Thu, 25 Jan 2024 04:06:14 -0800 (PST)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EC54C3F73F
 for <gcc-patches@gcc.gnu.org>; Thu, 25 Jan 2024 04:05:29 -0800 (PST)
From: Richard Sandiford <richard.sandiford@arm.com>
To: gcc-patches@gcc.gnu.org
Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com
Subject: [pushed] aarch64: Avoid paradoxical subregs in UXTL split [PR113485]
Date: Thu, 25 Jan 2024 12:05:28 +0000
Message-ID: <mpto7d9fwjb.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
X-Spam-Status: No, score=-21.3 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT,
 SPF_HELO_NONE, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1789063987475765989
X-GMAIL-MSGID: 1789063987475765989

g:74e3e839ab2d36841320 handled the UXTL{,2}-ZIP[12] optimisation
in split1.  The UXTL input is a 64-bit vector of N-bit elements
and the result is a 128-bit vector of 2N-bit elements.  The
corresponding ZIP1 operates on 128-bit vectors of N-bit elements.

This meant that the ZIP1 input had to be a 128-bit paradoxical subreg
of the 64-bit UXTL input.  In the PRs, it wasn't possible to generate
this subreg because the inputs were already subregs of a x[234]
structure of 64-bit vectors.

I don't think the same thing can happen for UXTL2->ZIP2 because
UXTL2 input is a 128-bit vector rather than a 64-bit vector.

It isn't really necessary for ZIP1 to take 128-bit inputs,
since the upper 64 bits are ignored.  This patch therefore adds
a pattern for 64-bit → 128-bit ZIP1s.

In principle, we should probably use this form for all ZIP1s.
But in practice, that creates an awkward special case, and
would be quite invasive for stage 4.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
	PR target/113485
	* config/aarch64/aarch64-simd.md (aarch64_zip1<mode>_low): New
	pattern.
	(<optab><Vnarrowq><mode>2): Use it instead of generating a
	paradoxical subreg for the input.

gcc/testsuite/
	PR target/113485
	* gcc.target/aarch64/pr113485.c: New test.
	* gcc.target/aarch64/pr113573.c: Likewise.
---
 gcc/config/aarch64/aarch64-simd.md          | 17 +++++++--
 gcc/testsuite/gcc.target/aarch64/pr113485.c | 25 +++++++++++++
 gcc/testsuite/gcc.target/aarch64/pr113573.c | 40 +++++++++++++++++++++
 3 files changed, 79 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr113485.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr113573.c

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 556d0cf359f..48f0741e7d0 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -8505,6 +8505,18 @@ (define_insn "aarch64_<PERMUTE:perm_insn><mode><vczle><vczbe>"
   [(set_attr "type" "neon_permute<q>")]
 )
 
+;; ZIP1 ignores the contents of the upper halves of the registers,
+;; so we can describe 128-bit operations in terms of 64-bit inputs.
+(define_insn "aarch64_zip1<mode>_low"
+  [(set (match_operand:VQ 0 "register_operand" "=w")
+	(unspec:VQ [(match_operand:<VHALF> 1 "register_operand" "w")
+		    (match_operand:<VHALF> 2 "register_operand" "w")]
+		   UNSPEC_ZIP1))]
+  "TARGET_SIMD"
+  "zip1\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>"
+  [(set_attr "type" "neon_permute_q")]
+)
+
 ;; This instruction's pattern is generated directly by
 ;; aarch64_expand_vec_perm_const, so any changes to the pattern would
 ;; need corresponding changes there.  Note that the immediate (third)
@@ -9685,9 +9697,8 @@ (define_insn_and_split "<optab><Vnarrowq><mode>2"
        not sufficient uses of the zero to make the split worthwhile.  */
     rtx res = simplify_gen_subreg (<VNARROWQ2>mode, operands[0],
 				   <MODE>mode, 0);
-    rtx zero = aarch64_gen_shareable_zero (<VNARROWQ2>mode);
-    rtx op = lowpart_subreg (<VNARROWQ2>mode, operands[1], <VNARROWQ>mode);
-    emit_insn (gen_aarch64_zip1<Vnarrowq2> (res, op, zero));
+    rtx zero = aarch64_gen_shareable_zero (<VNARROWQ>mode);
+    emit_insn (gen_aarch64_zip1<Vnarrowq2>_low (res, operands[1], zero));
     DONE;
   }
   [(set_attr "type" "neon_shift_imm_long")]
diff --git a/gcc/testsuite/gcc.target/aarch64/pr113485.c b/gcc/testsuite/gcc.target/aarch64/pr113485.c
new file mode 100644
index 00000000000..c7028245b61
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr113485.c
@@ -0,0 +1,25 @@
+/* { dg-options "-O" } */
+
+#include <arm_neon.h>
+
+void test()
+{
+  while (1)
+  {
+    static const uint16_t jsimd_rgb_ycc_neon_consts[] = {19595, 0, 0, 0, 0, 0, 0, 0};
+    uint16x8_t consts = vld1q_u16(jsimd_rgb_ycc_neon_consts);
+
+    uint8_t tmp_buf[0];
+    uint8x8x3_t input_pixels = vld3_u8(tmp_buf);
+    uint16x8_t r = vmovl_u8(input_pixels.val[1]);
+    uint32x4_t y_l = vmull_laneq_u16(vget_low_u16(r), consts, 0);
+
+    uint32x4_t s = vdupq_n_u32(1);
+    uint16x4_t a = vrshrn_n_u32(s, 16);
+    uint16x4_t y = vrshrn_n_u32(y_l, 16);
+    uint16x8_t ay = vcombine_u16(a, y);
+
+    unsigned char ***out_buf;
+    vst1_u8(out_buf[1][0], vmovn_u16(ay));
+  }
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/pr113573.c b/gcc/testsuite/gcc.target/aarch64/pr113573.c
new file mode 100644
index 00000000000..a8e445c6e19
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr113573.c
@@ -0,0 +1,40 @@
+/* { dg-options "-O2" } */
+
+#pragma GCC aarch64 "arm_neon.h"
+typedef __Uint8x8_t uint8x8_t;
+typedef __Uint16x4_t uint16x4_t;
+typedef __Int16x8_t int16x8_t;
+typedef __Uint16x8_t uint16x8_t;
+int jsimd_extbgrx_ycc_convert_neon_image_width,
+    jsimd_extbgrx_ycc_convert_neon___trans_tmp_1;
+uint16x4_t jsimd_extbgrx_ycc_convert_neon___trans_tmp_2;
+uint16x8_t vcombine_u16();
+uint16x8_t vmovl_u8(uint8x8_t __a) {
+  return __builtin_aarch64_uxtlv8hi_uu(__a);
+}
+__inline int __attribute__((__gnu_inline__)) vmull_laneq_u16();
+uint8x8x4_t vld4_u8();
+void jsimd_extbgrx_ycc_convert_neon() {
+  int scaled_128_5 = jsimd_extbgrx_ycc_convert_neon___trans_tmp_1,
+      cols_remaining = jsimd_extbgrx_ycc_convert_neon_image_width;
+  for (;;)
+    if (cols_remaining) {
+      uint8x8x4_t input_pixels = vld4_u8();
+      uint16x8_t r = vmovl_u8(input_pixels.val[2]);
+      uint16x8_t g = vmovl_u8(input_pixels.val[1]);
+      uint16x8_t b = vmovl_u8(input_pixels.val[0]);
+      int y_l = vmull_laneq_u16(r);
+      uint16x8_t __a = g;
+      jsimd_extbgrx_ycc_convert_neon___trans_tmp_2 =
+          (uint16x4_t)__builtin_aarch64_get_lowv8hi((int16x8_t)__a);
+      __a = b;
+      int cb_l = scaled_128_5;
+      int cb_h = scaled_128_5;
+      int cr_l = scaled_128_5;
+      int cr_h = scaled_128_5;
+      uint16x8_t y_u16 = vcombine_u16(y_l);
+      uint16x8_t cb_u16 = vcombine_u16(cb_l, cb_h);
+      uint16x8_t cr_u16 = vcombine_u16(cr_l, cr_h);
+      __a = y_u16 = cb_u16 = cr_u16;
+    }
+}