From patchwork Thu Jan 25 12:05:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 192043 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:2553:b0:103:945f:af90 with SMTP id p19csp1588488dyi; Thu, 25 Jan 2024 04:06:15 -0800 (PST) X-Google-Smtp-Source: AGHT+IFGCvD91f9J7Vdwl+GUKzia92DNFErBTOhrxND0DVbPwtHKW0d+fOv9I18f4AsjQJWOYlJg X-Received: by 2002:ad4:5f0c:0:b0:686:ac19:ac96 with SMTP id fo12-20020ad45f0c000000b00686ac19ac96mr1141116qvb.67.1706184375043; Thu, 25 Jan 2024 04:06:15 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706184375; cv=pass; d=google.com; s=arc-20160816; b=Hs23bpkmAl8Vlz7An8JNXdGSdbdMZE0T3GfToFMKahGk4ONuZSft6U4FvKrrx2aJQa ek5ZXS+Y1NZDUb5guNjExU/Tydf23RFYOxgBZsAQW8PZ6IsM2pyKqKqKRh7ESvP/6kaf IeKjyQt1fY01b7mP26M0seHHYhnwm3SBGfgKZgxe9PiPg2+Gg87CylJi8hre8RsXBfF2 ZiU/eeZx0GEwmOCvN1j3IByu6y/Zp6AS/7ZBX/1OF5utsid4Wf55DvwQIexaOHiUyUw8 hUYJF6HAc/r/gb3/jvgulvcC3FsRtfD05J160lv3m6yQFFbApV4/YghFF0bRdFODrapI 5hmw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:user-agent:message-id:date:subject:mail-followup-to:to :from:arc-filter:dmarc-filter:delivered-to; bh=XdrQ0sugEhu+YovYqtjOAUfvlE6nvaPMmDtJ2HelsKk=; fh=hPrbWPhweUx4V0GV9uXJqbyAzg2ABmTz7kczrAQqMmM=; b=hwkZ0X3qjDYiuftLsDEsj9atgstXq1DY4CyNAERBVJLUeFZ3gqW70MaBaL9CMqHNg0 D2yH9c43hVwmJuCScLY7v2Tb8lvO82C4IU+isT/JisokgZUsFH1govIZNwKAbDeJ6CK/ UnLS4MUycb4SzqkZkX01xuETYT4d5UCW221HPkyGD737M8t0WnMV7aIX5gUipOSrhm7g lxxG5Z4wwLxnS8jM0yHESac6NubuYQYHpzuBuDWFi70112H+X+K6nHh5Ef+jVxSowL/6 QI76zXZa5GF7SBet9Zqh1a4FEzixNFQMgs5/pQnOeRLTSO3YPrBJXYKFK4n5Ak2Rcjol gS5Q== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id t19-20020a0cde13000000b0067a56b2ec60si12909999qvk.168.2024.01.25.04.06.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Jan 2024 04:06:15 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BD2B53858297 for ; Thu, 25 Jan 2024 12:06:14 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 0373B3858C66 for ; Thu, 25 Jan 2024 12:05:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0373B3858C66 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 0373B3858C66 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706184334; cv=none; b=VLJ9BVeHmsviSYx1C3f5h1iMq53cyNvbWUt9tq8tjEG/ZrclNvWtrB4WYuXD+6I0hBe3OOvOL+GlFk2pPuXFnB10C/dsO+NGTBwKplP1HXhKpPMdNDmNwymeZ2ETaaqdn0hHBE6Bgsm7FKpJqmniqExu4zLrqbPtsxEDetF6jAg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706184334; c=relaxed/simple; bh=URWX+5NG5Gc5ITG0l6w3E+vC8NxG+4MKQhKK3Mlr0ro=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=bQLOc2D/CDiQ3Kp3ETcr9gIPjV1q53GadHQ6opmwsuqSql/r+ttjhD6S2uKzkBlZkYxyMm37Z59pE/e/qF/PZcyLAnzUFxD4Ha9uMk/w6ZF06WPAVdXqUOZ4jwEzGvBk6FQLWayk5s0tvg1Gvja215kJYWzqNC8Ru3QkayR7jw4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CC81D1FB for ; Thu, 25 Jan 2024 04:06:14 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EC54C3F73F for ; Thu, 25 Jan 2024 04:05:29 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [pushed] aarch64: Avoid paradoxical subregs in UXTL split [PR113485] Date: Thu, 25 Jan 2024 12:05:28 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-21.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789063987475765989 X-GMAIL-MSGID: 1789063987475765989 g:74e3e839ab2d36841320 handled the UXTL{,2}-ZIP[12] optimisation in split1. The UXTL input is a 64-bit vector of N-bit elements and the result is a 128-bit vector of 2N-bit elements. The corresponding ZIP1 operates on 128-bit vectors of N-bit elements. This meant that the ZIP1 input had to be a 128-bit paradoxical subreg of the 64-bit UXTL input. In the PRs, it wasn't possible to generate this subreg because the inputs were already subregs of a x[234] structure of 64-bit vectors. I don't think the same thing can happen for UXTL2->ZIP2 because UXTL2 input is a 128-bit vector rather than a 64-bit vector. It isn't really necessary for ZIP1 to take 128-bit inputs, since the upper 64 bits are ignored. This patch therefore adds a pattern for 64-bit → 128-bit ZIP1s. In principle, we should probably use this form for all ZIP1s. But in practice, that creates an awkward special case, and would be quite invasive for stage 4. Tested on aarch64-linux-gnu & pushed. Richard gcc/ PR target/113485 * config/aarch64/aarch64-simd.md (aarch64_zip1_low): New pattern. (2): Use it instead of generating a paradoxical subreg for the input. gcc/testsuite/ PR target/113485 * gcc.target/aarch64/pr113485.c: New test. * gcc.target/aarch64/pr113573.c: Likewise. --- gcc/config/aarch64/aarch64-simd.md | 17 +++++++-- gcc/testsuite/gcc.target/aarch64/pr113485.c | 25 +++++++++++++ gcc/testsuite/gcc.target/aarch64/pr113573.c | 40 +++++++++++++++++++++ 3 files changed, 79 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr113485.c create mode 100644 gcc/testsuite/gcc.target/aarch64/pr113573.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 556d0cf359f..48f0741e7d0 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -8505,6 +8505,18 @@ (define_insn "aarch64_" [(set_attr "type" "neon_permute")] ) +;; ZIP1 ignores the contents of the upper halves of the registers, +;; so we can describe 128-bit operations in terms of 64-bit inputs. +(define_insn "aarch64_zip1_low" + [(set (match_operand:VQ 0 "register_operand" "=w") + (unspec:VQ [(match_operand: 1 "register_operand" "w") + (match_operand: 2 "register_operand" "w")] + UNSPEC_ZIP1))] + "TARGET_SIMD" + "zip1\t%0., %1., %2." + [(set_attr "type" "neon_permute_q")] +) + ;; This instruction's pattern is generated directly by ;; aarch64_expand_vec_perm_const, so any changes to the pattern would ;; need corresponding changes there. Note that the immediate (third) @@ -9685,9 +9697,8 @@ (define_insn_and_split "2" not sufficient uses of the zero to make the split worthwhile. */ rtx res = simplify_gen_subreg (mode, operands[0], mode, 0); - rtx zero = aarch64_gen_shareable_zero (mode); - rtx op = lowpart_subreg (mode, operands[1], mode); - emit_insn (gen_aarch64_zip1 (res, op, zero)); + rtx zero = aarch64_gen_shareable_zero (mode); + emit_insn (gen_aarch64_zip1_low (res, operands[1], zero)); DONE; } [(set_attr "type" "neon_shift_imm_long")] diff --git a/gcc/testsuite/gcc.target/aarch64/pr113485.c b/gcc/testsuite/gcc.target/aarch64/pr113485.c new file mode 100644 index 00000000000..c7028245b61 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr113485.c @@ -0,0 +1,25 @@ +/* { dg-options "-O" } */ + +#include + +void test() +{ + while (1) + { + static const uint16_t jsimd_rgb_ycc_neon_consts[] = {19595, 0, 0, 0, 0, 0, 0, 0}; + uint16x8_t consts = vld1q_u16(jsimd_rgb_ycc_neon_consts); + + uint8_t tmp_buf[0]; + uint8x8x3_t input_pixels = vld3_u8(tmp_buf); + uint16x8_t r = vmovl_u8(input_pixels.val[1]); + uint32x4_t y_l = vmull_laneq_u16(vget_low_u16(r), consts, 0); + + uint32x4_t s = vdupq_n_u32(1); + uint16x4_t a = vrshrn_n_u32(s, 16); + uint16x4_t y = vrshrn_n_u32(y_l, 16); + uint16x8_t ay = vcombine_u16(a, y); + + unsigned char ***out_buf; + vst1_u8(out_buf[1][0], vmovn_u16(ay)); + } +} diff --git a/gcc/testsuite/gcc.target/aarch64/pr113573.c b/gcc/testsuite/gcc.target/aarch64/pr113573.c new file mode 100644 index 00000000000..a8e445c6e19 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr113573.c @@ -0,0 +1,40 @@ +/* { dg-options "-O2" } */ + +#pragma GCC aarch64 "arm_neon.h" +typedef __Uint8x8_t uint8x8_t; +typedef __Uint16x4_t uint16x4_t; +typedef __Int16x8_t int16x8_t; +typedef __Uint16x8_t uint16x8_t; +int jsimd_extbgrx_ycc_convert_neon_image_width, + jsimd_extbgrx_ycc_convert_neon___trans_tmp_1; +uint16x4_t jsimd_extbgrx_ycc_convert_neon___trans_tmp_2; +uint16x8_t vcombine_u16(); +uint16x8_t vmovl_u8(uint8x8_t __a) { + return __builtin_aarch64_uxtlv8hi_uu(__a); +} +__inline int __attribute__((__gnu_inline__)) vmull_laneq_u16(); +uint8x8x4_t vld4_u8(); +void jsimd_extbgrx_ycc_convert_neon() { + int scaled_128_5 = jsimd_extbgrx_ycc_convert_neon___trans_tmp_1, + cols_remaining = jsimd_extbgrx_ycc_convert_neon_image_width; + for (;;) + if (cols_remaining) { + uint8x8x4_t input_pixels = vld4_u8(); + uint16x8_t r = vmovl_u8(input_pixels.val[2]); + uint16x8_t g = vmovl_u8(input_pixels.val[1]); + uint16x8_t b = vmovl_u8(input_pixels.val[0]); + int y_l = vmull_laneq_u16(r); + uint16x8_t __a = g; + jsimd_extbgrx_ycc_convert_neon___trans_tmp_2 = + (uint16x4_t)__builtin_aarch64_get_lowv8hi((int16x8_t)__a); + __a = b; + int cb_l = scaled_128_5; + int cb_h = scaled_128_5; + int cr_l = scaled_128_5; + int cr_h = scaled_128_5; + uint16x8_t y_u16 = vcombine_u16(y_l); + uint16x8_t cb_u16 = vcombine_u16(cb_l, cb_h); + uint16x8_t cr_u16 = vcombine_u16(cr_l, cr_h); + __a = y_u16 = cb_u16 = cr_u16; + } +}