From patchwork Fri Jan 12 12:39:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 187721 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:693c:2614:b0:101:6a76:bbe3 with SMTP id mm20csp143618dyc; Fri, 12 Jan 2024 04:40:28 -0800 (PST) X-Google-Smtp-Source: AGHT+IFYCLzPPCl4HthWEydS9YPW2ixsgVxr3MSsNOgR9qVnHs4wPqA7oGBfdbh7Davwj9md9vyy X-Received: by 2002:a0c:ab57:0:b0:681:942:bf87 with SMTP id i23-20020a0cab57000000b006810942bf87mr873166qvb.48.1705063227738; Fri, 12 Jan 2024 04:40:27 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705063227; cv=pass; d=google.com; s=arc-20160816; b=1LCylvp9Zm5xP85K8Y+VuLQG2GBd6LhbkD96L48k5EagUo0qyDVGH5dmmMF2ob/H6e J19XNP2UEReGzD7KaX0KdPpTcn3QLRAuVhxfuDjv+2BJJDTi3QAV48Bfe5pZLPDL4cvE BidiKvYF8J2xW3HZhqvVfoiEapTCqN/I5LyXLu3+jtQ06/AAAvkP+8zDkamrvqXyWKgE W0jspZCrMbh71qFzEPU+WRcb9Ih1WyVtgDIcPFirhv5OmK/WWi8jjTNNRMI+AWl3FdkF z9vE8pnoHad+MpEv9ynWdPHVQz4ChqYAl2pNemyhAmRjrxjbQkQU+fhesBLnlB/oLi+X XqOw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mime-version:user-agent :message-id:date:subject:mail-followup-to:to:from:arc-filter :dmarc-filter:delivered-to; bh=XRvGyAXeO3YdAV4y/fovj42REpyd0VlJ3aqPYUk+LV0=; fh=hPrbWPhweUx4V0GV9uXJqbyAzg2ABmTz7kczrAQqMmM=; b=NtQ+szEW5GQ1IXT2xeu8StnDfqOocdmeMI47QUsDx4giusyOBiwqyQSHpLRePIRddi qfbYjx+JYGINkqNs6mfhheNaBkDVs0ff6nMBazO1I17kEFrCtOVcU2jZ5jp9+T9SaL6R m7yRYp930gZhbOypgwQtyReIsT+Z5XxGoSiqWARel9uL5o0hI30+dSHwCGLavX/1Go8R z0VxcUx7ZrOXY/o05h0Unh74BJo+gKgY8tjpBgxKpHGSWpz26hkxR3S0//KPuQO7sf5E mbqHJwM73cv+JjB7hIu+jRZhUkFro2OkNnPua6Fnp01qIuOwmxVO/VSMU7PoFLXZ2R04 WcNA== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id a37-20020a05620a43a500b007833ebf462bsi3022519qkp.601.2024.01.12.04.40.27 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Jan 2024 04:40:27 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 70773385829A for ; Fri, 12 Jan 2024 12:40:27 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 270393858C60 for ; Fri, 12 Jan 2024 12:39:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 270393858C60 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 270393858C60 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705063179; cv=none; b=ZVfrPOuSHHZHHQXVXVODccHXXXKIliMZyVdWNqfoWIuRL0HBVKANxbyt76KwLfcY9Dv7v7zB1I0wONrOctjA0zhW/GWo5se0TQ38z7NORqDhb4sx/vHdtRpe/mwiADmOxT3RSG5jgob0KvChq5iYjBGfQl/hrHRpolviLwt0nnI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705063179; c=relaxed/simple; bh=LJwVRd6iRLb9rChQxWuUhhOBJczl/gEhALfh/Y3u3Vw=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=cxB5p2c62U1Y4NuXA4nTiFPFvurwPuPYsv2iV+IHk//OBshNNXH3uOUP/kI6QhUqQlPDrDfrF2q5IIap/H0UB37TbkWoxJuZH/8danjXZKkG/D1KQ2cdmvfa9t5quw1H17KsBB227fsc4WoFgMczIMFyM4Ssbi2NSNkTy4WrNYY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 26FC11FB for ; Fri, 12 Jan 2024 04:40:22 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 66FC03F73F for ; Fri, 12 Jan 2024 04:39:35 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [pushed] aarch64: Rework uxtl->zip optimisation [PR113196] Date: Fri, 12 Jan 2024 12:39:34 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-20.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, KAM_STOCKGEN, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1787888378977560418 X-GMAIL-MSGID: 1787888378977560418 g:f26f92b534f9 implemented unsigned extensions using ZIPs rather than UXTL{,2}, since the former has a higher throughput than the latter on amny cores. The optimisation worked by lowering directly to ZIP during expand, so that the zero input could be hoisted and shared. However, changing to ZIP means that zero extensions no longer benefit from some existing combine patterns. The patch included new patterns for UADDW and USUBW, but the PR shows that other patterns were affected as well. This patch instead introduces the ZIPs during a pre-reload split and forcibly hoists the zero move to the outermost scope. This has the disadvantage of executing the move even for a shrink-wrapped function, which I suppose could be a problem if it causes a kernel to trap and enable Advanced SIMD unnecessarily. In other circumstances, an unused move shouldn't affect things much. Also, the RA should be able to rematerialise the move at an appropriate point if necessary, such as if there is an intervening call. In https://gcc.gnu.org/pipermail/gcc-patches/2024-January/641948.html I'd then tried to allow a zero to be recombined back into a solitary ZIP. However, that relied on late-combine, which didn't make it into GCC 14. This version instead restricts the split to cases where the UXTL executes more frequently as the entry block (which is where we plan to put the zero). Also, the original optimisation contained a big-endian correction that I don't think is needed/correct. Even on big-endian targets, we want the ZIP to take the low half of an element from the input vector and the high half from the zero vector. And the patterns map directly to the underlying Advanced SIMD instructions: the use of unspecs means that there's no need to adjust for the difference between GCC and Arm lane numbering. Tested on aarch64-linux-gnu & pushed (after checking with Tamar off-list). Richard gcc/ PR target/113196 * config/aarch64/aarch64.h (machine_function::advsimd_zero_insn): New member variable. * config/aarch64/aarch64-protos.h (aarch64_split_simd_shift_p): Declare. * config/aarch64/iterators.md (Vnarrowq2): New mode attribute. * config/aarch64/aarch64-simd.md (vec_unpacku_hi_, vec_unpacks_hi_): Recombine into... (vec_unpack_hi_): ...this. Move the generation of zip2 for zero-extends to... (aarch64_simd_vec_unpack_hi_): ...a split of this instruction. Fix big-endian handling. (vec_unpacku_lo_, vec_unpacks_lo_): Recombine into... (vec_unpack_lo_): ...this. Move the generation of zip1 for zero-extends to... (2): ...a split of this instruction. Fix big-endian handling. (*aarch64_zip1_uxtl): New pattern. (aarch64_usubw_lo_zip, aarch64_uaddw_lo_zip): Delete (aarch64_usubw_hi_zip, aarch64_uaddw_hi_zip): Likewise. * config/aarch64/aarch64.cc (aarch64_get_shareable_reg): New function. (aarch64_gen_shareable_zero): Use it. (aarch64_split_simd_shift_p): New function. gcc/testsuite/ PR target/113196 * gcc.target/aarch64/pr113196.c: New test. * gcc.target/aarch64/simd/vmovl_high_1.c: Remove double include. Expect uxtl2 rather than zip2. * gcc.target/aarch64/vect_mixed_sizes_8.c: Expect zip1 rather than uxtl. * gcc.target/aarch64/vect_mixed_sizes_9.c: Likewise. * gcc.target/aarch64/vect_mixed_sizes_10.c: Likewise. --- gcc/config/aarch64/aarch64-protos.h | 1 + gcc/config/aarch64/aarch64-simd.md | 134 +++++------------- gcc/config/aarch64/aarch64.cc | 53 ++++++- gcc/config/aarch64/aarch64.h | 6 + gcc/config/aarch64/iterators.md | 2 + gcc/testsuite/gcc.target/aarch64/pr113196.c | 23 +++ .../gcc.target/aarch64/simd/vmovl_high_1.c | 8 +- .../gcc.target/aarch64/vect_mixed_sizes_10.c | 2 +- .../gcc.target/aarch64/vect_mixed_sizes_8.c | 2 +- .../gcc.target/aarch64/vect_mixed_sizes_9.c | 2 +- 10 files changed, 123 insertions(+), 110 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr113196.c diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index ce9bec79cec..4c70e8a4963 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -880,6 +880,7 @@ rtx aarch64_return_addr_rtx (void); rtx aarch64_return_addr (int, rtx); rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT); rtx aarch64_gen_shareable_zero (machine_mode); +bool aarch64_split_simd_shift_p (rtx_insn *); bool aarch64_simd_mem_operand_p (rtx); bool aarch64_sve_ld1r_operand_p (rtx); bool aarch64_sve_ld1rq_operand_p (rtx); diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 3cd184f46fa..6f48b4d5f21 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1958,7 +1958,7 @@ (define_insn "aarch64_simd_vec_unpack_lo_" [(set_attr "type" "neon_shift_imm_long")] ) -(define_insn "aarch64_simd_vec_unpack_hi_" +(define_insn_and_split "aarch64_simd_vec_unpack_hi_" [(set (match_operand: 0 "register_operand" "=w") (ANY_EXTEND: (vec_select: (match_operand:VQW 1 "register_operand" "w") @@ -1966,63 +1966,42 @@ (define_insn "aarch64_simd_vec_unpack_hi_" )))] "TARGET_SIMD" "xtl2\t%0., %1." - [(set_attr "type" "neon_shift_imm_long")] -) - -(define_expand "vec_unpacku_hi_" - [(match_operand: 0 "register_operand") - (match_operand:VQW 1 "register_operand")] - "TARGET_SIMD" + "&& == ZERO_EXTEND + && aarch64_split_simd_shift_p (insn)" + [(const_int 0)] { - rtx res = gen_reg_rtx (mode); - rtx tmp = aarch64_gen_shareable_zero (mode); - if (BYTES_BIG_ENDIAN) - emit_insn (gen_aarch64_zip2 (res, tmp, operands[1])); - else - emit_insn (gen_aarch64_zip2 (res, operands[1], tmp)); - emit_move_insn (operands[0], - simplify_gen_subreg (mode, res, mode, 0)); + /* On many cores, it is cheaper to implement UXTL2 using a ZIP2 with zero, + provided that the cost of the zero can be amortized over several + operations. We'll later recombine the zero and zip if there are + not sufficient uses of the zero to make the split worthwhile. */ + rtx res = simplify_gen_subreg (mode, operands[0], mode, 0); + rtx zero = aarch64_gen_shareable_zero (mode); + emit_insn (gen_aarch64_zip2 (res, operands[1], zero)); DONE; } + [(set_attr "type" "neon_shift_imm_long")] ) -(define_expand "vec_unpacks_hi_" +(define_expand "vec_unpack_hi_" [(match_operand: 0 "register_operand") - (match_operand:VQW 1 "register_operand")] + (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))] "TARGET_SIMD" { rtx p = aarch64_simd_vect_par_cnst_half (mode, , true); - emit_insn (gen_aarch64_simd_vec_unpacks_hi_ (operands[0], - operands[1], p)); + emit_insn (gen_aarch64_simd_vec_unpack_hi_ (operands[0], + operands[1], p)); DONE; } ) -(define_expand "vec_unpacku_lo_" +(define_expand "vec_unpack_lo_" [(match_operand: 0 "register_operand") - (match_operand:VQW 1 "register_operand")] - "TARGET_SIMD" - { - rtx res = gen_reg_rtx (mode); - rtx tmp = aarch64_gen_shareable_zero (mode); - if (BYTES_BIG_ENDIAN) - emit_insn (gen_aarch64_zip1 (res, tmp, operands[1])); - else - emit_insn (gen_aarch64_zip1 (res, operands[1], tmp)); - emit_move_insn (operands[0], - simplify_gen_subreg (mode, res, mode, 0)); - DONE; - } -) - -(define_expand "vec_unpacks_lo_" - [(match_operand: 0 "register_operand") - (match_operand:VQW 1 "register_operand")] + (ANY_EXTEND: (match_operand:VQW 1 "register_operand"))] "TARGET_SIMD" { rtx p = aarch64_simd_vect_par_cnst_half (mode, , false); - emit_insn (gen_aarch64_simd_vec_unpacks_lo_ (operands[0], - operands[1], p)); + emit_insn (gen_aarch64_simd_vec_unpack_lo_ (operands[0], + operands[1], p)); DONE; } ) @@ -4792,62 +4771,6 @@ (define_insn "aarch64_subw2_internal" [(set_attr "type" "neon_sub_widen")] ) -(define_insn "aarch64_usubw_lo_zip" - [(set (match_operand: 0 "register_operand" "=w") - (minus: - (match_operand: 1 "register_operand" "w") - (subreg: - (unspec: [ - (match_operand:VQW 2 "register_operand" "w") - (match_operand:VQW 3 "aarch64_simd_imm_zero") - ] UNSPEC_ZIP1) 0)))] - "TARGET_SIMD" - "usubw\\t%0., %1., %2." - [(set_attr "type" "neon_sub_widen")] -) - -(define_insn "aarch64_uaddw_lo_zip" - [(set (match_operand: 0 "register_operand" "=w") - (plus: - (subreg: - (unspec: [ - (match_operand:VQW 2 "register_operand" "w") - (match_operand:VQW 3 "aarch64_simd_imm_zero") - ] UNSPEC_ZIP1) 0) - (match_operand: 1 "register_operand" "w")))] - "TARGET_SIMD" - "uaddw\\t%0., %1., %2." - [(set_attr "type" "neon_add_widen")] -) - -(define_insn "aarch64_usubw_hi_zip" - [(set (match_operand: 0 "register_operand" "=w") - (minus: - (match_operand: 1 "register_operand" "w") - (subreg: - (unspec: [ - (match_operand:VQW 2 "register_operand" "w") - (match_operand:VQW 3 "aarch64_simd_imm_zero") - ] UNSPEC_ZIP2) 0)))] - "TARGET_SIMD" - "usubw2\\t%0., %1., %2." - [(set_attr "type" "neon_sub_widen")] -) - -(define_insn "aarch64_uaddw_hi_zip" - [(set (match_operand: 0 "register_operand" "=w") - (plus: - (subreg: - (unspec: [ - (match_operand:VQW 2 "register_operand" "w") - (match_operand:VQW 3 "aarch64_simd_imm_zero") - ] UNSPEC_ZIP2) 0) - (match_operand: 1 "register_operand" "w")))] - "TARGET_SIMD" - "uaddw2\\t%0., %1., %2." - [(set_attr "type" "neon_add_widen")] -) - (define_insn "aarch64_addw" [(set (match_operand: 0 "register_operand" "=w") (plus: @@ -9788,11 +9711,26 @@ (define_insn "aarch64_crypto_pmullv2di" ) ;; Sign- or zero-extend a 64-bit integer vector to a 128-bit vector. -(define_insn "2" +(define_insn_and_split "2" [(set (match_operand:VQN 0 "register_operand" "=w") (ANY_EXTEND:VQN (match_operand: 1 "register_operand" "w")))] "TARGET_SIMD" "xtl\t%0., %1." + "&& == ZERO_EXTEND + && aarch64_split_simd_shift_p (insn)" + [(const_int 0)] + { + /* On many cores, it is cheaper to implement UXTL using a ZIP1 with zero, + provided that the cost of the zero can be amortized over several + operations. We'll later recombine the zero and zip if there are + not sufficient uses of the zero to make the split worthwhile. */ + rtx res = simplify_gen_subreg (mode, operands[0], + mode, 0); + rtx zero = aarch64_gen_shareable_zero (mode); + rtx op = lowpart_subreg (mode, operands[1], mode); + emit_insn (gen_aarch64_zip1 (res, op, zero)); + DONE; + } [(set_attr "type" "neon_shift_imm_long")] ) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 32c7317f360..7d1f8c65ce4 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -22882,16 +22882,61 @@ aarch64_mov_operand_p (rtx x, machine_mode mode) == SYMBOL_TINY_ABSOLUTE; } +/* Return a function-invariant register that contains VALUE. *CACHED_INSN + caches instructions that set up such registers, so that they can be + reused by future calls. */ + +static rtx +aarch64_get_shareable_reg (rtx_insn **cached_insn, rtx value) +{ + rtx_insn *insn = *cached_insn; + if (insn && INSN_P (insn) && !insn->deleted ()) + { + rtx pat = PATTERN (insn); + if (GET_CODE (pat) == SET) + { + rtx dest = SET_DEST (pat); + if (REG_P (dest) + && !HARD_REGISTER_P (dest) + && rtx_equal_p (SET_SRC (pat), value)) + return dest; + } + } + rtx reg = gen_reg_rtx (GET_MODE (value)); + *cached_insn = emit_insn_before (gen_rtx_SET (reg, value), + function_beg_insn); + return reg; +} + /* Create a 0 constant that is based on V4SI to allow CSE to optimally share the constant creation. */ rtx aarch64_gen_shareable_zero (machine_mode mode) { - machine_mode zmode = V4SImode; - rtx tmp = gen_reg_rtx (zmode); - emit_move_insn (tmp, CONST0_RTX (zmode)); - return lowpart_subreg (mode, tmp, zmode); + rtx reg = aarch64_get_shareable_reg (&cfun->machine->advsimd_zero_insn, + CONST0_RTX (V4SImode)); + return lowpart_subreg (mode, reg, GET_MODE (reg)); +} + +/* INSN is some form of extension or shift that can be split into a + permutation involving a shared zero. Return true if we should + perform such a split. + + ??? For now, make sure that the split instruction executes more + frequently than the zero that feeds it. In future it would be good + to split without that restriction and instead recombine shared zeros + if they turn out not to be worthwhile. This would allow splits in + single-block functions and would also cope more naturally with + rematerialization. */ + +bool +aarch64_split_simd_shift_p (rtx_insn *insn) +{ + return (can_create_pseudo_p () + && optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn)) + && (ENTRY_BLOCK_PTR_FOR_FN (cfun)->count + < BLOCK_FOR_INSN (insn)->count)); } /* Return a const_int vector of VAL. */ diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 0a4e152c9bd..157a0b9dfa5 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -1056,6 +1056,12 @@ typedef struct GTY (()) machine_function /* A set of all decls that have been passed to a vld1 intrinsic in the current function. This is used to help guide the vector cost model. */ hash_set *vector_load_decls; + + /* An instruction that was emitted at the start of the function to + set an Advanced SIMD pseudo register to zero. If the instruction + still exists and still fulfils its original purpose. the same register + can be reused by other code. */ + rtx_insn *advsimd_zero_insn; } machine_function; #endif #endif diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 89767eecdf8..942270e99d6 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -1656,6 +1656,8 @@ (define_mode_attr Vnarrowq [(V8HI "v8qi") (V4SI "v4hi") ;; Narrowed quad-modes for VQN (Used for XTN2). (define_mode_attr VNARROWQ2 [(V8HI "V16QI") (V4SI "V8HI") (V2DI "V4SI")]) +(define_mode_attr Vnarrowq2 [(V8HI "v16qi") (V4SI "v8hi") + (V2DI "v4si")]) ;; Narrowed modes of vector modes. (define_mode_attr VNARROW [(VNx8HI "VNx16QI") diff --git a/gcc/testsuite/gcc.target/aarch64/pr113196.c b/gcc/testsuite/gcc.target/aarch64/pr113196.c new file mode 100644 index 00000000000..8982cc50282 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr113196.c @@ -0,0 +1,23 @@ +/* { dg-options "-O3" } */ + +#pragma GCC target "+nosve" + +int test(unsigned array[4][4]); + +int foo(unsigned short *a, unsigned long n) +{ + unsigned array[4][4]; + + for (unsigned i = 0; i < 4; i++, a += 4) + { + array[i][0] = a[0] << 6; + array[i][1] = a[1] << 6; + array[i][2] = a[2] << 6; + array[i][3] = a[3] << 6; + } + + return test(array); +} + +/* { dg-final { scan-assembler-times {\tushll\t} 2 } } */ +/* { dg-final { scan-assembler-times {\tushll2\t} 2 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/simd/vmovl_high_1.c b/gcc/testsuite/gcc.target/aarch64/simd/vmovl_high_1.c index a2d09eaee0d..9519062e6d7 100644 --- a/gcc/testsuite/gcc.target/aarch64/simd/vmovl_high_1.c +++ b/gcc/testsuite/gcc.target/aarch64/simd/vmovl_high_1.c @@ -3,8 +3,6 @@ #include -#include - #define FUNC(IT, OT, S) \ OT \ foo_##S (IT a) \ @@ -22,11 +20,11 @@ FUNC (int32x4_t, int64x2_t, s32) /* { dg-final { scan-assembler-times {sxtl2\tv0\.2d, v0\.4s} 1} } */ FUNC (uint8x16_t, uint16x8_t, u8) -/* { dg-final { scan-assembler-times {zip2\tv0\.16b, v0\.16b} 1} } */ +/* { dg-final { scan-assembler-times {uxtl2\tv0\.8h, v0\.16b} 1} } */ FUNC (uint16x8_t, uint32x4_t, u16) -/* { dg-final { scan-assembler-times {zip2\tv0\.8h, v0\.8h} 1} } */ +/* { dg-final { scan-assembler-times {uxtl2\tv0\.4s, v0\.8h} 1} } */ FUNC (uint32x4_t, uint64x2_t, u32) -/* { dg-final { scan-assembler-times {zip2\tv0\.4s, v0\.4s} 1} } */ +/* { dg-final { scan-assembler-times {uxtl2\tv0\.2d, v0\.4s} 1} } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_10.c b/gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_10.c index 81e77a8bb04..a741919b924 100644 --- a/gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_10.c +++ b/gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_10.c @@ -14,5 +14,5 @@ f (int16_t *x, int16_t *y, uint8_t *z, int n) } } -/* { dg-final { scan-assembler-times {\tuxtl\tv[0-9]+\.8h, v[0-9]+\.8b\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tzip1\tv[0-9]+\.16b, v[0-9]+\.16b, v[0-9]+\.16b\n} 1 } } */ /* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.8h,} 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_8.c b/gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_8.c index 9531966c294..835eef32f50 100644 --- a/gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_8.c +++ b/gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_8.c @@ -14,5 +14,5 @@ f (int64_t *x, int64_t *y, uint32_t *z, int n) } } -/* { dg-final { scan-assembler-times {\tuxtl\tv[0-9]+\.2d, v[0-9]+\.2s\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tzip1\tv[0-9]+\.4s, v[0-9]+\.4s, v[0-9]+\.4s\n} 1 } } */ /* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.2d,} 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_9.c b/gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_9.c index de8f6988685..77ff691da1c 100644 --- a/gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_9.c +++ b/gcc/testsuite/gcc.target/aarch64/vect_mixed_sizes_9.c @@ -14,5 +14,5 @@ f (int32_t *x, int32_t *y, uint16_t *z, int n) } } -/* { dg-final { scan-assembler-times {\tuxtl\tv[0-9]+\.4s, v[0-9]+\.4h\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tzip1\tv[0-9]+\.8h, v[0-9]+\.8h, v[0-9]+\.8h\n} 1 } } */ /* { dg-final { scan-assembler-times {\tadd\tv[0-9]+\.4s,} 1 } } */