From patchwork Tue Dec 5 10:13:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 173907 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp3330118vqy; Tue, 5 Dec 2023 02:17:34 -0800 (PST) X-Google-Smtp-Source: AGHT+IH9bHl11QzCJTEbleTwdTiRqfoGs0QnKcDV9hQ0Uc2jqdy1KVawC8QEYQGSci41/9B/uSdv X-Received: by 2002:a05:6808:6506:b0:3b8:b063:895a with SMTP id fm6-20020a056808650600b003b8b063895amr3341546oib.104.1701771454277; Tue, 05 Dec 2023 02:17:34 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1701771454; cv=pass; d=google.com; s=arc-20160816; b=jA99XgvEpGrK17iTsoj/HIk0ch7/ugtfpFhwVGMf9e9Yed0IXo5kgIeHI1CSleyXKU o/m/hCOirfivblJh+1SjUU5JcIQe4Y+a757B3MPfnhxxk+xB8CZNMUuR0a5AUdA5Y5of n+NSdtUnmsWZhAJy1/VdlieAL8PNpEdUzEphC8oFFrdIvSWWpLOAqon0EVIy9AASY9L/ nUZfDlNBs+r/yAImDUPTtiyb7xtewlKVjFp+HDFeD4sPldRlsIWZXF33NnGt56TldWXv nLGkCczjGk8B4Hhsn5/U6KgmbYmB86MKjj60L/ZvpsymZqu0Xv4J/9DX5vf2Zy0ivYao ONdQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:arc-filter:dmarc-filter:delivered-to; bh=M1XNr4wemyde2lRYMEMzruk6cANxVqxLNZ66ez4Lyog=; fh=C4nEn4uRKApr1WsFtLyJD8L5BeRuRc+JFyqoopFjd9M=; b=kWkvNJWk6YA0s5PF3tROMbulgBXoqNQIbmvJZh6YI7Dr+29ZO1gZVpIvzOb7F/rdv5 /Y6I1I4QmpqkmYBGJCV+RVbvsKIT8zAXjdkVgOCSAKNZOOZ0wjSFf5KTEFE/MPopqUdu hd8X2uLD/tfrDJhAlYR7Ak0CVfIAaEZ7XcqItN0jcQHsVKliqXjQWXuGW381W+oa3XNf fdfHx/7SLZtCkOYbs5LbFM7UkSThwpGrhXBNOecF7hN8D1xSEa+gSwK0P9TMPOU0lCYg g+ygyIhUlWRNPTPPtnC0jeWTkGlxvDl2rRBGx79IEfa9uJWkI90V0uDVv0dQhmTardkY GrUQ== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id k9-20020a05620a414900b0077f086b9db4si5272163qko.74.2023.12.05.02.17.34 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 02:17:34 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 38AB43845BC4 for ; Tue, 5 Dec 2023 10:16:51 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id DC199385C303 for ; Tue, 5 Dec 2023 10:13:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DC199385C303 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org DC199385C303 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701771230; cv=none; b=sEXrJwXnaxgO+7gN3bIn4wmnZ+hC/g3s7rOxTctT0TtY7cev3Jq/yk93WWBSH0kwh5AuaON6+Xpb18mG4sSYFwrmxCLk299cJtqvLYRgPCmDbE8iIyhU3HBdSL4xgU2wvMNe9lwvJPBJg/V7wxU7yUytsDFH6QTD+frhtKTLdyw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701771230; c=relaxed/simple; bh=TlFoHv17krISx5EAHCt8/h0xfis8+VS6qWJwDIS34v8=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=w7n+Zkr3r8/I9Zxkn+5fWMHCQZ6LtfVQQ5IK8l2RYhjpzRBrWt1VLOj07blzMW/MosuymEusNAptwanMjfQdJbYxngF3yjd4Kev+2Bh4f95UdTXxKSR8gM9GXM8LGDokKhohzCJ/o+7KmgjkysV1pnSsvjmABJlDIX6EZOrmirA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1EB0FFEC; Tue, 5 Dec 2023 02:14:29 -0800 (PST) Received: from e121540-lin.manchester.arm.com (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0FF263F5A1; Tue, 5 Dec 2023 02:13:41 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Cc: Richard Sandiford Subject: [pushed v2 13/25] aarch64: Distinguish streaming-compatible AdvSIMD insns Date: Tue, 5 Dec 2023 10:13:11 +0000 Message-Id: <20231205101323.1914247-14-richard.sandiford@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231205101323.1914247-1-richard.sandiford@arm.com> References: <20231205101323.1914247-1-richard.sandiford@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-22.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784436704424324411 X-GMAIL-MSGID: 1784436704424324411 The vast majority of Advanced SIMD instructions are not available in streaming mode, but some of the load/store/move instructions are. This patch adds a new target feature macro called TARGET_BASE_SIMD for this streaming-compatible subset. The vector-to-vector move instructions are not streaming-compatible, so we need to use the SVE move instructions where enabled, or fall back to the nofp16 handling otherwise. I haven't found a good way of testing the SVE EXT alternative in aarch64_simd_mov_from_high, but I'd rather provide it than not. gcc/ * config/aarch64/aarch64.h (TARGET_BASE_SIMD): New macro. (TARGET_SIMD): Require PSTATE.SM to be 0. (AARCH64_ISA_SM_OFF): New macro. * config/aarch64/aarch64.cc (aarch64_array_mode_supported_p): Allow Advanced SIMD structure modes for TARGET_BASE_SIMD. (aarch64_print_operand): Support '%Z'. (aarch64_secondary_reload): Expect SVE moves to be used for Advanced SIMD modes if SVE is enabled and non-streaming Advanced SIMD isn't. (aarch64_register_move_cost): Likewise. (aarch64_simd_container_mode): Extend Advanced SIMD mode handling to TARGET_BASE_SIMD. (aarch64_expand_cpymem): Expand commentary. * config/aarch64/aarch64.md (arches): Add base_simd and nobase_simd. (arch_enabled): Handle it. (*mov_aarch64): Extend UMOV alternative to TARGET_BASE_SIMD. (*movti_aarch64): Use an SVE move instruction if non-streaming SIMD isn't available. (*mov_aarch64): Likewise. (load_pair_dw_tftf): Extend to TARGET_BASE_SIMD. (store_pair_dw_tftf): Likewise. (loadwb_pair_): Likewise. (storewb_pair_): Likewise. * config/aarch64/aarch64-simd.md (*aarch64_simd_mov): Allow UMOV in streaming mode. (*aarch64_simd_mov): Use an SVE move instruction if non-streaming SIMD isn't available. (aarch64_store_lane0): Depend on TARGET_FLOAT rather than TARGET_SIMD. (aarch64_simd_mov_from_low): Likewise. Use fmov if Advanced SIMD is completely disabled. (aarch64_simd_mov_from_high): Use SVE EXT instructions if non-streaming SIMD isn't available. gcc/testsuite/ * gcc.target/aarch64/movdf_2.c: New test. * gcc.target/aarch64/movdi_3.c: Likewise. * gcc.target/aarch64/movhf_2.c: Likewise. * gcc.target/aarch64/movhi_2.c: Likewise. * gcc.target/aarch64/movqi_2.c: Likewise. * gcc.target/aarch64/movsf_2.c: Likewise. * gcc.target/aarch64/movsi_2.c: Likewise. * gcc.target/aarch64/movtf_3.c: Likewise. * gcc.target/aarch64/movtf_4.c: Likewise. * gcc.target/aarch64/movti_3.c: Likewise. * gcc.target/aarch64/movti_4.c: Likewise. * gcc.target/aarch64/movv16qi_4.c: Likewise. * gcc.target/aarch64/movv16qi_5.c: Likewise. * gcc.target/aarch64/movv8qi_4.c: Likewise. * gcc.target/aarch64/sme/arm_neon_1.c: Likewise. * gcc.target/aarch64/sme/arm_neon_2.c: Likewise. * gcc.target/aarch64/sme/arm_neon_3.c: Likewise. --- gcc/config/aarch64/aarch64-simd.md | 48 +++++------ gcc/config/aarch64/aarch64.cc | 16 ++-- gcc/config/aarch64/aarch64.h | 12 ++- gcc/config/aarch64/aarch64.md | 79 +++++++++-------- gcc/testsuite/gcc.target/aarch64/movdf_2.c | 51 +++++++++++ gcc/testsuite/gcc.target/aarch64/movdi_3.c | 59 +++++++++++++ gcc/testsuite/gcc.target/aarch64/movhf_2.c | 53 ++++++++++++ gcc/testsuite/gcc.target/aarch64/movhi_2.c | 61 +++++++++++++ gcc/testsuite/gcc.target/aarch64/movqi_2.c | 59 +++++++++++++ gcc/testsuite/gcc.target/aarch64/movsf_2.c | 51 +++++++++++ gcc/testsuite/gcc.target/aarch64/movsi_2.c | 59 +++++++++++++ gcc/testsuite/gcc.target/aarch64/movtf_3.c | 81 +++++++++++++++++ gcc/testsuite/gcc.target/aarch64/movtf_4.c | 78 +++++++++++++++++ gcc/testsuite/gcc.target/aarch64/movti_3.c | 86 +++++++++++++++++++ gcc/testsuite/gcc.target/aarch64/movti_4.c | 83 ++++++++++++++++++ gcc/testsuite/gcc.target/aarch64/movv16qi_4.c | 82 ++++++++++++++++++ gcc/testsuite/gcc.target/aarch64/movv16qi_5.c | 79 +++++++++++++++++ gcc/testsuite/gcc.target/aarch64/movv8qi_4.c | 55 ++++++++++++ .../gcc.target/aarch64/sme/arm_neon_1.c | 13 +++ .../gcc.target/aarch64/sme/arm_neon_2.c | 11 +++ .../gcc.target/aarch64/sme/arm_neon_3.c | 11 +++ 21 files changed, 1060 insertions(+), 67 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/movdf_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movdi_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movhf_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movhi_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movqi_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movsf_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movsi_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movtf_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movtf_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movti_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movti_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv16qi_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv16qi_5.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv8qi_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index ad79a8110a5..50b68552fe4 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -149,20 +149,20 @@ (define_insn_and_split "*aarch64_simd_mov" && (register_operand (operands[0], mode) || aarch64_simd_reg_or_zero (operands[1], mode))" {@ [cons: =0, 1; attrs: type, arch, length] - [w , m ; neon_load1_1reg , * , *] ldr\t%d0, %1 - [r , m ; load_8 , * , *] ldr\t%x0, %1 - [m , Dz; store_8 , * , *] str\txzr, %0 - [m , w ; neon_store1_1reg, * , *] str\t%d1, %0 - [m , r ; store_8 , * , *] str\t%x1, %0 - [w , w ; neon_logic , simd, *] mov\t%0., %1. - [w , w ; neon_logic , * , *] fmov\t%d0, %d1 - [?r, w ; neon_to_gp , simd, *] umov\t%0, %1.d[0] - [?r, w ; neon_to_gp , * , *] fmov\t%x0, %d1 - [?w, r ; f_mcr , * , *] fmov\t%d0, %1 - [?r, r ; mov_reg , * , *] mov\t%0, %1 - [w , Dn; neon_move , simd, *] << aarch64_output_simd_mov_immediate (operands[1], 64); - [w , Dz; f_mcr , * , *] fmov\t%d0, xzr - [w , Dx; neon_move , simd, 8] # + [w , m ; neon_load1_1reg , * , *] ldr\t%d0, %1 + [r , m ; load_8 , * , *] ldr\t%x0, %1 + [m , Dz; store_8 , * , *] str\txzr, %0 + [m , w ; neon_store1_1reg, * , *] str\t%d1, %0 + [m , r ; store_8 , * , *] str\t%x1, %0 + [w , w ; neon_logic , simd , *] mov\t%0., %1. + [w , w ; neon_logic , * , *] fmov\t%d0, %d1 + [?r, w ; neon_to_gp , base_simd, *] umov\t%0, %1.d[0] + [?r, w ; neon_to_gp , * , *] fmov\t%x0, %d1 + [?w, r ; f_mcr , * , *] fmov\t%d0, %1 + [?r, r ; mov_reg , * , *] mov\t%0, %1 + [w , Dn; neon_move , simd , *] << aarch64_output_simd_mov_immediate (operands[1], 64); + [w , Dz; f_mcr , * , *] fmov\t%d0, xzr + [w , Dx; neon_move , simd , 8] # } "CONST_INT_P (operands[1]) && aarch64_simd_special_constant_p (operands[1], mode) @@ -185,6 +185,7 @@ (define_insn_and_split "*aarch64_simd_mov" [Umn, Dz; store_16 , * , 4] stp\txzr, xzr, %0 [m , w ; neon_store1_1reg, * , 4] str\t%q1, %0 [w , w ; neon_logic , simd, 4] mov\t%0., %1. + [w , w ; * , sve , 4] mov\t%Z0.d, %Z1.d [?r , w ; multiple , * , 8] # [?w , r ; multiple , * , 8] # [?r , r ; multiple , * , 8] # @@ -225,7 +226,7 @@ (define_insn "aarch64_store_lane0" [(set (match_operand: 0 "memory_operand" "=m") (vec_select: (match_operand:VALL_F16 1 "register_operand" "w") (parallel [(match_operand 2 "const_int_operand" "n")])))] - "TARGET_SIMD + "TARGET_FLOAT && ENDIAN_LANE_N (, INTVAL (operands[2])) == 0" "str\\t%1, %0" [(set_attr "type" "neon_store1_1reg")] @@ -374,18 +375,18 @@ (define_insn_and_split "aarch64_simd_mov_from_low" (vec_select: (match_operand:VQMOV_NO2E 1 "register_operand") (match_operand:VQMOV_NO2E 2 "vect_par_cnst_lo_half")))] - "TARGET_SIMD" - {@ [ cons: =0 , 1 ; attrs: type ] - [ w , w ; mov_reg ] # - [ ?r , w ; neon_to_gp ] umov\t%0, %1.d[0] + "TARGET_FLOAT" + {@ [ cons: =0 , 1 ; attrs: type , arch ] + [ w , w ; mov_reg , simd ] # + [ ?r , w ; neon_to_gp , base_simd ] umov\t%0, %1.d[0] + [ ?r , w ; f_mrc , * ] fmov\t%0, %d1 } "&& reload_completed && aarch64_simd_register (operands[0], mode)" [(set (match_dup 0) (match_dup 1))] { operands[1] = aarch64_replace_reg_mode (operands[1], mode); } - [ - (set_attr "length" "4")] + [(set_attr "length" "4")] ) (define_insn "aarch64_simd_mov_from_high" @@ -396,12 +397,11 @@ (define_insn "aarch64_simd_mov_from_high" "TARGET_FLOAT" {@ [ cons: =0 , 1 ; attrs: type , arch ] [ w , w ; neon_dup , simd ] dup\t%d0, %1.d[1] + [ w , w ; * , sve ] ext\t%Z0.b, %Z0.b, %Z0.b, #8 [ ?r , w ; neon_to_gp , simd ] umov\t%0, %1.d[1] [ ?r , w ; f_mrc , * ] fmov\t%0, %1.d[1] } - [ - - (set_attr "length" "4")] + [(set_attr "length" "4")] ) (define_insn "orn3" diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 3792f1e99fd..ea00ec192ee 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -1400,7 +1400,7 @@ static bool aarch64_array_mode_supported_p (machine_mode mode, unsigned HOST_WIDE_INT nelems) { - if (TARGET_SIMD + if (TARGET_BASE_SIMD && (AARCH64_VALID_SIMD_QREG_MODE (mode) || AARCH64_VALID_SIMD_DREG_MODE (mode)) && (nelems >= 2 && nelems <= 4)) @@ -10762,8 +10762,8 @@ aarch64_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x, return NO_REGS; } - /* Without the TARGET_SIMD instructions we cannot move a Q register - to a Q register directly. We need a scratch. */ + /* Without the TARGET_SIMD or TARGET_SVE instructions we cannot move a + Q register to a Q register directly. We need a scratch. */ if (REG_P (x) && (mode == TFmode || mode == TImode @@ -13368,7 +13368,7 @@ aarch64_register_move_cost (machine_mode mode, secondary reload. A general register is used as a scratch to move the upper DI value and the lower DI value is moved directly, hence the cost is the sum of three moves. */ - if (! TARGET_SIMD) + if (!TARGET_SIMD && !TARGET_SVE) return regmove_cost->GP2FP + regmove_cost->FP2GP + regmove_cost->FP2FP; return regmove_cost->FP2FP; @@ -18996,7 +18996,7 @@ aarch64_simd_container_mode (scalar_mode mode, poly_int64 width) return aarch64_full_sve_mode (mode).else_mode (word_mode); gcc_assert (known_eq (width, 64) || known_eq (width, 128)); - if (TARGET_SIMD) + if (TARGET_BASE_SIMD) { if (known_eq (width, 128)) return aarch64_vq_mode (mode).else_mode (word_mode); @@ -23409,7 +23409,11 @@ aarch64_expand_cpymem (rtx *operands) int copy_bits = 256; /* Default to 256-bit LDP/STP on large copies, however small copies, no SIMD - support or slow 256-bit LDP/STP fall back to 128-bit chunks. */ + support or slow 256-bit LDP/STP fall back to 128-bit chunks. + + ??? Although it would be possible to use LDP/STP Qn in streaming mode + (so using TARGET_BASE_SIMD instead of TARGET_SIMD), it isn't clear + whether that would improve performance. */ if (size <= 24 || !TARGET_SIMD || (aarch64_tune_params.extra_tuning_flags diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index aa908ced7cd..808e2044009 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -61,8 +61,15 @@ #define WORDS_BIG_ENDIAN (BYTES_BIG_ENDIAN) /* AdvSIMD is supported in the default configuration, unless disabled by - -mgeneral-regs-only or by the +nosimd extension. */ -#define TARGET_SIMD (AARCH64_ISA_SIMD) + -mgeneral-regs-only or by the +nosimd extension. The set of available + instructions is then subdivided into: + + - the "base" set, available both in SME streaming mode and in + non-streaming mode + + - the full set, available only in non-streaming mode. */ +#define TARGET_BASE_SIMD (AARCH64_ISA_SIMD) +#define TARGET_SIMD (AARCH64_ISA_SIMD && AARCH64_ISA_SM_OFF) #define TARGET_FLOAT (AARCH64_ISA_FP) #define UNITS_PER_WORD 8 @@ -199,6 +206,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; /* Macros to test ISA flags. */ +#define AARCH64_ISA_SM_OFF (aarch64_isa_flags & AARCH64_FL_SM_OFF) #define AARCH64_ISA_MODE (aarch64_isa_flags & AARCH64_FL_ISA_MODES) #define AARCH64_ISA_CRC (aarch64_isa_flags & AARCH64_FL_CRC) #define AARCH64_ISA_CRYPTO (aarch64_isa_flags & AARCH64_FL_CRYPTO) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e6b19b962b1..ddfd17bd2dd 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -366,7 +366,8 @@ (define_constants ;; As a convenience, "fp_q" means "fp" + the ability to move between ;; Q registers and is equivalent to "simd". -(define_enum "arches" [ any rcpc8_4 fp fp_q simd nosimd sve fp16]) +(define_enum "arches" [any rcpc8_4 fp fp_q base_simd nobase_simd + simd nosimd sve fp16]) (define_enum_attr "arch" "arches" (const_string "any")) @@ -394,6 +395,12 @@ (define_attr "arch_enabled" "no,yes" (and (eq_attr "arch" "fp") (match_test "TARGET_FLOAT")) + (and (eq_attr "arch" "base_simd") + (match_test "TARGET_BASE_SIMD")) + + (and (eq_attr "arch" "nobase_simd") + (match_test "!TARGET_BASE_SIMD")) + (and (eq_attr "arch" "fp_q, simd") (match_test "TARGET_SIMD")) @@ -1224,23 +1231,23 @@ (define_insn "*mov_aarch64" "(register_operand (operands[0], mode) || aarch64_reg_or_zero (operands[1], mode))" {@ [cons: =0, 1; attrs: type, arch] - [w, Z ; neon_move , simd ] movi\t%0., #0 - [r, r ; mov_reg , * ] mov\t%w0, %w1 - [r, M ; mov_imm , * ] mov\t%w0, %1 - [w, D; neon_move , simd ] << aarch64_output_scalar_simd_mov_immediate (operands[1], mode); + [w, Z ; neon_move , simd ] movi\t%0., #0 + [r, r ; mov_reg , * ] mov\t%w0, %w1 + [r, M ; mov_imm , * ] mov\t%w0, %1 + [w, D; neon_move , simd ] << aarch64_output_scalar_simd_mov_immediate (operands[1], mode); /* The "mov_imm" type for CNT is just a placeholder. */ - [r, Usv ; mov_imm , sve ] << aarch64_output_sve_cnt_immediate ("cnt", "%x0", operands[1]); - [r, Usr ; mov_imm , sve ] << aarch64_output_sve_rdvl (operands[1]); - [r, m ; load_4 , * ] ldr\t%w0, %1 - [w, m ; load_4 , * ] ldr\t%0, %1 - [m, r Z ; store_4 , * ] str\\t%w1, %0 - [m, w ; store_4 , * ] str\t%1, %0 - [r, w ; neon_to_gp , simd ] umov\t%w0, %1.[0] - [r, w ; neon_to_gp , nosimd] fmov\t%w0, %s1 - [w, r Z ; neon_from_gp, simd ] dup\t%0., %w1 - [w, r Z ; neon_from_gp, nosimd] fmov\t%s0, %w1 - [w, w ; neon_dup , simd ] dup\t%0, %1.[0] - [w, w ; neon_dup , nosimd] fmov\t%s0, %s1 + [r, Usv ; mov_imm , sve ] << aarch64_output_sve_cnt_immediate ("cnt", "%x0", operands[1]); + [r, Usr ; mov_imm , sve ] << aarch64_output_sve_rdvl (operands[1]); + [r, m ; load_4 , * ] ldr\t%w0, %1 + [w, m ; load_4 , * ] ldr\t%0, %1 + [m, r Z ; store_4 , * ] str\\t%w1, %0 + [m, w ; store_4 , * ] str\t%1, %0 + [r, w ; neon_to_gp , base_simd ] umov\t%w0, %1.[0] + [r, w ; neon_to_gp , nobase_simd] fmov\t%w0, %s1 + [w, r Z ; neon_from_gp, simd ] dup\t%0., %w1 + [w, r Z ; neon_from_gp, nosimd ] fmov\t%s0, %w1 + [w, w ; neon_dup , simd ] dup\t%0, %1.[0] + [w, w ; neon_dup , nosimd ] fmov\t%s0, %s1 } ) @@ -1405,9 +1412,9 @@ (define_expand "movti" (define_insn "*movti_aarch64" [(set (match_operand:TI 0 - "nonimmediate_operand" "= r,w,w,w, r,w,r,m,m,w,m") + "nonimmediate_operand" "= r,w,w,w, r,w,w,r,m,m,w,m") (match_operand:TI 1 - "aarch64_movti_operand" " rUti,Z,Z,r, w,w,m,r,Z,m,w"))] + "aarch64_movti_operand" " rUti,Z,Z,r, w,w,w,m,r,Z,m,w"))] "(register_operand (operands[0], TImode) || aarch64_reg_or_zero (operands[1], TImode))" "@ @@ -1417,16 +1424,17 @@ (define_insn "*movti_aarch64" # # mov\\t%0.16b, %1.16b + mov\\t%Z0.d, %Z1.d ldp\\t%0, %H0, %1 stp\\t%1, %H1, %0 stp\\txzr, xzr, %0 ldr\\t%q0, %1 str\\t%q1, %0" - [(set_attr "type" "multiple,neon_move,f_mcr,f_mcr,f_mrc,neon_logic_q, \ + [(set_attr "type" "multiple,neon_move,f_mcr,f_mcr,f_mrc,neon_logic_q,*,\ load_16,store_16,store_16,\ load_16,store_16") - (set_attr "length" "8,4,4,8,8,4,4,4,4,4,4") - (set_attr "arch" "*,simd,*,*,*,simd,*,*,*,fp,fp")] + (set_attr "length" "8,4,4,8,8,4,4,4,4,4,4,4") + (set_attr "arch" "*,simd,*,*,*,simd,sve,*,*,*,fp,fp")] ) ;; Split a TImode register-register or register-immediate move into @@ -1553,13 +1561,14 @@ (define_split (define_insn "*mov_aarch64" [(set (match_operand:TFD 0 - "nonimmediate_operand" "=w,?r ,w ,?r,w,?w,w,m,?r,m ,m") + "nonimmediate_operand" "=w,w,?r ,w ,?r,w,?w,w,m,?r,m ,m") (match_operand:TFD 1 - "general_operand" " w,?rY,?r,w ,Y,Y ,m,w,m ,?r,Y"))] + "general_operand" " w,w,?rY,?r,w ,Y,Y ,m,w,m ,?r,Y"))] "TARGET_FLOAT && (register_operand (operands[0], mode) || aarch64_reg_or_fp_zero (operands[1], mode))" "@ mov\\t%0.16b, %1.16b + mov\\t%Z0.d, %Z1.d # # # @@ -1570,10 +1579,10 @@ (define_insn "*mov_aarch64" ldp\\t%0, %H0, %1 stp\\t%1, %H1, %0 stp\\txzr, xzr, %0" - [(set_attr "type" "logic_reg,multiple,f_mcr,f_mrc,neon_move_q,f_mcr,\ + [(set_attr "type" "logic_reg,*,multiple,f_mcr,f_mrc,neon_move_q,f_mcr,\ f_loadd,f_stored,load_16,store_16,store_16") - (set_attr "length" "4,8,8,8,4,4,4,4,4,4,4") - (set_attr "arch" "simd,*,*,*,simd,*,*,*,*,*,*")] + (set_attr "length" "4,4,8,8,8,4,4,4,4,4,4,4") + (set_attr "arch" "simd,sve,*,*,*,simd,*,*,*,*,*,*")] ) (define_split @@ -1767,7 +1776,7 @@ (define_insn "load_pair_dw_" (match_operand:TX 1 "aarch64_mem_pair_operand" "Ump")) (set (match_operand:TX2 2 "register_operand" "=w") (match_operand:TX2 3 "memory_operand" "m"))] - "TARGET_SIMD + "TARGET_BASE_SIMD && rtx_equal_p (XEXP (operands[3], 0), plus_constant (Pmode, XEXP (operands[1], 0), @@ -1815,11 +1824,11 @@ (define_insn "store_pair_dw_" (match_operand:TX 1 "register_operand" "w")) (set (match_operand:TX2 2 "memory_operand" "=m") (match_operand:TX2 3 "register_operand" "w"))] - "TARGET_SIMD && - rtx_equal_p (XEXP (operands[2], 0), - plus_constant (Pmode, - XEXP (operands[0], 0), - GET_MODE_SIZE (TFmode)))" + "TARGET_BASE_SIMD + && rtx_equal_p (XEXP (operands[2], 0), + plus_constant (Pmode, + XEXP (operands[0], 0), + GET_MODE_SIZE (TFmode)))" "stp\\t%q1, %q3, %z0" [(set_attr "type" "neon_stp_q") (set_attr "fp" "yes")] @@ -1867,7 +1876,7 @@ (define_insn "loadwb_pair_" (set (match_operand:TX 3 "register_operand" "=w") (mem:TX (plus:P (match_dup 1) (match_operand:P 5 "const_int_operand" "n"))))])] - "TARGET_SIMD && INTVAL (operands[5]) == GET_MODE_SIZE (mode)" + "TARGET_BASE_SIMD && INTVAL (operands[5]) == GET_MODE_SIZE (mode)" "ldp\\t%q2, %q3, [%1], %4" [(set_attr "type" "neon_ldp_q")] ) @@ -1917,7 +1926,7 @@ (define_insn "storewb_pair_" (set (mem:TX (plus:P (match_dup 0) (match_operand:P 5 "const_int_operand" "n"))) (match_operand:TX 3 "register_operand" "w"))])] - "TARGET_SIMD + "TARGET_BASE_SIMD && INTVAL (operands[5]) == INTVAL (operands[4]) + GET_MODE_SIZE (mode)" "stp\\t%q2, %q3, [%0, %4]!" diff --git a/gcc/testsuite/gcc.target/aarch64/movdf_2.c b/gcc/testsuite/gcc.target/aarch64/movdf_2.c new file mode 100644 index 00000000000..0d459d31760 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movdf_2.c @@ -0,0 +1,51 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +/* +** fpr_to_fpr: +** fmov d0, d1 +** ret +*/ +double +fpr_to_fpr (double q0, double q1) [[arm::streaming_compatible]] +{ + return q1; +} + +/* +** gpr_to_fpr: +** fmov d0, x0 +** ret +*/ +double +gpr_to_fpr () [[arm::streaming_compatible]] +{ + register double x0 asm ("x0"); + asm volatile ("" : "=r" (x0)); + return x0; +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +double +zero_to_fpr () [[arm::streaming_compatible]] +{ + return 0; +} + +/* +** fpr_to_gpr: +** fmov x0, d0 +** ret +*/ +void +fpr_to_gpr (double q0) [[arm::streaming_compatible]] +{ + register double x0 asm ("x0"); + x0 = q0; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movdi_3.c b/gcc/testsuite/gcc.target/aarch64/movdi_3.c new file mode 100644 index 00000000000..31b2cbbaeb0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movdi_3.c @@ -0,0 +1,59 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include + +/* +** fpr_to_fpr: +** fmov d0, d1 +** ret +*/ +void +fpr_to_fpr (void) [[arm::streaming_compatible]] +{ + register uint64_t q0 asm ("q0"); + register uint64_t q1 asm ("q1"); + asm volatile ("" : "=w" (q1)); + q0 = q1; + asm volatile ("" :: "w" (q0)); +} + +/* +** gpr_to_fpr: +** fmov d0, x0 +** ret +*/ +void +gpr_to_fpr (uint64_t x0) [[arm::streaming_compatible]] +{ + register uint64_t q0 asm ("q0"); + q0 = x0; + asm volatile ("" :: "w" (q0)); +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +void +zero_to_fpr () [[arm::streaming_compatible]] +{ + register uint64_t q0 asm ("q0"); + q0 = 0; + asm volatile ("" :: "w" (q0)); +} + +/* +** fpr_to_gpr: +** fmov x0, d0 +** ret +*/ +uint64_t +fpr_to_gpr () [[arm::streaming_compatible]] +{ + register uint64_t q0 asm ("q0"); + asm volatile ("" : "=w" (q0)); + return q0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/movhf_2.c b/gcc/testsuite/gcc.target/aarch64/movhf_2.c new file mode 100644 index 00000000000..3292b0de8d1 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movhf_2.c @@ -0,0 +1,53 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nothing+simd" + +/* +** fpr_to_fpr: +** fmov s0, s1 +** ret +*/ +_Float16 +fpr_to_fpr (_Float16 q0, _Float16 q1) [[arm::streaming_compatible]] +{ + return q1; +} + +/* +** gpr_to_fpr: +** fmov s0, w0 +** ret +*/ +_Float16 +gpr_to_fpr () [[arm::streaming_compatible]] +{ + register _Float16 w0 asm ("w0"); + asm volatile ("" : "=r" (w0)); + return w0; +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +_Float16 +zero_to_fpr () [[arm::streaming_compatible]] +{ + return 0; +} + +/* +** fpr_to_gpr: +** fmov w0, s0 +** ret +*/ +void +fpr_to_gpr (_Float16 q0) [[arm::streaming_compatible]] +{ + register _Float16 w0 asm ("w0"); + w0 = q0; + asm volatile ("" :: "r" (w0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movhi_2.c b/gcc/testsuite/gcc.target/aarch64/movhi_2.c new file mode 100644 index 00000000000..dbbf3486f58 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movhi_2.c @@ -0,0 +1,61 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nothing+simd" + +#include + +/* +** fpr_to_fpr: +** fmov s0, s1 +** ret +*/ +void +fpr_to_fpr (void) [[arm::streaming_compatible]] +{ + register uint16_t q0 asm ("q0"); + register uint16_t q1 asm ("q1"); + asm volatile ("" : "=w" (q1)); + q0 = q1; + asm volatile ("" :: "w" (q0)); +} + +/* +** gpr_to_fpr: +** fmov s0, w0 +** ret +*/ +void +gpr_to_fpr (uint16_t w0) [[arm::streaming_compatible]] +{ + register uint16_t q0 asm ("q0"); + q0 = w0; + asm volatile ("" :: "w" (q0)); +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +void +zero_to_fpr () [[arm::streaming_compatible]] +{ + register uint16_t q0 asm ("q0"); + q0 = 0; + asm volatile ("" :: "w" (q0)); +} + +/* +** fpr_to_gpr: +** umov w0, v0.h\[0\] +** ret +*/ +uint16_t +fpr_to_gpr () [[arm::streaming_compatible]] +{ + register uint16_t q0 asm ("q0"); + asm volatile ("" : "=w" (q0)); + return q0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/movqi_2.c b/gcc/testsuite/gcc.target/aarch64/movqi_2.c new file mode 100644 index 00000000000..aec087e4e2c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movqi_2.c @@ -0,0 +1,59 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include + +/* +** fpr_to_fpr: +** fmov s0, s1 +** ret +*/ +void +fpr_to_fpr (void) [[arm::streaming_compatible]] +{ + register uint8_t q0 asm ("q0"); + register uint8_t q1 asm ("q1"); + asm volatile ("" : "=w" (q1)); + q0 = q1; + asm volatile ("" :: "w" (q0)); +} + +/* +** gpr_to_fpr: +** fmov s0, w0 +** ret +*/ +void +gpr_to_fpr (uint8_t w0) [[arm::streaming_compatible]] +{ + register uint8_t q0 asm ("q0"); + q0 = w0; + asm volatile ("" :: "w" (q0)); +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +void +zero_to_fpr () [[arm::streaming_compatible]] +{ + register uint8_t q0 asm ("q0"); + q0 = 0; + asm volatile ("" :: "w" (q0)); +} + +/* +** fpr_to_gpr: +** umov w0, v0.b\[0\] +** ret +*/ +uint8_t +fpr_to_gpr () [[arm::streaming_compatible]] +{ + register uint8_t q0 asm ("q0"); + asm volatile ("" : "=w" (q0)); + return q0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/movsf_2.c b/gcc/testsuite/gcc.target/aarch64/movsf_2.c new file mode 100644 index 00000000000..7fed4b22f7a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movsf_2.c @@ -0,0 +1,51 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +/* +** fpr_to_fpr: +** fmov s0, s1 +** ret +*/ +float +fpr_to_fpr (float q0, float q1) [[arm::streaming_compatible]] +{ + return q1; +} + +/* +** gpr_to_fpr: +** fmov s0, w0 +** ret +*/ +float +gpr_to_fpr () [[arm::streaming_compatible]] +{ + register float w0 asm ("w0"); + asm volatile ("" : "=r" (w0)); + return w0; +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +float +zero_to_fpr () [[arm::streaming_compatible]] +{ + return 0; +} + +/* +** fpr_to_gpr: +** fmov w0, s0 +** ret +*/ +void +fpr_to_gpr (float q0) [[arm::streaming_compatible]] +{ + register float w0 asm ("w0"); + w0 = q0; + asm volatile ("" :: "r" (w0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movsi_2.c b/gcc/testsuite/gcc.target/aarch64/movsi_2.c new file mode 100644 index 00000000000..c14d2468af3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movsi_2.c @@ -0,0 +1,59 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include + +/* +** fpr_to_fpr: +** fmov s0, s1 +** ret +*/ +void +fpr_to_fpr (void) [[arm::streaming_compatible]] +{ + register uint32_t q0 asm ("q0"); + register uint32_t q1 asm ("q1"); + asm volatile ("" : "=w" (q1)); + q0 = q1; + asm volatile ("" :: "w" (q0)); +} + +/* +** gpr_to_fpr: +** fmov s0, w0 +** ret +*/ +void +gpr_to_fpr (uint32_t w0) [[arm::streaming_compatible]] +{ + register uint32_t q0 asm ("q0"); + q0 = w0; + asm volatile ("" :: "w" (q0)); +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +void +zero_to_fpr () [[arm::streaming_compatible]] +{ + register uint32_t q0 asm ("q0"); + q0 = 0; + asm volatile ("" :: "w" (q0)); +} + +/* +** fpr_to_gpr: +** fmov w0, s0 +** ret +*/ +uint32_t +fpr_to_gpr () [[arm::streaming_compatible]] +{ + register uint32_t q0 asm ("q0"); + asm volatile ("" : "=w" (q0)); + return q0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/movtf_3.c b/gcc/testsuite/gcc.target/aarch64/movtf_3.c new file mode 100644 index 00000000000..dd164a41855 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movtf_3.c @@ -0,0 +1,81 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target large_long_double } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nosve" + +/* +** fpr_to_fpr: +** sub sp, sp, #16 +** str q1, \[sp\] +** ldr q0, \[sp\] +** add sp, sp, #?16 +** ret +*/ +long double +fpr_to_fpr (long double q0, long double q1) [[arm::streaming_compatible]] +{ + return q1; +} + +/* +** gpr_to_fpr: { target aarch64_little_endian } +** fmov d0, x0 +** fmov v0.d\[1\], x1 +** ret +*/ +/* +** gpr_to_fpr: { target aarch64_big_endian } +** fmov d0, x1 +** fmov v0.d\[1\], x0 +** ret +*/ +long double +gpr_to_fpr () [[arm::streaming_compatible]] +{ + register long double x0 asm ("x0"); + asm volatile ("" : "=r" (x0)); + return x0; +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +long double +zero_to_fpr () [[arm::streaming_compatible]] +{ + return 0; +} + +/* +** fpr_to_gpr: { target aarch64_little_endian } +** ( +** fmov x0, d0 +** fmov x1, v0.d\[1\] +** | +** fmov x1, v0.d\[1\] +** fmov x0, d0 +** ) +** ret +*/ +/* +** fpr_to_gpr: { target aarch64_big_endian } +** ( +** fmov x1, d0 +** fmov x0, v0.d\[1\] +** | +** fmov x0, v0.d\[1\] +** fmov x1, d0 +** ) +** ret +*/ +void +fpr_to_gpr (long double q0) [[arm::streaming_compatible]] +{ + register long double x0 asm ("x0"); + x0 = q0; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movtf_4.c b/gcc/testsuite/gcc.target/aarch64/movtf_4.c new file mode 100644 index 00000000000..faf9703e2b6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movtf_4.c @@ -0,0 +1,78 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target large_long_double } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+sve" + +/* +** fpr_to_fpr: +** mov z0.d, z1.d +** ret +*/ +long double +fpr_to_fpr (long double q0, long double q1) [[arm::streaming_compatible]] +{ + return q1; +} + +/* +** gpr_to_fpr: { target aarch64_little_endian } +** fmov d0, x0 +** fmov v0.d\[1\], x1 +** ret +*/ +/* +** gpr_to_fpr: { target aarch64_big_endian } +** fmov d0, x1 +** fmov v0.d\[1\], x0 +** ret +*/ +long double +gpr_to_fpr () [[arm::streaming_compatible]] +{ + register long double x0 asm ("x0"); + asm volatile ("" : "=r" (x0)); + return x0; +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +long double +zero_to_fpr () [[arm::streaming_compatible]] +{ + return 0; +} + +/* +** fpr_to_gpr: { target aarch64_little_endian } +** ( +** fmov x0, d0 +** fmov x1, v0.d\[1\] +** | +** fmov x1, v0.d\[1\] +** fmov x0, d0 +** ) +** ret +*/ +/* +** fpr_to_gpr: { target aarch64_big_endian } +** ( +** fmov x1, d0 +** fmov x0, v0.d\[1\] +** | +** fmov x0, v0.d\[1\] +** fmov x1, d0 +** ) +** ret +*/ +void +fpr_to_gpr (long double q0) [[arm::streaming_compatible]] +{ + register long double x0 asm ("x0"); + x0 = q0; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movti_3.c b/gcc/testsuite/gcc.target/aarch64/movti_3.c new file mode 100644 index 00000000000..243109181d6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movti_3.c @@ -0,0 +1,86 @@ +/* { dg-do assemble } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nosve" + +/* +** fpr_to_fpr: +** sub sp, sp, #16 +** str q1, \[sp\] +** ldr q0, \[sp\] +** add sp, sp, #?16 +** ret +*/ +void +fpr_to_fpr (void) [[arm::streaming_compatible]] +{ + register __int128_t q0 asm ("q0"); + register __int128_t q1 asm ("q1"); + asm volatile ("" : "=w" (q1)); + q0 = q1; + asm volatile ("" :: "w" (q0)); +} + +/* +** gpr_to_fpr: { target aarch64_little_endian } +** fmov d0, x0 +** fmov v0.d\[1\], x1 +** ret +*/ +/* +** gpr_to_fpr: { target aarch64_big_endian } +** fmov d0, x1 +** fmov v0.d\[1\], x0 +** ret +*/ +void +gpr_to_fpr (__int128_t x0) [[arm::streaming_compatible]] +{ + register __int128_t q0 asm ("q0"); + q0 = x0; + asm volatile ("" :: "w" (q0)); +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +void +zero_to_fpr () [[arm::streaming_compatible]] +{ + register __int128_t q0 asm ("q0"); + q0 = 0; + asm volatile ("" :: "w" (q0)); +} + +/* +** fpr_to_gpr: { target aarch64_little_endian } +** ( +** fmov x0, d0 +** fmov x1, v0.d\[1\] +** | +** fmov x1, v0.d\[1\] +** fmov x0, d0 +** ) +** ret +*/ +/* +** fpr_to_gpr: { target aarch64_big_endian } +** ( +** fmov x1, d0 +** fmov x0, v0.d\[1\] +** | +** fmov x0, v0.d\[1\] +** fmov x1, d0 +** ) +** ret +*/ +__int128_t +fpr_to_gpr () [[arm::streaming_compatible]] +{ + register __int128_t q0 asm ("q0"); + asm volatile ("" : "=w" (q0)); + return q0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/movti_4.c b/gcc/testsuite/gcc.target/aarch64/movti_4.c new file mode 100644 index 00000000000..a70feccb0e3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movti_4.c @@ -0,0 +1,83 @@ +/* { dg-do assemble } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+sve" + +/* +** fpr_to_fpr: +** mov z0\.d, z1\.d +** ret +*/ +void +fpr_to_fpr (void) [[arm::streaming_compatible]] +{ + register __int128_t q0 asm ("q0"); + register __int128_t q1 asm ("q1"); + asm volatile ("" : "=w" (q1)); + q0 = q1; + asm volatile ("" :: "w" (q0)); +} + +/* +** gpr_to_fpr: { target aarch64_little_endian } +** fmov d0, x0 +** fmov v0.d\[1\], x1 +** ret +*/ +/* +** gpr_to_fpr: { target aarch64_big_endian } +** fmov d0, x1 +** fmov v0.d\[1\], x0 +** ret +*/ +void +gpr_to_fpr (__int128_t x0) [[arm::streaming_compatible]] +{ + register __int128_t q0 asm ("q0"); + q0 = x0; + asm volatile ("" :: "w" (q0)); +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +void +zero_to_fpr () [[arm::streaming_compatible]] +{ + register __int128_t q0 asm ("q0"); + q0 = 0; + asm volatile ("" :: "w" (q0)); +} + +/* +** fpr_to_gpr: { target aarch64_little_endian } +** ( +** fmov x0, d0 +** fmov x1, v0.d\[1\] +** | +** fmov x1, v0.d\[1\] +** fmov x0, d0 +** ) +** ret +*/ +/* +** fpr_to_gpr: { target aarch64_big_endian } +** ( +** fmov x1, d0 +** fmov x0, v0.d\[1\] +** | +** fmov x0, v0.d\[1\] +** fmov x1, d0 +** ) +** ret +*/ +__int128_t +fpr_to_gpr () [[arm::streaming_compatible]] +{ + register __int128_t q0 asm ("q0"); + asm volatile ("" : "=w" (q0)); + return q0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/movv16qi_4.c b/gcc/testsuite/gcc.target/aarch64/movv16qi_4.c new file mode 100644 index 00000000000..7bec888b71d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv16qi_4.c @@ -0,0 +1,82 @@ +/* { dg-do assemble } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nosve" + +typedef unsigned char v16qi __attribute__((vector_size(16))); + +/* +** fpr_to_fpr: +** sub sp, sp, #16 +** str q1, \[sp\] +** ldr q0, \[sp\] +** add sp, sp, #?16 +** ret +*/ +v16qi +fpr_to_fpr (v16qi q0, v16qi q1) [[arm::streaming_compatible]] +{ + return q1; +} + +/* +** gpr_to_fpr: { target aarch64_little_endian } +** fmov d0, x0 +** fmov v0.d\[1\], x1 +** ret +*/ +/* +** gpr_to_fpr: { target aarch64_big_endian } +** fmov d0, x1 +** fmov v0.d\[1\], x0 +** ret +*/ +v16qi +gpr_to_fpr () [[arm::streaming_compatible]] +{ + register v16qi x0 asm ("x0"); + asm volatile ("" : "=r" (x0)); + return x0; +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +v16qi +zero_to_fpr () [[arm::streaming_compatible]] +{ + return (v16qi) {}; +} + +/* +** fpr_to_gpr: { target aarch64_little_endian } +** ( +** umov x0, v0.d\[0\] +** fmov x1, v0.d\[1\] +** | +** fmov x1, v0.d\[1\] +** umov x0, v0.d\[0\] +** ) +** ret +*/ +/* +** fpr_to_gpr: { target aarch64_big_endian } +** ( +** umov x1, v0.d\[0\] +** fmov x0, v0.d\[1\] +** | +** fmov x0, v0.d\[1\] +** umov x1, v0.d\[0\] +** ) +** ret +*/ +void +fpr_to_gpr (v16qi q0) [[arm::streaming_compatible]] +{ + register v16qi x0 asm ("x0"); + x0 = q0; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movv16qi_5.c b/gcc/testsuite/gcc.target/aarch64/movv16qi_5.c new file mode 100644 index 00000000000..2d36342b3f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv16qi_5.c @@ -0,0 +1,79 @@ +/* { dg-do assemble } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+sve" + +typedef unsigned char v16qi __attribute__((vector_size(16))); + +/* +** fpr_to_fpr: +** mov z0.d, z1.d +** ret +*/ +v16qi +fpr_to_fpr (v16qi q0, v16qi q1) [[arm::streaming_compatible]] +{ + return q1; +} + +/* +** gpr_to_fpr: { target aarch64_little_endian } +** fmov d0, x0 +** fmov v0.d\[1\], x1 +** ret +*/ +/* +** gpr_to_fpr: { target aarch64_big_endian } +** fmov d0, x1 +** fmov v0.d\[1\], x0 +** ret +*/ +v16qi +gpr_to_fpr () [[arm::streaming_compatible]] +{ + register v16qi x0 asm ("x0"); + asm volatile ("" : "=r" (x0)); + return x0; +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +v16qi +zero_to_fpr () [[arm::streaming_compatible]] +{ + return (v16qi) {}; +} + +/* +** fpr_to_gpr: { target aarch64_little_endian } +** ( +** umov x0, v0.d\[0\] +** fmov x1, v0.d\[1\] +** | +** fmov x1, v0.d\[1\] +** umov x0, v0.d\[0\] +** ) +** ret +*/ +/* +** fpr_to_gpr: { target aarch64_big_endian } +** ( +** umov x1, v0.d\[0\] +** fmov x0, v0.d\[1\] +** | +** fmov x0, v0.d\[1\] +** umov x1, v0.d\[0\] +** ) +** ret +*/ +void +fpr_to_gpr (v16qi q0) [[arm::streaming_compatible]] +{ + register v16qi x0 asm ("x0"); + x0 = q0; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movv8qi_4.c b/gcc/testsuite/gcc.target/aarch64/movv8qi_4.c new file mode 100644 index 00000000000..12ae25a3a4a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv8qi_4.c @@ -0,0 +1,55 @@ +/* { dg-do assemble } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nosve" + +typedef unsigned char v8qi __attribute__((vector_size(8))); + +/* +** fpr_to_fpr: +** fmov d0, d1 +** ret +*/ +v8qi +fpr_to_fpr (v8qi q0, v8qi q1) [[arm::streaming_compatible]] +{ + return q1; +} + +/* +** gpr_to_fpr: +** fmov d0, x0 +** ret +*/ +v8qi +gpr_to_fpr () [[arm::streaming_compatible]] +{ + register v8qi x0 asm ("x0"); + asm volatile ("" : "=r" (x0)); + return x0; +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +v8qi +zero_to_fpr () [[arm::streaming_compatible]] +{ + return (v8qi) {}; +} + +/* +** fpr_to_gpr: +** umov x0, v0\.d\[0\] +** ret +*/ +void +fpr_to_gpr (v8qi q0) [[arm::streaming_compatible]] +{ + register v8qi x0 asm ("x0"); + x0 = q0; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c new file mode 100644 index 00000000000..5b5346cf435 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c @@ -0,0 +1,13 @@ +// { dg-options "" } + +#include + +#pragma GCC target "+nosme" + +// { dg-error {inlining failed.*'vhaddq_s32'} "" { target *-*-* } 0 } + +int32x4_t +foo (int32x4_t x, int32x4_t y) [[arm::streaming_compatible]] +{ + return vhaddq_s32 (x, y); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c new file mode 100644 index 00000000000..2092c4471f0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c @@ -0,0 +1,11 @@ +// { dg-options "" } + +#include + +// { dg-error {inlining failed.*'vhaddq_s32'} "" { target *-*-* } 0 } + +int32x4_t +foo (int32x4_t x, int32x4_t y) [[arm::streaming_compatible]] +{ + return vhaddq_s32 (x, y); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c new file mode 100644 index 00000000000..36794e5b0df --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c @@ -0,0 +1,11 @@ +// { dg-options "" } + +#include + +// { dg-error {inlining failed.*'vhaddq_s32'} "" { target *-*-* } 0 } + +int32x4_t +foo (int32x4_t x, int32x4_t y) [[arm::streaming]] +{ + return vhaddq_s32 (x, y); +}