From patchwork Sun Nov 13 10:00:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 19318 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp1619535wru; Sun, 13 Nov 2022 02:03:03 -0800 (PST) X-Google-Smtp-Source: AA0mqf6UPYeNQSbwYotloYLn9hrjhUQ8ueq7Cjlemtxx6ttgYApaxotM5E3mnPQaEdOcgRZte1vc X-Received: by 2002:a17:907:990f:b0:7ad:79c0:5479 with SMTP id ka15-20020a170907990f00b007ad79c05479mr7176084ejc.392.1668333783332; Sun, 13 Nov 2022 02:03:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668333783; cv=none; d=google.com; s=arc-20160816; b=ez9i6kRmUrdUAfCqtVQ0WRVy0rErKQ6JzFcr2A8YR11rFytm6Sn2xEUVcD6anAStq2 5yXbXcJH0thK1M1ZLT2O1ztgAQa5Rpe16u7G464DkDe6yf/PEzm9vdFooxkAMaBF+1Cj hGx3PWrV/yqhdfw3rM8yKrcUR4tzA3G45yOhTL49Cv9MHv7V801CTurjLOF7U2GEWToM DE9q/EgvaQZ8cwhZOYF0I5hw6GuEcNSq6MoOMYGtIMFGG72N9IVinkMmScUEkYr8PnpL OHSgWmnN6Ylc74sMjCJa944TV74H/sMUQsIET7289ocMQEYskcpk7bDHTbg2f3cFk+Xc nyqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:mime-version :user-agent:message-id:in-reply-to:date:references:subject :mail-followup-to:to:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=J0q2U764hhewRK+e0IjhelrBV3TOd2lCO5+OgLfwcxM=; b=NTUwAezf7tcf9mlucoxhp9YhKT2wk8nvQX0SHcViOY7aK4sV9AP0uDSGo6OqGu/C0e hS6ZXWXhjSMpb+33ghVyYgLtN1rFCjoCoClnamJARoD1chzCYAje+kbFn5+rl+CKs87n mceDmHVWtWliTu7lvPrH7joPBkdh7JSenLbBtpeYKb+2bh73Ki4pZwUG6w5r+XiwDCdg xeiDNGFeJP2flpxFfVc0jJb8azj9qycZF87q249Qvf0JR+ia1+QrIMjjUJUJGE28M46p YuiT3Vo0kHflJoEMuesdonbzI7pkfY0hUbfVmkTaiZInKwMp2hUwbtoXha/t4w6gvYvE 2Xpw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=TienKeUm; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id f14-20020a0564021e8e00b004676034f552si6067515edf.45.2022.11.13.02.03.02 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 13 Nov 2022 02:03:03 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=TienKeUm; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1107C3898394 for ; Sun, 13 Nov 2022 10:02:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1107C3898394 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1668333728; bh=J0q2U764hhewRK+e0IjhelrBV3TOd2lCO5+OgLfwcxM=; h=To:Subject:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=TienKeUmPTZsCNQdpi6wfNCy6bds2iAcVG/s5nGSbScIQ565euiSglMqRhCovqh3S HTXb2c3bk5cQIv/ZsfU5VA0K2wrocZdX9NTmhsixO4DK3tFVc45UjACRtA37J+GwJk KXQMoFiikMhhCqZf/6wqRGWjRU38erbrwZBmHReQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id A5E303889E36 for ; Sun, 13 Nov 2022 10:00:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A5E303889E36 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9E24D23A for ; Sun, 13 Nov 2022 02:00:33 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C90193F73D for ; Sun, 13 Nov 2022 02:00:26 -0800 (PST) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH 03/16] aarch64: Distinguish streaming-compatible AdvSIMD insns References: Date: Sun, 13 Nov 2022 10:00:25 +0000 In-Reply-To: (Richard Sandiford's message of "Sun, 13 Nov 2022 09:59:23 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-42.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749374765029837673?= X-GMAIL-MSGID: =?utf-8?q?1749374765029837673?= The vast majority of Advanced SIMD instructions are not available in streaming mode, but some of the load/store/move instructions are. This patch adds a new target feature macro called TARGET_BASE_SIMD for this streaming-compatible subset. The vector-to-vector move instructions are not streaming-compatible, so we need to use the SVE move instructions where enabled, or fall back to the nofp16 handling otherwise. I haven't found a good way of testing the SVE EXT alternative in aarch64_simd_mov_from_high, but I'd rather provide it than not. gcc/ * config/aarch64/aarch64.h (TARGET_BASE_SIMD): New macro. (TARGET_SIMD): Require PSTATE.SM to be 0. (AARCH64_ISA_SM_OFF): New macro. * config/aarch64/aarch64.cc (aarch64_array_mode_supported_p): Allow Advanced SIMD structure modes for TARGET_BASE_SIMD. (aarch64_print_operand): Support '%Z'. (aarch64_secondary_reload): Expect SVE moves to be used for Advanced SIMD modes if SVE is enabled and non-streaming Advanced SIMD isn't. (aarch64_register_move_cost): Likewise. (aarch64_simd_container_mode): Extend Advanced SIMD mode handling to TARGET_BASE_SIMD. (aarch64_expand_cpymem): Expand commentary. * config/aarch64/aarch64.md (arches): Add base_simd. (arch_enabled): Handle it. (*mov_aarch64): Extend UMOV alternative to TARGET_BASE_SIMD. (*movti_aarch64): Use an SVE move instruction if non-streaming SIMD isn't available. (*mov_aarch64): Likewise. (load_pair_dw_tftf): Extend to TARGET_BASE_SIMD. (store_pair_dw_tftf): Likewise. (loadwb_pair_): Likewise. (storewb_pair_): Likewise. * config/aarch64/aarch64-simd.md (*aarch64_simd_mov): Allow UMOV in streaming mode. (*aarch64_simd_mov): Use an SVE move instruction if non-streaming SIMD isn't available. (aarch64_store_lane0): Depend on TARGET_FLOAT rather than TARGET_SIMD. (aarch64_simd_mov_from_low): Likewise. Use fmov if Advanced SIMD is completely disabled. (aarch64_simd_mov_from_high): Use SVE EXT instructions if non-streaming SIMD isn't available. gcc/testsuite/ * gcc.target/aarch64/movdf_2.c: New test. * gcc.target/aarch64/movdi_3.c: Likewise. * gcc.target/aarch64/movhf_2.c: Likewise. * gcc.target/aarch64/movhi_2.c: Likewise. * gcc.target/aarch64/movqi_2.c: Likewise. * gcc.target/aarch64/movsf_2.c: Likewise. * gcc.target/aarch64/movsi_2.c: Likewise. * gcc.target/aarch64/movtf_3.c: Likewise. * gcc.target/aarch64/movtf_4.c: Likewise. * gcc.target/aarch64/movti_3.c: Likewise. * gcc.target/aarch64/movti_4.c: Likewise. * gcc.target/aarch64/movv16qi_4.c: Likewise. * gcc.target/aarch64/movv16qi_5.c: Likewise. * gcc.target/aarch64/movv8qi_4.c: Likewise. * gcc.target/aarch64/sme/arm_neon_1.c: Likewise. * gcc.target/aarch64/sme/arm_neon_2.c: Likewise. * gcc.target/aarch64/sme/arm_neon_3.c: Likewise. --- gcc/config/aarch64/aarch64-simd.md | 43 ++++++---- gcc/config/aarch64/aarch64.cc | 22 +++-- gcc/config/aarch64/aarch64.h | 12 ++- gcc/config/aarch64/aarch64.md | 45 +++++----- gcc/testsuite/gcc.target/aarch64/movdf_2.c | 51 +++++++++++ gcc/testsuite/gcc.target/aarch64/movdi_3.c | 59 +++++++++++++ gcc/testsuite/gcc.target/aarch64/movhf_2.c | 53 ++++++++++++ gcc/testsuite/gcc.target/aarch64/movhi_2.c | 61 +++++++++++++ gcc/testsuite/gcc.target/aarch64/movqi_2.c | 59 +++++++++++++ gcc/testsuite/gcc.target/aarch64/movsf_2.c | 51 +++++++++++ gcc/testsuite/gcc.target/aarch64/movsi_2.c | 59 +++++++++++++ gcc/testsuite/gcc.target/aarch64/movtf_3.c | 81 +++++++++++++++++ gcc/testsuite/gcc.target/aarch64/movtf_4.c | 78 +++++++++++++++++ gcc/testsuite/gcc.target/aarch64/movti_3.c | 86 +++++++++++++++++++ gcc/testsuite/gcc.target/aarch64/movti_4.c | 83 ++++++++++++++++++ gcc/testsuite/gcc.target/aarch64/movv16qi_4.c | 82 ++++++++++++++++++ gcc/testsuite/gcc.target/aarch64/movv16qi_5.c | 79 +++++++++++++++++ gcc/testsuite/gcc.target/aarch64/movv8qi_4.c | 55 ++++++++++++ .../gcc.target/aarch64/sme/arm_neon_1.c | 13 +++ .../gcc.target/aarch64/sme/arm_neon_2.c | 11 +++ .../gcc.target/aarch64/sme/arm_neon_3.c | 11 +++ 21 files changed, 1047 insertions(+), 47 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/movdf_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movdi_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movhf_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movhi_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movqi_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movsf_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movsi_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movtf_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movtf_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movti_3.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movti_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv16qi_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv16qi_5.c create mode 100644 gcc/testsuite/gcc.target/aarch64/movv8qi_4.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 5386043739a..b6313cba172 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -133,7 +133,7 @@ (define_insn "*aarch64_simd_mov" return "mov\t%0., %1."; return "fmov\t%d0, %d1"; case 4: - if (TARGET_SIMD) + if (TARGET_BASE_SIMD) return "umov\t%0, %1.d[0]"; return "fmov\t%x0, %d1"; case 5: return "fmov\t%d0, %1"; @@ -152,9 +152,9 @@ (define_insn "*aarch64_simd_mov" (define_insn "*aarch64_simd_mov" [(set (match_operand:VQMOV 0 "nonimmediate_operand" - "=w, Umn, m, w, ?r, ?w, ?r, w, w") + "=w, Umn, m, w, w, ?r, ?w, ?r, w, w") (match_operand:VQMOV 1 "general_operand" - "m, Dz, w, w, w, r, r, Dn, Dz"))] + "m, Dz, w, w, w, w, r, r, Dn, Dz"))] "TARGET_FLOAT && (register_operand (operands[0], mode) || aarch64_simd_reg_or_zero (operands[1], mode))" @@ -170,22 +170,24 @@ (define_insn "*aarch64_simd_mov" case 3: return "mov\t%0., %1."; case 4: + return "mov\t%Z0.d, %Z1.d"; case 5: case 6: - return "#"; case 7: - return aarch64_output_simd_mov_immediate (operands[1], 128); + return "#"; case 8: + return aarch64_output_simd_mov_immediate (operands[1], 128); + case 9: return "fmov\t%d0, xzr"; default: gcc_unreachable (); } } [(set_attr "type" "neon_load1_1reg, store_16, neon_store1_1reg,\ - neon_logic, multiple, multiple,\ - multiple, neon_move, fmov") - (set_attr "length" "4,4,4,4,8,8,8,4,4") - (set_attr "arch" "*,*,*,simd,*,*,*,simd,*")] + neon_logic, *, multiple, multiple,\ + multiple, neon_move, f_mcr") + (set_attr "length" "4,4,4,4,4,8,8,8,4,4") + (set_attr "arch" "*,*,*,simd,sve,*,*,*,simd,*")] ) ;; When storing lane zero we can use the normal STR and its more permissive @@ -195,7 +197,7 @@ (define_insn "aarch64_store_lane0" [(set (match_operand: 0 "memory_operand" "=m") (vec_select: (match_operand:VALL_F16 1 "register_operand" "w") (parallel [(match_operand 2 "const_int_operand" "n")])))] - "TARGET_SIMD + "TARGET_FLOAT && ENDIAN_LANE_N (, INTVAL (operands[2])) == 0" "str\\t%1, %0" [(set_attr "type" "neon_store1_1reg")] @@ -353,35 +355,38 @@ (define_expand "aarch64_get_high" ) (define_insn_and_split "aarch64_simd_mov_from_low" - [(set (match_operand: 0 "register_operand" "=w,?r") + [(set (match_operand: 0 "register_operand" "=w,?r,?r") (vec_select: - (match_operand:VQMOV_NO2E 1 "register_operand" "w,w") + (match_operand:VQMOV_NO2E 1 "register_operand" "w,w,w") (match_operand:VQMOV_NO2E 2 "vect_par_cnst_lo_half" "")))] - "TARGET_SIMD" + "TARGET_FLOAT" "@ # - umov\t%0, %1.d[0]" + umov\t%0, %1.d[0] + fmov\t%0, %d1" "&& reload_completed && aarch64_simd_register (operands[0], mode)" [(set (match_dup 0) (match_dup 1))] { operands[1] = aarch64_replace_reg_mode (operands[1], mode); } - [(set_attr "type" "mov_reg,neon_to_gp") + [(set_attr "type" "mov_reg,neon_to_gp,f_mrc") + (set_attr "arch" "simd,base_simd,*") (set_attr "length" "4")] ) (define_insn "aarch64_simd_mov_from_high" - [(set (match_operand: 0 "register_operand" "=w,?r,?r") + [(set (match_operand: 0 "register_operand" "=w,w,?r,?r") (vec_select: - (match_operand:VQMOV_NO2E 1 "register_operand" "w,w,w") + (match_operand:VQMOV_NO2E 1 "register_operand" "w,0,w,w") (match_operand:VQMOV_NO2E 2 "vect_par_cnst_hi_half" "")))] "TARGET_FLOAT" "@ dup\t%d0, %1.d[1] + ext\t%Z0.b, %Z0.b, %Z0.b, #8 umov\t%0, %1.d[1] fmov\t%0, %1.d[1]" - [(set_attr "type" "neon_dup,neon_to_gp,f_mrc") - (set_attr "arch" "simd,simd,*") + [(set_attr "type" "neon_dup,*,neon_to_gp,f_mrc") + (set_attr "arch" "simd,sve,simd,*") (set_attr "length" "4")] ) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index fc6f0bc208a..36ef0435b4e 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -3726,7 +3726,7 @@ static bool aarch64_array_mode_supported_p (machine_mode mode, unsigned HOST_WIDE_INT nelems) { - if (TARGET_SIMD + if (TARGET_BASE_SIMD && (AARCH64_VALID_SIMD_QREG_MODE (mode) || AARCH64_VALID_SIMD_DREG_MODE (mode)) && (nelems >= 2 && nelems <= 4)) @@ -11876,6 +11876,10 @@ sizetochar (int size) 'N': Take the duplicated element in a vector constant and print the negative of it in decimal. 'b/h/s/d/q': Print a scalar FP/SIMD register name. + 'Z': Same for SVE registers. ('z' was already taken.) + Note that it is not necessary to use %Z for operands + that have SVE modes. The convention is to use %Z + only for non-SVE (or potentially non-SVE) modes. 'S/T/U/V': Print a FP/SIMD register name for a register list. The register printed is the FP/SIMD register name of X + 0/1/2/3 for S/T/U/V. @@ -12048,6 +12052,8 @@ aarch64_print_operand (FILE *f, rtx x, int code) case 's': case 'd': case 'q': + case 'Z': + code = TOLOWER (code); if (!REG_P (x) || !FP_REGNUM_P (REGNO (x))) { output_operand_lossage ("incompatible floating point / vector register operand for '%%%c'", code); @@ -12702,8 +12708,8 @@ aarch64_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x, return NO_REGS; } - /* Without the TARGET_SIMD instructions we cannot move a Q register - to a Q register directly. We need a scratch. */ + /* Without the TARGET_SIMD or TARGET_SVE instructions we cannot move a + Q register to a Q register directly. We need a scratch. */ if (REG_P (x) && (mode == TFmode || mode == TImode @@ -15273,7 +15279,7 @@ aarch64_register_move_cost (machine_mode mode, secondary reload. A general register is used as a scratch to move the upper DI value and the lower DI value is moved directly, hence the cost is the sum of three moves. */ - if (! TARGET_SIMD) + if (!TARGET_SIMD && !TARGET_SVE) return regmove_cost->GP2FP + regmove_cost->FP2GP + regmove_cost->FP2FP; return regmove_cost->FP2FP; @@ -20773,7 +20779,7 @@ aarch64_simd_container_mode (scalar_mode mode, poly_int64 width) return aarch64_full_sve_mode (mode).else_mode (word_mode); gcc_assert (known_eq (width, 64) || known_eq (width, 128)); - if (TARGET_SIMD) + if (TARGET_BASE_SIMD) { if (known_eq (width, 128)) return aarch64_vq_mode (mode).else_mode (word_mode); @@ -24908,7 +24914,11 @@ aarch64_expand_cpymem (rtx *operands) int copy_bits = 256; /* Default to 256-bit LDP/STP on large copies, however small copies, no SIMD - support or slow 256-bit LDP/STP fall back to 128-bit chunks. */ + support or slow 256-bit LDP/STP fall back to 128-bit chunks. + + ??? Although it would be possible to use LDP/STP Qn in streaming mode + (so using TARGET_BASE_SIMD instead of TARGET_SIMD), it isn't clear + whether that would improve performance. */ if (size <= 24 || !TARGET_SIMD || (aarch64_tune_params.extra_tuning_flags diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index c47f27eefec..398cc03fd1f 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -61,8 +61,15 @@ #define WORDS_BIG_ENDIAN (BYTES_BIG_ENDIAN) /* AdvSIMD is supported in the default configuration, unless disabled by - -mgeneral-regs-only or by the +nosimd extension. */ -#define TARGET_SIMD (AARCH64_ISA_SIMD) + -mgeneral-regs-only or by the +nosimd extension. The set of available + instructions is then subdivided into: + + - the "base" set, available both in SME streaming mode and in + non-streaming mode + + - the full set, available only in non-streaming mode. */ +#define TARGET_BASE_SIMD (AARCH64_ISA_SIMD) +#define TARGET_SIMD (AARCH64_ISA_SIMD && AARCH64_ISA_SM_OFF) #define TARGET_FLOAT (AARCH64_ISA_FP) #define UNITS_PER_WORD 8 @@ -199,6 +206,7 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = AARCH64_FL_SM_OFF; /* Macros to test ISA flags. */ +#define AARCH64_ISA_SM_OFF (aarch64_isa_flags & AARCH64_FL_SM_OFF) #define AARCH64_ISA_MODE (aarch64_isa_flags & AARCH64_FL_ISA_MODES) #define AARCH64_ISA_CRC (aarch64_isa_flags & AARCH64_FL_CRC) #define AARCH64_ISA_CRYPTO (aarch64_isa_flags & AARCH64_FL_CRYPTO) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index cd6d5e5000c..3dc877ba9fe 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -374,7 +374,7 @@ (define_constants ;; As a convenience, "fp_q" means "fp" + the ability to move between ;; Q registers and is equivalent to "simd". -(define_enum "arches" [ any rcpc8_4 fp fp_q simd sve fp16]) +(define_enum "arches" [any rcpc8_4 fp fp_q base_simd simd sve fp16]) (define_enum_attr "arch" "arches" (const_string "any")) @@ -402,6 +402,9 @@ (define_attr "arch_enabled" "no,yes" (and (eq_attr "arch" "fp") (match_test "TARGET_FLOAT")) + (and (eq_attr "arch" "base_simd") + (match_test "TARGET_BASE_SIMD")) + (and (eq_attr "arch" "fp_q, simd") (match_test "TARGET_SIMD")) @@ -1215,7 +1218,7 @@ (define_insn "*mov_aarch64" case 8: return "str\t%1, %0"; case 9: - return TARGET_SIMD ? "umov\t%w0, %1.[0]" : "fmov\t%w0, %s1"; + return TARGET_BASE_SIMD ? "umov\t%w0, %1.[0]" : "fmov\t%w0, %s1"; case 10: return TARGET_SIMD ? "dup\t%0., %w1" : "fmov\t%s0, %w1"; case 11: @@ -1395,9 +1398,9 @@ (define_expand "movti" (define_insn "*movti_aarch64" [(set (match_operand:TI 0 - "nonimmediate_operand" "= r,w,w,w, r,w,r,m,m,w,m") + "nonimmediate_operand" "= r,w,w,w, r,w,w,r,m,m,w,m") (match_operand:TI 1 - "aarch64_movti_operand" " rUti,Z,Z,r, w,w,m,r,Z,m,w"))] + "aarch64_movti_operand" " rUti,Z,Z,r, w,w,w,m,r,Z,m,w"))] "(register_operand (operands[0], TImode) || aarch64_reg_or_zero (operands[1], TImode))" "@ @@ -1407,16 +1410,17 @@ (define_insn "*movti_aarch64" # # mov\\t%0.16b, %1.16b + mov\\t%Z0.d, %Z1.d ldp\\t%0, %H0, %1 stp\\t%1, %H1, %0 stp\\txzr, xzr, %0 ldr\\t%q0, %1 str\\t%q1, %0" - [(set_attr "type" "multiple,neon_move,f_mcr,f_mcr,f_mrc,neon_logic_q, \ + [(set_attr "type" "multiple,neon_move,f_mcr,f_mcr,f_mrc,neon_logic_q,*,\ load_16,store_16,store_16,\ load_16,store_16") - (set_attr "length" "8,4,4,8,8,4,4,4,4,4,4") - (set_attr "arch" "*,simd,*,*,*,simd,*,*,*,fp,fp")] + (set_attr "length" "8,4,4,8,8,4,4,4,4,4,4,4") + (set_attr "arch" "*,simd,*,*,*,simd,sve,*,*,*,fp,fp")] ) ;; Split a TImode register-register or register-immediate move into @@ -1552,13 +1556,14 @@ (define_split (define_insn "*mov_aarch64" [(set (match_operand:TFD 0 - "nonimmediate_operand" "=w,?r ,w ,?r,w,?w,w,m,?r,m ,m") + "nonimmediate_operand" "=w,w,?r ,w ,?r,w,?w,w,m,?r,m ,m") (match_operand:TFD 1 - "general_operand" " w,?rY,?r,w ,Y,Y ,m,w,m ,?r,Y"))] + "general_operand" " w,w,?rY,?r,w ,Y,Y ,m,w,m ,?r,Y"))] "TARGET_FLOAT && (register_operand (operands[0], mode) || aarch64_reg_or_fp_zero (operands[1], mode))" "@ mov\\t%0.16b, %1.16b + mov\\t%Z0.d, %Z1.d # # # @@ -1569,10 +1574,10 @@ (define_insn "*mov_aarch64" ldp\\t%0, %H0, %1 stp\\t%1, %H1, %0 stp\\txzr, xzr, %0" - [(set_attr "type" "logic_reg,multiple,f_mcr,f_mrc,neon_move_q,f_mcr,\ + [(set_attr "type" "logic_reg,*,multiple,f_mcr,f_mrc,neon_move_q,f_mcr,\ f_loadd,f_stored,load_16,store_16,store_16") - (set_attr "length" "4,8,8,8,4,4,4,4,4,4,4") - (set_attr "arch" "simd,*,*,*,simd,*,*,*,*,*,*")] + (set_attr "length" "4,4,8,8,8,4,4,4,4,4,4,4") + (set_attr "arch" "simd,sve,*,*,*,simd,*,*,*,*,*,*")] ) (define_split @@ -1756,7 +1761,7 @@ (define_insn "load_pair_dw_tftf" (match_operand:TF 1 "aarch64_mem_pair_operand" "Ump")) (set (match_operand:TF 2 "register_operand" "=w") (match_operand:TF 3 "memory_operand" "m"))] - "TARGET_SIMD + "TARGET_BASE_SIMD && rtx_equal_p (XEXP (operands[3], 0), plus_constant (Pmode, XEXP (operands[1], 0), @@ -1806,11 +1811,11 @@ (define_insn "store_pair_dw_tftf" (match_operand:TF 1 "register_operand" "w")) (set (match_operand:TF 2 "memory_operand" "=m") (match_operand:TF 3 "register_operand" "w"))] - "TARGET_SIMD && - rtx_equal_p (XEXP (operands[2], 0), - plus_constant (Pmode, - XEXP (operands[0], 0), - GET_MODE_SIZE (TFmode)))" + "TARGET_BASE_SIMD + && rtx_equal_p (XEXP (operands[2], 0), + plus_constant (Pmode, + XEXP (operands[0], 0), + GET_MODE_SIZE (TFmode)))" "stp\\t%q1, %q3, %z0" [(set_attr "type" "neon_stp_q") (set_attr "fp" "yes")] @@ -1858,7 +1863,7 @@ (define_insn "loadwb_pair_" (set (match_operand:TX 3 "register_operand" "=w") (mem:TX (plus:P (match_dup 1) (match_operand:P 5 "const_int_operand" "n"))))])] - "TARGET_SIMD && INTVAL (operands[5]) == GET_MODE_SIZE (mode)" + "TARGET_BASE_SIMD && INTVAL (operands[5]) == GET_MODE_SIZE (mode)" "ldp\\t%q2, %q3, [%1], %4" [(set_attr "type" "neon_ldp_q")] ) @@ -1908,7 +1913,7 @@ (define_insn "storewb_pair_" (set (mem:TX (plus:P (match_dup 0) (match_operand:P 5 "const_int_operand" "n"))) (match_operand:TX 3 "register_operand" "w"))])] - "TARGET_SIMD + "TARGET_BASE_SIMD && INTVAL (operands[5]) == INTVAL (operands[4]) + GET_MODE_SIZE (mode)" "stp\\t%q2, %q3, [%0, %4]!" diff --git a/gcc/testsuite/gcc.target/aarch64/movdf_2.c b/gcc/testsuite/gcc.target/aarch64/movdf_2.c new file mode 100644 index 00000000000..c2454d2c83e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movdf_2.c @@ -0,0 +1,51 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +/* +** fpr_to_fpr: +** fmov d0, d1 +** ret +*/ +double __attribute__((arm_streaming_compatible)) +fpr_to_fpr (double q0, double q1) +{ + return q1; +} + +/* +** gpr_to_fpr: +** fmov d0, x0 +** ret +*/ +double __attribute__((arm_streaming_compatible)) +gpr_to_fpr () +{ + register double x0 asm ("x0"); + asm volatile ("" : "=r" (x0)); + return x0; +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +double __attribute__((arm_streaming_compatible)) +zero_to_fpr () +{ + return 0; +} + +/* +** fpr_to_gpr: +** fmov x0, d0 +** ret +*/ +void __attribute__((arm_streaming_compatible)) +fpr_to_gpr (double q0) +{ + register double x0 asm ("x0"); + x0 = q0; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movdi_3.c b/gcc/testsuite/gcc.target/aarch64/movdi_3.c new file mode 100644 index 00000000000..5d369b27356 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movdi_3.c @@ -0,0 +1,59 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include + +/* +** fpr_to_fpr: +** fmov d0, d1 +** ret +*/ +void __attribute__((arm_streaming_compatible)) +fpr_to_fpr (void) +{ + register uint64_t q0 asm ("q0"); + register uint64_t q1 asm ("q1"); + asm volatile ("" : "=w" (q1)); + q0 = q1; + asm volatile ("" :: "w" (q0)); +} + +/* +** gpr_to_fpr: +** fmov d0, x0 +** ret +*/ +void __attribute__((arm_streaming_compatible)) +gpr_to_fpr (uint64_t x0) +{ + register uint64_t q0 asm ("q0"); + q0 = x0; + asm volatile ("" :: "w" (q0)); +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +void __attribute__((arm_streaming_compatible)) +zero_to_fpr () +{ + register uint64_t q0 asm ("q0"); + q0 = 0; + asm volatile ("" :: "w" (q0)); +} + +/* +** fpr_to_gpr: +** fmov x0, d0 +** ret +*/ +uint64_t __attribute__((arm_streaming_compatible)) +fpr_to_gpr () +{ + register uint64_t q0 asm ("q0"); + asm volatile ("" : "=w" (q0)); + return q0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/movhf_2.c b/gcc/testsuite/gcc.target/aarch64/movhf_2.c new file mode 100644 index 00000000000..cf3af357b84 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movhf_2.c @@ -0,0 +1,53 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nothing+simd" + +/* +** fpr_to_fpr: +** fmov s0, s1 +** ret +*/ +_Float16 __attribute__((arm_streaming_compatible)) +fpr_to_fpr (_Float16 q0, _Float16 q1) +{ + return q1; +} + +/* +** gpr_to_fpr: +** fmov s0, w0 +** ret +*/ +_Float16 __attribute__((arm_streaming_compatible)) +gpr_to_fpr () +{ + register _Float16 w0 asm ("w0"); + asm volatile ("" : "=r" (w0)); + return w0; +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +_Float16 __attribute__((arm_streaming_compatible)) +zero_to_fpr () +{ + return 0; +} + +/* +** fpr_to_gpr: +** fmov w0, s0 +** ret +*/ +void __attribute__((arm_streaming_compatible)) +fpr_to_gpr (_Float16 q0) +{ + register _Float16 w0 asm ("w0"); + w0 = q0; + asm volatile ("" :: "r" (w0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movhi_2.c b/gcc/testsuite/gcc.target/aarch64/movhi_2.c new file mode 100644 index 00000000000..108923449b9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movhi_2.c @@ -0,0 +1,61 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nothing+simd" + +#include + +/* +** fpr_to_fpr: +** fmov s0, s1 +** ret +*/ +void __attribute__((arm_streaming_compatible)) +fpr_to_fpr (void) +{ + register uint16_t q0 asm ("q0"); + register uint16_t q1 asm ("q1"); + asm volatile ("" : "=w" (q1)); + q0 = q1; + asm volatile ("" :: "w" (q0)); +} + +/* +** gpr_to_fpr: +** fmov s0, w0 +** ret +*/ +void __attribute__((arm_streaming_compatible)) +gpr_to_fpr (uint16_t w0) +{ + register uint16_t q0 asm ("q0"); + q0 = w0; + asm volatile ("" :: "w" (q0)); +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +void __attribute__((arm_streaming_compatible)) +zero_to_fpr () +{ + register uint16_t q0 asm ("q0"); + q0 = 0; + asm volatile ("" :: "w" (q0)); +} + +/* +** fpr_to_gpr: +** umov w0, v0.h\[0\] +** ret +*/ +uint16_t __attribute__((arm_streaming_compatible)) +fpr_to_gpr () +{ + register uint16_t q0 asm ("q0"); + asm volatile ("" : "=w" (q0)); + return q0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/movqi_2.c b/gcc/testsuite/gcc.target/aarch64/movqi_2.c new file mode 100644 index 00000000000..a28547d2ba3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movqi_2.c @@ -0,0 +1,59 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include + +/* +** fpr_to_fpr: +** fmov s0, s1 +** ret +*/ +void __attribute__((arm_streaming_compatible)) +fpr_to_fpr (void) +{ + register uint8_t q0 asm ("q0"); + register uint8_t q1 asm ("q1"); + asm volatile ("" : "=w" (q1)); + q0 = q1; + asm volatile ("" :: "w" (q0)); +} + +/* +** gpr_to_fpr: +** fmov s0, w0 +** ret +*/ +void __attribute__((arm_streaming_compatible)) +gpr_to_fpr (uint8_t w0) +{ + register uint8_t q0 asm ("q0"); + q0 = w0; + asm volatile ("" :: "w" (q0)); +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +void __attribute__((arm_streaming_compatible)) +zero_to_fpr () +{ + register uint8_t q0 asm ("q0"); + q0 = 0; + asm volatile ("" :: "w" (q0)); +} + +/* +** fpr_to_gpr: +** umov w0, v0.b\[0\] +** ret +*/ +uint8_t __attribute__((arm_streaming_compatible)) +fpr_to_gpr () +{ + register uint8_t q0 asm ("q0"); + asm volatile ("" : "=w" (q0)); + return q0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/movsf_2.c b/gcc/testsuite/gcc.target/aarch64/movsf_2.c new file mode 100644 index 00000000000..53abd380510 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movsf_2.c @@ -0,0 +1,51 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +/* +** fpr_to_fpr: +** fmov s0, s1 +** ret +*/ +float __attribute__((arm_streaming_compatible)) +fpr_to_fpr (float q0, float q1) +{ + return q1; +} + +/* +** gpr_to_fpr: +** fmov s0, w0 +** ret +*/ +float __attribute__((arm_streaming_compatible)) +gpr_to_fpr () +{ + register float w0 asm ("w0"); + asm volatile ("" : "=r" (w0)); + return w0; +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +float __attribute__((arm_streaming_compatible)) +zero_to_fpr () +{ + return 0; +} + +/* +** fpr_to_gpr: +** fmov w0, s0 +** ret +*/ +void __attribute__((arm_streaming_compatible)) +fpr_to_gpr (float q0) +{ + register float w0 asm ("w0"); + w0 = q0; + asm volatile ("" :: "r" (w0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movsi_2.c b/gcc/testsuite/gcc.target/aarch64/movsi_2.c new file mode 100644 index 00000000000..a0159d3fc1e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movsi_2.c @@ -0,0 +1,59 @@ +/* { dg-do assemble } */ +/* { dg-options "-O --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#include + +/* +** fpr_to_fpr: +** fmov s0, s1 +** ret +*/ +void __attribute__((arm_streaming_compatible)) +fpr_to_fpr (void) +{ + register uint32_t q0 asm ("q0"); + register uint32_t q1 asm ("q1"); + asm volatile ("" : "=w" (q1)); + q0 = q1; + asm volatile ("" :: "w" (q0)); +} + +/* +** gpr_to_fpr: +** fmov s0, w0 +** ret +*/ +void __attribute__((arm_streaming_compatible)) +gpr_to_fpr (uint32_t w0) +{ + register uint32_t q0 asm ("q0"); + q0 = w0; + asm volatile ("" :: "w" (q0)); +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +void __attribute__((arm_streaming_compatible)) +zero_to_fpr () +{ + register uint32_t q0 asm ("q0"); + q0 = 0; + asm volatile ("" :: "w" (q0)); +} + +/* +** fpr_to_gpr: +** fmov w0, s0 +** ret +*/ +uint32_t __attribute__((arm_streaming_compatible)) +fpr_to_gpr () +{ + register uint32_t q0 asm ("q0"); + asm volatile ("" : "=w" (q0)); + return q0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/movtf_3.c b/gcc/testsuite/gcc.target/aarch64/movtf_3.c new file mode 100644 index 00000000000..d38f59e2a1f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movtf_3.c @@ -0,0 +1,81 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target large_long_double } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nosve" + +/* +** fpr_to_fpr: +** sub sp, sp, #16 +** str q1, \[sp\] +** ldr q0, \[sp\] +** add sp, sp, #?16 +** ret +*/ +long double __attribute__((arm_streaming_compatible)) +fpr_to_fpr (long double q0, long double q1) +{ + return q1; +} + +/* +** gpr_to_fpr: { target aarch64_little_endian } +** fmov d0, x0 +** fmov v0.d\[1\], x1 +** ret +*/ +/* +** gpr_to_fpr: { target aarch64_big_endian } +** fmov d0, x1 +** fmov v0.d\[1\], x0 +** ret +*/ +long double __attribute__((arm_streaming_compatible)) +gpr_to_fpr () +{ + register long double x0 asm ("x0"); + asm volatile ("" : "=r" (x0)); + return x0; +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +long double __attribute__((arm_streaming_compatible)) +zero_to_fpr () +{ + return 0; +} + +/* +** fpr_to_gpr: { target aarch64_little_endian } +** ( +** fmov x0, d0 +** fmov x1, v0.d\[1\] +** | +** fmov x1, v0.d\[1\] +** fmov x0, d0 +** ) +** ret +*/ +/* +** fpr_to_gpr: { target aarch64_big_endian } +** ( +** fmov x1, d0 +** fmov x0, v0.d\[1\] +** | +** fmov x0, v0.d\[1\] +** fmov x1, d0 +** ) +** ret +*/ +void __attribute__((arm_streaming_compatible)) +fpr_to_gpr (long double q0) +{ + register long double x0 asm ("x0"); + x0 = q0; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movtf_4.c b/gcc/testsuite/gcc.target/aarch64/movtf_4.c new file mode 100644 index 00000000000..5b7486c7887 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movtf_4.c @@ -0,0 +1,78 @@ +/* { dg-do assemble } */ +/* { dg-require-effective-target large_long_double } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+sve" + +/* +** fpr_to_fpr: +** mov z0.d, z1.d +** ret +*/ +long double __attribute__((arm_streaming_compatible)) +fpr_to_fpr (long double q0, long double q1) +{ + return q1; +} + +/* +** gpr_to_fpr: { target aarch64_little_endian } +** fmov d0, x0 +** fmov v0.d\[1\], x1 +** ret +*/ +/* +** gpr_to_fpr: { target aarch64_big_endian } +** fmov d0, x1 +** fmov v0.d\[1\], x0 +** ret +*/ +long double __attribute__((arm_streaming_compatible)) +gpr_to_fpr () +{ + register long double x0 asm ("x0"); + asm volatile ("" : "=r" (x0)); + return x0; +} + +/* +** zero_to_fpr: +** fmov s0, wzr +** ret +*/ +long double __attribute__((arm_streaming_compatible)) +zero_to_fpr () +{ + return 0; +} + +/* +** fpr_to_gpr: { target aarch64_little_endian } +** ( +** fmov x0, d0 +** fmov x1, v0.d\[1\] +** | +** fmov x1, v0.d\[1\] +** fmov x0, d0 +** ) +** ret +*/ +/* +** fpr_to_gpr: { target aarch64_big_endian } +** ( +** fmov x1, d0 +** fmov x0, v0.d\[1\] +** | +** fmov x0, v0.d\[1\] +** fmov x1, d0 +** ) +** ret +*/ +void __attribute__((arm_streaming_compatible)) +fpr_to_gpr (long double q0) +{ + register long double x0 asm ("x0"); + x0 = q0; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movti_3.c b/gcc/testsuite/gcc.target/aarch64/movti_3.c new file mode 100644 index 00000000000..d846b09497e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movti_3.c @@ -0,0 +1,86 @@ +/* { dg-do assemble } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nosve" + +/* +** fpr_to_fpr: +** sub sp, sp, #16 +** str q1, \[sp\] +** ldr q0, \[sp\] +** add sp, sp, #?16 +** ret +*/ +void __attribute__((arm_streaming_compatible)) +fpr_to_fpr (void) +{ + register __int128_t q0 asm ("q0"); + register __int128_t q1 asm ("q1"); + asm volatile ("" : "=w" (q1)); + q0 = q1; + asm volatile ("" :: "w" (q0)); +} + +/* +** gpr_to_fpr: { target aarch64_little_endian } +** fmov d0, x0 +** fmov v0.d\[1\], x1 +** ret +*/ +/* +** gpr_to_fpr: { target aarch64_big_endian } +** fmov d0, x1 +** fmov v0.d\[1\], x0 +** ret +*/ +void __attribute__((arm_streaming_compatible)) +gpr_to_fpr (__int128_t x0) +{ + register __int128_t q0 asm ("q0"); + q0 = x0; + asm volatile ("" :: "w" (q0)); +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +void __attribute__((arm_streaming_compatible)) +zero_to_fpr () +{ + register __int128_t q0 asm ("q0"); + q0 = 0; + asm volatile ("" :: "w" (q0)); +} + +/* +** fpr_to_gpr: { target aarch64_little_endian } +** ( +** fmov x0, d0 +** fmov x1, v0.d\[1\] +** | +** fmov x1, v0.d\[1\] +** fmov x0, d0 +** ) +** ret +*/ +/* +** fpr_to_gpr: { target aarch64_big_endian } +** ( +** fmov x1, d0 +** fmov x0, v0.d\[1\] +** | +** fmov x0, v0.d\[1\] +** fmov x1, d0 +** ) +** ret +*/ +__int128_t __attribute__((arm_streaming_compatible)) +fpr_to_gpr () +{ + register __int128_t q0 asm ("q0"); + asm volatile ("" : "=w" (q0)); + return q0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/movti_4.c b/gcc/testsuite/gcc.target/aarch64/movti_4.c new file mode 100644 index 00000000000..01e5537e88f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movti_4.c @@ -0,0 +1,83 @@ +/* { dg-do assemble } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+sve" + +/* +** fpr_to_fpr: +** mov z0\.d, z1\.d +** ret +*/ +void __attribute__((arm_streaming_compatible)) +fpr_to_fpr (void) +{ + register __int128_t q0 asm ("q0"); + register __int128_t q1 asm ("q1"); + asm volatile ("" : "=w" (q1)); + q0 = q1; + asm volatile ("" :: "w" (q0)); +} + +/* +** gpr_to_fpr: { target aarch64_little_endian } +** fmov d0, x0 +** fmov v0.d\[1\], x1 +** ret +*/ +/* +** gpr_to_fpr: { target aarch64_big_endian } +** fmov d0, x1 +** fmov v0.d\[1\], x0 +** ret +*/ +void __attribute__((arm_streaming_compatible)) +gpr_to_fpr (__int128_t x0) +{ + register __int128_t q0 asm ("q0"); + q0 = x0; + asm volatile ("" :: "w" (q0)); +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +void __attribute__((arm_streaming_compatible)) +zero_to_fpr () +{ + register __int128_t q0 asm ("q0"); + q0 = 0; + asm volatile ("" :: "w" (q0)); +} + +/* +** fpr_to_gpr: { target aarch64_little_endian } +** ( +** fmov x0, d0 +** fmov x1, v0.d\[1\] +** | +** fmov x1, v0.d\[1\] +** fmov x0, d0 +** ) +** ret +*/ +/* +** fpr_to_gpr: { target aarch64_big_endian } +** ( +** fmov x1, d0 +** fmov x0, v0.d\[1\] +** | +** fmov x0, v0.d\[1\] +** fmov x1, d0 +** ) +** ret +*/ +__int128_t __attribute__((arm_streaming_compatible)) +fpr_to_gpr () +{ + register __int128_t q0 asm ("q0"); + asm volatile ("" : "=w" (q0)); + return q0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/movv16qi_4.c b/gcc/testsuite/gcc.target/aarch64/movv16qi_4.c new file mode 100644 index 00000000000..f0f8cb95750 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv16qi_4.c @@ -0,0 +1,82 @@ +/* { dg-do assemble } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nosve" + +typedef unsigned char v16qi __attribute__((vector_size(16))); + +/* +** fpr_to_fpr: +** sub sp, sp, #16 +** str q1, \[sp\] +** ldr q0, \[sp\] +** add sp, sp, #?16 +** ret +*/ +v16qi __attribute__((arm_streaming_compatible)) +fpr_to_fpr (v16qi q0, v16qi q1) +{ + return q1; +} + +/* +** gpr_to_fpr: { target aarch64_little_endian } +** fmov d0, x0 +** fmov v0.d\[1\], x1 +** ret +*/ +/* +** gpr_to_fpr: { target aarch64_big_endian } +** fmov d0, x1 +** fmov v0.d\[1\], x0 +** ret +*/ +v16qi __attribute__((arm_streaming_compatible)) +gpr_to_fpr () +{ + register v16qi x0 asm ("x0"); + asm volatile ("" : "=r" (x0)); + return x0; +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +v16qi __attribute__((arm_streaming_compatible)) +zero_to_fpr () +{ + return (v16qi) {}; +} + +/* +** fpr_to_gpr: { target aarch64_little_endian } +** ( +** umov x0, v0.d\[0\] +** fmov x1, v0.d\[1\] +** | +** fmov x1, v0.d\[1\] +** umov x0, v0.d\[0\] +** ) +** ret +*/ +/* +** fpr_to_gpr: { target aarch64_big_endian } +** ( +** umov x1, v0.d\[0\] +** fmov x0, v0.d\[1\] +** | +** fmov x0, v0.d\[1\] +** umov x1, v0.d\[0\] +** ) +** ret +*/ +void __attribute__((arm_streaming_compatible)) +fpr_to_gpr (v16qi q0) +{ + register v16qi x0 asm ("x0"); + x0 = q0; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movv16qi_5.c b/gcc/testsuite/gcc.target/aarch64/movv16qi_5.c new file mode 100644 index 00000000000..db59f01376e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv16qi_5.c @@ -0,0 +1,79 @@ +/* { dg-do assemble } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+sve" + +typedef unsigned char v16qi __attribute__((vector_size(16))); + +/* +** fpr_to_fpr: +** mov z0.d, z1.d +** ret +*/ +v16qi __attribute__((arm_streaming_compatible)) +fpr_to_fpr (v16qi q0, v16qi q1) +{ + return q1; +} + +/* +** gpr_to_fpr: { target aarch64_little_endian } +** fmov d0, x0 +** fmov v0.d\[1\], x1 +** ret +*/ +/* +** gpr_to_fpr: { target aarch64_big_endian } +** fmov d0, x1 +** fmov v0.d\[1\], x0 +** ret +*/ +v16qi __attribute__((arm_streaming_compatible)) +gpr_to_fpr () +{ + register v16qi x0 asm ("x0"); + asm volatile ("" : "=r" (x0)); + return x0; +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +v16qi __attribute__((arm_streaming_compatible)) +zero_to_fpr () +{ + return (v16qi) {}; +} + +/* +** fpr_to_gpr: { target aarch64_little_endian } +** ( +** umov x0, v0.d\[0\] +** fmov x1, v0.d\[1\] +** | +** fmov x1, v0.d\[1\] +** umov x0, v0.d\[0\] +** ) +** ret +*/ +/* +** fpr_to_gpr: { target aarch64_big_endian } +** ( +** umov x1, v0.d\[0\] +** fmov x0, v0.d\[1\] +** | +** fmov x0, v0.d\[1\] +** umov x1, v0.d\[0\] +** ) +** ret +*/ +void __attribute__((arm_streaming_compatible)) +fpr_to_gpr (v16qi q0) +{ + register v16qi x0 asm ("x0"); + x0 = q0; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/movv8qi_4.c b/gcc/testsuite/gcc.target/aarch64/movv8qi_4.c new file mode 100644 index 00000000000..49eb2d31910 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movv8qi_4.c @@ -0,0 +1,55 @@ +/* { dg-do assemble } */ +/* { dg-options "-O -mtune=neoverse-v1 --save-temps" } */ +/* { dg-final { check-function-bodies "**" "" "" } } */ + +#pragma GCC target "+nosve" + +typedef unsigned char v8qi __attribute__((vector_size(8))); + +/* +** fpr_to_fpr: +** fmov d0, d1 +** ret +*/ +v8qi __attribute__((arm_streaming_compatible)) +fpr_to_fpr (v8qi q0, v8qi q1) +{ + return q1; +} + +/* +** gpr_to_fpr: +** fmov d0, x0 +** ret +*/ +v8qi __attribute__((arm_streaming_compatible)) +gpr_to_fpr () +{ + register v8qi x0 asm ("x0"); + asm volatile ("" : "=r" (x0)); + return x0; +} + +/* +** zero_to_fpr: +** fmov d0, xzr +** ret +*/ +v8qi __attribute__((arm_streaming_compatible)) +zero_to_fpr () +{ + return (v8qi) {}; +} + +/* +** fpr_to_gpr: +** umov x0, v0\.d\[0\] +** ret +*/ +void __attribute__((arm_streaming_compatible)) +fpr_to_gpr (v8qi q0) +{ + register v8qi x0 asm ("x0"); + x0 = q0; + asm volatile ("" :: "r" (x0)); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c new file mode 100644 index 00000000000..4a526e7d125 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_1.c @@ -0,0 +1,13 @@ +// { dg-options "" } + +#include + +#pragma GCC target "+nosme" + +// { dg-error {inlining failed.*'vaddq_s32'} "" { target *-*-* } 0 } + +int32x4_t __attribute__((arm_streaming_compatible)) +foo (int32x4_t x, int32x4_t y) +{ + return vaddq_s32 (x, y); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c new file mode 100644 index 00000000000..e7183caa6f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_2.c @@ -0,0 +1,11 @@ +// { dg-options "" } + +#include + +// { dg-error {inlining failed.*'vaddq_s32'} "" { target *-*-* } 0 } + +int32x4_t __attribute__((arm_streaming_compatible)) +foo (int32x4_t x, int32x4_t y) +{ + return vaddq_s32 (x, y); +} diff --git a/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c new file mode 100644 index 00000000000..e11570e41d1 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sme/arm_neon_3.c @@ -0,0 +1,11 @@ +// { dg-options "" } + +#include + +// { dg-error {inlining failed.*'vaddq_s32'} "" { target *-*-* } 0 } + +int32x4_t __attribute__((arm_streaming)) +foo (int32x4_t x, int32x4_t y) +{ + return vaddq_s32 (x, y); +}