From patchwork Tue Oct 11 11:02:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 1913 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp2031354wrs; Tue, 11 Oct 2022 04:05:42 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4WGGYK52pNSE5yjO46CpMObo6B/GnrhYHr+n+FDxX/jDkm8GQ9rOV81t/sbyIdC6rYNryo X-Received: by 2002:a05:6402:298b:b0:44f:20a:2db2 with SMTP id eq11-20020a056402298b00b0044f020a2db2mr22730723edb.138.1665486342591; Tue, 11 Oct 2022 04:05:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665486342; cv=none; d=google.com; s=arc-20160816; b=UrG77fqOw8rydLjHeSZ7qZAWtBNy2zqiFsPym2q0eA0b9ebBMvB5iN28TnqlyjXgqm Xwp08pQZ5tJ0TPMtSHILvrHPdA0DXC7VrcJXcy8uoWaMwoTLMKYyWl+bnRGgxlUimOA0 tCZhGp6zDBk/SeHR/mQyLrH8D5pm1sojVucFKqcDwJU33MAe579XGx8gXaDetNpMsZYB Wc0Utk7Ee5+xWPrh8UOuF9kl2uGtoyKi2zPeKjI1sfHYDMvSSlrUGmyVUm2TxTmYg5Ql bd8OE7Kf8PJVxQ37q+RRtagDh3BHHum4D4SZxIBRQdbXT5dvnH7zP1YeAj6XnzpIJGTa rYww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :ironport-sdr:dmarc-filter:delivered-to; bh=cZDHHLSfWvUndpOtfCmAjgpBBQpCO8mHLifzPHfW+D0=; b=ISCOfneruNFILyBDFLrinCVodqRQbR/ZCtdJoTABTmWQMBcKFKUQ1WKRG45CzMPM6M OvG+vL7QrzpH7YYUnrKeU79YKoA7Gc8WNpDwzL0F96BtYN4WwYD0B7QkPquIxjag5Pfs 7/w3oykMXnlrRYIsmTWM+9I7fWf6DJUCGLqHrArwWPavvtoOvQtPNk8a4WtBIomjVKnU qF1A+GIaAarHRS731RVLQrIIsYTHkgiWpjZNwz34Q3m1goBTpxH6jDBQ1z9Khc9BA9Bf FmSSk6hS9cT0DMYQHAa4jwwd+wU5JxSrqpKck17NyMVlKmXOJO/UWkMkFbHlc3smkXk6 PZMw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id di6-20020a170906730600b0078db6f56d51si6686005ejc.808.2022.10.11.04.05.42 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Oct 2022 04:05:42 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7D5C33853821 for ; Tue, 11 Oct 2022 11:03:36 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 676223858C2D for ; Tue, 11 Oct 2022 11:02:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 676223858C2D Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.95,176,1661846400"; d="scan'208";a="87280550" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa1.mentor.iphmx.com with ESMTP; 11 Oct 2022 03:02:33 -0800 IronPort-SDR: fgTgcRYbiXBLBovoZ0GYhhjPcgQp7uolc/hJ1Xk704pq5fDkwEfk/zreKqQVijklWU+ocKTHGy ae954pI+8+illHacuvyvUTeIlceRX7bNEiNdWJ91kHpCwktjvh02DjrV75v3qOQCb4ONqyh/+H x4Jtw4JnSuLstjzsSXDiHTIYjSqZy5msvDEpERqx2k6lbm2oYC3w9EmbGKJrEFjUNmH3YttWLM AJqQ+IIH/AMJvI8dvE9sdOG50i2nS6V/jD1IKFnPEgiqJLeTZm+Hv1DusOhuEEZ5qffb/2oFQP Rog= From: Andrew Stubbs To: Subject: [committed 1/6] amdgcn: add multiple vector sizes Date: Tue, 11 Oct 2022 12:02:03 +0100 Message-ID: <45381d6f9f4e7b5c7b062f5ad8cc9788091c2d07.1665485382.git.ams@codesourcery.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-14.mgc.mentorg.com (139.181.222.14) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746389006785852102?= X-GMAIL-MSGID: =?utf-8?q?1746389006785852102?= The vectors sizes are simulated using implicit masking, but they make life easier for the autovectorizer and SLP passes. gcc/ChangeLog: * config/gcn/gcn-modes.def (VECTOR_MODE): Add new modes V32QI, V32HI, V32SI, V32DI, V32TI, V32HF, V32SF, V32DF, V16QI, V16HI, V16SI, V16DI, V16TI, V16HF, V16SF, V16DF, V8QI, V8HI, V8SI, V8DI, V8TI, V8HF, V8SF, V8DF, V4QI, V4HI, V4SI, V4DI, V4TI, V4HF, V4SF, V4DF, V2QI, V2HI, V2SI, V2DI, V2TI, V2HF, V2SF, V2DF. (ADJUST_ALIGNMENT): Likewise. * config/gcn/gcn-protos.h (gcn_full_exec): Delete. (gcn_full_exec_reg): Delete. (gcn_scalar_exec): Delete. (gcn_scalar_exec_reg): Delete. (vgpr_1reg_mode_p): Use inner mode to identify vector registers. (vgpr_2reg_mode_p): Likewise. (vgpr_vector_mode_p): Use VECTOR_MODE_P. * config/gcn/gcn-valu.md (V_QI, V_HI, V_HF, V_SI, V_SF, V_DI, V_DF, V_QIHI, V_1REG, V_INT_1REG, V_INT_1REG_ALT, V_FP_1REG, V_2REG, V_noQI, V_noHI, V_INT_noQI, V_INT_noHI, V_ALL, V_ALL_ALT, V_INT, V_FP): Add additional vector modes. (V64_SI, V64_DI, V64_ALL, V64_FP): New iterators. (scalar_mode, SCALAR_MODE, vnsi, VnSI, vndi, VnDI, sdwa): Add additional vector mode mappings. (mov): Implement vector length conversions. (ldexp3): Use VnSI. (frexp_exp2): Likewise. (VCVT_MODE, VCVT_FMODE, VCVT_IMODE): Add additional vector modes. (reduc__scal_): Use V64_ALL. (fold_left_plus_): Use V64_FP. (*_dpp_shr_): Use V64_1REG. (*_dpp_shr_): Use V64_DI. (*plus_carry_dpp_shr_): Use V64_INT_1REG. (*plus_carry_in_dpp_shr_): Use V64_SI. (*plus_carry_dpp_shr_): Use V64_DI. (mov_from_lane63_): Use V64_2REG. * config/gcn/gcn.cc (VnMODE): New function. (gcn_can_change_mode_class): Support multiple vector sizes. (gcn_modes_tieable_p): Likewise. (gcn_operand_part): Likewise. (gcn_scalar_exec): Delete function. (gcn_scalar_exec_reg): Delete function. (gcn_full_exec): Delete function. (gcn_full_exec_reg): Delete function. (gcn_inline_fp_constant_p): Support multiple vector sizes. (gcn_fp_constant_p): Likewise. (A): New macro. (GEN_VN_NOEXEC): New macro. (GEN_VNM_NOEXEC): New macro. (GEN_VN): New macro. (GEN_VNM): New macro. (GET_VN_FN): New macro. (CODE_FOR): New macro. (CODE_FOR_OP): New macro. (gen_mov_with_exec): Delete function. (gen_duplicate_load): Delete function. (gcn_expand_vector_init): Support multiple vector sizes. (strided_constant): Likewise. (gcn_addr_space_legitimize_address): Likewise. (gcn_expand_scalar_to_vector_address): Likewise. (gcn_expand_scaled_offsets): Likewise. (gcn_secondary_reload): Likewise. (gcn_valid_cvt_p): Likewise. (gcn_expand_builtin_1): Likewise. (gcn_make_vec_perm_address): Likewise. (gcn_vectorize_vec_perm_const): Likewise. (gcn_vector_mode_supported_p): Likewise. (gcn_autovectorize_vector_modes): New hook. (gcn_related_vector_mode): Support multiple vector sizes. (gcn_expand_dpp_shr_insn): Add FIXME comment. (gcn_md_reorg): Support multiple vector sizes. (print_reg): Likewise. (print_operand): Likewise. (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): New hook. --- gcc/config/gcn/gcn-modes.def | 82 ++++ gcc/config/gcn/gcn-protos.h | 22 +- gcc/config/gcn/gcn-valu.md | 332 ++++++++++--- gcc/config/gcn/gcn.cc | 927 ++++++++++++++++++++++------------- 4 files changed, 938 insertions(+), 425 deletions(-) diff --git a/gcc/config/gcn/gcn-modes.def b/gcc/config/gcn/gcn-modes.def index 82585de798b..1b8a3203463 100644 --- a/gcc/config/gcn/gcn-modes.def +++ b/gcc/config/gcn/gcn-modes.def @@ -29,6 +29,48 @@ VECTOR_MODE (FLOAT, HF, 64); /* V64HF */ VECTOR_MODE (FLOAT, SF, 64); /* V64SF */ VECTOR_MODE (FLOAT, DF, 64); /* V64DF */ +/* Artificial vector modes, for when vector masking doesn't work (yet). */ +VECTOR_MODE (INT, QI, 32); /* V32QI */ +VECTOR_MODE (INT, HI, 32); /* V32HI */ +VECTOR_MODE (INT, SI, 32); /* V32SI */ +VECTOR_MODE (INT, DI, 32); /* V32DI */ +VECTOR_MODE (INT, TI, 32); /* V32TI */ +VECTOR_MODE (FLOAT, HF, 32); /* V32HF */ +VECTOR_MODE (FLOAT, SF, 32); /* V32SF */ +VECTOR_MODE (FLOAT, DF, 32); /* V32DF */ +VECTOR_MODE (INT, QI, 16); /* V16QI */ +VECTOR_MODE (INT, HI, 16); /* V16HI */ +VECTOR_MODE (INT, SI, 16); /* V16SI */ +VECTOR_MODE (INT, DI, 16); /* V16DI */ +VECTOR_MODE (INT, TI, 16); /* V16TI */ +VECTOR_MODE (FLOAT, HF, 16); /* V16HF */ +VECTOR_MODE (FLOAT, SF, 16); /* V16SF */ +VECTOR_MODE (FLOAT, DF, 16); /* V16DF */ +VECTOR_MODE (INT, QI, 8); /* V8QI */ +VECTOR_MODE (INT, HI, 8); /* V8HI */ +VECTOR_MODE (INT, SI, 8); /* V8SI */ +VECTOR_MODE (INT, DI, 8); /* V8DI */ +VECTOR_MODE (INT, TI, 8); /* V8TI */ +VECTOR_MODE (FLOAT, HF, 8); /* V8HF */ +VECTOR_MODE (FLOAT, SF, 8); /* V8SF */ +VECTOR_MODE (FLOAT, DF, 8); /* V8DF */ +VECTOR_MODE (INT, QI, 4); /* V4QI */ +VECTOR_MODE (INT, HI, 4); /* V4HI */ +VECTOR_MODE (INT, SI, 4); /* V4SI */ +VECTOR_MODE (INT, DI, 4); /* V4DI */ +VECTOR_MODE (INT, TI, 4); /* V4TI */ +VECTOR_MODE (FLOAT, HF, 4); /* V4HF */ +VECTOR_MODE (FLOAT, SF, 4); /* V4SF */ +VECTOR_MODE (FLOAT, DF, 4); /* V4DF */ +VECTOR_MODE (INT, QI, 2); /* V2QI */ +VECTOR_MODE (INT, HI, 2); /* V2HI */ +VECTOR_MODE (INT, SI, 2); /* V2SI */ +VECTOR_MODE (INT, DI, 2); /* V2DI */ +VECTOR_MODE (INT, TI, 2); /* V2TI */ +VECTOR_MODE (FLOAT, HF, 2); /* V2HF */ +VECTOR_MODE (FLOAT, SF, 2); /* V2SF */ +VECTOR_MODE (FLOAT, DF, 2); /* V2DF */ + /* Vector units handle reads independently and thus no large alignment needed. */ ADJUST_ALIGNMENT (V64QI, 1); @@ -39,3 +81,43 @@ ADJUST_ALIGNMENT (V64TI, 16); ADJUST_ALIGNMENT (V64HF, 2); ADJUST_ALIGNMENT (V64SF, 4); ADJUST_ALIGNMENT (V64DF, 8); +ADJUST_ALIGNMENT (V32QI, 1); +ADJUST_ALIGNMENT (V32HI, 2); +ADJUST_ALIGNMENT (V32SI, 4); +ADJUST_ALIGNMENT (V32DI, 8); +ADJUST_ALIGNMENT (V32TI, 16); +ADJUST_ALIGNMENT (V32HF, 2); +ADJUST_ALIGNMENT (V32SF, 4); +ADJUST_ALIGNMENT (V32DF, 8); +ADJUST_ALIGNMENT (V16QI, 1); +ADJUST_ALIGNMENT (V16HI, 2); +ADJUST_ALIGNMENT (V16SI, 4); +ADJUST_ALIGNMENT (V16DI, 8); +ADJUST_ALIGNMENT (V16TI, 16); +ADJUST_ALIGNMENT (V16HF, 2); +ADJUST_ALIGNMENT (V16SF, 4); +ADJUST_ALIGNMENT (V16DF, 8); +ADJUST_ALIGNMENT (V8QI, 1); +ADJUST_ALIGNMENT (V8HI, 2); +ADJUST_ALIGNMENT (V8SI, 4); +ADJUST_ALIGNMENT (V8DI, 8); +ADJUST_ALIGNMENT (V8TI, 16); +ADJUST_ALIGNMENT (V8HF, 2); +ADJUST_ALIGNMENT (V8SF, 4); +ADJUST_ALIGNMENT (V8DF, 8); +ADJUST_ALIGNMENT (V4QI, 1); +ADJUST_ALIGNMENT (V4HI, 2); +ADJUST_ALIGNMENT (V4SI, 4); +ADJUST_ALIGNMENT (V4DI, 8); +ADJUST_ALIGNMENT (V4TI, 16); +ADJUST_ALIGNMENT (V4HF, 2); +ADJUST_ALIGNMENT (V4SF, 4); +ADJUST_ALIGNMENT (V4DF, 8); +ADJUST_ALIGNMENT (V2QI, 1); +ADJUST_ALIGNMENT (V2HI, 2); +ADJUST_ALIGNMENT (V2SI, 4); +ADJUST_ALIGNMENT (V2DI, 8); +ADJUST_ALIGNMENT (V2TI, 16); +ADJUST_ALIGNMENT (V2HF, 2); +ADJUST_ALIGNMENT (V2SF, 4); +ADJUST_ALIGNMENT (V2DF, 8); diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h index ca804609c09..6300c1cbd36 100644 --- a/gcc/config/gcn/gcn-protos.h +++ b/gcc/config/gcn/gcn-protos.h @@ -34,8 +34,6 @@ extern rtx gcn_expand_scalar_to_vector_address (machine_mode, rtx, rtx, rtx); extern void gcn_expand_vector_init (rtx, rtx); extern bool gcn_flat_address_p (rtx, machine_mode); extern bool gcn_fp_constant_p (rtx, bool); -extern rtx gcn_full_exec (); -extern rtx gcn_full_exec_reg (); extern rtx gcn_gen_undef (machine_mode); extern bool gcn_global_address_p (rtx); extern tree gcn_goacc_adjust_private_decl (location_t, tree var, int level); @@ -67,8 +65,6 @@ extern rtx gcn_operand_part (machine_mode, rtx, int); extern bool gcn_regno_mode_code_ok_for_base_p (int, machine_mode, addr_space_t, int, int); extern reg_class gcn_regno_reg_class (int regno); -extern rtx gcn_scalar_exec (); -extern rtx gcn_scalar_exec_reg (); extern bool gcn_scalar_flat_address_p (rtx); extern bool gcn_scalar_flat_mem_p (rtx); extern bool gcn_sgpr_move_p (rtx, rtx); @@ -105,9 +101,11 @@ extern gimple_opt_pass *make_pass_omp_gcn (gcc::context *ctxt); inline bool vgpr_1reg_mode_p (machine_mode mode) { - return (mode == SImode || mode == SFmode || mode == HImode || mode == QImode - || mode == V64QImode || mode == V64HImode || mode == V64SImode - || mode == V64HFmode || mode == V64SFmode || mode == BImode); + if (VECTOR_MODE_P (mode)) + mode = GET_MODE_INNER (mode); + + return (mode == SImode || mode == SFmode || mode == HImode || mode == HFmode + || mode == QImode || mode == BImode); } /* Return true if MODE is valid for 1 SGPR register. */ @@ -124,8 +122,10 @@ sgpr_1reg_mode_p (machine_mode mode) inline bool vgpr_2reg_mode_p (machine_mode mode) { - return (mode == DImode || mode == DFmode - || mode == V64DImode || mode == V64DFmode); + if (VECTOR_MODE_P (mode)) + mode = GET_MODE_INNER (mode); + + return (mode == DImode || mode == DFmode); } /* Return true if MODE can be handled directly by VGPR operations. */ @@ -133,9 +133,7 @@ vgpr_2reg_mode_p (machine_mode mode) inline bool vgpr_vector_mode_p (machine_mode mode) { - return (mode == V64QImode || mode == V64HImode - || mode == V64SImode || mode == V64DImode - || mode == V64HFmode || mode == V64SFmode || mode == V64DFmode); + return VECTOR_MODE_P (mode); } diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index dec81e863f7..52d2fcb880a 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b/gcc/config/gcn/gcn-valu.md @@ -17,88 +17,243 @@ ;; {{{ Vector iterators ; Vector modes for specific types -; (This will make more sense when there are multiple vector sizes) (define_mode_iterator V_QI - [V64QI]) + [V2QI V4QI V8QI V16QI V32QI V64QI]) (define_mode_iterator V_HI - [V64HI]) + [V2HI V4HI V8HI V16HI V32HI V64HI]) (define_mode_iterator V_HF - [V64HF]) + [V2HF V4HF V8HF V16HF V32HF V64HF]) (define_mode_iterator V_SI - [V64SI]) + [V2SI V4SI V8SI V16SI V32SI V64SI]) (define_mode_iterator V_SF - [V64SF]) + [V2SF V4SF V8SF V16SF V32SF V64SF]) (define_mode_iterator V_DI - [V64DI]) + [V2DI V4DI V8DI V16DI V32DI V64DI]) (define_mode_iterator V_DF - [V64DF]) + [V2DF V4DF V8DF V16DF V32DF V64DF]) + +(define_mode_iterator V64_SI + [V64SI]) +(define_mode_iterator V64_DI + [V64DI]) ; Vector modes for sub-dword modes (define_mode_iterator V_QIHI - [V64QI V64HI]) + [V2QI V2HI + V4QI V4HI + V8QI V8HI + V16QI V16HI + V32QI V32HI + V64QI V64HI]) ; Vector modes for one vector register (define_mode_iterator V_1REG - [V64QI V64HI V64SI V64HF V64SF]) + [V2QI V2HI V2SI V2HF V2SF + V4QI V4HI V4SI V4HF V4SF + V8QI V8HI V8SI V8HF V8SF + V16QI V16HI V16SI V16HF V16SF + V32QI V32HI V32SI V32HF V32SF + V64QI V64HI V64SI V64HF V64SF]) (define_mode_iterator V_INT_1REG - [V64QI V64HI V64SI]) + [V2QI V2HI V2SI + V4QI V4HI V4SI + V8QI V8HI V8SI + V16QI V16HI V16SI + V32QI V32HI V32SI + V64QI V64HI V64SI]) (define_mode_iterator V_INT_1REG_ALT - [V64QI V64HI V64SI]) + [V2QI V2HI V2SI + V4QI V4HI V4SI + V8QI V8HI V8SI + V16QI V16HI V16SI + V32QI V32HI V32SI + V64QI V64HI V64SI]) (define_mode_iterator V_FP_1REG - [V64HF V64SF]) + [V2HF V2SF + V4HF V4SF + V8HF V8SF + V16HF V16SF + V32HF V32SF + V64HF V64SF]) + +; V64_* modes are for where more general support is unimplemented +; (e.g. reductions) +(define_mode_iterator V64_1REG + [V64QI V64HI V64SI V64HF V64SF]) +(define_mode_iterator V64_INT_1REG + [V64QI V64HI V64SI]) ; Vector modes for two vector registers (define_mode_iterator V_2REG + [V2DI V2DF + V4DI V4DF + V8DI V8DF + V16DI V16DF + V32DI V32DF + V64DI V64DF]) + +(define_mode_iterator V64_2REG [V64DI V64DF]) ; Vector modes with native support (define_mode_iterator V_noQI - [V64HI V64HF V64SI V64SF V64DI V64DF]) + [V2HI V2HF V2SI V2SF V2DI V2DF + V4HI V4HF V4SI V4SF V4DI V4DF + V8HI V8HF V8SI V8SF V8DI V8DF + V16HI V16HF V16SI V16SF V16DI V16DF + V32HI V32HF V32SI V32SF V32DI V32DF + V64HI V64HF V64SI V64SF V64DI V64DF]) (define_mode_iterator V_noHI - [V64HF V64SI V64SF V64DI V64DF]) + [V2HF V2SI V2SF V2DI V2DF + V4HF V4SI V4SF V4DI V4DF + V8HF V8SI V8SF V8DI V8DF + V16HF V16SI V16SF V16DI V16DF + V32HF V32SI V32SF V32DI V32DF + V64HF V64SI V64SF V64DI V64DF]) (define_mode_iterator V_INT_noQI - [V64HI V64SI V64DI]) + [V2HI V2SI V2DI + V4HI V4SI V4DI + V8HI V8SI V8DI + V16HI V16SI V16DI + V32HI V32SI V32DI + V64HI V64SI V64DI]) (define_mode_iterator V_INT_noHI - [V64SI V64DI]) + [V2SI V2DI + V4SI V4DI + V8SI V8DI + V16SI V16DI + V32SI V32DI + V64SI V64DI]) ; All of above (define_mode_iterator V_ALL - [V64QI V64HI V64HF V64SI V64SF V64DI V64DF]) + [V2QI V2HI V2HF V2SI V2SF V2DI V2DF + V4QI V4HI V4HF V4SI V4SF V4DI V4DF + V8QI V8HI V8HF V8SI V8SF V8DI V8DF + V16QI V16HI V16HF V16SI V16SF V16DI V16DF + V32QI V32HI V32HF V32SI V32SF V32DI V32DF + V64QI V64HI V64HF V64SI V64SF V64DI V64DF]) (define_mode_iterator V_ALL_ALT - [V64QI V64HI V64HF V64SI V64SF V64DI V64DF]) + [V2QI V2HI V2HF V2SI V2SF V2DI V2DF + V4QI V4HI V4HF V4SI V4SF V4DI V4DF + V8QI V8HI V8HF V8SI V8SF V8DI V8DF + V16QI V16HI V16HF V16SI V16SF V16DI V16DF + V32QI V32HI V32HF V32SI V32SF V32DI V32DF + V64QI V64HI V64HF V64SI V64SF V64DI V64DF]) (define_mode_iterator V_INT - [V64QI V64HI V64SI V64DI]) + [V2QI V2HI V2SI V2DI + V4QI V4HI V4SI V4DI + V8QI V8HI V8SI V8DI + V16QI V16HI V16SI V16DI + V32QI V32HI V32SI V32DI + V64QI V64HI V64SI V64DI]) (define_mode_iterator V_FP + [V2HF V2SF V2DF + V4HF V4SF V4DF + V8HF V8SF V8DF + V16HF V16SF V16DF + V32HF V32SF V32DF + V64HF V64SF V64DF]) + +(define_mode_iterator V64_ALL + [V64QI V64HI V64HF V64SI V64SF V64DI V64DF]) +(define_mode_iterator V64_FP [V64HF V64SF V64DF]) (define_mode_attr scalar_mode - [(V64QI "qi") (V64HI "hi") (V64SI "si") + [(V2QI "qi") (V2HI "hi") (V2SI "si") + (V2HF "hf") (V2SF "sf") (V2DI "di") (V2DF "df") + (V4QI "qi") (V4HI "hi") (V4SI "si") + (V4HF "hf") (V4SF "sf") (V4DI "di") (V4DF "df") + (V8QI "qi") (V8HI "hi") (V8SI "si") + (V8HF "hf") (V8SF "sf") (V8DI "di") (V8DF "df") + (V16QI "qi") (V16HI "hi") (V16SI "si") + (V16HF "hf") (V16SF "sf") (V16DI "di") (V16DF "df") + (V32QI "qi") (V32HI "hi") (V32SI "si") + (V32HF "hf") (V32SF "sf") (V32DI "di") (V32DF "df") + (V64QI "qi") (V64HI "hi") (V64SI "si") (V64HF "hf") (V64SF "sf") (V64DI "di") (V64DF "df")]) (define_mode_attr SCALAR_MODE - [(V64QI "QI") (V64HI "HI") (V64SI "SI") + [(V2QI "QI") (V2HI "HI") (V2SI "SI") + (V2HF "HF") (V2SF "SF") (V2DI "DI") (V2DF "DF") + (V4QI "QI") (V4HI "HI") (V4SI "SI") + (V4HF "HF") (V4SF "SF") (V4DI "DI") (V4DF "DF") + (V8QI "QI") (V8HI "HI") (V8SI "SI") + (V8HF "HF") (V8SF "SF") (V8DI "DI") (V8DF "DF") + (V16QI "QI") (V16HI "HI") (V16SI "SI") + (V16HF "HF") (V16SF "SF") (V16DI "DI") (V16DF "DF") + (V32QI "QI") (V32HI "HI") (V32SI "SI") + (V32HF "HF") (V32SF "SF") (V32DI "DI") (V32DF "DF") + (V64QI "QI") (V64HI "HI") (V64SI "SI") (V64HF "HF") (V64SF "SF") (V64DI "DI") (V64DF "DF")]) (define_mode_attr vnsi - [(V64QI "v64si") (V64HI "v64si") (V64HF "v64si") (V64SI "v64si") + [(V2QI "v2si") (V2HI "v2si") (V2HF "v2si") (V2SI "v2si") + (V2SF "v2si") (V2DI "v2si") (V2DF "v2si") + (V4QI "v4si") (V4HI "v4si") (V4HF "v4si") (V4SI "v4si") + (V4SF "v4si") (V4DI "v4si") (V4DF "v4si") + (V8QI "v8si") (V8HI "v8si") (V8HF "v8si") (V8SI "v8si") + (V8SF "v8si") (V8DI "v8si") (V8DF "v8si") + (V16QI "v16si") (V16HI "v16si") (V16HF "v16si") (V16SI "v16si") + (V16SF "v16si") (V16DI "v16si") (V16DF "v16si") + (V32QI "v32si") (V32HI "v32si") (V32HF "v32si") (V32SI "v32si") + (V32SF "v32si") (V32DI "v32si") (V32DF "v32si") + (V64QI "v64si") (V64HI "v64si") (V64HF "v64si") (V64SI "v64si") (V64SF "v64si") (V64DI "v64si") (V64DF "v64si")]) (define_mode_attr VnSI - [(V64QI "V64SI") (V64HI "V64SI") (V64HF "V64SI") (V64SI "V64SI") + [(V2QI "V2SI") (V2HI "V2SI") (V2HF "V2SI") (V2SI "V2SI") + (V2SF "V2SI") (V2DI "V2SI") (V2DF "V2SI") + (V4QI "V4SI") (V4HI "V4SI") (V4HF "V4SI") (V4SI "V4SI") + (V4SF "V4SI") (V4DI "V4SI") (V4DF "V4SI") + (V8QI "V8SI") (V8HI "V8SI") (V8HF "V8SI") (V8SI "V8SI") + (V8SF "V8SI") (V8DI "V8SI") (V8DF "V8SI") + (V16QI "V16SI") (V16HI "V16SI") (V16HF "V16SI") (V16SI "V16SI") + (V16SF "V16SI") (V16DI "V16SI") (V16DF "V16SI") + (V32QI "V32SI") (V32HI "V32SI") (V32HF "V32SI") (V32SI "V32SI") + (V32SF "V32SI") (V32DI "V32SI") (V32DF "V32SI") + (V64QI "V64SI") (V64HI "V64SI") (V64HF "V64SI") (V64SI "V64SI") (V64SF "V64SI") (V64DI "V64SI") (V64DF "V64SI")]) (define_mode_attr vndi - [(V64QI "v64di") (V64HI "v64di") (V64HF "v64di") (V64SI "v64di") + [(V2QI "v2di") (V2HI "v2di") (V2HF "v2di") (V2SI "v2di") + (V2SF "v2di") (V2DI "v2di") (V2DF "v2di") + (V4QI "v4di") (V4HI "v4di") (V4HF "v4di") (V4SI "v4di") + (V4SF "v4di") (V4DI "v4di") (V4DF "v4di") + (V8QI "v8di") (V8HI "v8di") (V8HF "v8di") (V8SI "v8di") + (V8SF "v8di") (V8DI "v8di") (V8DF "v8di") + (V16QI "v16di") (V16HI "v16di") (V16HF "v16di") (V16SI "v16di") + (V16SF "v16di") (V16DI "v16di") (V16DF "v16di") + (V32QI "v32di") (V32HI "v32di") (V32HF "v32di") (V32SI "v32di") + (V32SF "v32di") (V32DI "v32di") (V32DF "v32di") + (V64QI "v64di") (V64HI "v64di") (V64HF "v64di") (V64SI "v64di") (V64SF "v64di") (V64DI "v64di") (V64DF "v64di")]) (define_mode_attr VnDI - [(V64QI "V64DI") (V64HI "V64DI") (V64HF "V64DI") (V64SI "V64DI") + [(V2QI "V2DI") (V2HI "V2DI") (V2HF "V2DI") (V2SI "V2DI") + (V2SF "V2DI") (V2DI "V2DI") (V2DF "V2DI") + (V4QI "V4DI") (V4HI "V4DI") (V4HF "V4DI") (V4SI "V4DI") + (V4SF "V4DI") (V4DI "V4DI") (V4DF "V4DI") + (V8QI "V8DI") (V8HI "V8DI") (V8HF "V8DI") (V8SI "V8DI") + (V8SF "V8DI") (V8DI "V8DI") (V8DF "V8DI") + (V16QI "V16DI") (V16HI "V16DI") (V16HF "V16DI") (V16SI "V16DI") + (V16SF "V16DI") (V16DI "V16DI") (V16DF "V16DI") + (V32QI "V32DI") (V32HI "V32DI") (V32HF "V32DI") (V32SI "V32DI") + (V32SF "V32DI") (V32DI "V32DI") (V32DF "V32DI") + (V64QI "V64DI") (V64HI "V64DI") (V64HF "V64DI") (V64SI "V64DI") (V64SF "V64DI") (V64DI "V64DI") (V64DF "V64DI")]) -(define_mode_attr sdwa [(V64QI "BYTE_0") (V64HI "WORD_0") (V64SI "DWORD")]) +(define_mode_attr sdwa + [(V2QI "BYTE_0") (V2HI "WORD_0") (V2SI "DWORD") + (V4QI "BYTE_0") (V4HI "WORD_0") (V4SI "DWORD") + (V8QI "BYTE_0") (V8HI "WORD_0") (V8SI "DWORD") + (V16QI "BYTE_0") (V16HI "WORD_0") (V16SI "DWORD") + (V32QI "BYTE_0") (V32HI "WORD_0") (V32SI "DWORD") + (V64QI "BYTE_0") (V64HI "WORD_0") (V64SI "DWORD")]) ;; }}} ;; {{{ Substitutions @@ -180,6 +335,37 @@ (define_expand "mov" (match_operand:V_ALL 1 "general_operand"))] "" { + /* Bitwise reinterpret casts via SUBREG don't work with GCN vector + registers, but we can convert the MEM to a mode that does work. */ + if (MEM_P (operands[0]) && !SUBREG_P (operands[0]) + && SUBREG_P (operands[1]) + && GET_MODE_SIZE (GET_MODE (operands[1])) + == GET_MODE_SIZE (GET_MODE (SUBREG_REG (operands[1])))) + { + rtx src = SUBREG_REG (operands[1]); + rtx mem = copy_rtx (operands[0]); + PUT_MODE_RAW (mem, GET_MODE (src)); + emit_move_insn (mem, src); + DONE; + } + if (MEM_P (operands[1]) && !SUBREG_P (operands[1]) + && SUBREG_P (operands[0]) + && GET_MODE_SIZE (GET_MODE (operands[0])) + == GET_MODE_SIZE (GET_MODE (SUBREG_REG (operands[0])))) + { + rtx dest = SUBREG_REG (operands[0]); + rtx mem = copy_rtx (operands[1]); + PUT_MODE_RAW (mem, GET_MODE (dest)); + emit_move_insn (dest, mem); + DONE; + } + + /* SUBREG of MEM is not supported. */ + gcc_assert ((!SUBREG_P (operands[0]) + || !MEM_P (SUBREG_REG (operands[0]))) + && (!SUBREG_P (operands[1]) + || !MEM_P (SUBREG_REG (operands[1])))); + if (MEM_P (operands[0]) && !lra_in_progress && !reload_completed) { operands[1] = force_reg (mode, operands[1]); @@ -2419,10 +2605,10 @@ (define_insn "ldexp3" (set_attr "length" "8")]) (define_insn "ldexp3" - [(set (match_operand:V_FP 0 "register_operand" "=v") + [(set (match_operand:V_FP 0 "register_operand" "= v") (unspec:V_FP - [(match_operand:V_FP 1 "gcn_alu_operand" "vB") - (match_operand:V64SI 2 "gcn_alu_operand" "vSvA")] + [(match_operand:V_FP 1 "gcn_alu_operand" " vB") + (match_operand: 2 "gcn_alu_operand" "vSvA")] UNSPEC_LDEXP))] "" "v_ldexp%i0\t%0, %1, %2" @@ -2452,8 +2638,8 @@ (define_insn "frexp_mant2" (set_attr "length" "8")]) (define_insn "frexp_exp2" - [(set (match_operand:V64SI 0 "register_operand" "=v") - (unspec:V64SI + [(set (match_operand: 0 "register_operand" "=v") + (unspec: [(match_operand:V_FP 1 "gcn_alu_operand" "vB")] UNSPEC_FREXP_EXP))] "" @@ -2640,9 +2826,27 @@ (define_expand "div3" (define_mode_iterator CVT_FROM_MODE [HI SI HF SF DF]) (define_mode_iterator CVT_TO_MODE [HI SI HF SF DF]) -(define_mode_iterator VCVT_MODE [V64HI V64SI V64HF V64SF V64DF]) -(define_mode_iterator VCVT_FMODE [V64HF V64SF V64DF]) -(define_mode_iterator VCVT_IMODE [V64HI V64SI]) +(define_mode_iterator VCVT_MODE + [V2HI V2SI V2HF V2SF V2DF + V4HI V4SI V4HF V4SF V4DF + V8HI V8SI V8HF V8SF V8DF + V16HI V16SI V16HF V16SF V16DF + V32HI V32SI V32HF V32SF V32DF + V64HI V64SI V64HF V64SF V64DF]) +(define_mode_iterator VCVT_FMODE + [V2HF V2SF V2DF + V4HF V4SF V4DF + V8HF V8SF V8DF + V16HF V16SF V16DF + V32HF V32SF V32DF + V64HF V64SF V64DF]) +(define_mode_iterator VCVT_IMODE + [V2HI V2SI + V4HI V4SI + V8HI V8SI + V16HI V16SI + V32HI V32SI + V64HI V64SI]) (define_code_iterator cvt_op [fix unsigned_fix float unsigned_float @@ -3265,7 +3469,7 @@ (define_int_attr reduc_insn [(UNSPEC_SMIN_DPP_SHR "v_min%i0") (define_expand "reduc__scal_" [(set (match_operand: 0 "register_operand") (unspec: - [(match_operand:V_ALL 1 "register_operand")] + [(match_operand:V64_ALL 1 "register_operand")] REDUC_UNSPEC))] "" { @@ -3284,7 +3488,7 @@ (define_expand "reduc__scal_" (define_expand "fold_left_plus_" [(match_operand: 0 "register_operand") (match_operand: 1 "gcn_alu_operand") - (match_operand:V_FP 2 "gcn_alu_operand")] + (match_operand:V64_FP 2 "gcn_alu_operand")] "can_create_pseudo_p () && (flag_openacc || flag_openmp || flag_associative_math)" @@ -3300,11 +3504,11 @@ (define_expand "fold_left_plus_" }) (define_insn "*_dpp_shr_" - [(set (match_operand:V_1REG 0 "register_operand" "=v") - (unspec:V_1REG - [(match_operand:V_1REG 1 "register_operand" "v") - (match_operand:V_1REG 2 "register_operand" "v") - (match_operand:SI 3 "const_int_operand" "n")] + [(set (match_operand:V64_1REG 0 "register_operand" "=v") + (unspec:V64_1REG + [(match_operand:V64_1REG 1 "register_operand" "v") + (match_operand:V64_1REG 2 "register_operand" "v") + (match_operand:SI 3 "const_int_operand" "n")] REDUC_UNSPEC))] ; GCN3 requires a carry out, GCN5 not "!(TARGET_GCN3 && SCALAR_INT_MODE_P (mode) @@ -3317,11 +3521,11 @@ (define_insn "*_dpp_shr_" (set_attr "length" "8")]) (define_insn_and_split "*_dpp_shr_" - [(set (match_operand:V_DI 0 "register_operand" "=v") - (unspec:V_DI - [(match_operand:V_DI 1 "register_operand" "v") - (match_operand:V_DI 2 "register_operand" "v") - (match_operand:SI 3 "const_int_operand" "n")] + [(set (match_operand:V64_DI 0 "register_operand" "=v") + (unspec:V64_DI + [(match_operand:V64_DI 1 "register_operand" "v") + (match_operand:V64_DI 2 "register_operand" "v") + (match_operand:SI 3 "const_int_operand" "n")] REDUC_2REG_UNSPEC))] "" "#" @@ -3346,10 +3550,10 @@ (define_insn_and_split "*_dpp_shr_" ; Special cases for addition. (define_insn "*plus_carry_dpp_shr_" - [(set (match_operand:V_INT_1REG 0 "register_operand" "=v") - (unspec:V_INT_1REG - [(match_operand:V_INT_1REG 1 "register_operand" "v") - (match_operand:V_INT_1REG 2 "register_operand" "v") + [(set (match_operand:V64_INT_1REG 0 "register_operand" "=v") + (unspec:V64_INT_1REG + [(match_operand:V64_INT_1REG 1 "register_operand" "v") + (match_operand:V64_INT_1REG 2 "register_operand" "v") (match_operand:SI 3 "const_int_operand" "n")] UNSPEC_PLUS_CARRY_DPP_SHR)) (clobber (reg:DI VCC_REG))] @@ -3363,12 +3567,12 @@ (define_insn "*plus_carry_dpp_shr_" (set_attr "length" "8")]) (define_insn "*plus_carry_in_dpp_shr_" - [(set (match_operand:V_SI 0 "register_operand" "=v") - (unspec:V_SI - [(match_operand:V_SI 1 "register_operand" "v") - (match_operand:V_SI 2 "register_operand" "v") - (match_operand:SI 3 "const_int_operand" "n") - (match_operand:DI 4 "register_operand" "cV")] + [(set (match_operand:V64_SI 0 "register_operand" "=v") + (unspec:V64_SI + [(match_operand:V64_SI 1 "register_operand" "v") + (match_operand:V64_SI 2 "register_operand" "v") + (match_operand:SI 3 "const_int_operand" "n") + (match_operand:DI 4 "register_operand" "cV")] UNSPEC_PLUS_CARRY_IN_DPP_SHR)) (clobber (reg:DI VCC_REG))] "" @@ -3381,11 +3585,11 @@ (define_insn "*plus_carry_in_dpp_shr_" (set_attr "length" "8")]) (define_insn_and_split "*plus_carry_dpp_shr_" - [(set (match_operand:V_DI 0 "register_operand" "=v") - (unspec:V_DI - [(match_operand:V_DI 1 "register_operand" "v") - (match_operand:V_DI 2 "register_operand" "v") - (match_operand:SI 3 "const_int_operand" "n")] + [(set (match_operand:V64_DI 0 "register_operand" "=v") + (unspec:V64_DI + [(match_operand:V64_DI 1 "register_operand" "v") + (match_operand:V64_DI 2 "register_operand" "v") + (match_operand:SI 3 "const_int_operand" "n")] UNSPEC_PLUS_CARRY_DPP_SHR)) (clobber (reg:DI VCC_REG))] "" @@ -3416,7 +3620,7 @@ (define_insn_and_split "*plus_carry_dpp_shr_" (define_insn "mov_from_lane63_" [(set (match_operand: 0 "register_operand" "=Sg,v") (unspec: - [(match_operand:V_1REG 1 "register_operand" " v,v")] + [(match_operand:V64_1REG 1 "register_operand" " v,v")] UNSPEC_MOV_FROM_LANE63))] "" "@ @@ -3429,7 +3633,7 @@ (define_insn "mov_from_lane63_" (define_insn "mov_from_lane63_" [(set (match_operand: 0 "register_operand" "=Sg,v") (unspec: - [(match_operand:V_2REG 1 "register_operand" " v,v")] + [(match_operand:V64_2REG 1 "register_operand" " v,v")] UNSPEC_MOV_FROM_LANE63))] "" "@ diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index c27ee91210e..e1636f6ddd6 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -395,6 +395,97 @@ gcn_scalar_mode_supported_p (scalar_mode mode) || mode == TImode); } +/* Return a vector mode with N lanes of MODE. */ + +static machine_mode +VnMODE (int n, machine_mode mode) +{ + switch (mode) + { + case QImode: + switch (n) + { + case 2: return V2QImode; + case 4: return V4QImode; + case 8: return V8QImode; + case 16: return V16QImode; + case 32: return V32QImode; + case 64: return V64QImode; + } + break; + case HImode: + switch (n) + { + case 2: return V2HImode; + case 4: return V4HImode; + case 8: return V8HImode; + case 16: return V16HImode; + case 32: return V32HImode; + case 64: return V64HImode; + } + break; + case HFmode: + switch (n) + { + case 2: return V2HFmode; + case 4: return V4HFmode; + case 8: return V8HFmode; + case 16: return V16HFmode; + case 32: return V32HFmode; + case 64: return V64HFmode; + } + break; + case SImode: + switch (n) + { + case 2: return V2SImode; + case 4: return V4SImode; + case 8: return V8SImode; + case 16: return V16SImode; + case 32: return V32SImode; + case 64: return V64SImode; + } + break; + case SFmode: + switch (n) + { + case 2: return V2SFmode; + case 4: return V4SFmode; + case 8: return V8SFmode; + case 16: return V16SFmode; + case 32: return V32SFmode; + case 64: return V64SFmode; + } + break; + case DImode: + switch (n) + { + case 2: return V2DImode; + case 4: return V4DImode; + case 8: return V8DImode; + case 16: return V16DImode; + case 32: return V32DImode; + case 64: return V64DImode; + } + break; + case DFmode: + switch (n) + { + case 2: return V2DFmode; + case 4: return V4DFmode; + case 8: return V8DFmode; + case 16: return V16DFmode; + case 32: return V32DFmode; + case 64: return V64DFmode; + } + break; + default: + break; + } + + return VOIDmode; +} + /* Implement TARGET_CLASS_MAX_NREGS. Return the number of hard registers needed to hold a value of MODE in @@ -556,6 +647,23 @@ gcn_can_change_mode_class (machine_mode from, machine_mode to, { if (!vgpr_vector_mode_p (from) && !vgpr_vector_mode_p (to)) return true; + + /* Vector conversions are only valid when changing mode with a fixed number + of lanes, or changing number of lanes with a fixed mode. Anything else + would require actual data movement. */ + if (VECTOR_MODE_P (from) && VECTOR_MODE_P (to) + && GET_MODE_NUNITS (from) != GET_MODE_NUNITS (to) + && GET_MODE_INNER (from) != GET_MODE_INNER (to)) + return false; + + /* Vector/scalar conversions are only permitted when the scalar mode + is the same or smaller than the inner vector mode. */ + if ((VECTOR_MODE_P (from) && !VECTOR_MODE_P (to) + && GET_MODE_SIZE (to) >= GET_MODE_SIZE (GET_MODE_INNER (from))) + || (VECTOR_MODE_P (to) && !VECTOR_MODE_P (from) + && GET_MODE_SIZE (from) >= GET_MODE_SIZE (GET_MODE_INNER (to)))) + return false; + return (gcn_class_max_nregs (regclass, from) == gcn_class_max_nregs (regclass, to)); } @@ -595,6 +703,16 @@ gcn_class_likely_spilled_p (reg_class_t rclass) bool gcn_modes_tieable_p (machine_mode mode1, machine_mode mode2) { + if (VECTOR_MODE_P (mode1) || VECTOR_MODE_P (mode2)) + { + int vf1 = (VECTOR_MODE_P (mode1) ? GET_MODE_NUNITS (mode1) : 1); + int vf2 = (VECTOR_MODE_P (mode2) ? GET_MODE_NUNITS (mode2) : 1); + machine_mode inner1 = (vf1 > 1 ? GET_MODE_INNER (mode1) : mode1); + machine_mode inner2 = (vf2 > 1 ? GET_MODE_INNER (mode2) : mode2); + + return (vf1 == vf2 || (inner1 == inner2 && vf2 <= vf1)); + } + return (GET_MODE_BITSIZE (mode1) <= MAX_FIXED_MODE_SIZE && GET_MODE_BITSIZE (mode2) <= MAX_FIXED_MODE_SIZE); } @@ -616,14 +734,16 @@ gcn_truly_noop_truncation (poly_uint64 outprec, poly_uint64 inprec) rtx gcn_operand_part (machine_mode mode, rtx op, int n) { - if (GET_MODE_SIZE (mode) >= 256) + int vf = VECTOR_MODE_P (mode) ? GET_MODE_NUNITS (mode) : 1; + + if (vf > 1) { - /*gcc_assert (GET_MODE_SIZE (mode) == 256 || n == 0); */ + machine_mode vsimode = VnMODE (vf, SImode); if (REG_P (op)) { gcc_assert (REGNO (op) + n < FIRST_PSEUDO_REGISTER); - return gen_rtx_REG (V64SImode, REGNO (op) + n); + return gen_rtx_REG (vsimode, REGNO (op) + n); } if (GET_CODE (op) == CONST_VECTOR) { @@ -634,10 +754,10 @@ gcn_operand_part (machine_mode mode, rtx op, int n) RTVEC_ELT (v, i) = gcn_operand_part (GET_MODE_INNER (mode), CONST_VECTOR_ELT (op, i), n); - return gen_rtx_CONST_VECTOR (V64SImode, v); + return gen_rtx_CONST_VECTOR (vsimode, v); } if (GET_CODE (op) == UNSPEC && XINT (op, 1) == UNSPEC_VECTOR) - return gcn_gen_undef (V64SImode); + return gcn_gen_undef (vsimode); gcc_unreachable (); } else if (GET_MODE_SIZE (mode) == 8 && REG_P (op)) @@ -734,38 +854,6 @@ get_exec (int64_t val) return reg; } -/* Return value of scalar exec register. */ - -rtx -gcn_scalar_exec () -{ - return const1_rtx; -} - -/* Return pseudo holding scalar exec register. */ - -rtx -gcn_scalar_exec_reg () -{ - return get_exec (1); -} - -/* Return value of full exec register. */ - -rtx -gcn_full_exec () -{ - return constm1_rtx; -} - -/* Return pseudo holding full exec register. */ - -rtx -gcn_full_exec_reg () -{ - return get_exec (-1); -} - /* }}} */ /* {{{ Immediate constants. */ @@ -802,8 +890,13 @@ int gcn_inline_fp_constant_p (rtx x, bool allow_vector) { machine_mode mode = GET_MODE (x); + int vf = VECTOR_MODE_P (mode) ? GET_MODE_NUNITS (mode) : 1; - if ((mode == V64HFmode || mode == V64SFmode || mode == V64DFmode) + if (vf > 1) + mode = GET_MODE_INNER (mode); + + if (vf > 1 + && (mode == HFmode || mode == SFmode || mode == DFmode) && allow_vector) { int n; @@ -812,7 +905,7 @@ gcn_inline_fp_constant_p (rtx x, bool allow_vector) n = gcn_inline_fp_constant_p (CONST_VECTOR_ELT (x, 0), false); if (!n) return 0; - for (int i = 1; i < 64; i++) + for (int i = 1; i < vf; i++) if (CONST_VECTOR_ELT (x, i) != CONST_VECTOR_ELT (x, 0)) return 0; return 1; @@ -867,8 +960,13 @@ bool gcn_fp_constant_p (rtx x, bool allow_vector) { machine_mode mode = GET_MODE (x); + int vf = VECTOR_MODE_P (mode) ? GET_MODE_NUNITS (mode) : 1; - if ((mode == V64HFmode || mode == V64SFmode || mode == V64DFmode) + if (vf > 1) + mode = GET_MODE_INNER (mode); + + if (vf > 1 + && (mode == HFmode || mode == SFmode || mode == DFmode) && allow_vector) { int n; @@ -877,7 +975,7 @@ gcn_fp_constant_p (rtx x, bool allow_vector) n = gcn_fp_constant_p (CONST_VECTOR_ELT (x, 0), false); if (!n) return false; - for (int i = 1; i < 64; i++) + for (int i = 1; i < vf; i++) if (CONST_VECTOR_ELT (x, i) != CONST_VECTOR_ELT (x, 0)) return false; return true; @@ -1090,6 +1188,244 @@ gcn_gen_undef (machine_mode mode) return gen_rtx_UNSPEC (mode, gen_rtvec (1, const0_rtx), UNSPEC_VECTOR); } +/* }}} */ +/* {{{ Utility functions. */ + +/* Generalised accessor functions for instruction patterns. + The machine desription '@' prefix does something similar, but as of + GCC 10 is incompatible with define_subst, and anyway it doesn't + auto-handle the exec feature. + + Four macros are provided; each function only needs one: + + GEN_VN - create accessor functions for all sizes of one mode + GEN_VNM - create accessor functions for all sizes of all modes + GEN_VN_NOEXEC - for insns without "_exec" variants + GEN_VNM_NOEXEC - likewise + + E.g. add3 + GEN_VNM (add, 3, A(rtx dest, rtx s1, rtx s2), A(dest, s1, s2) + + gen_addvNsi3 (dst, a, b) + -> calls gen_addv64si3, or gen_addv32si3, etc. + + gen_addvNm3 (dst, a, b) + -> calls gen_addv64qi3, or gen_addv2di3, etc. + + The mode is determined from the first parameter, which must be called + "dest" (or else the macro doesn't work). + + Each function has two optional parameters at the end: merge_src and exec. + If exec is non-null, the function will call the "_exec" variant of the + insn. If exec is non-null but merge_src is null then an undef unspec + will be created. + + E.g. cont. + gen_addvNsi3 (v64sidst, a, b, oldval, exec) + -> calls gen_addv64si3_exec (v64sidst, a, b, oldval, exec) + + gen_addvNm3 (v2qidst, a, b, NULL, exec) + -> calls gen_addv2qi3_exec (v2qidst, a, b, + gcn_gen_undef (V2QImode), exec) + */ + +#define A(...) __VA_ARGS__ +#define GEN_VN_NOEXEC(PREFIX, SUFFIX, PARAMS, ARGS) \ +static rtx \ +gen_##PREFIX##vN##SUFFIX (PARAMS) \ +{ \ + machine_mode mode = GET_MODE (dest); \ + int n = GET_MODE_NUNITS (mode); \ + \ + switch (n) \ + { \ + case 2: return gen_##PREFIX##v2##SUFFIX (ARGS); \ + case 4: return gen_##PREFIX##v4##SUFFIX (ARGS); \ + case 8: return gen_##PREFIX##v8##SUFFIX (ARGS); \ + case 16: return gen_##PREFIX##v16##SUFFIX (ARGS); \ + case 32: return gen_##PREFIX##v32##SUFFIX (ARGS); \ + case 64: return gen_##PREFIX##v64##SUFFIX (ARGS); \ + } \ + \ + gcc_unreachable (); \ + return NULL_RTX; \ +} + +#define GEN_VNM_NOEXEC(PREFIX, SUFFIX, PARAMS, ARGS) \ +GEN_VN_NOEXEC (PREFIX, qi##SUFFIX, A(PARAMS), A(ARGS)) \ +GEN_VN_NOEXEC (PREFIX, hi##SUFFIX, A(PARAMS), A(ARGS)) \ +GEN_VN_NOEXEC (PREFIX, hf##SUFFIX, A(PARAMS), A(ARGS)) \ +GEN_VN_NOEXEC (PREFIX, si##SUFFIX, A(PARAMS), A(ARGS)) \ +GEN_VN_NOEXEC (PREFIX, sf##SUFFIX, A(PARAMS), A(ARGS)) \ +GEN_VN_NOEXEC (PREFIX, di##SUFFIX, A(PARAMS), A(ARGS)) \ +GEN_VN_NOEXEC (PREFIX, df##SUFFIX, A(PARAMS), A(ARGS)) \ +static rtx \ +gen_##PREFIX##vNm##SUFFIX (PARAMS) \ +{ \ + machine_mode mode = GET_MODE_INNER (GET_MODE (dest)); \ + \ + switch (mode) \ + { \ + case E_QImode: return gen_##PREFIX##vNqi##SUFFIX (ARGS); \ + case E_HImode: return gen_##PREFIX##vNhi##SUFFIX (ARGS); \ + case E_HFmode: return gen_##PREFIX##vNhf##SUFFIX (ARGS); \ + case E_SImode: return gen_##PREFIX##vNsi##SUFFIX (ARGS); \ + case E_SFmode: return gen_##PREFIX##vNsf##SUFFIX (ARGS); \ + case E_DImode: return gen_##PREFIX##vNdi##SUFFIX (ARGS); \ + case E_DFmode: return gen_##PREFIX##vNdf##SUFFIX (ARGS); \ + default: \ + break; \ + } \ + \ + gcc_unreachable (); \ + return NULL_RTX; \ +} + +#define GEN_VN(PREFIX, SUFFIX, PARAMS, ARGS) \ +static rtx \ +gen_##PREFIX##vN##SUFFIX (PARAMS, rtx merge_src=NULL, rtx exec=NULL) \ +{ \ + machine_mode mode = GET_MODE (dest); \ + int n = GET_MODE_NUNITS (mode); \ + \ + if (exec && !merge_src) \ + merge_src = gcn_gen_undef (mode); \ + \ + if (exec) \ + switch (n) \ + { \ + case 2: return gen_##PREFIX##v2##SUFFIX##_exec (ARGS, merge_src, exec); \ + case 4: return gen_##PREFIX##v4##SUFFIX##_exec (ARGS, merge_src, exec); \ + case 8: return gen_##PREFIX##v8##SUFFIX##_exec (ARGS, merge_src, exec); \ + case 16: return gen_##PREFIX##v16##SUFFIX##_exec (ARGS, merge_src, exec); \ + case 32: return gen_##PREFIX##v32##SUFFIX##_exec (ARGS, merge_src, exec); \ + case 64: return gen_##PREFIX##v64##SUFFIX##_exec (ARGS, merge_src, exec); \ + } \ + else \ + switch (n) \ + { \ + case 2: return gen_##PREFIX##v2##SUFFIX (ARGS); \ + case 4: return gen_##PREFIX##v4##SUFFIX (ARGS); \ + case 8: return gen_##PREFIX##v8##SUFFIX (ARGS); \ + case 16: return gen_##PREFIX##v16##SUFFIX (ARGS); \ + case 32: return gen_##PREFIX##v32##SUFFIX (ARGS); \ + case 64: return gen_##PREFIX##v64##SUFFIX (ARGS); \ + } \ + \ + gcc_unreachable (); \ + return NULL_RTX; \ +} + +#define GEN_VNM(PREFIX, SUFFIX, PARAMS, ARGS) \ +GEN_VN (PREFIX, qi##SUFFIX, A(PARAMS), A(ARGS)) \ +GEN_VN (PREFIX, hi##SUFFIX, A(PARAMS), A(ARGS)) \ +GEN_VN (PREFIX, hf##SUFFIX, A(PARAMS), A(ARGS)) \ +GEN_VN (PREFIX, si##SUFFIX, A(PARAMS), A(ARGS)) \ +GEN_VN (PREFIX, sf##SUFFIX, A(PARAMS), A(ARGS)) \ +GEN_VN (PREFIX, di##SUFFIX, A(PARAMS), A(ARGS)) \ +GEN_VN (PREFIX, df##SUFFIX, A(PARAMS), A(ARGS)) \ +static rtx \ +gen_##PREFIX##vNm##SUFFIX (PARAMS, rtx merge_src=NULL, rtx exec=NULL) \ +{ \ + machine_mode mode = GET_MODE_INNER (GET_MODE (dest)); \ + \ + switch (mode) \ + { \ + case E_QImode: return gen_##PREFIX##vNqi##SUFFIX (ARGS, merge_src, exec); \ + case E_HImode: return gen_##PREFIX##vNhi##SUFFIX (ARGS, merge_src, exec); \ + case E_HFmode: return gen_##PREFIX##vNhf##SUFFIX (ARGS, merge_src, exec); \ + case E_SImode: return gen_##PREFIX##vNsi##SUFFIX (ARGS, merge_src, exec); \ + case E_SFmode: return gen_##PREFIX##vNsf##SUFFIX (ARGS, merge_src, exec); \ + case E_DImode: return gen_##PREFIX##vNdi##SUFFIX (ARGS, merge_src, exec); \ + case E_DFmode: return gen_##PREFIX##vNdf##SUFFIX (ARGS, merge_src, exec); \ + default: \ + break; \ + } \ + \ + gcc_unreachable (); \ + return NULL_RTX; \ +} + +GEN_VNM (add,3, A(rtx dest, rtx src1, rtx src2), A(dest, src1, src2)) +GEN_VN (add,si3_dup, A(rtx dest, rtx src1, rtx src2), A(dest, src1, src2)) +GEN_VN (add,si3_vcc_dup, A(rtx dest, rtx src1, rtx src2, rtx vcc), + A(dest, src1, src2, vcc)) +GEN_VN (add,di3_sext_dup2, A(rtx dest, rtx src1, rtx src2), A(dest, src1, src2)) +GEN_VN (add,di3_vcc_zext_dup, A(rtx dest, rtx src1, rtx src2, rtx vcc), + A(dest, src1, src2, vcc)) +GEN_VN (add,di3_zext_dup2, A(rtx dest, rtx src1, rtx src2), A(dest, src1, src2)) +GEN_VN (add,di3_vcc_zext_dup2, A(rtx dest, rtx src1, rtx src2, rtx vcc), + A(dest, src1, src2, vcc)) +GEN_VN (addc,si3, A(rtx dest, rtx src1, rtx src2, rtx vccout, rtx vccin), + A(dest, src1, src2, vccout, vccin)) +GEN_VN (ashl,si3, A(rtx dest, rtx src, rtx shift), A(dest, src, shift)) +GEN_VNM_NOEXEC (ds_bpermute,, A(rtx dest, rtx addr, rtx src, rtx exec), + A(dest, addr, src, exec)) +GEN_VNM (mov,, A(rtx dest, rtx src), A(dest, src)) +GEN_VN (mul,si3_dup, A(rtx dest, rtx src1, rtx src2), A(dest, src1, src2)) +GEN_VNM (vec_duplicate,, A(rtx dest, rtx src), A(dest, src)) + +#undef GEN_VNM +#undef GEN_VN +#undef GET_VN_FN +#undef A + +/* Get icode for vector instructions without an optab. */ + +#define CODE_FOR(PREFIX, SUFFIX) \ +static int \ +get_code_for_##PREFIX##vN##SUFFIX (int nunits) \ +{ \ + switch (nunits) \ + { \ + case 2: return CODE_FOR_##PREFIX##v2##SUFFIX; \ + case 4: return CODE_FOR_##PREFIX##v4##SUFFIX; \ + case 8: return CODE_FOR_##PREFIX##v8##SUFFIX; \ + case 16: return CODE_FOR_##PREFIX##v16##SUFFIX; \ + case 32: return CODE_FOR_##PREFIX##v32##SUFFIX; \ + case 64: return CODE_FOR_##PREFIX##v64##SUFFIX; \ + } \ + \ + gcc_unreachable (); \ + return CODE_FOR_nothing; \ +} + +#define CODE_FOR_OP(PREFIX) \ + CODE_FOR (PREFIX, qi) \ + CODE_FOR (PREFIX, hi) \ + CODE_FOR (PREFIX, hf) \ + CODE_FOR (PREFIX, si) \ + CODE_FOR (PREFIX, sf) \ + CODE_FOR (PREFIX, di) \ + CODE_FOR (PREFIX, df) \ +static int \ +get_code_for_##PREFIX (machine_mode mode) \ +{ \ + int vf = GET_MODE_NUNITS (mode); \ + machine_mode smode = GET_MODE_INNER (mode); \ + \ + switch (smode) \ + { \ + case E_QImode: return get_code_for_##PREFIX##vNqi (vf); \ + case E_HImode: return get_code_for_##PREFIX##vNhi (vf); \ + case E_HFmode: return get_code_for_##PREFIX##vNhf (vf); \ + case E_SImode: return get_code_for_##PREFIX##vNsi (vf); \ + case E_SFmode: return get_code_for_##PREFIX##vNsf (vf); \ + case E_DImode: return get_code_for_##PREFIX##vNdi (vf); \ + case E_DFmode: return get_code_for_##PREFIX##vNdf (vf); \ + default: break; \ + } \ + \ + gcc_unreachable (); \ + return CODE_FOR_nothing; \ +} + +CODE_FOR_OP (reload_in) +CODE_FOR_OP (reload_out) + +#undef CODE_FOR_OP +#undef CODE_FOR + /* }}} */ /* {{{ Addresses, pointers and moves. */ @@ -1644,60 +1980,6 @@ regno_ok_for_index_p (int regno) return regno == M0_REG || VGPR_REGNO_P (regno); } -/* Generate move which uses the exec flags. If EXEC is NULL, then it is - assumed that all lanes normally relevant to the mode of the move are - affected. If PREV is NULL, then a sensible default is supplied for - the inactive lanes. */ - -static rtx -gen_mov_with_exec (rtx op0, rtx op1, rtx exec = NULL, rtx prev = NULL) -{ - machine_mode mode = GET_MODE (op0); - - if (vgpr_vector_mode_p (mode)) - { - if (exec && exec != CONSTM1_RTX (DImode)) - { - if (!prev) - prev = op0; - } - else - { - if (!prev) - prev = gcn_gen_undef (mode); - exec = gcn_full_exec_reg (); - } - - rtx set = gen_rtx_SET (op0, gen_rtx_VEC_MERGE (mode, op1, prev, exec)); - - return gen_rtx_PARALLEL (VOIDmode, - gen_rtvec (2, set, - gen_rtx_CLOBBER (VOIDmode, - gen_rtx_SCRATCH (V64DImode)))); - } - - return (gen_rtx_PARALLEL - (VOIDmode, - gen_rtvec (2, gen_rtx_SET (op0, op1), - gen_rtx_USE (VOIDmode, - exec ? exec : gcn_scalar_exec ())))); -} - -/* Generate masked move. */ - -static rtx -gen_duplicate_load (rtx op0, rtx op1, rtx op2 = NULL, rtx exec = NULL) -{ - if (exec) - return (gen_rtx_SET (op0, - gen_rtx_VEC_MERGE (GET_MODE (op0), - gen_rtx_VEC_DUPLICATE (GET_MODE - (op0), op1), - op2, exec))); - else - return (gen_rtx_SET (op0, gen_rtx_VEC_DUPLICATE (GET_MODE (op0), op1))); -} - /* Expand vector init of OP0 by VEC. Implements vec_init instruction pattern. */ @@ -1707,10 +1989,11 @@ gcn_expand_vector_init (rtx op0, rtx vec) int64_t initialized_mask = 0; int64_t curr_mask = 1; machine_mode mode = GET_MODE (op0); + int vf = GET_MODE_NUNITS (mode); rtx val = XVECEXP (vec, 0, 0); - for (int i = 1; i < 64; i++) + for (int i = 1; i < vf; i++) if (rtx_equal_p (val, XVECEXP (vec, 0, i))) curr_mask |= (int64_t) 1 << i; @@ -1719,26 +2002,26 @@ gcn_expand_vector_init (rtx op0, rtx vec) else { val = force_reg (GET_MODE_INNER (mode), val); - emit_insn (gen_duplicate_load (op0, val)); + emit_insn (gen_vec_duplicatevNm (op0, val)); } initialized_mask |= curr_mask; - for (int i = 1; i < 64; i++) + for (int i = 1; i < vf; i++) if (!(initialized_mask & ((int64_t) 1 << i))) { curr_mask = (int64_t) 1 << i; rtx val = XVECEXP (vec, 0, i); - for (int j = i + 1; j < 64; j++) + for (int j = i + 1; j < vf; j++) if (rtx_equal_p (val, XVECEXP (vec, 0, j))) curr_mask |= (int64_t) 1 << j; if (gcn_constant_p (val)) - emit_insn (gen_mov_with_exec (op0, gcn_vec_constant (mode, val), - get_exec (curr_mask))); + emit_insn (gen_movvNm (op0, gcn_vec_constant (mode, val), op0, + get_exec (curr_mask))); else { val = force_reg (GET_MODE_INNER (mode), val); - emit_insn (gen_duplicate_load (op0, val, op0, - get_exec (curr_mask))); + emit_insn (gen_vec_duplicatevNm (op0, val, op0, + get_exec (curr_mask))); } initialized_mask |= curr_mask; } @@ -1751,18 +2034,18 @@ strided_constant (machine_mode mode, int base, int val) { rtx x = gen_reg_rtx (mode); emit_move_insn (x, gcn_vec_constant (mode, base)); - emit_insn (gen_addv64si3_exec (x, x, gcn_vec_constant (mode, val * 32), - x, get_exec (0xffffffff00000000))); - emit_insn (gen_addv64si3_exec (x, x, gcn_vec_constant (mode, val * 16), - x, get_exec (0xffff0000ffff0000))); - emit_insn (gen_addv64si3_exec (x, x, gcn_vec_constant (mode, val * 8), - x, get_exec (0xff00ff00ff00ff00))); - emit_insn (gen_addv64si3_exec (x, x, gcn_vec_constant (mode, val * 4), - x, get_exec (0xf0f0f0f0f0f0f0f0))); - emit_insn (gen_addv64si3_exec (x, x, gcn_vec_constant (mode, val * 2), - x, get_exec (0xcccccccccccccccc))); - emit_insn (gen_addv64si3_exec (x, x, gcn_vec_constant (mode, val * 1), - x, get_exec (0xaaaaaaaaaaaaaaaa))); + emit_insn (gen_addvNm3 (x, x, gcn_vec_constant (mode, val * 32), + x, get_exec (0xffffffff00000000))); + emit_insn (gen_addvNm3 (x, x, gcn_vec_constant (mode, val * 16), + x, get_exec (0xffff0000ffff0000))); + emit_insn (gen_addvNm3 (x, x, gcn_vec_constant (mode, val * 8), + x, get_exec (0xff00ff00ff00ff00))); + emit_insn (gen_addvNm3 (x, x, gcn_vec_constant (mode, val * 4), + x, get_exec (0xf0f0f0f0f0f0f0f0))); + emit_insn (gen_addvNm3 (x, x, gcn_vec_constant (mode, val * 2), + x, get_exec (0xcccccccccccccccc))); + emit_insn (gen_addvNm3 (x, x, gcn_vec_constant (mode, val * 1), + x, get_exec (0xaaaaaaaaaaaaaaaa))); return x; } @@ -1792,15 +2075,17 @@ gcn_addr_space_legitimize_address (rtx x, rtx old, machine_mode mode, case ADDR_SPACE_LDS: case ADDR_SPACE_GDS: /* FIXME: LDS support offsets, handle them!. */ - if (vgpr_vector_mode_p (mode) && GET_MODE (x) != V64SImode) + if (vgpr_vector_mode_p (mode) + && GET_MODE_INNER (GET_MODE (x)) != SImode) { - rtx addrs = gen_reg_rtx (V64SImode); + machine_mode simode = VnMODE (GET_MODE_NUNITS (mode), SImode); + rtx addrs = gen_reg_rtx (simode); rtx base = force_reg (SImode, x); - rtx offsets = strided_constant (V64SImode, 0, + rtx offsets = strided_constant (simode, 0, GET_MODE_UNIT_SIZE (mode)); - emit_insn (gen_vec_duplicatev64si (addrs, base)); - emit_insn (gen_addv64si3 (addrs, offsets, addrs)); + emit_insn (gen_vec_duplicatevNsi (addrs, base)); + emit_insn (gen_addvNsi3 (addrs, offsets, addrs)); return addrs; } return x; @@ -1808,16 +2093,18 @@ gcn_addr_space_legitimize_address (rtx x, rtx old, machine_mode mode, gcc_unreachable (); } -/* Convert a (mem: (reg:DI)) to (mem: (reg:V64DI)) with the +/* Convert a (mem: (reg:DI)) to (mem: (reg:VnDI)) with the proper vector of stepped addresses. MEM will be a DImode address of a vector in an SGPR. - TMP will be a V64DImode VGPR pair or (scratch:V64DI). */ + TMP will be a VnDImode VGPR pair or (scratch:VnDI). */ rtx gcn_expand_scalar_to_vector_address (machine_mode mode, rtx exec, rtx mem, rtx tmp) { + machine_mode pmode = VnMODE (GET_MODE_NUNITS (mode), DImode); + machine_mode offmode = VnMODE (GET_MODE_NUNITS (mode), SImode); gcc_assert (MEM_P (mem)); rtx mem_base = XEXP (mem, 0); rtx mem_index = NULL_RTX; @@ -1841,22 +2128,18 @@ gcn_expand_scalar_to_vector_address (machine_mode mode, rtx exec, rtx mem, machine_mode inner = GET_MODE_INNER (mode); int shift = exact_log2 (GET_MODE_SIZE (inner)); - rtx ramp = gen_rtx_REG (V64SImode, VGPR_REGNO (1)); - rtx undef_v64si = gcn_gen_undef (V64SImode); + rtx ramp = gen_rtx_REG (offmode, VGPR_REGNO (1)); rtx new_base = NULL_RTX; addr_space_t as = MEM_ADDR_SPACE (mem); rtx tmplo = (REG_P (tmp) - ? gcn_operand_part (V64DImode, tmp, 0) - : gen_reg_rtx (V64SImode)); + ? gcn_operand_part (pmode, tmp, 0) + : gen_reg_rtx (offmode)); /* tmplo[:] = ramp[:] << shift */ - if (exec) - emit_insn (gen_ashlv64si3_exec (tmplo, ramp, - gen_int_mode (shift, SImode), - undef_v64si, exec)); - else - emit_insn (gen_ashlv64si3 (tmplo, ramp, gen_int_mode (shift, SImode))); + emit_insn (gen_ashlvNsi3 (tmplo, ramp, + gen_int_mode (shift, SImode), + NULL, exec)); if (AS_FLAT_P (as)) { @@ -1866,53 +2149,41 @@ gcn_expand_scalar_to_vector_address (machine_mode mode, rtx exec, rtx mem, { rtx mem_base_lo = gcn_operand_part (DImode, mem_base, 0); rtx mem_base_hi = gcn_operand_part (DImode, mem_base, 1); - rtx tmphi = gcn_operand_part (V64DImode, tmp, 1); + rtx tmphi = gcn_operand_part (pmode, tmp, 1); /* tmphi[:] = mem_base_hi */ - if (exec) - emit_insn (gen_vec_duplicatev64si_exec (tmphi, mem_base_hi, - undef_v64si, exec)); - else - emit_insn (gen_vec_duplicatev64si (tmphi, mem_base_hi)); + emit_insn (gen_vec_duplicatevNsi (tmphi, mem_base_hi, NULL, exec)); /* tmp[:] += zext (mem_base) */ if (exec) { - emit_insn (gen_addv64si3_vcc_dup_exec (tmplo, mem_base_lo, tmplo, - vcc, undef_v64si, exec)); - emit_insn (gen_addcv64si3_exec (tmphi, tmphi, const0_rtx, - vcc, vcc, undef_v64si, exec)); + emit_insn (gen_addvNsi3_vcc_dup (tmplo, mem_base_lo, tmplo, + vcc, NULL, exec)); + emit_insn (gen_addcvNsi3 (tmphi, tmphi, const0_rtx, + vcc, vcc, NULL, exec)); } else - emit_insn (gen_addv64di3_vcc_zext_dup (tmp, mem_base_lo, tmp, vcc)); + emit_insn (gen_addvNdi3_vcc_zext_dup (tmp, mem_base_lo, tmp, vcc)); } else { - tmp = gen_reg_rtx (V64DImode); - if (exec) - emit_insn (gen_addv64di3_vcc_zext_dup2_exec - (tmp, tmplo, mem_base, vcc, gcn_gen_undef (V64DImode), - exec)); - else - emit_insn (gen_addv64di3_vcc_zext_dup2 (tmp, tmplo, mem_base, vcc)); + tmp = gen_reg_rtx (pmode); + emit_insn (gen_addvNdi3_vcc_zext_dup2 (tmp, tmplo, mem_base, vcc, + NULL, exec)); } new_base = tmp; } else if (AS_ANY_DS_P (as)) { - if (!exec) - emit_insn (gen_addv64si3_dup (tmplo, tmplo, mem_base)); - else - emit_insn (gen_addv64si3_dup_exec (tmplo, tmplo, mem_base, - gcn_gen_undef (V64SImode), exec)); + emit_insn (gen_addvNsi3_dup (tmplo, tmplo, mem_base, NULL, exec)); new_base = tmplo; } else { - mem_base = gen_rtx_VEC_DUPLICATE (V64DImode, mem_base); - new_base = gen_rtx_PLUS (V64DImode, mem_base, - gen_rtx_SIGN_EXTEND (V64DImode, tmplo)); + mem_base = gen_rtx_VEC_DUPLICATE (pmode, mem_base); + new_base = gen_rtx_PLUS (pmode, mem_base, + gen_rtx_SIGN_EXTEND (pmode, tmplo)); } return gen_rtx_PLUS (GET_MODE (new_base), new_base, @@ -1929,42 +2200,33 @@ gcn_expand_scalar_to_vector_address (machine_mode mode, rtx exec, rtx mem, If EXEC is set then _exec patterns will be used, otherwise plain. Return values. - ADDR_SPACE_FLAT - return V64DImode vector of absolute addresses. - ADDR_SPACE_GLOBAL - return V64SImode vector of offsets. */ + ADDR_SPACE_FLAT - return VnDImode vector of absolute addresses. + ADDR_SPACE_GLOBAL - return VnSImode vector of offsets. */ rtx gcn_expand_scaled_offsets (addr_space_t as, rtx base, rtx offsets, rtx scale, bool unsigned_p, rtx exec) { - rtx tmpsi = gen_reg_rtx (V64SImode); - rtx tmpdi = gen_reg_rtx (V64DImode); - rtx undefsi = exec ? gcn_gen_undef (V64SImode) : NULL; - rtx undefdi = exec ? gcn_gen_undef (V64DImode) : NULL; + int vf = GET_MODE_NUNITS (GET_MODE (offsets)); + rtx tmpsi = gen_reg_rtx (VnMODE (vf, SImode)); + rtx tmpdi = gen_reg_rtx (VnMODE (vf, DImode)); if (CONST_INT_P (scale) && INTVAL (scale) > 0 && exact_log2 (INTVAL (scale)) >= 0) - emit_insn (gen_ashlv64si3 (tmpsi, offsets, - GEN_INT (exact_log2 (INTVAL (scale))))); + emit_insn (gen_ashlvNsi3 (tmpsi, offsets, + GEN_INT (exact_log2 (INTVAL (scale))), + NULL, exec)); else - (exec - ? emit_insn (gen_mulv64si3_dup_exec (tmpsi, offsets, scale, undefsi, - exec)) - : emit_insn (gen_mulv64si3_dup (tmpsi, offsets, scale))); + emit_insn (gen_mulvNsi3_dup (tmpsi, offsets, scale, NULL, exec)); /* "Global" instructions do not support negative register offsets. */ if (as == ADDR_SPACE_FLAT || !unsigned_p) { if (unsigned_p) - (exec - ? emit_insn (gen_addv64di3_zext_dup2_exec (tmpdi, tmpsi, base, - undefdi, exec)) - : emit_insn (gen_addv64di3_zext_dup2 (tmpdi, tmpsi, base))); + emit_insn (gen_addvNdi3_zext_dup2 (tmpdi, tmpsi, base, NULL, exec)); else - (exec - ? emit_insn (gen_addv64di3_sext_dup2_exec (tmpdi, tmpsi, base, - undefdi, exec)) - : emit_insn (gen_addv64di3_sext_dup2 (tmpdi, tmpsi, base))); + emit_insn (gen_addvNdi3_sext_dup2 (tmpdi, tmpsi, base, NULL, exec)); return tmpdi; } else if (as == ADDR_SPACE_GLOBAL) @@ -2065,59 +2327,9 @@ gcn_secondary_reload (bool in_p, rtx x, reg_class_t rclass, || GET_MODE_CLASS (reload_mode) == MODE_VECTOR_FLOAT) { if (in_p) - switch (reload_mode) - { - case E_V64SImode: - sri->icode = CODE_FOR_reload_inv64si; - break; - case E_V64SFmode: - sri->icode = CODE_FOR_reload_inv64sf; - break; - case E_V64HImode: - sri->icode = CODE_FOR_reload_inv64hi; - break; - case E_V64HFmode: - sri->icode = CODE_FOR_reload_inv64hf; - break; - case E_V64QImode: - sri->icode = CODE_FOR_reload_inv64qi; - break; - case E_V64DImode: - sri->icode = CODE_FOR_reload_inv64di; - break; - case E_V64DFmode: - sri->icode = CODE_FOR_reload_inv64df; - break; - default: - gcc_unreachable (); - } + sri->icode = get_code_for_reload_in (reload_mode); else - switch (reload_mode) - { - case E_V64SImode: - sri->icode = CODE_FOR_reload_outv64si; - break; - case E_V64SFmode: - sri->icode = CODE_FOR_reload_outv64sf; - break; - case E_V64HImode: - sri->icode = CODE_FOR_reload_outv64hi; - break; - case E_V64HFmode: - sri->icode = CODE_FOR_reload_outv64hf; - break; - case E_V64QImode: - sri->icode = CODE_FOR_reload_outv64qi; - break; - case E_V64DImode: - sri->icode = CODE_FOR_reload_outv64di; - break; - case E_V64DFmode: - sri->icode = CODE_FOR_reload_outv64df; - break; - default: - gcc_unreachable (); - } + sri->icode = get_code_for_reload_out (reload_mode); break; } /* Fallthrough. */ @@ -3428,6 +3640,9 @@ gcn_valid_cvt_p (machine_mode from, machine_mode to, enum gcn_cvt_t op) if (VECTOR_MODE_P (from)) { + if (GET_MODE_NUNITS (from) != GET_MODE_NUNITS (to)) + return false; + from = GET_MODE_INNER (from); to = GET_MODE_INNER (to); } @@ -3926,7 +4141,7 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ , rtx mem = gen_rtx_MEM (GET_MODE (target), addrs); /*set_mem_addr_space (mem, ADDR_SPACE_FLAT); */ /* FIXME: set attributes. */ - emit_insn (gen_mov_with_exec (target, mem, exec)); + emit_insn (gen_movvNm (target, mem, NULL, exec)); return target; } case GCN_BUILTIN_FLAT_STORE_PTR_INT32: @@ -3961,20 +4176,18 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ , rtx mem = gen_rtx_MEM (vmode, addrs); /*set_mem_addr_space (mem, ADDR_SPACE_FLAT); */ /* FIXME: set attributes. */ - emit_insn (gen_mov_with_exec (mem, val, exec)); + emit_insn (gen_movvNm (mem, val, NULL, exec)); return target; } case GCN_BUILTIN_SQRTVF: { if (ignore) return target; - rtx exec = gcn_full_exec_reg (); rtx arg = force_reg (V64SFmode, expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX, V64SFmode, EXPAND_NORMAL)); - emit_insn (gen_sqrtv64sf2_exec - (target, arg, gcn_gen_undef (V64SFmode), exec)); + emit_insn (gen_sqrtv64sf2 (target, arg)); return target; } case GCN_BUILTIN_SQRTF: @@ -3992,20 +4205,17 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ , { if (ignore) return target; - rtx exec = gcn_full_exec_reg (); rtx arg = force_reg (V64SFmode, expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX, V64SFmode, EXPAND_NORMAL)); - emit_insn (gen_absv64sf2_exec - (target, arg, gcn_gen_undef (V64SFmode), exec)); + emit_insn (gen_absv64sf2 (target, arg)); return target; } case GCN_BUILTIN_LDEXPVF: { if (ignore) return target; - rtx exec = gcn_full_exec_reg (); rtx arg1 = force_reg (V64SFmode, expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX, V64SFmode, @@ -4014,15 +4224,13 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ , expand_expr (CALL_EXPR_ARG (exp, 1), NULL_RTX, V64SImode, EXPAND_NORMAL)); - emit_insn (gen_ldexpv64sf3_exec - (target, arg1, arg2, gcn_gen_undef (V64SFmode), exec)); + emit_insn (gen_ldexpv64sf3 (target, arg1, arg2)); return target; } case GCN_BUILTIN_LDEXPV: { if (ignore) return target; - rtx exec = gcn_full_exec_reg (); rtx arg1 = force_reg (V64DFmode, expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX, V64SFmode, @@ -4031,60 +4239,51 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ , expand_expr (CALL_EXPR_ARG (exp, 1), NULL_RTX, V64SImode, EXPAND_NORMAL)); - emit_insn (gen_ldexpv64df3_exec - (target, arg1, arg2, gcn_gen_undef (V64DFmode), exec)); + emit_insn (gen_ldexpv64df3 (target, arg1, arg2)); return target; } case GCN_BUILTIN_FREXPVF_EXP: { if (ignore) return target; - rtx exec = gcn_full_exec_reg (); rtx arg = force_reg (V64SFmode, expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX, V64SFmode, EXPAND_NORMAL)); - emit_insn (gen_frexpv64sf_exp2_exec - (target, arg, gcn_gen_undef (V64SImode), exec)); + emit_insn (gen_frexpv64sf_exp2 (target, arg)); return target; } case GCN_BUILTIN_FREXPVF_MANT: { if (ignore) return target; - rtx exec = gcn_full_exec_reg (); rtx arg = force_reg (V64SFmode, expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX, V64SFmode, EXPAND_NORMAL)); - emit_insn (gen_frexpv64sf_mant2_exec - (target, arg, gcn_gen_undef (V64SFmode), exec)); + emit_insn (gen_frexpv64sf_mant2 (target, arg)); return target; } case GCN_BUILTIN_FREXPV_EXP: { if (ignore) return target; - rtx exec = gcn_full_exec_reg (); rtx arg = force_reg (V64DFmode, expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX, V64DFmode, EXPAND_NORMAL)); - emit_insn (gen_frexpv64df_exp2_exec - (target, arg, gcn_gen_undef (V64SImode), exec)); + emit_insn (gen_frexpv64df_exp2 (target, arg)); return target; } case GCN_BUILTIN_FREXPV_MANT: { if (ignore) return target; - rtx exec = gcn_full_exec_reg (); rtx arg = force_reg (V64DFmode, expand_expr (CALL_EXPR_ARG (exp, 0), NULL_RTX, V64DFmode, EXPAND_NORMAL)); - emit_insn (gen_frexpv64df_mant2_exec - (target, arg, gcn_gen_undef (V64DFmode), exec)); + emit_insn (gen_frexpv64df_mant2 (target, arg)); return target; } case GCN_BUILTIN_OMP_DIM_SIZE: @@ -4239,10 +4438,11 @@ gcn_vectorize_get_mask_mode (machine_mode) Helper function for gcn_vectorize_vec_perm_const. */ static rtx -gcn_make_vec_perm_address (unsigned int *perm) +gcn_make_vec_perm_address (unsigned int *perm, int nelt) { - rtx x = gen_reg_rtx (V64SImode); - emit_move_insn (x, gcn_vec_constant (V64SImode, 0)); + machine_mode mode = VnMODE (nelt, SImode); + rtx x = gen_reg_rtx (mode); + emit_move_insn (x, gcn_vec_constant (mode, 0)); /* Permutation addresses use byte addressing. With each vector lane being 4 bytes wide, and with 64 lanes in total, only bits 2..7 are significant, @@ -4258,15 +4458,13 @@ gcn_make_vec_perm_address (unsigned int *perm) { uint64_t exec_mask = 0; uint64_t lane_mask = 1; - for (int j = 0; j < 64; j++, lane_mask <<= 1) - if ((perm[j] * 4) & bit_mask) + for (int j = 0; j < nelt; j++, lane_mask <<= 1) + if (((perm[j] % nelt) * 4) & bit_mask) exec_mask |= lane_mask; if (exec_mask) - emit_insn (gen_addv64si3_exec (x, x, - gcn_vec_constant (V64SImode, - bit_mask), - x, get_exec (exec_mask))); + emit_insn (gen_addvNsi3 (x, x, gcn_vec_constant (mode, bit_mask), + x, get_exec (exec_mask))); } return x; @@ -4336,39 +4534,11 @@ gcn_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode, src1_lanes |= lane_bit; } - rtx addr = gcn_make_vec_perm_address (perm); - rtx (*ds_bpermute) (rtx, rtx, rtx, rtx); - - switch (vmode) - { - case E_V64QImode: - ds_bpermute = gen_ds_bpermutev64qi; - break; - case E_V64HImode: - ds_bpermute = gen_ds_bpermutev64hi; - break; - case E_V64SImode: - ds_bpermute = gen_ds_bpermutev64si; - break; - case E_V64HFmode: - ds_bpermute = gen_ds_bpermutev64hf; - break; - case E_V64SFmode: - ds_bpermute = gen_ds_bpermutev64sf; - break; - case E_V64DImode: - ds_bpermute = gen_ds_bpermutev64di; - break; - case E_V64DFmode: - ds_bpermute = gen_ds_bpermutev64df; - break; - default: - gcc_assert (false); - } + rtx addr = gcn_make_vec_perm_address (perm, nelt); /* Load elements from src0 to dst. */ - gcc_assert (~src1_lanes); - emit_insn (ds_bpermute (dst, addr, src0, gcn_full_exec_reg ())); + gcc_assert ((~src1_lanes) & (0xffffffffffffffffUL > (64-nelt))); + emit_insn (gen_ds_bpermutevNm (dst, addr, src0, get_exec (vmode))); /* Load elements from src1 to dst. */ if (src1_lanes) @@ -4379,8 +4549,8 @@ gcn_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode, the two source vectors together. */ rtx tmp = gen_reg_rtx (vmode); - emit_insn (ds_bpermute (tmp, addr, src1, gcn_full_exec_reg ())); - emit_insn (gen_mov_with_exec (dst, tmp, get_exec (src1_lanes))); + emit_insn (gen_ds_bpermutevNm (tmp, addr, src1, get_exec (vmode))); + emit_insn (gen_movvNm (dst, tmp, dst, get_exec (src1_lanes))); } return true; @@ -4396,7 +4566,22 @@ gcn_vector_mode_supported_p (machine_mode mode) { return (mode == V64QImode || mode == V64HImode || mode == V64SImode || mode == V64DImode - || mode == V64SFmode || mode == V64DFmode); + || mode == V64SFmode || mode == V64DFmode + || mode == V32QImode || mode == V32HImode + || mode == V32SImode || mode == V32DImode + || mode == V32SFmode || mode == V32DFmode + || mode == V16QImode || mode == V16HImode + || mode == V16SImode || mode == V16DImode + || mode == V16SFmode || mode == V16DFmode + || mode == V8QImode || mode == V8HImode + || mode == V8SImode || mode == V8DImode + || mode == V8SFmode || mode == V8DFmode + || mode == V4QImode || mode == V4HImode + || mode == V4SImode || mode == V4DImode + || mode == V4SFmode || mode == V4DFmode + || mode == V2QImode || mode == V2HImode + || mode == V2SImode || mode == V2DImode + || mode == V2SFmode || mode == V2DFmode); } /* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE. @@ -4425,23 +4610,74 @@ gcn_vectorize_preferred_simd_mode (scalar_mode mode) } } +/* Implement TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES. + + Try all the vector modes. */ + +unsigned int gcn_autovectorize_vector_modes (vector_modes *modes, + bool ARG_UNUSED (all)) +{ + modes->safe_push (V64QImode); + modes->safe_push (V64HImode); + modes->safe_push (V64SImode); + modes->safe_push (V64SFmode); + modes->safe_push (V64DImode); + modes->safe_push (V64DFmode); + + modes->safe_push (V32QImode); + modes->safe_push (V32HImode); + modes->safe_push (V32SImode); + modes->safe_push (V32SFmode); + modes->safe_push (V32DImode); + modes->safe_push (V32DFmode); + + modes->safe_push (V16QImode); + modes->safe_push (V16HImode); + modes->safe_push (V16SImode); + modes->safe_push (V16SFmode); + modes->safe_push (V16DImode); + modes->safe_push (V16DFmode); + + modes->safe_push (V8QImode); + modes->safe_push (V8HImode); + modes->safe_push (V8SImode); + modes->safe_push (V8SFmode); + modes->safe_push (V8DImode); + modes->safe_push (V8DFmode); + + modes->safe_push (V4QImode); + modes->safe_push (V4HImode); + modes->safe_push (V4SImode); + modes->safe_push (V4SFmode); + modes->safe_push (V4DImode); + modes->safe_push (V4DFmode); + + modes->safe_push (V2QImode); + modes->safe_push (V2HImode); + modes->safe_push (V2SImode); + modes->safe_push (V2SFmode); + modes->safe_push (V2DImode); + modes->safe_push (V2DFmode); + + /* We shouldn't need VECT_COMPARE_COSTS as they should all cost the same. */ + return 0; +} + /* Implement TARGET_VECTORIZE_RELATED_MODE. All GCN vectors are 64-lane, so this is simpler than other architectures. In particular, we do *not* want to match vector bit-size. */ static opt_machine_mode -gcn_related_vector_mode (machine_mode ARG_UNUSED (vector_mode), +gcn_related_vector_mode (machine_mode vector_mode, scalar_mode element_mode, poly_uint64 nunits) { - if (known_ne (nunits, 0U) && known_ne (nunits, 64U)) - return VOIDmode; + int n = nunits.to_constant (); - machine_mode pref_mode = gcn_vectorize_preferred_simd_mode (element_mode); - if (!VECTOR_MODE_P (pref_mode)) - return VOIDmode; + if (n == 0) + n = GET_MODE_NUNITS (vector_mode); - return pref_mode; + return VnMODE (n, element_mode); } /* Implement TARGET_VECTORIZE_PREFERRED_VECTOR_ALIGNMENT. @@ -4566,6 +4802,8 @@ gcn_expand_dpp_shr_insn (machine_mode mode, const char *insn, The vector register SRC of mode MODE is reduced using the operation given by UNSPEC, and the scalar result is returned in lane 63 of a vector register. */ +/* FIXME: Implement reductions for sizes other than V64. + (They're currently disabled in the machine description.) */ rtx gcn_expand_reduc_scalar (machine_mode mode, rtx src, int unspec) @@ -4975,10 +5213,11 @@ gcn_md_reorg (void) { if (VECTOR_MODE_P (GET_MODE (x))) { - new_exec = -1; - break; + int vf = GET_MODE_NUNITS (GET_MODE (x)); + new_exec = MAX ((uint64_t)new_exec, + 0xffffffffffffffffUL >> (64-vf)); } - else + else if (new_exec == 0) new_exec = 1; } } @@ -5693,13 +5932,12 @@ static void print_reg (FILE *file, rtx x) { machine_mode mode = GET_MODE (x); + if (VECTOR_MODE_P (mode)) + mode = GET_MODE_INNER (mode); if (mode == BImode || mode == QImode || mode == HImode || mode == SImode - || mode == HFmode || mode == SFmode - || mode == V64SFmode || mode == V64SImode - || mode == V64QImode || mode == V64HImode) + || mode == HFmode || mode == SFmode) fprintf (file, "%s", reg_names[REGNO (x)]); - else if (mode == DImode || mode == V64DImode - || mode == DFmode || mode == V64DFmode) + else if (mode == DImode || mode == DFmode) { if (SGPR_REGNO_P (REGNO (x))) fprintf (file, "s[%i:%i]", REGNO (x) - FIRST_SGPR_REG, @@ -6146,20 +6384,20 @@ print_operand (FILE *file, rtx x, int code) case 'o': { const char *s = 0; - switch (GET_MODE_SIZE (GET_MODE (x))) + machine_mode mode = GET_MODE (x); + if (VECTOR_MODE_P (mode)) + mode = GET_MODE_INNER (mode); + + switch (mode) { - case 1: + case E_QImode: s = "_ubyte"; break; - case 2: + case E_HImode: + case E_HFmode: s = "_ushort"; break; - /* The following are full-vector variants. */ - case 64: - s = "_ubyte"; - break; - case 128: - s = "_ushort"; + default: break; } @@ -6174,43 +6412,31 @@ print_operand (FILE *file, rtx x, int code) } case 's': { - const char *s = ""; - switch (GET_MODE_SIZE (GET_MODE (x))) + const char *s; + machine_mode mode = GET_MODE (x); + if (VECTOR_MODE_P (mode)) + mode = GET_MODE_INNER (mode); + + switch (mode) { - case 1: + case E_QImode: s = "_byte"; break; - case 2: + case E_HImode: + case E_HFmode: s = "_short"; break; - case 4: + case E_SImode: + case E_SFmode: s = "_dword"; break; - case 8: + case E_DImode: + case E_DFmode: s = "_dwordx2"; break; - case 12: - s = "_dwordx3"; - break; - case 16: + case E_TImode: s = "_dwordx4"; break; - case 32: - s = "_dwordx8"; - break; - case 64: - s = VECTOR_MODE_P (GET_MODE (x)) ? "_byte" : "_dwordx16"; - break; - /* The following are full-vector variants. */ - case 128: - s = "_short"; - break; - case 256: - s = "_dword"; - break; - case 512: - s = "_dwordx2"; - break; default: output_operand_lossage ("invalid operand %%xn code"); return; @@ -6714,6 +6940,9 @@ gcn_dwarf_register_span (rtx rtl) #define TARGET_ASM_TRAMPOLINE_TEMPLATE gcn_asm_trampoline_template #undef TARGET_ATTRIBUTE_TABLE #define TARGET_ATTRIBUTE_TABLE gcn_attribute_table +#undef TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES +#define TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES \ + gcn_autovectorize_vector_modes #undef TARGET_BUILTIN_DECL #define TARGET_BUILTIN_DECL gcn_builtin_decl #undef TARGET_CAN_CHANGE_MODE_CLASS From patchwork Tue Oct 11 11:02:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 1910 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp2030532wrs; Tue, 11 Oct 2022 04:04:03 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4PiBik6CiZ7T8lhmQZVM1vRXhYUTgQMeygPmTdRUBn41DNg2gGPyEC850VxMQPw7SmGLMU X-Received: by 2002:a17:907:7d8d:b0:78d:d467:dd3 with SMTP id oz13-20020a1709077d8d00b0078dd4670dd3mr2637899ejc.547.1665486242746; Tue, 11 Oct 2022 04:04:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665486242; cv=none; d=google.com; s=arc-20160816; b=k5rgdrJHQeedpzcdPYSnEuq+85IojAOZSjz8BUdLD2ChYtkOtmxqesoxcJIGwtaDoS MBGgEoU1RPngvTEVA38oPGIODpPSBMZ4fkYZN+dqKDklZN2r8B62pZHt98d8j70Ss6h+ laiToYrzuB4WczUxbQd02wN8eX0fkQk3xunIeAv2hQ08AZpo9mVf59DbgC0ex7hGRQHl HQjBVjrS5v9DRTBOpz0K6oOfPeQVwKrKJuQlrNBlHhHJ9TA48I/Zn+1ti3/QREtDK8NP WgkYCnL/tBIsJeySMzIXje/bUpLVAIE9LYRW7Vnb75U9FAd8CX9JchrZcSdakeVrq8AO 0psw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :ironport-sdr:dmarc-filter:delivered-to; bh=WhhVSAcbeQ2SXM2dz3LWHb6ugOwdfYHmTJfuTflq2js=; b=pOBspn/JIOocDyAVKmf3E2oNLHzt8dGNKC1yqXBa4X9W7tLLl/g8kUi1zwAvuCRJoL /x+7O8/kf52ScGg1cGmkiyadXKQrBn/vqofipiC9mvSLyLgKzo3nVwP72BYmZrFxkYRI wleDtVQnfi+xprfIRzsEIMBm0csvDqndskRPHYKQ8sU50jbjk9rVfWrgRW/YMpFvUKXJ a5ZBwsxuqvTubd+fkwV6jz4UfFH6ceRAVjepIjSNkO70zqvti/F9p3/zodLEEhiqmb2Q i1jj3Xbe94CeOttn62JSDPsMOM8luaAcSsjHmd5pTvpg3+k9AW1nFJYuu+i1xY7qYkeK n/Zw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id hv12-20020a17090760cc00b0078de51e658esi1104525ejc.208.2022.10.11.04.04.02 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Oct 2022 04:04:02 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CCD403851C0F for ; Tue, 11 Oct 2022 11:03:01 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 6F8383858401 for ; Tue, 11 Oct 2022 11:02:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6F8383858401 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.95,176,1661846400"; d="scan'208";a="87280551" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa1.mentor.iphmx.com with ESMTP; 11 Oct 2022 03:02:34 -0800 IronPort-SDR: BRlmm/wJC06AiMQ0I6X8+S6AxLJqdEGUSg145+YEmimS+ff70rcPuKTKeDTRIcIls92j+fEbLL JtNABAvYTBrY0KZUHmTOuZl1q1bqHp0XjW/LDOaDWsc/4FFxpKbVrxP9ZcgiuoH1EVtrFW0uRp o0RtRhOhPgHTDqFnGSDL2ncBtXuGf60TgtAY1TY0jj3xVD+Pbsy52r425nWLAMtN2ExP478LUh 7Ov8bhKdlKniGvhPc7BynWfHHbVZKsczKlzoh7VnswdeZy/gr+Z8KrT2Tch9SM1kl6a5lI3q5g o2g= From: Andrew Stubbs To: Subject: [committed 2/6] amdgcn: Resolve insn conditions at compile time Date: Tue, 11 Oct 2022 12:02:04 +0100 Message-ID: <0d8753cf30486c4e7fb07455b7cae49aa812c6a4.1665485382.git.ams@codesourcery.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-14.mgc.mentorg.com (139.181.222.14) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746388902207716563?= X-GMAIL-MSGID: =?utf-8?q?1746388902207716563?= GET_MODE_NUNITS isn't a compile time constant, so we end up with many impossible insns in the machine description. Adding MODE_VF allows the insns to be eliminated completely. gcc/ChangeLog: * config/gcn/gcn-valu.md (2): Use MODE_VF. (2): Likewise. * config/gcn/gcn.h (MODE_VF): New macro. --- gcc/config/gcn/gcn-valu.md | 10 ++++++---- gcc/config/gcn/gcn.h | 24 ++++++++++++++++++++++++ 2 files changed, 30 insertions(+), 4 deletions(-) diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index 52d2fcb880a..c7be2361164 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b/gcc/config/gcn/gcn-valu.md @@ -2873,8 +2873,9 @@ (define_insn "2" [(set (match_operand:VCVT_FMODE 0 "register_operand" "= v") (cvt_op:VCVT_FMODE (match_operand:VCVT_MODE 1 "gcn_alu_operand" "vSvB")))] - "gcn_valid_cvt_p (mode, mode, - _cvt)" + "MODE_VF (mode) == MODE_VF (mode) + && gcn_valid_cvt_p (mode, mode, + _cvt)" "v_cvt\t%0, %1" [(set_attr "type" "vop1") (set_attr "length" "8")]) @@ -2883,8 +2884,9 @@ (define_insn "2" [(set (match_operand:VCVT_IMODE 0 "register_operand" "= v") (cvt_op:VCVT_IMODE (match_operand:VCVT_FMODE 1 "gcn_alu_operand" "vSvB")))] - "gcn_valid_cvt_p (mode, mode, - _cvt)" + "MODE_VF (mode) == MODE_VF (mode) + && gcn_valid_cvt_p (mode, mode, + _cvt)" "v_cvt\t%0, %1" [(set_attr "type" "vop1") (set_attr "length" "8")]) diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h index 318256c4a7a..38f7212db59 100644 --- a/gcc/config/gcn/gcn.h +++ b/gcc/config/gcn/gcn.h @@ -678,3 +678,27 @@ enum gcn_builtin_codes /* Trampolines */ #define TRAMPOLINE_SIZE 36 #define TRAMPOLINE_ALIGNMENT 64 + +/* MD Optimization. + The following are intended to be obviously constant at compile time to + allow genconditions to eliminate bad patterns at compile time. */ +#define MODE_VF(M) \ + ((M == V64QImode || M == V64HImode || M == V64HFmode || M == V64SImode \ + || M == V64SFmode || M == V64DImode || M == V64DFmode) \ + ? 64 \ + : (M == V32QImode || M == V32HImode || M == V32HFmode || M == V32SImode \ + || M == V32SFmode || M == V32DImode || M == V32DFmode) \ + ? 32 \ + : (M == V16QImode || M == V16HImode || M == V16HFmode || M == V16SImode \ + || M == V16SFmode || M == V16DImode || M == V16DFmode) \ + ? 16 \ + : (M == V8QImode || M == V8HImode || M == V8HFmode || M == V8SImode \ + || M == V8SFmode || M == V8DImode || M == V8DFmode) \ + ? 8 \ + : (M == V4QImode || M == V4HImode || M == V4HFmode || M == V4SImode \ + || M == V4SFmode || M == V4DImode || M == V4DFmode) \ + ? 4 \ + : (M == V2QImode || M == V2HImode || M == V2HFmode || M == V2SImode \ + || M == V2SFmode || M == V2DImode || M == V2DFmode) \ + ? 2 \ + : 1) From patchwork Tue Oct 11 11:02:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 1911 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp2030622wrs; Tue, 11 Oct 2022 04:04:14 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6mNdbAMx+3aVCjFlLdyREugSH9838AQgvDB8sNC2QfSFEmFwsNRsj2rNZLn/jOt/fUQWtb X-Received: by 2002:a17:907:3f94:b0:78d:9d2f:3002 with SMTP id hr20-20020a1709073f9400b0078d9d2f3002mr11456462ejc.40.1665486254779; Tue, 11 Oct 2022 04:04:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665486254; cv=none; d=google.com; s=arc-20160816; b=nviDB3Q2IgCorkiz2f/KM4R+hRXGuURI+4kaEreiHGjMxdV5uAfrEHwTKfBPoHcFVm mqK4mCO9PLRZtZ5U5+bcHCBBOfCOlENyqIm3y6P6DIP2OTUsrFEgTdsFcd+s8HKwI2gm ztAGdjgsnTSr2mPMTdYTlluCKyfm4Tt5IwcIF/EytKBfcU9YE5dZZYTgqYGYp9AcyPy7 GzeF/kUwg4ZPMAIRgcZ1ZdgaYWHmbeO8FPvWWqqwXVWZ3dz6nzzUjMC7vHPQFJkT1adv vWeAKBzWAYt2mPvZwM3tHQk5Gf7xe6Q5vCMRE2Kdur10DKt2ZWY5BXJF0H7e3cuIV1rw pIbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :ironport-sdr:dmarc-filter:delivered-to; bh=HsuySx3lvcg99QRyhJxqin5UrO4A7tt1CH3AxwccbdI=; b=MQxnejQyLAcRh688D058erHMadpHrf9ZBjCxmDbU/K6FMvcOhcyH83hiEv+KSYS2+b 54RqP7ayZlNvWS9zBs6EOZbqnIRRStxqqxL91k3dfqN7MdjVw+BuK206XX0VdBJIa0BQ 1KYhFrtAmI3YBd9IedsCbUDj9rnDfoo3ASVGUinz5O7qY1VKuHejG+gBw2WMImZhFV6P wGGqaRR6dm47AFrVaXTOlrybKbaLmuni/JNhhMDo2mEf9J2UiQoNg6K5mtVhvWnXOPIG SEP1k3meRXiIftJd5gkGa/1OGwLaeA+PqKecRn1813uSjWXFgXSXcwIFIorVWQLHQdkV S6ww== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id h17-20020a056402281100b0045945eed10asi16603145ede.5.2022.10.11.04.04.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Oct 2022 04:04:14 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 26330385803F for ; Tue, 11 Oct 2022 11:03:04 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 6D1FD3857826 for ; Tue, 11 Oct 2022 11:02:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6D1FD3857826 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.95,176,1661846400"; d="scan'208";a="87280554" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa1.mentor.iphmx.com with ESMTP; 11 Oct 2022 03:02:36 -0800 IronPort-SDR: 6dBJNlvrg1F6bIF7VkHDphnB8lY2ry55oJZH1Ho3jNj6zjnxxdGBYnS6OhWmx0J1GA+eEOjwPn NkSIw7UN4H89dYzNIZRhx6DVpEVn3BBFWCE4BnZF1d3oAHplQmq50EdmvqzzPVMKFzRUlLOLzw QzCVyXtVJElgNnl/dEY/HBWoI4Zf0dkq47pRYoXtzPj/Z8NYZOCw5kYwvTKVTiTigdmM2JmgCd AW7crqGzaXj36ktZtanHnDBL1eA1Cr46jq9GeWirQsm5kEoBNqrar3gV2SKESuSTyvoVVVkwEJ 2rY= From: Andrew Stubbs To: Subject: [committed 3/6] amdgcn: Add vec_extract for partial vectors Date: Tue, 11 Oct 2022 12:02:05 +0100 Message-ID: <5cfe08555034b29f301dcfb99a3691c81b2e2def.1665485382.git.ams@codesourcery.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-14.mgc.mentorg.com (139.181.222.14) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746388915247966618?= X-GMAIL-MSGID: =?utf-8?q?1746388915247966618?= Add vec_extract expanders for all valid pairs of vector types. gcc/ChangeLog: * config/gcn/gcn-protos.h (get_exec): Add prototypes for two variants. * config/gcn/gcn-valu.md (vec_extract): New define_expand. * config/gcn/gcn.cc (get_exec): Export the existing function. Add a new overload variant. --- gcc/config/gcn/gcn-protos.h | 2 ++ gcc/config/gcn/gcn-valu.md | 34 ++++++++++++++++++++++++++++++++++ gcc/config/gcn/gcn.cc | 9 ++++++++- 3 files changed, 44 insertions(+), 1 deletion(-) diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h index 6300c1cbd36..f9a1fc00b4f 100644 --- a/gcc/config/gcn/gcn-protos.h +++ b/gcc/config/gcn/gcn-protos.h @@ -24,6 +24,8 @@ extern bool gcn_constant64_p (rtx); extern bool gcn_constant_p (rtx); extern rtx gcn_convert_mask_mode (rtx reg); extern unsigned int gcn_dwarf_register_number (unsigned int regno); +extern rtx get_exec (int64_t); +extern rtx get_exec (machine_mode mode); extern char * gcn_expand_dpp_shr_insn (machine_mode, const char *, int, int); extern void gcn_expand_epilogue (); extern rtx gcn_expand_scaled_offsets (addr_space_t as, rtx base, rtx offsets, diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index c7be2361164..9ea60e1174f 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b/gcc/config/gcn/gcn-valu.md @@ -808,6 +808,40 @@ (define_insn "vec_extract" (set_attr "exec" "none") (set_attr "laneselect" "yes")]) +(define_expand "vec_extract" + [(set (match_operand:V_ALL_ALT 0 "register_operand") + (vec_select:V_ALL_ALT + (match_operand:V_ALL 1 "register_operand") + (parallel [(match_operand 2 "immediate_operand")])))] + "MODE_VF (mode) < MODE_VF (mode) + && mode == mode" + { + int numlanes = GET_MODE_NUNITS (mode); + int firstlane = INTVAL (operands[2]) * numlanes; + rtx tmp; + + if (firstlane == 0) + { + /* A plain move will do. */ + tmp = operands[1]; + } else { + /* FIXME: optimize this by using DPP where available. */ + + rtx permutation = gen_reg_rtx (mode); + emit_insn (gen_vec_series (permutation, + GEN_INT (firstlane*4), + GEN_INT (4))); + + tmp = gen_reg_rtx (mode); + emit_insn (gen_ds_bpermute (tmp, permutation, operands[1], + get_exec (mode))); + } + + emit_move_insn (operands[0], + gen_rtx_SUBREG (mode, tmp, 0)); + DONE; + }) + (define_expand "extract_last_" [(match_operand: 0 "register_operand") (match_operand:DI 1 "gcn_alu_operand") diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index e1636f6ddd6..fdcf290ef8b 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -846,7 +846,7 @@ gcn_ira_change_pseudo_allocno_class (int regno, reg_class_t cl, /* Create a new DImode pseudo reg and emit an instruction to initialize it to VAL. */ -static rtx +rtx get_exec (int64_t val) { rtx reg = gen_reg_rtx (DImode); @@ -854,6 +854,13 @@ get_exec (int64_t val) return reg; } +rtx +get_exec (machine_mode mode) +{ + int vf = (VECTOR_MODE_P (mode) ? GET_MODE_NUNITS (mode) : 1); + return get_exec (0xffffffffffffffffUL >> (64-vf)); +} + /* }}} */ /* {{{ Immediate constants. */ From patchwork Tue Oct 11 11:02:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 1912 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp2031079wrs; Tue, 11 Oct 2022 04:05:08 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7boyEygChiXduJhDZXNXgopfFoamfPestWQuDl4af9MmmzrIyK5kvvHk7tTkey3cvobcUC X-Received: by 2002:aa7:dc10:0:b0:440:b446:c0cc with SMTP id b16-20020aa7dc10000000b00440b446c0ccmr22223451edu.34.1665486308237; Tue, 11 Oct 2022 04:05:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665486308; cv=none; d=google.com; s=arc-20160816; b=R05AlgJdrvZWx6U47EA9xK14PF9nyfP3QEMUr/xmfVqYX6ys+/0IdjeuPMq407wIuc ts5O0IuBLwD0j++WSo1011jd2Cp24yUZp3rzoIxjnaCKbd9NNd/5QDHxRwtv3LBfDP3y Bk/0W37SEkkq2uCLzv16peSsO23BPLmnBbhpKSXLtVlMsbbI9WjAxPT5OBCWKsNddIZ4 Z8MVIpWrxSuFJOs93A9bVnWRSi9uwkmbiTX5MiIX72JnXDEK/8GSUYMGNPvEejnzrcnf SjSlXlH1prDBcZk32BGD7E+4KNZcpZ2Aigwb9bUXq9ZPvTORPGiFr7FcqaWFQVC/EoiD 2Sdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :ironport-sdr:dmarc-filter:delivered-to; bh=W6bQxyYhn4yNoWSwkC3zOvG6KQqU4zKar6MsWQQpHoA=; b=Ih1+fBvjQZbZui4E7TrPyoOD5j3IXpDYyAD7X5+1kQJ4/Ugprm3za3tRl54TNWdpyW ZNSZx5mfnP7CGnUzBydnZ3RxJtzaXtgvx0CGsiP/xutmUW/EIBVWPRRIA8xlp7ubwl1o 5ftjepQ6IKOKr3SZPca6e7NEO11fTGydCFppO/2iU4WMwPeROcI1hfyxIhERtd5uRRg9 Qw0tQbb8Dp5Z6Lnx2hvlaUhl6zyY4dRQs4KFqaFOBFq+4xa9oF9mt9tX+BOpc7PHs88I W1YUA9YM4ChvpHnsYjJDVd2OZwDbHziqBsV8JLGOhoKtLBuAvv+VBmAF6GWTGSH1SJk5 TYgw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id bx17-20020a0564020b5100b00458ff0764casi11207998edb.95.2022.10.11.04.05.08 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Oct 2022 04:05:08 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 17F47385354F for ; Tue, 11 Oct 2022 11:03:20 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 06BBC3857025 for ; Tue, 11 Oct 2022 11:02:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 06BBC3857025 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.95,176,1661846400"; d="scan'208";a="87280555" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa1.mentor.iphmx.com with ESMTP; 11 Oct 2022 03:02:42 -0800 IronPort-SDR: fYNvSZGTOuHHzOPjWfwo2wPSAD0yvFB6IsQ3nSQfXlkgaWXjvexd2gPLb14fuRB2XYohQIUOAZ emlF9mWhqK6quyEp5q4zaL3XO7wyTeG4OVRlEqxw9FMlXXqkjSliwWnXKUeXIdoSpNEZUPeWsz jegS7FGfdRRdUNwV105Q++P+dG8HBMGvldxfz++RQp5SKDruPXWAtlBCXn97stb1QGQVLkLzOc +yH7I7MRQWoQJ5TWi6UDzZ63If/R6SssgAZnOsU44K0aCdxKEtcWRvX5P9nDGIuAGA+x8RHf14 FGU= From: Andrew Stubbs To: Subject: [committed 4/6] amdgcn: vec_init for multiple vector sizes Date: Tue, 11 Oct 2022 12:02:06 +0100 Message-ID: <769a10d0fc45e4923d7eb631170a117529ad5e39.1665485382.git.ams@codesourcery.com> X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-14.mgc.mentorg.com (139.181.222.14) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746388970937624959?= X-GMAIL-MSGID: =?utf-8?q?1746388970937624959?= Implements vec_init when the input is a vector of smaller vectors, or of vector MEM types, or a smaller vector duplicated several times. gcc/ChangeLog: * config/gcn/gcn-valu.md (vec_init): New. * config/gcn/gcn.cc (GEN_VN): Add andvNsi3, subvNsi3. (GEN_VNM): Add gathervNm_expr. (GEN_VN_NOEXEC): Add vec_seriesvNsi. (gcn_expand_vector_init): Add initialization of vectors from smaller vectors. --- gcc/config/gcn/gcn-valu.md | 10 +++ gcc/config/gcn/gcn.cc | 159 +++++++++++++++++++++++++++++++------ 2 files changed, 143 insertions(+), 26 deletions(-) diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index 9ea60e1174f..f708e587f38 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b/gcc/config/gcn/gcn-valu.md @@ -893,6 +893,16 @@ (define_expand "vec_init" DONE; }) +(define_expand "vec_init" + [(match_operand:V_ALL 0 "register_operand") + (match_operand:V_ALL_ALT 1)] + "mode == mode + && MODE_VF (mode) < MODE_VF (mode)" + { + gcn_expand_vector_init (operands[0], operands[1]); + DONE; + }) + ;; }}} ;; {{{ Scatter / Gather diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index fdcf290ef8b..3dc294c2d2f 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -1365,12 +1365,17 @@ GEN_VN (add,di3_vcc_zext_dup2, A(rtx dest, rtx src1, rtx src2, rtx vcc), A(dest, src1, src2, vcc)) GEN_VN (addc,si3, A(rtx dest, rtx src1, rtx src2, rtx vccout, rtx vccin), A(dest, src1, src2, vccout, vccin)) +GEN_VN (and,si3, A(rtx dest, rtx src1, rtx src2), A(dest, src1, src2)) GEN_VN (ashl,si3, A(rtx dest, rtx src, rtx shift), A(dest, src, shift)) GEN_VNM_NOEXEC (ds_bpermute,, A(rtx dest, rtx addr, rtx src, rtx exec), A(dest, addr, src, exec)) +GEN_VNM (gather,_expr, A(rtx dest, rtx addr, rtx as, rtx vol), + A(dest, addr, as, vol)) GEN_VNM (mov,, A(rtx dest, rtx src), A(dest, src)) GEN_VN (mul,si3_dup, A(rtx dest, rtx src1, rtx src2), A(dest, src1, src2)) +GEN_VN (sub,si3, A(rtx dest, rtx src1, rtx src2), A(dest, src1, src2)) GEN_VNM (vec_duplicate,, A(rtx dest, rtx src), A(dest, src)) +GEN_VN_NOEXEC (vec_series,si, A(rtx dest, rtx x, rtx c), A(dest, x, c)) #undef GEN_VNM #undef GEN_VN @@ -1993,44 +1998,146 @@ regno_ok_for_index_p (int regno) void gcn_expand_vector_init (rtx op0, rtx vec) { - int64_t initialized_mask = 0; - int64_t curr_mask = 1; + rtx val[64]; machine_mode mode = GET_MODE (op0); int vf = GET_MODE_NUNITS (mode); + machine_mode addrmode = VnMODE (vf, DImode); + machine_mode offsetmode = VnMODE (vf, SImode); - rtx val = XVECEXP (vec, 0, 0); + int64_t mem_mask = 0; + int64_t item_mask[64]; + rtx ramp = gen_reg_rtx (offsetmode); + rtx addr = gen_reg_rtx (addrmode); - for (int i = 1; i < vf; i++) - if (rtx_equal_p (val, XVECEXP (vec, 0, i))) - curr_mask |= (int64_t) 1 << i; + int unit_size = GET_MODE_SIZE (GET_MODE_INNER (GET_MODE (op0))); + emit_insn (gen_mulvNsi3_dup (ramp, gen_rtx_REG (offsetmode, VGPR_REGNO (1)), + GEN_INT (unit_size))); - if (gcn_constant_p (val)) - emit_move_insn (op0, gcn_vec_constant (mode, val)); - else + bool simple_repeat = true; + + /* Expand nested vectors into one vector. */ + int item_count = XVECLEN (vec, 0); + for (int i = 0, j = 0; i < item_count; i++) + { + rtx item = XVECEXP (vec, 0, i); + machine_mode mode = GET_MODE (item); + int units = VECTOR_MODE_P (mode) ? GET_MODE_NUNITS (mode) : 1; + item_mask[j] = (((uint64_t)-1)>>(64-units)) << j; + + if (simple_repeat && i != 0) + simple_repeat = item == XVECEXP (vec, 0, i-1); + + /* If its a vector of values then copy them into the final location. */ + if (GET_CODE (item) == CONST_VECTOR) + { + for (int k = 0; k < units; k++) + val[j++] = XVECEXP (item, 0, k); + continue; + } + /* Otherwise, we have a scalar or an expression that expands... */ + + if (MEM_P (item)) + { + rtx base = XEXP (item, 0); + if (MEM_ADDR_SPACE (item) == DEFAULT_ADDR_SPACE + && REG_P (base)) + { + /* We have a simple vector load. We can put the addresses in + the vector, combine it with any other such MEMs, and load it + all with a single gather at the end. */ + int64_t mask = ((0xffffffffffffffffUL + >> (64-GET_MODE_NUNITS (mode))) + << j); + rtx exec = get_exec (mask); + emit_insn (gen_subvNsi3 + (ramp, ramp, + gcn_vec_constant (offsetmode, j*unit_size), + ramp, exec)); + emit_insn (gen_addvNdi3_zext_dup2 + (addr, ramp, base, + (mem_mask ? addr : gcn_gen_undef (addrmode)), + exec)); + mem_mask |= mask; + } + else + /* The MEM is non-trivial, so let's load it independently. */ + item = force_reg (mode, item); + } + else if (!CONST_INT_P (item) && !CONST_DOUBLE_P (item)) + /* The item may be a symbol_ref, or something else non-trivial. */ + item = force_reg (mode, item); + + /* Duplicate the vector across each item. + It is either a smaller vector register that needs shifting, + or a MEM that needs loading. */ + val[j] = item; + j += units; + } + + int64_t initialized_mask = 0; + rtx prev = NULL; + + if (mem_mask) { - val = force_reg (GET_MODE_INNER (mode), val); - emit_insn (gen_vec_duplicatevNm (op0, val)); + emit_insn (gen_gathervNm_expr + (op0, gen_rtx_PLUS (addrmode, addr, + gen_rtx_VEC_DUPLICATE (addrmode, + const0_rtx)), + GEN_INT (DEFAULT_ADDR_SPACE), GEN_INT (0), + NULL, get_exec (mem_mask))); + prev = op0; + initialized_mask = mem_mask; } - initialized_mask |= curr_mask; - for (int i = 1; i < vf; i++) + + if (simple_repeat && item_count > 1 && !prev) + { + /* Special case for instances of {A, B, A, B, A, B, ....}, etc. */ + rtx src = gen_rtx_SUBREG (mode, val[0], 0); + rtx input_vf_mask = GEN_INT (GET_MODE_NUNITS (GET_MODE (val[0]))-1); + + rtx permutation = gen_reg_rtx (VnMODE (vf, SImode)); + emit_insn (gen_vec_seriesvNsi (permutation, GEN_INT (0), GEN_INT (1))); + rtx mask_dup = gen_reg_rtx (VnMODE (vf, SImode)); + emit_insn (gen_vec_duplicatevNsi (mask_dup, input_vf_mask)); + emit_insn (gen_andvNsi3 (permutation, permutation, mask_dup)); + emit_insn (gen_ashlvNsi3 (permutation, permutation, GEN_INT (2))); + emit_insn (gen_ds_bpermutevNm (op0, permutation, src, get_exec (mode))); + return; + } + + /* Write each value, elementwise, but coalesce matching values into one + instruction, where possible. */ + for (int i = 0; i < vf; i++) if (!(initialized_mask & ((int64_t) 1 << i))) { - curr_mask = (int64_t) 1 << i; - rtx val = XVECEXP (vec, 0, i); - - for (int j = i + 1; j < vf; j++) - if (rtx_equal_p (val, XVECEXP (vec, 0, j))) - curr_mask |= (int64_t) 1 << j; - if (gcn_constant_p (val)) - emit_insn (gen_movvNm (op0, gcn_vec_constant (mode, val), op0, - get_exec (curr_mask))); + if (gcn_constant_p (val[i])) + emit_insn (gen_movvNm (op0, gcn_vec_constant (mode, val[i]), prev, + get_exec (item_mask[i]))); + else if (VECTOR_MODE_P (GET_MODE (val[i])) + && (GET_MODE_NUNITS (GET_MODE (val[i])) == vf + || i == 0)) + emit_insn (gen_movvNm (op0, gen_rtx_SUBREG (mode, val[i], 0), prev, + get_exec (item_mask[i]))); + else if (VECTOR_MODE_P (GET_MODE (val[i]))) + { + rtx permutation = gen_reg_rtx (VnMODE (vf, SImode)); + emit_insn (gen_vec_seriesvNsi (permutation, GEN_INT (-i*4), + GEN_INT (4))); + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_ds_bpermutevNm (tmp, permutation, + gen_rtx_SUBREG (mode, val[i], 0), + get_exec (-1))); + emit_insn (gen_movvNm (op0, tmp, prev, get_exec (item_mask[i]))); + } else { - val = force_reg (GET_MODE_INNER (mode), val); - emit_insn (gen_vec_duplicatevNm (op0, val, op0, - get_exec (curr_mask))); + rtx reg = force_reg (GET_MODE_INNER (mode), val[i]); + emit_insn (gen_vec_duplicatevNm (op0, reg, prev, + get_exec (item_mask[i]))); } - initialized_mask |= curr_mask; + + initialized_mask |= item_mask[i]; + prev = op0; } } From patchwork Tue Oct 11 11:02:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 1914 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp2031763wrs; Tue, 11 Oct 2022 04:06:25 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6QkcbSAs0ylY4JTs3/JCaKBdrSruOrqq/O3hbhXChDtrDkYSwg0YXSA9kOe4M+Wnys1x6y X-Received: by 2002:a17:906:730d:b0:782:a4e0:bb54 with SMTP id di13-20020a170906730d00b00782a4e0bb54mr18720014ejc.659.1665486385272; Tue, 11 Oct 2022 04:06:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665486385; cv=none; d=google.com; s=arc-20160816; b=iV8ytkOuQKW5IWpIay86hbfbciIzGItRKG+dMDTTGO/HGe2XLYDc9qnfxjutg0Zm5X 9c+a4AkRD3srx4vHG3NM2f+D61NLFOtq3yUkQU7Nk81a5rMBRoFj9/ajNxiGMHEneOcq DRxcGoqLctdl4qK1yM3bTiRwQTelTxC9vtTBPMKsIOqPgRJALq8olx+p2dpHGbvyhEo5 n54Wr+Po3w6FxPMrofDm+ik0kWVM/JiK4qTZyymx6SbkxTWW92JDyu0C2k+iO75apRCx xwh7dwmHcs5TILebvsZmtgCJcpeARSAemztGZrWi33mWF8X3Anq9DWa7BS50fJ6DY+aV GC/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :ironport-sdr:dmarc-filter:delivered-to; bh=E4LC1ghjon5yRI/fXgNI5cvOWHQLgIval+NB4cR4cEk=; b=Sa0rFIiNNKWO/A8FAjHr7TW7jEm6k0yrqZFWpRpDAbtLbsDIxxp3ju59NNPoh9YbNV jdc2yo0sX1Uy6i0NvIjdPR35jNq29i5IouQjRolWR4YdPHx+iC8dvKH9HF8r9GakOT4K oZiavkSAPPFXkaZEL95E6OSEIfDX5kJy6VawwqU227fmXYiLR+czOUavx7iGHCxe+6Ya 3DmXhflJP/S0uh/A3sBClK/00R+lWnhkNCPh8F1j/mpfvHTe8EFVtie0A6BfyKLGBDz9 SUuaPEfJ64QJjj2ef8KKqILqQyPuDvtoPkccTcAhOyF3UhJZlas9eq4Sy9gDNUebJj4v m/zA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id i9-20020aa7c709000000b00458814a8852si276878edq.591.2022.10.11.04.06.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Oct 2022 04:06:25 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A07F138515F9 for ; Tue, 11 Oct 2022 11:05:03 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id 00237385AE4F for ; Tue, 11 Oct 2022 11:04:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 00237385AE4F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.95,176,1661846400"; d="scan'208";a="84478907" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa4.mentor.iphmx.com with ESMTP; 11 Oct 2022 03:04:18 -0800 IronPort-SDR: qdISgpG0UxyzF1bAtP56kI0xroB7F3X5URtRPrM/sZDZJ4ujuAe4nQ8NqTBXIJ0Rsqjqdq8O2G lCRtJQ1mvoKbPiM0CPGEeojc/ymDEEKYfSyG7WQu5/N0AL1qa3LS3U+fmBeU+nbDIp+cswYpXc KHWCyw1wWj79ZYevOpr8UQ26N4fIOnfGzS29OtnlBzhxp0vMOpsIWEBylXFc1uDJd0T+o8ydHa WuOKJ3l1L/S3h8T/VlC7WoPceudpW7oonWRTcye9t+jroVHWzRNInHHQM4OrCQKW2DNp3eAond FfU= From: Andrew Stubbs To: Subject: [committed 5/6] amdgcn: Add vector integer negate insn Date: Tue, 11 Oct 2022 12:02:07 +0100 Message-ID: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-07.mgc.mentorg.com (139.181.222.7) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746389051848696377?= X-GMAIL-MSGID: =?utf-8?q?1746389051848696377?= Another example of the vectorizer needing explicit insns where the scalar expander just works. gcc/ChangeLog: * config/gcn/gcn-valu.md (neg2): New define_expand. --- gcc/config/gcn/gcn-valu.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md index f708e587f38..00c0e3be1ea 100644 --- a/gcc/config/gcn/gcn-valu.md +++ b/gcc/config/gcn/gcn-valu.md @@ -2390,6 +2390,19 @@ (define_insn "3" [(set_attr "type" "vop2,ds") (set_attr "length" "8,8")]) +;; }}} +;; {{{ Int unops + +(define_expand "neg2" + [(match_operand:V_INT 0 "register_operand") + (match_operand:V_INT 1 "register_operand")] + "" + { + emit_insn (gen_sub3 (operands[0], gcn_vec_constant (mode, 0), + operands[1])); + DONE; + }) + ;; }}} ;; {{{ FP binops - special cases From patchwork Tue Oct 11 11:02:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 1915 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp2032039wrs; Tue, 11 Oct 2022 04:07:03 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6sGW7UTwhqVHFMscS7VluN0YB5GBK9T8Hz1wYvLTQjFIoXCvVYY5BUe1CNn6Qw0PuN0J4D X-Received: by 2002:a17:907:2672:b0:781:dc01:6c5a with SMTP id ci18-20020a170907267200b00781dc016c5amr19151638ejc.191.1665486422623; Tue, 11 Oct 2022 04:07:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665486422; cv=none; d=google.com; s=arc-20160816; b=pDjhoNxk+FHEZc6VELUE9jkXI0pn4OU85vtcocaiP6U7od1yj1e2rhDojEOAY7mGW0 Gs+0UmQOB3sYb78wK1d8fwCKe2G5xX4QYOeA1xBl71VS7AuO0fxoMLG9esUyWQleqmSv hIAnyG0gM8I2u39SKE0vMVQ4EvyHcuLerbhmGNmuwWSbcMrhYyITeAkZPdia0lVDH+rq jiweeQoHXFYGeD6i7ioe7l4QHrs4FYjs8sVJBwSDiFWFXpXX0mPR95N6aAyCBrbwLtmH G5tcyCveDRYv0MGqVHxYijojkg6tgCFJlNhfNgOrUA+NvFDrQ1049W2L8EhfH+RmAs8M HbaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:to:from :ironport-sdr:dmarc-filter:delivered-to; bh=2g1dzPR1p9JCjY3YB+H6zf8LTUaNjiuI+ANlu2jru3I=; b=Ba1UQ11AvSympzGkD2Cdup/UAzn5G4RoEF3ACBxtqwIhS3hFpVtK5rVpQfqM6vuaK1 RzKCC0MW8Evi68wn8hertRrSBQoEFsLhUuRA1ewIXwUQ0DdXrt4TfibajnIhVzpdJKck 1p+69VSJvFXpTC4S1eOGhBcjeolLYEhhIDKru40tCWRnQ2Ld+qAHpAuZPTa903SvMpkS uZp6LegYOJjjRpsWmaA7+q6AoiF4NTOQfrXeKdKZifNkXehIXtSaVq7wPVnGr/veQedy 1/xIpCczg/YwaqcQ/sbcfSR5bYdpJDYOfeejaGhIqnkkPhpl89SWm+muVFyGDMPKUJRY rohQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id sc19-20020a1709078a1300b0078c4a772ea7si12259868ejc.11.2022.10.11.04.07.02 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Oct 2022 04:07:02 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 00876385AC26 for ; Tue, 11 Oct 2022 11:05:55 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id 767E23857C59 for ; Tue, 11 Oct 2022 11:04:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 767E23857C59 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.95,176,1661846400"; d="scan'208";a="84478908" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa4.mentor.iphmx.com with ESMTP; 11 Oct 2022 03:04:18 -0800 IronPort-SDR: usGuS1/pgkofIdG4HrZqUqN8uO/w/Lq6y0/lSBXGY0KzXREhPg3Pxq/87M6jkcYpav1P94oAFD gmmcvKJag/sC6TH4OKqC6dT1b/0opn/veDHWJvkTg+nyDzUBAm0bUof8s2eMMpqFEDpkPidEbr bZUZ5ECv/1zJ4fUJZalmBKZ3WyRGdbU4ZoKbDiDUaJPQ6jSbNO7DSG0L5Hc61m9nNh5TUof3ey kZiOGi1QgdMp3Pt29NjzCUmxqzeilnvDgN9nvtF/GmTxozoo5hBdXjO9MAKILxcA0lHswI3FRD ua4= From: Andrew Stubbs To: Subject: [committed 6/6] amdgcn: vector testsuite tweaks Date: Tue, 11 Oct 2022 12:02:08 +0100 Message-ID: X-Mailer: git-send-email 2.37.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-07.mgc.mentorg.com (139.181.222.7) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746389091011196834?= X-GMAIL-MSGID: =?utf-8?q?1746389091011196834?= The testsuite needs a few tweaks following my patches to add multiple vector sizes for amdgcn. gcc/testsuite/ChangeLog: * gcc.dg/pr104464.c: Xfail on amdgcn. * gcc.dg/signbit-2.c: Likewise. * gcc.dg/signbit-5.c: Likewise. * gcc.dg/vect/bb-slp-68.c: Likewise. * gcc.dg/vect/bb-slp-cond-1.c: Change expectations on amdgcn. * gcc.dg/vect/bb-slp-subgroups-3.c: Likewise. * gcc.dg/vect/no-vfa-vect-depend-2.c: Change expectations for multiple vector sizes. * gcc.dg/vect/pr33953.c: Likewise. * gcc.dg/vect/pr65947-12.c: Likewise. * gcc.dg/vect/pr65947-13.c: Likewise. * gcc.dg/vect/pr80631-2.c: Likewise. * gcc.dg/vect/slp-reduc-4.c: Likewise. * gcc.dg/vect/trapv-vect-reduc-4.c: Likewise. * lib/target-supports.exp (available_vector_sizes): Add more sizes for amdgcn. --- gcc/testsuite/gcc.dg/pr104464.c | 2 ++ gcc/testsuite/gcc.dg/signbit-2.c | 5 +++-- gcc/testsuite/gcc.dg/signbit-5.c | 1 + gcc/testsuite/gcc.dg/vect/bb-slp-68.c | 5 +++-- gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c | 3 ++- gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 5 ++++- gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c | 3 ++- gcc/testsuite/gcc.dg/vect/pr33953.c | 3 ++- gcc/testsuite/gcc.dg/vect/pr65947-12.c | 3 ++- gcc/testsuite/gcc.dg/vect/pr65947-13.c | 3 ++- gcc/testsuite/gcc.dg/vect/pr80631-2.c | 3 ++- gcc/testsuite/gcc.dg/vect/slp-reduc-4.c | 3 ++- gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c | 3 ++- gcc/testsuite/lib/target-supports.exp | 3 ++- 14 files changed, 31 insertions(+), 14 deletions(-) diff --git a/gcc/testsuite/gcc.dg/pr104464.c b/gcc/testsuite/gcc.dg/pr104464.c index ed6a22c39d5..d36a28678cb 100644 --- a/gcc/testsuite/gcc.dg/pr104464.c +++ b/gcc/testsuite/gcc.dg/pr104464.c @@ -9,3 +9,5 @@ foo(void) { f += (F)(f != (F){}[0]); } + +/* { dg-xfail-if "-fnon-call-exceptions unsupported" { amdgcn-*-* } } */ diff --git a/gcc/testsuite/gcc.dg/signbit-2.c b/gcc/testsuite/gcc.dg/signbit-2.c index 2f2dc448286..99a455bc7d7 100644 --- a/gcc/testsuite/gcc.dg/signbit-2.c +++ b/gcc/testsuite/gcc.dg/signbit-2.c @@ -20,6 +20,7 @@ void fun2(int32_t *x, int n) x[i] = (-x[i]) >> 30; } -/* { dg-final { scan-tree-dump {\s+>\s+\{ 0(, 0)+ \}} optimized { target vect_int } } } */ +/* Xfail amdgcn where vector truth type is not integer type. */ +/* { dg-final { scan-tree-dump {\s+>\s+\{ 0(, 0)+ \}} optimized { target vect_int xfail amdgcn-*-* } } } */ /* { dg-final { scan-tree-dump {\s+>\s+0} optimized { target { ! vect_int } } } } */ -/* { dg-final { scan-tree-dump-not {\s+>>\s+31} optimized } } */ +/* { dg-final { scan-tree-dump-not {\s+>>\s+31} optimized { xfail amdgcn-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/signbit-5.c b/gcc/testsuite/gcc.dg/signbit-5.c index 2b119cdfda7..0fad56c0ea8 100644 --- a/gcc/testsuite/gcc.dg/signbit-5.c +++ b/gcc/testsuite/gcc.dg/signbit-5.c @@ -4,6 +4,7 @@ /* This test does not work when the truth type does not match vector type. */ /* { dg-additional-options "-mno-avx512f" { target { i?86-*-* x86_64-*-* } } } */ /* { dg-additional-options "-march=armv8-a" { target aarch64_sve } } */ +/* { dg-xfail-run-if "truth type does not match vector type" { amdgcn-*-* } } */ #include diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-68.c b/gcc/testsuite/gcc.dg/vect/bb-slp-68.c index 8718031cc71..e7573a14933 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-68.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-68.c @@ -18,5 +18,6 @@ void foo () x[9] = z[3] + 1.; } -/* We want to have the store group split into 4, 2, 4 when using 32byte vectors. */ -/* { dg-final { scan-tree-dump-not "from scalars" "slp2" } } */ +/* We want to have the store group split into 4, 2, 4 when using 32byte vectors. + Unfortunately it does not work when 64-byte vectors are available. */ +/* { dg-final { scan-tree-dump-not "from scalars" "slp2" { xfail amdgcn-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c index 4bd286bf08c..1f5c621e5fd 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-cond-1.c @@ -46,5 +46,6 @@ int main () } /* { dg-final { scan-tree-dump {(no need for alias check [^\n]* when VF is 1|no alias between [^\n]* when [^\n]* is outside \(-16, 16\))} "vect" { target vect_element_align } } } */ -/* { dg-final { scan-tree-dump-times "loop vectorized" 1 "vect" { target vect_element_align } } } */ +/* { dg-final { scan-tree-dump-times "loop vectorized" 1 "vect" { target { vect_element_align && !amdgcn-*-* } } } } */ +/* { dg-final { scan-tree-dump-times "loop vectorized" 2 "vect" { target amdgcn-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c index 03c062ae6cf..fb719915db7 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c @@ -42,4 +42,7 @@ main (int argc, char **argv) /* Because we disable the cost model, targets with variable-length vectors can end up vectorizing the store to a[0..7] on its own. With the cost model we do something sensible. */ -/* { dg-final { scan-tree-dump-times "optimized: basic block" 2 "slp2" { xfail vect_variable_length } } } */ +/* { dg-final { scan-tree-dump-times "optimized: basic block" 2 "slp2" { target { ! amdgcn-*-* } xfail vect_variable_length } } } */ + +/* amdgcn can do this in one vector. */ +/* { dg-final { scan-tree-dump-times "optimized: basic block" 1 "slp2" { target amdgcn-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c index 1880d1edb32..89958378fca 100644 --- a/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c +++ b/gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c @@ -51,4 +51,5 @@ int main (void) } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {xfail { vect_no_align && { ! vect_hw_misalign } } } } } */ -/* { dg-final { scan-tree-dump-times "dependence distance negative" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "dependence distance negative" 1 "vect" { target { ! vect_multiple_sizes } } } } */ +/* { dg-final { scan-tree-dump "dependence distance negative" "vect" { target vect_multiple_sizes } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/pr33953.c b/gcc/testsuite/gcc.dg/vect/pr33953.c index 4dd54cd57f3..d376cf904b7 100644 --- a/gcc/testsuite/gcc.dg/vect/pr33953.c +++ b/gcc/testsuite/gcc.dg/vect/pr33953.c @@ -29,6 +29,7 @@ void blockmove_NtoN_blend_noremap32 (const UINT32 *srcdata, int srcwidth, } /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { ! vect_multiple_sizes } xfail { vect_no_align && { ! vect_hw_misalign } } } } } */ +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target vect_multiple_sizes xfail { vect_no_align && { ! vect_hw_misalign } } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-12.c b/gcc/testsuite/gcc.dg/vect/pr65947-12.c index a47f4146a29..9788eea0f54 100644 --- a/gcc/testsuite/gcc.dg/vect/pr65947-12.c +++ b/gcc/testsuite/gcc.dg/vect/pr65947-12.c @@ -42,5 +42,6 @@ main (void) } /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */ -/* { dg-final { scan-tree-dump-times "optimizing condition reduction with FOLD_EXTRACT_LAST" 2 "vect" { target vect_fold_extract_last } } } */ +/* { dg-final { scan-tree-dump-times "optimizing condition reduction with FOLD_EXTRACT_LAST" 2 "vect" { target { vect_fold_extract_last && { ! vect_multiple_sizes } } } } } */ +/* { dg-final { scan-tree-dump "optimizing condition reduction with FOLD_EXTRACT_LAST" "vect" { target { vect_fold_extract_last && vect_multiple_sizes } } } } */ /* { dg-final { scan-tree-dump-not "condition expression based on integer induction." "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/pr65947-13.c b/gcc/testsuite/gcc.dg/vect/pr65947-13.c index a703923151d..079b5f91ced 100644 --- a/gcc/testsuite/gcc.dg/vect/pr65947-13.c +++ b/gcc/testsuite/gcc.dg/vect/pr65947-13.c @@ -44,4 +44,5 @@ main (void) /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */ /* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 2 "vect" { xfail vect_fold_extract_last } } } */ -/* { dg-final { scan-tree-dump-times "optimizing condition reduction with FOLD_EXTRACT_LAST" 2 "vect" { target vect_fold_extract_last } } } */ +/* { dg-final { scan-tree-dump-times "optimizing condition reduction with FOLD_EXTRACT_LAST" 2 "vect" { target { vect_fold_extract_last && { ! vect_multiple_sizes } } } } } */ +/* { dg-final { scan-tree-dump "optimizing condition reduction with FOLD_EXTRACT_LAST" "vect" { target { vect_fold_extract_last && vect_multiple_sizes } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/pr80631-2.c b/gcc/testsuite/gcc.dg/vect/pr80631-2.c index 61e11316af2..4e586275176 100644 --- a/gcc/testsuite/gcc.dg/vect/pr80631-2.c +++ b/gcc/testsuite/gcc.dg/vect/pr80631-2.c @@ -75,4 +75,5 @@ main () /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 5 "vect" { target vect_condition } } } */ /* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 5 "vect" { target vect_condition xfail vect_fold_extract_last } } } */ -/* { dg-final { scan-tree-dump-times "optimizing condition reduction with FOLD_EXTRACT_LAST" 5 "vect" { target vect_fold_extract_last } } } */ +/* { dg-final { scan-tree-dump-times "optimizing condition reduction with FOLD_EXTRACT_LAST" 5 "vect" { target { { ! vect_multiple_sizes } && vect_fold_extract_last } } } } */ +/* { dg-final { scan-tree-dump "optimizing condition reduction with FOLD_EXTRACT_LAST" "vect" { target { vect_multiple_sizes && vect_fold_extract_last } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-4.c b/gcc/testsuite/gcc.dg/vect/slp-reduc-4.c index cffb0114bcb..15f5c259e98 100644 --- a/gcc/testsuite/gcc.dg/vect/slp-reduc-4.c +++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-4.c @@ -59,6 +59,7 @@ int main (void) /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_int_min_max } } } */ /* For variable-length SVE, the number of scalar statements in the reduction exceeds the number of elements in a 128-bit granule. */ -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_no_int_min_max || { aarch64_sve && vect_variable_length } } } } } */ +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { ! vect_multiple_sizes } xfail { vect_no_int_min_max || { aarch64_sve && vect_variable_length } } } } } */ +/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { target { vect_multiple_sizes } } } } */ /* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR" 0 "vect" { xfail { aarch64_sve && vect_variable_length } } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c index f09c964fdc1..24cf1f793c7 100644 --- a/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c +++ b/gcc/testsuite/gcc.dg/vect/trapv-vect-reduc-4.c @@ -50,6 +50,7 @@ int main (void) /* We can't handle the first loop with variable-length vectors and so fall back to the fixed-length mininum instead. */ -/* { dg-final { scan-tree-dump-times "Detected reduction\\." 3 "vect" { xfail vect_variable_length } } } */ +/* { dg-final { scan-tree-dump-times "Detected reduction\\." 3 "vect" { target { ! vect_multiple_sizes } xfail vect_variable_length } } } */ +/* { dg-final { scan-tree-dump "Detected reduction\\." "vect" { target vect_multiple_sizes } } } */ /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { ! vect_no_int_min_max } } } } */ /* { dg-final { scan-tree-dump-times {using an in-order \(fold-left\) reduction} 1 "vect" } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 7c9dd45f2a7..fdd88e6a516 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -8400,7 +8400,8 @@ proc available_vector_sizes { } { } elseif { [istarget sparc*-*-*] } { lappend result 64 } elseif { [istarget amdgcn*-*-*] } { - lappend result 4096 + # 6 different lane counts, and 4 element sizes + lappend result 4096 2048 1024 512 256 128 64 32 16 8 4 2 } else { # The traditional default asumption. lappend result 128