From patchwork Tue Nov 14 03:28:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 164711 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b909:0:b0:403:3b70:6f57 with SMTP id t9csp1625763vqg; Mon, 13 Nov 2023 19:29:17 -0800 (PST) X-Google-Smtp-Source: AGHT+IHBNEYFD3PRfoGq/Rifp9FzNYxtvfIodRthd5JNM+GHeD2EenNYHiVqHozprAjKZp3K6L7t X-Received: by 2002:ae9:e917:0:b0:779:de52:8743 with SMTP id x23-20020ae9e917000000b00779de528743mr1164775qkf.27.1699932557574; Mon, 13 Nov 2023 19:29:17 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1699932557; cv=pass; d=google.com; s=arc-20160816; b=dXBZqwNOQ38FEPhVn1cy8JpIXPZemFZH+BkgDcqG8mBpdsThL7GNsjvOG1Et0zxqnu sgu8LMz+YCP+LTl/BRSdMQsX+Thp4BBZRMN+lYxVgLeED4ewJN+Pxtt9dAyXS4JXNzUB YRm+C87DKrGK7lBsJJNCjT3kcT2BeL/bIPC0CL/mkIQJzXP3BNjaQyG2/FVz1wio9Eoq B4dzu3sHbO+gfmyqihlM580VMZddjAjPY7XUJEfUVW9U9ZGrbf4YFylqczOkDT+h1UIq vEZdS82T/l3BAI8rQ5um7/PJrvnH0vmNVMWcyE4Dzz6h92osfwwFT9yk+5eZ0Kqxjiix 9uew== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:arc-filter:dmarc-filter:delivered-to; bh=7rKatVivRpwADQG36SAAlLduQYHFhyP/YLkQHXbCN7U=; fh=idvV5TQ1gmHAoU8u1GUGfjilVySOK+BR5TeZLoSouN8=; b=awsH4V046XUX6G+Jb+vDh//k9dCIKhz4vEomK9ofs8SIYMpIbMo3Y37SlslEBq2KwW ZHB7hKDRWDSAed7LMcFuNPZGCBOgFFQMNmSNQWkCzLFjG/h0TN5RdSnznylAnqZs1yWY 06sftux3ukpVV/RrM7blbS6zGmyBjWM+lnaUDQc5BpU1jz1UzgrsHcCJt9TGOsutUwN0 knOFeEinQXAdHksdh8Jf6BrToOGQSGNDx6d2oZ6xoyBYeHZhtd1QLNi98u4YKpCwTmCr fhIcAXhSGglox5x8AnYLLQG5BCfdqRTm/4hiks5ihrDqAhEP4WrPDcxHe/iVkT/mXd1v ooUQ== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id a5-20020a05620a02e500b00778a780a304si5720277qko.663.2023.11.13.19.29.17 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Nov 2023 19:29:17 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 54A0D3845158 for ; Tue, 14 Nov 2023 03:29:17 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgeu1.qq.com (smtpbgeu1.qq.com [52.59.177.22]) by sourceware.org (Postfix) with ESMTPS id 6A7333858425 for ; Tue, 14 Nov 2023 03:28:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6A7333858425 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 6A7333858425 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=52.59.177.22 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699932534; cv=none; b=BENiq9PvAEJGikNBLjHVNUrI7cNY+5fY4kEZR0gqtkqhPOQCtIHcvAlzo8yeu0f3nrLEIvW24CBj0hJxheLeHBVEhI5zBpvYfY8VccX6Xpy3+/bp3Xk2xvQSN9jQWtTfDD9jh/CPm8h+sPA2CXIZf4M5YDLP/ZaW8th6GoGjK8Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699932534; c=relaxed/simple; bh=rnoiMLFwdfiXWya4/aqH2HSPvA5iVZ2WSUSYkZJLVME=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=Untrq784kLdSi0lXVDBSkeD2lcdQagX1BVZKJdJBVIxC9v5OmOBys7efqwVqi50dqdM9AiCdv89qh6uanhGLLm23yFalzDzv5cNrg2YkbXegtRCmiDIsybnXJcJUo2Yxo132eZipwO6OtAbCh/555WJvUKP6FMA/HfyzRqX+dGo= ARC-Authentication-Results: i=1; server2.sourceware.org X-QQ-mid: bizesmtp84t1699932519t9m8djmw Received: from rios-cad122.hadoop.rioslab.org ( [58.60.1.26]) by bizesmtp.qq.com (ESMTP) with id ; Tue, 14 Nov 2023 11:28:38 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: +ynUkgUhZJmTlQhcGowueSC/M5/aIktxzi2klHpvJxMpalsah8QaQjPG4WYrV FN45jUNvl9fd2bLbFImskkTOxfJDW/szNKrtR0h9PVtwo/j3rl5yHA6SzKqxDUKho45zqT7 rk+eZ/rrj+jVVymluWpUsixQjqqXZ1z2Ej3kdAGI5i5U9149oOTl/Xgbv2w2z22W2w8CQc/ 3e0DpVxYYaLC4FkdFg6pqr7N8SasLJLRtcPCKCZzHnbVtdf+ntIYBOhg3Oye+0ULeyRXgfI eggxBBen0dtcY28xJxvwDpui4RBVpFEvGb9eaF+MVMLTKs49D+8VE2hqX3SvlTIKiv/Ttuu ftgF3x4RYM6U9U/W52sZhJzDGbDOpS3lGPW30Q1dNxsBzU6q2NJaN2LcgL3NU4yJnsnPPpf X-QQ-GoodBg: 2 X-BIZMAIL-ID: 7468290072238689780 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: Juzhe-Zhong Subject: [Commit QUEUE V3] RISC-V: Support strided load/store Date: Tue, 14 Nov 2023 11:28:37 +0800 Message-Id: <20231114032837.1687779-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782508481562323211 X-GMAIL-MSGID: 1782508481562323211 Strided load/store has been approved. Rebase on V3 and adapt for middle-end IR change. Will commit after middle-end patche is approved. gcc/ChangeLog: * config/riscv/autovec.md (mask_len_strided_load_): New pattern. (mask_len_strided_store_): Ditto. * config/riscv/riscv-protos.h (expand_strided_load_store): New function. * config/riscv/riscv-v.cc (expand_strided_load_store): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c: Adapt test. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c: Ditto. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load-1.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load_run-1.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store-1.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store_run-1.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-3.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-3.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c: New test. * gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-3.c: New test. --- gcc/config/riscv/autovec.md | 30 ++++++ gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 56 +++++++++++ .../gather-scatter/mask_strided_load-1.c | 47 +++++++++ .../gather-scatter/mask_strided_load_run-1.c | 97 +++++++++++++++++++ .../gather-scatter/mask_strided_store-1.c | 48 +++++++++ .../gather-scatter/mask_strided_store_run-1.c | 89 +++++++++++++++++ .../autovec/gather-scatter/strided_load-1.c | 2 +- .../autovec/gather-scatter/strided_load-2.c | 2 +- .../autovec/gather-scatter/strided_load-3.c | 45 +++++++++ .../gather-scatter/strided_load_run-3.c | 84 ++++++++++++++++ .../autovec/gather-scatter/strided_store-1.c | 2 +- .../autovec/gather-scatter/strided_store-2.c | 2 +- .../autovec/gather-scatter/strided_store-3.c | 45 +++++++++ .../gather-scatter/strided_store_run-3.c | 82 ++++++++++++++++ 15 files changed, 628 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load_run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store_run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-3.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 80e41af6334..e0c294ffd10 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -272,6 +272,36 @@ DONE; }) +;; ========================================================================= +;; == Strided Load/Store +;; ========================================================================= + +(define_expand "mask_len_strided_load_" + [(match_operand:V 0 "register_operand") + (match_operand 1 "pmode_reg_or_0_operand") + (match_operand 2 "pmode_reg_or_0_operand") + (match_operand: 3 "vector_mask_operand") + (match_operand 4 "autovec_length_operand") + (match_operand 5 "const_0_operand")] + "TARGET_VECTOR" +{ + riscv_vector::expand_strided_load_store (mode, operands, true); + DONE; +}) + +(define_expand "mask_len_strided_store_" + [(match_operand 0 "pmode_reg_or_0_operand") + (match_operand 1 "pmode_reg_or_0_operand") + (match_operand:V 2 "register_operand") + (match_operand: 3 "vector_mask_operand") + (match_operand 4 "autovec_length_operand") + (match_operand 5 "const_0_operand")] + "TARGET_VECTOR" +{ + riscv_vector::expand_strided_load_store (mode, operands, false); + DONE; +}) + ;; ========================================================================= ;; == Array Load/Store ;; ========================================================================= diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 8cdfadbcf10..3ef5740cf5b 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -546,6 +546,7 @@ void expand_vec_perm (rtx, rtx, rtx, rtx); void expand_select_vl (rtx *); void expand_load_store (rtx *, bool); void expand_gather_scatter (rtx *, bool); +void expand_strided_load_store (machine_mode, rtx *, bool); void expand_cond_len_ternop (unsigned, rtx *); void prepare_ternary_operands (rtx *); void expand_lanes_load_store (rtx *, bool); diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 265a298f447..6e9bb08aee7 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -3608,6 +3608,62 @@ expand_gather_scatter (rtx *ops, bool is_load) } } +/* Expand MASK_LEN_STRIDED_{LOAD,STORE}. */ +void +expand_strided_load_store (machine_mode mode, rtx *ops, bool is_load) +{ + rtx ptr, stride, vec_reg; + rtx mask = ops[3]; + rtx len = ops[4]; + poly_int64 value; + if (is_load) + { + vec_reg = ops[0]; + ptr = ops[1]; + stride = ops[2]; + } + else + { + vec_reg = ops[2]; + ptr = ops[0]; + stride = ops[1]; + } + + if (poly_int_rtx_p (len, &value) && known_eq (value, GET_MODE_NUNITS (mode))) + { + /* If the length operand is equal to VF, it is VLMAX load/store. */ + if (is_load) + { + rtx m_ops[] = {vec_reg, mask, gen_rtx_MEM (mode, ptr), stride}; + emit_vlmax_insn (code_for_pred_strided_load (mode), BINARY_OP_TAMA, + m_ops); + } + else + { + len = gen_reg_rtx (Pmode); + emit_vlmax_vsetvl (mode, len); + emit_insn (gen_pred_strided_store (mode, gen_rtx_MEM (mode, ptr), + mask, stride, vec_reg, len, + get_avl_type_rtx (VLMAX))); + } + } + else + { + if (!satisfies_constraint_K (len)) + len = force_reg (Pmode, len); + if (is_load) + { + rtx m_ops[] = {vec_reg, mask, gen_rtx_MEM (mode, ptr), stride}; + emit_nonvlmax_insn (code_for_pred_strided_load (mode), BINARY_OP_TAMA, + m_ops, len); + } + else + emit_insn (gen_pred_strided_store (mode, gen_rtx_MEM (mode, ptr), mask, + stride, vec_reg, len, + get_avl_type_rtx (NONVLMAX))); + } +} + /* Expand COND_LEN_*. */ void expand_cond_len_ternop (unsigned icode, rtx *ops) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load-1.c new file mode 100644 index 00000000000..b2b6a03189d --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load-1.c @@ -0,0 +1,47 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */ + +#include + +#ifndef INDEX8 +#define INDEX8 int8_t +#define INDEX16 int16_t +#define INDEX32 int32_t +#define INDEX64 int64_t +#endif + +#define TEST_LOOP(DATA_TYPE, BITS) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src, \ + INDEX##BITS stride, DATA_TYPE *restrict cond, \ + INDEX##BITS n) \ + { \ + for (INDEX##BITS i = 0; i < n; ++i) \ + if (cond[i * stride]) \ + dest[i] += src[i * stride]; \ + } + +#define TEST_TYPE(T, DATA_TYPE) \ + T (DATA_TYPE, 8) \ + T (DATA_TYPE, 16) \ + T (DATA_TYPE, 32) \ + T (DATA_TYPE, 64) + +#define TEST_ALL(T) \ + TEST_TYPE (T, int8_t) \ + TEST_TYPE (T, uint8_t) \ + TEST_TYPE (T, int16_t) \ + TEST_TYPE (T, uint16_t) \ + TEST_TYPE (T, _Float16) \ + TEST_TYPE (T, int32_t) \ + TEST_TYPE (T, uint32_t) \ + TEST_TYPE (T, float) \ + TEST_TYPE (T, int64_t) \ + TEST_TYPE (T, uint64_t) \ + TEST_TYPE (T, double) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_LOAD" 132 "optimized" } } */ +/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "optimized" } } */ +/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "optimized" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load_run-1.c new file mode 100644 index 00000000000..08e70ad4e44 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_load_run-1.c @@ -0,0 +1,97 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-mcmodel=medany" } */ + +#include "mask_strided_load-1.c" +#include + +int +main (void) +{ + /* FIXME: The purpose of this assembly is to ensure that the vtype register is + initialized befor instructions such as vmv1r.v are executed. Otherwise you + will get illegal instruction errors when running with spike+pk. This is an + interim solution for reduce unnecessary failures and a unified solution + will come later. */ + asm volatile("vsetivli x0, 0, e8, m1, ta, ma"); +#define RUN_LOOP(DATA_TYPE, BITS) \ + DATA_TYPE dest_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)]; \ + DATA_TYPE dest2_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)]; \ + DATA_TYPE src_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)]; \ + DATA_TYPE cond_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)]; \ + INDEX##BITS stride_##DATA_TYPE##_##BITS = (BITS - 3); \ + INDEX##BITS n_##DATA_TYPE##_##BITS = (BITS + 13); \ + for (INDEX##BITS i = 0; \ + i < stride_##DATA_TYPE##_##BITS * n_##DATA_TYPE##_##BITS; i++) \ + { \ + dest_##DATA_TYPE##_##BITS[i] \ + = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1)); \ + dest2_##DATA_TYPE##_##BITS[i] \ + = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1)); \ + src_##DATA_TYPE##_##BITS[i] \ + = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1)); \ + cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) (i & 1); \ + } \ + f_##DATA_TYPE##_##BITS (dest_##DATA_TYPE##_##BITS, src_##DATA_TYPE##_##BITS, \ + stride_##DATA_TYPE##_##BITS, \ + cond_##DATA_TYPE##_##BITS, n_##DATA_TYPE##_##BITS); \ + for (int i = 0; i < n_##DATA_TYPE##_##BITS; i++) \ + { \ + if (cond_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS]) \ + assert ( \ + dest_##DATA_TYPE##_##BITS[i] \ + == (dest2_##DATA_TYPE##_##BITS[i] \ + + src_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS])); \ + else \ + assert (dest_##DATA_TYPE##_##BITS[i] \ + == dest2_##DATA_TYPE##_##BITS[i]); \ + } + + RUN_LOOP (int8_t, 8) + RUN_LOOP (uint8_t, 8) + RUN_LOOP (int16_t, 8) + RUN_LOOP (uint16_t, 8) + RUN_LOOP (_Float16, 8) + RUN_LOOP (int32_t, 8) + RUN_LOOP (uint32_t, 8) + RUN_LOOP (float, 8) + RUN_LOOP (int64_t, 8) + RUN_LOOP (uint64_t, 8) + RUN_LOOP (double, 8) + + RUN_LOOP (int8_t, 16) + RUN_LOOP (uint8_t, 16) + RUN_LOOP (int16_t, 16) + RUN_LOOP (uint16_t, 16) + RUN_LOOP (_Float16, 16) + RUN_LOOP (int32_t, 16) + RUN_LOOP (uint32_t, 16) + RUN_LOOP (float, 16) + RUN_LOOP (int64_t, 16) + RUN_LOOP (uint64_t, 16) + RUN_LOOP (double, 16) + + RUN_LOOP (int8_t, 32) + RUN_LOOP (uint8_t, 32) + RUN_LOOP (int16_t, 32) + RUN_LOOP (uint16_t, 32) + RUN_LOOP (_Float16, 32) + RUN_LOOP (int32_t, 32) + RUN_LOOP (uint32_t, 32) + RUN_LOOP (float, 32) + RUN_LOOP (int64_t, 32) + RUN_LOOP (uint64_t, 32) + RUN_LOOP (double, 32) + + RUN_LOOP (int8_t, 64) + RUN_LOOP (uint8_t, 64) + RUN_LOOP (int16_t, 64) + RUN_LOOP (uint16_t, 64) + RUN_LOOP (_Float16, 64) + RUN_LOOP (int32_t, 64) + RUN_LOOP (uint32_t, 64) + RUN_LOOP (float, 64) + RUN_LOOP (int64_t, 64) + RUN_LOOP (uint64_t, 64) + RUN_LOOP (double, 64) + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store-1.c new file mode 100644 index 00000000000..a832af2ba57 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store-1.c @@ -0,0 +1,48 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */ + +#include + +#ifndef INDEX8 +#define INDEX8 int8_t +#define INDEX16 int16_t +#define INDEX32 int32_t +#define INDEX64 int64_t +#endif + +#define TEST_LOOP(DATA_TYPE, BITS) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src, \ + INDEX##BITS stride, DATA_TYPE *restrict cond, \ + INDEX##BITS n) \ + { \ + for (INDEX##BITS i = 0; i < n; ++i) \ + if (cond[i * stride]) \ + dest[i * stride] = src[i] + BITS; \ + } + +#define TEST_TYPE(T, DATA_TYPE) \ + T (DATA_TYPE, 8) \ + T (DATA_TYPE, 16) \ + T (DATA_TYPE, 32) \ + T (DATA_TYPE, 64) + +#define TEST_ALL(T) \ + TEST_TYPE (T, int8_t) \ + TEST_TYPE (T, uint8_t) \ + TEST_TYPE (T, int16_t) \ + TEST_TYPE (T, uint16_t) \ + TEST_TYPE (T, _Float16) \ + TEST_TYPE (T, int32_t) \ + TEST_TYPE (T, uint32_t) \ + TEST_TYPE (T, float) \ + TEST_TYPE (T, int64_t) \ + TEST_TYPE (T, uint64_t) \ + TEST_TYPE (T, double) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_STORE" 66 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_LOAD" 66 "optimized" } } */ +/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "optimized" } } */ +/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "optimized" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store_run-1.c new file mode 100644 index 00000000000..58956bd6925 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/mask_strided_store_run-1.c @@ -0,0 +1,89 @@ +/* { dg-do run { target { riscv_v } } } */ + +#include "mask_strided_store-1.c" +#include + +int +main (void) +{ +#define RUN_LOOP(DATA_TYPE, BITS) \ + DATA_TYPE dest_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)]; \ + DATA_TYPE dest2_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)]; \ + DATA_TYPE src_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)]; \ + DATA_TYPE cond_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)]; \ + INDEX##BITS stride_##DATA_TYPE##_##BITS = (BITS - 3); \ + INDEX##BITS n_##DATA_TYPE##_##BITS = (BITS + 13); \ + for (INDEX##BITS i = 0; \ + i < stride_##DATA_TYPE##_##BITS * n_##DATA_TYPE##_##BITS; i++) \ + { \ + dest_##DATA_TYPE##_##BITS[i] \ + = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1)); \ + dest2_##DATA_TYPE##_##BITS[i] \ + = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1)); \ + src_##DATA_TYPE##_##BITS[i] \ + = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1)); \ + cond_##DATA_TYPE##_##BITS[i] = (DATA_TYPE) (i & 1); \ + } \ + f_##DATA_TYPE##_##BITS (dest_##DATA_TYPE##_##BITS, src_##DATA_TYPE##_##BITS, \ + stride_##DATA_TYPE##_##BITS, \ + cond_##DATA_TYPE##_##BITS, n_##DATA_TYPE##_##BITS); \ + for (int i = 0; i < n_##DATA_TYPE##_##BITS; i++) \ + { \ + if (cond_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS]) \ + assert (dest_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS] \ + == (src_##DATA_TYPE##_##BITS[i] + BITS)); \ + else \ + assert ( \ + dest_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS] \ + == dest2_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS]); \ + } + + RUN_LOOP (int8_t, 8) + RUN_LOOP (uint8_t, 8) + RUN_LOOP (int16_t, 8) + RUN_LOOP (uint16_t, 8) + RUN_LOOP (_Float16, 8) + RUN_LOOP (int32_t, 8) + RUN_LOOP (uint32_t, 8) + RUN_LOOP (float, 8) + RUN_LOOP (int64_t, 8) + RUN_LOOP (uint64_t, 8) + RUN_LOOP (double, 8) + + RUN_LOOP (int8_t, 16) + RUN_LOOP (uint8_t, 16) + RUN_LOOP (int16_t, 16) + RUN_LOOP (uint16_t, 16) + RUN_LOOP (_Float16, 16) + RUN_LOOP (int32_t, 16) + RUN_LOOP (uint32_t, 16) + RUN_LOOP (float, 16) + RUN_LOOP (int64_t, 16) + RUN_LOOP (uint64_t, 16) + RUN_LOOP (double, 16) + + RUN_LOOP (int8_t, 32) + RUN_LOOP (uint8_t, 32) + RUN_LOOP (int16_t, 32) + RUN_LOOP (uint16_t, 32) + RUN_LOOP (_Float16, 32) + RUN_LOOP (int32_t, 32) + RUN_LOOP (uint32_t, 32) + RUN_LOOP (float, 32) + RUN_LOOP (int64_t, 32) + RUN_LOOP (uint64_t, 32) + RUN_LOOP (double, 32) + + RUN_LOOP (int8_t, 64) + RUN_LOOP (uint8_t, 64) + RUN_LOOP (int16_t, 64) + RUN_LOOP (uint16_t, 64) + RUN_LOOP (_Float16, 64) + RUN_LOOP (int32_t, 64) + RUN_LOOP (uint32_t, 64) + RUN_LOOP (float, 64) + RUN_LOOP (int64_t, 64) + RUN_LOOP (uint64_t, 64) + RUN_LOOP (double, 64) + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c index b1e6a17543f..8f383069b0e 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-1.c @@ -40,6 +40,6 @@ TEST_ALL (TEST_LOOP) -/* { dg-final { scan-tree-dump-times " \.MASK_LEN_GATHER_LOAD" 66 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_LOAD" 66 "optimized" } } */ /* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "optimized" } } */ /* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "optimized" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c index 2c9e7dd14a8..3b497e69088 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-2.c @@ -40,6 +40,6 @@ TEST_ALL (TEST_LOOP) -/* { dg-final { scan-tree-dump-times " \.MASK_LEN_GATHER_LOAD" 33 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_LOAD" 33 "optimized" } } */ /* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "optimized" } } */ /* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "optimized" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-3.c new file mode 100644 index 00000000000..77e34a85575 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load-3.c @@ -0,0 +1,45 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */ + +#include + +#ifndef INDEX8 +#define INDEX8 uint8_t +#define INDEX16 uint16_t +#define INDEX32 uint32_t +#define INDEX64 uint64_t +#endif + +#define TEST_LOOP(DATA_TYPE, BITS) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src, \ + INDEX##BITS stride, INDEX##BITS n) \ + { \ + for (INDEX##BITS i = 0; i < n; ++i) \ + dest[i] += src[i * stride]; \ + } + +#define TEST_TYPE(T, DATA_TYPE) \ + T (DATA_TYPE, 8) \ + T (DATA_TYPE, 16) \ + T (DATA_TYPE, 32) \ + T (DATA_TYPE, 64) + +#define TEST_ALL(T) \ + TEST_TYPE (T, int8_t) \ + TEST_TYPE (T, uint8_t) \ + TEST_TYPE (T, int16_t) \ + TEST_TYPE (T, uint16_t) \ + TEST_TYPE (T, _Float16) \ + TEST_TYPE (T, int32_t) \ + TEST_TYPE (T, uint32_t) \ + TEST_TYPE (T, float) \ + TEST_TYPE (T, int64_t) \ + TEST_TYPE (T, uint64_t) \ + TEST_TYPE (T, double) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_LOAD" 55 "optimized" } } */ +/* { dg-final { scan-tree-dump-not " \.GATHER_LOAD" "optimized" } } */ +/* { dg-final { scan-tree-dump-not " \.MASK_GATHER_LOAD" "optimized" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-3.c new file mode 100644 index 00000000000..2835e502cfa --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-3.c @@ -0,0 +1,84 @@ +/* { dg-do run { target { riscv_v } } } */ + +#include "strided_load-3.c" +#include + +int +main (void) +{ +#define RUN_LOOP(DATA_TYPE, BITS) \ + DATA_TYPE dest_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)]; \ + DATA_TYPE dest2_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)]; \ + DATA_TYPE src_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)]; \ + INDEX##BITS stride_##DATA_TYPE##_##BITS = (BITS - 3); \ + INDEX##BITS n_##DATA_TYPE##_##BITS = (BITS + 13); \ + for (INDEX##BITS i = 0; \ + i < stride_##DATA_TYPE##_##BITS * n_##DATA_TYPE##_##BITS; i++) \ + { \ + dest_##DATA_TYPE##_##BITS[i] \ + = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1)); \ + dest2_##DATA_TYPE##_##BITS[i] \ + = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1)); \ + src_##DATA_TYPE##_##BITS[i] \ + = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1)); \ + } \ + f_##DATA_TYPE##_##BITS (dest_##DATA_TYPE##_##BITS, src_##DATA_TYPE##_##BITS, \ + stride_##DATA_TYPE##_##BITS, \ + n_##DATA_TYPE##_##BITS); \ + for (int i = 0; i < n_##DATA_TYPE##_##BITS; i++) \ + { \ + assert ( \ + dest_##DATA_TYPE##_##BITS[i] \ + == (dest2_##DATA_TYPE##_##BITS[i] \ + + src_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS])); \ + } + + RUN_LOOP (int8_t, 8) + RUN_LOOP (uint8_t, 8) + RUN_LOOP (int16_t, 8) + RUN_LOOP (uint16_t, 8) + RUN_LOOP (_Float16, 8) + RUN_LOOP (int32_t, 8) + RUN_LOOP (uint32_t, 8) + RUN_LOOP (float, 8) + RUN_LOOP (int64_t, 8) + RUN_LOOP (uint64_t, 8) + RUN_LOOP (double, 8) + + RUN_LOOP (int8_t, 16) + RUN_LOOP (uint8_t, 16) + RUN_LOOP (int16_t, 16) + RUN_LOOP (uint16_t, 16) + RUN_LOOP (_Float16, 16) + RUN_LOOP (int32_t, 16) + RUN_LOOP (uint32_t, 16) + RUN_LOOP (float, 16) + RUN_LOOP (int64_t, 16) + RUN_LOOP (uint64_t, 16) + RUN_LOOP (double, 16) + + RUN_LOOP (int8_t, 32) + RUN_LOOP (uint8_t, 32) + RUN_LOOP (int16_t, 32) + RUN_LOOP (uint16_t, 32) + RUN_LOOP (_Float16, 32) + RUN_LOOP (int32_t, 32) + RUN_LOOP (uint32_t, 32) + RUN_LOOP (float, 32) + RUN_LOOP (int64_t, 32) + RUN_LOOP (uint64_t, 32) + RUN_LOOP (double, 32) + + RUN_LOOP (int8_t, 64) + RUN_LOOP (uint8_t, 64) + RUN_LOOP (int16_t, 64) + RUN_LOOP (uint16_t, 64) + RUN_LOOP (_Float16, 64) + RUN_LOOP (int32_t, 64) + RUN_LOOP (uint32_t, 64) + RUN_LOOP (float, 64) + RUN_LOOP (int64_t, 64) + RUN_LOOP (uint64_t, 64) + RUN_LOOP (double, 64) + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c index 3e6a34029b3..d8da51f3ac5 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-1.c @@ -40,6 +40,6 @@ TEST_ALL (TEST_LOOP) -/* { dg-final { scan-tree-dump-times " \.MASK_LEN_SCATTER_STORE" 66 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_STORE" 66 "optimized" } } */ /* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "optimized" } } */ /* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "optimized" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c index 6906af17d84..9ffea4e0d99 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-2.c @@ -40,6 +40,6 @@ TEST_ALL (TEST_LOOP) -/* { dg-final { scan-tree-dump-times " \.MASK_LEN_SCATTER_STORE" 44 "optimized" } } */ +/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_STORE" 44 "optimized" } } */ /* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "optimized" } } */ /* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "optimized" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c new file mode 100644 index 00000000000..eb0bcddaade --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store-3.c @@ -0,0 +1,45 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */ + +#include + +#ifndef INDEX8 +#define INDEX8 uint8_t +#define INDEX16 uint16_t +#define INDEX32 uint32_t +#define INDEX64 uint64_t +#endif + +#define TEST_LOOP(DATA_TYPE, BITS) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, DATA_TYPE *restrict src, \ + INDEX##BITS stride, INDEX##BITS n) \ + { \ + for (INDEX##BITS i = 0; i < n; ++i) \ + dest[i * stride] = src[i] + BITS; \ + } + +#define TEST_TYPE(T, DATA_TYPE) \ + T (DATA_TYPE, 8) \ + T (DATA_TYPE, 16) \ + T (DATA_TYPE, 32) \ + T (DATA_TYPE, 64) + +#define TEST_ALL(T) \ + TEST_TYPE (T, int8_t) \ + TEST_TYPE (T, uint8_t) \ + TEST_TYPE (T, int16_t) \ + TEST_TYPE (T, uint16_t) \ + TEST_TYPE (T, _Float16) \ + TEST_TYPE (T, int32_t) \ + TEST_TYPE (T, uint32_t) \ + TEST_TYPE (T, float) \ + TEST_TYPE (T, int64_t) \ + TEST_TYPE (T, uint64_t) \ + TEST_TYPE (T, double) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-tree-dump-times " \.MASK_LEN_STRIDED_STORE" 55 "optimized" } } */ +/* { dg-final { scan-tree-dump-not " \.SCATTER_STORE" "optimized" } } */ +/* { dg-final { scan-tree-dump-not " \.MASK_SCATTER_STORE" "optimized" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-3.c new file mode 100644 index 00000000000..079336737bb --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-3.c @@ -0,0 +1,82 @@ +/* { dg-do run { target { riscv_v } } } */ + +#include "strided_store-3.c" +#include + +int +main (void) +{ +#define RUN_LOOP(DATA_TYPE, BITS) \ + DATA_TYPE dest_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)]; \ + DATA_TYPE dest2_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)]; \ + DATA_TYPE src_##DATA_TYPE##_##BITS[(BITS - 3) * (BITS + 13)]; \ + INDEX##BITS stride_##DATA_TYPE##_##BITS = (BITS - 3); \ + INDEX##BITS n_##DATA_TYPE##_##BITS = (BITS + 13); \ + for (INDEX##BITS i = 0; \ + i < stride_##DATA_TYPE##_##BITS * n_##DATA_TYPE##_##BITS; i++) \ + { \ + dest_##DATA_TYPE##_##BITS[i] \ + = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1)); \ + dest2_##DATA_TYPE##_##BITS[i] \ + = (DATA_TYPE) ((i * 81 + 735) & (BITS - 1)); \ + src_##DATA_TYPE##_##BITS[i] \ + = (DATA_TYPE) ((i * 13 + 9107) & (BITS - 1)); \ + } \ + f_##DATA_TYPE##_##BITS (dest_##DATA_TYPE##_##BITS, src_##DATA_TYPE##_##BITS, \ + stride_##DATA_TYPE##_##BITS, \ + n_##DATA_TYPE##_##BITS); \ + for (int i = 0; i < n_##DATA_TYPE##_##BITS; i++) \ + { \ + assert (dest_##DATA_TYPE##_##BITS[i * stride_##DATA_TYPE##_##BITS] \ + == (src_##DATA_TYPE##_##BITS[i] + BITS)); \ + } + + RUN_LOOP (int8_t, 8) + RUN_LOOP (uint8_t, 8) + RUN_LOOP (int16_t, 8) + RUN_LOOP (uint16_t, 8) + RUN_LOOP (_Float16, 8) + RUN_LOOP (int32_t, 8) + RUN_LOOP (uint32_t, 8) + RUN_LOOP (float, 8) + RUN_LOOP (int64_t, 8) + RUN_LOOP (uint64_t, 8) + RUN_LOOP (double, 8) + + RUN_LOOP (int8_t, 16) + RUN_LOOP (uint8_t, 16) + RUN_LOOP (int16_t, 16) + RUN_LOOP (uint16_t, 16) + RUN_LOOP (_Float16, 16) + RUN_LOOP (int32_t, 16) + RUN_LOOP (uint32_t, 16) + RUN_LOOP (float, 16) + RUN_LOOP (int64_t, 16) + RUN_LOOP (uint64_t, 16) + RUN_LOOP (double, 16) + + RUN_LOOP (int8_t, 32) + RUN_LOOP (uint8_t, 32) + RUN_LOOP (int16_t, 32) + RUN_LOOP (uint16_t, 32) + RUN_LOOP (_Float16, 32) + RUN_LOOP (int32_t, 32) + RUN_LOOP (uint32_t, 32) + RUN_LOOP (float, 32) + RUN_LOOP (int64_t, 32) + RUN_LOOP (uint64_t, 32) + RUN_LOOP (double, 32) + + RUN_LOOP (int8_t, 64) + RUN_LOOP (uint8_t, 64) + RUN_LOOP (int16_t, 64) + RUN_LOOP (uint16_t, 64) + RUN_LOOP (_Float16, 64) + RUN_LOOP (int32_t, 64) + RUN_LOOP (uint32_t, 64) + RUN_LOOP (float, 64) + RUN_LOOP (int64_t, 64) + RUN_LOOP (uint64_t, 64) + RUN_LOOP (double, 64) + return 0; +}