From patchwork Wed Oct 18 04:32:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 154626 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp4560043vqb; Tue, 17 Oct 2023 21:33:37 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHoEbWWmByaMHcuu734hikPOkrU4QYWzNgJKWPjZKNEF842wZM4t/dR23Z+FWIDIkpeYZ/d X-Received: by 2002:a1f:1f81:0:b0:4a1:a334:57f0 with SMTP id f123-20020a1f1f81000000b004a1a33457f0mr4246938vkf.3.1697603617520; Tue, 17 Oct 2023 21:33:37 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1697603617; cv=pass; d=google.com; s=arc-20160816; b=u+wvADmSFd/r+drUt00y0slyi3i+lrFth2z/E7cU8aYg2NQcnWImDoxZlpN+OQOGTP NDUcdr025Hrji749b00I1uvW6GA5aqVsYbA4KgjoIVQBLBLXFXgLoaSjZxfNBkm4KHS5 TeaOsNXdHhcfwA7JbhEVABuEbt9unrvdb6YuYyWtkpE+uAvDTlH7/zgzj2ZOAgNKUVNG v9z6hOklFU54KFnRxcfp8UqXD5MHM+jfhRhGI1Z0z2AOp4410zdCGwPnXDoV3xkFladf OzVyS2nRIvlMtLzDRCgjXC+N4GxFVww3fmwCw8PTBoo0lGhB913+tXO1MTpc2gD2+8SK /oWg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:arc-filter:dmarc-filter:delivered-to; bh=CcV3dpRwm5MEriw+AWwg19Rkr1KQ3hi4mjNRaF0Pdkc=; fh=12MRPJmZ1mgDpHqWoogMKqnaGRGM2b7lcuJroqfjJiw=; b=c8w7P0zsdrnlNKi2kAqYIoPeisLdyT9vMDKKvP9kk8Diu2jIHd6Nxzd36/mCc/2CEY TjGokMwWvBnYEleCq+N3cagEMc3+VkU9+RBvUwBN7ArJBR6H2CIhLYMbckVWjN8u+Z2b niZEsmMukq5omtdWiorw2nJn8ZKYIIOuwdQXCn7WNpnh9KjtKoPt26AUgzufBzxwLrFt qtuTXDjVS5Eh5BGvNtexlTSrkzEQX4gFNf9iu9sNcoFE5yYSp3b+XnpKj8R+VF5dRLoI pPglCjNg9gAnGQcWU6y7y6vCoEX6x8stiag9KURtN9rPgvJxGopjuXHiUHrSaRNyZ0Ke eDwg== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id ld27-20020a056214419b00b0065b262017ecsi2172716qvb.214.2023.10.17.21.33.37 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 21:33:37 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3D1193858C5E for ; Wed, 18 Oct 2023 04:33:37 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgbr1.qq.com (smtpbgbr1.qq.com [54.207.19.206]) by sourceware.org (Postfix) with ESMTPS id 23A3E3858D33 for ; Wed, 18 Oct 2023 04:33:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 23A3E3858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 23A3E3858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=54.207.19.206 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697603594; cv=none; b=fbDRN12EN+xXtyhRXesfq/3P+GHOuqSf6oSejZzamzerxLOD+deSGOnbg8dyzlUp9mq1g8T0GX5Ktg4Qg+LeU6tBw5lVuwPfiW7og0EhPHT+vKZl4cHh/G9cz7oK6weOapD3FJKtlX168nhAyQvWjpYcq9t+QhLvtCbMgCo35QU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697603594; c=relaxed/simple; bh=OH2bwsre3EtxgQuJROIBYrT5kv6IeSDA5sTpQjTxlio=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=xSrX4WF1vnMNNHMvHwLmOIUbdmey6Rgh9+fhE9OS1epzW2fjZwHvu+96jarv/9aRitAk0kyuap5Klo3r2dSBkXpBWsxivEQ/wcQiJI68g81CHr/9lJVrtkWZWQNyT+J2JlsINSiZ5egRztuENpCB96gO3SultiO2PHTpBszD8g4= ARC-Authentication-Results: i=1; server2.sourceware.org X-QQ-mid: bizesmtp64t1697603582tu03kp9x Received: from rios-cad121.hadoop.rioslab.org ( [58.60.1.9]) by bizesmtp.qq.com (ESMTP) with id ; Wed, 18 Oct 2023 12:33:00 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: q+EIYT+FhZpYu1JFWxIfK4q3U4i67OoerYotDgklVRwqJ43pleHAoKx5haMRU A/GnOJlMdFjsYP9s1Pdl1hxdqYQ8j+O8Un8tW2qo2iyZEfwmqxwYU5gOdsBwdG4kVG43TDx rjNB7dhC8jPo9fUSSeIwZMbhBoGHB60COE3kQwtB18AtnhjGs2G2P8FsyZr44ZP7Ej6WuuA 89zDcxGCIC1xbI1TQrexdkKR2hPkxozu1sIPxO+gNJqJDnCyE4P9CLnBGljfstWh1ZWt/St Tpw92z1ucg1YLbfXTzEfUDNH6x56pB4AhVhcfrob2DH7IbukW3zZglLg8/PncUUs2y0yayK XbUEMpFBk9AiuMonr2gATqxJI6hiO+xC5899vhzvAFdOfIlXz445x2mOGWQSMP1chv4HPB8 mGY/Iw7mg9M3d5PFq3Jehg== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 3267191948547264736 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, Juzhe-Zhong Subject: [PATCH] RISC-V: Optimize consecutive permutation index pattern by vrgather.vi/vx Date: Wed, 18 Oct 2023 12:32:59 +0800 Message-Id: <20231018043259.1873023-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780066411041167103 X-GMAIL-MSGID: 1780066411041167103 This patch optimize this following permutation with consecutive patterns index: typedef char vnx16i __attribute__ ((vector_size (16))); #define MASK_16 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15 vnx16i __attribute__ ((noinline, noclone)) test_1 (vnx16i x, vnx16i y) { return __builtin_shufflevector (x, y, MASK_16); } Before this patch: lui a5,%hi(.LC0) addi a5,a5,%lo(.LC0) vsetivli zero,16,e8,m1,ta,ma vle8.v v3,0(a5) vle8.v v2,0(a1) vrgather.vv v1,v2,v3 vse8.v v1,0(a0) ret After this patch: vsetivli zero,16,e8,mf8,ta,ma vle8.v v2,0(a1) vsetivli zero,4,e32,mf2,ta,ma vrgather.vi v1,v2,3 vsetivli zero,16,e8,mf8,ta,ma vse8.v v1,0(a0) ret Overal reduce 1 instruction which is vector load instruction which is much more expansive than VL toggling. Also, with this patch, we are using vrgather.vi which reduce 1 vector register consumption. gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_consecutive_patterns): New function. (expand_vec_perm_const_1): Add consecutive pattern recognition. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/def.h: Add new test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/consecutive-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/consecutive-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/consecutive-3.c: New test. --- gcc/config/riscv/riscv-v.cc | 85 +++++++++++++++++ .../rvv/autovec/vls-vlmax/consecutive-1.c | 21 +++++ .../rvv/autovec/vls-vlmax/consecutive-2.c | 45 +++++++++ .../rvv/autovec/vls-vlmax/consecutive_run-1.c | 27 ++++++ .../rvv/autovec/vls-vlmax/consecutive_run-2.c | 51 ++++++++++ .../riscv/rvv/autovec/vls/consecutive-1.c | 94 +++++++++++++++++++ .../riscv/rvv/autovec/vls/consecutive-2.c | 68 ++++++++++++++ .../riscv/rvv/autovec/vls/consecutive-3.c | 68 ++++++++++++++ .../gcc.target/riscv/rvv/autovec/vls/def.h | 6 ++ 9 files changed, 465 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-3.c diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 21d86c3f917..895c11d13fc 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -2822,6 +2822,89 @@ shuffle_merge_patterns (struct expand_vec_perm_d *d) return true; } +/* Recognize the consecutive index that we can use a single + vrgather.v[x|i] to shuffle the vectors. + + e.g. short[8] = VEC_PERM_EXPR + Use SEW = 32, index = 1 vrgather.vi to get the result. */ +static bool +shuffle_consecutive_patterns (struct expand_vec_perm_d *d) +{ + machine_mode vmode = d->vmode; + scalar_mode smode = GET_MODE_INNER (vmode); + poly_int64 vec_len = d->perm.length (); + HOST_WIDE_INT elt; + + if (!vec_len.is_constant () || !d->perm[0].is_constant (&elt)) + return false; + int vlen = vec_len.to_constant (); + + /* Compute the last element index of consecutive pattern from the leading + consecutive elements. */ + int last_consecutive_idx = -1; + int consecutive_num = -1; + for (int i = 1; i < vlen; i++) + { + if (maybe_ne (d->perm[i], d->perm[i - 1] + 1)) + break; + last_consecutive_idx = i; + consecutive_num = last_consecutive_idx + 1; + } + + int new_vlen = vlen / consecutive_num; + if (last_consecutive_idx < 0 || consecutive_num == vlen + || !pow2p_hwi (consecutive_num) || !pow2p_hwi (new_vlen)) + return false; + /* VEC_PERM <..., (index, index + 1, ... index + consecutive_num - 1)>. + All elements of index, index + 1, ... index + consecutive_num - 1 should + locate at the same vector. */ + if (maybe_ge (d->perm[0], vec_len) + != maybe_ge (d->perm[last_consecutive_idx], vec_len)) + return false; + /* If a vector has 8 elements. We allow optimizations on consecutive + patterns e.g. <0, 1, 2, 3, 0, 1, 2, 3> or <4, 5, 6, 7, 4, 5, 6, 7>. + Other patterns like <2, 3, 4, 5, 2, 3, 4, 5> are not feasible patterns + to be optimized. */ + if (d->perm[0].to_constant () % consecutive_num != 0) + return false; + unsigned int container_bits = consecutive_num * GET_MODE_BITSIZE (smode); + if (container_bits > 64) + return false; + else if (container_bits == 64) + { + if (!TARGET_VECTOR_ELEN_64) + return false; + else if (FLOAT_MODE_P (smode) && !TARGET_VECTOR_ELEN_FP_64) + return false; + } + + /* Check the rest of elements are the same consecutive pattern. */ + for (int i = consecutive_num; i < vlen; i++) + if (maybe_ne (d->perm[i], d->perm[i % consecutive_num])) + return false; + + if (FLOAT_MODE_P (smode)) + smode = float_mode_for_size (container_bits).require (); + else + smode = int_mode_for_size (container_bits, 0).require (); + if (!get_vector_mode (smode, new_vlen).exists (&vmode)) + return false; + machine_mode sel_mode = related_int_vector_mode (vmode).require (); + + /* Success! */ + if (d->testing_p) + return true; + + int index = elt / consecutive_num; + if (index >= new_vlen) + index = index - new_vlen; + rtx sel = gen_const_vector_dup (sel_mode, index); + rtx op = elt >= vlen ? d->op0 : d->op1; + emit_vlmax_gather_insn (gen_lowpart (vmode, d->target), + gen_lowpart (vmode, op), sel); + return true; +} + /* Recognize the patterns that we can use compress operation to shuffle the vectors. The perm selector of compress pattern is divided into 2 part: The first part is the random index number < NUNITS. @@ -3174,6 +3257,8 @@ expand_vec_perm_const_1 (struct expand_vec_perm_d *d) { if (shuffle_merge_patterns (d)) return true; + if (shuffle_consecutive_patterns (d)) + return true; if (shuffle_compress_patterns (d)) return true; if (shuffle_decompress_patterns (d)) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-1.c new file mode 100644 index 00000000000..7dc2b99f007 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-1.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 --param riscv-autovec-preference=fixed-vlmax -Wno-psabi" } */ + +#include + +typedef int8_t vnx4i __attribute__ ((vector_size (4))); +typedef uint8_t vnx4ui __attribute__ ((vector_size (4))); + +#define MASK_4 0, 1, 0, 1 + +vnx4i __attribute__ ((noinline, noclone)) test_1 (vnx4i x, vnx4i y) +{ + return __builtin_shufflevector (x, y, MASK_4); +} + +vnx4ui __attribute__ ((noinline, noclone)) test_2 (vnx4ui x, vnx4ui y) +{ + return __builtin_shufflevector (x, y, MASK_4); +} + +/* { dg-final { scan-assembler-times {\tvrgather\.vi} 2 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-2.c new file mode 100644 index 00000000000..9aa91008016 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-2.c @@ -0,0 +1,45 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvfh -mabi=lp64d -O3 --param riscv-autovec-preference=fixed-vlmax -Wno-psabi" } */ + +#include + +typedef int8_t vnx8i __attribute__ ((vector_size (8))); +typedef int16_t vnx4i __attribute__ ((vector_size (8))); +typedef uint8_t vnx8ui __attribute__ ((vector_size (8))); +typedef uint16_t vnx4ui __attribute__ ((vector_size (8))); +typedef _Float16 vnx4f __attribute__ ((vector_size (8))); + +#define MASK_4 4, 5, 4, 5 +#define MASK_8 12, 13, 14, 15, 12, 13, 14, 15 + +vnx8i __attribute__ ((noinline, noclone)) +test_1 (vnx8i x, vnx8i y) +{ + return __builtin_shufflevector (x, y, MASK_8); +} + +vnx4i __attribute__ ((noinline, noclone)) +test_2 (vnx4i x, vnx4i y) +{ + return __builtin_shufflevector (x, y, MASK_4); +} + +vnx8ui __attribute__ ((noinline, noclone)) +test_3 (vnx8ui x, vnx8ui y) +{ + return __builtin_shufflevector (x, y, MASK_8); +} + +vnx4ui __attribute__ ((noinline, noclone)) +test_4 (vnx4ui x, vnx4ui y) +{ + return __builtin_shufflevector (x, y, MASK_4); +} + +vnx4f __attribute__ ((noinline, noclone)) +test_5 (vnx4f x, vnx4f y) +{ + return __builtin_shufflevector (x, y, MASK_4); +} + +/* { dg-final { scan-assembler-times {\tvrgather\.vi} 5 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-1.c new file mode 100644 index 00000000000..d12424ea20a --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-1.c @@ -0,0 +1,27 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-options "-O3 --param riscv-autovec-preference=fixed-vlmax -Wno-psabi" } */ + +#include +#include "consecutive-1.c" + +int +main (void) +{ + vnx4i test_1_x = {99, 111, 2, 4}; + vnx4i test_1_y = {4, 5, 7, 8}; + vnx4i test_1_except = {99, 111, 99, 111}; + vnx4i test_1_real; + test_1_real = test_1 (test_1_x, test_1_y); + for (int i = 0; i < 4; i++) + assert (test_1_real[i] == test_1_except[i]); + + vnx4ui test_2_x = {99, 111, 2, 4}; + vnx4ui test_2_y = {4, 5, 6, 8}; + vnx4ui test_2_except = {99, 111, 99, 111}; + vnx4ui test_2_real; + test_2_real = test_2 (test_2_x, test_2_y); + for (int i = 0; i < 4; i++) + assert (test_2_real[i] == test_2_except[i]); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-2.c new file mode 100644 index 00000000000..8362e9fe87f --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-2.c @@ -0,0 +1,51 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-options "-O3 --param riscv-autovec-preference=fixed-vlmax -Wno-psabi" } */ + +#include +#include "consecutive-2.c" + +int +main (void) +{ + vnx8i test_1_x = {0, 1, 2, 3, 5, 6, 7, 8}; + vnx8i test_1_y = {8, 9, 10, 11, 13, 14, 15, 16}; + vnx8i test_1_except = {13, 14, 15, 16, 13, 14, 15, 16}; + vnx8i test_1_real; + test_1_real = test_1 (test_1_x, test_1_y); + for (int i = 0; i < 8; i++) + assert (test_1_real[i] == test_1_except[i]); + + vnx4i test_2_x = {1, 2, 3, 4}; + vnx4i test_2_y = {5, 6, 7, 8}; + vnx4i test_2_except = {5, 6, 5, 6}; + vnx4i test_2_real; + test_2_real = test_2 (test_2_x, test_2_y); + for (int i = 0; i < 4; i++) + assert (test_2_real[i] == test_2_except[i]); + + vnx8ui test_3_x = {0, 1, 2, 3, 4, 5, 6, 8}; + vnx8ui test_3_y = {8, 9, 10, 11, 12, 13, 15, 16}; + vnx8ui test_3_except = {12, 13, 15, 16, 12, 13, 15, 16}; + vnx8ui test_3_real; + test_3_real = test_3 (test_3_x, test_3_y); + for (int i = 0; i < 8; i++) + assert (test_3_real[i] == test_3_except[i]); + + vnx4ui test_4_x = {1, 2, 3, 4}; + vnx4ui test_4_y = {4, 5, 6, 8}; + vnx4ui test_4_except = {4, 5, 4, 5}; + vnx4ui test_4_real; + test_4_real = test_4 (test_4_x, test_4_y); + for (int i = 0; i < 4; i++) + assert (test_4_real[i] == test_4_except[i]); + + vnx4f test_5_x = {0, 1, 3, 4}; + vnx4f test_5_y = {4, 5, 6, 7}; + vnx4f test_5_except = {4, 5, 4, 5}; + vnx4f test_5_real; + test_5_real = test_5 (test_5_x, test_5_y); + for (int i = 0; i < 4; i++) + assert (test_5_real[i] == test_5_except[i]); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-1.c new file mode 100644 index 00000000000..c010c883065 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-1.c @@ -0,0 +1,94 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -fdump-tree-optimized" } */ + +#include "def.h" + +#define MASK_8 0, 1, 0, 1, 0, 1, 0, 1 +#define MASK_16 MASK_8, MASK_8 +#define MASK_32 MASK_16, MASK_16 +#define MASK_64 MASK_32, MASK_32 +#define MASK_64 MASK_32, MASK_32 +#define MASK_128 MASK_64, MASK_64 +#define MASK_256 MASK_128, MASK_128 +#define MASK_512 MASK_256, MASK_256 +#define MASK_1024 MASK_512, MASK_512 +#define MASK_2048 MASK_1024, MASK_1024 +#define MASK_4096 MASK_2048, MASK_2048 + +DEF_CONSECUTIVE (v8qi, 8) +DEF_CONSECUTIVE (v16qi, 16) +DEF_CONSECUTIVE (v32qi, 32) +DEF_CONSECUTIVE (v64qi, 64) +DEF_CONSECUTIVE (v128qi, 128) +DEF_CONSECUTIVE (v256qi, 256) +DEF_CONSECUTIVE (v512qi, 512) +DEF_CONSECUTIVE (v1024qi, 1024) +DEF_CONSECUTIVE (v2048qi, 2048) +DEF_CONSECUTIVE (v4096qi, 4096) +DEF_CONSECUTIVE (v8uqi, 8) +DEF_CONSECUTIVE (v16uqi, 16) +DEF_CONSECUTIVE (v32uqi, 32) +DEF_CONSECUTIVE (v64uqi, 64) +DEF_CONSECUTIVE (v128uqi, 128) +DEF_CONSECUTIVE (v256uqi, 256) +DEF_CONSECUTIVE (v512uqi, 512) +DEF_CONSECUTIVE (v1024uqi, 1024) +DEF_CONSECUTIVE (v2048uqi, 2048) +DEF_CONSECUTIVE (v4096uqi, 4096) + +DEF_CONSECUTIVE (v8hi, 8) +DEF_CONSECUTIVE (v16hi, 16) +DEF_CONSECUTIVE (v32hi, 32) +DEF_CONSECUTIVE (v64hi, 64) +DEF_CONSECUTIVE (v128hi, 128) +DEF_CONSECUTIVE (v256hi, 256) +DEF_CONSECUTIVE (v512hi, 512) +DEF_CONSECUTIVE (v1024hi, 1024) +DEF_CONSECUTIVE (v2048hi, 2048) +DEF_CONSECUTIVE (v8uhi, 8) +DEF_CONSECUTIVE (v16uhi, 16) +DEF_CONSECUTIVE (v32uhi, 32) +DEF_CONSECUTIVE (v64uhi, 64) +DEF_CONSECUTIVE (v128uhi, 128) +DEF_CONSECUTIVE (v256uhi, 256) +DEF_CONSECUTIVE (v512uhi, 512) +DEF_CONSECUTIVE (v1024uhi, 1024) +DEF_CONSECUTIVE (v2048uhi, 2048) + +DEF_CONSECUTIVE (v8si, 8) +DEF_CONSECUTIVE (v16si, 16) +DEF_CONSECUTIVE (v32si, 32) +DEF_CONSECUTIVE (v64si, 64) +DEF_CONSECUTIVE (v128si, 128) +DEF_CONSECUTIVE (v256si, 256) +DEF_CONSECUTIVE (v512si, 512) +DEF_CONSECUTIVE (v1024si, 1024) +DEF_CONSECUTIVE (v8usi, 8) +DEF_CONSECUTIVE (v16usi, 16) +DEF_CONSECUTIVE (v32usi, 32) +DEF_CONSECUTIVE (v64usi, 64) +DEF_CONSECUTIVE (v128usi, 128) +DEF_CONSECUTIVE (v256usi, 256) +DEF_CONSECUTIVE (v512usi, 512) +DEF_CONSECUTIVE (v1024usi, 1024) + +DEF_CONSECUTIVE (v8hf, 8) +DEF_CONSECUTIVE (v16hf, 16) +DEF_CONSECUTIVE (v32hf, 32) +DEF_CONSECUTIVE (v64hf, 64) +DEF_CONSECUTIVE (v128hf, 128) +DEF_CONSECUTIVE (v256hf, 256) +DEF_CONSECUTIVE (v512hf, 512) +DEF_CONSECUTIVE (v1024hf, 1024) +DEF_CONSECUTIVE (v2048hf, 2048) + +DEF_CONSECUTIVE (v8sf, 8) +DEF_CONSECUTIVE (v16sf, 16) +DEF_CONSECUTIVE (v32sf, 32) +DEF_CONSECUTIVE (v64sf, 64) +DEF_CONSECUTIVE (v128sf, 128) +DEF_CONSECUTIVE (v256sf, 256) +DEF_CONSECUTIVE (v512sf, 512) +DEF_CONSECUTIVE (v1024sf, 1024) + +/* { dg-final { scan-assembler-times {vrgather\.vi\s+v[0-9]+,\s*v[0-9]+,\s*0} 71 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-2.c new file mode 100644 index 00000000000..ccbbb24ad5d --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-2.c @@ -0,0 +1,68 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -fdump-tree-optimized" } */ + +#include "def.h" + +#define MASK_8 4, 5, 6, 7, 4, 5, 6, 7 +#define MASK_16 MASK_8, MASK_8 +#define MASK_32 MASK_16, MASK_16 +#define MASK_64 MASK_32, MASK_32 +#define MASK_64 MASK_32, MASK_32 +#define MASK_128 MASK_64, MASK_64 +#define MASK_256 MASK_128, MASK_128 +#define MASK_512 MASK_256, MASK_256 +#define MASK_1024 MASK_512, MASK_512 +#define MASK_2048 MASK_1024, MASK_1024 +#define MASK_4096 MASK_2048, MASK_2048 + +DEF_CONSECUTIVE (v8qi, 8) +DEF_CONSECUTIVE (v16qi, 16) +DEF_CONSECUTIVE (v32qi, 32) +DEF_CONSECUTIVE (v64qi, 64) +DEF_CONSECUTIVE (v128qi, 128) +DEF_CONSECUTIVE (v256qi, 256) +DEF_CONSECUTIVE (v512qi, 512) +DEF_CONSECUTIVE (v1024qi, 1024) +DEF_CONSECUTIVE (v2048qi, 2048) +DEF_CONSECUTIVE (v4096qi, 4096) +DEF_CONSECUTIVE (v8uqi, 8) +DEF_CONSECUTIVE (v16uqi, 16) +DEF_CONSECUTIVE (v32uqi, 32) +DEF_CONSECUTIVE (v64uqi, 64) +DEF_CONSECUTIVE (v128uqi, 128) +DEF_CONSECUTIVE (v256uqi, 256) +DEF_CONSECUTIVE (v512uqi, 512) +DEF_CONSECUTIVE (v1024uqi, 1024) +DEF_CONSECUTIVE (v2048uqi, 2048) +DEF_CONSECUTIVE (v4096uqi, 4096) + +DEF_CONSECUTIVE (v8hi, 8) +DEF_CONSECUTIVE (v16hi, 16) +DEF_CONSECUTIVE (v32hi, 32) +DEF_CONSECUTIVE (v64hi, 64) +DEF_CONSECUTIVE (v128hi, 128) +DEF_CONSECUTIVE (v256hi, 256) +DEF_CONSECUTIVE (v512hi, 512) +DEF_CONSECUTIVE (v1024hi, 1024) +DEF_CONSECUTIVE (v2048hi, 2048) +DEF_CONSECUTIVE (v8uhi, 8) +DEF_CONSECUTIVE (v16uhi, 16) +DEF_CONSECUTIVE (v32uhi, 32) +DEF_CONSECUTIVE (v64uhi, 64) +DEF_CONSECUTIVE (v128uhi, 128) +DEF_CONSECUTIVE (v256uhi, 256) +DEF_CONSECUTIVE (v512uhi, 512) +DEF_CONSECUTIVE (v1024uhi, 1024) +DEF_CONSECUTIVE (v2048uhi, 2048) + +DEF_CONSECUTIVE (v8hf, 8) +DEF_CONSECUTIVE (v16hf, 16) +DEF_CONSECUTIVE (v32hf, 32) +DEF_CONSECUTIVE (v64hf, 64) +DEF_CONSECUTIVE (v128hf, 128) +DEF_CONSECUTIVE (v256hf, 256) +DEF_CONSECUTIVE (v512hf, 512) +DEF_CONSECUTIVE (v1024hf, 1024) +DEF_CONSECUTIVE (v2048hf, 2048) + +/* { dg-final { scan-assembler-times {vrgather\.vi\s+v[0-9]+,\s*v[0-9]+,\s*1} 47 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-3.c new file mode 100644 index 00000000000..7de3c7df99e --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/consecutive-3.c @@ -0,0 +1,68 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 -fdump-tree-optimized" } */ + +#include "def.h" + +#define MASK_8 2, 3, 4, 5, 2, 3, 4, 5 +#define MASK_16 MASK_8, MASK_8 +#define MASK_32 MASK_16, MASK_16 +#define MASK_64 MASK_32, MASK_32 +#define MASK_64 MASK_32, MASK_32 +#define MASK_128 MASK_64, MASK_64 +#define MASK_256 MASK_128, MASK_128 +#define MASK_512 MASK_256, MASK_256 +#define MASK_1024 MASK_512, MASK_512 +#define MASK_2048 MASK_1024, MASK_1024 +#define MASK_4096 MASK_2048, MASK_2048 + +DEF_CONSECUTIVE (v8qi, 8) +DEF_CONSECUTIVE (v16qi, 16) +DEF_CONSECUTIVE (v32qi, 32) +DEF_CONSECUTIVE (v64qi, 64) +DEF_CONSECUTIVE (v128qi, 128) +DEF_CONSECUTIVE (v256qi, 256) +DEF_CONSECUTIVE (v512qi, 512) +DEF_CONSECUTIVE (v1024qi, 1024) +DEF_CONSECUTIVE (v2048qi, 2048) +DEF_CONSECUTIVE (v4096qi, 4096) +DEF_CONSECUTIVE (v8uqi, 8) +DEF_CONSECUTIVE (v16uqi, 16) +DEF_CONSECUTIVE (v32uqi, 32) +DEF_CONSECUTIVE (v64uqi, 64) +DEF_CONSECUTIVE (v128uqi, 128) +DEF_CONSECUTIVE (v256uqi, 256) +DEF_CONSECUTIVE (v512uqi, 512) +DEF_CONSECUTIVE (v1024uqi, 1024) +DEF_CONSECUTIVE (v2048uqi, 2048) +DEF_CONSECUTIVE (v4096uqi, 4096) + +DEF_CONSECUTIVE (v8hi, 8) +DEF_CONSECUTIVE (v16hi, 16) +DEF_CONSECUTIVE (v32hi, 32) +DEF_CONSECUTIVE (v64hi, 64) +DEF_CONSECUTIVE (v128hi, 128) +DEF_CONSECUTIVE (v256hi, 256) +DEF_CONSECUTIVE (v512hi, 512) +DEF_CONSECUTIVE (v1024hi, 1024) +DEF_CONSECUTIVE (v2048hi, 2048) +DEF_CONSECUTIVE (v8uhi, 8) +DEF_CONSECUTIVE (v16uhi, 16) +DEF_CONSECUTIVE (v32uhi, 32) +DEF_CONSECUTIVE (v64uhi, 64) +DEF_CONSECUTIVE (v128uhi, 128) +DEF_CONSECUTIVE (v256uhi, 256) +DEF_CONSECUTIVE (v512uhi, 512) +DEF_CONSECUTIVE (v1024uhi, 1024) +DEF_CONSECUTIVE (v2048uhi, 2048) + +DEF_CONSECUTIVE (v8hf, 8) +DEF_CONSECUTIVE (v16hf, 16) +DEF_CONSECUTIVE (v32hf, 32) +DEF_CONSECUTIVE (v64hf, 64) +DEF_CONSECUTIVE (v128hf, 128) +DEF_CONSECUTIVE (v256hf, 256) +DEF_CONSECUTIVE (v512hf, 512) +DEF_CONSECUTIVE (v1024hf, 1024) +DEF_CONSECUTIVE (v2048hf, 2048) + +/* { dg-final { scan-assembler-not {vrgather\.vi\s+v[0-9]+,\s*v[0-9]+,\s*1} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h index b4148f29d8a..8dd5bcf617d 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/def.h @@ -833,3 +833,9 @@ typedef double v512df __attribute__ ((vector_size (4096))); a[i] = cond[i] ? (TYPE3) (b[i] >> shift) : a[i]; \ return a; \ } + +#define DEF_CONSECUTIVE(TYPE, NUM) \ + TYPE f##TYPE (TYPE a, TYPE b) \ + { \ + return __builtin_shufflevector (a, b, MASK_##NUM); \ + }