From patchwork Sat Dec 9 04:06:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 176117 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp5869388vqy; Fri, 8 Dec 2023 20:07:08 -0800 (PST) X-Google-Smtp-Source: AGHT+IHJWBmv2bHtULOVzJVlb31Jhr9/YstbkuViy2kDjuq19yQMMdFOQx815T8K5MK6aclL1Pdy X-Received: by 2002:a05:620a:2910:b0:77e:fba3:58ec with SMTP id m16-20020a05620a291000b0077efba358ecmr1380806qkp.125.1702094828752; Fri, 08 Dec 2023 20:07:08 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1702094828; cv=pass; d=google.com; s=arc-20160816; b=xbquUCd/jWdsivJ8TMd9dfQKKjcGCbRqZ7B0mBHW5S9QD1BW1cEG4qy5H/oGTrP6u+ 1b/SK3OpvGnm907umj+VhciO1EEOXwJWRU7Mi5Y3gI+OU/QkBwksje7fOyraIyMl2VTr 6XJqnrCsgm1No+bMlPYmAvqSG0xvwY7gSP3z4jRCKLJ1qhVSFRwkD2DrW80JIpp/PCJJ bHkO5kjc6X7PJUBlGR7K9bdiSQumLH6qo+KQqJwx6FyehO9A3Ysqz7BwrVVlYxpffevF mS8NDp8rwezMwZVgo6oi4Y9a+8qhHogpMmKhZbIjsjqudeTN1697bYB8591MU/5ASNs0 gOdw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:arc-filter:dmarc-filter:delivered-to; bh=fXGYui21a2YOHDgUS7Fq1OQHgodsdOG21Hmq7bj3irc=; fh=12MRPJmZ1mgDpHqWoogMKqnaGRGM2b7lcuJroqfjJiw=; b=o6tI1yqf9qPoPW2lAVRf7Ma56sKcVu94EgqigyGI7Ux5ucnJnJXteBLJpYe5zEBTqO OVOkW+dNYxo6fMcl2cU/vPdH9rClA5OHnRX/pcXhpqAtzlF6/1jQPf0DUdLEpQ+66otH +q7aGzeD7Ntj9PooVKxOhaSIQcjvW+vH9SA6TWu/YUGpBWBCcoEzoPO/CmlKng4fwzSi lWWtDaBYpw8XfLE0XSC9ZBcAuVRba5txkUd9VIlSvNcEd7yczJNVUDmwWJ6VyKujeRXd i1+Jjgrt3smZbznAAhtxvIrvT8wnAt2eeghvbrVcfbgiAsQWLF54y/z0IKBZSWd0tJLj 33HQ== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id ov11-20020a05620a628b00b0077d59d88b35si3704052qkn.666.2023.12.08.20.07.07 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Dec 2023 20:07:08 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CB8F03858420 for ; Sat, 9 Dec 2023 04:07:07 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgsg1.qq.com (smtpbgsg1.qq.com [54.254.200.92]) by sourceware.org (Postfix) with ESMTPS id 5F0293858CD1 for ; Sat, 9 Dec 2023 04:06:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5F0293858CD1 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5F0293858CD1 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=54.254.200.92 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702094804; cv=none; b=EuiezM7zCzVbYTtjQ6n9KaYLNivcP9TQggg1Xvi9zGUKibzyWm6KjH5RsO938mBLpqNq3Kf6n0kxAMM0Y3hNfiSrv7e8nBVobmYszbjDVcbmRyc0Iv9QRjlfM9rDqs9cf0Yi7c6wLcNI3hdCxm7pXlCWLX5GjS+ef8uEwQV/sK0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702094804; c=relaxed/simple; bh=3qutwyyJFaBTKmMJayZD8+aKbqYl6F032k70lezZ1bs=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=mgrlrqdzWCj6WlbhqZfDlJppVCV0DfY9VmZoxhg4Y6hBhxpnid0p8jXCYY41Pd0jE/BaYbvGqa2TP6KBFTKTDYPkm3tKcaMdRPh9mIkBcNt3uQ9iMfGjuqvie8RZe72L884fOikKS6KoOlPQDj7E+eELsF+MDLRg9BTYeTrbrrM= ARC-Authentication-Results: i=1; server2.sourceware.org X-QQ-mid: bizesmtp74t1702094791tanqof3g Received: from rios-cad121.hadoop.rioslab.org ( [58.60.1.9]) by bizesmtp.qq.com (ESMTP) with id ; Sat, 09 Dec 2023 12:06:30 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: 6/K5pWSRdGr+H0djXnzwcL4toarn60pX/CUnhS+9Vk5Vhj7WU1baeEXhyvdjt xPM9KKMsc/Uz604eXKiNXHolxbAvYkQ55TXPUP3FNMqCuG4638Wj6dFuJER6IjzlDQxZGFB vCCUTbD9pRhaWa7PPPWodjja2HDlxnQwJVJHdze0EhS/cvMsdZ0ae0Ijrr9iXhX+jx4jytu 9KwG5wnMgRr4CqrYC0QB1+APPi90coItRcFGnzlq/96iVueoTLYhbBuoY9wsT8vGrnT4Vr8 VOTlW5bb7bvWd77xfQjjy7Py81UosrtY3YbQlBMBL6ZMtWEB6b8GIKlhk+2k6N9UQkGHxIQ 0nSCLS3IpRVSWyfsIV8TwZVatKFsK1laAb1QSwtGMwLABbZ905vorEp6DA/h3+3QZ1ctuac X-QQ-GoodBg: 2 X-BIZMAIL-ID: 10538205646874917401 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, Juzhe-Zhong Subject: [PATCH] RISC-V: Support highest overlap for wv instructions Date: Sat, 9 Dec 2023 12:06:29 +0800 Message-Id: <20231209040629.1104489-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SCC_10_SHORT_WORD_LINES, SCC_20_SHORT_WORD_LINES, SCC_35_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784775787050954756 X-GMAIL-MSGID: 1784775787050954756 According to RVV ISA, we can allow vwadd.wv v2, v2, v3 overlap. Before this patch: nop vsetivli zero,4,e8,m4,tu,ma vle16.v v8,0(a0) vmv8r.v v0,v8 vwsub.wv v0,v8,v12 nop addi a4,a0,100 vle16.v v8,0(a4) vmv8r.v v24,v8 vwsub.wv v24,v8,v12 nop addi a4,a0,200 vle16.v v8,0(a4) vmv8r.v v16,v8 vwsub.wv v16,v8,v12 nop After this patch: nop vsetivli zero,4,e8,m4,tu,ma vle16.v v0,0(a0) vwsub.wv v0,v0,v4 nop addi a4,a0,100 vle16.v v24,0(a4) vwsub.wv v24,v24,v28 nop addi a4,a0,200 vle16.v v16,0(a4) vwsub.wv v16,v16,v20 PR target/112431 gcc/ChangeLog: * config/riscv/vector.md: Support highest overlap for wv instructions. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr112431-39.c: New test. * gcc.target/riscv/rvv/base/pr112431-40.c: New test. * gcc.target/riscv/rvv/base/pr112431-41.c: New test. --- gcc/config/riscv/vector.md | 88 +++++----- .../gcc.target/riscv/rvv/base/pr112431-39.c | 158 ++++++++++++++++++ .../gcc.target/riscv/rvv/base/pr112431-40.c | 94 +++++++++++ .../gcc.target/riscv/rvv/base/pr112431-41.c | 62 +++++++ 4 files changed, 360 insertions(+), 42 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-39.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-40.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-41.c diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index ba0714a9971..31c13a6dcca 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -3795,46 +3795,48 @@ (set_attr "group_overlap" "W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84,none,none")]) (define_insn "@pred_single_widen_sub" - [(set (match_operand:VWEXTI 0 "register_operand" "=&vr,&vr") + [(set (match_operand:VWEXTI 0 "register_operand" "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?&vr, ?&vr") (if_then_else:VWEXTI (unspec: - [(match_operand: 1 "vector_mask_operand" "vmWc1,vmWc1") - (match_operand 5 "vector_length_operand" " rK, rK") - (match_operand 6 "const_int_operand" " i, i") - (match_operand 7 "const_int_operand" " i, i") - (match_operand 8 "const_int_operand" " i, i") + [(match_operand: 1 "vector_mask_operand" " vm,Wc1, vm,Wc1, vm,Wc1, vm,Wc1, vm,Wc1, vm,Wc1,vmWc1,vmWc1") + (match_operand 5 "vector_length_operand" " rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK") + (match_operand 6 "const_int_operand" " i, i, i, i, i, i, i, i, i, i, i, i, i, i") + (match_operand 7 "const_int_operand" " i, i, i, i, i, i, i, i, i, i, i, i, i, i") + (match_operand 8 "const_int_operand" " i, i, i, i, i, i, i, i, i, i, i, i, i, i") (reg:SI VL_REGNUM) (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE) (minus:VWEXTI - (match_operand:VWEXTI 3 "register_operand" " vr, vr") + (match_operand:VWEXTI 3 "register_operand" " vr, vr, vr, vr, vr, vr, vr, vr, vr, vr, vr, vr, vr, vr") (any_extend:VWEXTI - (match_operand: 4 "register_operand" " vr, vr"))) - (match_operand:VWEXTI 2 "vector_merge_operand" " vu, 0")))] + (match_operand: 4 "register_operand" "W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84, vr, vr"))) + (match_operand:VWEXTI 2 "vector_merge_operand" " vu, vu, 0, 0, vu, vu, 0, 0, vu, vu, 0, 0, vu, 0")))] "TARGET_VECTOR" "vwsub.wv\t%0,%3,%4%p1" [(set_attr "type" "viwalu") - (set_attr "mode" "")]) + (set_attr "mode" "") + (set_attr "group_overlap" "W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84,none,none")]) (define_insn "@pred_single_widen_add" - [(set (match_operand:VWEXTI 0 "register_operand" "=&vr,&vr") + [(set (match_operand:VWEXTI 0 "register_operand" "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?&vr, ?&vr") (if_then_else:VWEXTI (unspec: - [(match_operand: 1 "vector_mask_operand" "vmWc1,vmWc1") - (match_operand 5 "vector_length_operand" " rK, rK") - (match_operand 6 "const_int_operand" " i, i") - (match_operand 7 "const_int_operand" " i, i") - (match_operand 8 "const_int_operand" " i, i") + [(match_operand: 1 "vector_mask_operand" " vm,Wc1, vm,Wc1, vm,Wc1, vm,Wc1, vm,Wc1, vm,Wc1,vmWc1,vmWc1") + (match_operand 5 "vector_length_operand" " rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK") + (match_operand 6 "const_int_operand" " i, i, i, i, i, i, i, i, i, i, i, i, i, i") + (match_operand 7 "const_int_operand" " i, i, i, i, i, i, i, i, i, i, i, i, i, i") + (match_operand 8 "const_int_operand" " i, i, i, i, i, i, i, i, i, i, i, i, i, i") (reg:SI VL_REGNUM) (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE) (plus:VWEXTI (any_extend:VWEXTI - (match_operand: 4 "register_operand" " vr, vr")) - (match_operand:VWEXTI 3 "register_operand" " vr, vr")) - (match_operand:VWEXTI 2 "vector_merge_operand" " vu, 0")))] + (match_operand: 4 "register_operand" "W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84, vr, vr")) + (match_operand:VWEXTI 3 "register_operand" " vr, vr, vr, vr, vr, vr, vr, vr, vr, vr, vr, vr, vr, vr")) + (match_operand:VWEXTI 2 "vector_merge_operand" " vu, vu, 0, 0, vu, vu, 0, 0, vu, vu, 0, 0, vu, 0")))] "TARGET_VECTOR" "vwadd.wv\t%0,%3,%4%p1" [(set_attr "type" "viwalu") - (set_attr "mode" "")]) + (set_attr "mode" "") + (set_attr "group_overlap" "W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84,none,none")]) (define_insn "@pred_single_widen__scalar" [(set (match_operand:VWEXTI 0 "register_operand" "=vr, vr") @@ -7073,54 +7075,56 @@ (set_attr "group_overlap" "W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84,none,none")]) (define_insn "@pred_single_widen_add" - [(set (match_operand:VWEXTF 0 "register_operand" "=&vr, &vr") + [(set (match_operand:VWEXTF 0 "register_operand" "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?&vr, ?&vr") (if_then_else:VWEXTF (unspec: - [(match_operand: 1 "vector_mask_operand" "vmWc1,vmWc1") - (match_operand 5 "vector_length_operand" " rK, rK") - (match_operand 6 "const_int_operand" " i, i") - (match_operand 7 "const_int_operand" " i, i") - (match_operand 8 "const_int_operand" " i, i") - (match_operand 9 "const_int_operand" " i, i") + [(match_operand: 1 "vector_mask_operand" " vm,Wc1, vm,Wc1, vm,Wc1, vm,Wc1, vm,Wc1, vm,Wc1,vmWc1,vmWc1") + (match_operand 5 "vector_length_operand" " rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK") + (match_operand 6 "const_int_operand" " i, i, i, i, i, i, i, i, i, i, i, i, i, i") + (match_operand 7 "const_int_operand" " i, i, i, i, i, i, i, i, i, i, i, i, i, i") + (match_operand 8 "const_int_operand" " i, i, i, i, i, i, i, i, i, i, i, i, i, i") + (match_operand 9 "const_int_operand" " i, i, i, i, i, i, i, i, i, i, i, i, i, i") (reg:SI VL_REGNUM) (reg:SI VTYPE_REGNUM) (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE) (plus:VWEXTF (float_extend:VWEXTF - (match_operand: 4 "register_operand" " vr, vr")) - (match_operand:VWEXTF 3 "register_operand" " vr, vr")) - (match_operand:VWEXTF 2 "vector_merge_operand" " vu, 0")))] + (match_operand: 4 "register_operand" "W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84, vr, vr")) + (match_operand:VWEXTF 3 "register_operand" " vr, vr, vr, vr, vr, vr, vr, vr, vr, vr, vr, vr, vr, vr")) + (match_operand:VWEXTF 2 "vector_merge_operand" " vu, vu, 0, 0, vu, vu, 0, 0, vu, vu, 0, 0, vu, 0")))] "TARGET_VECTOR" "vfwadd.wv\t%0,%3,%4%p1" [(set_attr "type" "vfwalu") (set_attr "mode" "") (set (attr "frm_mode") - (symbol_ref "riscv_vector::get_frm_mode (operands[9])"))]) + (symbol_ref "riscv_vector::get_frm_mode (operands[9])")) + (set_attr "group_overlap" "W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84,none,none")]) (define_insn "@pred_single_widen_sub" - [(set (match_operand:VWEXTF 0 "register_operand" "=&vr, &vr") + [(set (match_operand:VWEXTF 0 "register_operand" "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?&vr, ?&vr") (if_then_else:VWEXTF (unspec: - [(match_operand: 1 "vector_mask_operand" "vmWc1,vmWc1") - (match_operand 5 "vector_length_operand" " rK, rK") - (match_operand 6 "const_int_operand" " i, i") - (match_operand 7 "const_int_operand" " i, i") - (match_operand 8 "const_int_operand" " i, i") - (match_operand 9 "const_int_operand" " i, i") + [(match_operand: 1 "vector_mask_operand" " vm,Wc1, vm,Wc1, vm,Wc1, vm,Wc1, vm,Wc1, vm,Wc1,vmWc1,vmWc1") + (match_operand 5 "vector_length_operand" " rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK, rK") + (match_operand 6 "const_int_operand" " i, i, i, i, i, i, i, i, i, i, i, i, i, i") + (match_operand 7 "const_int_operand" " i, i, i, i, i, i, i, i, i, i, i, i, i, i") + (match_operand 8 "const_int_operand" " i, i, i, i, i, i, i, i, i, i, i, i, i, i") + (match_operand 9 "const_int_operand" " i, i, i, i, i, i, i, i, i, i, i, i, i, i") (reg:SI VL_REGNUM) (reg:SI VTYPE_REGNUM) (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE) (minus:VWEXTF - (match_operand:VWEXTF 3 "register_operand" " vr, vr") + (match_operand:VWEXTF 3 "register_operand" " vr, vr, vr, vr, vr, vr, vr, vr, vr, vr, vr, vr, vr, vr") (float_extend:VWEXTF - (match_operand: 4 "register_operand" " vr, vr"))) - (match_operand:VWEXTF 2 "vector_merge_operand" " vu, 0")))] + (match_operand: 4 "register_operand" "W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84, vr, vr"))) + (match_operand:VWEXTF 2 "vector_merge_operand" " vu, vu, 0, 0, vu, vu, 0, 0, vu, vu, 0, 0, vu, 0")))] "TARGET_VECTOR" "vfwsub.wv\t%0,%3,%4%p1" [(set_attr "type" "vfwalu") (set_attr "mode" "") (set (attr "frm_mode") - (symbol_ref "riscv_vector::get_frm_mode (operands[9])"))]) + (symbol_ref "riscv_vector::get_frm_mode (operands[9])")) + (set_attr "group_overlap" "W21,W21,W21,W21,W42,W42,W42,W42,W84,W84,W84,W84,none,none")]) (define_insn "@pred_single_widen__scalar" [(set (match_operand:VWEXTF 0 "register_operand" "=vr, vr") diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-39.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-39.c new file mode 100644 index 00000000000..47820dd29f0 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-39.c @@ -0,0 +1,158 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */ + +#include "riscv_vector.h" + +void +foo (void *in, void *out, int n) +{ + for (int i = 0; i < n; i++) + { + asm volatile("nop" ::: "memory"); + vint16m2_t v0 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v0 = __riscv_vwsub_wv_i16m2_tu (v0, v0, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v0, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v1 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v1 = __riscv_vwsub_wv_i16m2_tu (v1, v1, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v1, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v2 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v2 = __riscv_vwsub_wv_i16m2_tu (v2, v2, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v2, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v3 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v3 = __riscv_vwsub_wv_i16m2_tu (v3, v3, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v3, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v4 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v4 = __riscv_vwsub_wv_i16m2_tu (v4, v4, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v4, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v5 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v5 = __riscv_vwsub_wv_i16m2_tu (v5, v5, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v5, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v6 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v6 = __riscv_vwsub_wv_i16m2_tu (v6, v6, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v6, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v7 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v7 = __riscv_vwsub_wv_i16m2_tu (v7, v7, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v7, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v8 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v8 = __riscv_vwsub_wv_i16m2_tu (v8, v8, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v8, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v9 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v9 = __riscv_vwsub_wv_i16m2_tu (v9, v9, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v9, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v10 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v10 = __riscv_vwsub_wv_i16m2_tu (v10, v10, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v10, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v11 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v11 = __riscv_vwsub_wv_i16m2_tu (v11, v11, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v11, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v12 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v12 = __riscv_vwsub_wv_i16m2_tu (v12, v12, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v12, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v13 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v13 = __riscv_vwsub_wv_i16m2_tu (v13, v13, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v13, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v14 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v14 = __riscv_vwsub_wv_i16m2_tu (v14, v14, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v14, 1)), 4); + asm volatile("nop" ::: "memory"); + vint8m1_t v15_n = __riscv_vle8_v_i8m1 (in, 4);in+=100; + vint16m2_t v15 = __riscv_vwcvt_x_x_v_i16m2 (v15_n, 4); + + asm volatile("nop" ::: "memory"); + __riscv_vsse16_v_i16m2 (out, 4, v0, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v1, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v2, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v3, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v4, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v5, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v6, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v7, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v8, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v9, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v10, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v11, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v12, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v13, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v14, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v15, 4);out+=100; + } +} + +void +foo2 (void *in, void *out, int n) +{ + for (int i = 0; i < n; i++) + { + asm volatile("nop" ::: "memory"); + vint16m2_t v0 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v0 = __riscv_vwadd_wv_i16m2_tu (v0, v0, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v0, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v1 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v1 = __riscv_vwadd_wv_i16m2_tu (v1, v1, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v1, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v2 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v2 = __riscv_vwadd_wv_i16m2_tu (v2, v2, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v2, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v3 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v3 = __riscv_vwadd_wv_i16m2_tu (v3, v3, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v3, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v4 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v4 = __riscv_vwadd_wv_i16m2_tu (v4, v4, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v4, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v5 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v5 = __riscv_vwadd_wv_i16m2_tu (v5, v5, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v5, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v6 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v6 = __riscv_vwadd_wv_i16m2_tu (v6, v6, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v6, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v7 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v7 = __riscv_vwadd_wv_i16m2_tu (v7, v7, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v7, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v8 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v8 = __riscv_vwadd_wv_i16m2_tu (v8, v8, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v8, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v9 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v9 = __riscv_vwadd_wv_i16m2_tu (v9, v9, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v9, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v10 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v10 = __riscv_vwadd_wv_i16m2_tu (v10, v10, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v10, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v11 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v11 = __riscv_vwadd_wv_i16m2_tu (v11, v11, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v11, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v12 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v12 = __riscv_vwadd_wv_i16m2_tu (v12, v12, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v12, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v13 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v13 = __riscv_vwadd_wv_i16m2_tu (v13, v13, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v13, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m2_t v14 = __riscv_vle16_v_i16m2 (in, 4);in+=100; + v14 = __riscv_vwadd_wv_i16m2_tu (v14, v14, __riscv_vreinterpret_v_i16m1_i8m1 (__riscv_vget_v_i16m2_i16m1 (v14, 1)), 4); + asm volatile("nop" ::: "memory"); + vint8m1_t v15_n = __riscv_vle8_v_i8m1 (in, 4);in+=100; + vint16m2_t v15 = __riscv_vwcvt_x_x_v_i16m2 (v15_n, 4); + + asm volatile("nop" ::: "memory"); + __riscv_vsse16_v_i16m2 (out, 4, v0, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v1, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v2, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v3, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v4, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v5, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v6, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v7, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v8, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v9, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v10, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v11, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v12, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v13, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v14, 4);out+=100; + __riscv_vsse16_v_i16m2 (out, 4, v15, 4);out+=100; + } +} + +/* { dg-final { scan-assembler-not {vmv1r} } } */ +/* { dg-final { scan-assembler-not {vmv2r} } } */ +/* { dg-final { scan-assembler-not {vmv4r} } } */ +/* { dg-final { scan-assembler-not {vmv8r} } } */ +/* { dg-final { scan-assembler-not {csrr} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-40.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-40.c new file mode 100644 index 00000000000..e44b8010579 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-40.c @@ -0,0 +1,94 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */ + +#include "riscv_vector.h" + +void +foo (void *in, void *out, int n) +{ + for (int i = 0; i < n; i++) + { + asm volatile("nop" ::: "memory"); + vint16m4_t v0 = __riscv_vle16_v_i16m4 (in, 4);in+=100; + v0 = __riscv_vwsub_wv_i16m4_tu (v0, v0, __riscv_vreinterpret_v_i16m2_i8m2 (__riscv_vget_v_i16m4_i16m2 (v0, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m4_t v1 = __riscv_vle16_v_i16m4 (in, 4);in+=100; + v1 = __riscv_vwsub_wv_i16m4_tu (v1, v1, __riscv_vreinterpret_v_i16m2_i8m2 (__riscv_vget_v_i16m4_i16m2 (v1, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m4_t v2 = __riscv_vle16_v_i16m4 (in, 4);in+=100; + v2 = __riscv_vwsub_wv_i16m4_tu (v2, v2, __riscv_vreinterpret_v_i16m2_i8m2 (__riscv_vget_v_i16m4_i16m2 (v2, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m4_t v3 = __riscv_vle16_v_i16m4 (in, 4);in+=100; + v3 = __riscv_vwsub_wv_i16m4_tu (v3, v3, __riscv_vreinterpret_v_i16m2_i8m2 (__riscv_vget_v_i16m4_i16m2 (v3, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m4_t v4 = __riscv_vle16_v_i16m4 (in, 4);in+=100; + v4 = __riscv_vwsub_wv_i16m4_tu (v4, v4, __riscv_vreinterpret_v_i16m2_i8m2 (__riscv_vget_v_i16m4_i16m2 (v4, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m4_t v5 = __riscv_vle16_v_i16m4 (in, 4);in+=100; + v5 = __riscv_vwsub_wv_i16m4_tu (v5, v5, __riscv_vreinterpret_v_i16m2_i8m2 (__riscv_vget_v_i16m4_i16m2 (v5, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m4_t v6 = __riscv_vle16_v_i16m4 (in, 4);in+=100; + v6 = __riscv_vwsub_wv_i16m4_tu (v6, v6, __riscv_vreinterpret_v_i16m2_i8m2 (__riscv_vget_v_i16m4_i16m2 (v6, 1)), 4); + asm volatile("nop" ::: "memory"); + vint8m2_t v7_n = __riscv_vle8_v_i8m2 (in, 4);in+=100; + vint16m4_t v7 = __riscv_vwcvt_x_x_v_i16m4 (v7_n, 4); + + asm volatile("nop" ::: "memory"); + __riscv_vsse16_v_i16m4 (out, 4, v0, 4);out+=100; + __riscv_vsse16_v_i16m4 (out, 4, v1, 4);out+=100; + __riscv_vsse16_v_i16m4 (out, 4, v2, 4);out+=100; + __riscv_vsse16_v_i16m4 (out, 4, v3, 4);out+=100; + __riscv_vsse16_v_i16m4 (out, 4, v4, 4);out+=100; + __riscv_vsse16_v_i16m4 (out, 4, v5, 4);out+=100; + __riscv_vsse16_v_i16m4 (out, 4, v6, 4);out+=100; + __riscv_vsse16_v_i16m4 (out, 4, v7, 4);out+=100; + } +} + +void +foo2 (void *in, void *out, int n) +{ + for (int i = 0; i < n; i++) + { + asm volatile("nop" ::: "memory"); + vint16m4_t v0 = __riscv_vle16_v_i16m4 (in, 4);in+=100; + v0 = __riscv_vwadd_wv_i16m4_tu (v0, v0, __riscv_vreinterpret_v_i16m2_i8m2 (__riscv_vget_v_i16m4_i16m2 (v0, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m4_t v1 = __riscv_vle16_v_i16m4 (in, 4);in+=100; + v1 = __riscv_vwadd_wv_i16m4_tu (v1, v1, __riscv_vreinterpret_v_i16m2_i8m2 (__riscv_vget_v_i16m4_i16m2 (v1, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m4_t v2 = __riscv_vle16_v_i16m4 (in, 4);in+=100; + v2 = __riscv_vwadd_wv_i16m4_tu (v2, v2, __riscv_vreinterpret_v_i16m2_i8m2 (__riscv_vget_v_i16m4_i16m2 (v2, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m4_t v3 = __riscv_vle16_v_i16m4 (in, 4);in+=100; + v3 = __riscv_vwadd_wv_i16m4_tu (v3, v3, __riscv_vreinterpret_v_i16m2_i8m2 (__riscv_vget_v_i16m4_i16m2 (v3, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m4_t v4 = __riscv_vle16_v_i16m4 (in, 4);in+=100; + v4 = __riscv_vwadd_wv_i16m4_tu (v4, v4, __riscv_vreinterpret_v_i16m2_i8m2 (__riscv_vget_v_i16m4_i16m2 (v4, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m4_t v5 = __riscv_vle16_v_i16m4 (in, 4);in+=100; + v5 = __riscv_vwadd_wv_i16m4_tu (v5, v5, __riscv_vreinterpret_v_i16m2_i8m2 (__riscv_vget_v_i16m4_i16m2 (v5, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m4_t v6 = __riscv_vle16_v_i16m4 (in, 4);in+=100; + v6 = __riscv_vwadd_wv_i16m4_tu (v6, v6, __riscv_vreinterpret_v_i16m2_i8m2 (__riscv_vget_v_i16m4_i16m2 (v6, 1)), 4); + asm volatile("nop" ::: "memory"); + vint8m2_t v7_n = __riscv_vle8_v_i8m2 (in, 4);in+=100; + vint16m4_t v7 = __riscv_vwcvt_x_x_v_i16m4 (v7_n, 4); + + asm volatile("nop" ::: "memory"); + __riscv_vsse16_v_i16m4 (out, 4, v0, 4);out+=100; + __riscv_vsse16_v_i16m4 (out, 4, v1, 4);out+=100; + __riscv_vsse16_v_i16m4 (out, 4, v2, 4);out+=100; + __riscv_vsse16_v_i16m4 (out, 4, v3, 4);out+=100; + __riscv_vsse16_v_i16m4 (out, 4, v4, 4);out+=100; + __riscv_vsse16_v_i16m4 (out, 4, v5, 4);out+=100; + __riscv_vsse16_v_i16m4 (out, 4, v6, 4);out+=100; + __riscv_vsse16_v_i16m4 (out, 4, v7, 4);out+=100; + } +} + +/* { dg-final { scan-assembler-not {vmv1r} } } */ +/* { dg-final { scan-assembler-not {vmv2r} } } */ +/* { dg-final { scan-assembler-not {vmv4r} } } */ +/* { dg-final { scan-assembler-not {vmv8r} } } */ +/* { dg-final { scan-assembler-not {csrr} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-41.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-41.c new file mode 100644 index 00000000000..dc27006f6f9 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-41.c @@ -0,0 +1,62 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */ + +#include "riscv_vector.h" + +void +foo (void *in, void *out, int n) +{ + for (int i = 0; i < n; i++) + { + asm volatile("nop" ::: "memory"); + vint16m8_t v0 = __riscv_vle16_v_i16m8 (in, 4);in+=100; + v0 = __riscv_vwsub_wv_i16m8_tu (v0, v0, __riscv_vreinterpret_v_i16m4_i8m4 (__riscv_vget_v_i16m8_i16m4 (v0, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m8_t v1 = __riscv_vle16_v_i16m8 (in, 4);in+=100; + v1 = __riscv_vwsub_wv_i16m8_tu (v1, v1, __riscv_vreinterpret_v_i16m4_i8m4 (__riscv_vget_v_i16m8_i16m4 (v1, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m8_t v2 = __riscv_vle16_v_i16m8 (in, 4);in+=100; + v2 = __riscv_vwsub_wv_i16m8_tu (v2, v2, __riscv_vreinterpret_v_i16m4_i8m4 (__riscv_vget_v_i16m8_i16m4 (v2, 1)), 4); + asm volatile("nop" ::: "memory"); + vint8m4_t v3_n = __riscv_vle8_v_i8m4 (in, 4);in+=100; + vint16m8_t v3 = __riscv_vwcvt_x_x_v_i16m8 (v3_n, 4); + + asm volatile("nop" ::: "memory"); + __riscv_vsse16_v_i16m8 (out, 4, v0, 4);out+=100; + __riscv_vsse16_v_i16m8 (out, 4, v1, 4);out+=100; + __riscv_vsse16_v_i16m8 (out, 4, v2, 4);out+=100; + __riscv_vsse16_v_i16m8 (out, 4, v3, 4);out+=100; + } +} + +void +foo2 (void *in, void *out, int n) +{ + for (int i = 0; i < n; i++) + { + asm volatile("nop" ::: "memory"); + vint16m8_t v0 = __riscv_vle16_v_i16m8 (in, 4);in+=100; + v0 = __riscv_vwadd_wv_i16m8_tu (v0, v0, __riscv_vreinterpret_v_i16m4_i8m4 (__riscv_vget_v_i16m8_i16m4 (v0, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m8_t v1 = __riscv_vle16_v_i16m8 (in, 4);in+=100; + v1 = __riscv_vwadd_wv_i16m8_tu (v1, v1, __riscv_vreinterpret_v_i16m4_i8m4 (__riscv_vget_v_i16m8_i16m4 (v1, 1)), 4); + asm volatile("nop" ::: "memory"); + vint16m8_t v2 = __riscv_vle16_v_i16m8 (in, 4);in+=100; + v2 = __riscv_vwadd_wv_i16m8_tu (v2, v2, __riscv_vreinterpret_v_i16m4_i8m4 (__riscv_vget_v_i16m8_i16m4 (v2, 1)), 4); + asm volatile("nop" ::: "memory"); + vint8m4_t v3_n = __riscv_vle8_v_i8m4 (in, 4);in+=100; + vint16m8_t v3 = __riscv_vwcvt_x_x_v_i16m8 (v3_n, 4); + + asm volatile("nop" ::: "memory"); + __riscv_vsse16_v_i16m8 (out, 4, v0, 4);out+=100; + __riscv_vsse16_v_i16m8 (out, 4, v1, 4);out+=100; + __riscv_vsse16_v_i16m8 (out, 4, v2, 4);out+=100; + __riscv_vsse16_v_i16m8 (out, 4, v3, 4);out+=100; + } +} + +/* { dg-final { scan-assembler-not {vmv1r} } } */ +/* { dg-final { scan-assembler-not {vmv2r} } } */ +/* { dg-final { scan-assembler-not {vmv4r} } } */ +/* { dg-final { scan-assembler-not {vmv8r} } } */ +/* { dg-final { scan-assembler-not {csrr} } } */