From patchwork Mon Dec 4 08:51:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 173140 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp2629555vqy; Mon, 4 Dec 2023 00:51:52 -0800 (PST) X-Google-Smtp-Source: AGHT+IFrhwKfxSvef9gUXZ9FTSwkADWOQr80ocmHGxaZHhj9oM4v69BR9nB7YLS4bF6cyhidFqf9 X-Received: by 2002:a05:622a:1805:b0:421:b997:968c with SMTP id t5-20020a05622a180500b00421b997968cmr5458596qtc.24.1701679912389; Mon, 04 Dec 2023 00:51:52 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1701679912; cv=pass; d=google.com; s=arc-20160816; b=dnJkjKltDGqdANbcI+tdCp1LeXUuZnucn2QAzz42tBx1Jq3NowYmjqZTgFt9wxsCEh ukxETTUKQX7uzJuImXA3PH3f9WEx/wViYs2ND8DKcVJi1JyGquLAUtoeAjhHHI/ihJEf gs8vzp81tMyWL6woK8qcvHCt72bYIZ4n1Sn+8s0Xrzu0szjp6dVi+dGkGH2+zXp0xMR6 QhfZ5Zi7xW/PCDZJ7QbHstqaJ0tTqYdumiIgzHoJmkYl/qTuBG2ZQS9aJYxMiDqP3txi NLA2o1v31cx6hGYaFro/JbPagLXDQYP8UhXaFFUYOUcM5sS+WrrFZAmoirv1C7d0xZBT nOTQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:arc-filter:dmarc-filter:delivered-to; bh=WEo+qrGU2hos71hQx5GAR0f8z9eaxOQuaf87fkGjPbY=; fh=12MRPJmZ1mgDpHqWoogMKqnaGRGM2b7lcuJroqfjJiw=; b=PT4Ev94TFtGOP2kMFu4iu6KTjyoUiXkwpP2KOIx4k3lP0qnEluKlSiim+JfSWStnER lfw8kfGTeVdtxeVAG5wOyCCJ/S6LJnicXIqxR8ie+PhqRZteLGGcrrRZpbsyAzGRlenW BaVcKWqBSwgD3QcDBoVmBZ6OAM82iEKSNXOECcJg30znRdxJWKVtDcVwSo7y+fbQ7hIt XVZZRItYqAIMajx0q0zNieWux4AavhQIDHsQbEHi7MvQYN6mSEYkOCEVlpfYXM+2aNOG 9Q1qLLmOAgKBjFTUBLqvIsk9cUIUQUN5XaaP3P3J4AoNGw72GD3G31ls0QLxXLFWcatp hDjQ== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id m16-20020ac86890000000b004214ba70700si8120715qtq.408.2023.12.04.00.51.52 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 04 Dec 2023 00:51:52 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 27F2A3857710 for ; Mon, 4 Dec 2023 08:51:52 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbg153.qq.com (smtpbg153.qq.com [13.245.218.24]) by sourceware.org (Postfix) with ESMTPS id 477793858C5E for ; Mon, 4 Dec 2023 08:51:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 477793858C5E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 477793858C5E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=13.245.218.24 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701679883; cv=none; b=bv0YJf3EWLafMnCFjl5o7R79lij+V2GIq+Usgn+zZgBl1B/pPFemJbGlbDL0Z0IvjDc6E6pF362/qEhzcKOiHaXmLn4AFjSHnpMHF6LCrME9LJeHNeWWT1TfzrmWecCY4iYMRDAQkvkN+wgFW2bWpbus+2g32oOkBuJmldFdnMg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701679883; c=relaxed/simple; bh=HsuGNSTCijrYRGjDMu+wNKNle9j2xVbMcibKDMJ1oqo=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=Il3goIS58s6CmunbOsLpFLOS3BoDPJJgs+gOC3iKj3ktyC+1cGet53wktMZ39s0d1XMmNgF+XAxKPBaizZp+9DY65IzNF0XaJPoN0z/y3j9nHV41PUdrhAHfch+Pg8INYyh95h0xsZOQpNxM7xy6kOehEg3XacMqFuVuRK0Dt/M= ARC-Authentication-Results: i=1; server2.sourceware.org X-QQ-mid: bizesmtp70t1701679868tdvhn25u Received: from rios-cad121.hadoop.rioslab.org ( [58.60.1.9]) by bizesmtp.qq.com (ESMTP) with id ; Mon, 04 Dec 2023 16:51:07 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: 3M0okmaRx3j6TIkapngsWlM5DtZm1Obm9sx1m1KE0nLGYgup62pkpLK41f0TA CS4jYN/RTWjRxL7dp+0aDmKGaui3qNUFMSZYIvoDVfC5Fst7+MO8i+6DheeUm4fWF+7O9lC axh9atWssN21JNuP9jQ4qO/9pP1aJUPo/HkcR8S9sEK8AaiT4pfE9H2e44U20ZPbRBTW9Ob yGbshqfYXkogSWaYbc4jEJLRxh4RPPWMsDBwhfki7sSvd5/lrtv3vecKO+XVPXFczM3erdz NkLSvGigW8xrnR21hWvJ1Ene20551Umvuf0gQsZnYSZpXQhPw4tg3Kup3ea7BvKYp2FIHAT Uu2pSfWbfqdMDi2asU8bq5DcI9feOTr9JdcPT1bYt1S+1zmBg8F6y0qWFsBhrI2qJZfXgfz FQXFTfJ34u9ZCj2+SLDI6g== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 2712120611328746950 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, Juzhe-Zhong Subject: [PATCH] RISC-V: Remove earlyclobber from widen reduction Date: Mon, 4 Dec 2023 16:51:06 +0800 Message-Id: <20231204085106.400729-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784340715513521048 X-GMAIL-MSGID: 1784340715513521048 Since the destination of reduction is not a vector register group, there is no need to apply overlap constraint. Also confirm Clang: The mir in LLVM has early clobber: early-clobber %49:vrm2 = PseudoVWADD_VX_M1 $noreg(tied-def 0), killed %17:vr, %48:gpr, %0:gprnox0, 3, 0; example.c:59:24 The mir in LLVM doesn't have early clobber: %48:vr = PseudoVWREDSUM_VS_M2_E8 $noreg(tied-def 0), %17:vrm2, killed %33:vr, %0:gprnox0, 3, 1; example.c:60:26 And also confirm both: vwredsum.vs v24, v8, v24 and vwredsum.vs v8, v8, v24 all legal on LLVM. Align with LLVM and honor RISC-V V spec, remove earlyclobber. Before this patch: vwredsum.vs v8,v24,v8 vwredsum.vs v7,v22,v7 vwredsum.vs v6,v20,v6 vwredsum.vs v5,v18,v5 vwredsum.vs v4,v16,v4 vwredsum.vs v3,v14,v3 vwredsum.vs v2,v12,v2 vwredsum.vs v1,v10,v1 vmv1r.v v9,v8 vwredsum.vs v9,v24,v9 vmv1r.v v24,v7 vwredsum.vs v24,v22,v24 vmv1r.v v22,v6 vwredsum.vs v22,v20,v22 vmv1r.v v20,v5 vwredsum.vs v20,v18,v20 vmv1r.v v18,v4 vwredsum.vs v18,v16,v18 vmv1r.v v16,v3 vwredsum.vs v16,v14,v16 vmv1r.v v14,v2 vwredsum.vs v14,v12,v14 vmv1r.v v12,v1 vwredsum.vs v12,v10,v12 After this patch: vfwredusum.vs v17,v12,v17 vfwredusum.vs v18,v10,v18 vfwredusum.vs v15,v26,v15 vfwredusum.vs v16,v24,v16 vfwredusum.vs v12,v12,v17 vfwredusum.vs v10,v10,v18 vfwredusum.vs v13,v6,v20 vfwredusum.vs v11,v8,v19 vfwredusum.vs v6,v6,v13 vfwredusum.vs v8,v8,v11 vfwredusum.vs v7,v4,v21 vfwredusum.vs v9,v2,v22 vfwredusum.vs v14,v26,v15 vfwredusum.vs v1,v24,v16 vfwredusum.vs v4,v4,v7 vfwredusum.vs v2,v2,v9 Same behavior as LLVM, and honor RISC-V V spec. PR 112431 gcc/ChangeLog: * config/riscv/vector.md: Remove earlyclobber from widen reduction. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/pr112431-35.c: New test. * gcc.target/riscv/rvv/base/pr112431-36.c: New test. --- gcc/config/riscv/vector.md | 8 +- .../gcc.target/riscv/rvv/base/pr112431-35.c | 107 ++++++++++++++++++ .../gcc.target/riscv/rvv/base/pr112431-36.c | 107 ++++++++++++++++++ 3 files changed, 218 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-35.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-36.c diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index 731057416cd..72cf3553e45 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -7861,7 +7861,7 @@ ;; Integer Widen Reduction Sum (vwredsum[u].vs) (define_insn "@pred_" - [(set (match_operand: 0 "register_operand" "=&vr,&vr") + [(set (match_operand: 0 "register_operand" "=vr, vr") (unspec: [(unspec: [(match_operand: 1 "vector_mask_operand" "vmWc1,vmWc1") @@ -7872,7 +7872,7 @@ (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE) (unspec: [ (match_operand:VI_QHS 3 "register_operand" " vr, vr") - (match_operand: 4 "register_operand" " vr0, vr0") + (match_operand: 4 "register_operand" " vr, vr") ] ANY_WREDUC) (match_operand: 2 "vector_merge_operand" " vu, 0")] UNSPEC_REDUC))] "TARGET_VECTOR" @@ -7928,7 +7928,7 @@ ;; Float Widen Reduction Sum (vfwred[ou]sum.vs) (define_insn "@pred_" - [(set (match_operand: 0 "register_operand" "=&vr, &vr") + [(set (match_operand: 0 "register_operand" "=vr, vr") (unspec: [(unspec: [(match_operand: 1 "vector_mask_operand" "vmWc1,vmWc1") @@ -7941,7 +7941,7 @@ (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE) (unspec: [ (match_operand:VF_HS 3 "register_operand" " vr, vr") - (match_operand: 4 "register_operand" " vr0, vr0") + (match_operand: 4 "register_operand" " vr, vr") ] ANY_FWREDUC_SUM) (match_operand: 2 "vector_merge_operand" " vu, 0")] UNSPEC_REDUC))] "TARGET_VECTOR" diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-35.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-35.c new file mode 100644 index 00000000000..6f72e93aa38 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-35.c @@ -0,0 +1,107 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */ + +#include "riscv_vector.h" + +size_t __attribute__ ((noinline)) +sumation (size_t sum0, size_t sum1, size_t sum2, size_t sum3, size_t sum4, + size_t sum5, size_t sum6, size_t sum7, + size_t sum0_2, size_t sum1_2, size_t sum2_2, size_t sum3_2, size_t sum4_2, + size_t sum5_2, size_t sum6_2, size_t sum7_2) +{ + return sum0 + sum1 + sum2 + sum3 + sum4 + sum5 + sum6 + sum7 + + sum0_2 + sum1_2 + sum2_2 + sum3_2 + sum4_2 + sum5_2 + sum6_2 + sum7_2; +} + +size_t +foo (char const *buf, size_t len) +{ + size_t sum = 0; + size_t vl = __riscv_vsetvlmax_e8m8 (); + size_t step = vl * 4; + const char *it = buf, *end = buf + len; + for (; it + step <= end;) + { + vint8m2_t v0 = __riscv_vle8_v_i8m2 ((void *) it, vl); + it += vl; + vint8m2_t v1 = __riscv_vle8_v_i8m2 ((void *) it, vl); + it += vl; + vint8m2_t v2 = __riscv_vle8_v_i8m2 ((void *) it, vl); + it += vl; + vint8m2_t v3 = __riscv_vle8_v_i8m2 ((void *) it, vl); + it += vl; + vint8m2_t v4 = __riscv_vle8_v_i8m2 ((void *) it, vl); + it += vl; + vint8m2_t v5 = __riscv_vle8_v_i8m2 ((void *) it, vl); + it += vl; + vint8m2_t v6 = __riscv_vle8_v_i8m2 ((void *) it, vl); + it += vl; + vint8m2_t v7 = __riscv_vle8_v_i8m2 ((void *) it, vl); + it += vl; + + vint16m1_t vw0 = __riscv_vle16_v_i16m1 ((void *) it, vl); + it += vl; + vint16m1_t vw1 = __riscv_vle16_v_i16m1 ((void *) it, vl); + it += vl; + vint16m1_t vw2 = __riscv_vle16_v_i16m1 ((void *) it, vl); + it += vl; + vint16m1_t vw3 = __riscv_vle16_v_i16m1 ((void *) it, vl); + it += vl; + vint16m1_t vw4 = __riscv_vle16_v_i16m1 ((void *) it, vl); + it += vl; + vint16m1_t vw5 = __riscv_vle16_v_i16m1 ((void *) it, vl); + it += vl; + vint16m1_t vw6 = __riscv_vle16_v_i16m1 ((void *) it, vl); + it += vl; + vint16m1_t vw7 = __riscv_vle16_v_i16m1 ((void *) it, vl); + it += vl; + + asm volatile("nop" ::: "memory"); + vint16m1_t vw0_2 = __riscv_vwredsum_vs_i8m2_i16m1 (v0, vw0, vl); + vint16m1_t vw1_2 = __riscv_vwredsum_vs_i8m2_i16m1 (v1, vw1, vl); + vint16m1_t vw2_2 = __riscv_vwredsum_vs_i8m2_i16m1 (v2, vw2, vl); + vint16m1_t vw3_2 = __riscv_vwredsum_vs_i8m2_i16m1 (v3, vw3, vl); + vint16m1_t vw4_2 = __riscv_vwredsum_vs_i8m2_i16m1 (v4, vw4, vl); + vint16m1_t vw5_2 = __riscv_vwredsum_vs_i8m2_i16m1 (v5, vw5, vl); + vint16m1_t vw6_2 = __riscv_vwredsum_vs_i8m2_i16m1 (v6, vw6, vl); + vint16m1_t vw7_2 = __riscv_vwredsum_vs_i8m2_i16m1 (v7, vw7, vl); + + vw0 = __riscv_vwredsum_vs_i8m2_i16m1 (v0, vw0_2, vl); + vw1 = __riscv_vwredsum_vs_i8m2_i16m1 (v1, vw1_2, vl); + vw2 = __riscv_vwredsum_vs_i8m2_i16m1 (v2, vw2_2, vl); + vw3 = __riscv_vwredsum_vs_i8m2_i16m1 (v3, vw3_2, vl); + vw4 = __riscv_vwredsum_vs_i8m2_i16m1 (v4, vw4_2, vl); + vw5 = __riscv_vwredsum_vs_i8m2_i16m1 (v5, vw5_2, vl); + vw6 = __riscv_vwredsum_vs_i8m2_i16m1 (v6, vw6_2, vl); + vw7 = __riscv_vwredsum_vs_i8m2_i16m1 (v7, vw7_2, vl); + + asm volatile("nop" ::: "memory"); + size_t sum0 = __riscv_vmv_x_s_i16m1_i16 (vw0); + size_t sum1 = __riscv_vmv_x_s_i16m1_i16 (vw1); + size_t sum2 = __riscv_vmv_x_s_i16m1_i16 (vw2); + size_t sum3 = __riscv_vmv_x_s_i16m1_i16 (vw3); + size_t sum4 = __riscv_vmv_x_s_i16m1_i16 (vw4); + size_t sum5 = __riscv_vmv_x_s_i16m1_i16 (vw5); + size_t sum6 = __riscv_vmv_x_s_i16m1_i16 (vw6); + size_t sum7 = __riscv_vmv_x_s_i16m1_i16 (vw7); + + size_t sum0_2 = __riscv_vmv_x_s_i16m1_i16 (vw0_2); + size_t sum1_2 = __riscv_vmv_x_s_i16m1_i16 (vw1_2); + size_t sum2_2 = __riscv_vmv_x_s_i16m1_i16 (vw2_2); + size_t sum3_2 = __riscv_vmv_x_s_i16m1_i16 (vw3_2); + size_t sum4_2 = __riscv_vmv_x_s_i16m1_i16 (vw4_2); + size_t sum5_2 = __riscv_vmv_x_s_i16m1_i16 (vw5_2); + size_t sum6_2 = __riscv_vmv_x_s_i16m1_i16 (vw6_2); + size_t sum7_2 = __riscv_vmv_x_s_i16m1_i16 (vw7_2); + + sum += sumation (sum0, sum1, sum2, sum3, sum4, sum5, sum6, sum7, + sum0_2, sum1_2, sum2_2, sum3_2, sum4_2, sum5_2, sum6_2, sum7_2); + } + return sum; +} + +/* { dg-final { scan-assembler-not {vmv1r} } } */ +/* { dg-final { scan-assembler-not {vmv2r} } } */ +/* { dg-final { scan-assembler-not {vmv4r} } } */ +/* { dg-final { scan-assembler-not {vmv8r} } } */ +/* { dg-final { scan-assembler-not {csrr} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-36.c b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-36.c new file mode 100644 index 00000000000..7756bdbad46 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-36.c @@ -0,0 +1,107 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */ + +#include "riscv_vector.h" + +size_t __attribute__ ((noinline)) +sumation (size_t sum0, size_t sum1, size_t sum2, size_t sum3, size_t sum4, + size_t sum5, size_t sum6, size_t sum7, + size_t sum0_2, size_t sum1_2, size_t sum2_2, size_t sum3_2, size_t sum4_2, + size_t sum5_2, size_t sum6_2, size_t sum7_2) +{ + return sum0 + sum1 + sum2 + sum3 + sum4 + sum5 + sum6 + sum7 + + sum0_2 + sum1_2 + sum2_2 + sum3_2 + sum4_2 + sum5_2 + sum6_2 + sum7_2; +} + +size_t +foo (char const *buf, size_t len) +{ + size_t sum = 0; + size_t vl = __riscv_vsetvlmax_e8m8 (); + size_t step = vl * 4; + const char *it = buf, *end = buf + len; + for (; it + step <= end;) + { + vfloat32m2_t v0 = __riscv_vle32_v_f32m2 ((void *) it, vl); + it += vl; + vfloat32m2_t v1 = __riscv_vle32_v_f32m2 ((void *) it, vl); + it += vl; + vfloat32m2_t v2 = __riscv_vle32_v_f32m2 ((void *) it, vl); + it += vl; + vfloat32m2_t v3 = __riscv_vle32_v_f32m2 ((void *) it, vl); + it += vl; + vfloat32m2_t v4 = __riscv_vle32_v_f32m2 ((void *) it, vl); + it += vl; + vfloat32m2_t v5 = __riscv_vle32_v_f32m2 ((void *) it, vl); + it += vl; + vfloat32m2_t v6 = __riscv_vle32_v_f32m2 ((void *) it, vl); + it += vl; + vfloat32m2_t v7 = __riscv_vle32_v_f32m2 ((void *) it, vl); + it += vl; + + vfloat64m1_t vw0 = __riscv_vle64_v_f64m1 ((void *) it, vl); + it += vl; + vfloat64m1_t vw1 = __riscv_vle64_v_f64m1 ((void *) it, vl); + it += vl; + vfloat64m1_t vw2 = __riscv_vle64_v_f64m1 ((void *) it, vl); + it += vl; + vfloat64m1_t vw3 = __riscv_vle64_v_f64m1 ((void *) it, vl); + it += vl; + vfloat64m1_t vw4 = __riscv_vle64_v_f64m1 ((void *) it, vl); + it += vl; + vfloat64m1_t vw5 = __riscv_vle64_v_f64m1 ((void *) it, vl); + it += vl; + vfloat64m1_t vw6 = __riscv_vle64_v_f64m1 ((void *) it, vl); + it += vl; + vfloat64m1_t vw7 = __riscv_vle64_v_f64m1 ((void *) it, vl); + it += vl; + + asm volatile("nop" ::: "memory"); + vfloat64m1_t vw0_2 = __riscv_vfwredusum_vs_f32m2_f64m1 (v0, vw0, vl); + vfloat64m1_t vw1_2 = __riscv_vfwredusum_vs_f32m2_f64m1 (v1, vw1, vl); + vfloat64m1_t vw2_2 = __riscv_vfwredusum_vs_f32m2_f64m1 (v2, vw2, vl); + vfloat64m1_t vw3_2 = __riscv_vfwredusum_vs_f32m2_f64m1 (v3, vw3, vl); + vfloat64m1_t vw4_2 = __riscv_vfwredusum_vs_f32m2_f64m1 (v4, vw4, vl); + vfloat64m1_t vw5_2 = __riscv_vfwredusum_vs_f32m2_f64m1 (v5, vw5, vl); + vfloat64m1_t vw6_2 = __riscv_vfwredusum_vs_f32m2_f64m1 (v6, vw6, vl); + vfloat64m1_t vw7_2 = __riscv_vfwredusum_vs_f32m2_f64m1 (v7, vw7, vl); + + vw0 = __riscv_vfwredusum_vs_f32m2_f64m1 (v0, vw0_2, vl); + vw1 = __riscv_vfwredusum_vs_f32m2_f64m1 (v1, vw1_2, vl); + vw2 = __riscv_vfwredusum_vs_f32m2_f64m1 (v2, vw2_2, vl); + vw3 = __riscv_vfwredusum_vs_f32m2_f64m1 (v3, vw3_2, vl); + vw4 = __riscv_vfwredusum_vs_f32m2_f64m1 (v4, vw4_2, vl); + vw5 = __riscv_vfwredusum_vs_f32m2_f64m1 (v5, vw5_2, vl); + vw6 = __riscv_vfwredusum_vs_f32m2_f64m1 (v6, vw6_2, vl); + vw7 = __riscv_vfwredusum_vs_f32m2_f64m1 (v7, vw7_2, vl); + + asm volatile("nop" ::: "memory"); + size_t sum0 = __riscv_vfmv_f_s_f64m1_f64 (vw0); + size_t sum1 = __riscv_vfmv_f_s_f64m1_f64 (vw1); + size_t sum2 = __riscv_vfmv_f_s_f64m1_f64 (vw2); + size_t sum3 = __riscv_vfmv_f_s_f64m1_f64 (vw3); + size_t sum4 = __riscv_vfmv_f_s_f64m1_f64 (vw4); + size_t sum5 = __riscv_vfmv_f_s_f64m1_f64 (vw5); + size_t sum6 = __riscv_vfmv_f_s_f64m1_f64 (vw6); + size_t sum7 = __riscv_vfmv_f_s_f64m1_f64 (vw7); + + size_t sum0_2 = __riscv_vfmv_f_s_f64m1_f64 (vw0_2); + size_t sum1_2 = __riscv_vfmv_f_s_f64m1_f64 (vw1_2); + size_t sum2_2 = __riscv_vfmv_f_s_f64m1_f64 (vw2_2); + size_t sum3_2 = __riscv_vfmv_f_s_f64m1_f64 (vw3_2); + size_t sum4_2 = __riscv_vfmv_f_s_f64m1_f64 (vw4_2); + size_t sum5_2 = __riscv_vfmv_f_s_f64m1_f64 (vw5_2); + size_t sum6_2 = __riscv_vfmv_f_s_f64m1_f64 (vw6_2); + size_t sum7_2 = __riscv_vfmv_f_s_f64m1_f64 (vw7_2); + + sum += sumation (sum0, sum1, sum2, sum3, sum4, sum5, sum6, sum7, + sum0_2, sum1_2, sum2_2, sum3_2, sum4_2, sum5_2, sum6_2, sum7_2); + } + return sum; +} + +/* { dg-final { scan-assembler-not {vmv1r} } } */ +/* { dg-final { scan-assembler-not {vmv2r} } } */ +/* { dg-final { scan-assembler-not {vmv4r} } } */ +/* { dg-final { scan-assembler-not {vmv8r} } } */ +/* { dg-final { scan-assembler-not {csrr} } } */