From patchwork Mon Oct 23 09:04:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 156761 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce89:0:b0:403:3b70:6f57 with SMTP id p9csp1163314vqx; Mon, 23 Oct 2023 02:04:43 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGzDEJfJtmkle77EqmbchFKUBE9tz7v2L+4FNKOuUNqdZH9FXo4/3Ng7hzCwjsL8k4K7Jdy X-Received: by 2002:a05:620a:1a8d:b0:774:21d8:b0bb with SMTP id bl13-20020a05620a1a8d00b0077421d8b0bbmr11281210qkb.24.1698051882942; Mon, 23 Oct 2023 02:04:42 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1698051882; cv=pass; d=google.com; s=arc-20160816; b=mG20BdswtRqRZya6YjYRCAWNkp3EbeLcYoyju4ujZd2lmViqiZE1HzPWCDx001Y1Z8 pPsYBii9U1KCGXjSZ1jR3k29ssEpZMjDh2h7okGKKfuxTAf2kSeV53slKuGr1YqdftGd PlDnqt/U7QHeDMDkHDedAml4GrwlLScQYHyjGoXPQ7LJIQ+dF+S/HjAipdo8c7LEjfxH FYBdodf/Tl25lnTSBs9+3YtdTBQ0U2rX7MBuPO8FlwsoUT3XOrymOLARf82/COW/ZOUM Kznw2rOlr0YoB/hA5r++00IcOs4jTqNSggqL33w/cDFKkx97YtuFi3YUQJn2WfNFpqQc xW+A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:arc-filter:dmarc-filter:delivered-to; bh=qkJ65ybn6qMTDI8tFlBLEYqMzE8CLN7H7+awhudQnQU=; fh=12MRPJmZ1mgDpHqWoogMKqnaGRGM2b7lcuJroqfjJiw=; b=r7FEIWWWedBlmNjcgF5kPT2HY6dPypU2CwIZC0r9hiBHJ/IWjTk/adPrD4D6IyBvOe f75PxV0fnffV2pEhHjfYg+42tnE5zMkyaWUn8ymFZBqWxWpiQQLrHoHJ2KypwsChr5qD xtvdjMH+VRD6Oe5LUZC/jjJH9BZac6e+mhhHD2GSQ1IVXRcv3+7CU8DumHBmVzyu/EDd I9GLhtknSviP21GuFglfmxMvaAkU+8bJOoay8Kubh0rRZDaCpbbKb0sd58xCxfK0t7kB gMzxuaK2eeiLbX3p+DhyFX8VmdS0Yymmo/PDr5iSBIK4EFO0Y3k4TgspesBL1WYjxcBM ZL3g== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id d3-20020a05620a166300b0077592aca100si4827527qko.153.2023.10.23.02.04.42 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Oct 2023 02:04:42 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B19893858C52 for ; Mon, 23 Oct 2023 09:04:42 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgbr2.qq.com (smtpbgbr2.qq.com [54.207.22.56]) by sourceware.org (Postfix) with ESMTPS id CCA2C3858D37 for ; Mon, 23 Oct 2023 09:04:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CCA2C3858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CCA2C3858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=54.207.22.56 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698051858; cv=none; b=REws+TioOe7OjnbTPqNvvFtzLxbhWwBPgbsCy+L5flv0BPKmJur2XGMHMXzRZhADeIVQly3BvhrmOsqjyMECE5A2ASnmKWXdQscK4eRc8K9yp9lfF1UexJQ6XtPwZStNHF6jVRa0Y7vf0gMtZr+5PBNAa4irDvMIDjTObRauZTc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698051858; c=relaxed/simple; bh=dZ1k50J4+qEqYJessNPZ0/BMtZeaiGbkpPINvqxwPcE=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=jCOXe7hZyWLNdDA25y7m/zc5o4Pzz/0JFLIWkNFMfxtFIWps4XFHYG1/rkzDB4PHhEaVtvgYiuEMMLBVTvCDs8clsHQgRLXZTNhu+DHfeRBJBQYsoaDgGJPGE17uDytyx8bOXIJTQhmmMWooifcXR6nv8r8x1SX2FSULY0cKXIA= ARC-Authentication-Results: i=1; server2.sourceware.org X-QQ-mid: bizesmtp74t1698051843tusgsqyr Received: from rios-cad121.hadoop.rioslab.org ( [58.60.1.9]) by bizesmtp.qq.com (ESMTP) with id ; Mon, 23 Oct 2023 17:04:02 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: vrqOr+ppv0tZ6kGoE/ZxFdfU86ESMfVMfv6NpEKEl4J2gRA0E458M6wCMCauA Ft9PXS5JY4gKKNpZgAxKEm4baHtEJ/xU25u+gcpWphQCvZlbesNVjZ7Y5Aj/+3TJclq8rQS 5vPc8TTF5el0BzH0CBWctdsty3i/LhA/0lZY73fnQQrN5ebcxEyFSZfgMbB5mUQb6D+RV4y IjlXmuQaW6sRSVoIV3Vg1wlQRfxdqM4QRn7/+wRtGicDSThD2D8EJ8funZvy8NYgK0kh8fz 5K0UtpigA3SFteWEHF9cRqeyOlJRjZkJNXFc4QOPHGTfUnJ2yRbIat9t+JmG6LIupCYIM0M 8N7xqRsl/YHGEkl3kPBrWkn1WB4Fp/e9e9ohQ3/PdNIh2rUsCTTv0ji5DzX3J/dDG2HPiuD iPU+c9M+tpXHWHmIhiVf7Q== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 10275362683197330235 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, Juzhe-Zhong Subject: [PATCH] RISC-V: Fix ICE for the fusion case from vsetvl to scalar move[PR111927] Date: Mon, 23 Oct 2023 17:04:01 +0800 Message-Id: <20231023090401.1724890-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780536451135324846 X-GMAIL-MSGID: 1780536451135324846 ICE: during RTL pass: vsetvl : In function 'riscv_lms_f32': :240:1: internal compiler error: in merge, at config/riscv/riscv-vsetvl.cc:1997 240 | } In general compatible_p (avl_equal_p) has: if (next.has_vl () && next.vl_used_by_non_rvv_insn_p ()) return false; Don't fuse AVL of vsetvl if the VL operand is used by non-RVV instructrions. It is reasonable to add it into 'can_use_next_avl_p' since we don't want to fuse AVL of vsetvl into a scalar move instruction which doesn't demand AVL. And after the fusion, we will alway use compatible_p to check whether the demand is correct or not. PR target/111927 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc: Fix ICE. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr111927.c: New test. --- gcc/config/riscv/riscv-vsetvl.cc | 23 ++ .../gcc.target/riscv/rvv/vsetvl/pr111927.c | 243 ++++++++++++++++++ 2 files changed, 266 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 47b459fddd4..42295732ed7 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -1541,6 +1541,29 @@ private: inline bool can_use_next_avl_p (const vsetvl_info &prev, const vsetvl_info &next) { + /* Forbid the AVL/VL propagation if VL of NEXT is used + by non-RVV instructions. This is because: + + bb 2: + scalar move (no AVL) + bb 3: + vsetvl a5(VL), a4(AVL) ... + branch a5,zero + + Since user vsetvl instruction is no side effect instruction + which should be placed in the correct and optimal location + of the program by the previous PASS, it is unreasonble that + VSETVL PASS tries to move it to another places if it used by + non-RVV instructions. + + Note: We only forbid the cases that VL is used by the following + non-RVV instructions which will cause issues. We don't forbid + other cases since it won't cause correctness issues and we still + more more demand info are fused backward. The later LCM algorithm + should know the optimal location of the vsetvl. */ + if (next.has_vl () && next.vl_used_by_non_rvv_insn_p ()) + return false; + if (!next.has_nonvlmax_reg_avl () && !next.has_vl ()) return true; diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c new file mode 100644 index 00000000000..62f395fee33 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c @@ -0,0 +1,243 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */ + +#include "riscv_vector.h" +#include + +#define RISCV_MATH_LOOPUNROLL +#define RISCV_MATH_VECTOR +typedef float float32_t; + + typedef struct + { + uint16_t numTaps; /**< number of coefficients in the filter. */ + float32_t *pState; /**< points to the state variable array. The array is of length numTaps+blockSize-1. */ + float32_t *pCoeffs; /**< points to the coefficient array. The array is of length numTaps. */ + float32_t mu; /**< step size that controls filter coefficient updates. */ + } riscv_lms_instance_f32; + + +void riscv_lms_f32( + const riscv_lms_instance_f32 * S, + const float32_t * pSrc, + float32_t * pRef, + float32_t * pOut, + float32_t * pErr, + uint32_t blockSize) +{ + float32_t *pState = S->pState; /* State pointer */ + float32_t *pCoeffs = S->pCoeffs; /* Coefficient pointer */ + float32_t *pStateCurnt; /* Points to the current sample of the state */ + float32_t *px, *pb; /* Temporary pointers for state and coefficient buffers */ + float32_t mu = S->mu; /* Adaptive factor */ + float32_t acc, e; /* Accumulator, error */ + float32_t w; /* Weight factor */ + uint32_t numTaps = S->numTaps; /* Number of filter coefficients in the filter */ + uint32_t tapCnt, blkCnt; /* Loop counters */ + + /* Initializations of error, difference, Coefficient update */ + e = 0.0f; + w = 0.0f; + + /* S->pState points to state array which contains previous frame (numTaps - 1) samples */ + /* pStateCurnt points to the location where the new input data should be written */ + pStateCurnt = &(S->pState[(numTaps - 1U)]); + + /* initialise loop count */ + blkCnt = blockSize; + + while (blkCnt > 0U) + { + /* Copy the new input sample into the state buffer */ + *pStateCurnt++ = *pSrc++; + + /* Initialize pState pointer */ + px = pState; + + /* Initialize coefficient pointer */ + pb = pCoeffs; + + /* Set the accumulator to zero */ + acc = 0.0f; +#if defined (RISCV_MATH_VECTOR) + uint32_t vblkCnt = numTaps; /* Loop counter */ + size_t l; + vfloat32m8_t vx, vy; + vfloat32m1_t temp00m1; + l = __riscv_vsetvl_e32m1(1); + temp00m1 = __riscv_vfmv_v_f_f32m1(0, l); + for (; (l = __riscv_vsetvl_e32m8(vblkCnt)) > 0; vblkCnt -= l) { + vx = __riscv_vle32_v_f32m8(px, l); + px += l; + vy = __riscv_vle32_v_f32m8(pb, l); + pb += l; + temp00m1 = __riscv_vfredusum_vs_f32m8_f32m1(__riscv_vfmul_vv_f32m8(vx, vy, l), temp00m1, l); + } + acc += __riscv_vfmv_f_s_f32m1_f32(temp00m1); +#else +#if defined (RISCV_MATH_LOOPUNROLL) + + /* Loop unrolling: Compute 4 taps at a time. */ + tapCnt = numTaps >> 2U; + + while (tapCnt > 0U) + { + /* Perform the multiply-accumulate */ + acc += (*px++) * (*pb++); + + acc += (*px++) * (*pb++); + + acc += (*px++) * (*pb++); + + acc += (*px++) * (*pb++); + + /* Decrement loop counter */ + tapCnt--; + } + + /* Loop unrolling: Compute remaining taps */ + tapCnt = numTaps & 0x3U; + +#else + + /* Initialize tapCnt with number of samples */ + tapCnt = numTaps; + +#endif /* #if defined (RISCV_MATH_LOOPUNROLL) */ + + while (tapCnt > 0U) + { + /* Perform the multiply-accumulate */ + acc += (*px++) * (*pb++); + + /* Decrement the loop counter */ + tapCnt--; + } +#endif /* defined (RISCV_MATH_VECTOR) */ + /* Store the result from accumulator into the destination buffer. */ + *pOut++ = acc; + + /* Compute and store error */ + e = (float32_t) *pRef++ - acc; + *pErr++ = e; + + /* Calculation of Weighting factor for updating filter coefficients */ + w = e * mu; + + /* Initialize pState pointer */ + /* Advance state pointer by 1 for the next sample */ + px = pState++; + + /* Initialize coefficient pointer */ + pb = pCoeffs; + +#if defined (RISCV_MATH_VECTOR) + vblkCnt = numTaps; + for (; (l = __riscv_vsetvl_e32m8(vblkCnt)) > 0; vblkCnt -= l) { + vx = __riscv_vle32_v_f32m8(px, l); + px += l; + __riscv_vse32_v_f32m8(pb, __riscv_vfadd_vv_f32m8(__riscv_vfmul_vf_f32m8(vx, w, l), __riscv_vle32_v_f32m8(pb, l), l) , l); + pb += l; + } +#else +#if defined (RISCV_MATH_LOOPUNROLL) + + /* Loop unrolling: Compute 4 taps at a time. */ + tapCnt = numTaps >> 2U; + + /* Update filter coefficients */ + while (tapCnt > 0U) + { + /* Perform the multiply-accumulate */ + *pb += w * (*px++); + pb++; + + *pb += w * (*px++); + pb++; + + *pb += w * (*px++); + pb++; + + *pb += w * (*px++); + pb++; + + /* Decrement loop counter */ + tapCnt--; + } + + /* Loop unrolling: Compute remaining taps */ + tapCnt = numTaps & 0x3U; + +#else + + /* Initialize tapCnt with number of samples */ + tapCnt = numTaps; + +#endif /* #if defined (RISCV_MATH_LOOPUNROLL) */ + + while (tapCnt > 0U) + { + /* Perform the multiply-accumulate */ + *pb += w * (*px++); + pb++; + + /* Decrement loop counter */ + tapCnt--; + } +#endif /* defined (RISCV_MATH_VECTOR) */ + /* Decrement loop counter */ + blkCnt--; + } + + /* Processing is complete. + Now copy the last numTaps - 1 samples to the start of the state buffer. + This prepares the state buffer for the next function call. */ + + /* Points to the start of the pState buffer */ + pStateCurnt = S->pState; + + /* copy data */ +#if defined (RISCV_MATH_VECTOR) + uint32_t vblkCnt = (numTaps - 1U); /* Loop counter */ + size_t l; + for (; (l = __riscv_vsetvl_e32m8(vblkCnt)) > 0; vblkCnt -= l) { + __riscv_vse32_v_f32m8(pStateCurnt, __riscv_vle32_v_f32m8(pState, l) , l); + pState += l; + pStateCurnt += l; + } +#else +#if defined (RISCV_MATH_LOOPUNROLL) + + /* Loop unrolling: Compute 4 taps at a time. */ + tapCnt = (numTaps - 1U) >> 2U; + + while (tapCnt > 0U) + { + *pStateCurnt++ = *pState++; + *pStateCurnt++ = *pState++; + *pStateCurnt++ = *pState++; + *pStateCurnt++ = *pState++; + + /* Decrement loop counter */ + tapCnt--; + } + + /* Loop unrolling: Compute remaining taps */ + tapCnt = (numTaps - 1U) & 0x3U; + +#else + + /* Initialize tapCnt with number of samples */ + tapCnt = (numTaps - 1U); + +#endif /* #if defined (RISCV_MATH_LOOPUNROLL) */ + + while (tapCnt > 0U) + { + *pStateCurnt++ = *pState++; + + /* Decrement loop counter */ + tapCnt--; + } +#endif /* defined (RISCV_MATH_VECTOR) */ +}