From patchwork Mon Oct 23 09:40:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 156784 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce89:0:b0:403:3b70:6f57 with SMTP id p9csp1177129vqx; Mon, 23 Oct 2023 02:41:14 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEezKILej6MvTD8Cut5HCEF/Izf1oNsiPSoEh+N80T2fKYv4R6X0hGWKdHLUf/GdCcSD7O1 X-Received: by 2002:ac8:5e0b:0:b0:417:b269:4689 with SMTP id h11-20020ac85e0b000000b00417b2694689mr8394226qtx.53.1698054073774; Mon, 23 Oct 2023 02:41:13 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1698054073; cv=pass; d=google.com; s=arc-20160816; b=bUa2UOlUr+UMGgv6sGk65eKo+vrzaUeSGvVFUTUmG8xmz1PBuwS7Oo+xjYgKnVhZSD BaN/Zxr341fFBeLP0qjC0jS+hrdfknvj8Z6Rm6YjaY7JSjZekAogPTCy9bVl1h1HXnkb 7d29NhxN7ltwpzzMGjmk3TFI5LvldGprh3UCTGKdAtx7ZFRGaHwBlzFD6GACoWuvJrwy F4J4mC2mkqIuvFWyyIWkG1sE5thF7sxpF0HW7u+yAn2awO3O4MEn1xmrvpuTTzRJTsFr xOED2Hpi5+SR4zR4/kth5CkcwkVXMQLnaVSnaUuNYXghVnu2oPHC9wI3hjWk9rQEQ1ag W9WA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:arc-filter:dmarc-filter:delivered-to; bh=bBx4v0PA7QFZ6FiGPn73XsJ9FT2oew609SuLR/tzoXE=; fh=12MRPJmZ1mgDpHqWoogMKqnaGRGM2b7lcuJroqfjJiw=; b=PU2kIH8ANSqx1CSdrCL1G8A9FSXjCMNyL5T/R8p7Kf857xm0KlgYGYw/DsRJCtcKmP DO/KGfqEiyvrF3FpCyNX5yzn1h0aLfZt9FM5CRdTGFhFwGhtTf98FrlFxvnwM0gy15Nf 6ItAWsDkZ5X/YxiuICLQSovF0DYqUZwMnnIczrDjoLzkEGmUcR5GNTDN6elELN0MQ2ng si/qwPAtwTYh5P0/1wm06gt9HC4HgApwdVexegWm/g++Ws7VuOmevT/Y2amq6963/xkE G7JXGTvyvbfufFfKANYwUz8+JSdczfFPS7ZC7Tm4uRNL71zZxOfCRUCHahPAjeGdouOk +3cg== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id s13-20020a05622a178d00b0041799b6f25fsi5272498qtk.17.2023.10.23.02.41.13 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 23 Oct 2023 02:41:13 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 884EB3858C60 for ; Mon, 23 Oct 2023 09:41:13 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgbr1.qq.com (smtpbgbr1.qq.com [54.207.19.206]) by sourceware.org (Postfix) with ESMTPS id 63C393858C35 for ; Mon, 23 Oct 2023 09:40:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 63C393858C35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 63C393858C35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=54.207.19.206 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698054047; cv=none; b=O7p8IZbyM82IoR06SFdjdipZv8At1Z9IE2EaC5f5une2CYycXJrradJzgJZhRgW47EHKc8cP0MJMdBUOJslLqRB2baDEAo20z9capo42g30fFyCA7LOgpIKhEocSv0MZpW5oJw/XmfHMnu+/SgyQm4IjKzTUrGcVl3pYiztcJj8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698054047; c=relaxed/simple; bh=9I2/e2HeOQV13GQgpBm3PFeT2SMLNQEPW+DrVNjkScA=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=EWET+KkK23x//FlyaFW1XiJtN6gWX9A6y6fk0QvEHlStSBI00QhLsJ02YMHefHtGwviqWL/ZTh3L00dhcII2XRHcFSpKoqQKmu+0Mi4oc+5Ov+2UUKzVh0fqpKu4evTERAPbosXs4fE8NdZ+818fm6YmXjisi1Iuo1pLeTAyHQI= ARC-Authentication-Results: i=1; server2.sourceware.org X-QQ-mid: bizesmtp85t1698054036ts0pg2qa Received: from rios-cad121.hadoop.rioslab.org ( [58.60.1.9]) by bizesmtp.qq.com (ESMTP) with id ; Mon, 23 Oct 2023 17:40:35 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: rGm7xzoh3hlfkBXPcRCGKbsm5MGYqZVynt/X/X7GE1Ftjt5qT/1wZ5Y93eahN 8QQkl0UnT9jxjP91+mxCp8dq7nOAL0yOXDAoLC48YDLRA7u0MzG9EHBVPmT0oNcpVq+c3Nc xQF6skLX4GXYMlLpKc256WH34tH3qVqdiJYrUgKaZIrdYuuzqPoza/eSUupZiZR/HAkqXu9 3GI57YuDjLBJ3yS3PmxWvjrB81vPMhHK8QkJOljznzgaLY4JoF/5DNl5G+RFp43gVVveLkR iYGKfE+eE7S1BOD8ZcqlJxPh5iyG11i3TB+yDjAQNdiRVMyDBhxI+RiyNUOZ7u9vxCuXP+o 1w18oHBk7/44mx1pLAxQ/5/Q6ST6TUqTn/iNtzF8ppvPaKFThnwIFJ0ruiilD583YnmkOoU P2vlo+daB5G3Q0jaNfxyHQ== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 3676545305446315686 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, Juzhe-Zhong Subject: [PATCH V2] RISC-V: Fix ICE for the fusion case from vsetvl to scalar move[PR111927] Date: Mon, 23 Oct 2023 17:40:34 +0800 Message-Id: <20231023094034.1728130-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP, WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780538748154517437 X-GMAIL-MSGID: 1780538748154517437 ICE: during RTL pass: vsetvl : In function 'riscv_lms_f32': :240:1: internal compiler error: in merge, at config/riscv/riscv-vsetvl.cc:1997 240 | } In general compatible_p (avl_equal_p) has: if (next.has_vl () && next.vl_used_by_non_rvv_insn_p ()) return false; Don't fuse AVL of vsetvl if the VL operand is used by non-RVV instructions. It is reasonable to add it into 'can_use_next_avl_p' since we don't want to fuse AVL of vsetvl into a scalar move instruction which doesn't demand AVL. And after the fusion, we will alway use compatible_p to check whether the demand is correct or not. PR target/111927 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc: Fix bug. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr111927.c: New test. --- gcc/config/riscv/riscv-vsetvl.cc | 23 +++ .../gcc.target/riscv/rvv/vsetvl/pr111927.c | 170 ++++++++++++++++++ 2 files changed, 193 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 47b459fddd4..f3922a051c5 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -1541,6 +1541,29 @@ private: inline bool can_use_next_avl_p (const vsetvl_info &prev, const vsetvl_info &next) { + /* Forbid the AVL/VL propagation if VL of NEXT is used + by non-RVV instructions. This is because: + + bb 2: + PREV: scalar move (no AVL) + bb 3: + NEXT: vsetvl a5(VL), a4(AVL) ... + branch a5,zero + + Since user vsetvl instruction is no side effect instruction + which should be placed in the correct and optimal location + of the program by the previous PASS, it is unreasonable that + VSETVL PASS tries to move it to another places if it used by + non-RVV instructions. + + Note: We only forbid the cases that VL is used by the following + non-RVV instructions which will cause issues. We don't forbid + other cases since it won't cause correctness issues and we still + more demand info are fused backward. The later LCM algorithm + should know the optimal location of the vsetvl. */ + if (next.has_vl () && next.vl_used_by_non_rvv_insn_p ()) + return false; + if (!next.has_nonvlmax_reg_avl () && !next.has_vl ()) return true; diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c new file mode 100644 index 00000000000..ab599add57f --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr111927.c @@ -0,0 +1,170 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3" } */ + +#include "riscv_vector.h" + +#define RISCV_MATH_LOOPUNROLL +#define RISCV_MATH_VECTOR +typedef float float32_t; + + typedef struct + { + uint16_t numTaps; /**< number of coefficients in the filter. */ + float32_t *pState; /**< points to the state variable array. The array is of length numTaps+blockSize-1. */ + float32_t *pCoeffs; /**< points to the coefficient array. The array is of length numTaps. */ + float32_t mu; /**< step size that controls filter coefficient updates. */ + } riscv_lms_instance_f32; + + +void riscv_lms_f32( + const riscv_lms_instance_f32 * S, + const float32_t * pSrc, + float32_t * pRef, + float32_t * pOut, + float32_t * pErr, + uint32_t blockSize) +{ + float32_t *pState = S->pState; /* State pointer */ + float32_t *pCoeffs = S->pCoeffs; /* Coefficient pointer */ + float32_t *pStateCurnt; /* Points to the current sample of the state */ + float32_t *px, *pb; /* Temporary pointers for state and coefficient buffers */ + float32_t mu = S->mu; /* Adaptive factor */ + float32_t acc, e; /* Accumulator, error */ + float32_t w; /* Weight factor */ + uint32_t numTaps = S->numTaps; /* Number of filter coefficients in the filter */ + uint32_t tapCnt, blkCnt; /* Loop counters */ + + /* Initializations of error, difference, Coefficient update */ + e = 0.0f; + w = 0.0f; + + /* S->pState points to state array which contains previous frame (numTaps - 1) samples */ + /* pStateCurnt points to the location where the new input data should be written */ + pStateCurnt = &(S->pState[(numTaps - 1U)]); + + /* initialise loop count */ + blkCnt = blockSize; + + while (blkCnt > 0U) + { + /* Copy the new input sample into the state buffer */ + *pStateCurnt++ = *pSrc++; + + /* Initialize pState pointer */ + px = pState; + + /* Initialize coefficient pointer */ + pb = pCoeffs; + + /* Set the accumulator to zero */ + acc = 0.0f; + uint32_t vblkCnt = numTaps; /* Loop counter */ + size_t l; + vfloat32m8_t vx, vy; + vfloat32m1_t temp00m1; + l = __riscv_vsetvl_e32m1(1); + temp00m1 = __riscv_vfmv_v_f_f32m1(0, l); + for (; (l = __riscv_vsetvl_e32m8(vblkCnt)) > 0; vblkCnt -= l) { + vx = __riscv_vle32_v_f32m8(px, l); + px += l; + vy = __riscv_vle32_v_f32m8(pb, l); + pb += l; + temp00m1 = __riscv_vfredusum_vs_f32m8_f32m1(__riscv_vfmul_vv_f32m8(vx, vy, l), temp00m1, l); + } + acc += __riscv_vfmv_f_s_f32m1_f32(temp00m1); + + while (tapCnt > 0U) + { + /* Perform the multiply-accumulate */ + acc += (*px++) * (*pb++); + + /* Decrement the loop counter */ + tapCnt--; + } + /* Store the result from accumulator into the destination buffer. */ + *pOut++ = acc; + + /* Compute and store error */ + e = (float32_t) *pRef++ - acc; + *pErr++ = e; + + /* Calculation of Weighting factor for updating filter coefficients */ + w = e * mu; + + /* Initialize pState pointer */ + /* Advance state pointer by 1 for the next sample */ + px = pState++; + + /* Initialize coefficient pointer */ + pb = pCoeffs; + + vblkCnt = numTaps; + for (; (l = __riscv_vsetvl_e32m8(vblkCnt)) > 0; vblkCnt -= l) { + vx = __riscv_vle32_v_f32m8(px, l); + px += l; + __riscv_vse32_v_f32m8(pb, __riscv_vfadd_vv_f32m8(__riscv_vfmul_vf_f32m8(vx, w, l), __riscv_vle32_v_f32m8(pb, l), l) , l); + pb += l; + } + while (tapCnt > 0U) + { + /* Perform the multiply-accumulate */ + *pb += w * (*px++); + pb++; + + /* Decrement loop counter */ + tapCnt--; + } + /* Decrement loop counter */ + blkCnt--; + } + + /* Processing is complete. + Now copy the last numTaps - 1 samples to the start of the state buffer. + This prepares the state buffer for the next function call. */ + + /* Points to the start of the pState buffer */ + pStateCurnt = S->pState; + + /* copy data */ + + uint32_t vblkCnt = (numTaps - 1U); /* Loop counter */ + size_t l; + for (; (l = __riscv_vsetvl_e32m8(vblkCnt)) > 0; vblkCnt -= l) { + __riscv_vse32_v_f32m8(pStateCurnt, __riscv_vle32_v_f32m8(pState, l) , l); + pState += l; + pStateCurnt += l; + } + + + /* Loop unrolling: Compute 4 taps at a time. */ + tapCnt = (numTaps - 1U) >> 2U; + + while (tapCnt > 0U) + { + *pStateCurnt++ = *pState++; + *pStateCurnt++ = *pState++; + *pStateCurnt++ = *pState++; + *pStateCurnt++ = *pState++; + + /* Decrement loop counter */ + tapCnt--; + } + + /* Loop unrolling: Compute remaining taps */ + tapCnt = (numTaps - 1U) & 0x3U; + + + + /* Initialize tapCnt with number of samples */ + tapCnt = (numTaps - 1U); + + + + while (tapCnt > 0U) + { + *pStateCurnt++ = *pState++; + + /* Decrement loop counter */ + tapCnt--; + } +}