From patchwork Mon May 8 14:40:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 91187 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2207617vqo; Mon, 8 May 2023 07:41:17 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4mIcK+ZQYbAXEPeUMhL1mFrZ0IpPi7oMViVGPaS/Ey5RD0vqE/W1fKLlmO1UfHHFCBflvo X-Received: by 2002:a50:ed99:0:b0:50b:c41b:25d with SMTP id h25-20020a50ed99000000b0050bc41b025dmr7914272edr.7.1683556877421; Mon, 08 May 2023 07:41:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683556877; cv=none; d=google.com; s=arc-20160816; b=Re1fFg3eBFgYjDfDxV8WNRSsgmeWwKBN7Metz/aOAkR4MZl36uiV/837sggb7MPrsq zEtNWiDMR+QgNu5Jpp0RG1kd1LN0P+uzkG5eLhaHiVKBjI0Jehg12MqrXtJCEgiBvGJj q9PGyiGcHgHvoLabxu9uuNE/bI1KuaQ6czRICI9dxW03AYIFEeblvExcuryZqJbfkTZn ichOHMCIBJZ1jl8LpBZWcxQfR29DUXrmJYONtov0mibxT2l7PlumV5ExxyGAipIp0ceF 5VWAtodphMt3iV4uKO00UIu4oPHoM2JusExv7xOXiTwUTHYmlphHRfghCwOMhMU/768e lc3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=5bReoMFYIaFmh9a745sOJM5g8qWGktmmQhamEL5gPdE=; b=MCkgfvQhzmOcWU9hiK7Dbo4caxS+QDDOdr+6R4SH/LmklcqMIOwXC2H5l/kTfwfYLk 9OkbBXTXOycIZAg6aN5nYpPUvsL26bFcUYc7/aTBfTlTasQAvQFL79fLezvX6Lp++ktj 9OqQ1P/rp1D7f+x0N8kryMKehwKNNlIkNaBWwHb3hnR8Hn4NgibwPwuzSCBPzJy1GIGT b9/Yv9TqYiQRUAvci/eKAHytMlQOOEOKfUhqbUPHcPtAC7Vj7MbV71fzLFFNgQ43uUrD A98k4697gU+AcWioWkOTcQ0C4zlV4j+dWxqw9Jrjc0D+FVB6MqKyHJaDneOOJIthx9iZ oHhg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id a23-20020a50ff17000000b005067d449614si8932217edu.218.2023.05.08.07.41.17 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 May 2023 07:41:17 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 23802385416A for ; Mon, 8 May 2023 14:40:57 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgeu2.qq.com (smtpbgeu2.qq.com [18.194.254.142]) by sourceware.org (Postfix) with ESMTPS id 2C4893858D32 for ; Mon, 8 May 2023 14:40:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 2C4893858D32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp69t1683556819tzyr24sr Received: from rios-cad5.localdomain ( [58.60.1.11]) by bizesmtp.qq.com (ESMTP) with id ; Mon, 08 May 2023 22:40:18 +0800 (CST) X-QQ-SSF: 01400000000000F0Q000000A0000000 X-QQ-FEAT: LrCnY+iDm+OPohfHQ8Q7IJcu6oialQ3Ld2bRlOQJ0sHKa8Gc5Uc6uCGIw/HRI bRWnNKVw6G2Rkk7hi5RBneCxiJ6ryd3qgluJslwbQlweK/MjrhA6e5E/chTiJveo41dIP8k RKVxledikulUFP2aBeCNpdh7/qktmPyU2/WiZxEfxDu+IND7hOZK9SnPc9ldX3bWIokSFdW yMRMc2HQkRdZv7vExgaJDKgxtczaZYAcUJOH42jYUf7APg/w8Uwu6qpe2nr9M4E2kyfnQYs 6Nuj/F/vHXSLjzLT8JvzyOWLy8yqIK4rYnli3Dww7gMAl82z1RoTl1Yyt4BXbB820TXYtB1 eSDiOwst4hT2GAUErV1VYMKis1GVWZURezxa3N2wVdUGKRvMOm8QAy4upauhu83eWhnJQxO nch7MWemFLjZyOeiK9WB1w== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 4566743459965663667 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, Juzhe-Zhong Subject: [PATCH V2] RISC-V: Optimize vsetvli of LCM INSERTED edge for user vsetvli [PR 109743] Date: Mon, 8 May 2023 22:40:16 +0800 Message-Id: <20230508144016.649694-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, LIKELY_SPAM_BODY, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1765337336384698737?= X-GMAIL-MSGID: =?utf-8?q?1765337336384698737?= From: Juzhe-Zhong This patch is fixing: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109743. This issue happens is because we are currently very conservative in optimization of user vsetvli. Consider this following case: bb 1: vsetvli a5,a4... (demand AVL = a4). bb 2: RVV insn use a5 (demand AVL = a5). LCM will hoist vsetvl of bb 2 into bb 1. We don't do AVL propagation for this situation since it's complicated that we should analyze the code sequence between vsetvli in bb 1 and RVV insn in bb 2. They are not necessary the consecutive blocks. This patch is doing the optimizations after LCM, we will check and eliminate the vsetvli in LCM inserted edge if such vsetvli is redundant. Such approach is much simplier and safe. code: void foo2 (int32_t *a, int32_t *b, int n) { if (n <= 0) return; int i = n; size_t vl = __riscv_vsetvl_e32m1 (i); for (; i >= 0; i--) { vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl); __riscv_vse32_v_i32m1 (b, v, vl); if (i >= vl) continue; if (i == 0) return; vl = __riscv_vsetvl_e32m1 (i); } } Before this patch: foo2: .LFB2: .cfi_startproc ble a2,zero,.L1 mv a4,a2 li a3,-1 vsetvli a5,a2,e32,m1,ta,mu vsetvli zero,a5,e32,m1,ta,ma <- can be eliminated. .L5: vle32.v v1,0(a0) vse32.v v1,0(a1) bgeu a4,a5,.L3 .L10: beq a2,zero,.L1 vsetvli a5,a4,e32,m1,ta,mu addi a4,a4,-1 vsetvli zero,a5,e32,m1,ta,ma <- can be eliminated. vle32.v v1,0(a0) vse32.v v1,0(a1) addiw a2,a2,-1 bltu a4,a5,.L10 .L3: addiw a2,a2,-1 addi a4,a4,-1 bne a2,a3,.L5 .L1: ret After this patch: f: ble a2,zero,.L1 mv a4,a2 li a3,-1 vsetvli a5,a2,e32,m1,ta,ma .L5: vle32.v v1,0(a0) vse32.v v1,0(a1) bgeu a4,a5,.L3 .L10: beq a2,zero,.L1 vsetvli a5,a4,e32,m1,ta,ma addi a4,a4,-1 vle32.v v1,0(a0) vse32.v v1,0(a1) addiw a2,a2,-1 bltu a4,a5,.L10 .L3: addiw a2,a2,-1 addi a4,a4,-1 bne a2,a3,.L5 .L1: ret PR target/109743 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pass_vsetvl::local_eliminate_vsetvl_insn): Enhance local optimizations for LCM. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr109743-1.c: New test. * gcc.target/riscv/rvv/vsetvl/pr109743-2.c: New test. * gcc.target/riscv/rvv/vsetvl/pr109743-3.c: New test. * gcc.target/riscv/rvv/vsetvl/pr109743-4.c: New test. --- gcc/config/riscv/riscv-vsetvl.cc | 49 ++++++++++++++++++- .../gcc.target/riscv/rvv/vsetvl/pr109743-1.c | 26 ++++++++++ .../gcc.target/riscv/rvv/vsetvl/pr109743-2.c | 27 ++++++++++ .../gcc.target/riscv/rvv/vsetvl/pr109743-3.c | 28 +++++++++++ .../gcc.target/riscv/rvv/vsetvl/pr109743-4.c | 28 +++++++++++ 5 files changed, 157 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-4.c diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index f55907a410e..090a4737c17 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -3978,7 +3978,8 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const vector_insn_info &dem) const { if (i->is_call () || i->is_asm () || find_access (i->defs (), VL_REGNUM) - || find_access (i->defs (), VTYPE_REGNUM)) + || find_access (i->defs (), VTYPE_REGNUM) + || find_access (i->defs (), REGNO (vl))) return; if (has_vtype_op (i->rtl ())) @@ -4004,6 +4005,52 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const vector_insn_info &dem) const return; } } + + /* Here we optimize the VSETVL is hoisted by LCM: + + Before LCM: + bb 1: + vsetvli a5,a2,e32,m1,ta,mu + bb 2: + vsetvli zero,a5,e32,m1,ta,mu + ... + + After LCM: + bb 1: + vsetvli a5,a2,e32,m1,ta,mu + LCM INSERTED: vsetvli zero,a5,e32,m1,ta,mu --> eliminate + bb 2: + ... + Such instruction can not be accessed in RTL_SSA when we don't re-init + the new RTL_SSA framework but it is definetely at the END of the block. */ + rtx_insn *end_vsetvl = BB_END (bb->cfg_bb ()); + if (!vsetvl_discard_result_insn_p (end_vsetvl)) + { + if (JUMP_P (end_vsetvl) + && vsetvl_discard_result_insn_p (PREV_INSN (end_vsetvl))) + end_vsetvl = PREV_INSN (end_vsetvl); + else + return; + } + + if (single_succ_p (bb->cfg_bb ())) + { + edge e = single_succ_edge (bb->cfg_bb ()); + auto require + = m_vector_manager->vector_block_infos[e->dest->index].local_dem; + const auto reaching_out + = m_vector_manager->vector_block_infos[bb->index ()].reaching_out; + if (require.get_avl_source () + && require.skip_avl_compatible_p (reaching_out) + && reaching_out.get_insn () == insn + && get_vl (insn->rtl ()) == get_avl (end_vsetvl)) + { + require.set_avl_info (reaching_out.get_avl_info ()); + require = reaching_out.merge (require, LOCAL_MERGE); + change_vsetvl_insn (insn, require); + eliminate_insn (end_vsetvl); + } + } } } diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-1.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-1.c new file mode 100644 index 00000000000..f30275c8280 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-1.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */ + +#include "riscv_vector.h" + +void f (int32_t * a, int32_t * b, int n) +{ + if (n <= 0) + return; + int i = n; + size_t vl = __riscv_vsetvl_e32m1 (i); + for (; i >= 0; i--) + { + vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl); + __riscv_vse32_v_i32m1 (b, v, vl); + + if (i >= vl) + continue; + if (i == 0) + return; + vl = __riscv_vsetvl_e32m1 (i); + } +} + +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e32,\s*m1,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-2.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-2.c new file mode 100644 index 00000000000..5f6647bb916 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-2.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */ + +#include "riscv_vector.h" + +void f (int32_t * a, int32_t * b, int n) +{ + if (n <= 0) + return; + int i = n; + size_t vl = __riscv_vsetvl_e8mf4 (i); + for (; i >= 0; i--) + { + vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl); + __riscv_vse32_v_i32m1 (b, v, vl); + + if (i >= vl) + continue; + if (i == 0) + return; + vl = __riscv_vsetvl_e32m1 (i); + } +} + + +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e32,\s*m1,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-3.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-3.c new file mode 100644 index 00000000000..5dbc871ed12 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-3.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */ + +#include "riscv_vector.h" + +void f (int32_t * a, int32_t * b, int n) +{ + if (n <= 0) + return; + int i = n; + size_t vl = __riscv_vsetvl_e8mf2 (i); + for (; i >= 0; i--) + { + vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl); + __riscv_vse32_v_i32m1 (b, v, vl); + + if (i >= vl) + continue; + if (i == 0) + return; + vl = __riscv_vsetvl_e32m1 (i); + } +} + +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e8,\s*mf2,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e32,\s*m1,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e32,\s*m1,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-4.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-4.c new file mode 100644 index 00000000000..edd12855f58 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-4.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */ + +#include "riscv_vector.h" + +void +f (int32_t *a, int32_t *b, int n) +{ + if (n <= 0) + return; + int i = n; + size_t vl = __riscv_vsetvl_e8mf4 (i); + for (; i >= 0; i--) + { + vint32m1_t v = __riscv_vle32_v_i32m1 (a + i, vl); + v = __riscv_vle32_v_i32m1_tu (v, a + i + 100, vl); + __riscv_vse32_v_i32m1 (b + i, v, vl); + + if (i >= vl) + continue; + if (i == 0) + return; + vl = __riscv_vsetvl_e8mf4 (i); + } +} + +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e32,\s*m1,\s*tu,\s*m[au]} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */