From patchwork Mon May 8 23:17:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 91314 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2493005vqo; Mon, 8 May 2023 16:18:23 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4ywJuja7kpb8YmPwx9ABBDOVx4V3OA3CTY8Sjkx8NTgNJ30IlhJA35Uqw2g6l7FVo4hXza X-Received: by 2002:a17:907:72d6:b0:94e:43ce:95f6 with SMTP id du22-20020a17090772d600b0094e43ce95f6mr11064203ejc.47.1683587903594; Mon, 08 May 2023 16:18:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683587903; cv=none; d=google.com; s=arc-20160816; b=b0M7qoKnsAmfmN2/PB8jnN94B0vLCNql1koXebiOePdqgAJ77jYFdDgNTT1lfTJsMi Q1t1Wdgq6BuwXggzi9ppSiuKPnhzznGXF8oUEvodRjMdje3EQ/5e7nBrRubs1rBx3u+/ 1I2NSacfCqtnElLgx3/yMDvkpJrxCVJ12efasR2/spuJzRgdGfcW8aDtXA9k7B/plpcH jglWgH+K1AIM0zuzfOhikvKsnn09Lall27rw2yVo7aufplD9OTCOX9ZJNeS+4x+Niwy9 acaKn8TAIp1JMuMkRtbxweQeBn9vGHtGUTA95aubkSdhuk8+qOlqgij6JObGan/P5qqW E2Gw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=F1O7MCyiUPKCV4FiQZQoikc/FXt4XFNQMSGLT4lxYas=; b=0CFUPLOxosov/OptI1rW6VgycbMmsZEcSWowUQkqf6l9ho00FIlvMODwHxtyD5oL8p K8lO1dDGxY7GsVAI7tK+x84C1DH7rSXDewuvaJfeTqwsjeTrmXTytvI0H0+j7yunBLDr 9F+UEClLAUPK1eApIAESlU795CxPEFqD6DeOhN6laZdlvomMpF2Og/rj9Vyptzn1SDd5 PNtqMZeda3j5MqSrM/oNrWqhIbKEPO77ZYzHtqxKm4u7UQv2P7giYb/QALG2InHLUCMR Phvxixj8hU8lKqRv3at4l83ISa7OxGg4irgo5SOdvV6ZBDn9TEu+EP68Y/rdJfQrxuY9 TCeg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id i2-20020a17090685c200b0094f6cda60bfsi619792ejy.748.2023.05.08.16.18.23 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 May 2023 16:18:23 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 623943856961 for ; Mon, 8 May 2023 23:18:11 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgsg2.qq.com (smtpbgsg2.qq.com [54.254.200.128]) by sourceware.org (Postfix) with ESMTPS id AF1D13858D28 for ; Mon, 8 May 2023 23:17:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AF1D13858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp70t1683587849tqu7l9co Received: from rios-cad5.localdomain ( [58.60.1.11]) by bizesmtp.qq.com (ESMTP) with id ; Tue, 09 May 2023 07:17:28 +0800 (CST) X-QQ-SSF: 01400000000000F0Q000000A0000000 X-QQ-FEAT: XBN7tc9DADLcRAmVL2nEaSy5dgZM92SzvJjGj61CrJAe4hlYqhapCmhMkjY4I jr3dzX4hpT129AIIGfLcOTdvxdpEzflFaD9WhtAdTRkePAmKp1Z537RIrjTchT2/GHoewvt NrXPHlxaLBdcLWwL73iDmSUirNSVwOvDYkw012uME1IdqXqb48302yo/wuh2YircndLsKgh CIHqbNQ4CFOr5PZOV2+tlQXGmZLtsAvmujUqmL8cNyFQhRj/RWTkOKXsAyEFHzYKnNzXUBh JwqCjzWWqws+x2mZ1erXUBI1DuRNMhMlWww+SWKpFO1J/EomNG5lKGsIgztz5IoACnsKcuu 6uEYnK0tfeVZVxdQ7TGC5mTUY1+2i5qnetfH7cWDyjWKyG1l3I= X-QQ-GoodBg: 2 X-BIZMAIL-ID: 4650577338721857732 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, Juzhe-Zhong Subject: [PATCH V3] RISC-V: Optimize vsetvli of LCM INSERTED edge for user vsetvli [PR 109743] Date: Tue, 9 May 2023 07:17:26 +0800 Message-Id: <20230508231726.801047-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, LIKELY_SPAM_BODY, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1765143202506912894?= X-GMAIL-MSGID: =?utf-8?q?1765369869761365157?= From: Juzhe-Zhong Rebase to trunk and send V3 patch for: https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617821.html This patch is fixing: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109743. This issue happens is because we are currently very conservative in optimization of user vsetvli. Consider this following case: bb 1: vsetvli a5,a4... (demand AVL = a4). bb 2: RVV insn use a5 (demand AVL = a5). LCM will hoist vsetvl of bb 2 into bb 1. We don't do AVL propagation for this situation since it's complicated that we should analyze the code sequence between vsetvli in bb 1 and RVV insn in bb 2. They are not necessary the consecutive blocks. This patch is doing the optimizations after LCM, we will check and eliminate the vsetvli in LCM inserted edge if such vsetvli is redundant. Such approach is much simplier and safe. code: void foo2 (int32_t *a, int32_t *b, int n) { if (n <= 0) return; int i = n; size_t vl = __riscv_vsetvl_e32m1 (i); for (; i >= 0; i--) { vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl); __riscv_vse32_v_i32m1 (b, v, vl); if (i >= vl) continue; if (i == 0) return; vl = __riscv_vsetvl_e32m1 (i); } } Before this patch: foo2: .LFB2: .cfi_startproc ble a2,zero,.L1 mv a4,a2 li a3,-1 vsetvli a5,a2,e32,m1,ta,mu vsetvli zero,a5,e32,m1,ta,ma <- can be eliminated. .L5: vle32.v v1,0(a0) vse32.v v1,0(a1) bgeu a4,a5,.L3 .L10: beq a2,zero,.L1 vsetvli a5,a4,e32,m1,ta,mu addi a4,a4,-1 vsetvli zero,a5,e32,m1,ta,ma <- can be eliminated. vle32.v v1,0(a0) vse32.v v1,0(a1) addiw a2,a2,-1 bltu a4,a5,.L10 .L3: addiw a2,a2,-1 addi a4,a4,-1 bne a2,a3,.L5 .L1: ret After this patch: f: ble a2,zero,.L1 mv a4,a2 li a3,-1 vsetvli a5,a2,e32,m1,ta,ma .L5: vle32.v v1,0(a0) vse32.v v1,0(a1) bgeu a4,a5,.L3 .L10: beq a2,zero,.L1 vsetvli a5,a4,e32,m1,ta,ma addi a4,a4,-1 vle32.v v1,0(a0) vse32.v v1,0(a1) addiw a2,a2,-1 bltu a4,a5,.L10 .L3: addiw a2,a2,-1 addi a4,a4,-1 bne a2,a3,.L5 .L1: ret PR target/109743 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pass_vsetvl::local_eliminate_vsetvl_insn): Enhance local optimizations for LCM. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/pr109743-1.c: New test. * gcc.target/riscv/rvv/vsetvl/pr109743-2.c: New test. * gcc.target/riscv/rvv/vsetvl/pr109743-3.c: New test. * gcc.target/riscv/rvv/vsetvl/pr109743-4.c: New test. --- gcc/config/riscv/riscv-vsetvl.cc | 47 ++++++++++++++++++- .../gcc.target/riscv/rvv/vsetvl/pr109743-1.c | 26 ++++++++++ .../gcc.target/riscv/rvv/vsetvl/pr109743-2.c | 27 +++++++++++ .../gcc.target/riscv/rvv/vsetvl/pr109743-3.c | 28 +++++++++++ .../gcc.target/riscv/rvv/vsetvl/pr109743-4.c | 28 +++++++++++ 5 files changed, 155 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-4.c diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index d4d6f336ef9..72aa2bfcf6f 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -4026,7 +4026,8 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const vector_insn_info &dem) const { if (i->is_call () || i->is_asm () || find_access (i->defs (), VL_REGNUM) - || find_access (i->defs (), VTYPE_REGNUM)) + || find_access (i->defs (), VTYPE_REGNUM) + || find_access (i->defs (), REGNO (vl))) return; if (has_vtype_op (i->rtl ())) @@ -4051,6 +4052,50 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const vector_insn_info &dem) const return; } } + + /* Here we optimize the VSETVL is hoisted by LCM: + + Before LCM: + bb 1: + vsetvli a5,a2,e32,m1,ta,mu + bb 2: + vsetvli zero,a5,e32,m1,ta,mu + ... + + After LCM: + bb 1: + vsetvli a5,a2,e32,m1,ta,mu + LCM INSERTED: vsetvli zero,a5,e32,m1,ta,mu --> eliminate + bb 2: + ... + Such instruction can not be accessed in RTL_SSA when we don't re-init + the new RTL_SSA framework but it is definetely at the END of the block. */ + rtx_insn *end_vsetvl = BB_END (bb->cfg_bb ()); + if (!vsetvl_discard_result_insn_p (end_vsetvl)) + { + if (JUMP_P (end_vsetvl) + && vsetvl_discard_result_insn_p (PREV_INSN (end_vsetvl))) + end_vsetvl = PREV_INSN (end_vsetvl); + else + return; + } + + if (single_succ_p (bb->cfg_bb ())) + { + edge e = single_succ_edge (bb->cfg_bb ()); + auto require = get_block_info (e->dest).local_dem; + const auto reaching_out = get_block_info (bb->cfg_bb ()).reaching_out; + if (require.get_avl_source () + && require.skip_avl_compatible_p (reaching_out) + && reaching_out.get_insn () == insn + && get_vl (insn->rtl ()) == get_avl (end_vsetvl)) + { + require.set_avl_info (reaching_out.get_avl_info ()); + require = reaching_out.merge (require, LOCAL_MERGE); + change_vsetvl_insn (insn, require); + eliminate_insn (end_vsetvl); + } + } } } diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-1.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-1.c new file mode 100644 index 00000000000..f30275c8280 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-1.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */ + +#include "riscv_vector.h" + +void f (int32_t * a, int32_t * b, int n) +{ + if (n <= 0) + return; + int i = n; + size_t vl = __riscv_vsetvl_e32m1 (i); + for (; i >= 0; i--) + { + vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl); + __riscv_vse32_v_i32m1 (b, v, vl); + + if (i >= vl) + continue; + if (i == 0) + return; + vl = __riscv_vsetvl_e32m1 (i); + } +} + +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e32,\s*m1,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-2.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-2.c new file mode 100644 index 00000000000..5f6647bb916 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-2.c @@ -0,0 +1,27 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */ + +#include "riscv_vector.h" + +void f (int32_t * a, int32_t * b, int n) +{ + if (n <= 0) + return; + int i = n; + size_t vl = __riscv_vsetvl_e8mf4 (i); + for (; i >= 0; i--) + { + vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl); + __riscv_vse32_v_i32m1 (b, v, vl); + + if (i >= vl) + continue; + if (i == 0) + return; + vl = __riscv_vsetvl_e32m1 (i); + } +} + + +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e32,\s*m1,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-3.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-3.c new file mode 100644 index 00000000000..5dbc871ed12 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-3.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */ + +#include "riscv_vector.h" + +void f (int32_t * a, int32_t * b, int n) +{ + if (n <= 0) + return; + int i = n; + size_t vl = __riscv_vsetvl_e8mf2 (i); + for (; i >= 0; i--) + { + vint32m1_t v = __riscv_vle32_v_i32m1 (a, vl); + __riscv_vse32_v_i32m1 (b, v, vl); + + if (i >= vl) + continue; + if (i == 0) + return; + vl = __riscv_vsetvl_e32m1 (i); + } +} + +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e8,\s*mf2,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e32,\s*m1,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e32,\s*m1,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli} 3 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-4.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-4.c new file mode 100644 index 00000000000..edd12855f58 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109743-4.c @@ -0,0 +1,28 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */ + +#include "riscv_vector.h" + +void +f (int32_t *a, int32_t *b, int n) +{ + if (n <= 0) + return; + int i = n; + size_t vl = __riscv_vsetvl_e8mf4 (i); + for (; i >= 0; i--) + { + vint32m1_t v = __riscv_vle32_v_i32m1 (a + i, vl); + v = __riscv_vle32_v_i32m1_tu (v, a + i + 100, vl); + __riscv_vse32_v_i32m1 (b + i, v, vl); + + if (i >= vl) + continue; + if (i == 0) + return; + vl = __riscv_vsetvl_e8mf4 (i); + } +} + +/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*[a-x0-9]+,\s*e32,\s*m1,\s*tu,\s*m[au]} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } */