From patchwork Fri Apr 21 09:19:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 86193 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp931730vqo; Fri, 21 Apr 2023 02:19:57 -0700 (PDT) X-Google-Smtp-Source: AKy350bUOMlVlmIJ4XegIGXPeQpZQVH/GRgW+oenFO9r2AqNjL7eb/j/i7usBvpacaa8SnQ7NXVB X-Received: by 2002:a17:906:b6c2:b0:94f:1d54:95d2 with SMTP id ec2-20020a170906b6c200b0094f1d5495d2mr1744660ejb.15.1682068797706; Fri, 21 Apr 2023 02:19:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682068797; cv=none; d=google.com; s=arc-20160816; b=XfmT36ZNPvNtB8b1T9ads6yB5ic/lFaC8P1jc84ZpZHAjppnuhgWsQHjfUDZ2iNvjP /8qeFU47aQRxnBctsSijyFcoAguSarkFGtLgbZiqPdkIhkPafNi59kiPjVRylOgZpERq D8lgvA7Scmaqgksg4VXi1yQb37ErA681x9n8QBiT84amtFIjYfC3uESjeJlMjlx9q3o+ mzvU/BmEehSIUyd/QEYK6YosWY5QcS94eONdtmuuK+BVMa6yVrBv4UMkUe3CZgJPeTeZ lidyVb/9MUJZ8zUaTq7g4Jb/n/1CSz/NH5WOBLIFSawiymsoHH2K5gfC6jLPYOajInwC s2lA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=HDaSGHSDYSBodx9j5tqUZRvLvCLPuukGr/S3XVai59I=; b=M70kJcuAP4CqK1zvLy1Lwbmy9MddAJ56OP+BDfvQz2JRADhe53fzrl/LoEm+oPQoL8 +oJULjQhxvnMzU/d2P62kWx5w8BX408D1u4jVzvTyP1knbgO24c9ggEQn1b35pQ4opoo wFDiFdn1kW30dUYUFbioqSHEu3LtPJY2XCNtfmsgowj3AbMml/Ew1E5elGjDxNeVr1Si 4o0jWb0O6v9Z+SnTH+WXOoFGqjTvc5gpBQ6rg6XPwyZ6h6y4pxhvqD1znO5xmt30gnDw c+argh1GvgjDDPMeLmVLwQUzFl7RjqrR93mYERdwwJT7JwRc0aWR9AdBNfpdt8EQ0Exn OWgQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id p14-20020a17090653ce00b0094f442c8a6dsi3548004ejo.291.2023.04.21.02.19.57 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Apr 2023 02:19:57 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E842B3856DE6 for ; Fri, 21 Apr 2023 09:19:47 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgsg2.qq.com (smtpbgsg2.qq.com [54.254.200.128]) by sourceware.org (Postfix) with ESMTPS id 835023858D37 for ; Fri, 21 Apr 2023 09:19:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 835023858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp62t1682068754toymp28m Received: from server1.localdomain ( [58.60.1.22]) by bizesmtp.qq.com (ESMTP) with id ; Fri, 21 Apr 2023 17:19:13 +0800 (CST) X-QQ-SSF: 01400000000000F0P000000A0000000 X-QQ-FEAT: +Odi5FkUgBmsQmMFlOu/AOBkWMOTirXEZXaNXMyjsYA2JsM4fLfH+uc78xb3c 38LveWxcQyblG66qVlG22dCzzxGZlI4IL7Ii7MR0ibxf+pdAbIOdttkb5YhlBYGq6nKS5Hf kSip8/0SqThNI3rjgQjlsDLprRsk8gLo0itF4MqxBGepOdTuARr4hHkEQQEr056cYtWjUo3 vQqiC+V8XQRbEldF2oelqCSVTh+/XQdlkHl/Tf36pviDe4lP+V7CKoExBMy1kAD0AhN/tx2 jveewbc5sbFjK/K34PIKYq0Rit5tXWSQNaZg8nl+VJImGV+CNpMNbcV3cDDpxXg8UQV8Khc Q3hYumrjjQTSzNTPJEWGqWjcDrGR5Lru04H5E60pqj/Wzl1yZu+nI7sK5s5DlJDYTGBivF4 BiJrOYuJXYE= X-QQ-GoodBg: 2 X-BIZMAIL-ID: 14330691024462621225 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, Juzhe-Zhong Subject: [PATCH V4] RISC-V: Defer vsetvli insertion to later if possible [PR108270] Date: Fri, 21 Apr 2023 17:19:12 +0800 Message-Id: <20230421091912.169622-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvr:qybglogicsvr7 X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763776443857537498?= X-GMAIL-MSGID: =?utf-8?q?1763776971564151227?= From: Juzhe-Zhong Fix issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270. Consider the following testcase: void f (void * restrict in, void * restrict out, int l, int n, int m) { for (int i = 0; i < l; i++){ for (int j = 0; j < m; j++){ for (int k = 0; k < n; k++) { vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17); __riscv_vse8_v_i8mf8 (out + i + j, v, 17); } } } } Compile option: -O3 Before this patch: mv a7,a2 mv a6,a0 mv t1,a1 mv a2,a3 vsetivli zero,17,e8,mf8,ta,ma ble a7,zero,.L1 ble a4,zero,.L1 ble a3,zero,.L1 ... After this patch: mv a7,a2 mv a6,a0 mv t1,a1 mv a2,a3 ble a7,zero,.L1 ble a4,zero,.L1 ble a3,zero,.L1 add a1,a0,a4 li a0,0 vsetivli zero,17,e8,mf8,ta,ma ... This issue is a missed optmization produced by Phase 3 global backward demand fusion instead of LCM. This patch is fixing poor placement of the vsetvl. This point is seletected not because LCM but by Phase 3 (VL/VTYPE demand info backward fusion and propogation) which is I introduced into VSETVL PASS to enhance LCM && improve vsetvl instruction performance. This patch is to supress the Phase 3 too aggressive backward fusion and propagation to the top of the function program when there is no define instruction of AVL (AVL is 0 ~ 31 imm since vsetivli instruction allows imm value instead of reg). You may want to ask why we need Phase 3 to the job. Well, we have so many situations that pure LCM fails to optimize, here I can show you a simple case to demonstrate it: void f (void * restrict in, void * restrict out, int n, int m, int cond) { size_t vl = 101; for (size_t j = 0; j < m; j++){ if (cond) { for (size_t i = 0; i < n; i++) { vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, vl); __riscv_vse8_v_i8mf8 (out + i, v, vl); } } else { for (size_t i = 0; i < n; i++) { vint32mf2_t v = __riscv_vle32_v_i32mf2 (in + i + j, vl); v = __riscv_vadd_vv_i32mf2 (v,v,vl); __riscv_vse32_v_i32mf2 (out + i, v, vl); } } } } You can see: The first inner loop needs vsetvli e8 mf8 for vle+vse. The second inner loop need vsetvli e32 mf2 for vle+vadd+vse. If we don't have Phase 3 (Only handled by LCM (Phase 4)), we will end up with : outerloop: ... vsetvli e8mf8 inner loop 1: .... vsetvli e32mf2 inner loop 2: .... However, if we have Phase 3, Phase 3 is going to fuse the vsetvli e32 mf2 of inner loop 2 into vsetvli e8 mf8, then we will end up with this result after phase 3: outerloop: ... inner loop 1: vsetvli e32mf2 .... inner loop 2: vsetvli e32mf2 .... Then, this demand information after phase 3 will be well optimized after phase 4 (LCM), after Phase 4 result is: vsetvli e32mf2 outerloop: ... inner loop 1: .... inner loop 2: .... You can see this is the optimal codegen after current VSETVL PASS (Phase 3: Demand backward fusion and propagation + Phase 4: LCM ). This is a known issue when I start to implement VSETVL PASS. PR 108270 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (vector_infos_manager::all_empty_predecessor_p): New function. (pass_vsetvl::backward_demand_fusion): Ditto. * config/riscv/riscv-vsetvl.h: Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c: Adapt testcase. * gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c: Ditto. * gcc.target/riscv/rvv/vsetvl/pr108270.c: New test. --- gcc/config/riscv/riscv-vsetvl.cc | 23 +++++++++++++++++++ gcc/config/riscv/riscv-vsetvl.h | 2 ++ .../riscv/rvv/vsetvl/imm_bb_prop-1.c | 2 +- .../riscv/rvv/vsetvl/imm_conflict-3.c | 4 ++-- .../gcc.target/riscv/rvv/vsetvl/pr108270.c | 19 +++++++++++++++ 5 files changed, 47 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr108270.c diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index 5f424221659..167e3c6145c 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -2355,6 +2355,21 @@ vector_infos_manager::get_all_available_exprs ( return available_list; } +bool +vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) const +{ + hash_set pred_cfg_bbs = get_all_predecessors (cfg_bb); + for (const basic_block pred_cfg_bb : pred_cfg_bbs) + { + const auto &pred_block_info = vector_block_infos[pred_cfg_bb->index]; + if (!pred_block_info.local_dem.valid_or_dirty_p () + && !pred_block_info.reaching_out.valid_or_dirty_p ()) + continue; + return false; + } + return true; +} + bool vector_infos_manager::all_same_ratio_p (sbitmap bitdata) const { @@ -3138,6 +3153,14 @@ pass_vsetvl::backward_demand_fusion (void) if (!backward_propagate_worthwhile_p (cfg_bb, curr_block_info)) continue; + /* Fix PR108270: + + bb 0 -> bb 1 + We don't need to backward fuse VL/VTYPE info from bb 1 to bb 0 + if bb 1 is not inside a loop and all predecessors of bb 0 are empty. */ + if (m_vector_manager->all_empty_predecessor_p (cfg_bb)) + continue; + edge e; edge_iterator ei; /* Backward propagate to each predecessor. */ diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h index 237381f7026..eec03d35071 100644 --- a/gcc/config/riscv/riscv-vsetvl.h +++ b/gcc/config/riscv/riscv-vsetvl.h @@ -450,6 +450,8 @@ public: /* Return true if all expression set in bitmap are same ratio. */ bool all_same_ratio_p (sbitmap) const; + bool all_empty_predecessor_p (const basic_block) const; + void release (void); void create_bitmap_vectors (void); void free_bitmap_vectors (void); diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c index cd4ee7dd0d3..ed32a40f5e7 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c @@ -29,4 +29,4 @@ void f (int8_t * restrict in, int8_t * restrict out, int n, int cond) } } -/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e8,\s*mf8,\s*tu,\s*m[au]} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e8,\s*mf8,\s*tu,\s*m[au]} 2 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c index 1f7c0f036a2..2fa29c01dbc 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c @@ -20,7 +20,7 @@ void f (int8_t * restrict in, int8_t * restrict out, int n, int cond) } } -/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e8,\s*mf8,\s*tu,\s*m[au]} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */ /* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-funroll-loops" no-opts "-g" } } } } */ -/* { dg-final { scan-assembler-times {vsetivli} 1 { target { no-opts "-O0" no-opts "-funroll-loops" no-opts "-g" } } } } */ +/* { dg-final { scan-assembler-times {vsetivli} 2 { target { no-opts "-O0" no-opts "-funroll-loops" no-opts "-g" } } } } */ /* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" no-opts "-funroll-loops" no-opts "-g" } } } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr108270.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr108270.c new file mode 100644 index 00000000000..d2ae43bf263 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr108270.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */ + +#include "riscv_vector.h" + +void f (void * restrict in, void * restrict out, int l, int n, int m) +{ + for (int i = 0; i < l; i++){ + for (int j = 0; j < m; j++){ + for (int k = 0; k < n; k++) + { + vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17); + __riscv_vse8_v_i8mf8 (out + i + j, v, 17); + } + } + } +} + +/* { dg-final { scan-assembler-not {mv\s+[a-x0-9]+,[a-x0-9]+\s+mv\s+[a-x0-9]+,[a-x0-9]+\s+mv\s+[a-x0-9]+,[a-x0-9]+\s+mv\s+[a-x0-9]+,[a-x0-9]+\s+mv\s+[a-x0-9]+,[a-x0-9]+\s+vsetivli} } } */