[V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS

From: Juzhe-Zhong <juzhe.zhong@rivai.ai>

  From: Juzhe-Zhong <juzhe.zhong@rivai.ai>

This patch is to rework Phase 5 && Phase 6 of VSETVL PASS since Phase 5 && Phase 6
are quite messy and cause some bugs discovered by my downstream auto-vectorization
test-generator.

Before this patch.

Phase 5 is cleanup_insns is the function remove AVL operand dependency from each RVV instruction.
E.g. vadd.vv (use a5), after Phase 5, ====> vadd.vv (use const_int 0). Since "a5" is used in "vsetvl" instructions and
after the correct "vsetvl" instructions are inserted, each RVV instruction doesn't need AVL operand "a5" anymore. Then,
we remove this operand dependency helps for the following scheduling PASS.

Phase 6 is propagate_avl do the following 2 things:
1. Local && Global user vsetvl instructions optimization.
   E.g.
      vsetvli a2, a2, e8, mf8   ======> Change it into vsetvli a2, a2, e32, mf2
      vsetvli zero,a2, e32, mf2  ======> eliminate
2. Optimize user vsetvl from "vsetvl a2,a2" into "vsetvl zero,a2" if "a2" is not used by any instructions.
Since from Phase 1 ~ Phase 4 which inserts "vsetvli" instructions base on LCM which change the CFG, I re-new a new
RTL_SSA framework (which is more expensive than just using DF) for Phase 6 and optmize user vsetvli base on the new RTL_SSA.

There are 2 issues in Phase 5 && Phase 6:
1. local_eliminate_vsetvl_insn was introduced by @kito which can do better local user vsetvl optimizations better than
   Phase 6 do, such approach doesn't need to re-new the RTL_SSA framework. So the local user vsetvli instructions optimizaiton
   in Phase 6 is redundant and should be removed.
2. A bug discovered by my downstream auto-vectorization test-generator (I can't put the test in this patch since we are missing autovec
   patterns for it so we can't use the upstream GCC directly reproduce such issue but I will remember put it back after I support the
   necessary autovec patterns). Such bug is causing by using RTL_SSA re-new framework. The issue description is this:

Before Phase 6:
   ...
   insn1: vsetlvi a3, 17 <========== generated by SELECT_VL auto-vec pattern.
   slli a4,a3,3
   ...
   insn2: vsetvli zero, a3, ... 
   load (use const_int 0, before Phase 5, it's using a3, but the use of "a3" is removed in Phase 5)
   ...

In Phase 6, we iterate to insn2, then get the def of "a3" which is the insn1.
insn2 is the vsetvli instruction inserted in Phase 4 which is not included in the RLT_SSA framework
even though we renew it (I didn't take a look at it and I don't think we need to now).
Base on this situation, the def_info of insn2 has the information "set->single_nondebug_insn_use ()"
which return true. Obviously, this information is not correct, since insn1 has aleast 2 uses:
1). slli a4,a3,3 2).insn2: vsetvli zero, a3, ... Then, the test generated by my downstream test-generator
execution test failed.

Conclusion of RTL_SSA framework:
Before this patch, we initialize RTL_SSA 2 times. One is at the beginning of the VSETVL PASS which is absolutely correct, the other
is re-new after Phase 4 (LCM) has incorrect information that causes bugs.

Besides, we don't like to initialize RTL_SSA second time it seems to be a waste since we just need to do a little optimization.

Base on all circumstances I described above, I rework and reorganize Phase 5 && Phase 6 as follows:
1. Phase 5 is called ssa_post_optimization which is doing the optimization base on the RTL_SSA information (The RTL_SSA is initialized
   at the beginning of the VSETVL PASS, no need to re-new it again). This phase includes 3 optimizaitons:
   1). local_eliminate_vsetvl_insn we already have (no change).
   2). global_eliminate_vsetvl_insn ---> new optimizaiton splitted from orignal Phase 6 but with more powerful and reliable implementation.
      E.g. 
      void f(int8_t *base, int8_t *out, size_t vl, size_t m, size_t k) {
        size_t avl;
        if (m > 100)
          avl = __riscv_vsetvl_e16mf4(vl << 4);
        else
          avl = __riscv_vsetvl_e32mf2(vl >> 8);
        for (size_t i = 0; i < m; i++) {
          vint8mf8_t v0 = __riscv_vle8_v_i8mf8(base + i, avl);
          v0 = __riscv_vadd_vv_i8mf8 (v0, v0, avl);
          __riscv_vse8_v_i8mf8(out + i, v0, avl);
        }
      }

      This example failed to global user vsetvl optimize before this patch:
      f:
              li      a5,100
              bleu    a3,a5,.L2
              slli    a2,a2,4
              vsetvli a4,a2,e16,mf4,ta,mu
      .L3:
              li      a5,0
              vsetvli zero,a4,e8,mf8,ta,ma
      .L5:
              add     a6,a0,a5
              add     a2,a1,a5
              vle8.v  v1,0(a6)
              addi    a5,a5,1
              vadd.vv v1,v1,v1
              vse8.v  v1,0(a2)
              bgtu    a3,a5,.L5
      .L10:
              ret
      .L2:
              beq     a3,zero,.L10
              srli    a2,a2,8
              vsetvli a4,a2,e32,mf2,ta,mu
              j       .L3
      With this patch:
      f:
              li      a5,100
              bleu    a3,a5,.L2
              slli    a2,a2,4
              vsetvli zero,a2,e8,mf8,ta,ma
      .L3:
              li      a5,0
      .L5:
              add     a6,a0,a5
              add     a2,a1,a5
              vle8.v  v1,0(a6)
              addi    a5,a5,1
              vadd.vv v1,v1,v1
              vse8.v  v1,0(a2)
              bgtu    a3,a5,.L5
      .L10:
              ret
      .L2:
              beq     a3,zero,.L10
              srli    a2,a2,8
              vsetvli zero,a2,e8,mf8,ta,ma
              j       .L3

   3). Remove AVL operand dependency of each RVV instructions.

2. Phase 6 is called df_post_optimization: Optimize "vsetvl a3,a2...." into Optimize "vsetvl zero,a2...." base on
   dataflow analysis of new CFG (new CFG is created by LCM). The reason we need to do use new CFG and after Phase 5:
   ...
   vsetvl a3, a2...
   vadd.vv (use a3)
   If we don't have Phase 5 which removes the "a3" use in vadd.vv, we will fail to optimize vsetvl a3,a2 into vsetvl zero,a2.

   This patch passed all tests in rvv.exp with ONLY peformance && codegen improved (no performance decline and no bugs including my
   downstream tests).

gcc/ChangeLog:

        * config/riscv/riscv-vsetvl.cc (available_occurrence_p): Ehance user vsetvl optimization.
        (vector_insn_info::parse_insn): Add rtx_insn parse.
        (pass_vsetvl::local_eliminate_vsetvl_insn): Ehance user vsetvl optimization.
        (get_first_vsetvl): New function.
        (pass_vsetvl::global_eliminate_vsetvl_insn): Ditto.
        (pass_vsetvl::cleanup_insns): Remove it.
        (pass_vsetvl::ssa_post_optimization): New function.
        (has_no_uses): Ditto.
        (pass_vsetvl::propagate_avl): Remove it.
        (pass_vsetvl::df_post_optimization): New function.
        (pass_vsetvl::lazy_vsetvl): Rework Phase 5 && Phase 6.
        * config/riscv/riscv-vsetvl.h: Adapt declaration.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/vsetvl/vsetvl-16.c: Adapt test.
        * gcc.target/riscv/rvv/vsetvl/vsetvl-2.c: Ditto.
        * gcc.target/riscv/rvv/vsetvl/vsetvl-3.c: Ditto.
        * gcc.target/riscv/rvv/vsetvl/vsetvl-21.c: New test.
        * gcc.target/riscv/rvv/vsetvl/vsetvl-22.c: New test.
        * gcc.target/riscv/rvv/vsetvl/vsetvl-23.c: New test.

---
 gcc/config/riscv/riscv-vsetvl.cc              | 400 +++++++++++-------
 gcc/config/riscv/riscv-vsetvl.h               |  34 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-16.c   |   2 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-2.c    |   2 +-
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-21.c   |  21 +
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-22.c   |  21 +
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-23.c   |  37 ++
 .../gcc.target/riscv/rvv/vsetvl/vsetvl-3.c    |   2 +-
 8 files changed, 366 insertions(+), 153 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-21.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-22.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vsetvl-23.c

Message ID	20230609104105.9100-1-juzhe.zhong@rivai.ai
State	Unresolved
Headers	Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp851302vqr; Fri, 9 Jun 2023 03:41:59 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6Q1pcySMcwwQ92dSnG6QG8POgEg3muw6litZdrfdN2jlDGlnF/IEZ4790jycNMQ2+HawoK X-Received: by 2002:a05:6512:310f:b0:4f5:ac59:f65c with SMTP id n15-20020a056512310f00b004f5ac59f65cmr672426lfb.1.1686307319643; Fri, 09 Jun 2023 03:41:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686307319; cv=none; d=google.com; s=arc-20160816; b=m6FhtGQBqFCE477r9c1fBV/eeC2l4TZWcdQAEe6ol4LOYRupIupKunoohulqAw+k3R inl+hLgHV+YqqYefNGJ935HXOuXHFGED7ntYB0PrrwmZbwvYGEMgg4UehBN9qdVHiFDq 3nQ9RAsRT3xoWnhyCyqd4gB5vPCnZTzpcUFipgiOZ2oMqZ1+TEfCsryVKpsiYN8RDypT sj8WYnbF7N+GoUhzYZsIr8N39UewnTxpOWq4qXkrbuII2V/e3gCGo5D2bmAWMzMX1wli 0LVxlcLZkY9wxNVe+GwuiSvdjjsB5kg7IYrAiNuv3+Kfmo32yCp2yLBFfhTgso6dXcC4 5t1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=BDnkZaWdN8D8G84fkO0sRvC00ejpWb0zaxnjFfxuAXE=; b=rgxSgb6NEi88DEXZUv0ZhUysNSbdApZBL8X7nI0ODfKTeSo4CiTzMFTdvzHvP/rR8s HSmXy4+XqrqBYe4dEJPQfz08weM/8GS6cHxIIiem9r1kFJXjpFBNvBOW9xQwQEdEvCXT hrhAkhCl3TC9K6nxRphiPzKO/KqZeI441JqAGlu4Y4Esp08Xb47Jkftf6dBEq9JgK3g6 PqkcJOTZKvl2TfGfEOitDlf/L5eRR59S8d29F5D3ALfvIez/z9iNMiIAg82oL2bITJdW Ah2nn1SehX6wBJFcG0wq9q6qHLVKfDK+r2i2llovvmMw6ccvrmWh5kUfFfLR9hOPkriX dwmQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id s18-20020a056402015200b0050bd1a4bed4si1918409edu.16.2023.06.09.03.41.59 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Jun 2023 03:41:59 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C17B03856639 for <ouuuleilei@gmail.com>; Fri, 9 Jun 2023 10:41:47 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbguseast2.qq.com (smtpbguseast2.qq.com [54.204.34.130]) by sourceware.org (Postfix) with ESMTPS id 84ADD3858D35 for <gcc-patches@gcc.gnu.org>; Fri, 9 Jun 2023 10:41:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 84ADD3858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp91t1686307268trnagkv1 Received: from server1.localdomain ( [58.60.1.22]) by bizesmtp.qq.com (ESMTP) with id ; Fri, 09 Jun 2023 18:41:07 +0800 (CST) X-QQ-SSF: 01400000000000F0S000000A0000000 X-QQ-FEAT: dcYQFNbI8vEltQCEfZV3gG+SsRvCdU5T8SdltZ001L3pl1y3t/0zhT7mcwUhj 7hgOTRaa7n5sKQWy9YycegKjxiaeljXRNjCK8WQMeysEZf8Sl2U8t1IQMtYBafPUY3CQ+G6 8bI9EPiwrtuiCwBFaRadYvxCz1iKnKANQLkHGalteHnfVevnDMnEbWsUHFPKvTJimKipCY2 /tB1EQ1q1OpzbXAUxulA7xSaKndKwSv5buqTUtc4GP1k4hIpnNcn2Sq/IRiDLNJKeaTVWkv h2QBjN5KWgGjUB8xlp2Ld0LoRV29U7rRXwp0/qBYLV6h4czasci98vLH9xcJ/xOiSc1gfFb MuGuvcyttc/h0Mf1p0Az2A540tGnvqSeFRY537hevecD/x3iy8Ii6Y53fpwGagfwOWrAZon 3bhoQT8A1LwMUkwT4/x7nA== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 8645705297069349055 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, palmer@dabbelt.com, palmer@rivosinc.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, pan2.li@intel.com, Juzhe-Zhong <juzhe.zhong@rivai.ai> Subject: [PATCH V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS Date: Fri, 9 Jun 2023 18:41:05 +0800 Message-Id: <20230609104105.9100-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, LIKELY_SPAM_BODY, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768220871114561934?= X-GMAIL-MSGID: =?utf-8?q?1768221384033585574?=
Series	[V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS \| [V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS

[V2] RISC-V: Rework Phase 5 && Phase 6 of VSETVL PASS

Checks

Commit Message

Comments

Patch