From patchwork Fri Apr 21 08:02:05 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
X-Patchwork-Id: 86165
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp896757vqo;
        Fri, 21 Apr 2023 01:02:56 -0700 (PDT)
X-Google-Smtp-Source: 
 AKy350YDSSGLksw9/KMJj9/2b5w3KrVHNGlhpnEBqi0WpsGNQAR+fWfRZENBEFisvj40UoALMgPW
X-Received: by 2002:a17:907:904e:b0:949:55fd:34fa with SMTP id
 az14-20020a170907904e00b0094955fd34famr1218532ejc.39.1682064176389;
        Fri, 21 Apr 2023 01:02:56 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1682064176; cv=none;
        d=google.com; s=arc-20160816;
        b=dTtjeM8apiqXt6bGfyGfKgNHxqycJHDXD8EICvgBSL6AJ62/WuQXyoia5ODe8730YA
         wuFMCZFamJ0S+//63JMYeFwcRnf7mpQzviL5YbTc3wRZBcEVH5nwT2FdhDFA8tzFhRM9
         u9m289mTULRPdRgQYLitHUcejfAd+vz3mPLikfatMr03zLf0QzE/lrV7Mx2+agD8oLXh
         RvElxG87be9BKWp9fuWhzbAEl+WIRF1BFfYBFUfn4mtquw5S52+wJUFAFopiAOoLC0CC
         SITVGV1iGAZi9hC5Cj05Td+0ieu29ospxiNFnoeFJ5o/rSe00Vh65x1vabUnsQyG3q91
         UxyQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:list-subscribe:list-help:list-post:list-archive
         :list-unsubscribe:list-id:precedence:feedback-id
         :content-transfer-encoding:mime-version:message-id:date:subject:cc
         :to:from:dmarc-filter:delivered-to;
        bh=LSxCydAlOaj6FqPjzwAOgsCFDi0SDuDRQFIsHrFy/rk=;
        b=Nwj+px/nhKyQZ0uBxGIGVfgA9QP6MSNY/3170HJQUBP3gjdOFF325+hH3c49QONmlA
         vwIu6kjzjXWDa0HmS6NwdQN+SA8sWnK9nZz60vP86uVZABHYUDb7ATP2lhbbIvp4XmSv
         cvz/wh5tjGg8pFRBnADdeuI3C5gYyLEUHsbsR+Uz6yZ1VI2NPo6mQqWt0RTRE4Ti7EVe
         Un72BUBmthQGJIK2GHSFLQXYqQuYmKZZXVVv6YBWAAhKwj2JVSgn0NtAwrqqMKJWjogc
         r08YlfT++R79wQdnIJudOecTY9WNtbK6JSrZgpn+GGhPCKKpx0aDoAoA3gee/Y8RUj4W
         vAYg==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"
Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97])
        by mx.google.com with ESMTPS id
 p13-20020aa7cc8d000000b00504a1d4547esi2888148edt.465.2023.04.21.01.02.56
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 21 Apr 2023 01:02:56 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender) client-ip=8.43.85.97;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 15383385701C
	for <ouuuleilei@gmail.com>; Fri, 21 Apr 2023 08:02:53 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from smtpbg150.qq.com (smtpbg150.qq.com [18.132.163.193])
 by sourceware.org (Postfix) with ESMTPS id A0B8B3858D37
 for <gcc-patches@gcc.gnu.org>; Fri, 21 Apr 2023 08:02:23 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A0B8B3858D37
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=rivai.ai
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai
X-QQ-mid: bizesmtp62t1682064128tnfgcxw5
Received: from server1.localdomain ( [58.60.1.22])
 by bizesmtp.qq.com (ESMTP) with
 id ; Fri, 21 Apr 2023 16:02:07 +0800 (CST)
X-QQ-SSF: 01400000000000F0P000000A0000000
X-QQ-FEAT: 0laiA9+vjADzMkS/hD+2jtzFqNdYuR0XyPpv/gMkfJp98jmb2OcmgIXF1tBk4
 P/PnEDOzLt0RoPxM8ucGfYRKGQ3zae142I17/LKQ1WIu65sUuFxQRhCtLyjxrXT8HSiksQP
 rg63tJtbSPkrmbITMxY7Irfr74l3xs5yZE5No9BjfHGkCRfpLUp6FotFcMCFEOQcjWGQyUC
 2dNaunJVL/n1hdwYZrxoZO2LJt/IR3FUOlwuf1POhAHWAieJvW0vDzPh8TV2+8udWMe6A4w
 xXjh5S0SoQDH+WRKPrF5dCsWWfNG8KQ0U9Y7FOWptuTuKfjkwYXOaBajHyMuIA6+OVFcd9d
 4Vnc1VZBByOmnP7sHM10cW5tuyMhv8njEJ4hzd1GFKMdovMty+Wsf0KlL6d6yhqO7O91plW
X-QQ-GoodBg: 2
X-BIZMAIL-ID: 15657108904516119927
From: juzhe.zhong@rivai.ai
To: gcc-patches@gcc.gnu.org
Cc: kito.cheng@gmail.com,
	Juzhe-Zhong <juzhe.zhong@rivai.ai>
Subject: [PATCH V2] RISC-V: Fix PR108279
Date: Fri, 21 Apr 2023 16:02:05 +0800
Message-Id: <20230421080205.47633-1-juzhe.zhong@rivai.ai>
X-Mailer: git-send-email 2.36.1
MIME-Version: 1.0
X-QQ-SENDSIZE: 520
Feedback-ID: bizesmtp:rivai.ai:qybglogicsvr:qybglogicsvr7
X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL,
 RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1763771945153859728?=
X-GMAIL-MSGID: =?utf-8?q?1763772125893457307?=

From: Juzhe-Zhong <juzhe.zhong@rivai.ai>

        PR 108270


Fix issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270.

Consider the following testcase:
void f (void * restrict in, void * restrict out, int l, int n, int m)
{
  for (int i = 0; i < l; i++){
    for (int j = 0; j < m; j++){
      for (int k = 0; k < n; k++)
        {
          vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17);
          __riscv_vse8_v_i8mf8 (out + i + j, v, 17);
        }
    }
  }
}

Compile option: -O3

Before this patch:
	mv	a7,a2
	mv	a6,a0	
        mv	t1,a1
	mv	a2,a3
	vsetivli	zero,17,e8,mf8,ta,ma
...

After this patch:
        mv      a7,a2
        mv      a6,a0
        mv      t1,a1
        mv      a2,a3
        ble     a7,zero,.L1
        ble     a4,zero,.L1
        ble     a3,zero,.L1
        add     a1,a0,a4
        li      a0,0
        vsetivli        zero,17,e8,mf8,ta,ma
...

This issue is a missed optmization produced by Phase 3 global backward demand fusion instead of
LCM.

This patch is fixing poor placement of the vsetvl.

This point is seletected not because LCM but by Phase 3 (VL/VTYPE demand info backward fusion and propogation) which
is I introduced into VSETVL PASS to enhance LCM && improve vsetvl instruction performance.

This patch is to supress the Phase 3 too aggressive backward fusion and propagation to the top of the function program
when there is no define instruction of AVL (AVL is 0 ~ 31 imm since vsetivli instruction allows imm value instead of reg).

You may want to ask why we need Phase 3 to the job. 
Well, we have so many situations that pure LCM fails to optimize, here I can show you a simple case to demonstrate it:
void f (void * restrict in, void * restrict out, int n, int m, int cond)
{
  size_t vl = 101;
  for (size_t j = 0; j < m; j++){
    if (cond) {
      for (size_t i = 0; i < n; i++)
        {
          vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, vl);
          __riscv_vse8_v_i8mf8 (out + i, v, vl);
        }
    } else {
      for (size_t i = 0; i < n; i++)
        {
          vint32mf2_t v = __riscv_vle32_v_i32mf2 (in + i + j, vl);
          v = __riscv_vadd_vv_i32mf2 (v,v,vl);
          __riscv_vse32_v_i32mf2 (out + i, v, vl);
        }
    }
  }
}

You can see:
The first inner loop needs vsetvli e8 mf8 for vle+vse.
The second inner loop need vsetvli e32 mf2 for vle+vadd+vse.

If we don't have Phase 3 (Only handled by LCM (Phase 4)), we will end up with :

outerloop:
...
vsetvli e8mf8
inner loop 1:
....

vsetvli e32mf2
inner loop 2:
....

However, if we have Phase 3, Phase 3 is going to fuse the vsetvli e32 mf2 of inner loop 2 into vsetvli e8 mf8, then we will end up with this result after phase 3:

outerloop:
...
inner loop 1:
vsetvli e32mf2
....

inner loop 2:
vsetvli e32mf2
....

Then, this demand information after phase 3 will be well optimized after phase 4 (LCM), after Phase 4 result is:

vsetvli e32mf2
outerloop:
...
inner loop 1:
....

inner loop 2:
....

You can see this is the optimal codegen after current VSETVL PASS (Phase 3: Demand backward fusion and propagation + Phase 4: LCM ). This is a known issue when I start to implement VSETVL PASS.

gcc/ChangeLog:

        * config/riscv/riscv-vsetvl.cc (vector_infos_manager::all_empty_predecessor_p): New function.
        (pass_vsetvl::backward_demand_fusion): Ditto.
        * config/riscv/riscv-vsetvl.h: Ditto.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c: Adapt testcase.
        * gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c: Ditto.
        * gcc.target/riscv/rvv/vsetvl/pr108270.c: New test.
---
 gcc/config/riscv/riscv-vsetvl.cc              | 23 +++++++++++++++++++
 gcc/config/riscv/riscv-vsetvl.h               |  2 ++
 .../riscv/rvv/vsetvl/imm_bb_prop-1.c          |  2 +-
 .../riscv/rvv/vsetvl/imm_conflict-3.c         |  4 ++--
 .../gcc.target/riscv/rvv/vsetvl/pr108270.c    | 19 +++++++++++++++
 5 files changed, 47 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr108270.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 5f424221659..167e3c6145c 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -2355,6 +2355,21 @@ vector_infos_manager::get_all_available_exprs (
   return available_list;
 }
 
+bool
+vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) const
+{
+  hash_set<basic_block> pred_cfg_bbs = get_all_predecessors (cfg_bb);
+  for (const basic_block pred_cfg_bb : pred_cfg_bbs)
+    {
+      const auto &pred_block_info = vector_block_infos[pred_cfg_bb->index];
+      if (!pred_block_info.local_dem.valid_or_dirty_p ()
+	  && !pred_block_info.reaching_out.valid_or_dirty_p ())
+	continue;
+      return false;
+    }
+  return true;
+}
+
 bool
 vector_infos_manager::all_same_ratio_p (sbitmap bitdata) const
 {
@@ -3138,6 +3153,14 @@ pass_vsetvl::backward_demand_fusion (void)
       if (!backward_propagate_worthwhile_p (cfg_bb, curr_block_info))
 	continue;
 
+      /* Fix PR108270:
+
+		bb 0 -> bb 1
+	 We don't need to backward fuse VL/VTYPE info from bb 1 to bb 0
+	 if bb 1 is not inside a loop and all predecessors of bb 0 are empty. */
+      if (m_vector_manager->all_empty_predecessor_p (cfg_bb))
+	continue;
+
       edge e;
       edge_iterator ei;
       /* Backward propagate to each predecessor.  */
diff --git a/gcc/config/riscv/riscv-vsetvl.h b/gcc/config/riscv/riscv-vsetvl.h
index 237381f7026..eec03d35071 100644
--- a/gcc/config/riscv/riscv-vsetvl.h
+++ b/gcc/config/riscv/riscv-vsetvl.h
@@ -450,6 +450,8 @@ public:
   /* Return true if all expression set in bitmap are same ratio.  */
   bool all_same_ratio_p (sbitmap) const;
 
+  bool all_empty_predecessor_p (const basic_block) const;
+
   void release (void);
   void create_bitmap_vectors (void);
   void free_bitmap_vectors (void);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c
index cd4ee7dd0d3..ed32a40f5e7 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c
@@ -29,4 +29,4 @@ void f (int8_t * restrict in, int8_t * restrict out, int n, int cond)
   }
 }
 
-/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e8,\s*mf8,\s*tu,\s*m[au]} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e8,\s*mf8,\s*tu,\s*m[au]} 2 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c
index 1f7c0f036a2..2fa29c01dbc 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c
@@ -20,7 +20,7 @@ void f (int8_t * restrict in, int8_t * restrict out, int n, int cond)
   }
 }
 
-/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e8,\s*mf8,\s*tu,\s*m[au]} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*5,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0"  no-opts "-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times {vsetivli} 1 { target { no-opts "-O0"  no-opts "-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {vsetivli} 2 { target { no-opts "-O0"  no-opts "-funroll-loops" no-opts "-g" } } } } */
 /* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0"  no-opts "-funroll-loops" no-opts "-g" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr108270.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr108270.c
new file mode 100644
index 00000000000..d2ae43bf263
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr108270.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv -mabi=ilp32 -fno-tree-vectorize -fno-schedule-insns -fno-schedule-insns2" } */
+
+#include "riscv_vector.h"
+
+void f (void * restrict in, void * restrict out, int l, int n, int m)
+{
+  for (int i = 0; i < l; i++){
+    for (int j = 0; j < m; j++){
+      for (int k = 0; k < n; k++)
+        {
+          vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17);
+          __riscv_vse8_v_i8mf8 (out + i + j, v, 17);
+        }
+    }
+  }
+}
+
+/* { dg-final { scan-assembler-not {mv\s+[a-x0-9]+,[a-x0-9]+\s+mv\s+[a-x0-9]+,[a-x0-9]+\s+mv\s+[a-x0-9]+,[a-x0-9]+\s+mv\s+[a-x0-9]+,[a-x0-9]+\s+mv\s+[a-x0-9]+,[a-x0-9]+\s+vsetivli} } } */