From patchwork Mon Oct 16 23:20:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 153806 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp3783519vqb; Mon, 16 Oct 2023 16:21:18 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFwrFLHPb3zlX6YivsV/I4aCOEhWq8rd3uBI2uXQymP/7UniBCbBXRofex43oy3f9IV1Ik6 X-Received: by 2002:a67:c18b:0:b0:457:d97a:4553 with SMTP id h11-20020a67c18b000000b00457d97a4553mr787114vsj.25.1697498478531; Mon, 16 Oct 2023 16:21:18 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1697498478; cv=pass; d=google.com; s=arc-20160816; b=e3NdKMtsSKZAd/iTBxhccABWvlMs5NFUfP649IvuI0XmSlkzKQ67IKW0T27+amakme +9Fz5PiHOR64WwiSrPcYh9P+Tfw7VtsITTys/K5158Cn2xfCHw+Sj2TjgySgdkdWC8GB CoHxdTY56uYpKjVy6Az5cKzmRsJlXWGpnbv/+iZGAjGSksbCK2Rt+Q7PCgvJiUR7ZCny JDx37F6SVgssHTBM61GPidekeQnNpUeHDwgJmmj5TvVXYW5mVYs/lWoFqSvE/FoAJHQM vlNWX9trQUwSE97Bi7/byooAnXyeaA0fl0xSt3JnW/Bhc4gys7MXY+iHLgHsk7h7DZmU Zc9g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:arc-filter:dmarc-filter:delivered-to; bh=jttlWspvf4TQrKDWaHvwvrNc/3O717e747lkePaZYcI=; fh=12MRPJmZ1mgDpHqWoogMKqnaGRGM2b7lcuJroqfjJiw=; b=Di6BWogW8zFQsfmdWyHH6rX8sMd9X9PJqtfp6UDgyGC/ZOBGZv6z2ibX+zHIiKWdcU acY40+gJqWE9P4fl9VOOjZdlNAxTZNhzwa9+qVPKSWsQW4fbN8GDao6RZpjpxq4O9mlP 0xg5Yf5TrCnntZSjQia3KAKx+b96V9BHau0eApQ0L3wqy9YE53p70uOeOp6uKktkoGAv F8i0zQQ2sNejkj4TKvUpPvVr/LC7Cm4C1/zuk+2El2vvDUb3uCxk/o4LukzigMhSR9se 4d1Y2goA4s8ilzBoeMT7+DAKe97yMckcPnYsK7+E+8Jn5iOoXEZLqpNdQsIsp3j3errW Ns6A== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id cx19-20020a05620a51d300b007758f54999bsi208727qkb.626.2023.10.16.16.21.18 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 16:21:18 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3EFE4385840D for ; Mon, 16 Oct 2023 23:21:18 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgsg1.qq.com (smtpbgsg1.qq.com [54.254.200.92]) by sourceware.org (Postfix) with ESMTPS id 987CC3858C2A for ; Mon, 16 Oct 2023 23:20:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 987CC3858C2A Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 987CC3858C2A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=54.254.200.92 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697498454; cv=none; b=ifTEUWFQjXj3XulOZ+Hn8FpJF2aZA/tNUg7LhyNFG+h9rPhzR/cry3roDM7S7xe+LSxyrrcVSmYheGIpJPI1JOBFUuLxIrXFMI15QD5bn6q5n5zRTT+y7PqeEmCSyCCap2IhfmB3WXRmb3daoTIgRrfNrBSOfLYNPAfa0hVjuBg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697498454; c=relaxed/simple; bh=+t5iM11pLrlt2cj7frPH/Y9i8L+0bXdT4pRQSlf7Coo=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=Vk7K4BvOl6XkLFXamHx83Y1Fmj+IGQtYQObyCVE7Y3JpcI/FSNMQZtkf6E7M1rdXPP3dsR4aGtTRcvwhHnEM3Zq2qvoOKbUQNO7+OnCtONQ4SgAN85AFBTggpcMyhH2hTdqioCMd44MYYWxTuGRDa/GXZV8D7djiOscJh6LpDGY= ARC-Authentication-Results: i=1; server2.sourceware.org X-QQ-mid: bizesmtp73t1697498440tuqjkikv Received: from rios-cad121.hadoop.rioslab.org ( [58.60.1.9]) by bizesmtp.qq.com (ESMTP) with id ; Tue, 17 Oct 2023 07:20:39 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: 7YFKcddXagi7iksB6CthgBbPK8XgmS3/seFNW1gk/cNGFRdovKJtpld3YAH3k +xXBFxheBumG04iVrLhNXoGtASAoB3lFQUL2DWb2wsvHUpVJd81AmXghr1FzqvHXX9kjDOU yWXJurkrItp1uDJQ77ZNVu3bwldYC8D+dhzbIbgX7JVACf827fECVOgUQbPZBGy9s8pXjm3 LMBNtGNMdm8EsvhFEz/Pece1n5OTHL7p45wy/vaBml19zHUINn8/E2yl3pdanTP6BIswXgJ EJMdX/z0/GowXptkWQK2Dza6/Lhzo7bjuKVLRemH9PmOWrpXHBHKYh9NrxWWb4UJ6qDzh9I EWJJEMchUpKhzjNgvzQUbGal6ZYaA9GoF1xfTAEmlcnwVxHNzHjyNOvA3JjbzgrFHAxN5we X-QQ-GoodBg: 2 X-BIZMAIL-ID: 1771812124459562294 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, Juzhe-Zhong Subject: [PATCH V4] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store Date: Tue, 17 Oct 2023 07:20:38 +0800 Message-Id: <20231016232038.353641-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1779956164266203224 X-GMAIL-MSGID: 1779956164266203224 Consider this following case: int bar (int *x, int a, int b, int n) { x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__); int sum1 = 0; int sum2 = 0; for (int i = 0; i < n; ++i) { sum1 += x[2*i] - a; sum1 += x[2*i+1] * b; sum2 += x[2*i] - b; sum2 += x[2*i+1] * a; } return sum1 + sum2; } Before this patch: bar: ble a3,zero,.L5 csrr t0,vlenb csrr a6,vlenb slli t1,t0,3 vsetvli a5,zero,e32,m4,ta,ma sub sp,sp,t1 vid.v v20 vmv.v.x v12,a1 vand.vi v4,v20,1 vmv.v.x v16,a2 vmseq.vi v4,v4,1 slli t3,a6,2 vsetvli zero,a5,e32,m4,ta,ma vmv1r.v v0,v4 viota.m v8,v4 add a7,t3,sp vsetvli a5,zero,e32,m4,ta,mu vand.vi v28,v20,-2 vadd.vi v4,v28,1 vs4r.v v20,0(a7) ----- spill vrgather.vv v24,v12,v8 vrgather.vv v20,v16,v8 vrgather.vv v24,v16,v8,v0.t vrgather.vv v20,v12,v8,v0.t vs4r.v v4,0(sp) ----- spill slli a3,a3,1 addi t4,a6,-1 neg t1,a6 vmv4r.v v0,v20 vmv.v.i v4,0 j .L4 .L13: vsetvli a5,zero,e32,m4,ta,ma .L4: mv a7,a3 mv a4,a3 bleu a3,a6,.L3 csrr a4,vlenb .L3: vmv.v.x v8,t4 vl4re32.v v12,0(sp) ---- spill vand.vv v20,v28,v8 vand.vv v8,v12,v8 vsetvli zero,a4,e32,m4,ta,ma vle32.v v16,0(a0) vsetvli a5,zero,e32,m4,ta,ma add a3,a3,t1 vrgather.vv v12,v16,v20 add a0,a0,t3 vrgather.vv v20,v16,v8 vsub.vv v12,v12,v0 vsetvli zero,a4,e32,m4,tu,ma vadd.vv v4,v4,v12 vmacc.vv v4,v24,v20 bgtu a7,a6,.L13 csrr a1,vlenb slli a1,a1,2 add a1,a1,sp li a4,-1 csrr t0,vlenb vsetvli a5,zero,e32,m4,ta,ma vl4re32.v v12,0(a1) ---- spill vmv.v.i v8,0 vmul.vx v0,v12,a4 li a2,0 slli t1,t0,3 vadd.vi v0,v0,-1 vand.vi v0,v0,1 vmseq.vv v0,v0,v8 vand.vi v12,v12,1 vmerge.vvm v16,v8,v4,v0 vmseq.vv v12,v12,v8 vmv.s.x v1,a2 vmv1r.v v0,v12 vredsum.vs v16,v16,v1 vmerge.vvm v8,v8,v4,v0 vmv.x.s a0,v16 vredsum.vs v8,v8,v1 vmv.x.s a5,v8 add sp,sp,t1 addw a0,a0,a5 jr ra .L5: li a0,0 ret We can there are multiple horrible register spillings. The root cause of this issue is for a scalar IR load: _5 = *_4; We didn't check whether it is a continguous load/store or gather/scatter load/store Since it will be translate into: 1. MASK_LEN_GATHER_LOAD (..., perm indice). 2. Continguous load/store + VEC_PERM (..., perm indice) It's obvious that no matter which situation, we will end up with consuming one vector register group (perm indice) that we didn't count it before. So this case we pick LMUL = 4 which is incorrect choice for dynamic LMUL cost model. The key of this patch is: if ((type == load_vec_info_type || type == store_vec_info_type) && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info))) { ... } Add one more register consumption if it is not an adjacent load/store. After this patch, it pick LMUL = 2 which is optimal: bar: ble a3,zero,.L4 csrr a6,vlenb vsetvli a5,zero,e32,m2,ta,ma vmv.v.x v6,a2 srli a2,a6,1 vmv.v.x v4,a1 vid.v v12 slli a3,a3,1 vand.vi v0,v12,1 addi t1,a2,-1 vmseq.vi v0,v0,1 slli a6,a6,1 vsetvli zero,a5,e32,m2,ta,ma neg a7,a2 viota.m v2,v0 vsetvli a5,zero,e32,m2,ta,mu vrgather.vv v16,v4,v2 vrgather.vv v14,v6,v2 vrgather.vv v16,v6,v2,v0.t vrgather.vv v14,v4,v2,v0.t vand.vi v18,v12,-2 vmv.v.i v2,0 vadd.vi v20,v18,1 .L3: minu a4,a3,a2 vsetvli zero,a4,e32,m2,ta,ma vle32.v v8,0(a0) vsetvli a5,zero,e32,m2,ta,ma vmv.v.x v4,t1 vand.vv v10,v18,v4 vrgather.vv v6,v8,v10 vsub.vv v6,v6,v14 vsetvli zero,a4,e32,m2,tu,ma vadd.vv v2,v2,v6 vsetvli a1,zero,e32,m2,ta,ma vand.vv v4,v20,v4 vrgather.vv v6,v8,v4 vsetvli zero,a4,e32,m2,tu,ma mv a4,a3 add a0,a0,a6 add a3,a3,a7 vmacc.vv v2,v16,v6 bgtu a4,a2,.L3 vsetvli a1,zero,e32,m2,ta,ma vand.vi v0,v12,1 vmv.v.i v4,0 li a3,-1 vmseq.vv v0,v0,v4 vmv.s.x v1,zero vmerge.vvm v6,v4,v2,v0 vredsum.vs v6,v6,v1 vmul.vx v0,v12,a3 vadd.vi v0,v0,-1 vand.vi v0,v0,1 vmv.x.s a4,v6 vmseq.vv v0,v0,v4 vmv.s.x v1,zero vmerge.vvm v4,v4,v2,v0 vredsum.vs v4,v4,v1 vmv.x.s a0,v4 addw a0,a0,a4 ret .L4: li a0,0 ret No spillings. gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (max_number_of_live_regs): Fix big LMUL issue. (get_store_value): New function. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: New test. --- gcc/config/riscv/riscv-vector-costs.cc | 93 +++++++++++++++++-- .../costmodel/riscv/rvv/dynamic-lmul2-7.c | 25 +++++ 2 files changed, 109 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c diff --git a/gcc/config/riscv/riscv-vector-costs.cc b/gcc/config/riscv/riscv-vector-costs.cc index 0b890396535..33061efb1d0 100644 --- a/gcc/config/riscv/riscv-vector-costs.cc +++ b/gcc/config/riscv/riscv-vector-costs.cc @@ -40,6 +40,7 @@ along with GCC; see the file COPYING3. If not see #include "bitmap.h" #include "ssa.h" #include "backend.h" +#include "tree-data-ref.h" /* This file should be included last. */ #include "riscv-vector-costs.h" @@ -135,8 +136,9 @@ compute_local_program_points ( || is_gimple_call (gsi_stmt (si)))) continue; stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi_stmt (si)); - if (STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info)) - != undef_vec_info_type) + enum stmt_vec_info_type type + = STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info)); + if (type != undef_vec_info_type) { stmt_point info = {point, gsi_stmt (si)}; program_points.safe_push (info); @@ -289,9 +291,7 @@ max_number_of_live_regs (const basic_block bb, unsigned int i; unsigned int live_point = 0; auto_vec live_vars_vec; - live_vars_vec.safe_grow (max_point + 1, true); - for (i = 0; i < live_vars_vec.length (); ++i) - live_vars_vec[i] = 0; + live_vars_vec.safe_grow_cleared (max_point + 1, true); for (hash_map::iterator iter = live_ranges.begin (); iter != live_ranges.end (); ++iter) { @@ -360,6 +360,31 @@ get_current_lmul (class loop *loop) return loop_autovec_infos.get (loop)->current_lmul; } +/* Get STORE value. */ +static tree +get_store_value (gimple *stmt) +{ + if (is_gimple_call (stmt) && gimple_call_internal_p (stmt)) + { + if (gimple_call_internal_fn (stmt) == IFN_MASK_STORE) + return gimple_call_arg (stmt, 3); + else + gcc_unreachable (); + } + else + return gimple_assign_rhs1 (stmt); +} + +/* Return true if it is non-contiguous load/store. */ +static bool +non_contiguous_memory_access_p (stmt_vec_info stmt_info) +{ + enum stmt_vec_info_type type + = STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info)); + return ((type == load_vec_info_type || type == store_vec_info_type) + && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info))); +} + /* Update the live ranges according PHI. Loop: @@ -395,13 +420,15 @@ update_local_live_ranges ( unsigned int nbbs = loop->num_nodes; unsigned int i, j; gphi_iterator psi; + gimple_stmt_iterator si; for (i = 0; i < nbbs; i++) { basic_block bb = bbs[i]; if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, - "Update local program points for bb %d:\n", bb->index); - for (psi = gsi_start_phis (bbs[i]); !gsi_end_p (psi); gsi_next (&psi)) + "Update local program points for bb %d:\n", + bbs[i]->index); + for (psi = gsi_start_phis (bb); !gsi_end_p (psi); gsi_next (&psi)) { gphi *phi = psi.phi (); stmt_vec_info stmt_info = vinfo->lookup_stmt (phi); @@ -413,12 +440,23 @@ update_local_live_ranges ( { edge e = gimple_phi_arg_edge (phi, j); tree def = gimple_phi_arg_def (phi, j); - auto *live_ranges = live_ranges_per_bb.get (e->src); + auto *live_ranges = live_ranges_per_bb.get (bb); + auto *live_range = live_ranges->get (def); + if (live_range && flow_bb_inside_loop_p (loop, e->src)) + { + unsigned int start = (*live_range).first; + (*live_range).first = 0; + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "Update %T start point from %d to %d:\n", + def, start, (*live_range).first); + } + live_ranges = live_ranges_per_bb.get (e->src); if (!program_points_per_bb.get (e->src)) continue; unsigned int max_point = (*program_points_per_bb.get (e->src)).length () - 1; - auto *live_range = live_ranges->get (def); + live_range = live_ranges->get (def); if (!live_range) continue; @@ -430,6 +468,43 @@ update_local_live_ranges ( end, (*live_range).second); } } + for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) + { + if (!(is_gimple_assign (gsi_stmt (si)) + || is_gimple_call (gsi_stmt (si)))) + continue; + stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi_stmt (si)); + enum stmt_vec_info_type type + = STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info)); + if (non_contiguous_memory_access_p (stmt_info)) + { + /* For non-adjacent load/store STMT, we will potentially + convert it into: + + 1. MASK_LEN_GATHER_LOAD (..., perm indice). + 2. Continguous load/store + VEC_PERM (..., perm indice) + + We will be likely using one more vector variable. */ + unsigned int max_point + = (*program_points_per_bb.get (bb)).length () - 1; + auto *live_ranges = live_ranges_per_bb.get (bb); + bool existed_p = false; + tree var = type == load_vec_info_type + ? gimple_get_lhs (gsi_stmt (si)) + : get_store_value (gsi_stmt (si)); + tree sel_type = build_nonstandard_integer_type ( + TYPE_PRECISION (TREE_TYPE (var)), 1); + tree sel = build_decl (UNKNOWN_LOCATION, VAR_DECL, + get_identifier ("vect_perm"), sel_type); + pair &live_range = live_ranges->get_or_insert (sel, &existed_p); + gcc_assert (!existed_p); + live_range = pair (0, max_point); + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "Add perm indice %T, start = 0, end = %d\n", + sel, max_point); + } + } } } diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c new file mode 100644 index 00000000000..3dfc6f16a25 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c @@ -0,0 +1,25 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -fdump-tree-vect-details" } */ + +int +bar (int *x, int a, int b, int n) +{ + x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__); + int sum1 = 0; + int sum2 = 0; + for (int i = 0; i < n; ++i) + { + sum1 += x[2*i] - a; + sum1 += x[2*i+1] * b; + sum2 += x[2*i] - b; + sum2 += x[2*i+1] * a; + } + return sum1 + sum2; +} + +/* { dg-final { scan-assembler {e32,m2} } } */ +/* { dg-final { scan-assembler-times {csrr} 1 } } */ +/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */