From patchwork Mon Jan 29 11:32:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 193406 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp501395dyb; Mon, 29 Jan 2024 03:33:06 -0800 (PST) X-Google-Smtp-Source: AGHT+IHTSpzb9iokSL9gF9bwQ5A/qyDg4tfQEJ1frjsnDYUQYWxV0DM8IEA+4vEYcZxGG6WwiRex X-Received: by 2002:a05:6214:490:b0:681:7ba9:e1da with SMTP id pt16-20020a056214049000b006817ba9e1damr5153421qvb.121.1706527986578; Mon, 29 Jan 2024 03:33:06 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706527986; cv=pass; d=google.com; s=arc-20160816; b=M++IBcgrLkphEnaeSgx3uifJRbFRCMClFdxroYF1DcCnqQHk5lHYKXquDfBE7V0Xi0 h3GVHzc9rT04x/yUr+QZEdIKBPvTWipTTPwP9ttlezjzAe23BUQo4QCXTunkDzQg6cBA +7s68ieV3ZSeo+mFhEiEDaNzHjNKrF97oauP37KK16+cqIUq0jw8edEnnximy/TrVg32 hNjfprqKGJK6c9lt8Qi9tlSBOAgrX/uyZbbgunOFcZd9aGHPU6yDyXUvIqxTiZLFNdpT kQArowONHWulu7POZZ/tJ3/7AEWLSR0Oy7x3mCri6TIMdCxj0mLg7qn29biiBekY14hb ucig== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:arc-filter:dmarc-filter:delivered-to; bh=Oc+Zk+vM0Ql7NQpA2+DK31PUBo0p5cnXchB2ksESgjs=; fh=12MRPJmZ1mgDpHqWoogMKqnaGRGM2b7lcuJroqfjJiw=; b=mURP5dB2Mec4U42jfN6DT0lea8K/I2C5Oowg/2b9i84utDsecgSgjT1ri6I5QhYW7o stHnqC7GIvU5132EmRtpja5gO8qMShU7AgcFUPG2q7lh504s8FAicbRNc+A7nq7JmT7o 8CGJ46YcKc7EuOA8fAORPlSSupU34R0tCuwls9KFIzQ0Yje5TeDfjP7d3/eMspFvISSw SXuoKxMmCybKo9G3NvZwz0J2onDLq8jowdEr/nzVtftiqvmaPphi6jNrb1WxID2WECT1 JmrP66LaLFRekrBlsLsWvpNNuUx+7Hq5SvkHyL9cztdfHi51dNdCHwC9Tpb6fP493ZJM Ommw== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id q4-20020ad45744000000b0068c469e8e89si3408523qvx.262.2024.01.29.03.33.06 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 03:33:06 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E7C41385842D for ; Mon, 29 Jan 2024 11:33:04 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgeu2.qq.com (smtpbgeu2.qq.com [18.194.254.142]) by sourceware.org (Postfix) with ESMTPS id 1C2F83858D20 for ; Mon, 29 Jan 2024 11:32:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1C2F83858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1C2F83858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=18.194.254.142 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706527938; cv=none; b=Xs61zAnHESzR/O6Ji30dcTkrLEiz8D5ePXQYsMJA6+HPX3djCTc9onTOxqHiXkLKWUqgbVBsGeiEi/1LE3LJbnMmUEsbIYbUS8o9EQ/A5/pAhtgvnsReisbI0++YRP5E7XxwnGISUGKSMJaW06fOHGiqebIpAwoFBPo/UczR2+s= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706527938; c=relaxed/simple; bh=1mpxToZFZXVV0z2toi4tb2/6I0MF7W7i4pNKza/I/ns=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=a3KqDFxizFcdbbHD0zDbGitwb2jL9pDl8wb7ZkTMPvuuGvSMOIXFS8xOerxQBZU+RdhnCPm6zo1d7JJqfSeOTK9nM0cJKhoHZHI/FVenrxicce8iFk+/YLQdjpR3cRdK7tLYM7kyv56T2vhtmMFqTaKXiErj9L8FuV7AXRbuYs4= ARC-Authentication-Results: i=1; server2.sourceware.org X-QQ-mid: bizesmtp81t1706527924tx3qruf9 X-QQ-Originating-IP: o0i18lBdycnImnStal+CRuj2VxMAp2if4pYoDfaekv8= Received: from server1.localdomain ( [58.60.1.8]) by bizesmtp.qq.com (ESMTP) with id ; Mon, 29 Jan 2024 19:32:03 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: znfcQSa1hKYlfHrbubSHtDIUblkkp5BsCm87EXCnhdKiMDjIqpTJc8zfXQlKR cukVXwJIcPwWhCwcu741kNfWxZEPtswrvz6Fo36OzTJw5UBn9BuE9gQpuoPWCSL8QDddXwe LTL2Q73GA6WgTZzNjxl6JEv77O60WIyspJAJH2wXJoT0JpTPLvoug7We5VWLJq+7ppIAqLY Aal71ptLv7vsvBRiujTFnNgHo9rR6oaLPFva6Fre6JgHVTN/IiHxS4f7oe5KCiRx39xWlJ4 xugYODJr3NTA6gEZm232PY/X8U2TJjs+Jp1CGT3N+opNfEiOZq8s64HT8GHRYB8aMSBLX8s CqvJfSyvYNuMYTVGs4fqdwpXdnfJ6oe20ytq46qSPp2KIaLIKiTRDazl6970sjUwKafljQ/ 6RbjDC8PibxzsapedenZSw== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 12099038700008069456 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, Juzhe-Zhong Subject: [PATCH] RISC-V: Fix VSETLV PASS compile-time issue Date: Mon, 29 Jan 2024 19:32:02 +0800 Message-Id: <20240129113202.277104-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.1 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789424289964322548 X-GMAIL-MSGID: 1789424289964322548 The compile time issue was discovered in SPEC 2017 wrf: Use time and -ftime-report to analyze the profile data of SPEC 2017 wrf compilation . Before this patch (Lazy vsetvl): scheduling : 121.89 ( 15%) 0.53 ( 11%) 122.72 ( 15%) 13M ( 1%) machine dep reorg : 424.61 ( 53%) 1.84 ( 37%) 427.44 ( 53%) 5290k ( 0%) real 13m27.074s user 13m19.539s sys 0m5.180s Simple vsetvl: machine dep reorg : 0.10 ( 0%) 0.00 ( 0%) 0.11 ( 0%) 4138k ( 0%) real 6m5.780s user 6m2.396s sys 0m2.373s The machine dep reorg is the compile time of VSETVL PASS (424 seconds) which counts 53% of the compilation time, spends much more time than scheduling. After investigation, the critical patch of VSETVL pass is compute_lcm_local_properties which is called every iteration of phase 2 (earliest fusion) and phase 3 (global lcm). This patch optimized the codes of compute_lcm_local_properties to reduce the compilation time. After this patch: scheduling : 117.51 ( 27%) 0.21 ( 6%) 118.04 ( 27%) 13M ( 1%) machine dep reorg : 80.13 ( 18%) 0.91 ( 26%) 81.26 ( 18%) 5290k ( 0%) real 7m25.374s user 7m20.116s sys 0m3.795s The optimization of this patch is very obvious, lazy VSETVL PASS: 424s (53%) -> 80s (18%) which spend less time than scheduling. Tested on both RV32 and RV64 no regression. Ok for trunk ? PR target/113495 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (extract_single_source): Remove. (pre_vsetvl::compute_vsetvl_def_data): Fix compile time issue. (pre_vsetvl::compute_transparent): New function. (pre_vsetvl::compute_lcm_local_properties): Fix compile time time issue. --- gcc/config/riscv/riscv-vsetvl.cc | 184 ++++++++++--------------------- 1 file changed, 60 insertions(+), 124 deletions(-) diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc index d7b40a5c813..cec862329c5 100644 --- a/gcc/config/riscv/riscv-vsetvl.cc +++ b/gcc/config/riscv/riscv-vsetvl.cc @@ -599,14 +599,6 @@ extract_single_source (set_info *set) return first_insn; } -static insn_info * -extract_single_source (def_info *def) -{ - if (!def) - return nullptr; - return extract_single_source (dyn_cast (def)); -} - static bool same_equiv_note_p (set_info *set1, set_info *set2) { @@ -2374,6 +2366,7 @@ public: } void compute_vsetvl_def_data (); + void compute_transparent (const bb_info *); void compute_lcm_local_properties (); void fuse_local_vsetvl_info (); @@ -2452,20 +2445,16 @@ pre_vsetvl::compute_vsetvl_def_data () { for (unsigned i = 0; i < m_vsetvl_def_exprs.length (); i += 1) { - const vsetvl_info &info = *m_vsetvl_def_exprs[i]; - if (!info.has_nonvlmax_reg_avl ()) - continue; - unsigned int regno; - sbitmap_iterator sbi; - EXECUTE_IF_SET_IN_BITMAP (m_reg_def_loc[bb->index ()], 0, regno, - sbi) - if (regno == REGNO (info.get_avl ())) - { - bitmap_set_bit (m_kill[bb->index ()], i); - bitmap_set_bit (def_loc[bb->index ()], - get_expr_index (m_vsetvl_def_exprs, - m_unknow_info)); - } + auto *info = m_vsetvl_def_exprs[i]; + if (info->has_nonvlmax_reg_avl () + && bitmap_bit_p (m_reg_def_loc[bb->index ()], + REGNO (info->get_avl ()))) + { + bitmap_set_bit (m_kill[bb->index ()], i); + bitmap_set_bit (def_loc[bb->index ()], + get_expr_index (m_vsetvl_def_exprs, + m_unknow_info)); + } } continue; } @@ -2516,6 +2505,36 @@ pre_vsetvl::compute_vsetvl_def_data () sbitmap_vector_free (m_kill); } +/* Subroutine of compute_lcm_local_properties which Compute local transparent + BB. Note that the compile time is very sensitive to compute_transparent and + compute_lcm_local_properties, any change of these 2 functions should be + aware of the compile time changing of the program which has a large number of + blocks, e.g SPEC 2017 wrf. + + Current compile time profile of SPEC 2017 wrf: + + 1. scheduling - 27% + 2. machine dep reorg (VSETVL PASS) - 18% + + VSETVL pass should not spend more time than scheduling in compilation. */ +void +pre_vsetvl::compute_transparent (const bb_info *bb) +{ + int num_exprs = m_exprs.length (); + unsigned bb_index = bb->index (); + for (int i = 0; i < num_exprs; i++) + { + auto *info = m_exprs[i]; + if (info->has_nonvlmax_reg_avl () + && bitmap_bit_p (m_reg_def_loc[bb_index], REGNO (info->get_avl ()))) + bitmap_clear_bit (m_transp[bb_index], i); + else if (info->has_vl () + && bitmap_bit_p (m_reg_def_loc[bb_index], + REGNO (info->get_vl ()))) + bitmap_clear_bit (m_transp[bb_index], i); + } +} + /* Compute the local properties of each recorded expression. Local properties are those that are defined by the block, irrespective of @@ -2572,7 +2591,7 @@ pre_vsetvl::compute_lcm_local_properties () bitmap_vector_clear (m_avloc, last_basic_block_for_fn (cfun)); bitmap_vector_clear (m_antloc, last_basic_block_for_fn (cfun)); - bitmap_vector_clear (m_transp, last_basic_block_for_fn (cfun)); + bitmap_vector_ones (m_transp, last_basic_block_for_fn (cfun)); /* - If T is locally available at the end of a block, then T' must be available at the end of the same block. Since some optimization has @@ -2598,117 +2617,34 @@ pre_vsetvl::compute_lcm_local_properties () /* Compute m_transp */ if (block_info.empty_p ()) + compute_transparent (bb); + else { - bitmap_ones (m_transp[bb_index]); - for (int i = 0; i < num_exprs; i += 1) - { - const vsetvl_info &info = *m_exprs[i]; - if (!info.has_nonvlmax_reg_avl () && !info.has_vl ()) - continue; - - if (info.has_nonvlmax_reg_avl ()) - { - unsigned int regno; - sbitmap_iterator sbi; - EXECUTE_IF_SET_IN_BITMAP (m_reg_def_loc[bb->index ()], 0, - regno, sbi) - { - if (regno == REGNO (info.get_avl ())) - bitmap_clear_bit (m_transp[bb->index ()], i); - } - } - - for (insn_info *insn : bb->real_nondebug_insns ()) - { - if (info.has_nonvlmax_reg_avl () - && find_access (insn->defs (), REGNO (info.get_avl ()))) - { - bitmap_clear_bit (m_transp[bb_index], i); - break; - } - - if (info.has_vl () - && reg_mentioned_p (info.get_vl (), insn->rtl ())) - { - if (find_access (insn->defs (), REGNO (info.get_vl ()))) - /* We can't fuse vsetvl into the blocks that modify the - VL operand since successors of such blocks will need - the value of those blocks are defining. - - bb 4: def a5 - / \ - bb 5:use a5 bb 6:vsetvl a5, 5 - - The example above shows that we can't fuse vsetvl - from bb 6 into bb 4 since the successor bb 5 is using - the value defined in bb 4. */ - ; - else - { - /* We can't fuse vsetvl into the blocks that use the - VL operand which has different value from the - vsetvl info. - - bb 4: def a5 - | - bb 5: use a5 - | - bb 6: def a5 - | - bb 7: use a5 - - The example above shows that we can't fuse vsetvl - from bb 6 into bb 5 since their value is different. - */ - resource_info resource - = full_register (REGNO (info.get_vl ())); - def_lookup dl = crtl->ssa->find_def (resource, insn); - def_info *def - = dl.matching_set_or_last_def_of_prev_group (); - insn_info *def_insn = extract_single_source (def); - if (def_insn && vsetvl_insn_p (def_insn->rtl ())) - { - vsetvl_info def_info = vsetvl_info (def_insn); - if (m_dem.compatible_p (def_info, info)) - continue; - } - } + bitmap_clear (m_transp[bb_index]); + vsetvl_info &header_info = block_info.get_entry_info (); + vsetvl_info &footer_info = block_info.get_exit_info (); - bitmap_clear_bit (m_transp[bb_index], i); - break; - } - } - } + if (header_info.valid_p () && anticipated_exp_p (header_info)) + bitmap_set_bit (m_antloc[bb_index], + get_expr_index (m_exprs, header_info)); - continue; + if (footer_info.valid_p ()) + for (int i = 0; i < num_exprs; i += 1) + { + const vsetvl_info &info = *m_exprs[i]; + if (!info.valid_p ()) + continue; + if (available_exp_p (footer_info, info)) + bitmap_set_bit (m_avloc[bb_index], i); + } } - vsetvl_info &header_info = block_info.get_entry_info (); - vsetvl_info &footer_info = block_info.get_exit_info (); - - if (header_info.valid_p () && anticipated_exp_p (header_info)) - bitmap_set_bit (m_antloc[bb_index], - get_expr_index (m_exprs, header_info)); - - if (footer_info.valid_p ()) - for (int i = 0; i < num_exprs; i += 1) - { - const vsetvl_info &info = *m_exprs[i]; - if (!info.valid_p ()) - continue; - if (available_exp_p (footer_info, info)) - bitmap_set_bit (m_avloc[bb_index], i); - } - } - - for (const bb_info *bb : crtl->ssa->bbs ()) - { - unsigned bb_index = bb->index (); if (invalid_opt_bb_p (bb->cfg_bb ())) { bitmap_clear (m_antloc[bb_index]); bitmap_clear (m_transp[bb_index]); } + /* Compute ae_kill for each basic block using: ~(TRANSP | COMP)