From patchwork Mon Nov 6 03:34:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 161820 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:8f47:0:b0:403:3b70:6f57 with SMTP id j7csp2422210vqu; Sun, 5 Nov 2023 19:35:05 -0800 (PST) X-Google-Smtp-Source: AGHT+IEX9ZlO8QPn7NCglttUp30OByXNAEiwye/PUKOiWMR3DPglDTk4yYwwx0mI0hseaH45Fsth X-Received: by 2002:ad4:5b83:0:b0:658:41ee:faf2 with SMTP id 3-20020ad45b83000000b0065841eefaf2mr33691716qvp.23.1699241705269; Sun, 05 Nov 2023 19:35:05 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1699241705; cv=pass; d=google.com; s=arc-20160816; b=uGjAu6M3vB0hq/5PKNLtzdAC4bKMlgD5X3MTZMgg/kPflB04qtnd24u4oF24OWmtQK te7WZzeieA6HdPGmmFXwfFDA6FRsoACgAw6mCj1f6nMWAlbr1S2VLB3c6hvAzTKngofo rSjunfGLsEm3m2wXp4pWLHI+B7/aZPNqH4ybVlpk6UR8KQTxup4CiIY2k8QKk6DRThm2 /olZrKo1GQMRIyPfZM/vY69hIgZEbcVIRgW/YHN6RF5xZhdNK+3Mt4BhxOXJ2x/lrNfk fju4N2B6puLLicZ9WVUgUxiT10iDjhzk96e3429IEQQi2dFfQOmHHsvdYDgPu1gKCHbV Kcvw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:arc-filter:dmarc-filter:delivered-to; bh=J9rsYBCtc3T5JolwinECxBlZ8GvReh4W7QvUvIo217g=; fh=12MRPJmZ1mgDpHqWoogMKqnaGRGM2b7lcuJroqfjJiw=; b=ZQSu2HiEKbA3hesoVgDY6y8nRrO7OY6cmriSIYWLjJ4oRCJl5qQx2obYeyUmc38MVD 5r+PxjUAQI+5/cSzrYGvYPl7qPh00n99dGTJhI2fW1jsy5/dIMAVCuo+tL9OAcKnFTYj aaXxX+M0wJnYG6K5Z8E4NY0ihzKIAVSg0hZssCZoSWLmmKHNOt0IUuDk5oDmuwfBSp0P YMLKusfPwCNcGn4sMvB8Bnlpkw/pFTwJXDS6UkoLhYXU2B7zGzK2y4TH24YH/MaYb6PC 41Dk4ZsExu3nX3nCELVnQd/itrNJScrRsIeP6AC7/1s3wDKIyehdpVWnYb2fAlDcRtil 74gA== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id i13-20020ad4410d000000b0065afe979197si4921012qvp.123.2023.11.05.19.35.05 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 05 Nov 2023 19:35:05 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0520D385840B for ; Mon, 6 Nov 2023 03:35:05 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgjp3.qq.com (smtpbgjp3.qq.com [54.92.39.34]) by sourceware.org (Postfix) with ESMTPS id 869523858D28 for ; Mon, 6 Nov 2023 03:34:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 869523858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 869523858D28 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=54.92.39.34 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699241682; cv=none; b=Qg86N/2NS8blyinAiT2N1OwKExi8uohjrgJkq+rxolx8biiO8MTiVoR/5TFlIJrpcXkCjS1NVj3wZ17/xrpecvbNqrZXzIy+hsFv4xLQq9ckxQykmBUBHKKeBs6v3zIFI8VhrJuIoXjoJlNxq1Yas82eN6u7iDpA0IML/GWxehE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699241682; c=relaxed/simple; bh=WzML11jF7DwwCuojN7FQbc30q8pCl0zv8RMC4gm/ocE=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=jOzVWIu25QujVJJOSF1JkS2t9jNIxzrFpM+yHETmEsBRjAyEM50Z1KTR7O+xBpePQ18qaf2PfmbRywG/xFnOp5A2J8rDPnG2EPFDMLQTFb2ssDwxMJqjRDNld2XXenp67kC8et9CcKhELG7CHc4Px2MAhv7G4fhVH4lrikHkPoQ= ARC-Authentication-Results: i=1; server2.sourceware.org X-QQ-mid: bizesmtp71t1699241668tladrqxy Received: from rios-cad122.hadoop.rioslab.org ( [58.60.1.26]) by bizesmtp.qq.com (ESMTP) with id ; Mon, 06 Nov 2023 11:34:27 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: k0mQ4ihyJQO947vEQ21Zq39P34dQQ3bP7AhzgqouA2H3sg6VFW66oJAzzRrZr tYLBMP8DeUeirsAPR9FW7lONJU5UDgqfMhlFCFbMhMTY9qW4Ql/nGBh5IQbApu4Dr4yU5Tq uMXyOrLx9BdZfQPrXSSxiTYt7JfHJlfBYmpTS6mLfYyuZQYPtRA9XxwCRNBAWVvg0vA7po+ HggEZbC9t61aO47Tn175PUDtmAgjR82iWJgWVcweiel6EYcjuVQCsdSfQCkuGfEvcIlN+5v RseyoHwK9/5oD814AOU3Dl0UTRv0TQOXeZS6gJJNSGxzBzBUtlp4gK/Q5woz/BAUMDam3WS 9a9UNFI74xPfJkZIz5UErl6Y4DYxELcOOB6LnGeCcn6waLD2C7UxYTiaQij+B1w3ikLVRkz aTHzfMuMKOk= X-QQ-GoodBg: 2 X-BIZMAIL-ID: 10258915037143720950 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, Juzhe-Zhong Subject: [PATCH] RISC-V: Enhance AVL propagation for complicate reduction auto-vectorization Date: Mon, 6 Nov 2023 11:34:26 +0800 Message-Id: <20231106033426.45920-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781784070236050852 X-GMAIL-MSGID: 1781784070236050852 I notice we failed to AVL propagate for reduction with more complicate situation: double foo (double *__restrict a, double *__restrict b, double *__restrict c, int n) { double result = 0; for (int i = 0; i < n; i++) result += a[i] * b[i] * c[i]; return result; } vsetvli a5,a3,e8,mf8,ta,ma -> should be fused into e64m1,TU slli a4,a5,3 vle64.v v3,0(a0) vle64.v v1,0(a1) vsetvli a6,zero,e64,m1,ta,ma -> redundant vfmul.vv v1,v1,v3 vsetvli zero,a5,e64,m1,tu,ma -> redundant vle64.v v3,0(a2) vfmacc.vv v2,v1,v3 add a0,a0,a4 add a1,a1,a4 add a2,a2,a4 sub a3,a3,a5 bne a3,zero,.L3 The failed AVL propgation causes redundant AVL/VL togglling. The root cause as follows: vsetvl a5, zero vadd.vv def r136 vsetvl zero, a3, ... TU vsub.vv (use r136) We propagate AVL (r136) from 'vsub.vv' into 'vadd.vv' when 'vsub.vv' is TA policy. However, it's too restrict so we missed optimization here. We enhance AVL propation for TU policy for following situation: vsetvl a5, zero vadd.vv def r136 vsetvl zero, a3, ... TU vsub.vv (use r136, merge != r136) Note that we should only propagate AVL when merge != r136 for 'vsub.vv' doesn't depend on the tail elements. After this patch: vsetvli a5,a3,e64,m1,tu,ma slli a4,a5,3 vle64.v v3,0(a0) vle64.v v1,0(a1) vfmul.vv v1,v1,v3 vle64.v v3,0(a2) vfmacc.vv v2,v3,v1 add a0,a0,a4 add a1,a1,a4 add a2,a2,a4 sub a3,a3,a5 bne a3,zero,.L3 PR target/112399 gcc/ChangeLog: * config/riscv/riscv-avlprop.cc (pass_avlprop::get_vlmax_ta_preferred_avl): Enhance AVL propagation. * config/riscv/t-riscv: Add new include. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/imm_switch-2.c: Adapt test. * gcc.target/riscv/rvv/autovec/pr112399.c: New test. --- gcc/config/riscv/riscv-avlprop.cc | 17 ++++++++-- gcc/config/riscv/t-riscv | 3 +- .../gcc.target/riscv/rvv/autovec/pr112399.c | 31 +++++++++++++++++++ .../riscv/rvv/vsetvl/imm_switch-2.c | 3 +- 4 files changed, 49 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112399.c diff --git a/gcc/config/riscv/riscv-avlprop.cc b/gcc/config/riscv/riscv-avlprop.cc index 1dfaa8742da..1f6ba405342 100644 --- a/gcc/config/riscv/riscv-avlprop.cc +++ b/gcc/config/riscv/riscv-avlprop.cc @@ -78,6 +78,7 @@ along with GCC; see the file COPYING3. If not see #include "rtl-ssa.h" #include "cfgcleanup.h" #include "insn-attr.h" +#include "tm-constrs.h" using namespace rtl_ssa; using namespace riscv_vector; @@ -285,8 +286,20 @@ pass_avlprop::get_vlmax_ta_preferred_avl (insn_info *insn) const if (!use_insn->can_be_optimized () || use_insn->is_asm () || use_insn->is_call () || use_insn->has_volatile_refs () || use_insn->has_pre_post_modify () - || !has_vl_op (use_insn->rtl ()) - || !tail_agnostic_p (use_insn->rtl ())) + || !has_vl_op (use_insn->rtl ())) + return NULL_RTX; + + /* We should only propagate non-VLMAX AVL into VLMAX insn when + such insn potential tail elements (after propagation) are + not used. So, we should make sure the outcome of VLMAX insn + is not depend on. */ + extract_insn_cached (use_insn->rtl ()); + int merge_op_idx = get_attr_merge_op_idx (use_insn->rtl ()); + if (merge_op_idx != INVALID_ATTRIBUTE + && !satisfies_constraint_vu (recog_data.operand[merge_op_idx]) + && refers_to_regno_p (set->regno (), + recog_data.operand[merge_op_idx]) + && !tail_agnostic_p (use_insn->rtl ())) return NULL_RTX; int new_sew = get_sew (use_insn->rtl ()); diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv index f8ca3f4ac57..95becfc819b 100644 --- a/gcc/config/riscv/t-riscv +++ b/gcc/config/riscv/t-riscv @@ -80,7 +80,8 @@ riscv-vector-costs.o: $(srcdir)/config/riscv/riscv-vector-costs.cc \ riscv-avlprop.o: $(srcdir)/config/riscv/riscv-avlprop.cc \ $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \ - $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-attr.h + $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-attr.h \ + tm-constrs.h $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \ $(srcdir)/config/riscv/riscv-avlprop.cc diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112399.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112399.c new file mode 100644 index 00000000000..948e12b8474 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112399.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -fno-vect-cost-model -ffast-math -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + + +/* +** foo: +** ... +** vsetvli\s*[a-x0-9]+,\s*[a-x0-9]+,\s*e64,\s*m1,\s*tu,\s*m[au] +** slli\s*[a-x0-9]+,\s*[a-x0-9]+,\s*3 +** vle64\.v\s*v[0-9]+,\s*0\([a-x0-9]+\) +** vle64\.v\s*v[0-9]+,\s*0\([a-x0-9]+\) +** vfmul\.vv\s*v[0-9]+,\s*v[0-9]+,\s*v[0-9]+ +** vle64\.v\s*\s*v[0-9]+,\s*0\([a-x0-9]+\) +** vfmacc\.vv\s*\s*v[0-9]+,\s*v[0-9]+,\s*v[0-9]+ +** add\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+ +** add\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+ +** add\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+ +** sub\s*[a-x0-9]+,\s*[a-x0-9]+,\s*[a-x0-9]+ +** ... +*/ + +double +foo (double *__restrict a, double *__restrict b, double *__restrict c, int n) +{ + double result = 0; + for (int i = 0; i < n; i++) + result += a[i] * b[i] * c[i]; + return result; +} + diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_switch-2.c b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_switch-2.c index 2e58f088d6b..c55faa5fa47 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_switch-2.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/imm_switch-2.c @@ -23,6 +23,5 @@ void f (void * restrict in, void * restrict out, void * restrict mask_in, int n) /* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*19,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 2 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */ /* { dg-final { scan-assembler-times {vsetivli\s+zero,\s*19,\s*e16,\s*mf2,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */ /* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */ -/* { dg-final { scan-assembler-times {vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */ -/* { dg-final { scan-assembler-times {vsetvli} 2 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */ +/* { dg-final { scan-assembler-times {vsetvli} 1 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */ /* { dg-final { scan-assembler-times {vsetivli} 4 { target { no-opts "-O0" no-opts "-g" no-opts "-funroll-loops" } } } } */