From patchwork Wed Oct 18 12:36:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 154867 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp4757428vqb; Wed, 18 Oct 2023 05:37:22 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGccC3hYBkdIIFA7H3uwmFB1jsX+/nsdmFxN2gqvfAS8Kx4u7Uom4mzgXAfCeH3xoPHuRbS X-Received: by 2002:a05:620a:1991:b0:775:66c1:7f9c with SMTP id bm17-20020a05620a199100b0077566c17f9cmr5464780qkb.32.1697632642583; Wed, 18 Oct 2023 05:37:22 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1697632642; cv=pass; d=google.com; s=arc-20160816; b=qDA34sy7WYIkwgRKETyzF/0J46SeH4X+x0zI9TJCwjfTzK+o8wI7qgN1EHlMymi39z s9JjV8M8CcnvxBS/mC4bCeV1bpR42d2+ya2HdSrD/KCTIXsJSghDhHkOBLd3GhGdXXoX eoV+j+EAhXaEivKTZy19gDucKnhV2ISbnxGS2Whm/izgdrK7GcMaaRIHWPtGQRMmGSJO iSNxKKJ/5/jrHsCuxhdqVKJG6CH5krFrKFOITo+YHRpIf+APpHiYkMwE8XcWF6p0n7y5 pVo5SX3CVGfCzE1vsteR71Yt/Ra1Z/OhW7gNhSMkNNbVmtqBlrKyCWVl76U9CZCJP4g0 BIgA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:arc-filter:dmarc-filter:delivered-to; bh=/YsIFi3PgFVw6qhy79ZhOd3P+TKwCZ4JwRUK/TiXmvE=; fh=Hxj0cw/2jAkyitv6gv3v4BQlApRvZwleU4172JThTNQ=; b=lzmZ7D1fFS2EoW/K+H1S5GJLlOTka1CPswR54FYC7QLiPnTatLTH/d7xv/hgHVA5Q4 kQndxio6ODdV4OfoYzo2ZuGUeuxLJSF7Pl23DQ5JfpE6saL9sc7KVVT8jSz/5/rNU5oc H1PTtZutCSU1dOEdSaV3pSCoC0Zun4Nz95dShgrVtUHYA9zc5fQJx8rCvtBzc4A5hkI5 3szRIcCzfVaJ0yv77jimHj/TZDKYAsLgkoRqrxQwg4emGMbUP9ubUPKlB5l6V5j7snOP UaPJmDl47CalTlZZV/pdbNGzTb4fWQOwBI4YmRRnD7pPLbBhxmpDgL45G5f2l9ojApue QI3g== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id k3-20020a05620a414300b0076da78c6888si2559887qko.750.2023.10.18.05.37.22 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 05:37:22 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5994A38582BD for ; Wed, 18 Oct 2023 12:37:22 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgsg2.qq.com (smtpbgsg2.qq.com [54.254.200.128]) by sourceware.org (Postfix) with ESMTPS id 7BBE13858D33 for ; Wed, 18 Oct 2023 12:36:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7BBE13858D33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7BBE13858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=54.254.200.128 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697632619; cv=none; b=SQvMB0JFTLwJDn94wjgVFoIjnc2ybMOowqgAz80NwYp2+fLkrgfgsw3HUlopnuBfXIa8rJ6xJATYuBE/TXhgWy5YIYY0aXMwTLaSp0xzc62jZ2DQDa2h4h+VAHgqR+NswA51DoO0i7cJ1jZcyaiYvw6/tkNFiifCNoevWUWLEuU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697632619; c=relaxed/simple; bh=wQbZgOe+aMPhOTF+raNkW0I/Ngxc9gYRN2IMnr2XhVM=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=hjoHgu3MIqebkU4IYi+dnsyE5DVWK+17iWW65Uda4k/iUVQFchPgxfRvrM/pqRDB5PAlq07s0CgE+iSS/+tRTlNR4Az3YWAdhEd6QUlmEuLdX97n/+HpKn1WuQEc2UjFYC3qXg9znw1bhLVUfYzRLX6+rAIg21OVSacqN3PPhfo= ARC-Authentication-Results: i=1; server2.sourceware.org X-QQ-mid: bizesmtp83t1697632604tq3g1eyg Received: from rios-cad121.hadoop.rioslab.org ( [58.60.1.9]) by bizesmtp.qq.com (ESMTP) with id ; Wed, 18 Oct 2023 20:36:43 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: 7YFKcddXagi8nOsoYcY5TrTRlkWze4AH60PMkfOL2EBoMIT921Wb9zobbhPQJ yptwiB7eZEvmh2SCiLzXIo/96ZyKzIotxrV5JaJWO8+Gm6y0KAGOY1KoINVxWFH52pFJinE ZjaQopAGOMeMO6T6NpfpVl41NnlPfesdwZjoUKP4Fl4Ak7kk7TQFyQapDEUBsmC4sbmn6AX IttOynZO/P78n9p5dDewR7Qi3FRsTMQ9+HDKM93zUgQz8HAco9OilIqPIsJSYJo3U7Mn9LJ TR6p3pGMSkRjvEtn1uZJT1cNx++7nBBa9xBnMGSHz9WIY3BTGPbByYFRygnkvLwW2ilWmyD 4lmJpmzfcq/mHhpoJs6a2hjYx4VM3GtNtUwN5o4dkj0Hg58I1AhhtThynmKaSAPm5CEGFJO 46/yuxWobyTknDtVeql7OQ== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 17032694378959169369 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Juzhe-Zhong Subject: [PATCH V5] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] Date: Wed, 18 Oct 2023 20:36:42 +0800 Message-Id: <20231018123642.427403-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780096846014507754 X-GMAIL-MSGID: 1780096846014507754 This patch fixes this following FAILs in RISC-V regression: FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts" The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD. We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD: 1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, condtional mask). This situation we just need to leverage the current MASK_GATHER_LOAD which can achieve SLP MASK_LEN_GATHER_LOAD. 2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, -1) Current SLP check will failed on dummy mask -1, so we relax the check in tree-vect-slp.cc and allow it to be materialized. Consider this following case: void __attribute__((noipa)) f (int *restrict y, int *restrict x, int *restrict indices, int n) { for (int i = 0; i < n; ++i) { y[i * 2] = x[indices[i * 2]] + 1; y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2; } } https://godbolt.org/z/WG3M3n7Mo GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES: f: ble a3,zero,.L5 .L3: vsetvli a5,a3,e8,mf4,ta,ma vsetvli zero,a5,e32,m1,ta,ma vlseg2e32.v v6,(a2) vsetvli a4,zero,e64,m2,ta,ma vsext.vf2 v2,v6 vsll.vi v2,v2,2 vsetvli zero,a5,e32,m1,ta,ma vluxei64.v v1,(a1),v2 vsetvli a4,zero,e64,m2,ta,ma vsext.vf2 v2,v7 vsetvli zero,zero,e32,m1,ta,ma vadd.vi v4,v1,1 vsetvli zero,zero,e64,m2,ta,ma vsll.vi v2,v2,2 vsetvli zero,a5,e32,m1,ta,ma vluxei64.v v2,(a1),v2 vsetvli a4,zero,e32,m1,ta,ma slli a6,a5,3 vadd.vi v5,v2,2 sub a3,a3,a5 vsetvli zero,a5,e32,m1,ta,ma vsseg2e32.v v4,(a0) add a2,a2,a6 add a0,a0,a6 bne a3,zero,.L3 .L5: ret After this patch: f: ble a3,zero,.L5 li a5,1 csrr t1,vlenb slli a5,a5,33 srli a7,t1,2 addi a5,a5,1 slli a3,a3,1 neg t3,a7 vsetvli a4,zero,e64,m1,ta,ma vmv.v.x v4,a5 .L3: minu a5,a3,a7 vsetvli zero,a5,e32,m1,ta,ma vle32.v v1,0(a2) vsetvli a4,zero,e64,m2,ta,ma vsext.vf2 v2,v1 vsll.vi v2,v2,2 vsetvli zero,a5,e32,m1,ta,ma vluxei64.v v2,(a1),v2 vsetvli a4,zero,e32,m1,ta,ma mv a6,a3 vadd.vv v2,v2,v4 vsetvli zero,a5,e32,m1,ta,ma vse32.v v2,0(a0) add a2,a2,t1 add a0,a0,t1 add a3,a3,t3 bgtu a6,a7,.L3 .L5: ret Note that I found we are missing conditional mask gather_load SLP test, Append a test for it in this patch. Tested on RISC-V and Bootstrap && Regression on X86 passed. Ok for trunk ? gcc/ChangeLog: * tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD. (vect_get_and_check_slp_defs): Ditto. (vect_build_slp_tree_1): Ditto. (vect_build_slp_tree_2): Ditto. * tree-vect-stmts.cc (vectorizable_load): Ditto. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-gather-6.c: New test. --- gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++++++++++++++ gcc/tree-vect-slp.cc | 22 ++++++++++++++++++---- gcc/tree-vect-stmts.cc | 12 +++++++++++- 3 files changed, 44 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-6.c b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c new file mode 100644 index 00000000000..ff55f321854 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ + +void +f (int *restrict y, int *restrict x, int *restrict indices, int *restrict cond, int n) +{ + for (int i = 0; i < n; ++i) + { + if (cond[i * 2]) + y[i * 2] = x[indices[i * 2]] + 1; + if (cond[i * 2 + 1]) + y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2; + } +} + +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target vect_gather_load_ifn } } } */ diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index d081999a763..146dba658a2 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -552,6 +552,7 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0) return arg1_map; case IFN_MASK_GATHER_LOAD: + case IFN_MASK_LEN_GATHER_LOAD: return arg1_arg4_map; case IFN_MASK_STORE: @@ -719,8 +720,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap, { tree type = TREE_TYPE (oprnd); dt = dts[i]; - if ((dt == vect_constant_def - || dt == vect_external_def) + if (dt == vect_external_def && !GET_MODE_SIZE (vinfo->vector_mode).is_constant () && (TREE_CODE (type) == BOOLEAN_TYPE || !can_duplicate_and_interleave_p (vinfo, stmts.length (), @@ -732,6 +732,17 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap, "for variable-length SLP %T\n", oprnd); return -1; } + if (dt == vect_constant_def + && !GET_MODE_SIZE (vinfo->vector_mode).is_constant () + && !can_duplicate_and_interleave_p (vinfo, stmts.length (), type)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "Build SLP failed: invalid type of def " + "for variable-length SLP %T\n", + oprnd); + return -1; + } /* For the swapping logic below force vect_reduction_def for the reduction op in a SLP reduction group. */ @@ -1096,7 +1107,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, if (cfn == CFN_MASK_LOAD || cfn == CFN_GATHER_LOAD - || cfn == CFN_MASK_GATHER_LOAD) + || cfn == CFN_MASK_GATHER_LOAD + || cfn == CFN_MASK_LEN_GATHER_LOAD) ldst_p = true; else if (cfn == CFN_MASK_STORE) { @@ -1357,6 +1369,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)) && rhs_code != CFN_GATHER_LOAD && rhs_code != CFN_MASK_GATHER_LOAD + && rhs_code != CFN_MASK_LEN_GATHER_LOAD /* Not grouped loads are handled as externals for BB vectorization. For loop vectorization we can handle splats the same we handle single element interleaving. */ @@ -1857,7 +1870,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, if (gcall *stmt = dyn_cast (stmt_info->stmt)) gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD) || gimple_call_internal_p (stmt, IFN_GATHER_LOAD) - || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)); + || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD) + || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD)); else { *max_nunits = this_max_nunits; diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index e5ff44c25f1..0bc7beb1342 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -9878,12 +9878,22 @@ vectorizable_load (vec_info *vinfo, return false; mask_index = internal_fn_mask_index (ifn); + slp_tree slp_op = NULL; if (mask_index >= 0 && slp_node) mask_index = vect_slp_child_index_for_operand (call, mask_index); if (mask_index >= 0 && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index, - &mask, NULL, &mask_dt, &mask_vectype)) + &mask, &slp_op, &mask_dt, &mask_vectype)) return false; + if (mask_index >= 0 && slp_node + && !vect_maybe_update_slp_op_vectype (slp_op, mask_vectype)) + { + /* We don't vectorize the boolean type external SLP mask. */ + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "incompatible vector types for invariants\n"); + return false; + } } tree vectype = STMT_VINFO_VECTYPE (stmt_info);