From patchwork Tue Oct 17 06:43:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 153929 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp3940286vqb; Mon, 16 Oct 2023 23:44:04 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE1Iyl0uRWzc2OTf7JhA49s0QHXvHFR4vOibQTHjmURz0yZTrSyfmIBo80xo05gq6mCFuAM X-Received: by 2002:a05:620a:4628:b0:775:6e8d:de4f with SMTP id br40-20020a05620a462800b007756e8dde4fmr1698097qkb.60.1697525043782; Mon, 16 Oct 2023 23:44:03 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1697525043; cv=pass; d=google.com; s=arc-20160816; b=DkTkkhaJHGMWj21eweSygI/mPawWlIxKu03xofEpHgqNUVNhIEC85ITCyp1nHIlnZN SihpPDZMaEnNorBqs0I4QDPlIGkkPolkmyJoZBvddEqH8q2bD9MK3228Cnq2tgZNHUqr zMc8DT3wslwUSWdRHDz4AHcByV/8wkwxXquc/rxImj9TNPL4wdZpg+EEWWNqGJHT1xPI 3jRa9U8j99k1M+J1s0TIiFCrrD/wVRnkmcVBRV6HxJbzL3uQRTX7BAqW7DJIMQLD5AbN 6ky60YV1dsM7kCMbPY0RwjDwGRiHABDay5c/c3DdwnKdNZRhYs2yC65zV7eN/V5pRRyL u8dA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:arc-filter:dmarc-filter:delivered-to; bh=Q3H6uEGhkcZkAKE3OsQ+H9R0wbHjNcrQfcOqBuSNW38=; fh=Hxj0cw/2jAkyitv6gv3v4BQlApRvZwleU4172JThTNQ=; b=zp4Yk5E8+AklLjK6jP7YbaxEeMzWRedSslF6ZePhLMoPoZ4ZxHlSL37y+hQwupHFGM iTDQvKxMWfm5Ue3UVfnAv5i1gFeuPmqJGdAePhXDyzJ6gCItAIA1Ez9uwMjfP9auJcXT hoLRphF+UWwhEPcToiPvoGaOBnb1NtbnOE+4xMuBaHGFZuCORxOxbke0+NrwSy3GVoEk g+aYIEIJaNjhhDvK5ncM/2zcAtkpd2szLnt9JQGuusfaP4pbeG/Eh+8K+7MEQn14dQPG h0/oZ9KdSEcoVPIJKubB6Mbyz6YfQ4oEsCWBkVHOPERhuUfpJBh7GLRVpTkQ1CRezF10 HeRw== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id d17-20020a05622a101100b00417badc9998si698086qte.453.2023.10.16.23.44.03 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 23:44:03 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8D5C33858425 for ; Tue, 17 Oct 2023 06:44:03 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgeu2.qq.com (smtpbgeu2.qq.com [18.194.254.142]) by sourceware.org (Postfix) with ESMTPS id 538EA3858C54 for ; Tue, 17 Oct 2023 06:43:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 538EA3858C54 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 538EA3858C54 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=18.194.254.142 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697525019; cv=none; b=qNoX7Bx3eHkhBYPvKgxdNeAOW422nSbbAx9DSIZhfH7SDRbvTvlWldY+c8c4If9+ikpRqhzULs3/9+C5ZatTx804gqzwyw/RCgpgLPZJjlTW49f8Rv4jPeEdEYMhD6r2NE45+h+Fkm4HKkwc1qxPGDF1Hnj0pvWTfcWBqEepjyo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697525019; c=relaxed/simple; bh=zMqR2x3JbrNIjl2hm+/cEKjIfK2Vt+SEJnjJlZi0aRQ=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=bidPLYCTbsG7Al1Q6lSTrgvmZXVDjvRCtqqY19BqAIbd2s+PHGvLMQF0fcrXYum7BlF9nDRKTvhFUp5b9hI4nlXVxTT6XClb54YDKW6yJ4YYxhHOOd4Rr0b780/hssIF9TBERmzXK9OwyQ2oZGHdUGglxoXqogiZ/wVUjijcpFA= ARC-Authentication-Results: i=1; server2.sourceware.org X-QQ-mid: bizesmtp84t1697525006tzjfx8be Received: from rios-cad121.hadoop.rioslab.org ( [58.60.1.9]) by bizesmtp.qq.com (ESMTP) with id ; Tue, 17 Oct 2023 14:43:25 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: znfcQSa1hKZZE+ilQ8Fctt8PRvrwS+0FbfAG4VhJSS85QJ0y5CWJHjiN8i9I7 QcbDe2oAFFo+AjHRyT1/BfV9hbpJ3ojW1DFzYh33yNTHCwwkeKwibs9MQtyZ65w48GM4xxa 5e3N5+Hh14gkbm/kwFvL5dAFQ8lY/l7isq1zP1w3iuv1EHLr0Oj3B8t932Rqa3k4+jpi0jd UA/FPX8NFnXu+QAyIYMuQ+xs9lZPhcmHj6zlLZGPuojcCictQtPxCuUUm8CPtbjGSTJ9AQR W8FVdK00EhiapL2q0qX4LPu5KHNqcDJOeVXqoRuhKPoSOxioHKbwCZF2D8rt5K+0iAZWX6S EGY6ak67/3o3oExcZXSWbxUC15DH4pjDeUwmCdgvAjv8J6uPor4FpWCFje4geZWgSuKBt57 Q7ZVvmy8oUw7R9k0Uz6SbQ== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 11201540890042124631 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Juzhe-Zhong Subject: [PATCH V4] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] Date: Tue, 17 Oct 2023 14:43:24 +0800 Message-Id: <20231017064324.1023901-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_PASS, TXREP, T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1779984020767246079 X-GMAIL-MSGID: 1779984020767246079 This patch fixes this following FAILs in RISC-V regression: FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects scan-tree-dump vect "Loop contains only SLP stmts" FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts" The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD. We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD: 1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, condtional mask). This situation we just need to leverage the current MASK_GATHER_LOAD which can achieve SLP MASK_LEN_GATHER_LOAD. 2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, -1) Current SLP check will failed on dummy mask -1, so we relax the check in tree-vect-slp.cc and allow it to be materialized. Consider this following case: void __attribute__((noipa)) f (int *restrict y, int *restrict x, int *restrict indices, int n) { for (int i = 0; i < n; ++i) { y[i * 2] = x[indices[i * 2]] + 1; y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2; } } https://godbolt.org/z/WG3M3n7Mo GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES: f: ble a3,zero,.L5 .L3: vsetvli a5,a3,e8,mf4,ta,ma vsetvli zero,a5,e32,m1,ta,ma vlseg2e32.v v6,(a2) vsetvli a4,zero,e64,m2,ta,ma vsext.vf2 v2,v6 vsll.vi v2,v2,2 vsetvli zero,a5,e32,m1,ta,ma vluxei64.v v1,(a1),v2 vsetvli a4,zero,e64,m2,ta,ma vsext.vf2 v2,v7 vsetvli zero,zero,e32,m1,ta,ma vadd.vi v4,v1,1 vsetvli zero,zero,e64,m2,ta,ma vsll.vi v2,v2,2 vsetvli zero,a5,e32,m1,ta,ma vluxei64.v v2,(a1),v2 vsetvli a4,zero,e32,m1,ta,ma slli a6,a5,3 vadd.vi v5,v2,2 sub a3,a3,a5 vsetvli zero,a5,e32,m1,ta,ma vsseg2e32.v v4,(a0) add a2,a2,a6 add a0,a0,a6 bne a3,zero,.L3 .L5: ret After this patch: f: ble a3,zero,.L5 li a5,1 csrr t1,vlenb slli a5,a5,33 srli a7,t1,2 addi a5,a5,1 slli a3,a3,1 neg t3,a7 vsetvli a4,zero,e64,m1,ta,ma vmv.v.x v4,a5 .L3: minu a5,a3,a7 vsetvli zero,a5,e32,m1,ta,ma vle32.v v1,0(a2) vsetvli a4,zero,e64,m2,ta,ma vsext.vf2 v2,v1 vsll.vi v2,v2,2 vsetvli zero,a5,e32,m1,ta,ma vluxei64.v v2,(a1),v2 vsetvli a4,zero,e32,m1,ta,ma mv a6,a3 vadd.vv v2,v2,v4 vsetvli zero,a5,e32,m1,ta,ma vse32.v v2,0(a0) add a2,a2,t1 add a0,a0,t1 add a3,a3,t3 bgtu a6,a7,.L3 .L5: ret Note that I found we are missing conditional mask gather_load SLP test, Append a test for it in this patch. Tested on RISC-V and Bootstrap && Regression on X86 passed. Ok for trunk ? gcc/ChangeLog: * tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD. (vect_get_and_check_slp_defs): Ditto. (vect_build_slp_tree_1): Ditto. (vect_build_slp_tree_2): Ditto. * tree-vect-stmts.cc (vectorizable_load): Ditto. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-gather-6.c: New test. --- gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++++++++++++++ gcc/tree-vect-slp.cc | 22 ++++++++++++++++++---- gcc/tree-vect-stmts.cc | 9 ++++++++- 3 files changed, 41 insertions(+), 5 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-6.c b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c new file mode 100644 index 00000000000..ff55f321854 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ + +void +f (int *restrict y, int *restrict x, int *restrict indices, int *restrict cond, int n) +{ + for (int i = 0; i < n; ++i) + { + if (cond[i * 2]) + y[i * 2] = x[indices[i * 2]] + 1; + if (cond[i * 2 + 1]) + y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2; + } +} + +/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target vect_gather_load_ifn } } } */ diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index af8f5031bd2..b379278446b 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -550,6 +550,7 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0) return arg1_map; case IFN_MASK_GATHER_LOAD: + case IFN_MASK_LEN_GATHER_LOAD: return arg1_arg4_map; case IFN_MASK_STORE: @@ -717,8 +718,7 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap, { tree type = TREE_TYPE (oprnd); dt = dts[i]; - if ((dt == vect_constant_def - || dt == vect_external_def) + if (dt == vect_external_def && !GET_MODE_SIZE (vinfo->vector_mode).is_constant () && (TREE_CODE (type) == BOOLEAN_TYPE || !can_duplicate_and_interleave_p (vinfo, stmts.length (), @@ -730,6 +730,17 @@ vect_get_and_check_slp_defs (vec_info *vinfo, unsigned char swap, "for variable-length SLP %T\n", oprnd); return -1; } + if (dt == vect_constant_def + && !GET_MODE_SIZE (vinfo->vector_mode).is_constant () + && !can_duplicate_and_interleave_p (vinfo, stmts.length (), type)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "Build SLP failed: invalid type of def " + "for variable-length SLP %T\n", + oprnd); + return -1; + } /* For the swapping logic below force vect_reduction_def for the reduction op in a SLP reduction group. */ @@ -1094,7 +1105,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, if (cfn == CFN_MASK_LOAD || cfn == CFN_GATHER_LOAD - || cfn == CFN_MASK_GATHER_LOAD) + || cfn == CFN_MASK_GATHER_LOAD + || cfn == CFN_MASK_LEN_GATHER_LOAD) ldst_p = true; else if (cfn == CFN_MASK_STORE) { @@ -1355,6 +1367,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)) && rhs_code != CFN_GATHER_LOAD && rhs_code != CFN_MASK_GATHER_LOAD + && rhs_code != CFN_MASK_LEN_GATHER_LOAD /* Not grouped loads are handled as externals for BB vectorization. For loop vectorization we can handle splats the same we handle single element interleaving. */ @@ -1855,7 +1868,8 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, if (gcall *stmt = dyn_cast (stmt_info->stmt)) gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD) || gimple_call_internal_p (stmt, IFN_GATHER_LOAD) - || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)); + || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD) + || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD)); else { *max_nunits = this_max_nunits; diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index b3a56498595..d50eb48c0be 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -9874,12 +9874,19 @@ vectorizable_load (vec_info *vinfo, return false; mask_index = internal_fn_mask_index (ifn); + slp_tree slp_op = NULL; if (mask_index >= 0 && slp_node) mask_index = vect_slp_child_index_for_operand (call, mask_index); if (mask_index >= 0 && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index, - &mask, NULL, &mask_dt, &mask_vectype)) + &mask, &slp_op, &mask_dt, &mask_vectype)) return false; + if (mask_index >= 0 && slp_node) + { + bool match_p + = vect_maybe_update_slp_op_vectype (slp_op, mask_vectype); + gcc_assert (match_p); + } } tree vectype = STMT_VINFO_VECTYPE (stmt_info);