From patchwork Fri Nov 10 12:20:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 163795 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b129:0:b0:403:3b70:6f57 with SMTP id q9csp1073378vqs; Fri, 10 Nov 2023 04:20:49 -0800 (PST) X-Google-Smtp-Source: AGHT+IGfZ8f5PYt39AROj9q64ETNP3m8BbiXUZQ0r+ZEf6Am2s9+egWdyHonw9ncs2N2IbessXQ1 X-Received: by 2002:a25:f449:0:b0:da0:228e:9cc5 with SMTP id p9-20020a25f449000000b00da0228e9cc5mr7466190ybe.42.1699618849344; Fri, 10 Nov 2023 04:20:49 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1699618849; cv=pass; d=google.com; s=arc-20160816; b=rKU4I3eoiL26I76Epn20p86QbWr1j4k+ux6WLhNumTJCuWQub6uJvm5EJuMap2kVO0 uhswhdaDziHGHMMG88eLAKUovT4sV5O6o4ezUtD0ZOTEE0aw9VIWl/sZaE4YYlBiq6QC X4cOoDvhzsnFkCjEj88HV60eIthGBVAm57+RobTBoyJ+nT3E8iFSi7pRlEG2pGWCOtcR Hnf6QzzaNFb+RfwuhoDEbUh6IRcHP0rMZJIjlbDm2lhDKeT2Z8GkoN3mnOg6wGSs3tq7 WCA+rmUWD1fneEhpu0Q+p9nv/C80fgtNQ4GIfnH51HXcx0NwR2nKsX7JLHodlaf+LJy+ NPPQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:arc-filter:dmarc-filter:delivered-to; bh=5uJI0bDLNWMuevJ0xn5dZCz7tQvREPxRCvoQnw1pHBY=; fh=Hxj0cw/2jAkyitv6gv3v4BQlApRvZwleU4172JThTNQ=; b=fH7t3Jei1I0jEawSh8octkfTN+rZVjTuvZK6FDAB/Eityf9U4tGiLVIKeGNu5RfoHj q/Y4yRicx9ZtIEWhudCutXk4vG2r1EFZevUjWJ2OGoBL5T3wpsa0DGzpfEQPPeFIie6p NBVrtWRxj6fl6/DJn0vIZSZSFmYPK0AWYx0Vop7Px3IbB3xcdVSfAZFCa5yK1MwUVFVw E42L7/XL7dcnv53xR8+QYrhJrNgXLAKMmeGYNYNUIwXetdErWiFD4mISqvhUMimy1gNp YiHhUfUpJ5uTDZ0495XA+fpYHnXCKyz6TX7zNpGnm6FbBdkTAcjK7QqV7CT3uSGOKWOs rBhQ== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id bw26-20020a05622a099a00b00421a25f7ae6si2046911qtb.230.2023.11.10.04.20.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Nov 2023 04:20:49 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 21EB038582A4 for ; Fri, 10 Nov 2023 12:20:49 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgsg1.qq.com (smtpbgsg1.qq.com [54.254.200.92]) by sourceware.org (Postfix) with ESMTPS id 86B2F3858D3C for ; Fri, 10 Nov 2023 12:20:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 86B2F3858D3C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 86B2F3858D3C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=54.254.200.92 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699618825; cv=none; b=wmlWCfJR9GIKlc3NcgJ8ncv7DZWPPzEEvJqSFMM8LVZfL6ySWoYNwlJngwAIWDnkX8kYxLjGqCThsv5lby/ECDIJ0F/ZnMzMtEbWXAQDHheirxS9u16nuF2GXUludLO25vVQCu7x8BWu4ATAahQY7UZ7jkRQMZ2cyxQJ/M0hpeQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699618825; c=relaxed/simple; bh=6yN/uYSRxaEPLi041wWTRg2CaIeo7ByjUQADkNqBOc4=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=Lw53Britzp/dPd7qv9I/3Lq+AEgug1e9/AOlsiJ1zDmmORDist17mONxtj7RaALq6cpAzua0jUk00z68mVk3wHemLgM2m94llUzok+G3AbNnU2j3mNTa3l2pgdwRWMOEl3Sf6XN2xNSnfoHuPJJofxdMbuSeoMg/IBoFoXK3f5c= ARC-Authentication-Results: i=1; server2.sourceware.org X-QQ-mid: bizesmtp75t1699618813txvbdw4k Received: from rios-cad121.hadoop.rioslab.org ( [58.60.1.9]) by bizesmtp.qq.com (ESMTP) with id ; Fri, 10 Nov 2023 20:20:12 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: 1aHTM9ylpVzdauG633bkj35OwXiHGnjzLKEYwr/CrNHXAWgr+1NAiBHxANK6C SW3W/AbaQewTHp9+/TBGGADN/hygOveRQu2+PSa9D+BEuRnkoLH0F44PVc6Jc+XdRVG5gtJ Wv1+5aVgTRkKe35xT1oCRR+Fc28yHxVzpMPyje+q8zrL8+mgVmBiolOUVV4r3wXXEvAM0U0 gEYwNNUhSCjUwNC1G3O87lFpaoQxi8V8uWWZERIwOXUcvQV9zkI51FVrGnMbdIBSKCt0Xqd mCkSj0o6NyFOBU7x43uXizS/IS4si/rNx6dtKR2k8HyRBUaxHRdbk9oeBm5sfvzN8Oud9HP HH0LYOpnMUhjOVZlQLx53HO42YZG3z1uRpS0lOZPYAtCP/ZSojIkWlI0MqOeMW3lJQCmbxj MzRo+cHDWoUv7aDC0v4sAnoW2HmQpaWa X-QQ-GoodBg: 2 X-BIZMAIL-ID: 12876439653650221787 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Juzhe-Zhong Subject: [PATCH V2] Middle-end: Fix bug of induction variable vectorization for RVV Date: Fri, 10 Nov 2023 20:20:11 +0800 Message-Id: <20231110122011.3626658-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782179534803510290 X-GMAIL-MSGID: 1782179534803510290 PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438 1. Since SELECT_VL result is not necessary always VF in non-final iteration. Current GIMPLE IR is wrong: # vect_vec_iv_.8_22 = PHI <_21(4), { 0, 1, 2, ... }(3)> ... _35 = .SELECT_VL (ivtmp_33, VF); _21 = vect_vec_iv_.8_22 + { VF, ... }; E.g. Consider the total iterations N = 6, the VF = 4. Since SELECT_VL output is defined as not always to be VF in non-final iteration which needs to depend on hardware implementation. Suppose we have a RVV CPU core with vsetvl doing even distribution workload optimization. It may process 3 elements at the 1st iteration and 3 elements at the last iteration. Then the induction variable here: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... }; is wrong which is adding VF, which is 4, actually, we didn't process 4 elements. It should be adding 3 elements which is the result of SELECT_VL. So, here the correct IR should be: _36 = .SELECT_VL (ivtmp_34, VF); _22 = (int) _36; vect_cst__21 = [vec_duplicate_expr] _22; 2. This issue only happens on non-SLP vectorization single rgroup since: if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)) { tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo); if (direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type, OPTIMIZE_FOR_SPEED) && LOOP_VINFO_LENS (loop_vinfo).length () == 1 && LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1 && !slp && (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())) LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true; } 3. This issue doesn't appears on nested loop no matter LOOP_VINFO_USING_SELECT_VL_P is true or false. Since: # vect_vec_iv_.6_5 = PHI <_19(3), { 0, ... }(5)> # vect_diff_15.7_20 = PHI _19 = vect_vec_iv_.6_5 + { 1, ... }; vect_diff_9.8_22 = .COND_LEN_ADD ({ -1, ... }, vect_vec_iv_.6_5, vect_diff_15.7_20, vect_diff_15.7_20, _28, 0); ivtmp_1 = ivtmp_4 + 4294967295; .... [local count: 6549826]: # vect_diff_18.5_11 = PHI # ivtmp_26 = PHI _28 = .SELECT_VL (ivtmp_26, POLY_INT_CST [4, 4]); goto ; [100.00%] Note the induction variable IR: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... }; update induction variable independent on VF (or don't care about how many elements are processed in the iteration). The update is loop invariant. So it won't be the problem even if LOOP_VINFO_USING_SELECT_VL_P is true. Testing passed, Ok for trunk ? PR tree-optimization/112438 gcc/ChangeLog: * tree-vect-loop.cc (vectorizable_induction): gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr112438.c: New test. --- .../gcc.target/riscv/rvv/autovec/pr112438.c | 33 +++++++++++++++++++ gcc/tree-vect-loop.cc | 30 ++++++++++++++++- 2 files changed, 62 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c new file mode 100644 index 00000000000..51f90df38a0 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr112438.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -fno-vect-cost-model -ffast-math -fdump-tree-optimized-details" } */ + +void +foo (int n, int *__restrict in, int *__restrict out) +{ + for (int i = 0; i < n; i += 1) + { + out[i] = in[i] + i; + } +} + +void +foo2 (int n, float * __restrict in, +float * __restrict out) +{ + for (int i = 0; i < n; i += 1) + { + out[i] = in[i] + i; + } +} + +void +foo3 (int n, float * __restrict in, +float * __restrict out, float x) +{ + for (int i = 0; i < n; i += 1) + { + out[i] = in[i] + i* i; + } +} + +/* We don't want to see vect_vec_iv_.21_25 + { POLY_INT_CST [4, 4], ... }. */ diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 8abc1937d74..b152072c969 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -10306,10 +10306,36 @@ vectorizable_induction (loop_vec_info loop_vinfo, /* Create the vector that holds the step of the induction. */ + gimple_stmt_iterator *step_iv_si = NULL; if (nested_in_vect_loop) /* iv_loop is nested in the loop to be vectorized. Generate: vec_step = [S, S, S, S] */ new_name = step_expr; + else if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)) + { + /* When we're using loop_len produced by SELEC_VL, the non-final + iterations are not always processing VF elements. So vectorize + induction variable instead of + + _21 = vect_vec_iv_.6_22 + { VF, ... }; + + We should generate: + + _35 = .SELECT_VL (ivtmp_33, VF); + vect_cst__22 = [vec_duplicate_expr] _35; + _21 = vect_vec_iv_.6_22 + vect_cst__22; */ + gcc_assert (!slp_node); + gimple_seq seq = NULL; + vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo); + tree len = vect_get_loop_len (loop_vinfo, NULL, lens, 1, vectype, 0, 0); + expr = force_gimple_operand (fold_convert (TREE_TYPE (step_expr), + unshare_expr (len)), + &seq, true, NULL_TREE); + new_name = gimple_build (&seq, MULT_EXPR, TREE_TYPE (step_expr), expr, + step_expr); + gsi_insert_seq_before (&si, seq, GSI_SAME_STMT); + step_iv_si = &si; + } else { /* iv_loop is the loop to be vectorized. Generate: @@ -10336,7 +10362,7 @@ vectorizable_induction (loop_vec_info loop_vinfo, || TREE_CODE (new_name) == SSA_NAME); new_vec = build_vector_from_val (step_vectype, t); vec_step = vect_init_vector (loop_vinfo, stmt_info, - new_vec, step_vectype, NULL); + new_vec, step_vectype, step_iv_si); /* Create the following def-use cycle: @@ -10382,6 +10408,8 @@ vectorizable_induction (loop_vec_info loop_vinfo, gimple_seq seq = NULL; /* FORNOW. This restriction should be relaxed. */ gcc_assert (!nested_in_vect_loop); + /* We expect LOOP_VINFO_USING_SELECT_VL_P to be false if ncopies > 1. */ + gcc_assert (!LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo)); /* Create the vector that holds the step of the induction. */ if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)))