From patchwork Wed May 10 13:05:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 92080 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp3607042vqo; Wed, 10 May 2023 06:06:41 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7Y6yaRvUIjAmf007hAiAgKRpjdskLr5lk7LvQSApEIk4QQr3BbBu0i6IlX3Loelq4O1yH9 X-Received: by 2002:a17:907:60c9:b0:967:2abb:2cec with SMTP id hv9-20020a17090760c900b009672abb2cecmr9443174ejc.64.1683724001049; Wed, 10 May 2023 06:06:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683724001; cv=none; d=google.com; s=arc-20160816; b=RO0cCQkqVCcYMudivOzSFUnyTzy7/x8In739YneUTm1GLo2woADBTWt+NsFN0g6WaJ VRnZYf2SRDmomlH15UxvSqXZ2zZFHwGPJ1FYWu06bg186QmPFkAgga86UthTqnOEvICo Oa+Dc0uOnV/uIb2wBjAgmkcPNwsDwpki+miMGnu/rZfknYk0umSDqwKtVMOG1pETohrH da4aSFUn87ahR3opFgHrM3Cve77a8nc+XEbTkCd4JUUhQhOAG1Dowp29RxYkSVxgNLws INytp9xO9OzPYhz4zkpqmK7NkjwqS2KuZeSS9Ya0hkyzYoj+b5UvJ2MTLxr1GgM0MBmj /Wvw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=/8dvO7mbUCljSaO3TgjmAWpffK4U8ZBAw0GBOhKZ/ZQ=; b=GmnoIcHNOegbxMuU4LaPq+dXGLafW22eenKX2A5kg4wba/aQaOl7oI6IltllMEnW4o k3ZXjHGt52U5Wt2x38fNtx7j1H9K7NVg7pOtne5pJjD+R1bsNRkCG1/aZg+PhSncAW5t 51n0KbkfAar8MgDdGmP5wvCGzm32VeKV5FESLIOfg2gREJKTYAdwCA9ndtN0ZXDQFkRW unsbgobQvFiw8HE2wQImi5Tw+x9py+wkNLjVjjPzOuernk156b0t4sRN9k9v1S/kV0e5 CLabTfXEn3spQ5mJBdh1d5KG7V3SD1G4ET4Keozc6b9/W+njDfpAzcarkpb1JXAQ8cSj aHtw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id hp34-20020a1709073e2200b0094f7b61c922si5419260ejc.1049.2023.05.10.06.06.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 May 2023 06:06:41 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6BE793856961 for ; Wed, 10 May 2023 13:06:34 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgsg1.qq.com (smtpbgsg1.qq.com [54.254.200.92]) by sourceware.org (Postfix) with ESMTPS id 4480F38582AB for ; Wed, 10 May 2023 13:06:05 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4480F38582AB Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp70t1683723946tu65iyii Received: from rios-cad5.localdomain ( [58.60.1.11]) by bizesmtp.qq.com (ESMTP) with id ; Wed, 10 May 2023 21:05:45 +0800 (CST) X-QQ-SSF: 01400000000000F0Q000000A0000000 X-QQ-FEAT: bhet8yMU7vkD5PSZe4tROG0TX/oxH6WJ1/taYiTNFywT9v0Hd+pLl+YyeaQRA T9ngek/jVhnHOxLohVrRRqd4UKLvzDkUaCc7t5OXk+euK5JFMAJdrHhCwLeEHyuZ0YXga6p Sodtge49Oa/S4PLsrEwtYZfF5fCJWYD07TPdlfgjwUdIU7ybQKPjWOgOgh36VJ3SNL3yyxh EYUs1AWRqYH7+4CGC2TXQ8i70ITWVJlD5ChcJIwx3K1fLqqdTET5A8Z+HGBhogrSrYQSUeC W1+m5c60yMr5L/S4frIYEnwZozLQraYdz/iuOguj28NSpdffpvT3Z5atw6tk/kAGafEbRsp jW4nPjWFjXYiAL2G4RbXlUB9QG6dOtAffDj3NEHfzPZF8E6Gxck8xnd26QP3BOznLCl6TYI jUQ3owiYqSMmPscn/yRPHw== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 8234790658148766711 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, palmer@dabbelt.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, Juzhe-Zhong Subject: [PATCH] RISC-V: Add basic vec_init support for RVV auto-vectorizaiton Date: Wed, 10 May 2023 21:05:43 +0800 Message-Id: <20230510130543.4026214-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1765512578347949210?= X-GMAIL-MSGID: =?utf-8?q?1765512578347949210?= From: Juzhe-Zhong This is patching is adding basic vec_init support for RVV auto-vectorization. Testing is on-going. This patch makes vec_init support common init vector handling (using vslide1down to insert element) which can handle any cases of initialization vec but it's not optimal for cases. And support Case 1 optimizaiton: https://godbolt.org/z/GzYsTEfqx #include typedef int8_t vnx16qi __attribute__((vector_size (16))); __attribute__((noipa)) void foo(int8_t a, int8_t b, int8_t c, int8_t *out) { vnx16qi v = { a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b }; *(vnx16qi*) out = v; } LLVM codegen: foo: # @foo lui a3, 5 addiw a3, a3, 1365 vsetivli zero, 1, e16, mf4, ta, ma vmv.s.x v0, a3 vsetivli zero, 16, e8, m1, ta, ma vmv.v.x v8, a1 vmerge.vxm v8, v8, a0, v0 vse8.v v8, (a2) ret This patch codegen: foo: slli a0,a0,8 or a0,a0,a1 vsetvli a5,zero,e16,m1,ta,ma vmv.v.x v1,a0 vs1r.v v1,0(a2) ret We support more optimizations cases in the future. But they are not included in this patch. gcc/ChangeLog: * config/riscv/autovec.md (vec_init): New pattern. * config/riscv/riscv-protos.h (expand_vec_init): New function. * config/riscv/riscv-v.cc (class rvv_builder): New class. (rvv_builder::can_duplicate_repeating_sequence_p): New function. (rvv_builder::get_merged_repeating_sequence): Ditto. (expand_vector_init_insert_elems): Ditto. (expand_vec_init): Ditto. * config/riscv/vector-iterators.md: New attribute. --- gcc/config/riscv/autovec.md | 16 ++++ gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 127 +++++++++++++++++++++++++++ gcc/config/riscv/vector-iterators.md | 9 ++ 4 files changed, 153 insertions(+) diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 99dc4f046b0..fb57a52a4b6 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -82,3 +82,19 @@ DONE; } ) + +;; ------------------------------------------------------------------------- +;; ---- [INT,FP] Initialize from individual elements +;; ------------------------------------------------------------------------- +;; This is the pattern initialize the vector +;; ------------------------------------------------------------------------- + +(define_expand "vec_init" + [(match_operand:V 0 "register_operand") + (match_operand 1 "")] + "TARGET_VECTOR" + { + riscv_vector::expand_vec_init (operands[0], operands[1]); + DONE; + } +) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index e8a728ae226..7196e34e335 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -220,6 +220,7 @@ void expand_tuple_move (machine_mode, rtx *); machine_mode preferred_simd_mode (scalar_mode); opt_machine_mode get_mask_mode (machine_mode); void expand_vec_series (rtx, rtx, rtx); +void expand_vec_init (rtx, rtx); } /* We classify builtin types into two classes: diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 0c3b1b4c40b..9ab6d7d5f41 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -1091,4 +1091,131 @@ preferred_simd_mode (scalar_mode mode) return word_mode; } +class rvv_builder : public rtx_vector_builder +{ +public: + static const uint8_t MAX_DUP_SIZE = 64; + + rvv_builder () : rtx_vector_builder () {} + rvv_builder (machine_mode mode, unsigned int npatterns, + unsigned int nelts_per_pattern) + : rtx_vector_builder (mode, npatterns, nelts_per_pattern) + { + m_inner_mode = GET_MODE_INNER (mode); + m_inner_size = GET_MODE_BITSIZE (m_inner_mode).to_constant (); + } + + bool can_duplicate_repeating_sequence_p (); + rtx get_merged_repeating_sequence (); + + machine_mode new_mode () const { return m_new_mode; } + +private: + machine_mode m_inner_mode; + machine_mode m_new_mode; + scalar_int_mode m_new_inner_mode; + unsigned int m_inner_size; +}; + +/* Return true if the vector duplicated by a super element which is the fusion + of consecutive elements. + + v = { a, b, a, b } super element = ab, v = { ab, ab } */ +bool +rvv_builder::can_duplicate_repeating_sequence_p () +{ + poly_uint64 new_size = exact_div (full_nelts (), npatterns ()); + unsigned int new_inner_size = m_inner_size * npatterns (); + if (!int_mode_for_size (new_inner_size, 0).exists (&m_new_inner_mode) + || GET_MODE_BITSIZE (m_new_inner_mode) > MAX_DUP_SIZE + || !get_vector_mode (m_new_inner_mode, new_size).exists (&m_new_mode)) + return false; + return repeating_sequence_p (0, encoded_nelts (), npatterns ()); +} + +/* Merge the repeating sequence into a single element and return the RTX. */ +rtx +rvv_builder::get_merged_repeating_sequence () +{ + scalar_int_mode mode = m_new_inner_mode; + if (GET_MODE_SIZE (m_new_inner_mode) < UNITS_PER_WORD) + mode = Pmode; + rtx target = gen_reg_rtx (mode); + emit_move_insn (gen_lowpart (m_inner_mode, target), elt (npatterns () - 1)); + /* { a, b, a, b }: Generate duplicate element = a << bits | b. */ + for (unsigned int i = 0; i < npatterns () - 1; i++) + { + unsigned int loc = m_inner_size * (npatterns () - 1 - i); + rtx shift = GEN_INT (loc); + rtx tmp = expand_simple_binop (mode, ASHIFT, gen_lowpart (mode, elt (i)), + shift, NULL_RTX, false, OPTAB_DIRECT); + rtx tmp2 = expand_simple_binop (mode, IOR, tmp, target, NULL_RTX, false, + OPTAB_DIRECT); + emit_move_insn (target, tmp2); + } + if (GET_MODE_SIZE (m_new_inner_mode) < UNITS_PER_WORD) + return gen_lowpart (m_new_inner_mode, target); + return target; +} + +/* Subroutine of riscv_vector_expand_vector_init. + Works as follows: + (a) Initialize TARGET by broadcasting element NELTS_REQD - 1 of BUILDER. + (b) Skip leading elements from BUILDER, which are the same as + element NELTS_REQD - 1. + (c) Insert earlier elements in reverse order in TARGET using vslide1down. */ + +static void +expand_vector_init_insert_elems (rtx target, const rvv_builder &builder, + int nelts_reqd) +{ + machine_mode mode = GET_MODE (target); + scalar_mode elem_mode = GET_MODE_INNER (mode); + machine_mode mask_mode; + gcc_assert (get_mask_mode (mode).exists (&mask_mode)); + rtx dup = expand_vector_broadcast (mode, builder.elt (0)); + emit_move_insn (target, dup); + int ndups = builder.count_dups (0, nelts_reqd - 1, 1); + for (int i = ndups; i < nelts_reqd; i++) + { + unsigned int unspec + = FLOAT_MODE_P (mode) ? UNSPEC_VFSLIDE1DOWN : UNSPEC_VSLIDE1DOWN; + insn_code icode = code_for_pred_slide (unspec, mode); + rtx ops[3] = {target, target, builder.elt (i)}; + emit_binop (icode, ops, mask_mode, elem_mode); + } +} + +/* Initialize register TARGET from the elements in PARALLEL rtx VALS. */ + +void +expand_vec_init (rtx target, rtx vals) +{ + machine_mode mode = GET_MODE (target); + int nelts = XVECLEN (vals, 0); + + rvv_builder v (mode, nelts, 1); + for (int i = 0; i < nelts; i++) + v.quick_push (XVECEXP (vals, 0, i)); + v.finalize (); + + if (nelts > 3) + { + /* Case 1: Convert v = { a, b, a, b } into v = { ab, ab }. */ + if (v.can_duplicate_repeating_sequence_p ()) + { + rtx ele = v.get_merged_repeating_sequence (); + rtx dup = expand_vector_broadcast (v.new_mode (), ele); + emit_move_insn (target, gen_lowpart (mode, dup)); + return; + } + /* TODO: We will support more Initialization of vector in the future. */ + } + + /* Handle common situation by vslide1down. This function can handle any + situation of vec_init. Only the cases that are not optimized above + will fall through here. */ + expand_vector_init_insert_elems (target, v, nelts); +} + } // namespace riscv_vector diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index 29c9d77674b..19ea016cb6d 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -996,6 +996,15 @@ (VNx1DF "DF") (VNx2DF "DF") (VNx4DF "DF") (VNx8DF "DF") (VNx16DF "DF") ]) +(define_mode_attr vel [ + (VNx1QI "qi") (VNx2QI "qi") (VNx4QI "qi") (VNx8QI "qi") (VNx16QI "qi") (VNx32QI "qi") (VNx64QI "qi") (VNx128QI "qi") + (VNx1HI "hi") (VNx2HI "hi") (VNx4HI "hi") (VNx8HI "hi") (VNx16HI "hi") (VNx32HI "hi") (VNx64HI "hi") + (VNx1SI "si") (VNx2SI "si") (VNx4SI "si") (VNx8SI "si") (VNx16SI "si") (VNx32SI "si") + (VNx1DI "di") (VNx2DI "di") (VNx4DI "di") (VNx8DI "di") (VNx16DI "di") + (VNx1SF "sf") (VNx2SF "sf") (VNx4SF "sf") (VNx8SF "sf") (VNx16SF "sf") (VNx32SF "sf") + (VNx1DF "df") (VNx2DF "df") (VNx4DF "df") (VNx8DF "df") (VNx16DF "df") +]) + (define_mode_attr VSUBEL [ (VNx1HI "QI") (VNx2HI "QI") (VNx4HI "QI") (VNx8HI "QI") (VNx16HI "QI") (VNx32HI "QI") (VNx64HI "QI") (VNx1SI "HI") (VNx2SI "HI") (VNx4SI "HI") (VNx8SI "HI") (VNx16SI "HI") (VNx32SI "HI")