From patchwork Fri Jun 30 07:06:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 114595 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp10158026vqr; Fri, 30 Jun 2023 00:07:13 -0700 (PDT) X-Google-Smtp-Source: APBJJlGw5rAkwPJ7t5mOEAiW0zkXdDNxzB2sGrv56nTc+Sce2OHxI3wXv8EKyV91J0Bec9bCRYm+ X-Received: by 2002:a17:906:4805:b0:992:5deb:db89 with SMTP id w5-20020a170906480500b009925debdb89mr1135101ejq.22.1688108833692; Fri, 30 Jun 2023 00:07:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688108833; cv=none; d=google.com; s=arc-20160816; b=YRz842Squx2nufnXK7kOAPkQlEKCInJHjoU8Ldi1PuuN4jm3Vs+TSmyc3OZNeNszWe 5Qb2OlhTkRlZS3vNSxL0dZUXGvtGd72AAYWzi/FUxqAqCSEXO99/R2KRON1Ot70Sbyym Yxt7uQ9nfKx+k86CFVtcEsFG9xCSwL3RJJsyUmOaYLDIkJPtiTtTkKNO3vFmraYfG8YA wDUb1XUcGrHTD8Iqm8KVkesEhq0Z2S6aFpWo3HkeBVk8dv1jtBdQqzLhSCSxBW8sA5S9 wXW1mLR2oSqGu0b0Uou1zH9zWztWrdKJrYfrS8ysbwPlx/A9VrqBY1kzjFYBBWwZWVqI RQiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=acRY+z8si7YNKqLjKeUD4YUeDYP/YGGvUhfKYhivH38=; fh=+AvuKg/ZieviC39HgdTh2WPJ5uJlkXvTpH1RHAly6yw=; b=OueUbFvW5wFtgSUQwGCYnEqaocOThye8z09/pRmEbE9IEHtg/vKeneaZgG/m4wacXa pkNGkx5kBadpu+1bOldBHB3CuaxddbLZiqjqTEHzrx0WHgsEKtwI6UzxnpDXEIqoiQ5t bz62oW1Y09VIcZp3J1rjycADQMNxK1Q055c8DeLHFQkVRGg8sCugADa9PrA+ss72ylBu HfGVft6VtuLOp6bgM69TWSdm6DbPcLjLDeZIfmXoXy15bcRCszUpcSX5dbzOues2mc57 no/4OTudc3+jvR+osSj+hGREOdgJXQ6GmvMXpShAl2H+U444Sx5o5ExDQ+rPMiEnzs// kMCA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id r13-20020a17090638cd00b00989238bbcdesi7913994ejd.370.2023.06.30.00.07.13 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Jun 2023 00:07:13 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1287A385482E for ; Fri, 30 Jun 2023 07:07:00 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgau1.qq.com (smtpbgau1.qq.com [54.206.16.166]) by sourceware.org (Postfix) with ESMTPS id 94E07385B525 for ; Fri, 30 Jun 2023 07:06:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 94E07385B525 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp91t1688108787tpc0g4kb Received: from rios-cad5.localdomain ( [58.60.1.11]) by bizesmtp.qq.com (ESMTP) with id ; Fri, 30 Jun 2023 15:06:25 +0800 (CST) X-QQ-SSF: 01400000000000G0T000000A0000000 X-QQ-FEAT: 90EFqYDyPxDaW8glZEDdznKI/FIGvVKtX0pEl+4KV3l2gsF62KvMtQojR6cCT Fc1qdUpENSQSrDEi644euxP6dh1ddzX7C2JLluyX7JEljw0va3xCGWylf2LU4tulV4l2DU+ 4afswk8e9fG2iF0xBAkfsjHWm/pOB30V/0NHrwY4ITnQ4R33lL93UYxZMSIbn5TWRWpCYan pdx+YBx9LOVckBHk7btJGFyg6grEyTcdn0JNfNsUFG7TXB+IQLbGz0DF/A+vNMHPUbjxsa5 +ZULmLD2foQ5+pxMm7D024d5G5WobfEtOeArOBLO01jddm6CaQfVGqN2uYyJ7eBqEPsuA1W gntROg9iZPo6OC7240L1eJuiQnmjDw+QAkjLtiaUtOHgXHyzvD10HYKRgxa/yQRO81oWlHc 308Vrb/9yYuBwtCRv6+mXQ== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 10917293763170055975 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, rdapp.gcc@gmail.com, Ju-Zhe Zhong Subject: [PATCH V4] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern Date: Fri, 30 Jun 2023 15:06:24 +0800 Message-Id: <20230630070624.3953527-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1770110089623766460?= X-GMAIL-MSGID: =?utf-8?q?1770110407979408156?= From: Ju-Zhe Zhong Hi, Richi and Richard. This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets handle flow control by mask and loop control by length on gather/scatter memory operations. Consider this following case: #include void f (uint8_t *restrict a, uint8_t *restrict b, int n, int base, int step, int *restrict cond) { for (int i = 0; i < n; ++i) { if (cond[i]) a[i * step + base] = b[i * step + base]; } } We hope RVV can vectorize such case into following IR: loop_len = SELECT_VL control_mask = comparison v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask, bias) LEN_SCATTER_STORE (... v, ..., loop_len, control_mask, bias) This patch doesn't apply such patterns into vectorizer, just add patterns and update the documents. Will send patch which apply such patterns into vectorizer soon after this patch is approved. Thanks. gcc/ChangeLog: * doc/md.texi: Add len_mask_gather/scatter. * internal-fn.cc (expand_scatter_store_optab_fn): Ditto. (expand_gather_load_optab_fn): Ditto. (internal_load_fn_p): Ditto. (internal_store_fn_p): Ditto. (internal_gather_scatter_fn_p): Ditto. (internal_fn_len_index): Ditto. (internal_fn_stored_value_index): Ditto. * internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto. (LEN_MASK_SCATTER_STORE): Ditto. * internal-fn.h (internal_fn_len_index): Ditto. * optabs.def (OPTAB_CD): Ditto. --- gcc/doc/md.texi | 17 ++++++++++++ gcc/internal-fn.cc | 67 +++++++++++++++++++++++++++++++++++++++++++-- gcc/internal-fn.def | 8 ++++-- gcc/internal-fn.h | 1 + gcc/optabs.def | 2 ++ 5 files changed, 90 insertions(+), 5 deletions(-) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 9648fdc846a..df41b5251d4 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5040,6 +5040,15 @@ operand 5. Bit @var{i} of the mask is set if element @var{i} of the result should be loaded from memory and clear if element @var{i} of the result should be set to zero. +@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern +@item @samp{len_mask_gather_load@var{m}@var{n}} +Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand (operand 5), +a mask operand (operand 6) as well as a bias operand (operand 7). Similar to len_maskload, +the instruction loads at most (operand 5 + operand 7) elements from memory. +Bit @var{i} of the mask is set if element @var{i} of the result should +be loaded from memory and clear if element @var{i} of the result should be undefined. +Mask elements @var{i} with i > (operand 5) are ignored. + @cindex @code{scatter_store@var{m}@var{n}} instruction pattern @item @samp{scatter_store@var{m}@var{n}} Store a vector of mode @var{m} into several distinct memory locations. @@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an extra mask operand as operand 5. Bit @var{i} of the mask is set if element @var{i} of the result should be stored to memory. +@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern +@item @samp{len_mask_scatter_store@var{m}@var{n}} +Like @samp{scatter_store@var{m}@var{n}}, but takes an extra length operand (operand 5), +a mask operand (operand 6) as well as a bias operand (operand 7). The instruction stores +at most (operand 5 + operand 7) elements of (operand 4) to memory. +Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored. +Mask elements @var{i} with i > (operand 5) are ignored. + @cindex @code{vec_set@var{m}} instruction pattern @item @samp{vec_set@var{m}} Set given field in the vector value. Operand 0 is the vector to modify, diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 9017176dc7a..6401eeeccb9 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -3537,7 +3537,7 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab) HOST_WIDE_INT scale_int = tree_to_shwi (scale); rtx rhs_rtx = expand_normal (rhs); - class expand_operand ops[6]; + class expand_operand ops[7]; int i = 0; create_address_operand (&ops[i++], base_rtx); create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset))); @@ -3546,9 +3546,23 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab) create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs))); if (mask_index >= 0) { + if (optab == len_mask_scatter_store_optab) + { + tree len = gimple_call_arg (stmt, internal_fn_len_index (ifn)); + rtx len_rtx = expand_normal (len); + create_convert_operand_from (&ops[i++], len_rtx, + TYPE_MODE (TREE_TYPE (len)), + TYPE_UNSIGNED (TREE_TYPE (len))); + } tree mask = gimple_call_arg (stmt, mask_index); rtx mask_rtx = expand_normal (mask); create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask))); + if (optab == len_mask_scatter_store_optab) + { + tree biast = gimple_call_arg (stmt, gimple_call_num_args (stmt) - 1); + rtx bias = expand_normal (biast); + create_input_operand (&ops[i++], bias, QImode); + } } insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (rhs)), @@ -3559,7 +3573,7 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab) /* Expand {MASK_,}GATHER_LOAD call CALL using optab OPTAB. */ static void -expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab) +expand_gather_load_optab_fn (internal_fn ifn, gcall *stmt, direct_optab optab) { tree lhs = gimple_call_lhs (stmt); tree base = gimple_call_arg (stmt, 0); @@ -3572,7 +3586,7 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab) HOST_WIDE_INT scale_int = tree_to_shwi (scale); int i = 0; - class expand_operand ops[6]; + class expand_operand ops[7]; create_output_operand (&ops[i++], lhs_rtx, TYPE_MODE (TREE_TYPE (lhs))); create_address_operand (&ops[i++], base_rtx); create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset))); @@ -3584,6 +3598,20 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab) rtx mask_rtx = expand_normal (mask); create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask))); } + else if (optab == len_mask_gather_load_optab) + { + tree len = gimple_call_arg (stmt, internal_fn_len_index (ifn)); + rtx len_rtx = expand_normal (len); + create_convert_operand_from (&ops[i++], len_rtx, + TYPE_MODE (TREE_TYPE (len)), + TYPE_UNSIGNED (TREE_TYPE (len))); + tree mask = gimple_call_arg (stmt, internal_fn_mask_index (ifn)); + rtx mask_rtx = expand_normal (mask); + create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask))); + tree biast = gimple_call_arg (stmt, gimple_call_num_args (stmt) - 1); + rtx bias = expand_normal (biast); + create_input_operand (&ops[i++], bias, QImode); + } insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs)), TYPE_MODE (TREE_TYPE (offset))); expand_insn (icode, i, ops); @@ -4434,6 +4462,7 @@ internal_load_fn_p (internal_fn fn) case IFN_MASK_LOAD_LANES: case IFN_GATHER_LOAD: case IFN_MASK_GATHER_LOAD: + case IFN_LEN_MASK_GATHER_LOAD: case IFN_LEN_LOAD: case IFN_LEN_MASK_LOAD: return true; @@ -4455,6 +4484,7 @@ internal_store_fn_p (internal_fn fn) case IFN_MASK_STORE_LANES: case IFN_SCATTER_STORE: case IFN_MASK_SCATTER_STORE: + case IFN_LEN_MASK_SCATTER_STORE: case IFN_LEN_STORE: case IFN_LEN_MASK_STORE: return true; @@ -4473,8 +4503,10 @@ internal_gather_scatter_fn_p (internal_fn fn) { case IFN_GATHER_LOAD: case IFN_MASK_GATHER_LOAD: + case IFN_LEN_MASK_GATHER_LOAD: case IFN_SCATTER_STORE: case IFN_MASK_SCATTER_STORE: + case IFN_LEN_MASK_SCATTER_STORE: return true; default: @@ -4504,6 +4536,34 @@ internal_fn_mask_index (internal_fn fn) case IFN_LEN_MASK_STORE: return 3; + case IFN_LEN_MASK_GATHER_LOAD: + case IFN_LEN_MASK_SCATTER_STORE: + return 5; + + default: + return (conditional_internal_fn_code (fn) != ERROR_MARK + || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1); + } +} + +/* If FN takes a vector len argument, return the index of that argument, + otherwise return -1. */ + +int +internal_fn_len_index (internal_fn fn) +{ + switch (fn) + { + case IFN_LEN_LOAD: + case IFN_LEN_STORE: + case IFN_LEN_MASK_LOAD: + case IFN_LEN_MASK_STORE: + return 2; + + case IFN_LEN_MASK_GATHER_LOAD: + case IFN_LEN_MASK_SCATTER_STORE: + return 4; + default: return (conditional_internal_fn_code (fn) != ERROR_MARK || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1); @@ -4522,6 +4582,7 @@ internal_fn_stored_value_index (internal_fn fn) case IFN_MASK_STORE_LANES: case IFN_SCATTER_STORE: case IFN_MASK_SCATTER_STORE: + case IFN_LEN_MASK_SCATTER_STORE: case IFN_LEN_STORE: return 3; diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index bc947c0fde7..5be24decf88 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -48,14 +48,14 @@ along with GCC; see the file COPYING3. If not see - mask_load: currently just maskload - load_lanes: currently just vec_load_lanes - mask_load_lanes: currently just vec_mask_load_lanes - - gather_load: used for {mask_,}gather_load + - gather_load: used for {mask_,len_mask_,}gather_load - len_load: currently just len_load - len_maskload: currently just len_maskload - mask_store: currently just maskstore - store_lanes: currently just vec_store_lanes - mask_store_lanes: currently just vec_mask_store_lanes - - scatter_store: used for {mask_,}scatter_store + - scatter_store: used for {mask_,len_mask_,}scatter_store - len_store: currently just len_store - len_maskstore: currently just len_maskstore @@ -157,6 +157,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE, DEF_INTERNAL_OPTAB_FN (GATHER_LOAD, ECF_PURE, gather_load, gather_load) DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE, mask_gather_load, gather_load) +DEF_INTERNAL_OPTAB_FN (LEN_MASK_GATHER_LOAD, ECF_PURE, + len_mask_gather_load, gather_load) DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load) DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload) @@ -164,6 +166,8 @@ DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload) DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store) DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0, mask_scatter_store, scatter_store) +DEF_INTERNAL_OPTAB_FN (LEN_MASK_SCATTER_STORE, 0, + len_mask_scatter_store, scatter_store) DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store) DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes) diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 8f21068e300..4234bbfed87 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -234,6 +234,7 @@ extern bool internal_load_fn_p (internal_fn); extern bool internal_store_fn_p (internal_fn); extern bool internal_gather_scatter_fn_p (internal_fn); extern int internal_fn_mask_index (internal_fn); +extern int internal_fn_len_index (internal_fn); extern int internal_fn_stored_value_index (internal_fn); extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree, tree, tree, int); diff --git a/gcc/optabs.def b/gcc/optabs.def index 9533eb11565..58933e61817 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -95,8 +95,10 @@ OPTAB_CD(len_maskload_optab, "len_maskload$a$b") OPTAB_CD(len_maskstore_optab, "len_maskstore$a$b") OPTAB_CD(gather_load_optab, "gather_load$a$b") OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b") +OPTAB_CD(len_mask_gather_load_optab, "len_mask_gather_load$a$b") OPTAB_CD(scatter_store_optab, "scatter_store$a$b") OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b") +OPTAB_CD(len_mask_scatter_store_optab, "len_mask_scatter_store$a$b") OPTAB_CD(vec_extract_optab, "vec_extract$a$b") OPTAB_CD(vec_init_optab, "vec_init$a$b")