From patchwork Fri Jun 30 02:36:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 114537 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp10054394vqr; Thu, 29 Jun 2023 19:37:03 -0700 (PDT) X-Google-Smtp-Source: APBJJlHXV3OJ41RCWdf/uDpiiSs0+kU5olstEbf6evL9BKjVuYUoy0Ip1HN1odXKoB9segzsF+1i X-Received: by 2002:aa7:c697:0:b0:51d:960d:5fce with SMTP id n23-20020aa7c697000000b0051d960d5fcemr571145edq.14.1688092623398; Thu, 29 Jun 2023 19:37:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688092623; cv=none; d=google.com; s=arc-20160816; b=CRO+bNQNBZef8xTOuG5XoV+AkkZ5SZ6BRM+mR6skr+9nTv13m7Ppv0GwT7C3oNkMex HVUhiQaMJoIjuw6MklZA2Pvo1VEbi5b3SFoNtzgatlkCOewaGfbcXGm1sx5BiESI9hM/ Rq5NExGePOU0Wh3U8ycA+bUI1JlvjV2NAZbu85R6+iXZDU5H19UFeQI+1WStFVzkjGtN d0/UDAIaljtQPU5Myje4bL+F22R67EpDBm2vQbC7NDBn5vaqsVyBmlWJz17tcXBAB0HM ea3wi7USLs/wcnGDWP1dngR4d9ML6CW7iGgZQG8HUgLWuAgxDrqz/eM+61OBIlwjK65S WfsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=5fmGENpNPVpw3IJPj2/uxHfYEA+yWiz2qAvFpXZEyPA=; fh=h6UG6PIw2FD/e8JL+9n/VDDoO75bXx4B3jGj5okR2Rw=; b=1D74lAIs+H1YYCafVm1BJMhY6JsU3MlvtwPJyxzEuVUr6aMgsXDOkIsaHIfo66UgTo ZkMZyozuKJmvAAZGwXNvG+5YZBF8NCeE6nSVkKA0SUNgVpIv3vol0caEk80LOQvF2KMl xVlvPQnNVB2mBvCJnMElLMAFONZpiMpCSVBlTzTsoApQvZW+j+erBSZupBiJRkOQ334J DoLwzqVTUg7lrqcUHs1nh6nWnXgKpQVNkS4spziMhkgzFBXAQVu2LQbrcxgEA70dhsJX qrM45xVfcpvSBef6bLLSMgn1ONn7ZXjxV+e3DPpEdL1NJzU58s433KDk/qAgJUSdJH8p tnGg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id d8-20020a056402078800b0051de4f15391si985425edy.214.2023.06.29.19.37.03 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Jun 2023 19:37:03 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 508303858017 for ; Fri, 30 Jun 2023 02:37:00 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgbr1.qq.com (smtpbgbr1.qq.com [54.207.19.206]) by sourceware.org (Postfix) with ESMTPS id 940C03858D35 for ; Fri, 30 Jun 2023 02:36:27 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 940C03858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp80t1688092580t00ec8h0 Received: from rios-cad5.localdomain ( [58.60.1.11]) by bizesmtp.qq.com (ESMTP) with id ; Fri, 30 Jun 2023 10:36:19 +0800 (CST) X-QQ-SSF: 01400000000000G0T000000A0000000 X-QQ-FEAT: q+EIYT+FhZrmmDJdp7EmxAUfk1cW+61wHwnppkdH9rQSFjdjguKzUoWWxIe81 JSoWoftIsIGOXQbFdtYQejgADC6IPMdUjZKDKEf57ZyXagQAN/FuKqitZ0COsOIF4CHEXwI OsD0rOx2CNexE3zvWg9rbzUx5QfnM1Fv8+QrOxJfxI+Hr6LPQsDA9uAwqhFI/wNZGvM0dX7 TEOyjvbzYmH79H5cU2WuJp6dDd7jX+cDyPWOfWP972Dw4k3B+dNE6T0/j6g7HAxDqSx5foM 7ZVcifJnHghTnOPA50dD7nY+pnyqjqSX8RqFfji4IP3x7MgRbTaHz9Wm0jSESzTV8bw4bRF TbD6HYCRdzYx8X9IhVVTMiDK278ZuTR7QDY/ND3ddGSC/aUmosauxvU+VFBxwzOfexIg9jF X-QQ-GoodBg: 2 X-BIZMAIL-ID: 3535212585988695978 From: juzhe.zhong@rivai.ai To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Ju-Zhe Zhong Subject: [PATCH V2] Machine Description: Add LEN_MASK_{GATHER_LOAD, SCATTER_STORE} pattern Date: Fri, 30 Jun 2023 10:36:18 +0800 Message-Id: <20230630023618.3898001-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769761651691010271?= X-GMAIL-MSGID: =?utf-8?q?1770093410954967910?= From: Ju-Zhe Zhong Hi, Richi and Richard. This patch is adding LEN_MASK_{GATHER_LOAD,SCATTER_STORE} to allow targets handle flow control by mask and loop control by length on gather/scatter memory operations. Consider this following case: #include void f (uint8_t *restrict a, uint8_t *restrict b, int n, int base, int step, int *restrict cond) { for (int i = 0; i < n; ++i) { if (cond[i]) a[i * step + base] = b[i * step + base]; } } We hope RVV can vectorize such case into following IR: loop_len = SELECT_VL control_mask = comparison v = LEN_MASK_GATHER_LOAD (.., loop_len, control_mask) LEN_SCATTER_STORE (... v, ..., loop_len, control_mask) This patch doesn't apply such patterns into vectorizer, just add patterns and update the documents. Will send patch which apply such patterns into vectorizer soon after this patch is approved. Thanks. gcc/ChangeLog: * doc/md.texi: Add LEN_MASK_{GATHER_LOAD,SCATTER_STORE}. * internal-fn.cc (expand_scatter_store_optab_fn): Ditto. (expand_gather_load_optab_fn): Ditto. (internal_load_fn_p): Ditto. (internal_store_fn_p): Ditto. (internal_gather_scatter_fn_p): Ditto. (internal_fn_mask_index): Ditto. (internal_fn_stored_value_index): Ditto. * internal-fn.def (LEN_MASK_GATHER_LOAD): Ditto. (LEN_MASK_SCATTER_STORE): Ditto. * optabs.def (OPTAB_CD): Ditto. --- gcc/doc/md.texi | 17 +++++++++++++++++ gcc/internal-fn.cc | 32 ++++++++++++++++++++++++++++++-- gcc/internal-fn.def | 8 ++++++-- gcc/optabs.def | 2 ++ 4 files changed, 55 insertions(+), 4 deletions(-) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 9648fdc846a..b84aaab7075 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5040,6 +5040,15 @@ operand 5. Bit @var{i} of the mask is set if element @var{i} of the result should be loaded from memory and clear if element @var{i} of the result should be set to zero. +@cindex @code{len_mask_gather_load@var{m}@var{n}} instruction pattern +@item @samp{len_mask_gather_load@var{m}@var{n}} +Like @samp{gather_load@var{m}@var{n}}, but takes an extra length operand (operand 5) +as well as a mask operand (operand 6). Similar to len_maskload, the instruction loads +at most (operand 5) elements from memory. +Bit @var{i} of the mask is set if element @var{i} of the result should +be loaded from memory and clear if element @var{i} of the result should be undefined. +Mask elements @var{i} with i > (operand 5) are ignored. + @cindex @code{scatter_store@var{m}@var{n}} instruction pattern @item @samp{scatter_store@var{m}@var{n}} Store a vector of mode @var{m} into several distinct memory locations. @@ -5069,6 +5078,14 @@ Like @samp{scatter_store@var{m}@var{n}}, but takes an extra mask operand as operand 5. Bit @var{i} of the mask is set if element @var{i} of the result should be stored to memory. +@cindex @code{len_mask_scatter_store@var{m}@var{n}} instruction pattern +@item @samp{len_mask_scatter_store@var{m}@var{n}} +Like @samp{scatter_store@var{m}@var{n}}, but takes an extra length operand (operand 5) +as well as a mask operand (operand 6). The instruction stores at most (operand 5) elements +of (operand 4) to memory. +Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored. +Mask elements @var{i} with i > (operand 5) are ignored. + @cindex @code{vec_set@var{m}} instruction pattern @item @samp{vec_set@var{m}} Set given field in the vector value. Operand 0 is the vector to modify, diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 9017176dc7a..e4b558e33d8 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -3537,7 +3537,7 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab) HOST_WIDE_INT scale_int = tree_to_shwi (scale); rtx rhs_rtx = expand_normal (rhs); - class expand_operand ops[6]; + class expand_operand ops[7]; int i = 0; create_address_operand (&ops[i++], base_rtx); create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset))); @@ -3546,6 +3546,14 @@ expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab) create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs))); if (mask_index >= 0) { + if (optab == len_mask_scatter_store_optab) + { + tree len = gimple_call_arg (stmt, mask_index - 1); + rtx len_rtx = expand_normal (len); + create_convert_operand_from (&ops[i++], len_rtx, + TYPE_MODE (TREE_TYPE (len)), + TYPE_UNSIGNED (TREE_TYPE (len))); + } tree mask = gimple_call_arg (stmt, mask_index); rtx mask_rtx = expand_normal (mask); create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask))); @@ -3572,7 +3580,7 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab) HOST_WIDE_INT scale_int = tree_to_shwi (scale); int i = 0; - class expand_operand ops[6]; + class expand_operand ops[7]; create_output_operand (&ops[i++], lhs_rtx, TYPE_MODE (TREE_TYPE (lhs))); create_address_operand (&ops[i++], base_rtx); create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset))); @@ -3584,6 +3592,17 @@ expand_gather_load_optab_fn (internal_fn, gcall *stmt, direct_optab optab) rtx mask_rtx = expand_normal (mask); create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask))); } + else if (optab == len_mask_gather_load_optab) + { + tree len = gimple_call_arg (stmt, 4); + rtx len_rtx = expand_normal (len); + create_convert_operand_from (&ops[i++], len_rtx, + TYPE_MODE (TREE_TYPE (len)), + TYPE_UNSIGNED (TREE_TYPE (len))); + tree mask = gimple_call_arg (stmt, 5); + rtx mask_rtx = expand_normal (mask); + create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask))); + } insn_code icode = convert_optab_handler (optab, TYPE_MODE (TREE_TYPE (lhs)), TYPE_MODE (TREE_TYPE (offset))); expand_insn (icode, i, ops); @@ -4434,6 +4453,7 @@ internal_load_fn_p (internal_fn fn) case IFN_MASK_LOAD_LANES: case IFN_GATHER_LOAD: case IFN_MASK_GATHER_LOAD: + case IFN_LEN_MASK_GATHER_LOAD: case IFN_LEN_LOAD: case IFN_LEN_MASK_LOAD: return true; @@ -4455,6 +4475,7 @@ internal_store_fn_p (internal_fn fn) case IFN_MASK_STORE_LANES: case IFN_SCATTER_STORE: case IFN_MASK_SCATTER_STORE: + case IFN_LEN_MASK_SCATTER_STORE: case IFN_LEN_STORE: case IFN_LEN_MASK_STORE: return true; @@ -4473,8 +4494,10 @@ internal_gather_scatter_fn_p (internal_fn fn) { case IFN_GATHER_LOAD: case IFN_MASK_GATHER_LOAD: + case IFN_LEN_MASK_GATHER_LOAD: case IFN_SCATTER_STORE: case IFN_MASK_SCATTER_STORE: + case IFN_LEN_MASK_SCATTER_STORE: return true; default: @@ -4504,6 +4527,10 @@ internal_fn_mask_index (internal_fn fn) case IFN_LEN_MASK_STORE: return 3; + case IFN_LEN_MASK_GATHER_LOAD: + case IFN_LEN_MASK_SCATTER_STORE: + return 5; + default: return (conditional_internal_fn_code (fn) != ERROR_MARK || get_unconditional_internal_fn (fn) != IFN_LAST ? 0 : -1); @@ -4522,6 +4549,7 @@ internal_fn_stored_value_index (internal_fn fn) case IFN_MASK_STORE_LANES: case IFN_SCATTER_STORE: case IFN_MASK_SCATTER_STORE: + case IFN_LEN_MASK_SCATTER_STORE: case IFN_LEN_STORE: return 3; diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index bc947c0fde7..5be24decf88 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -48,14 +48,14 @@ along with GCC; see the file COPYING3. If not see - mask_load: currently just maskload - load_lanes: currently just vec_load_lanes - mask_load_lanes: currently just vec_mask_load_lanes - - gather_load: used for {mask_,}gather_load + - gather_load: used for {mask_,len_mask_,}gather_load - len_load: currently just len_load - len_maskload: currently just len_maskload - mask_store: currently just maskstore - store_lanes: currently just vec_store_lanes - mask_store_lanes: currently just vec_mask_store_lanes - - scatter_store: used for {mask_,}scatter_store + - scatter_store: used for {mask_,len_mask_,}scatter_store - len_store: currently just len_store - len_maskstore: currently just len_maskstore @@ -157,6 +157,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE, DEF_INTERNAL_OPTAB_FN (GATHER_LOAD, ECF_PURE, gather_load, gather_load) DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE, mask_gather_load, gather_load) +DEF_INTERNAL_OPTAB_FN (LEN_MASK_GATHER_LOAD, ECF_PURE, + len_mask_gather_load, gather_load) DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load) DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload) @@ -164,6 +166,8 @@ DEF_INTERNAL_OPTAB_FN (LEN_MASK_LOAD, ECF_PURE, len_maskload, len_maskload) DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store) DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0, mask_scatter_store, scatter_store) +DEF_INTERNAL_OPTAB_FN (LEN_MASK_SCATTER_STORE, 0, + len_mask_scatter_store, scatter_store) DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store) DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes) diff --git a/gcc/optabs.def b/gcc/optabs.def index 9533eb11565..58933e61817 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -95,8 +95,10 @@ OPTAB_CD(len_maskload_optab, "len_maskload$a$b") OPTAB_CD(len_maskstore_optab, "len_maskstore$a$b") OPTAB_CD(gather_load_optab, "gather_load$a$b") OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b") +OPTAB_CD(len_mask_gather_load_optab, "len_mask_gather_load$a$b") OPTAB_CD(scatter_store_optab, "scatter_store$a$b") OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b") +OPTAB_CD(len_mask_scatter_store_optab, "len_mask_scatter_store$a$b") OPTAB_CD(vec_extract_optab, "vec_extract$a$b") OPTAB_CD(vec_init_optab, "vec_init$a$b")