From patchwork Tue Oct 31 09:59:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 160053 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b90f:0:b0:403:3b70:6f57 with SMTP id t15csp122311vqg; Tue, 31 Oct 2023 03:00:00 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF3hMfNgX/3wS3sx9ry/SSS9A4gVF916CvX561pGemV4tfPSqoAeIBRhYpqsZhNgvBczFis X-Received: by 2002:a05:622a:50b:b0:417:a74f:69b5 with SMTP id l11-20020a05622a050b00b00417a74f69b5mr14729286qtx.33.1698746400118; Tue, 31 Oct 2023 03:00:00 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1698746400; cv=pass; d=google.com; s=arc-20160816; b=X5D4xrFpme68T9zVPcdJBGg1t2TU6j6gcMnBMBFK1tlhIuTpKp9pxpqwkTpSsLGI9R eMln7i9SMFLQRC/Ojl2dXQ9VU0DdhYVAa0pElL0nlmA5eZK+Lhy3kcwfjPAtLSVz0lCe 9+h5yTbiNJejRK8OBdHsbgNfb/EhbMiwaMFbcIGVztHOdP8V9jQ5ZgscKEUUtJV42A8I cIs53d8mRSEkeoH+XY7+YNNv1GbPP2UrM+AE7VQdmyOYEn74lARwAAqaDZ9QGniMoEOe ZeXm7blUionryg3rg2M2D0tOI+0KWet7yN15cRkd9ec2D8FSeHvJcYVJUrVtgGxzY6oN utPg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:arc-filter:dmarc-filter:delivered-to; bh=rnKHu/LjW+nNZSw5IQ+V/DlzfROMnA96aNM2PuBrXyg=; fh=cpCxYriPDnBWNE6v33zhwTr+HzW/qXvVG5q6aCOyXNs=; b=BX8RWCEvizpKtISwMogX/3OSFHt6mNBHmHExI07tu/Sadho8wdllc+ARvjcfWTA1X6 OZjPbkXLaMPNZVRUoPq4TjXLjyOeC0MEBewaUH2CKDlxsj14ugWy9jLtIOsvWvMbrRmL j+JvIEaFL0desRhsa6rqmPq2eiNwJY000F5V7fJu1RVA+WFxdUGsWWYkZFAn+I1uutIz xWBaP8U6VqSqBmOb6QTTN+nywmtR8+JMbF7pMPnKXRtuBg0qmzp7wjBb/5bdTbErpEFR /GxKW06vspXdhoKOORhHut02o285nokZpi9u+Z+cJyE1iRXiJeTFCzW2z5f9rbW+mTMZ yFdA== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id v36-20020a05622a18a400b0041020b894b6si773223qtc.101.2023.10.31.03.00.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Oct 2023 03:00:00 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DA2B83858C2B for ; Tue, 31 Oct 2023 09:59:59 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbg153.qq.com (smtpbg153.qq.com [13.245.218.24]) by sourceware.org (Postfix) with ESMTPS id E08113858D1E for ; Tue, 31 Oct 2023 09:59:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E08113858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E08113858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=13.245.218.24 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698746377; cv=none; b=NjBjp/uICkE/UUnVUQJ+UBkITfGOHkOGoGIam/SPXX1ixrY+09fVUoilutwqR3s4ezYmVtethybSVFBQ0zN4iqAFS6S72M8AqFoyAas+fYPO/POlcCNLF85kb48psGyQOyw3qG/SZ8zxVX01IQgvsbq7lHHaQGjI/LC7JZpmhSE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698746377; c=relaxed/simple; bh=sG7nSQ4MVz+jh02X2gYzYP5QkyouUsRrsCIyo3hXTlM=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=RypzFXlPcHFoGdf1DtQEv5uslR7oodF1P/OeXvbbQ084Qk1a9Q7OYe43M7bLnHxq3hdQNRCcz7L1FLSYjBBvttSADpdsAzKCIB/FM7STNmmL/Dw520P5kyCqgQ3HB4DUcY4QbSv/qo2Qa8dhaSjnJb716C5/Tlxv/RcCre+EX0g= ARC-Authentication-Results: i=1; server2.sourceware.org X-QQ-mid: bizesmtp71t1698746364thd8f79k Received: from rios-cad121.hadoop.rioslab.org ( [58.60.1.9]) by bizesmtp.qq.com (ESMTP) with id ; Tue, 31 Oct 2023 17:59:22 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: LGwoLArZsiUtYkO2N472IJUlaqogVfYQfSuwUHYFdufgGLeLx7ESagAUKlfVy 41/6bnb9clNFIulG8oXChWhhxEZNinkNKIuO2JLhoUk0bZ44JIlARpmuVBEjai7EG6p5W+V TSsY4YcxO1OoCmfwd5y+a3/O7U8EDokHoccHjIBru2LmGYBWUn1ymOGwEQtaLhhvGZUYNr7 DLQGI9dE+OKUU5VCGzRugfrjL8KHPACbOpj5Gu8qdWjjYu0HZD0DVXbFQOU68P3OznGVu7s QL1h8ZXGi2hGuJP+qs0Oq2B1hYVFv3uLv/yuSkkixwc7g1bZYwbUXDRFODZPa+kNt2GCtqm sWEluc1O5W+LYMgXFzTi/kkixoLiuSAqM9sv1xq0HrHMzOePQ5PXo1xxCCybOQxs2fZ2ZC/ WCxHLbVEQpNQO7SNW+kRKQ== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 5966515459790488366 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: rguenther@suse.de, jeffreyalaw@gmail.com, richard.sandiford@arm.com, rdapp.gcc@gmail.com, Juzhe-Zhong Subject: [PATCH V2] OPTABS/IFN: Add mask_len_strided_load/mask_len_strided_store OPTABS/IFN Date: Tue, 31 Oct 2023 17:59:20 +0800 Message-Id: <20231031095920.3210489-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781176606522636609 X-GMAIL-MSGID: 1781264705278040761 As previous Richard's suggested, we should support strided load/store in loop vectorizer instead hacking RISC-V backend. This patch adds MASK_LEN_STRIDED LOAD/STORE OPTABS/IFN. The GIMPLE IR is same as mask_len_gather_load/mask_len_scatter_store but with changing vector offset into scalar stride. We don't have strided_load/strided_store and mask_strided_load/mask_strided_store since it't unlikely RVV will have such optabs and we can't add the patterns that we can't test them. gcc/ChangeLog: * doc/md.texi: Add mask_len_strided_load/mask_len_strided_store. * internal-fn.cc (internal_load_fn_p): Ditto. (internal_strided_fn_p): Ditto. (internal_fn_len_index): Ditto. (internal_fn_mask_index): Ditto. (internal_fn_stored_value_index): Ditto. (internal_strided_fn_supported_p): Ditto. * internal-fn.def (MASK_LEN_STRIDED_LOAD): Ditto. (MASK_LEN_STRIDED_STORE): Ditto. * internal-fn.h (internal_strided_fn_p): Ditto. (internal_strided_fn_supported_p): Ditto. * optabs.def (OPTAB_CD): Ditto. --- gcc/doc/md.texi | 51 +++++++++++++++++++++++++++++++++++++++++++++ gcc/internal-fn.cc | 44 ++++++++++++++++++++++++++++++++++++++ gcc/internal-fn.def | 4 ++++ gcc/internal-fn.h | 2 ++ gcc/optabs.def | 2 ++ 5 files changed, 103 insertions(+) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index fab2513105a..5bac713a0dd 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -5094,6 +5094,32 @@ Bit @var{i} of the mask is set if element @var{i} of the result should be loaded from memory and clear if element @var{i} of the result should be undefined. Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored. +@cindex @code{mask_len_strided_load@var{m}@var{n}} instruction pattern +@item @samp{mask_len_strided_load@var{m}@var{n}} +Load several separate memory locations into a destination vector of mode @var{m}. +Operand 0 is a destination vector of mode @var{m}. +Operand 1 is a scalar base address and operand 2 is a scalar stride of mode @var{n}. +The instruction can be seen as a special case of @code{mask_len_gather_load@var{m}@var{n}} +with an offset vector that is a @code{vec_series} with operand 1 as base and operand 2 as step. +For each element index i: + +@itemize @bullet +@item +extend the stride to address width, using zero +extension if operand 3 is 1 and sign extension if operand 3 is zero; +@item +multiply the extended stride by operand 4; +@item +add the result to the base; and +@item +load the value at that address (operand 1 + @var{i} * multiplied and extended stride) into element @var{i} of operand 0. +@end itemize + +Similar to mask_len_load, the instruction loads at most (operand 6 + operand 7) elements from memory. +Bit @var{i} of the mask is set if element @var{i} of the result should +be loaded from memory and clear if element @var{i} of the result should be undefined. +Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored. + @cindex @code{scatter_store@var{m}@var{n}} instruction pattern @item @samp{scatter_store@var{m}@var{n}} Store a vector of mode @var{m} into several distinct memory locations. @@ -5131,6 +5157,31 @@ at most (operand 6 + operand 7) elements of (operand 4) to memory. Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored. Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored. +@cindex @code{mask_len_strided_store@var{m}@var{n}} instruction pattern +@item @samp{mask_len_strided_store@var{m}@var{n}} +Store a vector of mode m into several distinct memory locations. +Operand 0 is a scalar base address and operand 1 is scalar stride of mode @var{n}. +Operand 2 is the vector of values that should be stored, which is of mode @var{m}. +The instruction can be seen as a special case of @code{mask_len_scatter_store@var{m}@var{n}} +with an offset vector that is a @code{vec_series} with operand 1 as base and operand 2 as step. +For each element index i: + +@itemize @bullet +@item +extend the stride to address width, using zero +extension if operand 2 is 1 and sign extension if operand 2 is zero; +@item +multiply the extended stride by operand 3; +@item +add the result to the base; and +@item +store element @var{i} of operand 4 to that address (operand 1 + @var{i} * multiplied and extended stride). +@end itemize + +Similar to mask_len_store, the instruction stores at most (operand 6 + operand 7) elements of (operand 4) to memory. +Bit @var{i} of the mask is set if element @var{i} of (operand 4) should be stored. +Mask elements @var{i} with @var{i} > (operand 6 + operand 7) are ignored. + @cindex @code{vec_set@var{m}} instruction pattern @item @samp{vec_set@var{m}} Set given field in the vector value. Operand 0 is the vector to modify, diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index e7451b96353..f7f85aa7dde 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -4596,6 +4596,7 @@ internal_load_fn_p (internal_fn fn) case IFN_GATHER_LOAD: case IFN_MASK_GATHER_LOAD: case IFN_MASK_LEN_GATHER_LOAD: + case IFN_MASK_LEN_STRIDED_LOAD: case IFN_LEN_LOAD: case IFN_MASK_LEN_LOAD: return true; @@ -4648,6 +4649,22 @@ internal_gather_scatter_fn_p (internal_fn fn) } } +/* Return true if IFN is some form of strided load or strided store. */ + +bool +internal_strided_fn_p (internal_fn fn) +{ + switch (fn) + { + case IFN_MASK_LEN_STRIDED_LOAD: + case IFN_MASK_LEN_STRIDED_STORE: + return true; + + default: + return false; + } +} + /* If FN takes a vector len argument, return the index of that argument, otherwise return -1. */ @@ -4662,6 +4679,8 @@ internal_fn_len_index (internal_fn fn) case IFN_MASK_LEN_GATHER_LOAD: case IFN_MASK_LEN_SCATTER_STORE: + case IFN_MASK_LEN_STRIDED_LOAD: + case IFN_MASK_LEN_STRIDED_STORE: case IFN_COND_LEN_FMA: case IFN_COND_LEN_FMS: case IFN_COND_LEN_FNMA: @@ -4719,6 +4738,8 @@ internal_fn_mask_index (internal_fn fn) case IFN_MASK_SCATTER_STORE: case IFN_MASK_LEN_GATHER_LOAD: case IFN_MASK_LEN_SCATTER_STORE: + case IFN_MASK_LEN_STRIDED_LOAD: + case IFN_MASK_LEN_STRIDED_STORE: return 4; default: @@ -4740,6 +4761,7 @@ internal_fn_stored_value_index (internal_fn fn) case IFN_SCATTER_STORE: case IFN_MASK_SCATTER_STORE: case IFN_MASK_LEN_SCATTER_STORE: + case IFN_MASK_LEN_STRIDED_STORE: return 3; case IFN_LEN_STORE: @@ -4784,6 +4806,28 @@ internal_gather_scatter_fn_supported_p (internal_fn ifn, tree vector_type, && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale))); } +/* Return true if the target supports strided load or strided store function + IFN. For loads, VECTOR_TYPE is the vector type of the load result, + while for stores it is the vector type of the stored data argument. + STRIDE_TYPE is the type that holds the stride from the previous element + memory address of each loaded or stored element. + SCALE is the amount by which these stride should be multiplied + *after* they have been extended to address width. */ + +bool +internal_strided_fn_supported_p (internal_fn ifn, tree vector_type, + tree offset_type, int scale) +{ + optab optab = direct_internal_fn_optab (ifn); + insn_code icode = convert_optab_handler (optab, TYPE_MODE (vector_type), + TYPE_MODE (offset_type)); + int output_ops = internal_load_fn_p (ifn) ? 1 : 0; + bool unsigned_p = TYPE_UNSIGNED (offset_type); + return (icode != CODE_FOR_nothing + && insn_operand_matches (icode, 2 + output_ops, GEN_INT (unsigned_p)) + && insn_operand_matches (icode, 3 + output_ops, GEN_INT (scale))); +} + /* Return true if the target supports IFN_CHECK_{RAW,WAR}_PTRS function IFN for pointers of type TYPE when the accesses have LENGTH bytes and their common byte alignment is ALIGN. */ diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index a2023ab9c3d..0fa532e8f6b 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -199,6 +199,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE, mask_gather_load, gather_load) DEF_INTERNAL_OPTAB_FN (MASK_LEN_GATHER_LOAD, ECF_PURE, mask_len_gather_load, gather_load) +DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_LOAD, ECF_PURE, + mask_len_strided_load, gather_load) DEF_INTERNAL_OPTAB_FN (LEN_LOAD, ECF_PURE, len_load, len_load) DEF_INTERNAL_OPTAB_FN (MASK_LEN_LOAD, ECF_PURE, mask_len_load, mask_len_load) @@ -208,6 +210,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0, mask_scatter_store, scatter_store) DEF_INTERNAL_OPTAB_FN (MASK_LEN_SCATTER_STORE, 0, mask_len_scatter_store, scatter_store) +DEF_INTERNAL_OPTAB_FN (MASK_LEN_STRIDED_STORE, 0, + mask_len_strided_store, scatter_store) DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store) DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes) diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h index 99de13a0199..8379c61dff7 100644 --- a/gcc/internal-fn.h +++ b/gcc/internal-fn.h @@ -235,11 +235,13 @@ extern bool can_interpret_as_conditional_op_p (gimple *, tree *, extern bool internal_load_fn_p (internal_fn); extern bool internal_store_fn_p (internal_fn); extern bool internal_gather_scatter_fn_p (internal_fn); +extern bool internal_strided_fn_p (internal_fn); extern int internal_fn_mask_index (internal_fn); extern int internal_fn_len_index (internal_fn); extern int internal_fn_stored_value_index (internal_fn); extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree, tree, tree, int); +extern bool internal_strided_fn_supported_p (internal_fn, tree, tree, int); extern bool internal_check_ptrs_fn_supported_p (internal_fn, tree, poly_uint64, unsigned int); #define VECT_PARTIAL_BIAS_UNSUPPORTED 127 diff --git a/gcc/optabs.def b/gcc/optabs.def index 2ccbe4197b7..3d85ac5f678 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -98,9 +98,11 @@ OPTAB_CD(mask_len_store_optab, "mask_len_store$a$b") OPTAB_CD(gather_load_optab, "gather_load$a$b") OPTAB_CD(mask_gather_load_optab, "mask_gather_load$a$b") OPTAB_CD(mask_len_gather_load_optab, "mask_len_gather_load$a$b") +OPTAB_CD(mask_len_strided_load_optab, "mask_len_strided_load$a$b") OPTAB_CD(scatter_store_optab, "scatter_store$a$b") OPTAB_CD(mask_scatter_store_optab, "mask_scatter_store$a$b") OPTAB_CD(mask_len_scatter_store_optab, "mask_len_scatter_store$a$b") +OPTAB_CD(mask_len_strided_store_optab, "mask_len_strided_store$a$b") OPTAB_CD(vec_extract_optab, "vec_extract$a$b") OPTAB_CD(vec_init_optab, "vec_init$a$b")