From patchwork Fri Aug 11 06:49:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 134338 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp898142vqi; Thu, 10 Aug 2023 23:49:55 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFU2cVQ+thuhtUnDwIvm8I+O7KoBj6n/orE8ASa3DkDS14nUXwTDtC2GpYW8vVNZeX4w9dK X-Received: by 2002:a17:906:109c:b0:99c:e1f4:193d with SMTP id u28-20020a170906109c00b0099ce1f4193dmr964880eju.31.1691736595788; Thu, 10 Aug 2023 23:49:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691736595; cv=none; d=google.com; s=arc-20160816; b=atiGTA/5i9Y7LMtBKnDsa4mBXUL2q2LUwMK4iUbvxg23XydrZDTcFMGHsuorYH/h+s bKO9Fh0BfMhZa0cKpA4dTEdbqtB7ZK0uvZgfXCD4C3efMOGLcinGjRnpDOoSOdZDfTU7 vUGCdrPYT4ubdyXCg2dlxFAq2ze+2hCCllRnqBjck7cBcd3r/CgOmjxdKMu3UULF9lDr jU9/T1fDxCWu+ncz2T+U/C5ZdfEN2rAGWanv29I7+dK0R20FJVUfR/VqryogiQC9vLo3 CdKCPo6/caKPw9qY6KeRURZPomcPx35RplBxHXci7O0v4+tnaotq9Mst6wIhZBBj7M/U peuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=rLxsswcviqJAamR+qiBv/Sm9/u5u2jHwWwFV2xWZobI=; fh=bFGt8bvBdss4RDVNwtoG4vql1cbNszblsxOGwrQqCdA=; b=eSvS+kFCgdZYT54UdjDWVUlqIzkoAL++mCo1R0xL4mEXPKNgNYh7qUsoHnsKsMn0mO q9cO7HoKMriLEi0X3G5H3ms5sx3WuCYVunYpbd4jWOeVOBGLkj7LidKfxhK7jbYeXKK1 6VuvXWG2Lv1L33MGD4GWN4siuCI8RhHvqmMn60RzcAtO4IYU/v/uPjXAakFGivPue01t eTECLw0k+Lro+eqTdUTpZHtVMTIRf/Wrc5IyNt9Yg8GRfGZjH5XsCAb1skK8sMB7moAC 1ii4z6pI0gV+GoyFpGC1QRJmG2R/TCLDNNZN9H1GzdnDWEObrUUOrZwvRcL889uHorCb 0gcg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id h3-20020a1709060f4300b009927d850155si2800363ejj.892.2023.08.10.23.49.55 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Aug 2023 23:49:55 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D07DF3857709 for ; Fri, 11 Aug 2023 06:49:50 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgau2.qq.com (smtpbgau2.qq.com [54.206.34.216]) by sourceware.org (Postfix) with ESMTPS id 79CD93858D20 for ; Fri, 11 Aug 2023 06:49:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 79CD93858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp65t1691736556temq2a6j Received: from rios-cad5.localdomain ( [58.60.1.11]) by bizesmtp.qq.com (ESMTP) with id ; Fri, 11 Aug 2023 14:49:15 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0001000 X-QQ-FEAT: tym1VHZoWxlw71cSSHZMxvKxejX0Q60xQilgsQWAjzB1rud+4SK1XpUojv7yh AtO2YV0EwsK42DXm7CcYdvUYMHOssm7rFqpvkIGbBHYKympuqkZHfsQGRyAELjw0TnACapC vmw+UNhBKr1l0715f4yCsVkJaO6DQUBW2eTfkCmiyY6584rlU0j3Ylk+n8BQkN9UlZVkeYk hpzp/4wXbRtYDx9uXaPHoMGvqCmgLOpujJ3K3Op44MK+MLk/jmwdAmsXCl9AR+6WClDvl1/ coXoF6Apmgke4HDrYNSCZqIH3BThRGzEGC5Dt93yiNl1f5lt7Qw0tL7YP7WjDHbX9fHAXp/ Wrb0FQ+DyvYk5T8GLhL/Mc8xu0mTAK8c6UYfvcx7WInRZopYBiBRO3VH9SfX52xPSDktshp xRBdIAF72sXzl7vOQ6JuHQ== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 572486776826175232 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: richard.sandiford@arm.com, rguenther@suse.de, Juzhe-Zhong Subject: [PATCH] VECT: Add vec_mask_len_{load_lanes,store_lanes} patterns Date: Fri, 11 Aug 2023 14:49:10 +0800 Message-Id: <20230811064910.525721-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773914392459910177 X-GMAIL-MSGID: 1773914392459910177 This patch is add vec_mask_len_{load_lanes,store_stores} autovectorization patterns. Here we want to support this following autovectorization: #include void foo (int8_t *__restrict a, int8_t *__restrict b, int8_t *__restrict cond, int n) { for (intptr_t i = 0; i < n; ++i) { if (cond[i]) a[i] = b[i * 2] + b[i * 2 + 1]; } } ARM SVE IR: https://godbolt.org/z/cro1Eqc6a # loop_mask_60 = PHI ... mask__39.12_63 = vect__3.11_61 != { 0, ... }; vec_mask_and_66 = loop_mask_60 & mask__39.12_63; ... vect_array.15 = .MASK_LOAD_LANES (_57, 8B, vec_mask_and_66); ... For RVV, we would like to see IR: loop_len = SELECT_VL; ... mask__39.12_63 = vect__3.11_61 != { 0, ... }; ... vect_array.15 = .MASK_LEN_LOAD_LANES (_57, 8B, mask__39.12_63, loop_len, bias); ... Bootstrap and Regression on X86 passed. Ok for trunk ? gcc/ChangeLog: * doc/md.texi: Add vec_mask_len_{load_lanes,store_lanes} patterns. * internal-fn.cc (expand_partial_load_optab_fn): Ditto. (expand_partial_store_optab_fn): Ditto. * internal-fn.def (MASK_LEN_LOAD_LANES): Ditto. (MASK_LEN_STORE_LANES): Ditto. * optabs.def (OPTAB_CD): Ditto. --- gcc/doc/md.texi | 34 ++++++++++++++++++++++++++++++++++ gcc/internal-fn.cc | 6 ++++-- gcc/internal-fn.def | 6 ++++++ gcc/optabs.def | 2 ++ 4 files changed, 46 insertions(+), 2 deletions(-) diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 9693b6bfe79..70590e68ffe 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -4978,6 +4978,23 @@ for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++) This pattern is not allowed to @code{FAIL}. +@cindex @code{vec_mask_len_load_lanes@var{m}@var{n}} instruction pattern +@item @samp{vec_mask_len_load_lanes@var{m}@var{n}} +Like @samp{vec_load_lanes@var{m}@var{n}}, but takes an additional +mask operand (operand 2), length operand (operand 3) as well as bias operand (operand 4) +that specifies which elements of the destination vectors should be loaded. +Other elements of the destination vectors are undefined. The operation is equivalent to: + +@smallexample +int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n}); +for (j = 0; j < operand3 + operand4; j++) + if (operand2[j]) + for (i = 0; i < c; i++) + operand0[i][j] = operand1[j * c + i]; +@end smallexample + +This pattern is not allowed to @code{FAIL}. + @cindex @code{vec_store_lanes@var{m}@var{n}} instruction pattern @item @samp{vec_store_lanes@var{m}@var{n}} Equivalent to @samp{vec_load_lanes@var{m}@var{n}}, with the memory @@ -5011,6 +5028,23 @@ for (j = 0; j < GET_MODE_NUNITS (@var{n}); j++) This pattern is not allowed to @code{FAIL}. +@cindex @code{vec_mask_len_store_lanes@var{m}@var{n}} instruction pattern +@item @samp{vec_mask_len_store_lanes@var{m}@var{n}} +Like @samp{vec_store_lanes@var{m}@var{n}}, but takes an additional +mask operand (operand 2), length operand (operand 3) as well as bias operand (operand 4) +that specifies which elements of the source vectors should be stored. +The operation is equivalent to: + +@smallexample +int c = GET_MODE_SIZE (@var{m}) / GET_MODE_SIZE (@var{n}); +for (j = 0; j < operand3 + operand4; j++) + if (operand2[j]) + for (i = 0; i < c; i++) + operand0[j * c + i] = operand1[i][j]; +@end smallexample + +This pattern is not allowed to @code{FAIL}. + @cindex @code{gather_load@var{m}@var{n}} instruction pattern @item @samp{gather_load@var{m}@var{n}} Load several separate memory locations into a vector of mode @var{m}. diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 7f5ede00c02..4f2b20a79e5 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -2931,7 +2931,8 @@ expand_partial_load_optab_fn (internal_fn ifn, gcall *stmt, convert_optab optab) type = TREE_TYPE (lhs); rhs = expand_call_mem_ref (type, stmt, 0); - if (optab == vec_mask_load_lanes_optab) + if (optab == vec_mask_load_lanes_optab + || optab == vec_mask_len_load_lanes_optab) icode = get_multi_vector_move (type, optab); else if (optab == len_load_optab) icode = direct_optab_handler (optab, TYPE_MODE (type)); @@ -2973,7 +2974,8 @@ expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab optab type = TREE_TYPE (rhs); lhs = expand_call_mem_ref (type, stmt, 0); - if (optab == vec_mask_store_lanes_optab) + if (optab == vec_mask_store_lanes_optab + || optab == vec_mask_len_store_lanes_optab) icode = get_multi_vector_move (type, optab); else if (optab == len_store_optab) icode = direct_optab_handler (optab, TYPE_MODE (type)); diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index b3c410f4b6a..a04d2b36319 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -50,6 +50,7 @@ along with GCC; see the file COPYING3. If not see - mask_load: currently just maskload - load_lanes: currently just vec_load_lanes - mask_load_lanes: currently just vec_mask_load_lanes + - mask_len_load_lanes: currently just vec_mask_len_load_lanes - gather_load: used for {mask_,mask_len_,}gather_load - len_load: currently just len_load - mask_len_load: currently just mask_len_load @@ -57,6 +58,7 @@ along with GCC; see the file COPYING3. If not see - mask_store: currently just maskstore - store_lanes: currently just vec_store_lanes - mask_store_lanes: currently just vec_mask_store_lanes + - mask_len_store_lanes: currently just vec_mask_len_store_lanes - scatter_store: used for {mask_,mask_len_,}scatter_store - len_store: currently just len_store - mask_len_store: currently just mask_len_store @@ -188,6 +190,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_LOAD, ECF_PURE, maskload, mask_load) DEF_INTERNAL_OPTAB_FN (LOAD_LANES, ECF_CONST, vec_load_lanes, load_lanes) DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE, vec_mask_load_lanes, mask_load_lanes) +DEF_INTERNAL_OPTAB_FN (MASK_LEN_LOAD_LANES, ECF_PURE, + vec_mask_len_load_lanes, mask_load_lanes) DEF_INTERNAL_OPTAB_FN (GATHER_LOAD, ECF_PURE, gather_load, gather_load) DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE, @@ -208,6 +212,8 @@ DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store) DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes) DEF_INTERNAL_OPTAB_FN (MASK_STORE_LANES, 0, vec_mask_store_lanes, mask_store_lanes) +DEF_INTERNAL_OPTAB_FN (MASK_LEN_STORE_LANES, 0, + vec_mask_len_store_lanes, mask_store_lanes) DEF_INTERNAL_OPTAB_FN (VCOND, ECF_CONST | ECF_NOTHROW, vcond, vec_cond) DEF_INTERNAL_OPTAB_FN (VCONDU, ECF_CONST | ECF_NOTHROW, vcondu, vec_cond) diff --git a/gcc/optabs.def b/gcc/optabs.def index 1ea1947b3b5..d4d7d6c53d4 100644 --- a/gcc/optabs.def +++ b/gcc/optabs.def @@ -82,6 +82,8 @@ OPTAB_CD(vec_load_lanes_optab, "vec_load_lanes$a$b") OPTAB_CD(vec_store_lanes_optab, "vec_store_lanes$a$b") OPTAB_CD(vec_mask_load_lanes_optab, "vec_mask_load_lanes$a$b") OPTAB_CD(vec_mask_store_lanes_optab, "vec_mask_store_lanes$a$b") +OPTAB_CD(vec_mask_len_load_lanes_optab, "vec_mask_len_load_lanes$a$b") +OPTAB_CD(vec_mask_len_store_lanes_optab, "vec_mask_len_store_lanes$a$b") OPTAB_CD(vcond_optab, "vcond$a$b") OPTAB_CD(vcondu_optab, "vcondu$a$b") OPTAB_CD(vcondeq_optab, "vcondeq$a$b")