From patchwork Wed Nov 2 07:59:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 14068 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp3472474wru; Wed, 2 Nov 2022 01:00:09 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5yDykxpXKh1ysrkw6Uaxb+hcCti/x9P0cN3VjKEAjM84roYGRLCPZ5pgocDPoen03r9s4m X-Received: by 2002:a05:6402:3223:b0:461:8635:e5c with SMTP id g35-20020a056402322300b0046186350e5cmr23242630eda.303.1667376009702; Wed, 02 Nov 2022 01:00:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667376009; cv=none; d=google.com; s=arc-20160816; b=EK0/gqX2MnjHhzdRxrDZUkIOrC9FZIqrTBsPzK91NzVoXpu2tf6g6GPVPRU5m4IbTj 9fhiSTpOB8rMvTb/MenirKovmtooMs51agsdHTF+qOf/mQoA/pAdxfjhnoYyZHrUEngH Mo3B7JtSwu2JDZrQXpQr5LH9jjOV+SNfsehb3jHTiXqtKxuwcp/OYdwMWc38kT3dorXS lAjZtyC7cgtfxLU0aEs6tuVxU8U/qVEP5bHYjbXqkx0yUy1N+HV2snzd5JaP4iQM67ax ufj9UgTGyYg0O9FrC78hDVe/LBWb7/bW3LGlk+B/5vjsKR6ALZkLIUXV/jE5gb/ZA/uX z4bA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:subject:to:content-language:user-agent :mime-version:date:message-id:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=AwBsFa/QiKlF7yyu7Ro+csmjH26iog6TeEtSsqfqd38=; b=EZ7y+KPPLs5BmDHrZRvkJY+YFGDewcwI4Mx2NQOMovTRcWrp2rC1SM0hmk6psQFPb9 gAlF7NBRRtD+xI3vFXmeJ/39yr9mwBjMLrcOFALTkqW/LOp0LRI7b82sPkb9hwsqjPiR R2g8FcrvAPF5gKLaji3jQzNU0iOAupdy88Aq0z8dWpC3bS1HrpZZyQbQT2pniiV4L4xY JcgIWUEYemJvDbG5lIC1Ao/gZNlliDuD3GExNf+wE0pnZDXe28Zl0fxC0zqDpnBaRavm IKbSPV7tN72kezU6nmpCQW7wDbkHo2W0R/0Msiv+9xqs+QBa081I9y48yynWi4bxXuYU 7ecg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=bGgvCcF0; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id x12-20020a50ba8c000000b00461540620f1si14316015ede.169.2022.11.02.01.00.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 02 Nov 2022 01:00:09 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=bGgvCcF0; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 745293857BBB for ; Wed, 2 Nov 2022 08:00:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 745293857BBB DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1667376008; bh=AwBsFa/QiKlF7yyu7Ro+csmjH26iog6TeEtSsqfqd38=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=bGgvCcF01kHLuFRalKFhVfdG/sIdAvaxbZq8zrMzUpdLsFwvcb8l54W3rfX849uE2 tMZWae4W00VV6mYOPHxeA3bWb3DKbIoq1B7xv8tFT9BV3POyCMSfOg6u3A7ld5pf+a vqkyFouQtxp6MB+05TbFdO4mBE1RYq7ipR8u4uXw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 5EAB93858402 for ; Wed, 2 Nov 2022 07:59:22 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5EAB93858402 Received: from pps.filterd (m0098409.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A26gOQA022063; Wed, 2 Nov 2022 07:59:15 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3kjvbj0wjt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Nov 2022 07:59:15 +0000 Received: from m0098409.ppops.net (m0098409.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 2A26irZO010258; Wed, 2 Nov 2022 07:59:14 GMT Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3kjvbj0whq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Nov 2022 07:59:14 +0000 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 2A27pcQ7013537; Wed, 2 Nov 2022 07:59:12 GMT Received: from b06cxnps3075.portsmouth.uk.ibm.com (d06relay10.portsmouth.uk.ibm.com [9.149.109.195]) by ppma03ams.nl.ibm.com with ESMTP id 3kgut8x8tt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Nov 2022 07:59:12 +0000 Received: from d06av24.portsmouth.uk.ibm.com (d06av24.portsmouth.uk.ibm.com [9.149.105.60]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 2A27xAhP524874 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 2 Nov 2022 07:59:10 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 4E04A4203F; Wed, 2 Nov 2022 07:59:10 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BEDCF42042; Wed, 2 Nov 2022 07:59:07 +0000 (GMT) Received: from [9.197.230.20] (unknown [9.197.230.20]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 2 Nov 2022 07:59:07 +0000 (GMT) Message-ID: <94ac390b-a770-c868-051b-75319eb7f81d@linux.ibm.com> Date: Wed, 2 Nov 2022 15:59:06 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Content-Language: en-US To: GCC Patches Subject: [PATCH] vect: Fold LEN_{LOAD,STORE} if it's for the whole vector [PR107412] X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 833oMBBSWAxH2gPSwcKiRErxj1EPBlVC X-Proofpoint-ORIG-GUID: Af6PoYdUyhYhmSU_KuU7-K-HeBYKybmS X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-02_04,2022-11-01_02,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 lowpriorityscore=0 phishscore=0 priorityscore=1501 impostorscore=0 bulkscore=0 adultscore=0 mlxscore=0 spamscore=0 malwarescore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2210170000 definitions=main-2211020045 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches" From: "Kewen.Lin" Reply-To: "Kewen.Lin" Cc: Richard Sandiford , Peter Bergner , Segher Boessenkool Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748370466494184823?= X-GMAIL-MSGID: =?utf-8?q?1748370466494184823?= Hi, As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD, STORE} into normal vector load/store if the given length is known to be equal to the length of the whole vector. It would help to improve overall cycles as normally the latency of vector access with length in bytes is bigger than normal vector access, and it also saves the preparation for length if constant length can not be encoded into instruction (such as on power). Bootstrapped and regtested on x86_64-redhat-linux, aarch64-linux-gnu and powerpc64{,le}-linux-gnu. Is it ok for trunk? BR, Kewen ----- PR tree-optimization/107412 gcc/ChangeLog: * gimple-fold.cc (gimple_fold_mask_load_store_mem_ref): Rename to ... (gimple_fold_partial_load_store_mem_ref): ... this, add one parameter mask_p indicating it's for mask or length, and add some handlings for IFN LEN_{LOAD,STORE}. (gimple_fold_mask_load): Rename to ... (gimple_fold_partial_load): ... this, add one parameter mask_p. (gimple_fold_mask_store): Rename to ... (gimple_fold_partial_store): ... this, add one parameter mask_p. (gimple_fold_call): Add the handlings for IFN LEN_{LOAD,STORE}, and adjust calls on gimple_fold_mask_load_store_mem_ref to gimple_fold_partial_load_store_mem_ref. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr107412.c: New test. * gcc.target/powerpc/p9-vec-length-epil-8.c: Adjust scan times for folded LEN_LOAD. --- gcc/gimple-fold.cc | 57 ++++++++++++++----- .../gcc.target/powerpc/p9-vec-length-epil-8.c | 2 +- gcc/testsuite/gcc.target/powerpc/pr107412.c | 19 +++++++ 3 files changed, 64 insertions(+), 14 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107412.c -- 2.27.0 diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc index a1704784bc9..e3a087defa6 100644 --- a/gcc/gimple-fold.cc +++ b/gcc/gimple-fold.cc @@ -5370,19 +5370,39 @@ arith_overflowed_p (enum tree_code code, const_tree type, return wi::min_precision (wres, sign) > TYPE_PRECISION (type); } -/* If IFN_MASK_LOAD/STORE call CALL is unconditional, return a MEM_REF +/* If IFN_{MASK,LEN}_LOAD/STORE call CALL is unconditional, return a MEM_REF for the memory it references, otherwise return null. VECTYPE is the - type of the memory vector. */ + type of the memory vector. MASK_P indicates it's for MASK if true, + otherwise it's for LEN. */ static tree -gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype) +gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool mask_p) { tree ptr = gimple_call_arg (call, 0); tree alias_align = gimple_call_arg (call, 1); - tree mask = gimple_call_arg (call, 2); - if (!tree_fits_uhwi_p (alias_align) || !integer_all_onesp (mask)) + if (!tree_fits_uhwi_p (alias_align)) return NULL_TREE; + if (mask_p) + { + tree mask = gimple_call_arg (call, 2); + if (!integer_all_onesp (mask)) + return NULL_TREE; + } else { + tree basic_len = gimple_call_arg (call, 2); + if (!tree_fits_uhwi_p (basic_len)) + return NULL_TREE; + unsigned int nargs = gimple_call_num_args (call); + tree bias = gimple_call_arg (call, nargs - 1); + gcc_assert (tree_fits_uhwi_p (bias)); + tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias); + unsigned int len = tree_to_uhwi (biased_len); + unsigned int vect_len + = GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant (); + if (vect_len != len) + return NULL_TREE; + } + unsigned HOST_WIDE_INT align = tree_to_uhwi (alias_align); if (TYPE_ALIGN (vectype) != align) vectype = build_aligned_type (vectype, align); @@ -5390,16 +5410,18 @@ gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype) return fold_build2 (MEM_REF, vectype, ptr, offset); } -/* Try to fold IFN_MASK_LOAD call CALL. Return true on success. */ +/* Try to fold IFN_{MASK,LEN}_LOAD call CALL. Return true on success. + MASK_P indicates it's for MASK if true, otherwise it's for LEN. */ static bool -gimple_fold_mask_load (gimple_stmt_iterator *gsi, gcall *call) +gimple_fold_partial_load (gimple_stmt_iterator *gsi, gcall *call, bool mask_p) { tree lhs = gimple_call_lhs (call); if (!lhs) return false; - if (tree rhs = gimple_fold_mask_load_store_mem_ref (call, TREE_TYPE (lhs))) + if (tree rhs + = gimple_fold_partial_load_store_mem_ref (call, TREE_TYPE (lhs), mask_p)) { gassign *new_stmt = gimple_build_assign (lhs, rhs); gimple_set_location (new_stmt, gimple_location (call)); @@ -5410,13 +5432,16 @@ gimple_fold_mask_load (gimple_stmt_iterator *gsi, gcall *call) return false; } -/* Try to fold IFN_MASK_STORE call CALL. Return true on success. */ +/* Try to fold IFN_{MASK,LEN}_STORE call CALL. Return true on success. + MASK_P indicates it's for MASK if true, otherwise it's for LEN. */ static bool -gimple_fold_mask_store (gimple_stmt_iterator *gsi, gcall *call) +gimple_fold_partial_store (gimple_stmt_iterator *gsi, gcall *call, + bool mask_p) { tree rhs = gimple_call_arg (call, 3); - if (tree lhs = gimple_fold_mask_load_store_mem_ref (call, TREE_TYPE (rhs))) + if (tree lhs + = gimple_fold_partial_load_store_mem_ref (call, TREE_TYPE (rhs), mask_p)) { gassign *new_stmt = gimple_build_assign (lhs, rhs); gimple_set_location (new_stmt, gimple_location (call)); @@ -5634,10 +5659,16 @@ gimple_fold_call (gimple_stmt_iterator *gsi, bool inplace) cplx_result = true; break; case IFN_MASK_LOAD: - changed |= gimple_fold_mask_load (gsi, stmt); + changed |= gimple_fold_partial_load (gsi, stmt, true); break; case IFN_MASK_STORE: - changed |= gimple_fold_mask_store (gsi, stmt); + changed |= gimple_fold_partial_store (gsi, stmt, true); + break; + case IFN_LEN_LOAD: + changed |= gimple_fold_partial_load (gsi, stmt, false); + break; + case IFN_LEN_STORE: + changed |= gimple_fold_partial_store (gsi, stmt, false); break; default: break; diff --git a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c index 961df0d5646..8b9c9107814 100644 --- a/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c +++ b/gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-8.c @@ -8,5 +8,5 @@ #include "p9-vec-length-8.h" -/* { dg-final { scan-assembler-times {\mlxvl\M} 21 } } */ +/* { dg-final { scan-assembler-times {\mlxvl\M} 16 } } */ /* { dg-final { scan-assembler-times {\mstxvl\M} 7 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/pr107412.c b/gcc/testsuite/gcc.target/powerpc/pr107412.c new file mode 100644 index 00000000000..4526ea8639d --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr107412.c @@ -0,0 +1,19 @@ +/* { dg-require-effective-target powerpc_p9vector_ok } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options "-mdejagnu-cpu=power9 -O2 -ftree-vectorize -fno-vect-cost-model -funroll-loops -fno-tree-loop-distribute-patterns --param vect-partial-vector-usage=2 -fdump-tree-optimized" } */ + +/* Verify there is only one IFN call LEN_LOAD and IFN_STORE separately. */ + +#define N 16 +int src[N]; +int dest[N]; + +void +foo () +{ + for (int i = 0; i < (N - 1); i++) + dest[i] = src[i]; +} + +/* { dg-final { scan-tree-dump-times {\mLEN_LOAD\M} 1 "optimized" } } */ +/* { dg-final { scan-tree-dump-times {\mLEN_STORE\M} 1 "optimized" } } */