From patchwork Wed Aug 30 11:54:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 137171 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a7d1:0:b0:3f2:4152:657d with SMTP id p17csp4484051vqm; Wed, 30 Aug 2023 04:55:44 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHVe1sqoAxqJe6Mf16rPbSIozmHss/R3i8LV+8mJrPzUogxb9N/iWrz5w5zLOG16yVYFAas X-Received: by 2002:a05:6512:e95:b0:500:12c6:c915 with SMTP id bi21-20020a0565120e9500b0050012c6c915mr1614691lfb.67.1693396543778; Wed, 30 Aug 2023 04:55:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1693396543; cv=none; d=google.com; s=arc-20160816; b=kb5Uitp9yCUZGTneUE3bw3TocbRM+QRdasvp+vht0iqf1/bhTHmkfkWqNZCgLeYq7Y X/SlFKvWt1hN6eel1ro+45RTX3jgEOgCpSayV5+JrcgHKV+MOBMHFaoKZjcWjmSrB0d4 9h4lJvqcE42o3eSdywuSd7PVxmeciQvX+liJrQMgTxn2ifQ5kYBkf1aFPNLm4mQYB8tc VI7Yl82TcdNaGECJgKihCc2mY+FpcVtHPrR6Tg6BZYMPEvM9h3R6tZqLtWsDtn+34VDh KJuBPo0N1iRLT2IEiEQ/TQYQbcGa34OZkadiM0rz6VaryvwpMPHVKBwFM0RxNmX+GuIP 02FA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:message-id :mime-version:subject:to:date:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=U3Zd/dpDaS2XBkiupFgc+jmu+Ud1ch4Bkz6yJTus684=; fh=hPrbWPhweUx4V0GV9uXJqbyAzg2ABmTz7kczrAQqMmM=; b=m72YnZYAQ5p4TR/63vRULD4bNafMzWpCJ6fdWEKxIW2Dguih6CS0o/RqHIUC+Ze0Sp qybekLQbUxuB4JcT/j9fSmlDOEfW9+8nKMK+1537EzKJ+IOECb4GsWExwG6ouPeMGHNk nv3SAT1VY8iYSBdLWKoYoRO1ZUTA+wMzqW3gWd0PZDEg/dDLY9YgCl+CI7jQJRIc00kn 41JiWK+kk3/2HPBdxT+LG7DFJBSBrEStCTxrb4rVBHTbiYj1z+jDS7pjj3X1wW2mAFVo yq1X6iGgh80q/wg/8+BXbz2I0zYIo0/Qm451RYGvIzn6AuKGZJuoK+EZGgXpxxLHnx/J ibeA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="wgKpoWi/"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id m1-20020a50ef01000000b00529faadef4esi7581286eds.219.2023.08.30.04.55.43 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Aug 2023 04:55:43 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="wgKpoWi/"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7C294385773C for ; Wed, 30 Aug 2023 11:55:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7C294385773C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1693396537; bh=U3Zd/dpDaS2XBkiupFgc+jmu+Ud1ch4Bkz6yJTus684=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=wgKpoWi/uAItpWzHcVG2smgZ2exWQwAwG9r1wcDX12Dvye+NeFPyYK2pXsFpyXILT GEpe5aWphq/qN01p7uMJN9yNUkWH3+Qdz7XITcJew+OpBqlfpGxYWUFTKURTMdcJCb KgE6y58VFFT4LXI7l6JAJUpOLWblD/sw8kYgTZQg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id 3DA4C3858D28 for ; Wed, 30 Aug 2023 11:54:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 3DA4C3858D28 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 51D3821852 for ; Wed, 30 Aug 2023 11:54:47 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 3E5611353E for ; Wed, 30 Aug 2023 11:54:47 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id s3cBDgcu72RSNAAAMHmgww (envelope-from ) for ; Wed, 30 Aug 2023 11:54:47 +0000 Date: Wed, 30 Aug 2023 13:54:46 +0200 (CEST) To: gcc-patches@gcc.gnu.org Subject: [PATCH] tree-optimization/111228 - combine two VEC_PERM_EXPRs MIME-Version: 1.0 Message-Id: <20230830115447.3E5611353E@imap2.suse-dmz.suse.de> X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Biener via Gcc-patches From: Richard Biener Reply-To: Richard Biener Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1775654973994965795 X-GMAIL-MSGID: 1775654973994965795 The following adds simplification of two VEC_PERM_EXPRs where the later one replaces all elements from either the first or the second input of the earlier permute. This allows a three input permute to be simplified to a two input one. I'm following the existing two input simplification case and only allow non-VLA permutes. The now existing three cases and the single case in tree-ssa-forwprop.cc somehow ask for merging, I'm not doing this as part of this change though. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/111228 * match.pd ((vec_perm (vec_perm ..) @5 ..) -> (vec_perm @x @5 ..)): New simplifications. * gcc.dg/tree-ssa/forwprop-42.c: New testcase. --- gcc/match.pd | 141 +++++++++++++++++++- gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c | 17 +++ 2 files changed, 155 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c diff --git a/gcc/match.pd b/gcc/match.pd index 47d2733211a..6a7edde5736 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -8993,10 +8993,10 @@ and, /* Merge - c = VEC_PERM_EXPR ; - d = VEC_PERM_EXPR ; + c = VEC_PERM_EXPR ; + d = VEC_PERM_EXPR ; to - d = VEC_PERM_EXPR ; */ + d = VEC_PERM_EXPR ; */ (simplify (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @0 VECTOR_CST@4) @@ -9038,6 +9038,141 @@ and, (if (op0) (vec_perm @1 @2 { op0; }))))))) +/* Merge + c = VEC_PERM_EXPR ; + d = VEC_PERM_EXPR ; + to + d = VEC_PERM_EXPR ; + when all elements from a or b are replaced by the later + permutation. */ + +(simplify + (vec_perm @5 (vec_perm@0 @1 @2 VECTOR_CST@3) VECTOR_CST@4) + (if (TYPE_VECTOR_SUBPARTS (type).is_constant ()) + (with + { + machine_mode result_mode = TYPE_MODE (type); + machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1)); + int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant (); + vec_perm_builder builder0; + vec_perm_builder builder1; + vec_perm_builder builder2 (nelts, nelts, 2); + } + (if (tree_to_vec_perm_builder (&builder0, @3) + && tree_to_vec_perm_builder (&builder1, @4)) + (with + { + vec_perm_indices sel0 (builder0, 2, nelts); + vec_perm_indices sel1 (builder1, 2, nelts); + bool use_1 = false, use_2 = false; + + for (int i = 0; i < nelts; i++) + { + if (known_lt ((poly_uint64)sel1[i], sel1.nelts_per_input ())) + builder2.quick_push (sel1[i]); + else + { + poly_uint64 j = sel0[(sel1[i] - sel1.nelts_per_input ()) + .to_constant ()]; + if (known_lt (j, sel0.nelts_per_input ())) + use_1 = true; + else + { + use_2 = true; + j -= sel0.nelts_per_input (); + } + builder2.quick_push (j + sel1.nelts_per_input ()); + } + } + } + (if (use_1 ^ use_2) + (with + { + vec_perm_indices sel2 (builder2, 2, nelts); + tree op0 = NULL_TREE; + /* If the new VEC_PERM_EXPR can't be handled but both + original VEC_PERM_EXPRs can, punt. + If one or both of the original VEC_PERM_EXPRs can't be + handled and the new one can't be either, don't increase + number of VEC_PERM_EXPRs that can't be handled. */ + if (can_vec_perm_const_p (result_mode, op_mode, sel2, false) + || (single_use (@0) + ? (!can_vec_perm_const_p (result_mode, op_mode, sel0, false) + || !can_vec_perm_const_p (result_mode, op_mode, sel1, false)) + : !can_vec_perm_const_p (result_mode, op_mode, sel1, false))) + op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2); + } + (if (op0) + (switch + (if (use_1) + (vec_perm @5 @1 { op0; })) + (if (use_2) + (vec_perm @5 @2 { op0; }))))))))))) + +/* And the case with swapped outer permute sources. */ + +(simplify + (vec_perm (vec_perm@0 @1 @2 VECTOR_CST@3) @5 VECTOR_CST@4) + (if (TYPE_VECTOR_SUBPARTS (type).is_constant ()) + (with + { + machine_mode result_mode = TYPE_MODE (type); + machine_mode op_mode = TYPE_MODE (TREE_TYPE (@1)); + int nelts = TYPE_VECTOR_SUBPARTS (type).to_constant (); + vec_perm_builder builder0; + vec_perm_builder builder1; + vec_perm_builder builder2 (nelts, nelts, 2); + } + (if (tree_to_vec_perm_builder (&builder0, @3) + && tree_to_vec_perm_builder (&builder1, @4)) + (with + { + vec_perm_indices sel0 (builder0, 2, nelts); + vec_perm_indices sel1 (builder1, 2, nelts); + bool use_1 = false, use_2 = false; + + for (int i = 0; i < nelts; i++) + { + if (known_ge ((poly_uint64)sel1[i], sel1.nelts_per_input ())) + builder2.quick_push (sel1[i]); + else + { + poly_uint64 j = sel0[sel1[i].to_constant ()]; + if (known_lt (j, sel0.nelts_per_input ())) + use_1 = true; + else + { + use_2 = true; + j -= sel0.nelts_per_input (); + } + builder2.quick_push (j); + } + } + } + (if (use_1 ^ use_2) + (with + { + vec_perm_indices sel2 (builder2, 2, nelts); + tree op0 = NULL_TREE; + /* If the new VEC_PERM_EXPR can't be handled but both + original VEC_PERM_EXPRs can, punt. + If one or both of the original VEC_PERM_EXPRs can't be + handled and the new one can't be either, don't increase + number of VEC_PERM_EXPRs that can't be handled. */ + if (can_vec_perm_const_p (result_mode, op_mode, sel2, false) + || (single_use (@0) + ? (!can_vec_perm_const_p (result_mode, op_mode, sel0, false) + || !can_vec_perm_const_p (result_mode, op_mode, sel1, false)) + : !can_vec_perm_const_p (result_mode, op_mode, sel1, false))) + op0 = vec_perm_indices_to_tree (TREE_TYPE (@4), sel2); + } + (if (op0) + (switch + (if (use_1) + (vec_perm @1 @5 { op0; })) + (if (use_2) + (vec_perm @2 @5 { op0; }))))))))))) + /* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop. The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c new file mode 100644 index 00000000000..f3dbc3e9394 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O -fdump-tree-cddce1" } */ + +typedef unsigned long v2di __attribute__((vector_size(16))); + +v2di g; +void test (v2di *v) +{ + v2di lo = v[0]; + v2di hi = v[1]; + v2di res; + res[1] = hi[1]; + res[0] = lo[0]; + g = res; +} + +/* { dg-final { scan-tree-dump-times "VEC_PERM_EXPR <\[^>\]*, { 0, 3 }>" 1 "cddce1" } } */