From patchwork Thu Sep 1 10:29:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 892 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:ecc5:0:0:0:0:0 with SMTP id s5csp182437wro; Thu, 1 Sep 2022 03:30:41 -0700 (PDT) X-Google-Smtp-Source: AA6agR7WsH2zCBHBTaBZF2vy8+O0kfK3cO42VZysPxil4KL+pTIU91cENdozFaMs3+Fb4s1MiTTH X-Received: by 2002:a17:907:2cea:b0:741:6251:3a22 with SMTP id hz10-20020a1709072cea00b0074162513a22mr14806614ejc.6.1662028240901; Thu, 01 Sep 2022 03:30:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1662028240; cv=none; d=google.com; s=arc-20160816; b=ashiJWVlZYLwZDNspQGy7ImrrBuUrXYv9IOxb2TysEKDxCV2lBBdnhTtgXYsBkJlYb 0WEV5M230SSJ6x4c9v9S/lx/GW0CumpNDSpT7MCb6BUkCaClZh5pAo1I2OXL4M13TjhK LQh473Tw6HSBx0f+U0OeESw3q608m5kcpAnjb+DAAXCjrOhMhB47Lfo0hKBlmJchgw7n MaOeZNESOnX9xv++rsEBgpIQSMJTLJ/LH1q8piMfjivrKJfpprCDelrUw1MZSm4+p0Pp GFTNNdZFoBZthhR1Mg4qtIGMpu58Gvi4cflXMNxd/HUhBrBbopKn/QezdXVwJVtI8KHT xeWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :mime-version:user-agent:message-id:date:subject:mail-followup-to:to :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=zrngq7Gv2SlWvGrEhUAtBovUZCCpwDWlZvtWBm1Yua8=; b=LiQfIKdkO2uYmI0nlEWVjSm7htWhQDjNLG/7jdWO6Lwf+6OQPTsz2gO3794lV0gJ4r IvFVBJmS235jOLV+PfmRnaooO+kCeU5yabeFvx6PiS0eGyejJSlN6dynQySLVl7bA6fQ nQtZA/BDuri5V8EX/9zUpB9qRB50MrZrnvdxyOnt3PAHfva5ohNzFSxHAdNroEOokrSA oS7GRESeJDZFIspgjPkYE2k31JqQIkACDzwDuffV7By0lDnrdPq+fD9hlpo3KHZF/+Za lHBNWk1K2B6nLmBEAHN0e9jsl+6Uz0bnpWk+wV03QzBaDntvw+K2SFW66sz97NgkLzjp nlpA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=FDzSnIoU; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id p24-20020a056402075800b00447d6f244c6si1277408edy.248.2022.09.01.03.30.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Sep 2022 03:30:40 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=FDzSnIoU; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B4835382DB1D for ; Thu, 1 Sep 2022 10:30:39 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B4835382DB1D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1662028239; bh=zrngq7Gv2SlWvGrEhUAtBovUZCCpwDWlZvtWBm1Yua8=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=FDzSnIoUzQbb31QE7K3tCbxqNqmSsYTPBFfkVCsa4PI5kCwV6cxP0Iq+CjX7QAd6N r19Chv2zMLJwDjAN417sjypympxQ/AD1nf6l3oxSWPXPBOyp95PnqMlDK76sVab81g 1OmDvsvRZhsuyHAY2Tgod1KpFq4p1nTeYKiufxys= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 8A37D3858C00 for ; Thu, 1 Sep 2022 10:29:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8A37D3858C00 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 43AC2ED1; Thu, 1 Sep 2022 03:29:59 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9A5B33F7B4; Thu, 1 Sep 2022 03:29:52 -0700 (PDT) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, rguenther@suse.de, richard.sandiford@arm.com Subject: [PATCH] vect: Try to remove single-vector permutes from SLP graph Date: Thu, 01 Sep 2022 11:29:51 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-49.7 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Cc: rguenther@suse.de Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1742762924311599011?= X-GMAIL-MSGID: =?utf-8?q?1742762924311599011?= This patch extends the SLP layout optimisation pass so that it tries to remove layout changes that are brought about by permutes of existing vectors. This fixes the bb-slp-pr54400.c regression on x86_64 and also means that we can remove the permutes in cases like: typedef float v4sf __attribute__((vector_size(sizeof(float)*4))); float __attribute__((noipa)) f(v4sf v0, v4sf v1) { return v0[0]*v1[0]+v0[1]*v1[1]+v0[2]*v1[2]+v0[3]*v1[3]; } The new test is a simple adaption of bb-slp-pr54400.c, with the same style of markup. Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Richard PS. Sorry for missing the regression during testing. The initial version of the new bb-slp-layout* tests had markup that led to a lot of FAILs on x86_64. I fixed the markup to avoid those, but didn't notice this extra (unrelated) failure at the end. gcc/ * tree-vect-slp.cc (vect_build_slp_tree_2): When building a VEC_PERM_EXPR of an existing vector, set the SLP_TREE_LANES to the number of vector elements, if that's a known constant. (vect_optimize_slp_pass::is_compatible_layout): Remove associated comment about zero SLP_TREE_LANES. (vect_optimize_slp_pass::start_choosing_layouts): Iterate over all partition members when looking for potential layouts. Handle existing permutes of fixed-length vectors. gcc/testsuite/ * gcc.dg/vect/bb-slp-pr54400.c: Extend to aarch64. * gcc.dg/vect/bb-slp-layout-18.c: New test. --- gcc/testsuite/gcc.dg/vect/bb-slp-layout-18.c | 15 +++++ gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c | 4 +- gcc/tree-vect-slp.cc | 67 +++++++++++++------- 3 files changed, 61 insertions(+), 25 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-layout-18.c diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-layout-18.c b/gcc/testsuite/gcc.dg/vect/bb-slp-layout-18.c new file mode 100644 index 00000000000..ff462722507 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-layout-18.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_float} */ +/* { dg-additional-options "-w -Wno-psabi -ffast-math" } */ + +typedef float v4sf __attribute__((vector_size(sizeof(float)*4))); + +float __attribute__((noipa)) +f(v4sf v0, v4sf v1) +{ + return v0[0]*v1[0]+v0[1]*v1[1]+v0[2]*v1[2]+v0[3]*v1[3]; +} + +/* We are lacking an effective target for .REDUC_PLUS support. */ +/* { dg-final { scan-tree-dump-times "basic block part vectorized" 1 "slp2" { target x86_64-*-* aarch64*-*-* } } } */ +/* { dg-final { scan-tree-dump-not " = VEC_PERM_EXPR" "slp2" { target x86_64-*-* aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c b/gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c index 6b427aac774..8aec2092f4d 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c @@ -39,5 +39,5 @@ main () } /* We are lacking an effective target for .REDUC_PLUS support. */ -/* { dg-final { scan-tree-dump-times "basic block part vectorized" 3 "slp2" { target x86_64-*-* } } } */ -/* { dg-final { scan-tree-dump-not " = VEC_PERM_EXPR" "slp2" { target x86_64-*-* } } } */ +/* { dg-final { scan-tree-dump-times "basic block part vectorized" 3 "slp2" { target x86_64-*-* aarch64*-*-* } } } */ +/* { dg-final { scan-tree-dump-not " = VEC_PERM_EXPR" "slp2" { target x86_64-*-* aarch64*-*-* } } } */ diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 226550635cc..cc04ef350a6 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -1840,6 +1840,10 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node, TREE_TYPE (TREE_TYPE (vec)))); SLP_TREE_VECTYPE (vnode) = TREE_TYPE (vec); } + auto nunits = TYPE_VECTOR_SUBPARTS (SLP_TREE_VECTYPE (vnode)); + unsigned HOST_WIDE_INT const_nunits; + if (nunits.is_constant (&const_nunits)) + SLP_TREE_LANES (vnode) = const_nunits; SLP_TREE_VEC_DEFS (vnode).safe_push (vec); /* We are always building a permutation node even if it is an identity permute to shield the rest of the vectorizer from the odd node @@ -4325,8 +4329,6 @@ vect_optimize_slp_pass::is_compatible_layout (slp_tree node, if (layout_i == 0) return true; - /* SLP_TREE_LANES is zero for existing vectors, but those only support - layout 0 anyway. */ if (SLP_TREE_LANES (node) != m_perms[layout_i].length ()) return false; @@ -4521,38 +4523,57 @@ vect_optimize_slp_pass::start_choosing_layouts () m_perms.safe_push (vNULL); /* Create layouts from existing permutations. */ - for (unsigned int node_i : m_leafs) + auto_load_permutation_t tmp_perm; + for (unsigned int node_i : m_partitioned_nodes) { - auto &vertex = m_vertices[node_i]; - if (vertex.partition < 0) - continue; - /* Leafs also double as entries to the reverse graph. Allow the layout of those to be changed. */ + auto &vertex = m_vertices[node_i]; auto &partition = m_partitions[vertex.partition]; if (!m_slpg->vertices[node_i].succ) partition.layout = 0; - /* Loads are the only thing generating permutes. */ + /* Loads and VEC_PERM_EXPRs are the only things generating permutes. */ slp_tree node = vertex.node; - if (!SLP_TREE_LOAD_PERMUTATION (node).exists ()) - continue; - - /* If splitting out a SLP_TREE_LANE_PERMUTATION can make the node - unpermuted, record a layout that reverses this permutation. */ - gcc_assert (partition.layout == 0); stmt_vec_info dr_stmt = SLP_TREE_REPRESENTATIVE (node); - if (!STMT_VINFO_GROUPED_ACCESS (dr_stmt)) - continue; - dr_stmt = DR_GROUP_FIRST_ELEMENT (dr_stmt); - unsigned imin = DR_GROUP_SIZE (dr_stmt) + 1, imax = 0; + slp_tree child; + unsigned HOST_WIDE_INT imin, imax = 0; bool any_permute = false; + tmp_perm.truncate (0); + if (SLP_TREE_LOAD_PERMUTATION (node).exists ()) + { + /* If splitting out a SLP_TREE_LANE_PERMUTATION can make the node + unpermuted, record a layout that reverses this permutation. */ + gcc_assert (partition.layout == 0); + if (!STMT_VINFO_GROUPED_ACCESS (dr_stmt)) + continue; + dr_stmt = DR_GROUP_FIRST_ELEMENT (dr_stmt); + imin = DR_GROUP_SIZE (dr_stmt) + 1; + tmp_perm.safe_splice (SLP_TREE_LOAD_PERMUTATION (node)); + } + else if (SLP_TREE_CODE (node) == VEC_PERM_EXPR + && SLP_TREE_CHILDREN (node).length () == 1 + && (child = SLP_TREE_CHILDREN (node)[0]) + && (TYPE_VECTOR_SUBPARTS (SLP_TREE_VECTYPE (child)) + .is_constant (&imin))) + { + /* If the child has the same vector size as this node, + reversing the permutation can make the permutation a no-op. + In other cases it can change a true permutation into a + full-vector extract. */ + tmp_perm.reserve (SLP_TREE_LANES (node)); + for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j) + tmp_perm.quick_push (SLP_TREE_LANE_PERMUTATION (node)[j].second); + } + else + continue; + for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j) { - unsigned idx = SLP_TREE_LOAD_PERMUTATION (node)[j]; + unsigned idx = tmp_perm[j]; imin = MIN (imin, idx); imax = MAX (imax, idx); - if (idx - SLP_TREE_LOAD_PERMUTATION (node)[0] != j) + if (idx - tmp_perm[0] != j) any_permute = true; } /* If the span doesn't match we'd disrupt VF computation, avoid @@ -4560,7 +4581,7 @@ vect_optimize_slp_pass::start_choosing_layouts () if (imax - imin + 1 != SLP_TREE_LANES (node)) continue; /* If there's no permute no need to split one out. In this case - we can consider turning the load into a permuted load, if that + we can consider turning a load into a permuted load, if that turns out to be cheaper than alternatives. */ if (!any_permute) { @@ -4575,7 +4596,7 @@ vect_optimize_slp_pass::start_choosing_layouts () auto_sbitmap load_index (SLP_TREE_LANES (node)); bitmap_clear (load_index); for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j) - bitmap_set_bit (load_index, SLP_TREE_LOAD_PERMUTATION (node)[j] - imin); + bitmap_set_bit (load_index, tmp_perm[j] - imin); unsigned j; for (j = 0; j < SLP_TREE_LANES (node); ++j) if (!bitmap_bit_p (load_index, j)) @@ -4586,7 +4607,7 @@ vect_optimize_slp_pass::start_choosing_layouts () vec perm = vNULL; perm.safe_grow (SLP_TREE_LANES (node), true); for (unsigned j = 0; j < SLP_TREE_LANES (node); ++j) - perm[j] = SLP_TREE_LOAD_PERMUTATION (node)[j] - imin; + perm[j] = tmp_perm[j] - imin; if (int (m_perms.length ()) >= param_vect_max_layout_candidates) {