From patchwork Mon Aug 7 13:30:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 131984 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c44e:0:b0:3f2:4152:657d with SMTP id w14csp1451390vqr; Mon, 7 Aug 2023 06:31:36 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG8ltPJohLhG7lEBI+Ux+A9msvd411tS+8Fu1d7MPAVj5nOR8zOji/+OVwghAAqcoR3cJGO X-Received: by 2002:a17:906:9bf8:b0:993:d6e8:2389 with SMTP id de56-20020a1709069bf800b00993d6e82389mr8098977ejc.26.1691415096127; Mon, 07 Aug 2023 06:31:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691415096; cv=none; d=google.com; s=arc-20160816; b=0vYmbMTcZjGjYxLz0jX8TN9/ENQpEnwznmOOxXPJvdZYfb7+GnDNbfPfmLMgXoVUYw i6tcCOTOToR4/noJ2sTDzX4QWi7eEcQ/mrKbN7Hzs7quwmI5JAFj8grg9ef4Mk0XwEVw FA90V5OLAV0Zqz9msLhqdDjBsu0BxEhUFxca7g82mto6HbPP6aLI9tgOzJDMZoeSlHZz uJ2SQvhqAFMqMpfPOstN2mf7Va5LAzBmMOpGZA3xx28iukcor2Nm/VxClYX5DJJMfL79 ljqnda0AT7VPNa01JGBSVhzwIr1cHEa62/fLYUdKAC8J0DowAiQRlKM2wPAyC8vzpIuk UDjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:sender:errors-to:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :mime-version:user-agent:subject:cc:to:date:dmarc-filter :delivered-to:dkim-signature:dkim-filter; bh=7AVBofV43mFu4gyhJfTR6bUKPMZ5BMcpEvuBLEoskF8=; fh=YqA+5s9oAwnNzotQ4oUKFF8r+ylKGgHXnNsFpvcI6Dw=; b=bWsBWAN032UdoEHG46abC1kcdxNV/8EA6wYgUUsRifZa04tfDBrPSO6FXqQAlGXWHD zg1h/dcXt6K4AhuT0YntBtOpXqmX5wTYpzpZWOjOhOW17IW5xgsaxekegdgETUXXttym de42L0rhldQ3pTgRiKiMvSkwWxhohwe+5nkprQYxZwCFioOJA+39BsDDOQRsNOJugEJ5 OiiIOhLiA5W3wkp8Y1sdrVya4vvnIh46xazzbmaZCvUN5on63U4HQrsLsKnjVM9+qqpe IYQ7KPh90yr2o46Eed5i/2AlQRHcmLpuqJ2lXndMd4PTu7f/g6DdSzJjUkOpx8TQGJ85 eG3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=BzjjlVtO; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id o18-20020a17090637d200b0098d0a88d4fasi152932ejc.808.2023.08.07.06.31.35 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 06:31:36 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=BzjjlVtO; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 04FB838582A1 for ; Mon, 7 Aug 2023 13:31:35 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 04FB838582A1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1691415095; bh=7AVBofV43mFu4gyhJfTR6bUKPMZ5BMcpEvuBLEoskF8=; h=Date:To:cc:Subject:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=BzjjlVtObsI8EYlkUZxHmUwfMBKFuknPx2avGzWHJn9YosB70pbmLXHmeMw22ZLUh YmLVaSqJYfluknb9EhKgjuOc+Ud87yEfbNqwPyvw0GuhcyR+bgIt8WaykYc8ui7cxl MV56zsWCnayagj0ni41Xx9PduUprRPqy+S5EeLsw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id E59773858418 for ; Mon, 7 Aug 2023 13:30:51 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E59773858418 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 2635E1FDD1; Mon, 7 Aug 2023 13:30:51 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id F074E2C142; Mon, 7 Aug 2023 13:30:50 +0000 (UTC) Date: Mon, 7 Aug 2023 13:30:50 +0000 (UTC) To: gcc-patches@gcc.gnu.org cc: richard.sandiford@arm.com Subject: [PATCH] tree-optimization/49955 - BB reduction with odd number of lanes User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Biener via Gcc-patches From: Richard Biener Reply-To: Richard Biener Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" Message-Id: <20230807133135.04FB838582A1@sourceware.org> X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773577275596535676 X-GMAIL-MSGID: 1773577275596535676 The following enhances BB reduction vectorization to support vectorizing only a subset of the lanes, keeping the rest as scalar ops. For now we try to make the number of lanes even by leaving alone the "last" lane. That's because SLP discovery with all lanes will fail too soon to get us any hint on which lane to strip and likewise we don't know what vector modes the target supports so restricting ourselves to power-of-two or other cases isn't easy. This is enough to get at the vectorization opportunity for the testcase in the PR - albeit with the chosen lanes not optimal but at least vectorizable. Boostrap and regtest running on x86_64-unknown-linux-gnu. I failed to write a small testcase because of that "optimal" lane selection and PR110935. PR tree-optimization/49955 * tree-vectorizer.h (_slp_instance::remain_stmts): New. (SLP_INSTANCE_REMAIN_STMTS): Likewise. * tree-vect-slp.cc (vect_free_slp_instance): Release SLP_INSTANCE_REMAIN_STMTS. (vect_build_slp_instance): Make the number of lanes of a BB reduction even. (vectorize_slp_instance_root_stmt): Handle unvectorized defs of a BB reduction. * gfortran.dg/vect/pr49955.f: New testcase. --- gcc/testsuite/gfortran.dg/vect/pr49955.f | 38 ++++++++++++++++++++++++ gcc/tree-vect-slp.cc | 30 ++++++++++++++++++- gcc/tree-vectorizer.h | 5 ++++ 3 files changed, 72 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gfortran.dg/vect/pr49955.f diff --git a/gcc/testsuite/gfortran.dg/vect/pr49955.f b/gcc/testsuite/gfortran.dg/vect/pr49955.f new file mode 100644 index 00000000000..a73cd5ada03 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/vect/pr49955.f @@ -0,0 +1,38 @@ +! { dg-do compile } +! { dg-additional-options "-ffast-math -fdump-tree-slp1" } + + subroutine shell(nx,ny,nz,q,dt,cfl,dx,dy,dz,cfll,gm,Pr,Re) + implicit none + integer nx,ny,nz,i,j,k + real*8 cfl,dx,dy,dz,dt + real*8 gm,Re,Pr,cfll,t1,t2,t3,t4,t5,t6,t7,t8,mu + real*8 q(5,nx,ny,nz) + + if (cfll.ge.cfl) cfll=cfl + t8=0.0d0 + + do k=1,nz + do j=1,ny + do i=1,nx + t1=q(1,i,j,k) + t2=q(2,i,j,k)/t1 + t3=q(3,i,j,k)/t1 + t4=q(4,i,j,k)/t1 + t5=(gm-1.0d0)*(q(5,i,j,k)-0.5d0*t1*(t2*t2+t3*t3+t4*t4)) + t6=dSQRT(gm*t5/t1) + mu=gm*Pr*(gm*t5/t1)**0.75d0*2.0d0/Re/t1 + t7=((dabs(t2)+t6)/dx+mu/dx**2)**2 + + 1 ((dabs(t3)+t6)/dy+mu/dy**2)**2 + + 2 ((dabs(t4)+t6)/dz+mu/dz**2)**2 + t7=DSQRT(t7) + t8=max(t8,t7) + enddo + enddo + enddo + dt=cfll / t8 + + return + end + +! We don't have an effective target for reduc_plus_scal optab support +! { dg-final { scan-tree-dump ".REDUC_PLUS" "slp1" { target x86_64-*-* } } } diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index eab3dcd40ec..070ab3ff7ae 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -209,6 +209,7 @@ vect_free_slp_instance (slp_instance instance) vect_free_slp_tree (SLP_INSTANCE_TREE (instance)); SLP_INSTANCE_LOADS (instance).release (); SLP_INSTANCE_ROOT_STMTS (instance).release (); + SLP_INSTANCE_REMAIN_STMTS (instance).release (); instance->subgraph_entries.release (); instance->cost_vec.release (); free (instance); @@ -3128,6 +3129,16 @@ vect_build_slp_instance (vec_info *vinfo, " %G", scalar_stmts[i]->stmt); } + /* When a BB reduction doesn't have an even number of lanes + strip it down, treating the remaining lane as scalar. + ??? Selecting the optimal set of lanes to vectorize would be nice + but SLP build for all lanes will fail quickly because we think + we're going to need unrolling. */ + auto_vec remain; + if (kind == slp_inst_kind_bb_reduc + && (scalar_stmts.length () & 1)) + remain.safe_push (scalar_stmts.pop ()); + /* Build the tree for the SLP instance. */ unsigned int group_size = scalar_stmts.length (); bool *matches = XALLOCAVEC (bool, group_size); @@ -3175,6 +3186,10 @@ vect_build_slp_instance (vec_info *vinfo, SLP_INSTANCE_UNROLLING_FACTOR (new_instance) = unrolling_factor; SLP_INSTANCE_LOADS (new_instance) = vNULL; SLP_INSTANCE_ROOT_STMTS (new_instance) = root_stmt_infos; + if (!remain.is_empty ()) + SLP_INSTANCE_REMAIN_STMTS (new_instance) = remain.copy (); + else + SLP_INSTANCE_REMAIN_STMTS (new_instance) = vNULL; SLP_INSTANCE_KIND (new_instance) = kind; new_instance->reduc_phis = NULL; new_instance->cost_vec = vNULL; @@ -9138,7 +9153,20 @@ vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance) gcc_unreachable (); tree scalar_def = gimple_build (&epilogue, as_combined_fn (reduc_fn), TREE_TYPE (TREE_TYPE (vec_def)), vec_def); - + if (!SLP_INSTANCE_REMAIN_STMTS (instance).is_empty ()) + { + tree rem_def = NULL_TREE; + for (auto rem : SLP_INSTANCE_REMAIN_STMTS (instance)) + if (!rem_def) + rem_def = gimple_get_lhs (rem->stmt); + else + rem_def = gimple_build (&epilogue, reduc_code, + TREE_TYPE (scalar_def), + rem_def, gimple_get_lhs (rem->stmt)); + scalar_def = gimple_build (&epilogue, reduc_code, + TREE_TYPE (scalar_def), + scalar_def, rem_def); + } gimple_stmt_iterator rgsi = gsi_for_stmt (instance->root_stmts[0]->stmt); gsi_insert_seq_before (&rgsi, epilogue, GSI_SAME_STMT); gimple_assign_set_rhs_from_tree (&rgsi, scalar_def); diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index a65161499ea..dea29a74ebb 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -257,6 +257,10 @@ public: from, NULL otherwise. */ vec root_stmts; + /* For slp_inst_kind_bb_reduc the defs that were not vectorized, NULL + otherwise. */ + vec remain_stmts; + /* The unrolling factor required to vectorized this SLP instance. */ poly_uint64 unrolling_factor; @@ -285,6 +289,7 @@ public: #define SLP_INSTANCE_UNROLLING_FACTOR(S) (S)->unrolling_factor #define SLP_INSTANCE_LOADS(S) (S)->loads #define SLP_INSTANCE_ROOT_STMTS(S) (S)->root_stmts +#define SLP_INSTANCE_REMAIN_STMTS(S) (S)->remain_stmts #define SLP_INSTANCE_KIND(S) (S)->kind #define SLP_TREE_CHILDREN(S) (S)->children