From patchwork Fri Sep 16 08:05:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 1249 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5044:0:0:0:0:0 with SMTP id h4csp640254wrt; Fri, 16 Sep 2022 01:05:58 -0700 (PDT) X-Google-Smtp-Source: AMsMyM42DnYJcGf320mmZXeFjAHL+NTVuuHTo5a0kH1yj7RPVRgbIFTFbTGqBsq3K7sE79SHF/qH X-Received: by 2002:a17:906:9bd4:b0:770:4efb:acbe with SMTP id de20-20020a1709069bd400b007704efbacbemr2787225ejc.436.1663315557957; Fri, 16 Sep 2022 01:05:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663315557; cv=none; d=google.com; s=arc-20160816; b=r/EBRo3GLFSk8W0Ov1dP6+khFmUNtgCkfpbrmHHyjk9E/4gq7YZBvMIyd64btgYYPS q3lE938sa61lsHOTXYnUox0V/FaAHNhpGY5eaM7JX06qK46VdRYnbPLQcrHPgasxm5zF kx99ogH6bA1ZtKYTm4ZwaxIg1KiJXlmsVmf9TJCzV4S0FAFPr4WVtN58DkQJ+jii1slT 6HXxCRTxLLngtBtPAOVJhNTr30VsFeqlyEoeNSu/b3Pf9vgfmxCNbRWVpJ8YRnZPtqCe Cmyv5aPjYyLJrnDk2Z5Htmoj7hVU1taE1q1vZpAgQj65RpgmnsBg7r+KdonWXVKiK3ve BcSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :mime-version:user-agent:message-id:date:subject:mail-followup-to:to :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=eL2kYtJmWbLY09ylRQV6nTbTsPt4d3MKiXIvxW1CyLw=; b=Pv0bepVZhYC2rhUnZdCaE00J4DfxuaCtdpfAbvzZUGI6+5bDoULR4HFqdSGT1NIH/I 44yezve2SIBSsq0GDffKWQjMPylpn2K8fPKxhjOtS2cq9j0FDJTpWADicrPdaJgDcqMY iba2C2RY5X8+vtK/R3gRH4znOdaXcfhYY9DtlobNEaO3q32oaHQLln15eBrN4tDa1g1+ 4FGLhtDbY8WtoECbvt/nLn+4bm7yDXds6JTQiKZ1/pQDxbNczr6exi1HQSJ/St9ailV+ h6kGQeoWDIuObrsDTo3/oylbZ78UKGrus4ePoONrsXxPwLdukpKx8hak/2AdKKWWwG8t MrDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=h1EMoYv3; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id dm3-20020a170907948300b00774afdfb3f2si3341381ejc.495.2022.09.16.01.05.57 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Sep 2022 01:05:57 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=h1EMoYv3; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9E69E3853556 for ; Fri, 16 Sep 2022 08:05:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9E69E3853556 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1663315556; bh=eL2kYtJmWbLY09ylRQV6nTbTsPt4d3MKiXIvxW1CyLw=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=h1EMoYv3M6TnCNCmLVEHVyBJGuwAVfKCACnXu2h3awrPym/Kav5KWlGbgoxRGue9S 3vCLGdDfxuAPyc5HAGY8YYYX3eEgoyET5NLmCnNCeS/en+IF4q5DN2zyP36Qn5La3J sJbKm/3zJnsEVGNgoypEyb39ieA6RoKXeMtGmGes= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 9CD653853556 for ; Fri, 16 Sep 2022 08:05:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9CD653853556 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BAA631D34; Fri, 16 Sep 2022 01:05:17 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CC75F3F73D; Fri, 16 Sep 2022 01:05:10 -0700 (PDT) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, rguenther@suse.de, richard.sandiford@arm.com Subject: [PATCH] vect: Fix SLP layout handling of masked loads [PR106794] Date: Fri, 16 Sep 2022 09:05:09 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-48.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Cc: rguenther@suse.de Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1744112774927420097?= X-GMAIL-MSGID: =?utf-8?q?1744112774927420097?= PR106794 shows that I'd forgotten about masked loads when doing the SLP layout changes. These loads can't currently be permuted independently of their mask input, so during construction they never get a load permutation. (If we did support permuting masked loads in future, the mask would need to be in the right order for the load, rather than in the order implied by the result of the permutation. Since masked loads can't be partly or fully scalarised in the way that normal permuted loads can be, there's probably no benefit to fusing the permutation and the load. Permutation after the fact is probably good enough.) Tested on aarch64-linux-gnu & x86_64-linux-gnu. OK to install? Richard gcc/ PR tree-optimization/106794 PR tree-optimization/106914 * tree-vect-slp.cc (vect_optimize_slp_pass::internal_node_cost): Only consider loads that already have a permutation. (vect_optimize_slp_pass::start_choosing_layouts): Assert that loads with permutations are leaf nodes. Prevent any kind of grouped access from changing layout if it doesn't have a load permutation. gcc/testsuite/ * gcc.dg/vect/pr106914.c: New test. * g++.dg/vect/pr106794.cc: Likewise. --- gcc/testsuite/g++.dg/vect/pr106794.cc | 40 +++++++++++++++++++++++++++ gcc/testsuite/gcc.dg/vect/pr106914.c | 15 ++++++++++ gcc/tree-vect-slp.cc | 30 ++++++++++++++------ 3 files changed, 77 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/g++.dg/vect/pr106794.cc create mode 100644 gcc/testsuite/gcc.dg/vect/pr106914.c diff --git a/gcc/testsuite/g++.dg/vect/pr106794.cc b/gcc/testsuite/g++.dg/vect/pr106794.cc new file mode 100644 index 00000000000..f056563c4e1 --- /dev/null +++ b/gcc/testsuite/g++.dg/vect/pr106794.cc @@ -0,0 +1,40 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-Ofast" } */ +/* { dg-additional-options "-march=bdver2" { target x86_64-*-* i?86-*-* } } */ + +template struct Vector3 { + Vector3(); + Vector3(T, T, T); + T length() const; + T x, y, z; +}; +template +Vector3::Vector3(T _x, T _y, T _z) : x(_x), y(_y), z(_z) {} +Vector3 cross(Vector3 a, Vector3 b) { + return Vector3(a.y * b.z - a.z * b.y, a.z * b.x - a.x * b.z, + a.x * b.y - a.y * b.x); +} +template T Vector3::length() const { return z; } +int generateNormals_i; +float generateNormals_p2_0, generateNormals_p0_0; +struct SphereMesh { + void generateNormals(); + float vertices; +}; +void SphereMesh::generateNormals() { + Vector3 *faceNormals = new Vector3; + for (int j; j; j++) { + float *p0 = &vertices + 3, *p1 = &vertices + j * 3, *p2 = &vertices + 3, + *p3 = &vertices + generateNormals_i + j * 3; + Vector3 v0(p1[0] - generateNormals_p0_0, p1[1] - 1, p1[2] - 2), + v1(0, 1, 2); + if (v0.length()) + v1 = Vector3(p3[0] - generateNormals_p2_0, p3[1] - p2[1], + p3[2] - p2[2]); + else + v1 = Vector3(generateNormals_p0_0 - p3[0], p0[1] - p3[1], + p0[2] - p3[2]); + Vector3 faceNormal = cross(v0, v1); + faceNormals[j] = faceNormal; + } +} diff --git a/gcc/testsuite/gcc.dg/vect/pr106914.c b/gcc/testsuite/gcc.dg/vect/pr106914.c new file mode 100644 index 00000000000..9d9b3e30081 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr106914.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-fprofile-generate" } */ +/* { dg-additional-options "-mavx512vl" { target x86_64-*-* i?86-*-* } } */ + +int *mask_slp_int64_t_8_2_x, *mask_slp_int64_t_8_2_y, *mask_slp_int64_t_8_2_z; + +void +__attribute__mask_slp_int64_t_8_2() { + for (int i; i; i += 8) { + mask_slp_int64_t_8_2_x[i + 6] = + mask_slp_int64_t_8_2_y[i + 6] ? mask_slp_int64_t_8_2_z[i] : 1; + mask_slp_int64_t_8_2_x[i + 7] = + mask_slp_int64_t_8_2_y[i + 7] ? mask_slp_int64_t_8_2_z[i + 7] : 2; + } +} diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index ca3422c2a1e..229f2663ebc 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -4494,7 +4494,8 @@ vect_optimize_slp_pass::internal_node_cost (slp_tree node, int in_layout_i, stmt_vec_info rep = SLP_TREE_REPRESENTATIVE (node); if (rep && STMT_VINFO_DATA_REF (rep) - && DR_IS_READ (STMT_VINFO_DATA_REF (rep))) + && DR_IS_READ (STMT_VINFO_DATA_REF (rep)) + && SLP_TREE_LOAD_PERMUTATION (node).exists ()) { auto_load_permutation_t tmp_perm; tmp_perm.safe_splice (SLP_TREE_LOAD_PERMUTATION (node)); @@ -4569,8 +4570,12 @@ vect_optimize_slp_pass::start_choosing_layouts () if (SLP_TREE_LOAD_PERMUTATION (node).exists ()) { /* If splitting out a SLP_TREE_LANE_PERMUTATION can make the node - unpermuted, record a layout that reverses this permutation. */ - gcc_assert (partition.layout == 0); + unpermuted, record a layout that reverses this permutation. + + We would need more work to cope with loads that are internally + permuted and also have inputs (such as masks for + IFN_MASK_LOADs). */ + gcc_assert (partition.layout == 0 && !m_slpg->vertices[node_i].succ); if (!STMT_VINFO_GROUPED_ACCESS (dr_stmt)) continue; dr_stmt = DR_GROUP_FIRST_ELEMENT (dr_stmt); @@ -4684,12 +4689,21 @@ vect_optimize_slp_pass::start_choosing_layouts () vertex.weight = vect_slp_node_weight (node); /* We do not handle stores with a permutation, so all - incoming permutations must have been materialized. */ + incoming permutations must have been materialized. + + We also don't handle masked grouped loads, which lack a + permutation vector. In this case the memory locations + form an implicit second input to the loads, on top of the + explicit mask input, and the memory input's layout cannot + be changed. + + On the other hand, we do support permuting gather loads and + masked gather loads, where each scalar load is independent + of the others. This can be useful if the address/index input + benefits from permutation. */ if (STMT_VINFO_DATA_REF (rep) - && DR_IS_WRITE (STMT_VINFO_DATA_REF (rep))) - /* ??? We're forcing materialization in place - of the child here, we'd need special handling - in materialization to leave layout -1 here. */ + && STMT_VINFO_GROUPED_ACCESS (rep) + && !SLP_TREE_LOAD_PERMUTATION (node).exists ()) partition.layout = 0; /* We cannot change the layout of an operation that is