From patchwork Mon Jul 18 11:19:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Pan2 via Gcc-patches" X-Patchwork-Id: 33 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a98:d5ce:0:b0:178:cc93:bf7d with SMTP id g14csp1789983eik; Mon, 18 Jul 2022 04:19:56 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sTvxqk7cZfXc1atiQFo3aOEBPRp/1PQylAAdbszA3C66DJJNJ+XUTV0qqL3ZR0rtWbASYZ X-Received: by 2002:aa7:df83:0:b0:43a:4b96:f126 with SMTP id b3-20020aa7df83000000b0043a4b96f126mr36913884edy.309.1658143196744; Mon, 18 Jul 2022 04:19:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658143196; cv=none; d=google.com; s=arc-20160816; b=XptLbr3iU5RTdABM93DnHDgKj23x7SDpkoAY1Cgb/CrxAit0Cim4VrB9Nt1eY3F6BF xP5GhTAp88WrViQ0CmsMW1eV7Co2u9rDhlWe8JUuQ5kvAkFkq9vNesTBMfG0KpRBkVHf +CjKUU+0/vPCiRwphax4MSeSAFpZNJidYdw9BZwcTeHlqDfnHSGUK9u5+rax9TBDMfzV dpGwQfPsISV9h72WXL/QpNORQJNZncj1QCw+7NsaIxpfhqR5bSU1j0gxDMMjsNeJ3fEI e79uwvF2P4Qa5c8TXrTRYCjemaoXmeV7osXJb77ubPAnJinUjIjulkBQe8ft6tjvVd/m /CAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:mime-version :user-agent:message-id:subject:to:date:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=es/z6mUo0PxUFmyr87pZlNgB0ySxXcTU/BXOs6hXDHQ=; b=jD70lK6CTb/NuW2b2fYxYXF8eUzhfpG1hmOxzFMyO+QXIsPXfwSRj2kSq3mRutVcLK zIQbmXp9PWqP5Nw/kqzxj6bA8AEyRlX3U+Xv2QrmpXLdP0yiKnW7DTJWqwxo/zbvntAe OWdfrHsNdyWvYDTQ7PS/fOGk3tmoe9wdnF0HuZ2KTGuU3q1PyeKZYXqaAmHFc02f+0U9 yEKrD5bdAu8ZcFIo7AgKRpArA8Uj9/7Vts+OY97fdfmAYEB4wQPi+q3muJKPigFjkoZU gxREJPJ05nswd6eR4yNT20PR37UiU9mwCzcxaJeEgcdm7JbDLDVi0ooPVr462GX91TFA oJFQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=cHzd7bsh; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id e17-20020a17090658d100b00711f63ebdb3si19246256ejs.565.2022.07.18.04.19.56 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Jul 2022 04:19:56 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=cHzd7bsh; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8167B3857B8C for ; Mon, 18 Jul 2022 11:19:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8167B3857B8C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1658143195; bh=es/z6mUo0PxUFmyr87pZlNgB0ySxXcTU/BXOs6hXDHQ=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=cHzd7bshxrFgXNWBe8k4nWMKcMx4y2if8n274As0mdWwPK6Wve2UlTkI7k8S92TP8 bQJ/SritqeqqHau7Co//HHL40LlL4fbfO9MvHR8vFT2Chyh7oFKNWO0WbxFm8qgxGV FqDyQD3VmRVrKxPHeNTeBgGu0kjJDYKvqHqj1D04= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by sourceware.org (Postfix) with ESMTPS id 4BBD43858D33 for ; Mon, 18 Jul 2022 11:19:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4BBD43858D33 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 414BC1FF41 for ; Mon, 18 Jul 2022 11:19:12 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 3C4E32C141 for ; Mon, 18 Jul 2022 11:19:12 +0000 (UTC) Date: Mon, 18 Jul 2022 11:19:12 +0000 (UTC) To: gcc-patches@gcc.gnu.org Subject: [PATCH] Improve common reduction vs builtin code generation in loop distribution Message-ID: User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Biener via Gcc-patches From: "Li, Pan2 via Gcc-patches" Reply-To: Richard Biener Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1738689160619693881?= X-GMAIL-MSGID: =?utf-8?q?1738689160619693881?= loop distribution currently cannot handle the situation when the last partition is a builtin but there's a common reduction in all partitions (like the final IV value). The following lifts this restriction by making the last non-builtin partition provide the definitions for the loop-closed PHI nodes. Since we have heuristics in place to avoid code generating builtins last writing a testcase is difficult (but I ran into a case with other pending patches that made the heuristic ineffective). What's remaining is the inability to preserve common reductions when all partitions could be builtins (in some cases final value replacement could come to the rescue here). Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. * tree-loop-distribution.cc (copy_loop_before): Add the ability to replace the original LC PHI defs. (generate_loops_for_partition): Pass through a flag whether to redirect original LC PHI defs. (generate_code_for_partition): Likewise. (loop_distribution::distribute_loop): Compute the partition that should provide the LC PHI defs for common reductions and pass that down. --- gcc/tree-loop-distribution.cc | 64 ++++++++++++++++++++++++----------- 1 file changed, 45 insertions(+), 19 deletions(-) diff --git a/gcc/tree-loop-distribution.cc b/gcc/tree-loop-distribution.cc index ed7f432f322..0714bc41a43 100644 --- a/gcc/tree-loop-distribution.cc +++ b/gcc/tree-loop-distribution.cc @@ -942,7 +942,7 @@ stmt_has_scalar_dependences_outside_loop (loop_p loop, gimple *stmt) /* Return a copy of LOOP placed before LOOP. */ static class loop * -copy_loop_before (class loop *loop) +copy_loop_before (class loop *loop, bool redirect_lc_phi_defs) { class loop *res; edge preheader = loop_preheader_edge (loop); @@ -950,6 +950,24 @@ copy_loop_before (class loop *loop) initialize_original_copy_tables (); res = slpeel_tree_duplicate_loop_to_edge_cfg (loop, NULL, preheader); gcc_assert (res != NULL); + + /* When a not last partition is supposed to keep the LC PHIs computed + adjust their definitions. */ + if (redirect_lc_phi_defs) + { + edge exit = single_exit (loop); + for (gphi_iterator si = gsi_start_phis (exit->dest); !gsi_end_p (si); + gsi_next (&si)) + { + gphi *phi = si.phi (); + if (virtual_operand_p (gimple_phi_result (phi))) + continue; + use_operand_p use_p = PHI_ARG_DEF_PTR_FROM_EDGE (phi, exit); + tree new_def = get_current_def (USE_FROM_PTR (use_p)); + SET_USE (use_p, new_def); + } + } + free_original_copy_tables (); delete_update_ssa (); @@ -977,7 +995,7 @@ create_bb_after_loop (class loop *loop) static void generate_loops_for_partition (class loop *loop, partition *partition, - bool copy_p) + bool copy_p, bool keep_lc_phis_p) { unsigned i; basic_block *bbs; @@ -985,7 +1003,7 @@ generate_loops_for_partition (class loop *loop, partition *partition, if (copy_p) { int orig_loop_num = loop->orig_loop_num; - loop = copy_loop_before (loop); + loop = copy_loop_before (loop, keep_lc_phis_p); gcc_assert (loop != NULL); loop->orig_loop_num = orig_loop_num; create_preheader (loop, CP_SIMPLE_PREHEADERS); @@ -1336,7 +1354,8 @@ destroy_loop (class loop *loop) static bool generate_code_for_partition (class loop *loop, - partition *partition, bool copy_p) + partition *partition, bool copy_p, + bool keep_lc_phis_p) { switch (partition->kind) { @@ -1345,7 +1364,8 @@ generate_code_for_partition (class loop *loop, /* Reductions all have to be in the last partition. */ gcc_assert (!partition_reduction_p (partition) || !copy_p); - generate_loops_for_partition (loop, partition, copy_p); + generate_loops_for_partition (loop, partition, copy_p, + keep_lc_phis_p); return false; case PKIND_MEMSET: @@ -3013,6 +3033,7 @@ loop_distribution::distribute_loop (class loop *loop, bool any_builtin = false; bool reduction_in_all = false; + int reduction_partition_num = -1; FOR_EACH_VEC_ELT (partitions, i, partition) { reduction_in_all @@ -3092,10 +3113,13 @@ loop_distribution::distribute_loop (class loop *loop, } /* Put a non-builtin partition last if we need to preserve a reduction. - ??? This is a workaround that makes sort_partitions_by_post_order do - the correct thing while in reality it should sort each component - separately and then put the component with a reduction or a non-builtin - last. */ + In most cases this helps to keep a normal partition last avoiding to + spill a reduction result across builtin calls. + ??? The proper way would be to use dependences to see whether we + can move builtin partitions earlier during merge_dep_scc_partitions + and its sort_partitions_by_post_order. Especially when the + dependence graph is composed of multiple independent subgraphs the + heuristic does not work reliably. */ if (reduction_in_all && partition_builtin_p (partitions.last())) FOR_EACH_VEC_ELT (partitions, i, partition) @@ -3126,19 +3150,20 @@ loop_distribution::distribute_loop (class loop *loop, finalize_partitions (loop, &partitions, &alias_ddrs); - /* If there is a reduction in all partitions make sure the last one - is not classified for builtin code generation. */ + /* If there is a reduction in all partitions make sure the last + non-builtin partition provides the LC PHI defs. */ if (reduction_in_all) { - partition = partitions.last (); - if (only_patterns_p - && partition_builtin_p (partition) - && !partition_builtin_p (partitions[0])) + FOR_EACH_VEC_ELT (partitions, i, partition) + if (!partition_builtin_p (partition)) + reduction_partition_num = i; + if (reduction_partition_num == -1) { - nbp = 0; - goto ldist_done; + /* If all partitions are builtin, force the last one to + be code generated as normal partition. */ + partition = partitions.last (); + partition->kind = PKIND_NORMAL; } - partition->kind = PKIND_NORMAL; } nbp = partitions.length (); @@ -3164,7 +3189,8 @@ loop_distribution::distribute_loop (class loop *loop, { if (partition_builtin_p (partition)) (*nb_calls)++; - *destroy_p |= generate_code_for_partition (loop, partition, i < nbp - 1); + *destroy_p |= generate_code_for_partition (loop, partition, i < nbp - 1, + i == reduction_partition_num); } ldist_done: