From patchwork Fri Nov 25 02:13:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sandra Loosemore X-Patchwork-Id: 25781 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp3720183wrr; Thu, 24 Nov 2022 18:14:18 -0800 (PST) X-Google-Smtp-Source: AA0mqf6GfIZW6RSraKugDFEVESmSy2XHW8epzUWxVwYBtgg91uQBbQkohtMoL3/XXmEjd2E34BOw X-Received: by 2002:aa7:d74b:0:b0:46a:8e72:e32c with SMTP id a11-20020aa7d74b000000b0046a8e72e32cmr4455361eds.139.1669342458435; Thu, 24 Nov 2022 18:14:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669342458; cv=none; d=google.com; s=arc-20160816; b=UV52NMToWzmW6ucOf3c4NNAtBgPQIk6TAkcBeRxnswdYFgHECs8dhOoyj3N3wrUtkq vAF9/eOeJpwekbzEvZ4nzxuppXyE3zhIDVTEC/ykq9eWSTM265a6WWMGL7za0CAdHjRY OsZsTWjKkGcqY8Wpuym1RtgALAE9yh0dtOPmO7H0DBACZ8ydzFmSxEXPWgLJ4DqMUZkr JjsARb3Vh1p5quLdBowT9SQ9YZPvABs3jdtHmTZAD+/MJdzDYeLsMgqtJ7ylqfSPI9AE XzI8l4OD8kapYIe8I9aX3Z++8Ps9rVvzvjQIteaxOjjwpi2MB1jJ4gTPDmqDcU3Aa7jz h6RQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:from:cc:to :content-language:user-agent:mime-version:date:message-id :ironport-sdr:dmarc-filter:delivered-to; bh=/WxOw9++6DSciG4eW5by9078NSgK+H3ymWu3VEtV/88=; b=QufUFdxibsw0y6zgqPYGGUCeEatC41BBAxSdROf6sZgTa3HMksyM+SkNXoO4920I23 xcHQSgY/5LnPabDiO92YKb/gl740hPk0TygZxLXTvnsS1ouEZGn+NwEJH+Lcl7eq0jiE UQTuxxQcg42vHrmqhhrmWplt1R0726jyoJ1EkKYfU7p4Qxlm3nt9m1qb0GPjwgBs59sz CMwEgUx+8qPYvKhYbNAzXrOi/b5IgpaXG8OO5rq+xuKAMlzayUUcxJKF170NzUNTv+7d rWT8iE1XkcGrACZX2esPmO5ARdHDPVBi/evoHWqOKaIqZ2AU5MCjzEIeaNRSVizNYqbI Y7WQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id hv11-20020a17090760cb00b0073dda1430b9si2055249ejc.403.2022.11.24.18.14.18 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 24 Nov 2022 18:14:18 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E68F1388B6A6 for ; Fri, 25 Nov 2022 02:14:09 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 8E9BF3889E00 for ; Fri, 25 Nov 2022 02:13:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 8E9BF3889E00 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.96,192,1665475200"; d="scan'208,223";a="88082179" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 24 Nov 2022 18:13:41 -0800 IronPort-SDR: mbUuuSCeNOywvIXxPESDTHQNVxFwhgdu/szzE4oxHJRHwYcPwvcVckCGBhy5yJpUja1jfP1EzW Kw6+9V6bHZpAAPaHR9J193xv+0KCWG1T2hpcaagF9K/BbBD6ASNo1EgxFO/PUKF6ZBI8zumfTl /hQ0gSsGB2cw54jmfNqtqfgyMprSk3xBfgcMBL2EGfKZO0FTzFamOlVV9SJmL/VYut/ZmZ+DId u+S2c1lAi6XUcTd+bHTo1KQjY9Oq9GGVdkxc4FoeUesGOLl2EqGrPM4mI012QB7eDVApbsnIDF t1o= Message-ID: Date: Thu, 24 Nov 2022 19:13:38 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Content-Language: en-US To: "gcc-patches@gcc.gnu.org" CC: Jakub Jelinek From: Sandra Loosemore Subject: [PATCH] [OpenMP] GC unused SIMD clones X-ClientProxiedBy: svr-orw-mbx-14.mgc.mentorg.com (147.34.90.214) To svr-orw-mbx-13.mgc.mentorg.com (147.34.90.213) X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750432438082013019?= X-GMAIL-MSGID: =?utf-8?q?1750432438082013019?= This patch is a followup to my not-yet-reviewed patch [PATCH v4] OpenMP: Generate SIMD clones for functions with "declare target" https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606218.html In comments on a previous iteration of that patch, I was asked to do something to delete unused SIMD clones to avoid code bloat; this is it. I've implemented something like a simple mark-and-sweep algorithm. Clones that are used are marked at the point where the call is generated in the vectorizer. The loop that iterates over functions to apply the passes after IPA is modified to defer processing of unmarked clones, and anything left over is deleted. OK to commit this along with the above-linked patch? -Sandra From bfffcea926d4dfb6275346237c61922a95c9e715 Mon Sep 17 00:00:00 2001 From: Sandra Loosemore Date: Wed, 23 Nov 2022 23:14:31 +0000 Subject: [PATCH] [OpenMP] GC unused SIMD clones SIMD clones are created during the IPA phase when it is not known whether or not the vectorizer can use them. Clones for functions with external linkage are part of the ABI, but local clones can be GC'ed if no calls are found in the compilation unit after vectorization. gcc/ChangeLog * cgraph.h (struct cgraph_node): Add gc_candidate bit, modify default constructor to initialize it. * cgraphunit.cc (expand_all_functions): Save gc_candidate functions for last and iterate to handle recursive calls. Delete leftover candidates at the end. * omp-simd-clone.cc (simd_clone_create): Set gc_candidate bit on local clones. * tree-vect-stmts.cc (vectorizable_simd_clone_call): Clear gc_candidate bit when a clone is used. gcc/testsuite/ChangeLog * testsuite/g++.dg/gomp/target-simd-clone-1.C: Tweak to test that the unused clone is GC'ed. * testsuite/gcc.dg/gomp/target-simd-clone-1.c: Likewise. --- gcc/cgraph.h | 7 ++- gcc/cgraphunit.cc | 49 ++++++++++++++++--- gcc/omp-simd-clone.cc | 5 ++ .../g++.dg/gomp/target-simd-clone-1.C | 7 ++- .../gcc.dg/gomp/target-simd-clone-1.c | 6 ++- gcc/tree-vect-stmts.cc | 3 ++ 6 files changed, 66 insertions(+), 11 deletions(-) diff --git a/gcc/cgraph.h b/gcc/cgraph.h index 4be67e3cea9..b065677a8d0 100644 --- a/gcc/cgraph.h +++ b/gcc/cgraph.h @@ -891,7 +891,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node versionable (false), can_change_signature (false), redefined_extern_inline (false), tm_may_enter_irr (false), ipcp_clone (false), declare_variant_alt (false), - calls_declare_variant_alt (false), m_uid (uid), m_summary_id (-1) + calls_declare_variant_alt (false), gc_candidate (false), + m_uid (uid), m_summary_id (-1) {} /* Remove the node from cgraph and all inline clones inlined into it. @@ -1490,6 +1491,10 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node unsigned declare_variant_alt : 1; /* True if the function calls declare_variant_alt functions. */ unsigned calls_declare_variant_alt : 1; + /* True if the function should only be emitted if it is used. This flag + is set for local SIMD clones when they are created and cleared if the + vectorizer uses them. */ + unsigned gc_candidate : 1; private: /* Unique id of the node. */ diff --git a/gcc/cgraphunit.cc b/gcc/cgraphunit.cc index b05d790bf8d..587daf5674e 100644 --- a/gcc/cgraphunit.cc +++ b/gcc/cgraphunit.cc @@ -1996,19 +1996,52 @@ expand_all_functions (void) /* Output functions in RPO so callees get optimized before callers. This makes ipa-ra and other propagators to work. - FIXME: This is far from optimal code layout. */ - for (i = new_order_pos - 1; i >= 0; i--) - { - node = order[i]; + FIXME: This is far from optimal code layout. + Make multiple passes over the list to defer processing of gc + candidates until all potential uses are seen. */ + int gc_candidates = 0; + int prev_gc_candidates = 0; - if (node->process) + while (1) + { + for (i = new_order_pos - 1; i >= 0; i--) { - expanded_func_count++; - node->process = 0; - node->expand (); + node = order[i]; + + if (node->gc_candidate) + gc_candidates++; + else if (node->process) + { + expanded_func_count++; + node->process = 0; + node->expand (); + } } + if (!gc_candidates || gc_candidates == prev_gc_candidates) + break; + prev_gc_candidates = gc_candidates; + gc_candidates = 0; } + /* Free any unused gc_candidate functions. */ + if (gc_candidates) + for (i = new_order_pos - 1; i >= 0; i--) + { + node = order[i]; + if (node->gc_candidate) + { + struct function *fn = DECL_STRUCT_FUNCTION (node->decl); + if (symtab->dump_file) + fprintf (symtab->dump_file, + "Deleting unused function %s\n", + IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl))); + node->process = false; + free_dominance_info (fn, CDI_DOMINATORS); + free_dominance_info (fn, CDI_POST_DOMINATORS); + node->release_body (false); + } + } + if (dump_file) fprintf (dump_file, "Expanded functions with time profile (%s):%u/%u\n", main_input_filename, profiled_func_count, expanded_func_count); diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc index 21d69aa8747..ea2d68a0def 100644 --- a/gcc/omp-simd-clone.cc +++ b/gcc/omp-simd-clone.cc @@ -702,6 +702,11 @@ simd_clone_create (struct cgraph_node *old_node, bool force_local) = old_node->calls_declare_variant_alt; } + /* Mark clones with internal linkage as gc'able, so they will not be + emitted unless the vectorizer can actually use them. */ + if (!TREE_PUBLIC (new_node->decl)) + new_node->gc_candidate = true; + return new_node; } diff --git a/gcc/testsuite/g++.dg/gomp/target-simd-clone-1.C b/gcc/testsuite/g++.dg/gomp/target-simd-clone-1.C index 10b5ac38812..b96473642bb 100644 --- a/gcc/testsuite/g++.dg/gomp/target-simd-clone-1.C +++ b/gcc/testsuite/g++.dg/gomp/target-simd-clone-1.C @@ -1,5 +1,5 @@ /* { dg-options "-fopenmp -O2" } */ -/* { dg-additional-options "-fopenmp-target-simd-clone=any -fdump-ipa-simdclone-details" } */ +/* { dg-additional-options "-fopenmp-target-simd-clone=any -fdump-ipa-simdclone-details -fdump-ipa-cgraph" } */ /* Test that simd clones are generated for functions with "declare target". */ @@ -23,3 +23,8 @@ void callit (int *a, int *b, int *c) /* { dg-final { scan-ipa-dump "Generated local clone _ZGV.*N.*__Z5additii" "simdclone" { target x86_64-*-* } } } */ /* { dg-final { scan-ipa-dump "Generated local clone _ZGV.*M.*__Z5additii" "simdclone" { target x86_64-*-* } } } */ + +/* Only the "N" clone is used. The other one should be GC'ed. */ + +/* { dg-final { scan-ipa-dump "Deleting unused function _ZGV.*M.*__Z5additii" "cgraph" { target x86_64-*-* } } } */ + diff --git a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c index 388dc2a956c..0d74aa971f9 100644 --- a/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c +++ b/gcc/testsuite/gcc.dg/gomp/target-simd-clone-1.c @@ -1,5 +1,5 @@ /* { dg-options "-fopenmp -O2" } */ -/* { dg-additional-options "-fopenmp-target-simd-clone=any -fdump-ipa-simdclone-details" } */ +/* { dg-additional-options "-fopenmp-target-simd-clone=any -fdump-ipa-simdclone-details -fdump-ipa-cgraph" } */ /* Test that simd clones are generated for functions with "declare target". */ @@ -23,3 +23,7 @@ void callit (int *a, int *b, int *c) /* { dg-final { scan-ipa-dump "Generated local clone _ZGV.*N.*_addit" "simdclone" { target x86_64-*-* } } } */ /* { dg-final { scan-ipa-dump "Generated local clone _ZGV.*M.*_addit" "simdclone" { target x86_64-*-* } } } */ + +/* Only the "N" clone is used. The other one should be GC'ed. */ + +/* { dg-final { scan-ipa-dump "Deleting unused function _ZGV.*M.*_addit" "cgraph" { target x86_64-*-* } } } */ diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index b35b986889d..a5015468d9f 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -4620,6 +4620,9 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, } vargs.release (); + /* Mark the clone as no longer being a candidate for GC. */ + bestn->gc_candidate = false; + /* The call in STMT might prevent it from being removed in dce. We however cannot remove it here, due to the way the ssa name it defines is mapped to the new definition. So just replace -- 2.31.1