[OpenMP] GC unused SIMD clones
Checks
Commit Message
This patch is a followup to my not-yet-reviewed patch
[PATCH v4] OpenMP: Generate SIMD clones for functions with "declare target"
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606218.html
In comments on a previous iteration of that patch, I was asked to do
something to delete unused SIMD clones to avoid code bloat; this is it.
I've implemented something like a simple mark-and-sweep algorithm.
Clones that are used are marked at the point where the call is generated
in the vectorizer. The loop that iterates over functions to apply the
passes after IPA is modified to defer processing of unmarked clones, and
anything left over is deleted.
OK to commit this along with the above-linked patch?
-Sandra
Comments
On Thu, Nov 24, 2022 at 07:13:38PM -0700, Sandra Loosemore wrote:
> This patch is a followup to my not-yet-reviewed patch
>
> [PATCH v4] OpenMP: Generate SIMD clones for functions with "declare target"
>
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606218.html
>
> In comments on a previous iteration of that patch, I was asked to do
> something to delete unused SIMD clones to avoid code bloat; this is it.
>
> I've implemented something like a simple mark-and-sweep algorithm. Clones
> that are used are marked at the point where the call is generated in the
> vectorizer. The loop that iterates over functions to apply the passes after
> IPA is modified to defer processing of unmarked clones, and anything left
> over is deleted.
>
> OK to commit this along with the above-linked patch?
I think you want Honza to review this.
Jakub
On 25.11.22 03:13, Sandra Loosemore wrote:
> This patch is a followup to my not-yet-reviewed patch
> [PATCH v4] OpenMP: Generate SIMD clones for functions with "declare
> target"
That patch got reviewed and went into mainline on Nov 15, 2022 as
https://gcc.gnu.org/r13-4309-g309e2d95e3b930c6f15c8a5346b913158404c76d
> In comments on a previous iteration of that patch, I was asked to do
> something to delete unused SIMD clones to avoid code bloat; this is it.
>
> I've implemented something like a simple mark-and-sweep algorithm.
> Clones that are used are marked at the point where the call is
> generated in the vectorizer. The loop that iterates over functions to
> apply the passes after IPA is modified to defer processing of unmarked
> clones, and anything left over is deleted.
Jakub referred to Honza for the review, who wrote yesterday off list (to
me and Sandra):
> I am really sorry for taking so long time. It was busy month for me
> and I was not very keen about the idea, since we had such logic
> implemented many years ago but removed it to be able to determine
> functions to be output early and optimize code layout.
>
> I see that this is not possible with current organization where
> vectorization is run late, so I guess it does make sense to do what you
> are doing.
>
> Patch is OK,
> Honza
Thanks for the review! (And to Sandra: thanks for the patch.)
I leave it to Sandra to commit her patch and only want to update the
gcc-patches@ email. However. I think we can expect a commit tomorrow.
(Today is a holiday at her place - as new year's day fell on a Sunday.)
Thanks and happy new year!
Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
On 1/2/23 03:20, Tobias Burnus wrote:
> On 25.11.22 03:13, Sandra Loosemore wrote:
>> This patch is a followup to my not-yet-reviewed patch
>> [PATCH v4] OpenMP: Generate SIMD clones for functions with "declare
>> target"
>
> That patch got reviewed and went into mainline on Nov 15, 2022 as
> https://gcc.gnu.org/r13-4309-g309e2d95e3b930c6f15c8a5346b913158404c76d>>
>
>> In comments on a previous iteration of that patch, I was asked to do
>> something to delete unused SIMD clones to avoid code bloat; this is it.
>>
>> I've implemented something like a simple mark-and-sweep algorithm.
>> Clones that are used are marked at the point where the call is
>> generated in the vectorizer. The loop that iterates over functions to
>> apply the passes after IPA is modified to defer processing of unmarked
>> clones, and anything left over is deleted.
>
>
> Jakub referred to Honza for the review, who wrote yesterday off list (to
> me and Sandra):
>
>> I am really sorry for taking so long time. It was busy month for me
>> and I was not very keen about the idea, since we had such logic
>> implemented many years ago but removed it to be able to determine
>> functions to be output early and optimize code layout.
>>
>> I see that this is not possible with current organization where
>> vectorization is run late, so I guess it does make sense to do what you
>> are doing.
>>
>> Patch is OK,
>> Honza
>
> Thanks for the review! (And to Sandra: thanks for the patch.)
>
> I leave it to Sandra to commit her patch and only want to update the
> gcc-patches@ email. However. I think we can expect a commit tomorrow.
> (Today is a holiday at her place - as new year's day fell on a Sunday.)
Yes, the patch is committed now, and also backported to og12. Thanks,
Honza, for the review, and Tobias, for your assistance. I'm really glad
to get this project done, finally. :-)
-Sandra
From bfffcea926d4dfb6275346237c61922a95c9e715 Mon Sep 17 00:00:00 2001
From: Sandra Loosemore <sandra@codesourcery.com>
Date: Wed, 23 Nov 2022 23:14:31 +0000
Subject: [PATCH] [OpenMP] GC unused SIMD clones
SIMD clones are created during the IPA phase when it is not known whether
or not the vectorizer can use them. Clones for functions with external
linkage are part of the ABI, but local clones can be GC'ed if no calls are
found in the compilation unit after vectorization.
gcc/ChangeLog
* cgraph.h (struct cgraph_node): Add gc_candidate bit, modify
default constructor to initialize it.
* cgraphunit.cc (expand_all_functions): Save gc_candidate functions
for last and iterate to handle recursive calls. Delete leftover
candidates at the end.
* omp-simd-clone.cc (simd_clone_create): Set gc_candidate bit
on local clones.
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Clear
gc_candidate bit when a clone is used.
gcc/testsuite/ChangeLog
* testsuite/g++.dg/gomp/target-simd-clone-1.C: Tweak to test
that the unused clone is GC'ed.
* testsuite/gcc.dg/gomp/target-simd-clone-1.c: Likewise.
---
gcc/cgraph.h | 7 ++-
gcc/cgraphunit.cc | 49 ++++++++++++++++---
gcc/omp-simd-clone.cc | 5 ++
.../g++.dg/gomp/target-simd-clone-1.C | 7 ++-
.../gcc.dg/gomp/target-simd-clone-1.c | 6 ++-
gcc/tree-vect-stmts.cc | 3 ++
6 files changed, 66 insertions(+), 11 deletions(-)
@@ -891,7 +891,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node
versionable (false), can_change_signature (false),
redefined_extern_inline (false), tm_may_enter_irr (false),
ipcp_clone (false), declare_variant_alt (false),
- calls_declare_variant_alt (false), m_uid (uid), m_summary_id (-1)
+ calls_declare_variant_alt (false), gc_candidate (false),
+ m_uid (uid), m_summary_id (-1)
{}
/* Remove the node from cgraph and all inline clones inlined into it.
@@ -1490,6 +1491,10 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node
unsigned declare_variant_alt : 1;
/* True if the function calls declare_variant_alt functions. */
unsigned calls_declare_variant_alt : 1;
+ /* True if the function should only be emitted if it is used. This flag
+ is set for local SIMD clones when they are created and cleared if the
+ vectorizer uses them. */
+ unsigned gc_candidate : 1;
private:
/* Unique id of the node. */
@@ -1996,19 +1996,52 @@ expand_all_functions (void)
/* Output functions in RPO so callees get optimized before callers. This
makes ipa-ra and other propagators to work.
- FIXME: This is far from optimal code layout. */
- for (i = new_order_pos - 1; i >= 0; i--)
- {
- node = order[i];
+ FIXME: This is far from optimal code layout.
+ Make multiple passes over the list to defer processing of gc
+ candidates until all potential uses are seen. */
+ int gc_candidates = 0;
+ int prev_gc_candidates = 0;
- if (node->process)
+ while (1)
+ {
+ for (i = new_order_pos - 1; i >= 0; i--)
{
- expanded_func_count++;
- node->process = 0;
- node->expand ();
+ node = order[i];
+
+ if (node->gc_candidate)
+ gc_candidates++;
+ else if (node->process)
+ {
+ expanded_func_count++;
+ node->process = 0;
+ node->expand ();
+ }
}
+ if (!gc_candidates || gc_candidates == prev_gc_candidates)
+ break;
+ prev_gc_candidates = gc_candidates;
+ gc_candidates = 0;
}
+ /* Free any unused gc_candidate functions. */
+ if (gc_candidates)
+ for (i = new_order_pos - 1; i >= 0; i--)
+ {
+ node = order[i];
+ if (node->gc_candidate)
+ {
+ struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+ if (symtab->dump_file)
+ fprintf (symtab->dump_file,
+ "Deleting unused function %s\n",
+ IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl)));
+ node->process = false;
+ free_dominance_info (fn, CDI_DOMINATORS);
+ free_dominance_info (fn, CDI_POST_DOMINATORS);
+ node->release_body (false);
+ }
+ }
+
if (dump_file)
fprintf (dump_file, "Expanded functions with time profile (%s):%u/%u\n",
main_input_filename, profiled_func_count, expanded_func_count);
@@ -702,6 +702,11 @@ simd_clone_create (struct cgraph_node *old_node, bool force_local)
= old_node->calls_declare_variant_alt;
}
+ /* Mark clones with internal linkage as gc'able, so they will not be
+ emitted unless the vectorizer can actually use them. */
+ if (!TREE_PUBLIC (new_node->decl))
+ new_node->gc_candidate = true;
+
return new_node;
}
@@ -1,5 +1,5 @@
/* { dg-options "-fopenmp -O2" } */
-/* { dg-additional-options "-fopenmp-target-simd-clone=any -fdump-ipa-simdclone-details" } */
+/* { dg-additional-options "-fopenmp-target-simd-clone=any -fdump-ipa-simdclone-details -fdump-ipa-cgraph" } */
/* Test that simd clones are generated for functions with "declare target". */
@@ -23,3 +23,8 @@ void callit (int *a, int *b, int *c)
/* { dg-final { scan-ipa-dump "Generated local clone _ZGV.*N.*__Z5additii" "simdclone" { target x86_64-*-* } } } */
/* { dg-final { scan-ipa-dump "Generated local clone _ZGV.*M.*__Z5additii" "simdclone" { target x86_64-*-* } } } */
+
+/* Only the "N" clone is used. The other one should be GC'ed. */
+
+/* { dg-final { scan-ipa-dump "Deleting unused function _ZGV.*M.*__Z5additii" "cgraph" { target x86_64-*-* } } } */
+
@@ -1,5 +1,5 @@
/* { dg-options "-fopenmp -O2" } */
-/* { dg-additional-options "-fopenmp-target-simd-clone=any -fdump-ipa-simdclone-details" } */
+/* { dg-additional-options "-fopenmp-target-simd-clone=any -fdump-ipa-simdclone-details -fdump-ipa-cgraph" } */
/* Test that simd clones are generated for functions with "declare target". */
@@ -23,3 +23,7 @@ void callit (int *a, int *b, int *c)
/* { dg-final { scan-ipa-dump "Generated local clone _ZGV.*N.*_addit" "simdclone" { target x86_64-*-* } } } */
/* { dg-final { scan-ipa-dump "Generated local clone _ZGV.*M.*_addit" "simdclone" { target x86_64-*-* } } } */
+
+/* Only the "N" clone is used. The other one should be GC'ed. */
+
+/* { dg-final { scan-ipa-dump "Deleting unused function _ZGV.*M.*_addit" "cgraph" { target x86_64-*-* } } } */
@@ -4620,6 +4620,9 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
}
vargs.release ();
+ /* Mark the clone as no longer being a candidate for GC. */
+ bestn->gc_candidate = false;
+
/* The call in STMT might prevent it from being removed in dce.
We however cannot remove it here, due to the way the ssa name
it defines is mapped to the new definition. So just replace
--
2.31.1