cgraph: Handle simd clones in cgraph_node::set_{const,pure}_flag [PR106433]
Checks
Commit Message
Hi!
The following testcase ICEs, because we determine only in late pure const
pass that bar is const (the content of the function loses a store to a
global var during dse3 and read from it during cddce2) and local-pure-const2
makes it const. The cgraph ordering is that post IPA (in late IPA simd
clones are created) bar is processed first, then foo as its caller, then
foo.simdclone* and finally bar.simdclone*. Conceptually I think that is the
right ordering which allows for static simd clones to be removed.
The reason for the ICE is that because bar was marked const, the call to
it lost vops before vectorization, and when we in foo.simdclone* try to
vectorize the call to bar, we replace it with bar.simdclone* which hasn't
been marked const and so needs vops, which we don't add.
Now, because the simd clones are created from the same IL, just in a loop
with different argument/return value passing, I think generally if the base
function is determined to be const or pure, the simd clones should be too,
unless e.g. the vectorization causes different optimization decisions, but
then still the global memory reads if any shouldn't affect what the function
does and global memory stores shouldn't be reachable at runtime.
So, the following patch changes set_{const,pure}_flag to mark also simd
clones.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
2023-02-07 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/106433
* cgraph.cc (set_const_flag_1): Recurse on simd clones too.
(cgraph_node::set_pure_flag): Call set_pure_flag_1 on simd clones too.
* gcc.c-torture/compile/pr106433.c: New test.
Jakub
Comments
> Am 07.02.2023 um 09:37 schrieb Jakub Jelinek <jakub@redhat.com>:
>
> Hi!
>
> The following testcase ICEs, because we determine only in late pure const
> pass that bar is const (the content of the function loses a store to a
> global var during dse3 and read from it during cddce2) and local-pure-const2
> makes it const. The cgraph ordering is that post IPA (in late IPA simd
> clones are created) bar is processed first, then foo as its caller, then
> foo.simdclone* and finally bar.simdclone*. Conceptually I think that is the
> right ordering which allows for static simd clones to be removed.
>
> The reason for the ICE is that because bar was marked const, the call to
> it lost vops before vectorization, and when we in foo.simdclone* try to
> vectorize the call to bar, we replace it with bar.simdclone* which hasn't
> been marked const and so needs vops, which we don't add.
>
> Now, because the simd clones are created from the same IL, just in a loop
> with different argument/return value passing, I think generally if the base
> function is determined to be const or pure, the simd clones should be too,
> unless e.g. the vectorization causes different optimization decisions, but
> then still the global memory reads if any shouldn't affect what the function
> does and global memory stores shouldn't be reachable at runtime.
>
> So, the following patch changes set_{const,pure}_flag to mark also simd
> clones.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Ok,
Thanks,
Richard
> 2023-02-07 Jakub Jelinek <jakub@redhat.com>
>
> PR tree-optimization/106433
> * cgraph.cc (set_const_flag_1): Recurse on simd clones too.
> (cgraph_node::set_pure_flag): Call set_pure_flag_1 on simd clones too.
>
> * gcc.c-torture/compile/pr106433.c: New test.
>
> --- gcc/cgraph.cc.jj 2023-02-02 10:54:44.327473492 +0100
> +++ gcc/cgraph.cc 2023-02-06 12:28:22.040593063 +0100
> @@ -2764,6 +2764,9 @@ set_const_flag_1 (cgraph_node *node, boo
> if (!set_const || alias->get_availability () > AVAIL_INTERPOSABLE)
> set_const_flag_1 (alias, set_const, looping, changed);
> }
> + for (struct cgraph_node *n = node->simd_clones; n != NULL;
> + n = n->simdclone->next_clone)
> + set_const_flag_1 (n, set_const, looping, changed);
> for (cgraph_edge *e = node->callers; e; e = e->next_caller)
> if (e->caller->thunk
> && (!set_const || e->caller->get_availability () > AVAIL_INTERPOSABLE))
> @@ -2876,6 +2879,9 @@ cgraph_node::set_pure_flag (bool pure, b
> {
> struct set_pure_flag_info info = {pure, looping, false};
> call_for_symbol_thunks_and_aliases (set_pure_flag_1, &info, !pure, true);
> + for (struct cgraph_node *n = simd_clones; n != NULL;
> + n = n->simdclone->next_clone)
> + set_pure_flag_1 (n, &info);
> return info.changed;
> }
>
> --- gcc/testsuite/gcc.c-torture/compile/pr106433.c.jj 2023-02-06 12:37:26.963748811 +0100
> +++ gcc/testsuite/gcc.c-torture/compile/pr106433.c 2023-02-06 12:37:06.631041918 +0100
> @@ -0,0 +1,24 @@
> +/* PR tree-optimization/106433 */
> +
> +int m, *p;
> +
> +__attribute__ ((simd)) int
> +bar (int x)
> +{
> + if (x)
> + {
> + if (m < 1)
> + for (m = 0; m < 1; ++m)
> + ++x;
> + p = &x;
> + for (;;)
> + ++m;
> + }
> + return 0;
> +}
> +
> +__attribute__ ((simd)) int
> +foo (int x)
> +{
> + return bar (x);
> +}
>
> Jakub
>
> Hi!
>
> The following testcase ICEs, because we determine only in late pure const
> pass that bar is const (the content of the function loses a store to a
> global var during dse3 and read from it during cddce2) and local-pure-const2
> makes it const. The cgraph ordering is that post IPA (in late IPA simd
> clones are created) bar is processed first, then foo as its caller, then
> foo.simdclone* and finally bar.simdclone*. Conceptually I think that is the
> right ordering which allows for static simd clones to be removed.
>
> The reason for the ICE is that because bar was marked const, the call to
> it lost vops before vectorization, and when we in foo.simdclone* try to
> vectorize the call to bar, we replace it with bar.simdclone* which hasn't
> been marked const and so needs vops, which we don't add.
>
> Now, because the simd clones are created from the same IL, just in a loop
> with different argument/return value passing, I think generally if the base
> function is determined to be const or pure, the simd clones should be too,
> unless e.g. the vectorization causes different optimization decisions, but
> then still the global memory reads if any shouldn't affect what the function
> does and global memory stores shouldn't be reachable at runtime.
My understanding of simd clones is bit limited, but I think you are
right that they should have the same semantics as their caller.
I think const may be one that makes compiler to ICE, but
there are many other places where function body is analyzed and all its
aliases/thunks and other variants should be updated too. For exmaple
set_pure_flag, nothrow, noreturn and analysis done by modref,
ipa-refernece etc.
I wonder if we want to update them all and hide that in some
abstraction? Next stage 1 I can work on inventing iterators for those
kind of things as current approach combinindg direct walkters and
function wrappers has become bit hard to maintain in cases like this.
Honza
>
> So, the following patch changes set_{const,pure}_flag to mark also simd
> clones.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2023-02-07 Jakub Jelinek <jakub@redhat.com>
>
> PR tree-optimization/106433
> * cgraph.cc (set_const_flag_1): Recurse on simd clones too.
> (cgraph_node::set_pure_flag): Call set_pure_flag_1 on simd clones too.
>
> * gcc.c-torture/compile/pr106433.c: New test.
>
> --- gcc/cgraph.cc.jj 2023-02-02 10:54:44.327473492 +0100
> +++ gcc/cgraph.cc 2023-02-06 12:28:22.040593063 +0100
> @@ -2764,6 +2764,9 @@ set_const_flag_1 (cgraph_node *node, boo
> if (!set_const || alias->get_availability () > AVAIL_INTERPOSABLE)
> set_const_flag_1 (alias, set_const, looping, changed);
> }
> + for (struct cgraph_node *n = node->simd_clones; n != NULL;
> + n = n->simdclone->next_clone)
> + set_const_flag_1 (n, set_const, looping, changed);
> for (cgraph_edge *e = node->callers; e; e = e->next_caller)
> if (e->caller->thunk
> && (!set_const || e->caller->get_availability () > AVAIL_INTERPOSABLE))
> @@ -2876,6 +2879,9 @@ cgraph_node::set_pure_flag (bool pure, b
> {
> struct set_pure_flag_info info = {pure, looping, false};
> call_for_symbol_thunks_and_aliases (set_pure_flag_1, &info, !pure, true);
> + for (struct cgraph_node *n = simd_clones; n != NULL;
> + n = n->simdclone->next_clone)
> + set_pure_flag_1 (n, &info);
> return info.changed;
> }
>
> --- gcc/testsuite/gcc.c-torture/compile/pr106433.c.jj 2023-02-06 12:37:26.963748811 +0100
> +++ gcc/testsuite/gcc.c-torture/compile/pr106433.c 2023-02-06 12:37:06.631041918 +0100
> @@ -0,0 +1,24 @@
> +/* PR tree-optimization/106433 */
> +
> +int m, *p;
> +
> +__attribute__ ((simd)) int
> +bar (int x)
> +{
> + if (x)
> + {
> + if (m < 1)
> + for (m = 0; m < 1; ++m)
> + ++x;
> + p = &x;
> + for (;;)
> + ++m;
> + }
> + return 0;
> +}
> +
> +__attribute__ ((simd)) int
> +foo (int x)
> +{
> + return bar (x);
> +}
>
> Jakub
>
On Wed, Feb 08, 2023 at 06:10:08PM +0100, Jan Hubicka wrote:
> My understanding of simd clones is bit limited, but I think you are
> right that they should have the same semantics as their caller.
>
> I think const may be one that makes compiler to ICE, but
> there are many other places where function body is analyzed and all its
> aliases/thunks and other variants should be updated too. For exmaple
> set_pure_flag, nothrow, noreturn and analysis done by modref,
> ipa-refernece etc.
>
> I wonder if we want to update them all and hide that in some
> abstraction? Next stage 1 I can work on inventing iterators for those
> kind of things as current approach combinindg direct walkters and
> function wrappers has become bit hard to maintain in cases like this.
I think it depends on whether we do that analysis or update it post IPA
or not. Because simd clones are created very late during IPA, if say
the nothrow, noreturn, modref etc. analysis is done only during IPA or
before it, we don't need to walk the simd clones.
It is just for late GIMPLE analysis that changes flags that later on
could be used in callers of those functions.
pure/const flag is what I know can happen this late, what else?
Jakub
> On Wed, Feb 08, 2023 at 06:10:08PM +0100, Jan Hubicka wrote:
> > My understanding of simd clones is bit limited, but I think you are
> > right that they should have the same semantics as their caller.
> >
> > I think const may be one that makes compiler to ICE, but
> > there are many other places where function body is analyzed and all its
> > aliases/thunks and other variants should be updated too. For exmaple
> > set_pure_flag, nothrow, noreturn and analysis done by modref,
> > ipa-refernece etc.
> >
> > I wonder if we want to update them all and hide that in some
> > abstraction? Next stage 1 I can work on inventing iterators for those
> > kind of things as current approach combinindg direct walkters and
> > function wrappers has become bit hard to maintain in cases like this.
>
> I think it depends on whether we do that analysis or update it post IPA
> or not. Because simd clones are created very late during IPA, if say
> the nothrow, noreturn, modref etc. analysis is done only during IPA or
> before it, we don't need to walk the simd clones.
> It is just for late GIMPLE analysis that changes flags that later on
> could be used in callers of those functions.
> pure/const flag is what I know can happen this late, what else?
We have late pure/const (doing pure, const, nothrow, noreturn), modref
(which also discovers pure/const attributes and produces its own
summaries) and except.c at the very end of copimlation can set notrow
flag...
This is all I can think of.
Honza
>
> Jakub
>
@@ -2764,6 +2764,9 @@ set_const_flag_1 (cgraph_node *node, boo
if (!set_const || alias->get_availability () > AVAIL_INTERPOSABLE)
set_const_flag_1 (alias, set_const, looping, changed);
}
+ for (struct cgraph_node *n = node->simd_clones; n != NULL;
+ n = n->simdclone->next_clone)
+ set_const_flag_1 (n, set_const, looping, changed);
for (cgraph_edge *e = node->callers; e; e = e->next_caller)
if (e->caller->thunk
&& (!set_const || e->caller->get_availability () > AVAIL_INTERPOSABLE))
@@ -2876,6 +2879,9 @@ cgraph_node::set_pure_flag (bool pure, b
{
struct set_pure_flag_info info = {pure, looping, false};
call_for_symbol_thunks_and_aliases (set_pure_flag_1, &info, !pure, true);
+ for (struct cgraph_node *n = simd_clones; n != NULL;
+ n = n->simdclone->next_clone)
+ set_pure_flag_1 (n, &info);
return info.changed;
}
@@ -0,0 +1,24 @@
+/* PR tree-optimization/106433 */
+
+int m, *p;
+
+__attribute__ ((simd)) int
+bar (int x)
+{
+ if (x)
+ {
+ if (m < 1)
+ for (m = 0; m < 1; ++m)
+ ++x;
+ p = &x;
+ for (;;)
+ ++m;
+ }
+ return 0;
+}
+
+__attribute__ ((simd)) int
+foo (int x)
+{
+ return bar (x);
+}