tree-optimization/106912 - IPA profile and pure/const
Checks
Commit Message
IPA profile instrumentation tries to clear the pure and const
flags of functions but that's quite hopeless in particular for
const since that attribute prevails on the type and thus on each
call to the function leading to inconsistencies in the IL and
eventual checking ICEs. There's no good reason to do this and
it wouldn't fixup any indirect calls so just don't. No other
instrumentation GCC does bothers about this.
Bootstrap and regtest pending on x86_64-unknown-linux-gnu, OK?
Thanks,
Richard.
PR tree-optimization/106912
* tree-profile.cc (tree_profiling): Do not clear pure/const
flags.
* gcc.dg/pr106912.c: New testcase.
---
gcc/testsuite/gcc.dg/pr106912.c | 16 ++++++++++++++++
gcc/tree-profile.cc | 3 ---
2 files changed, 16 insertions(+), 3 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/pr106912.c
Comments
> IPA profile instrumentation tries to clear the pure and const
> flags of functions but that's quite hopeless in particular for
> const since that attribute prevails on the type and thus on each
> call to the function leading to inconsistencies in the IL and
> eventual checking ICEs. There's no good reason to do this and
> it wouldn't fixup any indirect calls so just don't. No other
> instrumentation GCC does bothers about this.
This was mostly meant to deadl with situation where we auto-detect
function to be const and then partially inline it to a loop.
Then both caller and callee accesses same counters in the memory and if
you move load/stores out of the loop in caller you lose counters and get
inconsistencies at profile read-in time.
Honza
>
> Bootstrap and regtest pending on x86_64-unknown-linux-gnu, OK?
>
> Thanks,
> Richard.
>
> PR tree-optimization/106912
> * tree-profile.cc (tree_profiling): Do not clear pure/const
> flags.
>
> * gcc.dg/pr106912.c: New testcase.
> ---
> gcc/testsuite/gcc.dg/pr106912.c | 16 ++++++++++++++++
> gcc/tree-profile.cc | 3 ---
> 2 files changed, 16 insertions(+), 3 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/pr106912.c
>
> diff --git a/gcc/testsuite/gcc.dg/pr106912.c b/gcc/testsuite/gcc.dg/pr106912.c
> new file mode 100644
> index 00000000000..8faa877d8b3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr106912.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fPIC -ftree-vectorize -fprofile-generate" } */
> +
> +__attribute__ ((__simd__))
> +__attribute__ ((__nothrow__ , __leaf__ , __const__))
> +double foo (double x);
> +void bar(double *f, int n)
> +{
> + int i;
> + for (i = 0; i < n; i++)
> + f[i] = foo(f[i]);
> +}
> +double foo(double x)
> +{
> + return x * x / 3.0;
> +}
> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> index 2beb49241f2..5491b398870 100644
> --- a/gcc/tree-profile.cc
> +++ b/gcc/tree-profile.cc
> @@ -814,9 +814,6 @@ tree_profiling (void)
> /* Don't profile functions produced for builtin stuff. */
> if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION)
> continue;
> -
> - node->set_const_flag (false, false);
> - node->set_pure_flag (false, false);
> }
>
> /* Update call statements and rebuild the cgraph. */
> --
> 2.35.3
> Am 25.11.2022 um 11:05 schrieb Jan Hubicka via Gcc-patches <gcc-patches@gcc.gnu.org>:
>
>
>>
>> IPA profile instrumentation tries to clear the pure and const
>> flags of functions but that's quite hopeless in particular for
>> const since that attribute prevails on the type and thus on each
>> call to the function leading to inconsistencies in the IL and
>> eventual checking ICEs. There's no good reason to do this and
>> it wouldn't fixup any indirect calls so just don't. No other
>> instrumentation GCC does bothers about this.
>
> This was mostly meant to deadl with situation where we auto-detect
> function to be const and then partially inline it to a loop.
> Then both caller and callee accesses same counters in the memory and if
> you move load/stores out of the loop in caller you lose counters and get
> inconsistencies at profile read-in time.
Don’t we Instrument after partial inlining now? As said, since we have the fntype on the call this doesn’t work anymore for const functions via attributes.
Richard
> Honza
>>
>> Bootstrap and regtest pending on x86_64-unknown-linux-gnu, OK?
>>
>> Thanks,
>> Richard.
>>
>> PR tree-optimization/106912
>> * tree-profile.cc (tree_profiling): Do not clear pure/const
>> flags.
>>
>> * gcc.dg/pr106912.c: New testcase.
>> ---
>> gcc/testsuite/gcc.dg/pr106912.c | 16 ++++++++++++++++
>> gcc/tree-profile.cc | 3 ---
>> 2 files changed, 16 insertions(+), 3 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.dg/pr106912.c
>>
>> diff --git a/gcc/testsuite/gcc.dg/pr106912.c b/gcc/testsuite/gcc.dg/pr106912.c
>> new file mode 100644
>> index 00000000000..8faa877d8b3
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/pr106912.c
>> @@ -0,0 +1,16 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fPIC -ftree-vectorize -fprofile-generate" } */
>> +
>> +__attribute__ ((__simd__))
>> +__attribute__ ((__nothrow__ , __leaf__ , __const__))
>> +double foo (double x);
>> +void bar(double *f, int n)
>> +{
>> + int i;
>> + for (i = 0; i < n; i++)
>> + f[i] = foo(f[i]);
>> +}
>> +double foo(double x)
>> +{
>> + return x * x / 3.0;
>> +}
>> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
>> index 2beb49241f2..5491b398870 100644
>> --- a/gcc/tree-profile.cc
>> +++ b/gcc/tree-profile.cc
>> @@ -814,9 +814,6 @@ tree_profiling (void)
>> /* Don't profile functions produced for builtin stuff. */
>> if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION)
>> continue;
>> -
>> - node->set_const_flag (false, false);
>> - node->set_pure_flag (false, false);
>> }
>>
>> /* Update call statements and rebuild the cgraph. */
>> --
>> 2.35.3
>
>
> > Am 25.11.2022 um 11:05 schrieb Jan Hubicka via Gcc-patches <gcc-patches@gcc.gnu.org>:
> >
> >
> >>
> >> IPA profile instrumentation tries to clear the pure and const
> >> flags of functions but that's quite hopeless in particular for
> >> const since that attribute prevails on the type and thus on each
> >> call to the function leading to inconsistencies in the IL and
> >> eventual checking ICEs. There's no good reason to do this and
> >> it wouldn't fixup any indirect calls so just don't. No other
> >> instrumentation GCC does bothers about this.
> >
> > This was mostly meant to deadl with situation where we auto-detect
> > function to be const and then partially inline it to a loop.
> > Then both caller and callee accesses same counters in the memory and if
> > you move load/stores out of the loop in caller you lose counters and get
> > inconsistencies at profile read-in time.
>
> Don’t we Instrument after partial inlining now? As said, since we have the fntype on the call this doesn’t work anymore for const functions via attributes.
Full inlining can cause problem already. So for code like
do
{
if (__builtin_expect (test,1))
a+=foo (a);
else
a+=foo (b);
} while (....);
we may end up inlining one of the two invocation. Then caller and callee
will modify the same counter. If we handle the remaining call as const,
we can hoist the counter modifications out of the loop and mix them up.
I remember I run into an actual example of this problem during GCC
bootstrap. There the function was auto-detected to be const by
early pure-const pass so type was not an problem. You are right we ought
to do something about types since the scenario above can happen with foo
being declared with an attribute as well.
Honza
>
> Richard
> > Honza
> >>
> >> Bootstrap and regtest pending on x86_64-unknown-linux-gnu, OK?
> >>
> >> Thanks,
> >> Richard.
> >>
> >> PR tree-optimization/106912
> >> * tree-profile.cc (tree_profiling): Do not clear pure/const
> >> flags.
> >>
> >> * gcc.dg/pr106912.c: New testcase.
> >> ---
> >> gcc/testsuite/gcc.dg/pr106912.c | 16 ++++++++++++++++
> >> gcc/tree-profile.cc | 3 ---
> >> 2 files changed, 16 insertions(+), 3 deletions(-)
> >> create mode 100644 gcc/testsuite/gcc.dg/pr106912.c
> >>
> >> diff --git a/gcc/testsuite/gcc.dg/pr106912.c b/gcc/testsuite/gcc.dg/pr106912.c
> >> new file mode 100644
> >> index 00000000000..8faa877d8b3
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/pr106912.c
> >> @@ -0,0 +1,16 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -fPIC -ftree-vectorize -fprofile-generate" } */
> >> +
> >> +__attribute__ ((__simd__))
> >> +__attribute__ ((__nothrow__ , __leaf__ , __const__))
> >> +double foo (double x);
> >> +void bar(double *f, int n)
> >> +{
> >> + int i;
> >> + for (i = 0; i < n; i++)
> >> + f[i] = foo(f[i]);
> >> +}
> >> +double foo(double x)
> >> +{
> >> + return x * x / 3.0;
> >> +}
> >> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> >> index 2beb49241f2..5491b398870 100644
> >> --- a/gcc/tree-profile.cc
> >> +++ b/gcc/tree-profile.cc
> >> @@ -814,9 +814,6 @@ tree_profiling (void)
> >> /* Don't profile functions produced for builtin stuff. */
> >> if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION)
> >> continue;
> >> -
> >> - node->set_const_flag (false, false);
> >> - node->set_pure_flag (false, false);
> >> }
> >>
> >> /* Update call statements and rebuild the cgraph. */
> >> --
> >> 2.35.3
On Fri, 25 Nov 2022, Jan Hubicka wrote:
> >
> >
> > > Am 25.11.2022 um 11:05 schrieb Jan Hubicka via Gcc-patches <gcc-patches@gcc.gnu.org>:
> > >
> > >
> > >>
> > >> IPA profile instrumentation tries to clear the pure and const
> > >> flags of functions but that's quite hopeless in particular for
> > >> const since that attribute prevails on the type and thus on each
> > >> call to the function leading to inconsistencies in the IL and
> > >> eventual checking ICEs. There's no good reason to do this and
> > >> it wouldn't fixup any indirect calls so just don't. No other
> > >> instrumentation GCC does bothers about this.
> > >
> > > This was mostly meant to deadl with situation where we auto-detect
> > > function to be const and then partially inline it to a loop.
> > > Then both caller and callee accesses same counters in the memory and if
> > > you move load/stores out of the loop in caller you lose counters and get
> > > inconsistencies at profile read-in time.
> >
> > Don?t we Instrument after partial inlining now? As said, since we have the fntype on the call this doesn?t work anymore for const functions via attributes.
>
> Full inlining can cause problem already. So for code like
>
> do
> {
> if (__builtin_expect (test,1))
> a+=foo (a);
> else
> a+=foo (b);
> } while (....);
> we may end up inlining one of the two invocation. Then caller and callee
> will modify the same counter. If we handle the remaining call as const,
> we can hoist the counter modifications out of the loop and mix them up.
>
> I remember I run into an actual example of this problem during GCC
> bootstrap. There the function was auto-detected to be const by
> early pure-const pass so type was not an problem. You are right we ought
> to do something about types since the scenario above can happen with foo
> being declared with an attribute as well.
Or by doing the first call as
volatile int __attribute__((const)) (*foop)(int) = foo;
a += (*foop) (a);
you'd need to get all calls that might possibly call an instrumented
function adjusted.
I think if we're taking this issue serious we'd have to simply
ignore const/pure attributes at parsing time and refrain from
auto-detecting as well, at least for address-taken functions?
That said, this adjustment in the "wrong" direction causes problems
downstream, which is what the fixed PR is about (simd cloning picks
up the wrong things, or rather possibly fails to clone the attributes?).
Richard.
> Honza
> >
> > Richard
> > > Honza
> > >>
> > >> Bootstrap and regtest pending on x86_64-unknown-linux-gnu, OK?
> > >>
> > >> Thanks,
> > >> Richard.
> > >>
> > >> PR tree-optimization/106912
> > >> * tree-profile.cc (tree_profiling): Do not clear pure/const
> > >> flags.
> > >>
> > >> * gcc.dg/pr106912.c: New testcase.
> > >> ---
> > >> gcc/testsuite/gcc.dg/pr106912.c | 16 ++++++++++++++++
> > >> gcc/tree-profile.cc | 3 ---
> > >> 2 files changed, 16 insertions(+), 3 deletions(-)
> > >> create mode 100644 gcc/testsuite/gcc.dg/pr106912.c
> > >>
> > >> diff --git a/gcc/testsuite/gcc.dg/pr106912.c b/gcc/testsuite/gcc.dg/pr106912.c
> > >> new file mode 100644
> > >> index 00000000000..8faa877d8b3
> > >> --- /dev/null
> > >> +++ b/gcc/testsuite/gcc.dg/pr106912.c
> > >> @@ -0,0 +1,16 @@
> > >> +/* { dg-do compile } */
> > >> +/* { dg-options "-O2 -fPIC -ftree-vectorize -fprofile-generate" } */
> > >> +
> > >> +__attribute__ ((__simd__))
> > >> +__attribute__ ((__nothrow__ , __leaf__ , __const__))
> > >> +double foo (double x);
> > >> +void bar(double *f, int n)
> > >> +{
> > >> + int i;
> > >> + for (i = 0; i < n; i++)
> > >> + f[i] = foo(f[i]);
> > >> +}
> > >> +double foo(double x)
> > >> +{
> > >> + return x * x / 3.0;
> > >> +}
> > >> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> > >> index 2beb49241f2..5491b398870 100644
> > >> --- a/gcc/tree-profile.cc
> > >> +++ b/gcc/tree-profile.cc
> > >> @@ -814,9 +814,6 @@ tree_profiling (void)
> > >> /* Don't profile functions produced for builtin stuff. */
> > >> if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION)
> > >> continue;
> > >> -
> > >> - node->set_const_flag (false, false);
> > >> - node->set_pure_flag (false, false);
> > >> }
> > >>
> > >> /* Update call statements and rebuild the cgraph. */
> > >> --
> > >> 2.35.3
>
> On Fri, 25 Nov 2022, Jan Hubicka wrote:
>
> > >
> > >
> > > > Am 25.11.2022 um 11:05 schrieb Jan Hubicka via Gcc-patches <gcc-patches@gcc.gnu.org>:
> > > >
> > > >
> > > >>
> > > >> IPA profile instrumentation tries to clear the pure and const
> > > >> flags of functions but that's quite hopeless in particular for
> > > >> const since that attribute prevails on the type and thus on each
> > > >> call to the function leading to inconsistencies in the IL and
> > > >> eventual checking ICEs. There's no good reason to do this and
> > > >> it wouldn't fixup any indirect calls so just don't. No other
> > > >> instrumentation GCC does bothers about this.
> > > >
> > > > This was mostly meant to deadl with situation where we auto-detect
> > > > function to be const and then partially inline it to a loop.
> > > > Then both caller and callee accesses same counters in the memory and if
> > > > you move load/stores out of the loop in caller you lose counters and get
> > > > inconsistencies at profile read-in time.
> > >
> > > Don?t we Instrument after partial inlining now? As said, since we have the fntype on the call this doesn?t work anymore for const functions via attributes.
> >
> > Full inlining can cause problem already. So for code like
> >
> > do
> > {
> > if (__builtin_expect (test,1))
> > a+=foo (a);
> > else
> > a+=foo (b);
> > } while (....);
> > we may end up inlining one of the two invocation. Then caller and callee
> > will modify the same counter. If we handle the remaining call as const,
> > we can hoist the counter modifications out of the loop and mix them up.
> >
> > I remember I run into an actual example of this problem during GCC
> > bootstrap. There the function was auto-detected to be const by
> > early pure-const pass so type was not an problem. You are right we ought
> > to do something about types since the scenario above can happen with foo
> > being declared with an attribute as well.
>
> Or by doing the first call as
>
> volatile int __attribute__((const)) (*foop)(int) = foo;
>
> a += (*foop) (a);
>
> you'd need to get all calls that might possibly call an instrumented
> function adjusted.
>
> I think if we're taking this issue serious we'd have to simply
> ignore const/pure attributes at parsing time and refrain from
> auto-detecting as well, at least for address-taken functions?
I think that is also not a good idea, since we would have to do that
with -fprofile-use, too (so the CFG at the instrumentation time
matches) and it would lead to poor perofrmance with FDO.
The idea was to honor pure/const during early opt and undo the
attributes when profiling is inserted.
We have fixup_cfg to compensate for attribute changes. We could
probably keep tract if any instrumented code was ever inlined into a
given function and perhaps just start ignoring attributes set on types?
Honza
>
> That said, this adjustment in the "wrong" direction causes problems
> downstream, which is what the fixed PR is about (simd cloning picks
> up the wrong things, or rather possibly fails to clone the attributes?).
>
> Richard.
>
> > Honza
> > >
> > > Richard
> > > > Honza
> > > >>
> > > >> Bootstrap and regtest pending on x86_64-unknown-linux-gnu, OK?
> > > >>
> > > >> Thanks,
> > > >> Richard.
> > > >>
> > > >> PR tree-optimization/106912
> > > >> * tree-profile.cc (tree_profiling): Do not clear pure/const
> > > >> flags.
> > > >>
> > > >> * gcc.dg/pr106912.c: New testcase.
> > > >> ---
> > > >> gcc/testsuite/gcc.dg/pr106912.c | 16 ++++++++++++++++
> > > >> gcc/tree-profile.cc | 3 ---
> > > >> 2 files changed, 16 insertions(+), 3 deletions(-)
> > > >> create mode 100644 gcc/testsuite/gcc.dg/pr106912.c
> > > >>
> > > >> diff --git a/gcc/testsuite/gcc.dg/pr106912.c b/gcc/testsuite/gcc.dg/pr106912.c
> > > >> new file mode 100644
> > > >> index 00000000000..8faa877d8b3
> > > >> --- /dev/null
> > > >> +++ b/gcc/testsuite/gcc.dg/pr106912.c
> > > >> @@ -0,0 +1,16 @@
> > > >> +/* { dg-do compile } */
> > > >> +/* { dg-options "-O2 -fPIC -ftree-vectorize -fprofile-generate" } */
> > > >> +
> > > >> +__attribute__ ((__simd__))
> > > >> +__attribute__ ((__nothrow__ , __leaf__ , __const__))
> > > >> +double foo (double x);
> > > >> +void bar(double *f, int n)
> > > >> +{
> > > >> + int i;
> > > >> + for (i = 0; i < n; i++)
> > > >> + f[i] = foo(f[i]);
> > > >> +}
> > > >> +double foo(double x)
> > > >> +{
> > > >> + return x * x / 3.0;
> > > >> +}
> > > >> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> > > >> index 2beb49241f2..5491b398870 100644
> > > >> --- a/gcc/tree-profile.cc
> > > >> +++ b/gcc/tree-profile.cc
> > > >> @@ -814,9 +814,6 @@ tree_profiling (void)
> > > >> /* Don't profile functions produced for builtin stuff. */
> > > >> if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION)
> > > >> continue;
> > > >> -
> > > >> - node->set_const_flag (false, false);
> > > >> - node->set_pure_flag (false, false);
> > > >> }
> > > >>
> > > >> /* Update call statements and rebuild the cgraph. */
> > > >> --
> > > >> 2.35.3
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
> HRB 36809 (AG Nuernberg)
On Fri, 25 Nov 2022, Jan Hubicka wrote:
>> On Fri, 25 Nov 2022, Jan Hubicka wrote:
>>
>>>>
>>>>
>>>>> Am 25.11.2022 um 11:05 schrieb Jan Hubicka via Gcc-patches <gcc-patches@gcc.gnu.org>:
>>>>>
>>>>>
>>>>>>
>>>>>> IPA profile instrumentation tries to clear the pure and const
>>>>>> flags of functions but that's quite hopeless in particular for
>>>>>> const since that attribute prevails on the type and thus on each
>>>>>> call to the function leading to inconsistencies in the IL and
>>>>>> eventual checking ICEs. There's no good reason to do this and
>>>>>> it wouldn't fixup any indirect calls so just don't. No other
>>>>>> instrumentation GCC does bothers about this.
>>>>>
>>>>> This was mostly meant to deadl with situation where we auto-detect
>>>>> function to be const and then partially inline it to a loop.
>>>>> Then both caller and callee accesses same counters in the memory and if
>>>>> you move load/stores out of the loop in caller you lose counters and get
>>>>> inconsistencies at profile read-in time.
>>>>
>>>> Don?t we Instrument after partial inlining now? As said, since we have the fntype on the call this doesn?t work anymore for const functions via attributes.
>>>
>>> Full inlining can cause problem already. So for code like
>>>
>>> do
>>> {
>>> if (__builtin_expect (test,1))
>>> a+=foo (a);
>>> else
>>> a+=foo (b);
>>> } while (....);
>>> we may end up inlining one of the two invocation. Then caller and callee
>>> will modify the same counter. If we handle the remaining call as const,
>>> we can hoist the counter modifications out of the loop and mix them up.
>>>
>>> I remember I run into an actual example of this problem during GCC
>>> bootstrap. There the function was auto-detected to be const by
>>> early pure-const pass so type was not an problem. You are right we ought
>>> to do something about types since the scenario above can happen with foo
>>> being declared with an attribute as well.
>>
>> Or by doing the first call as
>>
>> volatile int __attribute__((const)) (*foop)(int) = foo;
>>
>> a += (*foop) (a);
>>
>> you'd need to get all calls that might possibly call an instrumented
>> function adjusted.
>>
>> I think if we're taking this issue serious we'd have to simply
>> ignore const/pure attributes at parsing time and refrain from
>> auto-detecting as well, at least for address-taken functions?
>
> I think that is also not a good idea, since we would have to do that
> with -fprofile-use, too (so the CFG at the instrumentation time
> matches) and it would lead to poor perofrmance with FDO.
Hmm, possibly, yes.
> The idea was to honor pure/const during early opt and undo the
> attributes when profiling is inserted.
> We have fixup_cfg to compensate for attribute changes.
Which works for pure (it's not an attribute on types) but fails for
const (which is). That's a bug btw.
> We could
> probably keep tract if any instrumented code was ever inlined into a
> given function and perhaps just start ignoring attributes set on types?
But ignoring attributes on types makes all indirect calls not
const/pure annotatable (OK, they are not pure annotatable because of
the above bug). I really don't see how to conservatively solve this
issue? Maybe we can ignore all pure/const when the cgraph state is
in profile-instrumented state? Of course we have multiple "APIs"
to query that.
Richard.
> Honza
>>
>> That said, this adjustment in the "wrong" direction causes problems
>> downstream, which is what the fixed PR is about (simd cloning picks
>> up the wrong things, or rather possibly fails to clone the attributes?).
>>
>> Richard.
>>
>>> Honza
>>>>
>>>> Richard
>>>>> Honza
>>>>>>
>>>>>> Bootstrap and regtest pending on x86_64-unknown-linux-gnu, OK?
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>> PR tree-optimization/106912
>>>>>> * tree-profile.cc (tree_profiling): Do not clear pure/const
>>>>>> flags.
>>>>>>
>>>>>> * gcc.dg/pr106912.c: New testcase.
>>>>>> ---
>>>>>> gcc/testsuite/gcc.dg/pr106912.c | 16 ++++++++++++++++
>>>>>> gcc/tree-profile.cc | 3 ---
>>>>>> 2 files changed, 16 insertions(+), 3 deletions(-)
>>>>>> create mode 100644 gcc/testsuite/gcc.dg/pr106912.c
>>>>>>
>>>>>> diff --git a/gcc/testsuite/gcc.dg/pr106912.c b/gcc/testsuite/gcc.dg/pr106912.c
>>>>>> new file mode 100644
>>>>>> index 00000000000..8faa877d8b3
>>>>>> --- /dev/null
>>>>>> +++ b/gcc/testsuite/gcc.dg/pr106912.c
>>>>>> @@ -0,0 +1,16 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fPIC -ftree-vectorize -fprofile-generate" } */
>>>>>> +
>>>>>> +__attribute__ ((__simd__))
>>>>>> +__attribute__ ((__nothrow__ , __leaf__ , __const__))
>>>>>> +double foo (double x);
>>>>>> +void bar(double *f, int n)
>>>>>> +{
>>>>>> + int i;
>>>>>> + for (i = 0; i < n; i++)
>>>>>> + f[i] = foo(f[i]);
>>>>>> +}
>>>>>> +double foo(double x)
>>>>>> +{
>>>>>> + return x * x / 3.0;
>>>>>> +}
>>>>>> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
>>>>>> index 2beb49241f2..5491b398870 100644
>>>>>> --- a/gcc/tree-profile.cc
>>>>>> +++ b/gcc/tree-profile.cc
>>>>>> @@ -814,9 +814,6 @@ tree_profiling (void)
>>>>>> /* Don't profile functions produced for builtin stuff. */
>>>>>> if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION)
>>>>>> continue;
>>>>>> -
>>>>>> - node->set_const_flag (false, false);
>>>>>> - node->set_pure_flag (false, false);
>>>>>> }
>>>>>>
>>>>>> /* Update call statements and rebuild the cgraph. */
>>>>>> --
>>>>>> 2.35.3
>>>
>>
>> --
>> Richard Biener <rguenther@suse.de>
>> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
>> Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
>> HRB 36809 (AG Nuernberg)
>
>
On Fri, Nov 25, 2022 at 09:26:34PM +0100, Richard Biener via Gcc-patches wrote:
> > We could
> > probably keep tract if any instrumented code was ever inlined into a
> > given function and perhaps just start ignoring attributes set on types?
>
> But ignoring attributes on types makes all indirect calls not
> const/pure annotatable (OK, they are not pure annotatable because of
> the above bug). I really don't see how to conservatively solve this
> issue?
> Maybe we can ignore all pure/const when the cgraph state is
> in profile-instrumented state? Of course we have multiple "APIs"
> to query that.
I think that's the way to go. But we'd need to arrange somewhere during IPA
to add vops to all those pure/const calls if -fprofile-generate (direct or
indirect; not sure what exact flags) and after that make sure all our APIs
don't return ECF_CONST, ECF_PURE or ECF_LOOPING_CONST_OR_PURE.
Could be e.g.
if (whatever)
flags &= ~(ECF_CONST | ECF_PURE | ECF_LOOPING_CONST_OR_PURE);
at the end of flags_from_decl_or_type, internal_fn_flags, what else?
Although, perhaps internal_fn_flags don't need to change, because internal
calls don't really have callees.
Or is just ECF_CONST enough given that ECF_PURE is on fndecls and not
types?
Or perhaps just start ignoring TYPE_READONLY -> ECF_CONST in
flags_from_decl_or_type if that flag is on?
Is that rewriting currently what tree_profiling does in the
/* Update call statements and rebuild the cgraph. */
FOR_EACH_DEFINED_FUNCTION (node)
spot where it calls update_stmt on all call statements?
If so, could we just set that global? flag instead or in addition to
doing those node->set_const_flag (false, false); calls and
change flags_from_decl_or_type, plus I guess lto1 should set that
flag if it is global on start as well if
!flag_auto_profile
&& (flag_branch_probabilities || flag_test_coverage || profile_arc_flag)
?
I think just ignoring ECF_CONST from TYPE_READONLY might be best, if we
have external functions like builtins from libc/libm and don't have their
bodies, we can still treat them as const, the only problem is with the
indirect calls where we don't know if we do or don't have a body for
the callees and whether we've instrumented those or not.
Jakub
On Thu, 16 Mar 2023, Jakub Jelinek wrote:
> On Fri, Nov 25, 2022 at 09:26:34PM +0100, Richard Biener via Gcc-patches wrote:
> > > We could
> > > probably keep tract if any instrumented code was ever inlined into a
> > > given function and perhaps just start ignoring attributes set on types?
> >
> > But ignoring attributes on types makes all indirect calls not
> > const/pure annotatable (OK, they are not pure annotatable because of
> > the above bug). I really don't see how to conservatively solve this
> > issue?
>
> > Maybe we can ignore all pure/const when the cgraph state is
> > in profile-instrumented state? Of course we have multiple "APIs"
> > to query that.
>
> I think that's the way to go. But we'd need to arrange somewhere during IPA
> to add vops to all those pure/const calls if -fprofile-generate (direct or
> indirect; not sure what exact flags) and after that make sure all our APIs
> don't return ECF_CONST, ECF_PURE or ECF_LOOPING_CONST_OR_PURE.
> Could be e.g.
> if (whatever)
> flags &= ~(ECF_CONST | ECF_PURE | ECF_LOOPING_CONST_OR_PURE);
> at the end of flags_from_decl_or_type, internal_fn_flags, what else?
> Although, perhaps internal_fn_flags don't need to change, because internal
> calls don't really have callees.
>
> Or is just ECF_CONST enough given that ECF_PURE is on fndecls and not
> types?
>
> Or perhaps just start ignoring TYPE_READONLY -> ECF_CONST in
> flags_from_decl_or_type if that flag is on?
>
> Is that rewriting currently what tree_profiling does in the
> /* Update call statements and rebuild the cgraph. */
> FOR_EACH_DEFINED_FUNCTION (node)
> spot where it calls update_stmt on all call statements?
>
> If so, could we just set that global? flag instead or in addition to
> doing those node->set_const_flag (false, false); calls and
> change flags_from_decl_or_type, plus I guess lto1 should set that
> flag if it is global on start as well if
> !flag_auto_profile
> && (flag_branch_probabilities || flag_test_coverage || profile_arc_flag)
> ?
>
> I think just ignoring ECF_CONST from TYPE_READONLY might be best, if we
> have external functions like builtins from libc/libm and don't have their
> bodies, we can still treat them as const, the only problem is with the
> indirect calls where we don't know if we do or don't have a body for
> the callees and whether we've instrumented those or not.
I think we want something reflected on the IL. Because of LTO I think
we cannot ignore externs (or we have to do massaging at stream-in).
The following brute-force works for the testcase. I suppose since
we leave const/pure set on functions without body in the compile-time
TU (and ignore LTO there) we could do the same here.
I also wonder if that whole function walk is necessary if not
profile_arc_flag || flag_test_coverage ...
Honza - does this look OK to you? I'll test guarding the whole
outer loop with profile_arc_flag || flag_test_coverage separately.
diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
index ea9d7a23443..c8789618f96 100644
--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -840,9 +840,29 @@ tree_profiling (void)
gimple_stmt_iterator gsi;
for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
{
- gimple *stmt = gsi_stmt (gsi);
- if (is_gimple_call (stmt))
- update_stmt (stmt);
+ gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
+ if (!call)
+ continue;
+
+ if (profile_arc_flag || flag_test_coverage)
+ {
+ /* We do not clear pure/const on decls without body. */
+ tree fndecl = gimple_call_fndecl (call);
+ if (fndecl && !gimple_has_body_p (fndecl))
+ continue;
+
+ /* Drop the const attribute from the call type (the pure
+ attribute is not available on types). */
+ tree fntype = gimple_call_fntype (call);
+ if (fntype && TYPE_READONLY (fntype))
+ gimple_call_set_fntype (call, build_qualified_type
+ (fntype, (TYPE_QUALS
(fntype)
+ &
~TYPE_QUAL_CONST)));
+ }
+
+ /* Update virtual operands of calls to no longer const/pure
+ functions. */
+ update_stmt (call);
}
}
On Thu, Mar 16, 2023 at 12:05:56PM +0000, Richard Biener wrote:
> On Thu, 16 Mar 2023, Jakub Jelinek wrote:
>
> > On Fri, Nov 25, 2022 at 09:26:34PM +0100, Richard Biener via Gcc-patches wrote:
> > > > We could
> > > > probably keep tract if any instrumented code was ever inlined into a
> > > > given function and perhaps just start ignoring attributes set on types?
> > >
> > > But ignoring attributes on types makes all indirect calls not
> > > const/pure annotatable (OK, they are not pure annotatable because of
> > > the above bug). I really don't see how to conservatively solve this
> > > issue?
> >
> > > Maybe we can ignore all pure/const when the cgraph state is
> > > in profile-instrumented state? Of course we have multiple "APIs"
> > > to query that.
> >
> > I think that's the way to go. But we'd need to arrange somewhere during IPA
> > to add vops to all those pure/const calls if -fprofile-generate (direct or
> > indirect; not sure what exact flags) and after that make sure all our APIs
> > don't return ECF_CONST, ECF_PURE or ECF_LOOPING_CONST_OR_PURE.
> > Could be e.g.
> > if (whatever)
> > flags &= ~(ECF_CONST | ECF_PURE | ECF_LOOPING_CONST_OR_PURE);
> > at the end of flags_from_decl_or_type, internal_fn_flags, what else?
> > Although, perhaps internal_fn_flags don't need to change, because internal
> > calls don't really have callees.
> >
> > Or is just ECF_CONST enough given that ECF_PURE is on fndecls and not
> > types?
> >
> > Or perhaps just start ignoring TYPE_READONLY -> ECF_CONST in
> > flags_from_decl_or_type if that flag is on?
> >
> > Is that rewriting currently what tree_profiling does in the
> > /* Update call statements and rebuild the cgraph. */
> > FOR_EACH_DEFINED_FUNCTION (node)
> > spot where it calls update_stmt on all call statements?
> >
> > If so, could we just set that global? flag instead or in addition to
> > doing those node->set_const_flag (false, false); calls and
> > change flags_from_decl_or_type, plus I guess lto1 should set that
> > flag if it is global on start as well if
> > !flag_auto_profile
> > && (flag_branch_probabilities || flag_test_coverage || profile_arc_flag)
> > ?
> >
> > I think just ignoring ECF_CONST from TYPE_READONLY might be best, if we
> > have external functions like builtins from libc/libm and don't have their
> > bodies, we can still treat them as const, the only problem is with the
> > indirect calls where we don't know if we do or don't have a body for
> > the callees and whether we've instrumented those or not.
>
> I think we want something reflected on the IL. Because of LTO I think
> we cannot ignore externs (or we have to do massaging at stream-in).
>
> The following brute-force works for the testcase. I suppose since
> we leave const/pure set on functions without body in the compile-time
> TU (and ignore LTO there) we could do the same here.
>
> I also wonder if that whole function walk is necessary if not
> profile_arc_flag || flag_test_coverage ...
>
> Honza - does this look OK to you? I'll test guarding the whole
> outer loop with profile_arc_flag || flag_test_coverage separately.
>
> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> index ea9d7a23443..c8789618f96 100644
> --- a/gcc/tree-profile.cc
> +++ b/gcc/tree-profile.cc
> @@ -840,9 +840,29 @@ tree_profiling (void)
> gimple_stmt_iterator gsi;
> for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> {
> - gimple *stmt = gsi_stmt (gsi);
> - if (is_gimple_call (stmt))
> - update_stmt (stmt);
> + gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
> + if (!call)
> + continue;
> +
> + if (profile_arc_flag || flag_test_coverage)
> + {
> + /* We do not clear pure/const on decls without body. */
> + tree fndecl = gimple_call_fndecl (call);
> + if (fndecl && !gimple_has_body_p (fndecl))
> + continue;
> +
> + /* Drop the const attribute from the call type (the pure
> + attribute is not available on types). */
> + tree fntype = gimple_call_fntype (call);
> + if (fntype && TYPE_READONLY (fntype))
> + gimple_call_set_fntype (call, build_qualified_type
> + (fntype, (TYPE_QUALS
> (fntype)
> + &
> ~TYPE_QUAL_CONST)));
> + }
> +
> + /* Update virtual operands of calls to no longer const/pure
> + functions. */
> + update_stmt (call);
> }
> }
I have in the meantime briefly tested following.
But if you want to the above way, then at least the testcase could be
useful. Though, not sure if the above is all that is needed. Shouldn't
set_const_flag_1 upon TREE_READONLY (node->decl) = 0; also adjust
TREE_TYPE on the function to
tree fntype = TREE_TYPE (node->decl);
TREE_TYPE (node->decl)
= build_qualified_type (fntype,
TYPE_QUALS (fntype) & ~TYPE_QUAL_CONST);
too (perhaps guarded on TREE_READONLY (fntype))?
2023-03-16 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/106912
gcc/
* calls.h (ignore_const_fntype): Declare.
* calls.cc (flags_from_decl_or_type): Ignore TYPE_READONLY
on types if ignore_const_fntype.
* tree-profile.cc: Include calls.h.
(tree_profiling): Set ignore_const_fntype if profile_arc_flag
or flag_test_coverage.
gcc/lto/
* lto-lang.cc (lto_post_options): Set ignore_const_fntype if
profile_arc_flag or flag_test_coverage and not flag_auto_profile.
gcc/testsuite/
* gcc.dg/pr106912.c: New test.
--- gcc/calls.h.jj 2023-01-02 09:32:51.252868185 +0100
+++ gcc/calls.h 2023-03-16 12:23:51.632460586 +0100
@@ -134,5 +134,6 @@ extern void maybe_complain_about_tail_ca
extern rtx rtx_for_static_chain (const_tree, bool);
extern bool cxx17_empty_base_field_p (const_tree);
+extern bool ignore_const_fntype;
#endif // GCC_CALLS_H
--- gcc/calls.cc.jj 2023-02-21 11:44:48.460464845 +0100
+++ gcc/calls.cc 2023-03-16 12:27:45.427032110 +0100
@@ -800,6 +800,13 @@ is_tm_builtin (const_tree fndecl)
return false;
}
+/* Set if flags_from_decl_or_type should ignore TYPE_READONLY of function
+ types. This is used when tree-profile.cc instruments const calls,
+ clears TREE_READONLY on FUNCTION_DECLs which have been instrumented, but
+ for function types or indirect calls we don't know if the callees have been
+ instrumented or not. */
+bool ignore_const_fntype;
+
/* Detect flags (function attributes) from the function decl or type node. */
int
@@ -849,7 +856,7 @@ flags_from_decl_or_type (const_tree exp)
}
else if (TYPE_P (exp))
{
- if (TYPE_READONLY (exp))
+ if (TYPE_READONLY (exp) && !ignore_const_fntype)
flags |= ECF_CONST;
if (flag_tm
--- gcc/tree-profile.cc.jj 2023-01-02 09:32:27.923205268 +0100
+++ gcc/tree-profile.cc 2023-03-16 12:57:29.130873893 +0100
@@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.
#include "alloc-pool.h"
#include "symbol-summary.h"
#include "symtab-thunks.h"
+#include "calls.h"
static GTY(()) tree gcov_type_node;
static GTY(()) tree tree_interval_profiler_fn;
@@ -819,6 +820,10 @@ tree_profiling (void)
node->set_pure_flag (false, false);
}
+ /* From this point on, ignore TYPE_READONLY on function types. */
+ if (profile_arc_flag || flag_test_coverage)
+ ignore_const_fntype = true;
+
/* Update call statements and rebuild the cgraph. */
FOR_EACH_DEFINED_FUNCTION (node)
{
--- gcc/lto/lto-lang.cc.jj 2023-01-16 11:52:16.165732856 +0100
+++ gcc/lto/lto-lang.cc 2023-03-16 12:58:00.420414840 +0100
@@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.
#include "lto-common.h"
#include "stringpool.h"
#include "attribs.h"
+#include "calls.h"
/* LTO specific dumps. */
int lto_link_dump_id, decl_merge_dump_id, partition_dump_id;
@@ -938,6 +939,11 @@ lto_post_options (const char **pfilename
if (!flag_merge_constants)
flag_merge_constants = 1;
+ /* If const functions were or might have been instrumented by
+ tree-profiler, ignore TYPE_READONLY on function types. */
+ if (!flag_auto_profile && (profile_arc_flag || flag_test_coverage))
+ ignore_const_fntype = true;
+
/* Initialize the compiler back end. */
return false;
}
--- gcc/testsuite/gcc.dg/pr106912.c.jj 2023-03-16 13:02:17.720639903 +0100
+++ gcc/testsuite/gcc.dg/pr106912.c 2023-03-16 13:03:47.472323118 +0100
@@ -0,0 +1,22 @@
+/* PR tree-optimization/106912 */
+/* { dg-do compile { target vect_simd_clones } } */
+/* { dg-require-profiling "-fprofile-generate" } */
+/* { dg-options "-O2 -ftree-vectorize -fprofile-generate" } */
+/* { dg-additional-options "-fPIC" { target fpic } } */
+
+__attribute__ ((__simd__, __nothrow__, __leaf__, __const__))
+double foo (double x);
+
+void
+bar (double *f, int n)
+{
+ int i;
+ for (i = 0; i < n; i++)
+ f[i] = foo (f[i]);
+}
+
+double
+foo (double x)
+{
+ return x * x / 3.0;
+}
Jakub
On Thu, 16 Mar 2023, Jakub Jelinek wrote:
> On Thu, Mar 16, 2023 at 12:05:56PM +0000, Richard Biener wrote:
> > On Thu, 16 Mar 2023, Jakub Jelinek wrote:
> >
> > > On Fri, Nov 25, 2022 at 09:26:34PM +0100, Richard Biener via Gcc-patches wrote:
> > > > > We could
> > > > > probably keep tract if any instrumented code was ever inlined into a
> > > > > given function and perhaps just start ignoring attributes set on types?
> > > >
> > > > But ignoring attributes on types makes all indirect calls not
> > > > const/pure annotatable (OK, they are not pure annotatable because of
> > > > the above bug). I really don't see how to conservatively solve this
> > > > issue?
> > >
> > > > Maybe we can ignore all pure/const when the cgraph state is
> > > > in profile-instrumented state? Of course we have multiple "APIs"
> > > > to query that.
> > >
> > > I think that's the way to go. But we'd need to arrange somewhere during IPA
> > > to add vops to all those pure/const calls if -fprofile-generate (direct or
> > > indirect; not sure what exact flags) and after that make sure all our APIs
> > > don't return ECF_CONST, ECF_PURE or ECF_LOOPING_CONST_OR_PURE.
> > > Could be e.g.
> > > if (whatever)
> > > flags &= ~(ECF_CONST | ECF_PURE | ECF_LOOPING_CONST_OR_PURE);
> > > at the end of flags_from_decl_or_type, internal_fn_flags, what else?
> > > Although, perhaps internal_fn_flags don't need to change, because internal
> > > calls don't really have callees.
> > >
> > > Or is just ECF_CONST enough given that ECF_PURE is on fndecls and not
> > > types?
> > >
> > > Or perhaps just start ignoring TYPE_READONLY -> ECF_CONST in
> > > flags_from_decl_or_type if that flag is on?
> > >
> > > Is that rewriting currently what tree_profiling does in the
> > > /* Update call statements and rebuild the cgraph. */
> > > FOR_EACH_DEFINED_FUNCTION (node)
> > > spot where it calls update_stmt on all call statements?
> > >
> > > If so, could we just set that global? flag instead or in addition to
> > > doing those node->set_const_flag (false, false); calls and
> > > change flags_from_decl_or_type, plus I guess lto1 should set that
> > > flag if it is global on start as well if
> > > !flag_auto_profile
> > > && (flag_branch_probabilities || flag_test_coverage || profile_arc_flag)
> > > ?
> > >
> > > I think just ignoring ECF_CONST from TYPE_READONLY might be best, if we
> > > have external functions like builtins from libc/libm and don't have their
> > > bodies, we can still treat them as const, the only problem is with the
> > > indirect calls where we don't know if we do or don't have a body for
> > > the callees and whether we've instrumented those or not.
> >
> > I think we want something reflected on the IL. Because of LTO I think
> > we cannot ignore externs (or we have to do massaging at stream-in).
> >
> > The following brute-force works for the testcase. I suppose since
> > we leave const/pure set on functions without body in the compile-time
> > TU (and ignore LTO there) we could do the same here.
> >
> > I also wonder if that whole function walk is necessary if not
> > profile_arc_flag || flag_test_coverage ...
> >
> > Honza - does this look OK to you? I'll test guarding the whole
> > outer loop with profile_arc_flag || flag_test_coverage separately.
> >
> > diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> > index ea9d7a23443..c8789618f96 100644
> > --- a/gcc/tree-profile.cc
> > +++ b/gcc/tree-profile.cc
> > @@ -840,9 +840,29 @@ tree_profiling (void)
> > gimple_stmt_iterator gsi;
> > for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > {
> > - gimple *stmt = gsi_stmt (gsi);
> > - if (is_gimple_call (stmt))
> > - update_stmt (stmt);
> > + gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
> > + if (!call)
> > + continue;
> > +
> > + if (profile_arc_flag || flag_test_coverage)
> > + {
> > + /* We do not clear pure/const on decls without body. */
> > + tree fndecl = gimple_call_fndecl (call);
> > + if (fndecl && !gimple_has_body_p (fndecl))
> > + continue;
> > +
> > + /* Drop the const attribute from the call type (the pure
> > + attribute is not available on types). */
> > + tree fntype = gimple_call_fntype (call);
> > + if (fntype && TYPE_READONLY (fntype))
> > + gimple_call_set_fntype (call, build_qualified_type
> > + (fntype, (TYPE_QUALS
> > (fntype)
> > + &
> > ~TYPE_QUAL_CONST)));
> > + }
> > +
> > + /* Update virtual operands of calls to no longer const/pure
> > + functions. */
> > + update_stmt (call);
> > }
> > }
>
> I have in the meantime briefly tested following.
>
> But if you want to the above way, then at least the testcase could be
> useful. Though, not sure if the above is all that is needed. Shouldn't
> set_const_flag_1 upon TREE_READONLY (node->decl) = 0; also adjust
> TREE_TYPE on the function to
> tree fntype = TREE_TYPE (node->decl);
> TREE_TYPE (node->decl)
> = build_qualified_type (fntype,
> TYPE_QUALS (fntype) & ~TYPE_QUAL_CONST);
> too (perhaps guarded on TREE_READONLY (fntype))?
That would be unnecessary for the problem at hand I think. Nothing
should look at this type.
Let's wait for Honzas opinion.
Richard.
>
> 2023-03-16 Jakub Jelinek <jakub@redhat.com>
>
> PR tree-optimization/106912
> gcc/
> * calls.h (ignore_const_fntype): Declare.
> * calls.cc (flags_from_decl_or_type): Ignore TYPE_READONLY
> on types if ignore_const_fntype.
> * tree-profile.cc: Include calls.h.
> (tree_profiling): Set ignore_const_fntype if profile_arc_flag
> or flag_test_coverage.
> gcc/lto/
> * lto-lang.cc (lto_post_options): Set ignore_const_fntype if
> profile_arc_flag or flag_test_coverage and not flag_auto_profile.
> gcc/testsuite/
> * gcc.dg/pr106912.c: New test.
>
> --- gcc/calls.h.jj 2023-01-02 09:32:51.252868185 +0100
> +++ gcc/calls.h 2023-03-16 12:23:51.632460586 +0100
> @@ -134,5 +134,6 @@ extern void maybe_complain_about_tail_ca
>
> extern rtx rtx_for_static_chain (const_tree, bool);
> extern bool cxx17_empty_base_field_p (const_tree);
> +extern bool ignore_const_fntype;
>
> #endif // GCC_CALLS_H
> --- gcc/calls.cc.jj 2023-02-21 11:44:48.460464845 +0100
> +++ gcc/calls.cc 2023-03-16 12:27:45.427032110 +0100
> @@ -800,6 +800,13 @@ is_tm_builtin (const_tree fndecl)
> return false;
> }
>
> +/* Set if flags_from_decl_or_type should ignore TYPE_READONLY of function
> + types. This is used when tree-profile.cc instruments const calls,
> + clears TREE_READONLY on FUNCTION_DECLs which have been instrumented, but
> + for function types or indirect calls we don't know if the callees have been
> + instrumented or not. */
> +bool ignore_const_fntype;
> +
> /* Detect flags (function attributes) from the function decl or type node. */
>
> int
> @@ -849,7 +856,7 @@ flags_from_decl_or_type (const_tree exp)
> }
> else if (TYPE_P (exp))
> {
> - if (TYPE_READONLY (exp))
> + if (TYPE_READONLY (exp) && !ignore_const_fntype)
> flags |= ECF_CONST;
>
> if (flag_tm
> --- gcc/tree-profile.cc.jj 2023-01-02 09:32:27.923205268 +0100
> +++ gcc/tree-profile.cc 2023-03-16 12:57:29.130873893 +0100
> @@ -58,6 +58,7 @@ along with GCC; see the file COPYING3.
> #include "alloc-pool.h"
> #include "symbol-summary.h"
> #include "symtab-thunks.h"
> +#include "calls.h"
>
> static GTY(()) tree gcov_type_node;
> static GTY(()) tree tree_interval_profiler_fn;
> @@ -819,6 +820,10 @@ tree_profiling (void)
> node->set_pure_flag (false, false);
> }
>
> + /* From this point on, ignore TYPE_READONLY on function types. */
> + if (profile_arc_flag || flag_test_coverage)
> + ignore_const_fntype = true;
> +
> /* Update call statements and rebuild the cgraph. */
> FOR_EACH_DEFINED_FUNCTION (node)
> {
> --- gcc/lto/lto-lang.cc.jj 2023-01-16 11:52:16.165732856 +0100
> +++ gcc/lto/lto-lang.cc 2023-03-16 12:58:00.420414840 +0100
> @@ -37,6 +37,7 @@ along with GCC; see the file COPYING3.
> #include "lto-common.h"
> #include "stringpool.h"
> #include "attribs.h"
> +#include "calls.h"
>
> /* LTO specific dumps. */
> int lto_link_dump_id, decl_merge_dump_id, partition_dump_id;
> @@ -938,6 +939,11 @@ lto_post_options (const char **pfilename
> if (!flag_merge_constants)
> flag_merge_constants = 1;
>
> + /* If const functions were or might have been instrumented by
> + tree-profiler, ignore TYPE_READONLY on function types. */
> + if (!flag_auto_profile && (profile_arc_flag || flag_test_coverage))
> + ignore_const_fntype = true;
> +
> /* Initialize the compiler back end. */
> return false;
> }
> --- gcc/testsuite/gcc.dg/pr106912.c.jj 2023-03-16 13:02:17.720639903 +0100
> +++ gcc/testsuite/gcc.dg/pr106912.c 2023-03-16 13:03:47.472323118 +0100
> @@ -0,0 +1,22 @@
> +/* PR tree-optimization/106912 */
> +/* { dg-do compile { target vect_simd_clones } } */
> +/* { dg-require-profiling "-fprofile-generate" } */
> +/* { dg-options "-O2 -ftree-vectorize -fprofile-generate" } */
> +/* { dg-additional-options "-fPIC" { target fpic } } */
> +
> +__attribute__ ((__simd__, __nothrow__, __leaf__, __const__))
> +double foo (double x);
> +
> +void
> +bar (double *f, int n)
> +{
> + int i;
> + for (i = 0; i < n; i++)
> + f[i] = foo (f[i]);
> +}
> +
> +double
> +foo (double x)
> +{
> + return x * x / 3.0;
> +}
>
>
> Jakub
>
>
On Thu, 16 Mar 2023, Richard Biener wrote:
> On Thu, 16 Mar 2023, Jakub Jelinek wrote:
>
> > On Thu, Mar 16, 2023 at 12:05:56PM +0000, Richard Biener wrote:
> > > On Thu, 16 Mar 2023, Jakub Jelinek wrote:
> > >
> > > > On Fri, Nov 25, 2022 at 09:26:34PM +0100, Richard Biener via Gcc-patches wrote:
> > > > > > We could
> > > > > > probably keep tract if any instrumented code was ever inlined into a
> > > > > > given function and perhaps just start ignoring attributes set on types?
> > > > >
> > > > > But ignoring attributes on types makes all indirect calls not
> > > > > const/pure annotatable (OK, they are not pure annotatable because of
> > > > > the above bug). I really don't see how to conservatively solve this
> > > > > issue?
> > > >
> > > > > Maybe we can ignore all pure/const when the cgraph state is
> > > > > in profile-instrumented state? Of course we have multiple "APIs"
> > > > > to query that.
> > > >
> > > > I think that's the way to go. But we'd need to arrange somewhere during IPA
> > > > to add vops to all those pure/const calls if -fprofile-generate (direct or
> > > > indirect; not sure what exact flags) and after that make sure all our APIs
> > > > don't return ECF_CONST, ECF_PURE or ECF_LOOPING_CONST_OR_PURE.
> > > > Could be e.g.
> > > > if (whatever)
> > > > flags &= ~(ECF_CONST | ECF_PURE | ECF_LOOPING_CONST_OR_PURE);
> > > > at the end of flags_from_decl_or_type, internal_fn_flags, what else?
> > > > Although, perhaps internal_fn_flags don't need to change, because internal
> > > > calls don't really have callees.
> > > >
> > > > Or is just ECF_CONST enough given that ECF_PURE is on fndecls and not
> > > > types?
> > > >
> > > > Or perhaps just start ignoring TYPE_READONLY -> ECF_CONST in
> > > > flags_from_decl_or_type if that flag is on?
> > > >
> > > > Is that rewriting currently what tree_profiling does in the
> > > > /* Update call statements and rebuild the cgraph. */
> > > > FOR_EACH_DEFINED_FUNCTION (node)
> > > > spot where it calls update_stmt on all call statements?
> > > >
> > > > If so, could we just set that global? flag instead or in addition to
> > > > doing those node->set_const_flag (false, false); calls and
> > > > change flags_from_decl_or_type, plus I guess lto1 should set that
> > > > flag if it is global on start as well if
> > > > !flag_auto_profile
> > > > && (flag_branch_probabilities || flag_test_coverage || profile_arc_flag)
> > > > ?
> > > >
> > > > I think just ignoring ECF_CONST from TYPE_READONLY might be best, if we
> > > > have external functions like builtins from libc/libm and don't have their
> > > > bodies, we can still treat them as const, the only problem is with the
> > > > indirect calls where we don't know if we do or don't have a body for
> > > > the callees and whether we've instrumented those or not.
> > >
> > > I think we want something reflected on the IL. Because of LTO I think
> > > we cannot ignore externs (or we have to do massaging at stream-in).
> > >
> > > The following brute-force works for the testcase. I suppose since
> > > we leave const/pure set on functions without body in the compile-time
> > > TU (and ignore LTO there) we could do the same here.
> > >
> > > I also wonder if that whole function walk is necessary if not
> > > profile_arc_flag || flag_test_coverage ...
> > >
> > > Honza - does this look OK to you? I'll test guarding the whole
> > > outer loop with profile_arc_flag || flag_test_coverage separately.
> > >
> > > diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> > > index ea9d7a23443..c8789618f96 100644
> > > --- a/gcc/tree-profile.cc
> > > +++ b/gcc/tree-profile.cc
> > > @@ -840,9 +840,29 @@ tree_profiling (void)
> > > gimple_stmt_iterator gsi;
> > > for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > {
> > > - gimple *stmt = gsi_stmt (gsi);
> > > - if (is_gimple_call (stmt))
> > > - update_stmt (stmt);
> > > + gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
> > > + if (!call)
> > > + continue;
> > > +
> > > + if (profile_arc_flag || flag_test_coverage)
> > > + {
> > > + /* We do not clear pure/const on decls without body. */
> > > + tree fndecl = gimple_call_fndecl (call);
> > > + if (fndecl && !gimple_has_body_p (fndecl))
> > > + continue;
> > > +
> > > + /* Drop the const attribute from the call type (the pure
> > > + attribute is not available on types). */
> > > + tree fntype = gimple_call_fntype (call);
> > > + if (fntype && TYPE_READONLY (fntype))
> > > + gimple_call_set_fntype (call, build_qualified_type
> > > + (fntype, (TYPE_QUALS
> > > (fntype)
> > > + &
> > > ~TYPE_QUAL_CONST)));
> > > + }
> > > +
> > > + /* Update virtual operands of calls to no longer const/pure
> > > + functions. */
> > > + update_stmt (call);
> > > }
> > > }
> >
> > I have in the meantime briefly tested following.
> >
> > But if you want to the above way, then at least the testcase could be
> > useful. Though, not sure if the above is all that is needed. Shouldn't
> > set_const_flag_1 upon TREE_READONLY (node->decl) = 0; also adjust
> > TREE_TYPE on the function to
> > tree fntype = TREE_TYPE (node->decl);
> > TREE_TYPE (node->decl)
> > = build_qualified_type (fntype,
> > TYPE_QUALS (fntype) & ~TYPE_QUAL_CONST);
> > too (perhaps guarded on TREE_READONLY (fntype))?
>
> That would be unnecessary for the problem at hand I think. Nothing
> should look at this type.
>
> Let's wait for Honzas opinion.
The following is what I profile-bootstrapped and tested on
x86_64-unknown-linux-gnu.
Richard.
From d438a0d84cafced85c90204cba81de0f60ad0073 Mon Sep 17 00:00:00 2001
From: Richard Biener <rguenther@suse.de>
Date: Thu, 16 Mar 2023 13:51:19 +0100
Subject: [PATCH] tree-optimization/106912 - clear const attribute from fntype
To: gcc-patches@gcc.gnu.org
The following makes sure that after clearing pure/const from
instrumented function declarations we are adjusting call statements
fntype as well to handle indirect calls and because gimple_call_flags
looks at both decl and fntype.
Like the pure/const flag clearing on decls we refrain from touching
calls to known functions that do not have a body in the current TU.
PR tree-optimization/106912
* tree-profile.cc (tree_profiling): Update stmts only when
profiling or testing coverage. Make sure to update calls
fntype, stripping 'const' there.
* gcc.dg/profile-generate-4.c: New testcase.
---
gcc/testsuite/gcc.dg/profile-generate-4.c | 19 ++++++++++++
gcc/tree-profile.cc | 38 +++++++++++++++++------
2 files changed, 47 insertions(+), 10 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/profile-generate-4.c
diff --git a/gcc/testsuite/gcc.dg/profile-generate-4.c b/gcc/testsuite/gcc.dg/profile-generate-4.c
new file mode 100644
index 00000000000..c2b999fe4cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/profile-generate-4.c
@@ -0,0 +1,19 @@
+/* PR106912 */
+/* { dg-require-profiling "-fprofile-generate" } */
+/* { dg-options "-O2 -fprofile-generate -ftree-vectorize" } */
+
+__attribute__ ((__simd__))
+__attribute__ ((__nothrow__ , __leaf__ , __const__, __noinline__))
+double foo (double x);
+
+void bar(double *f, int n)
+{
+ int i;
+ for (i = 0; i < n; i++)
+ f[i] = foo(f[i]);
+}
+
+double foo(double x)
+{
+ return x * x / 3.0;
+}
diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
index ea9d7a23443..7854cd4bc31 100644
--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -835,16 +835,34 @@ tree_profiling (void)
push_cfun (DECL_STRUCT_FUNCTION (node->decl));
- FOR_EACH_BB_FN (bb, cfun)
- {
- gimple_stmt_iterator gsi;
- for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
- {
- gimple *stmt = gsi_stmt (gsi);
- if (is_gimple_call (stmt))
- update_stmt (stmt);
- }
- }
+ if (profile_arc_flag || flag_test_coverage)
+ FOR_EACH_BB_FN (bb, cfun)
+ {
+ gimple_stmt_iterator gsi;
+ for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+ {
+ gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
+ if (!call)
+ continue;
+
+ /* We do not clear pure/const on decls without body. */
+ tree fndecl = gimple_call_fndecl (call);
+ if (fndecl && !gimple_has_body_p (fndecl))
+ continue;
+
+ /* Drop the const attribute from the call type (the pure
+ attribute is not available on types). */
+ tree fntype = gimple_call_fntype (call);
+ if (fntype && TYPE_READONLY (fntype))
+ gimple_call_set_fntype
+ (call, build_qualified_type (fntype, (TYPE_QUALS (fntype)
+ & ~TYPE_QUAL_CONST)));
+
+ /* Update virtual operands of calls to no longer const/pure
+ functions. */
+ update_stmt (call);
+ }
+ }
/* re-merge split blocks. */
cleanup_tree_cfg ();
On Thu, Mar 16, 2023 at 02:11:01PM +0000, Richard Biener wrote:
> > Let's wait for Honzas opinion.
>
> The following is what I profile-bootstrapped and tested on
> x86_64-unknown-linux-gnu.
>
> Richard.
>
> >From d438a0d84cafced85c90204cba81de0f60ad0073 Mon Sep 17 00:00:00 2001
> From: Richard Biener <rguenther@suse.de>
> Date: Thu, 16 Mar 2023 13:51:19 +0100
> Subject: [PATCH] tree-optimization/106912 - clear const attribute from fntype
> To: gcc-patches@gcc.gnu.org
>
> The following makes sure that after clearing pure/const from
> instrumented function declarations we are adjusting call statements
> fntype as well to handle indirect calls and because gimple_call_flags
> looks at both decl and fntype.
>
> Like the pure/const flag clearing on decls we refrain from touching
> calls to known functions that do not have a body in the current TU.
>
> PR tree-optimization/106912
> * tree-profile.cc (tree_profiling): Update stmts only when
> profiling or testing coverage. Make sure to update calls
> fntype, stripping 'const' there.
>
> * gcc.dg/profile-generate-4.c: New testcase.
> + if (fntype && TYPE_READONLY (fntype))
> + gimple_call_set_fntype
> + (call, build_qualified_type (fntype, (TYPE_QUALS (fntype)
> + & ~TYPE_QUAL_CONST)));
I think
if (fntype && TYPE_READONLY (fntype))
{
int quals = TYPE_QUALS (fntype) & ~TYPE_QUAL_CONST;
fntype = build_qualified_type (fntype, quals);
gimple_call_set_fntype (call, fntype);
}
would be nicer formatting for it.
Anyway, let's wait for Honza, LGTM.
> +
> + /* Update virtual operands of calls to no longer const/pure
> + functions. */
> + update_stmt (call);
> + }
> + }
>
> /* re-merge split blocks. */
> cleanup_tree_cfg ();
Jakub
>
> I have in the meantime briefly tested following.
>
> But if you want to the above way, then at least the testcase could be
> useful. Though, not sure if the above is all that is needed. Shouldn't
> set_const_flag_1 upon TREE_READONLY (node->decl) = 0; also adjust
> TREE_TYPE on the function to
> tree fntype = TREE_TYPE (node->decl);
> TREE_TYPE (node->decl)
> = build_qualified_type (fntype,
> TYPE_QUALS (fntype) & ~TYPE_QUAL_CONST);
> too (perhaps guarded on TREE_READONLY (fntype))?
>
> 2023-03-16 Jakub Jelinek <jakub@redhat.com>
>
> PR tree-optimization/106912
> gcc/
> * calls.h (ignore_const_fntype): Declare.
> * calls.cc (flags_from_decl_or_type): Ignore TYPE_READONLY
> on types if ignore_const_fntype.
> * tree-profile.cc: Include calls.h.
> (tree_profiling): Set ignore_const_fntype if profile_arc_flag
> or flag_test_coverage.
> gcc/lto/
> * lto-lang.cc (lto_post_options): Set ignore_const_fntype if
> profile_arc_flag or flag_test_coverage and not flag_auto_profile.
> gcc/testsuite/
> * gcc.dg/pr106912.c: New test.
>
> --- gcc/calls.h.jj 2023-01-02 09:32:51.252868185 +0100
> +++ gcc/calls.h 2023-03-16 12:23:51.632460586 +0100
> @@ -134,5 +134,6 @@ extern void maybe_complain_about_tail_ca
>
> extern rtx rtx_for_static_chain (const_tree, bool);
> extern bool cxx17_empty_base_field_p (const_tree);
> +extern bool ignore_const_fntype;
>
> #endif // GCC_CALLS_H
> --- gcc/calls.cc.jj 2023-02-21 11:44:48.460464845 +0100
> +++ gcc/calls.cc 2023-03-16 12:27:45.427032110 +0100
> @@ -800,6 +800,13 @@ is_tm_builtin (const_tree fndecl)
> return false;
> }
>
> +/* Set if flags_from_decl_or_type should ignore TYPE_READONLY of function
> + types. This is used when tree-profile.cc instruments const calls,
> + clears TREE_READONLY on FUNCTION_DECLs which have been instrumented, but
> + for function types or indirect calls we don't know if the callees have been
> + instrumented or not. */
> +bool ignore_const_fntype;
> +
> /* Detect flags (function attributes) from the function decl or type node. */
>
> int
> @@ -849,7 +856,7 @@ flags_from_decl_or_type (const_tree exp)
> }
> else if (TYPE_P (exp))
> {
> - if (TYPE_READONLY (exp))
> + if (TYPE_READONLY (exp) && !ignore_const_fntype)
I think we want to ignore PURE flag too: if callee modifies counters
seen by caller, we need to take that into account when optimizing it.
It is not very pretty to have global flag for prifled state, but I can't
think of scenario where this would break, so perhaps it is good way to
go?
Honza
On Fri, Mar 17, 2023 at 08:09:17PM +0100, Jan Hubicka wrote:
> >
> > I have in the meantime briefly tested following.
> >
> > But if you want to the above way, then at least the testcase could be
> > useful. Though, not sure if the above is all that is needed. Shouldn't
> > set_const_flag_1 upon TREE_READONLY (node->decl) = 0; also adjust
> > TREE_TYPE on the function to
> > tree fntype = TREE_TYPE (node->decl);
> > TREE_TYPE (node->decl)
> > = build_qualified_type (fntype,
> > TYPE_QUALS (fntype) & ~TYPE_QUAL_CONST);
> > too (perhaps guarded on TREE_READONLY (fntype))?
> >
> > 2023-03-16 Jakub Jelinek <jakub@redhat.com>
> >
> > PR tree-optimization/106912
> > gcc/
> > * calls.h (ignore_const_fntype): Declare.
> > * calls.cc (flags_from_decl_or_type): Ignore TYPE_READONLY
> > on types if ignore_const_fntype.
> > * tree-profile.cc: Include calls.h.
> > (tree_profiling): Set ignore_const_fntype if profile_arc_flag
> > or flag_test_coverage.
> > gcc/lto/
> > * lto-lang.cc (lto_post_options): Set ignore_const_fntype if
> > profile_arc_flag or flag_test_coverage and not flag_auto_profile.
> > gcc/testsuite/
> > * gcc.dg/pr106912.c: New test.
> >
> > --- gcc/calls.h.jj 2023-01-02 09:32:51.252868185 +0100
> > +++ gcc/calls.h 2023-03-16 12:23:51.632460586 +0100
> > @@ -134,5 +134,6 @@ extern void maybe_complain_about_tail_ca
> >
> > extern rtx rtx_for_static_chain (const_tree, bool);
> > extern bool cxx17_empty_base_field_p (const_tree);
> > +extern bool ignore_const_fntype;
> >
> > #endif // GCC_CALLS_H
> > --- gcc/calls.cc.jj 2023-02-21 11:44:48.460464845 +0100
> > +++ gcc/calls.cc 2023-03-16 12:27:45.427032110 +0100
> > @@ -800,6 +800,13 @@ is_tm_builtin (const_tree fndecl)
> > return false;
> > }
> >
> > +/* Set if flags_from_decl_or_type should ignore TYPE_READONLY of function
> > + types. This is used when tree-profile.cc instruments const calls,
> > + clears TREE_READONLY on FUNCTION_DECLs which have been instrumented, but
> > + for function types or indirect calls we don't know if the callees have been
> > + instrumented or not. */
> > +bool ignore_const_fntype;
> > +
> > /* Detect flags (function attributes) from the function decl or type node. */
> >
> > int
> > @@ -849,7 +856,7 @@ flags_from_decl_or_type (const_tree exp)
> > }
> > else if (TYPE_P (exp))
> > {
> > - if (TYPE_READONLY (exp))
> > + if (TYPE_READONLY (exp) && !ignore_const_fntype)
>
> I think we want to ignore PURE flag too: if callee modifies counters
> seen by caller, we need to take that into account when optimizing it.
>
> It is not very pretty to have global flag for prifled state, but I can't
> think of scenario where this would break, so perhaps it is good way to
> go?
Note, Richi posted what I think is much cleaner patch in
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614074.html
Jakub
>
> The following is what I profile-bootstrapped and tested on
> x86_64-unknown-linux-gnu.
>
> Richard.
>
> From d438a0d84cafced85c90204cba81de0f60ad0073 Mon Sep 17 00:00:00 2001
> From: Richard Biener <rguenther@suse.de>
> Date: Thu, 16 Mar 2023 13:51:19 +0100
> Subject: [PATCH] tree-optimization/106912 - clear const attribute from fntype
> To: gcc-patches@gcc.gnu.org
>
> The following makes sure that after clearing pure/const from
> instrumented function declarations we are adjusting call statements
> fntype as well to handle indirect calls and because gimple_call_flags
> looks at both decl and fntype.
>
> Like the pure/const flag clearing on decls we refrain from touching
> calls to known functions that do not have a body in the current TU.
>
> PR tree-optimization/106912
> * tree-profile.cc (tree_profiling): Update stmts only when
> profiling or testing coverage. Make sure to update calls
> fntype, stripping 'const' there.
>
> * gcc.dg/profile-generate-4.c: New testcase.
> ---
> gcc/testsuite/gcc.dg/profile-generate-4.c | 19 ++++++++++++
> gcc/tree-profile.cc | 38 +++++++++++++++++------
> 2 files changed, 47 insertions(+), 10 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/profile-generate-4.c
>
> diff --git a/gcc/testsuite/gcc.dg/profile-generate-4.c b/gcc/testsuite/gcc.dg/profile-generate-4.c
> new file mode 100644
> index 00000000000..c2b999fe4cb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/profile-generate-4.c
> @@ -0,0 +1,19 @@
> +/* PR106912 */
> +/* { dg-require-profiling "-fprofile-generate" } */
> +/* { dg-options "-O2 -fprofile-generate -ftree-vectorize" } */
> +
> +__attribute__ ((__simd__))
> +__attribute__ ((__nothrow__ , __leaf__ , __const__, __noinline__))
> +double foo (double x);
> +
> +void bar(double *f, int n)
> +{
> + int i;
> + for (i = 0; i < n; i++)
> + f[i] = foo(f[i]);
> +}
> +
> +double foo(double x)
> +{
> + return x * x / 3.0;
> +}
> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> index ea9d7a23443..7854cd4bc31 100644
> --- a/gcc/tree-profile.cc
> +++ b/gcc/tree-profile.cc
> @@ -835,16 +835,34 @@ tree_profiling (void)
>
> push_cfun (DECL_STRUCT_FUNCTION (node->decl));
>
> - FOR_EACH_BB_FN (bb, cfun)
> - {
> - gimple_stmt_iterator gsi;
> - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> - {
> - gimple *stmt = gsi_stmt (gsi);
> - if (is_gimple_call (stmt))
> - update_stmt (stmt);
> - }
> - }
> + if (profile_arc_flag || flag_test_coverage)
> + FOR_EACH_BB_FN (bb, cfun)
> + {
> + gimple_stmt_iterator gsi;
> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> + {
> + gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
> + if (!call)
> + continue;
> +
> + /* We do not clear pure/const on decls without body. */
> + tree fndecl = gimple_call_fndecl (call);
> + if (fndecl && !gimple_has_body_p (fndecl))
> + continue;
> +
> + /* Drop the const attribute from the call type (the pure
> + attribute is not available on types). */
> + tree fntype = gimple_call_fntype (call);
> + if (fntype && TYPE_READONLY (fntype))
> + gimple_call_set_fntype
> + (call, build_qualified_type (fntype, (TYPE_QUALS (fntype)
> + & ~TYPE_QUAL_CONST)));
Sorry, now I am bit confused on why Jakub's fix did not need similar
fixup. The flag is only set during the profiling stage and thus I would
expect it to still run into the problem that VOPs are missing.
Is it only becuase we do not sanity check?
Here is a testcase that shows the problem:
include <math.h>
float c;
__attribute__ ((const))
float
instrument(float v)
{
return sin (v);
}
__attribute__ ((no_profile_instrument_function,const,noinline))
float noinstrument (float v)
{
return instrument (v);
}
m()
{
c+=instrument(c);
if (!__builtin_expect (c,1))
{
c+=noinstrument (c);
}
c+=instrument(c);
}
main()
{
m();
}
Compiling
gcc -O0 t.c -fprofile-arcs -fno-early-inlining --coverage -lm -ftest-coverage -S ; gcc t.s -ftest-coverage -lm -fprofile-arcs
makes gcov to report 3 executions on instrument while with -O3 it is 2.
This is because the noinstrument has const flag that is not cleared and
it becoames essentially non-const because it calls instrument that gets
intrumented and partially inlined.
Honza
> +
> + /* Update virtual operands of calls to no longer const/pure
> + functions. */
> + update_stmt (call);
> + }
> + }
>
> /* re-merge split blocks. */
> cleanup_tree_cfg ();
> --
> 2.35.3
>
On Fri, Mar 17, 2023 at 08:40:34PM +0100, Jan Hubicka wrote:
> > + /* Drop the const attribute from the call type (the pure
> > + attribute is not available on types). */
> > + tree fntype = gimple_call_fntype (call);
> > + if (fntype && TYPE_READONLY (fntype))
> > + gimple_call_set_fntype
> > + (call, build_qualified_type (fntype, (TYPE_QUALS (fntype)
> > + & ~TYPE_QUAL_CONST)));
>
> Sorry, now I am bit confused on why Jakub's fix did not need similar
> fixup. The flag is only set during the profiling stage and thus I would
> expect it to still run into the problem that VOPs are missing.
> Is it only becuase we do not sanity check?
My patch started from this point ignoring all TYPE_READONLY bits on
all FUNCTION_TYPE/METHOD_TYPEs, while Richi's patch instead makes it
explicit in the IL, TYPE_READONLY is honored as before but it shouldn't
show up in any of the gimple_call_fntype types unless it is a direct
call to a const function for which we don't have a body.
In either case, vops are added on the update_stmt a few lines later.
> Here is a testcase that shows the problem:
>
> include <math.h>
> float c;
>
> __attribute__ ((const))
> float
> instrument(float v)
> {
> return sin (v);
> }
> __attribute__ ((no_profile_instrument_function,const,noinline))
> float noinstrument (float v)
> {
> return instrument (v);
> }
>
> m()
> {
> c+=instrument(c);
> if (!__builtin_expect (c,1))
> {
> c+=noinstrument (c);
> }
> c+=instrument(c);
> }
> main()
> {
> m();
> }
>
> Compiling
> gcc -O0 t.c -fprofile-arcs -fno-early-inlining --coverage -lm -ftest-coverage -S ; gcc t.s -ftest-coverage -lm -fprofile-arcs
> makes gcov to report 3 executions on instrument while with -O3 it is 2.
>
> This is because the noinstrument has const flag that is not cleared and
> it becoames essentially non-const because it calls instrument that gets
> intrumented and partially inlined.
I thought tree-profile.cc right now clears const on all const functions
with body, or is that
!(!node->clone_of || node->decl != node->clone_of->decl)
restricting it to instrumented functions only?
Jakub
On Fri, 17 Mar 2023, Jakub Jelinek wrote:
> On Fri, Mar 17, 2023 at 08:40:34PM +0100, Jan Hubicka wrote:
> > > + /* Drop the const attribute from the call type (the pure
> > > + attribute is not available on types). */
> > > + tree fntype = gimple_call_fntype (call);
> > > + if (fntype && TYPE_READONLY (fntype))
> > > + gimple_call_set_fntype
> > > + (call, build_qualified_type (fntype, (TYPE_QUALS (fntype)
> > > + & ~TYPE_QUAL_CONST)));
> >
> > Sorry, now I am bit confused on why Jakub's fix did not need similar
> > fixup. The flag is only set during the profiling stage and thus I would
> > expect it to still run into the problem that VOPs are missing.
> > Is it only becuase we do not sanity check?
>
> My patch started from this point ignoring all TYPE_READONLY bits on
> all FUNCTION_TYPE/METHOD_TYPEs, while Richi's patch instead makes it
> explicit in the IL, TYPE_READONLY is honored as before but it shouldn't
> show up in any of the gimple_call_fntype types unless it is a direct
> call to a const function for which we don't have a body.
>
> In either case, vops are added on the update_stmt a few lines later.
>
> > Here is a testcase that shows the problem:
> >
> > include <math.h>
> > float c;
> >
> > __attribute__ ((const))
> > float
> > instrument(float v)
> > {
> > return sin (v);
> > }
> > __attribute__ ((no_profile_instrument_function,const,noinline))
> > float noinstrument (float v)
> > {
> > return instrument (v);
> > }
> >
> > m()
> > {
> > c+=instrument(c);
> > if (!__builtin_expect (c,1))
> > {
> > c+=noinstrument (c);
> > }
> > c+=instrument(c);
> > }
> > main()
> > {
> > m();
> > }
> >
> > Compiling
> > gcc -O0 t.c -fprofile-arcs -fno-early-inlining --coverage -lm -ftest-coverage -S ; gcc t.s -ftest-coverage -lm -fprofile-arcs
> > makes gcov to report 3 executions on instrument while with -O3 it is 2.
With my proposed patch it works fine and reports 3 executions on
'instrument' with both -O0 and -O3. I checked it indeed reports only
2 executions with GCC 12.
So it seems the patch is a progression in general?
Thus, OK?
Thanks,
Richard.
On Mon, 20 Mar 2023, Richard Biener wrote:
> On Fri, 17 Mar 2023, Jakub Jelinek wrote:
>
> > On Fri, Mar 17, 2023 at 08:40:34PM +0100, Jan Hubicka wrote:
> > > > + /* Drop the const attribute from the call type (the pure
> > > > + attribute is not available on types). */
> > > > + tree fntype = gimple_call_fntype (call);
> > > > + if (fntype && TYPE_READONLY (fntype))
> > > > + gimple_call_set_fntype
> > > > + (call, build_qualified_type (fntype, (TYPE_QUALS (fntype)
> > > > + & ~TYPE_QUAL_CONST)));
> > >
> > > Sorry, now I am bit confused on why Jakub's fix did not need similar
> > > fixup. The flag is only set during the profiling stage and thus I would
> > > expect it to still run into the problem that VOPs are missing.
> > > Is it only becuase we do not sanity check?
> >
> > My patch started from this point ignoring all TYPE_READONLY bits on
> > all FUNCTION_TYPE/METHOD_TYPEs, while Richi's patch instead makes it
> > explicit in the IL, TYPE_READONLY is honored as before but it shouldn't
> > show up in any of the gimple_call_fntype types unless it is a direct
> > call to a const function for which we don't have a body.
> >
> > In either case, vops are added on the update_stmt a few lines later.
> >
> > > Here is a testcase that shows the problem:
> > >
> > > include <math.h>
> > > float c;
> > >
> > > __attribute__ ((const))
> > > float
> > > instrument(float v)
> > > {
> > > return sin (v);
> > > }
> > > __attribute__ ((no_profile_instrument_function,const,noinline))
> > > float noinstrument (float v)
> > > {
> > > return instrument (v);
> > > }
> > >
> > > m()
> > > {
> > > c+=instrument(c);
> > > if (!__builtin_expect (c,1))
> > > {
> > > c+=noinstrument (c);
> > > }
> > > c+=instrument(c);
> > > }
> > > main()
> > > {
> > > m();
> > > }
> > >
> > > Compiling
> > > gcc -O0 t.c -fprofile-arcs -fno-early-inlining --coverage -lm -ftest-coverage -S ; gcc t.s -ftest-coverage -lm -fprofile-arcs
> > > makes gcov to report 3 executions on instrument while with -O3 it is 2.
>
> With my proposed patch it works fine and reports 3 executions on
> 'instrument' with both -O0 and -O3. I checked it indeed reports only
> 2 executions with GCC 12.
>
> So it seems the patch is a progression in general?
>
> Thus, OK?
Honza - ping, are you fine with the patch in
https://gcc.gnu.org/pipermail/gcc-patches/2023-March/614074.html?
Thanks,
Richard.
> On Fri, 17 Mar 2023, Jakub Jelinek wrote:
>
> > On Fri, Mar 17, 2023 at 08:40:34PM +0100, Jan Hubicka wrote:
> > > > + /* Drop the const attribute from the call type (the pure
> > > > + attribute is not available on types). */
> > > > + tree fntype = gimple_call_fntype (call);
> > > > + if (fntype && TYPE_READONLY (fntype))
> > > > + gimple_call_set_fntype
> > > > + (call, build_qualified_type (fntype, (TYPE_QUALS (fntype)
> > > > + & ~TYPE_QUAL_CONST)));
> > >
> > > Sorry, now I am bit confused on why Jakub's fix did not need similar
> > > fixup. The flag is only set during the profiling stage and thus I would
> > > expect it to still run into the problem that VOPs are missing.
> > > Is it only becuase we do not sanity check?
> >
> > My patch started from this point ignoring all TYPE_READONLY bits on
> > all FUNCTION_TYPE/METHOD_TYPEs, while Richi's patch instead makes it
> > explicit in the IL, TYPE_READONLY is honored as before but it shouldn't
> > show up in any of the gimple_call_fntype types unless it is a direct
> > call to a const function for which we don't have a body.
> >
> > In either case, vops are added on the update_stmt a few lines later.
> >
> > > Here is a testcase that shows the problem:
> > >
> > > include <math.h>
> > > float c;
> > >
> > > __attribute__ ((const))
> > > float
> > > instrument(float v)
> > > {
> > > return sin (v);
> > > }
> > > __attribute__ ((no_profile_instrument_function,const,noinline))
> > > float noinstrument (float v)
> > > {
> > > return instrument (v);
> > > }
> > >
> > > m()
> > > {
> > > c+=instrument(c);
> > > if (!__builtin_expect (c,1))
> > > {
> > > c+=noinstrument (c);
> > > }
> > > c+=instrument(c);
> > > }
> > > main()
> > > {
> > > m();
> > > }
> > >
> > > Compiling
> > > gcc -O0 t.c -fprofile-arcs -fno-early-inlining --coverage -lm -ftest-coverage -S ; gcc t.s -ftest-coverage -lm -fprofile-arcs
> > > makes gcov to report 3 executions on instrument while with -O3 it is 2.
>
> With my proposed patch it works fine and reports 3 executions on
> 'instrument' with both -O0 and -O3. I checked it indeed reports only
> 2 executions with GCC 12.
>
> So it seems the patch is a progression in general?
>
> Thus, OK?
OK,
thanks!
Honza
>
> Thanks,
> Richard.
> From d438a0d84cafced85c90204cba81de0f60ad0073 Mon Sep 17 00:00:00 2001
> From: Richard Biener <rguenther@suse.de>
> Date: Thu, 16 Mar 2023 13:51:19 +0100
> Subject: [PATCH] tree-optimization/106912 - clear const attribute from fntype
> To: gcc-patches@gcc.gnu.org
>
> The following makes sure that after clearing pure/const from
> instrumented function declarations we are adjusting call statements
> fntype as well to handle indirect calls and because gimple_call_flags
> looks at both decl and fntype.
>
> Like the pure/const flag clearing on decls we refrain from touching
> calls to known functions that do not have a body in the current TU.
>
> PR tree-optimization/106912
> * tree-profile.cc (tree_profiling): Update stmts only when
> profiling or testing coverage. Make sure to update calls
> fntype, stripping 'const' there.
>
> * gcc.dg/profile-generate-4.c: New testcase.
> ---
> gcc/testsuite/gcc.dg/profile-generate-4.c | 19 ++++++++++++
> gcc/tree-profile.cc | 38 +++++++++++++++++------
> 2 files changed, 47 insertions(+), 10 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/profile-generate-4.c
>
> diff --git a/gcc/testsuite/gcc.dg/profile-generate-4.c b/gcc/testsuite/gcc.dg/profile-generate-4.c
> new file mode 100644
> index 00000000000..c2b999fe4cb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/profile-generate-4.c
> @@ -0,0 +1,19 @@
> +/* PR106912 */
> +/* { dg-require-profiling "-fprofile-generate" } */
> +/* { dg-options "-O2 -fprofile-generate -ftree-vectorize" } */
> +
> +__attribute__ ((__simd__))
> +__attribute__ ((__nothrow__ , __leaf__ , __const__, __noinline__))
> +double foo (double x);
> +
> +void bar(double *f, int n)
> +{
> + int i;
> + for (i = 0; i < n; i++)
> + f[i] = foo(f[i]);
> +}
> +
> +double foo(double x)
> +{
> + return x * x / 3.0;
> +}
> diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> index ea9d7a23443..7854cd4bc31 100644
> --- a/gcc/tree-profile.cc
> +++ b/gcc/tree-profile.cc
> @@ -835,16 +835,34 @@ tree_profiling (void)
>
> push_cfun (DECL_STRUCT_FUNCTION (node->decl));
>
> - FOR_EACH_BB_FN (bb, cfun)
> - {
> - gimple_stmt_iterator gsi;
> - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> - {
> - gimple *stmt = gsi_stmt (gsi);
> - if (is_gimple_call (stmt))
> - update_stmt (stmt);
> - }
> - }
> + if (profile_arc_flag || flag_test_coverage)
> + FOR_EACH_BB_FN (bb, cfun)
> + {
> + gimple_stmt_iterator gsi;
> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> + {
> + gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
> + if (!call)
> + continue;
> +
> + /* We do not clear pure/const on decls without body. */
> + tree fndecl = gimple_call_fndecl (call);
> + if (fndecl && !gimple_has_body_p (fndecl))
> + continue;
Actually on second thought, I think I can break this either by making
the wraping function to be thunk or alias or by moving it to different
compilation unit.
Also with LTO we will get body later.
So I think we need to drop this optimization.
Honza
> +
> + /* Drop the const attribute from the call type (the pure
> + attribute is not available on types). */
> + tree fntype = gimple_call_fntype (call);
> + if (fntype && TYPE_READONLY (fntype))
> + gimple_call_set_fntype
> + (call, build_qualified_type (fntype, (TYPE_QUALS (fntype)
> + & ~TYPE_QUAL_CONST)));
> +
> + /* Update virtual operands of calls to no longer const/pure
> + functions. */
> + update_stmt (call);
> + }
> + }
>
> /* re-merge split blocks. */
> cleanup_tree_cfg ();
> --
> 2.35.3
>
On Fri, 24 Mar 2023, Jan Hubicka wrote:
> > From d438a0d84cafced85c90204cba81de0f60ad0073 Mon Sep 17 00:00:00 2001
> > From: Richard Biener <rguenther@suse.de>
> > Date: Thu, 16 Mar 2023 13:51:19 +0100
> > Subject: [PATCH] tree-optimization/106912 - clear const attribute from fntype
> > To: gcc-patches@gcc.gnu.org
> >
> > The following makes sure that after clearing pure/const from
> > instrumented function declarations we are adjusting call statements
> > fntype as well to handle indirect calls and because gimple_call_flags
> > looks at both decl and fntype.
> >
> > Like the pure/const flag clearing on decls we refrain from touching
> > calls to known functions that do not have a body in the current TU.
> >
> > PR tree-optimization/106912
> > * tree-profile.cc (tree_profiling): Update stmts only when
> > profiling or testing coverage. Make sure to update calls
> > fntype, stripping 'const' there.
> >
> > * gcc.dg/profile-generate-4.c: New testcase.
> > ---
> > gcc/testsuite/gcc.dg/profile-generate-4.c | 19 ++++++++++++
> > gcc/tree-profile.cc | 38 +++++++++++++++++------
> > 2 files changed, 47 insertions(+), 10 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.dg/profile-generate-4.c
> >
> > diff --git a/gcc/testsuite/gcc.dg/profile-generate-4.c b/gcc/testsuite/gcc.dg/profile-generate-4.c
> > new file mode 100644
> > index 00000000000..c2b999fe4cb
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/profile-generate-4.c
> > @@ -0,0 +1,19 @@
> > +/* PR106912 */
> > +/* { dg-require-profiling "-fprofile-generate" } */
> > +/* { dg-options "-O2 -fprofile-generate -ftree-vectorize" } */
> > +
> > +__attribute__ ((__simd__))
> > +__attribute__ ((__nothrow__ , __leaf__ , __const__, __noinline__))
> > +double foo (double x);
> > +
> > +void bar(double *f, int n)
> > +{
> > + int i;
> > + for (i = 0; i < n; i++)
> > + f[i] = foo(f[i]);
> > +}
> > +
> > +double foo(double x)
> > +{
> > + return x * x / 3.0;
> > +}
> > diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
> > index ea9d7a23443..7854cd4bc31 100644
> > --- a/gcc/tree-profile.cc
> > +++ b/gcc/tree-profile.cc
> > @@ -835,16 +835,34 @@ tree_profiling (void)
> >
> > push_cfun (DECL_STRUCT_FUNCTION (node->decl));
> >
> > - FOR_EACH_BB_FN (bb, cfun)
> > - {
> > - gimple_stmt_iterator gsi;
> > - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > - {
> > - gimple *stmt = gsi_stmt (gsi);
> > - if (is_gimple_call (stmt))
> > - update_stmt (stmt);
> > - }
> > - }
> > + if (profile_arc_flag || flag_test_coverage)
> > + FOR_EACH_BB_FN (bb, cfun)
> > + {
> > + gimple_stmt_iterator gsi;
> > + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > + {
> > + gcall *call = dyn_cast <gcall *> (gsi_stmt (gsi));
> > + if (!call)
> > + continue;
> > +
> > + /* We do not clear pure/const on decls without body. */
> > + tree fndecl = gimple_call_fndecl (call);
> > + if (fndecl && !gimple_has_body_p (fndecl))
> > + continue;
>
> Actually on second thought, I think I can break this either by making
> the wraping function to be thunk or alias or by moving it to different
> compilation unit.
> Also with LTO we will get body later.
>
> So I think we need to drop this optimization.
It's the same condition guarding the set_{const,pure}_flag call earlier
(but yes, I agree). So it isn't covered by the regression so we
should address this for next stage1.
Richard.
> >
> > Actually on second thought, I think I can break this either by making
> > the wraping function to be thunk or alias or by moving it to different
> > compilation unit.
> > Also with LTO we will get body later.
> >
> > So I think we need to drop this optimization.
>
> It's the same condition guarding the set_{const,pure}_flag call earlier
> (but yes, I agree). So it isn't covered by the regression so we
> should address this for next stage1.
I will do that next stage1 then. This bug is mine and it there for a
long time (it doesn't even need LTO to reproduce).
So it may be regression relative to pre-tree-ssa gccs but fixing such bug can
wait for 13.2.
Honza
new file mode 100644
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fPIC -ftree-vectorize -fprofile-generate" } */
+
+__attribute__ ((__simd__))
+__attribute__ ((__nothrow__ , __leaf__ , __const__))
+double foo (double x);
+void bar(double *f, int n)
+{
+ int i;
+ for (i = 0; i < n; i++)
+ f[i] = foo(f[i]);
+}
+double foo(double x)
+{
+ return x * x / 3.0;
+}
@@ -814,9 +814,6 @@ tree_profiling (void)
/* Don't profile functions produced for builtin stuff. */
if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION)
continue;
-
- node->set_const_flag (false, false);
- node->set_pure_flag (false, false);
}
/* Update call statements and rebuild the cgraph. */