rs6000: Don't use optimize_function_for_speed_p too early [PR108184]
Checks
Commit Message
Hi,
As Honza pointed out in [1], the current uses of function
optimize_function_for_speed_p in rs6000_option_override_internal
are too early, since the query results from the functions
optimize_function_for_{speed,size}_p could be changed later due
to profile feedback and some function attributes handlings etc.
This patch is to move optimize_function_for_speed_p to all the
use places of the corresponding flags, which follows the existing
practices. Maybe we can cache it somewhere at an appropriate
timing, but that's another thing.
[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607527.html
Bootstrapped and regtested on powerpc64-linux-gnu P8 and
powerpc64le-linux-gnu P9 and P10.
I'm going to push this soon if no objections.
BR,
Kewen
-----
PR target/108184
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Remove
all optimize_function_for_speed_p uses.
* config/rs6000/predicates.md (fusion_gpr_mem_load): Call
optimize_function_for_speed_p along with TARGET_P8_FUSION_SIGN.
(fusion_gpr_load_p): Likewise.
(expand_fusion_gpr_load): Likewise.
(rs6000_call_aix): Call optimize_function_for_speed_p along with
TARGET_SAVE_TOC_INDIRECT.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr108184.c: New test.
---
gcc/config/rs6000/predicates.md | 4 +++-
gcc/config/rs6000/rs6000.cc | 16 ++++++++++------
gcc/testsuite/gcc.target/powerpc/pr108184.c | 15 +++++++++++++++
3 files changed, 28 insertions(+), 7 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/powerpc/pr108184.c
--
2.27.0
Comments
On Wed, Jan 04, 2023 at 05:20:14PM +0800, Kewen.Lin wrote:
> As Honza pointed out in [1], the current uses of function
> optimize_function_for_speed_p in rs6000_option_override_internal
> are too early, since the query results from the functions
> optimize_function_for_{speed,size}_p could be changed later due
> to profile feedback and some function attributes handlings etc.
>
> This patch is to move optimize_function_for_speed_p to all the
> use places of the corresponding flags, which follows the existing
> practices. Maybe we can cache it somewhere at an appropriate
> timing, but that's another thing.
> @@ -25604,7 +25602,9 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, rtx cookie)
>
> /* Can we optimize saving the TOC in the prologue or
> do we need to do it at every call? */
> - if (TARGET_SAVE_TOC_INDIRECT && !cfun->calls_alloca)
> + if (TARGET_SAVE_TOC_INDIRECT
> + && !cfun->calls_alloca
> + && optimize_function_for_speed_p (cfun))
> cfun->machine->save_toc_in_prologue = true;
Is this correct? If so, it really needs a separate testcase.
The rest looks good. Thanks!
Segher
Hi Segher,
Thanks for the comments.
on 2023/1/4 18:46, Segher Boessenkool wrote:
> On Wed, Jan 04, 2023 at 05:20:14PM +0800, Kewen.Lin wrote:
>> As Honza pointed out in [1], the current uses of function
>> optimize_function_for_speed_p in rs6000_option_override_internal
>> are too early, since the query results from the functions
>> optimize_function_for_{speed,size}_p could be changed later due
>> to profile feedback and some function attributes handlings etc.
>>
>> This patch is to move optimize_function_for_speed_p to all the
>> use places of the corresponding flags, which follows the existing
>> practices. Maybe we can cache it somewhere at an appropriate
>> timing, but that's another thing.
>
>> @@ -25604,7 +25602,9 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, rtx cookie)
>>
>> /* Can we optimize saving the TOC in the prologue or
>> do we need to do it at every call? */
>> - if (TARGET_SAVE_TOC_INDIRECT && !cfun->calls_alloca)
>> + if (TARGET_SAVE_TOC_INDIRECT
>> + && !cfun->calls_alloca
>> + && optimize_function_for_speed_p (cfun))
>> cfun->machine->save_toc_in_prologue = true;
>
> Is this correct? If so, it really needs a separate testcase.
>
Yes, it just moves the condition from:
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3978,8 +3978,7 @@ rs6000_option_override_internal (bool global_init_p)
/* If we can shrink-wrap the TOC register save separately, then use
-msave-toc-indirect unless explicitly disabled. */
if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
- && flag_shrink_wrap_separate
- && optimize_function_for_speed_p (cfun))
+ && flag_shrink_wrap_separate)
rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
here.
I tried to find one test case before, but failed to find one which is not fragile
to test. And I thought the associated test case has demonstrated why the use of
optimize_function_for_{speed,size}_p is too early in function
rs6000_option_override_internal, so I gave up then. Do you worry about that we
could revert it unexpectedly in future and no sensitive test case is on it?
BR,
Kewen
Hi!
On Wed, Jan 04, 2023 at 08:15:03PM +0800, Kewen.Lin wrote:
> on 2023/1/4 18:46, Segher Boessenkool wrote:
> >> @@ -25604,7 +25602,9 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, rtx cookie)
> >>
> >> /* Can we optimize saving the TOC in the prologue or
> >> do we need to do it at every call? */
> >> - if (TARGET_SAVE_TOC_INDIRECT && !cfun->calls_alloca)
> >> + if (TARGET_SAVE_TOC_INDIRECT
> >> + && !cfun->calls_alloca
> >> + && optimize_function_for_speed_p (cfun))
> >> cfun->machine->save_toc_in_prologue = true;
> >
> > Is this correct? If so, it really needs a separate testcase.
>
> Yes, it just moves the condition from:
>
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -3978,8 +3978,7 @@ rs6000_option_override_internal (bool global_init_p)
> /* If we can shrink-wrap the TOC register save separately, then use
> -msave-toc-indirect unless explicitly disabled. */
> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
> - && flag_shrink_wrap_separate
> - && optimize_function_for_speed_p (cfun))
> + && flag_shrink_wrap_separate)
> rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
>
> here.
That "just" reinforces that this really needs a testcase! It is all
action at a distance, none of this is trivial (if it was there would
not be a bug here in the first place, of course).
> I tried to find one test case before, but failed to find one which is not fragile
> to test. And I thought the associated test case has demonstrated why the use of
> optimize_function_for_{speed,size}_p is too early in function
> rs6000_option_override_internal, so I gave up then. Do you worry about that we
> could revert it unexpectedly in future and no sensitive test case is on it?
I worry that it might contradict what some other code does. I also
worry that it just is not a sensible thing to do.
I do not worry that your patch is not an improvement. But the resulting
code more clearly (than the original) is problematic. Where is r2 saved
to the frame if save_toc_in_prologue is false?
Segher
on 2023/1/4 22:02, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Jan 04, 2023 at 08:15:03PM +0800, Kewen.Lin wrote:
>> on 2023/1/4 18:46, Segher Boessenkool wrote:
>>>> @@ -25604,7 +25602,9 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, rtx cookie)
>>>>
>>>> /* Can we optimize saving the TOC in the prologue or
>>>> do we need to do it at every call? */
>>>> - if (TARGET_SAVE_TOC_INDIRECT && !cfun->calls_alloca)
>>>> + if (TARGET_SAVE_TOC_INDIRECT
>>>> + && !cfun->calls_alloca
>>>> + && optimize_function_for_speed_p (cfun))
>>>> cfun->machine->save_toc_in_prologue = true;
>>>
>>> Is this correct? If so, it really needs a separate testcase.
>>
>> Yes, it just moves the condition from:
>>
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -3978,8 +3978,7 @@ rs6000_option_override_internal (bool global_init_p)
>> /* If we can shrink-wrap the TOC register save separately, then use
>> -msave-toc-indirect unless explicitly disabled. */
>> if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
>> - && flag_shrink_wrap_separate
>> - && optimize_function_for_speed_p (cfun))
>> + && flag_shrink_wrap_separate)
>> rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
>>
>> here.
>
> That "just" reinforces that this really needs a testcase! It is all
> action at a distance, none of this is trivial (if it was there would
> not be a bug here in the first place, of course).
OK, I'll make a test case for it. :)
>
>> I tried to find one test case before, but failed to find one which is not fragile
>> to test. And I thought the associated test case has demonstrated why the use of
>> optimize_function_for_{speed,size}_p is too early in function
>> rs6000_option_override_internal, so I gave up then. Do you worry about that we
>> could revert it unexpectedly in future and no sensitive test case is on it?
>
> I worry that it might contradict what some other code does. I also
> worry that it just is not a sensible thing to do.
>
> I do not worry that your patch is not an improvement. But the resulting
> code more clearly (than the original) is problematic. Where is r2 saved
> to the frame if save_toc_in_prologue is false?
If save_toc_in_prologue is false, the r2 saving to frame would occur at each
indirect call. Currently separate shrink-wrapping will check save_toc_in_prologue
to decide whether to consider saving toc as one component, I think that's why
we enable save-toc-indirect implicitly (going to set save_toc_in_prologue)
if it's not specified explicitly and doing separate shrink-wrapping.
BR,
Kewen
@@ -1878,7 +1878,9 @@ (define_predicate "fusion_gpr_mem_load"
/* Handle sign/zero extend. */
if (GET_CODE (op) == ZERO_EXTEND
- || (TARGET_P8_FUSION_SIGN && GET_CODE (op) == SIGN_EXTEND))
+ || (TARGET_P8_FUSION_SIGN
+ && GET_CODE (op) == SIGN_EXTEND
+ && optimize_function_for_speed_p (cfun)))
{
op = XEXP (op, 0);
mode = GET_MODE (op);
@@ -3978,8 +3978,7 @@ rs6000_option_override_internal (bool global_init_p)
/* If we can shrink-wrap the TOC register save separately, then use
-msave-toc-indirect unless explicitly disabled. */
if ((rs6000_isa_flags_explicit & OPTION_MASK_SAVE_TOC_INDIRECT) == 0
- && flag_shrink_wrap_separate
- && optimize_function_for_speed_p (cfun))
+ && flag_shrink_wrap_separate)
rs6000_isa_flags |= OPTION_MASK_SAVE_TOC_INDIRECT;
/* Enable power8 fusion if we are tuning for power8, even if we aren't
@@ -4013,7 +4012,6 @@ rs6000_option_override_internal (bool global_init_p)
zero extending load, and an explicit sign extension. */
if (TARGET_P8_FUSION
&& !(rs6000_isa_flags_explicit & OPTION_MASK_P8_FUSION_SIGN)
- && optimize_function_for_speed_p (cfun)
&& optimize >= 3)
rs6000_isa_flags |= OPTION_MASK_P8_FUSION_SIGN;
@@ -25604,7 +25602,9 @@ rs6000_call_aix (rtx value, rtx func_desc, rtx tlsarg, rtx cookie)
/* Can we optimize saving the TOC in the prologue or
do we need to do it at every call? */
- if (TARGET_SAVE_TOC_INDIRECT && !cfun->calls_alloca)
+ if (TARGET_SAVE_TOC_INDIRECT
+ && !cfun->calls_alloca
+ && optimize_function_for_speed_p (cfun))
cfun->machine->save_toc_in_prologue = true;
else
{
@@ -27471,7 +27471,9 @@ fusion_gpr_load_p (rtx addis_reg, /* register set via addis. */
/* Allow sign/zero extension. */
if (GET_CODE (mem) == ZERO_EXTEND
- || (GET_CODE (mem) == SIGN_EXTEND && TARGET_P8_FUSION_SIGN))
+ || (GET_CODE (mem) == SIGN_EXTEND
+ && TARGET_P8_FUSION_SIGN
+ && optimize_function_for_speed_p (cfun)))
mem = XEXP (mem, 0);
if (!MEM_P (mem))
@@ -27535,7 +27537,9 @@ expand_fusion_gpr_load (rtx *operands)
enum rtx_code extend = UNKNOWN;
if (GET_CODE (orig_mem) == ZERO_EXTEND
- || (TARGET_P8_FUSION_SIGN && GET_CODE (orig_mem) == SIGN_EXTEND))
+ || (TARGET_P8_FUSION_SIGN
+ && GET_CODE (orig_mem) == SIGN_EXTEND
+ && optimize_function_for_speed_p (cfun)))
{
extend = GET_CODE (orig_mem);
orig_mem = XEXP (orig_mem, 0);
new file mode 100644
@@ -0,0 +1,15 @@
+/* Only possible to fuse sign extended loads with the addis when
+ optimize >= 3 and Power8 fusion takes effects. */
+/* { dg-options "-mdejagnu-tune=power8 -O3" } */
+
+/* Verify it doesn't optimize this function for speed as it's cold,
+ by checking it doesn't try to fuse sign extended loads with addis,
+ which results in a zero extended load and a sign extension. */
+
+__attribute__ ((cold)) int
+fusion_short (short *p)
+{
+ return p[0x12345];
+}
+
+/* { dg-final { scan-assembler-not {\mextsh\M} } } */