PR rtl-optimization/106594: Preserve zero_extend in combine when cheap.
Checks
Commit Message
This patch addresses PR rtl-optimization/106594, a P1 performance
regression affecting aarch64.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.
If someone (who can regression test this on aarch64) could take this
from here that would be much appreciated. Thanks in advance.
2023-03-04 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR rtl-optimization/106594
* combine.cc (expand_compound_operation): Don't expand/transform
ZERO_EXTEND or SIGN_EXTEND on targets where rtx_cost claims they are
cheap.
Roger
--
Comments
On Sat, Mar 04, 2023 at 06:32:15PM -0000, Roger Sayle wrote:
> This patch addresses PR rtl-optimization/106594, a P1 performance
> regression affecting aarch64.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.
It should be tested for performance everywhere else, too.
It can very easily result in worse code on some targets. This kind of
thing really should be done in stage 1, not stage 4.
> PR rtl-optimization/106594
> * combine.cc (expand_compound_operation): Don't expand/transform
> ZERO_EXTEND or SIGN_EXTEND on targets where rtx_cost claims they are
> cheap.
That is not how combine works. If the old code is more expensive than
what combine comes up with., it *should* transform it. Magic cost
cutoffs are not okay anywhere in combine, either.
If expand_compound_operation and friends misbehave (not really an "if",
unfortunately), then please fix that, instead of randomly disabling
parts of combine?
Segher
The regression was reported during stage-1. A patch was provided during stage 1 and the discussions around combine stalled.
The regression for AArch64 needs to be fixed in GCC 13. The hit is too big just to "take".
So we need a way forward, even if it's stage-4.
Thanks,
Tamar
On 3/5/23 12:28, Tamar Christina via Gcc-patches wrote:
>
> The regression was reported during stage-1. A patch was provided during stage 1 and the discussions around combine stalled.
>
> The regression for AArch64 needs to be fixed in GCC 13. The hit is too big just to "take".
>
> So we need a way forward, even if it's stage-4.
Then it needs to be in a way that works within the design constraints of
combine.
As Segher has indicated, using a magic constant to say "this is always
cheap enough" isn't acceptable. Furthermore, what this patch changes is
combine's internal canonicalization of extensions into shift pairs.
So I think another path forward needs to be found. I don't see hacking
up expand_compound_operation is viable.
Jeff
>
> On 3/5/23 12:28, Tamar Christina via Gcc-patches wrote:
> >
> > The regression was reported during stage-1. A patch was provided during
> stage 1 and the discussions around combine stalled.
> >
> > The regression for AArch64 needs to be fixed in GCC 13. The hit is too big just
> to "take".
> >
> > So we need a way forward, even if it's stage-4.
> Then it needs to be in a way that works within the design constraints of
> combine.
>
> As Segher has indicated, using a magic constant to say "this is always cheap
> enough" isn't acceptable. Furthermore, what this patch changes is combine's
> internal canonicalization of extensions into shift pairs.
>
> So I think another path forward needs to be found. I don't see hacking up
> expand_compound_operation is viable.
I'm not arguing at all about the merits of the patch. My argument was about Segher saying he doesn't think this is a P1 regression or one that should be addressed in stage-4.
We noticed and reported the regression early on during stage-1. So I'm unsure what else we should have done and it's not right to waive off fixing it now, otherwise what's the point in us filing bug reports.
Tamar.
>
> Jeff
Hi!
On Sun, Mar 05, 2023 at 08:43:20PM +0000, Tamar Christina wrote:
> > On 3/5/23 12:28, Tamar Christina via Gcc-patches wrote:
> > > The regression was reported during stage-1. A patch was provided during
> > stage 1 and the discussions around combine stalled.
> > >
> > > The regression for AArch64 needs to be fixed in GCC 13. The hit is too big just
> > to "take".
> > >
> > > So we need a way forward, even if it's stage-4.
> > Then it needs to be in a way that works within the design constraints of
> > combine.
> >
> > As Segher has indicated, using a magic constant to say "this is always cheap
> > enough" isn't acceptable. Furthermore, what this patch changes is combine's
> > internal canonicalization of extensions into shift pairs.
> >
> > So I think another path forward needs to be found. I don't see hacking up
> > expand_compound_operation is viable.
>
> I'm not arguing at all about the merits of the patch. My argument was about Segher saying he doesn't think this is a P1 regression or one that should be addressed in stage-4.
That is not what I said (in the PR). I said:
Either this should not be P1, or the proposed patch is taking
completely the wrong direction. P1 means there is a regression.
There is no regression in combine, in fact the proposed patch would
*cause* regressions on many targets!
(<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594#c13>)
> We noticed and reported the regression early on during stage-1. So I'm unsure what else we should have done and it's not right to waive off fixing it now, otherwise what's the point in us filing bug reports.
Something that fixes the regression is of course welcome. But something
that *causes* many regressions is not. Something that makes
compound_operation stuff better on all targets is more than welcome as
well, but *better* on *all* targets, not regressing most. This really
is stage 1 material most likely. I have been chipping away at this for
years, I don't expect any trivial patch can help, and it certainly won't
solve most problems here.
Maybe a target hook for this is best. But not a completely ad-hoc one,
something usable and maintainable please. So, it should say we do not
want certain kinds of code (or what kind of code we want instead), and
it should not add magic to the bowels of basic passes, magic that just
happens to make the code of particular testcases look better on a
particular target.
Yes, *look* better: I have seen no proof or indication that this would
actually generate better code, not even on just aarch, let alone on the
majority of targets. As I said I have a test running, you may be lucky
even :-) It has to run for about six hours more and after that it needs
analysis still (a few more hours if it isn't obviously always better or
worse), so expect results tomorrow night at the earliest.
Segher
Hi!
On Sun, Mar 05, 2023 at 03:33:40PM -0600, Segher Boessenkool wrote:
> On Sun, Mar 05, 2023 at 08:43:20PM +0000, Tamar Christina wrote:
> Yes, *look* better: I have seen no proof or indication that this would
("looks", I cannot type, sorry)
> actually generate better code, not even on just aarch, let alone on the
> majority of targets. As I said I have a test running, you may be lucky
> even :-) It has to run for about six hours more and after that it needs
> analysis still (a few more hours if it isn't obviously always better or
> worse), so expect results tomorrow night at the earliest.
The results are in:
$ perl sizes.pl --percent C[12]
C1 C2
alpha 7082243 100.066%
arc 4207975 100.015%
arm 11518624 100.008%
arm64 24514565 100.067%
armhf 16661684 100.098%
csky 4031841 100.002%
i386 0 0
ia64 20354295 100.029%
m68k 4394084 100.023%
microblaze 6549965 100.014%
mips 10684680 100.024%
mips64 8171850 100.002%
nios2 4356713 100.012%
openrisc 5010570 100.003%
parisc 8406294 100.002%
parisc64 0 0
powerpc 11104901 99.992%
powerpc64 24532358 100.057%
powerpc64le 21293219 100.062%
riscv32 2028474 100.131%
riscv64 9515453 100.120%
s390 20519612 100.279%
sh 0 0
shnommu 1840960 100.012%
sparc 5314422 100.004%
sparc64 7964129 99.992%
x86_64 0 0
xtensa 2925723 100.070%
C1 is the original, C2 with your patch. These numbers are the code
sizes of a Linux kernel, some defconfig for every arch. This is a good
measure of how effective combine was.
The patch is a tiny win for sparc64 and classic powerpc32 only, but bad
everywhere else. Look at that s390 number! Or riscv, or most of the
arm variants (including aarch64).
Do you want me to look in detail what causes this regression on some
particular target, i.e. why we really still need the expand_compound
functionality there?
(Btw. "0" means the target did not build. For the x86 targets this is
just more -Werror madness that seeped in it seems. For parisc64 and sh
it is the choice of config. Will fix.)
Segher
> Hi!
>
> On Sun, Mar 05, 2023 at 03:33:40PM -0600, Segher Boessenkool wrote:
> > On Sun, Mar 05, 2023 at 08:43:20PM +0000, Tamar Christina wrote:
> > Yes, *look* better: I have seen no proof or indication that this would
>
> ("looks", I cannot type, sorry)
>
> > actually generate better code, not even on just aarch, let alone on
> > the majority of targets. As I said I have a test running, you may be
> > lucky even :-) It has to run for about six hours more and after that
> > it needs analysis still (a few more hours if it isn't obviously always
> > better or worse), so expect results tomorrow night at the earliest.
>
> The results are in:
>
> $ perl sizes.pl --percent C[12]
> C1 C2
> alpha 7082243 100.066%
> arc 4207975 100.015%
> arm 11518624 100.008%
> arm64 24514565 100.067%
> armhf 16661684 100.098%
> csky 4031841 100.002%
> i386 0 0
> ia64 20354295 100.029%
> m68k 4394084 100.023%
> microblaze 6549965 100.014%
> mips 10684680 100.024%
> mips64 8171850 100.002%
> nios2 4356713 100.012%
> openrisc 5010570 100.003%
> parisc 8406294 100.002%
> parisc64 0 0
> powerpc 11104901 99.992%
> powerpc64 24532358 100.057%
> powerpc64le 21293219 100.062%
> riscv32 2028474 100.131%
> riscv64 9515453 100.120%
> s390 20519612 100.279%
> sh 0 0
> shnommu 1840960 100.012%
> sparc 5314422 100.004%
> sparc64 7964129 99.992%
> x86_64 0 0
> xtensa 2925723 100.070%
>
>
> C1 is the original, C2 with your patch. These numbers are the code sizes of a
> Linux kernel, some defconfig for every arch. This is a good measure of how
> effective combine was.
>
> The patch is a tiny win for sparc64 and classic powerpc32 only, but bad
> everywhere else. Look at that s390 number! Or riscv, or most of the arm
> variants (including aarch64).
>
> Do you want me to look in detail what causes this regression on some
> particular target, i.e. why we really still need the expand_compound
> functionality there?
>
Hi,
Thanks for having a look! I think the Richards are exploring a different solution on the PR
so I don't think it's worth looking at now (maybe in stage-1?). Thanks for checking though!
I Appreciate you all helping to get this fixed!
Kind Regards,
Tamar
> (Btw. "0" means the target did not build. For the x86 targets this is just more
> -Werror madness that seeped in it seems. For parisc64 and sh it is the choice
> of config. Will fix.)
>
>
> Segher
@@ -7288,7 +7288,17 @@ expand_compound_operation (rtx x)
&& (STORE_FLAG_VALUE & ~GET_MODE_MASK (inner_mode)) == 0)
return SUBREG_REG (XEXP (x, 0));
+ /* If ZERO_EXTEND is cheap on this target, do nothing,
+ i.e. don't attempt to convert it to a pair of shifts. */
+ if (set_src_cost (x, mode, optimize_this_for_speed_p)
+ <= COSTS_N_INSNS (1))
+ return x;
}
+ /* Likewise, if SIGN_EXTEND is cheap, do nothing. */
+ else if (GET_CODE (x) == SIGN_EXTEND
+ && set_src_cost (x, mode, optimize_this_for_speed_p)
+ <= COSTS_N_INSNS (1))
+ return x;
/* If we reach here, we want to return a pair of shifts. The inner
shift is a left shift of BITSIZE - POS - LEN bits. The outer