Always enable LRA

Message ID d9063ef11e8eff2f1aa24d949235e687de4ce968.1665699882.git.segher@kernel.crashing.org
State Accepted
Headers
Series Always enable LRA |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Segher Boessenkool Oct. 13, 2022, 11:56 p.m. UTC
  This small patch changes everything that checks targetm.lra_p behave as
if it returned true.

It has no effect on any primary or secondary target.  It also is fine
for nds32 and for nios2, and it works fine for microblaze (which used
old reload before), resulting in smaller code.

I have patches to completely rip out old reload, and more stuff after
that, but of course not everything is nice yet:

It makes a few targets no longer build though.  In my testing (of all
linux targets that built before) these are alpha, c6x, h8300, m68k,
32-bit parisc, sh, and xtensa.

alpha builds the compiler, but it then crashes with
/home/segher/src/kernel/drivers/tty/serial/serial_core.c:1029:1: internal compiler error: maximum number of generated reload insns per insn achieved (90)
(and in three more files) which can mean anything unfortunately.

c6x is more exciting:
/home/segher/src/kernel/fs/char_dev.c:598:1: internal compiler error: in priority, at haifa-sched.cc:1597
which is
  /* We should not be interested in priority of an already scheduled insn.  */
  gcc_assert (QUEUE_INDEX (insn) != QUEUE_SCHEDULED);

h8300 fails during GCC build:
/home/segher/src/gcc/libgcc/unwind.inc: In function '_Unwind_SjLj_RaiseException':
/home/segher/src/gcc/libgcc/unwind.inc:141:1: error: could not split insn
  141 | }
      | ^
(insn 69 256 327 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [12  S4 A32])
        (reg/f:SI 7 sp)) "/home/segher/src/gcc/libgcc/unwind.inc":118:12 19 {*movsi}
     (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
        (nil)))
during RTL pass: final
which looks like a backend bug, I don't see a pattern that could split
this (without needing an extra clobber)?

m68k builds the compiler fine, but then has
/home/segher/src/kernel/fs/squashfs/namei.c: In function 'squashfs_lookup':
/home/segher/src/kernel/fs/squashfs/namei.c:232:1: error: insn does not satisfy its constraints:
  232 | }
      | ^
(insn 270 470 271 30 (set (mem:SI (pre_dec:SI (reg/f:SI 15 %sp)) [1  S4 A16])
        (plus:SI (sign_extend:SI (reg:HI 8 %a0 [175]))
            (reg:SI 2 %d2 [orig:173 val ] [173]))) "/home/segher/src/kernel/fs/squashfs/namei.c":212:13 77 {pushasi}
     (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
        (nil)))
during RTL pass: postreload
/home/segher/src/kernel/fs/squashfs/namei.c:232:1: internal compiler error: in extract_constrain_insn, at recog.cc:2692
(and similar in two more files).

32-bit parisc now runs into the 90 reloads thing (parisc64 already did
without the patch).

sh doesn't build GCC:
/home/segher/src/gcc/libgcc/libgcc2.c: In function '__divdc3':
/home/segher/src/gcc/libgcc/libgcc2.c:2182:1: error: unable to generate reloads for:
 2182 | }
      | ^
(call_insn/u 132 131 1855 7 (parallel [
            (set (reg:SI 0 r0)
                (call (mem:SI (symbol_ref:SI ("__ltdf2") [flags 0x41] <function_decl 0x7fff8a796500 __ltdf2>) [0  S4 A32])
                    (const_int 0 [0])))
            (use (reg:SI 154 fpscr0))
            (use (reg:SI 12 r12))
            (clobber (reg:SI 146 pr))
            (clobber (reg:SI 758))
        ]) "/home/segher/src/gcc/libgcc/libgcc2.c":2082:7 233 {call_value_pcrel}
     (expr_list:REG_DEAD (reg:DF 6 r6)
        (expr_list:REG_DEAD (reg:DF 4 r4)
            (expr_list:REG_CALL_DECL (symbol_ref:SI ("__ltdf2") [flags 0x41] <function_decl 0x7fff8a796500 __ltdf2>)
                (expr_list:REG_EH_REGION (const_int -2147483648 [0xffffffff80000000])
                    (nil)))))
    (expr_list (use (reg:DF 6 r6))
        (expr_list (use (reg:DF 4 r4))
            (nil))))
during RTL pass: reload

And finally, xtensa does
/home/segher/src/gcc/libgcc/libgcc2.c:840:1: error: insn does not satisfy its constraints:
  840 | }
      | ^
(insn 8 7 9 2 (set (reg:SI 9 a9 [57])
        (const_int 1431655765 [0x55555555])) "/home/segher/src/gcc/libgcc/libgcc2.c":828:21 37 {movsi_internal}
     (expr_list:REG_EQUIV (const_int 1431655765 [0x55555555])
        (nil)))
during RTL pass: postreload
/home/segher/src/gcc/libgcc/libgcc2.c:840:1: internal compiler error: in extract_constrain_insn, at recog.cc:2692

 - ~ - ~ -

All in all, more worked than expected.  I think it is time to finally
make this switch.

I did not test targets without a linux port.  I build the linux kernel
as well, to see if the resulting compiler actually works :-)
Suggestions what to use for other targets are welcome.

Is there any way to get the final targets updated for LRA?

Other rants?  :-)


Segher

---
 gcc/auto-inc-dec.cc | 2 +-
 gcc/combine.cc      | 2 +-
 gcc/ira-lives.cc    | 5 -----
 gcc/ira.cc          | 2 +-
 gcc/recog.cc        | 2 +-
 gcc/targhooks.cc    | 4 ----
 6 files changed, 4 insertions(+), 13 deletions(-)
  

Comments

Li, Pan2 via Gcc-patches Oct. 14, 2022, 12:36 a.m. UTC | #1
> On Oct 13, 2022, at 7:56 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> This small patch changes everything that checks targetm.lra_p behave as
> if it returned true.
> 
> It has no effect on any primary or secondary target.  It also is fine
> for nds32 and for nios2, and it works fine for microblaze (which used
> old reload before), resulting in smaller code.
> 
> I have patches to completely rip out old reload, and more stuff after
> that, but of course not everything is nice yet:

I guess I'll have to look harder to see if it's possible to make LRA handle CISC addressing modes like memory indirect, autoincrement, autodecrement, and others that the old reload handles at least somewhat.  Ideally LRA should do a better job; right now I believe it doesn't really do these things at all.  Targets like pdp11 and vax would like these.

	paul
  
Jeff Law Oct. 14, 2022, 1:07 a.m. UTC | #2
On 10/13/22 17:56, Segher Boessenkool wrote:
>
> h8300 fails during GCC build:
> /home/segher/src/gcc/libgcc/unwind.inc: In function '_Unwind_SjLj_RaiseException':
> /home/segher/src/gcc/libgcc/unwind.inc:141:1: error: could not split insn
>    141 | }
>        | ^
> (insn 69 256 327 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [12  S4 A32])
>          (reg/f:SI 7 sp)) "/home/segher/src/gcc/libgcc/unwind.inc":118:12 19 {*movsi}
>       (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
>          (nil)))
> during RTL pass: final
> which looks like a backend bug, I don't see a pattern that could split
> this (without needing an extra clobber)?

I'm aware of this -- its invalid RTL:

Uses of the register outside of an address are not permitted within the
same insn as a use in an embedded side effect expression because such
insns behave differently on different machines and hence must be treated
as ambiguous and disallowed.


I'd actually tried to turn on LRA for the H8 port a little while ago and 
stumbled over it.

I'm aware of a similar situation involving a general register on the H8, 
but using reload instead of LRA.  I looked at it a while back and my 
recollection was that the insn was actually fine until reload got its 
grubby hands on it.  And when I wandered reload to hunt for anything 
which handled the restriction noted above, I didn't find anything.  If 
you were to search for H8 bugs in bugzilla, it should be discoverable.

While we could potentially work around this in the backend, it'd be a 
hack at best.  It hasn't risen to the top of my priority list yet.  I 
considered suggesting we change this from "invalid" to "target defined" 
behavior, but that felt like a cop-out.


jeff
  
Jeff Law Oct. 14, 2022, 4:47 a.m. UTC | #3
On 10/13/22 17:56, Segher Boessenkool wrote:
>
> h8300 fails during GCC build:
> /home/segher/src/gcc/libgcc/unwind.inc: In function '_Unwind_SjLj_RaiseException':
> /home/segher/src/gcc/libgcc/unwind.inc:141:1: error: could not split insn
>    141 | }
>        | ^
> (insn 69 256 327 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [12  S4 A32])
>          (reg/f:SI 7 sp)) "/home/segher/src/gcc/libgcc/unwind.inc":118:12 19 {*movsi}
>       (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
>          (nil)))
> during RTL pass: final
> which looks like a backend bug, I don't see a pattern that could split
> this (without needing an extra clobber)?

Really smells like an LRA bug to me.


We have this as we leave IRA:


(insn 10 9 11 2 (set (reg/f:SI 30)
        (plus:SI (reg/f:SI 11 fp)
            (const_int -4 [0xfffffffffffffffc]))) "j.c":31:7 264 {*addsi}
     (expr_list:REG_EQUIV (plus:SI (reg/f:SI 11 fp)
            (const_int -4 [0xfffffffffffffffc]))
(nil)))
(insn 11 10 12 2 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [3  S4 A32])
        (reg/f:SI 30)) "j.c":31:7 19 {*movsi}
     (expr_list:REG_DEAD (reg/f:SI 30)
        (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
            (nil))))


And as we leave LRA:

(note 10 9 11 2 NOTE_INSN_DELETED)
(insn 11 10 13 2 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [3 S4 A32])
         (reg/f:SI 7 sp)) "j.c":31:7 19 {*movsi}
      (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
         (nil)))


Register elimination ultimately discovered that (reg 30) was the same as 
the stack pointer and did the natural substitution.    The natural 
substitution results in invalid RTL and there's really no way to back 
out and do something different AFAICT in lra-eliminations.cc.

The only reason we fault is because the H8 backend knows this is invalid 
RTL and refuses to split it.  If we were to try and re-recognize the 
insn in question it would fail to recognize after the substitution 
because the auto-inc'd operand overlaps with the other operand.

Jeff
  
Takayuki 'January June' Suwa Oct. 14, 2022, 6:20 a.m. UTC | #4
On 2022/10/14 8:56, Segher Boessenkool wrote:
> And finally, xtensa does
> /home/segher/src/gcc/libgcc/libgcc2.c:840:1: error: insn does not satisfy its constraints:
>   840 | }
>       | ^
> (insn 8 7 9 2 (set (reg:SI 9 a9 [57])
>         (const_int 1431655765 [0x55555555])) "/home/segher/src/gcc/libgcc/libgcc2.c":828:21 37 {movsi_internal}
>      (expr_list:REG_EQUIV (const_int 1431655765 [0x55555555])
>         (nil)))
> during RTL pass: postreload
> /home/segher/src/gcc/libgcc/libgcc2.c:840:1: internal compiler error: in extract_constrain_insn, at recog.cc:2692

This is a result of knowing that Reload is tolerant of out-of-constraint constants.
LRA support needs to be taken care of before that, ie. in the "split1" pass.
Excuse me in haste.
  
Li, Pan2 via Gcc-patches Oct. 14, 2022, 12:37 p.m. UTC | #5
> On Oct 13, 2022, at 9:07 PM, Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> 
> On 10/13/22 17:56, Segher Boessenkool wrote:
>> 
>> h8300 fails during GCC build:
>> /home/segher/src/gcc/libgcc/unwind.inc: In function '_Unwind_SjLj_RaiseException':
>> /home/segher/src/gcc/libgcc/unwind.inc:141:1: error: could not split insn
>>   141 | }
>>       | ^
>> (insn 69 256 327 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [12  S4 A32])
>>         (reg/f:SI 7 sp)) "/home/segher/src/gcc/libgcc/unwind.inc":118:12 19 {*movsi}
>>      (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
>>         (nil)))
>> during RTL pass: final
>> which looks like a backend bug, I don't see a pattern that could split
>> this (without needing an extra clobber)?
> 
> I'm aware of this -- its invalid RTL:
> 
> Uses of the register outside of an address are not permitted within the
> same insn as a use in an embedded side effect expression because such
> insns behave differently on different machines and hence must be treated
> as ambiguous and disallowed.

I had a bit of a fight with this sort of thing in pdp11, where in fact such operations are executed differently on different machine models.  The solution I picked is to create two sets of machine-specific constraint codes, one for "register N" and the other for "autoinc/dec of any register other than N" and pairing those.  (You can see this in pdp11.md, the mov<mode> definition.)

But the pdp11 case is actually not as restrictive as the rule you mentioned.  The problem case is register N source, autoinc/dec rN destination.  The opposite case, which we see here -- autoinc/dec Rn source, Rn destination -- is just fine.  Perhaps not all that important, but the ISA definition does not object to it.  So I'm not sure why there would be a general rule that says it's considered ambiguous when the target machine architecture says it is not.

	paul
  
Jeff Law Oct. 14, 2022, 2:38 p.m. UTC | #6
On 10/14/22 06:37, Koning, Paul wrote:
>
>> On Oct 13, 2022, at 9:07 PM, Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>
>>
>> On 10/13/22 17:56, Segher Boessenkool wrote:
>>> h8300 fails during GCC build:
>>> /home/segher/src/gcc/libgcc/unwind.inc: In function '_Unwind_SjLj_RaiseException':
>>> /home/segher/src/gcc/libgcc/unwind.inc:141:1: error: could not split insn
>>>    141 | }
>>>        | ^
>>> (insn 69 256 327 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [12  S4 A32])
>>>          (reg/f:SI 7 sp)) "/home/segher/src/gcc/libgcc/unwind.inc":118:12 19 {*movsi}
>>>       (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
>>>          (nil)))
>>> during RTL pass: final
>>> which looks like a backend bug, I don't see a pattern that could split
>>> this (without needing an extra clobber)?
>> I'm aware of this -- its invalid RTL:
>>
>> Uses of the register outside of an address are not permitted within the
>> same insn as a use in an embedded side effect expression because such
>> insns behave differently on different machines and hence must be treated
>> as ambiguous and disallowed.
> I had a bit of a fight with this sort of thing in pdp11, where in fact such operations are executed differently on different machine models.  The solution I picked is to create two sets of machine-specific constraint codes, one for "register N" and the other for "autoinc/dec of any register other than N" and pairing those.  (You can see this in pdp11.md, the mov<mode> definition.)

I've long suspected the pdp11 was the inspiration for this restriction 
(I have memories of noting it before I relocated to Utah, so circa 
1992).  The key problem is the generic parts of the compiler don't know 
what the semantics ought to be -- so it's not obvious when they do a 
substitution whether or not the substitution of one reg for another is 
actually valid.  It's important to remember that sometimes when we 
substitute one register for another, we don't have any contextual 
information about source vs dest -- it's a long standing wart that 
causes problems in other cases as well.

That punts the problem to the backends and the H8 actually tries to deal 
with this restriction.  Basically in the movxx pattern conditions, when 
the destination uses an autoinc addressing mode, the pattern's condition 
will check that the source register is different.  I would expect other 
ports likely to do something similar.

But that approach falls down with reload/lra doing substitutions without 
validating the result.  I guess it might be possible to cobble together 
something with secondary reloads, but it's way way way down on my todo list.

And yes, this case where the autoinc is on the destination works 
consistently on the H8 as well.  We could consider loosening the 
restrictions and let this through.  It's certainly simpler as it's a doc 
change and removing a bit of code on the H8.  It sounds like the pdp11 
already assumes that case is valid.


Jeff
  
Segher Boessenkool Oct. 14, 2022, 4:18 p.m. UTC | #7
On Fri, Oct 14, 2022 at 12:36:47AM +0000, Koning, Paul wrote:
> I guess I'll have to look harder to see if it's possible to make LRA handle CISC addressing modes like memory indirect, autoincrement, autodecrement, and others that the old reload handles at least somewhat.  Ideally LRA should do a better job; right now I believe it doesn't really do these things at all.  Targets like pdp11 and vax would like these.

So what does it do now?  Break every more complex addressing mode apart
again?  Or ICE?  Or something in between?


Segher
  
Segher Boessenkool Oct. 14, 2022, 4:25 p.m. UTC | #8
On Fri, Oct 14, 2022 at 03:20:40PM +0900, Takayuki 'January June' Suwa wrote:
> On 2022/10/14 8:56, Segher Boessenkool wrote:
> > And finally, xtensa does
> > /home/segher/src/gcc/libgcc/libgcc2.c:840:1: error: insn does not satisfy its constraints:
> >   840 | }
> >       | ^
> > (insn 8 7 9 2 (set (reg:SI 9 a9 [57])
> >         (const_int 1431655765 [0x55555555])) "/home/segher/src/gcc/libgcc/libgcc2.c":828:21 37 {movsi_internal}
> >      (expr_list:REG_EQUIV (const_int 1431655765 [0x55555555])
> >         (nil)))
> > during RTL pass: postreload
> > /home/segher/src/gcc/libgcc/libgcc2.c:840:1: internal compiler error: in extract_constrain_insn, at recog.cc:2692
> 
> This is a result of knowing that Reload is tolerant of out-of-constraint constants.
> LRA support needs to be taken care of before that, ie. in the "split1" pass.
> Excuse me in haste.

So old reload did a split itself?  Or it left it to say the split2 pass?

Thanks for looking into it!


Segher
  
Segher Boessenkool Oct. 14, 2022, 4:37 p.m. UTC | #9
Hi!

On Thu, Oct 13, 2022 at 10:47:20PM -0600, Jeff Law wrote:
> On 10/13/22 17:56, Segher Boessenkool wrote:
> >h8300 fails during GCC build:
> >/home/segher/src/gcc/libgcc/unwind.inc: In function 
> >'_Unwind_SjLj_RaiseException':
> >/home/segher/src/gcc/libgcc/unwind.inc:141:1: error: could not split insn
> >   141 | }
> >       | ^
> >(insn 69 256 327 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [12  S4 A32])
> >         (reg/f:SI 7 sp)) "/home/segher/src/gcc/libgcc/unwind.inc":118:12 
> >         19 {*movsi}
> >      (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
> >         (nil)))
> >during RTL pass: final
> >which looks like a backend bug, I don't see a pattern that could split
> >this (without needing an extra clobber)?
> 
> Really smells like an LRA bug to me.
> 
> 
> We have this as we leave IRA:
> 
> 
> (insn 10 9 11 2 (set (reg/f:SI 30)
>        (plus:SI (reg/f:SI 11 fp)
>            (const_int -4 [0xfffffffffffffffc]))) "j.c":31:7 264 
> {*addsi}
>     (expr_list:REG_EQUIV (plus:SI (reg/f:SI 11 fp)
>            (const_int -4 [0xfffffffffffffffc]))
> (nil)))
> (insn 11 10 12 2 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [3  S4 A32])
>        (reg/f:SI 30)) "j.c":31:7 19 {*movsi}
>     (expr_list:REG_DEAD (reg/f:SI 30)
>        (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
>            (nil))))
> 
> 
> And as we leave LRA:
> 
> (note 10 9 11 2 NOTE_INSN_DELETED)
> (insn 11 10 13 2 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [3 S4 A32])
>         (reg/f:SI 7 sp)) "j.c":31:7 19 {*movsi}
>      (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
>         (nil)))

LRA only ever generates insns that pass recog.  The backend allows this
define_insn, requiring it to be split (it returns template "#"), but
then somehow it doesn't match in any split pass?

> Register elimination ultimately discovered that (reg 30) was the same as 
> the stack pointer and did the natural substitution.    The natural 
> substitution results in invalid RTL and there's really no way to back 
> out and do something different AFAICT in lra-eliminations.cc.
> 
> The only reason we fault is because the H8 backend knows this is invalid 
> RTL and refuses to split it.  If we were to try and re-recognize the 
> insn in question it would fail to recognize after the substitution 
> because the auto-inc'd operand overlaps with the other operand.

But it *did* recog?  Or does LRA somehow not always recog() everything?
I always thought that was one of the huge improvements over old reload
(it does everything very locally instead of very globally)!


Segher
  
Li, Pan2 via Gcc-patches Oct. 14, 2022, 4:37 p.m. UTC | #10
> On Oct 14, 2022, at 10:38 AM, Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> 
> On 10/14/22 06:37, Koning, Paul wrote:
>> 
>>> On Oct 13, 2022, at 9:07 PM, Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>> 
>>> 
>>> On 10/13/22 17:56, Segher Boessenkool wrote:
>>>> h8300 fails during GCC build:
>>>> /home/segher/src/gcc/libgcc/unwind.inc: In function '_Unwind_SjLj_RaiseException':
>>>> /home/segher/src/gcc/libgcc/unwind.inc:141:1: error: could not split insn
>>>>   141 | }
>>>>       | ^
>>>> (insn 69 256 327 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [12  S4 A32])
>>>>         (reg/f:SI 7 sp)) "/home/segher/src/gcc/libgcc/unwind.inc":118:12 19 {*movsi}
>>>>      (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
>>>>         (nil)))
>>>> during RTL pass: final
>>>> which looks like a backend bug, I don't see a pattern that could split
>>>> this (without needing an extra clobber)?
>>> I'm aware of this -- its invalid RTL:
>>> 
>>> Uses of the register outside of an address are not permitted within the
>>> same insn as a use in an embedded side effect expression because such
>>> insns behave differently on different machines and hence must be treated
>>> as ambiguous and disallowed.
>> I had a bit of a fight with this sort of thing in pdp11, where in fact such operations are executed differently on different machine models.  The solution I picked is to create two sets of machine-specific constraint codes, one for "register N" and the other for "autoinc/dec of any register other than N" and pairing those.  (You can see this in pdp11.md, the mov<mode> definition.)
> 
> I've long suspected the pdp11 was the inspiration for this restriction (I have memories of noting it before I relocated to Utah, so circa 1992).  The key problem is the generic parts of the compiler don't know what the semantics ought to be -- so it's not obvious when they do a substitution whether or not the substitution of one reg for another is actually valid.  It's important to remember that sometimes when we substitute one register for another, we don't have any contextual information about source vs dest -- it's a long standing wart that causes problems in other cases as well.
> 
> That punts the problem to the backends and the H8 actually tries to deal with this restriction.  Basically in the movxx pattern conditions, when the destination uses an autoinc addressing mode, the pattern's condition will check that the source register is different.  I would expect other ports likely to do something similar.
> 
> But that approach falls down with reload/lra doing substitutions without validating the result.  I guess it might be possible to cobble together something with secondary reloads, but it's way way way down on my todo list.

Aren't the constraints enforced?  My experience is that I was getting these bad addressing modes in some test programs, and that the constraints I created to make the requirement explicit cured that.  Maybe I'm expecting too much from constraints, but my (admittedly inexperienced) understanding of them is that they inform reload what sort of things it can construct, and what it cannot.

If reload obeys the constraints in the patterns then the back end machine definition can be written to avoid the problematic cases, and it is no longer necessary to have a general (and as I pointed out, overly broad) rule in generic code.

	paul
  
Richard Biener Oct. 14, 2022, 4:39 p.m. UTC | #11
> Am 14.10.2022 um 16:40 schrieb Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org>:
> 
> 
>> On 10/14/22 06:37, Koning, Paul wrote:
>> 
>>>> On Oct 13, 2022, at 9:07 PM, Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>> 
>>> 
>>> On 10/13/22 17:56, Segher Boessenkool wrote:
>>>> h8300 fails during GCC build:
>>>> /home/segher/src/gcc/libgcc/unwind.inc: In function '_Unwind_SjLj_RaiseException':
>>>> /home/segher/src/gcc/libgcc/unwind.inc:141:1: error: could not split insn
>>>>   141 | }
>>>>       | ^
>>>> (insn 69 256 327 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [12  S4 A32])
>>>>         (reg/f:SI 7 sp)) "/home/segher/src/gcc/libgcc/unwind.inc":118:12 19 {*movsi}
>>>>      (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
>>>>         (nil)))
>>>> during RTL pass: final
>>>> which looks like a backend bug, I don't see a pattern that could split
>>>> this (without needing an extra clobber)?
>>> I'm aware of this -- its invalid RTL:
>>> 
>>> Uses of the register outside of an address are not permitted within the
>>> same insn as a use in an embedded side effect expression because such
>>> insns behave differently on different machines and hence must be treated
>>> as ambiguous and disallowed.
>> I had a bit of a fight with this sort of thing in pdp11, where in fact such operations are executed differently on different machine models.  The solution I picked is to create two sets of machine-specific constraint codes, one for "register N" and the other for "autoinc/dec of any register other than N" and pairing those.  (You can see this in pdp11.md, the mov<mode> definition.)
> 
> I've long suspected the pdp11 was the inspiration for this restriction (I have memories of noting it before I relocated to Utah, so circa 1992).  The key problem is the generic parts of the compiler don't know what the semantics ought to be -- so it's not obvious when they do a substitution whether or not the substitution of one reg for another is actually valid.  It's important to remember that sometimes when we substitute one register for another, we don't have any contextual information about source vs dest -- it's a long standing wart that causes problems in other cases as well.
> 
> That punts the problem to the backends and the H8 actually tries to deal with this restriction.  Basically in the movxx pattern conditions, when the destination uses an autoinc addressing mode, the pattern's condition will check that the source register is different.  I would expect other ports likely to do something similar.
> 
> But that approach falls down with reload/lra doing substitutions without validating the result.  I guess it might be possible to cobble together something with secondary reloads, but it's way way way down on my todo list.
> 
> And yes, this case where the autoinc is on the destination works consistently on the H8 as well.  We could consider loosening the restrictions and let this through.  It's certainly simpler as it's a doc change and removing a bit of code on the H8.  It sounds like the pdp11 already assumes that case is valid.

But what is the semantics of the RTL IL?
That ought to be documented.


> 
> Jeff
>
  
Li, Pan2 via Gcc-patches Oct. 14, 2022, 4:48 p.m. UTC | #12
> On Oct 14, 2022, at 12:18 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Oct 14, 2022 at 12:36:47AM +0000, Koning, Paul wrote:
>> I guess I'll have to look harder to see if it's possible to make LRA handle CISC addressing modes like memory indirect, autoincrement, autodecrement, and others that the old reload handles at least somewhat.  Ideally LRA should do a better job; right now I believe it doesn't really do these things at all.  Targets like pdp11 and vax would like these.
> 
> So what does it do now?  Break every more complex addressing mode apart
> again?  Or ICE?  Or something in between?

The former.  LRA does handle some cases but not all that the target permits and not as many as the old reload.

Example:

unsigned int cksum (unsigned int *buf, unsigned int len)
{
    unsigned int ret = 0;
    do {
        ret += *buf++;
    } while (--len != 0);
    return ret;
}

The loop looks like this:

L_2:
	add	(r2)+,r0
	sob	r1,L_2

which is what I would expect.  Now throw in an indirection:

Old reload produces this loop:

L_2:
	add	@(r2)+,r0
	sob	r1,L_2

while LRA doesn't understand it can use the autoincrement indirect mode:

L_2:
	mov	(r2)+,r3
	add	(r3),r0
	sob	r1,L_2

This is from a GCC 13.0 test build of last June, with -O2 -m45, with and without -mlra.

    paul
  
Jeff Law Oct. 14, 2022, 5:07 p.m. UTC | #13
On 10/14/22 10:37, Segher Boessenkool wrote:
> Hi!
>
> On Thu, Oct 13, 2022 at 10:47:20PM -0600, Jeff Law wrote:
>> On 10/13/22 17:56, Segher Boessenkool wrote:
>>> h8300 fails during GCC build:
>>> /home/segher/src/gcc/libgcc/unwind.inc: In function
>>> '_Unwind_SjLj_RaiseException':
>>> /home/segher/src/gcc/libgcc/unwind.inc:141:1: error: could not split insn
>>>    141 | }
>>>        | ^
>>> (insn 69 256 327 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [12  S4 A32])
>>>          (reg/f:SI 7 sp)) "/home/segher/src/gcc/libgcc/unwind.inc":118:12
>>>          19 {*movsi}
>>>       (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
>>>          (nil)))
>>> during RTL pass: final
>>> which looks like a backend bug, I don't see a pattern that could split
>>> this (without needing an extra clobber)?
>> Really smells like an LRA bug to me.
>>
>>
>> We have this as we leave IRA:
>>
>>
>> (insn 10 9 11 2 (set (reg/f:SI 30)
>>         (plus:SI (reg/f:SI 11 fp)
>>             (const_int -4 [0xfffffffffffffffc]))) "j.c":31:7 264
>> {*addsi}
>>      (expr_list:REG_EQUIV (plus:SI (reg/f:SI 11 fp)
>>             (const_int -4 [0xfffffffffffffffc]))
>> (nil)))
>> (insn 11 10 12 2 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [3  S4 A32])
>>         (reg/f:SI 30)) "j.c":31:7 19 {*movsi}
>>      (expr_list:REG_DEAD (reg/f:SI 30)
>>         (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
>>             (nil))))
>>
>>
>> And as we leave LRA:
>>
>> (note 10 9 11 2 NOTE_INSN_DELETED)
>> (insn 11 10 13 2 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [3 S4 A32])
>>          (reg/f:SI 7 sp)) "j.c":31:7 19 {*movsi}
>>       (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
>>          (nil)))
> LRA only ever generates insns that pass recog.  The backend allows this
> define_insn, requiring it to be split (it returns template "#"), but
> then somehow it doesn't match in any split pass?

Nope.  The elimination code will just change one register without 
re-recognizing.  That's precisely what happens here.


>
>> Register elimination ultimately discovered that (reg 30) was the same as
>> the stack pointer and did the natural substitution.    The natural
>> substitution results in invalid RTL and there's really no way to back
>> out and do something different AFAICT in lra-eliminations.cc.
>>
>> The only reason we fault is because the H8 backend knows this is invalid
>> RTL and refuses to split it.  If we were to try and re-recognize the
>> insn in question it would fail to recognize after the substitution
>> because the auto-inc'd operand overlaps with the other operand.
> But it *did* recog?  Or does LRA somehow not always recog() everything?
> I always thought that was one of the huge improvements over old reload
> (it does everything very locally instead of very globally)!

No, LRA does not force re-recognition in some cases, particularly for 
register eliminations.


jeff
  
Jeff Law Oct. 14, 2022, 5:10 p.m. UTC | #14
On 10/14/22 10:37, Koning, Paul wrote:
>
>> On Oct 14, 2022, at 10:38 AM, Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>
>>
>> On 10/14/22 06:37, Koning, Paul wrote:
>>>> On Oct 13, 2022, at 9:07 PM, Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>>>
>>>>
>>>> On 10/13/22 17:56, Segher Boessenkool wrote:
>>>>> h8300 fails during GCC build:
>>>>> /home/segher/src/gcc/libgcc/unwind.inc: In function '_Unwind_SjLj_RaiseException':
>>>>> /home/segher/src/gcc/libgcc/unwind.inc:141:1: error: could not split insn
>>>>>    141 | }
>>>>>        | ^
>>>>> (insn 69 256 327 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [12  S4 A32])
>>>>>          (reg/f:SI 7 sp)) "/home/segher/src/gcc/libgcc/unwind.inc":118:12 19 {*movsi}
>>>>>       (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
>>>>>          (nil)))
>>>>> during RTL pass: final
>>>>> which looks like a backend bug, I don't see a pattern that could split
>>>>> this (without needing an extra clobber)?
>>>> I'm aware of this -- its invalid RTL:
>>>>
>>>> Uses of the register outside of an address are not permitted within the
>>>> same insn as a use in an embedded side effect expression because such
>>>> insns behave differently on different machines and hence must be treated
>>>> as ambiguous and disallowed.
>>> I had a bit of a fight with this sort of thing in pdp11, where in fact such operations are executed differently on different machine models.  The solution I picked is to create two sets of machine-specific constraint codes, one for "register N" and the other for "autoinc/dec of any register other than N" and pairing those.  (You can see this in pdp11.md, the mov<mode> definition.)
>> I've long suspected the pdp11 was the inspiration for this restriction (I have memories of noting it before I relocated to Utah, so circa 1992).  The key problem is the generic parts of the compiler don't know what the semantics ought to be -- so it's not obvious when they do a substitution whether or not the substitution of one reg for another is actually valid.  It's important to remember that sometimes when we substitute one register for another, we don't have any contextual information about source vs dest -- it's a long standing wart that causes problems in other cases as well.
>>
>> That punts the problem to the backends and the H8 actually tries to deal with this restriction.  Basically in the movxx pattern conditions, when the destination uses an autoinc addressing mode, the pattern's condition will check that the source register is different.  I would expect other ports likely to do something similar.
>>
>> But that approach falls down with reload/lra doing substitutions without validating the result.  I guess it might be possible to cobble together something with secondary reloads, but it's way way way down on my todo list.
> Aren't the constraints enforced?  My experience is that I was getting these bad addressing modes in some test programs, and that the constraints I created to make the requirement explicit cured that.  Maybe I'm expecting too much from constraints, but my (admittedly inexperienced) understanding of them is that they inform reload what sort of things it can construct, and what it cannot.

It's not really a constraint issue -- the pattern's condition would 
cause this not to recognize, but LRA doesn't re-recognize the insn.  We 
might be able to hack something in the constraints to force a reload of 
the source operand in this case.   Ugly, but a possibility.


Jeff
  
Jeff Law Oct. 14, 2022, 5:11 p.m. UTC | #15
On 10/14/22 10:39, Richard Biener wrote:
>
>> Am 14.10.2022 um 16:40 schrieb Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org>:
>>
>> 
>>> On 10/14/22 06:37, Koning, Paul wrote:
>>>
>>>>> On Oct 13, 2022, at 9:07 PM, Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>>>
>>>> On 10/13/22 17:56, Segher Boessenkool wrote:
>>>>> h8300 fails during GCC build:
>>>>> /home/segher/src/gcc/libgcc/unwind.inc: In function '_Unwind_SjLj_RaiseException':
>>>>> /home/segher/src/gcc/libgcc/unwind.inc:141:1: error: could not split insn
>>>>>    141 | }
>>>>>        | ^
>>>>> (insn 69 256 327 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [12  S4 A32])
>>>>>          (reg/f:SI 7 sp)) "/home/segher/src/gcc/libgcc/unwind.inc":118:12 19 {*movsi}
>>>>>       (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
>>>>>          (nil)))
>>>>> during RTL pass: final
>>>>> which looks like a backend bug, I don't see a pattern that could split
>>>>> this (without needing an extra clobber)?
>>>> I'm aware of this -- its invalid RTL:
>>>>
>>>> Uses of the register outside of an address are not permitted within the
>>>> same insn as a use in an embedded side effect expression because such
>>>> insns behave differently on different machines and hence must be treated
>>>> as ambiguous and disallowed.
>>> I had a bit of a fight with this sort of thing in pdp11, where in fact such operations are executed differently on different machine models.  The solution I picked is to create two sets of machine-specific constraint codes, one for "register N" and the other for "autoinc/dec of any register other than N" and pairing those.  (You can see this in pdp11.md, the mov<mode> definition.)
>> I've long suspected the pdp11 was the inspiration for this restriction (I have memories of noting it before I relocated to Utah, so circa 1992).  The key problem is the generic parts of the compiler don't know what the semantics ought to be -- so it's not obvious when they do a substitution whether or not the substitution of one reg for another is actually valid.  It's important to remember that sometimes when we substitute one register for another, we don't have any contextual information about source vs dest -- it's a long standing wart that causes problems in other cases as well.
>>
>> That punts the problem to the backends and the H8 actually tries to deal with this restriction.  Basically in the movxx pattern conditions, when the destination uses an autoinc addressing mode, the pattern's condition will check that the source register is different.  I would expect other ports likely to do something similar.
>>
>> But that approach falls down with reload/lra doing substitutions without validating the result.  I guess it might be possible to cobble together something with secondary reloads, but it's way way way down on my todo list.
>>
>> And yes, this case where the autoinc is on the destination works consistently on the H8 as well.  We could consider loosening the restrictions and let this through.  It's certainly simpler as it's a doc change and removing a bit of code on the H8.  It sounds like the pdp11 already assumes that case is valid.
> But what is the semantics of the RTL IL?
> That ought to be documented.

*If* we went a route to relax the restriction (and I'm still not sure 
that's a good idea), we absolutely would have to document the semantics.


jeff
  
Segher Boessenkool Oct. 14, 2022, 5:35 p.m. UTC | #16
On Fri, Oct 14, 2022 at 11:07:43AM -0600, Jeff Law wrote:
> >LRA only ever generates insns that pass recog.  The backend allows this
> >define_insn, requiring it to be split (it returns template "#"), but
> >then somehow it doesn't match in any split pass?
> 
> Nope.  The elimination code will just change one register without 
> re-recognizing.  That's precisely what happens here.

That is a big oversight then.  Please file a PR?

> >>Register elimination ultimately discovered that (reg 30) was the same as
> >>the stack pointer and did the natural substitution.    The natural
> >>substitution results in invalid RTL and there's really no way to back
> >>out and do something different AFAICT in lra-eliminations.cc.
> >>
> >>The only reason we fault is because the H8 backend knows this is invalid
> >>RTL and refuses to split it.  If we were to try and re-recognize the
> >>insn in question it would fail to recognize after the substitution
> >>because the auto-inc'd operand overlaps with the other operand.
> >But it *did* recog?  Or does LRA somehow not always recog() everything?
> >I always thought that was one of the huge improvements over old reload
> >(it does everything very locally instead of very globally)!
> 
> No, LRA does not force re-recognition in some cases, particularly for 
> register eliminations.

It is the only way it can know if it needs to reload more.  Even if it
somehow can assume it doesn't have to check this in some cases, an
assert (inside a CHECKING_P) would be nice?


Segher
  
Li, Pan2 via Gcc-patches Oct. 14, 2022, 5:36 p.m. UTC | #17
> On Oct 14, 2022, at 1:10 PM, Jeff Law <jeffreyalaw@gmail.com> wrote:
> 
> On 10/14/22 10:37, Koning, Paul wrote:
>> 
>>> ...
>>> But that approach falls down with reload/lra doing substitutions without validating the result.  I guess it might be possible to cobble together something with secondary reloads, but it's way way way down on my todo list.
>> Aren't the constraints enforced?  My experience is that I was getting these bad addressing modes in some test programs, and that the constraints I created to make the requirement explicit cured that.  Maybe I'm expecting too much from constraints, but my (admittedly inexperienced) understanding of them is that they inform reload what sort of things it can construct, and what it cannot.
> 
> It's not really a constraint issue -- the pattern's condition would cause this not to recognize, but LRA doesn't re-recognize the insn.  We might be able to hack something in the constraints to force a reload of the source operand in this case.   Ugly, but a possibility.

I find it hard to cope with constraints that don't constrain.  Minimally it should be clearly documented exactly what cases fail to obey the constraints and what a target writer can do to deal with those failures.

As it stands, I find myself working hard to write MD code that accurately describes the rules of the machine, and for the core machinery to disregard those instructions is painful.

Is there a compelling argument for every case where LRA fails to obey the constraints?  If not, can they just be called bugs and added to the to-be-fixed queue?

	paul
  
Jeff Law Oct. 14, 2022, 6:03 p.m. UTC | #18
On 10/14/22 11:35, Segher Boessenkool wrote:
> On Fri, Oct 14, 2022 at 11:07:43AM -0600, Jeff Law wrote:
>>> LRA only ever generates insns that pass recog.  The backend allows this
>>> define_insn, requiring it to be split (it returns template "#"), but
>>> then somehow it doesn't match in any split pass?
>> Nope.  The elimination code will just change one register without
>> re-recognizing.  That's precisely what happens here.
> That is a big oversight then.  Please file a PR?

Sure.  But just recognizing (for this particular case) will just move 
the fault from a failure to split to a failure to recognize. From my 
wanderings in the elimination code, I don't see that it has a path that 
would allow it to reasonably handle this case -- ie, if the insn does 
not recognize, what then?   Conceptually we need to generate an 
input-reload but I don't see a way to do that in the elimination code.  
Maybe Vlad knows how it ought to be handled.



> It is the only way it can know if it needs to reload more.  Even if it
> somehow can assume it doesn't have to check this in some cases, an
> assert (inside a CHECKING_P) would be nice?

Agreed.


jeff
  
Li, Pan2 via Gcc-patches Oct. 14, 2022, 7:58 p.m. UTC | #19
> On Oct 14, 2022, at 2:03 PM, Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> 
> On 10/14/22 11:35, Segher Boessenkool wrote:
>> On Fri, Oct 14, 2022 at 11:07:43AM -0600, Jeff Law wrote:
>>>> LRA only ever generates insns that pass recog.  The backend allows this
>>>> define_insn, requiring it to be split (it returns template "#"), but
>>>> then somehow it doesn't match in any split pass?
>>> Nope.  The elimination code will just change one register without
>>> re-recognizing.  That's precisely what happens here.
>> That is a big oversight then.  Please file a PR?
> 
> Sure.  But just recognizing (for this particular case) will just move the fault from a failure to split to a failure to recognize. From my wanderings in the elimination code, I don't see that it has a path that would allow it to reasonably handle this case -- ie, if the insn does not recognize, what then?   Conceptually we need to generate an input-reload but I don't see a way to do that in the elimination code.  Maybe Vlad knows how it ought to be handled.

I probably have too simplistic a view of this, but the way I think of it is that LRA (and reload) make decisions subject to constraints, and among those constraints are the ones specified in the MD file patterns.  That to me means that a substitution proposed to be made by the LRA code is subject to those invariants: it can't do that if the constraints say "no" and must then consider some other alternative.

	paul
  
Segher Boessenkool Oct. 14, 2022, 8:12 p.m. UTC | #20
On Fri, Oct 14, 2022 at 07:58:39PM +0000, Koning, Paul wrote:
> > On Oct 14, 2022, at 2:03 PM, Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> > On 10/14/22 11:35, Segher Boessenkool wrote:
> >> On Fri, Oct 14, 2022 at 11:07:43AM -0600, Jeff Law wrote:
> >>>> LRA only ever generates insns that pass recog.  The backend allows this
> >>>> define_insn, requiring it to be split (it returns template "#"), but
> >>>> then somehow it doesn't match in any split pass?
> >>> Nope.  The elimination code will just change one register without
> >>> re-recognizing.  That's precisely what happens here.
> >> That is a big oversight then.  Please file a PR?
> > 
> > Sure.  But just recognizing (for this particular case) will just move the fault from a failure to split to a failure to recognize. From my wanderings in the elimination code, I don't see that it has a path that would allow it to reasonably handle this case -- ie, if the insn does not recognize, what then?   Conceptually we need to generate an input-reload but I don't see a way to do that in the elimination code.  Maybe Vlad knows how it ought to be handled.
> 
> I probably have too simplistic a view of this, but the way I think of it is that LRA (and reload) make decisions subject to constraints, and among those constraints are the ones specified in the MD file patterns.  That to me means that a substitution proposed to be made by the LRA code is subject to those invariants: it can't do that if the constraints say "no" and must then consider some other alternative.

I think that is exactly right for LRA.

Old reload conceptually changed the whole function all at once, starting
with valid RTL, and ending with strictly valid RTL.  LRA works locally,
one instruction at a time essentially, and makes the changes
immediately.  If when it has finished work on the function offsets have
changed, it walks over the whole function again, repeat until done.

"Strictly valid" means that the constraints are considered, and the insn
is only valid if some enabled alternative satisfies all constraints.

I hope I got that all right, I'm not an expert!  :-)


Segher
  
Li, Pan2 via Gcc-patches Oct. 14, 2022, 8:40 p.m. UTC | #21
> On Oct 14, 2022, at 4:12 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> 
> On Fri, Oct 14, 2022 at 07:58:39PM +0000, Koning, Paul wrote:
>>> On Oct 14, 2022, at 2:03 PM, Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>> On 10/14/22 11:35, Segher Boessenkool wrote:
>>>> On Fri, Oct 14, 2022 at 11:07:43AM -0600, Jeff Law wrote:
>>>>>> LRA only ever generates insns that pass recog.  The backend allows this
>>>>>> define_insn, requiring it to be split (it returns template "#"), but
>>>>>> then somehow it doesn't match in any split pass?
>>>>> Nope.  The elimination code will just change one register without
>>>>> re-recognizing.  That's precisely what happens here.
>>>> That is a big oversight then.  Please file a PR?
>>> 
>>> Sure.  But just recognizing (for this particular case) will just move the fault from a failure to split to a failure to recognize. From my wanderings in the elimination code, I don't see that it has a path that would allow it to reasonably handle this case -- ie, if the insn does not recognize, what then?   Conceptually we need to generate an input-reload but I don't see a way to do that in the elimination code.  Maybe Vlad knows how it ought to be handled.
>> 
>> I probably have too simplistic a view of this, but the way I think of it is that LRA (and reload) make decisions subject to constraints, and among those constraints are the ones specified in the MD file patterns.  That to me means that a substitution proposed to be made by the LRA code is subject to those invariants: it can't do that if the constraints say "no" and must then consider some other alternative.
> 
> I think that is exactly right for LRA.
> 
> Old reload conceptually changed the whole function all at once, starting
> with valid RTL, and ending with strictly valid RTL.  LRA works locally,
> one instruction at a time essentially, and makes the changes
> immediately.  If when it has finished work on the function offsets have
> changed, it walks over the whole function again, repeat until done.
> 
> "Strictly valid" means that the constraints are considered, and the insn
> is only valid if some enabled alternative satisfies all constraints.
> 
> I hope I got that all right, I'm not an expert!  :-)

Thanks Segher.

As I said earlier, if for some reason this straightforward understanding is not completely accurate, that can be handled provided it is documented when and why the exceptions arise, and what methods the target author should use to deal with these things when they happen.

As a target maintainer not deeply skilled in the GCC common internals, I tend to trip over these things.  With the old reload, and secondary reload in particular, it always felt to  me  like the answer was "keep tweaking the target definition files until the test cases stop breaking".  That isn't how it should be.  

Perhaps some of these issues come from out of the ordinary target restrictions.  The autoinc/autodec case we're discussing may be an example of that.  The one I remember in particular was the pdp11 float instructions, where I have 6 registers but only 4 of these can be loaded from or stored to memory.  Putting the other two to work while having spill to memory work right took quite a lot of iteration.

It may be LRA is better in these areas.  I haven't spent much time with that, other than to create a way to enable its use and observing that (a) I got about the same test suite numbers either way and (b) the LRA code was not as good in some of the cases.

	paul
  
Jeff Law Oct. 14, 2022, 9:15 p.m. UTC | #22
On 10/14/22 11:36, Koning, Paul wrote:
>
>> On Oct 14, 2022, at 1:10 PM, Jeff Law <jeffreyalaw@gmail.com> wrote:
>>
>> On 10/14/22 10:37, Koning, Paul wrote:
>>>> ...
>>>> But that approach falls down with reload/lra doing substitutions without validating the result.  I guess it might be possible to cobble together something with secondary reloads, but it's way way way down on my todo list.
>>> Aren't the constraints enforced?  My experience is that I was getting these bad addressing modes in some test programs, and that the constraints I created to make the requirement explicit cured that.  Maybe I'm expecting too much from constraints, but my (admittedly inexperienced) understanding of them is that they inform reload what sort of things it can construct, and what it cannot.
>> It's not really a constraint issue -- the pattern's condition would cause this not to recognize, but LRA doesn't re-recognize the insn.  We might be able to hack something in the constraints to force a reload of the source operand in this case.   Ugly, but a possibility.
> I find it hard to cope with constraints that don't constrain.  Minimally it should be clearly documented exactly what cases fail to obey the constraints and what a target writer can do to deal with those failures.

Constraints have a purpose, but as I've noted, they really don't come 
into play here.   Had LRA tried to see if what it created as a valid 
move insn, the backend would have said "nope, that's not valid".  That's 
a stronger test than checking the constraints.  If the insn is not valid 
according to its condition, then the constraints simply don't matter.

I'm not aware of a case where constraints are failing to be obeyed and 
constraints simply aren't a viable solution here other than to paper 
over the problem and hope it doesn't show up elsewhere.

Right now operand 0's constraint is "<" meaning pre-inc operand, operand 
1 is "r".  How would you define a new constraint for operand 1 that 
disallows overlap with operand 0 given that the H8 allows autoinc on any 
register operand?   You can't look at operand 0 while processing the 
constraint for operand 1. Similarly if you try to define a new 
constraint for operand0 without looking at operand1.

Hence the h8300_move_ok test in the insn's condition where we can look 
at both operands to assess if it's a legitimate insn.


>
> As it stands, I find myself working hard to write MD code that accurately describes the rules of the machine, and for the core machinery to disregard those instructions is painful.

No doubt.


>
> Is there a compelling argument for every case where LRA fails to obey the constraints?  If not, can they just be called bugs and added to the to-be-fixed queue?

There was in the reload days, though I honestly don't remember what it 
was, I'm much less familiar with LRA in this regard, but I trust Vlad's 
engineering skills and strongly believe that failing to recognize was 
done for a good reason.


It's also worth repeating, we can get the same fundamental failure on 
the H8 with reload.  The testcase is different, but the core issue is 
the same.  We have a move with an autoinc destination and the same 
register is also used as a source operand incorerctly created by reload.


What's a bit interesting here is the m68k doesn't do any kind of 
checking for these scenarios. It just accepts them and generates the 
obvious code.  I'm more tempted by the minute to do the same on the H8 :-)


Jeff
  
Li, Pan2 via Gcc-patches Oct. 14, 2022, 9:21 p.m. UTC | #23
> On Oct 14, 2022, at 5:15 PM, Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
> 
> 
> On 10/14/22 11:36, Koning, Paul wrote:
>> 
>>> On Oct 14, 2022, at 1:10 PM, Jeff Law <jeffreyalaw@gmail.com> wrote:
>>> 
>>> On 10/14/22 10:37, Koning, Paul wrote:
>>>>> ...
>>>>> But that approach falls down with reload/lra doing substitutions without validating the result.  I guess it might be possible to cobble together something with secondary reloads, but it's way way way down on my todo list.
>>>> Aren't the constraints enforced?  My experience is that I was getting these bad addressing modes in some test programs, and that the constraints I created to make the requirement explicit cured that.  Maybe I'm expecting too much from constraints, but my (admittedly inexperienced) understanding of them is that they inform reload what sort of things it can construct, and what it cannot.
>>> It's not really a constraint issue -- the pattern's condition would cause this not to recognize, but LRA doesn't re-recognize the insn.  We might be able to hack something in the constraints to force a reload of the source operand in this case.   Ugly, but a possibility.
>> I find it hard to cope with constraints that don't constrain.  Minimally it should be clearly documented exactly what cases fail to obey the constraints and what a target writer can do to deal with those failures.
> 
> Constraints have a purpose, but as I've noted, they really don't come into play here.   Had LRA tried to see if what it created as a valid move insn, the backend would have said "nope, that's not valid".  That's a stronger test than checking the constraints.  If the insn is not valid according to its condition, then the constraints simply don't matter.
> 
> I'm not aware of a case where constraints are failing to be obeyed and constraints simply aren't a viable solution here other than to paper over the problem and hope it doesn't show up elsewhere.
> 
> Right now operand 0's constraint is "<" meaning pre-inc operand, operand 1 is "r".  How would you define a new constraint for operand 1 that disallows overlap with operand 0 given that the H8 allows autoinc on any register operand?   You can't look at operand 0 while processing the constraint for operand 1. Similarly if you try to define a new constraint for operand0 without looking at operand1.

Easy but cumbersome: define constraints for "register N" (for each N) and another set for "autoinc on any register other than N".  In pdp11, I called these Z0, Z1... and Za, Zb... respectively.  Then the insn gets constraints that look like "Z0,Z1,Z2..." and "Za, Zb, Zc..." for the two operands.  As I said, see pdp11.md, the mov insn.

	paul
  
Jeff Law Oct. 14, 2022, 9:30 p.m. UTC | #24
On 10/14/22 15:21, Koning, Paul wrote:
>
>> On Oct 14, 2022, at 5:15 PM, Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>
>>
>> On 10/14/22 11:36, Koning, Paul wrote:
>>>> On Oct 14, 2022, at 1:10 PM, Jeff Law <jeffreyalaw@gmail.com> wrote:
>>>>
>>>> On 10/14/22 10:37, Koning, Paul wrote:
>>>>>> ...
>>>>>> But that approach falls down with reload/lra doing substitutions without validating the result.  I guess it might be possible to cobble together something with secondary reloads, but it's way way way down on my todo list.
>>>>> Aren't the constraints enforced?  My experience is that I was getting these bad addressing modes in some test programs, and that the constraints I created to make the requirement explicit cured that.  Maybe I'm expecting too much from constraints, but my (admittedly inexperienced) understanding of them is that they inform reload what sort of things it can construct, and what it cannot.
>>>> It's not really a constraint issue -- the pattern's condition would cause this not to recognize, but LRA doesn't re-recognize the insn.  We might be able to hack something in the constraints to force a reload of the source operand in this case.   Ugly, but a possibility.
>>> I find it hard to cope with constraints that don't constrain.  Minimally it should be clearly documented exactly what cases fail to obey the constraints and what a target writer can do to deal with those failures.
>> Constraints have a purpose, but as I've noted, they really don't come into play here.   Had LRA tried to see if what it created as a valid move insn, the backend would have said "nope, that's not valid".  That's a stronger test than checking the constraints.  If the insn is not valid according to its condition, then the constraints simply don't matter.
>>
>> I'm not aware of a case where constraints are failing to be obeyed and constraints simply aren't a viable solution here other than to paper over the problem and hope it doesn't show up elsewhere.
>>
>> Right now operand 0's constraint is "<" meaning pre-inc operand, operand 1 is "r".  How would you define a new constraint for operand 1 that disallows overlap with operand 0 given that the H8 allows autoinc on any register operand?   You can't look at operand 0 while processing the constraint for operand 1. Similarly if you try to define a new constraint for operand0 without looking at operand1.
> Easy but cumbersome: define constraints for "register N" (for each N) and another set for "autoinc on any register other than N".  In pdp11, I called these Z0, Z1... and Za, Zb... respectively.  Then the insn gets constraints that look like "Z0,Z1,Z2..." and "Za, Zb, Zc..." for the two operands.  As I said, see pdp11.md, the mov insn.

Yea, you're right.  It's definitely possible.  Painful, but possible.

jeff
  
Jeff Law Oct. 15, 2022, 12:19 a.m. UTC | #25
On 10/14/22 15:21, Koning, Paul wrote:
>
>> On Oct 14, 2022, at 5:15 PM, Jeff Law via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>>
>>
>> On 10/14/22 11:36, Koning, Paul wrote:
>>>> On Oct 14, 2022, at 1:10 PM, Jeff Law <jeffreyalaw@gmail.com> wrote:
>>>>
>>>> On 10/14/22 10:37, Koning, Paul wrote:
>>>>>> ...
>>>>>> But that approach falls down with reload/lra doing substitutions without validating the result.  I guess it might be possible to cobble together something with secondary reloads, but it's way way way down on my todo list.
>>>>> Aren't the constraints enforced?  My experience is that I was getting these bad addressing modes in some test programs, and that the constraints I created to make the requirement explicit cured that.  Maybe I'm expecting too much from constraints, but my (admittedly inexperienced) understanding of them is that they inform reload what sort of things it can construct, and what it cannot.
>>>> It's not really a constraint issue -- the pattern's condition would cause this not to recognize, but LRA doesn't re-recognize the insn.  We might be able to hack something in the constraints to force a reload of the source operand in this case.   Ugly, but a possibility.
>>> I find it hard to cope with constraints that don't constrain.  Minimally it should be clearly documented exactly what cases fail to obey the constraints and what a target writer can do to deal with those failures.
>> Constraints have a purpose, but as I've noted, they really don't come into play here.   Had LRA tried to see if what it created as a valid move insn, the backend would have said "nope, that's not valid".  That's a stronger test than checking the constraints.  If the insn is not valid according to its condition, then the constraints simply don't matter.
>>
>> I'm not aware of a case where constraints are failing to be obeyed and constraints simply aren't a viable solution here other than to paper over the problem and hope it doesn't show up elsewhere.
>>
>> Right now operand 0's constraint is "<" meaning pre-inc operand, operand 1 is "r".  How would you define a new constraint for operand 1 that disallows overlap with operand 0 given that the H8 allows autoinc on any register operand?   You can't look at operand 0 while processing the constraint for operand 1. Similarly if you try to define a new constraint for operand0 without looking at operand1.
> Easy but cumbersome: define constraints for "register N" (for each N) and another set for "autoinc on any register other than N".  In pdp11, I called these Z0, Z1... and Za, Zb... respectively.  Then the insn gets constraints that look like "Z0,Z1,Z2..." and "Za, Zb, Zc..." for the two operands.  As I said, see pdp11.md, the mov insn.

It generally looks sound, but golly gee, this runs into the "reload 
doesn't validate insns problem"  if it's done in a reload tree rather 
than an lra tree.  We've got an insn with a pre-inc destination and a 
reg source.  The source pseudo doesn't get a hard reg, reload replaces 
the pseudo with a mem as expected. Reload finishes with something like this:

(insn 100 98 101 15 (set (mem/f:SI (pre_dec:SI (reg/f:SI 7 sp)) [8  S4 A32])
         (mem/c:SI (plus:SI (reg/f:SI 7 sp)
                 (const_int 8 [0x8])) [9 %sfp+-24 S4 A32])) "j.c":62:11 
19 {*movsix}
      (expr_list:REG_ARGS_SIZE (const_int 4 [0x4])
         (nil)))

Which, isn't a valid instruction on the H8.  The insn's condition 
verifies that one of the two operands must be a REG.  But reload never 
bothered to re-recognize the insn after makng the substitution, then 
naturally it blows up in reload_cse with a constraint failure because 
the pre-inc destination constraints require a reg for the source 
operand.  But the real culprit here is reload making the substitution 
and not validing that the result is valid.

Arggh!

Which brings me back to pondering just removing the autoinc magic 
checking in the H8 port :-)  I've actually got that spinning in the 
tester just to see if there's any obvious fallout.  I've already spent 
more time on this than I can reasonably justify.


Jeff
  
Takayuki 'January June' Suwa Oct. 15, 2022, 3:18 a.m. UTC | #26
On 2022/10/15 1:25, Segher Boessenkool wrote:
> On Fri, Oct 14, 2022 at 03:20:40PM +0900, Takayuki 'January June' Suwa wrote:
>> On 2022/10/14 8:56, Segher Boessenkool wrote:
>>> And finally, xtensa does
>>> /home/segher/src/gcc/libgcc/libgcc2.c:840:1: error: insn does not satisfy its constraints:
>>>   840 | }
>>>       | ^
>>> (insn 8 7 9 2 (set (reg:SI 9 a9 [57])
>>>         (const_int 1431655765 [0x55555555])) "/home/segher/src/gcc/libgcc/libgcc2.c":828:21 37 {movsi_internal}
>>>      (expr_list:REG_EQUIV (const_int 1431655765 [0x55555555])
>>>         (nil)))
>>> during RTL pass: postreload
>>> /home/segher/src/gcc/libgcc/libgcc2.c:840:1: internal compiler error: in extract_constrain_insn, at recog.cc:2692
>>
>> This is a result of knowing that Reload is tolerant of out-of-constraint constants.
>> LRA support needs to be taken care of before that, ie. in the "split1" pass.
>> Excuse me in haste.
> 
> So old reload did a split itself?  Or it left it to say the split2 pass?

Reload did.


[testpiece]

unsigned int test(void) {
  return 0x55555555;
}


[282r.ira]

(insn 9 5 10 2 (set (reg/i:SI 2 a2)
        (const_int 1431655765 [0x55555555])) "test.c":3:1 27 {movsi_internal}
     (expr_list:REG_EQUAL (const_int 1431655765 [0x55555555])
        (nil)))


[283r.reload]

;; Function test (test, funcdef_no=0, decl_uid=1386, cgraph_uid=1, symbol_order=0)

insn=9, live_throughout: 1, dead_or_set: 2
insn=10, live_throughout: 1, 2, dead_or_set: 
Spilling for insn 9.

Reloads for insn # 9
Reload 0: reload_in (SI) = (const_int 1431655765 [0x55555555])
	RL_REGS, RELOAD_FOR_INPUT (opnum = 1)
	reload_in_reg: (const_int 1431655765 [0x55555555])
	reload_reg_rtx: (reg/i:SI 2 a2)
deleting insn with uid = 9.

(insn 14 5 10 2 (set (reg/i:SI 2 a2)
        (mem/u/c:SI (symbol_ref/u:SI ("*.LC0") [flags 0x2]) [0  S4 A32])) "test.c":3:1 27 {movsi_internal}
     (nil))
  

Patch

diff --git a/gcc/auto-inc-dec.cc b/gcc/auto-inc-dec.cc
index 481e7af..0186c17 100644
--- a/gcc/auto-inc-dec.cc
+++ b/gcc/auto-inc-dec.cc
@@ -1443,7 +1443,7 @@  merge_in_block (int max_reg, basic_block bb)
 
       /* Reload should handle auto-inc within a jump correctly, while LRA
 	 is known to have issues with autoinc.  */
-      if (JUMP_P (insn) && targetm.lra_p ())
+      if (JUMP_P (insn))
 	continue;
 
       if (dump_file)
diff --git a/gcc/combine.cc b/gcc/combine.cc
index a5fabf3..64a96c6 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -2026,7 +2026,7 @@  can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_insn *pred ATTRIBUTE_UNUSED,
   if (AUTO_INC_DEC)
     for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
       if (REG_NOTE_KIND (link) == REG_INC
-	  && ((JUMP_P (i3) && targetm.lra_p ())
+	  && (JUMP_P (i3)
 	      || reg_used_between_p (XEXP (link, 0), insn, i3)
 	      || (pred != NULL_RTX
 		  && reg_overlap_mentioned_p (XEXP (link, 0), PATTERN (pred)))
diff --git a/gcc/ira-lives.cc b/gcc/ira-lives.cc
index 7a92c44..87b2d52 100644
--- a/gcc/ira-lives.cc
+++ b/gcc/ira-lives.cc
@@ -1133,11 +1133,6 @@  find_call_crossed_cheap_reg (rtx_insn *insn)
 rtx
 non_conflicting_reg_copy_p (rtx_insn *insn)
 {
-  /* Reload has issues with overlapping pseudos being assigned to the
-     same hard register, so don't allow it.  See PR87600 for details.  */
-  if (!targetm.lra_p ())
-    return NULL_RTX;
-
   rtx set = single_set (insn);
 
   /* Disallow anything other than a simple register to register copy
diff --git a/gcc/ira.cc b/gcc/ira.cc
index 42c9cea..02a2f7b 100644
--- a/gcc/ira.cc
+++ b/gcc/ira.cc
@@ -1664,7 +1664,7 @@  ira_init_once (void)
   ira_init_costs_once ();
   lra_init_once ();
 
-  ira_use_lra_p = targetm.lra_p ();
+  ira_use_lra_p = 1;
 }
 
 /* Free ira_max_register_move_cost, ira_may_move_in_cost and
diff --git a/gcc/recog.cc b/gcc/recog.cc
index dac172b..6f41022 100644
--- a/gcc/recog.cc
+++ b/gcc/recog.cc
@@ -3242,7 +3242,7 @@  constrain_operands (int strict, alternative_mask alternatives)
 			       || (strict < 0 && CONSTANT_P (op))
 			       /* Before reload, accept a pseudo or hard register,
 				  since LRA can turn it into a mem.  */
-			       || (strict < 0 && targetm.lra_p () && REG_P (op))
+			       || (strict < 0 && REG_P (op))
 			       /* During reload, accept a pseudo  */
 			       || (reload_in_progress && REG_P (op)
 				   && REGNO (op) >= FIRST_PSEUDO_REGISTER)))
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index d17d393..c7f92c4 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -1380,10 +1380,6 @@  default_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x ATTRIBUTE_UNUSED,
 machine_mode
 default_secondary_memory_needed_mode (machine_mode mode)
 {
-  if (!targetm.lra_p ()
-      && known_lt (GET_MODE_BITSIZE (mode), BITS_PER_WORD)
-      && INTEGRAL_MODE_P (mode))
-    return mode_for_size (BITS_PER_WORD, GET_MODE_CLASS (mode), 0).require ();
   return mode;
 }