[v2] xtensa: Prepare the transition from Reload to LRA
Checks
Commit Message
On 2022/10/16 14:03, Max Filippov wrote:
> Hi Suwa-san,
Hi!
> This change results in a few new regressions in the following tests caused by ICE even when running without -mlra option:
>
> +FAIL: gcc.c-torture/execute/pr92904.c -O1 (internal compiler error: in extract_insn, at recog.cc:2791)
>
> The backtraces look like this in all of them:
>
> gcc/gcc/testsuite/gcc.c-torture/execute/pr92904.c:395:1: error:
> unrecognizable insn:
> (insn 10501 7 10502 2 (set (reg:SI 5913)
> (const_int 1431655765 [0x55555555]))
> "gcc/gcc/testsuite/gcc.c-torture/execute/pr92904.c":239:9 -1
> (nil))
> during RTL pass: subreg3
> gcc/gcc/testsuite/gcc.c-torture/execute/pr92904.c:395:1: internal compiler error: in extract_insn, at recog.cc:2791
"expand" pass generates the below from referencing to the struct:
;; MEM <long long int> [(union Y *)&u] = 6148914691236517205;
(set (reg:DI X) (mem:DI (symbol_ref:SI ("*.LC_u"))))
and then "fwprop1" transforms it by dereference:
(set (reg:DI X) (const_int 0x5555555555555555))
finally "subreg3" (but not "split1") splits it into the two that don't satisfy the constraint:
(set (reg:SI X0) (const_int 0x55555555))
(set (reg:SI X1) (const_int 0x55555555))
> There's also the following runtime failures, but only on call0 configuration:
>
> +FAIL: gcc.c-torture/execute/20010122-1.c -O1 execution test
> +FAIL: gcc.c-torture/execute/20010122-1.c -O2 execution test
> +FAIL: gcc.c-torture/execute/20010122-1.c -O3 -g execution test
> +FAIL: gcc.c-torture/execute/20010122-1.c -Os execution test
> +FAIL: gcc.c-torture/execute/20010122-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test
both assembler outputs with and without this patch are identical on my side, but perhaps it can break runtime init and/or libraries due to my silly mistake:
-+ if (HARD_REGISTER_P (x)
++ if (! HARD_REGISTER_P (x)
===
This patch provides the first step in the transition from Reload to LRA
in Xtensa.
gcc/ChangeLog:
* config/xtensa/xtensa-proto.h
(xtensa_split1_finished_p, xtensa_split_DI_reg_imm): New prototypes.
* config/xtensa/xtensa.cc
(xtensa_split1_finished_p, xtensa_split_DI_reg_imm, xtensa_lra_p):
New functions.
(TARGET_LRA_P): Replace the dummy hook with xtensa_lra_p.
(xt_true_regnum): Rework.
* gcc/config/xtensa/xtensa.h (CALL_REALLY_USED_REGISTERS):
Rename from CALL_USED_REGISTERS, and remove what correspond to
FIXED_REGISTERS.
* gcc/config/xtensa/constraints.md (Y):
Use !xtensa_split1_finished_p() instead of can_create_pseudo_p().
* gcc/config/xtensa/predicates.md (move_operand): Ditto.
* gcc/config/xtensa/xtensa.md: Add two new split patterns:
- splits DImode immediate load into two SImode ones
- puts out-of-constraint SImode constants into the constant pool
* gcc/config/xtensa/xtensa.opt (-mlra): New target-specific option
for testing purpose.
---
gcc/config/xtensa/constraints.md | 2 +-
gcc/config/xtensa/predicates.md | 2 +-
gcc/config/xtensa/xtensa-protos.h | 2 +
gcc/config/xtensa/xtensa.cc | 69 ++++++++++++++++++++++++++-----
gcc/config/xtensa/xtensa.h | 6 +--
gcc/config/xtensa/xtensa.md | 36 ++++++++++++----
gcc/config/xtensa/xtensa.opt | 4 ++
7 files changed, 98 insertions(+), 23 deletions(-)
Comments
On Mon, Oct 17, 2022 at 7:57 PM Takayuki 'January June' Suwa
<jjsuwa_sys3175@yahoo.co.jp> wrote:
> On 2022/10/16 14:03, Max Filippov wrote:
> > There's also the following runtime failures, but only on call0 configuration:
> >
> > +FAIL: gcc.c-torture/execute/20010122-1.c -O1 execution test
> > +FAIL: gcc.c-torture/execute/20010122-1.c -O2 execution test
> > +FAIL: gcc.c-torture/execute/20010122-1.c -O3 -g execution test
> > +FAIL: gcc.c-torture/execute/20010122-1.c -Os execution test
> > +FAIL: gcc.c-torture/execute/20010122-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test
>
> both assembler outputs with and without this patch are identical on my side
Interesting. In -O1 test I see the following difference that is going to affect
the return value of the corresponding functions:
--- gcc-13-3308-gb4a4c6382b14-call0-le/20010122-1.s 2022-10-17
20:07:32.390363204 -0700
+++ gcc-13-3309-g851636ecd015-call0-le/20010122-1.s 2022-10-17
20:06:36.613785546 -0700
@@ -143,13 +143,10 @@
test2:
addi sp, sp, -16
s32i.n a0, sp, 12
- s32i.n a12, sp, 8
- mov.n a12, a0
l32r a2, .LC6
callx0 a2
- mov.n a2, a12
+ mov.n a2, a0
l32i.n a0, sp, 12
- l32i.n a12, sp, 8
addi sp, sp, 16
ret.n
.size test2, .-test2
@@ -161,13 +158,10 @@
test3:
addi sp, sp, -16
s32i.n a0, sp, 12
- s32i.n a12, sp, 8
- mov.n a12, a0
l32r a2, .LC7
callx0 a2
- mov.n a2, a12
+ mov.n a2, a0
l32i.n a0, sp, 12
- l32i.n a12, sp, 8
addi sp, sp, 16
ret.n
.size test3, .-test3
@@ -258,14 +252,11 @@
test8:
addi sp, sp, -16
s32i.n a0, sp, 12
- s32i.n a12, sp, 8
- mov.n a12, a0
l32r a2, .LC12
callx0 a2
l32r a2, .LC13
- s32i.n a12, a2, 0
+ s32i.n a0, a2, 0
l32i.n a0, sp, 12
- l32i.n a12, sp, 8
addi sp, sp, 16
ret.n
.size test8, .-test8
Hi Suwa-san,
v2 fixes the regressions caused by ICEs, but not the runtime failures.
On Mon, Oct 17, 2022 at 8:14 PM Max Filippov <jcmvbkbc@gmail.com> wrote:
> On Mon, Oct 17, 2022 at 7:57 PM Takayuki 'January June' Suwa
> <jjsuwa_sys3175@yahoo.co.jp> wrote:
> > On 2022/10/16 14:03, Max Filippov wrote:
> > > There's also the following runtime failures, but only on call0 configuration:
> > >
> > > +FAIL: gcc.c-torture/execute/20010122-1.c -O1 execution test
> > > +FAIL: gcc.c-torture/execute/20010122-1.c -O2 execution test
> > > +FAIL: gcc.c-torture/execute/20010122-1.c -O3 -g execution test
> > > +FAIL: gcc.c-torture/execute/20010122-1.c -Os execution test
> > > +FAIL: gcc.c-torture/execute/20010122-1.c -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test
> >
> > both assembler outputs with and without this patch are identical on my side
>
> Interesting. In -O1 test I see the following difference that is going to affect
> the return value of the corresponding functions:
>
> --- gcc-13-3308-gb4a4c6382b14-call0-le/20010122-1.s 2022-10-17
> 20:07:32.390363204 -0700
> +++ gcc-13-3309-g851636ecd015-call0-le/20010122-1.s 2022-10-17
> 20:06:36.613785546 -0700
> @@ -143,13 +143,10 @@
> test2:
> addi sp, sp, -16
> s32i.n a0, sp, 12
> - s32i.n a12, sp, 8
> - mov.n a12, a0
> l32r a2, .LC6
> callx0 a2
> - mov.n a2, a12
> + mov.n a2, a0
> l32i.n a0, sp, 12
> - l32i.n a12, sp, 8
> addi sp, sp, 16
> ret.n
> .size test2, .-test2
> @@ -161,13 +158,10 @@
> test3:
> addi sp, sp, -16
> s32i.n a0, sp, 12
> - s32i.n a12, sp, 8
> - mov.n a12, a0
> l32r a2, .LC7
> callx0 a2
> - mov.n a2, a12
> + mov.n a2, a0
> l32i.n a0, sp, 12
> - l32i.n a12, sp, 8
> addi sp, sp, 16
> ret.n
> .size test3, .-test3
> @@ -258,14 +252,11 @@
> test8:
> addi sp, sp, -16
> s32i.n a0, sp, 12
> - s32i.n a12, sp, 8
> - mov.n a12, a0
> l32r a2, .LC12
> callx0 a2
> l32r a2, .LC13
> - s32i.n a12, a2, 0
> + s32i.n a0, a2, 0
> l32i.n a0, sp, 12
> - l32i.n a12, sp, 8
> addi sp, sp, 16
> ret.n
> .size test8, .-test8
I've noticed that this is related to the following hunk:
-#define CALL_USED_REGISTERS \
+#define CALL_REALLY_USED_REGISTERS \
{ \
- 1, 1, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2, \
- 1, 1, 1, \
+ 0, 0, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2, \
+ 0, 0, 1, \
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \
1, \
}
And the following change on top of v2 fixes this regression for me:
diff --git a/gcc/config/xtensa/xtensa.h b/gcc/config/xtensa/xtensa.h
index 6b60e5960625..897f87f735da 100644
--- a/gcc/config/xtensa/xtensa.h
+++ b/gcc/config/xtensa/xtensa.h
@@ -244,7 +244,7 @@ along with GCC; see the file COPYING3. If not see
#define CALL_REALLY_USED_REGISTERS \
{ \
- 0, 0, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2, \
+ 1, 0, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2, \
0, 0, 1, \
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \
1, \
@@ -121,7 +121,7 @@
(ior (and (match_code "const_int,const_double,const,symbol_ref,label_ref")
(match_test "TARGET_AUTO_LITPOOLS"))
(and (match_code "const_int")
- (match_test "can_create_pseudo_p ()"))))
+ (match_test "! xtensa_split1_finished_p ()"))))
;; Memory constraints. Do not use define_memory_constraint here. Doing so
;; causes reload to force some constants into the constant pool, but since
@@ -149,7 +149,7 @@
(ior (and (match_code "const_int")
(match_test "(GET_MODE_CLASS (mode) == MODE_INT
&& xtensa_simm12b (INTVAL (op)))
- || can_create_pseudo_p ()"))
+ || ! xtensa_split1_finished_p ()"))
(and (match_code "const_int,const_double,const,symbol_ref,label_ref")
(match_test "(TARGET_CONST16 || TARGET_AUTO_LITPOOLS)
&& CONSTANT_P (op)
@@ -58,6 +58,8 @@ extern char *xtensa_emit_call (int, rtx *);
extern char *xtensa_emit_sibcall (int, rtx *);
extern bool xtensa_tls_referenced_p (rtx);
extern enum rtx_code xtensa_shlrd_which_direction (rtx, rtx);
+extern bool xtensa_split1_finished_p (void);
+extern void xtensa_split_DI_reg_imm (rtx *);
#ifdef TREE_CODE
extern void init_cumulative_args (CUMULATIVE_ARGS *, int);
@@ -56,6 +56,7 @@ along with GCC; see the file COPYING3. If not see
#include "hw-doloop.h"
#include "rtl-iter.h"
#include "insn-attr.h"
+#include "tree-pass.h"
/* This file should be included last. */
#include "target-def.h"
@@ -199,6 +200,7 @@ static void xtensa_output_mi_thunk (FILE *file, tree thunk ATTRIBUTE_UNUSED,
HOST_WIDE_INT delta,
HOST_WIDE_INT vcall_offset,
tree function);
+static bool xtensa_lra_p (void);
static rtx xtensa_delegitimize_address (rtx);
@@ -295,7 +297,7 @@ static rtx xtensa_delegitimize_address (rtx);
#define TARGET_CANNOT_FORCE_CONST_MEM xtensa_cannot_force_const_mem
#undef TARGET_LRA_P
-#define TARGET_LRA_P hook_bool_void_false
+#define TARGET_LRA_P xtensa_lra_p
#undef TARGET_LEGITIMATE_ADDRESS_P
#define TARGET_LEGITIMATE_ADDRESS_P xtensa_legitimate_address_p
@@ -492,21 +494,30 @@ xtensa_mask_immediate (HOST_WIDE_INT v)
int
xt_true_regnum (rtx x)
{
- if (GET_CODE (x) == REG)
+ if (REG_P (x))
{
- if (reg_renumber
- && REGNO (x) >= FIRST_PSEUDO_REGISTER
- && reg_renumber[REGNO (x)] >= 0)
+ if (! HARD_REGISTER_P (x)
+ && reg_renumber
+ && (lra_in_progress || reg_renumber[REGNO (x)] >= 0))
return reg_renumber[REGNO (x)];
return REGNO (x);
}
- if (GET_CODE (x) == SUBREG)
+ if (SUBREG_P (x))
{
int base = xt_true_regnum (SUBREG_REG (x));
- if (base >= 0 && base < FIRST_PSEUDO_REGISTER)
- return base + subreg_regno_offset (REGNO (SUBREG_REG (x)),
- GET_MODE (SUBREG_REG (x)),
- SUBREG_BYTE (x), GET_MODE (x));
+
+ if (base >= 0
+ && HARD_REGISTER_NUM_P (base))
+ {
+ struct subreg_info info;
+
+ subreg_get_info (lra_in_progress
+ ? (unsigned) base : REGNO (SUBREG_REG (x)),
+ GET_MODE (SUBREG_REG (x)),
+ SUBREG_BYTE (x), GET_MODE (x), &info);
+ if (info.representable_p)
+ return base + info.offset;
+ }
}
return -1;
}
@@ -2477,6 +2488,36 @@ xtensa_shlrd_which_direction (rtx op0, rtx op1)
}
+/* Return true after "split1" pass has been finished. */
+
+bool
+xtensa_split1_finished_p (void)
+{
+ return cfun && (cfun->curr_properties & PROP_rtl_split_insns);
+}
+
+
+/* Split a DImode pair of reg (operand[0]) and const_int (operand[1]) into
+ two SImode pairs, the low-part (operands[0] and [1]) and the high-part
+ (operands[2] and [3]). */
+
+void
+xtensa_split_DI_reg_imm (rtx *operands)
+{
+ rtx lowpart, highpart;
+
+ if (WORDS_BIG_ENDIAN)
+ split_double (operands[1], &highpart, &lowpart);
+ else
+ split_double (operands[1], &lowpart, &highpart);
+
+ operands[3] = highpart;
+ operands[2] = gen_highpart (SImode, operands[0]);
+ operands[1] = lowpart;
+ operands[0] = gen_lowpart (SImode, operands[0]);
+}
+
+
/* Implement TARGET_CANNOT_FORCE_CONST_MEM. */
static bool
@@ -5119,4 +5160,12 @@ xtensa_delegitimize_address (rtx op)
return op;
}
+/* Implement TARGET_LRA_P. */
+
+static bool
+xtensa_lra_p (void)
+{
+ return TARGET_LRA;
+}
+
#include "gt-xtensa.h"
@@ -242,10 +242,10 @@ along with GCC; see the file COPYING3. If not see
Proper values are computed in TARGET_CONDITIONAL_REGISTER_USAGE. */
-#define CALL_USED_REGISTERS \
+#define CALL_REALLY_USED_REGISTERS \
{ \
- 1, 1, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2, \
- 1, 1, 1, \
+ 0, 0, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2, \
+ 0, 0, 1, \
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \
1, \
}
@@ -940,14 +940,9 @@
because of offering further optimization opportunities. */
if (register_operand (operands[0], DImode))
{
- rtx lowpart, highpart;
-
- if (TARGET_BIG_ENDIAN)
- split_double (operands[1], &highpart, &lowpart);
- else
- split_double (operands[1], &lowpart, &highpart);
- emit_insn (gen_movsi (gen_lowpart (SImode, operands[0]), lowpart));
- emit_insn (gen_movsi (gen_highpart (SImode, operands[0]), highpart));
+ xtensa_split_DI_reg_imm (operands);
+ emit_move_insn (operands[0], operands[1]);
+ emit_move_insn (operands[2], operands[3]);
DONE;
}
@@ -981,6 +976,19 @@
}
})
+(define_split
+ [(set (match_operand:DI 0 "register_operand")
+ (match_operand:DI 1 "const_int_operand"))]
+ "!TARGET_CONST16 && !TARGET_AUTO_LITPOOLS
+ && ! xtensa_split1_finished_p ()"
+ [(set (match_dup 0)
+ (match_dup 1))
+ (set (match_dup 2)
+ (match_dup 3))]
+{
+ xtensa_split_DI_reg_imm (operands);
+})
+
;; 32-bit Integer moves
(define_expand "movsi"
@@ -1017,6 +1025,18 @@
(set_attr "mode" "SI")
(set_attr "length" "2,2,2,2,2,2,3,3,3,3,6,3,3,3,3,3")])
+(define_split
+ [(set (match_operand:SI 0 "register_operand")
+ (match_operand:SI 1 "const_int_operand"))]
+ "!TARGET_CONST16 && !TARGET_AUTO_LITPOOLS
+ && ! xtensa_split1_finished_p ()
+ && ! xtensa_simm12b (INTVAL (operands[1]))"
+ [(set (match_dup 0)
+ (match_dup 1))]
+{
+ operands[1] = force_const_mem (SImode, operands[1]);
+})
+
(define_split
[(set (match_operand:SI 0 "register_operand")
(match_operand:SI 1 "constantpool_operand"))]
@@ -34,6 +34,10 @@ mextra-l32r-costs=
Target RejectNegative Joined UInteger Var(xtensa_extra_l32r_costs) Init(0)
Set extra memory access cost for L32R instruction, in clock-cycle units.
+mlra
+Target Mask(LRA)
+Use LRA instead of reload (transitional).
+
mtarget-align
Target
Automatically align branch targets to reduce branch penalties.