Hi,
on 2023/11/22 17:30, Kewen.Lin wrote:
> on 2023/11/17 20:55, Alexander Monakov wrote:
>>
>> On Fri, 17 Nov 2023, Kewen.Lin wrote:
>>>> I don't think you can run cleanup_cfg after sched_init. I would suggest
>>>> to put it early in schedule_insns.
>>>
>>> Thanks for the suggestion, I placed it at the beginning of haifa_sched_init
>>> instead, since schedule_insns invokes haifa_sched_init, although the
>>> calls rgn_setup_common_sched_info and rgn_setup_sched_infos are executed
>>> ahead but they are all "setup" functions, shouldn't affect or be affected
>>> by this placement.
>>
>> I was worried because sched_init invokes df_analyze, and I'm not sure if
>> cfg_cleanup can invalidate it.
>
> Thanks for further explaining! By scanning cleanup_cfg, it seems that it
> considers df, like compact_blocks checks df, try_optimize_cfg invokes
> df_analyze etc., but I agree that moving cleanup_cfg before sched_init
> makes more sense.
>
>>
>>>> I suspect this may be caused by invoking cleanup_cfg too late.
>>>
>>> By looking into some failures, I found that although cleanup_cfg is executed
>>> there would be still some empty blocks left, by analyzing a few failures there
>>> are at least such cases:
>>> 1. empty function body
>>> 2. block holding a label for return.
>>> 3. block without any successor.
>>> 4. block which becomes empty after scheduling some other block.
>>> 5. block which looks mergeable with its always successor but left.
>>> ...
>>>
>>> For 1,2, there is one single successor EXIT block, I think they don't affect
>>> state transition, for 3, it's the same. For 4, it depends on if we can have
>>> the assumption this kind of empty block doesn't have the chance to have debug
>>> insn (like associated debug insn should be moved along), I'm not sure. For 5,
>>> a reduced test case is:
>>
>> Oh, I should have thought of cases like these, really sorry about the slip
>> of attention, and thanks for showing a testcase for item 5. As Richard as
>> saying in his response, cfg_cleanup cannot be a fix here. The thing to check
>> would be changing no_real_insns_p to always return false, and see if the
>> situation looks recoverable (if it breaks bootstrap, regtest statistics of
>> a non-bootstrapped compiler are still informative).
>
> As you suggested, I forced no_real_insns_p to return false all the time, some
> issues got exposed, almost all of them are asserting NOTE_P insn shouldn't be
> encountered in those places, so the adjustments for most of them are just to
> consider NOTE_P or this kind of special block and so on. One draft patch is
> attached, it can be bootstrapped and regress-tested on ppc64{,le} and x86.
> btw, it's without the previous cfg_cleanup adjustment (hope it can get more
> empty blocks and expose more issues). The draft isn't qualified for code
> review but I hope it can provide some information on what kinds of changes
> are needed for the proposal. If this is the direction which we all agree on,
> I'll further refine it and post a formal patch. One thing I want to note is
> that this patch disable one assertion below:
>
> diff --git a/gcc/sched-rgn.cc b/gcc/sched-rgn.cc
> index e5964f54ead..abd334864fb 100644
> --- a/gcc/sched-rgn.cc
> +++ b/gcc/sched-rgn.cc
> @@ -3219,7 +3219,7 @@ schedule_region (int rgn)
> }
>
> /* Sanity check: verify that all region insns were scheduled. */
> - gcc_assert (sched_rgn_n_insns == rgn_n_insns);
> + // gcc_assert (sched_rgn_n_insns == rgn_n_insns);
>
> sched_finish_ready_list ();
>
> Some cases can cause this assertion to fail, it's due to the mismatch on
> to-be-scheduled and scheduled insn counts. The reason why it happens is that
> one block previously has only one INSN_P but while scheduling some other blocks
> it gets moved as well then we ends up with an empty block so that the only
> NOTE_P insn was counted then, but since this block isn't empty initially and
> NOTE_P gets skipped in a normal block, the count to-be-scheduled can't count
> it in. It can be fixed with special-casing this kind of block for counting
> like initially recording which block is empty and if a block isn't recorded
> before then fix up the count for it accordingly. I'm not sure if someone may
> have an argument that all the complication make this proposal beaten by
> previous special-casing debug insn approach, looking forward to more comments.
>
The attached one is the improved draft patch v2 for skipping empty BB, against
the previous draft, it does:
1) use NONDEBUG_INSN_P for !DEBUG_INSN_P && !NOTE_P when it's appropriate;
2) merge NOTE_P special handling into the one on DEBUG_INSN_P;
3) fix exposed issue on broad testing on EBB;
4) introduce rgn_init_empty_bb for mismatch count issue;
5) add/refine some comments;
It's bootstrapped and regress-tested on x86_64-redhat-linux and
powerpc64{,le}-linux-gnu. I also tested with EBB turned on by default, one
issue in schedule_ebb got exposed and fixed, bootstrapped on x86_64-redhat-linux
and powerpc64{,le}-linux-gnu, regress-tested on powerpc64{,le}-linux-gnu, one
guality failure on x86_64-redhat-linux. With seletive-scheduling turned on by
default, it's bootstrapped on x86_64-redhat-linux and a few guality failures
and some failures with extra warnings (like with -fvar-tracking-assignments),
those failures are trivial. Note that I'm unable to test forced seletive-scheduling
on Power since it's even broken without this patch, there seems a few issues,
I'll file some PRs separately.
Any comments are highly appreciated. I'll go forward to make a formal patch if
there is no objection this week. I think we still can keep this no_real_insns_p
since in some places like free_block_dependencies it's still good for early
return, we can just remove its uses in some places.
BR,
Kewen
----
Subject: [PATCH draft v2] sched: Don't skip empty block in scheduling
---
gcc/haifa-sched.cc | 172 ++++++++++++++++++++++++++-------------------
gcc/rtl.h | 4 +-
gcc/sched-ebb.cc | 7 +-
gcc/sched-rgn.cc | 23 +++++-
4 files changed, 126 insertions(+), 80 deletions(-)
@@ -1207,6 +1207,11 @@ recompute_todo_spec (rtx_insn *next, bool for_backtrack)
int n_replace = 0;
bool first_p = true;
+ /* Since we don't skip no_real_insns_p block any more, it's
+ possible to meet NOTE insn now, early return if so. */
+ if (NOTE_P (next))
+ return 0;
+
if (sd_lists_empty_p (next, SD_LIST_BACK))
/* NEXT has all its dependencies resolved. */
return 0;
@@ -1726,6 +1731,11 @@ setup_insn_reg_pressure_info (rtx_insn *insn)
int *max_reg_pressure;
static int death[N_REG_CLASSES];
+ /* Since we don't skip no_real_insns_p block any more, it's possible to
+ schedule NOTE insn now, we should check for it first. */
+ if (NOTE_P (insn))
+ return;
+
gcc_checking_assert (!DEBUG_INSN_P (insn));
excess_cost_change = 0;
@@ -4017,10 +4027,10 @@ schedule_insn (rtx_insn *insn)
/* Scheduling instruction should have all its dependencies resolved and
should have been removed from the ready list. */
- gcc_assert (sd_lists_empty_p (insn, SD_LIST_HARD_BACK));
+ gcc_assert (NOTE_P (insn) || sd_lists_empty_p (insn, SD_LIST_HARD_BACK));
/* Reset debug insns invalidated by moving this insn. */
- if (MAY_HAVE_DEBUG_BIND_INSNS && !DEBUG_INSN_P (insn))
+ if (MAY_HAVE_DEBUG_BIND_INSNS && NONDEBUG_INSN_P (insn))
for (sd_it = sd_iterator_start (insn, SD_LIST_BACK);
sd_iterator_cond (&sd_it, &dep);)
{
@@ -4106,61 +4116,66 @@ schedule_insn (rtx_insn *insn)
check_clobbered_conditions (insn);
- /* Update dependent instructions. First, see if by scheduling this insn
- now we broke a dependence in a way that requires us to change another
- insn. */
- for (sd_it = sd_iterator_start (insn, SD_LIST_SPEC_BACK);
- sd_iterator_cond (&sd_it, &dep); sd_iterator_next (&sd_it))
+ /* Since we don't skip no_real_insns_p block any more, it's possible to
+ schedule NOTE insn now, we should check for it first. */
+ if (!NOTE_P (insn))
{
- struct dep_replacement *desc = DEP_REPLACE (dep);
- rtx_insn *pro = DEP_PRO (dep);
- if (QUEUE_INDEX (pro) != QUEUE_SCHEDULED
- && desc != NULL && desc->insn == pro)
- apply_replacement (dep, false);
- }
+ /* Update dependent instructions. First, see if by scheduling this insn
+ now we broke a dependence in a way that requires us to change another
+ insn. */
+ for (sd_it = sd_iterator_start (insn, SD_LIST_SPEC_BACK);
+ sd_iterator_cond (&sd_it, &dep); sd_iterator_next (&sd_it))
+ {
+ struct dep_replacement *desc = DEP_REPLACE (dep);
+ rtx_insn *pro = DEP_PRO (dep);
+ if (QUEUE_INDEX (pro) != QUEUE_SCHEDULED && desc != NULL
+ && desc->insn == pro)
+ apply_replacement (dep, false);
+ }
- /* Go through and resolve forward dependencies. */
- for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
- sd_iterator_cond (&sd_it, &dep);)
- {
- rtx_insn *next = DEP_CON (dep);
- bool cancelled = (DEP_STATUS (dep) & DEP_CANCELLED) != 0;
+ /* Go through and resolve forward dependencies. */
+ for (sd_it = sd_iterator_start (insn, SD_LIST_FORW);
+ sd_iterator_cond (&sd_it, &dep);)
+ {
+ rtx_insn *next = DEP_CON (dep);
+ bool cancelled = (DEP_STATUS (dep) & DEP_CANCELLED) != 0;
- /* Resolve the dependence between INSN and NEXT.
- sd_resolve_dep () moves current dep to another list thus
- advancing the iterator. */
- sd_resolve_dep (sd_it);
+ /* Resolve the dependence between INSN and NEXT.
+ sd_resolve_dep () moves current dep to another list thus
+ advancing the iterator. */
+ sd_resolve_dep (sd_it);
- if (cancelled)
- {
- if (must_restore_pattern_p (next, dep))
- restore_pattern (dep, false);
- continue;
- }
+ if (cancelled)
+ {
+ if (must_restore_pattern_p (next, dep))
+ restore_pattern (dep, false);
+ continue;
+ }
- /* Don't bother trying to mark next as ready if insn is a debug
- insn. If insn is the last hard dependency, it will have
- already been discounted. */
- if (DEBUG_INSN_P (insn) && !DEBUG_INSN_P (next))
- continue;
+ /* Don't bother trying to mark next as ready if insn is a debug
+ insn. If insn is the last hard dependency, it will have
+ already been discounted. */
+ if (DEBUG_INSN_P (insn) && !DEBUG_INSN_P (next))
+ continue;
- if (!IS_SPECULATION_BRANCHY_CHECK_P (insn))
- {
- int effective_cost;
+ if (!IS_SPECULATION_BRANCHY_CHECK_P (insn))
+ {
+ int effective_cost;
- effective_cost = try_ready (next);
+ effective_cost = try_ready (next);
- if (effective_cost >= 0
- && SCHED_GROUP_P (next)
- && advance < effective_cost)
- advance = effective_cost;
- }
- else
- /* Check always has only one forward dependence (to the first insn in
- the recovery block), therefore, this will be executed only once. */
- {
- gcc_assert (sd_lists_empty_p (insn, SD_LIST_FORW));
- fix_recovery_deps (RECOVERY_BLOCK (insn));
+ if (effective_cost >= 0 && SCHED_GROUP_P (next)
+ && advance < effective_cost)
+ advance = effective_cost;
+ }
+ else
+ /* Check always has only one forward dependence (to the first insn
+ in the recovery block), therefore, this will be executed only
+ once. */
+ {
+ gcc_assert (sd_lists_empty_p (insn, SD_LIST_FORW));
+ fix_recovery_deps (RECOVERY_BLOCK (insn));
+ }
}
}
@@ -4170,9 +4185,9 @@ schedule_insn (rtx_insn *insn)
may use this information to decide how the instruction should
be aligned. */
if (issue_rate > 1
+ && NONDEBUG_INSN_P (insn)
&& GET_CODE (PATTERN (insn)) != USE
- && GET_CODE (PATTERN (insn)) != CLOBBER
- && !DEBUG_INSN_P (insn))
+ && GET_CODE (PATTERN (insn)) != CLOBBER)
{
if (reload_completed)
PUT_MODE (insn, clock_var > last_clock_var ? TImode : VOIDmode);
@@ -5036,8 +5051,11 @@ get_ebb_head_tail (basic_block beg, basic_block end,
/* Return true if there are no real insns in the range [ HEAD, TAIL ]. */
bool
-no_real_insns_p (const rtx_insn *head, const rtx_insn *tail)
+no_real_insns_p (const rtx_insn *head ATTRIBUTE_UNUSED,
+ const rtx_insn *tail ATTRIBUTE_UNUSED)
{
+ return false;
+#if 0
while (head != NEXT_INSN (tail))
{
if (!NOTE_P (head) && !LABEL_P (head))
@@ -5045,6 +5063,7 @@ no_real_insns_p (const rtx_insn *head, const rtx_insn *tail)
head = NEXT_INSN (head);
}
return true;
+#endif
}
/* Restore-other-notes: NOTE_LIST is the end of a chain of notes
@@ -6224,8 +6243,12 @@ commit_schedule (rtx_insn *prev_head, rtx_insn *tail, basic_block *target_bb)
scheduled_insns.iterate (i, &insn);
i++)
{
- if (control_flow_insn_p (last_scheduled_insn)
- || current_sched_info->advance_target_bb (*target_bb, insn))
+ /* Since we don't skip no_real_insns_p block any more, it's possible
+ to schedule NOTE insn now, we should check for it here to avoid
+ unexpected target bb advance. */
+ if ((control_flow_insn_p (last_scheduled_insn)
+ || current_sched_info->advance_target_bb (*target_bb, insn))
+ && !NOTE_P (insn))
{
*target_bb = current_sched_info->advance_target_bb (*target_bb, 0);
@@ -6245,7 +6268,7 @@ commit_schedule (rtx_insn *prev_head, rtx_insn *tail, basic_block *target_bb)
(*current_sched_info->begin_move_insn) (insn, last_scheduled_insn);
move_insn (insn, last_scheduled_insn,
current_sched_info->next_tail);
- if (!DEBUG_INSN_P (insn))
+ if (NONDEBUG_INSN_P (insn))
reemit_notes (insn);
last_scheduled_insn = insn;
}
@@ -6296,7 +6319,7 @@ prune_ready_list (state_t temp_state, bool first_cycle_insn_p,
int cost = 0;
const char *reason = "resource conflict";
- if (DEBUG_INSN_P (insn))
+ if (DEBUG_INSN_P (insn) || NOTE_P (insn))
continue;
if (sched_group_found && !SCHED_GROUP_P (insn)
@@ -6504,7 +6527,7 @@ schedule_block (basic_block *target_bb, state_t init_state)
and caused problems because schedule_block and compute_forward_dependences
had different notions of what the "head" insn was. */
- gcc_assert (head != tail || INSN_P (head));
+ gcc_assert (head != tail || INSN_P (head) || NOTE_P (head));
haifa_recovery_bb_recently_added_p = false;
@@ -6539,15 +6562,15 @@ schedule_block (basic_block *target_bb, state_t init_state)
if (targetm.sched.init)
targetm.sched.init (sched_dump, sched_verbose, ready.veclen);
+ gcc_assert (((NOTE_P (prev_head) || DEBUG_INSN_P (prev_head))
+ && BLOCK_FOR_INSN (prev_head) == *target_bb)
+ || (head == tail && NOTE_P (head)));
+
/* We start inserting insns after PREV_HEAD. */
last_scheduled_insn = prev_head;
last_nondebug_scheduled_insn = NULL;
nonscheduled_insns_begin = NULL;
- gcc_assert ((NOTE_P (last_scheduled_insn)
- || DEBUG_INSN_P (last_scheduled_insn))
- && BLOCK_FOR_INSN (last_scheduled_insn) == *target_bb);
-
/* Initialize INSN_QUEUE. Q_SIZE is the total number of insns in the
queue. */
q_ptr = 0;
@@ -6725,15 +6748,16 @@ schedule_block (basic_block *target_bb, state_t init_state)
}
}
- /* We don't want md sched reorder to even see debug isns, so put
- them out right away. */
- if (ready.n_ready && DEBUG_INSN_P (ready_element (&ready, 0))
+ /* We don't want md sched reorder to even see debug and note insns,
+ so put them out right away. */
+ if (ready.n_ready
+ && !NONDEBUG_INSN_P (ready_element (&ready, 0))
&& (*current_sched_info->schedule_more_p) ())
{
- while (ready.n_ready && DEBUG_INSN_P (ready_element (&ready, 0)))
+ while (ready.n_ready && !NONDEBUG_INSN_P (ready_element (&ready, 0)))
{
rtx_insn *insn = ready_remove_first (&ready);
- gcc_assert (DEBUG_INSN_P (insn));
+ gcc_assert (DEBUG_INSN_P (insn) || NOTE_P (insn));
(*current_sched_info->begin_schedule_ready) (insn);
scheduled_insns.safe_push (insn);
last_scheduled_insn = insn;
@@ -7145,17 +7169,18 @@ schedule_block (basic_block *target_bb, state_t init_state)
int
set_priorities (rtx_insn *head, rtx_insn *tail)
{
+ /* Since we don't skip no_real_insns_p block any more, it's
+ possible to meet NOTE insn now, we don't need to compute
+ priority for such block, so early return. */
+ if (head == tail && !INSN_P (head))
+ return 1;
+
rtx_insn *insn;
- int n_insn;
+ int n_insn = 0;
int sched_max_insns_priority =
current_sched_info->sched_max_insns_priority;
rtx_insn *prev_head;
- if (head == tail && ! INSN_P (head))
- gcc_unreachable ();
-
- n_insn = 0;
-
prev_head = PREV_INSN (head);
for (insn = tail; insn != prev_head; insn = PREV_INSN (insn))
{
@@ -7688,7 +7713,8 @@ fix_tick_ready (rtx_insn *next)
{
int tick, delay;
- if (!DEBUG_INSN_P (next) && !sd_lists_empty_p (next, SD_LIST_RES_BACK))
+ if (NONDEBUG_INSN_P (next)
+ && !sd_lists_empty_p (next, SD_LIST_RES_BACK))
{
int full_p;
sd_iterator_def sd_it;
@@ -2695,8 +2695,8 @@ do { \
/* During sched, 1 if RTX is an insn that must be scheduled together
with the preceding insn. */
#define SCHED_GROUP_P(RTX) \
- (RTL_FLAG_CHECK4 ("SCHED_GROUP_P", (RTX), DEBUG_INSN, INSN, \
- JUMP_INSN, CALL_INSN)->in_struct)
+ (RTL_FLAG_CHECK5 ("SCHED_GROUP_P", (RTX), DEBUG_INSN, INSN, \
+ JUMP_INSN, CALL_INSN, NOTE)->in_struct)
/* For a SET rtx, SET_DEST is the place that is set
and SET_SRC is the value it is set to. */
@@ -478,12 +478,10 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, bool modulo_scheduling)
a note or two. */
while (head != tail)
{
- if (NOTE_P (head) || DEBUG_INSN_P (head))
+ if (LABEL_P (head) || NOTE_P (head) || DEBUG_INSN_P (head))
head = NEXT_INSN (head);
else if (NOTE_P (tail) || DEBUG_INSN_P (tail))
tail = PREV_INSN (tail);
- else if (LABEL_P (head))
- head = NEXT_INSN (head);
else
break;
}
@@ -494,7 +492,8 @@ schedule_ebb (rtx_insn *head, rtx_insn *tail, bool modulo_scheduling)
if (no_real_insns_p (head, tail))
return BLOCK_FOR_INSN (tail);
- gcc_assert (INSN_P (head) && INSN_P (tail));
+ gcc_assert ((NOTE_P (head) && head == tail)
+ || (INSN_P (head) && INSN_P (tail)));
if (!bitmap_bit_p (&dont_calc_deps, first_bb->index))
{
@@ -228,6 +228,9 @@ static edgeset *pot_split;
/* For every bb, a set of its ancestor edges. */
static edgeset *ancestor_edges;
+/* Indicate the bb is empty initially if set. */
+static bitmap rgn_init_empty_bb;
+
#define INSN_PROBABILITY(INSN) (SRC_PROB (BLOCK_TO_BB (BLOCK_NUM (INSN))))
/* Speculative scheduling functions. */
@@ -3216,6 +3219,14 @@ schedule_region (int rgn)
/* Clean up. */
if (current_nr_blocks > 1)
free_trg_info ();
+
+ /* This empty block isn't empty initially, it means the only NOTE
+ inside was not counted when computing rgn_n_insns, so fix it up
+ now. */
+ if (head == tail
+ && NOTE_P (head)
+ && !bitmap_bit_p (rgn_init_empty_bb, bb))
+ rgn_n_insns++;
}
/* Sanity check: verify that all region insns were scheduled. */
@@ -3448,7 +3459,16 @@ sched_rgn_local_init (int rgn)
continue;
FOR_EACH_EDGE (e, ei, block->succs)
e->aux = NULL;
- }
+ }
+ }
+
+ rgn_init_empty_bb = BITMAP_ALLOC (NULL);
+ for (bb = 0; bb < current_nr_blocks; bb++)
+ {
+ rtx_insn *head, *tail;
+ get_ebb_head_tail (EBB_FIRST_BB (bb), EBB_LAST_BB (bb), &head, &tail);
+ if (head == tail && NOTE_P (head))
+ bitmap_set_bit (rgn_init_empty_bb, bb);
}
}
@@ -3461,6 +3481,7 @@ sched_rgn_local_free (void)
sbitmap_vector_free (pot_split);
sbitmap_vector_free (ancestor_edges);
free (rgn_edges);
+ BITMAP_FREE (rgn_init_empty_bb);
}
/* Free data computed for the finished region. */