[-tip] sched/fair: gracefully handle EEVDF scheduling failures
Commit Message
The EEVDF scheduling might fail due to unforeseen issues. Previously,
it handled such situations gracefully, which was helpful in identifying
problems, but it no longer does so. Therefore, it would be better to
restore its previous capability.
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
---
kernel/sched/fair.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
Comments
On Fri, Dec 08, 2023 at 07:20:59PM +0800, Tiwei Bie wrote:
> The EEVDF scheduling might fail due to unforeseen issues. Previously,
I might also fly if I jump up. But is there any actual reason to believe
something like that will happen?
On 12/8/23 10:32 PM, Peter Zijlstra wrote:
> On Fri, Dec 08, 2023 at 07:20:59PM +0800, Tiwei Bie wrote:
>> The EEVDF scheduling might fail due to unforeseen issues. Previously,
>
> I might also fly if I jump up. But is there any actual reason to believe
> something like that will happen?
Thanks for the quick reply! Sorry, after re-reading the commit log,
it looks confusing to me as well. I didn't mean something like that
will happen. I just thought it might be worthwhile to have a sanity
check on 'best'. Because, the 'best' is initialized to NULL and is
conditionally updated. The added 'WARN_ONCE' on '!best' is more like
a 'default' case to catch an unreachable case in a 'switch' block.
There was a similar check in the past that was helpful. And there
seems to be no harm in doing it. If this is reasonable, I'd like to
submit a v2 patch.
PS. I just noticed that the subject line should start with a uppercase
letter according to the rules in the tip tree handbook [1]. The subject
line should be something like: "sched/fair: Sanity check best in pick_eevdf()".
[1] https://www.kernel.org/doc/html/next/process/maintainer-tip.html#patch-subject
Regards,
Tiwei
@@ -878,7 +878,7 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
{
struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
- struct sched_entity *se = __pick_first_entity(cfs_rq);
+ struct sched_entity *first = __pick_first_entity(cfs_rq);
struct sched_entity *curr = cfs_rq->curr;
struct sched_entity *best = NULL;
@@ -887,7 +887,7 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
* in this cfs_rq, saving some cycles.
*/
if (cfs_rq->nr_running == 1)
- return curr && curr->on_rq ? curr : se;
+ return curr && curr->on_rq ? curr : first;
if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
curr = NULL;
@@ -900,14 +900,15 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
return curr;
/* Pick the leftmost entity if it's eligible */
- if (se && entity_eligible(cfs_rq, se)) {
- best = se;
+ if (first && entity_eligible(cfs_rq, first)) {
+ best = first;
goto found;
}
/* Heap search for the EEVD entity */
while (node) {
struct rb_node *left = node->rb_left;
+ struct sched_entity *se;
/*
* Eligible entities in left subtree are always better
@@ -937,6 +938,9 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
if (!best || (curr && entity_before(curr, best)))
best = curr;
+ if (WARN_ONCE(!best, "EEVDF scheduling failed, picking leftmost\n"))
+ best = first;
+
return best;
}