Message ID | 20230801152648._y603AS_@linutronix.de |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp2763790vqg; Tue, 1 Aug 2023 08:58:03 -0700 (PDT) X-Google-Smtp-Source: APBJJlHIkciEdpOucbUrBehe0vRVFUcM0Xj3wA9T5GW+a6naE6Mj3nWZ/P9rS024PWcosBtF3XWX X-Received: by 2002:a17:907:961f:b0:988:d841:7f90 with SMTP id gb31-20020a170907961f00b00988d8417f90mr3737677ejc.27.1690905483690; Tue, 01 Aug 2023 08:58:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690905483; cv=none; d=google.com; s=arc-20160816; b=lEPa8aPHlCb03WcNYVPp6PJ1oQ3yTf4Jr94d6dD5c7+hPQjl/KKLovGt1GSu1Tqp4/ VnCSLE4RmTWvJhiDX3/l6edBJGCZs5DmiFSnlzhger6QsjyyDUNGvdnUdo8JdieuxJcw fjNx2r06VDyyu6H7t89VLNjHFMDEBBQcJRn/L//Lib5YYGYKQYxoXrQuAXKBwKIQ5Teh W8peh9u9DFUOQuceEP2LIjqDeVo3S7+8iEs7pR+Z698lmaGnfYqEmEM7QvQ0H9upnYto L3KUCt0ZHnH6piVSnykUGVBN8tJT8K8uDIXyc/SPeTkF8/R4ghTLDH1kcDQhBlPzhQ+i qjJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:message-id :subject:cc:to:from:dkim-signature:dkim-signature:date; bh=e+Sq00pg9zX9idaJMZOXf7RcOUolfU9E1Ueldsmzb9U=; fh=k45dwNjkkv+NDC8x9cOmIerayLVio41WsA0BZQDJUGQ=; b=y/zDFad1EFQw/t03eTEzZ1ykFrqRoatq6XfuKkaMrASVmpvjtDi8i9X8qyzvV/3dcP oyV/4c9T/iYvpliq2qJjJZ+PxUOtkLwFko4S0CH+u2lNezqAnxkGEe3i5amA7bYkkAq1 jhk+cPoPLa8xz5QzrRYpZEfXdIr4TUq7ovWuB4O2NGp+UYiN0UIzM1RmLODfrYMhLHPZ xncnJ1sDLkuP3sIez7bHvQkzi4jYdHHFT5yeE8ZROoXp7+MZXYpI8QJHJqjZzuiYysMW S9xCcd+c/1rC8cLjfWOgviU2nqaTLTEqFHbScUFbIBRyaPEGVU4ogd6FQXAl//epxML/ LpQA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=m6dH707a; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id va16-20020a17090711d000b0099bd8fb8a96si9646156ejb.686.2023.08.01.08.57.32; Tue, 01 Aug 2023 08:58:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=m6dH707a; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233978AbjHAP07 (ORCPT <rfc822;maxi.paulin@gmail.com> + 99 others); Tue, 1 Aug 2023 11:26:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234093AbjHAP0z (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 1 Aug 2023 11:26:55 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 618551FEE for <linux-kernel@vger.kernel.org>; Tue, 1 Aug 2023 08:26:54 -0700 (PDT) Date: Tue, 1 Aug 2023 17:26:48 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1690903612; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=e+Sq00pg9zX9idaJMZOXf7RcOUolfU9E1Ueldsmzb9U=; b=m6dH707acHOf42oeSBtli0Utksxr7hRisdSQmwFe+LZXavmDGKVXNgQ7Tt28USHHPTqv5J c0FGKtL1YytiJzHi5HrvCDq4Mlypd4B+jkfcASBnfsOkMsdOOLJJvSqcAD6c5P8mu63lD2 NpusIGZso7La9OyrJQAHZo7C55thdVDVU0qjkQX0gObxPr+YnbKX1sJNpEJurPrwpxAhY5 n4NyC1g5C6yetQEXav9CqzUWfHOCaBVCeh/JVdRBHk3KdC0Su2JqwZGrZ7NOaqppf+l/6Y uScpI44f+ImvcQRegaj6aEIEDUdTCEhiq7BBcALl267aCrwS6Cm+kuxyPWhWdg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1690903612; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=e+Sq00pg9zX9idaJMZOXf7RcOUolfU9E1Ueldsmzb9U=; b=QUXYg22OzS/BcPddjpaHBk91+eddU+p16HvAsQwa7jCVN+oyWZJXsJWc1dolmhnvSyKdXd 78s4JulygIRG2qDw== From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> To: linux-kernel@vger.kernel.org Cc: Ben Segall <bsegall@google.com>, Daniel Bristot de Oliveira <bristot@redhat.com>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Ingo Molnar <mingo@redhat.com>, Juri Lelli <juri.lelli@redhat.com>, Mel Gorman <mgorman@suse.de>, Peter Zijlstra <peterz@infradead.org>, Steven Rostedt <rostedt@goodmis.org>, Valentin Schneider <vschneid@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org>, Thomas Gleixner <tglx@linutronix.de> Subject: [PATCH] sched/rt: Don't try push tasks if there are none. Message-ID: <20230801152648._y603AS_@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773042908371731146 X-GMAIL-MSGID: 1773042908371731146 |
Series |
sched/rt: Don't try push tasks if there are none.
|
|
Commit Message
Sebastian Andrzej Siewior
Aug. 1, 2023, 3:26 p.m. UTC
I have a RT task X at a high priority and cyclictest on each CPU with
lower priority than X's. If X is active and each CPU wakes their own
cylictest thread then it ends in a longer rto_push storm.
A random CPU determines via balance_rt() that the CPU on which X is
running needs to push tasks. X has the highest priority, cyclictest is
next in line so there is nothing that can be done since the task with
the higher priority is not touched.
tell_cpu_to_push() increments rto_loop_next and schedules
rto_push_irq_work_func() on X's CPU. The other CPUs also increment the
loop counter and do the same. Once rto_push_irq_work_func() is active it
does nothing because it has _no_ pushable tasks on its runqueue. Then
checks rto_next_cpu() and decides to queue irq_work on the local CPU
because another CPU requested a push by incrementing the counter.
I have traces where ~30 CPUs request this ~3 times each before it
finally ends. This greatly increases X's runtime while X isn't making
much progress.
Teach rto_next_cpu() to only return CPUs which also have tasks on their
runqueue which can be pushed away. This does not reduce the
tell_cpu_to_push() invocations (rto_loop_next counter increments) but
reduces the amount of issued rto_push_irq_work_func() if nothing can be
done. As the result the overloaded CPU is blocked less often.
There are still cases where the "same job" is repeated several times
(for instance the current CPU needs to resched but didn't yet because
the irq-work is repeated a few times and so the old task remains on the
CPU) but the majority of request end in tell_cpu_to_push() before an IPI
is issued.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
kernel/sched/rt.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
Comments
On Tue, 1 Aug 2023 17:26:48 +0200 Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c > index 00e0e50741153..338cd150973ff 100644 > --- a/kernel/sched/rt.c > +++ b/kernel/sched/rt.c > @@ -2247,8 +2247,11 @@ static int rto_next_cpu(struct root_domain *rd) > > rd->rto_cpu = cpu; > > - if (cpu < nr_cpu_ids) > + if (cpu < nr_cpu_ids) { > + if (!has_pushable_tasks(cpu_rq(cpu))) > + continue; Hmm, I had to read this again to make sure we can't get into an infinite loop. But it looks like we are safe, as if none of the CPUs had pushable tasks, we would still hit the cpu == nr_cpu_ids case and fall into the loop check, and break out there. Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> -- Steve > return cpu; > + } > > rd->rto_cpu = -1; >
On 01/08/23 17:26, Sebastian Andrzej Siewior wrote: > I have a RT task X at a high priority and cyclictest on each CPU with > lower priority than X's. If X is active and each CPU wakes their own > cylictest thread then it ends in a longer rto_push storm. > A random CPU determines via balance_rt() that the CPU on which X is > running needs to push tasks. X has the highest priority, cyclictest is > next in line so there is nothing that can be done since the task with > the higher priority is not touched. > > tell_cpu_to_push() increments rto_loop_next and schedules > rto_push_irq_work_func() on X's CPU. The other CPUs also increment the > loop counter and do the same. Once rto_push_irq_work_func() is active it > does nothing because it has _no_ pushable tasks on its runqueue. Then > checks rto_next_cpu() and decides to queue irq_work on the local CPU > because another CPU requested a push by incrementing the counter. > For a CPU to be in the rto_mask, it needs: rt_rq->rt_nr_migratory && rt_rq->rt_nr_total > 1 But if that CPU has no pushable tasks, then that means only the current task has p->nr_cpus_allowed > 1. Should we change it so a CPU is only in the rto_mask iff it has pushable tasks? AFAICT that should not break the case where we push the current task away due to migration_disabled, as that still relies on the migration_disabled task to be in the pushable list.
On 2023-08-09 18:02:32 [+0100], Valentin Schneider wrote: > On 01/08/23 17:26, Sebastian Andrzej Siewior wrote: > > I have a RT task X at a high priority and cyclictest on each CPU with > > lower priority than X's. If X is active and each CPU wakes their own > > cylictest thread then it ends in a longer rto_push storm. > > A random CPU determines via balance_rt() that the CPU on which X is > > running needs to push tasks. X has the highest priority, cyclictest is > > next in line so there is nothing that can be done since the task with > > the higher priority is not touched. > > > > tell_cpu_to_push() increments rto_loop_next and schedules > > rto_push_irq_work_func() on X's CPU. The other CPUs also increment the > > loop counter and do the same. Once rto_push_irq_work_func() is active it > > does nothing because it has _no_ pushable tasks on its runqueue. Then > > checks rto_next_cpu() and decides to queue irq_work on the local CPU > > because another CPU requested a push by incrementing the counter. > > > > For a CPU to be in the rto_mask, it needs: > > rt_rq->rt_nr_migratory && rt_rq->rt_nr_total > 1 > > But if that CPU has no pushable tasks, then that means only the current > task has p->nr_cpus_allowed > 1. > > Should we change it so a CPU is only in the rto_mask iff it has pushable > tasks? AFAICT that should not break the case where we push the current task > away due to migration_disabled, as that still relies on the > migration_disabled task to be in the pushable list. Sounds good. The task with the highest priority becomes pushable if it gets preempted (by a task with higher priority). This gets considered, right? Sebastian
On 10/08/23 10:07, Sebastian Andrzej Siewior wrote: > On 2023-08-09 18:02:32 [+0100], Valentin Schneider wrote: >> On 01/08/23 17:26, Sebastian Andrzej Siewior wrote: >> > I have a RT task X at a high priority and cyclictest on each CPU with >> > lower priority than X's. If X is active and each CPU wakes their own >> > cylictest thread then it ends in a longer rto_push storm. >> > A random CPU determines via balance_rt() that the CPU on which X is >> > running needs to push tasks. X has the highest priority, cyclictest is >> > next in line so there is nothing that can be done since the task with >> > the higher priority is not touched. >> > >> > tell_cpu_to_push() increments rto_loop_next and schedules >> > rto_push_irq_work_func() on X's CPU. The other CPUs also increment the >> > loop counter and do the same. Once rto_push_irq_work_func() is active it >> > does nothing because it has _no_ pushable tasks on its runqueue. Then >> > checks rto_next_cpu() and decides to queue irq_work on the local CPU >> > because another CPU requested a push by incrementing the counter. >> > >> >> For a CPU to be in the rto_mask, it needs: >> >> rt_rq->rt_nr_migratory && rt_rq->rt_nr_total > 1 >> >> But if that CPU has no pushable tasks, then that means only the current >> task has p->nr_cpus_allowed > 1. >> >> Should we change it so a CPU is only in the rto_mask iff it has pushable >> tasks? AFAICT that should not break the case where we push the current task >> away due to migration_disabled, as that still relies on the >> migration_disabled task to be in the pushable list. > > Sounds good. The task with the highest priority becomes pushable if it > gets preempted (by a task with higher priority). This gets considered, > right? > Yeah, when we switch the current task in put_prev_task_rt() we add the previously-running (IOW preempted) task to the pushable list (if isn't affined to just one CPU). Now it looks like if we change update_rt_migration() to look at the pushable list instead of rt_nr_migratory, we could get rid of rt_nr_migratory itself - let me poke at this a little, I might just be about to figure out why that isn't already the case... > Sebastian
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 00e0e50741153..338cd150973ff 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2247,8 +2247,11 @@ static int rto_next_cpu(struct root_domain *rd) rd->rto_cpu = cpu; - if (cpu < nr_cpu_ids) + if (cpu < nr_cpu_ids) { + if (!has_pushable_tasks(cpu_rq(cpu))) + continue; return cpu; + } rd->rto_cpu = -1;