Message ID | 20231122140119.472110-1-vincent.guittot@linaro.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2b07:b0:403:3b70:6f57 with SMTP id io7csp1341650vqb; Wed, 22 Nov 2023 06:01:41 -0800 (PST) X-Google-Smtp-Source: AGHT+IEsznlSTDIzBQ1BffrOuEejH+Kjio1NxcweOhTR8qNLrw49HBsJjheJfGKpnIb52TTOT3sz X-Received: by 2002:a05:6808:1205:b0:3af:63ad:a610 with SMTP id a5-20020a056808120500b003af63ada610mr2810848oil.14.1700661700435; Wed, 22 Nov 2023 06:01:40 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700661700; cv=none; d=google.com; s=arc-20160816; b=KGuSdMlkwmkIHFzHbkPVAxvS7UluGSAbBiwmbOX1MGXNd3fPU+LXSjMdyKa4V1rxoS R4jiXRTK87wT8G/liwJZOsUn50FvS05hvtZsOD1M5riugFtg2vQH1jZhlIFzWcsdIDLh WH6pHp5wJkT/bfwZucrgkLSa6eYhn77DNOHIdsnMxK/U6RKCrFbCFoXdmAbceTVHSKg3 0Msd+50hD1olPes2DIISrUknkYlEqgWRnRrdoDsILKY/DoDhoO6QCvQgd781y7MahqhF Fop1nhpeH5rnlQus+8qAbCGZmY9iJIhUmCMLWl5TqCMzfM1W9buMax9Be16270TzGrgX 6kYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=hD1zFhwekBASv0LVRn3WfLSmlJjau+U8Wq2CF3q/Fo0=; fh=CNsUedsD9r7NGMFKxpgdZdFT94wIjelnutR9o4GnWCs=; b=LuACpwKganvKE2SRc5MNVBjZ4y8ShKit7FAeZdEQLCni0hDS2pSftnxUNJ4MVnoJ2x wC8cxW7J2YKOG4+XsntdJT5h7jua755ad6sVf5aNkLjmBu91IqF/B2dtGcbd/Ae2/UWQ gwcESnlwoaih1ewDoydSF7+eHXyhvp2o2URLhmDPFQPo/OKWmhInuViR+SWgA7W95n8H xz2+vGBnfUTcPfDSkpKJ/ubJZSd8cYa+rfyvp7iX9kxKamv7j3Y8uHF/jNj5O5wgazDF u2nlmchzbPUbIYw2s+f4kPad4Bzz2Ls1kXsczNEoAhmXk1YXG7GY7UQtYEkU2krZFcuH 949g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=lAaxW2Li; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id r9-20020a544889000000b003ab86623785si4333543oic.307.2023.11.22.06.01.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Nov 2023 06:01:40 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=lAaxW2Li; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id ECC4E81CA3D7; Wed, 22 Nov 2023 06:01:35 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343965AbjKVOBb (ORCPT <rfc822;ouuuleilei@gmail.com> + 99 others); Wed, 22 Nov 2023 09:01:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53436 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344269AbjKVOB3 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 22 Nov 2023 09:01:29 -0500 Received: from mail-wr1-x433.google.com (mail-wr1-x433.google.com [IPv6:2a00:1450:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D744BD4A for <linux-kernel@vger.kernel.org>; Wed, 22 Nov 2023 06:01:24 -0800 (PST) Received: by mail-wr1-x433.google.com with SMTP id ffacd0b85a97d-32faea0fa1fso529128f8f.1 for <linux-kernel@vger.kernel.org>; Wed, 22 Nov 2023 06:01:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1700661683; x=1701266483; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=hD1zFhwekBASv0LVRn3WfLSmlJjau+U8Wq2CF3q/Fo0=; b=lAaxW2LirPJu+NuMjl0SIZYxsdaE1a/F9i0gFTWs75ufBnqG6UREbNDyMKyHTW9UW8 5dG0VgHDCFwxbXAyHFmP9eDnxp8KoBXSVOdtWtRt6d2Pd/n4TmlJuLUBiOY0ue60x3K1 0ROWmoaQqH3IDsXjBfWxsuXBAquZt2e1eJDFHgoXrzYnm59Gm8qjtN4lbF3TqGyqJJTu ScYQmGj6o9zABwz7P+DH5jlPmeV6cRPANC9AzDRj/Mnmj1uInaPCsiHyqKAcmyUA/HnG VZ+GsnHoiSuAaOGO4zSqMTinTTKhPIyPrR2OjnXGxaq2eWAjccJYRs2kFUGJ+GGg2kDD PqHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700661683; x=1701266483; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hD1zFhwekBASv0LVRn3WfLSmlJjau+U8Wq2CF3q/Fo0=; b=Ft4zVfHMxEuvNBLBQTypkKeNVpNmoc1XJwGzQF1Fx2ST7zadVHvSfJg/WDNWJZrja7 c2310FowneTvxOeBAWo36rCXix6p/FbabtoKTCfxNc0/K0TqXVTTfRkbfVR9axyEwCD/ 4fUfMj5kzTu8YTmmc422DAZPpYpHN+GT1FlKWn0pQA0k+0YlHyB/mId5HHCjqq3JQhZx snGJ57DFtnqQdRjD98ghvEfl3Y+vlW/75CZq4tN7Jg/RPCNMBHMFK5Xnx1HlplhZSibh gBKF9SG6qWY5Z9qUlUEwxPUTk3hk6hg5KeZI9OGhSGFP19ESKdVR6upEu2wIjnhjg8AW 9r/Q== X-Gm-Message-State: AOJu0Yz4sjIiOWXyTaptp2n7Zop07H5bSez+rrwrQ0D9IR893ukEySnb 2doOYmMppXm0gUnavXaWGaALJg== X-Received: by 2002:a05:6000:1aca:b0:32d:d4c5:272b with SMTP id i10-20020a0560001aca00b0032dd4c5272bmr2461824wry.26.1700661683074; Wed, 22 Nov 2023 06:01:23 -0800 (PST) Received: from vingu-book.. ([2a01:e0a:f:6020:3a2e:a7f5:93e6:508b]) by smtp.gmail.com with ESMTPSA id q10-20020adff94a000000b003317796e0e3sm13283423wrr.65.2023.11.22.06.01.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Nov 2023 06:01:22 -0800 (PST) From: Vincent Guittot <vincent.guittot@linaro.org> To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, rafael@kernel.org, viresh.kumar@linaro.org, qyousef@layalina.io, linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: lukasz.luba@arm.com, Vincent Guittot <vincent.guittot@linaro.org> Subject: [PATCH] sched/pelt: avoid underestimate of task utilization Date: Wed, 22 Nov 2023 15:01:19 +0100 Message-Id: <20231122140119.472110-1-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Wed, 22 Nov 2023 06:01:36 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783273042982473560 X-GMAIL-MSGID: 1783273042982473560 |
Series |
sched/pelt: avoid underestimate of task utilization
|
|
Commit Message
Vincent Guittot
Nov. 22, 2023, 2:01 p.m. UTC
It has been reported that thread's util_est can significantly decrease as
a result of sharing the CPU with other threads. The use case can be easily
reproduced with a periodic task TA that runs 1ms and sleeps 100us.
When the task is alone on the CPU, its max utilization and its util_est is
around 888. If another similar task starts to run on the same CPU, TA will
have to share the CPU runtime and its maximum utilization will decrease
around half the CPU capacity (512) then TA's util_est will follow this new
maximum trend which is only the result of sharing the CPU with others
tasks. Such situation can be detected with runnable_avg wich is close or
equal to util_avg when TA is alone but increases above util_avg when TA
shares the CPU with other threads and wait on the runqueue.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
This patch implements what I mentioned in [1]. I have been able to
reproduce such pattern with rt-app.
[1] https://lore.kernel.org/lkml/CAKfTPtDd-HhF-YiNTtL9i5k0PfJbF819Yxu4YquzfXgwi7voyw@mail.gmail.com/#t
kernel/sched/fair.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
Comments
On 11/22/23 15:01, Vincent Guittot wrote: > It has been reported that thread's util_est can significantly decrease as > a result of sharing the CPU with other threads. The use case can be easily > reproduced with a periodic task TA that runs 1ms and sleeps 100us. > When the task is alone on the CPU, its max utilization and its util_est is > around 888. If another similar task starts to run on the same CPU, TA will > have to share the CPU runtime and its maximum utilization will decrease > around half the CPU capacity (512) then TA's util_est will follow this new > maximum trend which is only the result of sharing the CPU with others > tasks. Such situation can be detected with runnable_avg wich is close or > equal to util_avg when TA is alone but increases above util_avg when TA > shares the CPU with other threads and wait on the runqueue. > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > --- So effectively if have two always running tasks on the same CPU their util_est will converge to 1024 instead of 512 now, right? I guess this is more accurate, yes. And I think this will hit the same limitation we can hit with uclamp_max. If for example there are two tasks that are 650 if they run alone, they would appear as 1024 now (instead of 512) if they share the CPU because combined running there will be no idle time on the CPU and appear like always running tasks, I think. If I got it right, I think this should not be a problem in practice because the only reason these two tasks will be stuck on the same CPU is because the load balancer can't do anything about it to spread them; which indicates the system must be busy anyway. Once there's more idle time elsewhere, they should be spread and converge to 650 again. Not my forte, but LGTM anyway. Cheers -- Qais Yousef
Hi Vincent, On 22/11/2023 14:01, Vincent Guittot wrote: > It has been reported that thread's util_est can significantly decrease as > a result of sharing the CPU with other threads. The use case can be easily > reproduced with a periodic task TA that runs 1ms and sleeps 100us. > When the task is alone on the CPU, its max utilization and its util_est is > around 888. If another similar task starts to run on the same CPU, TA will > have to share the CPU runtime and its maximum utilization will decrease > around half the CPU capacity (512) then TA's util_est will follow this new > maximum trend which is only the result of sharing the CPU with others > tasks. Such situation can be detected with runnable_avg wich is close or > equal to util_avg when TA is alone but increases above util_avg when TA > shares the CPU with other threads and wait on the runqueue. Thanks for bringing this case up. I'm a bit nervous skipping util_est updates this way. While it is true that this avoids dropping util_est when the task is still busy doing stuff, it also avoids dropping util_est when the task really is becoming less busy. If a task has a legitimate reason to drop its utilization, it looks weird to me that its util_est dropping can be stopped by a new task joining this rq which pushes up runnable_avg. Also, something about rt-app. Is there an easy way to ask an rt-app thread to achieve a certain amount of throughput (like loops per second)? I think 'runs 1ms and sleeps 100us' may not entirely simulate a task that really wants to preserve a util_est of 888. If its utilization really is that high, its sleep time will become less and less when sharing the rq with another task, or even has no idle time and become 1024 which will trigger overutilization and migration. > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > --- > > This patch implements what I mentioned in [1]. I have been able to > reproduce such pattern with rt-app. > > [1] https://lore.kernel.org/lkml/CAKfTPtDd-HhF-YiNTtL9i5k0PfJbF819Yxu4YquzfXgwi7voyw@mail.gmail.com/#t > > kernel/sched/fair.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 07f555857698..eeb505d28905 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -4774,6 +4774,11 @@ static inline unsigned long task_util(struct task_struct *p) > return READ_ONCE(p->se.avg.util_avg); > } > > +static inline unsigned long task_runnable(struct task_struct *p) > +{ > + return READ_ONCE(p->se.avg.runnable_avg); > +} > + > static inline unsigned long _task_util_est(struct task_struct *p) > { > struct util_est ue = READ_ONCE(p->se.avg.util_est); > @@ -4892,6 +4897,14 @@ static inline void util_est_update(struct cfs_rq *cfs_rq, > if (task_util(p) > arch_scale_cpu_capacity(cpu_of(rq_of(cfs_rq)))) > return; > > + /* > + * To avoid underestimate of task utilization, skip updates of ewma if > + * we cannot grant that thread got all CPU time it wanted. > + */ > + if ((ue.enqueued + UTIL_EST_MARGIN) < task_runnable(p)) > + goto done; > + > + > /* > * Update Task's estimated utilization > *
The same but with plain text instead of html ... On Wed, 22 Nov 2023 at 17:40, Hongyan Xia <hongyan.xia2@arm.com> wrote: > > Hi Vincent, > > On 22/11/2023 14:01, Vincent Guittot wrote: > > It has been reported that thread's util_est can significantly decrease as > > a result of sharing the CPU with other threads. The use case can be easily > > reproduced with a periodic task TA that runs 1ms and sleeps 100us. > > When the task is alone on the CPU, its max utilization and its util_est is > > around 888. If another similar task starts to run on the same CPU, TA will > > have to share the CPU runtime and its maximum utilization will decrease > > around half the CPU capacity (512) then TA's util_est will follow this new > > maximum trend which is only the result of sharing the CPU with others > > tasks. Such situation can be detected with runnable_avg wich is close or > > equal to util_avg when TA is alone but increases above util_avg when TA > > shares the CPU with other threads and wait on the runqueue. > > Thanks for bringing this case up. I'm a bit nervous skipping util_est > updates this way. While it is true that this avoids dropping util_est > when the task is still busy doing stuff, it also avoids dropping > util_est when the task really is becoming less busy. If a task has a > legitimate reason to drop its utilization, it looks weird to me that its > util_est dropping can be stopped by a new task joining this rq which > pushes up runnable_avg. We prefer an util_est that overestimate rather than under estimate because in 1st case you will not provide enough performance to the task which will remain under provisioned whereas in the other case you will create some idle time which will enable to reduce contention and as a result reduce the util_est so the overestimate will be transient whereas the underestimate will be remain > Also, something about rt-app. Is there an easy way to ask an rt-app > thread to achieve a certain amount of throughput (like loops per > second)? I think 'runs 1ms and sleeps 100us' may not entirely simulate a > task that really wants to preserve a util_est of 888. If its utilization We can do this in rt-app with timer instead of sleep but in this case there is no sleep and as a result no update of util_est. In the case raised in [1] by lukasz and according to the shared charts, there are some sleep phases even when the task must share the cpu. This can typically happen when you have a pipe of threads: A task prepares some data, wakes up next step and waits for the result. Then the 1st task is woken up when it's done and to prepare the next data and so on ... In this case, the task will slow down because of time sharing and there is still sleep phase > > > really is that high, its sleep time will become less and less when > sharing the rq with another task, or even has no idle time and become > 1024 which will trigger overutilization and migration. > > > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > > --- > > > > This patch implements what I mentioned in [1]. I have been able to > > reproduce such pattern with rt-app. > > > > [1] https://lore.kernel.org/lkml/CAKfTPtDd-HhF-YiNTtL9i5k0PfJbF819Yxu4YquzfXgwi7voyw@mail.gmail.com/#t > >
On 11/22/23 14:01, Vincent Guittot wrote: > It has been reported that thread's util_est can significantly decrease as > a result of sharing the CPU with other threads. The use case can be easily > reproduced with a periodic task TA that runs 1ms and sleeps 100us. > When the task is alone on the CPU, its max utilization and its util_est is > around 888. If another similar task starts to run on the same CPU, TA will > have to share the CPU runtime and its maximum utilization will decrease > around half the CPU capacity (512) then TA's util_est will follow this new > maximum trend which is only the result of sharing the CPU with others > tasks. Such situation can be detected with runnable_avg wich is close or > equal to util_avg when TA is alone but increases above util_avg when TA > shares the CPU with other threads and wait on the runqueue. > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > --- > > This patch implements what I mentioned in [1]. I have been able to > reproduce such pattern with rt-app. > > [1] https://lore.kernel.org/lkml/CAKfTPtDd-HhF-YiNTtL9i5k0PfJbF819Yxu4YquzfXgwi7voyw@mail.gmail.com/#t Thanks Vincet for looking at it! I didn't have to to come back to this issue. > > kernel/sched/fair.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 07f555857698..eeb505d28905 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -4774,6 +4774,11 @@ static inline unsigned long task_util(struct task_struct *p) > return READ_ONCE(p->se.avg.util_avg); > } > > +static inline unsigned long task_runnable(struct task_struct *p) > +{ > + return READ_ONCE(p->se.avg.runnable_avg); > +} > + > static inline unsigned long _task_util_est(struct task_struct *p) > { > struct util_est ue = READ_ONCE(p->se.avg.util_est); > @@ -4892,6 +4897,14 @@ static inline void util_est_update(struct cfs_rq *cfs_rq, > if (task_util(p) > arch_scale_cpu_capacity(cpu_of(rq_of(cfs_rq)))) > return; > > + /* > + * To avoid underestimate of task utilization, skip updates of ewma if > + * we cannot grant that thread got all CPU time it wanted. > + */ > + if ((ue.enqueued + UTIL_EST_MARGIN) < task_runnable(p)) > + goto done; > + > + > /* > * Update Task's estimated utilization > * That looks reasonable to do. I cannot test it right now on my pixel6. It should do the trick to the task util that we need in bigger apps than rt-app as well. Reviewed-by: Lukasz luba <lukasz.luba@arm.com>
On 11/21/23 23:01, Qais Yousef wrote: > On 11/22/23 15:01, Vincent Guittot wrote: >> It has been reported that thread's util_est can significantly decrease as >> a result of sharing the CPU with other threads. The use case can be easily >> reproduced with a periodic task TA that runs 1ms and sleeps 100us. >> When the task is alone on the CPU, its max utilization and its util_est is >> around 888. If another similar task starts to run on the same CPU, TA will >> have to share the CPU runtime and its maximum utilization will decrease >> around half the CPU capacity (512) then TA's util_est will follow this new >> maximum trend which is only the result of sharing the CPU with others >> tasks. Such situation can be detected with runnable_avg wich is close or >> equal to util_avg when TA is alone but increases above util_avg when TA >> shares the CPU with other threads and wait on the runqueue. >> >> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> >> --- > > So effectively if have two always running tasks on the same CPU their util_est > will converge to 1024 instead of 512 now, right? > > I guess this is more accurate, yes. And I think this will hit the same > limitation we can hit with uclamp_max. If for example there are two tasks that > are 650 if they run alone, they would appear as 1024 now (instead of 512) if > they share the CPU because combined running there will be no idle time on the > CPU and appear like always running tasks, I think. Well they might not converge to 1024. It will just prevent them to not drop the highest seen util_est on them before this contention happen. > > If I got it right, I think this should not be a problem in practice because the > only reason these two tasks will be stuck on the same CPU is because the load > balancer can't do anything about it to spread them; which indicates the system > must be busy anyway. Once there's more idle time elsewhere, they should be > spread and converge to 650 again. It can be applicable for the real app. That chrome thread that I reported (which is ~950 util) drops it's util and util_est in some scenarios when there are some other tasks in the runqueue, because of some short sleeps. Than this situation attracts other tasks to migrate but next time when the big thread wakes-up it has to share the CPU and looses it's util_est (which was the last information that there was such big util on it). Those update moments when we go via util_est_update() code path are quite often in short time and don't fit into the PELT world, IMO. It's like asynchronous force-update to the util_est signal, not the same way wrt how slowly util is built. I think Peter had something like this impression, when he asked me of often and fast this update could be triggered that we loose the value... I would even dare to call this patch a fix and a candidate to stable-tree.
On 11/23/23 14:27, Lukasz Luba wrote: > > > On 11/21/23 23:01, Qais Yousef wrote: > > On 11/22/23 15:01, Vincent Guittot wrote: > > > It has been reported that thread's util_est can significantly decrease as > > > a result of sharing the CPU with other threads. The use case can be easily > > > reproduced with a periodic task TA that runs 1ms and sleeps 100us. > > > When the task is alone on the CPU, its max utilization and its util_est is > > > around 888. If another similar task starts to run on the same CPU, TA will > > > have to share the CPU runtime and its maximum utilization will decrease > > > around half the CPU capacity (512) then TA's util_est will follow this new > > > maximum trend which is only the result of sharing the CPU with others > > > tasks. Such situation can be detected with runnable_avg wich is close or > > > equal to util_avg when TA is alone but increases above util_avg when TA > > > shares the CPU with other threads and wait on the runqueue. > > > > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > > > --- > > > > So effectively if have two always running tasks on the same CPU their util_est > > will converge to 1024 instead of 512 now, right? > > > > I guess this is more accurate, yes. And I think this will hit the same > > limitation we can hit with uclamp_max. If for example there are two tasks that > > are 650 if they run alone, they would appear as 1024 now (instead of 512) if > > they share the CPU because combined running there will be no idle time on the > > CPU and appear like always running tasks, I think. > > Well they might not converge to 1024. It will just prevent them to not > drop the highest seen util_est on them before this contention happen. Right, we just ignore the decay and hold on to the last seen value. > > > > > If I got it right, I think this should not be a problem in practice because the > > only reason these two tasks will be stuck on the same CPU is because the load > > balancer can't do anything about it to spread them; which indicates the system > > must be busy anyway. Once there's more idle time elsewhere, they should be > > spread and converge to 650 again. > > It can be applicable for the real app. That chrome thread that I > reported (which is ~950 util) drops it's util and util_est in some > scenarios when there are some other tasks in the runqueue, because > of some short sleeps. Than this situation attracts other tasks to > migrate but next time when the big thread wakes-up it has to share > the CPU and looses it's util_est (which was the last information > that there was such big util on it). > > Those update moments when we go via util_est_update() code path > are quite often in short time and don't fit into the PELT world, > IMO. It's like asynchronous force-update to the util_est signal, > not the same way wrt how slowly util is built. I think Peter > had something like this impression, when he asked me of often > and fast this update could be triggered that we loose the value... > > I would even dare to call this patch a fix and a candidate to > stable-tree. It seems a genuine fix yes. I am generally worried of the power impact of util_est. But I tend to agree this is something worth considering for a backport. Cheers -- Qais Yousef
On 22/11/2023 17:37, Vincent Guittot wrote: > The same but with plain text instead of html ... > > On Wed, 22 Nov 2023 at 17:40, Hongyan Xia <hongyan.xia2@arm.com> wrote: >> >> Hi Vincent, >> >> On 22/11/2023 14:01, Vincent Guittot wrote: >>> It has been reported that thread's util_est can significantly decrease as >>> a result of sharing the CPU with other threads. The use case can be easily >>> reproduced with a periodic task TA that runs 1ms and sleeps 100us. >>> When the task is alone on the CPU, its max utilization and its util_est is >>> around 888. If another similar task starts to run on the same CPU, TA will >>> have to share the CPU runtime and its maximum utilization will decrease >>> around half the CPU capacity (512) then TA's util_est will follow this new >>> maximum trend which is only the result of sharing the CPU with others >>> tasks. Such situation can be detected with runnable_avg wich is close or >>> equal to util_avg when TA is alone but increases above util_avg when TA >>> shares the CPU with other threads and wait on the runqueue. >> >> Thanks for bringing this case up. I'm a bit nervous skipping util_est >> updates this way. While it is true that this avoids dropping util_est >> when the task is still busy doing stuff, it also avoids dropping >> util_est when the task really is becoming less busy. If a task has a >> legitimate reason to drop its utilization, it looks weird to me that its >> util_est dropping can be stopped by a new task joining this rq which >> pushes up runnable_avg. > > We prefer an util_est that overestimate rather than under estimate > because in 1st case you will not provide enough performance to the > task which will remain under provisioned whereas in the other case you > will create some idle time which will enable to reduce contention and > as a result reduce the util_est so the overestimate will be transient > whereas the underestimate will be remain My concern is mostly about energy efficiency, although I have no concrete evidence on energy impact so I'm not firmly against this patch. > >> Also, something about rt-app. Is there an easy way to ask an rt-app >> thread to achieve a certain amount of throughput (like loops per >> second)? I think 'runs 1ms and sleeps 100us' may not entirely simulate a >> task that really wants to preserve a util_est of 888. If its utilization > > > We can do this in rt-app with timer... Thanks. Looking at the rt-app doc, I think a timer with absolute time stamps does what I want.
On 22/11/2023 14:01, Vincent Guittot wrote: > [...] > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 07f555857698..eeb505d28905 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -4774,6 +4774,11 @@ static inline unsigned long task_util(struct task_struct *p) > return READ_ONCE(p->se.avg.util_avg); > } > > +static inline unsigned long task_runnable(struct task_struct *p) > +{ > + return READ_ONCE(p->se.avg.runnable_avg); > +} > + > static inline unsigned long _task_util_est(struct task_struct *p) > { > struct util_est ue = READ_ONCE(p->se.avg.util_est); > @@ -4892,6 +4897,14 @@ static inline void util_est_update(struct cfs_rq *cfs_rq, > if (task_util(p) > arch_scale_cpu_capacity(cpu_of(rq_of(cfs_rq)))) > return; > > + /* > + * To avoid underestimate of task utilization, skip updates of ewma if > + * we cannot grant that thread got all CPU time it wanted. > + */ > + if ((ue.enqueued + UTIL_EST_MARGIN) < task_runnable(p)) > + goto done; > + > + Actually, does this also skip util_est increases as well, assuming no FASTUP? When a task is ramping up, another task could join and then blocks this task from ramping up its util_est. Or do we think this is intended behavior for !FASTUP? > /* > * Update Task's estimated utilization > *
On Fri, 24 Nov 2023 at 11:44, Hongyan Xia <hongyan.xia2@arm.com> wrote: > > On 22/11/2023 14:01, Vincent Guittot wrote: > > [...] > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 07f555857698..eeb505d28905 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -4774,6 +4774,11 @@ static inline unsigned long task_util(struct task_struct *p) > > return READ_ONCE(p->se.avg.util_avg); > > } > > > > +static inline unsigned long task_runnable(struct task_struct *p) > > +{ > > + return READ_ONCE(p->se.avg.runnable_avg); > > +} > > + > > static inline unsigned long _task_util_est(struct task_struct *p) > > { > > struct util_est ue = READ_ONCE(p->se.avg.util_est); > > @@ -4892,6 +4897,14 @@ static inline void util_est_update(struct cfs_rq *cfs_rq, > > if (task_util(p) > arch_scale_cpu_capacity(cpu_of(rq_of(cfs_rq)))) > > return; > > > > + /* > > + * To avoid underestimate of task utilization, skip updates of ewma if > > + * we cannot grant that thread got all CPU time it wanted. > > + */ > > + if ((ue.enqueued + UTIL_EST_MARGIN) < task_runnable(p)) > > + goto done; > > + > > + > > Actually, does this also skip util_est increases as well, assuming no > FASTUP? When a task is ramping up, another task could join and then > blocks this task from ramping up its util_est. > > Or do we think this is intended behavior for !FASTUP? sched_feat(UTIL_EST_FASTUP) has been there to disable faster ramp up in case of regression but I'm not aware of anybody having to disable it during the last 3 year so we should just delete the sched_feat() and make faster ramp up permanent. > > > /* > > * Update Task's estimated utilization > > *
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 07f555857698..eeb505d28905 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4774,6 +4774,11 @@ static inline unsigned long task_util(struct task_struct *p) return READ_ONCE(p->se.avg.util_avg); } +static inline unsigned long task_runnable(struct task_struct *p) +{ + return READ_ONCE(p->se.avg.runnable_avg); +} + static inline unsigned long _task_util_est(struct task_struct *p) { struct util_est ue = READ_ONCE(p->se.avg.util_est); @@ -4892,6 +4897,14 @@ static inline void util_est_update(struct cfs_rq *cfs_rq, if (task_util(p) > arch_scale_cpu_capacity(cpu_of(rq_of(cfs_rq)))) return; + /* + * To avoid underestimate of task utilization, skip updates of ewma if + * we cannot grant that thread got all CPU time it wanted. + */ + if ((ue.enqueued + UTIL_EST_MARGIN) < task_runnable(p)) + goto done; + + /* * Update Task's estimated utilization *