Message ID | 20230706135144.324311-1-vincent.guittot@linaro.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9f45:0:b0:3ea:f831:8777 with SMTP id v5csp2582552vqx; Thu, 6 Jul 2023 07:03:10 -0700 (PDT) X-Google-Smtp-Source: APBJJlGAEAerHfZ3uc1yqN4lFscgng2ux4h0lZjpCULdz9ClIojaLPlzvY/qnpgJUkgk+83CK3He X-Received: by 2002:a17:902:c10c:b0:1b5:54cc:fcb8 with SMTP id 12-20020a170902c10c00b001b554ccfcb8mr2071614pli.19.1688652189977; Thu, 06 Jul 2023 07:03:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688652189; cv=none; d=google.com; s=arc-20160816; b=U9gr6+L5ysWOSDRZUg+FZfahJD8RkZ3e3UDMy9D1eGPfMp4mrIXDUnQeaV8XCjabHQ W++FsjU/XUSPvAxiBiqliDpAQ6fm1wcbfLIsWG+rOeO0+4fOM8VaF6HuPHz9tyvJ2nGT 2CBDJ3lHSTa8kvNV6zJy0mD6WB2hdfPL5/ObnPXz9wAeTaW48PxAx9Zvm6Jn6/e2kNQi tT3oVZB1OMx/8pY9fh/wNG+jLwcbEzqIPU6nSqxM0FpbEfeZ0SmAMQaPiBRrVcefEKpJ pbSoB1LZt/02BjA1Yu4qn4bEAgH5kNBQEsDJOBRGlIRnbWuAgkO2T8fh72FPy0XRQC8P IW/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=T9Wtp7GAQLlCzMIbELA47/ReR5Ash9IIBcZpTnUw0Po=; fh=EtV5OCTjjo56wXHdJnbAWZqD8DQmFILnPLywowwMDa8=; b=KzqsE8yFmaU/BwjNuSxthgxGEEsK2p9YhTw1MjRrDebICsJfCcqykWmyux52HOLn9b 6q+o58fazSubqsK8UV4k/8E/eIPV446Ny4b7I/Q/lC7uGj1SOqypkTBjGrSdeuRbAoFX vDx+oZzK8zXViMTjcctDNPp4jsgj6tlknwfINzb6LF5n1uDwPD6u79JsDC+2bQDsWDPu Gmysx1j7tzEDvpscg+eoFptfl8MVbXybeOAUQ+++So9ydRlK6iB2O5DDh1fdGM1SnhMS NtiLKGX+pLAOkoc817gZZ+Rl2D+3Fy+Vpq18F+iC4H4lAGqtAd0wxtI3m5eel3NrrVNo n1Fg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=iv8tS7Tr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x18-20020a170902ec9200b001b8c6662081si987835plg.375.2023.07.06.07.02.31; Thu, 06 Jul 2023 07:03:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=iv8tS7Tr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229919AbjGFNvu (ORCPT <rfc822;hadasmailinglist@gmail.com> + 99 others); Thu, 6 Jul 2023 09:51:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60314 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229996AbjGFNvt (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 6 Jul 2023 09:51:49 -0400 Received: from mail-wr1-x435.google.com (mail-wr1-x435.google.com [IPv6:2a00:1450:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A915D1995 for <linux-kernel@vger.kernel.org>; Thu, 6 Jul 2023 06:51:48 -0700 (PDT) Received: by mail-wr1-x435.google.com with SMTP id ffacd0b85a97d-314172bb818so655593f8f.1 for <linux-kernel@vger.kernel.org>; Thu, 06 Jul 2023 06:51:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1688651507; x=1691243507; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=T9Wtp7GAQLlCzMIbELA47/ReR5Ash9IIBcZpTnUw0Po=; b=iv8tS7TrXbH666aYwVrLb+kZWTnPHQ1XIkv7whFD3SeRiVO9/n1fAcw0KZLQnhHf8w P1fRKDdhZ8K6PHzfbYKB7yABaV6LSZEQNFINl9RXhp3PyThmXtGt7Ip/QsJLG0oIdTsN +WyUwE8pI6KVQYj8It0EW1s15mR+fVm3Bw5ObyyaSJ6uQtHdxDNgUeiX484jydlPr7m1 yGkFijf+bax/VjUjfOOs5eFeaF2cJXGXCqDEjlz319kJrjTWMSlDwGxyekTfzevTRBUB 1vpCXVr3s2Zr3UEjGJsHb2sLQiiowyKArrEoA/Uvg170NfmujkqZFxqkFaGg3cb5I1Un K70g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688651507; x=1691243507; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=T9Wtp7GAQLlCzMIbELA47/ReR5Ash9IIBcZpTnUw0Po=; b=PrO8XQALbohpID2zl9eLplX+hJ+S1y4+hE7a314qTiQdAxSQTuSonLR2tacHqaVK5d CxWnr+zdQee2IJT7ZrO+9RHAA1avMDm6vW/zBnRKrRKFhqCO0qSiZhfiDLbpDq1XnH8B T1choNgIx7XLTt6XEPHHj19jlTTVtxDI0yqJ+O8CAz200Vgmj8kxlBxmghWB/sCHCCaK IpD7w1nZLH/xCsTT7P+rWg721vSdMsfOe1atjr47Adwgkr3Jk0Ut+cQ4v3lO6xNwu9XZ gjvL3Sn56bNMcWQbvMyEULkPlPgh6q7T7xGOcc7Wkd4oWVxMtgEBz64a0ps/kSBWEpGk QiSQ== X-Gm-Message-State: ABy/qLYDCZPmMc5rPfxjMIC8mQYucp9d0DXoEplLcE9sBqBrkQQNFXp6 90yZRVpLiGuXE9zJqbpnEeTTCA== X-Received: by 2002:a5d:6e86:0:b0:314:f18:bc65 with SMTP id k6-20020a5d6e86000000b003140f18bc65mr1649419wrz.66.1688651507171; Thu, 06 Jul 2023 06:51:47 -0700 (PDT) Received: from vingu-book.. ([2a01:e0a:f:6020:fd1d:a4f4:6cc:dcc]) by smtp.gmail.com with ESMTPSA id b18-20020adfe312000000b0031455482d1fsm698033wrj.47.2023.07.06.06.51.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Jul 2023 06:51:46 -0700 (PDT) From: Vincent Guittot <vincent.guittot@linaro.org> To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com, linux-kernel@vger.kernel.org Cc: qyousef@layalina.io, Vincent Guittot <vincent.guittot@linaro.org> Subject: [PATCH] sched/fair: remove util_est boosting Date: Thu, 6 Jul 2023 15:51:44 +0200 Message-Id: <20230706135144.324311-1-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1770680158929846151?= X-GMAIL-MSGID: =?utf-8?q?1770680158929846151?= |
Series |
sched/fair: remove util_est boosting
|
|
Commit Message
Vincent Guittot
July 6, 2023, 1:51 p.m. UTC
There is no need to use runnable_avg when estimating util_est and that
even generates wrong behavior because one includes blocked tasks whereas
the other one doesn't. This can lead to accounting twice the waking task p,
once with the blocked runnable_avg and another one when adding its
util_est.
cpu's runnable_avg is already used when computing util_avg which is then
compared with util_est.
In some situation, feec will not select prev_cpu but another one on the
same performance domain because of higher max_util
Fixes: 7d0583cf9ec7 ("sched/fair, cpufreq: Introduce 'runnable boosting'")
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/fair.c | 3 ---
1 file changed, 3 deletions(-)
Comments
On 07/06/23 15:51, Vincent Guittot wrote: > There is no need to use runnable_avg when estimating util_est and that > even generates wrong behavior because one includes blocked tasks whereas > the other one doesn't. This can lead to accounting twice the waking task p, > once with the blocked runnable_avg and another one when adding its > util_est. > > cpu's runnable_avg is already used when computing util_avg which is then > compared with util_est. > > In some situation, feec will not select prev_cpu but another one on the > same performance domain because of higher max_util > > Fixes: 7d0583cf9ec7 ("sched/fair, cpufreq: Introduce 'runnable boosting'") > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > --- Can we verify the numbers that introduced this magic boost are still valid please? Otherwise LGTM. Thanks! -- Qais Yousef > kernel/sched/fair.c | 3 --- > 1 file changed, 3 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index a80a73909dc2..77c9f5816c31 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7289,9 +7289,6 @@ cpu_util(int cpu, struct task_struct *p, int dst_cpu, int boost) > > util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued); > > - if (boost) > - util_est = max(util_est, runnable); > - > /* > * During wake-up @p isn't enqueued yet and doesn't contribute > * to any cpu_rq(cpu)->cfs.avg.util_est.enqueued. > -- > 2.34.1 >
On Tue, 11 Jul 2023 at 17:47, Qais Yousef <qyousef@layalina.io> wrote: > > On 07/06/23 15:51, Vincent Guittot wrote: > > There is no need to use runnable_avg when estimating util_est and that > > even generates wrong behavior because one includes blocked tasks whereas > > the other one doesn't. This can lead to accounting twice the waking task p, > > once with the blocked runnable_avg and another one when adding its > > util_est. > > > > cpu's runnable_avg is already used when computing util_avg which is then > > compared with util_est. > > > > In some situation, feec will not select prev_cpu but another one on the > > same performance domain because of higher max_util > > > > Fixes: 7d0583cf9ec7 ("sched/fair, cpufreq: Introduce 'runnable boosting'") > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > > --- > > Can we verify the numbers that introduced this magic boost are still valid > please? TBH I don't expect it but I agree it's worth checking. Dietmar could you rerun your tests with this change ? > > Otherwise LGTM. > > > Thanks! > > -- > > Qais Yousef > > > kernel/sched/fair.c | 3 --- > > 1 file changed, 3 deletions(-) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index a80a73909dc2..77c9f5816c31 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -7289,9 +7289,6 @@ cpu_util(int cpu, struct task_struct *p, int dst_cpu, int boost) > > > > util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued); > > > > - if (boost) > > - util_est = max(util_est, runnable); > > - > > /* > > * During wake-up @p isn't enqueued yet and doesn't contribute > > * to any cpu_rq(cpu)->cfs.avg.util_est.enqueued. > > -- > > 2.34.1 > >
On 12/07/2023 17:30, Vincent Guittot wrote: > On Tue, 11 Jul 2023 at 17:47, Qais Yousef <qyousef@layalina.io> wrote: >> >> On 07/06/23 15:51, Vincent Guittot wrote: >>> There is no need to use runnable_avg when estimating util_est and that >>> even generates wrong behavior because one includes blocked tasks whereas >>> the other one doesn't. This can lead to accounting twice the waking task p, >>> once with the blocked runnable_avg and another one when adding its >>> util_est. ... and we don't have this issue for the util_avg case since we have: 7317 } else if (p && task_cpu(p) != cpu && dst_cpu == cpu) ^^^^^^^^^^^^^^^^^^ 7318 util += task_util(p); >>> cpu's runnable_avg is already used when computing util_avg which is then >>> compared with util_est. We discussed why I have to use max(X, runnable) for X=util and X=util_est in v2: https://lkml.kernel.org/r/251b524a-2c44-3892-1bae-03f879d6a64b@arm.com --> I need the util_est = max(util_est, runnable) further down as well. Just want to fetch runnable only once. util = 50, task_util = 5, util_est = 60, task_util_est = 10, runnable = 70 max(70 + 5, 60 + 10) != max (70 + 5, 70 + 10) when dst_cpu == cpu <-- But I assume your point is that: 7327 if (boost) 7328 util_est = max(util_est, runnable); 7356 if (dst_cpu == cpu) <-- (1) 7357 util_est += _task_util_est(p); 7358 else if (p && unlikely(task_on_rq_queued(p) || current == p)) 7359 lsub_positive(&util_est, _task_util_est(p)); 7360 7361 util = max(util, util_est); --> (1) doesn't work anymore in case `util_est == runnable`. It will break the assumption for the if condition depicted in cpu_util()'s comment: 7331 * During wake-up (2) @p isn't enqueued yet and doesn't contribute 7332 * to any cpu_rq(cpu)->cfs.avg.util_est.enqueued. 7333 * If @dst_cpu == @cpu add it to "simulate" cpu_util after @p 7334 * has been enqueued. (2) eenv_pd_max_util() and find_energy_efficient_cpu() call-site. <--- Rerunning Jankbench tests on Pix6 will tell if boosting util_avg instead of both will still show the anticipated results. Likelihood is high that it will since we do `util = max(util, util_est)` at the end of cpu_util(). >>> In some situation, feec will not select prev_cpu but another one on the >>> same performance domain because of higher max_util >>> >>> Fixes: 7d0583cf9ec7 ("sched/fair, cpufreq: Introduce 'runnable boosting'") >>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> >>> --- >> >> Can we verify the numbers that introduced this magic boost are still valid >> please? > > TBH I don't expect it but I agree it's worth checking. Dietmar could > you rerun your tests with this change ? Could do. But first lets understand the issue properly. >> Otherwise LGTM. >> >> >> Thanks! >> >> -- >> >> Qais Yousef >> >>> kernel/sched/fair.c | 3 --- >>> 1 file changed, 3 deletions(-) >>> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>> index a80a73909dc2..77c9f5816c31 100644 >>> --- a/kernel/sched/fair.c >>> +++ b/kernel/sched/fair.c >>> @@ -7289,9 +7289,6 @@ cpu_util(int cpu, struct task_struct *p, int dst_cpu, int boost) >>> >>> util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued); >>> >>> - if (boost) >>> - util_est = max(util_est, runnable); >>> - >>> /* >>> * During wake-up @p isn't enqueued yet and doesn't contribute >>> * to any cpu_rq(cpu)->cfs.avg.util_est.enqueued. >>> -- >>> 2.34.1 >>>
On Fri, 21 Jul 2023 at 18:09, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: > > On 12/07/2023 17:30, Vincent Guittot wrote: > > On Tue, 11 Jul 2023 at 17:47, Qais Yousef <qyousef@layalina.io> wrote: > >> > >> On 07/06/23 15:51, Vincent Guittot wrote: > >>> There is no need to use runnable_avg when estimating util_est and that > >>> even generates wrong behavior because one includes blocked tasks whereas > >>> the other one doesn't. This can lead to accounting twice the waking task p, > >>> once with the blocked runnable_avg and another one when adding its > >>> util_est. > > ... and we don't have this issue for the util_avg case since we have: > > 7317 } else if (p && task_cpu(p) != cpu && dst_cpu == cpu) > ^^^^^^^^^^^^^^^^^^ > 7318 util += task_util(p); > > >>> cpu's runnable_avg is already used when computing util_avg which is then > >>> compared with util_est. > > We discussed why I have to use max(X, runnable) for X=util and > X=util_est in v2: > > https://lkml.kernel.org/r/251b524a-2c44-3892-1bae-03f879d6a64b@arm.com > > --> > > I need the util_est = max(util_est, runnable) further down as well. Just > want to fetch runnable only once. > > util = 50, task_util = 5, util_est = 60, task_util_est = 10, runnable = 70 > > max(70 + 5, 60 + 10) != max (70 + 5, 70 + 10) when dst_cpu == cpu > Hmm, I don't get your point here. Why should they be equal ? Below is a example to describe my problem: task A with util_avg=200 util_est=300 runnable=200 task A is attached to CPU0 so it contributes to CPU0's util_avg and runnable_avg. In eenv_pd_max_util() we call cpu_util(cpu, p, dst_cpu, 1) to get the max utilization and the OPP to use to compute energy. Let say that there is nothing else running on CPU0 and CPU1 and the both belong to the same performance domain so CPU0 util_avg= 200 util_est=0 runnable_avg=200 CPU1 util_avg=0 util_est=0 runnable_avg=0 For CPU0, cpu_util(cpu, p, dst_cpu, 1) will return (200 + 300) = 500 For CPU1, cpu_util(cpu, p, dst_cpu, 1) will return (0 + 300) = 300 If there is an OPP with a capacity between these 2 values, CPU1 will use a lower OPP than CPU0 and its computed energy will be lower. The condition if (max_spare_cap_cpu >= 0 && max_spare_cap > prev_spare_cap) filters some cases when CPU0 and CPU1 have the exact same spare capacity. But we often see a smaller spare capacity for CPU0 because of small side activities like cpufreq, timer, irq, rcu ... The difference is often only 1 but enough to bypass the condition above. task A will migrate to CPU1 whereas there is no need. Then it will move back to CPU0 once CPU1 will have a smaller spare capacity I ran a test on snapdragon RB5 with the latest tip/sched/core. I start 3 tasks: 1 large enough to be on medium CPUs and 2 small enough to stay on little CPUs during 30 seconds With tip/sched/core, the 3 tasks are migrating around 3665 With the patch, there is only 8 migration at the beginning of the test > <-- > > But I assume your point is that: > > 7327 if (boost) > 7328 util_est = max(util_est, runnable); > > 7356 if (dst_cpu == cpu) <-- (1) > 7357 util_est += _task_util_est(p); > 7358 else if (p && unlikely(task_on_rq_queued(p) || current == p)) > 7359 lsub_positive(&util_est, _task_util_est(p)); > 7360 > 7361 util = max(util, util_est); > > --> (1) doesn't work anymore in case `util_est == runnable`. > > It will break the assumption for the if condition depicted in > cpu_util()'s comment: exactly > > 7331 * During wake-up (2) @p isn't enqueued yet and doesn't contribute > 7332 * to any cpu_rq(cpu)->cfs.avg.util_est.enqueued. > 7333 * If @dst_cpu == @cpu add it to "simulate" cpu_util after @p > 7334 * has been enqueued. > > (2) eenv_pd_max_util() and find_energy_efficient_cpu() call-site. > > <--- > > Rerunning Jankbench tests on Pix6 will tell if boosting util_avg instead > of both will still show the anticipated results. Likelihood is high that > it will since we do `util = max(util, util_est)` at the end of cpu_util(). I think the same > > >>> In some situation, feec will not select prev_cpu but another one on the > >>> same performance domain because of higher max_util > >>> > >>> Fixes: 7d0583cf9ec7 ("sched/fair, cpufreq: Introduce 'runnable boosting'") > >>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > >>> --- > >> > >> Can we verify the numbers that introduced this magic boost are still valid > >> please? > > > > TBH I don't expect it but I agree it's worth checking. Dietmar could > > you rerun your tests with this change ? > > Could do. But first lets understand the issue properly. > > >> Otherwise LGTM. > >> > >> > >> Thanks! > >> > >> -- > >> > >> Qais Yousef > >> > >>> kernel/sched/fair.c | 3 --- > >>> 1 file changed, 3 deletions(-) > >>> > >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > >>> index a80a73909dc2..77c9f5816c31 100644 > >>> --- a/kernel/sched/fair.c > >>> +++ b/kernel/sched/fair.c > >>> @@ -7289,9 +7289,6 @@ cpu_util(int cpu, struct task_struct *p, int dst_cpu, int boost) > >>> > >>> util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued); > >>> > >>> - if (boost) > >>> - util_est = max(util_est, runnable); > >>> - > >>> /* > >>> * During wake-up @p isn't enqueued yet and doesn't contribute > >>> * to any cpu_rq(cpu)->cfs.avg.util_est.enqueued. > >>> -- > >>> 2.34.1 > >>> >
On 24/07/2023 15:06, Vincent Guittot wrote: > On Fri, 21 Jul 2023 at 18:09, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: >> >> On 12/07/2023 17:30, Vincent Guittot wrote: >>> On Tue, 11 Jul 2023 at 17:47, Qais Yousef <qyousef@layalina.io> wrote: >>>> >>>> On 07/06/23 15:51, Vincent Guittot wrote: [...] >> --> >> >> I need the util_est = max(util_est, runnable) further down as well. Just >> want to fetch runnable only once. >> >> util = 50, task_util = 5, util_est = 60, task_util_est = 10, runnable = 70 >> >> max(70 + 5, 60 + 10) != max (70 + 5, 70 + 10) when dst_cpu == cpu >> > > Hmm, I don't get your point here. Why should they be equal ? > > Below is a example to describe my problem: > > task A with util_avg=200 util_est=300 runnable=200 > task A is attached to CPU0 so it contributes to CPU0's util_avg and > runnable_avg. > > In eenv_pd_max_util() we call cpu_util(cpu, p, dst_cpu, 1) to get the > max utilization and the OPP to use to compute energy. > > Let say that there is nothing else running on CPU0 and CPU1 and the > both belong to the same performance domain so > CPU0 util_avg= 200 util_est=0 runnable_avg=200 > CPU1 util_avg=0 util_est=0 runnable_avg=0 > > For CPU0, cpu_util(cpu, p, dst_cpu, 1) will return (200 + 300) = 500 > For CPU1, cpu_util(cpu, p, dst_cpu, 1) will return (0 + 300) = 300 > > If there is an OPP with a capacity between these 2 values, CPU1 will > use a lower OPP than CPU0 and its computed energy will be lower. > > The condition if (max_spare_cap_cpu >= 0 && max_spare_cap > > prev_spare_cap) filters some cases when CPU0 and CPU1 have the exact > same spare capacity. But we often see a smaller spare capacity for > CPU0 because of small side activities like cpufreq, timer, irq, rcu > ... The difference is often only 1 but enough to bypass the condition > above. task A will migrate to CPU1 whereas there is no need. Then it > will move back to CPU0 once CPU1 will have a smaller spare capacity > > I ran a test on snapdragon RB5 with the latest tip/sched/core. I start > 3 tasks: 1 large enough to be on medium CPUs and 2 small enough to > stay on little CPUs during 30 seconds > With tip/sched/core, the 3 tasks are migrating around 3665 > With the patch, there is only 8 migration at the beginning of the test I agree with this. The fact that cfs_rq->avg.runnable_avg contains blocked contributions from task A makes it unsuitable for the util_est (no blocked contributions) if condition (dst_cpu == cpu) since we don't want to add A's util_est to util_est to simulate during wakeup that A is enqueued. >> <-- >> >> But I assume your point is that: >> >> 7327 if (boost) >> 7328 util_est = max(util_est, runnable); >> >> 7356 if (dst_cpu == cpu) <-- (1) >> 7357 util_est += _task_util_est(p); >> 7358 else if (p && unlikely(task_on_rq_queued(p) || current == p)) >> 7359 lsub_positive(&util_est, _task_util_est(p)); >> 7360 >> 7361 util = max(util, util_est); >> >> --> (1) doesn't work anymore in case `util_est == runnable`. >> >> It will break the assumption for the if condition depicted in >> cpu_util()'s comment: > > exactly OK. >> >> 7331 * During wake-up (2) @p isn't enqueued yet and doesn't contribute >> 7332 * to any cpu_rq(cpu)->cfs.avg.util_est.enqueued. >> 7333 * If @dst_cpu == @cpu add it to "simulate" cpu_util after @p >> 7334 * has been enqueued. >> >> (2) eenv_pd_max_util() and find_energy_efficient_cpu() call-site. >> >> <--- >> >> Rerunning Jankbench tests on Pix6 will tell if boosting util_avg instead >> of both will still show the anticipated results. Likelihood is high that >> it will since we do `util = max(util, util_est)` at the end of cpu_util(). > > I think the same Reran the Jankbench test with the patch (fix) on exactly the same platform (Pixel6, Android 12) I used for v3 (base, runnable): https://lkml.kernel.org/r/20230515115735.296329-1-dietmar.eggemann@arm.com Max_frame_duration: +-----------------+------------+ | kernel | value [ms] | +-----------------+------------+ | base | 163.1 | | runnable | 162.0 | | fix | 157.1 | +-----------------+------------+ Mean_frame_duration: +-----------------+------------+----------+ | kernel | value [ms] | diff [%] | +-----------------+------------+----------+ | base | 18.0 | 0.0 | | runnable | 12.7 | -29.43 | | fix | 13.0 | -27.78 | +-----------------+------------+----------+ Jank percentage (Jank deadline 16ms): +-----------------+------------+----------+ | kernel | value [%] | diff [%] | +-----------------+------------+----------+ | base | 3.6 | 0.0 | | runnable | 1.0 | -68.86 | | fix | 1.0 | -68.86 | +-----------------+------------+----------+ Power usage [mW] (total - all CPUs): +-----------------+------------+----------+ | kernel | value [mW] | diff [%] | +-----------------+------------+----------+ | base | 129.5 | 0.0 | | runnable | 134.3 | 3.71 | | fix | 129.9 | 0.31 | +-----------------+------------+----------+ Test results look good to me. Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> [...]
On 07/24/23 23:11, Dietmar Eggemann wrote: > Reran the Jankbench test with the patch (fix) on exactly the same > platform (Pixel6, Android 12) I used for v3 (base, runnable): > > https://lkml.kernel.org/r/20230515115735.296329-1-dietmar.eggemann@arm.com > > Max_frame_duration: > +-----------------+------------+ > | kernel | value [ms] | > +-----------------+------------+ > | base | 163.1 | > | runnable | 162.0 | > | fix | 157.1 | > +-----------------+------------+ > > Mean_frame_duration: > +-----------------+------------+----------+ > | kernel | value [ms] | diff [%] | > +-----------------+------------+----------+ > | base | 18.0 | 0.0 | > | runnable | 12.7 | -29.43 | > | fix | 13.0 | -27.78 | > +-----------------+------------+----------+ > > Jank percentage (Jank deadline 16ms): > +-----------------+------------+----------+ > | kernel | value [%] | diff [%] | > +-----------------+------------+----------+ > | base | 3.6 | 0.0 | > | runnable | 1.0 | -68.86 | > | fix | 1.0 | -68.86 | > +-----------------+------------+----------+ > > Power usage [mW] (total - all CPUs): > +-----------------+------------+----------+ > | kernel | value [mW] | diff [%] | > +-----------------+------------+----------+ > | base | 129.5 | 0.0 | > | runnable | 134.3 | 3.71 | > | fix | 129.9 | 0.31 | > +-----------------+------------+----------+ > > Test results look good to me. > > Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> > Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Thanks for re-running the test! Cheers -- Qais Yousef
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a80a73909dc2..77c9f5816c31 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7289,9 +7289,6 @@ cpu_util(int cpu, struct task_struct *p, int dst_cpu, int boost) util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued); - if (boost) - util_est = max(util_est, runnable); - /* * During wake-up @p isn't enqueued yet and doesn't contribute * to any cpu_rq(cpu)->cfs.avg.util_est.enqueued.