Message ID | 20230505152440.142265-1-hongyan.xia2@arm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp506984vqo; Fri, 5 May 2023 08:49:38 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5AMUxtEKABQloC00ZwBE6ev6+DcxNq/PrMDASexqKgGRqaue+uKwFNFa4FjA+yH8GMFgNa X-Received: by 2002:a17:90b:350d:b0:246:9517:30b6 with SMTP id ls13-20020a17090b350d00b00246951730b6mr7186206pjb.4.1683301778415; Fri, 05 May 2023 08:49:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683301778; cv=none; d=google.com; s=arc-20160816; b=a5rP0vD49nkncruYjp58UWpSa0OVu87Bro/M+Eqwe7aYrrvhTwXQK0I/Wcl+GjzWcv 4eg0+YSCT5f1DtCIhKunDcpouL7n1v2YKJnK71H1tbpPnu3JU4JOAl0QlnOkNvAQFd+s uOQwgpNB+cEPqLv2LpJ4RN1Ses+aXVa7QLnDxqvA+pf2LPMS4xsCCGDGrERUMOSluL2Z T1oqi79+1frj9mCxNh91OvHQiojwMjvRZoGY4HyJcMnObnqV8rCeAF6zEtOVtWh0h2/W 0NU5LhQWn1Vo8aiS6RRSkMQmvEYVbUE2hsoqTruNrHH8YXQmx0dVLc13s+cLQUrXnSec lmig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=8natiCZRZD4NcY4F4HQGSfEkKhN2Ua0oaXM7JEV0o8I=; b=GT41i24qRaR3LV9KWSeRx1magCNz8Ciz1kgszB9HbulH3cEueSNmBXkgLnSMmxzwHl fsKa4Aso2w6oNXKlMhUjIhcvQ/DfsPZUayl6KtgvRWCBfDJmcF28S7/PMq9+4ztP0FWD UtO0NlUzkB4PmoiHuyEA+2bnDuh9nbwsd7lmVwtZCegmL/QUtrKcq/+8GW63AVX6G7q+ 4SshzQCxSh20mRQkzLDTPS6VNbxn6lut+3nFOGk+pJRLckW7hsevPiTY+43bvB3v4ENS 7SLQvy9yNAdrQJYx3dNcBgjlG1i634KfhM8ExIU/dvPyFAhsju9xyUOxBvWB7aHBMMca DTCw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ls11-20020a17090b350b00b002502a6bb94dsi2043133pjb.179.2023.05.05.08.49.25; Fri, 05 May 2023 08:49:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232747AbjEEP0l (ORCPT <rfc822;jz.zhangjin@gmail.com> + 99 others); Fri, 5 May 2023 11:26:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232955AbjEEP0Y (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 5 May 2023 11:26:24 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 14C9E1A107; Fri, 5 May 2023 08:25:18 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 292E51FB; Fri, 5 May 2023 08:26:02 -0700 (PDT) Received: from e130256.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.121.207.14]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E80D13F5A1; Fri, 5 May 2023 08:25:16 -0700 (PDT) From: Hongyan Xia <hongyan.xia2@arm.com> To: Jonathan Corbet <corbet@lwn.net> Cc: Hongyan Xia <hongyan.xia2@arm.com>, Qais Yousef <qyousef@layalina.io>, Vincent Guittot <vincent.guittot@linaro.org>, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] sched/documentation: elaborate on uclamp limitations Date: Fri, 5 May 2023 16:24:39 +0100 Message-Id: <20230505152440.142265-1-hongyan.xia2@arm.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1765068763502985405?= X-GMAIL-MSGID: =?utf-8?q?1765069845547572065?= |
Series |
sched/documentation: elaborate on uclamp limitations
|
|
Commit Message
Hongyan Xia
May 5, 2023, 3:24 p.m. UTC
The story in 5.2 about util_avg abruptly jumping from 300 when
Fmax/Fmin == 3 to 1024 when Fmax/Fmin == 4 hides some details about how
clock_pelt works behind the scenes. Explicitly mention it to make it
easier for readers to follow.
Signed-off-by: Hongyan Xia <hongyan.xia2@arm.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
---
Documentation/scheduler/sched-util-clamp.rst | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
Comments
Please CC sched maintainers (Ingo + Peter) next time as they should pick this up ultimately and they won't see it from the list only. On 05/05/23 16:24, Hongyan Xia wrote: > The story in 5.2 about util_avg abruptly jumping from 300 when > Fmax/Fmin == 3 to 1024 when Fmax/Fmin == 4 hides some details about how > clock_pelt works behind the scenes. Explicitly mention it to make it > easier for readers to follow. > > Signed-off-by: Hongyan Xia <hongyan.xia2@arm.com> > Cc: Qais Yousef <qyousef@layalina.io> > Cc: Vincent Guittot <vincent.guittot@linaro.org> > --- > Documentation/scheduler/sched-util-clamp.rst | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/Documentation/scheduler/sched-util-clamp.rst b/Documentation/scheduler/sched-util-clamp.rst > index 74d5b7c6431d..524df07bceba 100644 > --- a/Documentation/scheduler/sched-util-clamp.rst > +++ b/Documentation/scheduler/sched-util-clamp.rst > @@ -669,6 +669,19 @@ but not proportional to Fmax/Fmin. > > p0->util_avg = 300 + small_error > > +The reason why util_avg is around 300 even though it runs for 900 at Fmin is: > +Although running at Fmin reduces the rate of rq_clock_pelt() to 1/3 thus > +accumulates util_sum at 1/3 of the rate at Fmax, the clock period > +(rq_clock_pelt() now minus previous rq_clock_pelt()) in: > + > +:: > + > + util_sum / clock period = util_avg > + > +does not shrink to 1/3, since rq->clock_pelt is periodically synchronized with > +rq->clock_task as long as there's idle time. As a result, we get util_avg of > +about 300, not 900. > + I feel neutral about these changes. It does answer some questions, but poses more questions like what is clock_pelt. So we might end up in recursive regression of explaining the explanation. I don't think we have a doc about clock_pelt. Worth adding one and just add a reference to it from here for those interested in understanding more details on why we need to go to idle to correct util_avg? I think our code has explanation, a reference to update_rq_clock_pelt() might suffice too. Vincent, do you have an opinion here? Thanks! -- Qais Yousef > Now if the ratio of Fmax/Fmin is 4, the maximum value becomes: > > :: > @@ -682,6 +695,10 @@ this happens, then the _actual_ util_avg will become: > > p0->util_avg = 1024 > > +This is because rq->clock_pelt is no longer synchronized with the task clock. > +The clock period therefore is proportionally shrunk by the same ratio of > +(Fmax/Fmin), giving us a maximal util_avg of 1024. > + > If task p1 wakes up on this CPU, which have: > > :: > -- > 2.34.1 >
Hi Qais, On 2023-05-18 12:30, Qais Yousef wrote: > Please CC sched maintainers (Ingo + Peter) next time as they should pick this > up ultimately and they won't see it from the list only. Will do. I was using the get_maintainers script and I thought that gave me all the CCs. > On 05/05/23 16:24, Hongyan Xia wrote: >> The story in 5.2 about util_avg abruptly jumping from 300 when >> Fmax/Fmin == 3 to 1024 when Fmax/Fmin == 4 hides some details about how >> clock_pelt works behind the scenes. Explicitly mention it to make it >> easier for readers to follow. >> >> Signed-off-by: Hongyan Xia <hongyan.xia2@arm.com> >> Cc: Qais Yousef <qyousef@layalina.io> >> Cc: Vincent Guittot <vincent.guittot@linaro.org> >> --- >> Documentation/scheduler/sched-util-clamp.rst | 17 +++++++++++++++++ >> 1 file changed, 17 insertions(+) >> >> diff --git a/Documentation/scheduler/sched-util-clamp.rst b/Documentation/scheduler/sched-util-clamp.rst >> index 74d5b7c6431d..524df07bceba 100644 >> --- a/Documentation/scheduler/sched-util-clamp.rst >> +++ b/Documentation/scheduler/sched-util-clamp.rst >> @@ -669,6 +669,19 @@ but not proportional to Fmax/Fmin. >> >> p0->util_avg = 300 + small_error >> >> +The reason why util_avg is around 300 even though it runs for 900 at Fmin is: >> +Although running at Fmin reduces the rate of rq_clock_pelt() to 1/3 thus >> +accumulates util_sum at 1/3 of the rate at Fmax, the clock period >> +(rq_clock_pelt() now minus previous rq_clock_pelt()) in: >> + >> +:: >> + >> + util_sum / clock period = util_avg >> + >> +does not shrink to 1/3, since rq->clock_pelt is periodically synchronized with >> +rq->clock_task as long as there's idle time. As a result, we get util_avg of >> +about 300, not 900. >> + > > I feel neutral about these changes. It does answer some questions, but poses > more questions like what is clock_pelt. So we might end up in recursive > regression of explaining the explanation. > > I don't think we have a doc about clock_pelt. Worth adding one and just add > a reference to it from here for those interested in understanding more details > on why we need to go to idle to correct util_avg? I think our code has > explanation, a reference to update_rq_clock_pelt() might suffice too. > > Vincent, do you have an opinion here? Sounds reasonable. I don't mind drafting a doc or just a couple of paragraphs for clock_pelt (or all the different clocks like clock, clock_task, clock_idle_*), if that's what we can agree on. Hongyan
On Thu, 18 May 2023 at 14:42, Hongyan Xia <hongyan.xia2@arm.com> wrote: > > Hi Qais, > > On 2023-05-18 12:30, Qais Yousef wrote: > > Please CC sched maintainers (Ingo + Peter) next time as they should pick this > > up ultimately and they won't see it from the list only. > > Will do. I was using the get_maintainers script and I thought that gave > me all the CCs. > > > On 05/05/23 16:24, Hongyan Xia wrote: > >> The story in 5.2 about util_avg abruptly jumping from 300 when > >> Fmax/Fmin == 3 to 1024 when Fmax/Fmin == 4 hides some details about how > >> clock_pelt works behind the scenes. Explicitly mention it to make it > >> easier for readers to follow. > >> > >> Signed-off-by: Hongyan Xia <hongyan.xia2@arm.com> > >> Cc: Qais Yousef <qyousef@layalina.io> > >> Cc: Vincent Guittot <vincent.guittot@linaro.org> > >> --- > >> Documentation/scheduler/sched-util-clamp.rst | 17 +++++++++++++++++ > >> 1 file changed, 17 insertions(+) > >> > >> diff --git a/Documentation/scheduler/sched-util-clamp.rst b/Documentation/scheduler/sched-util-clamp.rst > >> index 74d5b7c6431d..524df07bceba 100644 > >> --- a/Documentation/scheduler/sched-util-clamp.rst > >> +++ b/Documentation/scheduler/sched-util-clamp.rst > >> @@ -669,6 +669,19 @@ but not proportional to Fmax/Fmin. > >> > >> p0->util_avg = 300 + small_error > >> > >> +The reason why util_avg is around 300 even though it runs for 900 at Fmin is: What does it mean running for 900 at Fmin ? util_avg is a ratio in the range [0:1024] without time unit > >> +Although running at Fmin reduces the rate of rq_clock_pelt() to 1/3 thus > >> +accumulates util_sum at 1/3 of the rate at Fmax, the clock period > >> +(rq_clock_pelt() now minus previous rq_clock_pelt()) in: > >> + > >> +:: > >> + > >> + util_sum / clock period = util_avg I don't get the meaning of the formula above ? There is no "clock period" (although I'm not sure what it means here) involved when computing util_avg Also, there is no linear relation between util_avg and Fmin/Fmax ratio. Fmin/Fmax ratio is meaningful in regards to the ratio between running time and period time of a periodic task. I understand the reference of pelt in this document as a quite simplified description of PELT so I'm not sure that adding a partial explanation will help. It will probably cause more confusion to people. The only thing that is sure, is that PELT expects some idle time to stay fully invariant for periodic task > >> + > >> +does not shrink to 1/3, since rq->clock_pelt is periodically synchronized with > >> +rq->clock_task as long as there's idle time. As a result, we get util_avg of > >> +about 300, not 900. > >> + > > > > I feel neutral about these changes. It does answer some questions, but poses > > more questions like what is clock_pelt. So we might end up in recursive > > regression of explaining the explanation. > > > > I don't think we have a doc about clock_pelt. Worth adding one and just add > > a reference to it from here for those interested in understanding more details > > on why we need to go to idle to correct util_avg? I think our code has > > explanation, a reference to update_rq_clock_pelt() might suffice too. > > > > Vincent, do you have an opinion here? > > Sounds reasonable. I don't mind drafting a doc or just a couple of > paragraphs for clock_pelt (or all the different clocks like clock, > clock_task, clock_idle_*), if that's what we can agree on. I don't have a strong opinion on adding a doc on PELT. > > Hongyan
On 23/05/2023 11:23, Vincent Guittot wrote: > On Thu, 18 May 2023 at 14:42, Hongyan Xia <hongyan.xia2@arm.com> wrote: >> >> Hi Qais, >> >> On 2023-05-18 12:30, Qais Yousef wrote: >>> Please CC sched maintainers (Ingo + Peter) next time as they should pick this >>> up ultimately and they won't see it from the list only. >> >> Will do. I was using the get_maintainers script and I thought that gave >> me all the CCs. >> >>> On 05/05/23 16:24, Hongyan Xia wrote: [...] >>>> diff --git a/Documentation/scheduler/sched-util-clamp.rst b/Documentation/scheduler/sched-util-clamp.rst >>>> index 74d5b7c6431d..524df07bceba 100644 >>>> --- a/Documentation/scheduler/sched-util-clamp.rst >>>> +++ b/Documentation/scheduler/sched-util-clamp.rst >>>> @@ -669,6 +669,19 @@ but not proportional to Fmax/Fmin. >>>> >>>> p0->util_avg = 300 + small_error >>>> >>>> +The reason why util_avg is around 300 even though it runs for 900 at Fmin is: > > What does it mean running for 900 at Fmin ? util_avg is a ratio in the > range [0:1024] without time unit > >>>> +Although running at Fmin reduces the rate of rq_clock_pelt() to 1/3 thus >>>> +accumulates util_sum at 1/3 of the rate at Fmax, the clock period >>>> +(rq_clock_pelt() now minus previous rq_clock_pelt()) in: >>>> + >>>> +:: >>>> + >>>> + util_sum / clock period = util_avg > > I don't get the meaning of the formula above ? There is no "clock > period" (although I'm not sure what it means here) involved when > computing util_avg I also didn't get this one. IMHO. the relation between util_avg and util_sum is `divider = LOAD_AVG_MAX - 1024 + avg->period_contrib`. But I can't see how this matters here. The crucial point here is IMHO as long we have idle time (p->util_avg < CPU (current) capacity) the util_avg will not raise to 1024 since at wakeup util_avg will be only decayed (since the task was sleeping, i.e. !!se->on_rq = 0). And we are scale invariant thanks to the functionality in update_rq_clock_pelt() (which is executed when p is running). The pelt clock update at this moment (wakeup) is setting clock_pelt to clock_task since rq->curr is the idle task but IMHO that is not the reason why p->util_avg behaves like this. The moment `p->util_avg >= CPU (current) capacity` there is no idle time left, i.e. no such `only decay` updates happens for p anymore (only `accrue/decay` updates in tick) and the result is that p->util_avg goes 1024. > Also, there is no linear relation between util_avg and Fmin/Fmax > ratio. Fmin/Fmax ratio is meaningful in regards to the ratio between > running time and period time of a periodic task. I understand the > reference of pelt in this document as a quite simplified description > of PELT so I'm not sure that adding a partial explanation will help. > It will probably cause more confusion to people. The only thing that > is sure, is that PELT expects some idle time to stay fully invariant > for periodic task +1 ... we have to be able to understand the code. BTW, schedutil.rst has also paragraphs about PELT and `Frequency / CPU Invariance` and also refers to kernel/sched/pelt.h:update_rq_clock_pelt() for details. [...]
diff --git a/Documentation/scheduler/sched-util-clamp.rst b/Documentation/scheduler/sched-util-clamp.rst index 74d5b7c6431d..524df07bceba 100644 --- a/Documentation/scheduler/sched-util-clamp.rst +++ b/Documentation/scheduler/sched-util-clamp.rst @@ -669,6 +669,19 @@ but not proportional to Fmax/Fmin. p0->util_avg = 300 + small_error +The reason why util_avg is around 300 even though it runs for 900 at Fmin is: +Although running at Fmin reduces the rate of rq_clock_pelt() to 1/3 thus +accumulates util_sum at 1/3 of the rate at Fmax, the clock period +(rq_clock_pelt() now minus previous rq_clock_pelt()) in: + +:: + + util_sum / clock period = util_avg + +does not shrink to 1/3, since rq->clock_pelt is periodically synchronized with +rq->clock_task as long as there's idle time. As a result, we get util_avg of +about 300, not 900. + Now if the ratio of Fmax/Fmin is 4, the maximum value becomes: :: @@ -682,6 +695,10 @@ this happens, then the _actual_ util_avg will become: p0->util_avg = 1024 +This is because rq->clock_pelt is no longer synchronized with the task clock. +The clock period therefore is proportionally shrunk by the same ratio of +(Fmax/Fmin), giving us a maximal util_avg of 1024. + If task p1 wakes up on this CPU, which have: ::