Message ID | 4506480.LvFx2qVVIh@kreacher |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b8e4:0:b0:3e9:de7a:15be with SMTP id v4csp1374777vqw; Thu, 27 Jul 2023 13:25:45 -0700 (PDT) X-Google-Smtp-Source: APBJJlGWx8sDQ5WVNVBtdQL//l7D+qQO6kFnduLKlUxkZBzmivOralXEaFYSyflDqn1Fpx07ded+ X-Received: by 2002:a17:90a:ac09:b0:263:6114:f0f9 with SMTP id o9-20020a17090aac0900b002636114f0f9mr336025pjq.42.1690489544800; Thu, 27 Jul 2023 13:25:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690489544; cv=none; d=google.com; s=arc-20160816; b=nQKlTHbDyPds6usti6IbKpI8Oi5ob1hVEP0XfYNSXDvnd6/Dl3z2Cnj8m+OMFThZd7 KOWSeKOjqNk6gcSbF+5mhZWMFYpmyVZjiy3EBDQRAb5X39KhKE63wXg9BFCQC/HDYGmH barG8e51uzeXGpoOcP3XqsgH9uzypByT8A4Zp0clKqwh/hNNrsCNVj9NSO/mNKh0y9/k 0g/4Ffu86n2HJa0C97X87+OTeDvpAK+Ig0k06Rz3mzvbxPL6Pj/EVP4okzPEtT4E3ByK W2RvIVEqeoPZBnpnqQIUCYzQbVaWLSNMpvAYt95ba4QNytSPUDCD+gJAAQFtLPv05xoZ YOWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=j8120/bi8YMktJRRE+fxWRLX3mODqy/VEouEmAOGUQM=; fh=vcb1WTwJPUMTZX6OxkyO+5cKBEoTRHpgQZ5gmsle4eE=; b=tA8Ztbz7EQdAf3HiX6dWxQhRqp+G5YYXKgR0Sm9VO85KUP5KHK38Nqe/oK39OMi1sy SRTWx4G9mRSKUYLfIBuZJbRM4H6jy4LEE6UaHn5249FJkytDIpL+AtZly/SVPS7rwqZP 9wzQNe07Ez5TEwLpIj+87h+JCPulnuyZSQ++v4NNaBVd4Dg/M9OJfubU/3SfNPGWHaDo T5QrVp4ZnOMwv/rgvsZ6poWJpNjVZmXVxGomck244H8bONglDwDrOOHxpBxtpXWX/Bbv FerbJUsBqm+iHJwnnD0d7H3nOvdjtn9jiMmWp27e8O0Yljg9RpNV+nY/y9RaeojrjF4u ECYw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z14-20020a1709027e8e00b001bb96a30e21si1792964pla.70.2023.07.27.13.25.31; Thu, 27 Jul 2023 13:25:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232220AbjG0UFi (ORCPT <rfc822;kloczko.tomasz@gmail.com> + 99 others); Thu, 27 Jul 2023 16:05:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42564 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230468AbjG0UFh (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 27 Jul 2023 16:05:37 -0400 Received: from cloudserver094114.home.pl (cloudserver094114.home.pl [79.96.170.134]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F0133213F; Thu, 27 Jul 2023 13:05:35 -0700 (PDT) Received: from localhost (127.0.0.1) (HELO v370.home.net.pl) by /usr/run/smtp (/usr/run/postfix/private/idea_relay_lmtp) via UNIX with SMTP (IdeaSmtpServer 5.2.0) id 31a11507767cd30a; Thu, 27 Jul 2023 22:05:33 +0200 Authentication-Results: v370.home.net.pl; spf=softfail (domain owner discourages use of this host) smtp.mailfrom=rjwysocki.net (client-ip=195.136.19.94; helo=[195.136.19.94]; envelope-from=rjw@rjwysocki.net; receiver=<UNKNOWN>) Received: from kreacher.localnet (unknown [195.136.19.94]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by v370.home.net.pl (Postfix) with ESMTPSA id 290E7661E19; Thu, 27 Jul 2023 22:05:33 +0200 (CEST) From: "Rafael J. Wysocki" <rjw@rjwysocki.net> To: Linux PM <linux-pm@vger.kernel.org> Cc: LKML <linux-kernel@vger.kernel.org>, Peter Zijlstra <peterz@infradead.org>, Anna-Maria Behnsen <anna-maria@linutronix.de>, Frederic Weisbecker <frederic@kernel.org>, Kajetan Puchalski <kajetan.puchalski@arm.com> Subject: [PATCH v1] cpuidle: teo: Update idle duration estimate when choosing shallower state Date: Thu, 27 Jul 2023 22:05:33 +0200 Message-ID: <4506480.LvFx2qVVIh@kreacher> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="UTF-8" X-CLIENT-IP: 195.136.19.94 X-CLIENT-HOSTNAME: 195.136.19.94 X-VADE-SPAMSTATE: clean X-VADE-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedviedrieeggdduudekucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecujffqoffgrffnpdggtffipffknecuuegrihhlohhuthemucduhedtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefhvfevufffkfgggfgtsehtufertddttdejnecuhfhrohhmpedftfgrfhgrvghlucflrdcuhgihshhotghkihdfuceorhhjfiesrhhjfiihshhotghkihdrnhgvtheqnecuggftrfgrthhtvghrnhepffffffekgfehheffleetieevfeefvefhleetjedvvdeijeejledvieehueevueffnecukfhppeduleehrddufeeirdduledrleegnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehinhgvthepudelhedrudefiedrudelrdelgedphhgvlhhopehkrhgvrggthhgvrhdrlhhotggrlhhnvghtpdhmrghilhhfrhhomhepfdftrghfrggvlhculfdrucghhihsohgtkhhifdcuoehrjhifsehrjhifhihsohgtkhhirdhnvghtqedpnhgspghrtghpthhtohepiedprhgtphhtthhopehlihhnuhigqdhpmhesvhhgvghrrdhkvghrnhgvlhdrohhrghdprhgtphhtthhopehlihhnuhigqdhkvghrnhgvlhesvhhgvghrrdhkvghrnhgvlhdrohhrghdprhgtphhtthhopehpvghtvghriiesihhnfhhrrgguvggrugdrohhrghdprhgtphhtthhopegrnhhnrgdqmhgrrhhirgeslhhinhhuthhrohhnihigrdguvgdprhgtphhtthhopehfrhgvuggv rhhitgeskhgvrhhnvghlrdhorhhgpdhrtghpthhtohepkhgrjhgvthgrnhdrphhutghhrghlshhkihesrghrmhdrtghomh X-DCC--Metrics: v370.home.net.pl 1024; Body=6 Fuz1=6 Fuz2=6 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772606765364141449 X-GMAIL-MSGID: 1772606765364141449 |
Series |
[v1] cpuidle: teo: Update idle duration estimate when choosing shallower state
|
|
Commit Message
Rafael J. Wysocki
July 27, 2023, 8:05 p.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> The TEO governor takes CPU utilization into account by refining idle state selection when the utilization is above a certain threshold. The idle state selection is then refined by choosing an idle state shallower than the previously selected one. However, when this is done, the idle duration estimate needs to be updated so as to prevent the scheduler tick from being stopped while the candidate idle state is shallow, which may lead to excessive energy usage if the CPU is not interrupted quickly enough going forward. Moreover, in case the scheduler tick has been stopped already and the new idle duration estimate is too small, the replacement candidate state cannot be used. Modify the relevant code to take the above observations into account. Fixes: 9ce0f7c4bc64 ("cpuidle: teo: Introduce util-awareness") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> --- @Peter: This doesn't attempt to fix the tick stopping problem, it just makes the current behavior consistent. @Anna-Maria: This is likely to basically prevent the tick from being stopped at all if the CPU utilization is above a certain threshold. I'm wondering if your results will be affected by it and in what way. --- drivers/cpuidle/governors/teo.c | 33 ++++++++++++++++++++++++++------- 1 file changed, 26 insertions(+), 7 deletions(-)
Comments
On Thu, Jul 27, 2023 at 10:05 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > The TEO governor takes CPU utilization into account by refining idle state > selection when the utilization is above a certain threshold. The idle state > selection is then refined by choosing an idle state shallower than the > previously selected one. > > However, when this is done, the idle duration estimate needs to be updated > so as to prevent the scheduler tick from being stopped while the candidate > idle state is shallow, which may lead to excessive energy usage if the CPU > is not interrupted quickly enough going forward. Moreover, in case the > scheduler tick has been stopped already and the new idle duration estimate > is too small, the replacement candidate state cannot be used. > > Modify the relevant code to take the above observations into account. > > Fixes: 9ce0f7c4bc64 ("cpuidle: teo: Introduce util-awareness") > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > > @Peter: This doesn't attempt to fix the tick stopping problem, it just makes > the current behavior consistent. > > @Anna-Maria: This is likely to basically prevent the tick from being stopped > at all if the CPU utilization is above a certain threshold. I'm wondering if > your results will be affected by it and in what way. > > --- > drivers/cpuidle/governors/teo.c | 33 ++++++++++++++++++++++++++------- > 1 file changed, 26 insertions(+), 7 deletions(-) > > Index: linux-pm/drivers/cpuidle/governors/teo.c > =================================================================== > --- linux-pm.orig/drivers/cpuidle/governors/teo.c > +++ linux-pm/drivers/cpuidle/governors/teo.c > @@ -397,13 +397,22 @@ static int teo_select(struct cpuidle_dri > * the shallowest non-polling state and exit. > */ > if (drv->state_count < 3 && cpu_data->utilized) { > - for (i = 0; i < drv->state_count; ++i) { > - if (!dev->states_usage[i].disable && > - !(drv->states[i].flags & CPUIDLE_FLAG_POLLING)) { > - idx = i; > + /* > + * If state 0 is enabled and it is not a polling one, select it > + * right away and update the idle duration estimate accordingly, > + * unless the scheduler tick has been stopped. > + */ > + if (!idx && !(drv->states[0].flags & CPUIDLE_FLAG_POLLING)) { > + s64 span_ns = teo_middle_of_bin(0, drv); > + > + if (teo_time_ok(span_ns)) { > + duration_ns = span_ns; > goto end; > } > } > + /* Assume that state 1 is not a polling one and select it. */ Well, I should also check if it is not disabled. Will send a v2 tomorrow. > + idx = 1; > + goto end; > } > > /* > @@ -539,10 +548,20 @@ static int teo_select(struct cpuidle_dri > > /* > * If the CPU is being utilized over the threshold, choose a shallower > - * non-polling state to improve latency > + * non-polling state to improve latency, unless the scheduler tick has > + * been stopped already and the shallower state's target residency is > + * not sufficiently large. > */ > - if (cpu_data->utilized) > - idx = teo_find_shallower_state(drv, dev, idx, duration_ns, true); > + if (cpu_data->utilized) { > + s64 span_ns; > + > + i = teo_find_shallower_state(drv, dev, idx, duration_ns, true); > + span_ns = teo_middle_of_bin(i, drv); > + if (teo_time_ok(span_ns)) { > + idx = i; > + duration_ns = span_ns; > + } > + } > > end: > /* > > >
On Thu, Jul 27, 2023 at 10:12:56PM +0200, Rafael J. Wysocki wrote: > On Thu, Jul 27, 2023 at 10:05 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > The TEO governor takes CPU utilization into account by refining idle state > > selection when the utilization is above a certain threshold. The idle state > > selection is then refined by choosing an idle state shallower than the > > previously selected one. > > > > However, when this is done, the idle duration estimate needs to be updated > > so as to prevent the scheduler tick from being stopped while the candidate > > idle state is shallow, which may lead to excessive energy usage if the CPU > > is not interrupted quickly enough going forward. Moreover, in case the > > scheduler tick has been stopped already and the new idle duration estimate > > is too small, the replacement candidate state cannot be used. > > > > Modify the relevant code to take the above observations into account. > > > > Fixes: 9ce0f7c4bc64 ("cpuidle: teo: Introduce util-awareness") > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > --- > > > > @Peter: This doesn't attempt to fix the tick stopping problem, it just makes > > the current behavior consistent. > > > > @Anna-Maria: This is likely to basically prevent the tick from being stopped > > at all if the CPU utilization is above a certain threshold. I'm wondering if > > your results will be affected by it and in what way. > > > > --- > > drivers/cpuidle/governors/teo.c | 33 ++++++++++++++++++++++++++------- > > 1 file changed, 26 insertions(+), 7 deletions(-) > > > > Index: linux-pm/drivers/cpuidle/governors/teo.c > > =================================================================== > > --- linux-pm.orig/drivers/cpuidle/governors/teo.c > > +++ linux-pm/drivers/cpuidle/governors/teo.c > > @@ -397,13 +397,22 @@ static int teo_select(struct cpuidle_dri > > * the shallowest non-polling state and exit. > > */ > > if (drv->state_count < 3 && cpu_data->utilized) { > > - for (i = 0; i < drv->state_count; ++i) { > > - if (!dev->states_usage[i].disable && > > - !(drv->states[i].flags & CPUIDLE_FLAG_POLLING)) { > > - idx = i; > > + /* > > + * If state 0 is enabled and it is not a polling one, select it > > + * right away and update the idle duration estimate accordingly, > > + * unless the scheduler tick has been stopped. > > + */ > > + if (!idx && !(drv->states[0].flags & CPUIDLE_FLAG_POLLING)) { > > + s64 span_ns = teo_middle_of_bin(0, drv); > > + > > + if (teo_time_ok(span_ns)) { > > + duration_ns = span_ns; > > goto end; > > } > > } > > + /* Assume that state 1 is not a polling one and select it. */ > > Well, I should also check if it is not disabled. Will send a v2 tomorrow. > > > + idx = 1; > > + goto end; > > } > > > > /* > > @@ -539,10 +548,20 @@ static int teo_select(struct cpuidle_dri > > > > /* > > * If the CPU is being utilized over the threshold, choose a shallower > > - * non-polling state to improve latency > > + * non-polling state to improve latency, unless the scheduler tick has > > + * been stopped already and the shallower state's target residency is > > + * not sufficiently large. > > */ > > - if (cpu_data->utilized) > > - idx = teo_find_shallower_state(drv, dev, idx, duration_ns, true); > > + if (cpu_data->utilized) { > > + s64 span_ns; > > + > > + i = teo_find_shallower_state(drv, dev, idx, duration_ns, true); > > + span_ns = teo_middle_of_bin(i, drv); > > + if (teo_time_ok(span_ns)) { > > + idx = i; > > + duration_ns = span_ns; > > + } > > + } So I'm not a huge fan of that utilized thing to begin with.. that feels like a hack. I think my patch 3 would achieve much the same, because if busy, you'll have short idles, which will drive the hit+intercept to favour low states, and voila. I didn't take it out -- yet -- because I haven't had much time to evaluate it. Simply lowering one state at a random busy threshold is duct-tape if ever I saw some.
On Sat, Jul 29, 2023 at 11:02:55AM +0200, Peter Zijlstra wrote: > On Thu, Jul 27, 2023 at 10:12:56PM +0200, Rafael J. Wysocki wrote: > > On Thu, Jul 27, 2023 at 10:05 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > The TEO governor takes CPU utilization into account by refining idle state > > > selection when the utilization is above a certain threshold. The idle state > > > selection is then refined by choosing an idle state shallower than the > > > previously selected one. > > > > > > However, when this is done, the idle duration estimate needs to be updated > > > so as to prevent the scheduler tick from being stopped while the candidate > > > idle state is shallow, which may lead to excessive energy usage if the CPU > > > is not interrupted quickly enough going forward. Moreover, in case the > > > scheduler tick has been stopped already and the new idle duration estimate > > > is too small, the replacement candidate state cannot be used. > > > > > > Modify the relevant code to take the above observations into account. > > > > > > Fixes: 9ce0f7c4bc64 ("cpuidle: teo: Introduce util-awareness") > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > --- > > > > > > @Peter: This doesn't attempt to fix the tick stopping problem, it just makes > > > the current behavior consistent. > > > > > > @Anna-Maria: This is likely to basically prevent the tick from being stopped > > > at all if the CPU utilization is above a certain threshold. I'm wondering if > > > your results will be affected by it and in what way. > > > > > > --- > > > drivers/cpuidle/governors/teo.c | 33 ++++++++++++++++++++++++++------- > > > 1 file changed, 26 insertions(+), 7 deletions(-) > > > > > > Index: linux-pm/drivers/cpuidle/governors/teo.c > > > =================================================================== > > > --- linux-pm.orig/drivers/cpuidle/governors/teo.c > > > +++ linux-pm/drivers/cpuidle/governors/teo.c > > > @@ -397,13 +397,22 @@ static int teo_select(struct cpuidle_dri > > > * the shallowest non-polling state and exit. > > > */ > > > if (drv->state_count < 3 && cpu_data->utilized) { > > > - for (i = 0; i < drv->state_count; ++i) { > > > - if (!dev->states_usage[i].disable && > > > - !(drv->states[i].flags & CPUIDLE_FLAG_POLLING)) { > > > - idx = i; > > > + /* > > > + * If state 0 is enabled and it is not a polling one, select it > > > + * right away and update the idle duration estimate accordingly, > > > + * unless the scheduler tick has been stopped. > > > + */ > > > + if (!idx && !(drv->states[0].flags & CPUIDLE_FLAG_POLLING)) { > > > + s64 span_ns = teo_middle_of_bin(0, drv); > > > + > > > + if (teo_time_ok(span_ns)) { > > > + duration_ns = span_ns; > > > goto end; > > > } > > > } > > > + /* Assume that state 1 is not a polling one and select it. */ > > > > Well, I should also check if it is not disabled. Will send a v2 tomorrow. > > > > > + idx = 1; > > > + goto end; > > > } > > > > > > /* > > > @@ -539,10 +548,20 @@ static int teo_select(struct cpuidle_dri > > > > > > /* > > > * If the CPU is being utilized over the threshold, choose a shallower > > > - * non-polling state to improve latency > > > + * non-polling state to improve latency, unless the scheduler tick has > > > + * been stopped already and the shallower state's target residency is > > > + * not sufficiently large. > > > */ > > > - if (cpu_data->utilized) > > > - idx = teo_find_shallower_state(drv, dev, idx, duration_ns, true); > > > + if (cpu_data->utilized) { > > > + s64 span_ns; > > > + > > > + i = teo_find_shallower_state(drv, dev, idx, duration_ns, true); > > > + span_ns = teo_middle_of_bin(i, drv); > > > + if (teo_time_ok(span_ns)) { > > > + idx = i; > > > + duration_ns = span_ns; > > > + } > > > + } > > So I'm not a huge fan of that utilized thing to begin with.. that feels > like a hack. I think my patch 3 would achieve much the same, because if > busy, you'll have short idles, which will drive the hit+intercept to > favour low states, and voila. Not exactly, simply relying on the hit/intercept metrics while functional still just amounts to pretty much guessing as it does not take any information about what the cpu might be doing into account (beyond the timer events but that's the case for both approaches). Apart from the approach of "extrapolating future results from past mistakes" being slightly questionable to begin with it's in my view made even worse by the fact that the metrics are per cpu - meaning they get essentially invalidated when tasks get migrated between cores. Using just the hit/intercept metrics approach you end up bumping into the two scenarios below: 1) workload with short idle times -> governor selects too deep states then adjusts to shallower idle -> -> workload changes to longer idle times -> governor selects too shallow then adjusts to deeper -> workload changes to shorter idle -> governor keeps selecting too deep states before adjusting From looking at many traces I had this happens pretty often and we end up with the governor selecting deep idle while the avg util on the cpu is still massive and by looking at the util we could clearly tell that deep idle here would be a mistake. The metrics cannot avoid making that mistake, they need to make several of them in order to adjust. You can just get stuck ping-ponging between being wrong both ways. 2) A reasonably large task gets migrated onto a different CPU. The metrics on the target CPU still favour deeper idle as it wasn't doing anything up until the migration, the metrics on the previous CPU favour shallower states because of the workload having just run there. Now you have the target CPU selecing too deep states before it can adjust and the previous one selecting too shallow. With the util approach on the other hand, the change in util will be reflected right away so we can avoid making said mistakes on both the cores. > I didn't take it out -- yet -- because I haven't had much time to > evaluate it. > > Simply lowering one state at a random busy threshold is duct-tape if > ever I saw some. There might be a platform difference here, I do think it probably makes more sense on Arm and similar platforms where we only have 2 states to choose from so you use the threshold to distinguish between 'deep idle desirable' and 'deep idle not desirable'. It does feel slighly more hacky on Intel and other platforms with however many states those have as instead of "change scenario A to B" it ends up more like "lower scenario A". Doesn't make it a bad idea though, it can still be beneficial and bring improvements just like on Arm I think. My initial suggestion was to make this a separate governor for platforms and use cases where this makes sense. Besides, the threshold isn't random - just empirically the level that worked best for the approach. As I wrote in the other thread, it might benefit from being tweaked depending on the platform. That's not unique to this patchset in any case, the kernel is full of these arbitrary numbers that come from "worked on the developer's machine" and not much else after all. I put the numbers from testing this in the original thread for the patchset, the util approach was consistently getting much less too deep sleeps than the metrics approach in all the workloads I tested to the point of being noticeable on both the performance and power usage plots for our use cases (Android mobile phone). I never advocated for this to be made the default but it is useful for our side of the industry so at the very least we should have this as an option. In my view given that x86 and arm do cpuidle very differently we probably should have separate governors instead of trying to make a one size fits all approach but that's a different story.
Index: linux-pm/drivers/cpuidle/governors/teo.c =================================================================== --- linux-pm.orig/drivers/cpuidle/governors/teo.c +++ linux-pm/drivers/cpuidle/governors/teo.c @@ -397,13 +397,22 @@ static int teo_select(struct cpuidle_dri * the shallowest non-polling state and exit. */ if (drv->state_count < 3 && cpu_data->utilized) { - for (i = 0; i < drv->state_count; ++i) { - if (!dev->states_usage[i].disable && - !(drv->states[i].flags & CPUIDLE_FLAG_POLLING)) { - idx = i; + /* + * If state 0 is enabled and it is not a polling one, select it + * right away and update the idle duration estimate accordingly, + * unless the scheduler tick has been stopped. + */ + if (!idx && !(drv->states[0].flags & CPUIDLE_FLAG_POLLING)) { + s64 span_ns = teo_middle_of_bin(0, drv); + + if (teo_time_ok(span_ns)) { + duration_ns = span_ns; goto end; } } + /* Assume that state 1 is not a polling one and select it. */ + idx = 1; + goto end; } /* @@ -539,10 +548,20 @@ static int teo_select(struct cpuidle_dri /* * If the CPU is being utilized over the threshold, choose a shallower - * non-polling state to improve latency + * non-polling state to improve latency, unless the scheduler tick has + * been stopped already and the shallower state's target residency is + * not sufficiently large. */ - if (cpu_data->utilized) - idx = teo_find_shallower_state(drv, dev, idx, duration_ns, true); + if (cpu_data->utilized) { + s64 span_ns; + + i = teo_find_shallower_state(drv, dev, idx, duration_ns, true); + span_ns = teo_middle_of_bin(i, drv); + if (teo_time_ok(span_ns)) { + idx = i; + duration_ns = span_ns; + } + } end: /*