Message ID | 20221130153204.2085591-1-kajetan.puchalski@arm.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp998443wrr; Wed, 30 Nov 2022 07:38:54 -0800 (PST) X-Google-Smtp-Source: AA0mqf6V4erIgr65kPGyDjxzkjrRjN39t1cyeSD2HdRMc7lWCLNK1P2ZKW+0IkcZmq+9++x+ZkjY X-Received: by 2002:a63:5262:0:b0:477:6e5d:4e44 with SMTP id s34-20020a635262000000b004776e5d4e44mr39859912pgl.7.1669822734240; Wed, 30 Nov 2022 07:38:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669822734; cv=none; d=google.com; s=arc-20160816; b=kxn7vaZ1QEOlyEOgRN7DCq8K5APeu04fgBRhLX3zLrvxxSP76a5KCNQ6RGgtfRadbc SxN91Ueg16gxCnnULzu/DBpNfcg4IhYFdoRYAfnrNlfupazFRO6a4/cCo06vhKuRnpni hialRB9JcDtJp8xM0eBLfsW38kd3qeVN0tAEz93bgQ6fhaSiMGILZfc01drCZJZp5nnd ljF6vXv0lxbEfy3wTVD8jCpGRrfgIDTyOAusFeW5P2vTDVf+/PdPj/+Jq03E4e1D2hwt HbzBL2AlNqB62lwxV1XCAJubei+Vb8IPQa+DCHE/NpRhxO6sIB2gn1HHrZdcUijIhxxF +w1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=aTorMEiRHT8NLl+3pBPXgFnLkR06FFeGhsio9VjSiA4=; b=lL3KomGyvpoUs09r+bL28asAlaYak46rjoaOLUeRli03C28fBz+v7D77RK2A3TtnBF eMHIrwC99qB4GGOcn6D2NxFnOO3sfodejsZVQLucUbPSBpQ9AOd5mIJFfHqg7rWTHuSU e8DOyNUjtRCBCixWULyDBd7TkaGRw2Iu9ElsCdRh9xBUOnDdpODg6CF3byFh8Ln4Z9Cv JEKiXlL3tbrxuBsr3pQndNwWK7nq76k8mHnYcvWvtLO4i8CkeENgqMfFvM+nbqiPBXSi AChMWOHdGm26lNOqPlVxeOXkh8z+iL+dyqfdlrLNAZlbHw4rzLAp1pBcjYNZYS9cLqJK o74Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q9-20020a170902a3c900b0018981c84015si1525951plb.10.2022.11.30.07.38.41; Wed, 30 Nov 2022 07:38:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229895AbiK3PdV (ORCPT <rfc822;heyuhang3455@gmail.com> + 99 others); Wed, 30 Nov 2022 10:33:21 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229653AbiK3PdT (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 30 Nov 2022 10:33:19 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5C6C660E1; Wed, 30 Nov 2022 07:33:16 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4DCFDD6E; Wed, 30 Nov 2022 07:33:22 -0800 (PST) Received: from e126311.arm.com (unknown [10.57.72.111]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2F3E73F67D; Wed, 30 Nov 2022 07:33:14 -0800 (PST) From: Kajetan Puchalski <kajetan.puchalski@arm.com> To: rafael@kernel.org Cc: daniel.lezcano@linaro.org, lukasz.luba@arm.com, Dietmar.Eggemann@arm.com, dsmythies@telus.net, yu.chen.surf@gmail.com, kajetan.puchalski@arm.com, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v5 0/2] cpuidle: teo: Introduce util-awareness Date: Wed, 30 Nov 2022 15:32:02 +0000 Message-Id: <20221130153204.2085591-1-kajetan.puchalski@arm.com> X-Mailer: git-send-email 2.37.1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750936043451595282?= X-GMAIL-MSGID: =?utf-8?q?1750936043451595282?= |
Series |
cpuidle: teo: Introduce util-awareness
|
|
Message
Kajetan Puchalski
Nov. 30, 2022, 3:32 p.m. UTC
Hi, At the moment, all the available idle governors operate mainly based on their own past correctness metrics along with timer events without taking into account any scheduling information. Especially on interactive systems, this results in them frequently selecting a deeper idle state and then waking up before its target residency is hit, thus leading to increased wakeup latency and lower performance with no power saving. For 'menu' while web browsing on Android for instance, those types of wakeups ('too deep') account for over 24% of all wakeups. At the same time, on some platforms C0 can be power efficient enough to warrant wanting to prefer it over C1. This is because the power usage of the two states can be so close that sufficient amounts of too deep C1 sleeps can completely offset the C1 power saving to the point where it would've been more power efficient to just use C0 instead. Sleeps that happened in C0 while they could have used C1 ('too shallow') only save less power than they otherwise could have. Too deep sleeps, on the other hand, harm performance and nullify the potential power saving from using C1 in the first place. While taking this into account, it is clear that on balance it is preferable for an idle governor to have more too shallow sleeps instead of more too deep sleeps on those kinds of platforms. Currently the best available governor under this metric is TEO which on average results in less than half the percentage of too deep sleeps compared to 'menu', getting much better wakeup latencies and increased performance in the process. This proposed optional extension to TEO would specifically tune it for minimising too deep sleeps and minimising latency to achieve better performance. To this end, before selecting the next idle state it uses the avg_util signal of a CPU's runqueue in order to determine to what extent the CPU is being utilized. This util value is then compared to a threshold defined as a percentage of the cpu's capacity (capacity >> 6 ie. ~1.5% in the current implementation). If the util is above the threshold, the idle state selected by TEO metrics will be reduced by 1, thus selecting a shallower state. If the util is below the threshold, the governor defaults to the TEO metrics mechanism to try to select the deepest available idle state based on the closest timer event and its own correctness. The main goal of this is to reduce latency and increase performance for some workloads. Under some workloads it will result in an increase in power usage (Geekbench 5) while for other workloads it will also result in a decrease in power usage compared to TEO (PCMark Web, Jankbench, Speedometer). As of v2 the patch includes a 'fast exit' path for arm-based and similar systems where only 2 idle states are present. If there's just 2 idle states and the CPU is utilized, we can directly select the shallowest state and save cycles by skipping the entire metrics mechanism. Under the current implementation, the state will not be reduced by 1 if the change would lead to selecting a polling state instead of a non-polling state. This approach can outperform all the other currently available governors, at least on mobile device workloads, which is why I think it is worth keeping as an option. There is no particular attachment or reliance on TEO for this mechanism, I simply chose to base it on TEO because it performs the best out of all the available options and I didn't think there was any point in reinventing the wheel on the side of computing governor metrics. If a better approach comes along at some point, there's no reason why the same idle aware mechanism couldn't be used with any other metrics algorithm. That would, however, require implemeting it as a separate governor rather than a TEO add-on. As for how the extension performs in practice, below I'll add some benchmark results I got while testing this patchset. All the benchmarks were run after holding the phone in the fridge for exactly an hour each time to minimise the impact of thermal issues. Pixel 6 (Android 12, mainline kernel 5.18, with newer mainline CFS patches): 1. Geekbench 5 (latency-sensitive, heavy load test) The values below are gmean values across 3 back to back iteration of Geekbench 5. As GB5 is a heavy benchmark, after more than 3 iterations intense throttling kicks in on mobile devices resulting in skewed benchmark scores, which makes it difficult to collect reliable results. The actual values for all of the governors can change between runs as the benchmark might be affected by factors other than just latency. Nevertheless, on the runs I've seen, util-aware TEO frequently achieved better scores than all the other governors. Benchmark scores +-----------------+-------------+---------+-------------+ | metric | kernel | value | perc_diff | |-----------------+-------------+---------+-------------| | multicore_score | menu | 2826.5 | 0.0% | | multicore_score | teo | 2764.8 | -2.18% | | multicore_score | teo_util_v3 | 2849 | 0.8% | | multicore_score | teo_util_v4 | 2865 | 1.36% | | score | menu | 1053 | 0.0% | | score | teo | 1050.7 | -0.22% | | score | teo_util_v3 | 1059.6 | 0.63% | | score | teo_util_v4 | 1057.6 | 0.44% | +-----------------+-------------+---------+-------------+ Idle misses The numbers are percentages of too deep and too shallow sleeps computed using the new trace event - cpu_idle_miss. The percentage is obtained by counting the two types of misses over the course of a run and then dividing them by the total number of wakeups in that run. +-------------+-------------+--------------+ | wa_path | type | count_perc | |-------------+-------------+--------------| | menu | too deep | 14.994% | | teo | too deep | 9.649% | | teo_util_v3 | too deep | 4.298% | | teo_util_v4 | too deep | 4.02 % | | menu | too shallow | 2.497% | | teo | too shallow | 5.963% | | teo_util_v3 | too shallow | 13.773% | | teo_util_v4 | too shallow | 14.598% | +-------------+-------------+--------------+ Power usage [mW] +--------------+----------+-------------+---------+-------------+ | chan_name | metric | kernel | value | perc_diff | |--------------+----------+-------------+---------+-------------| | total_power | gmean | menu | 2551.4 | 0.0% | | total_power | gmean | teo | 2606.8 | 2.17% | | total_power | gmean | teo_util_v3 | 2670.1 | 4.65% | | total_power | gmean | teo_util_v4 | 2722.3 | 6.7% | +--------------+----------+-------------+---------+-------------+ Task wakeup latency +-----------------+----------+-------------+-------------+-------------+ | comm | metric | kernel | value | perc_diff | |-----------------+----------+-------------+-------------+-------------| | AsyncTask #1 | gmean | menu | 78.16μs | 0.0% | | AsyncTask #1 | gmean | teo | 61.60μs | -21.19% | | AsyncTask #1 | gmean | teo_util_v3 | 74.34μs | -4.89% | | AsyncTask #1 | gmean | teo_util_v4 | 54.45μs | -30.34% | | labs.geekbench5 | gmean | menu | 88.55μs | 0.0% | | labs.geekbench5 | gmean | teo | 100.97μs | 14.02% | | labs.geekbench5 | gmean | teo_util_v3 | 53.57μs | -39.5% | | labs.geekbench5 | gmean | teo_util_v4 | 59.60μs | -32.7% | +-----------------+----------+-------------+-------------+-------------+ In case of this benchmark, the difference in latency does seem to translate into better scores. 2. PCMark Web Browsing (non latency-sensitive, normal usage web browsing test) The table below contains gmean values across 20 back to back iterations of PCMark 2 Web Browsing. Benchmark scores +----------------+-------------+---------+-------------+ | metric | kernel | value | perc_diff | |----------------+-------------+---------+-------------| | PcmaWebV2Score | menu | 5232 | 0.0% | | PcmaWebV2Score | teo | 5219.8 | -0.23% | | PcmaWebV2Score | teo_util_v3 | 5273.5 | 0.79% | | PcmaWebV2Score | teo_util_v4 | 5239.9 | 0.15% | +----------------+-------------+---------+-------------+ Idle misses +-------------+-------------+--------------+ | wa_path | type | count_perc | |-------------+-------------+--------------| | menu | too deep | 24.814% | | teo | too deep | 11.65% | | teo_util_v3 | too deep | 3.481% | | teo_util_v4 | too deep | 3.662% | | menu | too shallow | 3.101% | | teo | too shallow | 8.578% | | teo_util_v3 | too shallow | 18.326% | | teo_util_v4 | too shallow | 18.692% | +-------------+-------------+--------------+ Power usage [mW] +--------------+----------+-------------+---------+-------------+ | chan_name | metric | kernel | value | perc_diff | |--------------+----------+-------------+---------+-------------| | total_power | gmean | menu | 179.2 | 0.0% | | total_power | gmean | teo | 184.8 | 3.1% | | total_power | gmean | teo_util_v3 | 177.4 | -1.02% | | total_power | gmean | teo_util_v4 | 184.1 | 2.71% | +--------------+----------+-------------+---------+-------------+ Task wakeup latency +-----------------+----------+-------------+-------------+-------------+ | comm | metric | kernel | value | perc_diff | |-----------------+----------+-------------+-------------+-------------| | CrRendererMain | gmean | menu | 236.63μs | 0.0% | | CrRendererMain | gmean | teo | 201.85μs | -14.7% | | CrRendererMain | gmean | teo_util_v3 | 106.46μs | -55.01% | | CrRendererMain | gmean | teo_util_v4 | 106.72μs | -54.9% | | chmark:workload | gmean | menu | 100.30μs | 0.0% | | chmark:workload | gmean | teo | 80.20μs | -20.04% | | chmark:workload | gmean | teo_util_v3 | 65.88μs | -34.32% | | chmark:workload | gmean | teo_util_v4 | 57.90μs | -42.28% | | surfaceflinger | gmean | menu | 97.57μs | 0.0% | | surfaceflinger | gmean | teo | 98.86μs | 1.31% | | surfaceflinger | gmean | teo_util_v3 | 56.49μs | -42.1% | | surfaceflinger | gmean | teo_util_v4 | 72.68μs | -25.52% | +-----------------+----------+-------------+-------------+-------------+ In this case the large latency improvement does not translate into a notable increase in benchmark score as this particular benchmark mainly responds to changes in operating frequency. 3. Jankbench (locked 60hz screen) (normal usage UI test) Frame durations +---------------+------------------+---------+-------------+ | variable | kernel | value | perc_diff | |---------------+------------------+---------+-------------| | mean_duration | menu_60hz | 13.9 | 0.0% | | mean_duration | teo_60hz | 14.7 | 6.0% | | mean_duration | teo_util_v3_60hz | 13.8 | -0.87% | | mean_duration | teo_util_v4_60hz | 12.6 | -9.0% | +---------------+------------------+---------+-------------+ Jank percentage +------------+------------------+---------+-------------+ | variable | kernel | value | perc_diff | |------------+------------------+---------+-------------| | jank_perc | menu_60hz | 1.5 | 0.0% | | jank_perc | teo_60hz | 2.1 | 36.99% | | jank_perc | teo_util_v3_60hz | 1.3 | -13.95% | | jank_perc | teo_util_v4_60hz | 1.3 | -17.37% | +------------+------------------+---------+-------------+ Idle misses +------------------+-------------+--------------+ | wa_path | type | count_perc | |------------------+-------------+--------------| | menu_60hz | too deep | 26.00% | | teo_60hz | too deep | 11.00% | | teo_util_v3_60hz | too deep | 2.33% | | teo_util_v4_60hz | too deep | 2.54% | | menu_60hz | too shallow | 4.74% | | teo_60hz | too shallow | 11.89% | | teo_util_v3_60hz | too shallow | 21.78% | | teo_util_v4_60hz | too shallow | 21.93% | +------------------+-------------+--------------+ Power usage [mW] +--------------+------------------+---------+-------------+ | chan_name | kernel | value | perc_diff | |--------------+------------------+---------+-------------| | total_power | menu_60hz | 144.6 | 0.0% | | total_power | teo_60hz | 136.9 | -5.27% | | total_power | teo_util_v3_60hz | 134.2 | -7.19% | | total_power | teo_util_v4_60hz | 121.3 | -16.08% | +--------------+------------------+---------+-------------+ Task wakeup latency +-----------------+------------------+-------------+-------------+ | comm | kernel | value | perc_diff | |-----------------+------------------+-------------+-------------| | RenderThread | menu_60hz | 139.52μs | 0.0% | | RenderThread | teo_60hz | 116.51μs | -16.49% | | RenderThread | teo_util_v3_60hz | 86.76μs | -37.82% | | RenderThread | teo_util_v4_60hz | 91.11μs | -34.7% | | droid.benchmark | menu_60hz | 135.88μs | 0.0% | | droid.benchmark | teo_60hz | 105.21μs | -22.57% | | droid.benchmark | teo_util_v3_60hz | 83.92μs | -38.24% | | droid.benchmark | teo_util_v4_60hz | 83.18μs | -38.79% | | surfaceflinger | menu_60hz | 124.03μs | 0.0% | | surfaceflinger | teo_60hz | 151.90μs | 22.47% | | surfaceflinger | teo_util_v3_60hz | 100.19μs | -19.22% | | surfaceflinger | teo_util_v4_60hz | 87.65μs | -29.33% | +-----------------+------------------+-------------+-------------+ 4. Speedometer 2 (heavy load web browsing test) Benchmark scores +-------------------+-------------+---------+-------------+ | metric | kernel | value | perc_diff | |-------------------+-------------+---------+-------------| | Speedometer Score | menu | 102 | 0.0% | | Speedometer Score | teo | 104.9 | 2.88% | | Speedometer Score | teo_util_v3 | 102.1 | 0.16% | | Speedometer Score | teo_util_v4 | 103.8 | 1.83% | +-------------------+-------------+---------+-------------+ Idle misses +-------------+-------------+--------------+ | wa_path | type | count_perc | |-------------+-------------+--------------| | menu | too deep | 17.95% | | teo | too deep | 6.46% | | teo_util_v3 | too deep | 0.63% | | teo_util_v4 | too deep | 0.64% | | menu | too shallow | 3.86% | | teo | too shallow | 8.21% | | teo_util_v3 | too shallow | 14.72% | | teo_util_v4 | too shallow | 14.43% | +-------------+-------------+--------------+ Power usage [mW] +--------------+----------+-------------+---------+-------------+ | chan_name | metric | kernel | value | perc_diff | |--------------+----------+-------------+---------+-------------| | total_power | gmean | menu | 2059 | 0.0% | | total_power | gmean | teo | 2187.8 | 6.26% | | total_power | gmean | teo_util_v3 | 2212.9 | 7.47% | | total_power | gmean | teo_util_v4 | 2121.8 | 3.05% | +--------------+----------+-------------+---------+-------------+ Task wakeup latency +-----------------+----------+-------------+-------------+-------------+ | comm | metric | kernel | value | perc_diff | |-----------------+----------+-------------+-------------+-------------| | CrRendererMain | gmean | menu | 17.18μs | 0.0% | | CrRendererMain | gmean | teo | 16.18μs | -5.82% | | CrRendererMain | gmean | teo_util_v3 | 18.04μs | 5.05% | | CrRendererMain | gmean | teo_util_v4 | 18.25μs | 6.27% | | RenderThread | gmean | menu | 68.60μs | 0.0% | | RenderThread | gmean | teo | 48.44μs | -29.39% | | RenderThread | gmean | teo_util_v3 | 48.01μs | -30.02% | | RenderThread | gmean | teo_util_v4 | 51.24μs | -25.3% | | surfaceflinger | gmean | menu | 42.23μs | 0.0% | | surfaceflinger | gmean | teo | 29.84μs | -29.33% | | surfaceflinger | gmean | teo_util_v3 | 24.51μs | -41.95% | | surfaceflinger | gmean | teo_util_v4 | 29.64μs | -29.8% | +-----------------+----------+-------------+-------------+-------------+ At the very least this approach seems promising so I wanted to discuss it in RFC form first. Thank you for taking your time to read this! -- Kajetan v4 -> v5: - remove the restriction to only apply the mechanism for C1 candidate state - clarify some code comments, fix comment style - refactor the fast-exit path loop implementation - move some cover letter information into the commit description v3 -> v4: - remove the chunk of code skipping metrics updates when the CPU was utilized - include new test results and more benchmarks in the cover letter v2 -> v3: - add a patch adding an option to skip polling states in teo_find_shallower_state() - only reduce the state if the candidate state is C1 and C0 is not a polling state - add a check for polling states in the 2-states fast-exit path - remove the ifdefs and Kconfig option v1 -> v2: - rework the mechanism to reduce selected state by 1 instead of directly selecting C0 (suggested by Doug Smythies) - add a fast-exit path for systems with 2 idle states to not waste cycles on metrics when utilized - fix typos in comments - include a missing header Kajetan Puchalski (2): cpuidle: teo: Optionally skip polling states in teo_find_shallower_state() cpuidle: teo: Introduce util-awareness drivers/cpuidle/governors/teo.c | 93 +++++++++++++++++++++++++++++++-- 1 file changed, 89 insertions(+), 4 deletions(-)
Comments
On Wed, Nov 30, 2022 at 03:32:02PM +0000, Kajetan Puchalski wrote: Hi Rafael, As it's been a while since the last email I wanted to bump this thread and ask what you think about the last changes. Additionally, I got some emails from the kernel test robot and noticed that sched_cpu_util is contingent on CONFIG_SMP so in the current form there's build errors on !SMP machines. The following change should fix the problem, do you think it's all right to add? @@ -207,10 +207,17 @@ static DEFINE_PER_CPU(struct teo_cpu, teo_cpus); * @dev: Target CPU * @cpu_data: Governor CPU data for the target CPU */ +#ifdef CONFIG_SMP static void teo_get_util(struct cpuidle_device *dev, struct teo_cpu *cpu_data) { cpu_data->utilized = sched_cpu_util(dev->cpu) > cpu_data->util_threshold; } +#else +static void teo_get_util(struct cpuidle_device *dev, struct teo_cpu *cpu_data) +{ + cpu_data->utilized = false; +} +#endif Thanks in advance for your time, Kajetan > v4 -> v5: > - remove the restriction to only apply the mechanism for C1 candidate state > - clarify some code comments, fix comment style > - refactor the fast-exit path loop implementation > - move some cover letter information into the commit description > > v3 -> v4: > - remove the chunk of code skipping metrics updates when the CPU was utilized > - include new test results and more benchmarks in the cover letter > > v2 -> v3: > - add a patch adding an option to skip polling states in teo_find_shallower_state() > - only reduce the state if the candidate state is C1 and C0 is not a polling state > - add a check for polling states in the 2-states fast-exit path > - remove the ifdefs and Kconfig option > > v1 -> v2: > - rework the mechanism to reduce selected state by 1 instead of directly selecting C0 (suggested by Doug Smythies) > - add a fast-exit path for systems with 2 idle states to not waste cycles on metrics when utilized > - fix typos in comments > - include a missing header > > Kajetan Puchalski (2): > cpuidle: teo: Optionally skip polling states in teo_find_shallower_state() > cpuidle: teo: Introduce util-awareness > > drivers/cpuidle/governors/teo.c | 93 +++++++++++++++++++++++++++++++-- > 1 file changed, 89 insertions(+), 4 deletions(-) > > -- > 2.37.1 >
On Tue, Jan 3, 2023 at 3:22 PM Kajetan Puchalski <kajetan.puchalski@arm.com> wrote: > > On Wed, Nov 30, 2022 at 03:32:02PM +0000, Kajetan Puchalski wrote: > > Hi Rafael, > > As it's been a while since the last email I wanted to bump this thread > and ask what you think about the last changes. Right, I'll send my comments on the last version of the patch separately. > Additionally, I got some emails from the kernel test robot and noticed > that sched_cpu_util is contingent on CONFIG_SMP so in the current form > there's build errors on !SMP machines. > > The following change should fix the problem, do you think it's all right to add? > > @@ -207,10 +207,17 @@ static DEFINE_PER_CPU(struct teo_cpu, teo_cpus); > * @dev: Target CPU > * @cpu_data: Governor CPU data for the target CPU > */ > +#ifdef CONFIG_SMP > static void teo_get_util(struct cpuidle_device *dev, struct teo_cpu *cpu_data) > { > cpu_data->utilized = sched_cpu_util(dev->cpu) > cpu_data->util_threshold; > } > +#else > +static void teo_get_util(struct cpuidle_device *dev, struct teo_cpu *cpu_data) > +{ > + cpu_data->utilized = false; > +} > +#endif > IMV it would be better to use teo_cpu_is_utilized() that would be called to update cpu_data->utilized this way cpu_data->utilized = teo_cpu_is_utilized(dev->cpu, cpu_data->util_threshold); and define it as an empty stab in the !SMP case.