Message ID | 006901d9be8c$f4439930$dccacb90$@telus.net |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9010:0:b0:3e4:2afc:c1 with SMTP id l16csp2149969vqg; Mon, 24 Jul 2023 17:54:23 -0700 (PDT) X-Google-Smtp-Source: APBJJlEzGnI4/HjyI/g5kJnJFSGewsV5m6flmnCsZ8tViUeOA/pGb+fcsZVqgSGmHNNN7SM6lF9Z X-Received: by 2002:a05:6a20:1058:b0:133:d1b:a880 with SMTP id gt24-20020a056a20105800b001330d1ba880mr11894458pzc.23.1690246463175; Mon, 24 Jul 2023 17:54:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690246463; cv=none; d=google.com; s=arc-20160816; b=D4GAH6Qn3O2nNiQjxPCVYmmCo54X8BKqVIrhY+kY5nnG5hoa7qQesmoFV5ETvMyeRl kbVoMwknerjksZwr06sXoURBf2lVqslYndACwmerALpu0uqTk5cayc+Mtghz3Piw4/My NvHs1RidIXeckZWoqdoQMFjoMoJQafVBsmcZVzMNHXhxbGS7M+OUZtueCgvnJ+8rJJHC RBOhpjJY2MINRsg/vuIrID5kZVD8nrx1V2hR9zOzOSYxE6RY9hWRmbFaWn50iK4eixaT RLOMtxdmbJ8r3YehQkPjLBvVkAW2fARjAhZaWuYypBXuxqyoH32t0bZ8uZAmfoW8rbp7 wGAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:thread-index:content-language :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dkim-signature; bh=HXCdkgf7SAEt55PoPjve0NuVJWNgzHzNEzNEpVtcDy8=; fh=Q1eOWrzpB352HgEGy9kXY7de9YJ1qebqDJRgjSaiFic=; b=j5zQljnAP3vgiXJmTFGjoPgw1+45Q2g7e6bAwdMR88Ouc+FEFfsVlDwijy9AXHpMgy OlRnpr1/rO+0+TtKHeo4MFuESjYJOTInEbp+JqNhStDTCoes8jIRq4cJlTbGsB6+F39L RbSoWRyKYTsAlqAdWgSLoURwj1KL+gMI18rLYG7Na8N/MAK2yr59S4KTI7F6Ebm+sc5w Bs3o+tapu1wn8yrLlsMnt3ZlPxx0xE7PN3pRMxCW3+x+zaG+7STMwN7B2XwDY1oV01DY xLZh48krFBL5JvRqGsUHcFL0NhUzmSHV/VvNEQbteduZhNtS/G1vX+bL8Xr0ZeLi/HYn AYDg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@telus.net header.s=google header.b=Xkk3+Dit; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=telus.net Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id dc11-20020a056a0035cb00b00682399fa4f9si10131209pfb.300.2023.07.24.17.54.10; Mon, 24 Jul 2023 17:54:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@telus.net header.s=google header.b=Xkk3+Dit; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=telus.net Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230216AbjGYAOT (ORCPT <rfc822;kautuk.consul.80@gmail.com> + 99 others); Mon, 24 Jul 2023 20:14:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55182 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229459AbjGYAOR (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 24 Jul 2023 20:14:17 -0400 Received: from mail-pg1-x535.google.com (mail-pg1-x535.google.com [IPv6:2607:f8b0:4864:20::535]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C46A115 for <linux-kernel@vger.kernel.org>; Mon, 24 Jul 2023 17:14:16 -0700 (PDT) Received: by mail-pg1-x535.google.com with SMTP id 41be03b00d2f7-53fbf2c42bfso3517323a12.3 for <linux-kernel@vger.kernel.org>; Mon, 24 Jul 2023 17:14:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=telus.net; s=google; t=1690244056; x=1690848856; h=thread-index:content-language:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=HXCdkgf7SAEt55PoPjve0NuVJWNgzHzNEzNEpVtcDy8=; b=Xkk3+DitHCobWhmL7FgLnVUVf+4ri2nDizSxVAnxlAGu/beEIriveW3rHd1jvntGbY J6qSmU7ajAB0k+tAIjHg+ted4HDVD9PELirtHsGs+6VATp0NIitVzDGbxiIlyXUR32rA pbSnND1wq6RHJ95U11bcGlGqLBT9CX5SUtPJ9vHoJPVqXSv+CW0MCDSKUHclVBsrjHxZ XrbovZpLF2RNQ0yiG2pNZ2R3XhsoAQXIQpVP2Sk5diL1F0Kw21urZK9sB8/W6vD06J6j ZbB+Zw+40yK9GW+d0c8MszAb5ofiHShklduaEB9eKBnLY4MDeTSn8ZFChYz0gK4K0UR3 PU7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690244056; x=1690848856; h=thread-index:content-language:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=HXCdkgf7SAEt55PoPjve0NuVJWNgzHzNEzNEpVtcDy8=; b=eSyTc/aRM5Tej5LdVnzHRbYB1SaPX60nJckmmT5VoF/ojlD6ilt4OqHRruzR6j1HrU 2YKrzKaynAj2uq2k2CX8I6EbE+QLv7g6IiWS3kbg06mGK+4Z5V5YwJhlfJVgQHJw0wob NfAu+Pif6vsLDVBrtNX6q+Llq5bPJY7H/g3SBNo+sG98sIc2PgRKmEWUYMidxZwnnjIz lGinB4//kPGIw6fCWmGoJHc9QyPE1kT7QKsLP3SiL5Sa3FIV/kQSVrCfEUtZzHf2dje5 9p4Y0QABNI51Dm2Knx4i8c+ljCcsyKTPag8IYFcME3fKUKCHjkA8i+kF1BEA5N5r/vic WDJQ== X-Gm-Message-State: ABy/qLbmyffkmCw+C8/IPKGqHqQnVpjR8VWQH3Ttyq05cyiXDQbeX2u8 duB9v1oXfGtv9PlJlr9FWsBBZA== X-Received: by 2002:a17:90a:2a4b:b0:267:f2b4:8ac4 with SMTP id d11-20020a17090a2a4b00b00267f2b48ac4mr9094176pjg.31.1690244055778; Mon, 24 Jul 2023 17:14:15 -0700 (PDT) Received: from DougS18 (s66-183-142-209.bc.hsia.telus.net. [66.183.142.209]) by smtp.gmail.com with ESMTPSA id g1-20020a17090a828100b0025c2c398d33sm6955023pjn.39.2023.07.24.17.14.14 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Jul 2023 17:14:15 -0700 (PDT) From: "Doug Smythies" <dsmythies@telus.net> To: "'Thomas Gleixner'" <tglx@linutronix.de>, <linux-pm@vger.kernel.org>, "'the arch/x86 maintainers'" <x86@kernel.org>, "'Rafael J. Wysocki'" <rafael@kernel.org> Cc: "Doug Smythies" <dsmythies@telus.net>, <yang.jie@linux.intel.com>, <linux-kernel@vger.kernel.org>, "'Eric Dumazet'" <edumazet@google.com>, "'Paul E. McKenney'" <paulmck@kernel.org>, "'Viresh Kumar'" <viresh.kumar@linaro.org> Subject: [PATCH] x86/aperfmperf: Make stale CPU frequency response within limits. Date: Mon, 24 Jul 2023 17:14:18 -0700 Message-ID: <006901d9be8c$f4439930$dccacb90$@telus.net> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Outlook 16.0 Content-Language: en-ca Thread-Index: Adm+i9OXa49HoOhhSw21FXVBGCqHEA== X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772351875674641055 X-GMAIL-MSGID: 1772351875674641055 |
Series |
x86/aperfmperf: Make stale CPU frequency response within limits.
|
|
Commit Message
Doug Smythies
July 25, 2023, 12:14 a.m. UTC
Currently, when the CPU frequency is stale the nominal clock frequency
is returned by calls to arch_freq_get_on_cpu(). Some users are
confused by the high reported frequency when their system is idle
and/or it is above a reduced maximum they set.
This patch will return the policy minimum as the stale frequency reply
from arch_freq_get_on_cpu().
Reported-by: Yang Jie <yang.jie@linux.intel.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217597
Signed-off-by: Doug Smythies <dsmythies@telus.net>
---
arch/x86/kernel/cpu/aperfmperf.c | 13 +++++--------
drivers/cpufreq/cpufreq.c | 18 ++++++++++++++++++
include/linux/cpufreq.h | 5 +++++
3 files changed, 28 insertions(+), 8 deletions(-)
Comments
On Tue, Jul 25, 2023 at 2:14 AM Doug Smythies <dsmythies@telus.net> wrote: > > Currently, when the CPU frequency is stale the nominal clock frequency > is returned by calls to arch_freq_get_on_cpu(). Some users are > confused by the high reported frequency when their system is idle > and/or it is above a reduced maximum they set. > > This patch will return the policy minimum as the stale frequency reply > from arch_freq_get_on_cpu(). > > Reported-by: Yang Jie <yang.jie@linux.intel.com> > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217597 > Signed-off-by: Doug Smythies <dsmythies@telus.net> > --- > arch/x86/kernel/cpu/aperfmperf.c | 13 +++++-------- > drivers/cpufreq/cpufreq.c | 18 ++++++++++++++++++ > include/linux/cpufreq.h | 5 +++++ > 3 files changed, 28 insertions(+), 8 deletions(-) > > diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c > index fdbb5f07448f..44cc96864d94 100644 > --- a/arch/x86/kernel/cpu/aperfmperf.c > +++ b/arch/x86/kernel/cpu/aperfmperf.c > @@ -418,9 +418,10 @@ unsigned int arch_freq_get_on_cpu(int cpu) > unsigned long last; > u64 acnt, mcnt; > > - if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF)) > - goto fallback; > - > + if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF)){ > + freq = cpufreq_quick_get(cpu); > + return freq ? freq : cpufreq_quick_get_min(cpu); > + } > do { > seq = raw_read_seqcount_begin(&s->seq); > last = s->last_update; > @@ -433,13 +434,9 @@ unsigned int arch_freq_get_on_cpu(int cpu) > * which covers idle and NOHZ full CPUs. > */ > if (!mcnt || (jiffies - last) > MAX_SAMPLE_AGE) > - goto fallback; > + return cpufreq_quick_get_min(cpu); > > return div64_u64((cpu_khz * acnt), mcnt); > - > -fallback: > - freq = cpufreq_quick_get(cpu); > - return freq ? freq : cpu_khz; It looks to me like modifying cpufreq_quick_get) to return policy->min if policy->cur is 0 would work in a similar way, wouldn't it? > } > > static int __init bp_init_aperfmperf(void) > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > index 50bbc969ffe5..a0b24f61a5b0 100644 > --- a/drivers/cpufreq/cpufreq.c > +++ b/drivers/cpufreq/cpufreq.c > @@ -1796,6 +1796,24 @@ unsigned int cpufreq_quick_get_max(unsigned int cpu) > } > EXPORT_SYMBOL(cpufreq_quick_get_max); > > +/** > + * cpufreq_quick_get_min - return the min frequency for a given CPU > + * @cpu: CPU number > + */ > +unsigned int cpufreq_quick_get_min(unsigned int cpu) > +{ > + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu); > + unsigned int ret_freq = 0; > + > + if (policy) { > + ret_freq = policy->min; > + cpufreq_cpu_put(policy); > + } > + > + return ret_freq; > +} > +EXPORT_SYMBOL(cpufreq_quick_get_min); > + > /** > * cpufreq_get_hw_max_freq - get the max hardware frequency of the CPU > * @cpu: CPU number > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h > index 172ff51c1b2a..c74dcb5f9263 100644 > --- a/include/linux/cpufreq.h > +++ b/include/linux/cpufreq.h > @@ -220,6 +220,7 @@ static inline bool policy_is_shared(struct cpufreq_policy *policy) > unsigned int cpufreq_get(unsigned int cpu); > unsigned int cpufreq_quick_get(unsigned int cpu); > unsigned int cpufreq_quick_get_max(unsigned int cpu); > +unsigned int cpufreq_quick_get_min(unsigned int cpu); > unsigned int cpufreq_get_hw_max_freq(unsigned int cpu); > void disable_cpufreq(void); > > @@ -250,6 +251,10 @@ static inline unsigned int cpufreq_quick_get_max(unsigned int cpu) > { > return 0; > } > +static inline unsigned int cpufreq_quick_get_min(unsigned int cpu) > +{ > + return 0; > +} > static inline unsigned int cpufreq_get_hw_max_freq(unsigned int cpu) > { > return 0; > -- > 2.25.1 > >
Hi Rafael, On Tue, Jul 25, 2023 at 11:31 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > On Tue, Jul 25, 2023 at 2:14 AM Doug Smythies <dsmythies@telus.net> wrote: > > > > Currently, when the CPU frequency is stale the nominal clock frequency > > is returned by calls to arch_freq_get_on_cpu(). Some users are > > confused by the high reported frequency when their system is idle > > and/or it is above a reduced maximum they set. > > > > This patch will return the policy minimum as the stale frequency reply > > from arch_freq_get_on_cpu(). > > > > Reported-by: Yang Jie <yang.jie@linux.intel.com> > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217597 > > Signed-off-by: Doug Smythies <dsmythies@telus.net> > > --- > > arch/x86/kernel/cpu/aperfmperf.c | 13 +++++-------- > > drivers/cpufreq/cpufreq.c | 18 ++++++++++++++++++ > > include/linux/cpufreq.h | 5 +++++ > > 3 files changed, 28 insertions(+), 8 deletions(-) > > > > diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c > > index fdbb5f07448f..44cc96864d94 100644 > > --- a/arch/x86/kernel/cpu/aperfmperf.c > > +++ b/arch/x86/kernel/cpu/aperfmperf.c > > @@ -418,9 +418,10 @@ unsigned int arch_freq_get_on_cpu(int cpu) > > unsigned long last; > > u64 acnt, mcnt; > > > > - if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF)) > > - goto fallback; > > - > > + if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF)){ > > + freq = cpufreq_quick_get(cpu); > > + return freq ? freq : cpufreq_quick_get_min(cpu); > > + } > > do { > > seq = raw_read_seqcount_begin(&s->seq); > > last = s->last_update; > > @@ -433,13 +434,9 @@ unsigned int arch_freq_get_on_cpu(int cpu) > > * which covers idle and NOHZ full CPUs. > > */ > > if (!mcnt || (jiffies - last) > MAX_SAMPLE_AGE) > > - goto fallback; > > + return cpufreq_quick_get_min(cpu); > > > > return div64_u64((cpu_khz * acnt), mcnt); > > - > > -fallback: > > - freq = cpufreq_quick_get(cpu); > > - return freq ? freq : cpu_khz; > > It looks to me like modifying cpufreq_quick_get) to return policy->min > if policy->cur is 0 would work in a similar way, wouldn't it? For the configuration of intel_cpufreq driver (intel_pstate in passive mode), schedutil governor, HWP enabled, for a stale frequency policy->cur is not 0 and will always be whatever the min value was when the driver was initialized, regardless of what has been set since. The patch I submitted deals with that situation also. A complete list of driver/governor/HWP stale frequency replies can be found on the bugzilla report at: https://bugzilla.kernel.org/attachment.cgi?id=304694 There might be push back on some of the performance governor stale frequency replies. I could not figure out a performance governor dependant reply. Also there are other callers to cpufreq_quick_get and I was not sure I could mess with the function response for them. For example drivers/devfreq/tegra30-devfreq.c and drivers/thermal/cpufreq_cooling.c and drivers/powercap/dtpm_cpu.c > > > } > > > > static int __init bp_init_aperfmperf(void) > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c > > index 50bbc969ffe5..a0b24f61a5b0 100644 > > --- a/drivers/cpufreq/cpufreq.c > > +++ b/drivers/cpufreq/cpufreq.c > > @@ -1796,6 +1796,24 @@ unsigned int cpufreq_quick_get_max(unsigned int cpu) > > } > > EXPORT_SYMBOL(cpufreq_quick_get_max); > > > > +/** > > + * cpufreq_quick_get_min - return the min frequency for a given CPU > > + * @cpu: CPU number > > + */ > > +unsigned int cpufreq_quick_get_min(unsigned int cpu) > > +{ > > + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu); > > + unsigned int ret_freq = 0; > > + > > + if (policy) { > > + ret_freq = policy->min; > > + cpufreq_cpu_put(policy); > > + } > > + > > + return ret_freq; > > +} > > +EXPORT_SYMBOL(cpufreq_quick_get_min); > > + > > /** > > * cpufreq_get_hw_max_freq - get the max hardware frequency of the CPU > > * @cpu: CPU number > > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h > > index 172ff51c1b2a..c74dcb5f9263 100644 > > --- a/include/linux/cpufreq.h > > +++ b/include/linux/cpufreq.h > > @@ -220,6 +220,7 @@ static inline bool policy_is_shared(struct cpufreq_policy *policy) > > unsigned int cpufreq_get(unsigned int cpu); > > unsigned int cpufreq_quick_get(unsigned int cpu); > > unsigned int cpufreq_quick_get_max(unsigned int cpu); > > +unsigned int cpufreq_quick_get_min(unsigned int cpu); > > unsigned int cpufreq_get_hw_max_freq(unsigned int cpu); > > void disable_cpufreq(void); > > > > @@ -250,6 +251,10 @@ static inline unsigned int cpufreq_quick_get_max(unsigned int cpu) > > { > > return 0; > > } > > +static inline unsigned int cpufreq_quick_get_min(unsigned int cpu) > > +{ > > + return 0; > > +} > > static inline unsigned int cpufreq_get_hw_max_freq(unsigned int cpu) > > { > > return 0; > > -- > > 2.25.1 > > > >
Hi Doug, On Tue, Jul 25, 2023 at 9:12 PM Doug Smythies <dsmythies@telus.net> wrote: > > Hi Rafael, > > On Tue, Jul 25, 2023 at 11:31 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Tue, Jul 25, 2023 at 2:14 AM Doug Smythies <dsmythies@telus.net> wrote: > > > > > > Currently, when the CPU frequency is stale the nominal clock frequency > > > is returned by calls to arch_freq_get_on_cpu(). Some users are > > > confused by the high reported frequency when their system is idle > > > and/or it is above a reduced maximum they set. > > > > > > This patch will return the policy minimum as the stale frequency reply > > > from arch_freq_get_on_cpu(). > > > > > > Reported-by: Yang Jie <yang.jie@linux.intel.com> > > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217597 > > > Signed-off-by: Doug Smythies <dsmythies@telus.net> > > > --- > > > arch/x86/kernel/cpu/aperfmperf.c | 13 +++++-------- > > > drivers/cpufreq/cpufreq.c | 18 ++++++++++++++++++ > > > include/linux/cpufreq.h | 5 +++++ > > > 3 files changed, 28 insertions(+), 8 deletions(-) > > > > > > diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c > > > index fdbb5f07448f..44cc96864d94 100644 > > > --- a/arch/x86/kernel/cpu/aperfmperf.c > > > +++ b/arch/x86/kernel/cpu/aperfmperf.c > > > @@ -418,9 +418,10 @@ unsigned int arch_freq_get_on_cpu(int cpu) > > > unsigned long last; > > > u64 acnt, mcnt; > > > > > > - if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF)) > > > - goto fallback; > > > - > > > + if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF)){ > > > + freq = cpufreq_quick_get(cpu); > > > + return freq ? freq : cpufreq_quick_get_min(cpu); > > > + } > > > do { > > > seq = raw_read_seqcount_begin(&s->seq); > > > last = s->last_update; > > > @@ -433,13 +434,9 @@ unsigned int arch_freq_get_on_cpu(int cpu) > > > * which covers idle and NOHZ full CPUs. > > > */ > > > if (!mcnt || (jiffies - last) > MAX_SAMPLE_AGE) > > > - goto fallback; > > > + return cpufreq_quick_get_min(cpu); > > > > > > return div64_u64((cpu_khz * acnt), mcnt); > > > - > > > -fallback: > > > - freq = cpufreq_quick_get(cpu); > > > - return freq ? freq : cpu_khz; > > > > It looks to me like modifying cpufreq_quick_get) to return policy->min > > if policy->cur is 0 would work in a similar way, wouldn't it? > > For the configuration of intel_cpufreq driver (intel_pstate in > passive mode), schedutil governor, HWP enabled, for > a stale frequency policy->cur is not 0 and will always > be whatever the min value was when the driver was initialized, > regardless of what has been set since. So I would prefer to address this in the intel_pstate driver than to work around it in the core. > The patch I submitted deals with that situation also. > > A complete list of driver/governor/HWP stale frequency > replies can be found on the bugzilla report at: > > https://bugzilla.kernel.org/attachment.cgi?id=304694 > > There might be push back on some of the performance > governor stale frequency replies. I could not figure out > a performance governor dependant reply. > > Also there are other callers to cpufreq_quick_get > and I was not sure I could mess with the function > response for them. For example > drivers/devfreq/tegra30-devfreq.c > and drivers/thermal/cpufreq_cooling.c > and drivers/powercap/dtpm_cpu.c IIUC, all of the above rely on policy->cur being nonzero. There are other users doing questionable things when cpufreq_quick_get() returns 0 that I think would be better off if the min is returned instead.
Hi Rafael, On Wed, Jul 26, 2023 at 7:43 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > On Tue, Jul 25, 2023 at 9:12 PM Doug Smythies <dsmythies@telus.net> wrote: > > On Tue, Jul 25, 2023 at 11:31 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > On Tue, Jul 25, 2023 at 2:14 AM Doug Smythies <dsmythies@telus.net> wrote: > > > > > > > > Currently, when the CPU frequency is stale the nominal clock frequency > > > > is returned by calls to arch_freq_get_on_cpu(). Some users are > > > > confused by the high reported frequency when their system is idle > > > > and/or it is above a reduced maximum they set. > > > > > > > > This patch will return the policy minimum as the stale frequency reply > > > > from arch_freq_get_on_cpu(). > > > > > > > > Reported-by: Yang Jie <yang.jie@linux.intel.com> > > > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217597 > > > > Signed-off-by: Doug Smythies <dsmythies@telus.net> > > > > --- > > > > arch/x86/kernel/cpu/aperfmperf.c | 13 +++++-------- > > > > drivers/cpufreq/cpufreq.c | 18 ++++++++++++++++++ > > > > include/linux/cpufreq.h | 5 +++++ > > > > 3 files changed, 28 insertions(+), 8 deletions(-) > > > > > > > > diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c > > > > index fdbb5f07448f..44cc96864d94 100644 > > > > --- a/arch/x86/kernel/cpu/aperfmperf.c > > > > +++ b/arch/x86/kernel/cpu/aperfmperf.c > > > > @@ -418,9 +418,10 @@ unsigned int arch_freq_get_on_cpu(int cpu) > > > > unsigned long last; > > > > u64 acnt, mcnt; > > > > > > > > - if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF)) > > > > - goto fallback; > > > > - > > > > + if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF)){ > > > > + freq = cpufreq_quick_get(cpu); > > > > + return freq ? freq : cpufreq_quick_get_min(cpu); > > > > + } > > > > do { > > > > seq = raw_read_seqcount_begin(&s->seq); > > > > last = s->last_update; > > > > @@ -433,13 +434,9 @@ unsigned int arch_freq_get_on_cpu(int cpu) > > > > * which covers idle and NOHZ full CPUs. > > > > */ > > > > if (!mcnt || (jiffies - last) > MAX_SAMPLE_AGE) > > > > - goto fallback; > > > > + return cpufreq_quick_get_min(cpu); > > > > > > > > return div64_u64((cpu_khz * acnt), mcnt); > > > > - > > > > -fallback: > > > > - freq = cpufreq_quick_get(cpu); > > > > - return freq ? freq : cpu_khz; > > > > > > It looks to me like modifying cpufreq_quick_get) to return policy->min > > > if policy->cur is 0 would work in a similar way, wouldn't it? > > > > For the configuration of intel_cpufreq driver (intel_pstate in > > passive mode), schedutil governor, HWP enabled, for > > a stale frequency policy->cur is not 0 and will always > > be whatever the min value was when the driver was initialized, > > regardless of what has been set since. > > So I would prefer to address this in the intel_pstate driver than to > work around it in the core. Okay, but I would need some help with it. I already tried to figure out a fix before starting this thread, and have tried again since your comment. I haven't been able to figure it out. An example of the issue: Use the ondemand governor and set some minimum and also put a load on CPU 5 such that the governor asks for a non-min and non-max pstate. Then switch to the schedutil governor, and terminate the load on CPU 5, and look at CPU frequencies: $ grep . /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy10/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy11/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy1/scaling_cur_freq:4799871 /sys/devices/system/cpu/cpufreq/policy2/scaling_cur_freq:4800027 /sys/devices/system/cpu/cpufreq/policy3/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy4/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy5/scaling_cur_freq:1300000 /sys/devices/system/cpu/cpufreq/policy6/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy7/scaling_cur_freq:4800736 /sys/devices/system/cpu/cpufreq/policy8/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy9/scaling_cur_freq:1000000 non stale frequencies are identified by non round numbers. observe that CPU 5 still indicates pstate 13. observe the other stale frequencies are the pstate 10 min that I set when the governor was ondemand. Now change the minimum to 1.1 GHz and check it: $ grep . /sys/devices/system/cpu/cpufreq/policy*/scaling_min_freq /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy10/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy11/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy1/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy2/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy3/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy4/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy5/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy6/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy7/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy8/scaling_min_freq:1100000 /sys/devices/system/cpu/cpufreq/policy9/scaling_min_freq:1100000 and look at current again: $ grep . /sys/devices/system/cpu/cpufreq/policy*/scaling_cur_freq /sys/devices/system/cpu/cpufreq/policy0/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy10/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy11/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy1/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy2/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy3/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy4/scaling_cur_freq:1000000 /sys/devices/system/cpu/cpufreq/policy5/scaling_cur_freq:1300000 /sys/devices/system/cpu/cpufreq/policy6/scaling_cur_freq:4800585 /sys/devices/system/cpu/cpufreq/policy7/scaling_cur_freq:4800177 /sys/devices/system/cpu/cpufreq/policy8/scaling_cur_freq:4799992 /sys/devices/system/cpu/cpufreq/policy9/scaling_cur_freq:4800015 Observe the stale frequencies are unchanged and outside of the range limits. > > The patch I submitted deals with that situation also. > > > > A complete list of driver/governor/HWP stale frequency > > replies can be found on the bugzilla report at: > > > > https://bugzilla.kernel.org/attachment.cgi?id=304694 > > > > There might be push back on some of the performance > > governor stale frequency replies. I could not figure out > > a performance governor dependant reply. > > > > Also there are other callers to cpufreq_quick_get > > and I was not sure I could mess with the function > > response for them. For example > > drivers/devfreq/tegra30-devfreq.c > > and drivers/thermal/cpufreq_cooling.c > > and drivers/powercap/dtpm_cpu.c > > IIUC, all of the above rely on policy->cur being nonzero. > > There are other users doing questionable things when > cpufreq_quick_get() returns 0 that I think would be better off if the > min is returned instead. Okay, I'll submit a new patch shortly, with this: diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 50bbc969ffe5..4e91169a83f5 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1768,7 +1768,7 @@ unsigned int cpufreq_quick_get(unsigned int cpu) policy = cpufreq_cpu_get(cpu); if (policy) { - ret_freq = policy->cur; + ret_freq = policy->cur ? policy->cur : policy->min; cpufreq_cpu_put(policy); } The testing results are in the bugzilla report here: https://bugzilla.kernel.org/attachment.cgi?id=304734 ... Doug
diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c index fdbb5f07448f..44cc96864d94 100644 --- a/arch/x86/kernel/cpu/aperfmperf.c +++ b/arch/x86/kernel/cpu/aperfmperf.c @@ -418,9 +418,10 @@ unsigned int arch_freq_get_on_cpu(int cpu) unsigned long last; u64 acnt, mcnt; - if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF)) - goto fallback; - + if (!cpu_feature_enabled(X86_FEATURE_APERFMPERF)){ + freq = cpufreq_quick_get(cpu); + return freq ? freq : cpufreq_quick_get_min(cpu); + } do { seq = raw_read_seqcount_begin(&s->seq); last = s->last_update; @@ -433,13 +434,9 @@ unsigned int arch_freq_get_on_cpu(int cpu) * which covers idle and NOHZ full CPUs. */ if (!mcnt || (jiffies - last) > MAX_SAMPLE_AGE) - goto fallback; + return cpufreq_quick_get_min(cpu); return div64_u64((cpu_khz * acnt), mcnt); - -fallback: - freq = cpufreq_quick_get(cpu); - return freq ? freq : cpu_khz; } static int __init bp_init_aperfmperf(void) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 50bbc969ffe5..a0b24f61a5b0 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1796,6 +1796,24 @@ unsigned int cpufreq_quick_get_max(unsigned int cpu) } EXPORT_SYMBOL(cpufreq_quick_get_max); +/** + * cpufreq_quick_get_min - return the min frequency for a given CPU + * @cpu: CPU number + */ +unsigned int cpufreq_quick_get_min(unsigned int cpu) +{ + struct cpufreq_policy *policy = cpufreq_cpu_get(cpu); + unsigned int ret_freq = 0; + + if (policy) { + ret_freq = policy->min; + cpufreq_cpu_put(policy); + } + + return ret_freq; +} +EXPORT_SYMBOL(cpufreq_quick_get_min); + /** * cpufreq_get_hw_max_freq - get the max hardware frequency of the CPU * @cpu: CPU number diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 172ff51c1b2a..c74dcb5f9263 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -220,6 +220,7 @@ static inline bool policy_is_shared(struct cpufreq_policy *policy) unsigned int cpufreq_get(unsigned int cpu); unsigned int cpufreq_quick_get(unsigned int cpu); unsigned int cpufreq_quick_get_max(unsigned int cpu); +unsigned int cpufreq_quick_get_min(unsigned int cpu); unsigned int cpufreq_get_hw_max_freq(unsigned int cpu); void disable_cpufreq(void); @@ -250,6 +251,10 @@ static inline unsigned int cpufreq_quick_get_max(unsigned int cpu) { return 0; } +static inline unsigned int cpufreq_quick_get_min(unsigned int cpu) +{ + return 0; +} static inline unsigned int cpufreq_get_hw_max_freq(unsigned int cpu) { return 0;