[v3,1/5] cpufreq: Add a cpufreq pressure feedback for the scheduler
Commit Message
Provide to the scheduler a feedback about the temporary max available
capacity. Unlike arch_update_thermal_pressure, this doesn't need to be
filtered as the pressure will happen for dozens ms or more.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
drivers/cpufreq/cpufreq.c | 36 ++++++++++++++++++++++++++++++++++++
include/linux/cpufreq.h | 10 ++++++++++
2 files changed, 46 insertions(+)
Comments
On 08/01/2024 14:48, Vincent Guittot wrote:
> Provide to the scheduler a feedback about the temporary max available
> capacity. Unlike arch_update_thermal_pressure, this doesn't need to be
> filtered as the pressure will happen for dozens ms or more.
Is this then related to the 'medium pace system pressure' you mentioned
in your OSPM '23 talk?
>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> drivers/cpufreq/cpufreq.c | 36 ++++++++++++++++++++++++++++++++++++
> include/linux/cpufreq.h | 10 ++++++++++
> 2 files changed, 46 insertions(+)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 44db4f59c4cc..fa2e2ea26f7f 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -2563,6 +2563,40 @@ int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu)
> }
> EXPORT_SYMBOL(cpufreq_get_policy);
>
> +DEFINE_PER_CPU(unsigned long, cpufreq_pressure);
> +
> +/**
> + * cpufreq_update_pressure() - Update cpufreq pressure for CPUs
> + * @policy: cpufreq policy of the CPUs.
> + *
> + * Update the value of cpufreq pressure for all @cpus in the policy.
> + */
> +static void cpufreq_update_pressure(struct cpufreq_policy *policy)
> +{
> + unsigned long max_capacity, capped_freq, pressure;
> + u32 max_freq;
> + int cpu;
> +
> + /*
> + * Handle properly the boost frequencies, which should simply clean
> + * the thermal pressure value.
^^^^^^^
IMHO, this is a copy & paste error from topology_update_thermal_pressure()?
> + */
> + if (max_freq <= capped_freq) {
max_freq seems to be uninitialized.
> + pressure = 0;
Is this x86 (turbo boost) specific? IMHO at arm we follow this max freq
(including boost) relates to 1024 in capacity? Or haven't we made this
discussion yet?
> + } else {
> + cpu = cpumask_first(policy->related_cpus);
> + max_capacity = arch_scale_cpu_capacity(cpu);
> + capped_freq = policy->max;
> + max_freq = arch_scale_freq_ref(cpu);
> +
> + pressure = max_capacity -
> + mult_frac(max_capacity, capped_freq, max_freq);
> + }
> +
> + for_each_cpu(cpu, policy->related_cpus)
> + WRITE_ONCE(per_cpu(cpufreq_pressure, cpu), pressure);
> +}
> +
[...]
On Mon, 8 Jan 2024 at 17:35, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>
> On 08/01/2024 14:48, Vincent Guittot wrote:
> > Provide to the scheduler a feedback about the temporary max available
> > capacity. Unlike arch_update_thermal_pressure, this doesn't need to be
> > filtered as the pressure will happen for dozens ms or more.
>
> Is this then related to the 'medium pace system pressure' you mentioned
> in your OSPM '23 talk?
>
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > ---
> > drivers/cpufreq/cpufreq.c | 36 ++++++++++++++++++++++++++++++++++++
> > include/linux/cpufreq.h | 10 ++++++++++
> > 2 files changed, 46 insertions(+)
> >
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 44db4f59c4cc..fa2e2ea26f7f 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -2563,6 +2563,40 @@ int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu)
> > }
> > EXPORT_SYMBOL(cpufreq_get_policy);
> >
> > +DEFINE_PER_CPU(unsigned long, cpufreq_pressure);
> > +
> > +/**
> > + * cpufreq_update_pressure() - Update cpufreq pressure for CPUs
> > + * @policy: cpufreq policy of the CPUs.
> > + *
> > + * Update the value of cpufreq pressure for all @cpus in the policy.
> > + */
> > +static void cpufreq_update_pressure(struct cpufreq_policy *policy)
> > +{
> > + unsigned long max_capacity, capped_freq, pressure;
> > + u32 max_freq;
> > + int cpu;
> > +
> > + /*
> > + * Handle properly the boost frequencies, which should simply clean
> > + * the thermal pressure value.
> ^^^^^^^
> IMHO, this is a copy & paste error from topology_update_thermal_pressure()?
>
> > + */
> > + if (max_freq <= capped_freq) {
>
> max_freq seems to be uninitialized.
argh yes, I made crap while cleaning up
both max_freq and capped_freq are uninitialized
>
> > + pressure = 0;
>
> Is this x86 (turbo boost) specific? IMHO at arm we follow this max freq
> (including boost) relates to 1024 in capacity? Or haven't we made this
> discussion yet?
This is not x86 specific. We can have capped_freq > max_freq on Arm too
Also this bypass all calculation below when max_freq == capped_freq
which is the most common case
>
> > + } else {
> > + cpu = cpumask_first(policy->related_cpus);
> > + max_capacity = arch_scale_cpu_capacity(cpu);
> > + capped_freq = policy->max;
> > + max_freq = arch_scale_freq_ref(cpu);
> > +
> > + pressure = max_capacity -
> > + mult_frac(max_capacity, capped_freq, max_freq);
> > + }
> > +
> > + for_each_cpu(cpu, policy->related_cpus)
> > + WRITE_ONCE(per_cpu(cpufreq_pressure, cpu), pressure);
> > +}
> > +
>
> [...]
>
On Mon, 8 Jan 2024 at 17:35, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>
> On 08/01/2024 14:48, Vincent Guittot wrote:
> > Provide to the scheduler a feedback about the temporary max available
> > capacity. Unlike arch_update_thermal_pressure, this doesn't need to be
> > filtered as the pressure will happen for dozens ms or more.
>
> Is this then related to the 'medium pace system pressure' you mentioned
> in your OSPM '23 talk?
Sorry I forgot to answer this question. Yes this is the medium pace
system pressure that I mentioned at OSPM'23
>
> >
@@ -2563,6 +2563,40 @@ int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu)
}
EXPORT_SYMBOL(cpufreq_get_policy);
+DEFINE_PER_CPU(unsigned long, cpufreq_pressure);
+
+/**
+ * cpufreq_update_pressure() - Update cpufreq pressure for CPUs
+ * @policy: cpufreq policy of the CPUs.
+ *
+ * Update the value of cpufreq pressure for all @cpus in the policy.
+ */
+static void cpufreq_update_pressure(struct cpufreq_policy *policy)
+{
+ unsigned long max_capacity, capped_freq, pressure;
+ u32 max_freq;
+ int cpu;
+
+ /*
+ * Handle properly the boost frequencies, which should simply clean
+ * the thermal pressure value.
+ */
+ if (max_freq <= capped_freq) {
+ pressure = 0;
+ } else {
+ cpu = cpumask_first(policy->related_cpus);
+ max_capacity = arch_scale_cpu_capacity(cpu);
+ capped_freq = policy->max;
+ max_freq = arch_scale_freq_ref(cpu);
+
+ pressure = max_capacity -
+ mult_frac(max_capacity, capped_freq, max_freq);
+ }
+
+ for_each_cpu(cpu, policy->related_cpus)
+ WRITE_ONCE(per_cpu(cpufreq_pressure, cpu), pressure);
+}
+
/**
* cpufreq_set_policy - Modify cpufreq policy parameters.
* @policy: Policy object to modify.
@@ -2618,6 +2652,8 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,
policy->max = __resolve_freq(policy, policy->max, CPUFREQ_RELATION_H);
trace_cpu_frequency_limits(policy);
+ cpufreq_update_pressure(policy);
+
policy->cached_target_freq = UINT_MAX;
pr_debug("new min and max freqs are %u - %u kHz\n",
@@ -241,6 +241,12 @@ struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy);
void cpufreq_enable_fast_switch(struct cpufreq_policy *policy);
void cpufreq_disable_fast_switch(struct cpufreq_policy *policy);
bool has_target_index(void);
+
+DECLARE_PER_CPU(unsigned long, cpufreq_pressure);
+static inline unsigned long cpufreq_get_pressure(int cpu)
+{
+ return per_cpu(cpufreq_pressure, cpu);
+}
#else
static inline unsigned int cpufreq_get(unsigned int cpu)
{
@@ -263,6 +269,10 @@ static inline bool cpufreq_supports_freq_invariance(void)
return false;
}
static inline void disable_cpufreq(void) { }
+static inline unsigned long cpufreq_get_pressure(int cpu)
+{
+ return 0;
+}
#endif
#ifdef CONFIG_CPU_FREQ_STAT