[1/2] sched: introduce helper function to calculate distribution over sched class

Message ID 20240220061542.489922-1-zhaoyang.huang@unisoc.com
State New
Headers
Series [1/2] sched: introduce helper function to calculate distribution over sched class |

Commit Message

zhaoyang.huang Feb. 20, 2024, 6:15 a.m. UTC
  From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

As RT, DL, IRQ time could be deemed as lost time of CFS's task, some
timing value want to know the distribution of how these spread
approximately by using utilization account value (nivcsw is not enough
sometimes). This commit would like to introduce a helper function to
achieve this goal.

eg.
Effective part of A = Total_time * cpu_util_cfs / cpu_util

Timing value A
(should be a process last for several TICKs or statistics of a repeadted
process)

Timing start
|
|
preempted by RT, DL or IRQ
|\
| This period time is nonvoluntary CPU give up, need to know how long
|/
sched in again
|
|
|
Timing end

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
 include/linux/sched.h |  1 +
 kernel/sched/core.c   | 20 ++++++++++++++++++++
 2 files changed, 21 insertions(+)
  

Comments

Vincent Guittot Feb. 21, 2024, 5:51 p.m. UTC | #1
On Tue, 20 Feb 2024 at 07:16, zhaoyang.huang <zhaoyang.huang@unisoc.com> wrote:
>
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
>
> As RT, DL, IRQ time could be deemed as lost time of CFS's task, some

It's lost only if cfs has been actually preempted

> timing value want to know the distribution of how these spread
> approximately by using utilization account value (nivcsw is not enough
> sometimes). This commit would like to introduce a helper function to
> achieve this goal.
>
> eg.
> Effective part of A = Total_time * cpu_util_cfs / cpu_util
>
> Timing value A
> (should be a process last for several TICKs or statistics of a repeadted
> process)
>
> Timing start
> |
> |
> preempted by RT, DL or IRQ
> |\
> | This period time is nonvoluntary CPU give up, need to know how long
> |/

preempted means that a cfs task stops running on the cpu and lets
another rt/dl task or an irq run on the cpu instead. We can't know
that. We know an average ratio of time spent in rt/dl and irq contexts
but not if the cpu was idle or running cfs task

> sched in again
> |
> |
> |
> Timing end
>
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
>  include/linux/sched.h |  1 +
>  kernel/sched/core.c   | 20 ++++++++++++++++++++
>  2 files changed, 21 insertions(+)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 77f01ac385f7..99cf09c47f72 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2318,6 +2318,7 @@ static inline bool owner_on_cpu(struct task_struct *owner)
>
>  /* Returns effective CPU energy utilization, as seen by the scheduler */
>  unsigned long sched_cpu_util(int cpu);
> +unsigned long cfs_prop_by_util(struct task_struct *tsk, unsigned long val);
>  #endif /* CONFIG_SMP */
>
>  #ifdef CONFIG_RSEQ
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 802551e0009b..217e2220fdc1 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7494,6 +7494,26 @@ unsigned long sched_cpu_util(int cpu)
>  {
>         return effective_cpu_util(cpu, cpu_util_cfs(cpu), ENERGY_UTIL, NULL);
>  }
> +
> +/*
> + * Calculate the approximate proportion of timing value consumed in cfs.
> + * The user must be aware of this is done by avg_util which is tracked by
> + * the geometric series as decaying the load by y^32 = 0.5 (unit is 1ms).
> + * That is, only the period last for at least several TICKs or the statistics
> + * of repeated timing value are suitable for this helper function.
> + */
> +unsigned long cfs_prop_by_util(struct task_struct *tsk, unsigned long val)
> +{
> +       unsigned int cpu = task_cpu(tsk);
> +       struct rq *rq = cpu_rq(cpu);
> +       unsigned long util;
> +
> +       if (tsk->sched_class != &fair_sched_class)
> +               return val;
> +       util = cpu_util_rt(rq) + cpu_util_cfs(cpu) + cpu_util_irq(rq) + cpu_util_dl(rq);

This is not correct as irq is not on the same clock domain: look at
effective_cpu_util()

You don't care about idle time ?

> +       return min(val, cpu_util_cfs(cpu) * val / util);
> +}
> +
>  #endif /* CONFIG_SMP */
>
>  /**
> --
> 2.25.1
>
  
Zhaoyang Huang Feb. 22, 2024, 2:58 a.m. UTC | #2
On Thu, Feb 22, 2024 at 1:51 AM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Tue, 20 Feb 2024 at 07:16, zhaoyang.huang <zhaoyang.huang@unisoc.com> wrote:
> >
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >
> > As RT, DL, IRQ time could be deemed as lost time of CFS's task, some
>
> It's lost only if cfs has been actually preempted
Yes. Actually, I just want to get the approximate proportion of how
CFS tasks(whole runq) is preempted. The preemption among CFS is not
considered.
>
> > timing value want to know the distribution of how these spread
> > approximately by using utilization account value (nivcsw is not enough
> > sometimes). This commit would like to introduce a helper function to
> > achieve this goal.
> >
> > eg.
> > Effective part of A = Total_time * cpu_util_cfs / cpu_util
> >
> > Timing value A
> > (should be a process last for several TICKs or statistics of a repeadted
> > process)
> >
> > Timing start
> > |
> > |
> > preempted by RT, DL or IRQ
> > |\
> > | This period time is nonvoluntary CPU give up, need to know how long
> > |/
>
> preempted means that a cfs task stops running on the cpu and lets
> another rt/dl task or an irq run on the cpu instead. We can't know
> that. We know an average ratio of time spent in rt/dl and irq contexts
> but not if the cpu was idle or running cfs task
ok, will take idle into consideration and as explained above,
preemption among cfs tasks is not considered on purpose
>
> > sched in again
> > |
> > |
> > |
> > Timing end
> >
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > ---
> >  include/linux/sched.h |  1 +
> >  kernel/sched/core.c   | 20 ++++++++++++++++++++
> >  2 files changed, 21 insertions(+)
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 77f01ac385f7..99cf09c47f72 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -2318,6 +2318,7 @@ static inline bool owner_on_cpu(struct task_struct *owner)
> >
> >  /* Returns effective CPU energy utilization, as seen by the scheduler */
> >  unsigned long sched_cpu_util(int cpu);
> > +unsigned long cfs_prop_by_util(struct task_struct *tsk, unsigned long val);
> >  #endif /* CONFIG_SMP */
> >
> >  #ifdef CONFIG_RSEQ
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 802551e0009b..217e2220fdc1 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -7494,6 +7494,26 @@ unsigned long sched_cpu_util(int cpu)
> >  {
> >         return effective_cpu_util(cpu, cpu_util_cfs(cpu), ENERGY_UTIL, NULL);
> >  }
> > +
> > +/*
> > + * Calculate the approximate proportion of timing value consumed in cfs.
> > + * The user must be aware of this is done by avg_util which is tracked by
> > + * the geometric series as decaying the load by y^32 = 0.5 (unit is 1ms).
> > + * That is, only the period last for at least several TICKs or the statistics
> > + * of repeated timing value are suitable for this helper function.
> > + */
> > +unsigned long cfs_prop_by_util(struct task_struct *tsk, unsigned long val)
> > +{
> > +       unsigned int cpu = task_cpu(tsk);
> > +       struct rq *rq = cpu_rq(cpu);
> > +       unsigned long util;
> > +
> > +       if (tsk->sched_class != &fair_sched_class)
> > +               return val;
> > +       util = cpu_util_rt(rq) + cpu_util_cfs(cpu) + cpu_util_irq(rq) + cpu_util_dl(rq);
>
> This is not correct as irq is not on the same clock domain: look at
> effective_cpu_util()
>
> You don't care about idle time ?
ok, will check. thanks
>
> > +       return min(val, cpu_util_cfs(cpu) * val / util);
> > +}
> > +
> >  #endif /* CONFIG_SMP */
> >
> >  /**
> > --
> > 2.25.1
> >
  

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 77f01ac385f7..99cf09c47f72 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2318,6 +2318,7 @@  static inline bool owner_on_cpu(struct task_struct *owner)
 
 /* Returns effective CPU energy utilization, as seen by the scheduler */
 unsigned long sched_cpu_util(int cpu);
+unsigned long cfs_prop_by_util(struct task_struct *tsk, unsigned long val);
 #endif /* CONFIG_SMP */
 
 #ifdef CONFIG_RSEQ
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 802551e0009b..217e2220fdc1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7494,6 +7494,26 @@  unsigned long sched_cpu_util(int cpu)
 {
 	return effective_cpu_util(cpu, cpu_util_cfs(cpu), ENERGY_UTIL, NULL);
 }
+
+/*
+ * Calculate the approximate proportion of timing value consumed in cfs.
+ * The user must be aware of this is done by avg_util which is tracked by
+ * the geometric series as decaying the load by y^32 = 0.5 (unit is 1ms).
+ * That is, only the period last for at least several TICKs or the statistics
+ * of repeated timing value are suitable for this helper function.
+ */
+unsigned long cfs_prop_by_util(struct task_struct *tsk, unsigned long val)
+{
+	unsigned int cpu = task_cpu(tsk);
+	struct rq *rq = cpu_rq(cpu);
+	unsigned long util;
+
+	if (tsk->sched_class != &fair_sched_class)
+		return val;
+	util = cpu_util_rt(rq) + cpu_util_cfs(cpu) + cpu_util_irq(rq) + cpu_util_dl(rq);
+	return min(val, cpu_util_cfs(cpu) * val / util);
+}
+
 #endif /* CONFIG_SMP */
 
 /**