sched/core: Reduce cost of sched_move_task when config autogroup

Message ID 20230319075643.28312-1-wuchi.zero@gmail.com
State New
Headers
Series sched/core: Reduce cost of sched_move_task when config autogroup |

Commit Message

wuchi March 19, 2023, 7:56 a.m. UTC
  Some sched_move_task calls of autogroup is useless when the
task_struct->sched_task_group isn't changed because of task_group
of cpu_cgroup overlay task_group of autogroup. The overlay key codes
are as follows:

sched_cgroup_fork->autogroup_task_group->task_wants_autogroup
sched_change_group->autogroup_task_group->autogroup_task_group

sched_move_task eg:
task A belongs to cpu_cgroup0 and autogroup0, it will always to
cpu_cgroup0 when doing exit. So there is no need to do {de|en}queue.
The call graph is as follow.

do_exit
  sched_autogroup_exit_task
    sched_move_task
      dequeue_task
        sched_change_group
	  A.sched_task_group = sched_get_task_group
      enqueue_task

So do some check before dequeue task in sched_move_task.

Signed-off-by: wuchi <wuchi.zero@gmail.com>
---
 kernel/sched/core.c | 29 +++++++++++++++++++++++++++--
 1 file changed, 27 insertions(+), 2 deletions(-)
  

Comments

Peter Zijlstra March 20, 2023, 8:28 a.m. UTC | #1
On Sun, Mar 19, 2023 at 03:56:43PM +0800, wuchi wrote:
> Some sched_move_task calls of autogroup is useless when the
> task_struct->sched_task_group isn't changed because of task_group
> of cpu_cgroup overlay task_group of autogroup. The overlay key codes
> are as follows:
> 
> sched_cgroup_fork->autogroup_task_group->task_wants_autogroup
> sched_change_group->autogroup_task_group->autogroup_task_group
> 
> sched_move_task eg:
> task A belongs to cpu_cgroup0 and autogroup0, it will always to
> cpu_cgroup0 when doing exit. So there is no need to do {de|en}queue.
> The call graph is as follow.
> 
> do_exit
>   sched_autogroup_exit_task
>     sched_move_task
>       dequeue_task
>         sched_change_group
> 	  A.sched_task_group = sched_get_task_group
>       enqueue_task
> 
> So do some check before dequeue task in sched_move_task.

No immediate objection; but the $subject seems to suggest you did this
for performance reasons, yet there are no performance results in this
Changelog. How much does this save under what load?
  
wuchi March 20, 2023, 12:40 p.m. UTC | #2
Peter Zijlstra <peterz@infradead.org> 于2023年3月20日周一 16:28写道:
>
> On Sun, Mar 19, 2023 at 03:56:43PM +0800, wuchi wrote:
> > Some sched_move_task calls of autogroup is useless when the
> > task_struct->sched_task_group isn't changed because of task_group
> > of cpu_cgroup overlay task_group of autogroup. The overlay key codes
> > are as follows:
> >
> > sched_cgroup_fork->autogroup_task_group->task_wants_autogroup
> > sched_change_group->autogroup_task_group->autogroup_task_group
> >
> > sched_move_task eg:
> > task A belongs to cpu_cgroup0 and autogroup0, it will always to
> > cpu_cgroup0 when doing exit. So there is no need to do {de|en}queue.
> > The call graph is as follow.
> >
> > do_exit
> >   sched_autogroup_exit_task
> >     sched_move_task
> >       dequeue_task
> >         sched_change_group
> >         A.sched_task_group = sched_get_task_group
> >       enqueue_task
> >
> > So do some check before dequeue task in sched_move_task.
>
> No immediate objection; but the $subject seems to suggest you did this
> for performance reasons, yet there are no performance results in this
> Changelog. How much does this save under what load?

cpu:bogomips=4600.00
kernel:6.3.0-rc3
cpu cgroup: 6:cpu,cpuacct:/user.slice

run cmds:

#!/bin/bash
for i in {0..10000}; do
sleep 0 &
done
wait

bpftrace -e 'k:sched_move_task { @ts[tid] = nsecs; }
kr:sched_move_task /@ts[tid]/ { @ns += nsecs - @ts[tid];
delete(@ts[tid]); }'

cost time(ns):
 without patch: 43528033
 with      patch: 18934496
                  diff: -24593537  -56.5%

About the change, move sched_task_group_changed before
update_rq_clock as following and add the performance results in
the Changelog?

@@ -10369,6 +10390,9 @@ void sched_move_task(struct task_struct *tsk)
        rq = task_rq_lock(tsk, &rf);

+       if (!sched_task_group_changed(tsk))
+               goto unlock;
+
        update_rq_clock(rq);
  
Peter Zijlstra March 20, 2023, 3:13 p.m. UTC | #3
On Mon, Mar 20, 2023 at 08:40:51PM +0800, chi wu wrote:
> Peter Zijlstra <peterz@infradead.org> 于2023年3月20日周一 16:28写道:
> >
> > On Sun, Mar 19, 2023 at 03:56:43PM +0800, wuchi wrote:
> > > Some sched_move_task calls of autogroup is useless when the
> > > task_struct->sched_task_group isn't changed because of task_group
> > > of cpu_cgroup overlay task_group of autogroup. The overlay key codes
> > > are as follows:
> > >
> > > sched_cgroup_fork->autogroup_task_group->task_wants_autogroup
> > > sched_change_group->autogroup_task_group->autogroup_task_group
> > >
> > > sched_move_task eg:
> > > task A belongs to cpu_cgroup0 and autogroup0, it will always to
> > > cpu_cgroup0 when doing exit. So there is no need to do {de|en}queue.
> > > The call graph is as follow.
> > >
> > > do_exit
> > >   sched_autogroup_exit_task
> > >     sched_move_task
> > >       dequeue_task
> > >         sched_change_group
> > >         A.sched_task_group = sched_get_task_group
> > >       enqueue_task
> > >
> > > So do some check before dequeue task in sched_move_task.
> >
> > No immediate objection; but the $subject seems to suggest you did this
> > for performance reasons, yet there are no performance results in this
> > Changelog. How much does this save under what load?
> 
> cpu:bogomips=4600.00
> kernel:6.3.0-rc3
> cpu cgroup: 6:cpu,cpuacct:/user.slice
> 
> run cmds:
> 
> #!/bin/bash
> for i in {0..10000}; do
> sleep 0 &
> done
> wait
> 
> bpftrace -e 'k:sched_move_task { @ts[tid] = nsecs; }
> kr:sched_move_task /@ts[tid]/ { @ns += nsecs - @ts[tid];
> delete(@ts[tid]); }'
> 
> cost time(ns):
>  without patch: 43528033
>  with      patch: 18934496
>                   diff: -24593537  -56.5%
> 
> About the change, move sched_task_group_changed before
> update_rq_clock as following and add the performance results in
> the Changelog?

Yes please, that gives good motivation for the change.

Thanks!
  

Patch

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a380f34789a2..acc9a0e391f4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -10330,7 +10330,7 @@  void sched_release_group(struct task_group *tg)
 	spin_unlock_irqrestore(&task_group_lock, flags);
 }
 
-static void sched_change_group(struct task_struct *tsk)
+static struct task_group *sched_get_task_group(struct task_struct *tsk)
 {
 	struct task_group *tg;
 
@@ -10342,7 +10342,28 @@  static void sched_change_group(struct task_struct *tsk)
 	tg = container_of(task_css_check(tsk, cpu_cgrp_id, true),
 			  struct task_group, css);
 	tg = autogroup_task_group(tsk, tg);
-	tsk->sched_task_group = tg;
+
+	return tg;
+}
+
+static bool sched_task_group_changed(struct task_struct *tsk)
+{
+	/*
+	 * Some sched_move_task calls of autogroup is useless when the
+	 * task_struct->sched_task_group isn't changed because of task_group
+	 * of cpu_cgroup overlay task_group of autogroup. so do some check
+	 * before dequeue task in sched_move_task.
+	 */
+#ifdef CONFIG_SCHED_AUTOGROUP
+	return sched_get_task_group(tsk) != tsk->sched_task_group;
+#else
+	return true;
+#endif /* CONFIG_SCHED_AUTOGROUP */
+}
+
+static void sched_change_group(struct task_struct *tsk)
+{
+	tsk->sched_task_group = sched_get_task_group(tsk);
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	if (tsk->sched_class->task_change_group)
@@ -10369,6 +10390,9 @@  void sched_move_task(struct task_struct *tsk)
 	rq = task_rq_lock(tsk, &rf);
 	update_rq_clock(rq);
 
+	if (!sched_task_group_changed(tsk))
+		goto unlock;
+
 	running = task_current(rq, tsk);
 	queued = task_on_rq_queued(tsk);
 
@@ -10391,6 +10415,7 @@  void sched_move_task(struct task_struct *tsk)
 		resched_curr(rq);
 	}
 
+unlock:
 	task_rq_unlock(rq, tsk, &rf);
 }