[RFC,0/3] sched/deadline: cpuset: Rework DEADLINE bandwidth restoration

Message ID 20230315121812.206079-1-juri.lelli@redhat.com
Headers
Series sched/deadline: cpuset: Rework DEADLINE bandwidth restoration |

Message

Juri Lelli March 15, 2023, 12:18 p.m. UTC
  Qais reported [1] that iterating over all tasks when rebuilding root
domains for finding out which ones are DEADLINE and need their bandwidth
correctly restored on such root domains can be a costly operation (10+
ms delays on suspend-resume). He proposed we skip rebuilding root
domains for certain operations, but that approach seemed arch specific
and possibly prone to errors, as paths that ultimately trigger a rebuild
might be quite convoluted (thanks Qais for spending time on this!).

To fix the problem I instead would propose we

 1 - Bring back cpuset_mutex (so that we have write access to cpusets
     from scheduler operations - and we also fix some problems
     associated to percpu_cpuset_rwsem)
 2 - Keep track of the number of DEADLINE tasks belonging to each cpuset
 3 - Use this information to only perform the costly iteration if
     DEADLINE tasks are actually present in the cpuset for which a
     corresponding root domain is being rebuilt

This set is also available from

https://github.com/jlelli/linux.git deadline/rework-cpusets

Feedback is more than welcome.

Best,
Juri

1 - https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@layalina.io/

Juri Lelli (3):
  sched/cpuset: Bring back cpuset_mutex
  sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
  cgroup/cpuset: Iterate only if DEADLINE tasks are present

 include/linux/cpuset.h |  12 ++-
 kernel/cgroup/cgroup.c |   4 +
 kernel/cgroup/cpuset.c | 175 +++++++++++++++++++++++------------------
 kernel/sched/core.c    |  32 ++++++--
 4 files changed, 137 insertions(+), 86 deletions(-)
  

Comments

Qais Yousef March 15, 2023, 2:55 p.m. UTC | #1
On 03/15/23 12:18, Juri Lelli wrote:
> Qais reported [1] that iterating over all tasks when rebuilding root
> domains for finding out which ones are DEADLINE and need their bandwidth
> correctly restored on such root domains can be a costly operation (10+
> ms delays on suspend-resume). He proposed we skip rebuilding root
> domains for certain operations, but that approach seemed arch specific
> and possibly prone to errors, as paths that ultimately trigger a rebuild
> might be quite convoluted (thanks Qais for spending time on this!).

Thanks a lot for this! And sorry I couldn't provide something better.

> 
> To fix the problem I instead would propose we
> 
>  1 - Bring back cpuset_mutex (so that we have write access to cpusets
>      from scheduler operations - and we also fix some problems
>      associated to percpu_cpuset_rwsem)
>  2 - Keep track of the number of DEADLINE tasks belonging to each cpuset
>  3 - Use this information to only perform the costly iteration if
>      DEADLINE tasks are actually present in the cpuset for which a
>      corresponding root domain is being rebuilt

nit:

Would you consider adding another patch to rename the functions?
rebuild_root_domains() and update_tasks_root_domain() are deadline accounting
specific functions and don't actually rebuild root domains.


Thanks!

--
Qais Yousef

> 
> This set is also available from
> 
> https://github.com/jlelli/linux.git deadline/rework-cpusets
> 
> Feedback is more than welcome.
> 
> Best,
> Juri
> 
> 1 - https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@layalina.io/
> 
> Juri Lelli (3):
>   sched/cpuset: Bring back cpuset_mutex
>   sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
>   cgroup/cpuset: Iterate only if DEADLINE tasks are present
> 
>  include/linux/cpuset.h |  12 ++-
>  kernel/cgroup/cgroup.c |   4 +
>  kernel/cgroup/cpuset.c | 175 +++++++++++++++++++++++------------------
>  kernel/sched/core.c    |  32 ++++++--
>  4 files changed, 137 insertions(+), 86 deletions(-)
> 
> -- 
> 2.39.2
>
  
Juri Lelli March 15, 2023, 5:10 p.m. UTC | #2
On 15/03/23 14:55, Qais Yousef wrote:
> On 03/15/23 12:18, Juri Lelli wrote:
> > Qais reported [1] that iterating over all tasks when rebuilding root
> > domains for finding out which ones are DEADLINE and need their bandwidth
> > correctly restored on such root domains can be a costly operation (10+
> > ms delays on suspend-resume). He proposed we skip rebuilding root
> > domains for certain operations, but that approach seemed arch specific
> > and possibly prone to errors, as paths that ultimately trigger a rebuild
> > might be quite convoluted (thanks Qais for spending time on this!).
> 
> Thanks a lot for this! And sorry I couldn't provide something better.

Ah, no worries. Actually still have to convice myself what I have it's
actually better. :)

> > 
> > To fix the problem I instead would propose we
> > 
> >  1 - Bring back cpuset_mutex (so that we have write access to cpusets
> >      from scheduler operations - and we also fix some problems
> >      associated to percpu_cpuset_rwsem)
> >  2 - Keep track of the number of DEADLINE tasks belonging to each cpuset
> >  3 - Use this information to only perform the costly iteration if
> >      DEADLINE tasks are actually present in the cpuset for which a
> >      corresponding root domain is being rebuilt
> 
> nit:
> 
> Would you consider adding another patch to rename the functions?
> rebuild_root_domains() and update_tasks_root_domain() are deadline accounting
> specific functions and don't actually rebuild root domains.

Yep, can do.

Thanks,
Juri