[0/6] sched/deadline: cpuset: Rework DEADLINE bandwidth restoration

Message ID 20230329125558.255239-1-juri.lelli@redhat.com
Headers
Series sched/deadline: cpuset: Rework DEADLINE bandwidth restoration |

Message

Juri Lelli March 29, 2023, 12:55 p.m. UTC
  Qais reported [1] that iterating over all tasks when rebuilding root
domains for finding out which ones are DEADLINE and need their bandwidth
correctly restored on such root domains can be a costly operation (10+
ms delays on suspend-resume). He proposed we skip rebuilding root
domains for certain operations, but that approach seemed arch specific
and possibly prone to errors, as paths that ultimately trigger a rebuild
might be quite convoluted (thanks Qais for spending time on this!).

To fix the problem

 01/06 - Rename functions deadline with DEADLINE accounting (cleanup
         suggested by Qais) - no functional change
 02/06 - Bring back cpuset_mutex (so that we have write access to cpusets
         from scheduler operations - and we also fix some problems
         associated to percpu_cpuset_rwsem)
 03/06 - Keep track of the number of DEADLINE tasks belonging to each cpuset
 04/06 - Create DL BW alloc, free & check overflow interface for bulk
         bandwidth allocation/removal - no functional change 
 05/06 - Fix bandwidth allocation handling for cgroup operation
         involving multiple tasks
 06/06 - Use this information to only perform the costly iteration if
         DEADLINE tasks are actually present in the cpuset for which a
         corresponding root domain is being rebuilt

With respect to the RFC posting [2]

 1 - rename DEADLINE bandwidth accounting functions - Qais
 2 - call inc/dec_dl_tasks_cs from switched_{to,from}_dl - Qais
 3 - fix DEADLINE bandwidth allocation with multiple tasks - Waiman,
     contributed by Dietmar

This set is also available from

https://github.com/jlelli/linux.git deadline/rework-cpusets

Best,
Juri

1 - https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@layalina.io/
2 - https://lore.kernel.org/lkml/20230315121812.206079-1-juri.lelli@redhat.com/

Dietmar Eggemann (2):
  sched/deadline: Create DL BW alloc, free & check overflow interface
  cgroup/cpuset: Free DL BW in case can_attach() fails

Juri Lelli (4):
  cgroup/cpuset: Rename functions dealing with DEADLINE accounting
  sched/cpuset: Bring back cpuset_mutex
  sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
  cgroup/cpuset: Iterate only if DEADLINE tasks are present

 include/linux/cpuset.h  |  12 ++-
 include/linux/sched.h   |   4 +-
 kernel/cgroup/cgroup.c  |   4 +
 kernel/cgroup/cpuset.c  | 232 ++++++++++++++++++++++++++--------------
 kernel/sched/core.c     |  41 ++++---
 kernel/sched/deadline.c |  67 +++++++++---
 kernel/sched/sched.h    |   2 +-
 7 files changed, 240 insertions(+), 122 deletions(-)
  

Comments

Waiman Long March 29, 2023, 2:34 p.m. UTC | #1
On 3/29/23 08:55, Juri Lelli wrote:
> Qais reported [1] that iterating over all tasks when rebuilding root
> domains for finding out which ones are DEADLINE and need their bandwidth
> correctly restored on such root domains can be a costly operation (10+
> ms delays on suspend-resume). He proposed we skip rebuilding root
> domains for certain operations, but that approach seemed arch specific
> and possibly prone to errors, as paths that ultimately trigger a rebuild
> might be quite convoluted (thanks Qais for spending time on this!).
>
> To fix the problem
>
>   01/06 - Rename functions deadline with DEADLINE accounting (cleanup
>           suggested by Qais) - no functional change
>   02/06 - Bring back cpuset_mutex (so that we have write access to cpusets
>           from scheduler operations - and we also fix some problems
>           associated to percpu_cpuset_rwsem)
>   03/06 - Keep track of the number of DEADLINE tasks belonging to each cpuset
>   04/06 - Create DL BW alloc, free & check overflow interface for bulk
>           bandwidth allocation/removal - no functional change
>   05/06 - Fix bandwidth allocation handling for cgroup operation
>           involving multiple tasks
>   06/06 - Use this information to only perform the costly iteration if
>           DEADLINE tasks are actually present in the cpuset for which a
>           corresponding root domain is being rebuilt
>
> With respect to the RFC posting [2]
>
>   1 - rename DEADLINE bandwidth accounting functions - Qais
>   2 - call inc/dec_dl_tasks_cs from switched_{to,from}_dl - Qais
>   3 - fix DEADLINE bandwidth allocation with multiple tasks - Waiman,
>       contributed by Dietmar
>
> This set is also available from
>
> https://github.com/jlelli/linux.git deadline/rework-cpusets
>
> Best,
> Juri
>
> 1 - https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@layalina.io/
> 2 - https://lore.kernel.org/lkml/20230315121812.206079-1-juri.lelli@redhat.com/
>
> Dietmar Eggemann (2):
>    sched/deadline: Create DL BW alloc, free & check overflow interface
>    cgroup/cpuset: Free DL BW in case can_attach() fails
>
> Juri Lelli (4):
>    cgroup/cpuset: Rename functions dealing with DEADLINE accounting
>    sched/cpuset: Bring back cpuset_mutex
>    sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
>    cgroup/cpuset: Iterate only if DEADLINE tasks are present
>
>   include/linux/cpuset.h  |  12 ++-
>   include/linux/sched.h   |   4 +-
>   kernel/cgroup/cgroup.c  |   4 +
>   kernel/cgroup/cpuset.c  | 232 ++++++++++++++++++++++++++--------------
>   kernel/sched/core.c     |  41 ++++---
>   kernel/sched/deadline.c |  67 +++++++++---
>   kernel/sched/sched.h    |   2 +-
>   7 files changed, 240 insertions(+), 122 deletions(-)

Other than some minor issues that I have talked in earlier emails, the 
patch series looks good to me.

You can add my ack once the issues are fixed.

Acked-by: Waiman Long <longman@redhat.com>
  
Waiman Long March 29, 2023, 4:05 p.m. UTC | #2
On 3/29/23 12:02, Waiman Long wrote:
> It is possible to have parallel attach operations to the same cpuset in
> progress. To avoid possible corruption of single set of DL BW data in
> the cpuset structure, we have to disallow parallel attach operations if
> DL tasks are present. Attach operations can still proceed in parallel
> as long as no DL tasks are involved.
>
> This patch also stores the CPU where DL BW is allocated and free that BW
> back to the same CPU in case cpuset_can_attach() is called.
>
> Signed-off-by: Waiman Long <longman@redhat.com>

Juri, this patch is an addendum to your series to address the issues 
that I found from the cpuset point of view.

Cheers,
Longman
  
Qais Yousef April 4, 2023, 8:09 p.m. UTC | #3
On 03/29/23 14:55, Juri Lelli wrote:
> Qais reported [1] that iterating over all tasks when rebuilding root
> domains for finding out which ones are DEADLINE and need their bandwidth
> correctly restored on such root domains can be a costly operation (10+
> ms delays on suspend-resume). He proposed we skip rebuilding root
> domains for certain operations, but that approach seemed arch specific
> and possibly prone to errors, as paths that ultimately trigger a rebuild
> might be quite convoluted (thanks Qais for spending time on this!).
> 
> To fix the problem
> 
>  01/06 - Rename functions deadline with DEADLINE accounting (cleanup
>          suggested by Qais) - no functional change
>  02/06 - Bring back cpuset_mutex (so that we have write access to cpusets
>          from scheduler operations - and we also fix some problems
>          associated to percpu_cpuset_rwsem)
>  03/06 - Keep track of the number of DEADLINE tasks belonging to each cpuset
>  04/06 - Create DL BW alloc, free & check overflow interface for bulk
>          bandwidth allocation/removal - no functional change 
>  05/06 - Fix bandwidth allocation handling for cgroup operation
>          involving multiple tasks
>  06/06 - Use this information to only perform the costly iteration if
>          DEADLINE tasks are actually present in the cpuset for which a
>          corresponding root domain is being rebuilt
> 
> With respect to the RFC posting [2]
> 
>  1 - rename DEADLINE bandwidth accounting functions - Qais
>  2 - call inc/dec_dl_tasks_cs from switched_{to,from}_dl - Qais
>  3 - fix DEADLINE bandwidth allocation with multiple tasks - Waiman,
>      contributed by Dietmar
> 
> This set is also available from
> 
> https://github.com/jlelli/linux.git deadline/rework-cpusets

Thanks a lot Juri!

I picked up the updated series and applied them to a 5.10 kernel and tested the
issue is fixed. Replied with my reviewed-and-tested-bys to some of the patches
already.

I haven't looked much at Dietmar's patches and while they were part of the
test, but there are no dl tasks on the system so I felt hesitant to say
I tested that part.


Cheers

--
Qais Yousef
  
Qais Yousef April 18, 2023, 2:11 p.m. UTC | #4
On 03/29/23 14:55, Juri Lelli wrote:
> Qais reported [1] that iterating over all tasks when rebuilding root
> domains for finding out which ones are DEADLINE and need their bandwidth
> correctly restored on such root domains can be a costly operation (10+
> ms delays on suspend-resume). He proposed we skip rebuilding root
> domains for certain operations, but that approach seemed arch specific
> and possibly prone to errors, as paths that ultimately trigger a rebuild
> might be quite convoluted (thanks Qais for spending time on this!).
> 
> To fix the problem
> 
>  01/06 - Rename functions deadline with DEADLINE accounting (cleanup
>          suggested by Qais) - no functional change
>  02/06 - Bring back cpuset_mutex (so that we have write access to cpusets
>          from scheduler operations - and we also fix some problems
>          associated to percpu_cpuset_rwsem)
>  03/06 - Keep track of the number of DEADLINE tasks belonging to each cpuset
>  04/06 - Create DL BW alloc, free & check overflow interface for bulk
>          bandwidth allocation/removal - no functional change 
>  05/06 - Fix bandwidth allocation handling for cgroup operation
>          involving multiple tasks
>  06/06 - Use this information to only perform the costly iteration if
>          DEADLINE tasks are actually present in the cpuset for which a
>          corresponding root domain is being rebuilt
> 
> With respect to the RFC posting [2]
> 
>  1 - rename DEADLINE bandwidth accounting functions - Qais
>  2 - call inc/dec_dl_tasks_cs from switched_{to,from}_dl - Qais
>  3 - fix DEADLINE bandwidth allocation with multiple tasks - Waiman,
>      contributed by Dietmar
> 
> This set is also available from
> 
> https://github.com/jlelli/linux.git deadline/rework-cpusets

Is this just waiting to be picked up or still there's something to be addressed
still?


Thanks!

--
Qais Yousef
  
Waiman Long April 18, 2023, 2:31 p.m. UTC | #5
On 4/18/23 10:11, Qais Yousef wrote:
> On 03/29/23 14:55, Juri Lelli wrote:
>> Qais reported [1] that iterating over all tasks when rebuilding root
>> domains for finding out which ones are DEADLINE and need their bandwidth
>> correctly restored on such root domains can be a costly operation (10+
>> ms delays on suspend-resume). He proposed we skip rebuilding root
>> domains for certain operations, but that approach seemed arch specific
>> and possibly prone to errors, as paths that ultimately trigger a rebuild
>> might be quite convoluted (thanks Qais for spending time on this!).
>>
>> To fix the problem
>>
>>   01/06 - Rename functions deadline with DEADLINE accounting (cleanup
>>           suggested by Qais) - no functional change
>>   02/06 - Bring back cpuset_mutex (so that we have write access to cpusets
>>           from scheduler operations - and we also fix some problems
>>           associated to percpu_cpuset_rwsem)
>>   03/06 - Keep track of the number of DEADLINE tasks belonging to each cpuset
>>   04/06 - Create DL BW alloc, free & check overflow interface for bulk
>>           bandwidth allocation/removal - no functional change
>>   05/06 - Fix bandwidth allocation handling for cgroup operation
>>           involving multiple tasks
>>   06/06 - Use this information to only perform the costly iteration if
>>           DEADLINE tasks are actually present in the cpuset for which a
>>           corresponding root domain is being rebuilt
>>
>> With respect to the RFC posting [2]
>>
>>   1 - rename DEADLINE bandwidth accounting functions - Qais
>>   2 - call inc/dec_dl_tasks_cs from switched_{to,from}_dl - Qais
>>   3 - fix DEADLINE bandwidth allocation with multiple tasks - Waiman,
>>       contributed by Dietmar
>>
>> This set is also available from
>>
>> https://github.com/jlelli/linux.git deadline/rework-cpusets
> Is this just waiting to be picked up or still there's something to be addressed
> still?

There are some changes to cpuset code recently and so I believe that 
this patch series may need to be refreshed to reconcile the changes.

Thanks,
Longman
  
Juri Lelli April 20, 2023, 2:15 p.m. UTC | #6
On 18/04/23 10:31, Waiman Long wrote:
> On 4/18/23 10:11, Qais Yousef wrote:
> > On 03/29/23 14:55, Juri Lelli wrote:
> > > Qais reported [1] that iterating over all tasks when rebuilding root
> > > domains for finding out which ones are DEADLINE and need their bandwidth
> > > correctly restored on such root domains can be a costly operation (10+
> > > ms delays on suspend-resume). He proposed we skip rebuilding root
> > > domains for certain operations, but that approach seemed arch specific
> > > and possibly prone to errors, as paths that ultimately trigger a rebuild
> > > might be quite convoluted (thanks Qais for spending time on this!).
> > > 
> > > To fix the problem
> > > 
> > >   01/06 - Rename functions deadline with DEADLINE accounting (cleanup
> > >           suggested by Qais) - no functional change
> > >   02/06 - Bring back cpuset_mutex (so that we have write access to cpusets
> > >           from scheduler operations - and we also fix some problems
> > >           associated to percpu_cpuset_rwsem)
> > >   03/06 - Keep track of the number of DEADLINE tasks belonging to each cpuset
> > >   04/06 - Create DL BW alloc, free & check overflow interface for bulk
> > >           bandwidth allocation/removal - no functional change
> > >   05/06 - Fix bandwidth allocation handling for cgroup operation
> > >           involving multiple tasks
> > >   06/06 - Use this information to only perform the costly iteration if
> > >           DEADLINE tasks are actually present in the cpuset for which a
> > >           corresponding root domain is being rebuilt
> > > 
> > > With respect to the RFC posting [2]
> > > 
> > >   1 - rename DEADLINE bandwidth accounting functions - Qais
> > >   2 - call inc/dec_dl_tasks_cs from switched_{to,from}_dl - Qais
> > >   3 - fix DEADLINE bandwidth allocation with multiple tasks - Waiman,
> > >       contributed by Dietmar
> > > 
> > > This set is also available from
> > > 
> > > https://github.com/jlelli/linux.git deadline/rework-cpusets
> > Is this just waiting to be picked up or still there's something to be addressed
> > still?
> 
> There are some changes to cpuset code recently and so I believe that this
> patch series may need to be refreshed to reconcile the changes.

Yeah, will soon take a look.

Thanks!
Juri