[0/4] sched/psi: Allow unprivileged PSI polling

Message ID 20230309170756.52927-1-cerasuolodomenico@gmail.com
Headers
Series sched/psi: Allow unprivileged PSI polling |

Message

Domenico Cerasuolo March 9, 2023, 5:07 p.m. UTC
  PSI offers 2 mechanisms to get information about a specific resource
pressure. One is reading from /proc/pressure/<resource>, which gives
average pressures aggregated every 2s. The other is creating a pollable
fd for a specific resource and cgroup.

The trigger creation requires CAP_SYS_RESOURCE, and gives the
possibility to pick specific time window and threshold, spawing an RT
thread to aggregate the data.

Systemd would like to provide containers the option to monitor pressure
on their own cgroup and sub-cgroups. For example, if systemd launches a
container that itself then launches services, the container should have
the ability to poll() for pressure in individual services. But neither
the container nor the services are privileged.

The series is implemented in 4 steps in order to reduce the noise of
the change.

Domenico Cerasuolo (4):
  sched/psi: rearrange polling code in preparation
  sched/psi: rename existing poll members in preparation
  sched/psi: extract update_triggers side effect
  sched/psi: allow unprivileged polling of N*2s period

 Documentation/accounting/psi.rst |   4 +
 include/linux/psi.h              |   2 +-
 include/linux/psi_types.h        |  43 ++--
 kernel/cgroup/cgroup.c           |   2 +-
 kernel/sched/psi.c               | 412 ++++++++++++++++---------------
 5 files changed, 250 insertions(+), 213 deletions(-)
  

Comments

Suren Baghdasaryan March 13, 2023, 3:29 p.m. UTC | #1
On Thu, Mar 9, 2023 at 9:08 AM Domenico Cerasuolo
<cerasuolodomenico@gmail.com> wrote:
>
> PSI offers 2 mechanisms to get information about a specific resource
> pressure. One is reading from /proc/pressure/<resource>, which gives
> average pressures aggregated every 2s. The other is creating a pollable
> fd for a specific resource and cgroup.
>
> The trigger creation requires CAP_SYS_RESOURCE, and gives the
> possibility to pick specific time window and threshold, spawing an RT
> thread to aggregate the data.
>
> Systemd would like to provide containers the option to monitor pressure
> on their own cgroup and sub-cgroups. For example, if systemd launches a
> container that itself then launches services, the container should have
> the ability to poll() for pressure in individual services. But neither
> the container nor the services are privileged.

This sounds like an interesting usecase. I'll need to take a closer
look once I'm back from vacation later this week.
Thanks!

>
> The series is implemented in 4 steps in order to reduce the noise of
> the change.
>
> Domenico Cerasuolo (4):
>   sched/psi: rearrange polling code in preparation
>   sched/psi: rename existing poll members in preparation
>   sched/psi: extract update_triggers side effect
>   sched/psi: allow unprivileged polling of N*2s period
>
>  Documentation/accounting/psi.rst |   4 +
>  include/linux/psi.h              |   2 +-
>  include/linux/psi_types.h        |  43 ++--
>  kernel/cgroup/cgroup.c           |   2 +-
>  kernel/sched/psi.c               | 412 ++++++++++++++++---------------
>  5 files changed, 250 insertions(+), 213 deletions(-)
>
> --
> 2.34.1
>
  
Johannes Weiner March 14, 2023, 4:10 p.m. UTC | #2
On Mon, Mar 13, 2023 at 08:29:37AM -0700, Suren Baghdasaryan wrote:
> On Thu, Mar 9, 2023 at 9:08 AM Domenico Cerasuolo
> <cerasuolodomenico@gmail.com> wrote:
> >
> > PSI offers 2 mechanisms to get information about a specific resource
> > pressure. One is reading from /proc/pressure/<resource>, which gives
> > average pressures aggregated every 2s. The other is creating a pollable
> > fd for a specific resource and cgroup.
> >
> > The trigger creation requires CAP_SYS_RESOURCE, and gives the
> > possibility to pick specific time window and threshold, spawing an RT
> > thread to aggregate the data.
> >
> > Systemd would like to provide containers the option to monitor pressure
> > on their own cgroup and sub-cgroups. For example, if systemd launches a
> > container that itself then launches services, the container should have
> > the ability to poll() for pressure in individual services. But neither
> > the container nor the services are privileged.
> 
> This sounds like an interesting usecase. I'll need to take a closer
> look once I'm back from vacation later this week.
> Thanks!

Thanks, Suren!

There is also the desktop monitoring usecase that Chris Down had
inquired about some while back:

https://lore.kernel.org/all/CAJuCfpGnJBEvQTUeJ_U6+rHmPcMjw_pPL+QFj7Sec5fHZPH67w@mail.gmail.com/T/

The patches should help with that as well.