[bpf-next,v2,0/6] Add Open-coded process and css iters

Message ID 20230912070149.969939-1-zhouchuyi@bytedance.com
Headers
Series Add Open-coded process and css iters |

Message

Chuyi Zhou Sept. 12, 2023, 7:01 a.m. UTC
  Hi,

This is version 2 of process and css iters support. All the changes were
suggested by Alexei.

Thanks for your review!

--- Changelog ---
Changes from v1:
- Add a pre-patch to make some preparations before supporting css_task
  iters.
- Add an allowlist for css_task iters
- Let bpf progs do explicit bpf_rcu_read_lock() when using process iters
and css_descendant iters.
---------------------

In some BPF usage scenarios, it will be useful to iterate the process and
css directly in the BPF program. One of the expected scenarios is
customizable OOM victim selection via BPF[1].

Inspired by Dave's task_vma iter[2], this patchset adds three types of
open-coded iterator kfuncs:

1. bpf_for_each(process, p). Just like for_each_process(p) in kernel to
itearing all tasks in the system.

2. bpf_for_each(css_task, task, css). It works like
css_task_iter_{start, next, end} and would be used to iterating
tasks/threads under a css.

3. bpf_for_each(css_{post, pre}, pos, root_css). It works like
css_next_descendant_{pre, post} to iterating all descendant css.

BPF programs can use these kfuncs directly or through bpf_for_each macro.

link[1]: https://lore.kernel.org/lkml/20230810081319.65668-1-zhouchuyi@bytedance.com/
link[2]: https://lore.kernel.org/all/20230810183513.684836-1-davemarchevsky@fb.com/

Chuyi Zhou (6):
  cgroup: Prepare for using css_task_iter_*() in BPF
  bpf: Introduce css_task open-coded iterator kfuncs
  bpf: Introduce process open coded iterator kfuncs
  bpf: Introduce css_descendant open-coded iterator kfuncs
  bpf: teach the verifier to enforce css_iter and process_iter in RCU CS
  selftests/bpf: Add tests for open-coded task and css iter

 include/linux/cgroup.h                        |  12 +-
 include/uapi/linux/bpf.h                      |  16 ++
 kernel/bpf/helpers.c                          |  12 ++
 kernel/bpf/task_iter.c                        | 130 +++++++++++++++++
 kernel/bpf/verifier.c                         |  53 ++++++-
 kernel/cgroup/cgroup.c                        |  18 ++-
 tools/include/uapi/linux/bpf.h                |  16 ++
 tools/lib/bpf/bpf_helpers.h                   |  24 +++
 .../testing/selftests/bpf/prog_tests/iters.c  | 138 ++++++++++++++++++
 .../testing/selftests/bpf/progs/iters_task.c  | 104 +++++++++++++
 10 files changed, 508 insertions(+), 15 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/iters_task.c
  

Comments

Chuyi Zhou Sept. 12, 2023, 7:11 a.m. UTC | #1
在 2023/9/12 15:01, Chuyi Zhou 写道:
> Hi,
> 
> This is version 2 of process and css iters support. All the changes were
> suggested by Alexei.
> 
> Thanks for your review!
> 
> --- Changelog ---
> Changes from v1:
> - Add a pre-patch to make some preparations before supporting css_task
>    iters.
> - Add an allowlist for css_task iters
> - Let bpf progs do explicit bpf_rcu_read_lock() when using process iters
> and css_descendant iters.

Sorry for missing the link to v1 
(https://lore.kernel.org/lkml/20230827072057.1591929-1-zhouchuyi@bytedance.com/).


> ---------------------
> 
> In some BPF usage scenarios, it will be useful to iterate the process and
> css directly in the BPF program. One of the expected scenarios is
> customizable OOM victim selection via BPF[1].
> 
> Inspired by Dave's task_vma iter[2], this patchset adds three types of
> open-coded iterator kfuncs:
> 
> 1. bpf_for_each(process, p). Just like for_each_process(p) in kernel to
> itearing all tasks in the system.
> 
> 2. bpf_for_each(css_task, task, css). It works like
> css_task_iter_{start, next, end} and would be used to iterating
> tasks/threads under a css.
> 
> 3. bpf_for_each(css_{post, pre}, pos, root_css). It works like
> css_next_descendant_{pre, post} to iterating all descendant css.
> 
> BPF programs can use these kfuncs directly or through bpf_for_each macro.
> 
> link[1]: https://lore.kernel.org/lkml/20230810081319.65668-1-zhouchuyi@bytedance.com/
> link[2]: https://lore.kernel.org/all/20230810183513.684836-1-davemarchevsky@fb.com/
> 
> Chuyi Zhou (6):
>    cgroup: Prepare for using css_task_iter_*() in BPF
>    bpf: Introduce css_task open-coded iterator kfuncs
>    bpf: Introduce process open coded iterator kfuncs
>    bpf: Introduce css_descendant open-coded iterator kfuncs
>    bpf: teach the verifier to enforce css_iter and process_iter in RCU CS
>    selftests/bpf: Add tests for open-coded task and css iter
> 
>   include/linux/cgroup.h                        |  12 +-
>   include/uapi/linux/bpf.h                      |  16 ++
>   kernel/bpf/helpers.c                          |  12 ++
>   kernel/bpf/task_iter.c                        | 130 +++++++++++++++++
>   kernel/bpf/verifier.c                         |  53 ++++++-
>   kernel/cgroup/cgroup.c                        |  18 ++-
>   tools/include/uapi/linux/bpf.h                |  16 ++
>   tools/lib/bpf/bpf_helpers.h                   |  24 +++
>   .../testing/selftests/bpf/prog_tests/iters.c  | 138 ++++++++++++++++++
>   .../testing/selftests/bpf/progs/iters_task.c  | 104 +++++++++++++
>   10 files changed, 508 insertions(+), 15 deletions(-)
>   create mode 100644 tools/testing/selftests/bpf/progs/iters_task.c
>
  
Andrii Nakryiko Sept. 14, 2023, 11:26 p.m. UTC | #2
On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> This Patch adds kfuncs bpf_iter_css_{pre,post}_{new,next,destroy} which
> allow creation and manipulation of struct bpf_iter_css in open-coded
> iterator style. These kfuncs actually wrapps css_next_descendant_{pre,
> post}. BPF programs can use these kfuncs through bpf_for_each macro for
> iteration of all descendant css under a root css.
>
> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> ---
>  include/uapi/linux/bpf.h       |  8 +++++
>  kernel/bpf/helpers.c           |  6 ++++
>  kernel/bpf/task_iter.c         | 53 ++++++++++++++++++++++++++++++++++
>  tools/include/uapi/linux/bpf.h |  8 +++++
>  tools/lib/bpf/bpf_helpers.h    | 12 ++++++++
>  5 files changed, 87 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index befa55b52e29..57760afc13d0 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -7326,4 +7326,12 @@ struct bpf_iter_process {
>         __u64 __opaque[1];
>  } __attribute__((aligned(8)));
>
> +struct bpf_iter_css_pre {
> +       __u64 __opaque[2];
> +} __attribute__((aligned(8)));
> +
> +struct bpf_iter_css_post {
> +       __u64 __opaque[2];
> +} __attribute__((aligned(8)));
> +
>  #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 9b7d2c6f99d1..ca1f6404af9e 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -2510,6 +2510,12 @@ BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
>  BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_css_pre_new, KF_ITER_NEW)
> +BTF_ID_FLAGS(func, bpf_iter_css_pre_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_css_pre_destroy, KF_ITER_DESTROY)
> +BTF_ID_FLAGS(func, bpf_iter_css_post_new, KF_ITER_NEW)
> +BTF_ID_FLAGS(func, bpf_iter_css_post_next, KF_ITER_NEXT | KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_iter_css_post_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>  BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>  BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> index 9d1927dc3a06..8963fc779b87 100644
> --- a/kernel/bpf/task_iter.c
> +++ b/kernel/bpf/task_iter.c
> @@ -880,6 +880,59 @@ __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
>  {
>  }
>
> +struct bpf_iter_css_kern {
> +       struct cgroup_subsys_state *root;
> +       struct cgroup_subsys_state *pos;
> +} __attribute__((aligned(8)));
> +
> +__bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
> +               struct cgroup_subsys_state *root)

similar to my comment on previous patches, please see
kernel/bpf/cgroup_iter.c for iter/cgroup iterator program. Let's stay
consistent. We have one iterator that accepts parameters defining
iteration order and starting cgroup. Unless there are some technical
reasons we can't follow similar approach with this open-coded iter,
let's use the same approach. We can even reuse
BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
BPF_CGROUP_ITER_ANCESTORS_UP enums.


> +{
> +       struct bpf_iter_css_kern *kit = (void *)it;
> +
> +       BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_pre));
> +       BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_pre));
> +       kit->root = root;
> +       kit->pos = NULL;
> +       return 0;
> +}
> +
> +__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it)
> +{
> +       struct bpf_iter_css_kern *kit = (void *)it;
> +
> +       kit->pos = css_next_descendant_pre(kit->pos, kit->root);
> +       return kit->pos;
> +}
> +
> +__bpf_kfunc void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it)
> +{
> +}
> +
> +__bpf_kfunc int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
> +               struct cgroup_subsys_state *root)
> +{
> +       struct bpf_iter_css_kern *kit = (void *)it;
> +
> +       BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css_post));
> +       BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css_post));
> +       kit->root = root;
> +       kit->pos = NULL;
> +       return 0;
> +}
> +
> +__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it)
> +{
> +       struct bpf_iter_css_kern *kit = (void *)it;
> +
> +       kit->pos = css_next_descendant_post(kit->pos, kit->root);
> +       return kit->pos;
> +}
> +
> +__bpf_kfunc void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it)
> +{
> +}
> +
>  DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work);
>
>  static void do_mmap_read_unlock(struct irq_work *entry)
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index befa55b52e29..57760afc13d0 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -7326,4 +7326,12 @@ struct bpf_iter_process {
>         __u64 __opaque[1];
>  } __attribute__((aligned(8)));
>
> +struct bpf_iter_css_pre {
> +       __u64 __opaque[2];
> +} __attribute__((aligned(8)));
> +
> +struct bpf_iter_css_post {
> +       __u64 __opaque[2];
> +} __attribute__((aligned(8)));
> +
>  #endif /* _UAPI__LINUX_BPF_H__ */
> diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
> index 858252c2641c..6e5bd9ef14d6 100644
> --- a/tools/lib/bpf/bpf_helpers.h
> +++ b/tools/lib/bpf/bpf_helpers.h
> @@ -315,6 +315,18 @@ extern int bpf_iter_process_new(struct bpf_iter_process *it) __weak __ksym;
>  extern struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it) __weak __ksym;
>  extern void bpf_iter_process_destroy(struct bpf_iter_process *it) __weak __ksym;
>
> +struct bpf_iter_css_pre;
> +extern int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
> +               struct cgroup_subsys_state *root) __weak __ksym;
> +extern struct cgroup_subsys_state *bpf_iter_css_pre_next(struct bpf_iter_css_pre *it) __weak __ksym;
> +extern void bpf_iter_css_pre_destroy(struct bpf_iter_css_pre *it) __weak __ksym;
> +
> +struct bpf_iter_css_post;
> +extern int bpf_iter_css_post_new(struct bpf_iter_css_post *it,
> +               struct cgroup_subsys_state *root) __weak __ksym;
> +extern struct cgroup_subsys_state *bpf_iter_css_post_next(struct bpf_iter_css_post *it) __weak __ksym;
> +extern void bpf_iter_css_post_destroy(struct bpf_iter_css_post *it) __weak __ksym;
> +
>  #ifndef bpf_for_each
>  /* bpf_for_each(iter_type, cur_elem, args...) provides generic construct for
>   * using BPF open-coded iterators without having to write mundane explicit
> --
> 2.20.1
>
  
Chuyi Zhou Sept. 15, 2023, 11:57 a.m. UTC | #3
Hello.

在 2023/9/15 07:26, Andrii Nakryiko 写道:
> On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>>
>> This Patch adds kfuncs bpf_iter_css_{pre,post}_{new,next,destroy} which
>> allow creation and manipulation of struct bpf_iter_css in open-coded
>> iterator style. These kfuncs actually wrapps css_next_descendant_{pre,
>> post}. BPF programs can use these kfuncs through bpf_for_each macro for
>> iteration of all descendant css under a root css.
>>
>> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
>> ---
>>   include/uapi/linux/bpf.h       |  8 +++++
>>   kernel/bpf/helpers.c           |  6 ++++
>>   kernel/bpf/task_iter.c         | 53 ++++++++++++++++++++++++++++++++++
>>   tools/include/uapi/linux/bpf.h |  8 +++++
>>   tools/lib/bpf/bpf_helpers.h    | 12 ++++++++
>>   5 files changed, 87 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index befa55b52e29..57760afc13d0 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -7326,4 +7326,12 @@ struct bpf_iter_process {
>>          __u64 __opaque[1];
>>   } __attribute__((aligned(8)));
>>
>> +struct bpf_iter_css_pre {
>> +       __u64 __opaque[2];
>> +} __attribute__((aligned(8)));
>> +
>> +struct bpf_iter_css_post {
>> +       __u64 __opaque[2];
>> +} __attribute__((aligned(8)));
>> +
>>   #endif /* _UAPI__LINUX_BPF_H__ */
>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>> index 9b7d2c6f99d1..ca1f6404af9e 100644
>> --- a/kernel/bpf/helpers.c
>> +++ b/kernel/bpf/helpers.c
>> @@ -2510,6 +2510,12 @@ BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
>>   BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
>>   BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
>>   BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
>> +BTF_ID_FLAGS(func, bpf_iter_css_pre_new, KF_ITER_NEW)
>> +BTF_ID_FLAGS(func, bpf_iter_css_pre_next, KF_ITER_NEXT | KF_RET_NULL)
>> +BTF_ID_FLAGS(func, bpf_iter_css_pre_destroy, KF_ITER_DESTROY)
>> +BTF_ID_FLAGS(func, bpf_iter_css_post_new, KF_ITER_NEW)
>> +BTF_ID_FLAGS(func, bpf_iter_css_post_next, KF_ITER_NEXT | KF_RET_NULL)
>> +BTF_ID_FLAGS(func, bpf_iter_css_post_destroy, KF_ITER_DESTROY)
>>   BTF_ID_FLAGS(func, bpf_dynptr_adjust)
>>   BTF_ID_FLAGS(func, bpf_dynptr_is_null)
>>   BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
>> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
>> index 9d1927dc3a06..8963fc779b87 100644
>> --- a/kernel/bpf/task_iter.c
>> +++ b/kernel/bpf/task_iter.c
>> @@ -880,6 +880,59 @@ __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
>>   {
>>   }
>>
>> +struct bpf_iter_css_kern {
>> +       struct cgroup_subsys_state *root;
>> +       struct cgroup_subsys_state *pos;
>> +} __attribute__((aligned(8)));
>> +
>> +__bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
>> +               struct cgroup_subsys_state *root)
> 
> similar to my comment on previous patches, please see
> kernel/bpf/cgroup_iter.c for iter/cgroup iterator program. Let's stay
> consistent. We have one iterator that accepts parameters defining
> iteration order and starting cgroup. Unless there are some technical
> reasons we can't follow similar approach with this open-coded iter,
> let's use the same approach. We can even reuse
> BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
> BPF_CGROUP_ITER_ANCESTORS_UP enums.
> 

I know your concern. It would be nice if we keep consistent with 
kernel/bpf/cgroup_iter.c

But this patch actually want to support iterating css 
(cgroup_subsys_state) not cgroup (css is more low lever).
With css_iter we can do something like 
"for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre"
in BPF Progs which is hard for cgroup_iter. In the future we can use 
this iterator to plug some customizable policy in other resource control 
system.

BTW, what I did in RFC actually very similar with the approach of 
cgroup_iter. 
(https://lore.kernel.org/all/20230827072057.1591929-4-zhouchuyi@bytedance.com/).

Thanks.
  
Andrii Nakryiko Sept. 15, 2023, 8:25 p.m. UTC | #4
On Fri, Sep 15, 2023 at 4:57 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
>
> Hello.
>
> 在 2023/9/15 07:26, Andrii Nakryiko 写道:
> > On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote:
> >>
> >> This Patch adds kfuncs bpf_iter_css_{pre,post}_{new,next,destroy} which
> >> allow creation and manipulation of struct bpf_iter_css in open-coded
> >> iterator style. These kfuncs actually wrapps css_next_descendant_{pre,
> >> post}. BPF programs can use these kfuncs through bpf_for_each macro for
> >> iteration of all descendant css under a root css.
> >>
> >> Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
> >> ---
> >>   include/uapi/linux/bpf.h       |  8 +++++
> >>   kernel/bpf/helpers.c           |  6 ++++
> >>   kernel/bpf/task_iter.c         | 53 ++++++++++++++++++++++++++++++++++
> >>   tools/include/uapi/linux/bpf.h |  8 +++++
> >>   tools/lib/bpf/bpf_helpers.h    | 12 ++++++++
> >>   5 files changed, 87 insertions(+)
> >>
> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >> index befa55b52e29..57760afc13d0 100644
> >> --- a/include/uapi/linux/bpf.h
> >> +++ b/include/uapi/linux/bpf.h
> >> @@ -7326,4 +7326,12 @@ struct bpf_iter_process {
> >>          __u64 __opaque[1];
> >>   } __attribute__((aligned(8)));
> >>
> >> +struct bpf_iter_css_pre {
> >> +       __u64 __opaque[2];
> >> +} __attribute__((aligned(8)));
> >> +
> >> +struct bpf_iter_css_post {
> >> +       __u64 __opaque[2];
> >> +} __attribute__((aligned(8)));
> >> +
> >>   #endif /* _UAPI__LINUX_BPF_H__ */
> >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >> index 9b7d2c6f99d1..ca1f6404af9e 100644
> >> --- a/kernel/bpf/helpers.c
> >> +++ b/kernel/bpf/helpers.c
> >> @@ -2510,6 +2510,12 @@ BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
> >>   BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
> >>   BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
> >>   BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
> >> +BTF_ID_FLAGS(func, bpf_iter_css_pre_new, KF_ITER_NEW)
> >> +BTF_ID_FLAGS(func, bpf_iter_css_pre_next, KF_ITER_NEXT | KF_RET_NULL)
> >> +BTF_ID_FLAGS(func, bpf_iter_css_pre_destroy, KF_ITER_DESTROY)
> >> +BTF_ID_FLAGS(func, bpf_iter_css_post_new, KF_ITER_NEW)
> >> +BTF_ID_FLAGS(func, bpf_iter_css_post_next, KF_ITER_NEXT | KF_RET_NULL)
> >> +BTF_ID_FLAGS(func, bpf_iter_css_post_destroy, KF_ITER_DESTROY)
> >>   BTF_ID_FLAGS(func, bpf_dynptr_adjust)
> >>   BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> >>   BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> >> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> >> index 9d1927dc3a06..8963fc779b87 100644
> >> --- a/kernel/bpf/task_iter.c
> >> +++ b/kernel/bpf/task_iter.c
> >> @@ -880,6 +880,59 @@ __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
> >>   {
> >>   }
> >>
> >> +struct bpf_iter_css_kern {
> >> +       struct cgroup_subsys_state *root;
> >> +       struct cgroup_subsys_state *pos;
> >> +} __attribute__((aligned(8)));
> >> +
> >> +__bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
> >> +               struct cgroup_subsys_state *root)
> >
> > similar to my comment on previous patches, please see
> > kernel/bpf/cgroup_iter.c for iter/cgroup iterator program. Let's stay
> > consistent. We have one iterator that accepts parameters defining
> > iteration order and starting cgroup. Unless there are some technical
> > reasons we can't follow similar approach with this open-coded iter,
> > let's use the same approach. We can even reuse
> > BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
> > BPF_CGROUP_ITER_ANCESTORS_UP enums.
> >
>
> I know your concern. It would be nice if we keep consistent with
> kernel/bpf/cgroup_iter.c
>
> But this patch actually want to support iterating css
> (cgroup_subsys_state) not cgroup (css is more low lever).
> With css_iter we can do something like
> "for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre"
> in BPF Progs which is hard for cgroup_iter. In the future we can use
> this iterator to plug some customizable policy in other resource control
> system.

That's fine if it's not exactly cgroup iter and returns a different
kernel object. But let's at least consistently use
BPF_CGROUP_ITER_DESCENDANTS_PRE/BPF_CGROUP_ITER_DESCENDANTS_POST/BPF_CGROUP_ITER_ANCESTORS_UP
approach as a way to specify iteration order?

>
> BTW, what I did in RFC actually very similar with the approach of
> cgroup_iter.
> (https://lore.kernel.org/all/20230827072057.1591929-4-zhouchuyi@bytedance.com/).
>
> Thanks.