[bpf-next,v1,0/4] Add BPF htab map's used size for monitoring

Message ID	20221105025146.238209-1-horenchuang@bytedance.com
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: "Ho-Ren (Jack) Chuang" <horenchuang@bytedance.com> To: Alexei Starovoitov <ast@kernel.org>, Alexei Starovoitov <alexei.starovoitov@gmail.com>, Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>, Jiri Olsa <olsajiri@gmail.com>, Andrii Nakryiko <andrii@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, John Fastabend <john.fastabend@gmail.com>, Martin KaFai Lau <martin.lau@linux.dev>, Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@google.com>, Quentin Monnet <quentin@isovalent.com>, Mykola Lysenko <mykolal@fb.com>, Shuah Khan <shuah@kernel.org>, Nathan Chancellor <nathan@kernel.org>, Nick Desaulniers <ndesaulniers@google.com>, Tom Rix <trix@redhat.com>, Joanne Koong <joannelkoong@gmail.com>, Kui-Feng Lee <kuifeng@fb.com>, Lorenzo Bianconi <lorenzo@kernel.org>, Maxim Mikityanskiy <maximmi@nvidia.com>, Hao Xiang <hao.xiang@bytedance.com>, Punit Agrawal <punit.agrawal@bytedance.com>, Yifei Ma <yifeima@bytedance.com>, Xiaoning Ding <xiaoning.ding@bytedance.com>, bpf@vger.kernel.org Cc: Ho-Ren Chuang <horenc@vt.edu>, Ho-Ren Chuang <horenchuang@bytedance.com>, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, llvm@lists.linux.dev Subject: [PATCH bpf-next v1 0/4] Add BPF htab map's used size for monitoring Date: Sat, 5 Nov 2022 02:51:42 +0000 Message-Id: <20221105025146.238209-1-horenchuang@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	Add BPF htab map's used size for monitoring \| [bpf-next,v1,0/4] Add BPF htab map's used size for monitoring [bpf-next,v1,1/4] bpf: Support reporting BPF htab map's used size for monitoring [bpf-next,v1,2/4] bpftool: Add tools support to show BPF htab map's used size [bpf-next,v1,3/4] samples/bpf: Add concurrency testing for BPF htab map's used size [bpf-next,v1,4/4] selftests/bpf: Add unit tests for BPF htab map's used size

Message ID

20221105025146.238209-1-horenchuang@bytedance.com

Headers

Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
From: "Ho-Ren (Jack) Chuang" <horenchuang@bytedance.com>
To: Alexei Starovoitov <ast@kernel.org>,
        Alexei Starovoitov <alexei.starovoitov@gmail.com>,
        Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>,
        Jiri Olsa <olsajiri@gmail.com>,
        Andrii Nakryiko <andrii@kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        John Fastabend <john.fastabend@gmail.com>,
        Martin KaFai Lau <martin.lau@linux.dev>,
        Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>,
        KP Singh <kpsingh@kernel.org>,
        Stanislav Fomichev <sdf@google.com>,
        Quentin Monnet <quentin@isovalent.com>,
        Mykola Lysenko <mykolal@fb.com>, Shuah Khan <shuah@kernel.org>,
        Nathan Chancellor <nathan@kernel.org>,
        Nick Desaulniers <ndesaulniers@google.com>,
        Tom Rix <trix@redhat.com>,
        Joanne Koong <joannelkoong@gmail.com>,
        Kui-Feng Lee <kuifeng@fb.com>,
        Lorenzo Bianconi <lorenzo@kernel.org>,
        Maxim Mikityanskiy <maximmi@nvidia.com>,
        Hao Xiang <hao.xiang@bytedance.com>,
        Punit Agrawal <punit.agrawal@bytedance.com>,
        Yifei Ma <yifeima@bytedance.com>,
        Xiaoning Ding <xiaoning.ding@bytedance.com>,
        bpf@vger.kernel.org
Cc: Ho-Ren Chuang <horenc@vt.edu>,
        Ho-Ren Chuang <horenchuang@bytedance.com>,
        linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
        llvm@lists.linux.dev
Subject: [PATCH bpf-next v1 0/4] Add BPF htab map's used size for monitoring
Date: Sat,  5 Nov 2022 02:51:42 +0000
Message-Id: <20221105025146.238209-1-horenchuang@bytedance.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

Add BPF htab map's used size for monitoring |

Message

Ho-Ren (Jack) Chuang Nov. 5, 2022, 2:51 a.m. UTC

  Hello everyone,

We have prepared patches to address an issue from a previous discussion.
The previous discussion email thread is here: https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKPckQ@mail.gmail.com/

This patch series adds a new field "used_entries" to struct bpf_map_info
and keeps tracking the "count" field in bpf_htab in both the preallocated
and non-preallocated cases.

bpftool is modified to report the newly added "used_entries" field in
struct bpf_map_info and to mark pre-allocated htab maps with "*".
These make it easier to view the current memory situation of a hashmap.

We have added a new interface function map_get_used_elem in bpf_map_ops
to provide an abstraction layer so that other map type implementations can
support the "used_entries" attribute in a future change.

A concurrency testing for pre-allocated and dynamically allocated
htab maps is introduced to test the correctness and performance of
htab map's used size.

Existing unit tests are integrated to test the correctness of
htab map's used size.

Thank you,

Ho-Ren (Jack) Chuang (4):
  bpf: Support reporting BPF htab map's used size for monitoring
  bpftool: Add tools support to show BPF htab map's used size
  samples/bpf: Add concurrency testing for BPF htab map's used size
  selftests/bpf: Add unit tests for BPF htab map's used size

 include/linux/bpf.h                     |   1 +
 include/uapi/linux/bpf.h                |   1 +
 kernel/bpf/hashtab.c                    |  19 +++
 kernel/bpf/syscall.c                    |   2 +
 samples/bpf/Makefile                    |   4 +
 samples/bpf/test_map_used_kern.c        |  65 ++++++++
 samples/bpf/test_map_used_user.c        | 204 ++++++++++++++++++++++++
 tools/bpf/bpftool/map.c                 |   9 +-
 tools/include/uapi/linux/bpf.h          |   1 +
 tools/testing/selftests/bpf/test_maps.c |  74 ++++++++-
 10 files changed, 377 insertions(+), 3 deletions(-)
 create mode 100644 samples/bpf/test_map_used_kern.c
 create mode 100644 samples/bpf/test_map_used_user.c

Comments

Alexei Starovoitov Nov. 5, 2022, 4:20 p.m. UTC | #1

On Fri, Nov 4, 2022 at 7:52 PM Ho-Ren (Jack) Chuang
<horenchuang@bytedance.com> wrote:
>
> Hello everyone,
>
> We have prepared patches to address an issue from a previous discussion.
> The previous discussion email thread is here: https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKPckQ@mail.gmail.com/

Rephrasing what was said earlier.
We're not keeping the count of elements in a preallocated hash map
and we are not going to add one.
The bpf prog needs to do the accounting on its own if it needs
this kind of statistics.
Keeping the count for non-prealloc is already significant performance
overhead. We don't trade performance for stats.

Hao Xiang Nov. 8, 2022, 12:30 a.m. UTC | #2

Hi Alexei,

We understand the concern on added performance overhead. We had some
discussion about this while working on the patch and decided to give
it a try (my bad).

Adding some more context. We are leveraging the BPF_OBJ_GET_INFO_BY_FD
syscall to trace CPU usage per prog and memory usage per map. We would
like to use this patch to add an interface for map types to return its
internal "count". For instance, we are thinking of having the below
map types to report the "count" and those won't add overhead to the
hot path.
1. ringbuf to return its "count" by calculating the distance between
producer_pos and consumer_pos
2. queue and stack to return its "count" from the head's position
3. dev map hash to returns its "count" from items

There are other map types, similar to the hashtab pre-allocation case,
will introduce overhead in the hot path in order to count the stats. I
think we can find alternative solutions for those (eg, iterate the map
and count, count only if bpf_stats_enabled switch is on, etc). There
are cases where this can't be done at the application level because
applications don't see the internal stats in order to do the right
counting.

We can remove the counting for the pre-allocated case in this patch.
Please let us know what you think.

Thanks, Hao

On Sat, Nov 5, 2022 at 9:20 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Nov 4, 2022 at 7:52 PM Ho-Ren (Jack) Chuang
> <horenchuang@bytedance.com> wrote:
> >
> > Hello everyone,
> >
> > We have prepared patches to address an issue from a previous discussion.
> > The previous discussion email thread is here: https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKPckQ@mail.gmail.com/
>
> Rephrasing what was said earlier.
> We're not keeping the count of elements in a preallocated hash map
> and we are not going to add one.
> The bpf prog needs to do the accounting on its own if it needs
> this kind of statistics.
> Keeping the count for non-prealloc is already significant performance
> overhead. We don't trade performance for stats.

Hao Xiang Nov. 28, 2022, 11:03 p.m. UTC | #3

Hi Alexei, we can use the existing switch bpf_stats_enabled around the
added overhead. The switch is turned off by default so I believe there
will be no extra overhead once we do that. Can you please have a
second thought on this?

On Mon, Nov 7, 2022 at 4:30 PM Hao Xiang . <hao.xiang@bytedance.com> wrote:
>
> Hi Alexei,
>
> We understand the concern on added performance overhead. We had some
> discussion about this while working on the patch and decided to give
> it a try (my bad).
>
> Adding some more context. We are leveraging the BPF_OBJ_GET_INFO_BY_FD
> syscall to trace CPU usage per prog and memory usage per map. We would
> like to use this patch to add an interface for map types to return its
> internal "count". For instance, we are thinking of having the below
> map types to report the "count" and those won't add overhead to the
> hot path.
> 1. ringbuf to return its "count" by calculating the distance between
> producer_pos and consumer_pos
> 2. queue and stack to return its "count" from the head's position
> 3. dev map hash to returns its "count" from items
>
> There are other map types, similar to the hashtab pre-allocation case,
> will introduce overhead in the hot path in order to count the stats. I
> think we can find alternative solutions for those (eg, iterate the map
> and count, count only if bpf_stats_enabled switch is on, etc). There
> are cases where this can't be done at the application level because
> applications don't see the internal stats in order to do the right
> counting.
>
> We can remove the counting for the pre-allocated case in this patch.
> Please let us know what you think.
>
> Thanks, Hao
>
> On Sat, Nov 5, 2022 at 9:20 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Nov 4, 2022 at 7:52 PM Ho-Ren (Jack) Chuang
> > <horenchuang@bytedance.com> wrote:
> > >
> > > Hello everyone,
> > >
> > > We have prepared patches to address an issue from a previous discussion.
> > > The previous discussion email thread is here: https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKPckQ@mail.gmail.com/
> >
> > Rephrasing what was said earlier.
> > We're not keeping the count of elements in a preallocated hash map
> > and we are not going to add one.
> > The bpf prog needs to do the accounting on its own if it needs
> > this kind of statistics.
> > Keeping the count for non-prealloc is already significant performance
> > overhead. We don't trade performance for stats.