Message ID | 20221105025146.238209-1-horenchuang@bytedance.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp770688wru; Fri, 4 Nov 2022 19:53:41 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5uxv3sY6ltY6eCU3puBei8CZgowu4WFLyVH2zTR4B3gxUBOUZ5feFOCKFgQ7FG6PLCvXkx X-Received: by 2002:a17:906:80c:b0:7ae:7b1:df51 with SMTP id e12-20020a170906080c00b007ae07b1df51mr15294323ejd.651.1667616821194; Fri, 04 Nov 2022 19:53:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667616821; cv=none; d=google.com; s=arc-20160816; b=Y+roM51Ar9lf+yBTEAnG9ZQ5CGKrEkkisB4kvymzVT0CPsc2WlGVjUs2eY9NMwMX1C FOBNZr9vOiW3LUyBVXNnRuSOtbImSkqj1f4IOqJB3JtX356PonJyoWBYqF+F25gSYe+S pHWsbSxzB7nkP9+JxNPbh2hhkNSDAhGXBiHoWWGM0VBwrmsk67/E4P3VD0xT8NYzajhF TnCD/mKkG7ECvw0p7AzpdxyokRvJCVpOS5yyP3W7uSDby3d6VtqvV1a7sYLuEJPK99Gu JSBIgv6Z6A+NlovhkmLssuh6SaUIg6m8ogJtHjs92EjjAhSBTGsaF0iJty8R/bzVDoB1 +kMQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=eCxoN6ChgDmmemcuz37gCvFaemxJyQ9HSvlBAAP7JaU=; b=XoeR89TS4UBSRH9nFeXsd0WjADL0XXc3FYNYU2WYsW9a9uhuieg/jFcuQcqRm1C/Q2 HrCYszkeznhfIGuboA5VIU64nYd46GAUuaIg+MBVhYAmW7i5fAtNxW8tOzVPnea5nksq HTXlpmXXsn4ig5F1sDura8k1zwMuZsBtRQpGKaOPwhw7H7ftQCrRb0r1fKdadDVs4ko6 x4XPvA0o2VFDd12U/fW0izuaNes3eCENzoba97ajOP98SvQFt5tcIWiWw1HRJ2pUwOda 2daEwWf3e7ekJ2U+Q7LPCHd5S6cdVrlTw/RgBlnlvf/qBxtf3loB+tF4qAiK62/03VZK rCvQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=Knez7Hwb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f29-20020a50a6dd000000b00459a0a21965si1223339edc.516.2022.11.04.19.53.17; Fri, 04 Nov 2022 19:53:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=Knez7Hwb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229560AbiKECwg (ORCPT <rfc822;hjfbswb@gmail.com> + 99 others); Fri, 4 Nov 2022 22:52:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50860 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229551AbiKECw2 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 4 Nov 2022 22:52:28 -0400 Received: from mail-qv1-xf2d.google.com (mail-qv1-xf2d.google.com [IPv6:2607:f8b0:4864:20::f2d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2895F3FB83 for <linux-kernel@vger.kernel.org>; Fri, 4 Nov 2022 19:52:27 -0700 (PDT) Received: by mail-qv1-xf2d.google.com with SMTP id mi9so4639331qvb.8 for <linux-kernel@vger.kernel.org>; Fri, 04 Nov 2022 19:52:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=eCxoN6ChgDmmemcuz37gCvFaemxJyQ9HSvlBAAP7JaU=; b=Knez7Hwb0qj9NP8+cMkRu09v3XcaTT/OuAB32+CgReZn9N6VuUDh5A7MQnYd0QYlyq sp4nR+UWrN9ZQ0UnOb3uUhABMcgDHku50dFtVXLS1fR4Jjb2FGobmqsSemV7tRIuPSuj LLgId/2ET+lNwnPjlilXkO7Mrv7qA5OTmHnKncWQVSFWga2m8tbk9TYvkTGIuSxV0Lcq zBxUO4Tt4d8Rszj/H5ZncVG3mq/tBwI3LAk6IpTvI65p6RqE0ZVYAkom/OJV7UKDqaW6 zYvWy+t8C4d7k5nyj6w59YMItGnaC44wAgREV+33Cr4oukyfobzKYvSlMx8CQJNGmrm3 IY9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=eCxoN6ChgDmmemcuz37gCvFaemxJyQ9HSvlBAAP7JaU=; b=gkPlFsV+C86CE9ZtU+XG5/fCJgUkHB4pDmT0kyEYZMMaBNdK4qH7nnU7SaZzafeHYq TYhBfP5IOiA9l1+xqAsEBxaopbcKBuZo3+VnKS4XmP4kUajAxlcPcPjOSABpBojDhqiY YHy2I9xrAKRx3rhbA/E+SUf+WxkjizGzMSDW+iHjDHNS+RbvrmGrDqn3Md+g7IVKfQzP ckQBN04YgGIh4rmCcyKy22kK+Lm8Fh6hbS3PGei7pDlkvCnSa2RguM7EdcbtQvP3rZkL 6FqqREvaPpJi78mj2jMqYeEfJcpTxSouSAAwMdnAnrULLAgHYV2NbeMnTsk+8S7mnJyM MAug== X-Gm-Message-State: ACrzQf07G6VYIth6uL1Ga0FuT3xybjNSf17D5oPd0GTGtNJ9IA47w3sR lMVNqF/D8Ze1kE2l7FxMle9L8w== X-Received: by 2002:a0c:e2d4:0:b0:4bb:5902:922c with SMTP id t20-20020a0ce2d4000000b004bb5902922cmr34764425qvl.57.1667616746264; Fri, 04 Nov 2022 19:52:26 -0700 (PDT) Received: from 192-168-53-12.byted.org ([130.44.212.119]) by smtp.gmail.com with ESMTPSA id ay14-20020a05620a178e00b006bb366779a4sm805905qkb.6.2022.11.04.19.52.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Nov 2022 19:52:25 -0700 (PDT) From: "Ho-Ren (Jack) Chuang" <horenchuang@bytedance.com> To: Alexei Starovoitov <ast@kernel.org>, Alexei Starovoitov <alexei.starovoitov@gmail.com>, Hao Luo <haoluo@google.com>, Jiri Olsa <jolsa@kernel.org>, Jiri Olsa <olsajiri@gmail.com>, Andrii Nakryiko <andrii@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, John Fastabend <john.fastabend@gmail.com>, Martin KaFai Lau <martin.lau@linux.dev>, Song Liu <song@kernel.org>, Yonghong Song <yhs@fb.com>, KP Singh <kpsingh@kernel.org>, Stanislav Fomichev <sdf@google.com>, Quentin Monnet <quentin@isovalent.com>, Mykola Lysenko <mykolal@fb.com>, Shuah Khan <shuah@kernel.org>, Nathan Chancellor <nathan@kernel.org>, Nick Desaulniers <ndesaulniers@google.com>, Tom Rix <trix@redhat.com>, Joanne Koong <joannelkoong@gmail.com>, Kui-Feng Lee <kuifeng@fb.com>, Lorenzo Bianconi <lorenzo@kernel.org>, Maxim Mikityanskiy <maximmi@nvidia.com>, Hao Xiang <hao.xiang@bytedance.com>, Punit Agrawal <punit.agrawal@bytedance.com>, Yifei Ma <yifeima@bytedance.com>, Xiaoning Ding <xiaoning.ding@bytedance.com>, bpf@vger.kernel.org Cc: Ho-Ren Chuang <horenc@vt.edu>, Ho-Ren Chuang <horenchuang@bytedance.com>, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, llvm@lists.linux.dev Subject: [PATCH bpf-next v1 0/4] Add BPF htab map's used size for monitoring Date: Sat, 5 Nov 2022 02:51:42 +0000 Message-Id: <20221105025146.238209-1-horenchuang@bytedance.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748622975583690444?= X-GMAIL-MSGID: =?utf-8?q?1748622975583690444?= |
Series |
Add BPF htab map's used size for monitoring
|
|
Message
Ho-Ren (Jack) Chuang
Nov. 5, 2022, 2:51 a.m. UTC
Hello everyone, We have prepared patches to address an issue from a previous discussion. The previous discussion email thread is here: https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKPckQ@mail.gmail.com/ This patch series adds a new field "used_entries" to struct bpf_map_info and keeps tracking the "count" field in bpf_htab in both the preallocated and non-preallocated cases. bpftool is modified to report the newly added "used_entries" field in struct bpf_map_info and to mark pre-allocated htab maps with "*". These make it easier to view the current memory situation of a hashmap. We have added a new interface function map_get_used_elem in bpf_map_ops to provide an abstraction layer so that other map type implementations can support the "used_entries" attribute in a future change. A concurrency testing for pre-allocated and dynamically allocated htab maps is introduced to test the correctness and performance of htab map's used size. Existing unit tests are integrated to test the correctness of htab map's used size. Thank you, Ho-Ren (Jack) Chuang (4): bpf: Support reporting BPF htab map's used size for monitoring bpftool: Add tools support to show BPF htab map's used size samples/bpf: Add concurrency testing for BPF htab map's used size selftests/bpf: Add unit tests for BPF htab map's used size include/linux/bpf.h | 1 + include/uapi/linux/bpf.h | 1 + kernel/bpf/hashtab.c | 19 +++ kernel/bpf/syscall.c | 2 + samples/bpf/Makefile | 4 + samples/bpf/test_map_used_kern.c | 65 ++++++++ samples/bpf/test_map_used_user.c | 204 ++++++++++++++++++++++++ tools/bpf/bpftool/map.c | 9 +- tools/include/uapi/linux/bpf.h | 1 + tools/testing/selftests/bpf/test_maps.c | 74 ++++++++- 10 files changed, 377 insertions(+), 3 deletions(-) create mode 100644 samples/bpf/test_map_used_kern.c create mode 100644 samples/bpf/test_map_used_user.c
Comments
On Fri, Nov 4, 2022 at 7:52 PM Ho-Ren (Jack) Chuang <horenchuang@bytedance.com> wrote: > > Hello everyone, > > We have prepared patches to address an issue from a previous discussion. > The previous discussion email thread is here: https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKPckQ@mail.gmail.com/ Rephrasing what was said earlier. We're not keeping the count of elements in a preallocated hash map and we are not going to add one. The bpf prog needs to do the accounting on its own if it needs this kind of statistics. Keeping the count for non-prealloc is already significant performance overhead. We don't trade performance for stats.
Hi Alexei, We understand the concern on added performance overhead. We had some discussion about this while working on the patch and decided to give it a try (my bad). Adding some more context. We are leveraging the BPF_OBJ_GET_INFO_BY_FD syscall to trace CPU usage per prog and memory usage per map. We would like to use this patch to add an interface for map types to return its internal "count". For instance, we are thinking of having the below map types to report the "count" and those won't add overhead to the hot path. 1. ringbuf to return its "count" by calculating the distance between producer_pos and consumer_pos 2. queue and stack to return its "count" from the head's position 3. dev map hash to returns its "count" from items There are other map types, similar to the hashtab pre-allocation case, will introduce overhead in the hot path in order to count the stats. I think we can find alternative solutions for those (eg, iterate the map and count, count only if bpf_stats_enabled switch is on, etc). There are cases where this can't be done at the application level because applications don't see the internal stats in order to do the right counting. We can remove the counting for the pre-allocated case in this patch. Please let us know what you think. Thanks, Hao On Sat, Nov 5, 2022 at 9:20 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Fri, Nov 4, 2022 at 7:52 PM Ho-Ren (Jack) Chuang > <horenchuang@bytedance.com> wrote: > > > > Hello everyone, > > > > We have prepared patches to address an issue from a previous discussion. > > The previous discussion email thread is here: https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKPckQ@mail.gmail.com/ > > Rephrasing what was said earlier. > We're not keeping the count of elements in a preallocated hash map > and we are not going to add one. > The bpf prog needs to do the accounting on its own if it needs > this kind of statistics. > Keeping the count for non-prealloc is already significant performance > overhead. We don't trade performance for stats.
Hi Alexei, we can use the existing switch bpf_stats_enabled around the added overhead. The switch is turned off by default so I believe there will be no extra overhead once we do that. Can you please have a second thought on this? On Mon, Nov 7, 2022 at 4:30 PM Hao Xiang . <hao.xiang@bytedance.com> wrote: > > Hi Alexei, > > We understand the concern on added performance overhead. We had some > discussion about this while working on the patch and decided to give > it a try (my bad). > > Adding some more context. We are leveraging the BPF_OBJ_GET_INFO_BY_FD > syscall to trace CPU usage per prog and memory usage per map. We would > like to use this patch to add an interface for map types to return its > internal "count". For instance, we are thinking of having the below > map types to report the "count" and those won't add overhead to the > hot path. > 1. ringbuf to return its "count" by calculating the distance between > producer_pos and consumer_pos > 2. queue and stack to return its "count" from the head's position > 3. dev map hash to returns its "count" from items > > There are other map types, similar to the hashtab pre-allocation case, > will introduce overhead in the hot path in order to count the stats. I > think we can find alternative solutions for those (eg, iterate the map > and count, count only if bpf_stats_enabled switch is on, etc). There > are cases where this can't be done at the application level because > applications don't see the internal stats in order to do the right > counting. > > We can remove the counting for the pre-allocated case in this patch. > Please let us know what you think. > > Thanks, Hao > > On Sat, Nov 5, 2022 at 9:20 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Fri, Nov 4, 2022 at 7:52 PM Ho-Ren (Jack) Chuang > > <horenchuang@bytedance.com> wrote: > > > > > > Hello everyone, > > > > > > We have prepared patches to address an issue from a previous discussion. > > > The previous discussion email thread is here: https://lore.kernel.org/all/CAADnVQLBt0snxv4bKwg1WKQ9wDFbaDCtZ03v1-LjOTYtsKPckQ@mail.gmail.com/ > > > > Rephrasing what was said earlier. > > We're not keeping the count of elements in a preallocated hash map > > and we are not going to add one. > > The bpf prog needs to do the accounting on its own if it needs > > this kind of statistics. > > Keeping the count for non-prealloc is already significant performance > > overhead. We don't trade performance for stats.