Message ID | 20230228093206.821563-1-jolsa@kernel.org |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp2916582wrd; Tue, 28 Feb 2023 01:45:45 -0800 (PST) X-Google-Smtp-Source: AK7set+iSO7CwiSr6TjC4MwyleUx09fvVx4U5sx7oGRjIjMF5nooKzssik1aJ1kSoZ3XPIrN5mJC X-Received: by 2002:a17:907:3f9e:b0:86f:64bb:47eb with SMTP id hr30-20020a1709073f9e00b0086f64bb47ebmr2804285ejc.3.1677577544859; Tue, 28 Feb 2023 01:45:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677577544; cv=none; d=google.com; s=arc-20160816; b=R71JhM1A/DxinhjvxfQcfUjaGr/x76s+u5MNfd+wCu3dj0yrM79KCpW1fvHL3ZAb5U lzNdlOyLIg8P8GIadmsv8CBuhzaos2t8ee+463z4foHru2V6gxhB54j3N27jMZFXrn8B Kkpu1iNObiHvN6Bn9HAvTINPHqm7JBy/e7c1GtDd1kEHha3VPyu4b9tt8pwXMr7puhlX q0lJUkgi4ExXIWWzAMBf+bjKLCg0Htlt7Kvfd5pnqzgD0vKQSL0DmuHV0xmObr+83kmb hYalbxeIg72mtPUOjRBKdaruEDHUC10+RAvTZ9SanRQlG0tCHxKBMN+XdBtui/othoy6 3fZg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=yGbMvo43FRGEDRAahQHtrCACHab3vOIp5YtaZ2eT+p4=; b=VI22EWmL33L4Twb5h3vuS/2v1CbKXbxpulyynBS0FZfP7fwkimV5c0+UyrRALH9ext CrdeewW4UBRhLZW4+bd/URgHFRGRInoXquB+T2UHZihEVsqsFTl1h1o1neNFu6rsUmnL 9q2I4sY1YQ4mVdkgpl+mM1jt7kbPGtdcC6C9zodoBosuWyJFtPkWVIF7sROEKF1cdxC/ vMVoR0nqF5qYM9HeOETz7GSoDHGnExNwSr+HrHOv/Np8tm9nZjy7qA/K2K4u8q+ggiey xoSZV1KwIrn311qQLVnCqwPTH8NeGF9wO8xF+E33YWjsy8AsE12G+WRGMikypOOMEyMg lnyw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=qBYvEMuO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id sb17-20020a170906edd100b008dfcc758d90si1312830ejb.789.2023.02.28.01.45.06; Tue, 28 Feb 2023 01:45:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=qBYvEMuO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231308AbjB1JcR (ORCPT <rfc822;brysonjbanks@gmail.com> + 99 others); Tue, 28 Feb 2023 04:32:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47668 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229565AbjB1JcP (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 28 Feb 2023 04:32:15 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E40DD20046; Tue, 28 Feb 2023 01:32:14 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7F2596103C; Tue, 28 Feb 2023 09:32:14 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EF8B3C433EF; Tue, 28 Feb 2023 09:32:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1677576733; bh=+IV4AHbmCCiEKP7LF/e7iy42Typyf3Xx3wgeSC17nEE=; h=From:To:Cc:Subject:Date:From; b=qBYvEMuONtlLWEKiJFbsfuEnIlCoUwAG8yVp5ddeof5wT43ogBWspmnlcHJrXI0Bf 8SJXID5Ak5i4IdsjswpcC9c0ONIp1PoZ5xotoc37Im1ZOfjsmhicUSXY6DdzoNDOii grxi0zosyAVjxfpBeWpBo6AlCWixEPtFdnD5YPC/ZV6UNvN6X/WsTW26lrJ5ie/ahp KnqqORvKvPBby074X7WGfJyOVD0WrOX0a0T0NtwYy8O+YIdhF2ewK2PSeTV8eE8IQ4 nI6ChbG5LH/oPETsBv36ND6xpliUSJ5vYRr7Z9OZfh59JUS/CJcUW+Vot9SvEBfCD7 hjBI4ILFN+/tw== From: Jiri Olsa <jolsa@kernel.org> To: Alexei Starovoitov <ast@kernel.org>, Andrii Nakryiko <andrii@kernel.org>, Hao Luo <haoluo@google.com>, Andrew Morton <akpm@linux-foundation.org>, Alexander Viro <viro@zeniv.linux.org.uk>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Arnaldo Carvalho de Melo <acme@kernel.org>, Matthew Wilcox <willy@infradead.org> Cc: bpf@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-perf-users@vger.kernel.org, Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>, John Fastabend <john.fastabend@gmail.com>, KP Singh <kpsingh@chromium.org>, Stanislav Fomichev <sdf@google.com>, Daniel Borkmann <daniel@iogearbox.net>, Namhyung Kim <namhyung@gmail.com> Subject: [RFC v2 bpf-next 0/9] mm/bpf/perf: Store build id in inode object Date: Tue, 28 Feb 2023 10:31:57 +0100 Message-Id: <20230228093206.821563-1-jolsa@kernel.org> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759067551965415698?= X-GMAIL-MSGID: =?utf-8?q?1759067551965415698?= |
Series |
mm/bpf/perf: Store build id in inode object
|
|
Message
Jiri Olsa
Feb. 28, 2023, 9:31 a.m. UTC
hi, this is RFC patchset for adding build id under inode's object. The main change to previous post [1] is to use inode object instead of file object for build id data. However.. ;-) while using inode as build id storage place saves some memory by keeping just one copy of the build id for all file instances, there seems to be another problem. The problem is that we read the build id when the file is mmap-ed. Which is fine for our use case, because we only access build id data through vma->vm_file->f_inode. But there are possible scenarios/windows where the build id can be wrong when accessed in another way. Like when the file is overwritten with another binary version with different build id. This will result in having wrong build id data in inode until the new file is mmap-ed. - file open > inode->i_build_id == NULL - file mmap -> read build id > inode->i_build_id == build_id_1 [ file changed with same inode, inode keeps old i_build_id data ] - file open > inode->i_build_id == build_id_1 - file mmap -> read build id > inode->i_build_id == build_id_2 I guess we could release i_build_id when the last file's vma go out? But I'm not sure how to solve this and still be able to access build id easily just by accessing the inode->i_build_id field without any lock. I'm inclined to go back and store build id under the file object, where the build id would be correct (or missing). thoughts? thanks, jirka v2 changes: - store build id under inode [Matthew Wilcox] - use urandom_read and liburandom_read.so for test [Andrii] - add libelf-based helper to fetch build ID from elf [Andrii] - store build id or error we got when reading it [Andrii] - use full name i_build_id [Andrii] - several tests fixes [Andrii] [1] https://lore.kernel.org/bpf/20230201135737.800527-2-jolsa@kernel.org/ --- Jiri Olsa (9): mm: Store build id in inode object bpf: Use file's inode object build id in stackmap perf: Use file object build id in perf_event_mmap_event libbpf: Allow to resolve binary path in current directory selftests/bpf: Add read_buildid function selftests/bpf: Add err.h header selftests/bpf: Replace extract_build_id with read_build_id selftests/bpf: Add inode_build_id test selftests/bpf: Add iter_task_vma_buildid test fs/inode.c | 12 +++++++++++ include/linux/buildid.h | 15 ++++++++++++++ include/linux/fs.h | 7 +++++++ kernel/bpf/stackmap.c | 24 +++++++++++++++++++++- kernel/events/core.c | 46 +++++++++++++++++++++++++++++++++++++---- lib/buildid.c | 40 ++++++++++++++++++++++++++++++++++++ mm/Kconfig | 8 ++++++++ mm/mmap.c | 23 +++++++++++++++++++++ tools/lib/bpf/libbpf.c | 4 +++- tools/testing/selftests/bpf/prog_tests/bpf_iter.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ tools/testing/selftests/bpf/prog_tests/inode_build_id.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c | 19 +++++++---------- tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c | 17 ++++++--------- tools/testing/selftests/bpf/progs/bpf_iter_task_vma_buildid.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++++++ tools/testing/selftests/bpf/progs/err.h | 13 ++++++++++++ tools/testing/selftests/bpf/progs/inode_build_id.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ tools/testing/selftests/bpf/progs/profiler.inc.h | 3 +-- tools/testing/selftests/bpf/test_progs.c | 25 ---------------------- tools/testing/selftests/bpf/test_progs.h | 11 +++++++++- tools/testing/selftests/bpf/trace_helpers.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ tools/testing/selftests/bpf/trace_helpers.h | 5 +++++ 21 files changed, 581 insertions(+), 57 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/inode_build_id.c create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_task_vma_buildid.c create mode 100644 tools/testing/selftests/bpf/progs/err.h create mode 100644 tools/testing/selftests/bpf/progs/inode_build_id.c
Comments
Em Wed, Mar 01, 2023 at 09:07:14AM +1100, Dave Chinner escreveu: > On Tue, Feb 28, 2023 at 10:31:57AM +0100, Jiri Olsa wrote: > > this is RFC patchset for adding build id under inode's object. > > The main change to previous post [1] is to use inode object instead of file > > object for build id data. > > Please explain what a "build id" is, the use case for it, why we > need to store it in VFS objects, what threat model it is protecting > the system against, etc. [root@quaco ~]# file /bin/bash /bin/bash: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=160df51238a38ca27d03290f3ad5f7df75560ae0, for GNU/Linux 3.2.0, stripped [root@quaco ~]# file /lib64/libc.so.6 /lib64/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=8257ee907646e9b057197533d1e4ac8ede7a9c5c, for GNU/Linux 3.2.0, not stripped [root@quaco ~]# Those BuildID[sha1]= bits, that is present in all binaries I think in all distros for quite a while. This page, from when this was initially designed, has a discussion about it, why it is needed, etc: https://fedoraproject.org/wiki/RolandMcGrath/BuildID 'perf record' will receive MMAP records, initially without build-ids, now we have one that has, but collecting it when the mmap is executed (and thus a PERF_RECORD_MMAP* record is emitted) may not work, thus this work from Jiri. - Arnaldo > > > > However.. ;-) while using inode as build id storage place saves some memory > > by keeping just one copy of the build id for all file instances, there seems > > to be another problem. > Yes, the problem being that we can cache hundreds of millions of > inodes in memory, and only a very small subset of them are going to > have open files associated with them. And an even smaller subset are > going to be mmapped. > So, in reality, this proposal won't save any memory at all - it > costs memory for every inode that is not currently being used as > a mmapped elf executable, right? > > > The problem is that we read the build id when the file is mmap-ed. > > Why? I'm completely clueless as to what this thing does or how it's > used.... > > > Which is fine for our use case, > > Which is? > > -Dave. > -- > Dave Chinner > david@fromorbit.com
On Wed, Mar 01, 2023 at 09:07:14AM +1100, Dave Chinner wrote: > On Tue, Feb 28, 2023 at 10:31:57AM +0100, Jiri Olsa wrote: > > hi, > > this is RFC patchset for adding build id under inode's object. > > > > The main change to previous post [1] is to use inode object instead of file > > object for build id data. > > Please explain what a "build id" is, the use case for it, why we > need to store it in VFS objects, what threat model it is protecting > the system against, etc. hum I still did not get your email from mailing list, just saw it from Arnaldo's reply and downloaded it from lore our use case is for hubble/tetragon [1] and we are asked to report buildid of executed binary.. but the monitoring process is running in its own pod and can't access the the binaries outside of it, so we need to be able to read it in kernel we want to read build id from BPF program attached to sched_exec tracepoint, and from BPF iterator we considered adding BPF helper and then kfunc for that, but it turned out it'd be usefull for other use cases (like retrieving build id from atomic context [2]) to have the build id stored in file (or inode) object [1] https://github.com/cilium/tetragon/ [2] https://lore.kernel.org/bpf/CA+khW7juLEcrTOd7iKG3C_WY8L265XKNo0iLzV1fE=o-cyeHcQ@mail.gmail.com/ > > > > > However.. ;-) while using inode as build id storage place saves some memory > > by keeping just one copy of the build id for all file instances, there seems > > to be another problem. > > Yes, the problem being that we can cache hundreds of millions of > inodes in memory, and only a very small subset of them are going to > have open files associated with them. And an even smaller subset are > going to be mmapped. ok, file seems like better option now > > So, in reality, this proposal won't save any memory at all - it > costs memory for every inode that is not currently being used as > a mmapped elf executable, right? right > > > The problem is that we read the build id when the file is mmap-ed. > > Why? I'm completely clueless as to what this thing does or how it's > used.... we need the build id only when the file is mmap-ed, so it seemed like the best way to read it when the file is mmaped > > > Which is fine for our use case, > > Which is? please see above thanks, jirka
On Wed, Mar 01, 2023 at 12:41:20PM -0300, Arnaldo Carvalho de Melo wrote: > Em Wed, Mar 01, 2023 at 09:07:14AM +1100, Dave Chinner escreveu: > > On Tue, Feb 28, 2023 at 10:31:57AM +0100, Jiri Olsa wrote: > > > this is RFC patchset for adding build id under inode's object. > > > > The main change to previous post [1] is to use inode object instead of file > > > object for build id data. > > > > Please explain what a "build id" is, the use case for it, why we > > need to store it in VFS objects, what threat model it is protecting > > the system against, etc. > > [root@quaco ~]# file /bin/bash > /bin/bash: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=160df51238a38ca27d03290f3ad5f7df75560ae0, for GNU/Linux 3.2.0, stripped > [root@quaco ~]# file /lib64/libc.so.6 > /lib64/libc.so.6: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=8257ee907646e9b057197533d1e4ac8ede7a9c5c, for GNU/Linux 3.2.0, not stripped > [root@quaco ~]# > > Those BuildID[sha1]= bits, that is present in all binaries I think in > all distros for quite a while. > > This page, from when this was initially designed, has a discussion about > it, why it is needed, etc: > > https://fedoraproject.org/wiki/RolandMcGrath/BuildID > > 'perf record' will receive MMAP records, initially without build-ids, > now we have one that has, but collecting it when the mmap is executed > (and thus a PERF_RECORD_MMAP* record is emitted) may not work, thus this > work from Jiri. thanks for the pointers build id is unique id for binary that's been used to identify correct binary version for related stuff.. like binary's debuginfo in perf or match binary with stack trace entries in bpf stackmap jirka > > - Arnaldo > > > > > > > However.. ;-) while using inode as build id storage place saves some memory > > > by keeping just one copy of the build id for all file instances, there seems > > > to be another problem. > > > Yes, the problem being that we can cache hundreds of millions of > > inodes in memory, and only a very small subset of them are going to > > have open files associated with them. And an even smaller subset are > > going to be mmapped. > > > So, in reality, this proposal won't save any memory at all - it > > costs memory for every inode that is not currently being used as > > a mmapped elf executable, right? > > > > > The problem is that we read the build id when the file is mmap-ed. > > > > Why? I'm completely clueless as to what this thing does or how it's > > used.... > > > > > Which is fine for our use case, > > > > Which is? > > > > -Dave. > > -- > > Dave Chinner > > david@fromorbit.com