Message ID | 20230109094247.1464856-1-imagedong@tencent.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp2063171wrt; Mon, 9 Jan 2023 01:47:34 -0800 (PST) X-Google-Smtp-Source: AMrXdXtedpxWuvWtzTccMZNQoJ0hwUaZJ2FD/jO++dfIAhHxQbjtk4HviQlkcLRFc5UFVuF3jLNG X-Received: by 2002:a17:906:60cd:b0:845:bb21:f638 with SMTP id f13-20020a17090660cd00b00845bb21f638mr50567924ejk.75.1673257654078; Mon, 09 Jan 2023 01:47:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673257654; cv=none; d=google.com; s=arc-20160816; b=PHV1qMgJfzhfs20kzbNpjXwEFwHXKfdoMmTQZpTRD+I39pj2rifzZH6Qcyahy4sAQN 8KRH5Df4IdqYIOfke+vgbYj77skPiwrMaRINC32vgR79J8y38ZyH0D6NohSdDKhABq9L 347cIaPwdHoj/ODPLWnvRBbA7ihSfRcD0M9T+zJHRlbduEqhjn3x8wvFCIkY9oHAw/A8 CBH8qsllGiPdbVaJf08VBpG90K63PRp+d9kqJXV3wlVwH8wVb+0zaTWPqZqE/EqJLsjo tYNuIOt9UzLemCqewnXesWD40LoVc/Av27Qey57qZ9sLY1m+qACZmJLYn/wRTmetfeDe IBbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=001Ob8msDHYSJ4H6+BAMlx9g50lKceUKFFGpbLEZpGo=; b=J9zXM9sH/C9Usbvgb/H4shdn5ksBh7yHXyAQoLNyWMTKVN7Y3V4fzIVXYBwUBa9lW9 Hzh5JAqrmOpFRxJGYqTp/T+UmsMquXt8Uq18Y1dS18YTzXgAcj10dQYHFgqqrWNYAu91 CnMgz2VxOgEJxoS5VQc7dsUfLDs5UhyqH8a1P0NZ7JImjSt4cvGpOrEDnPANHS0krg/3 wuq1GSySBaDMHhyIdoF/bWYQr/2mZQzkjIcoAAQ3nfve3bKUp5hIs+N/k9nxC6xZRwj0 CdV9Y5AqrRyXtnbTGCo2DeqNFfHmfweYFq1tJdJYW55nnpXIKSCicLe58CD84YaSwLo4 8MTg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=eYDyOUaH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id wv4-20020a170907080400b007c10bb5b4b3si9307658ejb.681.2023.01.09.01.47.10; Mon, 09 Jan 2023 01:47:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=eYDyOUaH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233885AbjAIJnn (ORCPT <rfc822;zhanglyra.2023@gmail.com> + 99 others); Mon, 9 Jan 2023 04:43:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231126AbjAIJni (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 9 Jan 2023 04:43:38 -0500 Received: from mail-pf1-x441.google.com (mail-pf1-x441.google.com [IPv6:2607:f8b0:4864:20::441]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 459AE1166; Mon, 9 Jan 2023 01:43:37 -0800 (PST) Received: by mail-pf1-x441.google.com with SMTP id c9so5674768pfj.5; Mon, 09 Jan 2023 01:43:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=001Ob8msDHYSJ4H6+BAMlx9g50lKceUKFFGpbLEZpGo=; b=eYDyOUaHHTHcueKILaSCYIfEciTW1pbEIlEDRLZdc9Ld42Fi2TMY/zFOiUclIOYxKE G3/Vdt+ep3en4CD1mF1VzlSKuo+Wh9HxpkvfD9ZxyfG2pUc4J+5QBSTBaqsv0puA5093 Qu9WF+zTU+e/LwbJOmsbes/c0YUi3NyINImEReG2kovWz6Wvfh0NfJoHTvzal72OKPmz 61xy1Eh8diyfbUQbVjVt0GljfAvO+xDSAPx4ODJJMHsDZZYuir01obzwfCabdj6JAyF/ WE5/hm2vf1huqC9lEba3pNmbEHqaTcg3Xwj/ovuUgF8h08W+FGa52hgsIdz66aXOjA+L EozA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=001Ob8msDHYSJ4H6+BAMlx9g50lKceUKFFGpbLEZpGo=; b=4ls3ZSJ89YHmWAiBGRBfdptqdECbkarkPa3NlYFCtG2ZZoK9v4YlSR9iShIrnrNq6I bG9bbKWiwnvSk4UkwK1gVR0E3sJDbc3a/3z/mmnzSbcGyUqPeu9D1BctoCxe3f0Fwi+C mvp+KHqRC1ox2K2QArXmboobn5qLbA0Kb+zzkua4KcgS+YtW5ms75bcvi4vw+dFPotzC VnoZQ0dKmfnrdHFDBnZ2KLhnIiOvR2sKKB7rUUmmqZ28DrcKXKeNXc8cQlD4NHJ+jq4g FMNS4xPwCl8EGa/KduPloI7Ti7ycPmA7DzNmlndKQ0WS7PgnB00oDnwgZiR4iW4omC1D SnDw== X-Gm-Message-State: AFqh2kpC8eJd7vRq+/rxBhO7ARCWWJG8AEbAXcSXuta/jxGfaAljlfoY kU3EaIL0HQ7HIZ04ys6Ukrw= X-Received: by 2002:a05:6a00:414c:b0:581:7c46:debd with SMTP id bv12-20020a056a00414c00b005817c46debdmr44941706pfb.24.1673257416708; Mon, 09 Jan 2023 01:43:36 -0800 (PST) Received: from localhost.localdomain ([203.205.141.20]) by smtp.gmail.com with ESMTPSA id z8-20020aa79e48000000b00580849b55a2sm5629527pfq.26.2023.01.09.01.43.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Jan 2023 01:43:36 -0800 (PST) From: menglong8.dong@gmail.com X-Google-Original-From: imagedong@tencent.com To: daniel@iogearbox.net Cc: ast@kernel.org, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, Menglong Dong <imagedong@tencent.com> Subject: [PATCH] libbpf: resolve kernel function name optimization for kprobe Date: Mon, 9 Jan 2023 17:42:47 +0800 Message-Id: <20230109094247.1464856-1-imagedong@tencent.com> X-Mailer: git-send-email 2.39.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754537817815539995?= X-GMAIL-MSGID: =?utf-8?q?1754537817815539995?= |
Series |
libbpf: resolve kernel function name optimization for kprobe
|
|
Commit Message
Menglong Dong
Jan. 9, 2023, 9:42 a.m. UTC
From: Menglong Dong <imagedong@tencent.com> The function name in kernel may be changed by the compiler. For example, the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'. This kind optimization can happen in any kernel function. Therefor, we should conside this case. If we failed to attach kprobe with a '-ENOENT', then we can lookup the kallsyms and check if there is a similar function end with '.xxx', and retry. Signed-off-by: Menglong Dong <imagedong@tencent.com> --- tools/lib/bpf/libbpf.c | 37 ++++++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-)
Comments
On 1/9/23 1:42 AM, menglong8.dong@gmail.com wrote: > From: Menglong Dong <imagedong@tencent.com> > > The function name in kernel may be changed by the compiler. For example, > the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'. > > This kind optimization can happen in any kernel function. Therefor, we > should conside this case. > > If we failed to attach kprobe with a '-ENOENT', then we can lookup the > kallsyms and check if there is a similar function end with '.xxx', and > retry. This might produce incorrect result, so this approach won't work for all .isra.0 cases. When a function name is changed from <func> to <func>.isra.<num>, it is possible that compiler may have make some changes to the arguments, e.g., removing one argument, chaning a semantics of argument, etc. if bpf program still uses the original function signature, the bpf program may produce unexpected result. > > Signed-off-by: Menglong Dong <imagedong@tencent.com> > --- > tools/lib/bpf/libbpf.c | 37 ++++++++++++++++++++++++++++++++++++- > 1 file changed, 36 insertions(+), 1 deletion(-) > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > index a5c67a3c93c5..fdfb1ca34ced 100644 > --- a/tools/lib/bpf/libbpf.c > +++ b/tools/lib/bpf/libbpf.c > @@ -10375,12 +10375,30 @@ bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog, > return libbpf_err_ptr(err); > } > > +struct kprobe_resolve { > + char pattern[128]; > + char name[128]; > +}; > + > +static int kprobe_kallsyms_cb(unsigned long long sym_addr, char sym_type, > + const char *sym_name, void *ctx) > +{ > + struct kprobe_resolve *res = ctx; > + > + if (!glob_match(sym_name, res->pattern)) > + return 0; > + strcpy(res->name, sym_name); > + return 1; > +} > + > static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link) > { > DECLARE_LIBBPF_OPTS(bpf_kprobe_opts, opts); > + struct kprobe_resolve res = {}; > unsigned long offset = 0; > const char *func_name; > char *func; > + int err; > int n; > > *link = NULL; > @@ -10408,8 +10426,25 @@ static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf > > opts.offset = offset; > *link = bpf_program__attach_kprobe_opts(prog, func, &opts); > + err = libbpf_get_error(*link); > + > + if (!err || err != -ENOENT) > + goto out; > + > + sprintf(res.pattern, "%s.*", func); > + if (!libbpf_kallsyms_parse(kprobe_kallsyms_cb, &res)) > + goto out; > + > + pr_warn("prog '%s': trying to create %s '%s+0x%zx' perf event instead\n", > + prog->name, opts.retprobe ? "kretprobe" : "kprobe", > + res.name, offset); > + > + *link = bpf_program__attach_kprobe_opts(prog, res.name, &opts); > + err = libbpf_get_error(*link); > + > +out: > free(func); > - return libbpf_get_error(*link); > + return err; > } > > static int attach_ksyscall(const struct bpf_program *prog, long cookie, struct bpf_link **link)
On Tue, Jan 10, 2023 at 4:29 AM Yonghong Song <yhs@meta.com> wrote: > > > > On 1/9/23 1:42 AM, menglong8.dong@gmail.com wrote: > > From: Menglong Dong <imagedong@tencent.com> > > > > The function name in kernel may be changed by the compiler. For example, > > the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'. > > > > This kind optimization can happen in any kernel function. Therefor, we > > should conside this case. > > > > If we failed to attach kprobe with a '-ENOENT', then we can lookup the > > kallsyms and check if there is a similar function end with '.xxx', and > > retry. > > This might produce incorrect result, so this approach won't work > for all .isra.0 cases. When a function name is changed from > <func> to <func>.isra.<num>, it is possible that compiler may have > make some changes to the arguments, e.g., removing one argument, > chaning a semantics of argument, etc. if bpf program still > uses the original function signature, the bpf program may > produce unexpected result. Oops, I wasn't aware of this part. Can we make this function disabled by default and offer an option to users to enable it? Such as: bpf_object_adapt_sym(struct bpf_object *obj) In my case, kernel function rename is common, and I have to check all functions and do such adaptation before attaching my kprobe programs, which makes me can't use auto-attach. What's more, I haven't seen the arguments change so far, and maybe it's not a common case? (Please just ignore this reply if it doesn't work :/ ) Thanks! Menglong Dong > > > > > Signed-off-by: Menglong Dong <imagedong@tencent.com> > > --- > > tools/lib/bpf/libbpf.c | 37 ++++++++++++++++++++++++++++++++++++- > > 1 file changed, 36 insertions(+), 1 deletion(-) > > > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > > index a5c67a3c93c5..fdfb1ca34ced 100644 > > --- a/tools/lib/bpf/libbpf.c > > +++ b/tools/lib/bpf/libbpf.c > > @@ -10375,12 +10375,30 @@ bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog, > > return libbpf_err_ptr(err); > > } > > > > +struct kprobe_resolve { > > + char pattern[128]; > > + char name[128]; > > +}; > > + > > +static int kprobe_kallsyms_cb(unsigned long long sym_addr, char sym_type, > > + const char *sym_name, void *ctx) > > +{ > > + struct kprobe_resolve *res = ctx; > > + > > + if (!glob_match(sym_name, res->pattern)) > > + return 0; > > + strcpy(res->name, sym_name); > > + return 1; > > +} > > + > > static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link) > > { > > DECLARE_LIBBPF_OPTS(bpf_kprobe_opts, opts); > > + struct kprobe_resolve res = {}; > > unsigned long offset = 0; > > const char *func_name; > > char *func; > > + int err; > > int n; > > > > *link = NULL; > > @@ -10408,8 +10426,25 @@ static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf > > > > opts.offset = offset; > > *link = bpf_program__attach_kprobe_opts(prog, func, &opts); > > + err = libbpf_get_error(*link); > > + > > + if (!err || err != -ENOENT) > > + goto out; > > + > > + sprintf(res.pattern, "%s.*", func); > > + if (!libbpf_kallsyms_parse(kprobe_kallsyms_cb, &res)) > > + goto out; > > + > > + pr_warn("prog '%s': trying to create %s '%s+0x%zx' perf event instead\n", > > + prog->name, opts.retprobe ? "kretprobe" : "kprobe", > > + res.name, offset); > > + > > + *link = bpf_program__attach_kprobe_opts(prog, res.name, &opts); > > + err = libbpf_get_error(*link); > > + > > +out: > > free(func); > > - return libbpf_get_error(*link); > > + return err; > > } > > > > static int attach_ksyscall(const struct bpf_program *prog, long cookie, struct bpf_link **link)
On Mon, Jan 9, 2023 at 7:11 PM Menglong Dong <menglong8.dong@gmail.com> wrote: > > On Tue, Jan 10, 2023 at 4:29 AM Yonghong Song <yhs@meta.com> wrote: > > > > > > > > On 1/9/23 1:42 AM, menglong8.dong@gmail.com wrote: > > > From: Menglong Dong <imagedong@tencent.com> > > > > > > The function name in kernel may be changed by the compiler. For example, > > > the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'. > > > > > > This kind optimization can happen in any kernel function. Therefor, we > > > should conside this case. > > > > > > If we failed to attach kprobe with a '-ENOENT', then we can lookup the > > > kallsyms and check if there is a similar function end with '.xxx', and > > > retry. > > > > This might produce incorrect result, so this approach won't work > > for all .isra.0 cases. When a function name is changed from > > <func> to <func>.isra.<num>, it is possible that compiler may have > > make some changes to the arguments, e.g., removing one argument, > > chaning a semantics of argument, etc. if bpf program still > > uses the original function signature, the bpf program may > > produce unexpected result. > > Oops, I wasn't aware of this part. Can we make this function disabled > by default and offer an option to users to enable it? Such as: > > bpf_object_adapt_sym(struct bpf_object *obj) > > In my case, kernel function rename is common, and I have to > check all functions and do such adaptation before attaching > my kprobe programs, which makes me can't use auto-attach. > > What's more, I haven't seen the arguments change so far, and > maybe it's not a common case? > > (Please just ignore this reply if it doesn't work :/ ) > libbpf can't just assume that in SEC("kprobe/abc") abc is meant to be a prefix. We'd need some more explicit marker to turn this search in kallsyms by prefix on. But also, at that point it becomes a multi-kprobe, really, because abc* pattern can match multiple functions. So it's not clear how to even integrate that into a singular kprobe program cleanly. Can you use kprobe.multi, which already supports what you want? > Thanks! > Menglong Dong > > > > > > > > Signed-off-by: Menglong Dong <imagedong@tencent.com> > > > --- > > > tools/lib/bpf/libbpf.c | 37 ++++++++++++++++++++++++++++++++++++- > > > 1 file changed, 36 insertions(+), 1 deletion(-) > > > > > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > > > index a5c67a3c93c5..fdfb1ca34ced 100644 > > > --- a/tools/lib/bpf/libbpf.c > > > +++ b/tools/lib/bpf/libbpf.c > > > @@ -10375,12 +10375,30 @@ bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog, > > > return libbpf_err_ptr(err); > > > } > > > > > > +struct kprobe_resolve { > > > + char pattern[128]; > > > + char name[128]; > > > +}; > > > + > > > +static int kprobe_kallsyms_cb(unsigned long long sym_addr, char sym_type, > > > + const char *sym_name, void *ctx) > > > +{ > > > + struct kprobe_resolve *res = ctx; > > > + > > > + if (!glob_match(sym_name, res->pattern)) > > > + return 0; > > > + strcpy(res->name, sym_name); > > > + return 1; > > > +} > > > + > > > static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link) > > > { > > > DECLARE_LIBBPF_OPTS(bpf_kprobe_opts, opts); > > > + struct kprobe_resolve res = {}; > > > unsigned long offset = 0; > > > const char *func_name; > > > char *func; > > > + int err; > > > int n; > > > > > > *link = NULL; > > > @@ -10408,8 +10426,25 @@ static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf > > > > > > opts.offset = offset; > > > *link = bpf_program__attach_kprobe_opts(prog, func, &opts); > > > + err = libbpf_get_error(*link); > > > + > > > + if (!err || err != -ENOENT) > > > + goto out; > > > + > > > + sprintf(res.pattern, "%s.*", func); > > > + if (!libbpf_kallsyms_parse(kprobe_kallsyms_cb, &res)) > > > + goto out; > > > + > > > + pr_warn("prog '%s': trying to create %s '%s+0x%zx' perf event instead\n", > > > + prog->name, opts.retprobe ? "kretprobe" : "kprobe", > > > + res.name, offset); > > > + > > > + *link = bpf_program__attach_kprobe_opts(prog, res.name, &opts); > > > + err = libbpf_get_error(*link); > > > + > > > +out: > > > free(func); > > > - return libbpf_get_error(*link); > > > + return err; > > > } > > > > > > static int attach_ksyscall(const struct bpf_program *prog, long cookie, struct bpf_link **link)
On 1/9/23 7:11 PM, Menglong Dong wrote: > On Tue, Jan 10, 2023 at 4:29 AM Yonghong Song <yhs@meta.com> wrote: >> >> >> >> On 1/9/23 1:42 AM, menglong8.dong@gmail.com wrote: >>> From: Menglong Dong <imagedong@tencent.com> >>> >>> The function name in kernel may be changed by the compiler. For example, >>> the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'. >>> >>> This kind optimization can happen in any kernel function. Therefor, we >>> should conside this case. >>> >>> If we failed to attach kprobe with a '-ENOENT', then we can lookup the >>> kallsyms and check if there is a similar function end with '.xxx', and >>> retry. >> >> This might produce incorrect result, so this approach won't work >> for all .isra.0 cases. When a function name is changed from >> <func> to <func>.isra.<num>, it is possible that compiler may have >> make some changes to the arguments, e.g., removing one argument, >> chaning a semantics of argument, etc. if bpf program still >> uses the original function signature, the bpf program may >> produce unexpected result. > > Oops, I wasn't aware of this part. Can we make this function disabled > by default and offer an option to users to enable it? Such as: > > bpf_object_adapt_sym(struct bpf_object *obj) > > In my case, kernel function rename is common, and I have to > check all functions and do such adaptation before attaching > my kprobe programs, which makes me can't use auto-attach. > > What's more, I haven't seen the arguments change so far, and > maybe it's not a common case? I don't have statistics, but it happens. In general, if you want to attach to a function like <foo>, but it has a variant <foo>.isra.<num>, you probably should check assembly code to ensure the parameter semantics not changed, and then you can attach to kprobe function <foo>.isra.<num>, which I assume current libbpf infrastructure should support it. After you investigate all these <foo>.isra.<num> functions and confirm their argument semantics won't change, you could use kprobe multi to do attachment. > > (Please just ignore this reply if it doesn't work :/ ) > > Thanks! > Menglong Dong >> >>> >>> Signed-off-by: Menglong Dong <imagedong@tencent.com> >>> --- >>> tools/lib/bpf/libbpf.c | 37 ++++++++++++++++++++++++++++++++++++- >>> 1 file changed, 36 insertions(+), 1 deletion(-) >>> >>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c >>> index a5c67a3c93c5..fdfb1ca34ced 100644 >>> --- a/tools/lib/bpf/libbpf.c >>> +++ b/tools/lib/bpf/libbpf.c >>> @@ -10375,12 +10375,30 @@ bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog, >>> return libbpf_err_ptr(err); >>> } >>> >>> +struct kprobe_resolve { >>> + char pattern[128]; >>> + char name[128]; >>> +}; >>> + >>> +static int kprobe_kallsyms_cb(unsigned long long sym_addr, char sym_type, >>> + const char *sym_name, void *ctx) >>> +{ >>> + struct kprobe_resolve *res = ctx; >>> + >>> + if (!glob_match(sym_name, res->pattern)) >>> + return 0; >>> + strcpy(res->name, sym_name); >>> + return 1; >>> +} >>> + >>> static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link) >>> { >>> DECLARE_LIBBPF_OPTS(bpf_kprobe_opts, opts); >>> + struct kprobe_resolve res = {}; >>> unsigned long offset = 0; >>> const char *func_name; >>> char *func; >>> + int err; >>> int n; >>> >>> *link = NULL; >>> @@ -10408,8 +10426,25 @@ static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf >>> >>> opts.offset = offset; >>> *link = bpf_program__attach_kprobe_opts(prog, func, &opts); >>> + err = libbpf_get_error(*link); >>> + >>> + if (!err || err != -ENOENT) >>> + goto out; >>> + >>> + sprintf(res.pattern, "%s.*", func); >>> + if (!libbpf_kallsyms_parse(kprobe_kallsyms_cb, &res)) >>> + goto out; >>> + >>> + pr_warn("prog '%s': trying to create %s '%s+0x%zx' perf event instead\n", >>> + prog->name, opts.retprobe ? "kretprobe" : "kprobe", >>> + res.name, offset); >>> + >>> + *link = bpf_program__attach_kprobe_opts(prog, res.name, &opts); >>> + err = libbpf_get_error(*link); >>> + >>> +out: >>> free(func); >>> - return libbpf_get_error(*link); >>> + return err; >>> } >>> >>> static int attach_ksyscall(const struct bpf_program *prog, long cookie, struct bpf_link **link)
On 12/01/2023 07:23, Yonghong Song wrote: > > > On 1/9/23 7:11 PM, Menglong Dong wrote: >> On Tue, Jan 10, 2023 at 4:29 AM Yonghong Song <yhs@meta.com> wrote: >>> >>> >>> >>> On 1/9/23 1:42 AM, menglong8.dong@gmail.com wrote: >>>> From: Menglong Dong <imagedong@tencent.com> >>>> >>>> The function name in kernel may be changed by the compiler. For example, >>>> the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'. >>>> >>>> This kind optimization can happen in any kernel function. Therefor, we >>>> should conside this case. >>>> >>>> If we failed to attach kprobe with a '-ENOENT', then we can lookup the >>>> kallsyms and check if there is a similar function end with '.xxx', and >>>> retry. >>> >>> This might produce incorrect result, so this approach won't work >>> for all .isra.0 cases. When a function name is changed from >>> <func> to <func>.isra.<num>, it is possible that compiler may have >>> make some changes to the arguments, e.g., removing one argument, >>> chaning a semantics of argument, etc. if bpf program still >>> uses the original function signature, the bpf program may >>> produce unexpected result. >> >> Oops, I wasn't aware of this part. Can we make this function disabled >> by default and offer an option to users to enable it? Such as: >> >> bpf_object_adapt_sym(struct bpf_object *obj) >> >> In my case, kernel function rename is common, and I have to >> check all functions and do such adaptation before attaching >> my kprobe programs, which makes me can't use auto-attach. >> >> What's more, I haven't seen the arguments change so far, and >> maybe it's not a common case? > > I don't have statistics, but it happens. In general, if you > want to attach to a function like <foo>, but it has a variant > <foo>.isra.<num>, you probably should check assembly code > to ensure the parameter semantics not changed, and then > you can attach to kprobe function <foo>.isra.<num>, which > I assume current libbpf infrastructure should support it. > After you investigate all these <foo>.isra.<num> functions > and confirm their argument semantics won't change, you > could use kprobe multi to do attachment. > I crunched some numbers on this, and discovered out of ~1600 .isra/.constprop functions, 76 had a missing argument. The patch series at [1] is a rough attempt to get pahole to spot these, and add BTF entries for each, where the BTF representation reflects reality by skipping optimized-out arguments. So for a function like static int ip6_nh_lookup_table(struct net *net, struct fib6_config *cfg, const struct in6_addr *gw_addr, u32 tbid, int flags, struct fib6_result *res); Examining the BTF representation using pahole from [1], we see int ip6_nh_lookup_table.isra.0(struct net *net, struct fib6_config *cfg, struct in6_addr *gw_addr, u32 tbid, int flags); Comparing to the definition, we see the last parameter is missing, i.e. the "struct fib6_result *" argument is missing. The calling pattern - where the callers have a struct fib6_result on the stack and pass a pointer - is reflected in late DWARF info which shows the argument is not actually passed as a register, but can be expressed as an offset relative to the current function stack (DW_OP_fbreg). This approach howvever introduces the problem that currently the kernel doesn't allow a "." in a function name. We can fix that, but any BTF encoding that introduced optimized functions containing a "." would have to be opt-in via a pahole option, so we do not generate invalid vmlinux BTF for kernels without that change. An alternative approach would be to simply encode .isra functions in BTF without the .isra suffix (i.e. using "function_name" not "function_name.isra"), only doing the BTF encoding if no arguments were optimized out - i.e. if the function signature matches expectations. The 76 functions with optimized-out parameters could simply be skipped. To me that feels like the simpler approach - it avoids issues with function name BTF encoding, and with that sort of model a loose-matching kallsyms approach - like that described here - could be used for kprobes and fentry/fexit. It also fits with the DWARF representation - the .isra suffixes are not present in DWARF representations of the function, only in the symbol table and kallsyms, so perhaps BTF should follow suit and not add the suffixes. What do you think? Alan [1] https://github.com/acmel/dwarves/compare/master...alan-maguire:dwarves:optimized >> >> (Please just ignore this reply if it doesn't work :/ ) >> >> Thanks! >> Menglong Dong >>> >>>> >>>> Signed-off-by: Menglong Dong <imagedong@tencent.com> >>>> --- >>>> tools/lib/bpf/libbpf.c | 37 ++++++++++++++++++++++++++++++++++++- >>>> 1 file changed, 36 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c >>>> index a5c67a3c93c5..fdfb1ca34ced 100644 >>>> --- a/tools/lib/bpf/libbpf.c >>>> +++ b/tools/lib/bpf/libbpf.c >>>> @@ -10375,12 +10375,30 @@ bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog, >>>> return libbpf_err_ptr(err); >>>> } >>>> >>>> +struct kprobe_resolve { >>>> + char pattern[128]; >>>> + char name[128]; >>>> +}; >>>> + >>>> +static int kprobe_kallsyms_cb(unsigned long long sym_addr, char sym_type, >>>> + const char *sym_name, void *ctx) >>>> +{ >>>> + struct kprobe_resolve *res = ctx; >>>> + >>>> + if (!glob_match(sym_name, res->pattern)) >>>> + return 0; >>>> + strcpy(res->name, sym_name); >>>> + return 1; >>>> +} >>>> + >>>> static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link) >>>> { >>>> DECLARE_LIBBPF_OPTS(bpf_kprobe_opts, opts); >>>> + struct kprobe_resolve res = {}; >>>> unsigned long offset = 0; >>>> const char *func_name; >>>> char *func; >>>> + int err; >>>> int n; >>>> >>>> *link = NULL; >>>> @@ -10408,8 +10426,25 @@ static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf >>>> >>>> opts.offset = offset; >>>> *link = bpf_program__attach_kprobe_opts(prog, func, &opts); >>>> + err = libbpf_get_error(*link); >>>> + >>>> + if (!err || err != -ENOENT) >>>> + goto out; >>>> + >>>> + sprintf(res.pattern, "%s.*", func); >>>> + if (!libbpf_kallsyms_parse(kprobe_kallsyms_cb, &res)) >>>> + goto out; >>>> + >>>> + pr_warn("prog '%s': trying to create %s '%s+0x%zx' perf event instead\n", >>>> + prog->name, opts.retprobe ? "kretprobe" : "kprobe", >>>> + res.name, offset); >>>> + >>>> + *link = bpf_program__attach_kprobe_opts(prog, res.name, &opts); >>>> + err = libbpf_get_error(*link); >>>> + >>>> +out: >>>> free(func); >>>> - return libbpf_get_error(*link); >>>> + return err; >>>> } >>>> >>>> static int attach_ksyscall(const struct bpf_program *prog, long cookie, struct bpf_link **link)
On Thu, Jan 12, 2023 at 2:20 AM Alan Maguire <alan.maguire@oracle.com> wrote: > > On 12/01/2023 07:23, Yonghong Song wrote: > > > > > > On 1/9/23 7:11 PM, Menglong Dong wrote: > >> On Tue, Jan 10, 2023 at 4:29 AM Yonghong Song <yhs@meta.com> wrote: > >>> > >>> > >>> > >>> On 1/9/23 1:42 AM, menglong8.dong@gmail.com wrote: > >>>> From: Menglong Dong <imagedong@tencent.com> > >>>> > >>>> The function name in kernel may be changed by the compiler. For example, > >>>> the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'. > >>>> > >>>> This kind optimization can happen in any kernel function. Therefor, we > >>>> should conside this case. > >>>> > >>>> If we failed to attach kprobe with a '-ENOENT', then we can lookup the > >>>> kallsyms and check if there is a similar function end with '.xxx', and > >>>> retry. > >>> > >>> This might produce incorrect result, so this approach won't work > >>> for all .isra.0 cases. When a function name is changed from > >>> <func> to <func>.isra.<num>, it is possible that compiler may have > >>> make some changes to the arguments, e.g., removing one argument, > >>> chaning a semantics of argument, etc. if bpf program still > >>> uses the original function signature, the bpf program may > >>> produce unexpected result. > >> > >> Oops, I wasn't aware of this part. Can we make this function disabled > >> by default and offer an option to users to enable it? Such as: > >> > >> bpf_object_adapt_sym(struct bpf_object *obj) > >> > >> In my case, kernel function rename is common, and I have to > >> check all functions and do such adaptation before attaching > >> my kprobe programs, which makes me can't use auto-attach. > >> > >> What's more, I haven't seen the arguments change so far, and > >> maybe it's not a common case? > > > > I don't have statistics, but it happens. In general, if you > > want to attach to a function like <foo>, but it has a variant > > <foo>.isra.<num>, you probably should check assembly code > > to ensure the parameter semantics not changed, and then > > you can attach to kprobe function <foo>.isra.<num>, which > > I assume current libbpf infrastructure should support it. > > After you investigate all these <foo>.isra.<num> functions > > and confirm their argument semantics won't change, you > > could use kprobe multi to do attachment. > > > > I crunched some numbers on this, and discovered out of ~1600 > .isra/.constprop functions, 76 had a missing argument. The patch series > at [1] is a rough attempt to get pahole to spot these, and add > BTF entries for each, where the BTF representation reflects > reality by skipping optimized-out arguments. So for a function > like > > static int ip6_nh_lookup_table(struct net *net, struct fib6_config *cfg, > const struct in6_addr *gw_addr, u32 tbid, > int flags, struct fib6_result *res); > > Examining the BTF representation using pahole from [1], we see > > int ip6_nh_lookup_table.isra.0(struct net *net, struct fib6_config *cfg, struct in6_addr *gw_addr, u32 tbid, int flags); > > Comparing to the definition, we see the last parameter is missing, > i.e. the "struct fib6_result *" argument is missing. The calling pattern - > where the callers have a struct fib6_result on the stack and pass a pointer - > is reflected in late DWARF info which shows the argument is not actually > passed as a register, but can be expressed as an offset relative to the current > function stack (DW_OP_fbreg). > > This approach howvever introduces the problem that currently the kernel > doesn't allow a "." in a function name. We can fix that, but any BTF encoding > that introduced optimized functions containing a "." would have to be opt-in > via a pahole option, so we do not generate invalid vmlinux BTF for kernels > without that change. > > An alternative approach would be to simply encode .isra functions > in BTF without the .isra suffix (i.e. using "function_name" not > "function_name.isra"), only doing the BTF encoding if no arguments were > optimized out - i.e. if the function signature matches expectations. > The 76 functions with optimized-out parameters could simply be skipped. > To me that feels like the simpler approach - it avoids issues > with function name BTF encoding, and with that sort of model a > loose-matching kallsyms approach - like that described here - could be used > for kprobes and fentry/fexit. It also fits with the DWARF representation - > the .isra suffixes are not present in DWARF representations of the function, > only in the symbol table and kallsyms, so perhaps BTF should follow suit > and not add the suffixes. What do you think? Sounds like a great idea to me. Addresses this issue in a clean way.
On Thu, Jan 12, 2023 at 6:20 PM Alan Maguire <alan.maguire@oracle.com> wrote: > > On 12/01/2023 07:23, Yonghong Song wrote: > > > > > > On 1/9/23 7:11 PM, Menglong Dong wrote: > >> On Tue, Jan 10, 2023 at 4:29 AM Yonghong Song <yhs@meta.com> wrote: > >>> > >>> > >>> > >>> On 1/9/23 1:42 AM, menglong8.dong@gmail.com wrote: > >>>> From: Menglong Dong <imagedong@tencent.com> > >>>> > >>>> The function name in kernel may be changed by the compiler. For example, > >>>> the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'. > >>>> > >>>> This kind optimization can happen in any kernel function. Therefor, we > >>>> should conside this case. > >>>> > >>>> If we failed to attach kprobe with a '-ENOENT', then we can lookup the > >>>> kallsyms and check if there is a similar function end with '.xxx', and > >>>> retry. > >>> > >>> This might produce incorrect result, so this approach won't work > >>> for all .isra.0 cases. When a function name is changed from > >>> <func> to <func>.isra.<num>, it is possible that compiler may have > >>> make some changes to the arguments, e.g., removing one argument, > >>> chaning a semantics of argument, etc. if bpf program still > >>> uses the original function signature, the bpf program may > >>> produce unexpected result. > >> > >> Oops, I wasn't aware of this part. Can we make this function disabled > >> by default and offer an option to users to enable it? Such as: > >> > >> bpf_object_adapt_sym(struct bpf_object *obj) > >> > >> In my case, kernel function rename is common, and I have to > >> check all functions and do such adaptation before attaching > >> my kprobe programs, which makes me can't use auto-attach. > >> > >> What's more, I haven't seen the arguments change so far, and > >> maybe it's not a common case? > > > > I don't have statistics, but it happens. In general, if you > > want to attach to a function like <foo>, but it has a variant > > <foo>.isra.<num>, you probably should check assembly code > > to ensure the parameter semantics not changed, and then > > you can attach to kprobe function <foo>.isra.<num>, which > > I assume current libbpf infrastructure should support it. > > After you investigate all these <foo>.isra.<num> functions > > and confirm their argument semantics won't change, you > > could use kprobe multi to do attachment. > > > > I crunched some numbers on this, and discovered out of ~1600 > .isra/.constprop functions, 76 had a missing argument. The patch series > at [1] is a rough attempt to get pahole to spot these, and add > BTF entries for each, where the BTF representation reflects > reality by skipping optimized-out arguments. So for a function > like > > static int ip6_nh_lookup_table(struct net *net, struct fib6_config *cfg, > const struct in6_addr *gw_addr, u32 tbid, > int flags, struct fib6_result *res); > > Examining the BTF representation using pahole from [1], we see > > int ip6_nh_lookup_table.isra.0(struct net *net, struct fib6_config *cfg, struct in6_addr *gw_addr, u32 tbid, int flags); > > Comparing to the definition, we see the last parameter is missing, > i.e. the "struct fib6_result *" argument is missing. The calling pattern - > where the callers have a struct fib6_result on the stack and pass a pointer - > is reflected in late DWARF info which shows the argument is not actually > passed as a register, but can be expressed as an offset relative to the current > function stack (DW_OP_fbreg). > > This approach howvever introduces the problem that currently the kernel > doesn't allow a "." in a function name. We can fix that, but any BTF encoding > that introduced optimized functions containing a "." would have to be opt-in > via a pahole option, so we do not generate invalid vmlinux BTF for kernels > without that change. > > An alternative approach would be to simply encode .isra functions > in BTF without the .isra suffix (i.e. using "function_name" not > "function_name.isra"), only doing the BTF encoding if no arguments were > optimized out - i.e. if the function signature matches expectations. > The 76 functions with optimized-out parameters could simply be skipped. > To me that feels like the simpler approach - it avoids issues > with function name BTF encoding, and with that sort of model a > loose-matching kallsyms approach - like that described here - could be used > for kprobes and fentry/fexit. It also fits with the DWARF representation - > the .isra suffixes are not present in DWARF representations of the function, > only in the symbol table and kallsyms, so perhaps BTF should follow suit > and not add the suffixes. What do you think? This idea sounds great to me too, which makes the kernel function consistent to users. As for the functions with optimized-out parameters, I think that skipping them is acceptable, as they might produce incorrect results to users. Thanks! Menglong Dong > > Alan > > [1] https://github.com/acmel/dwarves/compare/master...alan-maguire:dwarves:optimized > >> > >> (Please just ignore this reply if it doesn't work :/ ) > >> > >> Thanks! > >> Menglong Dong > >>> > >>>> > >>>> Signed-off-by: Menglong Dong <imagedong@tencent.com> > >>>> --- > >>>> tools/lib/bpf/libbpf.c | 37 ++++++++++++++++++++++++++++++++++++- > >>>> 1 file changed, 36 insertions(+), 1 deletion(-) > >>>> > >>>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c > >>>> index a5c67a3c93c5..fdfb1ca34ced 100644 > >>>> --- a/tools/lib/bpf/libbpf.c > >>>> +++ b/tools/lib/bpf/libbpf.c > >>>> @@ -10375,12 +10375,30 @@ bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog, > >>>> return libbpf_err_ptr(err); > >>>> } > >>>> > >>>> +struct kprobe_resolve { > >>>> + char pattern[128]; > >>>> + char name[128]; > >>>> +}; > >>>> + > >>>> +static int kprobe_kallsyms_cb(unsigned long long sym_addr, char sym_type, > >>>> + const char *sym_name, void *ctx) > >>>> +{ > >>>> + struct kprobe_resolve *res = ctx; > >>>> + > >>>> + if (!glob_match(sym_name, res->pattern)) > >>>> + return 0; > >>>> + strcpy(res->name, sym_name); > >>>> + return 1; > >>>> +} > >>>> + > >>>> static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link) > >>>> { > >>>> DECLARE_LIBBPF_OPTS(bpf_kprobe_opts, opts); > >>>> + struct kprobe_resolve res = {}; > >>>> unsigned long offset = 0; > >>>> const char *func_name; > >>>> char *func; > >>>> + int err; > >>>> int n; > >>>> > >>>> *link = NULL; > >>>> @@ -10408,8 +10426,25 @@ static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf > >>>> > >>>> opts.offset = offset; > >>>> *link = bpf_program__attach_kprobe_opts(prog, func, &opts); > >>>> + err = libbpf_get_error(*link); > >>>> + > >>>> + if (!err || err != -ENOENT) > >>>> + goto out; > >>>> + > >>>> + sprintf(res.pattern, "%s.*", func); > >>>> + if (!libbpf_kallsyms_parse(kprobe_kallsyms_cb, &res)) > >>>> + goto out; > >>>> + > >>>> + pr_warn("prog '%s': trying to create %s '%s+0x%zx' perf event instead\n", > >>>> + prog->name, opts.retprobe ? "kretprobe" : "kprobe", > >>>> + res.name, offset); > >>>> + > >>>> + *link = bpf_program__attach_kprobe_opts(prog, res.name, &opts); > >>>> + err = libbpf_get_error(*link); > >>>> + > >>>> +out: > >>>> free(func); > >>>> - return libbpf_get_error(*link); > >>>> + return err; > >>>> } > >>>> > >>>> static int attach_ksyscall(const struct bpf_program *prog, long cookie, struct bpf_link **link)
On 1/12/23 1:07 PM, Alexei Starovoitov wrote: > On Thu, Jan 12, 2023 at 2:20 AM Alan Maguire <alan.maguire@oracle.com> wrote: >> >> On 12/01/2023 07:23, Yonghong Song wrote: >>> >>> >>> On 1/9/23 7:11 PM, Menglong Dong wrote: >>>> On Tue, Jan 10, 2023 at 4:29 AM Yonghong Song <yhs@meta.com> wrote: >>>>> >>>>> >>>>> >>>>> On 1/9/23 1:42 AM, menglong8.dong@gmail.com wrote: >>>>>> From: Menglong Dong <imagedong@tencent.com> >>>>>> >>>>>> The function name in kernel may be changed by the compiler. For example, >>>>>> the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'. >>>>>> >>>>>> This kind optimization can happen in any kernel function. Therefor, we >>>>>> should conside this case. >>>>>> >>>>>> If we failed to attach kprobe with a '-ENOENT', then we can lookup the >>>>>> kallsyms and check if there is a similar function end with '.xxx', and >>>>>> retry. >>>>> >>>>> This might produce incorrect result, so this approach won't work >>>>> for all .isra.0 cases. When a function name is changed from >>>>> <func> to <func>.isra.<num>, it is possible that compiler may have >>>>> make some changes to the arguments, e.g., removing one argument, >>>>> chaning a semantics of argument, etc. if bpf program still >>>>> uses the original function signature, the bpf program may >>>>> produce unexpected result. >>>> >>>> Oops, I wasn't aware of this part. Can we make this function disabled >>>> by default and offer an option to users to enable it? Such as: >>>> >>>> bpf_object_adapt_sym(struct bpf_object *obj) >>>> >>>> In my case, kernel function rename is common, and I have to >>>> check all functions and do such adaptation before attaching >>>> my kprobe programs, which makes me can't use auto-attach. >>>> >>>> What's more, I haven't seen the arguments change so far, and >>>> maybe it's not a common case? >>> >>> I don't have statistics, but it happens. In general, if you >>> want to attach to a function like <foo>, but it has a variant >>> <foo>.isra.<num>, you probably should check assembly code >>> to ensure the parameter semantics not changed, and then >>> you can attach to kprobe function <foo>.isra.<num>, which >>> I assume current libbpf infrastructure should support it. >>> After you investigate all these <foo>.isra.<num> functions >>> and confirm their argument semantics won't change, you >>> could use kprobe multi to do attachment. >>> >> >> I crunched some numbers on this, and discovered out of ~1600 >> .isra/.constprop functions, 76 had a missing argument. The patch series >> at [1] is a rough attempt to get pahole to spot these, and add >> BTF entries for each, where the BTF representation reflects >> reality by skipping optimized-out arguments. So for a function >> like >> >> static int ip6_nh_lookup_table(struct net *net, struct fib6_config *cfg, >> const struct in6_addr *gw_addr, u32 tbid, >> int flags, struct fib6_result *res); >> >> Examining the BTF representation using pahole from [1], we see >> >> int ip6_nh_lookup_table.isra.0(struct net *net, struct fib6_config *cfg, struct in6_addr *gw_addr, u32 tbid, int flags); >> >> Comparing to the definition, we see the last parameter is missing, >> i.e. the "struct fib6_result *" argument is missing. The calling pattern - >> where the callers have a struct fib6_result on the stack and pass a pointer - >> is reflected in late DWARF info which shows the argument is not actually >> passed as a register, but can be expressed as an offset relative to the current >> function stack (DW_OP_fbreg). >> >> This approach howvever introduces the problem that currently the kernel >> doesn't allow a "." in a function name. We can fix that, but any BTF encoding >> that introduced optimized functions containing a "." would have to be opt-in >> via a pahole option, so we do not generate invalid vmlinux BTF for kernels >> without that change. >> >> An alternative approach would be to simply encode .isra functions >> in BTF without the .isra suffix (i.e. using "function_name" not >> "function_name.isra"), only doing the BTF encoding if no arguments were >> optimized out - i.e. if the function signature matches expectations. >> The 76 functions with optimized-out parameters could simply be skipped. >> To me that feels like the simpler approach - it avoids issues >> with function name BTF encoding, and with that sort of model a >> loose-matching kallsyms approach - like that described here - could be used >> for kprobes and fentry/fexit. It also fits with the DWARF representation - >> the .isra suffixes are not present in DWARF representations of the function, >> only in the symbol table and kallsyms, so perhaps BTF should follow suit >> and not add the suffixes. What do you think? > > Sounds like a great idea to me. > Addresses this issue in a clean way. Yes, the second approach seems a reasonable approach. If the number of parameters for the *actual* functions equals to the number of parameters for the defined function (abstract_origin), we can roughly assume the actual function signature matches the prototype. Although it is theoretically possible that compiler might change parameter types, e.g., from a struct pointer (struct foo *p) to a int value (p->field1). But this should be extremely rare and we need compiler emitting additional dwarf data (might through btf_decl_tag) to discover such cases.
On 1/13/23 12:00 AM, Yonghong Song wrote: > > > On 1/12/23 1:07 PM, Alexei Starovoitov wrote: >> On Thu, Jan 12, 2023 at 2:20 AM Alan Maguire <alan.maguire@oracle.com> >> wrote: >>> >>> On 12/01/2023 07:23, Yonghong Song wrote: >>>> >>>> >>>> On 1/9/23 7:11 PM, Menglong Dong wrote: >>>>> On Tue, Jan 10, 2023 at 4:29 AM Yonghong Song <yhs@meta.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 1/9/23 1:42 AM, menglong8.dong@gmail.com wrote: >>>>>>> From: Menglong Dong <imagedong@tencent.com> >>>>>>> >>>>>>> The function name in kernel may be changed by the compiler. For >>>>>>> example, >>>>>>> the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'. >>>>>>> >>>>>>> This kind optimization can happen in any kernel function. >>>>>>> Therefor, we >>>>>>> should conside this case. >>>>>>> >>>>>>> If we failed to attach kprobe with a '-ENOENT', then we can >>>>>>> lookup the >>>>>>> kallsyms and check if there is a similar function end with >>>>>>> '.xxx', and >>>>>>> retry. >>>>>> >>>>>> This might produce incorrect result, so this approach won't work >>>>>> for all .isra.0 cases. When a function name is changed from >>>>>> <func> to <func>.isra.<num>, it is possible that compiler may have >>>>>> make some changes to the arguments, e.g., removing one argument, >>>>>> chaning a semantics of argument, etc. if bpf program still >>>>>> uses the original function signature, the bpf program may >>>>>> produce unexpected result. >>>>> >>>>> Oops, I wasn't aware of this part. Can we make this function disabled >>>>> by default and offer an option to users to enable it? Such as: >>>>> >>>>> bpf_object_adapt_sym(struct bpf_object *obj) >>>>> >>>>> In my case, kernel function rename is common, and I have to >>>>> check all functions and do such adaptation before attaching >>>>> my kprobe programs, which makes me can't use auto-attach. >>>>> >>>>> What's more, I haven't seen the arguments change so far, and >>>>> maybe it's not a common case? >>>> >>>> I don't have statistics, but it happens. In general, if you >>>> want to attach to a function like <foo>, but it has a variant >>>> <foo>.isra.<num>, you probably should check assembly code >>>> to ensure the parameter semantics not changed, and then >>>> you can attach to kprobe function <foo>.isra.<num>, which >>>> I assume current libbpf infrastructure should support it. >>>> After you investigate all these <foo>.isra.<num> functions >>>> and confirm their argument semantics won't change, you >>>> could use kprobe multi to do attachment. >>>> >>> >>> I crunched some numbers on this, and discovered out of ~1600 >>> .isra/.constprop functions, 76 had a missing argument. The patch series >>> at [1] is a rough attempt to get pahole to spot these, and add >>> BTF entries for each, where the BTF representation reflects >>> reality by skipping optimized-out arguments. So for a function >>> like >>> >>> static int ip6_nh_lookup_table(struct net *net, struct fib6_config *cfg, >>> const struct in6_addr *gw_addr, u32 >>> tbid, >>> int flags, struct fib6_result *res); >>> >>> Examining the BTF representation using pahole from [1], we see >>> >>> int ip6_nh_lookup_table.isra.0(struct net *net, struct fib6_config >>> *cfg, struct in6_addr *gw_addr, u32 tbid, int flags); >>> >>> Comparing to the definition, we see the last parameter is missing, >>> i.e. the "struct fib6_result *" argument is missing. The calling >>> pattern - >>> where the callers have a struct fib6_result on the stack and pass a >>> pointer - >>> is reflected in late DWARF info which shows the argument is not actually >>> passed as a register, but can be expressed as an offset relative to >>> the current >>> function stack (DW_OP_fbreg). >>> >>> This approach howvever introduces the problem that currently the kernel >>> doesn't allow a "." in a function name. We can fix that, but any BTF >>> encoding >>> that introduced optimized functions containing a "." would have to >>> be opt-in >>> via a pahole option, so we do not generate invalid vmlinux BTF for >>> kernels >>> without that change. >>> >>> An alternative approach would be to simply encode .isra functions >>> in BTF without the .isra suffix (i.e. using "function_name" not >>> "function_name.isra"), only doing the BTF encoding if no arguments were >>> optimized out - i.e. if the function signature matches expectations. >>> The 76 functions with optimized-out parameters could simply be skipped. >>> To me that feels like the simpler approach - it avoids issues >>> with function name BTF encoding, and with that sort of model a >>> loose-matching kallsyms approach - like that described here - could >>> be used >>> for kprobes and fentry/fexit. It also fits with the DWARF >>> representation - >>> the .isra suffixes are not present in DWARF representations of the >>> function, >>> only in the symbol table and kallsyms, so perhaps BTF should follow suit >>> and not add the suffixes. What do you think? >> >> Sounds like a great idea to me. >> Addresses this issue in a clean way. > > Yes, the second approach seems a reasonable approach. If the number > of parameters for the *actual* functions equals to the number > of parameters for the defined function (abstract_origin), > we can roughly assume the actual function signature matches > the prototype. Although it is theoretically possible that > compiler might change parameter types, e.g., from a > struct pointer (struct foo *p) to a int value (p->field1). > But this should be extremely rare and we need compiler emitting > additional dwarf data (might through btf_decl_tag) to discover > such cases. I checked with some compiler guys at Meta. They confirmed that clang might have the same optimization (eliminating some function parameters for static functions), but clang won't change function linkage name. So that means, clang won't do static function cloning and it will only remove function parameters if this can be applied to all call sites. I checked the clang (clang16) build kernel with latest bpf-next and didn't find a single instance that a static function's parameter is removed. Also, current libbpf kprobe API supports to pass a function address to kernel. So user space can always do their own /proc/kallsyms search and find func address for either regular function or function with .isa.<num>/.constprop.<num> suffixes.
On 18/01/2023 18:15, Yonghong Song wrote: > > > On 1/13/23 12:00 AM, Yonghong Song wrote: >> >> >> On 1/12/23 1:07 PM, Alexei Starovoitov wrote: >>> On Thu, Jan 12, 2023 at 2:20 AM Alan Maguire <alan.maguire@oracle.com> wrote: >>>> >>>> On 12/01/2023 07:23, Yonghong Song wrote: >>>>> >>>>> >>>>> On 1/9/23 7:11 PM, Menglong Dong wrote: >>>>>> On Tue, Jan 10, 2023 at 4:29 AM Yonghong Song <yhs@meta.com> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 1/9/23 1:42 AM, menglong8.dong@gmail.com wrote: >>>>>>>> From: Menglong Dong <imagedong@tencent.com> >>>>>>>> >>>>>>>> The function name in kernel may be changed by the compiler. For example, >>>>>>>> the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'. >>>>>>>> >>>>>>>> This kind optimization can happen in any kernel function. Therefor, we >>>>>>>> should conside this case. >>>>>>>> >>>>>>>> If we failed to attach kprobe with a '-ENOENT', then we can lookup the >>>>>>>> kallsyms and check if there is a similar function end with '.xxx', and >>>>>>>> retry. >>>>>>> >>>>>>> This might produce incorrect result, so this approach won't work >>>>>>> for all .isra.0 cases. When a function name is changed from >>>>>>> <func> to <func>.isra.<num>, it is possible that compiler may have >>>>>>> make some changes to the arguments, e.g., removing one argument, >>>>>>> chaning a semantics of argument, etc. if bpf program still >>>>>>> uses the original function signature, the bpf program may >>>>>>> produce unexpected result. >>>>>> >>>>>> Oops, I wasn't aware of this part. Can we make this function disabled >>>>>> by default and offer an option to users to enable it? Such as: >>>>>> >>>>>> bpf_object_adapt_sym(struct bpf_object *obj) >>>>>> >>>>>> In my case, kernel function rename is common, and I have to >>>>>> check all functions and do such adaptation before attaching >>>>>> my kprobe programs, which makes me can't use auto-attach. >>>>>> >>>>>> What's more, I haven't seen the arguments change so far, and >>>>>> maybe it's not a common case? >>>>> >>>>> I don't have statistics, but it happens. In general, if you >>>>> want to attach to a function like <foo>, but it has a variant >>>>> <foo>.isra.<num>, you probably should check assembly code >>>>> to ensure the parameter semantics not changed, and then >>>>> you can attach to kprobe function <foo>.isra.<num>, which >>>>> I assume current libbpf infrastructure should support it. >>>>> After you investigate all these <foo>.isra.<num> functions >>>>> and confirm their argument semantics won't change, you >>>>> could use kprobe multi to do attachment. >>>>> >>>> >>>> I crunched some numbers on this, and discovered out of ~1600 >>>> .isra/.constprop functions, 76 had a missing argument. The patch series >>>> at [1] is a rough attempt to get pahole to spot these, and add >>>> BTF entries for each, where the BTF representation reflects >>>> reality by skipping optimized-out arguments. So for a function >>>> like >>>> >>>> static int ip6_nh_lookup_table(struct net *net, struct fib6_config *cfg, >>>> const struct in6_addr *gw_addr, u32 tbid, >>>> int flags, struct fib6_result *res); >>>> >>>> Examining the BTF representation using pahole from [1], we see >>>> >>>> int ip6_nh_lookup_table.isra.0(struct net *net, struct fib6_config *cfg, struct in6_addr *gw_addr, u32 tbid, int flags); >>>> >>>> Comparing to the definition, we see the last parameter is missing, >>>> i.e. the "struct fib6_result *" argument is missing. The calling pattern - >>>> where the callers have a struct fib6_result on the stack and pass a pointer - >>>> is reflected in late DWARF info which shows the argument is not actually >>>> passed as a register, but can be expressed as an offset relative to the current >>>> function stack (DW_OP_fbreg). >>>> >>>> This approach howvever introduces the problem that currently the kernel >>>> doesn't allow a "." in a function name. We can fix that, but any BTF encoding >>>> that introduced optimized functions containing a "." would have to be opt-in >>>> via a pahole option, so we do not generate invalid vmlinux BTF for kernels >>>> without that change. >>>> >>>> An alternative approach would be to simply encode .isra functions >>>> in BTF without the .isra suffix (i.e. using "function_name" not >>>> "function_name.isra"), only doing the BTF encoding if no arguments were >>>> optimized out - i.e. if the function signature matches expectations. >>>> The 76 functions with optimized-out parameters could simply be skipped. >>>> To me that feels like the simpler approach - it avoids issues >>>> with function name BTF encoding, and with that sort of model a >>>> loose-matching kallsyms approach - like that described here - could be used >>>> for kprobes and fentry/fexit. It also fits with the DWARF representation - >>>> the .isra suffixes are not present in DWARF representations of the function, >>>> only in the symbol table and kallsyms, so perhaps BTF should follow suit >>>> and not add the suffixes. What do you think? >>> >>> Sounds like a great idea to me. >>> Addresses this issue in a clean way. >> >> Yes, the second approach seems a reasonable approach. If the number >> of parameters for the *actual* functions equals to the number >> of parameters for the defined function (abstract_origin), >> we can roughly assume the actual function signature matches >> the prototype. Although it is theoretically possible that >> compiler might change parameter types, e.g., from a >> struct pointer (struct foo *p) to a int value (p->field1). >> But this should be extremely rare and we need compiler emitting >> additional dwarf data (might through btf_decl_tag) to discover >> such cases. > Thanks! I've prototyped a solution at [1]. The key problem is pahole processing compilation units separately; the issue is that for some functions, they have optimized out parameters in some compilation units but not others (NF_HOOK() does this for example). It's a pain, especially as we want to preserve parallel BTF encoding as much as possible, so the best solution I could come up with was to save information on functions that had a suffix match in a global encoder binary tree. Then, when we are collecting threads, they can be safely added prior to BTF merging, since at that point we have recorded if they have optimized-out parameters in any compilation unit. There may be a better way to handle this, but I think we are stuck with comparing binary-wide to see if the parameters are consistent. The code is (I think) careful to limit this to cases where "."-suffixed functions are found. In testing this however, I think there is a wider issue with BTF encoding of static functions which may require a similar global comparison mechanism. More below... > I checked with some compiler guys at Meta. They confirmed that clang > might have the same optimization (eliminating some function parameters > for static functions), but clang won't change function linkage name. > So that means, clang won't do static function cloning and it will > only remove function parameters if this can be applied to all call sites. > Great, that simplifies things a lot. > I checked the clang (clang16) build kernel with latest bpf-next > and didn't find a single instance that a static function's parameter > is removed. > Excellent! > Also, current libbpf kprobe API supports to pass a function address > to kernel. So user space can always do their own /proc/kallsyms search > and find func address for either regular function or function > with .isa.<num>/.constprop.<num> suffixes. Right; one way I've done this is to have a special "okprobe" section name where we support matching with a "." suffix as well as with the exact name. Might be worth adding that support to libbpf itself if the above approach lands. The bigger concern I have is in testing this I hit a problem which at first looked like a parallel BTF encoding problem, but on deeper analysis looks like a conceptual issue in how we handle static functions. To demonstrate, generate vmlinux BTF twice, once with a single thread and once with 2 threads. We observe a different number of functions that end up in BTF for the exact same object: $ LLVM_OBJCOPY=objcopy pahole -J -j1 --btf_gen_floats --lang_exclude=rust .tmp_vmlinux.btf $ bpftool btf dump file .tmp_vmlinux.btf |grep "FUNC "|wc -l 57596 $ LLVM_OBJCOPY=objcopy pahole -J -j2 --btf_gen_floats --lang_exclude=rust .tmp_vmlinux.btf $ bpftool btf dump file .tmp_vmlinux.btf |grep "FUNC "|wc -l 57714 So we see 118 more functions in the latter case. Why would this differ? If we sort and strip out duplicates, we see that the cause is multiple copies of functions: $ bpftool btf dump file .tmp_vmlinux.btf |grep "FUNC "|awk '{print $3}'|sort|uniq|wc -l 57596 Once clue is that each encoder maintains a function table of ELF symbols, and marks a symbol as generated if it has been added to BTF. As the encoder traverses CUs, it adds to the encoder symbol table. In a single-threaded encoding, we will see only one instance of a function in the final BTF because the first match of the binary seach of the function list will be returned. With multiple BTF encoders however - each with its own internal representation of the symbol table - there will be multiple instances of functions added to each individual encoder's BTF fragment. This is supposed to be okay because BTF deduplication will handle merging these. However, this does not take into account the fact that the same function name may live in different CUs as a static function with a different function signature. Could this explain the extra functions? It seems to account for many of them. To take an example tcp_in_window() has two representations: 122435] FUNC_PROTO '(anon)' ret_type_id=122409 vlen=7 'ct' type_id=65074 'dir' type_id=2170 'index' type_id=6 'skb' type_id=2012 'dataoff' type_id=6 'tcph' type_id=61910 'hook_state' type_id=29004 [122436] FUNC 'tcp_in_window' type_id=122435 linkage=static [66683] FUNC_PROTO '(anon)' ret_type_id=372 vlen=4 'seq' type_id=23 'end_seq' type_id=23 's_win' type_id=23 'e_win' type_id=23 [66684] FUNC 'tcp_in_window' type_id=66683 linkage=static ...one from nf_conntrack_proto_tcp.c, the other from tcp_minisocks.c. This raises the conceptual problem - what do we do with such cases? From a code perspective, it's totally fine to have conflicting static functions in different CUs, and one approach would be to retain multiple conflicting function signatures; however this is not really useful as there is no mapping from BTF function signature to site. As a result we have no way of knowing which signature applies to which function site. So perhaps the best approach is to eliminate such inconsistent static function descriptions? The actual amount is small, ~100 functions. Assuming that makes sense, the next question is how. One approach would be at BTF deduplication time, but we have seen cases in the past where BTF did not fully deduplicate identical structures, and in such cases, multiple copies of a function are present which have multiple (but identical) parameter type descriptions. If this happens, the danger in eliminating both is we might eliminate critical kernel functions due to a type deduplication issue. One way to avoid this would be a comparison of number of parameters and (failing that) parameter names; such a comparison would not be subject to issues with identical types leading to identical function definitions. Not 100% foolproof, but would work in nearly all cases I suspect. Another approach would be to re-use the mechanics introduced to compare static functions to see if they have optimized-out parameters to compare function signatures. So static functions paradoxically have to be stored in a global tree and compared to weed out inconsistent duplicates. Luckily, these comparisons can be quite superficial; for the most part number of arguments suffices. What do you think is the best way forward to solve this problem? The optimization-handling code gives us a way to deal with this by postponing addition of such functions until we can guarantee consistency, and I've roughly prototyped a patch on top of [1] that works, but I think first we need to figure out what to do with such inconsistent static functions before determining how we do it. What do folks think? Alan [1] https://github.com/acmel/dwarves/compare/master...alan-maguire:dwarves:btf-optimized
On 1/20/23 5:20 AM, Alan Maguire wrote: > On 18/01/2023 18:15, Yonghong Song wrote: >> >> >> On 1/13/23 12:00 AM, Yonghong Song wrote: >>> >>> >>> On 1/12/23 1:07 PM, Alexei Starovoitov wrote: >>>> On Thu, Jan 12, 2023 at 2:20 AM Alan Maguire <alan.maguire@oracle.com> wrote: >>>>> >>>>> On 12/01/2023 07:23, Yonghong Song wrote: >>>>>> >>>>>> >>>>>> On 1/9/23 7:11 PM, Menglong Dong wrote: >>>>>>> On Tue, Jan 10, 2023 at 4:29 AM Yonghong Song <yhs@meta.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 1/9/23 1:42 AM, menglong8.dong@gmail.com wrote: >>>>>>>>> From: Menglong Dong <imagedong@tencent.com> >>>>>>>>> >>>>>>>>> The function name in kernel may be changed by the compiler. For example, >>>>>>>>> the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'. >>>>>>>>> >>>>>>>>> This kind optimization can happen in any kernel function. Therefor, we >>>>>>>>> should conside this case. >>>>>>>>> >>>>>>>>> If we failed to attach kprobe with a '-ENOENT', then we can lookup the >>>>>>>>> kallsyms and check if there is a similar function end with '.xxx', and >>>>>>>>> retry. >>>>>>>> >>>>>>>> This might produce incorrect result, so this approach won't work >>>>>>>> for all .isra.0 cases. When a function name is changed from >>>>>>>> <func> to <func>.isra.<num>, it is possible that compiler may have >>>>>>>> make some changes to the arguments, e.g., removing one argument, >>>>>>>> chaning a semantics of argument, etc. if bpf program still >>>>>>>> uses the original function signature, the bpf program may >>>>>>>> produce unexpected result. >>>>>>> >>>>>>> Oops, I wasn't aware of this part. Can we make this function disabled >>>>>>> by default and offer an option to users to enable it? Such as: >>>>>>> >>>>>>> bpf_object_adapt_sym(struct bpf_object *obj) >>>>>>> >>>>>>> In my case, kernel function rename is common, and I have to >>>>>>> check all functions and do such adaptation before attaching >>>>>>> my kprobe programs, which makes me can't use auto-attach. >>>>>>> >>>>>>> What's more, I haven't seen the arguments change so far, and >>>>>>> maybe it's not a common case? >>>>>> >>>>>> I don't have statistics, but it happens. In general, if you >>>>>> want to attach to a function like <foo>, but it has a variant >>>>>> <foo>.isra.<num>, you probably should check assembly code >>>>>> to ensure the parameter semantics not changed, and then >>>>>> you can attach to kprobe function <foo>.isra.<num>, which >>>>>> I assume current libbpf infrastructure should support it. >>>>>> After you investigate all these <foo>.isra.<num> functions >>>>>> and confirm their argument semantics won't change, you >>>>>> could use kprobe multi to do attachment. >>>>>> >>>>> >>>>> I crunched some numbers on this, and discovered out of ~1600 >>>>> .isra/.constprop functions, 76 had a missing argument. The patch series >>>>> at [1] is a rough attempt to get pahole to spot these, and add >>>>> BTF entries for each, where the BTF representation reflects >>>>> reality by skipping optimized-out arguments. So for a function >>>>> like >>>>> >>>>> static int ip6_nh_lookup_table(struct net *net, struct fib6_config *cfg, >>>>> const struct in6_addr *gw_addr, u32 tbid, >>>>> int flags, struct fib6_result *res); >>>>> >>>>> Examining the BTF representation using pahole from [1], we see >>>>> >>>>> int ip6_nh_lookup_table.isra.0(struct net *net, struct fib6_config *cfg, struct in6_addr *gw_addr, u32 tbid, int flags); >>>>> >>>>> Comparing to the definition, we see the last parameter is missing, >>>>> i.e. the "struct fib6_result *" argument is missing. The calling pattern - >>>>> where the callers have a struct fib6_result on the stack and pass a pointer - >>>>> is reflected in late DWARF info which shows the argument is not actually >>>>> passed as a register, but can be expressed as an offset relative to the current >>>>> function stack (DW_OP_fbreg). >>>>> >>>>> This approach howvever introduces the problem that currently the kernel >>>>> doesn't allow a "." in a function name. We can fix that, but any BTF encoding >>>>> that introduced optimized functions containing a "." would have to be opt-in >>>>> via a pahole option, so we do not generate invalid vmlinux BTF for kernels >>>>> without that change. >>>>> >>>>> An alternative approach would be to simply encode .isra functions >>>>> in BTF without the .isra suffix (i.e. using "function_name" not >>>>> "function_name.isra"), only doing the BTF encoding if no arguments were >>>>> optimized out - i.e. if the function signature matches expectations. >>>>> The 76 functions with optimized-out parameters could simply be skipped. >>>>> To me that feels like the simpler approach - it avoids issues >>>>> with function name BTF encoding, and with that sort of model a >>>>> loose-matching kallsyms approach - like that described here - could be used >>>>> for kprobes and fentry/fexit. It also fits with the DWARF representation - >>>>> the .isra suffixes are not present in DWARF representations of the function, >>>>> only in the symbol table and kallsyms, so perhaps BTF should follow suit >>>>> and not add the suffixes. What do you think? >>>> >>>> Sounds like a great idea to me. >>>> Addresses this issue in a clean way. >>> >>> Yes, the second approach seems a reasonable approach. If the number >>> of parameters for the *actual* functions equals to the number >>> of parameters for the defined function (abstract_origin), >>> we can roughly assume the actual function signature matches >>> the prototype. Although it is theoretically possible that >>> compiler might change parameter types, e.g., from a >>> struct pointer (struct foo *p) to a int value (p->field1). >>> But this should be extremely rare and we need compiler emitting >>> additional dwarf data (might through btf_decl_tag) to discover >>> such cases. >> > > Thanks! I've prototyped a solution at [1]. > > The key problem is pahole processing compilation units separately; > the issue is that for some functions, they have optimized out > parameters in some compilation units but not others (NF_HOOK() > does this for example). It's a pain, especially as we want to > preserve parallel BTF encoding as much as possible, so the > best solution I could come up with was to save information on > functions that had a suffix match in a global encoder binary > tree. Then, when we are collecting threads, they can be safely > added prior to BTF merging, since at that point we have > recorded if they have optimized-out parameters in any compilation > unit. There may be a better way to handle this, but I think > we are stuck with comparing binary-wide to see if the > parameters are consistent. The code is (I think) careful > to limit this to cases where "."-suffixed functions are found. I agree that for this .isra.<num> issue and later static functions with different signature issue, global view is needed to make proper decision. > > In testing this however, I think there is a wider issue with > BTF encoding of static functions which may require a similar global > comparison mechanism. More below... > >> I checked with some compiler guys at Meta. They confirmed that clang >> might have the same optimization (eliminating some function parameters >> for static functions), but clang won't change function linkage name. >> So that means, clang won't do static function cloning and it will >> only remove function parameters if this can be applied to all call sites. >> > > Great, that simplifies things a lot. > >> I checked the clang (clang16) build kernel with latest bpf-next >> and didn't find a single instance that a static function's parameter >> is removed. >> > > Excellent! > >> Also, current libbpf kprobe API supports to pass a function address >> to kernel. So user space can always do their own /proc/kallsyms search >> and find func address for either regular function or function >> with .isa.<num>/.constprop.<num> suffixes. > > Right; one way I've done this is to have a special "okprobe" section > name where we support matching with a "." suffix as well as with > the exact name. Might be worth adding that support to libbpf itself > if the above approach lands. I am not sure about this. We should keep a good default to handle .isra functions as we discussed earlier. If user still wants to kprobe a .isra function which is filtered out by our default handling, user can use addr instead, in which case, user needs to check dwarf output and/or assembly to make sure they understand how many arguments are actually used and how they are used. > > The bigger concern I have is in testing this I hit a problem which > at first looked like a parallel BTF encoding problem, but on > deeper analysis looks like a conceptual issue in how we handle > static functions. > > To demonstrate, generate vmlinux BTF twice, once with a single thread > and once with 2 threads. We observe a different number of functions > that end up in BTF for the exact same object: > > $ LLVM_OBJCOPY=objcopy pahole -J -j1 --btf_gen_floats --lang_exclude=rust .tmp_vmlinux.btf > $ bpftool btf dump file .tmp_vmlinux.btf |grep "FUNC "|wc -l > 57596 > > $ LLVM_OBJCOPY=objcopy pahole -J -j2 --btf_gen_floats --lang_exclude=rust .tmp_vmlinux.btf > $ bpftool btf dump file .tmp_vmlinux.btf |grep "FUNC "|wc -l > 57714 > > So we see 118 more functions in the latter case. Why would this differ? If we sort > and strip out duplicates, we see that the cause is multiple copies of functions: > > $ bpftool btf dump file .tmp_vmlinux.btf |grep "FUNC "|awk '{print $3}'|sort|uniq|wc -l > 57596 > > Once clue is that each encoder maintains a function table of ELF symbols, and > marks a symbol as generated if it has been added to BTF. As the encoder > traverses CUs, it adds to the encoder symbol table. In a single-threaded encoding, > we will see only one instance of a function in the final BTF because the first > match of the binary seach of the function list will be returned. With multiple > BTF encoders however - each with its own internal representation of the symbol > table - there will be multiple instances of functions added to each individual > encoder's BTF fragment. This is supposed to be okay because BTF deduplication > will handle merging these. > > However, this does not take into account the fact that the same function name > may live in different CUs as a static function with a different function signature. > Could this explain the extra functions? Could be as you explained below. Also, the original behavior for static function with one thread seems wrong. User only find one static function definition in BTF and may assume it is the one user expected. But this may not be the case since other same-name static functions may be simply ignored. > > It seems to account for many of them. To take an example tcp_in_window() > has two representations: > > 122435] FUNC_PROTO '(anon)' ret_type_id=122409 vlen=7 > 'ct' type_id=65074 > 'dir' type_id=2170 > 'index' type_id=6 > 'skb' type_id=2012 > 'dataoff' type_id=6 > 'tcph' type_id=61910 > 'hook_state' type_id=29004 > [122436] FUNC 'tcp_in_window' type_id=122435 linkage=static > > [66683] FUNC_PROTO '(anon)' ret_type_id=372 vlen=4 > 'seq' type_id=23 > 'end_seq' type_id=23 > 's_win' type_id=23 > 'e_win' type_id=23 > [66684] FUNC 'tcp_in_window' type_id=66683 linkage=static > > ...one from nf_conntrack_proto_tcp.c, the other from tcp_minisocks.c. > > This raises the conceptual problem - what do we do with such cases? > From a code perspective, it's totally fine to have conflicting static > functions in different CUs, and one approach would be to retain multiple > conflicting function signatures; however this is not really useful as > there is no mapping from BTF function signature to site. As a result > we have no way of knowing which signature applies to which function site. > So perhaps the best approach is to eliminate such inconsistent static > function descriptions? The actual amount is small, ~100 functions. Removing these inconsistent static functions could be a simpler approach. To really resolve this issue, BTF needs to encode more information, e.g., DW_AT_low_pc, so this function can be related to a particular ksym. We could extend BTF somehow to encode this information. One possible is to utilize the btf_decl_tag. For same-name static functions, we could have btf_decl_tag("static_func:name:<low_pc_1>") -> foo (first static func in btf) btf_decl_tag("static_func:name:<low_pc_2>") -> foo (second static func in btf) We only need to do this for static functions which have same names. In verifier, we could build the above relationship and establish btf_id for first foo -> low_pc_1 btf_id for second foo -> low_pc_2 so verifier can then find the correct ksym addr for a particular btf_id for 'foo'. > > Assuming that makes sense, the next question is how. One approach > would be at BTF deduplication time, but we have seen cases in the > past where BTF did not fully deduplicate identical structures, and > in such cases, multiple copies of a function are present which > have multiple (but identical) parameter type descriptions. If this > happens, the danger in eliminating both is we might eliminate > critical kernel functions due to a type deduplication issue. One > way to avoid this would be a comparison of number of parameters > and (failing that) parameter names; such a comparison would not > be subject to issues with identical types leading to identical > function definitions. Not 100% foolproof, but would work in > nearly all cases I suspect. > > Another approach would be to re-use the mechanics introduced to > compare static functions to see if they have optimized-out > parameters to compare function signatures. So static functions > paradoxically have to be stored in a global tree and compared > to weed out inconsistent duplicates. Luckily, these comparisons > can be quite superficial; for the most part number of arguments > suffices. If I understand correctly, for same-name static functions, we would like compare # of parameters first. If they are the same, we then compare parameter names. <name>.isra.<num> functions will compare against <name> func to ensure the number of parameters the same. I am not 100% sure what is the difference between the above two approaches. Only implementation difference, right? > > What do you think is the best way forward to solve this problem? > The optimization-handling code gives us a way to deal with this > by postponing addition of such functions until we can guarantee > consistency, and I've roughly prototyped a patch on top of [1] > that works, but I think first we need to figure out what to do > with such inconsistent static functions before determining how we > do it. What do folks think? > > Alan > > [1] https://github.com/acmel/dwarves/compare/master...alan-maguire:dwarves:btf-optimized
On 21/01/2023 05:29, Yonghong Song wrote: > > > On 1/20/23 5:20 AM, Alan Maguire wrote: >> On 18/01/2023 18:15, Yonghong Song wrote: >>> >>> >>> On 1/13/23 12:00 AM, Yonghong Song wrote: >>>> >>>> >>>> On 1/12/23 1:07 PM, Alexei Starovoitov wrote: >>>>> On Thu, Jan 12, 2023 at 2:20 AM Alan Maguire <alan.maguire@oracle.com> wrote: >>>>>> >>>>>> On 12/01/2023 07:23, Yonghong Song wrote: >>>>>>> >>>>>>> >>>>>>> On 1/9/23 7:11 PM, Menglong Dong wrote: >>>>>>>> On Tue, Jan 10, 2023 at 4:29 AM Yonghong Song <yhs@meta.com> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 1/9/23 1:42 AM, menglong8.dong@gmail.com wrote: >>>>>>>>>> From: Menglong Dong <imagedong@tencent.com> >>>>>>>>>> >>>>>>>>>> The function name in kernel may be changed by the compiler. For example, >>>>>>>>>> the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'. >>>>>>>>>> >>>>>>>>>> This kind optimization can happen in any kernel function. Therefor, we >>>>>>>>>> should conside this case. >>>>>>>>>> >>>>>>>>>> If we failed to attach kprobe with a '-ENOENT', then we can lookup the >>>>>>>>>> kallsyms and check if there is a similar function end with '.xxx', and >>>>>>>>>> retry. >>>>>>>>> >>>>>>>>> This might produce incorrect result, so this approach won't work >>>>>>>>> for all .isra.0 cases. When a function name is changed from >>>>>>>>> <func> to <func>.isra.<num>, it is possible that compiler may have >>>>>>>>> make some changes to the arguments, e.g., removing one argument, >>>>>>>>> chaning a semantics of argument, etc. if bpf program still >>>>>>>>> uses the original function signature, the bpf program may >>>>>>>>> produce unexpected result. >>>>>>>> >>>>>>>> Oops, I wasn't aware of this part. Can we make this function disabled >>>>>>>> by default and offer an option to users to enable it? Such as: >>>>>>>> >>>>>>>> bpf_object_adapt_sym(struct bpf_object *obj) >>>>>>>> >>>>>>>> In my case, kernel function rename is common, and I have to >>>>>>>> check all functions and do such adaptation before attaching >>>>>>>> my kprobe programs, which makes me can't use auto-attach. >>>>>>>> >>>>>>>> What's more, I haven't seen the arguments change so far, and >>>>>>>> maybe it's not a common case? >>>>>>> >>>>>>> I don't have statistics, but it happens. In general, if you >>>>>>> want to attach to a function like <foo>, but it has a variant >>>>>>> <foo>.isra.<num>, you probably should check assembly code >>>>>>> to ensure the parameter semantics not changed, and then >>>>>>> you can attach to kprobe function <foo>.isra.<num>, which >>>>>>> I assume current libbpf infrastructure should support it. >>>>>>> After you investigate all these <foo>.isra.<num> functions >>>>>>> and confirm their argument semantics won't change, you >>>>>>> could use kprobe multi to do attachment. >>>>>>> >>>>>> >>>>>> I crunched some numbers on this, and discovered out of ~1600 >>>>>> .isra/.constprop functions, 76 had a missing argument. The patch series >>>>>> at [1] is a rough attempt to get pahole to spot these, and add >>>>>> BTF entries for each, where the BTF representation reflects >>>>>> reality by skipping optimized-out arguments. So for a function >>>>>> like >>>>>> >>>>>> static int ip6_nh_lookup_table(struct net *net, struct fib6_config *cfg, >>>>>> const struct in6_addr *gw_addr, u32 tbid, >>>>>> int flags, struct fib6_result *res); >>>>>> >>>>>> Examining the BTF representation using pahole from [1], we see >>>>>> >>>>>> int ip6_nh_lookup_table.isra.0(struct net *net, struct fib6_config *cfg, struct in6_addr *gw_addr, u32 tbid, int flags); >>>>>> >>>>>> Comparing to the definition, we see the last parameter is missing, >>>>>> i.e. the "struct fib6_result *" argument is missing. The calling pattern - >>>>>> where the callers have a struct fib6_result on the stack and pass a pointer - >>>>>> is reflected in late DWARF info which shows the argument is not actually >>>>>> passed as a register, but can be expressed as an offset relative to the current >>>>>> function stack (DW_OP_fbreg). >>>>>> >>>>>> This approach howvever introduces the problem that currently the kernel >>>>>> doesn't allow a "." in a function name. We can fix that, but any BTF encoding >>>>>> that introduced optimized functions containing a "." would have to be opt-in >>>>>> via a pahole option, so we do not generate invalid vmlinux BTF for kernels >>>>>> without that change. >>>>>> >>>>>> An alternative approach would be to simply encode .isra functions >>>>>> in BTF without the .isra suffix (i.e. using "function_name" not >>>>>> "function_name.isra"), only doing the BTF encoding if no arguments were >>>>>> optimized out - i.e. if the function signature matches expectations. >>>>>> The 76 functions with optimized-out parameters could simply be skipped. >>>>>> To me that feels like the simpler approach - it avoids issues >>>>>> with function name BTF encoding, and with that sort of model a >>>>>> loose-matching kallsyms approach - like that described here - could be used >>>>>> for kprobes and fentry/fexit. It also fits with the DWARF representation - >>>>>> the .isra suffixes are not present in DWARF representations of the function, >>>>>> only in the symbol table and kallsyms, so perhaps BTF should follow suit >>>>>> and not add the suffixes. What do you think? >>>>> >>>>> Sounds like a great idea to me. >>>>> Addresses this issue in a clean way. >>>> >>>> Yes, the second approach seems a reasonable approach. If the number >>>> of parameters for the *actual* functions equals to the number >>>> of parameters for the defined function (abstract_origin), >>>> we can roughly assume the actual function signature matches >>>> the prototype. Although it is theoretically possible that >>>> compiler might change parameter types, e.g., from a >>>> struct pointer (struct foo *p) to a int value (p->field1). >>>> But this should be extremely rare and we need compiler emitting >>>> additional dwarf data (might through btf_decl_tag) to discover >>>> such cases. >>> >> >> Thanks! I've prototyped a solution at [1]. >> >> The key problem is pahole processing compilation units separately; >> the issue is that for some functions, they have optimized out >> parameters in some compilation units but not others (NF_HOOK() >> does this for example). It's a pain, especially as we want to >> preserve parallel BTF encoding as much as possible, so the >> best solution I could come up with was to save information on >> functions that had a suffix match in a global encoder binary >> tree. Then, when we are collecting threads, they can be safely >> added prior to BTF merging, since at that point we have >> recorded if they have optimized-out parameters in any compilation >> unit. There may be a better way to handle this, but I think >> we are stuck with comparing binary-wide to see if the >> parameters are consistent. The code is (I think) careful >> to limit this to cases where "."-suffixed functions are found. > > I agree that for this .isra.<num> issue and later static functions > with different signature issue, global view is needed to make > proper decision. > >> >> In testing this however, I think there is a wider issue with >> BTF encoding of static functions which may require a similar global >> comparison mechanism. More below... >> >>> I checked with some compiler guys at Meta. They confirmed that clang >>> might have the same optimization (eliminating some function parameters >>> for static functions), but clang won't change function linkage name. >>> So that means, clang won't do static function cloning and it will >>> only remove function parameters if this can be applied to all call sites. >>> >> >> Great, that simplifies things a lot. >> >>> I checked the clang (clang16) build kernel with latest bpf-next >>> and didn't find a single instance that a static function's parameter >>> is removed. >>> >> >> Excellent! >> >>> Also, current libbpf kprobe API supports to pass a function address >>> to kernel. So user space can always do their own /proc/kallsyms search >>> and find func address for either regular function or function >>> with .isa.<num>/.constprop.<num> suffixes. >> >> Right; one way I've done this is to have a special "okprobe" section >> name where we support matching with a "." suffix as well as with >> the exact name. Might be worth adding that support to libbpf itself >> if the above approach lands. > > I am not sure about this. We should keep a good default to handle > .isra functions as we discussed earlier. If user still wants > to kprobe a .isra function which is filtered out by our default > handling, user can use addr instead, in which case, user needs > to check dwarf output and/or assembly to make sure they understand > how many arguments are actually used and how they are used. But if we ensure BTF only encodes consistent representations of functions and spots optimized-out parameters, wouldn't the presence in BTF of an .isra function provide a guarantee parameters are not optimized out? Note I'm not suggesting changing "kprobe/" or "fentry/" semantics here, but rather having other dedicated SEC() prefixes to support "try for foo, but fall back to foo.isra if attach to foo fails, and we can find foo in BTF". > >> >> The bigger concern I have is in testing this I hit a problem which >> at first looked like a parallel BTF encoding problem, but on >> deeper analysis looks like a conceptual issue in how we handle >> static functions. >> >> To demonstrate, generate vmlinux BTF twice, once with a single thread >> and once with 2 threads. We observe a different number of functions >> that end up in BTF for the exact same object: >> >> $ LLVM_OBJCOPY=objcopy pahole -J -j1 --btf_gen_floats --lang_exclude=rust .tmp_vmlinux.btf >> $ bpftool btf dump file .tmp_vmlinux.btf |grep "FUNC "|wc -l >> 57596 >> >> $ LLVM_OBJCOPY=objcopy pahole -J -j2 --btf_gen_floats --lang_exclude=rust .tmp_vmlinux.btf >> $ bpftool btf dump file .tmp_vmlinux.btf |grep "FUNC "|wc -l >> 57714 >> >> So we see 118 more functions in the latter case. Why would this differ? If we sort >> and strip out duplicates, we see that the cause is multiple copies of functions: >> >> $ bpftool btf dump file .tmp_vmlinux.btf |grep "FUNC "|awk '{print $3}'|sort|uniq|wc -l >> 57596 >> >> Once clue is that each encoder maintains a function table of ELF symbols, and >> marks a symbol as generated if it has been added to BTF. As the encoder >> traverses CUs, it adds to the encoder symbol table. In a single-threaded encoding, >> we will see only one instance of a function in the final BTF because the first >> match of the binary seach of the function list will be returned. With multiple >> BTF encoders however - each with its own internal representation of the symbol >> table - there will be multiple instances of functions added to each individual >> encoder's BTF fragment. This is supposed to be okay because BTF deduplication >> will handle merging these. >> >> However, this does not take into account the fact that the same function name >> may live in different CUs as a static function with a different function signature. >> Could this explain the extra functions? > > Could be as you explained below. Also, the original behavior for static function with one thread seems wrong. User only find one static function > definition in BTF and may assume it is the one user expected. But this may not be the case since other same-name static functions may be > simply ignored. > Right, I should have been clearer on this; the single-threaded case isn't right either, it just picks the first instance of a function it finds. >> >> It seems to account for many of them. To take an example tcp_in_window() >> has two representations: >> >> 122435] FUNC_PROTO '(anon)' ret_type_id=122409 vlen=7 >> 'ct' type_id=65074 >> 'dir' type_id=2170 >> 'index' type_id=6 >> 'skb' type_id=2012 >> 'dataoff' type_id=6 >> 'tcph' type_id=61910 >> 'hook_state' type_id=29004 >> [122436] FUNC 'tcp_in_window' type_id=122435 linkage=static >> >> [66683] FUNC_PROTO '(anon)' ret_type_id=372 vlen=4 >> 'seq' type_id=23 >> 'end_seq' type_id=23 >> 's_win' type_id=23 >> 'e_win' type_id=23 >> [66684] FUNC 'tcp_in_window' type_id=66683 linkage=static >> >> ...one from nf_conntrack_proto_tcp.c, the other from tcp_minisocks.c. >> >> This raises the conceptual problem - what do we do with such cases? >> From a code perspective, it's totally fine to have conflicting static >> functions in different CUs, and one approach would be to retain multiple >> conflicting function signatures; however this is not really useful as >> there is no mapping from BTF function signature to site. As a result >> we have no way of knowing which signature applies to which function site. >> So perhaps the best approach is to eliminate such inconsistent static >> function descriptions? The actual amount is small, ~100 functions. > > Removing these inconsistent static functions could be a simpler > approach. I took that approach with https://github.com/acmel/dwarves/commit/80eaecdb00b3d79becc2133b854593277093b115 Static functions with inconsistent prototypes are left out of BTF encoding. Given that the numbers here are pretty low (around 100 or so, not including .isra functions which have inconsistent prototypes due to optimizations), that seems to be the simplest solution for now. > To really resolve this issue, BTF needs to encode more > information, e.g., DW_AT_low_pc, so this function can be related to a particular ksym. We could extend BTF somehow to encode this information. > One possible is to utilize the btf_decl_tag. For same-name static > functions, we could have > btf_decl_tag("static_func:name:<low_pc_1>") -> foo (first static func in btf) > btf_decl_tag("static_func:name:<low_pc_2>") -> foo (second static func in btf) > We only need to do this for static functions which have same names. > > In verifier, we could build the above relationship and establish > btf_id for first foo -> low_pc_1 > btf_id for second foo -> low_pc_2 > so verifier can then find the correct ksym addr for a particular > btf_id for 'foo'. > It seems workable technically, but I wonder how much benefit it provides? The absolute number of static functions with conflicting prototypes is small (around 100), and the usability of the above would be hard to get right. My suggestion would be that if we can ensure that BTF encoding will not encode inconsistent static function representations, we could potentially do something like the following on attach: - a BPF program specifies "fentry/foo" - we find a representation of foo in BTF - we notice that there are multiple instances of foo in kallsyms - because we know BTF would not have encoded foo if these had inconsistent prototypes, we can safely attach to all those instances This seems to provide the least surprise for the user; i.e. I attached to a function, and my program fired when it ran. If the function has inconsistent representations, it isn't present in BTF so we cannot attach safely. >> >> Assuming that makes sense, the next question is how. One approach >> would be at BTF deduplication time, but we have seen cases in the >> past where BTF did not fully deduplicate identical structures, and >> in such cases, multiple copies of a function are present which >> have multiple (but identical) parameter type descriptions. If this >> happens, the danger in eliminating both is we might eliminate >> critical kernel functions due to a type deduplication issue. One >> way to avoid this would be a comparison of number of parameters >> and (failing that) parameter names; such a comparison would not >> be subject to issues with identical types leading to identical >> function definitions. Not 100% foolproof, but would work in >> nearly all cases I suspect. >> >> Another approach would be to re-use the mechanics introduced to >> compare static functions to see if they have optimized-out >> parameters to compare function signatures. So static functions >> paradoxically have to be stored in a global tree and compared >> to weed out inconsistent duplicates. Luckily, these comparisons >> can be quite superficial; for the most part number of arguments >> suffices. > > If I understand correctly, for same-name static functions, > we would like compare # of parameters > first. If they are the same, we then compare parameter names. > <name>.isra.<num> functions will compare against <name> func > to ensure the number of parameters the same. I am not 100% > sure what is the difference between the above two approaches. > Only implementation difference, right? > Right, the only question is when we eliminate the inconsistencies (prior to BTF dedup or during it). We need the mechanics for comparing static functions to handle the .isra case, so I went with re-using that scheme to catch inconsistent static functions in https://github.com/acmel/dwarves/commit/80eaecdb00b3d79becc2133b854593277093b115 Thanks! Alan >> >> What do you think is the best way forward to solve this problem? >> The optimization-handling code gives us a way to deal with this >> by postponing addition of such functions until we can guarantee >> consistency, and I've roughly prototyped a patch on top of [1] >> that works, but I think first we need to figure out what to do >> with such inconsistent static functions before determining how we >> do it. What do folks think? >> >> Alan >> >> [1] https://github.com/acmel/dwarves/compare/master...alan-maguire:dwarves:btf-optimized
On 1/23/23 9:14 AM, Alan Maguire wrote: > On 21/01/2023 05:29, Yonghong Song wrote: >> >> >> On 1/20/23 5:20 AM, Alan Maguire wrote: >>> On 18/01/2023 18:15, Yonghong Song wrote: >>>> >>>> >>>> On 1/13/23 12:00 AM, Yonghong Song wrote: >>>>> >>>>> >>>>> On 1/12/23 1:07 PM, Alexei Starovoitov wrote: >>>>>> On Thu, Jan 12, 2023 at 2:20 AM Alan Maguire <alan.maguire@oracle.com> wrote: >>>>>>> >>>>>>> On 12/01/2023 07:23, Yonghong Song wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 1/9/23 7:11 PM, Menglong Dong wrote: >>>>>>>>> On Tue, Jan 10, 2023 at 4:29 AM Yonghong Song <yhs@meta.com> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 1/9/23 1:42 AM, menglong8.dong@gmail.com wrote: >>>>>>>>>>> From: Menglong Dong <imagedong@tencent.com> >>>>>>>>>>> >>>>>>>>>>> The function name in kernel may be changed by the compiler. For example, >>>>>>>>>>> the function 'ip_rcv_core' can be compiled to 'ip_rcv_core.isra.0'. >>>>>>>>>>> >>>>>>>>>>> This kind optimization can happen in any kernel function. Therefor, we >>>>>>>>>>> should conside this case. >>>>>>>>>>> >>>>>>>>>>> If we failed to attach kprobe with a '-ENOENT', then we can lookup the >>>>>>>>>>> kallsyms and check if there is a similar function end with '.xxx', and >>>>>>>>>>> retry. >>>>>>>>>> >>>>>>>>>> This might produce incorrect result, so this approach won't work >>>>>>>>>> for all .isra.0 cases. When a function name is changed from >>>>>>>>>> <func> to <func>.isra.<num>, it is possible that compiler may have >>>>>>>>>> make some changes to the arguments, e.g., removing one argument, >>>>>>>>>> chaning a semantics of argument, etc. if bpf program still >>>>>>>>>> uses the original function signature, the bpf program may >>>>>>>>>> produce unexpected result. >>>>>>>>> >>>>>>>>> Oops, I wasn't aware of this part. Can we make this function disabled >>>>>>>>> by default and offer an option to users to enable it? Such as: >>>>>>>>> >>>>>>>>> bpf_object_adapt_sym(struct bpf_object *obj) >>>>>>>>> >>>>>>>>> In my case, kernel function rename is common, and I have to >>>>>>>>> check all functions and do such adaptation before attaching >>>>>>>>> my kprobe programs, which makes me can't use auto-attach. >>>>>>>>> >>>>>>>>> What's more, I haven't seen the arguments change so far, and >>>>>>>>> maybe it's not a common case? >>>>>>>> >>>>>>>> I don't have statistics, but it happens. In general, if you >>>>>>>> want to attach to a function like <foo>, but it has a variant >>>>>>>> <foo>.isra.<num>, you probably should check assembly code >>>>>>>> to ensure the parameter semantics not changed, and then >>>>>>>> you can attach to kprobe function <foo>.isra.<num>, which >>>>>>>> I assume current libbpf infrastructure should support it. >>>>>>>> After you investigate all these <foo>.isra.<num> functions >>>>>>>> and confirm their argument semantics won't change, you >>>>>>>> could use kprobe multi to do attachment. >>>>>>>> >>>>>>> >>>>>>> I crunched some numbers on this, and discovered out of ~1600 >>>>>>> .isra/.constprop functions, 76 had a missing argument. The patch series >>>>>>> at [1] is a rough attempt to get pahole to spot these, and add >>>>>>> BTF entries for each, where the BTF representation reflects >>>>>>> reality by skipping optimized-out arguments. So for a function >>>>>>> like >>>>>>> >>>>>>> static int ip6_nh_lookup_table(struct net *net, struct fib6_config *cfg, >>>>>>> const struct in6_addr *gw_addr, u32 tbid, >>>>>>> int flags, struct fib6_result *res); >>>>>>> >>>>>>> Examining the BTF representation using pahole from [1], we see >>>>>>> >>>>>>> int ip6_nh_lookup_table.isra.0(struct net *net, struct fib6_config *cfg, struct in6_addr *gw_addr, u32 tbid, int flags); >>>>>>> >>>>>>> Comparing to the definition, we see the last parameter is missing, >>>>>>> i.e. the "struct fib6_result *" argument is missing. The calling pattern - >>>>>>> where the callers have a struct fib6_result on the stack and pass a pointer - >>>>>>> is reflected in late DWARF info which shows the argument is not actually >>>>>>> passed as a register, but can be expressed as an offset relative to the current >>>>>>> function stack (DW_OP_fbreg). >>>>>>> >>>>>>> This approach howvever introduces the problem that currently the kernel >>>>>>> doesn't allow a "." in a function name. We can fix that, but any BTF encoding >>>>>>> that introduced optimized functions containing a "." would have to be opt-in >>>>>>> via a pahole option, so we do not generate invalid vmlinux BTF for kernels >>>>>>> without that change. >>>>>>> >>>>>>> An alternative approach would be to simply encode .isra functions >>>>>>> in BTF without the .isra suffix (i.e. using "function_name" not >>>>>>> "function_name.isra"), only doing the BTF encoding if no arguments were >>>>>>> optimized out - i.e. if the function signature matches expectations. >>>>>>> The 76 functions with optimized-out parameters could simply be skipped. >>>>>>> To me that feels like the simpler approach - it avoids issues >>>>>>> with function name BTF encoding, and with that sort of model a >>>>>>> loose-matching kallsyms approach - like that described here - could be used >>>>>>> for kprobes and fentry/fexit. It also fits with the DWARF representation - >>>>>>> the .isra suffixes are not present in DWARF representations of the function, >>>>>>> only in the symbol table and kallsyms, so perhaps BTF should follow suit >>>>>>> and not add the suffixes. What do you think? >>>>>> >>>>>> Sounds like a great idea to me. >>>>>> Addresses this issue in a clean way. >>>>> >>>>> Yes, the second approach seems a reasonable approach. If the number >>>>> of parameters for the *actual* functions equals to the number >>>>> of parameters for the defined function (abstract_origin), >>>>> we can roughly assume the actual function signature matches >>>>> the prototype. Although it is theoretically possible that >>>>> compiler might change parameter types, e.g., from a >>>>> struct pointer (struct foo *p) to a int value (p->field1). >>>>> But this should be extremely rare and we need compiler emitting >>>>> additional dwarf data (might through btf_decl_tag) to discover >>>>> such cases. >>>> >>> >>> Thanks! I've prototyped a solution at [1]. >>> >>> The key problem is pahole processing compilation units separately; >>> the issue is that for some functions, they have optimized out >>> parameters in some compilation units but not others (NF_HOOK() >>> does this for example). It's a pain, especially as we want to >>> preserve parallel BTF encoding as much as possible, so the >>> best solution I could come up with was to save information on >>> functions that had a suffix match in a global encoder binary >>> tree. Then, when we are collecting threads, they can be safely >>> added prior to BTF merging, since at that point we have >>> recorded if they have optimized-out parameters in any compilation >>> unit. There may be a better way to handle this, but I think >>> we are stuck with comparing binary-wide to see if the >>> parameters are consistent. The code is (I think) careful >>> to limit this to cases where "."-suffixed functions are found. >> >> I agree that for this .isra.<num> issue and later static functions >> with different signature issue, global view is needed to make >> proper decision. >> >>> >>> In testing this however, I think there is a wider issue with >>> BTF encoding of static functions which may require a similar global >>> comparison mechanism. More below... >>> >>>> I checked with some compiler guys at Meta. They confirmed that clang >>>> might have the same optimization (eliminating some function parameters >>>> for static functions), but clang won't change function linkage name. >>>> So that means, clang won't do static function cloning and it will >>>> only remove function parameters if this can be applied to all call sites. >>>> >>> >>> Great, that simplifies things a lot. >>> >>>> I checked the clang (clang16) build kernel with latest bpf-next >>>> and didn't find a single instance that a static function's parameter >>>> is removed. >>>> >>> >>> Excellent! >>> >>>> Also, current libbpf kprobe API supports to pass a function address >>>> to kernel. So user space can always do their own /proc/kallsyms search >>>> and find func address for either regular function or function >>>> with .isa.<num>/.constprop.<num> suffixes. >>> >>> Right; one way I've done this is to have a special "okprobe" section >>> name where we support matching with a "." suffix as well as with >>> the exact name. Might be worth adding that support to libbpf itself >>> if the above approach lands. >> >> I am not sure about this. We should keep a good default to handle >> .isra functions as we discussed earlier. If user still wants >> to kprobe a .isra function which is filtered out by our default >> handling, user can use addr instead, in which case, user needs >> to check dwarf output and/or assembly to make sure they understand >> how many arguments are actually used and how they are used. > > But if we ensure BTF only encodes consistent representations of > functions and spots optimized-out parameters, wouldn't the presence > in BTF of an .isra function provide a guarantee parameters are not > optimized out? Note I'm not suggesting changing "kprobe/" or "fentry/" > semantics here, but rather having other dedicated SEC() prefixes to > support "try for foo, but fall back to foo.isra if attach to foo fails, > and we can find foo in BTF". We could. I just feel this is a corner case and user have an alternative to use address directly. This is just my opinion though. > >> >>> >>> The bigger concern I have is in testing this I hit a problem which >>> at first looked like a parallel BTF encoding problem, but on >>> deeper analysis looks like a conceptual issue in how we handle >>> static functions. >>> >>> To demonstrate, generate vmlinux BTF twice, once with a single thread >>> and once with 2 threads. We observe a different number of functions >>> that end up in BTF for the exact same object: >>> >>> $ LLVM_OBJCOPY=objcopy pahole -J -j1 --btf_gen_floats --lang_exclude=rust .tmp_vmlinux.btf >>> $ bpftool btf dump file .tmp_vmlinux.btf |grep "FUNC "|wc -l >>> 57596 >>> >>> $ LLVM_OBJCOPY=objcopy pahole -J -j2 --btf_gen_floats --lang_exclude=rust .tmp_vmlinux.btf >>> $ bpftool btf dump file .tmp_vmlinux.btf |grep "FUNC "|wc -l >>> 57714 >>> >>> So we see 118 more functions in the latter case. Why would this differ? If we sort >>> and strip out duplicates, we see that the cause is multiple copies of functions: >>> >>> $ bpftool btf dump file .tmp_vmlinux.btf |grep "FUNC "|awk '{print $3}'|sort|uniq|wc -l >>> 57596 >>> >>> Once clue is that each encoder maintains a function table of ELF symbols, and >>> marks a symbol as generated if it has been added to BTF. As the encoder >>> traverses CUs, it adds to the encoder symbol table. In a single-threaded encoding, >>> we will see only one instance of a function in the final BTF because the first >>> match of the binary seach of the function list will be returned. With multiple >>> BTF encoders however - each with its own internal representation of the symbol >>> table - there will be multiple instances of functions added to each individual >>> encoder's BTF fragment. This is supposed to be okay because BTF deduplication >>> will handle merging these. >>> >>> However, this does not take into account the fact that the same function name >>> may live in different CUs as a static function with a different function signature. >>> Could this explain the extra functions? >> >> Could be as you explained below. Also, the original behavior for static function with one thread seems wrong. User only find one static function >> definition in BTF and may assume it is the one user expected. But this may not be the case since other same-name static functions may be >> simply ignored. >> > > Right, I should have been clearer on this; the single-threaded case isn't > right either, it just picks the first instance of a function it finds. > >>> >>> It seems to account for many of them. To take an example tcp_in_window() >>> has two representations: >>> >>> 122435] FUNC_PROTO '(anon)' ret_type_id=122409 vlen=7 >>> 'ct' type_id=65074 >>> 'dir' type_id=2170 >>> 'index' type_id=6 >>> 'skb' type_id=2012 >>> 'dataoff' type_id=6 >>> 'tcph' type_id=61910 >>> 'hook_state' type_id=29004 >>> [122436] FUNC 'tcp_in_window' type_id=122435 linkage=static >>> >>> [66683] FUNC_PROTO '(anon)' ret_type_id=372 vlen=4 >>> 'seq' type_id=23 >>> 'end_seq' type_id=23 >>> 's_win' type_id=23 >>> 'e_win' type_id=23 >>> [66684] FUNC 'tcp_in_window' type_id=66683 linkage=static >>> >>> ...one from nf_conntrack_proto_tcp.c, the other from tcp_minisocks.c. >>> >>> This raises the conceptual problem - what do we do with such cases? >>> From a code perspective, it's totally fine to have conflicting static >>> functions in different CUs, and one approach would be to retain multiple >>> conflicting function signatures; however this is not really useful as >>> there is no mapping from BTF function signature to site. As a result >>> we have no way of knowing which signature applies to which function site. >>> So perhaps the best approach is to eliminate such inconsistent static >>> function descriptions? The actual amount is small, ~100 functions. >> >> Removing these inconsistent static functions could be a simpler >> approach. > > I took that approach with > > https://github.com/acmel/dwarves/commit/80eaecdb00b3d79becc2133b854593277093b115 > > Static functions with inconsistent prototypes are left out of > BTF encoding. Given that the numbers here are pretty low (around > 100 or so, not including .isra functions which have inconsistent > prototypes due to optimizations), that seems to be the simplest > solution for now. Yes, I am okay with this approach. This will prevent user mistaking kprobe a wrong function and this is what we want for now. > >> To really resolve this issue, BTF needs to encode more >> information, e.g., DW_AT_low_pc, so this function can be related to a particular ksym. We could extend BTF somehow to encode this information. >> One possible is to utilize the btf_decl_tag. For same-name static >> functions, we could have >> btf_decl_tag("static_func:name:<low_pc_1>") -> foo (first static func in btf) >> btf_decl_tag("static_func:name:<low_pc_2>") -> foo (second static func in btf) >> We only need to do this for static functions which have same names. >> >> In verifier, we could build the above relationship and establish >> btf_id for first foo -> low_pc_1 >> btf_id for second foo -> low_pc_2 >> so verifier can then find the correct ksym addr for a particular >> btf_id for 'foo'. >> > > It seems workable technically, but I wonder how much benefit it > provides? The absolute number of static functions with conflicting As I mentioned in the above, we can do the minimum work in the beginning to prevent user errors. I agree that it may not be worthwhile to support those just ~100 static functions with more BTF types and kernel codes. Users already have alternatives. > prototypes is small (around 100), and the usability of the above > would be hard to get right. My suggestion would be that if we > can ensure that BTF encoding will not encode inconsistent static > function representations, we could potentially do something like I assume consistent static function representations means total match of function signature (return type, func name, parameter type, parameter name). > the following on attach: > > - a BPF program specifies "fentry/foo" > - we find a representation of foo in BTF > - we notice that there are multiple instances of foo in kallsyms > - because we know BTF would not have encoded foo if these had > inconsistent prototypes, we can safely attach to all those > instances This should work. But if we do find multiple instances of the same function. I think it would be good if we can spend some time find out why dedup didn't catch them. > > This seems to provide the least surprise for the user; i.e. > I attached to a function, and my program fired when it ran. If > the function has inconsistent representations, it isn't present > in BTF so we cannot attach safely. > >>> >>> Assuming that makes sense, the next question is how. One approach >>> would be at BTF deduplication time, but we have seen cases in the >>> past where BTF did not fully deduplicate identical structures, and >>> in such cases, multiple copies of a function are present which >>> have multiple (but identical) parameter type descriptions. If this >>> happens, the danger in eliminating both is we might eliminate >>> critical kernel functions due to a type deduplication issue. One >>> way to avoid this would be a comparison of number of parameters >>> and (failing that) parameter names; such a comparison would not >>> be subject to issues with identical types leading to identical >>> function definitions. Not 100% foolproof, but would work in >>> nearly all cases I suspect. >>> >>> Another approach would be to re-use the mechanics introduced to >>> compare static functions to see if they have optimized-out >>> parameters to compare function signatures. So static functions >>> paradoxically have to be stored in a global tree and compared >>> to weed out inconsistent duplicates. Luckily, these comparisons >>> can be quite superficial; for the most part number of arguments >>> suffices. >> >> If I understand correctly, for same-name static functions, >> we would like compare # of parameters >> first. If they are the same, we then compare parameter names. >> <name>.isra.<num> functions will compare against <name> func >> to ensure the number of parameters the same. I am not 100% >> sure what is the difference between the above two approaches. >> Only implementation difference, right? >> > > Right, the only question is when we eliminate the inconsistencies > (prior to BTF dedup or during it). > > We need the mechanics for comparing static functions to handle the .isra > case, so I went with re-using that scheme to catch inconsistent static > functions in > > https://github.com/acmel/dwarves/commit/80eaecdb00b3d79becc2133b854593277093b115 I am okay with this mechanism. > > Thanks! > > Alan > >>> >>> What do you think is the best way forward to solve this problem? >>> The optimization-handling code gives us a way to deal with this >>> by postponing addition of such functions until we can guarantee >>> consistency, and I've roughly prototyped a patch on top of [1] >>> that works, but I think first we need to figure out what to do >>> with such inconsistent static functions before determining how we >>> do it. What do folks think? >>> >>> Alan >>> >>> [1] https://github.com/acmel/dwarves/compare/master...alan-maguire:dwarves:btf-optimized
On Mon, Jan 23, 2023 at 11:18 PM Yonghong Song <yhs@meta.com> wrote: site. > >>> So perhaps the best approach is to eliminate such inconsistent static > >>> function descriptions? The actual amount is small, ~100 functions. > >> > >> Removing these inconsistent static functions could be a simpler > >> approach. > > > > I took that approach with > > > > https://github.com/acmel/dwarves/commit/80eaecdb00b3d79becc2133b854593277093b115 > > > > Static functions with inconsistent prototypes are left out of > > BTF encoding. Given that the numbers here are pretty low (around > > 100 or so, not including .isra functions which have inconsistent > > prototypes due to optimizations), that seems to be the simplest > > solution for now. > > Yes, I am okay with this approach. This will prevent user mistaking > kprobe a wrong function and this is what we want for now. Let's proceed with the fix first and then discuss improvements on top.
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index a5c67a3c93c5..fdfb1ca34ced 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -10375,12 +10375,30 @@ bpf_program__attach_kprobe_multi_opts(const struct bpf_program *prog, return libbpf_err_ptr(err); } +struct kprobe_resolve { + char pattern[128]; + char name[128]; +}; + +static int kprobe_kallsyms_cb(unsigned long long sym_addr, char sym_type, + const char *sym_name, void *ctx) +{ + struct kprobe_resolve *res = ctx; + + if (!glob_match(sym_name, res->pattern)) + return 0; + strcpy(res->name, sym_name); + return 1; +} + static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf_link **link) { DECLARE_LIBBPF_OPTS(bpf_kprobe_opts, opts); + struct kprobe_resolve res = {}; unsigned long offset = 0; const char *func_name; char *func; + int err; int n; *link = NULL; @@ -10408,8 +10426,25 @@ static int attach_kprobe(const struct bpf_program *prog, long cookie, struct bpf opts.offset = offset; *link = bpf_program__attach_kprobe_opts(prog, func, &opts); + err = libbpf_get_error(*link); + + if (!err || err != -ENOENT) + goto out; + + sprintf(res.pattern, "%s.*", func); + if (!libbpf_kallsyms_parse(kprobe_kallsyms_cb, &res)) + goto out; + + pr_warn("prog '%s': trying to create %s '%s+0x%zx' perf event instead\n", + prog->name, opts.retprobe ? "kretprobe" : "kprobe", + res.name, offset); + + *link = bpf_program__attach_kprobe_opts(prog, res.name, &opts); + err = libbpf_get_error(*link); + +out: free(func); - return libbpf_get_error(*link); + return err; } static int attach_ksyscall(const struct bpf_program *prog, long cookie, struct bpf_link **link)