Message ID | 20230417154737.12740-6-laoar.shao@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2222593vqo; Mon, 17 Apr 2023 08:50:30 -0700 (PDT) X-Google-Smtp-Source: AKy350ZeDGJVsqTkAEjEQXXT137C3bQnY+suHyGbyEKwfx1wxPwZWXkyA/NBxBTG3Yl/amSl0oZ+ X-Received: by 2002:a05:6a00:1916:b0:63b:22e7:6ee6 with SMTP id y22-20020a056a00191600b0063b22e76ee6mr21228971pfi.31.1681746630065; Mon, 17 Apr 2023 08:50:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681746630; cv=none; d=google.com; s=arc-20160816; b=SvVHyNiM8wIfXwhfJcMWQ7a/MOAt7WtwzxIRHkqFfSaOlomkJ3bOL0DJxnIif7tAsL jfdJ6jWGGUEDY74B/93oT7zg3iiNqxxOU3sFodQVIrW6cUS7VWvSFkp17HYHPIU5NBrb ITJqQBoInMqPW4bPp5t7y3a+ozEU+ofas4mciX7vI51i96gJZAliZdqBp7Q6WOtm5Owy ZEaGM6JpSdqsq012ENx8AT8aTwnUUIt7xCUX9ouu06GwGPyBiLcNPOu1xOMGMtObpqS0 d0ogKQh9NzW279b0hy5agK1ucfuSMMJ4edPtyeHoG/98MMVAOmjyFqmtG9+0vNTedvEY lJDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=+j/CiSSG6ZrIbfi+5KIf4ADOVLoZxvO0xQE+FQMboro=; b=TGSy+sp21xnJcZ4fdxDYhZsGj3QYLdBvpLYdMKNrOQQqbasMKynwgPtAq/ZSmpuhwY d7//DqF6dD5odoboHhdRXx18zK+ur0Xy+++YpRnhya1eotl5doCwqIVNQw6owH0JZvJG fBzf2fEUSXJYFM9fxEGYZ2w5VZCiRyodhNyPFPZHntmn2q2UB+y4BefUfrcSmBex/Cxo T7d8jLd4zay7XxjrlvdahyYRymrvwc6Tbwqk4G3j7BtzHjCF+sxZ9wX7whSexh8G7cxA VS5tCChYGfBRp26t+AjPuJm/L3HXZxuFzLxGaPNBHPf0f6ncJbYKIEBp0QGk3AsfUr9f wLhA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=L4ucy6fo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s15-20020aa78bcf000000b0063d2aa67a8esi1280561pfd.325.2023.04.17.08.50.13; Mon, 17 Apr 2023 08:50:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=L4ucy6fo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230043AbjDQPse (ORCPT <rfc822;leviz.kernel.dev@gmail.com> + 99 others); Mon, 17 Apr 2023 11:48:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230294AbjDQPs1 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 17 Apr 2023 11:48:27 -0400 Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com [IPv6:2607:f8b0:4864:20::1034]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E47A0E4A; Mon, 17 Apr 2023 08:48:14 -0700 (PDT) Received: by mail-pj1-x1034.google.com with SMTP id b2-20020a17090a6e0200b002470b249e59so15807982pjk.4; Mon, 17 Apr 2023 08:48:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1681746494; x=1684338494; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+j/CiSSG6ZrIbfi+5KIf4ADOVLoZxvO0xQE+FQMboro=; b=L4ucy6foP1tgbPdfjmtcWLR9O6NvDjMOviPj9FRo9r4eC6YpO97ReM/IZcIQc0avJc IjRzxMtU/7kh0VByS1OveKJdDtKa7X2HuCYDyG3vAjNxjf2oIfQYzVT0zPGIB69vlcOp aokhYgtBOr6meHto8CrxzX9Nw7gqmofgx4yrAyUOHTqiSa6eJPNw2f8dQ8QqRCBaLmXI dHCbZFM/k99HM6NyUbJSNQgUTYTs3+tYRDC2il9otYn+4CHQ2qz3isxJeN3cW172YdsZ p62CaDwsza36iIVv972A48L9ld6dyXKjo6IrxwGdyWfwn8iUGgqBFVMNj8OxMCc9QJUS ANUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681746494; x=1684338494; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+j/CiSSG6ZrIbfi+5KIf4ADOVLoZxvO0xQE+FQMboro=; b=LX3GVxtl5YmtEFj6wJz1fMrytgFbGLb9lvzsi7YEINF2cWh+v8BhUDbZt7F160vxdb MKvsiMtpil96XeoCUR/m8a+Ju+BdacULXN2GwNxAq9cWG27QAt5fdayQ9y3D7clQYYta duIpMHmqz+05sQQ+E8LargQuZc0hnkU/RMLgRUVejdQnr4QUexzvX7jKwa1sAr51B2C1 H9wemTcsKhlO7AOaoK2MPXkkvAkxzYNJmMntDLaqlmYzCTjbxY5ddoquWXvdtO8qIC50 xTSVj+FbxMaCK+WiqJy5+QRxT0SapbT3QYAoW+UCIdooQbc0NoarbiI15uZDk0epx8OA fLXA== X-Gm-Message-State: AAQBX9eeNdUXB8okM34nPpgAcobN2/fSAC3rO7dW5wrhZgIGeZE7fnb1 a0Q5FTIWSEe1fu7uo72OnEk= X-Received: by 2002:a17:903:645:b0:1a6:46f2:4365 with SMTP id kh5-20020a170903064500b001a646f24365mr10744242plb.30.1681746493962; Mon, 17 Apr 2023 08:48:13 -0700 (PDT) Received: from vultr.guest ([2401:c080:3800:263c:5400:4ff:fe66:d27f]) by smtp.gmail.com with ESMTPSA id jj17-20020a170903049100b001a6b308fcaesm4437513plb.153.2023.04.17.08.48.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 Apr 2023 08:48:13 -0700 (PDT) From: Yafang Shao <laoar.shao@gmail.com> To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, rostedt@goodmis.org, mhiramat@kernel.org Cc: bpf@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao <laoar.shao@gmail.com> Subject: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism Date: Mon, 17 Apr 2023 15:47:36 +0000 Message-Id: <20230417154737.12740-6-laoar.shao@gmail.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230417154737.12740-1-laoar.shao@gmail.com> References: <20230417154737.12740-1-laoar.shao@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763439154287565431?= X-GMAIL-MSGID: =?utf-8?q?1763439154287565431?= |
Series |
bpf: Tracing recursion prevention mechanism improvement
|
|
Commit Message
Yafang Shao
April 17, 2023, 3:47 p.m. UTC
Currently we use prog->active to prevent tracing recursion, but it has
some downsides,
- It can't identify different contexts
That said, if a process context is interrupted by a irq context and
the irq context runs the same code path, it will be considered as
recursion. For example,
normal:
this_cpu_inc_return(*(prog->active)) == 1 <- OK
irq:
this_cpu_inc_return(*(prog->active)) == 1 <- FAIL!
[ Considered as recusion ]
- It has to maintain a percpu area
A percpu area will be allocated for each prog when the prog is loaded
and be freed when the prog is destroyed.
Let's replace it with the generic tracing recursion prevention mechanism,
which can work fine with anything. In the above example, the irq context
won't be considered as recursion again,
normal:
test_recursion_try_acquire() <- OK
softirq:
test_recursion_try_acquire() <- OK
irq:
test_recursion_try_acquire() <- OK
Note that, currently one single recursion in process context is allowed
due to the TRACE_CTX_TRANSITION workaround, which can be fixed in the
future. That said, below behavior is expected currently,
normal:
test_recursion_try_acquire() <- OK
[ recursion happens ] <- one single recursion is allowed
test_recursion_try_acquire() <- OK
[ recursion happens ]
test_recursion_try_acquire() <- RECURSION!
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
include/linux/bpf.h | 2 +-
kernel/bpf/core.c | 10 ----------
kernel/bpf/trampoline.c | 44 +++++++++++++++++++++++++++++++++-----------
kernel/trace/bpf_trace.c | 12 +++++++-----
4 files changed, 41 insertions(+), 27 deletions(-)
Comments
On Mon, Apr 17, 2023 at 03:47:36PM +0000, Yafang Shao wrote: > diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c > index f61d513..3df39a5 100644 > --- a/kernel/bpf/trampoline.c > +++ b/kernel/bpf/trampoline.c > @@ -842,15 +842,21 @@ static __always_inline u64 notrace bpf_prog_start_time(void) > static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx) > __acquires(RCU) > { > - rcu_read_lock(); > - migrate_disable(); > - > - run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); > + int bit; > > - if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { > + rcu_read_lock(); > + bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_); and bpf will prevent ftrace to run and vice versa. Not a good idea. One bpf prog will prevent different bpf prog to run since they share current task. Not a good idea either.
Hi Yafang, kernel test robot noticed the following build errors: [auto build test ERROR on bpf-next/master] url: https://github.com/intel-lab-lkp/linux/commits/Yafang-Shao/bpf-Add-__rcu_read_-lock-unlock-into-btf-id-deny-list/20230417-235009 base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master patch link: https://lore.kernel.org/r/20230417154737.12740-6-laoar.shao%40gmail.com patch subject: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism config: loongarch-defconfig (https://download.01.org/0day-ci/archive/20230418/202304180736.cWjpwhs6-lkp@intel.com/config) compiler: loongarch64-linux-gcc (GCC) 12.1.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/ac84d2623c0469a703245030f2b23612ab4505dd git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Yafang-Shao/bpf-Add-__rcu_read_-lock-unlock-into-btf-id-deny-list/20230417-235009 git checkout ac84d2623c0469a703245030f2b23612ab4505dd # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=loongarch olddefconfig COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=loongarch SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202304180736.cWjpwhs6-lkp@intel.com/ All errors (new ones prefixed by >>): kernel/bpf/trampoline.c: In function '__bpf_prog_enter_recur': >> kernel/bpf/trampoline.c:848:15: error: implicit declaration of function 'test_recursion_try_acquire' [-Werror=implicit-function-declaration] 848 | bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_); | ^~~~~~~~~~~~~~~~~~~~~~~~~~ kernel/bpf/trampoline.c: In function '__bpf_prog_exit_recur': >> kernel/bpf/trampoline.c:896:9: error: implicit declaration of function 'test_recursion_release'; did you mean 'dev_recursion_level'? [-Werror=implicit-function-declaration] 896 | test_recursion_release(run_ctx->recursion_bit); | ^~~~~~~~~~~~~~~~~~~~~~ | dev_recursion_level cc1: some warnings being treated as errors vim +/test_recursion_try_acquire +848 kernel/bpf/trampoline.c 828 829 /* The logic is similar to bpf_prog_run(), but with an explicit 830 * rcu_read_lock() and migrate_disable() which are required 831 * for the trampoline. The macro is split into 832 * call __bpf_prog_enter 833 * call prog->bpf_func 834 * call __bpf_prog_exit 835 * 836 * __bpf_prog_enter returns: 837 * 0 - skip execution of the bpf prog 838 * 1 - execute bpf prog 839 * [2..MAX_U64] - execute bpf prog and record execution time. 840 * This is start time. 841 */ 842 static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx) 843 __acquires(RCU) 844 { 845 int bit; 846 847 rcu_read_lock(); > 848 bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_); 849 run_ctx->recursion_bit = bit; 850 if (bit < 0) { 851 preempt_disable_notrace(); 852 bpf_prog_inc_misses_counter(prog); 853 preempt_enable_notrace(); 854 return 0; 855 } 856 857 migrate_disable(); 858 859 run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); 860 return bpf_prog_start_time(); 861 } 862 863 static void notrace update_prog_stats(struct bpf_prog *prog, 864 u64 start) 865 { 866 struct bpf_prog_stats *stats; 867 868 if (static_branch_unlikely(&bpf_stats_enabled_key) && 869 /* static_key could be enabled in __bpf_prog_enter* 870 * and disabled in __bpf_prog_exit*. 871 * And vice versa. 872 * Hence check that 'start' is valid. 873 */ 874 start > NO_START_TIME) { 875 unsigned long flags; 876 877 stats = this_cpu_ptr(prog->stats); 878 flags = u64_stats_update_begin_irqsave(&stats->syncp); 879 u64_stats_inc(&stats->cnt); 880 u64_stats_add(&stats->nsecs, sched_clock() - start); 881 u64_stats_update_end_irqrestore(&stats->syncp, flags); 882 } 883 } 884 885 static void notrace __bpf_prog_exit_recur(struct bpf_prog *prog, u64 start, 886 struct bpf_tramp_run_ctx *run_ctx) 887 __releases(RCU) 888 { 889 if (run_ctx->recursion_bit < 0) 890 goto out; 891 892 bpf_reset_run_ctx(run_ctx->saved_run_ctx); 893 894 update_prog_stats(prog, start); 895 migrate_enable(); > 896 test_recursion_release(run_ctx->recursion_bit); 897 898 out: 899 rcu_read_unlock(); 900 } 901
On Tue, Apr 18, 2023 at 4:15 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Mon, Apr 17, 2023 at 03:47:36PM +0000, Yafang Shao wrote: > > diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c > > index f61d513..3df39a5 100644 > > --- a/kernel/bpf/trampoline.c > > +++ b/kernel/bpf/trampoline.c > > @@ -842,15 +842,21 @@ static __always_inline u64 notrace bpf_prog_start_time(void) > > static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx) > > __acquires(RCU) > > { > > - rcu_read_lock(); > > - migrate_disable(); > > - > > - run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); > > + int bit; > > > > - if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { > > + rcu_read_lock(); > > + bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_); > > and bpf will prevent ftrace to run and vice versa. > Not a good idea. > > One bpf prog will prevent different bpf prog to run since they share current task. > Not a good idea either. That shouldn't happen. test_recursion_try_acquire() uses a per-task_struct value. One single task_struct can't run in parallel, right? Note that the bpf program running in softirq or irq context won't be prevented by it. IIUC, the bpf program should run in serial in one single task, right? That said, one bpf program can only run after another bpf program finished in the same task?
On Mon, Apr 17, 2023 at 6:49 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > On Tue, Apr 18, 2023 at 4:15 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Mon, Apr 17, 2023 at 03:47:36PM +0000, Yafang Shao wrote: > > > diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c > > > index f61d513..3df39a5 100644 > > > --- a/kernel/bpf/trampoline.c > > > +++ b/kernel/bpf/trampoline.c > > > @@ -842,15 +842,21 @@ static __always_inline u64 notrace bpf_prog_start_time(void) > > > static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx) > > > __acquires(RCU) > > > { > > > - rcu_read_lock(); > > > - migrate_disable(); > > > - > > > - run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); > > > + int bit; > > > > > > - if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { > > > + rcu_read_lock(); > > > + bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_); > > > > and bpf will prevent ftrace to run and vice versa. > > Not a good idea. > > > > One bpf prog will prevent different bpf prog to run since they share current task. > > Not a good idea either. > > That shouldn't happen. test_recursion_try_acquire() uses a > per-task_struct value. One single task_struct can't run in parallel, > right? > Note that the bpf program running in softirq or irq context won't be > prevented by it. > IIUC, the bpf program should run in serial in one single task, right? > That said, one bpf program can only run after another bpf program > finished in the same task? bpf progs can nest in the same task.
On Tue, Apr 18, 2023 at 11:38 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Mon, Apr 17, 2023 at 6:49 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > On Tue, Apr 18, 2023 at 4:15 AM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Mon, Apr 17, 2023 at 03:47:36PM +0000, Yafang Shao wrote: > > > > diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c > > > > index f61d513..3df39a5 100644 > > > > --- a/kernel/bpf/trampoline.c > > > > +++ b/kernel/bpf/trampoline.c > > > > @@ -842,15 +842,21 @@ static __always_inline u64 notrace bpf_prog_start_time(void) > > > > static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx) > > > > __acquires(RCU) > > > > { > > > > - rcu_read_lock(); > > > > - migrate_disable(); > > > > - > > > > - run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); > > > > + int bit; > > > > > > > > - if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { > > > > + rcu_read_lock(); > > > > + bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_); > > > > > > and bpf will prevent ftrace to run and vice versa. > > > Not a good idea. > > > > > > One bpf prog will prevent different bpf prog to run since they share current task. > > > Not a good idea either. > > > > That shouldn't happen. test_recursion_try_acquire() uses a > > per-task_struct value. One single task_struct can't run in parallel, > > right? > > Note that the bpf program running in softirq or irq context won't be > > prevented by it. > > IIUC, the bpf program should run in serial in one single task, right? > > That said, one bpf program can only run after another bpf program > > finished in the same task? > > bpf progs can nest in the same task. Do you mean the tail_call ?
On Wed, 19 Apr 2023 15:46:34 -0700 Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > No. Just one prog at entry into any of the kernel functions > and another prog at entry of funcs that 1st bpf prog called indirectly. > Like one prog is tracing networking events while another > is focusing on mm. They should not conflict. You mean that you have: function start: __bpf_prog_enter_recur() bpf_program1() __bpf_prog_enter_recur() bpf_program2(); __bpf_prog_exit_recur() __bpf_prog_exit_recur() rest of function That is, a bpf program can be called within another bpf pogram between the prog_enter and prog_exit(), that is in the same context (normal, softirq, irq, etc)? The protection is on the trampoline where the bpf program is called. Not sure how ftrace can stop BPF or BPF stop ftrace, unless bpf is tracing a ftrace callback, or ftrace is tracing a bpf function. -- Steve
On Tue, Apr 25, 2023 at 5:40 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Wed, 19 Apr 2023 15:46:34 -0700 > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > > No. Just one prog at entry into any of the kernel functions > > and another prog at entry of funcs that 1st bpf prog called indirectly. > > Like one prog is tracing networking events while another > > is focusing on mm. They should not conflict. > > You mean that you have: > > function start: > __bpf_prog_enter_recur() > bpf_program1() > __bpf_prog_enter_recur() > bpf_program2(); > __bpf_prog_exit_recur() > __bpf_prog_exit_recur() > > rest of function > > That is, a bpf program can be called within another bpf pogram between > the prog_enter and prog_exit(), that is in the same context (normal, > softirq, irq, etc)? > Right, that can happen per my verification. Below is a simple bpf program to verify it. struct { __uint(type, BPF_MAP_TYPE_LPM_TRIE); __type(key, __u64); __type(value, __u64); __uint(max_entries, 1024); __uint(map_flags, BPF_F_NO_PREALLOC); } write_map SEC(".maps"); __u64 key; SEC("fentry/kernel_clone") int program1() { __u64 value = 1; bpf_printk("before update"); // It will call trie_update_elem and thus trigger program2. bpf_map_update_elem(&write_map, &key, &value, BPF_ANY); __sync_fetch_and_add(&key, 1); bpf_printk("after update"); return 0; } SEC("fentry/trie_update_elem") int program2() { bpf_printk("trie_update_elem"); return 0; } The result as follows, kubelet-203203 [018] ....1 9579.862862: __bpf_prog_enter_recur: __bpf_prog_enter_recur kubelet-203203 [018] ...11 9579.862869: bpf_trace_printk: before update kubelet-203203 [018] ....2 9579.862869: __bpf_prog_enter_recur: __bpf_prog_enter_recur kubelet-203203 [018] ...12 9579.862870: bpf_trace_printk: trie_update_elem kubelet-203203 [018] ....2 9579.862870: __bpf_prog_exit_recur: __bpf_prog_exit_recur kubelet-203203 [018] ...11 9579.862870: bpf_trace_printk: after update kubelet-203203 [018] ....1 9579.862871: __bpf_prog_exit_recur: __bpf_prog_exit_recur Note that we can't trace __bpf_prog_enter_recur and __bpf_prog_exit_recur, so we have to modify the kernel to print them. > The protection is on the trampoline where the bpf program is called. > Not sure how ftrace can stop BPF or BPF stop ftrace, unless bpf is > tracing a ftrace callback, or ftrace is tracing a bpf function. > > -- Steve
On Thu, Apr 27, 2023 at 5:57 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > On Tue, Apr 25, 2023 at 5:40 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > On Wed, 19 Apr 2023 15:46:34 -0700 > > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > > > > No. Just one prog at entry into any of the kernel functions > > > and another prog at entry of funcs that 1st bpf prog called indirectly. > > > Like one prog is tracing networking events while another > > > is focusing on mm. They should not conflict. > > > > You mean that you have: > > > > function start: > > __bpf_prog_enter_recur() > > bpf_program1() > > __bpf_prog_enter_recur() > > bpf_program2(); > > __bpf_prog_exit_recur() > > __bpf_prog_exit_recur() > > > > rest of function > > > > That is, a bpf program can be called within another bpf pogram between > > the prog_enter and prog_exit(), that is in the same context (normal, > > softirq, irq, etc)? > > > > Right, that can happen per my verification. Below is a simple bpf > program to verify it. > > struct { > __uint(type, BPF_MAP_TYPE_LPM_TRIE); > __type(key, __u64); > __type(value, __u64); > __uint(max_entries, 1024); > __uint(map_flags, BPF_F_NO_PREALLOC); > } write_map SEC(".maps"); > > __u64 key; > > SEC("fentry/kernel_clone") > int program1() > { > __u64 value = 1; > > bpf_printk("before update"); > // It will call trie_update_elem and thus trigger program2. > bpf_map_update_elem(&write_map, &key, &value, BPF_ANY); > __sync_fetch_and_add(&key, 1); > bpf_printk("after update"); > return 0; > } > > SEC("fentry/trie_update_elem") > int program2() > { > bpf_printk("trie_update_elem"); > return 0; > } > > The result as follows, > > kubelet-203203 [018] ....1 9579.862862: > __bpf_prog_enter_recur: __bpf_prog_enter_recur > kubelet-203203 [018] ...11 9579.862869: bpf_trace_printk: > before update > kubelet-203203 [018] ....2 9579.862869: > __bpf_prog_enter_recur: __bpf_prog_enter_recur > kubelet-203203 [018] ...12 9579.862870: bpf_trace_printk: > trie_update_elem > kubelet-203203 [018] ....2 9579.862870: > __bpf_prog_exit_recur: __bpf_prog_exit_recur > kubelet-203203 [018] ...11 9579.862870: bpf_trace_printk: > after update > kubelet-203203 [018] ....1 9579.862871: > __bpf_prog_exit_recur: __bpf_prog_exit_recur > > Note that we can't trace __bpf_prog_enter_recur and > __bpf_prog_exit_recur, so we have to modify the kernel to print them. > ... However, surprisingly it still works even after this patchset is applied, because the hardirq/softirq flag is set when the program2 is running, see also the flags in the above trace_pipe output. Is that expected ?! I need some time to figure it out, but maybe you have a quick answer... > > The protection is on the trampoline where the bpf program is called. > > Not sure how ftrace can stop BPF or BPF stop ftrace, unless bpf is > > tracing a ftrace callback, or ftrace is tracing a bpf function. > > > > -- Steve >
On Thu, Apr 27, 2023 at 8:15 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > On Thu, Apr 27, 2023 at 5:57 PM Yafang Shao <laoar.shao@gmail.com> wrote: > > > > On Tue, Apr 25, 2023 at 5:40 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > On Wed, 19 Apr 2023 15:46:34 -0700 > > > Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > > > > > > No. Just one prog at entry into any of the kernel functions > > > > and another prog at entry of funcs that 1st bpf prog called indirectly. > > > > Like one prog is tracing networking events while another > > > > is focusing on mm. They should not conflict. > > > > > > You mean that you have: > > > > > > function start: > > > __bpf_prog_enter_recur() > > > bpf_program1() > > > __bpf_prog_enter_recur() > > > bpf_program2(); > > > __bpf_prog_exit_recur() > > > __bpf_prog_exit_recur() > > > > > > rest of function > > > > > > That is, a bpf program can be called within another bpf pogram between > > > the prog_enter and prog_exit(), that is in the same context (normal, > > > softirq, irq, etc)? > > > > > > > Right, that can happen per my verification. Below is a simple bpf > > program to verify it. > > > > struct { > > __uint(type, BPF_MAP_TYPE_LPM_TRIE); > > __type(key, __u64); > > __type(value, __u64); > > __uint(max_entries, 1024); > > __uint(map_flags, BPF_F_NO_PREALLOC); > > } write_map SEC(".maps"); > > > > __u64 key; > > > > SEC("fentry/kernel_clone") > > int program1() > > { > > __u64 value = 1; > > > > bpf_printk("before update"); > > // It will call trie_update_elem and thus trigger program2. > > bpf_map_update_elem(&write_map, &key, &value, BPF_ANY); > > __sync_fetch_and_add(&key, 1); > > bpf_printk("after update"); > > return 0; > > } > > > > SEC("fentry/trie_update_elem") > > int program2() > > { > > bpf_printk("trie_update_elem"); > > return 0; > > } > > > > The result as follows, > > > > kubelet-203203 [018] ....1 9579.862862: > > __bpf_prog_enter_recur: __bpf_prog_enter_recur > > kubelet-203203 [018] ...11 9579.862869: bpf_trace_printk: > > before update > > kubelet-203203 [018] ....2 9579.862869: > > __bpf_prog_enter_recur: __bpf_prog_enter_recur > > kubelet-203203 [018] ...12 9579.862870: bpf_trace_printk: > > trie_update_elem > > kubelet-203203 [018] ....2 9579.862870: > > __bpf_prog_exit_recur: __bpf_prog_exit_recur > > kubelet-203203 [018] ...11 9579.862870: bpf_trace_printk: > > after update > > kubelet-203203 [018] ....1 9579.862871: > > __bpf_prog_exit_recur: __bpf_prog_exit_recur > > > > Note that we can't trace __bpf_prog_enter_recur and > > __bpf_prog_exit_recur, so we have to modify the kernel to print them. > > > > ... However, surprisingly it still works even after this patchset is > applied, because the hardirq/softirq flag is set when the program2 is > running, see also the flags in the above trace_pipe output. Is that > expected ?! > I need some time to figure it out, but maybe you have a quick answer... Answer it by myself, that is because of the allowing-one-single-recursion rule. I misread the trace flags before. Sorry about the noise.
On Mon, 17 Apr 2023 15:47:36 +0000 Yafang Shao <laoar.shao@gmail.com> wrote: > diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c > index f61d513..3df39a5 100644 > --- a/kernel/bpf/trampoline.c > +++ b/kernel/bpf/trampoline.c > @@ -842,15 +842,21 @@ static __always_inline u64 notrace bpf_prog_start_time(void) > static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx) > __acquires(RCU) Because __bpf_prog_enter_recur() and __bpf_prog_exit_recur() can legitimately nest (as you pointed out later in the thread), I think my original plan is the way to go. > { > - rcu_read_lock(); > - migrate_disable(); > - > - run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); > + int bit; > > - if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { > + rcu_read_lock(); > + bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_); > + run_ctx->recursion_bit = bit; > + if (bit < 0) { > + preempt_disable_notrace(); > bpf_prog_inc_misses_counter(prog); > + preempt_enable_notrace(); > return 0; > } > + > + migrate_disable(); Just encompass the migrate_disable/enable() with the recursion protection. That is, here add: test_recursion_release(recursion_bit); No need to save it in the run_ctx, as you can use a local variable. As I mentioned, if it passes when checking migrate_disable() it will also pass when checking around migrate_enable() so the two will still be paired properly, even if only the migrate_enable() starts recursing. bit = test_recursion_try_acquire() // OK if (bit < 0) return; migrate_disable(); test_recursion_release(bit); [..] bit = test_recursion_try_acquire() // OK migrate_enable() // traced and recurses... bit = test_recursion_try_acquire() // fails if (bit < 0) return; // returns here migrate_disable() // does not get called. The recursion around migrate_disable/enable() is needed because it's done before other checks. You can't attach the test_recursion logic to the __bpf_prog_enter/exit() routines, because those can legitimately recurse. -- Steve > + > + run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); > return bpf_prog_start_time(); > }
On Thu, Apr 27, 2023 at 9:26 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Mon, 17 Apr 2023 15:47:36 +0000 > Yafang Shao <laoar.shao@gmail.com> wrote: > > > diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c > > index f61d513..3df39a5 100644 > > --- a/kernel/bpf/trampoline.c > > +++ b/kernel/bpf/trampoline.c > > @@ -842,15 +842,21 @@ static __always_inline u64 notrace bpf_prog_start_time(void) > > static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx) > > __acquires(RCU) > > Because __bpf_prog_enter_recur() and __bpf_prog_exit_recur() can > legitimately nest (as you pointed out later in the thread), I think my > original plan is the way to go. > > > > > { > > - rcu_read_lock(); > > - migrate_disable(); > > - > > - run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); > > + int bit; > > > > - if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { > > + rcu_read_lock(); > > + bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_); > > + run_ctx->recursion_bit = bit; > > + if (bit < 0) { > > + preempt_disable_notrace(); > > bpf_prog_inc_misses_counter(prog); > > + preempt_enable_notrace(); > > return 0; > > } > > + > > + migrate_disable(); > > Just encompass the migrate_disable/enable() with the recursion protection. > > That is, here add: > > test_recursion_release(recursion_bit); > > No need to save it in the run_ctx, as you can use a local variable. > > As I mentioned, if it passes when checking migrate_disable() it will also > pass when checking around migrate_enable() so the two will still be paired > properly, even if only the migrate_enable() starts recursing. > > > bit = test_recursion_try_acquire() // OK > if (bit < 0) > return; > migrate_disable(); > test_recursion_release(bit); > > [..] > > bit = test_recursion_try_acquire() // OK > migrate_enable() // traced and recurses... > > bit = test_recursion_try_acquire() // fails > if (bit < 0) > return; // returns here > migrate_disable() // does not get called. > > The recursion around migrate_disable/enable() is needed because it's done > before other checks. You can't attach the test_recursion logic to the > __bpf_prog_enter/exit() routines, because those can legitimately recurse. > IIUC, the acquire/release pair works as follows, test_recursion_try_acquire [ protection area ] test_recursion_release After release, there will be no protection, and thus it will fail the tools/testing/selftests/bpf/progs/recursion.c[1] test case, because the recursion occurs in the bpf_prog_run() itself, __bpf_prog_enter test_recursion_try_acquire [...] test_recursion_release // no protection after the release bpf_prog_run() bpf_prog_run() // the recursion can't be prevented. __bpf_prog_enter test_recursion_try_acquire [...] test_recursion_release bpf_prog_run() bpf_prog_run() __bpf_prog_enter test_recursion_try_acquire [...] test_recursion_release bpf_prog_run() [ And so on ... ] [1]. https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/recursion.c#n38
On Thu, 27 Apr 2023 22:22:22 +0800 Yafang Shao <laoar.shao@gmail.com> wrote: > IIUC, the acquire/release pair works as follows, > > test_recursion_try_acquire > [ protection area ] > test_recursion_release > > After release, there will be no protection, and thus it will fail the > tools/testing/selftests/bpf/progs/recursion.c[1] test case, because > the recursion occurs in the bpf_prog_run() itself, But bpf programs are allowed to recurs. Hence, you need separate logic to detect that. The test_recursion_*() code is for cases that are not allowed to recurs. > > __bpf_prog_enter > test_recursion_try_acquire > [...] > test_recursion_release > // no protection after the release > bpf_prog_run() > bpf_prog_run() // the recursion can't be prevented. But I thought you can run a bpf_prog from another bpf_prog. So you don't want to prevent it. You need other logic to detect if it was not suppose to recurs. -- Steve > __bpf_prog_enter > test_recursion_try_acquire > [...] > test_recursion_release > bpf_prog_run() > bpf_prog_run() > __bpf_prog_enter > test_recursion_try_acquire > [...] > test_recursion_release > bpf_prog_run() > [ And so on ... ] > > [1]. https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/tree/tools/testing/selftests/bpf/progs/recursion.c#n38
On Thu, Apr 27, 2023 at 11:18 PM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Thu, 27 Apr 2023 22:22:22 +0800 > Yafang Shao <laoar.shao@gmail.com> wrote: > > > IIUC, the acquire/release pair works as follows, > > > > test_recursion_try_acquire > > [ protection area ] > > test_recursion_release > > > > After release, there will be no protection, and thus it will fail the > > tools/testing/selftests/bpf/progs/recursion.c[1] test case, because > > the recursion occurs in the bpf_prog_run() itself, > > But bpf programs are allowed to recurs. Hence, you need separate logic to > detect that. The test_recursion_*() code is for cases that are not allowed > to recurs. > Agreed. > > > > __bpf_prog_enter > > test_recursion_try_acquire > > [...] > > test_recursion_release > > // no protection after the release > > bpf_prog_run() > > bpf_prog_run() // the recursion can't be prevented. > > But I thought you can run a bpf_prog from another bpf_prog. So you don't > want to prevent it. You need other logic to detect if it was not suppose to > recurs. > If so, we have to keep the prog->active to prevent it, then I'm not sure if it is worth adding test_recursion_*().
On Thu, 27 Apr 2023 23:23:31 +0800 Yafang Shao <laoar.shao@gmail.com> wrote: > > But I thought you can run a bpf_prog from another bpf_prog. So you don't > > want to prevent it. You need other logic to detect if it was not suppose to > > recurs. > > > > If so, we have to keep the prog->active to prevent it, then I'm not > sure if it is worth adding test_recursion_*(). I thought that the whole point of this exercise was because the migrate_disable() itself could be traced (or call something that can), and that's outside of prog->active protection. Which the test_recursion_*() code was created for. -- Steve
On Thu, Apr 27, 2023 at 8:36 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > On Thu, 27 Apr 2023 23:23:31 +0800 > Yafang Shao <laoar.shao@gmail.com> wrote: > > > > But I thought you can run a bpf_prog from another bpf_prog. So you don't > > > want to prevent it. You need other logic to detect if it was not suppose to > > > recurs. > > > > > > > If so, we have to keep the prog->active to prevent it, then I'm not > > sure if it is worth adding test_recursion_*(). > > I thought that the whole point of this exercise was because the > migrate_disable() itself could be traced (or call something that can), and > that's outside of prog->active protection. Which the test_recursion_*() > code was created for. Not sure where did this come from. migrate_enable/disable were added to deny list back in 2021.
On Thu, Apr 27, 2023 at 11:39 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Thu, Apr 27, 2023 at 8:36 AM Steven Rostedt <rostedt@goodmis.org> wrote: > > > > On Thu, 27 Apr 2023 23:23:31 +0800 > > Yafang Shao <laoar.shao@gmail.com> wrote: > > > > > > But I thought you can run a bpf_prog from another bpf_prog. So you don't > > > > want to prevent it. You need other logic to detect if it was not suppose to > > > > recurs. > > > > > > > > > > If so, we have to keep the prog->active to prevent it, then I'm not > > > sure if it is worth adding test_recursion_*(). > > > > I thought that the whole point of this exercise was because the > > migrate_disable() itself could be traced (or call something that can), and > > that's outside of prog->active protection. Which the test_recursion_*() > > code was created for. > > Not sure where did this come from. > migrate_enable/disable were added to deny list back in 2021. Hi Alexei, Don't be uneasy. It is not good to play word games. What Steven really meant is the preempt_count_{sub, add}. Anyway thanks Steven for the help with this exercise.
On Thu, 27 Apr 2023 23:43:35 +0800 Yafang Shao <laoar.shao@gmail.com> wrote: > > > I thought that the whole point of this exercise was because the > > > migrate_disable() itself could be traced (or call something that can), and > > > that's outside of prog->active protection. Which the test_recursion_*() > > > code was created for. > > > > Not sure where did this come from. > > migrate_enable/disable were added to deny list back in 2021. > > Hi Alexei, > > Don't be uneasy. It is not good to play word games. > What Steven really meant is the preempt_count_{sub, add}. > Anyway thanks Steven for the help with this exercise. Right, it was the "(or call something that can)" part that this came from. As Yafang said, migrate_disable() calls preempt_count_add() (on some configs) which is traced by ftrace, and thus traced by bpf. Or was that added to the deny list? I think that was one of the solutions as well. -- Steve
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 18b592f..c42ff90 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1467,7 +1467,6 @@ struct bpf_prog { u32 jited_len; /* Size of jited insns in bytes */ u8 tag[BPF_TAG_SIZE]; struct bpf_prog_stats __percpu *stats; - int __percpu *active; unsigned int (*bpf_func)(const void *ctx, const struct bpf_insn *insn); struct bpf_prog_aux *aux; /* Auxiliary fields */ @@ -1813,6 +1812,7 @@ struct bpf_tramp_run_ctx { struct bpf_run_ctx run_ctx; u64 bpf_cookie; struct bpf_run_ctx *saved_run_ctx; + int recursion_bit; }; static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 7421487..0942ab2 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -103,12 +103,6 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag vfree(fp); return NULL; } - fp->active = alloc_percpu_gfp(int, bpf_memcg_flags(GFP_KERNEL | gfp_extra_flags)); - if (!fp->active) { - vfree(fp); - kfree(aux); - return NULL; - } fp->pages = size / PAGE_SIZE; fp->aux = aux; @@ -138,7 +132,6 @@ struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags) prog->stats = alloc_percpu_gfp(struct bpf_prog_stats, gfp_flags); if (!prog->stats) { - free_percpu(prog->active); kfree(prog->aux); vfree(prog); return NULL; @@ -256,7 +249,6 @@ struct bpf_prog *bpf_prog_realloc(struct bpf_prog *fp_old, unsigned int size, */ fp_old->aux = NULL; fp_old->stats = NULL; - fp_old->active = NULL; __bpf_prog_free(fp_old); } @@ -272,7 +264,6 @@ void __bpf_prog_free(struct bpf_prog *fp) kfree(fp->aux); } free_percpu(fp->stats); - free_percpu(fp->active); vfree(fp); } @@ -1385,7 +1376,6 @@ static void bpf_prog_clone_free(struct bpf_prog *fp) */ fp->aux = NULL; fp->stats = NULL; - fp->active = NULL; __bpf_prog_free(fp); } diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index f61d513..3df39a5 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -842,15 +842,21 @@ static __always_inline u64 notrace bpf_prog_start_time(void) static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx) __acquires(RCU) { - rcu_read_lock(); - migrate_disable(); - - run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); + int bit; - if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { + rcu_read_lock(); + bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_); + run_ctx->recursion_bit = bit; + if (bit < 0) { + preempt_disable_notrace(); bpf_prog_inc_misses_counter(prog); + preempt_enable_notrace(); return 0; } + + migrate_disable(); + + run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); return bpf_prog_start_time(); } @@ -880,11 +886,16 @@ static void notrace __bpf_prog_exit_recur(struct bpf_prog *prog, u64 start, struct bpf_tramp_run_ctx *run_ctx) __releases(RCU) { + if (run_ctx->recursion_bit < 0) + goto out; + bpf_reset_run_ctx(run_ctx->saved_run_ctx); update_prog_stats(prog, start); - this_cpu_dec(*(prog->active)); migrate_enable(); + test_recursion_release(run_ctx->recursion_bit); + +out: rcu_read_unlock(); } @@ -916,15 +927,21 @@ static void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start, u64 notrace __bpf_prog_enter_sleepable_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx) { - rcu_read_lock_trace(); - migrate_disable(); - might_fault(); + int bit; - if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { + rcu_read_lock_trace(); + bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_); + run_ctx->recursion_bit = bit; + if (bit < 0) { + preempt_disable_notrace(); bpf_prog_inc_misses_counter(prog); + preempt_enable_notrace(); return 0; } + migrate_disable(); + might_fault(); + run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); return bpf_prog_start_time(); @@ -933,11 +950,16 @@ u64 notrace __bpf_prog_enter_sleepable_recur(struct bpf_prog *prog, void notrace __bpf_prog_exit_sleepable_recur(struct bpf_prog *prog, u64 start, struct bpf_tramp_run_ctx *run_ctx) { + if (run_ctx->recursion_bit < 0) + goto out; + bpf_reset_run_ctx(run_ctx->saved_run_ctx); update_prog_stats(prog, start); - this_cpu_dec(*(prog->active)); migrate_enable(); + test_recursion_release(run_ctx->recursion_bit); + +out: rcu_read_unlock_trace(); } diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index bcf91bc..bb9a4c9 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2250,16 +2250,18 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp) static __always_inline void __bpf_trace_run(struct bpf_prog *prog, u64 *args) { - cant_sleep(); - if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { + int bit; + + bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_); + if (bit < 0) { bpf_prog_inc_misses_counter(prog); - goto out; + return; } + cant_sleep(); rcu_read_lock(); (void) bpf_prog_run(prog, args); rcu_read_unlock(); -out: - this_cpu_dec(*(prog->active)); + test_recursion_release(bit); } #define UNPACK(...) __VA_ARGS__