From patchwork Mon Apr 17 15:47:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 84326 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2222593vqo; Mon, 17 Apr 2023 08:50:30 -0700 (PDT) X-Google-Smtp-Source: AKy350ZeDGJVsqTkAEjEQXXT137C3bQnY+suHyGbyEKwfx1wxPwZWXkyA/NBxBTG3Yl/amSl0oZ+ X-Received: by 2002:a05:6a00:1916:b0:63b:22e7:6ee6 with SMTP id y22-20020a056a00191600b0063b22e76ee6mr21228971pfi.31.1681746630065; Mon, 17 Apr 2023 08:50:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681746630; cv=none; d=google.com; s=arc-20160816; b=SvVHyNiM8wIfXwhfJcMWQ7a/MOAt7WtwzxIRHkqFfSaOlomkJ3bOL0DJxnIif7tAsL jfdJ6jWGGUEDY74B/93oT7zg3iiNqxxOU3sFodQVIrW6cUS7VWvSFkp17HYHPIU5NBrb ITJqQBoInMqPW4bPp5t7y3a+ozEU+ofas4mciX7vI51i96gJZAliZdqBp7Q6WOtm5Owy ZEaGM6JpSdqsq012ENx8AT8aTwnUUIt7xCUX9ouu06GwGPyBiLcNPOu1xOMGMtObpqS0 d0ogKQh9NzW279b0hy5agK1ucfuSMMJ4edPtyeHoG/98MMVAOmjyFqmtG9+0vNTedvEY lJDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=+j/CiSSG6ZrIbfi+5KIf4ADOVLoZxvO0xQE+FQMboro=; b=TGSy+sp21xnJcZ4fdxDYhZsGj3QYLdBvpLYdMKNrOQQqbasMKynwgPtAq/ZSmpuhwY d7//DqF6dD5odoboHhdRXx18zK+ur0Xy+++YpRnhya1eotl5doCwqIVNQw6owH0JZvJG fBzf2fEUSXJYFM9fxEGYZ2w5VZCiRyodhNyPFPZHntmn2q2UB+y4BefUfrcSmBex/Cxo T7d8jLd4zay7XxjrlvdahyYRymrvwc6Tbwqk4G3j7BtzHjCF+sxZ9wX7whSexh8G7cxA VS5tCChYGfBRp26t+AjPuJm/L3HXZxuFzLxGaPNBHPf0f6ncJbYKIEBp0QGk3AsfUr9f wLhA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=L4ucy6fo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s15-20020aa78bcf000000b0063d2aa67a8esi1280561pfd.325.2023.04.17.08.50.13; Mon, 17 Apr 2023 08:50:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=L4ucy6fo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230043AbjDQPse (ORCPT + 99 others); Mon, 17 Apr 2023 11:48:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230294AbjDQPs1 (ORCPT ); Mon, 17 Apr 2023 11:48:27 -0400 Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com [IPv6:2607:f8b0:4864:20::1034]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E47A0E4A; Mon, 17 Apr 2023 08:48:14 -0700 (PDT) Received: by mail-pj1-x1034.google.com with SMTP id b2-20020a17090a6e0200b002470b249e59so15807982pjk.4; Mon, 17 Apr 2023 08:48:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1681746494; x=1684338494; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+j/CiSSG6ZrIbfi+5KIf4ADOVLoZxvO0xQE+FQMboro=; b=L4ucy6foP1tgbPdfjmtcWLR9O6NvDjMOviPj9FRo9r4eC6YpO97ReM/IZcIQc0avJc IjRzxMtU/7kh0VByS1OveKJdDtKa7X2HuCYDyG3vAjNxjf2oIfQYzVT0zPGIB69vlcOp aokhYgtBOr6meHto8CrxzX9Nw7gqmofgx4yrAyUOHTqiSa6eJPNw2f8dQ8QqRCBaLmXI dHCbZFM/k99HM6NyUbJSNQgUTYTs3+tYRDC2il9otYn+4CHQ2qz3isxJeN3cW172YdsZ p62CaDwsza36iIVv972A48L9ld6dyXKjo6IrxwGdyWfwn8iUGgqBFVMNj8OxMCc9QJUS ANUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681746494; x=1684338494; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+j/CiSSG6ZrIbfi+5KIf4ADOVLoZxvO0xQE+FQMboro=; b=LX3GVxtl5YmtEFj6wJz1fMrytgFbGLb9lvzsi7YEINF2cWh+v8BhUDbZt7F160vxdb MKvsiMtpil96XeoCUR/m8a+Ju+BdacULXN2GwNxAq9cWG27QAt5fdayQ9y3D7clQYYta duIpMHmqz+05sQQ+E8LargQuZc0hnkU/RMLgRUVejdQnr4QUexzvX7jKwa1sAr51B2C1 H9wemTcsKhlO7AOaoK2MPXkkvAkxzYNJmMntDLaqlmYzCTjbxY5ddoquWXvdtO8qIC50 xTSVj+FbxMaCK+WiqJy5+QRxT0SapbT3QYAoW+UCIdooQbc0NoarbiI15uZDk0epx8OA fLXA== X-Gm-Message-State: AAQBX9eeNdUXB8okM34nPpgAcobN2/fSAC3rO7dW5wrhZgIGeZE7fnb1 a0Q5FTIWSEe1fu7uo72OnEk= X-Received: by 2002:a17:903:645:b0:1a6:46f2:4365 with SMTP id kh5-20020a170903064500b001a646f24365mr10744242plb.30.1681746493962; Mon, 17 Apr 2023 08:48:13 -0700 (PDT) Received: from vultr.guest ([2401:c080:3800:263c:5400:4ff:fe66:d27f]) by smtp.gmail.com with ESMTPSA id jj17-20020a170903049100b001a6b308fcaesm4437513plb.153.2023.04.17.08.48.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 Apr 2023 08:48:13 -0700 (PDT) From: Yafang Shao To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, kafai@fb.com, songliubraving@fb.com, yhs@fb.com, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, rostedt@goodmis.org, mhiramat@kernel.org Cc: bpf@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-kernel@vger.kernel.org, Yafang Shao Subject: [PATCH bpf-next 5/6] bpf: Improve tracing recursion prevention mechanism Date: Mon, 17 Apr 2023 15:47:36 +0000 Message-Id: <20230417154737.12740-6-laoar.shao@gmail.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230417154737.12740-1-laoar.shao@gmail.com> References: <20230417154737.12740-1-laoar.shao@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763439154287565431?= X-GMAIL-MSGID: =?utf-8?q?1763439154287565431?= Currently we use prog->active to prevent tracing recursion, but it has some downsides, - It can't identify different contexts That said, if a process context is interrupted by a irq context and the irq context runs the same code path, it will be considered as recursion. For example, normal: this_cpu_inc_return(*(prog->active)) == 1 <- OK irq: this_cpu_inc_return(*(prog->active)) == 1 <- FAIL! [ Considered as recusion ] - It has to maintain a percpu area A percpu area will be allocated for each prog when the prog is loaded and be freed when the prog is destroyed. Let's replace it with the generic tracing recursion prevention mechanism, which can work fine with anything. In the above example, the irq context won't be considered as recursion again, normal: test_recursion_try_acquire() <- OK softirq: test_recursion_try_acquire() <- OK irq: test_recursion_try_acquire() <- OK Note that, currently one single recursion in process context is allowed due to the TRACE_CTX_TRANSITION workaround, which can be fixed in the future. That said, below behavior is expected currently, normal: test_recursion_try_acquire() <- OK [ recursion happens ] <- one single recursion is allowed test_recursion_try_acquire() <- OK [ recursion happens ] test_recursion_try_acquire() <- RECURSION! Signed-off-by: Yafang Shao --- include/linux/bpf.h | 2 +- kernel/bpf/core.c | 10 ---------- kernel/bpf/trampoline.c | 44 +++++++++++++++++++++++++++++++++----------- kernel/trace/bpf_trace.c | 12 +++++++----- 4 files changed, 41 insertions(+), 27 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 18b592f..c42ff90 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1467,7 +1467,6 @@ struct bpf_prog { u32 jited_len; /* Size of jited insns in bytes */ u8 tag[BPF_TAG_SIZE]; struct bpf_prog_stats __percpu *stats; - int __percpu *active; unsigned int (*bpf_func)(const void *ctx, const struct bpf_insn *insn); struct bpf_prog_aux *aux; /* Auxiliary fields */ @@ -1813,6 +1812,7 @@ struct bpf_tramp_run_ctx { struct bpf_run_ctx run_ctx; u64 bpf_cookie; struct bpf_run_ctx *saved_run_ctx; + int recursion_bit; }; static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 7421487..0942ab2 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -103,12 +103,6 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag vfree(fp); return NULL; } - fp->active = alloc_percpu_gfp(int, bpf_memcg_flags(GFP_KERNEL | gfp_extra_flags)); - if (!fp->active) { - vfree(fp); - kfree(aux); - return NULL; - } fp->pages = size / PAGE_SIZE; fp->aux = aux; @@ -138,7 +132,6 @@ struct bpf_prog *bpf_prog_alloc(unsigned int size, gfp_t gfp_extra_flags) prog->stats = alloc_percpu_gfp(struct bpf_prog_stats, gfp_flags); if (!prog->stats) { - free_percpu(prog->active); kfree(prog->aux); vfree(prog); return NULL; @@ -256,7 +249,6 @@ struct bpf_prog *bpf_prog_realloc(struct bpf_prog *fp_old, unsigned int size, */ fp_old->aux = NULL; fp_old->stats = NULL; - fp_old->active = NULL; __bpf_prog_free(fp_old); } @@ -272,7 +264,6 @@ void __bpf_prog_free(struct bpf_prog *fp) kfree(fp->aux); } free_percpu(fp->stats); - free_percpu(fp->active); vfree(fp); } @@ -1385,7 +1376,6 @@ static void bpf_prog_clone_free(struct bpf_prog *fp) */ fp->aux = NULL; fp->stats = NULL; - fp->active = NULL; __bpf_prog_free(fp); } diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c index f61d513..3df39a5 100644 --- a/kernel/bpf/trampoline.c +++ b/kernel/bpf/trampoline.c @@ -842,15 +842,21 @@ static __always_inline u64 notrace bpf_prog_start_time(void) static u64 notrace __bpf_prog_enter_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx) __acquires(RCU) { - rcu_read_lock(); - migrate_disable(); - - run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); + int bit; - if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { + rcu_read_lock(); + bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_); + run_ctx->recursion_bit = bit; + if (bit < 0) { + preempt_disable_notrace(); bpf_prog_inc_misses_counter(prog); + preempt_enable_notrace(); return 0; } + + migrate_disable(); + + run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); return bpf_prog_start_time(); } @@ -880,11 +886,16 @@ static void notrace __bpf_prog_exit_recur(struct bpf_prog *prog, u64 start, struct bpf_tramp_run_ctx *run_ctx) __releases(RCU) { + if (run_ctx->recursion_bit < 0) + goto out; + bpf_reset_run_ctx(run_ctx->saved_run_ctx); update_prog_stats(prog, start); - this_cpu_dec(*(prog->active)); migrate_enable(); + test_recursion_release(run_ctx->recursion_bit); + +out: rcu_read_unlock(); } @@ -916,15 +927,21 @@ static void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start, u64 notrace __bpf_prog_enter_sleepable_recur(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx) { - rcu_read_lock_trace(); - migrate_disable(); - might_fault(); + int bit; - if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { + rcu_read_lock_trace(); + bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_); + run_ctx->recursion_bit = bit; + if (bit < 0) { + preempt_disable_notrace(); bpf_prog_inc_misses_counter(prog); + preempt_enable_notrace(); return 0; } + migrate_disable(); + might_fault(); + run_ctx->saved_run_ctx = bpf_set_run_ctx(&run_ctx->run_ctx); return bpf_prog_start_time(); @@ -933,11 +950,16 @@ u64 notrace __bpf_prog_enter_sleepable_recur(struct bpf_prog *prog, void notrace __bpf_prog_exit_sleepable_recur(struct bpf_prog *prog, u64 start, struct bpf_tramp_run_ctx *run_ctx) { + if (run_ctx->recursion_bit < 0) + goto out; + bpf_reset_run_ctx(run_ctx->saved_run_ctx); update_prog_stats(prog, start); - this_cpu_dec(*(prog->active)); migrate_enable(); + test_recursion_release(run_ctx->recursion_bit); + +out: rcu_read_unlock_trace(); } diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index bcf91bc..bb9a4c9 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -2250,16 +2250,18 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp) static __always_inline void __bpf_trace_run(struct bpf_prog *prog, u64 *args) { - cant_sleep(); - if (unlikely(this_cpu_inc_return(*(prog->active)) != 1)) { + int bit; + + bit = test_recursion_try_acquire(_THIS_IP_, _RET_IP_); + if (bit < 0) { bpf_prog_inc_misses_counter(prog); - goto out; + return; } + cant_sleep(); rcu_read_lock(); (void) bpf_prog_run(prog, args); rcu_read_unlock(); -out: - this_cpu_dec(*(prog->active)); + test_recursion_release(bit); } #define UNPACK(...) __VA_ARGS__