[bpf-next,v2,5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS
Message ID | 20230912070149.969939-6-zhouchuyi@bytedance.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9ecd:0:b0:3f2:4152:657d with SMTP id t13csp901736vqx; Wed, 13 Sep 2023 00:08:18 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHLPgXZDuAHLtE8XySYm+amy1yy+E+NPiSWRWo09CgrA456B6kiOgm3yJlXpHOS024VXW3V X-Received: by 2002:a17:903:503:b0:1c1:f3f8:3949 with SMTP id jn3-20020a170903050300b001c1f3f83949mr2008637plb.1.1694588897832; Wed, 13 Sep 2023 00:08:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694588897; cv=none; d=google.com; s=arc-20160816; b=ydNdh+Pjv/kqrEN0F4Ayp46M5etak6pqfPP8W8i8r/HRF93e2Es7e5UD4bim1q5+a5 Nxw9kuOoz9hFva3ukdvGtW2+PM0fS2OEnzyIYWx6ie8Y2O1Br5JI2rYEr6hIJWoTr0Bm jsVgXKE8pNCn7VqZOQQXogzlps50g7kHObjCZyVoEbssT8kn5lYslO/jM24kDTLH7ySU Poqu9X2Bpng1j+50mdKZM7dtNq1N4ngyB8iWr5L7Krlpsio8F9cEP86RgDAKjZDlo/PT kocRvEbXtEBey3uRg2B/woKs6VeGt63B9fun5tvbNfrdhUfuhsAb3iZhabnUvLuHZbYM wUXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=zZ7auXMwt2HG2eO79EZb79CGVaa+fy/IcLCkXgQTU2E=; fh=DR7g1EcWKOXTEoooUPBSJXUaklSrDEYzv6YDdhz1CwE=; b=wc9Bo+NinpCC7aR8RwLXonFMz/CtZvOCbDUzw/LzSVtj5UP27P5uGbdVwIxB33u1cp ahMLSxAkhhaB9gaQuq8A/YudY/wmplCkLTWmFMMYDwYJ6kNjJcZcMBJpV6lSZCfRybFk RiZZzPTtLK2VPW5RSqfzQIz7vCJcJcdcllPs72ZY4OauSCbhn1FphE5nqJEnujuiny5s sJan5xUM0ptCGjA5YFCcEBplkjk12ofGHeFSRko8c1yul/yrMP9jg1Oymy9DZqBwKt/T 1TRmFDiKD/rX7jEdDwcT8DohwkIIG6halAsGvQqx9nhTmeXzwSuh0jVZ4xpUF/vZAt3+ x1CQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=gJntY8cV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id h6-20020a170902f54600b001c1f27ea534si10077932plf.353.2023.09.13.00.08.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 Sep 2023 00:08:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=gJntY8cV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 9EE8B8483152; Tue, 12 Sep 2023 00:02:54 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231332AbjILHCt (ORCPT <rfc822;ruipengqi7@gmail.com> + 39 others); Tue, 12 Sep 2023 03:02:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51986 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231382AbjILHCl (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 12 Sep 2023 03:02:41 -0400 Received: from mail-pl1-x62e.google.com (mail-pl1-x62e.google.com [IPv6:2607:f8b0:4864:20::62e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4347AF for <linux-kernel@vger.kernel.org>; Tue, 12 Sep 2023 00:02:16 -0700 (PDT) Received: by mail-pl1-x62e.google.com with SMTP id d9443c01a7336-1c35ee3b0d2so35295345ad.2 for <linux-kernel@vger.kernel.org>; Tue, 12 Sep 2023 00:02:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1694502136; x=1695106936; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zZ7auXMwt2HG2eO79EZb79CGVaa+fy/IcLCkXgQTU2E=; b=gJntY8cV6wapICpdZfEIKXOai/sUbNM/O2PEKnDVSxn2gbWVIXA5t6cb7ba/zTwtwZ WPKEc9Sb6BUUld9t3MOlZYSn36CNY7tArLtbbMwnTUTx3Mf5iaTfiZltVlVMV5xD4xQY X0FW9FZVCyndsjXUA8mh1czh/TU1HNC0/w83YQ7H09Et5h9x0ZmpeyHEFeji9O+0Vmxl xlJ0cfto8eS7a4We7zUxCHvM91RjLFelbmxD1yMP5jZN/ZD7K0XFwIEDw+SW7VfCMNOT Jep7/hQOIfpGe4m+m5bjhWqY0GvnE9Fwk1FcBmB1GqSj2HWUlw3/QSPGHW5JjFwi6oqI xDyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694502136; x=1695106936; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zZ7auXMwt2HG2eO79EZb79CGVaa+fy/IcLCkXgQTU2E=; b=Fv8wJ4ueJ6Hi+kcxAfo1obxMcUAFTShKxRFrUceNSO4yD8KEhlM5W2dE068gfoEDv+ +zy1oE6P8bx7uN2ea6/sDZFSK0/Jdt/ImmDpnIfxLMo/DBCjNp2KrS/htRbpgdck4M4q WCpdGLwlaZcgBo9DsqZFfR0bqO87z4xPN+qzxAjNpqeuN3nSTO8ZWKRgNHPltRGLQTlc L6L2s7tAaZBlieF2vyO0g/q+YGTUXzSb2Q9DXKQNT1DyVXC9uDVnXh/ugzqARLUMQtzB 6b/WRnsDsob326GQSf63scFzwoIdQ69Glbnfx1je65b1g9kTJh8PRcqLFSkg/+p7kw5h jVwQ== X-Gm-Message-State: AOJu0YwP9kbOJH7IfwreSUNeeZKPu/UuRoOhp2xT6q8RIde1AQ5lxl/X xKx/07lVmUGipamV+1JROkqJTw== X-Received: by 2002:a17:902:e883:b0:1bc:506a:58f2 with SMTP id w3-20020a170902e88300b001bc506a58f2mr10371670plg.46.1694502136153; Tue, 12 Sep 2023 00:02:16 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.84.173]) by smtp.gmail.com with ESMTPSA id b8-20020a170902d50800b001b8953365aesm7635401plg.22.2023.09.12.00.02.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Sep 2023 00:02:15 -0700 (PDT) From: Chuyi Zhou <zhouchuyi@bytedance.com> To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, Chuyi Zhou <zhouchuyi@bytedance.com> Subject: [PATCH bpf-next v2 5/6] bpf: teach the verifier to enforce css_iter and process_iter in RCU CS Date: Tue, 12 Sep 2023 15:01:48 +0800 Message-Id: <20230912070149.969939-6-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230912070149.969939-1-zhouchuyi@bytedance.com> References: <20230912070149.969939-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Tue, 12 Sep 2023 00:02:54 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1776905248505348469 X-GMAIL-MSGID: 1776905248505348469 |
Series |
Add Open-coded process and css iters
|
|
Commit Message
Chuyi Zhou
Sept. 12, 2023, 7:01 a.m. UTC
css_iter and process_iter should be used in rcu section. Specifically, in
sleepable progs explicit bpf_rcu_read_lock() is needed before use these
iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to
use them directly.
This patch checks whether we are in rcu cs before we want to invoke
bpf_iter_process_new and bpf_iter_css_{pre, post}_new in
mark_stack_slots_iter(). If the rcu protection is guaranteed, we would
let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will
reject if reg->type is UNTRUSTED.
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
---
kernel/bpf/verifier.c | 30 ++++++++++++++++++++++++++++--
1 file changed, 28 insertions(+), 2 deletions(-)
Comments
Hello. 在 2023/9/12 15:01, Chuyi Zhou 写道: > css_iter and process_iter should be used in rcu section. Specifically, in > sleepable progs explicit bpf_rcu_read_lock() is needed before use these > iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to > use them directly. > > This patch checks whether we are in rcu cs before we want to invoke > bpf_iter_process_new and bpf_iter_css_{pre, post}_new in > mark_stack_slots_iter(). If the rcu protection is guaranteed, we would > let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will > reject if reg->type is UNTRUSTED. I use the following BPF Prog to test this patch: SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") int iter_task_for_each_sleep(void *ctx) { struct task_struct *task; struct task_struct *cur_task = bpf_get_current_task_btf(); if (cur_task->pid != target_pid) return 0; bpf_rcu_read_lock(); bpf_for_each(process, task) { bpf_rcu_read_unlock(); if (task->pid == target_pid) process_cnt += 1; bpf_rcu_read_lock(); } bpf_rcu_read_unlock(); return 0; } Unfortunately, we can pass the verifier. Then I add some printk-messages before setting/clearing state to help debug: diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index d151e6b43a5f..35f3fa9471a9 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env, __mark_reg_known_zero(st); st->type = PTR_TO_STACK; /* we don't have dedicated reg type */ if (is_iter_need_rcu(meta)) { + printk("mark reg_addr : %px", st); if (in_rcu_cs(env)) st->type |= MEM_RCU; else @@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, return -EINVAL; } else if (rcu_unlock) { bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ + printk("clear reg_addr : %px MEM_RCU : %d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type & PTR_UNTRUSTED); if (reg->type & MEM_RCU) { - printk("clear reg addr : %lld", reg); reg->type &= ~(MEM_RCU | PTR_MAYBE_NULL); reg->type |= PTR_UNTRUSTED; } The demsg log: [ 393.705324] mark reg_addr : ffff88814e40e200 [ 393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0 PTR_UNTRUSTED : 0 [ 393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0 PTR_UNTRUSTED : 0 [ 393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0 PTR_UNTRUSTED : 0 .... .... I didn't see ffff88814e40e200 is cleared as expected because bpf_for_each_reg_in_vstate didn't find it. It seems when we are doing bpf_read_unlock() in the middle of iteration and want to clearing state through bpf_for_each_reg_in_vstate, we can not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in mark_stack_slots_iter(). I thought maybe the correct answer here is operating the *iter_reg* parameter in mark_stack_slots_iter() direcly so we can find it in bpf_for_each_reg_in_vstate. diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 6a6827ba7a18..53330ddf2b3c 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1218,6 +1218,12 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env, mark_stack_slot_scratched(env, spi - i); } + if (is_iter_need_rcu(meta)) { + if (in_rcu_cs(env)) + reg->type |= MEM_RCU; + else + reg->type |= PTR_UNTRUSTED; + } return 0; } @@ -1307,7 +1315,8 @@ static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_ if (slot->slot_type[j] != STACK_ITER) Kumarreturn false; } - + if (reg->type & PTR_UNTRUSTED) + return false; return true; } However, it did not work either. The reason it didn't work is the state of iter_reg will be cleared implicitly before the is_iter_reg_valid_init() even we don't call bpf_rcu_unlock. It would be appreciate if you could give some suggestion. Maby it worthy to try the solution proposed by Kumar?[1] [1] https://lore.kernel.org/lkml/CAP01T77cWxWNwq5HLr+Woiu7k4-P3QQfJWX1OeQJUkxW3=P4bA@mail.gmail.com/#t
在 2023/9/13 21:53, Chuyi Zhou 写道: > Hello. > > 在 2023/9/12 15:01, Chuyi Zhou 写道: >> css_iter and process_iter should be used in rcu section. Specifically, in >> sleepable progs explicit bpf_rcu_read_lock() is needed before use these >> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to >> use them directly. >> >> This patch checks whether we are in rcu cs before we want to invoke >> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in >> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would >> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will >> reject if reg->type is UNTRUSTED. > > I use the following BPF Prog to test this patch: > > SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") > int iter_task_for_each_sleep(void *ctx) > { > struct task_struct *task; > struct task_struct *cur_task = bpf_get_current_task_btf(); > > if (cur_task->pid != target_pid) > return 0; > bpf_rcu_read_lock(); > bpf_for_each(process, task) { > bpf_rcu_read_unlock(); > if (task->pid == target_pid) > process_cnt += 1; > bpf_rcu_read_lock(); > } > bpf_rcu_read_unlock(); > return 0; > } > > Unfortunately, we can pass the verifier. > > Then I add some printk-messages before setting/clearing state to help > debug: > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index d151e6b43a5f..35f3fa9471a9 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct > bpf_verifier_env *env, > __mark_reg_known_zero(st); > st->type = PTR_TO_STACK; /* we don't have dedicated reg > type */ > if (is_iter_need_rcu(meta)) { > + printk("mark reg_addr : %px", st); > if (in_rcu_cs(env)) > st->type |= MEM_RCU; > else > @@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct > bpf_verifier_env *env, struct bpf_insn *insn, > return -EINVAL; > } else if (rcu_unlock) { > bpf_for_each_reg_in_vstate(env->cur_state, > state, reg, ({ > + printk("clear reg_addr : %px MEM_RCU : > %d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type & > PTR_UNTRUSTED); > if (reg->type & MEM_RCU) { > - printk("clear reg addr : %lld", > reg); > reg->type &= ~(MEM_RCU | > PTR_MAYBE_NULL); > reg->type |= PTR_UNTRUSTED; > } > > > The demsg log: > > [ 393.705324] mark reg_addr : ffff88814e40e200 > > [ 393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0 > PTR_UNTRUSTED : 0 > > [ 393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0 > PTR_UNTRUSTED : 0 > > [ 393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0 > PTR_UNTRUSTED : 0 > .... > .... > > I didn't see ffff88814e40e200 is cleared as expected because > bpf_for_each_reg_in_vstate didn't find it. > > It seems when we are doing bpf_read_unlock() in the middle of iteration > and want to clearing state through bpf_for_each_reg_in_vstate, we can > not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in > mark_stack_slots_iter(). > bpf_get_spilled_reg will skip slots if they are not STACK_SPILL, but in mark_stack_slots_iter() we has marked the slots *STACK_ITER* With the following change, everything seems work OK. diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index a3236651ec64..83c5ecccadb4 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -387,7 +387,7 @@ struct bpf_verifier_state { #define bpf_get_spilled_reg(slot, frame) \ (((slot < frame->allocated_stack / BPF_REG_SIZE) && \ - (frame->stack[slot].slot_type[0] == STACK_SPILL)) \ + (frame->stack[slot].slot_type[0] == STACK_SPILL || frame->stack[slot].slot_type[0] == STACK_ITER)) \ ? &frame->stack[slot].spilled_ptr : NULL) I am not sure whether this would harm some logic implicitly when using bpf_get_spilled_reg/bpf_for_each_spilled_reg in other place. If so, maybe we should add a extra parameter to control the picking behaviour. #define bpf_get_spilled_reg(slot, frame, stack_type) \ (((slot < frame->allocated_stack / BPF_REG_SIZE) && \ (frame->stack[slot].slot_type[0] == stack_type)) \ ? &frame->stack[slot].spilled_ptr : NULL) Thanks.
On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote: > > css_iter and process_iter should be used in rcu section. Specifically, in > sleepable progs explicit bpf_rcu_read_lock() is needed before use these > iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to > use them directly. > > This patch checks whether we are in rcu cs before we want to invoke > bpf_iter_process_new and bpf_iter_css_{pre, post}_new in > mark_stack_slots_iter(). If the rcu protection is guaranteed, we would > let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will > reject if reg->type is UNTRUSTED. it would be nice to mention where this MEM_RCU is turned into UNTRUSTED when we do rcu_read_unlock(). For someone unfamiliar with these parts of verifier (like me) this is completely unobvious. > > Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com> > --- > kernel/bpf/verifier.c | 30 ++++++++++++++++++++++++++++-- > 1 file changed, 28 insertions(+), 2 deletions(-) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 2367483bf4c2..6a6827ba7a18 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -1172,7 +1172,13 @@ static bool is_dynptr_type_expected(struct bpf_verifier_env *env, struct bpf_reg > > static void __mark_reg_known_zero(struct bpf_reg_state *reg); > > +static bool in_rcu_cs(struct bpf_verifier_env *env); > + > +/* check whether we are using bpf_iter_process_*() or bpf_iter_css_*() */ > +static bool is_iter_need_rcu(struct bpf_kfunc_call_arg_meta *meta); > + > static int mark_stack_slots_iter(struct bpf_verifier_env *env, > + struct bpf_kfunc_call_arg_meta *meta, > struct bpf_reg_state *reg, int insn_idx, > struct btf *btf, u32 btf_id, int nr_slots) > { > @@ -1193,6 +1199,12 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env, > > __mark_reg_known_zero(st); > st->type = PTR_TO_STACK; /* we don't have dedicated reg type */ > + if (is_iter_need_rcu(meta)) { > + if (in_rcu_cs(env)) > + st->type |= MEM_RCU; > + else > + st->type |= PTR_UNTRUSTED; > + } > st->live |= REG_LIVE_WRITTEN; > st->ref_obj_id = i == 0 ? id : 0; > st->iter.btf = btf; > @@ -1281,6 +1293,8 @@ static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_ > struct bpf_stack_state *slot = &state->stack[spi - i]; > struct bpf_reg_state *st = &slot->spilled_ptr; > > + if (st->type & PTR_UNTRUSTED) > + return false; > /* only main (first) slot has ref_obj_id set */ > if (i == 0 && !st->ref_obj_id) > return false; > @@ -7503,13 +7517,13 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id > return err; > } > > - err = mark_stack_slots_iter(env, reg, insn_idx, meta->btf, btf_id, nr_slots); > + err = mark_stack_slots_iter(env, meta, reg, insn_idx, meta->btf, btf_id, nr_slots); > if (err) > return err; > } else { > /* iter_next() or iter_destroy() expect initialized iter state*/ > if (!is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots)) { > - verbose(env, "expected an initialized iter_%s as arg #%d\n", > + verbose(env, "expected an initialized iter_%s as arg #%d or without bpf_rcu_read_lock()\n", > iter_type_str(meta->btf, btf_id), regno); this message makes no sense, but even if reworded it would be confusing for users. So maybe do the RCU check separately and report a clear message that this iterator is expected to be within a single continuous rcu_read_{lock+unlock} region. I do think tracking RCU regions explicitly would make for much easier to follow code, better messages, etc. Probably would be beneficial for some other RCU-protected features. But that's a separate topic. > return -EINVAL; > } > @@ -10382,6 +10396,18 @@ BTF_ID(func, bpf_percpu_obj_new_impl) > BTF_ID(func, bpf_percpu_obj_drop_impl) > BTF_ID(func, bpf_iter_css_task_new) > > +BTF_SET_START(rcu_protect_kfuns_set) > +BTF_ID(func, bpf_iter_process_new) > +BTF_ID(func, bpf_iter_css_pre_new) > +BTF_ID(func, bpf_iter_css_post_new) > +BTF_SET_END(rcu_protect_kfuns_set) > + instead of maintaining these extra special sets, why not add a KF flag, like KF_RCU_PROTECTED? > +static inline bool is_iter_need_rcu(struct bpf_kfunc_call_arg_meta *meta) > +{ > + return btf_id_set_contains(&rcu_protect_kfuns_set, meta->func_id); > +} > + > + > static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta) > { > if (meta->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] && > -- > 2.20.1 >
On Thu, Sep 14, 2023 at 1:56 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote: > > > > 在 2023/9/13 21:53, Chuyi Zhou 写道: > > Hello. > > > > 在 2023/9/12 15:01, Chuyi Zhou 写道: > >> css_iter and process_iter should be used in rcu section. Specifically, in > >> sleepable progs explicit bpf_rcu_read_lock() is needed before use these > >> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to > >> use them directly. > >> > >> This patch checks whether we are in rcu cs before we want to invoke > >> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in > >> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would > >> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will > >> reject if reg->type is UNTRUSTED. > > > > I use the following BPF Prog to test this patch: > > > > SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") > > int iter_task_for_each_sleep(void *ctx) > > { > > struct task_struct *task; > > struct task_struct *cur_task = bpf_get_current_task_btf(); > > > > if (cur_task->pid != target_pid) > > return 0; > > bpf_rcu_read_lock(); > > bpf_for_each(process, task) { > > bpf_rcu_read_unlock(); > > if (task->pid == target_pid) > > process_cnt += 1; > > bpf_rcu_read_lock(); > > } > > bpf_rcu_read_unlock(); > > return 0; > > } > > > > Unfortunately, we can pass the verifier. > > > > Then I add some printk-messages before setting/clearing state to help > > debug: > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > index d151e6b43a5f..35f3fa9471a9 100644 > > --- a/kernel/bpf/verifier.c > > +++ b/kernel/bpf/verifier.c > > @@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct > > bpf_verifier_env *env, > > __mark_reg_known_zero(st); > > st->type = PTR_TO_STACK; /* we don't have dedicated reg > > type */ > > if (is_iter_need_rcu(meta)) { > > + printk("mark reg_addr : %px", st); > > if (in_rcu_cs(env)) > > st->type |= MEM_RCU; > > else > > @@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct > > bpf_verifier_env *env, struct bpf_insn *insn, > > return -EINVAL; > > } else if (rcu_unlock) { > > bpf_for_each_reg_in_vstate(env->cur_state, > > state, reg, ({ > > + printk("clear reg_addr : %px MEM_RCU : > > %d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type & > > PTR_UNTRUSTED); > > if (reg->type & MEM_RCU) { > > - printk("clear reg addr : %lld", > > reg); > > reg->type &= ~(MEM_RCU | > > PTR_MAYBE_NULL); > > reg->type |= PTR_UNTRUSTED; > > } > > > > > > The demsg log: > > > > [ 393.705324] mark reg_addr : ffff88814e40e200 > > > > [ 393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0 > > PTR_UNTRUSTED : 0 > > > > [ 393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0 > > PTR_UNTRUSTED : 0 > > > > [ 393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0 > > PTR_UNTRUSTED : 0 > > .... > > .... > > > > I didn't see ffff88814e40e200 is cleared as expected because > > bpf_for_each_reg_in_vstate didn't find it. > > > > It seems when we are doing bpf_read_unlock() in the middle of iteration > > and want to clearing state through bpf_for_each_reg_in_vstate, we can > > not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in > > mark_stack_slots_iter(). > > > > bpf_get_spilled_reg will skip slots if they are not STACK_SPILL, but in > mark_stack_slots_iter() we has marked the slots *STACK_ITER* > > With the following change, everything seems work OK. > > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h > index a3236651ec64..83c5ecccadb4 100644 > --- a/include/linux/bpf_verifier.h > +++ b/include/linux/bpf_verifier.h > @@ -387,7 +387,7 @@ struct bpf_verifier_state { > > #define bpf_get_spilled_reg(slot, frame) \ > (((slot < frame->allocated_stack / BPF_REG_SIZE) && \ > - (frame->stack[slot].slot_type[0] == STACK_SPILL)) \ > + (frame->stack[slot].slot_type[0] == STACK_SPILL || > frame->stack[slot].slot_type[0] == STACK_ITER)) \ > ? &frame->stack[slot].spilled_ptr : NULL) > > I am not sure whether this would harm some logic implicitly when using > bpf_get_spilled_reg/bpf_for_each_spilled_reg in other place. If so, > maybe we should add a extra parameter to control the picking behaviour. > > #define bpf_get_spilled_reg(slot, frame, stack_type) > \ > (((slot < frame->allocated_stack / BPF_REG_SIZE) && \ > (frame->stack[slot].slot_type[0] == stack_type)) \ > ? &frame->stack[slot].spilled_ptr : NULL) > > Thanks. I don't think it's safe to just make bpf_get_spilled_reg, and subsequently bpf_for_each_reg_in_vstate and bpf_for_each_spilled_reg just suddenly start iterating iterator states and/or dynptrs. At least some of existing uses of those assume they are really working just with registers.
Hello. 在 2023/9/15 07:26, Andrii Nakryiko 写道: > On Thu, Sep 14, 2023 at 1:56 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote: >> >> >> >> 在 2023/9/13 21:53, Chuyi Zhou 写道: >>> Hello. >>> >>> 在 2023/9/12 15:01, Chuyi Zhou 写道: >>>> css_iter and process_iter should be used in rcu section. Specifically, in >>>> sleepable progs explicit bpf_rcu_read_lock() is needed before use these >>>> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to >>>> use them directly. >>>> >>>> This patch checks whether we are in rcu cs before we want to invoke >>>> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in >>>> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would >>>> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will >>>> reject if reg->type is UNTRUSTED. >>> >>> I use the following BPF Prog to test this patch: >>> >>> SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") >>> int iter_task_for_each_sleep(void *ctx) >>> { >>> struct task_struct *task; >>> struct task_struct *cur_task = bpf_get_current_task_btf(); >>> >>> if (cur_task->pid != target_pid) >>> return 0; >>> bpf_rcu_read_lock(); >>> bpf_for_each(process, task) { >>> bpf_rcu_read_unlock(); >>> if (task->pid == target_pid) >>> process_cnt += 1; >>> bpf_rcu_read_lock(); >>> } >>> bpf_rcu_read_unlock(); >>> return 0; >>> } >>> >>> Unfortunately, we can pass the verifier. >>> >>> Then I add some printk-messages before setting/clearing state to help >>> debug: >>> >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c >>> index d151e6b43a5f..35f3fa9471a9 100644 >>> --- a/kernel/bpf/verifier.c >>> +++ b/kernel/bpf/verifier.c >>> @@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct >>> bpf_verifier_env *env, >>> __mark_reg_known_zero(st); >>> st->type = PTR_TO_STACK; /* we don't have dedicated reg >>> type */ >>> if (is_iter_need_rcu(meta)) { >>> + printk("mark reg_addr : %px", st); >>> if (in_rcu_cs(env)) >>> st->type |= MEM_RCU; >>> else >>> @@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct >>> bpf_verifier_env *env, struct bpf_insn *insn, >>> return -EINVAL; >>> } else if (rcu_unlock) { >>> bpf_for_each_reg_in_vstate(env->cur_state, >>> state, reg, ({ >>> + printk("clear reg_addr : %px MEM_RCU : >>> %d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type & >>> PTR_UNTRUSTED); >>> if (reg->type & MEM_RCU) { >>> - printk("clear reg addr : %lld", >>> reg); >>> reg->type &= ~(MEM_RCU | >>> PTR_MAYBE_NULL); >>> reg->type |= PTR_UNTRUSTED; >>> } >>> >>> >>> The demsg log: >>> >>> [ 393.705324] mark reg_addr : ffff88814e40e200 >>> >>> [ 393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0 >>> PTR_UNTRUSTED : 0 >>> >>> [ 393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0 >>> PTR_UNTRUSTED : 0 >>> >>> [ 393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0 >>> PTR_UNTRUSTED : 0 >>> .... >>> .... >>> >>> I didn't see ffff88814e40e200 is cleared as expected because >>> bpf_for_each_reg_in_vstate didn't find it. >>> >>> It seems when we are doing bpf_read_unlock() in the middle of iteration >>> and want to clearing state through bpf_for_each_reg_in_vstate, we can >>> not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in >>> mark_stack_slots_iter(). >>> >> >> bpf_get_spilled_reg will skip slots if they are not STACK_SPILL, but in >> mark_stack_slots_iter() we has marked the slots *STACK_ITER* >> >> With the following change, everything seems work OK. >> >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h >> index a3236651ec64..83c5ecccadb4 100644 >> --- a/include/linux/bpf_verifier.h >> +++ b/include/linux/bpf_verifier.h >> @@ -387,7 +387,7 @@ struct bpf_verifier_state { >> >> #define bpf_get_spilled_reg(slot, frame) \ >> (((slot < frame->allocated_stack / BPF_REG_SIZE) && \ >> - (frame->stack[slot].slot_type[0] == STACK_SPILL)) \ >> + (frame->stack[slot].slot_type[0] == STACK_SPILL || >> frame->stack[slot].slot_type[0] == STACK_ITER)) \ >> ? &frame->stack[slot].spilled_ptr : NULL) >> >> I am not sure whether this would harm some logic implicitly when using >> bpf_get_spilled_reg/bpf_for_each_spilled_reg in other place. If so, >> maybe we should add a extra parameter to control the picking behaviour. >> >> #define bpf_get_spilled_reg(slot, frame, stack_type) >> \ >> (((slot < frame->allocated_stack / BPF_REG_SIZE) && \ >> (frame->stack[slot].slot_type[0] == stack_type)) \ >> ? &frame->stack[slot].spilled_ptr : NULL) >> >> Thanks. > > I don't think it's safe to just make bpf_get_spilled_reg, and > subsequently bpf_for_each_reg_in_vstate and bpf_for_each_spilled_reg > just suddenly start iterating iterator states and/or dynptrs. At least > some of existing uses of those assume they are really working just > with registers. IIUC, when we are doing bpf_rcu_unlock, we do need to clear the state of reg including STACK_ITER. Maybe here we only need change the logic when using bpf_for_each_reg_in_vstate to clear state in bpf_rcu_unlock and keep everything else unchanged ? Thanks.
On Thu, Sep 14, 2023 at 10:46 PM Chuyi Zhou <zhouchuyi@bytedance.com> wrote: > > Hello. > > 在 2023/9/15 07:26, Andrii Nakryiko 写道: > > On Thu, Sep 14, 2023 at 1:56 AM Chuyi Zhou <zhouchuyi@bytedance.com> wrote: > >> > >> > >> > >> 在 2023/9/13 21:53, Chuyi Zhou 写道: > >>> Hello. > >>> > >>> 在 2023/9/12 15:01, Chuyi Zhou 写道: > >>>> css_iter and process_iter should be used in rcu section. Specifically, in > >>>> sleepable progs explicit bpf_rcu_read_lock() is needed before use these > >>>> iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to > >>>> use them directly. > >>>> > >>>> This patch checks whether we are in rcu cs before we want to invoke > >>>> bpf_iter_process_new and bpf_iter_css_{pre, post}_new in > >>>> mark_stack_slots_iter(). If the rcu protection is guaranteed, we would > >>>> let st->type = PTR_TO_STACK | MEM_RCU. is_iter_reg_valid_init() will > >>>> reject if reg->type is UNTRUSTED. > >>> > >>> I use the following BPF Prog to test this patch: > >>> > >>> SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") > >>> int iter_task_for_each_sleep(void *ctx) > >>> { > >>> struct task_struct *task; > >>> struct task_struct *cur_task = bpf_get_current_task_btf(); > >>> > >>> if (cur_task->pid != target_pid) > >>> return 0; > >>> bpf_rcu_read_lock(); > >>> bpf_for_each(process, task) { > >>> bpf_rcu_read_unlock(); > >>> if (task->pid == target_pid) > >>> process_cnt += 1; > >>> bpf_rcu_read_lock(); > >>> } > >>> bpf_rcu_read_unlock(); > >>> return 0; > >>> } > >>> > >>> Unfortunately, we can pass the verifier. > >>> > >>> Then I add some printk-messages before setting/clearing state to help > >>> debug: > >>> > >>> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > >>> index d151e6b43a5f..35f3fa9471a9 100644 > >>> --- a/kernel/bpf/verifier.c > >>> +++ b/kernel/bpf/verifier.c > >>> @@ -1200,7 +1200,7 @@ static int mark_stack_slots_iter(struct > >>> bpf_verifier_env *env, > >>> __mark_reg_known_zero(st); > >>> st->type = PTR_TO_STACK; /* we don't have dedicated reg > >>> type */ > >>> if (is_iter_need_rcu(meta)) { > >>> + printk("mark reg_addr : %px", st); > >>> if (in_rcu_cs(env)) > >>> st->type |= MEM_RCU; > >>> else > >>> @@ -11472,8 +11472,8 @@ static int check_kfunc_call(struct > >>> bpf_verifier_env *env, struct bpf_insn *insn, > >>> return -EINVAL; > >>> } else if (rcu_unlock) { > >>> bpf_for_each_reg_in_vstate(env->cur_state, > >>> state, reg, ({ > >>> + printk("clear reg_addr : %px MEM_RCU : > >>> %d PTR_UNTRUSTED : %d\n ", reg, reg->type & MEM_RCU, reg->type & > >>> PTR_UNTRUSTED); > >>> if (reg->type & MEM_RCU) { > >>> - printk("clear reg addr : %lld", > >>> reg); > >>> reg->type &= ~(MEM_RCU | > >>> PTR_MAYBE_NULL); > >>> reg->type |= PTR_UNTRUSTED; > >>> } > >>> > >>> > >>> The demsg log: > >>> > >>> [ 393.705324] mark reg_addr : ffff88814e40e200 > >>> > >>> [ 393.706883] clear reg_addr : ffff88814d5f8000 MEM_RCU : 0 > >>> PTR_UNTRUSTED : 0 > >>> > >>> [ 393.707353] clear reg_addr : ffff88814d5f8078 MEM_RCU : 0 > >>> PTR_UNTRUSTED : 0 > >>> > >>> [ 393.708099] clear reg_addr : ffff88814d5f80f0 MEM_RCU : 0 > >>> PTR_UNTRUSTED : 0 > >>> .... > >>> .... > >>> > >>> I didn't see ffff88814e40e200 is cleared as expected because > >>> bpf_for_each_reg_in_vstate didn't find it. > >>> > >>> It seems when we are doing bpf_read_unlock() in the middle of iteration > >>> and want to clearing state through bpf_for_each_reg_in_vstate, we can > >>> not find the previous reg which we marked MEM_RCU/PTR_UNTRUSTED in > >>> mark_stack_slots_iter(). > >>> > >> > >> bpf_get_spilled_reg will skip slots if they are not STACK_SPILL, but in > >> mark_stack_slots_iter() we has marked the slots *STACK_ITER* > >> > >> With the following change, everything seems work OK. > >> > >> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h > >> index a3236651ec64..83c5ecccadb4 100644 > >> --- a/include/linux/bpf_verifier.h > >> +++ b/include/linux/bpf_verifier.h > >> @@ -387,7 +387,7 @@ struct bpf_verifier_state { > >> > >> #define bpf_get_spilled_reg(slot, frame) \ > >> (((slot < frame->allocated_stack / BPF_REG_SIZE) && \ > >> - (frame->stack[slot].slot_type[0] == STACK_SPILL)) \ > >> + (frame->stack[slot].slot_type[0] == STACK_SPILL || > >> frame->stack[slot].slot_type[0] == STACK_ITER)) \ > >> ? &frame->stack[slot].spilled_ptr : NULL) > >> > >> I am not sure whether this would harm some logic implicitly when using > >> bpf_get_spilled_reg/bpf_for_each_spilled_reg in other place. If so, > >> maybe we should add a extra parameter to control the picking behaviour. > >> > >> #define bpf_get_spilled_reg(slot, frame, stack_type) > >> \ > >> (((slot < frame->allocated_stack / BPF_REG_SIZE) && \ > >> (frame->stack[slot].slot_type[0] == stack_type)) \ > >> ? &frame->stack[slot].spilled_ptr : NULL) > >> > >> Thanks. > > > > I don't think it's safe to just make bpf_get_spilled_reg, and > > subsequently bpf_for_each_reg_in_vstate and bpf_for_each_spilled_reg > > just suddenly start iterating iterator states and/or dynptrs. At least > > some of existing uses of those assume they are really working just > > with registers. > > IIUC, when we are doing bpf_rcu_unlock, we do need to clear the state of > reg including STACK_ITER. > > Maybe here we only need change the logic when using > bpf_for_each_reg_in_vstate to clear state in bpf_rcu_unlock and keep > everything else unchanged ? Right, maybe. I see 10 uses of bpf_for_each_reg_in_vstate() in kernel/bpf/verifier.c. Before we change the definition of bpf_for_each_reg_in_vstate() we should validate that iterating dynptr and iter states doesn't break any of them, that's all. > > Thanks.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 2367483bf4c2..6a6827ba7a18 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1172,7 +1172,13 @@ static bool is_dynptr_type_expected(struct bpf_verifier_env *env, struct bpf_reg static void __mark_reg_known_zero(struct bpf_reg_state *reg); +static bool in_rcu_cs(struct bpf_verifier_env *env); + +/* check whether we are using bpf_iter_process_*() or bpf_iter_css_*() */ +static bool is_iter_need_rcu(struct bpf_kfunc_call_arg_meta *meta); + static int mark_stack_slots_iter(struct bpf_verifier_env *env, + struct bpf_kfunc_call_arg_meta *meta, struct bpf_reg_state *reg, int insn_idx, struct btf *btf, u32 btf_id, int nr_slots) { @@ -1193,6 +1199,12 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env, __mark_reg_known_zero(st); st->type = PTR_TO_STACK; /* we don't have dedicated reg type */ + if (is_iter_need_rcu(meta)) { + if (in_rcu_cs(env)) + st->type |= MEM_RCU; + else + st->type |= PTR_UNTRUSTED; + } st->live |= REG_LIVE_WRITTEN; st->ref_obj_id = i == 0 ? id : 0; st->iter.btf = btf; @@ -1281,6 +1293,8 @@ static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_ struct bpf_stack_state *slot = &state->stack[spi - i]; struct bpf_reg_state *st = &slot->spilled_ptr; + if (st->type & PTR_UNTRUSTED) + return false; /* only main (first) slot has ref_obj_id set */ if (i == 0 && !st->ref_obj_id) return false; @@ -7503,13 +7517,13 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id return err; } - err = mark_stack_slots_iter(env, reg, insn_idx, meta->btf, btf_id, nr_slots); + err = mark_stack_slots_iter(env, meta, reg, insn_idx, meta->btf, btf_id, nr_slots); if (err) return err; } else { /* iter_next() or iter_destroy() expect initialized iter state*/ if (!is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots)) { - verbose(env, "expected an initialized iter_%s as arg #%d\n", + verbose(env, "expected an initialized iter_%s as arg #%d or without bpf_rcu_read_lock()\n", iter_type_str(meta->btf, btf_id), regno); return -EINVAL; } @@ -10382,6 +10396,18 @@ BTF_ID(func, bpf_percpu_obj_new_impl) BTF_ID(func, bpf_percpu_obj_drop_impl) BTF_ID(func, bpf_iter_css_task_new) +BTF_SET_START(rcu_protect_kfuns_set) +BTF_ID(func, bpf_iter_process_new) +BTF_ID(func, bpf_iter_css_pre_new) +BTF_ID(func, bpf_iter_css_post_new) +BTF_SET_END(rcu_protect_kfuns_set) + +static inline bool is_iter_need_rcu(struct bpf_kfunc_call_arg_meta *meta) +{ + return btf_id_set_contains(&rcu_protect_kfuns_set, meta->func_id); +} + + static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta) { if (meta->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&