From patchwork Mon Sep 25 10:55:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 144418 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp1192976vqu; Mon, 25 Sep 2023 06:02:51 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGxJ0g+UlZqlYS2ju9XNTYQqYjzlzZHxIFyJ0HSAIlboNVj419VqloTx5/Ht45IBwgw8ky1 X-Received: by 2002:a17:90a:e14b:b0:268:7ec:51ae with SMTP id ez11-20020a17090ae14b00b0026807ec51aemr3986023pjb.41.1695646970525; Mon, 25 Sep 2023 06:02:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695646970; cv=none; d=google.com; s=arc-20160816; b=aohaD0V44Kn4lmWaMeV7m7bLryO5aHjYxl9dhVH6siCCeTUpj4i2GEGTBtzqhRVsiM dPcRJ+/LjIKgTyASaSYRbr9IIBnA4FtQboEpOlvnM/bq0Sn8hpinsCzBsknTM3KAV7/N 7qx8T6vPETEdozkoQEWErSK1VlkyVDWPboDiqjvRNfMyqXmGMc3oy1CI2BaVOfx6hzcE o8cpPExR1LUz5AGBeCHj2Lb32jrwVhLtgt4L7KbSPxDTDJLJ5Wl4UZaeqMklu4kX0Lnu kdagu1GxLFAGQY9BiHSEHC/hlu3uZLJRss2BquMi/xIcGGy1TrHS4hzWlHQbd9+NPavB Fgew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=FNW2oYGtelVvgczyk2A13Ij/ocoUNYFTCyV6b+P6gh8=; fh=DR7g1EcWKOXTEoooUPBSJXUaklSrDEYzv6YDdhz1CwE=; b=sEqKjogVE8UZzy+mO2Ja+xxdH1CSASoFrm2q/+MU34ZHF0WW1azYGHClE3p8sOXzYR QtbSlhxl/eVzLabvBBt04HYF1SCdWFaxQbXeTrPZTk3nmbkMkJC2wZH0FsU/aV7c3y3o kPWjVvOfZRXkvQ+UewUf6ajPbmxkovKGTW/KgBCiq6IfjEDXNi3xeOjB32giKStdC1Hv 0Qfq13BygHvMd7Tnfk8cQIaViqYwowWT5Y+bdxaS5biC9yhkjEi+djSHQ+W5CbF6KxE/ zTkS3ABL+h/KcZNpWhQF/G0gC92Au/J2IVfqChsSRK4+ibJEicJY7OuYQwY9oBWYRkF2 Q9bg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=OEwcpaKF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id l15-20020a65680f000000b0055ffed90cc9si9751912pgt.609.2023.09.25.06.02.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 06:02:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=OEwcpaKF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id BF5A48068203; Mon, 25 Sep 2023 03:56:21 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229997AbjIYK4K (ORCPT + 29 others); Mon, 25 Sep 2023 06:56:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229449AbjIYK4J (ORCPT ); Mon, 25 Sep 2023 06:56:09 -0400 Received: from mail-pg1-x52b.google.com (mail-pg1-x52b.google.com [IPv6:2607:f8b0:4864:20::52b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D07BFC6 for ; Mon, 25 Sep 2023 03:56:02 -0700 (PDT) Received: by mail-pg1-x52b.google.com with SMTP id 41be03b00d2f7-577fb90bb76so2999881a12.2 for ; Mon, 25 Sep 2023 03:56:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1695639362; x=1696244162; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=FNW2oYGtelVvgczyk2A13Ij/ocoUNYFTCyV6b+P6gh8=; b=OEwcpaKFyUBrhux6ODETbu0bBt7ufnZGf78YErg8LsIrzPqTYE+VUom7slonPVHwW1 97S6H+00UmkrTq0AFH35AvdQFWouBCpgNBGC6c+EPGvdRab0FJKUn48mpoBv07/wXZBv L90JM6rlJoky0sicp8VCZn9GSXf8GC/XF03ib67JI3+Emn4pFnflXpGEa7wFctDuSdpU BcBCmoLAlG4MPVRDrFAqHLNL3kxLL6B9k432WmiXU0nQ41IUQaX1TUf1Do/dV4JZRylR 0yGtKz9rE3cLW4+qWF436aqOx1UCcpeNUfjmHDhZM0UTT+36DzCKN9yKNxuR12pYto96 VaaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695639362; x=1696244162; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FNW2oYGtelVvgczyk2A13Ij/ocoUNYFTCyV6b+P6gh8=; b=vX96LHgDjBgI1ZrdGTo+nKKbL8LBgtPCp+gJGSFTvnRb9YpD8+rdTYd25G24MNgczf wUbNY8ZOIR5Ub9jaVxMJCRinZpS6M6oR/RiIGERUBWFR7QFhhmdNmFlKvF0noYck6oJ3 +7zhgP7n8pEx1h/QF7h60foS9siy/2rcxW2tuRhLi4AyJ49AzZHvKcSQXf1rxPGgoCVo ctd/Wf54ZDu+GEVHON+xbuaUdFwA70p6ldDjTQ4rGVVriLq/SbWkTR0+GvlaS69LBuAr T8wk59iLnS99QVikKtnCggB0MPVqMfbLJaUXn08v1fKq2XMiTNsrETemfYo2mxWyoplq Ll5w== X-Gm-Message-State: AOJu0Yzvbh62HSo+mW55mUqlFIPaGatVn6OwSgXA32TUHhbHGZWHg3uZ U5zeVen9IQR+x2Yx46RUG2ncGA== X-Received: by 2002:a17:90b:4f8d:b0:268:c5c7:f7f1 with SMTP id qe13-20020a17090b4f8d00b00268c5c7f7f1mr3640155pjb.29.1695639362307; Mon, 25 Sep 2023 03:56:02 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.134]) by smtp.gmail.com with ESMTPSA id y9-20020a17090a16c900b002772faee740sm2297842pje.5.2023.09.25.03.56.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 03:56:02 -0700 (PDT) From: Chuyi Zhou To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, Chuyi Zhou Subject: [PATCH bpf-next v3 1/7] cgroup: Prepare for using css_task_iter_*() in BPF Date: Mon, 25 Sep 2023 18:55:46 +0800 Message-Id: <20230925105552.817513-2-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230925105552.817513-1-zhouchuyi@bytedance.com> References: <20230925105552.817513-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Mon, 25 Sep 2023 03:56:21 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778014717608640129 X-GMAIL-MSGID: 1778014717608640129 This patch makes some preparations for using css_task_iter_*() in BPF Program. 1. Flags CSS_TASK_ITER_* are #define-s and it's not easy for bpf prog to use them. Convert them to enum so bpf prog can take them from vmlinux.h. 2. In the next patch we will add css_task_iter_*() in common kfuncs which is not safe. Since css_task_iter_*() does spin_unlock_irq() which might screw up irq flags depending on the context where bpf prog is running. So we should use irqsave/irqrestore here and the switching is harmless. Suggested-by: Alexei Starovoitov Signed-off-by: Chuyi Zhou Acked-by: Tejun Heo --- include/linux/cgroup.h | 12 +++++------- kernel/cgroup/cgroup.c | 18 ++++++++++++------ 2 files changed, 17 insertions(+), 13 deletions(-) diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index b307013b9c6c..0ef0af66080e 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -40,13 +40,11 @@ struct kernel_clone_args; #define CGROUP_WEIGHT_DFL 100 #define CGROUP_WEIGHT_MAX 10000 -/* walk only threadgroup leaders */ -#define CSS_TASK_ITER_PROCS (1U << 0) -/* walk all threaded css_sets in the domain */ -#define CSS_TASK_ITER_THREADED (1U << 1) - -/* internal flags */ -#define CSS_TASK_ITER_SKIPPED (1U << 16) +enum { + CSS_TASK_ITER_PROCS = (1U << 0), /* walk only threadgroup leaders */ + CSS_TASK_ITER_THREADED = (1U << 1), /* walk all threaded css_sets in the domain */ + CSS_TASK_ITER_SKIPPED = (1U << 16), /* internal flags */ +}; /* a css_task_iter should be treated as an opaque object */ struct css_task_iter { diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 1fb7f562289d..b6d64f3b8888 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -4917,9 +4917,11 @@ static void css_task_iter_advance(struct css_task_iter *it) void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags, struct css_task_iter *it) { + unsigned long irqflags; + memset(it, 0, sizeof(*it)); - spin_lock_irq(&css_set_lock); + spin_lock_irqsave(&css_set_lock, irqflags); it->ss = css->ss; it->flags = flags; @@ -4933,7 +4935,7 @@ void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags, css_task_iter_advance(it); - spin_unlock_irq(&css_set_lock); + spin_unlock_irqrestore(&css_set_lock, irqflags); } /** @@ -4946,12 +4948,14 @@ void css_task_iter_start(struct cgroup_subsys_state *css, unsigned int flags, */ struct task_struct *css_task_iter_next(struct css_task_iter *it) { + unsigned long irqflags; + if (it->cur_task) { put_task_struct(it->cur_task); it->cur_task = NULL; } - spin_lock_irq(&css_set_lock); + spin_lock_irqsave(&css_set_lock, irqflags); /* @it may be half-advanced by skips, finish advancing */ if (it->flags & CSS_TASK_ITER_SKIPPED) @@ -4964,7 +4968,7 @@ struct task_struct *css_task_iter_next(struct css_task_iter *it) css_task_iter_advance(it); } - spin_unlock_irq(&css_set_lock); + spin_unlock_irqrestore(&css_set_lock, irqflags); return it->cur_task; } @@ -4977,11 +4981,13 @@ struct task_struct *css_task_iter_next(struct css_task_iter *it) */ void css_task_iter_end(struct css_task_iter *it) { + unsigned long irqflags; + if (it->cur_cset) { - spin_lock_irq(&css_set_lock); + spin_lock_irqsave(&css_set_lock, irqflags); list_del(&it->iters_node); put_css_set_locked(it->cur_cset); - spin_unlock_irq(&css_set_lock); + spin_unlock_irqrestore(&css_set_lock, irqflags); } if (it->cur_dcset) From patchwork Mon Sep 25 10:55:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 144402 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp1168325vqu; Mon, 25 Sep 2023 05:22:09 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEWYiPyjySNAqbfb1paDkQt/XDJVGCWQt3P0fxJkWy4KA9qHPg6NpTK0JKDVJ7wNFFJa0xi X-Received: by 2002:a17:902:d2d0:b0:1c3:323f:f531 with SMTP id n16-20020a170902d2d000b001c3323ff531mr9228602plc.20.1695644529340; Mon, 25 Sep 2023 05:22:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695644529; cv=none; d=google.com; s=arc-20160816; b=CQLOn6So6BJDEJLFbNHXAtLhqt/r97gqDxqnROw3rrRgMXWWEv5PgYg6hOJNBJ/4Lh XNmBPchYyWUId6TCApKjgI77/mXSJiYzJ1NjRx4sk12GfOX0u75e881UGhf/yVqQMhoc t0s1bolxWg81apxmCfFkwpaTNBkgks0HWpyKdUTL9XyQkHA+TTaHgwEx2+trL3WbZK/2 J5JTIxuPP5tR5w7nq363YhpddRNezjEXoCzYWlifw26XjozZ/6iYyCGVZ6RWEhkIZ5aQ ozcprde6BYXacPqFgIeyhQXaOGFjC/bWMF9f6Fz8YFLmaSJ8riqqwn3HyAMtSdSZ+h8s /TiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=XQlSq1TwvEGm1QiqG8EIEkE42Pc+GwwgcNtMrnq+QE0=; fh=DR7g1EcWKOXTEoooUPBSJXUaklSrDEYzv6YDdhz1CwE=; b=JIcGOGPuEyZWq+fhzg6AKd4OyZ3FfH8Ib86O6fpOrbyysI1LvCkP/s3H6lt0p94Kla vYtT/x0S+QfHOjF8oT5+6HeMHFYXfldHyfU20Ou8WZAV8Bo25UEzT3X6F6GqFrF8BV4k kO9Z6RERWcpNYuG8BtqxEQ7vAtzxY2lqZyTtpRajBhZOT8KI3iv7tMNoUQrDQAydmyPY Lz1ZiwqZrqEg+4e9kqyeURHxfs4HnRBqLi0Arzjfct80pYRFtCbjAuihKEHB1NiozkTN JsciVDMsSb/e4RqyuHGqePURHUO1Y7z25xL68OnWez9no7F19b3b3Xex47AeskD3lbni Q8PA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b="jgi/LDz3"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id c1-20020a170903234100b001c4401a7e18si7455361plh.382.2023.09.25.05.22.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 05:22:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b="jgi/LDz3"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 4B692809C67B; Mon, 25 Sep 2023 03:56:40 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230330AbjIYK4h (ORCPT + 29 others); Mon, 25 Sep 2023 06:56:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230205AbjIYK4b (ORCPT ); Mon, 25 Sep 2023 06:56:31 -0400 Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2911AC6 for ; Mon, 25 Sep 2023 03:56:06 -0700 (PDT) Received: by mail-pg1-x533.google.com with SMTP id 41be03b00d2f7-564b6276941so4431647a12.3 for ; Mon, 25 Sep 2023 03:56:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1695639365; x=1696244165; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=XQlSq1TwvEGm1QiqG8EIEkE42Pc+GwwgcNtMrnq+QE0=; b=jgi/LDz3p6lAihiRW0ABIPhW3QJgNP1jwUq0X9OFwnroxZ3iOW/mJ630m4ECtasIZd oeBJgYpE468yQWL/VPfvPZNBOHyQ9uU3L4W9jdj0lpiJS0lDBXEwFtnIDPBLcsGcO4j4 DdItoiBjErd3ulD0y2guSYmaDCETXAVTVbUFkPSPThR8knq2S63tjn/TNiN/hbnZo5Fo aUsflRBJ4zbZ1hDrQ5eyVwcdSxGXEKAuH3wVvk4Jr2ZfKoeA3o6Rtpy01Fzyb+ZLeXqw f/dwPXgG0KYNbF/djX7c24yXUhZQqjt1Xaf/k0mWek+dQ+OucCj5WdKdWSYMKQEJ+nrJ w/MA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695639365; x=1696244165; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XQlSq1TwvEGm1QiqG8EIEkE42Pc+GwwgcNtMrnq+QE0=; b=SIJbxJsKD4VVUv/IlMy46NWL7TaPiavDqr/Xttpw2NcNR+q7+Dzr/EebPU3RfDYW4z twcjeOvYoqhh+pPdnYFNIRfqdBqEVPtH06mNUbUJMkolW+kdpNBxymE8HYejHGd2FHOB 2EEz+AUj80Mh18tifYeObZkfpnfFaIm3ePGVR9TgCTWgwXopKh7ryBJnJ+GDobOupF16 Ku2tHJnZ/hash31m97mhEPQbQ39aDYSzJdhVg5ppM4QZ5kO5/XY62jlVu884zErPrSXO a+oC+RLrIGqo57ChqsVF5e0vkYBh7cNMRELtp6AUaSe/Njju0xH2Wuq3ZXzJvC0pAuzZ 9+PA== X-Gm-Message-State: AOJu0Yw0d8ZdKqjqjj6qAwM7cQCq3xIU3YZT5scuH40NNch1VFN4CPgH J1Ojf/7XEHJdzEfHwez/0muq5w== X-Received: by 2002:a17:90a:a095:b0:26b:e27:8bc2 with SMTP id r21-20020a17090aa09500b0026b0e278bc2mr5663466pjp.45.1695639365579; Mon, 25 Sep 2023 03:56:05 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.134]) by smtp.gmail.com with ESMTPSA id y9-20020a17090a16c900b002772faee740sm2297842pje.5.2023.09.25.03.56.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 03:56:05 -0700 (PDT) From: Chuyi Zhou To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, Chuyi Zhou Subject: [PATCH bpf-next v3 2/7] bpf: Introduce css_task open-coded iterator kfuncs Date: Mon, 25 Sep 2023 18:55:47 +0800 Message-Id: <20230925105552.817513-3-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230925105552.817513-1-zhouchuyi@bytedance.com> References: <20230925105552.817513-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Mon, 25 Sep 2023 03:56:40 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778012158214610784 X-GMAIL-MSGID: 1778012158214610784 This patch adds kfuncs bpf_iter_css_task_{new,next,destroy} which allow creation and manipulation of struct bpf_iter_css_task in open-coded iterator style. These kfuncs actually wrapps css_task_iter_{start,next, end}. BPF programs can use these kfuncs through bpf_for_each macro for iteration of all tasks under a css. css_task_iter_*() would try to get the global spin-lock *css_set_lock*, so the bpf side has to be careful in where it allows to use this iter. Currently we only allow it in bpf_lsm and bpf iter-s. Signed-off-by: Chuyi Zhou --- kernel/bpf/helpers.c | 3 ++ kernel/bpf/task_iter.c | 53 +++++++++++++++++++ kernel/bpf/verifier.c | 23 ++++++++ .../testing/selftests/bpf/bpf_experimental.h | 7 +++ 4 files changed, 86 insertions(+) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index b0a9834f1051..189d158c9b7f 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2504,6 +2504,9 @@ BTF_ID_FLAGS(func, bpf_dynptr_slice_rdwr, KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_num_new, KF_ITER_NEW) BTF_ID_FLAGS(func, bpf_iter_num_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY) +BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_dynptr_adjust) BTF_ID_FLAGS(func, bpf_dynptr_is_null) BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly) diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c index 7473068ed313..2cfcb4dd8a37 100644 --- a/kernel/bpf/task_iter.c +++ b/kernel/bpf/task_iter.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include "mmap_unlock_work.h" @@ -803,6 +804,58 @@ const struct bpf_func_proto bpf_find_vma_proto = { .arg5_type = ARG_ANYTHING, }; +struct bpf_iter_css_task { + __u64 __opaque[1]; +} __attribute__((aligned(8))); + +struct bpf_iter_css_task_kern { + struct css_task_iter *css_it; +} __attribute__((aligned(8))); + +__bpf_kfunc int bpf_iter_css_task_new(struct bpf_iter_css_task *it, + struct cgroup_subsys_state *css, unsigned int flags) +{ + struct bpf_iter_css_task_kern *kit = (void *)it; + + BUILD_BUG_ON(sizeof(struct bpf_iter_css_task_kern) != sizeof(struct bpf_iter_css_task)); + BUILD_BUG_ON(__alignof__(struct bpf_iter_css_task_kern) != + __alignof__(struct bpf_iter_css_task)); + kit->css_it = NULL; + switch (flags) { + case CSS_TASK_ITER_PROCS | CSS_TASK_ITER_THREADED: + case CSS_TASK_ITER_PROCS: + case 0: + break; + default: + return -EINVAL; + } + + kit->css_it = bpf_mem_alloc(&bpf_global_ma, sizeof(struct css_task_iter)); + if (!kit->css_it) + return -ENOMEM; + css_task_iter_start(css, flags, kit->css_it); + return 0; +} + +__bpf_kfunc struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) +{ + struct bpf_iter_css_task_kern *kit = (void *)it; + + if (!kit->css_it) + return NULL; + return css_task_iter_next(kit->css_it); +} + +__bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) +{ + struct bpf_iter_css_task_kern *kit = (void *)it; + + if (!kit->css_it) + return; + css_task_iter_end(kit->css_it); + bpf_mem_free(&bpf_global_ma, kit->css_it); +} + DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work); static void do_mmap_read_unlock(struct irq_work *entry) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index dbba2b806017..2367483bf4c2 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -10332,6 +10332,7 @@ enum special_kfunc_type { KF_bpf_dynptr_clone, KF_bpf_percpu_obj_new_impl, KF_bpf_percpu_obj_drop_impl, + KF_bpf_iter_css_task_new, }; BTF_SET_START(special_kfunc_set) @@ -10354,6 +10355,7 @@ BTF_ID(func, bpf_dynptr_slice_rdwr) BTF_ID(func, bpf_dynptr_clone) BTF_ID(func, bpf_percpu_obj_new_impl) BTF_ID(func, bpf_percpu_obj_drop_impl) +BTF_ID(func, bpf_iter_css_task_new) BTF_SET_END(special_kfunc_set) BTF_ID_LIST(special_kfunc_list) @@ -10378,6 +10380,7 @@ BTF_ID(func, bpf_dynptr_slice_rdwr) BTF_ID(func, bpf_dynptr_clone) BTF_ID(func, bpf_percpu_obj_new_impl) BTF_ID(func, bpf_percpu_obj_drop_impl) +BTF_ID(func, bpf_iter_css_task_new) static bool is_kfunc_ret_null(struct bpf_kfunc_call_arg_meta *meta) { @@ -10902,6 +10905,20 @@ static int process_kf_arg_ptr_to_rbtree_node(struct bpf_verifier_env *env, &meta->arg_rbtree_root.field); } +static bool check_css_task_iter_allowlist(struct bpf_verifier_env *env) +{ + enum bpf_prog_type prog_type = resolve_prog_type(env->prog); + + switch (prog_type) { + case BPF_PROG_TYPE_LSM: + return true; + case BPF_TRACE_ITER: + return env->prog->aux->sleepable; + default: + return false; + } +} + static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta, int insn_idx) { @@ -11152,6 +11169,12 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ break; } case KF_ARG_PTR_TO_ITER: + if (meta->func_id == special_kfunc_list[KF_bpf_iter_css_task_new]) { + if (!check_css_task_iter_allowlist(env)) { + verbose(env, "css_task_iter is only allowed in bpf_lsm and bpf iter-s\n"); + return -EINVAL; + } + } ret = process_iter_arg(env, regno, insn_idx, meta); if (ret < 0) return ret; diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h index 4494eaa9937e..d3ea90f0e142 100644 --- a/tools/testing/selftests/bpf/bpf_experimental.h +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -162,4 +162,11 @@ extern void bpf_percpu_obj_drop_impl(void *kptr, void *meta) __ksym; /* Convenience macro to wrap over bpf_obj_drop_impl */ #define bpf_percpu_obj_drop(kptr) bpf_percpu_obj_drop_impl(kptr, NULL) +struct bpf_iter_css_task; +struct cgroup_subsys_state; +extern int bpf_iter_css_task_new(struct bpf_iter_css_task *it, + struct cgroup_subsys_state *css, unsigned int flags) __weak __ksym; +extern struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) __weak __ksym; +extern void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __weak __ksym; + #endif From patchwork Mon Sep 25 10:55:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 144667 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp1598871vqu; Mon, 25 Sep 2023 18:16:14 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFftYwbrER6zFXHSzLl15jWIGC00VtL2Mt467kMySYdL5mCh2Rl5S2YSdBgzUTuoo+2CMtz X-Received: by 2002:a17:90b:686:b0:268:13c4:b800 with SMTP id m6-20020a17090b068600b0026813c4b800mr7837670pjz.21.1695690974328; Mon, 25 Sep 2023 18:16:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695690974; cv=none; d=google.com; s=arc-20160816; b=082YXIvb71OKm0ak47WO9ig1bstddPyOBdD3K33TRkmAVxI5aAEErHc5ClMQTgAmo+ xgOOkOc+c8pSVj0mxGh+9YE1joIXJ2vCzo5iSnKVHZ7T7vOr3AnhTkUsQtKxvyxd4eTf sUsfAvto0gQ/Anvt3tJR2PjPY5n+9BpzBGV0zPqPLHj9iVkShSiMPZ7woAUOAQ8R7Wrz x+tSMgDPS99kVETXeplP+Khe9+ePuBBqFX9znbW4mLpWWfjZYenNQxkP9ge6Rm5hYV5p YipOoYBwIFt9YRXgdvYsbP7CdIpza6IAPPj6VV50Pu1L48vJLF5+03dmEzTd2+5QSLad SLOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=2O1IYq2E0iRWWNCjth+VgALshowxNmpqObLZCu3x7r8=; fh=DR7g1EcWKOXTEoooUPBSJXUaklSrDEYzv6YDdhz1CwE=; b=x3au6NzPCXeRBQPcJeteUwq9WF1FaCaC/ymyRQY/4TS/DDuUiwtKAu4c61cuvMmvAN GG1VPuMwyrEWZ6brIRO5Xtrgki1Lfj1woPj8y5p9Eev/yff8a9inQdJFufRRmub9yMxH 6dszwPZE8ZgQaRQTb+gdnzB9xB+HBHyOjzhOC5ZHHDDgCT3Pvuf7wiErw/BtFTXWHhnl 3Q0N7GO53F9Euygp0A/RHJL6sh2tjEfqa72WGptrY8ZgvQLNb1NxpbuwYfqCF1idcpVJ hLbW7j7KS5ML9sjWbAK6YHAVptDOVfCVqZuCILWibJY1x9Fw4sV3lBAjXgIHqlUoODzY VtDg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=Hs4Ulemu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id f20-20020a17090aa79400b0025c1d114af0si11077353pjq.93.2023.09.25.18.16.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 18:16:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=Hs4Ulemu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 33C41809C67C; Mon, 25 Sep 2023 03:56:42 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230304AbjIYK4o (ORCPT + 29 others); Mon, 25 Sep 2023 06:56:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38048 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230012AbjIYK4i (ORCPT ); Mon, 25 Sep 2023 06:56:38 -0400 Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 588E1CE for ; Mon, 25 Sep 2023 03:56:09 -0700 (PDT) Received: by mail-pf1-x436.google.com with SMTP id d2e1a72fcca58-6907e44665bso5432217b3a.1 for ; Mon, 25 Sep 2023 03:56:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1695639369; x=1696244169; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2O1IYq2E0iRWWNCjth+VgALshowxNmpqObLZCu3x7r8=; b=Hs4Ulemu93KqHfzQ/MHQJ8mDYlK1Ic0VWIsI/rb3zd7aDPcPKQU0/gAqUPc5ScTsvC 9l17xsaqzaK/WCg/yxLyFxN/fE51CI1E2a6z6+K3DH/nEMxpjZsB3HJEjeKUwTVh4+kN V753IpVyUQZb+6MV0V0EHn0xlFrV1AN0f+a/JUr9wupXWN2h/cQMdZs7wezwlbkG/mEr 9H1FszmGDJz2ti4Hr0b3TcKYiTMzvygac6vObPns+Zs/XSAhVnAKojJC0PgzKV5VdsWZ 2ZYDkkNK5Vqnc2K2Ezv6BnCTQPFehZGKHHWSeBZt9bO4JKI8C+Vrb90GeLYXUxdIcIba 4NkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695639369; x=1696244169; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2O1IYq2E0iRWWNCjth+VgALshowxNmpqObLZCu3x7r8=; b=prUGIqqtcVF6m5DQNg6fDWzGVLyohDb+Gu/Y28mgXnsqFhZStEUfgPoitVm5sLItMh zUq+0C5/6k+roQK89DERCZLwQOT19D7e0Tam8CswoheKsKDrV4PERlP8wVXFNv34Afp+ 56uY3kebqI5InWG1D6tr9DqYr4QlReGWVS0QmzdyCDVLSRAHhIVPvq/XlMWRrXkLPjOB HucvtI6VY3AZFWC466xoewmlyP0nqc0brGaBDrNTsrAW6PCrtJODAqx0OBPg5q6QdUyQ lO6+udGOvn2/dd5oFN/7wtWDbiQN2CWlZTrwpMmLBIEVozxZfDk0UrMQuUuREPWJhSZ1 i0Jw== X-Gm-Message-State: AOJu0YwuIzEcBxaItljh3kNAop6bjH4rdmCv2RwtDoMegqO8BpfiY8z+ KTi8V/6Nz4YUpWlz0q2bq/fEIA== X-Received: by 2002:a05:6a20:7d96:b0:152:efa4:21b with SMTP id v22-20020a056a207d9600b00152efa4021bmr7791456pzj.5.1695639368762; Mon, 25 Sep 2023 03:56:08 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.134]) by smtp.gmail.com with ESMTPSA id y9-20020a17090a16c900b002772faee740sm2297842pje.5.2023.09.25.03.56.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 03:56:08 -0700 (PDT) From: Chuyi Zhou To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, Chuyi Zhou Subject: [PATCH bpf-next v3 3/7] bpf: Introduce task open coded iterator kfuncs Date: Mon, 25 Sep 2023 18:55:48 +0800 Message-Id: <20230925105552.817513-4-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230925105552.817513-1-zhouchuyi@bytedance.com> References: <20230925105552.817513-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Mon, 25 Sep 2023 03:56:42 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778060859318129595 X-GMAIL-MSGID: 1778060859318129595 This patch adds kfuncs bpf_iter_task_{new,next,destroy} which allow creation and manipulation of struct bpf_iter_task in open-coded iterator style. BPF programs can use these kfuncs or through bpf_for_each macro to iterate all processes in the system. The API design keep consistent with SEC("iter/task"). bpf_iter_task_new() accepts a specific task and iterating type which allows: 1. iterating all process in the system 2. iterating all threads in the system 3. iterating all threads of a specific task Here we also resuse enum bpf_iter_task_type and rename BPF_TASK_ITER_TID to BPF_TASK_ITER_THREAD, rename BPF_TASK_ITER_TGID to BPF_TASK_ITER_PROC. The newly-added struct bpf_iter_task has a name collision with a selftest for the seq_file task iter's bpf skel, so the selftests/bpf/progs file is renamed in order to avoid the collision. Signed-off-by: Chuyi Zhou --- include/linux/bpf.h | 8 +- kernel/bpf/helpers.c | 3 + kernel/bpf/task_iter.c | 96 ++++++++++++++++--- .../testing/selftests/bpf/bpf_experimental.h | 5 + .../selftests/bpf/prog_tests/bpf_iter.c | 18 ++-- .../{bpf_iter_task.c => bpf_iter_tasks.c} | 0 6 files changed, 106 insertions(+), 24 deletions(-) rename tools/testing/selftests/bpf/progs/{bpf_iter_task.c => bpf_iter_tasks.c} (100%) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 87eeb3a46a1d..0ef5b7a59d62 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2194,16 +2194,16 @@ int bpf_obj_get_user(int path_fd, const char __user *pathname, int flags); * BPF_TASK_ITER_ALL (default) * Iterate over resources of every task. * - * BPF_TASK_ITER_TID + * BPF_TASK_ITER_THREAD * Iterate over resources of a task/tid. * - * BPF_TASK_ITER_TGID + * BPF_TASK_ITER_PROC * Iterate over resources of every task of a process / task group. */ enum bpf_iter_task_type { BPF_TASK_ITER_ALL = 0, - BPF_TASK_ITER_TID, - BPF_TASK_ITER_TGID, + BPF_TASK_ITER_THREAD, + BPF_TASK_ITER_PROC, }; struct bpf_iter_aux_info { diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 189d158c9b7f..556262c27a75 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2507,6 +2507,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY) +BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_dynptr_adjust) BTF_ID_FLAGS(func, bpf_dynptr_is_null) BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly) diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c index 2cfcb4dd8a37..9bcd3f9922b1 100644 --- a/kernel/bpf/task_iter.c +++ b/kernel/bpf/task_iter.c @@ -94,7 +94,7 @@ static struct task_struct *task_seq_get_next(struct bpf_iter_seq_task_common *co struct task_struct *task = NULL; struct pid *pid; - if (common->type == BPF_TASK_ITER_TID) { + if (common->type == BPF_TASK_ITER_THREAD) { if (*tid && *tid != common->pid) return NULL; rcu_read_lock(); @@ -108,7 +108,7 @@ static struct task_struct *task_seq_get_next(struct bpf_iter_seq_task_common *co return task; } - if (common->type == BPF_TASK_ITER_TGID) { + if (common->type == BPF_TASK_ITER_PROC) { rcu_read_lock(); task = task_group_seq_get_next(common, tid, skip_if_dup_files); rcu_read_unlock(); @@ -217,15 +217,15 @@ static int bpf_iter_attach_task(struct bpf_prog *prog, aux->task.type = BPF_TASK_ITER_ALL; if (linfo->task.tid != 0) { - aux->task.type = BPF_TASK_ITER_TID; + aux->task.type = BPF_TASK_ITER_THREAD; aux->task.pid = linfo->task.tid; } if (linfo->task.pid != 0) { - aux->task.type = BPF_TASK_ITER_TGID; + aux->task.type = BPF_TASK_ITER_PROC; aux->task.pid = linfo->task.pid; } if (linfo->task.pid_fd != 0) { - aux->task.type = BPF_TASK_ITER_TGID; + aux->task.type = BPF_TASK_ITER_PROC; pid = pidfd_get_pid(linfo->task.pid_fd, &flags); if (IS_ERR(pid)) @@ -305,7 +305,7 @@ task_file_seq_get_next(struct bpf_iter_seq_task_file_info *info) rcu_read_unlock(); put_task_struct(curr_task); - if (info->common.type == BPF_TASK_ITER_TID) { + if (info->common.type == BPF_TASK_ITER_THREAD) { info->task = NULL; return NULL; } @@ -566,7 +566,7 @@ task_vma_seq_get_next(struct bpf_iter_seq_task_vma_info *info) return curr_vma; next_task: - if (info->common.type == BPF_TASK_ITER_TID) + if (info->common.type == BPF_TASK_ITER_THREAD) goto finish; put_task_struct(curr_task); @@ -677,10 +677,10 @@ static const struct bpf_iter_seq_info task_seq_info = { static int bpf_iter_fill_link_info(const struct bpf_iter_aux_info *aux, struct bpf_link_info *info) { switch (aux->task.type) { - case BPF_TASK_ITER_TID: + case BPF_TASK_ITER_THREAD: info->iter.task.tid = aux->task.pid; break; - case BPF_TASK_ITER_TGID: + case BPF_TASK_ITER_PROC: info->iter.task.pid = aux->task.pid; break; default: @@ -692,9 +692,9 @@ static int bpf_iter_fill_link_info(const struct bpf_iter_aux_info *aux, struct b static void bpf_iter_task_show_fdinfo(const struct bpf_iter_aux_info *aux, struct seq_file *seq) { seq_printf(seq, "task_type:\t%s\n", iter_task_type_names[aux->task.type]); - if (aux->task.type == BPF_TASK_ITER_TID) + if (aux->task.type == BPF_TASK_ITER_THREAD) seq_printf(seq, "tid:\t%u\n", aux->task.pid); - else if (aux->task.type == BPF_TASK_ITER_TGID) + else if (aux->task.type == BPF_TASK_ITER_PROC) seq_printf(seq, "pid:\t%u\n", aux->task.pid); } @@ -856,6 +856,80 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) bpf_mem_free(&bpf_global_ma, kit->css_it); } +struct bpf_iter_task { + __u64 __opaque[2]; + __u32 __opaque_int[1]; +} __attribute__((aligned(8))); + +struct bpf_iter_task_kern { + struct task_struct *task; + struct task_struct *pos; + unsigned int type; +} __attribute__((aligned(8))); + +__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task, unsigned int type) +{ + struct bpf_iter_task_kern *kit = (void *)it; + BUILD_BUG_ON(sizeof(struct bpf_iter_task_kern) != sizeof(struct bpf_iter_task)); + BUILD_BUG_ON(__alignof__(struct bpf_iter_task_kern) != + __alignof__(struct bpf_iter_task)); + kit->task = kit->pos = NULL; + switch (type) { + case BPF_TASK_ITER_ALL: + case BPF_TASK_ITER_PROC: + case BPF_TASK_ITER_THREAD: + break; + default: + return -EINVAL; + } + + if (type == BPF_TASK_ITER_THREAD) + kit->task = task; + else + kit->task = &init_task; + kit->pos = kit->task; + kit->type = type; + return 0; +} + +__bpf_kfunc struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it) +{ + struct bpf_iter_task_kern *kit = (void *)it; + struct task_struct *pos; + unsigned int type; + + type = kit->type; + pos = kit->pos; + + if (!pos) + goto out; + + if (type == BPF_TASK_ITER_PROC) + goto get_next_task; + + kit->pos = next_thread(kit->pos); + if (kit->pos == kit->task) { + if (type == BPF_TASK_ITER_THREAD) { + kit->pos = NULL; + goto out; + } + } else + goto out; + +get_next_task: + kit->pos = next_task(kit->pos); + kit->task = kit->pos; + if (kit->pos == &init_task) + kit->pos = NULL; + +out: + return pos; +} + +__bpf_kfunc void bpf_iter_task_destroy(struct bpf_iter_task *it) +{ +} + DEFINE_PER_CPU(struct mmap_unlock_irq_work, mmap_unlock_work); static void do_mmap_read_unlock(struct irq_work *entry) diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h index d3ea90f0e142..d989775dbdb5 100644 --- a/tools/testing/selftests/bpf/bpf_experimental.h +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -169,4 +169,9 @@ extern int bpf_iter_css_task_new(struct bpf_iter_css_task *it, extern struct task_struct *bpf_iter_css_task_next(struct bpf_iter_css_task *it) __weak __ksym; extern void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it) __weak __ksym; +struct bpf_iter_task; +extern int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task, unsigned int type) __weak __ksym; +extern struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it) __weak __ksym; +extern void bpf_iter_task_destroy(struct bpf_iter_task *it) __weak __ksym; + #endif diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c index 1f02168103dd..dc60e8e125cd 100644 --- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c +++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c @@ -7,7 +7,7 @@ #include "bpf_iter_ipv6_route.skel.h" #include "bpf_iter_netlink.skel.h" #include "bpf_iter_bpf_map.skel.h" -#include "bpf_iter_task.skel.h" +#include "bpf_iter_tasks.skel.h" #include "bpf_iter_task_stack.skel.h" #include "bpf_iter_task_file.skel.h" #include "bpf_iter_task_vma.skel.h" @@ -215,12 +215,12 @@ static void *do_nothing_wait(void *arg) static void test_task_common_nocheck(struct bpf_iter_attach_opts *opts, int *num_unknown, int *num_known) { - struct bpf_iter_task *skel; + struct bpf_iter_tasks *skel; pthread_t thread_id; void *ret; - skel = bpf_iter_task__open_and_load(); - if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load")) + skel = bpf_iter_tasks__open_and_load(); + if (!ASSERT_OK_PTR(skel, "bpf_iter_tasks__open_and_load")) return; ASSERT_OK(pthread_mutex_lock(&do_nothing_mutex), "pthread_mutex_lock"); @@ -239,7 +239,7 @@ static void test_task_common_nocheck(struct bpf_iter_attach_opts *opts, ASSERT_FALSE(pthread_join(thread_id, &ret) || ret != NULL, "pthread_join"); - bpf_iter_task__destroy(skel); + bpf_iter_tasks__destroy(skel); } static void test_task_common(struct bpf_iter_attach_opts *opts, int num_unknown, int num_known) @@ -307,10 +307,10 @@ static void test_task_pidfd(void) static void test_task_sleepable(void) { - struct bpf_iter_task *skel; + struct bpf_iter_tasks *skel; - skel = bpf_iter_task__open_and_load(); - if (!ASSERT_OK_PTR(skel, "bpf_iter_task__open_and_load")) + skel = bpf_iter_tasks__open_and_load(); + if (!ASSERT_OK_PTR(skel, "bpf_iter_tasks__open_and_load")) return; do_dummy_read(skel->progs.dump_task_sleepable); @@ -320,7 +320,7 @@ static void test_task_sleepable(void) ASSERT_GT(skel->bss->num_success_copy_from_user_task, 0, "num_success_copy_from_user_task"); - bpf_iter_task__destroy(skel); + bpf_iter_tasks__destroy(skel); } static void test_task_stack(void) diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task.c b/tools/testing/selftests/bpf/progs/bpf_iter_tasks.c similarity index 100% rename from tools/testing/selftests/bpf/progs/bpf_iter_task.c rename to tools/testing/selftests/bpf/progs/bpf_iter_tasks.c From patchwork Mon Sep 25 10:55:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 144471 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp1295003vqu; Mon, 25 Sep 2023 08:28:07 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEr7cKXnChG3Gdlf1VD7l00XAxuU1Cg4tuXyzStwlMoxSRD+Q911joO1EI9o/kIm+QCUCRl X-Received: by 2002:a17:90b:4cc2:b0:277:2d7c:1be4 with SMTP id nd2-20020a17090b4cc200b002772d7c1be4mr5260145pjb.1.1695655687095; Mon, 25 Sep 2023 08:28:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695655687; cv=none; d=google.com; s=arc-20160816; b=mVsZMqG6ir4OcW4sgVNesvV18KlhIUgUBjG0n+nwnaT2UtCoW1N9K9OP4NGIbH5M5z 2/Q6z42Cle8or7T9tdEEstI4Q+HmWrph9dRLLua3eSdkM8owFwukL7YLFIU5k8XkhTEo Jbrkh1AA3cr4g6BIaqfOQd+dXjfl0jbNRum9ovkvwxL/+N95C8KOLB0RGtb3R+0ORdro RzAF6gSYIaAN7aHkCiO7MjaGzXZ6yP5JdYEHiP9ptycJrSI1po4uVuKf0w1BAFdGTou0 Sw/0gvr3YHUruGt4wza8XskdgPBUK+xMLzkQRbMHjC3apFnOqQzJWVE4mI4U3bRVxLac HnPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=hrdiacOpaESaqEXHiOrMTWSWkSGBNArwX1oK/BX79n4=; fh=DR7g1EcWKOXTEoooUPBSJXUaklSrDEYzv6YDdhz1CwE=; b=Lk8EtlA/aSXnbweRKCRvOoFPsBDboMcfafPW+xcS++DP2xbOIBU2r8jZDQ+7rl8oYK gFWBnH4I1JzK3k3oRsJ/jeg3o0q7R7ExSWCW48H/arU8itZSnECV4fMBbaK+2iHQ4zzE OI62Z+XudF1g8kF0Nio8kJnVjwbkuCr5+KRsgvaW5J43n4ztXvQoaWXJUBrjizgfgF4a +MJ03aFnvyyDtNW7y41a9B5T9PgAqqxD0aHbA2e2tw3qvSEMFrDqbYzKIsZ3XIiYHuLn Nu8BVLSpHZEi0W6wabJojdMuljehdtR8MZvEbNS4JoWNktIsl4RwqvqJW8bVduhtq5vJ EOeg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=KRTF0ghb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id ce14-20020a17090aff0e00b00263dcd605a9si12008152pjb.25.2023.09.25.08.28.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 08:28:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=KRTF0ghb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 317EC80A4995; Mon, 25 Sep 2023 03:56:49 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230021AbjIYK4T (ORCPT + 29 others); Mon, 25 Sep 2023 06:56:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38060 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229449AbjIYK4S (ORCPT ); Mon, 25 Sep 2023 06:56:18 -0400 Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 71958AB for ; Mon, 25 Sep 2023 03:56:12 -0700 (PDT) Received: by mail-pg1-x533.google.com with SMTP id 41be03b00d2f7-578d791dd91so4501403a12.0 for ; Mon, 25 Sep 2023 03:56:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1695639372; x=1696244172; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=hrdiacOpaESaqEXHiOrMTWSWkSGBNArwX1oK/BX79n4=; b=KRTF0ghbHQmLgnAF6+iQYPHAJa2puc+eAnjl1P4XiCfqDpVjoE0ci+7On5ehlDFPIw R6xnvs9V6NwgXhxUGV7G4OH5JOm+tlIn/cLGibVJw51zLJgOsNpP26XzY4TP6I1e6DgK uk28NftmHFNXunvxqUsTKRJvPh2YRdKcRm22RkbwpJ7neifTE6ldJyki/89hwlxk7Wi9 ibVyURD5xoGxUTCvKts2rUxG5L9gT42YtRl+zj9WtqR8XkpBjdKrH47s72i18eBJgdO8 iy5gcZq3HAXoWz3S2E79SEK7Pcc/v+w7xZCsqZd31AbI0S5oGu92eH9Xr0yfdNPUTqSj foug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695639372; x=1696244172; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hrdiacOpaESaqEXHiOrMTWSWkSGBNArwX1oK/BX79n4=; b=BJW9gn2kYTjT9ZO8g7IeJp1ueaLByJfFkbqESf16WeH4H341zfgZJFRFQY3rE2FO70 Wcq4NH9Y+vDTGpy/KOOElf2K1qcbO5529jtkMDefi9S+nzUIMAUvg5ix4RA3dEp1hvH5 L30m1SfJsdHISqqFJxL/2MtetAOtQTLHal9qQM9dNEn9GYsOlU46koNaAvRKoNvt8pCQ F83+0prleJ2cjpqi0Uf8DIGvUa/kRbye0mYuZxKmVxOknySZhI5ImI25lDrXjuGvSZOi FtkHiNTdYEI4VpWlrk39Fe65+RMvCEIyCPeY5lb8SprsOJqTNp107bNzWnPVtU7S/VfJ BWxw== X-Gm-Message-State: AOJu0Yw0Qy4B2iyJugebmmM+ncDybxLe8OxqlsY3tY64AvUJRFJi9JUM IGwf0cJ/REKQwMq3hvlDuiof4A== X-Received: by 2002:a17:90b:e07:b0:268:798:a28b with SMTP id ge7-20020a17090b0e0700b002680798a28bmr14579618pjb.23.1695639371959; Mon, 25 Sep 2023 03:56:11 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.134]) by smtp.gmail.com with ESMTPSA id y9-20020a17090a16c900b002772faee740sm2297842pje.5.2023.09.25.03.56.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 03:56:11 -0700 (PDT) From: Chuyi Zhou To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, Chuyi Zhou Subject: [PATCH bpf-next v3 4/7] bpf: Introduce css open-coded iterator kfuncs Date: Mon, 25 Sep 2023 18:55:49 +0800 Message-Id: <20230925105552.817513-5-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230925105552.817513-1-zhouchuyi@bytedance.com> References: <20230925105552.817513-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Mon, 25 Sep 2023 03:56:49 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778023857327200132 X-GMAIL-MSGID: 1778023857327200132 This Patch adds kfuncs bpf_iter_css_{new,next,destroy} which allow creation and manipulation of struct bpf_iter_css in open-coded iterator style. These kfuncs actually wrapps css_next_descendant_{pre, post}. css_iter can be used to: 1) iterating a sepcific cgroup tree with pre/post/up order 2) iterating cgroup_subsystem in BPF Prog, like for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre in kernel. The API design is consistent with cgroup_iter. bpf_iter_css_new accepts parameters defining iteration order and starting css. Here we also reuse BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST, BPF_CGROUP_ITER_ANCESTORS_UP enums. Signed-off-by: Chuyi Zhou --- kernel/bpf/cgroup_iter.c | 57 +++++++++++++++++++ kernel/bpf/helpers.c | 3 + .../testing/selftests/bpf/bpf_experimental.h | 6 ++ 3 files changed, 66 insertions(+) diff --git a/kernel/bpf/cgroup_iter.c b/kernel/bpf/cgroup_iter.c index 810378f04fbc..ebc3d9471f52 100644 --- a/kernel/bpf/cgroup_iter.c +++ b/kernel/bpf/cgroup_iter.c @@ -294,3 +294,60 @@ static int __init bpf_cgroup_iter_init(void) } late_initcall(bpf_cgroup_iter_init); + +struct bpf_iter_css { + __u64 __opaque[2]; + __u32 __opaque_int[1]; +} __attribute__((aligned(8))); + +struct bpf_iter_css_kern { + struct cgroup_subsys_state *start; + struct cgroup_subsys_state *pos; + int order; +} __attribute__((aligned(8))); + +__bpf_kfunc int bpf_iter_css_new(struct bpf_iter_css *it, + struct cgroup_subsys_state *start, enum bpf_cgroup_iter_order order) +{ + struct bpf_iter_css_kern *kit = (void *)it; + kit->start = NULL; + BUILD_BUG_ON(sizeof(struct bpf_iter_css_kern) != sizeof(struct bpf_iter_css)); + BUILD_BUG_ON(__alignof__(struct bpf_iter_css_kern) != __alignof__(struct bpf_iter_css)); + switch (order) { + case BPF_CGROUP_ITER_DESCENDANTS_PRE: + case BPF_CGROUP_ITER_DESCENDANTS_POST: + case BPF_CGROUP_ITER_ANCESTORS_UP: + break; + default: + return -EINVAL; + } + + kit->start = start; + kit->pos = NULL; + kit->order = order; + return 0; +} + +__bpf_kfunc struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) +{ + struct bpf_iter_css_kern *kit = (void *)it; + if (!kit->start) + return NULL; + + switch (kit->order) { + case BPF_CGROUP_ITER_DESCENDANTS_PRE: + kit->pos = css_next_descendant_pre(kit->pos, kit->start); + break; + case BPF_CGROUP_ITER_DESCENDANTS_POST: + kit->pos = css_next_descendant_post(kit->pos, kit->start); + break; + default: + kit->pos = kit->pos ? kit->pos->parent : kit->start; + } + + return kit->pos; +} + +__bpf_kfunc void bpf_iter_css_destroy(struct bpf_iter_css *it) +{ +} diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 556262c27a75..9c3af36249a2 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2510,6 +2510,9 @@ BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY) +BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_css_next, KF_ITER_NEXT | KF_RET_NULL) +BTF_ID_FLAGS(func, bpf_iter_css_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_dynptr_adjust) BTF_ID_FLAGS(func, bpf_dynptr_is_null) BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly) diff --git a/tools/testing/selftests/bpf/bpf_experimental.h b/tools/testing/selftests/bpf/bpf_experimental.h index d989775dbdb5..aa247d1d81d1 100644 --- a/tools/testing/selftests/bpf/bpf_experimental.h +++ b/tools/testing/selftests/bpf/bpf_experimental.h @@ -174,4 +174,10 @@ extern int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task, extern struct task_struct *bpf_iter_task_next(struct bpf_iter_task *it) __weak __ksym; extern void bpf_iter_task_destroy(struct bpf_iter_task *it) __weak __ksym; +struct bpf_iter_css; +extern int bpf_iter_css_new(struct bpf_iter_css *it, + struct cgroup_subsys_state *start, enum bpf_cgroup_iter_order order) __weak __ksym; +extern struct cgroup_subsys_state *bpf_iter_css_next(struct bpf_iter_css *it) __weak __ksym; +extern void bpf_iter_css_destroy(struct bpf_iter_css *it) __weak __ksym; + #endif From patchwork Mon Sep 25 10:55:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 144405 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp1173364vqu; Mon, 25 Sep 2023 05:30:43 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH5lBOeG8NVNcaDPBZPF2hfj+4U2v8+3YfYhRdfBBry2vMt2SZO60BcK3US/hhSvpu4Hf2K X-Received: by 2002:a05:6a20:9151:b0:15e:b8a1:57b9 with SMTP id x17-20020a056a20915100b0015eb8a157b9mr2570439pzc.24.1695645043197; Mon, 25 Sep 2023 05:30:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695645043; cv=none; d=google.com; s=arc-20160816; b=aFW7a0sUyupQRoGvJVTAuvmTjXkeZ6KKU6F4HmjQGXi5DzUWGtC0dnQ/lPPEFMDP/B /RVD2oQ1ZvcCLrGx7dsV9QUSIZSF18BJjvbgxoGXICtDfP3EwjScbFiJvrKNLjhNGrXp PaX8Nrfsr3WWeU/r6XaJkihXXjEoWvPAdTrSugU8mv9fWWlzXSybOaLSGZQdToP2BAgw +NjSIGxzTef8UL7VhplALbekGTF0P/EaVa4h69XSWctEdRtA4fF7roCAqnbbMtGFjbN4 AEU/J3vG8RDRhguNCgCMAz4ckCFpkfRHgkkgcGOngbUn6XseOVflT5HZ7Nn2nek4UXA0 tR9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=gLE+SrQZXFdwPr+sEB0pr0StWC+p/+XlG2q7XgaVDLQ=; fh=DR7g1EcWKOXTEoooUPBSJXUaklSrDEYzv6YDdhz1CwE=; b=pRYAVs1moAUuZFF6z2QRo4RS6chqCgsL5HfOoOs5GX9rBTcyED2JswT/yA3dBJeDS8 D/E7IJ9LSDePMd9Ztk3cRZrDmI3QeGXCW25P5tSQTai+LmuErYF7QaZublsvOAHKwR1N 5MxqdkbSj6WdHVtNIcr9sNJU00hNZwuSzkjMl+g2Bqft62Wxtc00VkImJOj8iqD3QDXL djuIIRHYxDWHZusgunyd3D8Na8BMwi9e6y0y70xCwnyTyD1aRhA/ernpwdP+hOh+1FNG DN6wMGTD/z9RJrpYeQXKBtQfLl+0mpubkra+TzB78sRsiMyV0yTdSyJssTI3sGbFcVcH DBTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=NvJckRph; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id z11-20020a6552cb000000b00578b8016c40si9771162pgp.93.2023.09.25.05.30.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 05:30:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=NvJckRph; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id B0AE3801B93A; Mon, 25 Sep 2023 03:56:47 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230382AbjIYK4q (ORCPT + 29 others); Mon, 25 Sep 2023 06:56:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60092 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230020AbjIYK4m (ORCPT ); Mon, 25 Sep 2023 06:56:42 -0400 Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D2F4BE for ; Mon, 25 Sep 2023 03:56:15 -0700 (PDT) Received: by mail-pj1-x102c.google.com with SMTP id 98e67ed59e1d1-27758c8f579so945173a91.0 for ; Mon, 25 Sep 2023 03:56:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1695639375; x=1696244175; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=gLE+SrQZXFdwPr+sEB0pr0StWC+p/+XlG2q7XgaVDLQ=; b=NvJckRphZNg9kep0IcwHIeLgk3VAh0dS6Jmu+pbR9sv0DzC8bpQe3UjM30WT6H7biq D+NiYYE4Fl4Nfv1M7MM8Ktu7Mn/Np7aucbzJwKM4BXO7Av69MwgoyoQ4ErX4HC4xbzGE I/Ibgkpibtmec0n2V5rsJp//Vd5ThTBMrT9uRvnpUwp9WjgiIZHCVro7l4eM3RCMiJ4Y 9phhaZl7qpDPiqOdToliJ5Zwq0/lPKA9yvYt7RR42fhB6oCkpQt2jWK2aizqYc2s4Aqo o+6RXVFxR/NMc8M3p0deJrTTI9AjIsYTAH13LWSAEzfuobWBw9Qkau6uuuOM7ZES8Os8 p/Kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695639375; x=1696244175; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gLE+SrQZXFdwPr+sEB0pr0StWC+p/+XlG2q7XgaVDLQ=; b=WW/oQbcT+Ukg6PrRsxDxMXhGQ9DRLt3jLoooU1v8Eq/dlL212V0G6QQsyyH6sxTh33 lIg4pg0ETsTQdrWZxoY34OVIGUHXzu/zrlBGWrK5B5n4c7fCHh7kNaiL2hjp25Pk29Ka BSoPp+xGFyOYE0xazTeRog2yGgB4ukcFXrueLiBpqp8a48D8JDbkle0pyBhNBS/pl6Ih Tw9nj6BrUBk6xyjcgnICpvY5MTp75u05ydudWbNkqLHGPMngciKgcUwXs+bbg6Pnpz6z Ksatn0irIGvqe7WsKEGQs2IvyvwF7hsfqOtgMQUwsvxHXolNYZXlVWjpD6CHkgr8dLBf STyw== X-Gm-Message-State: AOJu0YwMWuVikuSQAzyXiUnROzDz4ibM6htjiTajGZs8pch+Pu+doBX0 tWQ2z+Llv680HPJb6KP368XPDg== X-Received: by 2002:a17:90b:3588:b0:274:b4ce:7049 with SMTP id mm8-20020a17090b358800b00274b4ce7049mr3776534pjb.34.1695639375020; Mon, 25 Sep 2023 03:56:15 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.134]) by smtp.gmail.com with ESMTPSA id y9-20020a17090a16c900b002772faee740sm2297842pje.5.2023.09.25.03.56.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 03:56:14 -0700 (PDT) From: Chuyi Zhou To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, Chuyi Zhou Subject: [PATCH bpf-next v3 5/7] bpf: teach the verifier to enforce css_iter and task_iter in RCU CS Date: Mon, 25 Sep 2023 18:55:50 +0800 Message-Id: <20230925105552.817513-6-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230925105552.817513-1-zhouchuyi@bytedance.com> References: <20230925105552.817513-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Mon, 25 Sep 2023 03:56:47 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778012696594957082 X-GMAIL-MSGID: 1778012696594957082 css_iter and task_iter should be used in rcu section. Specifically, in sleepable progs explicit bpf_rcu_read_lock() is needed before use these iters. In normal bpf progs that have implicit rcu_read_lock(), it's OK to use them directly. This patch adds a new a KF flag KF_RCU_PROTECTED for bpf_iter_task_new and bpf_iter_css_new. It means the kfunc should be used in RCU CS. We check whether we are in rcu cs before we want to invoke this kfunc. If the rcu protection is guaranteed, we would let st->type = PTR_TO_STACK | MEM_RCU. Once user do rcu_unlock during the iteration, state MEM_RCU of regs would be cleared. is_iter_reg_valid_init() will reject if reg->type is UNTRUSTED. It is worth noting that currently, bpf_rcu_read_unlock does not clear the state of the STACK_ITER reg, since bpf_for_each_spilled_reg only considers STACK_SPILL. This patch also let bpf_for_each_spilled_reg search STACK_ITER. Signed-off-by: Chuyi Zhou --- include/linux/bpf_verifier.h | 19 ++++++++------ include/linux/btf.h | 1 + kernel/bpf/helpers.c | 4 +-- kernel/bpf/verifier.c | 48 +++++++++++++++++++++++++++--------- 4 files changed, 50 insertions(+), 22 deletions(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index a3236651ec64..b5cdcc332b0a 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -385,19 +385,18 @@ struct bpf_verifier_state { u32 jmp_history_cnt; }; -#define bpf_get_spilled_reg(slot, frame) \ +#define bpf_get_spilled_reg(slot, frame, mask) \ (((slot < frame->allocated_stack / BPF_REG_SIZE) && \ - (frame->stack[slot].slot_type[0] == STACK_SPILL)) \ + ((1 << frame->stack[slot].slot_type[0]) & (mask))) \ ? &frame->stack[slot].spilled_ptr : NULL) /* Iterate over 'frame', setting 'reg' to either NULL or a spilled register. */ -#define bpf_for_each_spilled_reg(iter, frame, reg) \ - for (iter = 0, reg = bpf_get_spilled_reg(iter, frame); \ +#define bpf_for_each_spilled_reg(iter, frame, reg, mask) \ + for (iter = 0, reg = bpf_get_spilled_reg(iter, frame, mask); \ iter < frame->allocated_stack / BPF_REG_SIZE; \ - iter++, reg = bpf_get_spilled_reg(iter, frame)) + iter++, reg = bpf_get_spilled_reg(iter, frame, mask)) -/* Invoke __expr over regsiters in __vst, setting __state and __reg */ -#define bpf_for_each_reg_in_vstate(__vst, __state, __reg, __expr) \ +#define bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, __mask, __expr) \ ({ \ struct bpf_verifier_state *___vstate = __vst; \ int ___i, ___j; \ @@ -409,7 +408,7 @@ struct bpf_verifier_state { __reg = &___regs[___j]; \ (void)(__expr); \ } \ - bpf_for_each_spilled_reg(___j, __state, __reg) { \ + bpf_for_each_spilled_reg(___j, __state, __reg, __mask) { \ if (!__reg) \ continue; \ (void)(__expr); \ @@ -417,6 +416,10 @@ struct bpf_verifier_state { } \ }) +/* Invoke __expr over regsiters in __vst, setting __state and __reg */ +#define bpf_for_each_reg_in_vstate(__vst, __state, __reg, __expr) \ + bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, 1 << STACK_SPILL, __expr) + /* linked list of verifier states used to prune search */ struct bpf_verifier_state_list { struct bpf_verifier_state state; diff --git a/include/linux/btf.h b/include/linux/btf.h index 928113a80a95..c2231c64d60b 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -74,6 +74,7 @@ #define KF_ITER_NEW (1 << 8) /* kfunc implements BPF iter constructor */ #define KF_ITER_NEXT (1 << 9) /* kfunc implements BPF iter next method */ #define KF_ITER_DESTROY (1 << 10) /* kfunc implements BPF iter destructor */ +#define KF_RCU_PROTECTED (1 << 11) /* kfunc should be protected by rcu cs when they are invoked */ /* * Tag marking a kernel function as a kfunc. This is meant to minimize the diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 9c3af36249a2..aa9e03fbfe1a 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2507,10 +2507,10 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS) BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY) -BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_task_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED) BTF_ID_FLAGS(func, bpf_iter_task_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_task_destroy, KF_ITER_DESTROY) -BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS) +BTF_ID_FLAGS(func, bpf_iter_css_new, KF_ITER_NEW | KF_TRUSTED_ARGS | KF_RCU_PROTECTED) BTF_ID_FLAGS(func, bpf_iter_css_next, KF_ITER_NEXT | KF_RET_NULL) BTF_ID_FLAGS(func, bpf_iter_css_destroy, KF_ITER_DESTROY) BTF_ID_FLAGS(func, bpf_dynptr_adjust) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 2367483bf4c2..a065e18a0b3a 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1172,7 +1172,12 @@ static bool is_dynptr_type_expected(struct bpf_verifier_env *env, struct bpf_reg static void __mark_reg_known_zero(struct bpf_reg_state *reg); +static bool in_rcu_cs(struct bpf_verifier_env *env); + +static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta); + static int mark_stack_slots_iter(struct bpf_verifier_env *env, + struct bpf_kfunc_call_arg_meta *meta, struct bpf_reg_state *reg, int insn_idx, struct btf *btf, u32 btf_id, int nr_slots) { @@ -1193,6 +1198,12 @@ static int mark_stack_slots_iter(struct bpf_verifier_env *env, __mark_reg_known_zero(st); st->type = PTR_TO_STACK; /* we don't have dedicated reg type */ + if (is_kfunc_rcu_protected(meta)) { + if (in_rcu_cs(env)) + st->type |= MEM_RCU; + else + st->type |= PTR_UNTRUSTED; + } st->live |= REG_LIVE_WRITTEN; st->ref_obj_id = i == 0 ? id : 0; st->iter.btf = btf; @@ -1267,7 +1278,7 @@ static bool is_iter_reg_valid_uninit(struct bpf_verifier_env *env, return true; } -static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg, +static int is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_state *reg, struct btf *btf, u32 btf_id, int nr_slots) { struct bpf_func_state *state = func(env, reg); @@ -1275,26 +1286,28 @@ static bool is_iter_reg_valid_init(struct bpf_verifier_env *env, struct bpf_reg_ spi = iter_get_spi(env, reg, nr_slots); if (spi < 0) - return false; + return -EINVAL; for (i = 0; i < nr_slots; i++) { struct bpf_stack_state *slot = &state->stack[spi - i]; struct bpf_reg_state *st = &slot->spilled_ptr; + if (st->type & PTR_UNTRUSTED) + return -EPERM; /* only main (first) slot has ref_obj_id set */ if (i == 0 && !st->ref_obj_id) - return false; + return -EINVAL; if (i != 0 && st->ref_obj_id) - return false; + return -EINVAL; if (st->iter.btf != btf || st->iter.btf_id != btf_id) - return false; + return -EINVAL; for (j = 0; j < BPF_REG_SIZE; j++) if (slot->slot_type[j] != STACK_ITER) - return false; + return -EINVAL; } - return true; + return 0; } /* Check if given stack slot is "special": @@ -7503,15 +7516,20 @@ static int process_iter_arg(struct bpf_verifier_env *env, int regno, int insn_id return err; } - err = mark_stack_slots_iter(env, reg, insn_idx, meta->btf, btf_id, nr_slots); + err = mark_stack_slots_iter(env, meta, reg, insn_idx, meta->btf, btf_id, nr_slots); if (err) return err; } else { /* iter_next() or iter_destroy() expect initialized iter state*/ - if (!is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots)) { - verbose(env, "expected an initialized iter_%s as arg #%d\n", + err = is_iter_reg_valid_init(env, reg, meta->btf, btf_id, nr_slots); + switch (err) { + case -EINVAL: + verbose(env, "expected an initialized iter_%s as arg #%d or without bpf_rcu_read_lock()\n", iter_type_str(meta->btf, btf_id), regno); - return -EINVAL; + return err; + case -EPERM: + verbose(env, "expected an RCU CS when using %s\n", meta->func_name); + return err; } spi = iter_get_spi(env, reg, nr_slots); @@ -10092,6 +10110,11 @@ static bool is_kfunc_rcu(struct bpf_kfunc_call_arg_meta *meta) return meta->kfunc_flags & KF_RCU; } +static bool is_kfunc_rcu_protected(struct bpf_kfunc_call_arg_meta *meta) +{ + return meta->kfunc_flags & KF_RCU_PROTECTED; +} + static bool __kfunc_param_match_suffix(const struct btf *btf, const struct btf_param *arg, const char *suffix) @@ -11428,6 +11451,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, if (env->cur_state->active_rcu_lock) { struct bpf_func_state *state; struct bpf_reg_state *reg; + u32 clear_mask = (1 << STACK_SPILL) | (1 << STACK_ITER); if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) { verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n"); @@ -11438,7 +11462,7 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn, verbose(env, "nested rcu read lock (kernel function %s)\n", func_name); return -EINVAL; } else if (rcu_unlock) { - bpf_for_each_reg_in_vstate(env->cur_state, state, reg, ({ + bpf_for_each_reg_in_vstate_mask(env->cur_state, state, reg, clear_mask, ({ if (reg->type & MEM_RCU) { reg->type &= ~(MEM_RCU | PTR_MAYBE_NULL); reg->type |= PTR_UNTRUSTED; From patchwork Mon Sep 25 10:55:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 144532 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp1410132vqu; Mon, 25 Sep 2023 11:41:59 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFEZhWTiaVf3wgXuTI84zA+IHXcaz/7P1yyMZd/38dx9C90yNIA6D6WcINOMvp7++ebj7/Z X-Received: by 2002:a05:6358:41a1:b0:143:8574:4311 with SMTP id w33-20020a05635841a100b0014385744311mr9956369rwc.12.1695667319209; Mon, 25 Sep 2023 11:41:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695667319; cv=none; d=google.com; s=arc-20160816; b=huFW8KGakhup622AzrX3FZCTYVqm8oGNfJ4tO7wks973Q4Y+XbAoEUH9BesgzGhHkv tHEEpOCHHCD4IUl8XzY38nKqMuFy/Gz3V/j4FaRPMEgG06S/PeJu1Ox/Qz7L4Egrx9Bt /e5oSj/lJ/rgM/9/2rTFzZ1EjJJqmgH30jSmFB0h1NyboWa4zdHyn1bftP4+n/NL7Uqa Oc8zO6Ixc7ZeerkAiuO6poVSucKglRFAHwPu2hhAxLPqywhexXcVWChwGQJCOG9+L6Ro fKvpMeIpxEfKjQQ9KWl55sf8Wx5p69nQROs+l6W1z5xi4MS7n/qOcuhnLbKIzlObU5TQ PJlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=FOdJP7oL9V0pfghlkrFj0G9yXM6wChDw4505+CoOkdQ=; fh=DR7g1EcWKOXTEoooUPBSJXUaklSrDEYzv6YDdhz1CwE=; b=aFdpSGr07Xa1tw3LDv4mLBmXM2KdYZBnWnc8QDJl93OBrRRsGeeD+nCm7BHcHdOLtm xfKYaJrCpqiYyekTc4f/G0i6rkjI1WoKrIFtPGj2jLb7B08XyZzCfcIpTa2g/y6X8FC5 ro7WDfoxvF4RYDyTl/56hPo7iItDHWZHbxU2ltmqYO3rrZpNRSH2bLRFq/loEgUvQWCv nIGI2sJS2lbXqoguTyAFE3myVj0di3QpJTV9L4L8uhLklp/mMbxAf6w2r0vs/U0O5rIp QE+WIf3ssmBxpKiT8GnFmU8/mjTHAkPOiywp0qOK1rYCuzCJ698SdK2qReZSad3OLwz3 FuEw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=BmiMF2kc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id cn1-20020a056a020a8100b005774cf04028si9921518pgb.764.2023.09.25.11.41.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 11:41:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=BmiMF2kc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 32E6F8099CC9; Mon, 25 Sep 2023 03:57:00 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230467AbjIYK4y (ORCPT + 29 others); Mon, 25 Sep 2023 06:56:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230408AbjIYK4r (ORCPT ); Mon, 25 Sep 2023 06:56:47 -0400 Received: from mail-pj1-x1036.google.com (mail-pj1-x1036.google.com [IPv6:2607:f8b0:4864:20::1036]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 862F2DF for ; Mon, 25 Sep 2023 03:56:18 -0700 (PDT) Received: by mail-pj1-x1036.google.com with SMTP id 98e67ed59e1d1-27758c8f579so945217a91.0 for ; Mon, 25 Sep 2023 03:56:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1695639378; x=1696244178; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=FOdJP7oL9V0pfghlkrFj0G9yXM6wChDw4505+CoOkdQ=; b=BmiMF2kcWkyOidvwJfG4QC0zB74vzpVlPTXjptXjUwOvBGUjyJEfURbh7VpmAvc4eG hBOBtzXtXgdgt9ICBWbxBE2lXh0sQmvM8VMFnpSv3wMwwq2dXQvYb/PVJxTpU2DKtm+w 2Wo7D7VtMacgw0AIhg3JUjYNNDxe1ZVkWVSnNMOf3kWywBDA3DnOXHzXHkgdCSzffpw8 3dDuv/AhBrfpwW5WwSRtgdIdPZYOK7xBRViLSX1Ax9fhbebxCREH1+S2wuUZKzReuo42 9PnCRV5hGpLJFBeTtmquyy7k2QDp4PaTPk7Svm6RaFSgBPBzTBass38uMF5pqJp8c6tk ntQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695639378; x=1696244178; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FOdJP7oL9V0pfghlkrFj0G9yXM6wChDw4505+CoOkdQ=; b=Wiht/YmCyXXNePXYbbiPbQ28ZorG9Bh1T6UpS/nvHP0VSUlYUTSJKVWVzvF2cjP0P8 gw+RdAsEGaQB8RRyG6sUP8NeZVoeabfcKTiLoKhb4IOFIzbvCIcHiSrHsjzquT3+D/R+ Frexk8vwZOc8nLryzQ3I+s6rFaFRrbYhyDgjc2IZy82tgxzO8e8DFVJo+1d/y8ZZkVk9 ouV+dOiYkWGY1LdI7k6K6C16Lv+GqJf1yvShFkHoXObe/CrQryTHBQgGuElbK6B1PCHv p/t4jUsMHs7GrKKeJWH/8R9CSMNka0kuqnl0ZiRDH1T1R2YCFbPtXZ2gKCHv1E6iaHLB ZYYA== X-Gm-Message-State: AOJu0YyWeIxh1vnGHzGlNboRvkYjDNsZ0VGedACT6tgjBi7ENvimyyIB SE1w3vpQ30B6GBa4dEm6/eOvFz0gIaooNCxRe6I= X-Received: by 2002:a17:90b:164e:b0:26b:49de:13bd with SMTP id il14-20020a17090b164e00b0026b49de13bdmr3912935pjb.36.1695639378052; Mon, 25 Sep 2023 03:56:18 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.134]) by smtp.gmail.com with ESMTPSA id y9-20020a17090a16c900b002772faee740sm2297842pje.5.2023.09.25.03.56.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 03:56:17 -0700 (PDT) From: Chuyi Zhou To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, Chuyi Zhou Subject: [PATCH bpf-next v3 6/7] bpf: Let bpf_iter_task_new accept null task ptr Date: Mon, 25 Sep 2023 18:55:51 +0800 Message-Id: <20230925105552.817513-7-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230925105552.817513-1-zhouchuyi@bytedance.com> References: <20230925105552.817513-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Mon, 25 Sep 2023 03:57:00 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778036054538271105 X-GMAIL-MSGID: 1778036054538271105 When using task_iter to iterate all threads of a specific task, we enforce that the user must pass a valid task pointer to ensure safety. However, when iterating all threads/process in the system, BPF verifier still require a valid ptr instead of "nullable" pointer, even though it's pointless, which is a kind of surprising from usability standpoint. It would be nice if we could let that kfunc accept a explicit null pointer when we are using BPF_TASK_ITER_ALL/BPF_TASK_ITER_PROC and a valid pointer when using BPF_TASK_ITER_THREAD. Given a trival kfunc: __bpf_kfunc void FN(struct TYPE_A *obj) BPF Prog would reject a nullptr for obj. The error info is: "arg#x pointer type xx xx must point to scalar, or struct with scalar" reported by get_kfunc_ptr_arg_type(). The reg->type is SCALAR_VALUE and the btf type of ref_t is not scalar or scalar_struct which leads to the rejection of get_kfunc_ptr_arg_type. This patch reuse the __opt annotation which was used to indicate that the buffer associated with an __sz or __szk argument may be null: __bpf_kfunc void FN(struct TYPE_A *obj__opt) Here __opt indicates obj can be optional, user can pass a explicit nullptr or a normal TYPE_A pointer. In get_kfunc_ptr_arg_type(), we will detect whether the current arg is optional and register is null, If so, return a new kfunc_ptr_arg_type KF_ARG_PTR_TO_NULL and skip to the next arg in check_kfunc_args(). Signed-off-by: Chuyi Zhou --- kernel/bpf/task_iter.c | 7 +++++-- kernel/bpf/verifier.c | 13 ++++++++++++- 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c index 9bcd3f9922b1..7ac007f161cc 100644 --- a/kernel/bpf/task_iter.c +++ b/kernel/bpf/task_iter.c @@ -867,7 +867,7 @@ struct bpf_iter_task_kern { unsigned int type; } __attribute__((aligned(8))); -__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task, unsigned int type) +__bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct *task__opt, unsigned int type) { struct bpf_iter_task_kern *kit = (void *)it; BUILD_BUG_ON(sizeof(struct bpf_iter_task_kern) != sizeof(struct bpf_iter_task)); @@ -877,14 +877,17 @@ __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_task *it, struct task_struct * switch (type) { case BPF_TASK_ITER_ALL: case BPF_TASK_ITER_PROC: + break; case BPF_TASK_ITER_THREAD: + if (!task__opt) + return -EINVAL; break; default: return -EINVAL; } if (type == BPF_TASK_ITER_THREAD) - kit->task = task; + kit->task = task__opt; else kit->task = &init_task; kit->pos = kit->task; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index a065e18a0b3a..a79204c75a90 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -10331,6 +10331,7 @@ enum kfunc_ptr_arg_type { KF_ARG_PTR_TO_CALLBACK, KF_ARG_PTR_TO_RB_ROOT, KF_ARG_PTR_TO_RB_NODE, + KF_ARG_PTR_TO_NULL, }; enum special_kfunc_type { @@ -10425,6 +10426,12 @@ static bool is_kfunc_bpf_rcu_read_unlock(struct bpf_kfunc_call_arg_meta *meta) return meta->func_id == special_kfunc_list[KF_bpf_rcu_read_unlock]; } +static inline bool is_kfunc_arg_optional_null(struct bpf_reg_state *reg, + const struct btf *btf, const struct btf_param *arg) +{ + return register_is_null(reg) && is_kfunc_arg_optional(btf, arg); +} + static enum kfunc_ptr_arg_type get_kfunc_ptr_arg_type(struct bpf_verifier_env *env, struct bpf_kfunc_call_arg_meta *meta, @@ -10497,6 +10504,8 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env, */ if (!btf_type_is_scalar(ref_t) && !__btf_type_is_scalar_struct(env, meta->btf, ref_t, 0) && (arg_mem_size ? !btf_type_is_void(ref_t) : 1)) { + if (is_kfunc_arg_optional_null(reg, meta->btf, &args[argno])) + return KF_ARG_PTR_TO_NULL; verbose(env, "arg#%d pointer type %s %s must point to %sscalar, or struct with scalar\n", argno, btf_type_str(ref_t), ref_tname, arg_mem_size ? "void, " : ""); return -EINVAL; @@ -11028,7 +11037,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ } if ((is_kfunc_trusted_args(meta) || is_kfunc_rcu(meta)) && - (register_is_null(reg) || type_may_be_null(reg->type))) { + (register_is_null(reg) || type_may_be_null(reg->type)) && !is_kfunc_arg_optional(meta->btf, &args[i])) { verbose(env, "Possibly NULL pointer passed to trusted arg%d\n", i); return -EACCES; } @@ -11053,6 +11062,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ return kf_arg_type; switch (kf_arg_type) { + case KF_ARG_PTR_TO_NULL: + continue; case KF_ARG_PTR_TO_ALLOC_BTF_ID: case KF_ARG_PTR_TO_BTF_ID: if (!is_kfunc_trusted_args(meta) && !is_kfunc_rcu(meta)) From patchwork Mon Sep 25 10:55:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 144450 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6359:6f87:b0:13f:353d:d1ed with SMTP id tl7csp1149777rwb; Mon, 25 Sep 2023 07:43:59 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGWtHk1sRz1K1xfLhkihX3QZ6Mmbq1hNs74DPiMXgg0OWobB26HAy6WRLXgdoW7IrGk5Mms X-Received: by 2002:a05:6808:2385:b0:3a5:a4b4:f93e with SMTP id bp5-20020a056808238500b003a5a4b4f93emr8900984oib.7.1695653039328; Mon, 25 Sep 2023 07:43:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695653039; cv=none; d=google.com; s=arc-20160816; b=U55i9NhrII8Lt+psV9ez+FSmW2XbxdYoBlj0OXvNX+GtNuTr+jL6SWLV2/JSzw8naL 3Fvoy6e85d8HTZrAXjAeeOJAU6KJffKaOLDuYIoM2IbgXFXc+wCEFeo+5XBO33+OFgCX bLaPg8PyhdCViVuxVKJzWCzMjzByqS/d2fLzyQK2y0AgYZulLxaKULflxSIcSDeKEmrQ ZBcExscSJkNL3CA12WnAY8Wd7ARZR3VhQ1ilcEugArPCJ49ExAtHMvickMYB0ywS5FbM j3fU2h6M8VaBJXnhKGiuEHCsCXNyEzmJG4vb87YS83Sb2SqSeAVRjihs+0NuF+JvQhdY j0FA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=PeexHTe4tyT9b8KjfC5Hux5b6TXgeqGsjdg9N5cei7A=; fh=DR7g1EcWKOXTEoooUPBSJXUaklSrDEYzv6YDdhz1CwE=; b=rvs8APO3LRC/cssFqrcqtWGnaf8FZF2BYAr2mC86tEsGAqOBMRkbu8yk7A0eEyozLn z5yirMQwGaYvH8bSbgbnM3GeuivZItYFB4WUww+3kWbC9a+3VJx1hgubRk1avigMYK3O L+ZQBLJCtXVi8T3ZRC7XGI6ziSc15r0bxCYy50/RiCqD27X62iqq6BHSlH+alskXbR/9 f10z6eOlh1XiQ51B5PFhhMZmipX67PmNuxX2tEqDNVluVAK44222/DHm1Sq2dwGMwIUY xN8tBl6jjOnZs5kiJY4JFy9XfHt2Rco6D6P+c/HDV5ttrH7t/L9XtEWgQhWwyFeNdmtu bL0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=DJTvbnPx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from morse.vger.email (morse.vger.email. [2620:137:e000::3:1]) by mx.google.com with ESMTPS id bs191-20020a6328c8000000b00579926e0863si9990583pgb.159.2023.09.25.07.43.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 07:43:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) client-ip=2620:137:e000::3:1; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=DJTvbnPx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 466CB8068E2B; Mon, 25 Sep 2023 03:56:40 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230203AbjIYK4b (ORCPT + 29 others); Mon, 25 Sep 2023 06:56:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60140 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229570AbjIYK42 (ORCPT ); Mon, 25 Sep 2023 06:56:28 -0400 Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4011DA for ; Mon, 25 Sep 2023 03:56:21 -0700 (PDT) Received: by mail-pj1-x102b.google.com with SMTP id 98e67ed59e1d1-27731387c4eso2362714a91.1 for ; Mon, 25 Sep 2023 03:56:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1695639381; x=1696244181; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=PeexHTe4tyT9b8KjfC5Hux5b6TXgeqGsjdg9N5cei7A=; b=DJTvbnPxKNlckBIp/yZdGsnol8EzuG12HZFlDBz/lS7lr2KpT1R4w/kN0ORq/SWja7 eCQTN5mpyXu1GkCJs2kyhcA1lRwo1wpVuh8yRwwCwlOhX/LVujRojaXTdJ2+SNqlrflW Ld/2d2lLf2eKi+EHamhju8c94FilImS84dabNfzM5BCTN9J8lQGEhXFQWmaRiFZyP+yW ZBDHb5h1Vc8fgwh9/xyttqlkuOX61OB4ulJsGOJPhGArZx9J3XrCtRjC1samcmKcNi3I ie/XMgMD9Rx8B6f/JhzOBT6HdHuCy3SQg3GtWqnMqkVDS6b97owQEmiPsFBOsXLvO04K lHkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695639381; x=1696244181; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PeexHTe4tyT9b8KjfC5Hux5b6TXgeqGsjdg9N5cei7A=; b=L4GtPUcOD+mzSiesv3W9jvKuwKHoYqE/MdaMYpMUSlnNPn6iVdxcXVDWi6h2kOESln kKxqdNckCu+BElS3Wp0gH/armMuxKwnRJjZmW//4yRu7F0xq/k8InL4naItYmnlRkCqf BWkt1htPnaSwWKmneNeeqrgdKQlrJdKETX6iM1Hv26znQQG0a5lxDTlGH7JpEM6v3O7F YpEoNMbD1YYn7ZWe2CI0xFolnyN/GQBEYxkbfr80noDOVa/3aeCs0aElyCif3mn2HZMT pKV/wpCvmlNCAkU2IX+ZL3tvoFs/gA1NgqbNuoKwQ4hlobmepkk6eIjVbWQq4L4hUx5w cooQ== X-Gm-Message-State: AOJu0YxusEiJdwZHQJwBMHoW3lQimbIpZTZPvP4/zjJ7e9paJlra+a6b f2WriFFFtHT9zDk/psmQ+2My1g== X-Received: by 2002:a17:90a:af87:b0:270:1586:b014 with SMTP id w7-20020a17090aaf8700b002701586b014mr6211776pjq.28.1695639381161; Mon, 25 Sep 2023 03:56:21 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.134]) by smtp.gmail.com with ESMTPSA id y9-20020a17090a16c900b002772faee740sm2297842pje.5.2023.09.25.03.56.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 03:56:20 -0700 (PDT) From: Chuyi Zhou To: bpf@vger.kernel.org Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org, tj@kernel.org, linux-kernel@vger.kernel.org, Chuyi Zhou Subject: [PATCH bpf-next v3 7/7] selftests/bpf: Add tests for open-coded task and css iter Date: Mon, 25 Sep 2023 18:55:52 +0800 Message-Id: <20230925105552.817513-8-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230925105552.817513-1-zhouchuyi@bytedance.com> References: <20230925105552.817513-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Mon, 25 Sep 2023 03:56:40 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778021081473894237 X-GMAIL-MSGID: 1778021081473894237 This patch adds three subtests to demonstrate these patterns and validating correctness. subtest1: 1) We use task_iter to iterate all process in the system and search for the current process with a given pid. 2) We create some threads in current process context, and use BPF_TASK_ITER_PROC to iterate all threads of current process. As expected, we would find all the threads of current process. 3) We create some threads and use BPF_TASK_ITER_ALL to iterate all threads in the system. As expected, we would find all the threads which was created. subtest2: We create a cgroup and add the current task to the cgroup. In the BPF program, we would use bpf_for_each(css_task, task, css) to iterate all tasks under the cgroup. As expected, we would find the current process. subtest3: 1) We create a cgroup tree. In the BPF program, we use bpf_for_each(css, pos, root, XXX) to iterate all descendant under the root with pre and post order. As expected, we would find all descendant and the last iterating cgroup in post-order is root cgroup, the first iterating cgroup in pre-order is root cgroup. 2) We wse BPF_CGROUP_ITER_ANCESTORS_UP to traverse the cgroup tree starting from leaf and root separately, and record the height. The diff of the hights would be the total tree_high - 1. Signed-off-by: Chuyi Zhou --- .../testing/selftests/bpf/prog_tests/iters.c | 161 ++++++++++++++++++ .../testing/selftests/bpf/progs/iters_task.c | 132 ++++++++++++++ .../selftests/bpf/progs/iters_task_failure.c | 103 +++++++++++ 3 files changed, 396 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/iters_task.c create mode 100644 tools/testing/selftests/bpf/progs/iters_task_failure.c diff --git a/tools/testing/selftests/bpf/prog_tests/iters.c b/tools/testing/selftests/bpf/prog_tests/iters.c index 10804ae5ae97..f5bb3c5887db 100644 --- a/tools/testing/selftests/bpf/prog_tests/iters.c +++ b/tools/testing/selftests/bpf/prog_tests/iters.c @@ -1,13 +1,22 @@ // SPDX-License-Identifier: GPL-2.0 /* Copyright (c) 2023 Meta Platforms, Inc. and affiliates. */ +#include +#include +#include +#include +#include +#include #include +#include "cgroup_helpers.h" #include "iters.skel.h" #include "iters_state_safety.skel.h" #include "iters_looping.skel.h" #include "iters_num.skel.h" #include "iters_testmod_seq.skel.h" +#include "iters_task.skel.h" +#include "iters_task_failure.skel.h" static void subtest_num_iters(void) { @@ -90,6 +99,151 @@ static void subtest_testmod_seq_iters(void) iters_testmod_seq__destroy(skel); } +static pthread_mutex_t do_nothing_mutex; + +static void *do_nothing_wait(void *arg) +{ + pthread_mutex_lock(&do_nothing_mutex); + pthread_mutex_unlock(&do_nothing_mutex); + + pthread_exit(arg); +} + +#define thread_num 5 + +static void subtest_task_iters(void) +{ + struct iters_task *skel; + pthread_t thread_ids[thread_num]; + void *ret; + int err; + + skel = iters_task__open(); + if (!ASSERT_OK_PTR(skel, "skel_open")) + goto cleanup; + bpf_program__set_autoload(skel->progs.iter_task_for_each_sleep, true); + err = iters_task__load(skel); + if (!ASSERT_OK(err, "skel_load")) + goto cleanup; + skel->bss->target_pid = getpid(); + err = iters_task__attach(skel); + if (!ASSERT_OK(err, "iters_task__attach")) + goto cleanup; + pthread_mutex_lock(&do_nothing_mutex); + for (int i = 0; i < thread_num; i++) + ASSERT_OK(pthread_create(&thread_ids[i], NULL, &do_nothing_wait, NULL), "pthread_create"); + + syscall(SYS_getpgid); + iters_task__detach(skel); + ASSERT_EQ(skel->bss->process_cnt, 1, "process_cnt"); + ASSERT_EQ(skel->bss->thread_cnt, thread_num + 1, "thread_cnt"); + ASSERT_EQ(skel->bss->all_thread_cnt, thread_num + 1, "all_thread_cnt"); + pthread_mutex_unlock(&do_nothing_mutex); + for (int i = 0; i < thread_num; i++) + pthread_join(thread_ids[i], &ret); +cleanup: + iters_task__destroy(skel); +} + +extern int stack_mprotect(void); + +static void subtest_css_task_iters(void) +{ + struct iters_task *skel; + int err, cg_fd, cg_id; + const char *cgrp_path = "/cg1"; + + err = setup_cgroup_environment(); + if (!ASSERT_OK(err, "setup_cgroup_environment")) + goto cleanup; + cg_fd = create_and_get_cgroup(cgrp_path); + if (!ASSERT_GE(cg_fd, 0, "cg_create")) + goto cleanup; + cg_id = get_cgroup_id(cgrp_path); + err = join_cgroup(cgrp_path); + if (!ASSERT_OK(err, "setup_cgroup_environment")) + goto cleanup; + + skel = iters_task__open(); + if (!ASSERT_OK_PTR(skel, "skel_open")) + goto cleanup; + + bpf_program__set_autoload(skel->progs.iter_css_task_for_each, true); + err = iters_task__load(skel); + if (!ASSERT_OK(err, "skel_load")) + goto cleanup; + + skel->bss->target_pid = getpid(); + skel->bss->root_cg_id = cg_id; + err = iters_task__attach(skel); + + err = stack_mprotect(); + if (!ASSERT_OK(err, "iters_task__attach")) + goto cleanup; + + iters_task__detach(skel); + ASSERT_EQ(skel->bss->css_task_cnt, 1, "css_task_cnt"); + +cleanup: + cleanup_cgroup_environment(); + iters_task__destroy(skel); +} + +static void subtest_css_iters(void) +{ + struct iters_task *skel; + struct { + const char *path; + int fd; + } cgs[] = { + { "/cg1" }, + { "/cg1/cg2" }, + { "/cg1/cg2/cg3" }, + { "/cg1/cg2/cg3/cg4" }, + }; + int err, cg_nr = ARRAY_SIZE(cgs); + int i; + + err = setup_cgroup_environment(); + if (!ASSERT_OK(err, "setup_cgroup_environment")) + goto cleanup; + for (i = 0; i < cg_nr; i++) { + cgs[i].fd = create_and_get_cgroup(cgs[i].path); + if (!ASSERT_GE(cgs[i].fd, 0, "cg_create")) + goto cleanup; + } + + skel = iters_task__open(); + if (!ASSERT_OK_PTR(skel, "skel_open")) + goto cleanup; + bpf_program__set_autoload(skel->progs.iter_css_for_each, true); + err = iters_task__load(skel); + if (!ASSERT_OK(err, "skel_load")) + goto cleanup; + + skel->bss->target_pid = getpid(); + skel->bss->root_cg_id = get_cgroup_id(cgs[0].path); + skel->bss->leaf_cg_id = get_cgroup_id(cgs[cg_nr - 1].path); + err = iters_task__attach(skel); + + if (!ASSERT_OK(err, "iters_task__attach")) + goto cleanup; + + syscall(SYS_getpgid); + ASSERT_EQ(skel->bss->pre_css_dec_cnt, cg_nr, "pre order search dec count"); + ASSERT_EQ(skel->bss->first_cg_id, get_cgroup_id(cgs[0].path), + "pre order search first cgroup id"); + + ASSERT_EQ(skel->bss->post_css_dec_cnt, cg_nr, "post order search dec count"); + ASSERT_EQ(skel->bss->last_cg_id, get_cgroup_id(cgs[0].path), + "post order search last cgroup id"); + ASSERT_EQ(skel->bss->tree_high, cg_nr - 1, "tree high"); + iters_task__detach(skel); +cleanup: + cleanup_cgroup_environment(); + iters_task__destroy(skel); +} + void test_iters(void) { RUN_TESTS(iters_state_safety); @@ -103,4 +257,11 @@ void test_iters(void) subtest_num_iters(); if (test__start_subtest("testmod_seq")) subtest_testmod_seq_iters(); + if (test__start_subtest("task")) + subtest_task_iters(); + if (test__start_subtest("css_task")) + subtest_css_task_iters(); + if (test__start_subtest("css")) + subtest_css_iters(); + RUN_TESTS(iters_task_failure); } diff --git a/tools/testing/selftests/bpf/progs/iters_task.c b/tools/testing/selftests/bpf/progs/iters_task.c new file mode 100644 index 000000000000..0bf922fc750f --- /dev/null +++ b/tools/testing/selftests/bpf/progs/iters_task.c @@ -0,0 +1,132 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "vmlinux.h" +#include +#include +#include "bpf_misc.h" +#include "bpf_experimental.h" + +char _license[] SEC("license") = "GPL"; + +pid_t target_pid = 0; +int process_cnt = 0; +int thread_cnt = 0; +int all_thread_cnt = 0; +int css_task_cnt = 0; +int post_css_dec_cnt = 0; +int pre_css_dec_cnt = 0; +int tree_high = 0; + +u64 last_cg_id; +u64 first_cg_id; + +u64 root_cg_id; +u64 leaf_cg_id; + + +struct cgroup *bpf_cgroup_from_id(u64 cgid) __ksym; +struct cgroup *bpf_cgroup_acquire(struct cgroup *cgrp) __ksym; +void bpf_cgroup_release(struct cgroup *p) __ksym; +void bpf_rcu_read_lock(void) __ksym; +void bpf_rcu_read_unlock(void) __ksym; + +SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") +int iter_task_for_each_sleep(void *ctx) +{ + struct task_struct *pos; + struct task_struct *cur_task = bpf_get_current_task_btf(); + + if (cur_task->pid != target_pid) + return 0; + bpf_rcu_read_lock(); + bpf_for_each(task, pos, NULL, BPF_TASK_ITER_PROC) { + if (pos->pid == target_pid) + process_cnt += 1; + } + bpf_for_each(task, pos, cur_task, BPF_TASK_ITER_THREAD) { + thread_cnt += 1; + } + bpf_for_each(task, pos, NULL, BPF_TASK_ITER_ALL) { + if (pos->tgid == target_pid) + all_thread_cnt += 1; + } + bpf_rcu_read_unlock(); + return 0; +} + +SEC("?lsm/file_mprotect") +int BPF_PROG(iter_css_task_for_each) +{ + struct task_struct *task; + struct task_struct *cur_task = bpf_get_current_task_btf(); + + if (cur_task->pid != target_pid) + return 0; + + struct cgroup *cgrp = bpf_cgroup_from_id(root_cg_id); + + if (cgrp == NULL) + return 0; + struct cgroup_subsys_state *css = &cgrp->self; + + bpf_for_each(css_task, task, css, CSS_TASK_ITER_PROCS) { + if (!task) + continue; + if (task->pid == target_pid) + css_task_cnt += 1; + } + bpf_cgroup_release(cgrp); + return 0; +} + +SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") +int iter_css_for_each(const void *ctx) +{ + struct task_struct *cur_task = bpf_get_current_task_btf(); + + if (cur_task->pid != target_pid) + return 0; + + struct cgroup *root_cgrp = bpf_cgroup_from_id(root_cg_id); + + if (!root_cgrp) + return 0; + + struct cgroup *leaf_cgrp = bpf_cgroup_from_id(leaf_cg_id); + + if (!leaf_cgrp) { + bpf_cgroup_release(root_cgrp); + return 0; + } + struct cgroup_subsys_state *root_css = &root_cgrp->self; + struct cgroup_subsys_state *leaf_css = &leaf_cgrp->self; + struct cgroup_subsys_state *pos = NULL; + + bpf_rcu_read_lock(); + + bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_DESCENDANTS_POST) { + struct cgroup *cur_cgrp = pos->cgroup; + + post_css_dec_cnt += 1; + if (cur_cgrp) + last_cg_id = cur_cgrp->kn->id; + } + + bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_DESCENDANTS_PRE) { + struct cgroup *cur_cgrp = pos->cgroup; + + pre_css_dec_cnt += 1; + if (cur_cgrp && !first_cg_id) + first_cg_id = cur_cgrp->kn->id; + } + + bpf_for_each(css, pos, leaf_css, BPF_CGROUP_ITER_ANCESTORS_UP) + tree_high += 1; + + bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_ANCESTORS_UP) + tree_high -= 1; + + bpf_rcu_read_unlock(); + bpf_cgroup_release(root_cgrp); + bpf_cgroup_release(leaf_cgrp); + return 0; +} diff --git a/tools/testing/selftests/bpf/progs/iters_task_failure.c b/tools/testing/selftests/bpf/progs/iters_task_failure.c new file mode 100644 index 000000000000..40eb2704d94f --- /dev/null +++ b/tools/testing/selftests/bpf/progs/iters_task_failure.c @@ -0,0 +1,103 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "vmlinux.h" +#include +#include +#include "bpf_misc.h" +#include "bpf_experimental.h" + +char _license[] SEC("license") = "GPL"; + +struct cgroup *bpf_cgroup_from_id(u64 cgid) __ksym; +struct cgroup *bpf_cgroup_acquire(struct cgroup *cgrp) __ksym; +void bpf_cgroup_release(struct cgroup *p) __ksym; +void bpf_rcu_read_lock(void) __ksym; +void bpf_rcu_read_unlock(void) __ksym; + +SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") +__failure __msg("expected an RCU CS when using bpf_iter_task_next") +int BPF_PROG(iter_tasks_without_lock) +{ + struct task_struct *pos; + + bpf_for_each(task, pos, NULL, BPF_TASK_ITER_PROC) { + + } + return 0; +} + +SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") +__failure __msg("expected an RCU CS when using bpf_iter_css_next") +int BPF_PROG(iter_css_without_lock) +{ + u64 cg_id = 0; + struct cgroup *cgrp = bpf_cgroup_from_id(cg_id); + + if (!cgrp) + return 0; + struct cgroup_subsys_state *root_css = &cgrp->self; + struct cgroup_subsys_state *pos; + + bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_DESCENDANTS_POST) { + + } + bpf_cgroup_release(cgrp); + return 0; +} + +SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") +__failure __msg("expected an RCU CS when using bpf_iter_task_next") +int BPF_PROG(iter_tasks_lock_and_unlock) +{ + struct task_struct *pos; + + bpf_rcu_read_lock(); + bpf_for_each(task, pos, NULL, BPF_TASK_ITER_PROC) { + bpf_rcu_read_unlock(); + + bpf_rcu_read_lock(); + } + bpf_rcu_read_unlock(); + return 0; +} + +SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") +__failure __msg("expected an RCU CS when using bpf_iter_css_next") +int BPF_PROG(iter_css_lock_and_unlock) +{ + u64 cg_id = 0; + struct cgroup *cgrp = bpf_cgroup_from_id(cg_id); + + if (!cgrp) + return 0; + struct cgroup_subsys_state *root_css = &cgrp->self; + struct cgroup_subsys_state *pos; + + bpf_rcu_read_lock(); + bpf_for_each(css, pos, root_css, BPF_CGROUP_ITER_DESCENDANTS_POST) { + bpf_rcu_read_unlock(); + + bpf_rcu_read_lock(); + } + bpf_rcu_read_unlock(); + bpf_cgroup_release(cgrp); + return 0; +} + +SEC("?fentry.s/" SYS_PREFIX "sys_getpgid") +__failure __msg("css_task_iter is only allowed in bpf_lsm and bpf iter-s") +int BPF_PROG(iter_css_task_for_each) +{ + struct task_struct *task; + int cg_id = bpf_get_current_cgroup_id(); + struct cgroup *cgrp = bpf_cgroup_from_id(cg_id); + + if (cgrp == NULL) + return 0; + struct cgroup_subsys_state *css = &cgrp->self; + + bpf_for_each(css_task, task, css, CSS_TASK_ITER_PROCS) { + + } + bpf_cgroup_release(cgrp); + return 0; +}