From patchwork Thu Jul 27 07:36:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 126764 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a985:0:b0:3e4:2afc:c1 with SMTP id t5csp929640vqo; Thu, 27 Jul 2023 00:58:14 -0700 (PDT) X-Google-Smtp-Source: APBJJlGl6BQUzx+T7M/SbdHu2+bHmmWEG2dxcazLSmz/0AfaSs7JyddPbbkAWyPdgp5qACg8SgUH X-Received: by 2002:a17:907:72d4:b0:99b:ce01:457 with SMTP id du20-20020a17090772d400b0099bce010457mr1795154ejc.34.1690444694288; Thu, 27 Jul 2023 00:58:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690444694; cv=none; d=google.com; s=arc-20160816; b=s27RMX8g3DxQVXqd77iLSw1QC5tsG7XIX2xkEtolu/7fQ4WoEtLf3/0FWfrGV6tni+ nwVIRMmuheV5CQDgCkkUZpDXlT2y3ZuBU7FSrKKZjswqjaGH7TKeqdMmNORfbu/Er6aw IY4/G7Ti3OizlJyRC87hoUZkTWhV9wUPg4xqzARPOUsBZjou3d2kiqHUzkSw+Z6KAz7f FzqE6kn3/cD/EgtuYj7Sz7EeoS41VKy70qm9tmcxKADd4uYqZRqJmzC5WVOTkat+GLaE 1lbiuc+cEqYT2CAQd+zgMEgSpVwmOr6CBuRcMBvDyRahvxqMDgFneYgDwxPBDUsBYwIz Wulw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=GFgH3qRamcsfut4sZg8cGFh78nFJgYUtbeqMHlI/23M=; fh=hkVHq7oq/x6T47Z/W+8iW7aAYD01qDHXzBTbsOMDe4M=; b=ZMarYDwJWPaQB+z7dPQ41xU1zsFZ9Ma3fQRIvb3mTKLZdSnITO/H1saMUcYtsm6p6h ht1ui4dMHbzMxrsGL/Rm7O5uMNm0DHjRji8v5LfvfghQ+I52mRsekwQcvQoyajm/VBzS AJxVsFzQ9u5uv4NYknuN5soeHjwHmSq/2pUbkhoK5VvRIHkpjiCozxSb8V9NLfw4Bg9d rmhMFz/8qrxUO8ekymHADzGgU7ZJ4Tc7/3kOEIjqKGn0ckKGtgYsiW1QYODVLrkIJVIN EuDoo42j6Sk+PnskVQONAFRMyXXsJClN+l4toOxp9r50dQIW/l2MRXD/0asKdqjX+xnj YHEw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=dHR9kmWm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j8-20020a170906050800b00982a6e33d3esi570569eja.1046.2023.07.27.00.57.50; Thu, 27 Jul 2023 00:58:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=dHR9kmWm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233248AbjG0Hoc (ORCPT + 99 others); Thu, 27 Jul 2023 03:44:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51566 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233573AbjG0HnL (ORCPT ); Thu, 27 Jul 2023 03:43:11 -0400 Received: from mail-ot1-x335.google.com (mail-ot1-x335.google.com [IPv6:2607:f8b0:4864:20::335]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 631008688 for ; Thu, 27 Jul 2023 00:37:13 -0700 (PDT) Received: by mail-ot1-x335.google.com with SMTP id 46e09a7af769-6bb29b9044dso579608a34.1 for ; Thu, 27 Jul 2023 00:37:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690443432; x=1691048232; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=GFgH3qRamcsfut4sZg8cGFh78nFJgYUtbeqMHlI/23M=; b=dHR9kmWmcQ9le6M7YmaPbGgzg8LU1j88y2o2q+oGRZ09Fjo75XD675sOOoB3fiL7S/ mvJy/2hkUTJRAHi6RATqiBH6JYJsnMtW6BK0jXCyn0bBEKPWYUF1jDM3izq7+igh1ObQ Y9kqSsKfoASH3X3PRcbfgEXdnXZOLsODsIA8MS/dvlAWCNQk5GD6lwZUKjQpttwZvN2o zmgFhVSWqHDWZCO9dNxMtAI2ybRcdpb8qFZsydXSWI3EN7l3qgaodQMhiNjXu62rcr4p YGPDZRKOb2HsGrtEJAIlVBHUd0ED8g4LSMQnivwL3n8AQr1iacPB/KkD20p7/XffTHFO 79Zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690443432; x=1691048232; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GFgH3qRamcsfut4sZg8cGFh78nFJgYUtbeqMHlI/23M=; b=gwtzG8CsXqO8cRo2eD3yrKHKXs9fhnM69qWCr62EMfHf0u3FJcRaO4nGmcp5t6vD1G GC3oc4Zgmes4ttk9e+wfiZCZMVFzmVr3xDzM+mBIg7GroxtZ6wwyXn7wt20zsvkygjNN a0vO3oP1FeH3xixj91NT3k99e8lGrtDVUoI66i8n6tHDbtiZ1smPI3iuEJjiofhavsBz DN1YwwUAsNA71dnu3YgAAYOCT1yUWYgXnIPRuAYqQp5Jiik9+FB6sVzHaxS9BqDWChsT beVMs8y2rx1udo7NdDUh66V0p5g80nhlIgnFayYSSgbPQ2Bg3LE5BLWU4r+RXUQP11w/ hYvg== X-Gm-Message-State: ABy/qLbNZqwof/4CpRbsmoMA61tW9GCVQJrF9V1YDGpsCVpsUjS8m7RK iHPEhcsts8vZsLhbGsD3m+vXOg== X-Received: by 2002:a05:6358:9328:b0:135:4003:784c with SMTP id x40-20020a056358932800b001354003784cmr1695218rwa.17.1690443432576; Thu, 27 Jul 2023 00:37:12 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.134]) by smtp.gmail.com with ESMTPSA id s196-20020a6377cd000000b005638a70110bsm733919pgc.65.2023.07.27.00.37.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Jul 2023 00:37:12 -0700 (PDT) From: Chuyi Zhou To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, Chuyi Zhou Subject: [RFC PATCH 1/5] bpf: Introduce BPF_PROG_TYPE_OOM_POLICY Date: Thu, 27 Jul 2023 15:36:28 +0800 Message-Id: <20230727073632.44983-2-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230727073632.44983-1-zhouchuyi@bytedance.com> References: <20230727073632.44983-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772559735540182329 X-GMAIL-MSGID: 1772559735540182329 This patch introduces a BPF_PROG_TYPE_OOM_POLICY program type. This prog will be used to select a leaf memcg as victim from the memcg tree when global oom is invoked. The program takes two sibling cgroup's id as parameters and return a comparison result indicating which one should be chosen as the victim. Suggested-by: Abel Wu Signed-off-by: Chuyi Zhou --- include/linux/bpf_oom.h | 22 +++++ include/linux/bpf_types.h | 2 + include/uapi/linux/bpf.h | 14 ++++ kernel/bpf/syscall.c | 10 +++ mm/oom_kill.c | 168 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 216 insertions(+) create mode 100644 include/linux/bpf_oom.h diff --git a/include/linux/bpf_oom.h b/include/linux/bpf_oom.h new file mode 100644 index 000000000000..f4235a83d3bb --- /dev/null +++ b/include/linux/bpf_oom.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef _BPF_OOM_H +#define _BPF_OOM_H + +#include +#include +#include + +struct bpf_oom_policy { + struct bpf_prog_array __rcu *progs; +}; + +int oom_policy_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog); +int oom_policy_prog_detach(const union bpf_attr *attr); +int oom_policy_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr); + +int __bpf_run_oom_policy(u64 cg_id_1, u64 cg_id_2); + +bool bpf_oom_policy_enabled(void); + +#endif diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index fc0d6f32c687..8ab6009b7dd9 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -83,6 +83,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_SYSCALL, bpf_syscall, BPF_PROG_TYPE(BPF_PROG_TYPE_NETFILTER, netfilter, struct bpf_nf_ctx, struct bpf_nf_ctx) #endif +BPF_PROG_TYPE(BPF_PROG_TYPE_OOM_POLICY, oom_policy, + struct bpf_oom_ctx, struct bpf_oom_ctx) BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 60a9d59beeab..9da0d61cf703 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -987,6 +987,7 @@ enum bpf_prog_type { BPF_PROG_TYPE_SK_LOOKUP, BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */ BPF_PROG_TYPE_NETFILTER, + BPF_PROG_TYPE_OOM_POLICY, }; enum bpf_attach_type { @@ -1036,6 +1037,7 @@ enum bpf_attach_type { BPF_LSM_CGROUP, BPF_STRUCT_OPS, BPF_NETFILTER, + BPF_OOM_POLICY, __MAX_BPF_ATTACH_TYPE }; @@ -6825,6 +6827,18 @@ struct bpf_cgroup_dev_ctx { __u32 minor; }; +enum { + BPF_OOM_CMP_EQUAL = (1ULL << 0), + BPF_OOM_CMP_GREATER = (1ULL << 1), + BPF_OOM_CMP_LESS = (1ULL << 2), +}; + +struct bpf_oom_ctx { + __u64 cg_id_1; + __u64 cg_id_2; + __u8 cmp_ret; +}; + struct bpf_raw_tracepoint_args { __u64 args[0]; }; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index a2aef900519c..fb6fb6294eba 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -5,6 +5,7 @@ #include #include #include +#include #include #include #include @@ -3588,6 +3589,8 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type) return BPF_PROG_TYPE_XDP; case BPF_LSM_CGROUP: return BPF_PROG_TYPE_LSM; + case BPF_OOM_POLICY: + return BPF_PROG_TYPE_OOM_POLICY; default: return BPF_PROG_TYPE_UNSPEC; } @@ -3634,6 +3637,9 @@ static int bpf_prog_attach(const union bpf_attr *attr) case BPF_PROG_TYPE_FLOW_DISSECTOR: ret = netns_bpf_prog_attach(attr, prog); break; + case BPF_PROG_TYPE_OOM_POLICY: + ret = oom_policy_prog_attach(attr, prog); + break; case BPF_PROG_TYPE_CGROUP_DEVICE: case BPF_PROG_TYPE_CGROUP_SKB: case BPF_PROG_TYPE_CGROUP_SOCK: @@ -3676,6 +3682,8 @@ static int bpf_prog_detach(const union bpf_attr *attr) return lirc_prog_detach(attr); case BPF_PROG_TYPE_FLOW_DISSECTOR: return netns_bpf_prog_detach(attr, ptype); + case BPF_PROG_TYPE_OOM_POLICY: + return oom_policy_prog_detach(attr); case BPF_PROG_TYPE_CGROUP_DEVICE: case BPF_PROG_TYPE_CGROUP_SKB: case BPF_PROG_TYPE_CGROUP_SOCK: @@ -3733,6 +3741,8 @@ static int bpf_prog_query(const union bpf_attr *attr, case BPF_FLOW_DISSECTOR: case BPF_SK_LOOKUP: return netns_bpf_prog_query(attr, uattr); + case BPF_OOM_POLICY: + return oom_policy_prog_query(attr, uattr); case BPF_SK_SKB_STREAM_PARSER: case BPF_SK_SKB_STREAM_VERDICT: case BPF_SK_MSG_VERDICT: diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 612b5597d3af..01af8adaa16c 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -19,6 +19,7 @@ */ #include +#include #include #include #include @@ -73,6 +74,9 @@ static inline bool is_memcg_oom(struct oom_control *oc) return oc->memcg != NULL; } +DEFINE_MUTEX(oom_policy_lock); +static struct bpf_oom_policy global_oom_policy; + #ifdef CONFIG_NUMA /** * oom_cpuset_eligible() - check task eligibility for kill @@ -1258,3 +1262,167 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags) return -ENOSYS; #endif /* CONFIG_MMU */ } + +const struct bpf_prog_ops oom_policy_prog_ops = { +}; + +static const struct bpf_func_proto * +oom_policy_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +{ + return bpf_base_func_proto(func_id); +} + +static bool oom_policy_is_valid_access(int off, int size, + enum bpf_access_type type, + const struct bpf_prog *prog, + struct bpf_insn_access_aux *info) +{ + if (off < 0 || off + size > sizeof(struct bpf_oom_ctx) || off % size) + return false; + + switch (off) { + case bpf_ctx_range(struct bpf_oom_ctx, cg_id_1): + case bpf_ctx_range(struct bpf_oom_ctx, cg_id_2): + if (type != BPF_READ) + return false; + bpf_ctx_record_field_size(info, sizeof(__u64)); + return bpf_ctx_narrow_access_ok(off, size, sizeof(__u64)); + case bpf_ctx_range(struct bpf_oom_ctx, cmp_ret): + if (type == BPF_READ) { + bpf_ctx_record_field_size(info, sizeof(__u8)); + return bpf_ctx_narrow_access_ok(off, size, sizeof(__u8)); + } else { + return size == sizeof(__u8); + } + default: + return false; + } +} + +const struct bpf_verifier_ops oom_policy_verifier_ops = { + .get_func_proto = oom_policy_func_proto, + .is_valid_access = oom_policy_is_valid_access, +}; + +#define BPF_MAX_PROGS 10 + +int oom_policy_prog_attach(const union bpf_attr *attr, struct bpf_prog *prog) +{ + struct bpf_prog_array *old_array; + struct bpf_prog_array *new_array; + int ret; + + mutex_lock(&oom_policy_lock); + old_array = rcu_dereference(global_oom_policy.progs); + if (old_array && bpf_prog_array_length(old_array) >= BPF_MAX_PROGS) { + ret = -E2BIG; + goto unlock; + } + ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array); + if (ret < 0) + goto unlock; + + rcu_assign_pointer(global_oom_policy.progs, new_array); + bpf_prog_array_free(old_array); + +unlock: + mutex_unlock(&oom_policy_lock); + return ret; +} + +static int detach_prog(struct bpf_prog *prog) +{ + struct bpf_prog_array *old_array; + struct bpf_prog_array *new_array; + int ret; + + mutex_lock(&oom_policy_lock); + old_array = rcu_dereference(global_oom_policy.progs); + ret = bpf_prog_array_copy(old_array, prog, NULL, 0, &new_array); + + if (ret) + goto unlock; + + rcu_assign_pointer(global_oom_policy.progs, new_array); + bpf_prog_array_free(old_array); + bpf_prog_put(prog); +unlock: + mutex_unlock(&oom_policy_lock); + return ret; +} + +int oom_policy_prog_detach(const union bpf_attr *attr) +{ + struct bpf_prog *prog; + int ret; + + if (attr->attach_flags) + return -EINVAL; + + prog = bpf_prog_get_type(attr->attach_bpf_fd, + BPF_PROG_TYPE_OOM_POLICY); + if (IS_ERR(prog)) + return PTR_ERR(prog); + + ret = detach_prog(prog); + bpf_prog_put(prog); + + return ret; +} + +int oom_policy_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr) +{ + __u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids); + struct bpf_prog_array *progs; + u32 cnt, flags; + int ret = 0; + + if (attr->query.query_flags) + return -EINVAL; + + mutex_lock(&oom_policy_lock); + progs = rcu_dereference(global_oom_policy.progs); + cnt = progs ? bpf_prog_array_length(progs) : 0; + if (copy_to_user(&uattr->query.prog_cnt, &cnt, sizeof(cnt))) { + ret = -EFAULT; + goto unlock; + } + if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags))) { + ret = -EFAULT; + goto unlock; + } + if (attr->query.prog_cnt != 0 && prog_ids && cnt) + ret = bpf_prog_array_copy_to_user(progs, prog_ids, + attr->query.prog_cnt); + +unlock: + mutex_unlock(&oom_policy_lock); + return ret; +} + +int __bpf_run_oom_policy(u64 cg_id_1, u64 cg_id_2) +{ + struct bpf_oom_ctx ctx = { + .cg_id_1 = cg_id_1, + .cg_id_2 = cg_id_2, + .cmp_ret = BPF_OOM_CMP_EQUAL, + }; + rcu_read_lock(); + bpf_prog_run_array(rcu_dereference(global_oom_policy.progs), + &ctx, bpf_prog_run); + rcu_read_unlock(); + return ctx.cmp_ret; +} + +bool bpf_oom_policy_enabled(void) +{ + struct bpf_prog_array *prog_array; + bool empty = true; + + rcu_read_lock(); + prog_array = rcu_dereference(global_oom_policy.progs); + if (prog_array) + empty = bpf_prog_array_is_empty(prog_array); + rcu_read_unlock(); + return !empty; +} From patchwork Thu Jul 27 07:36:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 126771 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a985:0:b0:3e4:2afc:c1 with SMTP id t5csp936818vqo; Thu, 27 Jul 2023 01:12:35 -0700 (PDT) X-Google-Smtp-Source: APBJJlF/Cvp507DMJJsSImmCx9N56uTpYvp4bhIb3X0vxl0EUDoLf/UvWPy9QSul17S5PT5KhajD X-Received: by 2002:a05:6358:9308:b0:134:e422:c500 with SMTP id x8-20020a056358930800b00134e422c500mr1941726rwa.27.1690445555199; Thu, 27 Jul 2023 01:12:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690445555; cv=none; d=google.com; s=arc-20160816; b=P84fjorvZze7Ioa9gf56o40aokPV+b2rlJnZgJ2KiIPEs8Swd8qzOqTR6cN3AWxHi+ cgBU37qAgp6l7BRCHSDHH5NN2IyQ7Y/nHMVaAkyC4CcXnCvbczrP6l4QrK3/GlcSKil4 CCcQcM38kbsO+j6YZHj85LpfmBTM56e213Aqf+nkgchcPmvn9xaGwlPQg9xKxzgY58ep iQ/xBvFC06M0QWcPj1lpg6mABPefxzRYiGtrYt5TzvPQCWcIghVITgRBhqueHK+dXs6U 9SmwVk/z+eXzsuQblCf4aapRVZ8Agl8GoBxKcZwRPpGG/iK7zbguef4oD8YZRFKVwUcX TSBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=s3P4jk/DjTpvx9PdAJj920ScGLlcJW6HGkHJHTkHl6A=; fh=hkVHq7oq/x6T47Z/W+8iW7aAYD01qDHXzBTbsOMDe4M=; b=utPML6qZKgu47QrQk7XYhoxftYKQdP58OccuBj6hQom5wM+oJKLGB6QaXxN2oRMONn Vd/nEIQ+rsjhERA1aeZ9pB+bcPrQrT+fMWIi/PHeDc1OOJpGMFZ/lpt4xY0nGkgIJV1R SUA3gEiSyGWWzwGq5SxG0hdCHaJb5885SdbrlMwOr8CSbMQRUrmDou7v5qBBrQ155tJq mIaqD/8LtgM1nZTYvi3s7aJL7M6gZppVAMiHdTzwkJDlaENfruE7jZO5VmYi3REX/wuZ q21EpPfiC8Ciw7Z0KMjfpzuPYhe8o/JDzS5Ru3B+IGVbnZLf2WvinQEXgNt8A7m0PvJ4 Oyng== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=S7FhqPdC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s5-20020a656905000000b0054a291a0bf9si874106pgq.672.2023.07.27.01.12.22; Thu, 27 Jul 2023 01:12:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=S7FhqPdC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231650AbjG0HoJ (ORCPT + 99 others); Thu, 27 Jul 2023 03:44:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50494 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233699AbjG0Hnh (ORCPT ); Thu, 27 Jul 2023 03:43:37 -0400 Received: from mail-oo1-xc2b.google.com (mail-oo1-xc2b.google.com [IPv6:2607:f8b0:4864:20::c2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF98965B9 for ; Thu, 27 Jul 2023 00:37:19 -0700 (PDT) Received: by mail-oo1-xc2b.google.com with SMTP id 006d021491bc7-56c42bb70abso471106eaf.3 for ; Thu, 27 Jul 2023 00:37:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690443437; x=1691048237; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=s3P4jk/DjTpvx9PdAJj920ScGLlcJW6HGkHJHTkHl6A=; b=S7FhqPdC6B45lRXe055P5PaVB94tznGk9FrHIpoUyTL61ml7EZmcmoma4G1m8PTLNX qNy+xG/t8mZ8BQRoO2wrIto0n1SbVsyupw1IUQDbWg0ss/fGxYr2YQrHj2LrK0eZM3mB sijyrWeRuFgWJUOYkTfB86DrJRKy+rO0LluT1ezkA07OZB9v+0ToMXPEJMqFGmU0CxQG QgVQPZXhXjWdhJP+Fmn1qGbZuOKCLTw+7cj5vCjF2x7/ezfUNsHM6sP0vFj38UxZ0anZ j695g+6unGfMklS8VYfNLUkSH5FMlfOF7hSnLmjmbwri1KBrbOCAgTOH+vNyhE6URK9S DZgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690443437; x=1691048237; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=s3P4jk/DjTpvx9PdAJj920ScGLlcJW6HGkHJHTkHl6A=; b=QyKqZUTDbzdbKt2Q41hlDw/TFfI+3M52CvjZZRZSnUygmBH9/oITMkDpsEJnO+O4zz G89/QyNV4Kbfunsk8z3KM1lIkwSpL9/M19oVi8D3gr4q49VKlm9XeZqMs4g+13xMc3cP UnMV57uCuUrUF6sBIs4p84sKbjNX33yyjtQApOESDzTtwIeHcaFn7rOH/h4Om8TOLdbM Q1Lv7LsLrMJkWtstCvvuZYkS1LBoM+7rSkqCg0F9kWco4T1HepBlcjOi6myEfLAcW0BZ jN1ZbxBS4kkXWS4tqG+BKq+PqZRh1qV7svptUM0U0V/LhwwC2KM0xlGtUnQ0zbVD8Z9I KHrQ== X-Gm-Message-State: ABy/qLYiF1aAIEYZPAgP+7XluQZbsKHcjd/bGrIIg/mEcZVi1oKn3Q3Q NDfn9zv4QAEZdKQHNIgiNf0dfQ== X-Received: by 2002:a05:6358:5915:b0:135:4003:7851 with SMTP id g21-20020a056358591500b0013540037851mr2343798rwf.19.1690443437625; Thu, 27 Jul 2023 00:37:17 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.134]) by smtp.gmail.com with ESMTPSA id s196-20020a6377cd000000b005638a70110bsm733919pgc.65.2023.07.27.00.37.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Jul 2023 00:37:17 -0700 (PDT) From: Chuyi Zhou To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, Chuyi Zhou Subject: [RFC PATCH 2/5] mm: Select victim memcg using bpf prog Date: Thu, 27 Jul 2023 15:36:29 +0800 Message-Id: <20230727073632.44983-3-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230727073632.44983-1-zhouchuyi@bytedance.com> References: <20230727073632.44983-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772560638045817690 X-GMAIL-MSGID: 1772560638045817690 This patch use BPF prog to bypass the default select_bad_process method and select a victim memcg when gobal oom is invoked. Specifically, we iterate root_mem_cgroup's children and select a next iteration root through __bpf_run_oom_policy(). Repeat until we finally find a leaf memcg in the last layer. Then we use oom_evaluate_task() to find a victim task in the selected memcg. If there are no suitable process to be killed in the memcg, we go back to the default method. Suggested-by: Abel Wu Signed-off-by: Chuyi Zhou --- include/linux/memcontrol.h | 6 +++++ mm/memcontrol.c | 50 ++++++++++++++++++++++++++++++++++++++ mm/oom_kill.c | 17 +++++++++++++ 3 files changed, 73 insertions(+) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 5818af8eca5a..7fedc2521c8b 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1155,6 +1155,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, gfp_t gfp_mask, unsigned long *total_scanned); +struct mem_cgroup *select_victim_memcg(void); #else /* CONFIG_MEMCG */ #define MEM_CGROUP_ID_SHIFT 0 @@ -1588,6 +1589,11 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, { return 0; } + +static inline struct mem_cgroup *select_victim_memcg(void) +{ + return NULL; +} #endif /* CONFIG_MEMCG */ static inline void __inc_lruvec_kmem_state(void *p, enum node_stat_item idx) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e8ca4bdcb03c..c6b42635f1af 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -64,6 +64,7 @@ #include #include #include +#include #include "internal.h" #include #include @@ -2638,6 +2639,55 @@ void mem_cgroup_handle_over_high(void) css_put(&memcg->css); } +struct mem_cgroup *select_victim_memcg(void) +{ + struct cgroup_subsys_state *pos, *parent, *victim; + struct mem_cgroup *victim_memcg; + + parent = &root_mem_cgroup->css; + victim_memcg = NULL; + + if (!cgroup_subsys_on_dfl(memory_cgrp_subsys)) + return NULL; + + rcu_read_lock(); + while (parent) { + struct cgroup_subsys_state *chosen = NULL; + struct mem_cgroup *pos_mem, *chosen_mem; + u64 chosen_id, pos_id; + int cmp_ret; + + victim = parent; + + list_for_each_entry_rcu(pos, &parent->children, sibling) { + pos_id = cgroup_id(pos->cgroup); + if (!chosen) + goto chose; + + cmp_ret = __bpf_run_oom_policy(chosen_id, pos_id); + if (cmp_ret == BPF_OOM_CMP_GREATER) + continue; + if (cmp_ret == BPF_OOM_CMP_EQUAL) { + pos_mem = mem_cgroup_from_css(pos); + chosen_mem = mem_cgroup_from_css(chosen); + if (page_counter_read(&pos_mem->memory) <= + page_counter_read(&chosen_mem->memory)) + continue; + } +chose: + chosen = pos; + chosen_id = pos_id; + } + parent = chosen; + } + + if (victim && css_tryget(victim)) + victim_memcg = mem_cgroup_from_css(victim); + rcu_read_unlock(); + + return victim_memcg; +} + static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, unsigned int nr_pages) { diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 01af8adaa16c..b88c8c7d4ee4 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -361,6 +361,19 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) return 1; } +static bool bpf_select_bad_process(struct oom_control *oc) +{ + struct mem_cgroup *victim_memcg; + + victim_memcg = select_victim_memcg(); + if (victim_memcg) { + mem_cgroup_scan_tasks(victim_memcg, oom_evaluate_task, oc); + css_put(&victim_memcg->css); + } + + return !!oc->chosen; +} + /* * Simple selection loop. We choose the process with the highest number of * 'points'. In case scan was aborted, oc->chosen is set to -1. @@ -372,6 +385,9 @@ static void select_bad_process(struct oom_control *oc) if (is_memcg_oom(oc)) mem_cgroup_scan_tasks(oc->memcg, oom_evaluate_task, oc); else { + if (bpf_oom_policy_enabled() && bpf_select_bad_process(oc)) + return; + struct task_struct *p; rcu_read_lock(); @@ -1426,3 +1442,4 @@ bool bpf_oom_policy_enabled(void) rcu_read_unlock(); return !empty; } + From patchwork Thu Jul 27 07:36:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 126763 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a985:0:b0:3e4:2afc:c1 with SMTP id t5csp929616vqo; Thu, 27 Jul 2023 00:58:09 -0700 (PDT) X-Google-Smtp-Source: APBJJlE27GRb9Ln/hsx9Mzx/IPHN/cV53sJZCN1pa8tAouMqDfQ+a8v52qfDqXWmA6XGcHKuKwI4 X-Received: by 2002:a17:906:9f25:b0:99b:b2fd:4bfb with SMTP id fy37-20020a1709069f2500b0099bb2fd4bfbmr1162928ejc.32.1690444689469; Thu, 27 Jul 2023 00:58:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690444689; cv=none; d=google.com; s=arc-20160816; b=D/SJ1lNYMAVHf+dmOxgS7ZTJiZ6zokN6X+PpAifbvxCq4sDSjxwAWi8DUCxx2uO6QB DlCbqhw0SkZrIm/GctubGQQD55g0ckEr6ZpdHBzbnRvzMkpn+og9KV9oJ3+NfUbG7nNY LidKxH5fNFBeChbyZ64538Q2eVy8IEKXdkXNePeVfrsw7zNpWTcSV1W6LvkVVDNNM622 EnJxAUMJD93pdCs7i8h+0PLDSEQM8CS8+erbqC+/ikAd5xexq0LLKCzKgqDcuboP3DaU LzOF3Yt2WAGzOenXgEUyIX9ilvq4ScFOQfmbMLZEpDCEhdiuqjVPaPHUK0NckLBPrgk5 vVhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=r//0VfNIsSS7LjEKz2w3sxhtDAhJR7x5LJkBswgmGuQ=; fh=hkVHq7oq/x6T47Z/W+8iW7aAYD01qDHXzBTbsOMDe4M=; b=pSyz+37NfEneV8sUSMjCZdMQ1uAyVV3aQ6Drb8aS2qatDSCx+Fbf0TuVyd+P/c9BQD z6vIZFDdOh7lSeVPe1j2Ul3ciMTaLXSGzt4R3hPrsB4OJuCr5QfodPVigL3BDId8hqqE OssDuHRfSWXCYzw7Xefw1Ibr1kjC+mUSrhzwD2b7ldZ+EG1ufbJYjjNTobv8W6aYv16j YgU1o+JfnM4kZDb0zn4+wnM5MPG2lqYsUEYgDAQulmyS9pnzRpEj4CAdCuguIyGs1B3x jfocFnacUNZCF8/MflR9DULWikL12VJRyfAy1t1L0k/3yT8XzILqGgNd7wM+uYCQYOTv 4pcQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=bvNQcm1t; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id fi1-20020a170906da0100b0099bcfa13607si651366ejb.586.2023.07.27.00.57.45; Thu, 27 Jul 2023 00:58:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=bvNQcm1t; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233161AbjG0Ho0 (ORCPT + 99 others); Thu, 27 Jul 2023 03:44:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52926 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233742AbjG0Hnn (ORCPT ); Thu, 27 Jul 2023 03:43:43 -0400 Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 00A2F6A4D for ; Thu, 27 Jul 2023 00:37:24 -0700 (PDT) Received: by mail-pf1-x436.google.com with SMTP id d2e1a72fcca58-686f94328a4so119405b3a.0 for ; Thu, 27 Jul 2023 00:37:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690443442; x=1691048242; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=r//0VfNIsSS7LjEKz2w3sxhtDAhJR7x5LJkBswgmGuQ=; b=bvNQcm1txH3rhOO7bm1O9db1gBgcBMCQs09BOjh+n6R3VBja2J59RqrtgNWO9awRAF EmhJsX7mQDb9SHJfhSrkICHn+WQAe+gtR3inPpRAzBY6qQm1agnybe8DuKumdO5jlFKe 3YHkH7nP76dnwLFRRldyNlJof4ZD7ZNlpVEshWiVntSoFuZJgffmlqMqIhPq7i/8LGKD cnP86b8NCN8urnuP2ojgUdnkOLytJMwGVrSs3Rp5ZqjCx+OJDr2iX6bR9VytoWWz39b+ J/IBNZIBta/QlN/Xir19mHZ05UZLuiWy9NoVByRL9UYOatX6ERlOMwp7Y8Ubo90+OnSE +4Sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690443442; x=1691048242; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r//0VfNIsSS7LjEKz2w3sxhtDAhJR7x5LJkBswgmGuQ=; b=OmGrymshFZCAK5NDYY3SN9Yh0D89B1v7nwVMP6TEWxc5ymdCpgrPVMWRg1ld7V6Pjf hqi/3fCpWSXVw/M3twRbIC8bKf+QyK+Rn7EaIPjKX4Crya0zsNS7Rpj96oj5er8fd3MB FejwT8HyuYV2uWTQiTTyYVlZmAbOBOoH7wxGAJFQ/Kj73E5gSIfmruNVV6TXEDU0qMYL tlDGGtxtd6VnnvdBqOw3mGmkmlt2LSmY1ncUxghyxpSRJts+Py89eFLXTeHZcXBbdFoz BpN2cOIb36RB1Xthn/j+OKQvvlhWJ0v0UygrEQjFqt09zJyyVQ6VsfTo0afNhJUE75MX 5Wjw== X-Gm-Message-State: ABy/qLZ+pEGF4wdgBlGhGH1M6FK/k29PrWDe1hC3yjNaLzCExUfQDPV/ cyiFEhtguBzI/RlgZm+RkerSCw== X-Received: by 2002:a05:6a21:329d:b0:13a:cfdf:d7a1 with SMTP id yt29-20020a056a21329d00b0013acfdfd7a1mr2311681pzb.2.1690443442119; Thu, 27 Jul 2023 00:37:22 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.134]) by smtp.gmail.com with ESMTPSA id s196-20020a6377cd000000b005638a70110bsm733919pgc.65.2023.07.27.00.37.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Jul 2023 00:37:21 -0700 (PDT) From: Chuyi Zhou To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, Chuyi Zhou Subject: [RFC PATCH 3/5] libbpf, bpftool: Support BPF_PROG_TYPE_OOM_POLICY Date: Thu, 27 Jul 2023 15:36:30 +0800 Message-Id: <20230727073632.44983-4-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230727073632.44983-1-zhouchuyi@bytedance.com> References: <20230727073632.44983-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772559731164229932 X-GMAIL-MSGID: 1772559731164229932 Support BPF_PROG_TYPE_OOM_POLICY program in libbpf and bpftool, so that we can identify and use BPF_PROG_TYPE_OOM_POLICY in our application. Signed-off-by: Chuyi Zhou --- tools/bpf/bpftool/common.c | 1 + tools/include/uapi/linux/bpf.h | 14 ++++++++++++++ tools/lib/bpf/libbpf.c | 3 +++ tools/lib/bpf/libbpf_probes.c | 2 ++ 4 files changed, 20 insertions(+) diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c index cc6e6aae2447..c5c311299c4a 100644 --- a/tools/bpf/bpftool/common.c +++ b/tools/bpf/bpftool/common.c @@ -1089,6 +1089,7 @@ const char *bpf_attach_type_input_str(enum bpf_attach_type t) case BPF_TRACE_FENTRY: return "fentry"; case BPF_TRACE_FEXIT: return "fexit"; case BPF_MODIFY_RETURN: return "mod_ret"; + case BPF_OOM_POLICY: return "oom_policy"; case BPF_SK_REUSEPORT_SELECT: return "sk_skb_reuseport_select"; case BPF_SK_REUSEPORT_SELECT_OR_MIGRATE: return "sk_skb_reuseport_select_or_migrate"; default: return libbpf_bpf_attach_type_str(t); diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 60a9d59beeab..9da0d61cf703 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -987,6 +987,7 @@ enum bpf_prog_type { BPF_PROG_TYPE_SK_LOOKUP, BPF_PROG_TYPE_SYSCALL, /* a program that can execute syscalls */ BPF_PROG_TYPE_NETFILTER, + BPF_PROG_TYPE_OOM_POLICY, }; enum bpf_attach_type { @@ -1036,6 +1037,7 @@ enum bpf_attach_type { BPF_LSM_CGROUP, BPF_STRUCT_OPS, BPF_NETFILTER, + BPF_OOM_POLICY, __MAX_BPF_ATTACH_TYPE }; @@ -6825,6 +6827,18 @@ struct bpf_cgroup_dev_ctx { __u32 minor; }; +enum { + BPF_OOM_CMP_EQUAL = (1ULL << 0), + BPF_OOM_CMP_GREATER = (1ULL << 1), + BPF_OOM_CMP_LESS = (1ULL << 2), +}; + +struct bpf_oom_ctx { + __u64 cg_id_1; + __u64 cg_id_2; + __u8 cmp_ret; +}; + struct bpf_raw_tracepoint_args { __u64 args[0]; }; diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 214f828ece6b..10496bb9b3bc 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -118,6 +118,7 @@ static const char * const attach_type_name[] = { [BPF_TRACE_KPROBE_MULTI] = "trace_kprobe_multi", [BPF_STRUCT_OPS] = "struct_ops", [BPF_NETFILTER] = "netfilter", + [BPF_OOM_POLICY] = "oom_policy", }; static const char * const link_type_name[] = { @@ -204,6 +205,7 @@ static const char * const prog_type_name[] = { [BPF_PROG_TYPE_SK_LOOKUP] = "sk_lookup", [BPF_PROG_TYPE_SYSCALL] = "syscall", [BPF_PROG_TYPE_NETFILTER] = "netfilter", + [BPF_PROG_TYPE_OOM_POLICY] = "oom_policy", }; static int __base_pr(enum libbpf_print_level level, const char *format, @@ -8738,6 +8740,7 @@ static const struct bpf_sec_def section_defs[] = { SEC_DEF("struct_ops.s+", STRUCT_OPS, 0, SEC_SLEEPABLE), SEC_DEF("sk_lookup", SK_LOOKUP, BPF_SK_LOOKUP, SEC_ATTACHABLE), SEC_DEF("netfilter", NETFILTER, BPF_NETFILTER, SEC_NONE), + SEC_DEF("oom_policy", OOM_POLICY, BPF_OOM_POLICY, SEC_ATTACHABLE_OPT), }; static size_t custom_sec_def_cnt; diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c index 9c4db90b92b6..dbac3e98a2d7 100644 --- a/tools/lib/bpf/libbpf_probes.c +++ b/tools/lib/bpf/libbpf_probes.c @@ -129,6 +129,8 @@ static int probe_prog_load(enum bpf_prog_type prog_type, case BPF_PROG_TYPE_LIRC_MODE2: opts.expected_attach_type = BPF_LIRC_MODE2; break; + case BPF_PROG_TYPE_OOM_POLICY: + opts.expected_attach_type = BPF_OOM_POLICY; case BPF_PROG_TYPE_TRACING: case BPF_PROG_TYPE_LSM: opts.log_buf = buf; From patchwork Thu Jul 27 07:36:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 126765 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a985:0:b0:3e4:2afc:c1 with SMTP id t5csp929710vqo; Thu, 27 Jul 2023 00:58:24 -0700 (PDT) X-Google-Smtp-Source: APBJJlEZyGODOLIUNSwo4P2v6J+YTJr7wtv3bafMFCo+5PSp5lxkjz6wMzsE78CCpjfY09dIERva X-Received: by 2002:a17:906:64c6:b0:99b:65fa:fc24 with SMTP id p6-20020a17090664c600b0099b65fafc24mr1425546ejn.36.1690444704588; Thu, 27 Jul 2023 00:58:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690444704; cv=none; d=google.com; s=arc-20160816; b=ZG/K9RRfB++wXMR5wdl0vRl5CBapq70mVKXlzHoiUzH4gS/8EMVzUv7BiQbfjSoC1l IJCJPvFtmilJSuvs9qdy5sRrAr0/WNIUWG/irdafLiw1iWoIwco+2ayMTWGDaPgBMdek 2C5M42wPmPUeBABm3ZZdP5PhlV6fCYaQjHfWmnipXuSeC+6Dp0QTFnOkY3yVPWonRQ5n TLgHg9LtPSha1ldWOU87/e0ALXujwg/rRy5Rm6ksg3IlOqS0PIzeKo1PE1+UkQTYCAvl 1l3TKwsxCZIqu5sAsSJ2OlXc0MNK/+7I3uMHqq2XNhqCZPzxwbIo15+9YZ1ChlK1dl8H mlSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Uv3OYCgolDRnDe8sjWn08GTjavdgRys1P8gpfRFXymU=; fh=hkVHq7oq/x6T47Z/W+8iW7aAYD01qDHXzBTbsOMDe4M=; b=H5dOjZGN4XjES0mTWHSqvoSKacQiQqOfJ4ICWKiFZkvf15JjA9YXbeX9RIvMwM8Pdn lzYfe+qwvQE57mc7KFDQA5rqqm2np9KqeD43o/brM2jpXnjE/f5eWB6cnj3umesV9PHn KPetiAibFqgYDKBVsewlihNdpeDiG+WQQw16w5xRlnlwwfIgg9azuc2r2rn4vMObf6Gi TDi3snAN8/j2bfpm973BTNmp2kwk3WQGQKv27VP2ZHRlADDadJAnAaUkS1ZCRolMX31v xbblDNgFTdBLCwJpAesvVvzt098MLOMrt3jNNnlQ6daii5Ur4yD/h6EXK/ayQKxMx/Kj WNLg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=GPxKU1wL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e19-20020a170906249300b0099bc2f8e22esi627239ejb.599.2023.07.27.00.58.00; Thu, 27 Jul 2023 00:58:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=GPxKU1wL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233437AbjG0Hos (ORCPT + 99 others); Thu, 27 Jul 2023 03:44:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233650AbjG0HnU (ORCPT ); Thu, 27 Jul 2023 03:43:20 -0400 Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABFBC4C0A for ; Thu, 27 Jul 2023 00:37:29 -0700 (PDT) Received: by mail-oi1-x22e.google.com with SMTP id 5614622812f47-3a38953c928so601645b6e.1 for ; Thu, 27 Jul 2023 00:37:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690443447; x=1691048247; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Uv3OYCgolDRnDe8sjWn08GTjavdgRys1P8gpfRFXymU=; b=GPxKU1wLA/ico6qT1nEEu7gHQfgASi8rquE1AHhcXYJT1YIgQWCl0gYkO+q7Q4Ajd8 ZGhT1ge1OlP7dnRkKC9NzY2sUvhgM/h6rPXqBu3xK1S9r98sHCrdmoTvcTfANYWZBoLd xfVIwZRmn2yWUEfGaYPS4U465p+g49PadC9fcYd0hx5ojquQUhWVeEKTqP+pZ9xvinLc bNKbsys0ZIGHmQCTT0vWBHPadAdI12u2RXt8ygcYMQkFpLj8uIdMWG4xFoQS93HgAPsT PwYp3/fy/qrx1r2C2MV0kCEoJMDJandD7xo85GeTJRyvBJxZslJGlHE1vTxBwPd27657 T+pw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690443447; x=1691048247; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Uv3OYCgolDRnDe8sjWn08GTjavdgRys1P8gpfRFXymU=; b=JHbQfvkrFEKjH0JaFaOeThVbgsnmlgFoggfq4LsnYZuLXcsBpVWyXCcJpndfk4FeWA m75WXzKZWLLTV2qV2NRaJxchqhcXSy2P2LTWB5fULMlODVG7PzY9rnOK0pZFj1jjg6t+ hnoggyealKlfmm9Q98FFlxo4mdZiOlbUVh+5RmYcx/07HdUzcKhLMLl8GduONm9YjRFB T3oOvKIiu0OGCXOr5jO5Yx2THT7kdClBuzjEyRu0FqipbUlDIO9Ah2Y/LI1sWv4fB8bQ vMdaYA5KYDdjwpoZoTzBu183/0kbwxDBopjeVNpvtd+eUbDHs2/ThTzHa/iZZy/hPFAu OoOw== X-Gm-Message-State: ABy/qLYtVz+cgjOVgqhypcxp+whiDyvH92GkheNsLhAl05pwtPe5cJsN 0OAe980VxKETBWan7HC7kXRuwSBJr2dZzA3CUqAsgQ== X-Received: by 2002:a05:6808:1393:b0:3a5:ca93:fb69 with SMTP id c19-20020a056808139300b003a5ca93fb69mr2618037oiw.55.1690443447035; Thu, 27 Jul 2023 00:37:27 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.134]) by smtp.gmail.com with ESMTPSA id s196-20020a6377cd000000b005638a70110bsm733919pgc.65.2023.07.27.00.37.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Jul 2023 00:37:26 -0700 (PDT) From: Chuyi Zhou To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, Chuyi Zhou Subject: [RFC PATCH 4/5] bpf: Add a new bpf helper to get cgroup ino Date: Thu, 27 Jul 2023 15:36:31 +0800 Message-Id: <20230727073632.44983-5-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230727073632.44983-1-zhouchuyi@bytedance.com> References: <20230727073632.44983-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772559746475424124 X-GMAIL-MSGID: 1772559746475424124 This patch adds a new bpf helper bpf_get_ino_from_cgroup_id, so that we can get the inode number once we know the cgroup id. Cgroup_id is used to identify a cgroup in BPF prog. However we can't get the cgroup id directly in userspace applications. In userspace, we are used to identifying cgroups by their paths or their inodes. However, cgroup id is not always equal to the inode number, depending on the sizeof ino_t. For example, given some cgroup paths, we only care about the events related to those cgroups. We can only do this by updating these paths in a map and doing string comparison in BPF prog, which is not very convenient. However with this new helper, we just need to record the inode in a map and lookup a inode number in BPF prog. Signed-off-by: Chuyi Zhou --- include/uapi/linux/bpf.h | 7 +++++++ kernel/bpf/core.c | 1 + kernel/bpf/helpers.c | 17 +++++++++++++++++ tools/include/uapi/linux/bpf.h | 7 +++++++ 4 files changed, 32 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 9da0d61cf703..01efb289fa14 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -5575,6 +5575,12 @@ union bpf_attr { * 0 on success. * * **-ENOENT** if the bpf_local_storage cannot be found. + * + * u64 bpf_get_ino_from_cgroup_id(u64 id) + * Description + * Get inode number from a *cgroup id*. + * Return + * Inode number. */ #define ___BPF_FUNC_MAPPER(FN, ctx...) \ FN(unspec, 0, ##ctx) \ @@ -5789,6 +5795,7 @@ union bpf_attr { FN(user_ringbuf_drain, 209, ##ctx) \ FN(cgrp_storage_get, 210, ##ctx) \ FN(cgrp_storage_delete, 211, ##ctx) \ + FN(get_ino_from_cgroup_id, 212, ##ctx) \ /* */ /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index dc85240a0134..49dfdb2dd336 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2666,6 +2666,7 @@ const struct bpf_func_proto bpf_snprintf_btf_proto __weak; const struct bpf_func_proto bpf_seq_printf_btf_proto __weak; const struct bpf_func_proto bpf_set_retval_proto __weak; const struct bpf_func_proto bpf_get_retval_proto __weak; +const struct bpf_func_proto bpf_get_ino_from_cgroup_id_proto __weak; const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void) { diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 9e80efa59a5d..e87328b008d3 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -433,6 +433,21 @@ const struct bpf_func_proto bpf_get_current_ancestor_cgroup_id_proto = { .ret_type = RET_INTEGER, .arg1_type = ARG_ANYTHING, }; + +BPF_CALL_1(bpf_get_ino_from_cgroup_id, u64, id) +{ + u64 ino = kernfs_id_ino(id); + + return ino; +} + +const struct bpf_func_proto bpf_get_ino_from_cgroup_id_proto = { + .func = bpf_get_ino_from_cgroup_id, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_ANYTHING, +}; + #endif /* CONFIG_CGROUPS */ #define BPF_STRTOX_BASE_MASK 0x1F @@ -1767,6 +1782,8 @@ bpf_base_func_proto(enum bpf_func_id func_id) return &bpf_get_current_cgroup_id_proto; case BPF_FUNC_get_current_ancestor_cgroup_id: return &bpf_get_current_ancestor_cgroup_id_proto; + case BPF_FUNC_get_ino_from_cgroup_id: + return &bpf_get_ino_from_cgroup_id_proto; #endif default: break; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 9da0d61cf703..661d97aacb85 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -5575,6 +5575,12 @@ union bpf_attr { * 0 on success. * * **-ENOENT** if the bpf_local_storage cannot be found. + * + * u64 bpf_get_ino_from_cgroup_id(u64 id) + * Description + * Get inode number from a *cgroup id*. + * Return + * Inode number. */ #define ___BPF_FUNC_MAPPER(FN, ctx...) \ FN(unspec, 0, ##ctx) \ @@ -5789,6 +5795,7 @@ union bpf_attr { FN(user_ringbuf_drain, 209, ##ctx) \ FN(cgrp_storage_get, 210, ##ctx) \ FN(cgrp_storage_delete, 211, ##ctx) \ + FN(get_ino_from_cgroup_id, 212, ##ctx) \ /* */ /* backwards-compatibility macros for users of __BPF_FUNC_MAPPER that don't From patchwork Thu Jul 27 07:36:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 126792 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a985:0:b0:3e4:2afc:c1 with SMTP id t5csp941024vqo; Thu, 27 Jul 2023 01:21:44 -0700 (PDT) X-Google-Smtp-Source: APBJJlGPFZE5RziWe10y0y2vPMANPgWLdph9YRuQWBdrJJ2DbAEOwbWBFQAA6vibQxDG581t4q12 X-Received: by 2002:a1f:5ccd:0:b0:481:3f5d:2667 with SMTP id q196-20020a1f5ccd000000b004813f5d2667mr708150vkb.15.1690446104104; Thu, 27 Jul 2023 01:21:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690446104; cv=none; d=google.com; s=arc-20160816; b=ifhSzenx8otw4JXye6bAMaxOMz+C426IdKB3tScCCoZKIzv+CHwY7cudhGKcqH3uKQ QFYlq9KahGsNDpQ8bAlwjm5Af+VlbFpqtpZXIZ4OUkKgDWgSO3CJ5xc1WhQqqoQS/ClJ HvgYZw09RK0beqHi2xsVYkTrS0/Mk3JBbH8f8oVzJSGx6CI9UL2czXpUs0UHMbMWUE9C 7LW0gi2waZh3RzGWILia2i+BeILZ70+mtZX0IIi/RghJG9z8Q1JtPd1CoOzP6Wt0fsUr V0RYaT+oYo0G13SvY76q3iCtJYDi8ZZ0SLEBNMnAMC7GBnhFC0irYXvjiP3fCWgmBubc u10Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=8GeXvqoGvKsq4eXzt40D65FoTpyg+BIztu+fO5t6OCk=; fh=hkVHq7oq/x6T47Z/W+8iW7aAYD01qDHXzBTbsOMDe4M=; b=fhQ8QME2BDQZJf/SYx4Apj010FdNDiCv7Yrn1ntjEObQiBgbbtYQMOBCVDTJggJ15D ntoR61KpXgeTX7SCHADjnh++GvTdfym0XVTLML+YEGAsw63upgwLJzA/s4aVf8CFEfRk gmZ1614Uzx9lUB1H+3G/ivagEzCkAzjkkgVpZ8BKE2arGC+V58kh55GtfCaj8idBCcIo 6AQUcJSCg/fbY8sPaMschpFTXo5cGXMOljkDubeBd07xJtcB5HASIu/y3TKNPUj7verr sgswMMwsQEoh/5kW7SHm/vtRd8T3h02m0P/pNWc3hwR6yLZfAK41Hdr8YJzX3JpI5HCE Ey1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=dB2YLL1o; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m8-20020a170902db0800b001b85ab48092si1029888plx.499.2023.07.27.01.21.29; Thu, 27 Jul 2023 01:21:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=dB2YLL1o; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231477AbjG0HoF (ORCPT + 99 others); Thu, 27 Jul 2023 03:44:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50384 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233743AbjG0Hnn (ORCPT ); Thu, 27 Jul 2023 03:43:43 -0400 Received: from mail-oi1-x230.google.com (mail-oi1-x230.google.com [IPv6:2607:f8b0:4864:20::230]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C5C6E6A67 for ; Thu, 27 Jul 2023 00:37:32 -0700 (PDT) Received: by mail-oi1-x230.google.com with SMTP id 5614622812f47-3a3373211a1so583781b6e.0 for ; Thu, 27 Jul 2023 00:37:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690443451; x=1691048251; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=8GeXvqoGvKsq4eXzt40D65FoTpyg+BIztu+fO5t6OCk=; b=dB2YLL1oG+E1Gu38lWcd1QP/sEw5hL8k3plmGS+U5Al3NgCvB7aFDUnC/qLQqQvAyh kqM/U2FAMGOjzP7vlJ9n8P/SZJBYH4bfLW1EvEcQ8MCmw9uS6Hw3GeRrODYe0L60MU+f YwmG9B/u7gii5/6i9CQRybNJCrNZSbgvV6gFZSnnm/ALyvFAo1eRCOpOrG5Uf3Y0sXmh 2I3HOZMpRa4VcT9RhhSZBKD/SwgNZyOHYpzTl1svA95LuQvNwJAifna4peCdWQqysLF3 X77a4V5GbW5Mh0b4YJ/zyr4l1aEKHsoaGgzHv09uKleh4kLjbrbougScHwlnxloBfpmd EACw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690443451; x=1691048251; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8GeXvqoGvKsq4eXzt40D65FoTpyg+BIztu+fO5t6OCk=; b=AJ0+h33SL2TSPHTqxbcx8mfNGMOkXE8mu7conpJ7QkYwP7a/ZSV5pgg1nbvAwwtvw+ +a9OBUI5FOIYgr4bO2dXqEXw3lmoxCqW5NLDIAK8/7cFKG+PjY9KFMQS41z1dzuR63b2 PTbF+7OD9P17EfXy1DeeMuLjHNS5Fo6+RvU+u6hZQL6wBT1rqy9DcEBG6wAebIuLB0Pd SJZboSnXwArMWG26LdiK10IA/Pgz0+Hz9xHvTGndF1b/f/YbJXgbiBnJ+kAkb3S1Mwjc JfscHierso2GnSl8P+9IK5rUr+JHH4FEoCodGS1RDepXY4Fq+yXvYjLWvTskTh0ToRjO CaSA== X-Gm-Message-State: ABy/qLY5+dndp+AxpL3E1hEY/Z+BzoVMX6AlL5mT27JL+ParoY96a6az DT+haiiuq1U95vSOIgGg7NtPuQ== X-Received: by 2002:aca:1a16:0:b0:3a1:acef:7e2c with SMTP id a22-20020aca1a16000000b003a1acef7e2cmr1775792oia.58.1690443451584; Thu, 27 Jul 2023 00:37:31 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.134]) by smtp.gmail.com with ESMTPSA id s196-20020a6377cd000000b005638a70110bsm733919pgc.65.2023.07.27.00.37.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Jul 2023 00:37:31 -0700 (PDT) From: Chuyi Zhou To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, Chuyi Zhou Subject: [RFC PATCH 5/5] bpf: Sample BPF program to set oom policy Date: Thu, 27 Jul 2023 15:36:32 +0800 Message-Id: <20230727073632.44983-6-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230727073632.44983-1-zhouchuyi@bytedance.com> References: <20230727073632.44983-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772561213716203227 X-GMAIL-MSGID: 1772561213716203227 This patch adds a sample showing how to set a OOM victim selection policy to protect certain cgroups. The BPF program, oom_kern.c, compares the score of two sibling memcg and selects the larger one. The userspace program oom_user.c maintains a score map by using cgroup inode number as the keys and the scores as the values. Users can set lower score for some cgroups compared to their siblings to avoid being selected. Suggested-by: Abel Wu Signed-off-by: Chuyi Zhou --- samples/bpf/Makefile | 3 + samples/bpf/oom_kern.c | 42 ++++++++++++++ samples/bpf/oom_user.c | 128 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 173 insertions(+) create mode 100644 samples/bpf/oom_kern.c create mode 100644 samples/bpf/oom_user.c diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index 615f24ebc49c..09dbdec22dad 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -56,6 +56,7 @@ tprogs-y += xdp_redirect_map_multi tprogs-y += xdp_redirect_map tprogs-y += xdp_redirect tprogs-y += xdp_monitor +tprogs-y += oom # Libbpf dependencies LIBBPF_SRC = $(TOOLS_PATH)/lib/bpf @@ -118,6 +119,7 @@ xdp_redirect_map-objs := xdp_redirect_map_user.o $(XDP_SAMPLE) xdp_redirect-objs := xdp_redirect_user.o $(XDP_SAMPLE) xdp_monitor-objs := xdp_monitor_user.o $(XDP_SAMPLE) xdp_router_ipv4-objs := xdp_router_ipv4_user.o $(XDP_SAMPLE) +oom-objs := oom_user.o # Tell kbuild to always build the programs always-y := $(tprogs-y) @@ -173,6 +175,7 @@ always-y += xdp_sample_pkts_kern.o always-y += ibumad_kern.o always-y += hbm_out_kern.o always-y += hbm_edt_kern.o +always-y += oom_kern.o ifeq ($(ARCH), arm) # Strip all except -D__LINUX_ARM_ARCH__ option needed to handle linux diff --git a/samples/bpf/oom_kern.c b/samples/bpf/oom_kern.c new file mode 100644 index 000000000000..1e0e2de1e06e --- /dev/null +++ b/samples/bpf/oom_kern.c @@ -0,0 +1,42 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 1024); + __type(key, u64); + __type(value, u32); +} sc_map SEC(".maps"); + +SEC("oom_policy") +int bpf_prog1(struct bpf_oom_ctx *ctx) +{ + u64 cg_ino_1, cg_ino_2; + u32 cs_1, sc_2; + u32 *value; + + cs_1 = sc_2 = 250; + cg_ino_1 = bpf_get_ino_from_cgroup_id(ctx->cg_id_1); + cg_ino_2 = bpf_get_ino_from_cgroup_id(ctx->cg_id_2); + + value = bpf_map_lookup_elem(&sc_map, &cg_ino_1); + if (value) + cs_1 = *value; + + value = bpf_map_lookup_elem(&sc_map, &cg_ino_2); + if (value) + sc_2 = *value; + + if (cs_1 > sc_2) + ctx->cmp_ret = BPF_OOM_CMP_GREATER; + else if (cs_1 < sc_2) + ctx->cmp_ret = BPF_OOM_CMP_LESS; + else + ctx->cmp_ret = BPF_OOM_CMP_EQUAL; + return 0; +} + +char _license[] SEC("license") = "GPL"; diff --git a/samples/bpf/oom_user.c b/samples/bpf/oom_user.c new file mode 100644 index 000000000000..7bd2d56ba910 --- /dev/null +++ b/samples/bpf/oom_user.c @@ -0,0 +1,128 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "trace_helpers.h" + +static int map_fd, prog_fd; + +static unsigned long long get_cgroup_inode(const char *path) +{ + unsigned long long inode; + struct stat file_stat; + int fd, ret; + + fd = open(path, O_RDONLY); + if (fd < 0) + return 0; + + ret = fstat(fd, &file_stat); + if (ret < 0) + return 0; + + inode = file_stat.st_ino; + close(fd); + return inode; +} + +static int set_cgroup_oom_score(const char *cg_path, int score) +{ + unsigned long long ino = get_cgroup_inode(cg_path); + + if (!ino) { + fprintf(stderr, "ERROR: get inode for %s failed\n", cg_path); + return 1; + } + if (bpf_map_update_elem(map_fd, &ino, &score, BPF_ANY)) { + fprintf(stderr, "ERROR: update map failed\n"); + return 1; + } + + return 0; +} + +/** + * A simple sample of prefer select /root/blue/instance_1 as victim memcg + * and protect /root/blue/instance_2 + * root + * / \ + * user ... blue + * / \ / \ + * .. instance_1 instance_2 + */ + +int main(int argc, char **argv) +{ + struct bpf_object *obj = NULL; + struct bpf_program *prog; + int target_fd = 0; + unsigned int prog_cnt; + + obj = bpf_object__open_file("oom_kern.o", NULL); + if (libbpf_get_error(obj)) { + fprintf(stderr, "ERROR: opening BPF object file failed\n"); + obj = NULL; + goto cleanup; + } + + prog = bpf_object__next_program(obj, NULL); + bpf_program__set_type(prog, BPF_PROG_TYPE_OOM_POLICY); + /* load BPF program */ + if (bpf_object__load(obj)) { + fprintf(stderr, "ERROR: loading BPF object file failed\n"); + goto cleanup; + } + + map_fd = bpf_object__find_map_fd_by_name(obj, "sc_map"); + + if (map_fd < 0) { + fprintf(stderr, "ERROR: finding a map in obj file failed\n"); + goto cleanup; + } + + /* + * In this sample, default score is 250 (see oom_kern.c). + * set high score for /blue and /blue/instance_1, + * so when global oom happened, /blue/instance_1 would + * be chosed as victim memcg + */ + if (set_cgroup_oom_score("/sys/fs/cgroup/blue/", 500)) { + fprintf(stderr, "ERROR: set score for /blue failed\n"); + goto cleanup; + } + if (set_cgroup_oom_score("/sys/fs/cgroup/blue/instance_1", 500)) { + fprintf(stderr, "ERROR: set score for /blue/instance_2 failed\n"); + goto cleanup; + } + + /* set low score to protect /blue/instance_2 */ + if (set_cgroup_oom_score("/sys/fs/cgroup/blue/instance_2", 100)) { + fprintf(stderr, "ERROR: set score for /blue/instance_1 failed\n"); + goto cleanup; + } + + prog_fd = bpf_program__fd(prog); + + /* Attach bpf program */ + if (bpf_prog_attach(prog_fd, target_fd, BPF_OOM_POLICY, 0)) { + fprintf(stderr, "Failed to attach BPF_OOM_POLICY program"); + goto cleanup; + } + if (bpf_prog_query(target_fd, BPF_OOM_POLICY, 0, NULL, NULL, &prog_cnt)) { + fprintf(stderr, "Failed to query attached programs\n"); + goto cleanup; + } + printf("prog_cnt: %d\n", prog_cnt); + +cleanup: + bpf_object__close(obj); + return 0; +}