From patchwork Fri Aug 4 09:38:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 131077 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:44a:b0:3f2:4152:657d with SMTP id ez10csp162722vqb; Fri, 4 Aug 2023 03:26:27 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHm8RAfH1KyEKXGgum/L+WFBAyR8NwdbH34DSRY3fiXOjYiEXa09O5AoOlCE8X57MGpCtsP X-Received: by 2002:a17:902:e546:b0:1b8:a697:3719 with SMTP id n6-20020a170902e54600b001b8a6973719mr1570450plf.25.1691144786719; Fri, 04 Aug 2023 03:26:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691144786; cv=none; d=google.com; s=arc-20160816; b=QyGoKiGN1S9JHBU0MDLw3kQTm/WEJRLwXMIoKO22vHqSep7OudGL6wzv+3ziCUYyve WhKMp/bZnp9SeVCc6LGVlifWp0XvY/GjUXYXjOit6UGeqwTSoalnblgTkq2p9RAX+BB/ aMmYayaeFpkidVgYG1rurpXmpgxleFUhKzNSPzlTNlESjHMalGQ1dBEhf8+HBv1xr42z FxefzJslR+xzZSvWz6B0E6LRMOpxtmxEiF8XN4os7PqLO7X8gn48WqALk2XToM6YByO8 izLCPMiqxEAGeUlIZK5NyNEsh0CUh+Ywd6ChShRS7I3Ao966txEReHJ12B2/FHNMPD3W +ejQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ZnVxinOLSDbVFQcoUFkv5fif4kt7MH5yiiDW0k4Isz4=; fh=5K/5O+uaWy/TXWHq+XYNoCsNLzz39Nw0/H1YBlI/NpI=; b=JDy9h7/JdPr5al6aWdWQX1mkwrvW5btxJ5urX/H2+NSy3JsXVbEbv84T0XcbaCvR2W Js49vGHQqSxh67a7fqzJgMMfI5/07pMi9aRxEJdveW2apaVVLLz4/XFquEV35NQmG8J6 w9D+52Xo3F90WWoy0Xz9wEHulHMx9qdc6mKwlxgPR6YEWIF6CnYuG2ix8VHtpqQ46Tp0 cYvnLo1NKEX9076+hDBl4TpiSD5qMV51SBSSwTH99lDAsqCpTeKeDMGVnkerXn+zKaxD HhT3jjqjzqHm8A7dFyuewLcTuCy9BF/xUwpvyr64XudJkdXpPZW5ePuHMkGhDCKQPUay cgZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=SnYq7pN1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g2-20020a17090a8f0200b002634cad1d64si1604102pjo.75.2023.08.04.03.26.12; Fri, 04 Aug 2023 03:26:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=SnYq7pN1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230489AbjHDJi0 (ORCPT + 99 others); Fri, 4 Aug 2023 05:38:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230281AbjHDJiY (ORCPT ); Fri, 4 Aug 2023 05:38:24 -0400 Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D3BE30EA for ; Fri, 4 Aug 2023 02:38:23 -0700 (PDT) Received: by mail-pf1-x436.google.com with SMTP id d2e1a72fcca58-686f8614ce5so1773901b3a.3 for ; Fri, 04 Aug 2023 02:38:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691141902; x=1691746702; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ZnVxinOLSDbVFQcoUFkv5fif4kt7MH5yiiDW0k4Isz4=; b=SnYq7pN1xKt+8a/z1C+XFiuu2Ak1e5y+1Mhw4KDi+FAv5ci79AUGSFoPX+kdGaVg+F ljIZlgKY99++RUbA5xeIKLBsHHuwqQptGQ6yHf0MZ+5awXqsZx8LSY7v2Jeb702P0gGy aSNqUTc0vxHv3ZjhTdhYeX04dsH34jOpByFF8IwXYFZXrLOqi/Y1A0nxMr48P2Mked+W lRhz4HZu79RK7uZdHXLNBgFvOnMSLWkjTr8zn1WtUyr6KDzyoSKWOQMC6Rzh/DpUgiOJ HwIl5c3AZ3RSsOFH/IRN44JBRMBmMiVnj4F2MzK87P7LwMhl5vt40rFY7T6HqGGit5ae fJJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691141902; x=1691746702; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZnVxinOLSDbVFQcoUFkv5fif4kt7MH5yiiDW0k4Isz4=; b=C3U6qvfMFlU7/rXiMIeqjcR8XIsPyc57g5DqtzHIvZOvrskmjRho+2S6hUBQ6DqDsM rpQgQ3KpoTw65H8fPlVHjiBUZRASSFkZqqHXZjbTZSHurtYkBv1NTrAlzob4RJlAcZEL /7yF+07Ezyb6MOoydyqWx9oLA2nLBnnH8ntunIubs7nMIEBBStc2+2CC8IlugKSh5m3G hbCl65N08SKmQquNPkc4WJjoAVlzHbgb7K5oOAxD81c5k50oPdfq68CpygyY4bVSvHmk zvUbksCLe3Z6LdodLjUkVhifQWQmNeRpJLZdtz4UTpPcZQwhTU9g2V+rZfqN1z4K81MW fxkQ== X-Gm-Message-State: AOJu0YxCF++r7xFCsWn62a9fDKAlsCZtEqWhSpKYZgRlyehfIaXP6UIS OYrH+u9LtJr9valEGKkYTOODrQ== X-Received: by 2002:a05:6a21:789a:b0:133:7276:324b with SMTP id bf26-20020a056a21789a00b001337276324bmr1346719pzc.23.1691141902614; Fri, 04 Aug 2023 02:38:22 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.142]) by smtp.gmail.com with ESMTPSA id c23-20020aa78817000000b00687933946ddsm1214837pfo.23.2023.08.04.02.38.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Aug 2023 02:38:22 -0700 (PDT) From: Chuyi Zhou To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, muchun.song@linux.dev Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, Chuyi Zhou Subject: [RFC PATCH 1/2] mm, oom: Introduce bpf_select_task Date: Fri, 4 Aug 2023 17:38:03 +0800 Message-Id: <20230804093804.47039-2-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230804093804.47039-1-zhouchuyi@bytedance.com> References: <20230804093804.47039-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773293835955549027 X-GMAIL-MSGID: 1773293835955549027 This patch adds a new hook bpf_select_task in oom_evaluate_task. It takes oc and current iterating task as parameters and returns a result indicating which one is selected by bpf program. Although bpf_select_task is used to bypass the default method, there are some existing rules should be obeyed. Specifically, we skip these "unkillable" tasks(e.g., kthread, MMF_OOM_SKIP, in_vfork()).So we do not consider tasks with lowest score returned by oom_badness except it was caused by OOM_SCORE_ADJ_MIN. If we attach a prog to the hook, the interface is enabled only when we have successfully chosen at least one valid candidate in previous iteraion. This is to avoid that we find nothing if bpf program rejects all tasks. Signed-off-by: Chuyi Zhou --- mm/oom_kill.c | 57 ++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 50 insertions(+), 7 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 612b5597d3af..aec4c55ed49a 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -18,6 +18,7 @@ * kernel subsystems and hints as to where to find out what things do. */ +#include #include #include #include @@ -210,6 +211,16 @@ long oom_badness(struct task_struct *p, unsigned long totalpages) if (!p) return LONG_MIN; + /* + * If task is allocating a lot of memory and has been marked to be + * killed first if it triggers an oom, then set points to LONG_MAX. + * It will be selected unless we keep oc->chosen through bpf interface. + */ + if (oom_task_origin(p)) { + task_unlock(p); + return LONG_MAX; + } + /* * Do not even consider tasks which are explicitly marked oom * unkillable or have been already oom reaped or the are in @@ -305,8 +316,30 @@ static enum oom_constraint constrained_alloc(struct oom_control *oc) return CONSTRAINT_NONE; } +enum bpf_select_ret { + BPF_SELECT_DISABLE, + BPF_SELECT_TASK, + BPF_SELECT_CHOSEN, +}; + +__weak noinline int bpf_select_task(struct oom_control *oc, + struct task_struct *task, long badness_points) +{ + return BPF_SELECT_DISABLE; +} + +BTF_SET8_START(oom_bpf_fmodret_ids) +BTF_ID_FLAGS(func, bpf_select_task) +BTF_SET8_END(oom_bpf_fmodret_ids) + +static const struct btf_kfunc_id_set oom_bpf_fmodret_set = { + .owner = THIS_MODULE, + .set = &oom_bpf_fmodret_ids, +}; + static int oom_evaluate_task(struct task_struct *task, void *arg) { + enum bpf_select_ret bpf_ret = BPF_SELECT_DISABLE; struct oom_control *oc = arg; long points; @@ -329,17 +362,23 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) goto abort; } + points = oom_badness(task, oc->totalpages); + /* - * If task is allocating a lot of memory and has been marked to be - * killed first if it triggers an oom, then select it. + * Do not consider tasks with lowest score value except it was caused + * by OOM_SCORE_ADJ_MIN. Give these tasks a chance to be selected by + * bpf interface. */ - if (oom_task_origin(task)) { - points = LONG_MAX; + if (points == LONG_MIN && task->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) + goto next; + + if (oc->chosen) + bpf_ret = bpf_select_task(oc, task, points); + + if (bpf_ret == BPF_SELECT_TASK) goto select; - } - points = oom_badness(task, oc->totalpages); - if (points == LONG_MIN || points < oc->chosen_points) + if (bpf_ret == BPF_SELECT_CHOSEN || points == LONG_MIN || points < oc->chosen_points) goto next; select: @@ -732,10 +771,14 @@ static struct ctl_table vm_oom_kill_table[] = { static int __init oom_init(void) { + int err; oom_reaper_th = kthread_run(oom_reaper, NULL, "oom_reaper"); #ifdef CONFIG_SYSCTL register_sysctl_init("vm", vm_oom_kill_table); #endif + err = register_btf_fmodret_id_set(&oom_bpf_fmodret_set); + if (err) + pr_warn("error while registering oom fmodret entrypoints: %d", err); return 0; } subsys_initcall(oom_init) From patchwork Fri Aug 4 09:38:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 131085 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:44a:b0:3f2:4152:657d with SMTP id ez10csp172583vqb; Fri, 4 Aug 2023 03:49:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFgop0k3SEHe/ogqqtX7qDU9ZKotKznj0to4kgyVdIh6dUt+jaDG30RRUZCQeqUhl369SDV X-Received: by 2002:a17:90b:4d8a:b0:268:2af6:e48c with SMTP id oj10-20020a17090b4d8a00b002682af6e48cmr1954478pjb.4.1691146168147; Fri, 04 Aug 2023 03:49:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691146168; cv=none; d=google.com; s=arc-20160816; b=lztz5r8hVrarjwK2rzKB43JgIAsQsXLmZafUaxgAFN8qSNdxA1c2CcWf8/LshOECDW 2b5aNdv5Tb8ZJaVZUzijLJOfQwXS5zYYH4WyeVbVzHJIFhMNvtUodcRJsnBdmnLyJNVM mpotONZrOXVDZP8HLGnYpr0UkLuPYt1DY4FguHVv+QkZuHDUwQgo0JzwPBSzyh1EgIv+ QlTcxd0d3Gn+BicmH4J7ggF5xHVdYvMffBhjQjCb89RFErli5a4MhSPoR5c6Rhr+lfhe k3PKm9VmQ4l0TNicnmaN5NtV+Yc+5ArrUFEDkDaPSkwyfAnVL2isx/7Zgg+dadAEpBEC +WpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=01G/dGmCoWPYOyR3VsPv/eL5+i89PsgHn5Vivri25As=; fh=5K/5O+uaWy/TXWHq+XYNoCsNLzz39Nw0/H1YBlI/NpI=; b=mEGWlFeAfFJ8nyOE1rgx33SSZbXex6VOc+KX/vk0QuGrFRTv7JOVjHc7NcCIwzGMzN kUfKRDY+aDb25dD/dNm51I+jE5W7Upc/Oo7fLfssfx40yZYrQyNZ/ThcX8YTDJuAfmUE zyAYhGWiAS1cXsYw+NQFJzjjuYjo6D3lQHworvLisM8q7Efi6M6VYyqjdNDluaahXsng 8pKO4p2KL/zPyFNpiunDdzmRyVuem3qJmULTZAAw+n1Mxc4dXExzrME7QBpPmiyUrbdA DKEGp8mZ9x7WC+TWDmdApe0JzUJsmuc17m0qylc23DqFJ1HqBdYI9S8ESrUa4pFZ2Poc KhpA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=if8fIwb8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ij24-20020a170902ab5800b001bb7d55d15csi1510870plb.284.2023.08.04.03.49.14; Fri, 04 Aug 2023 03:49:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=if8fIwb8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231171AbjHDJjP (ORCPT + 99 others); Fri, 4 Aug 2023 05:39:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46080 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230527AbjHDJix (ORCPT ); Fri, 4 Aug 2023 05:38:53 -0400 Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4E7C830F8 for ; Fri, 4 Aug 2023 02:38:28 -0700 (PDT) Received: by mail-pf1-x432.google.com with SMTP id d2e1a72fcca58-686f94328a4so1321028b3a.0 for ; Fri, 04 Aug 2023 02:38:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691141908; x=1691746708; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=01G/dGmCoWPYOyR3VsPv/eL5+i89PsgHn5Vivri25As=; b=if8fIwb8lSueExAsbM1/AY31VsruqCA6SyziSadQfaBWq7gobGORNAvSo/zEC9G/B/ kx7iEKa2pPfVyH5IjW7i2e+sbal6F4+sIDqsBQ+zlCKBbhEJ9uh377bznGNXWviF0+tH 2kGckdyM1I5UV/+SApN/JG7mlMsv3FAcAid7Zt0pWw0ANaqIZDleN+JBQoyYev9okSe+ B68RzPb+ONsCtSqdCmgGDR//GVLTDlcw9xWlSLtTBW3hpgCXHXg+UvcPgAmhWfuA+Lmz X3+OukpfFFHT+4dWzM5c7Cc0Fm8HVIrE+M3PBAl6+wgMKQgTinPSh0Egt+VYrxtTQAxi ZxLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691141908; x=1691746708; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=01G/dGmCoWPYOyR3VsPv/eL5+i89PsgHn5Vivri25As=; b=VgSUZtGaNf6dMRFXAK7Aj1etH5T250/bljMbxaaePGQMTlgNqR4dyupnZ2LY16Tvej I4M8ZQ5Jei7xeVncFbZS2tNUXWI9NrpmArhL2qHXzpo4emVvbffMB5M+fX4y6xIq9xIa l4YNcYfvNehJuwsTg7GvBDsHcj2aZIkmVosTKUgbavVUDwP7/bYHrMm4ZRdJRfPuTqEP gAECiXgEmV2mrsw8beyaHZIMKvzpXIRzxzBhh2WAVOo9FB1pyWsubyKaYU6eRrDbnRn7 mtzwqNITjzKyUPX70fEmfmkbUlbfkyUjEq0O4+MrVDI1i5q+D5HwSDiVJTV0Uuq2l1Pm fj2Q== X-Gm-Message-State: AOJu0Ywi3LRZUta76ATXDztJZQe0I3cPvWnLgC9ISPxXrCjNqvhxAp7y CMrufj+gBBdvNxX8gc2fU7U03Q== X-Received: by 2002:a05:6a20:3d84:b0:137:c971:6a0c with SMTP id s4-20020a056a203d8400b00137c9716a0cmr1405040pzi.31.1691141907829; Fri, 04 Aug 2023 02:38:27 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.142]) by smtp.gmail.com with ESMTPSA id c23-20020aa78817000000b00687933946ddsm1214837pfo.23.2023.08.04.02.38.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 04 Aug 2023 02:38:27 -0700 (PDT) From: Chuyi Zhou To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, muchun.song@linux.dev Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, Chuyi Zhou Subject: [RFC PATCH 2/2] bpf: Add OOM policy test Date: Fri, 4 Aug 2023 17:38:04 +0800 Message-Id: <20230804093804.47039-3-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230804093804.47039-1-zhouchuyi@bytedance.com> References: <20230804093804.47039-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773295284515735803 X-GMAIL-MSGID: 1773295284515735803 This patch adds a test which implements a priority-based policy through bpf_select_task. The BPF program, oom_policy.c, compares the cgroup priority of two tasks and select the lower one. The userspace program test_oom_policy.c maintains a priority map by using cgroup id as the keys and priority as the values. We could protect certain cgroups from oom-killer by setting higher priority. Signed-off-by: Chuyi Zhou --- .../bpf/prog_tests/test_oom_policy.c | 140 ++++++++++++++++++ .../testing/selftests/bpf/progs/oom_policy.c | 77 ++++++++++ 2 files changed, 217 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/test_oom_policy.c create mode 100644 tools/testing/selftests/bpf/progs/oom_policy.c diff --git a/tools/testing/selftests/bpf/prog_tests/test_oom_policy.c b/tools/testing/selftests/bpf/prog_tests/test_oom_policy.c new file mode 100644 index 000000000000..2400cc48ba83 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/test_oom_policy.c @@ -0,0 +1,140 @@ +// SPDX-License-Identifier: GPL-2.0-only +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "cgroup_helpers.h" +#include "oom_policy.skel.h" + +static int map_fd; +static int cg_nr; +struct { + const char *path; + int fd; + unsigned long long id; +} cgs[] = { + { "/cg1" }, + { "/cg2" }, +}; + + +static struct oom_policy *open_load_oom_policy_skel(void) +{ + struct oom_policy *skel; + int err; + + skel = oom_policy__open(); + if (!ASSERT_OK_PTR(skel, "skel_open")) + return NULL; + + err = oom_policy__load(skel); + if (!ASSERT_OK(err, "skel_load")) + goto cleanup; + + return skel; + +cleanup: + oom_policy__destroy(skel); + return NULL; +} + +static void run_memory_consume(unsigned long long consume_size, int idx) +{ + char *buf; + + join_parent_cgroup(cgs[idx].path); + buf = malloc(consume_size); + memset(buf, 0, consume_size); + sleep(2); + exit(0); +} + +static int set_cgroup_prio(unsigned long long cg_id, int prio) +{ + int err; + + err = bpf_map_update_elem(map_fd, &cg_id, &prio, BPF_ANY); + ASSERT_EQ(err, 0, "update_map"); + return err; +} + +static int prepare_cgroup_environment(void) +{ + int err; + + err = setup_cgroup_environment(); + if (err) + goto clean_cg_env; + for (int i = 0; i < cg_nr; i++) { + err = cgs[i].fd = create_and_get_cgroup(cgs[i].path); + if (!ASSERT_GE(cgs[i].fd, 0, "cg_create")) + goto clean_cg_env; + cgs[i].id = get_cgroup_id(cgs[i].path); + } + return 0; +clean_cg_env: + cleanup_cgroup_environment(); + return err; +} + +void test_oom_policy(void) +{ + struct oom_policy *skel; + struct bpf_link *link; + int err; + int victim_pid; + unsigned long long victim_cg_id; + + link = NULL; + cg_nr = ARRAY_SIZE(cgs); + + skel = open_load_oom_policy_skel(); + err = oom_policy__attach(skel); + if (!ASSERT_OK(err, "oom_policy__attach")) + goto cleanup; + + map_fd = bpf_object__find_map_fd_by_name(skel->obj, "cg_map"); + if (!ASSERT_GE(map_fd, 0, "find map")) + goto cleanup; + + err = prepare_cgroup_environment(); + if (!ASSERT_EQ(err, 0, "prepare cgroup env")) + goto cleanup; + + write_cgroup_file("/", "memory.max", "10M"); + + /* + * Set higher priority to cg2 and lower to cg1, so we would select + * task under cg1 as victim.(see oom_policy.c) + */ + set_cgroup_prio(cgs[0].id, 10); + set_cgroup_prio(cgs[1].id, 50); + + victim_cg_id = cgs[0].id; + victim_pid = fork(); + + if (victim_pid == 0) + run_memory_consume(1024 * 1024 * 4, 0); + + if (fork() == 0) + run_memory_consume(1024 * 1024 * 8, 1); + + while (wait(NULL) > 0) + ; + + ASSERT_EQ(skel->bss->victim_pid, victim_pid, "victim_pid"); + ASSERT_EQ(skel->bss->victim_cg_id, victim_cg_id, "victim_cgid"); + +cleanup: + bpf_link__destroy(link); + oom_policy__destroy(skel); + cleanup_cgroup_environment(); +} diff --git a/tools/testing/selftests/bpf/progs/oom_policy.c b/tools/testing/selftests/bpf/progs/oom_policy.c new file mode 100644 index 000000000000..d269ea52bcb2 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/oom_policy.c @@ -0,0 +1,77 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include + +char _license[] SEC("license") = "GPL"; +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, int); + __type(value, int); + __uint(max_entries, 24); +} cg_map SEC(".maps"); + +unsigned int victim_pid; +u64 victim_cg_id; + +enum bpf_select_ret { + BPF_SELECT_DISABLE, + BPF_SELECT_TASK, + BPF_SELECT_CHOSEN, +}; + +static __always_inline u64 task_cgroup_id(struct task_struct *task) +{ + struct kernfs_node *node; + struct task_group *tg; + + if (!task) + return 0; + + tg = task->sched_task_group; + node = tg->css.cgroup->kn; + + return node->id; +} + +SEC("fentry/oom_kill_process") +int BPF_PROG(oom_kill_process_k, struct oom_control *oc, const char *message) +{ + struct task_struct *victim = oc->chosen; + + if (!victim) + return 0; + + victim_pid = victim->pid; + victim_cg_id = task_cgroup_id(victim); + return 0; +} + +SEC("fmod_ret/bpf_select_task") +int BPF_PROG(select_task_test, struct oom_control *oc, struct task_struct *task, long points) +{ + u64 chosen_cg_id, task_cg_id; + int chosen_cg_prio, task_cg_prio; + struct task_struct *chosen; + int *val; + + chosen = oc->chosen; + chosen_cg_id = task_cgroup_id(chosen); + task_cg_id = task_cgroup_id(task); + chosen_cg_prio = task_cg_prio = 0; + val = bpf_map_lookup_elem(&cg_map, &chosen_cg_id); + if (val) + chosen_cg_prio = *val; + val = bpf_map_lookup_elem(&cg_map, &task_cg_id); + if (val) + task_cg_prio = *val; + + if (chosen_cg_prio > task_cg_prio) + return BPF_SELECT_TASK; + if (chosen_cg_prio < task_cg_prio) + return BPF_SELECT_CHOSEN; + + return BPF_SELECT_DISABLE; + +} +