From patchwork Thu Aug 10 08:13:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 133770 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp282222vqi; Thu, 10 Aug 2023 02:02:58 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG215tU9pWA/i2o8+VngaPMGEQWoYTjIWx5av7oU43ureWHJIcwvKNQ3B3QAKu0OVrDZqHh X-Received: by 2002:a17:906:3188:b0:99b:cf4f:9090 with SMTP id 8-20020a170906318800b0099bcf4f9090mr1562928ejy.66.1691658177783; Thu, 10 Aug 2023 02:02:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691658177; cv=none; d=google.com; s=arc-20160816; b=LUt3ocD0SdLCscaqF5uDZKnGypDvK5afqfOoPYUQR3zxE+xm76Frt2cQrOHYYiRxSG grJeNYvzR3wndedH0N+kVR32KcczaWYx6ZGhx1Uvorh+4CwAfxa76i19E2UQBHdxIXQZ IdaCk6p3He3MOyPNlL5Xg4XUT2T0FSqJ5ZOvW47tfBOzbg9VhnyPE5hsKlDLN+jpjOBu 6jn2P3Q+XBCUq0coYM1iSUhE5bdDY8ATNcvszHu52TNbrRCv8mv1527PzR87N1hfQYHC IV1sXYAo4geP6DLqvwy/iYFaUkHoFDWbzgV4sjvl463jgOROzXiBPDwCEBIzZ1MBh/70 M6GQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=tl5oO1QRXOb7Xlr/P1TocHGd4tt8X+hUV0N7kEiSrUM=; fh=Uv7FIGtir+G2jRevwMgWu36KzrFZo3hvfZX1KFcDb4M=; b=zSsAfZ2zpVtWfi9g4w2QaWBHUq6so8b1hgCUoqwK9gMtQkoEf0/12BwSpFMpF7sQTh sfUDf9B82RVmEoHScLtGZwRiahquxX7dFootRZXNJiv1WdaEZ6E9leKryNlHBbolWWv9 mVL7no6Rg2erTvvg/2dby6vWokhew+eEBNuaxJkv3trnygNkieGjK9lsm9IGUXzqK5kx yFopi3DRwfndJIXy/DVcNvEOghiXCVFy/96JCMsvgX0e1Xjyjy2IrCJXOp+ixoWafl0U BkA01FtQj0MQnWUFROZFEmq0QhRYQoW/DhxhXzUQKoNZ7NAPXIaM1thG0t2XQyvMrpXc Cg1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=aDltu1PK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k22-20020a1709065fd600b00992d6fae2bfsi1108990ejv.953.2023.08.10.02.02.33; Thu, 10 Aug 2023 02:02:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=aDltu1PK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234135AbjHJIOE (ORCPT + 99 others); Thu, 10 Aug 2023 04:14:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51132 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234079AbjHJIN6 (ORCPT ); Thu, 10 Aug 2023 04:13:58 -0400 Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4B7A2127 for ; Thu, 10 Aug 2023 01:13:32 -0700 (PDT) Received: by mail-pl1-x62d.google.com with SMTP id d9443c01a7336-1bc83a96067so5015765ad.0 for ; Thu, 10 Aug 2023 01:13:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691655212; x=1692260012; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=tl5oO1QRXOb7Xlr/P1TocHGd4tt8X+hUV0N7kEiSrUM=; b=aDltu1PKROoqGLCvRCHWF58dz9bbnpl3orhWXwYhHL4mOHV2ipbxiKpukTOenyL6CP qsNy+obeB+6hqeZA4OJZrBAEy5I5NGbHxgyWFUwujdVGnzUJPnTe5NggTksBrAirVvJP s1EYtHrvz6WbPAli7p8Q8y6ax6VSW2rlRMDKpBKnDWboQse/8rLDqIBFeQVxhHeZztwp hNgKt7V0jmP5xO5pheYLt3rgmcVCkGCETmfp0Yi1GP0kbth3HerEwWBLJr9aP63ixuse /4sAIb4CE6FcbBvaG+uCMzgW0hze7LEwB0OChYDzKAYsFMSSo2MoGv0b3ayXt0THKVke snpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691655212; x=1692260012; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tl5oO1QRXOb7Xlr/P1TocHGd4tt8X+hUV0N7kEiSrUM=; b=Cl/QLQXl88mw0UO5TvX8gyYtdwhtkqdyr/ts45f0qne8vaHs4Cmph/6ww/dSIQml5i EedU2viEUjvAJJI8bWDXFY11zZgwutfR7cTbGtjaUCwT/hm/j7mYd0Jx99gTyVdgXK4H Zq1dNteZpry0W7PFnnVJp4Tjs8yuqbA/qExZ2JNCARqLUPTjaULVOerAJUnVMsbZAz97 ZjnhjFB3BV+4zXsiB7XNOoOCYwZt4kAki2Ceckw+5SkCj0d9IyE4g1BClWJfG+epRgKT hFCdUerSBBqSm0Z1cWymb82NUlSUtgqFRiV6DwnIBCn7pN0UXkj2jfSR8UkKRC+1g+Y/ C4Mw== X-Gm-Message-State: AOJu0YwN/0kw3vY8oOv2nBA437Psr6xxc3jkG72XKMbv340dLLrpJnjm C9tK1qvE5YhhXnwRZPMNosP1zQ== X-Received: by 2002:a17:903:41cf:b0:1bc:9794:22ef with SMTP id u15-20020a17090341cf00b001bc979422efmr1545395ple.1.1691655212147; Thu, 10 Aug 2023 01:13:32 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.40]) by smtp.gmail.com with ESMTPSA id x12-20020a170902ec8c00b001b1a2c14a4asm1019036plg.38.2023.08.10.01.13.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Aug 2023 01:13:31 -0700 (PDT) From: Chuyi Zhou To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, muchun.song@linux.dev Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, Chuyi Zhou , Michal Hocko Subject: [RFC PATCH v2 1/5] mm, oom: Introduce bpf_oom_evaluate_task Date: Thu, 10 Aug 2023 16:13:15 +0800 Message-Id: <20230810081319.65668-2-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230810081319.65668-1-zhouchuyi@bytedance.com> References: <20230810081319.65668-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773832165333169357 X-GMAIL-MSGID: 1773832165333169357 This patch adds a new hook bpf_oom_evaluate_task in oom_evaluate_task. It takes oc and current iterating task as parameters and returns a result indicating which one should be selected. We can use it to bypass the current logic of oom_evaluate_task and implement customized OOM policies in the attached BPF progams. Suggested-by: Michal Hocko Signed-off-by: Chuyi Zhou --- mm/oom_kill.c | 59 +++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 50 insertions(+), 9 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 612b5597d3af..255c9ef1d808 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -18,6 +18,7 @@ * kernel subsystems and hints as to where to find out what things do. */ +#include #include #include #include @@ -305,6 +306,27 @@ static enum oom_constraint constrained_alloc(struct oom_control *oc) return CONSTRAINT_NONE; } +enum { + NO_BPF_POLICY, + BPF_EVAL_ABORT, + BPF_EVAL_NEXT, + BPF_EVAL_SELECT, +}; + +__weak noinline int bpf_oom_evaluate_task(struct task_struct *task, struct oom_control *oc) +{ + return NO_BPF_POLICY; +} + +BTF_SET8_START(oom_bpf_fmodret_ids) +BTF_ID_FLAGS(func, bpf_oom_evaluate_task) +BTF_SET8_END(oom_bpf_fmodret_ids) + +static const struct btf_kfunc_id_set oom_bpf_fmodret_set = { + .owner = THIS_MODULE, + .set = &oom_bpf_fmodret_ids, +}; + static int oom_evaluate_task(struct task_struct *task, void *arg) { struct oom_control *oc = arg; @@ -317,6 +339,26 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) if (!is_memcg_oom(oc) && !oom_cpuset_eligible(task, oc)) goto next; + /* + * If task is allocating a lot of memory and has been marked to be + * killed first if it triggers an oom, then select it. + */ + if (oom_task_origin(task)) { + points = LONG_MAX; + goto select; + } + + switch (bpf_oom_evaluate_task(task, oc)) { + case BPF_EVAL_ABORT: + goto abort; /* abort search process */ + case BPF_EVAL_NEXT: + goto next; /* ignore the task */ + case BPF_EVAL_SELECT: + goto select; /* select the task */ + default: + break; /* No BPF policy */ + } + /* * This task already has access to memory reserves and is being killed. * Don't allow any other task to have access to the reserves unless @@ -329,15 +371,6 @@ static int oom_evaluate_task(struct task_struct *task, void *arg) goto abort; } - /* - * If task is allocating a lot of memory and has been marked to be - * killed first if it triggers an oom, then select it. - */ - if (oom_task_origin(task)) { - points = LONG_MAX; - goto select; - } - points = oom_badness(task, oc->totalpages); if (points == LONG_MIN || points < oc->chosen_points) goto next; @@ -732,10 +765,18 @@ static struct ctl_table vm_oom_kill_table[] = { static int __init oom_init(void) { + int err; oom_reaper_th = kthread_run(oom_reaper, NULL, "oom_reaper"); #ifdef CONFIG_SYSCTL register_sysctl_init("vm", vm_oom_kill_table); #endif + +#ifdef CONFIG_BPF_SYSCALL + err = register_btf_fmodret_id_set(&oom_bpf_fmodret_set); + if (err) + pr_warn("error while registering oom fmodret entrypoints: %d", err); +#endif + return 0; } subsys_initcall(oom_init) From patchwork Thu Aug 10 08:13:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 133771 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp283452vqi; Thu, 10 Aug 2023 02:05:18 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF3Glkhx01s5TruW5I1DYmTNgB9uZxSXGhFZVfNnzQ+Ss2KoDu6peIagIcUFRP6iXVnwoWw X-Received: by 2002:a9d:6e9a:0:b0:6b9:4216:c209 with SMTP id a26-20020a9d6e9a000000b006b94216c209mr1799928otr.12.1691658318281; Thu, 10 Aug 2023 02:05:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691658318; cv=none; d=google.com; s=arc-20160816; b=doQTn3uQ3FAX2p5XMWT2fErcmFLuZ48z2DoOPWRtypCvzEDfDZCxKJ4QDIdZG33wLy G4wxGvyUFI9+NBiCp49EMt5ISlaIKdv0yY3vqiJGL5m3LsmxgtEo9G5vVjS2xw4/Z5F+ R1U3KcG/FVXVZsONRHrwmMAm5lyW4WMNHkszMKUdg6q3mxy0cq1dvEkzONcsnrkINExV CpTyQRhoGbkpe9nR1/fZZE4HtBYCGpXifDlJeELffTfkxA4adpcaW2ZJFNjtChReVgLc 9A0b3QqSA5nQ3b1eNjkfGxxoVaga4SHCd3AqlJ4bniK07Ee3kAsOiKtvd001OMnUUWIM uHrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=V+gOX8E32N7J0qn1e/cgbPAAdyKUl0JYooXv5sNbHuI=; fh=5K/5O+uaWy/TXWHq+XYNoCsNLzz39Nw0/H1YBlI/NpI=; b=KRq+5Zji3LO4sPvv5h2W5jBiDxkmqVBn1tl15hfPb1fnfK50HJ64HpY1U7YW2izZPF po6zCGZSRVLhGUvUVPVxBo1CiRWGfShLHNnnuYSZzCCalepYCbtFs0aH90277ynECXHV qnPP9OS5eSR8LN4n021Pf5aSW6nLZsvSLOYFahjRSJWJDiHm6Nnc6pkUy9SiN1uB8gfM 65whZwBlIAjA3NQH0YQo43Lm6lmwBsaAdxl6riL0KP6r4ZOOaPtTafY6v0Uklgv/1wQO 4GuFw7YYywxY6C55/M1BZ+gmQKEK76tXH6y1kLeI66tVvIFdGIM5UT3V35+aNoCjV4Ol Wtzg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=OG7IuzB6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p7-20020a637f47000000b00564c6cc8f98si1129737pgn.883.2023.08.10.02.05.03; Thu, 10 Aug 2023 02:05:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=OG7IuzB6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234143AbjHJIOF (ORCPT + 99 others); Thu, 10 Aug 2023 04:14:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51144 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234089AbjHJIN7 (ORCPT ); Thu, 10 Aug 2023 04:13:59 -0400 Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA7EE2683 for ; Thu, 10 Aug 2023 01:13:36 -0700 (PDT) Received: by mail-pf1-x42e.google.com with SMTP id d2e1a72fcca58-686efb9ee3cso578541b3a.3 for ; Thu, 10 Aug 2023 01:13:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691655216; x=1692260016; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=V+gOX8E32N7J0qn1e/cgbPAAdyKUl0JYooXv5sNbHuI=; b=OG7IuzB6CPOFfsV9wFrgnazb3f1VQrgIV5f+OOZdkCCvMJDzb3aV1VZFFlJjpUsXA3 +BP87n9CA/2IoiPk0nP0DzRB7WQ7FQy3RpzkGS3xJcTT6KV93ClOZg7CbgeULa7wj9ly Fr3KCTw83jY9eghkbiZB2yqiscmS6/rgT4aHFfjPPmngWuHbHqylqAWkCsuflY/r6XA/ u+1EJS1Q7o+TgHUDSY/Y9RXZe993prhItKrseX0iBXnPuQufH4ZiVy6hcumq/SQIiGp4 weiY7Tex+vf38O2Fi3m5cUUfZFvNYFueWFQFMKwYqa/fe7Wq4zxqt5ZSwFX0OfTK0l6P IEEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691655216; x=1692260016; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=V+gOX8E32N7J0qn1e/cgbPAAdyKUl0JYooXv5sNbHuI=; b=ZFy+105Bz1Tc7JUwW59smB6b+rR2mt/1DQXnaqI6b50GAioaKK61BvlV3SUo7lUKyS lUfmPz/09yVq8XG88vftM5bV7TuUy3JNk52TTfw4DLG+xfABHzFaMC7r2cggcRHp5omg tdQfqnObMXTqpTHVd61cgj/NtPRHDKLrLQkiLt6jePBCvfLA/NrC7PNmPCO5TFpMT2U1 +y/yoSow1hTr15Y5qkJrT4OsARkbBnIOvnrKaSa4jc8lZv04GpOrjqu56zRqOfH8I2dk BQgZeHNT0C4IMxcMsmAOeADRO9/az15Txv8K8Ox6iVucRbHdKyma1Eq/+pfgtFXlhoZ0 z3og== X-Gm-Message-State: AOJu0Yyi5voIoxNMNK9P2NLv+YYo2E23LSGVZrfynAi1HaQhusj3MxK5 zLfU0sLXPWw7a8sAdOOG1ca9ag== X-Received: by 2002:a17:903:246:b0:1b8:76ce:9d91 with SMTP id j6-20020a170903024600b001b876ce9d91mr1911030plh.1.1691655216226; Thu, 10 Aug 2023 01:13:36 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.40]) by smtp.gmail.com with ESMTPSA id x12-20020a170902ec8c00b001b1a2c14a4asm1019036plg.38.2023.08.10.01.13.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Aug 2023 01:13:35 -0700 (PDT) From: Chuyi Zhou To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, muchun.song@linux.dev Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, Chuyi Zhou Subject: [RFC PATCH v2 2/5] mm: Add policy_name to identify OOM policies Date: Thu, 10 Aug 2023 16:13:16 +0800 Message-Id: <20230810081319.65668-3-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230810081319.65668-1-zhouchuyi@bytedance.com> References: <20230810081319.65668-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773832312508069894 X-GMAIL-MSGID: 1773832312508069894 This patch adds a new metadata policy_name in oom_control and report it in dump_header(), so we can know what has been the selection policy. In BPF program, we can call kfunc set_oom_policy_name to set the current user-defined policy name. The in-kernel policy_name is "default". Signed-off-by: Chuyi Zhou --- include/linux/oom.h | 7 +++++++ mm/oom_kill.c | 42 +++++++++++++++++++++++++++++++++++++++--- 2 files changed, 46 insertions(+), 3 deletions(-) diff --git a/include/linux/oom.h b/include/linux/oom.h index 7d0c9c48a0c5..69d0f2ec6ea6 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -22,6 +22,10 @@ enum oom_constraint { CONSTRAINT_MEMCG, }; +enum { + POLICY_NAME_LEN = 16, +}; + /* * Details of the page allocation that triggered the oom killer that are used to * determine what should be killed. @@ -52,6 +56,9 @@ struct oom_control { /* Used to print the constraint info. */ enum oom_constraint constraint; + + /* Used to report the policy info. */ + char policy_name[POLICY_NAME_LEN]; }; extern struct mutex oom_lock; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 255c9ef1d808..3239dcdba4d7 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -443,6 +443,35 @@ static int dump_task(struct task_struct *p, void *arg) return 0; } +__bpf_kfunc void set_oom_policy_name(struct oom_control *oc, const char *src, size_t sz) +{ + memset(oc->policy_name, 0, sizeof(oc->policy_name)); + + if (sz > POLICY_NAME_LEN) + sz = POLICY_NAME_LEN; + + memcpy(oc->policy_name, src, sz); +} + +__diag_push(); +__diag_ignore_all("-Wmissing-prototypes", + "kfuncs which will be used in BPF programs"); + +__weak noinline void bpf_set_policy_name(struct oom_control *oc) +{ +} + +__diag_pop(); + +BTF_SET8_START(bpf_oom_policy_kfunc_ids) +BTF_ID_FLAGS(func, set_oom_policy_name) +BTF_SET8_END(bpf_oom_policy_kfunc_ids) + +static const struct btf_kfunc_id_set bpf_oom_policy_kfunc_set = { + .owner = THIS_MODULE, + .set = &bpf_oom_policy_kfunc_ids, +}; + /** * dump_tasks - dump current memory state of all system tasks * @oc: pointer to struct oom_control @@ -484,8 +513,8 @@ static void dump_oom_summary(struct oom_control *oc, struct task_struct *victim) static void dump_header(struct oom_control *oc, struct task_struct *p) { - pr_warn("%s invoked oom-killer: gfp_mask=%#x(%pGg), order=%d, oom_score_adj=%hd\n", - current->comm, oc->gfp_mask, &oc->gfp_mask, oc->order, + pr_warn("%s invoked oom-killer: gfp_mask=%#x(%pGg), order=%d, policy_name=%s, oom_score_adj=%hd\n", + current->comm, oc->gfp_mask, &oc->gfp_mask, oc->order, oc->policy_name, current->signal->oom_score_adj); if (!IS_ENABLED(CONFIG_COMPACTION) && oc->order) pr_warn("COMPACTION is disabled!!!\n"); @@ -775,8 +804,11 @@ static int __init oom_init(void) err = register_btf_fmodret_id_set(&oom_bpf_fmodret_set); if (err) pr_warn("error while registering oom fmodret entrypoints: %d", err); + err = register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, + &bpf_oom_policy_kfunc_set); + if (err) + pr_warn("error while registering oom kfunc entrypoints: %d", err); #endif - return 0; } subsys_initcall(oom_init) @@ -1196,6 +1228,10 @@ bool out_of_memory(struct oom_control *oc) return true; } + set_oom_policy_name(oc, "default", sizeof("default")); + + bpf_set_policy_name(oc); + select_bad_process(oc); /* Found nothing?!?! */ if (!oc->chosen) { From patchwork Thu Aug 10 08:13:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 133875 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp329806vqi; Thu, 10 Aug 2023 03:54:07 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHEBsoWfFhA/H76g/kI0p32c50gjGsJOptppIGn7IaxdPUJrtHL2H38R1kTqSn6XzZ6qgq5 X-Received: by 2002:a05:6808:1a87:b0:3a7:88ce:7ac8 with SMTP id bm7-20020a0568081a8700b003a788ce7ac8mr1952062oib.47.1691664846853; Thu, 10 Aug 2023 03:54:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691664846; cv=none; d=google.com; s=arc-20160816; b=avYLaIWe8VcZh9spVJeJJFq4qQjzqnNP3okfMb5meAwJtQ3kRuUF0cCKKqJO0wysOm qOkJhW5XRVC6E0zaUW+XH2op6by2JWgIR2BR4IHoaQPoGijJQ2Zx/EHNB+L+MyBiodcK UifpfC+76fMlTYml8SI8/GiWomCaNIxdGLS2232/StEomvQH9R83RCj/8bgg8Yv1GbY5 oqYRk+HxpdpKNmdBgIbQmIQfv3ofnWMjnRd27f9ifmpDxY1+0E3/S6AC4zMjhhcK76bH DDardIOOtj6dHfY2mbhzroL5kflUNd41x4y4BBMs3rNdJayj2CWqbWmutHqIR1Yj1bUn Dl3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=bR96duwMScqqE4Do/jU3bN4CcpYhg3nXfWXt6kcMass=; fh=fFIPIFCL6zqeKuHG+Jl2txNyVulGV5Upezbe6aU6Pjs=; b=Qm1j+7dq49r+yIC6xSJ3/maPFOsU7R12WIamPhfnrmZ8xVnUNGaV6gaz+de3G9ISR8 kyGReOowNInth1isMu031zp2hlwCw01gZ/vpfsVgGML650+S2iA5k8W0LTW0pyQ8eCDj ZDFxlwKbZV+VY/L/2fRuVkaYHtK4WbCiDlHOatQ/M7h+Q+duKlkOb80SwO5tu4CaCw78 3wvAZAweVdZVBFwJedLVSUWqmcUXP8M1+DGAZY4UsVfYiDYWqDhX675MFUv5z4YuNfDy 2+xoluj0pZMSeL7UZZX0SLk94dCrE6qJrCCxpkxHdCkXLP98EZ3sQL88VCoZLVsc7ocB mwTw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=TXdsabfF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v135-20020a63618d000000b0056554b240f8si1382444pgb.175.2023.08.10.03.53.52; Thu, 10 Aug 2023 03:54:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=TXdsabfF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234033AbjHJINn (ORCPT + 99 others); Thu, 10 Aug 2023 04:13:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234007AbjHJINl (ORCPT ); Thu, 10 Aug 2023 04:13:41 -0400 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D27C10C4 for ; Thu, 10 Aug 2023 01:13:41 -0700 (PDT) Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-686f19b6dd2so449279b3a.2 for ; Thu, 10 Aug 2023 01:13:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691655221; x=1692260021; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bR96duwMScqqE4Do/jU3bN4CcpYhg3nXfWXt6kcMass=; b=TXdsabfFDEPwwJuaGZzBTo8pU4vepokuZPrymTlMQv1CvvfMKgCU29jA0QwH6nWuD0 gznXsPzuwLjEwlDK0vi+3+2uRxC1XMbW+lp+jWxIiLx3wBIpH31hYECAny9GNhrdQoBp fQwpX0cY+EOzJNEIbOPRRsH7MYAIzMIZor3623Qqkm1xdI1C87niDDFOIeGXXw9YKZKm brvKjzr+H6AxlWG+iqKS93izG5cWAN1ArPQzOhLyXud37OGWuLg4IpUu1Ebn7x3jQUuM TpKXPVzkIRw4vQlWHA/ApQtbsInEvHS24H7J3wHllzsOpLmActmt8abO0eoPurB7QaOk svQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691655221; x=1692260021; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bR96duwMScqqE4Do/jU3bN4CcpYhg3nXfWXt6kcMass=; b=DnJqFM39jwmDlRjAeufFifXSFWMGqjzvd12nBGsZ6/ZcODi+K1+bnFREQ7UDJIaQqA 3BKDHlwIhfIwPI2imrQ9zYAupaZxOkRjW6rxnwYvDIWdEkqZmSbJj6LbERVNuOaFD6nS l2aDKPHFzgEOAZgsb05SA9xG3z+h9HoS4E05Z1nVRQjg/p9vo/iKiHr9zuxAERSfFx07 3TOks3uM5GwCvvRbS7cyEGI3Fj3zSvXHF/c9p6tFnZmmSVGO+2vkNMwoXt1X01QFH8g/ uyFfjj2iVIuGi1vCGmqHZ4y0NMesMx3zCYMTuGN7uG2VFFi3WRSQBAKBonNdoCNeNiYB LWMg== X-Gm-Message-State: AOJu0YxG7SjH3cyU+suPRC6Xq3lAbUihcRYdIYmeJ0JruZh9kdYlho1d jaDrbFZ4pOYtXCF/OHZfknGJjw== X-Received: by 2002:a05:6a20:1387:b0:13d:af0e:4ee5 with SMTP id hn7-20020a056a20138700b0013daf0e4ee5mr1482972pzc.18.1691655220823; Thu, 10 Aug 2023 01:13:40 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.40]) by smtp.gmail.com with ESMTPSA id x12-20020a170902ec8c00b001b1a2c14a4asm1019036plg.38.2023.08.10.01.13.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Aug 2023 01:13:40 -0700 (PDT) From: Chuyi Zhou To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, muchun.song@linux.dev Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, Chuyi Zhou , Alan Maguire Subject: [RFC PATCH v2 3/5] mm: Add a tracepoint when OOM victim selection is failed Date: Thu, 10 Aug 2023 16:13:17 +0800 Message-Id: <20230810081319.65668-4-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230810081319.65668-1-zhouchuyi@bytedance.com> References: <20230810081319.65668-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773839158207398272 X-GMAIL-MSGID: 1773839158207398272 This patch add a tracepoint to mark the scenario where nothing was chosen for OOM killer. This would allow BPF programs to catch the fact that the BPF OOM policy didn't work well. Suggested-by: Alan Maguire Signed-off-by: Chuyi Zhou --- include/trace/events/oom.h | 18 ++++++++++++++++++ mm/oom_kill.c | 1 + 2 files changed, 19 insertions(+) diff --git a/include/trace/events/oom.h b/include/trace/events/oom.h index 26a11e4a2c36..b6ae1134229c 100644 --- a/include/trace/events/oom.h +++ b/include/trace/events/oom.h @@ -6,6 +6,7 @@ #define _TRACE_OOM_H #include #include +#include TRACE_EVENT(oom_score_adj_update, @@ -151,6 +152,23 @@ TRACE_EVENT(skip_task_reaping, TP_printk("pid=%d", __entry->pid) ); +TRACE_EVENT(select_bad_process_end, + + TP_PROTO(struct oom_control *oc), + + TP_ARGS(oc), + + TP_STRUCT__entry( + __array(char, policy_name, POLICY_NAME_LEN) + ), + + TP_fast_assign( + memcpy(__entry->policy_name, oc->policy_name, POLICY_NAME_LEN); + ), + + TP_printk("policy_name=%s", __entry->policy_name) +); + #ifdef CONFIG_COMPACTION TRACE_EVENT(compact_retry, diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 3239dcdba4d7..af40a1b750fa 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1235,6 +1235,7 @@ bool out_of_memory(struct oom_control *oc) select_bad_process(oc); /* Found nothing?!?! */ if (!oc->chosen) { + trace_select_bad_process_end(oc); dump_header(oc, NULL); pr_warn("Out of memory and no killable processes...\n"); /* From patchwork Thu Aug 10 08:13:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 133775 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp284389vqi; Thu, 10 Aug 2023 02:07:00 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGAUKUaJ4bmLaXSPEdhTlbnXY9fRAgoh7o4JpV3xm30mY8+5HCkOy8/L3qE6qfCnm6TvV1P X-Received: by 2002:a05:6808:14c8:b0:3a7:56a1:9bbe with SMTP id f8-20020a05680814c800b003a756a19bbemr2527907oiw.45.1691658419884; Thu, 10 Aug 2023 02:06:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691658419; cv=none; d=google.com; s=arc-20160816; b=H3k6WheGQPHb83jmG1VawNkEiLbChaTIAH25SnLMYblcP3Dy1CfGZnhCc8gNkuYFXv DCsNQTzjAP8k47jlranuQn8UeEM7XPEWJk9RBY6R5qk+w7Q96SzXLlZ/ywyF+MgfFB/F lSosbFMvfqG6mxPt1ez5CNM2MFeC/ZqkL3RUc8pWH7U5zq0Q3P0c1tNoUL53eujm5D4R ewEU+ECNdSZ+hHGtQIzJeJsxdvRgWXtQ8H+FwvMWSsuFdysTzQul5KwLgep8loL3kJVp x2meuTmVXzdy0GyLyycHceDSt8zkLb3d10ho1Vxgb8xVqx7f9lgS5Hws/ppa1ptitdUG y7cQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=LEsujYzeHDJksabxHXYH7r+I+WQ8F1nGq024d8cm1KU=; fh=5K/5O+uaWy/TXWHq+XYNoCsNLzz39Nw0/H1YBlI/NpI=; b=nzwfW0C5syi64K/iraqp3tbsbto/SDoMX23PivFZcBt3ETg7dPOY3Uu9lacmDPMyfa XRzgBke0OKE6438UoHdBtwR+jbFw2uQkIclACkGpORrsOe4E3gZDFBURgnmpIKbwQ917 iCA6pKjS8uA7KD+7IACd5hZ2l1pUx2UpesMvfH0R/3V7awIKnxh+eiCJeULvgz3IPU0F wNeN448s6AaEC9vvmvRD4vqezR9A1SRQ7ji1gf55vFCgzbJjAmtdxmCxVW+3BCg7Bgj/ yeLHd6S6Quma67SoDSDma1V3SdmNL3HTdwe58S52nAekDLIp1g5BKQFJONJSf04pzS4Y Wahw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=VuTFtoYE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i22-20020a17090adc1600b00268414272d1si1171962pjv.156.2023.08.10.02.06.47; Thu, 10 Aug 2023 02:06:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=VuTFtoYE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234114AbjHJIOQ (ORCPT + 99 others); Thu, 10 Aug 2023 04:14:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41528 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234176AbjHJIOL (ORCPT ); Thu, 10 Aug 2023 04:14:11 -0400 Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA7A32136 for ; Thu, 10 Aug 2023 01:13:45 -0700 (PDT) Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-1bc8045e09dso5147745ad.0 for ; Thu, 10 Aug 2023 01:13:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691655225; x=1692260025; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=LEsujYzeHDJksabxHXYH7r+I+WQ8F1nGq024d8cm1KU=; b=VuTFtoYEBUXlmZ/xn9ftnduPgMunFsfNJN8FNbD7NPo+XLvQDZC82CdMZileZgf7Ju J6N3h2CD+mAIE6hKRtUvuxSpadHUagPDYdUrTa4ay0ncTKS2MyjfNBe6eVPT14fr1HD5 kq6marMYBOiAeqDY6CTWVjuC3XM5YtbqNlCNH/aIP8g8SlkuknQocQyNqJtsiB6YzuhB uIJNNN8uyjXlvs3eP+8s4hLCB1yjUKioQZcsW+p7dsPom0psuNhy8Tf73mLptX1KoVFy 9AQav14P/kk92N8V/4fpdXgdt5oHI3z4UP1dFui92Em4UReElSPkLCYpVCWza+wM+d8m Fu1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691655225; x=1692260025; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LEsujYzeHDJksabxHXYH7r+I+WQ8F1nGq024d8cm1KU=; b=ZDKJpUds0Y+FmLJz1AVlhEkLw90l0C6w4rt07VGYlXf2ZlxkRe0q/uqg7hSWGsEnNA /qtA7xmPVK1KbfBB08VqDYI7KKIueEmH7z4/cirLycGKi9B/8Sj7zaEVuFlDL5n+3FKa maVrXTC19DohVO3+L8TYOlamleNeT8MA2CuH4oH2BHC50wjAHwGvhR6YdpU9RFJcGOmm YwuXXCGcrqJuRO3kvbglt0XCRalQc9NEagE8dUGA/f6rpbuOM6CuLaV+q2ShtOvvBVhM ju/IwZM1lVMFDSZc/7N8U94sBcZWK1PWt20jH52kiUpzbwT73CP5ziZh+TgIu0woeAgP c8Yw== X-Gm-Message-State: AOJu0YwG1+t0gwqV8xAtJTbZSE/UQpVrpQ0maOOylj7c+IEBpwg6m8y6 IcEfMFRhICsXiMgGxjc6dk1JHg== X-Received: by 2002:a17:903:11c8:b0:1b6:4bbd:c3a7 with SMTP id q8-20020a17090311c800b001b64bbdc3a7mr1431227plh.66.1691655225352; Thu, 10 Aug 2023 01:13:45 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.40]) by smtp.gmail.com with ESMTPSA id x12-20020a170902ec8c00b001b1a2c14a4asm1019036plg.38.2023.08.10.01.13.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Aug 2023 01:13:45 -0700 (PDT) From: Chuyi Zhou To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, muchun.song@linux.dev Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, Chuyi Zhou Subject: [RFC PATCH v2 4/5] bpf: Add a OOM policy test Date: Thu, 10 Aug 2023 16:13:18 +0800 Message-Id: <20230810081319.65668-5-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230810081319.65668-1-zhouchuyi@bytedance.com> References: <20230810081319.65668-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773832419581567697 X-GMAIL-MSGID: 1773832419581567697 This patch adds a test which implements a priority-based policy through bpf_oom_evaluate_task. The BPF program, oom_policy.c, compares the cgroup priority of two tasks and select the lower one. The userspace program test_oom_policy.c maintains a priority map by using cgroup id as the keys and priority as the values. We could protect certain cgroups from oom-killer by setting higher priority. Signed-off-by: Chuyi Zhou --- .../bpf/prog_tests/test_oom_policy.c | 140 ++++++++++++++++++ .../testing/selftests/bpf/progs/oom_policy.c | 104 +++++++++++++ 2 files changed, 244 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/test_oom_policy.c create mode 100644 tools/testing/selftests/bpf/progs/oom_policy.c diff --git a/tools/testing/selftests/bpf/prog_tests/test_oom_policy.c b/tools/testing/selftests/bpf/prog_tests/test_oom_policy.c new file mode 100644 index 000000000000..bea61ff22603 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/test_oom_policy.c @@ -0,0 +1,140 @@ +// SPDX-License-Identifier: GPL-2.0-only +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "cgroup_helpers.h" +#include "oom_policy.skel.h" + +static int map_fd; +static int cg_nr; +struct { + const char *path; + int fd; + unsigned long long id; +} cgs[] = { + { "/cg1" }, + { "/cg2" }, +}; + + +static struct oom_policy *open_load_oom_policy_skel(void) +{ + struct oom_policy *skel; + int err; + + skel = oom_policy__open(); + if (!ASSERT_OK_PTR(skel, "skel_open")) + return NULL; + + err = oom_policy__load(skel); + if (!ASSERT_OK(err, "skel_load")) + goto cleanup; + + return skel; + +cleanup: + oom_policy__destroy(skel); + return NULL; +} + +static void run_memory_consume(unsigned long long consume_size, int idx) +{ + char *buf; + + join_parent_cgroup(cgs[idx].path); + buf = malloc(consume_size); + memset(buf, 0, consume_size); + sleep(2); + exit(0); +} + +static int set_cgroup_prio(unsigned long long cg_id, int prio) +{ + int err; + + err = bpf_map_update_elem(map_fd, &cg_id, &prio, BPF_ANY); + ASSERT_EQ(err, 0, "update_map"); + return err; +} + +static int prepare_cgroup_environment(void) +{ + int err; + + err = setup_cgroup_environment(); + if (err) + goto clean_cg_env; + for (int i = 0; i < cg_nr; i++) { + err = cgs[i].fd = create_and_get_cgroup(cgs[i].path); + if (!ASSERT_GE(cgs[i].fd, 0, "cg_create")) + goto clean_cg_env; + cgs[i].id = get_cgroup_id(cgs[i].path); + } + return 0; +clean_cg_env: + cleanup_cgroup_environment(); + return err; +} + +void test_oom_policy(void) +{ + struct oom_policy *skel; + struct bpf_link *link; + int err; + int victim_pid; + unsigned long long victim_cg_id; + + link = NULL; + cg_nr = ARRAY_SIZE(cgs); + + skel = open_load_oom_policy_skel(); + err = oom_policy__attach(skel); + if (!ASSERT_OK(err, "oom_policy__attach")) + goto cleanup; + + map_fd = bpf_object__find_map_fd_by_name(skel->obj, "cg_map"); + if (!ASSERT_GE(map_fd, 0, "find map")) + goto cleanup; + + err = prepare_cgroup_environment(); + if (!ASSERT_EQ(err, 0, "prepare cgroup env")) + goto cleanup; + + write_cgroup_file("/", "memory.max", "10M"); + + /* + * Set higher priority to cg2 and lower to cg1, so we would select + * task under cg1 as victim.(see oom_policy.c) + */ + set_cgroup_prio(cgs[0].id, 10); + set_cgroup_prio(cgs[1].id, 50); + + victim_cg_id = cgs[0].id; + victim_pid = fork(); + + if (victim_pid == 0) + run_memory_consume(1024 * 1024 * 4, 0); + + if (fork() == 0) + run_memory_consume(1024 * 1024 * 8, 1); + + while (wait(NULL) > 0) + ; + + ASSERT_EQ(skel->bss->victim_pid, victim_pid, "victim_pid"); + ASSERT_EQ(skel->bss->victim_cg_id, victim_cg_id, "victim_cgid"); + ASSERT_EQ(skel->bss->failed_cnt, 1, "failed_cnt"); +cleanup: + bpf_link__destroy(link); + oom_policy__destroy(skel); + cleanup_cgroup_environment(); +} diff --git a/tools/testing/selftests/bpf/progs/oom_policy.c b/tools/testing/selftests/bpf/progs/oom_policy.c new file mode 100644 index 000000000000..fc9efc93914e --- /dev/null +++ b/tools/testing/selftests/bpf/progs/oom_policy.c @@ -0,0 +1,104 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include + +char _license[] SEC("license") = "GPL"; + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, int); + __type(value, int); + __uint(max_entries, 24); +} cg_map SEC(".maps"); + +unsigned int victim_pid; +u64 victim_cg_id; +int failed_cnt; + +#define EOPNOTSUPP 95 + +enum { + NO_BPF_POLICY, + BPF_EVAL_ABORT, + BPF_EVAL_NEXT, + BPF_EVAL_SELECT, +}; + +extern void set_oom_policy_name(struct oom_control *oc, const char *buf, size_t sz) __ksym; + +static __always_inline u64 task_cgroup_id(struct task_struct *task) +{ + struct kernfs_node *node; + struct task_group *tg; + + if (!task) + return 0; + + tg = task->sched_task_group; + node = tg->css.cgroup->kn; + + return node->id; +} + +SEC("fentry/oom_kill_process") +int BPF_PROG(oom_kill_process_k, struct oom_control *oc, const char *message) +{ + struct task_struct *victim = oc->chosen; + + if (victim) { + victim_cg_id = task_cgroup_id(victim); + victim_pid = victim->pid; + } + + return 0; +} + +SEC("fentry/bpf_set_policy_name") +int BPF_PROG(set_police_name_k, struct oom_control *oc) +{ + char name[] = "cg_prio"; + set_oom_policy_name(oc, name, sizeof(name)); + return 0; +} + +SEC("tp_btf/select_bad_process_end") +int BPF_PROG(record_failed, struct oom_control *oc) +{ + failed_cnt += 1; + return 0; +} + +SEC("fmod_ret/bpf_oom_evaluate_task") +int BPF_PROG(bpf_oom_evaluate_task, struct task_struct *task, struct oom_control *oc) +{ + int chosen_cg_prio, task_cg_prio; + u64 chosen_cg_id, task_cg_id; + struct task_struct *chosen; + int *val; + + if (!failed_cnt) + return BPF_EVAL_NEXT; + + chosen = oc->chosen; + if (!chosen) + return BPF_EVAL_SELECT; + + chosen_cg_id = task_cgroup_id(chosen); + task_cg_id = task_cgroup_id(task); + chosen_cg_prio = task_cg_prio = 0; + val = bpf_map_lookup_elem(&cg_map, &chosen_cg_id); + if (val) + chosen_cg_prio = *val; + val = bpf_map_lookup_elem(&cg_map, &task_cg_id); + if (val) + task_cg_prio = *val; + + if (chosen_cg_prio > task_cg_prio) + return BPF_EVAL_SELECT; + if (chosen_cg_prio < task_cg_prio) + return BPF_EVAL_NEXT; + + return NO_BPF_POLICY; +} + From patchwork Thu Aug 10 08:13:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 133763 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp278616vqi; Thu, 10 Aug 2023 01:55:01 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE38Lwk+nuG1HB00vh5d+rexn/7BDgD/lgqDq/xAbM/yG6Dgcnb0mJF4Jedh4ScmwKGpVIA X-Received: by 2002:a05:6a20:9143:b0:13a:ccb9:d5b7 with SMTP id x3-20020a056a20914300b0013accb9d5b7mr1922965pzc.41.1691657701621; Thu, 10 Aug 2023 01:55:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691657701; cv=none; d=google.com; s=arc-20160816; b=unVJbMvVWlUjHDEZK2y7In213sq94l3rTFaRrcCWrY7ALkSJuBGbsCymSo9AaZ5D9+ bfGFlGxQT6CEOxD0Yl50l0dxqICHc1bFCVvKNBAgQvKOYtMBfxJDUtQzpoY6Pk4xFDvn IRDnUykCDeG9yYwRaVyP20sTCTz9NrYsryp5Kz1XlDrHTYYKMvO2/IcGEqHBcPKTcn93 x5bqmKZbeGIFo3v3zKP5PCILrFBXkiUiie9vKBkrE7BEZ4X4NouxEc9l0B2KVttOi0r5 eD3D4U6dgZrbkHCJZYQ5zrq7RAHfXRbNbWfBEci7YP98/H0foOQbFXkyID6JYdFIbIEj Dp/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=YS6ue8t2dd5iDW6jb431VKpPx5tAdDiMNn4sU9SZ4eM=; fh=5K/5O+uaWy/TXWHq+XYNoCsNLzz39Nw0/H1YBlI/NpI=; b=H4f0nSZM7Yks/lDdn6VtflfFbNuNllkhC6YR5UhXScjgmyvIhamsB2zIawg36Prs2B 7QOKxZLlkah3Qq+mQbeljnhkTDKJkNLmSTqixxP708H4PDnaEc8igJi1MSumwVa8zG/O Dhsf/sQpL8VsadL4ZEko2X2FIkYJh+uIk/q4lluKBC81Jv7KQmC72pCofjFTGsOqgepV oF1scImJ2rbAoFpJkNuutp/2KAyGzVjBHqN47XHUwbzc73m6vVKh5WSl0R0yvnpWgpTu 9OcopvGwKQGa0o2J2eSnnzfyzEe8lgWLK9GC5f7AUAwG+yO3NwOahTE4vREpg6p3fGpe HoNw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=KNJbzwTx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k9-20020a170902c40900b001adc5bc4d8asi1155609plk.572.2023.08.10.01.54.48; Thu, 10 Aug 2023 01:55:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=KNJbzwTx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234092AbjHJIOS (ORCPT + 99 others); Thu, 10 Aug 2023 04:14:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48316 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234178AbjHJIOL (ORCPT ); Thu, 10 Aug 2023 04:14:11 -0400 Received: from mail-pg1-x52e.google.com (mail-pg1-x52e.google.com [IPv6:2607:f8b0:4864:20::52e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5F3F2E7E for ; Thu, 10 Aug 2023 01:13:51 -0700 (PDT) Received: by mail-pg1-x52e.google.com with SMTP id 41be03b00d2f7-564b8e60ce9so432088a12.2 for ; Thu, 10 Aug 2023 01:13:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691655231; x=1692260031; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=YS6ue8t2dd5iDW6jb431VKpPx5tAdDiMNn4sU9SZ4eM=; b=KNJbzwTxxK3KzgAfGpr2FeHEj85XtDGujL07/8Dd3z3V3jSF8Rh+LYfOCIp5iXK0w7 YcNv7ogCQvXKM3pEP24qCmrjpg/JDP9h1bPZJUTufYhvRvW+AfvjaCJjSAIiDVPy4YlB FvMnZgpH//K7t4jlj1KtZkT8Tjr1dVTBmA31ymCxgtTRo28F/xNhPrvKRxrHZK39ygQZ TPSiOnZgnkDk1zDo7yeScGo2f+KIT1WWFlda15vKkcDADn6i8rF0kiPkSV8N9pjikKK2 NsnxwPD8zv745JOrycqHn98sjc4x3oYJA0Stmu2pcb6Xa6lTRk6/qVJaIsczS2Rfi1mM 78hA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691655231; x=1692260031; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YS6ue8t2dd5iDW6jb431VKpPx5tAdDiMNn4sU9SZ4eM=; b=jem38Q03kgOhbuCGqpGyvcnQhoAaP0O132pPuxQJYOFajkt3450WoFFtNqsjve4nR7 kwVDzjMkSCYqq8DbZQy5paEZIGOMrvsilU4sHsWSHAAS3j1PINgZGLe8Y5WQDd42Xdf4 zD/l+UUlQbgQSmAFdX/lst2U66GuPwpsvmf+2hOefn1KJdYqd1YxVyltDjkzoT26Rhqh cjRVF/0CtSpa86pGBVgWAXRmGchKLLsunnRf6N6wwQbyZD/In4B/jdCSSRpjfoh1jLrC 6Ht3vts4AEP0ENvD9aNvo9hAviMH205bmvHeqvcaxeXJtVzXuS5vL8bqsH3VPHXHAMX+ Fngg== X-Gm-Message-State: AOJu0YxweEDp5vnDYOTids1PN30QiTA35aJWZJnb+wEK7q3+C/fVEbuU BfPCCpBZKCVeRP4OBzD8A0wmpA== X-Received: by 2002:a17:902:e548:b0:1ac:63ac:10a7 with SMTP id n8-20020a170902e54800b001ac63ac10a7mr1519133plf.68.1691655230885; Thu, 10 Aug 2023 01:13:50 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.40]) by smtp.gmail.com with ESMTPSA id x12-20020a170902ec8c00b001b1a2c14a4asm1019036plg.38.2023.08.10.01.13.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Aug 2023 01:13:50 -0700 (PDT) From: Chuyi Zhou To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, muchun.song@linux.dev Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, Chuyi Zhou Subject: [RFC PATCH v2 5/5] bpf: Add a BPF OOM policy Doc Date: Thu, 10 Aug 2023 16:13:19 +0800 Message-Id: <20230810081319.65668-6-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230810081319.65668-1-zhouchuyi@bytedance.com> References: <20230810081319.65668-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773831666141944816 X-GMAIL-MSGID: 1773831666141944816 This patch adds a new doc Documentation/bpf/oom.rst to describe how BPF OOM policy is supposed to work. Signed-off-by: Chuyi Zhou --- Documentation/bpf/oom.rst | 70 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) create mode 100644 Documentation/bpf/oom.rst diff --git a/Documentation/bpf/oom.rst b/Documentation/bpf/oom.rst new file mode 100644 index 000000000000..9bad1fd30d4a --- /dev/null +++ b/Documentation/bpf/oom.rst @@ -0,0 +1,70 @@ +============= +BPF OOM Policy +============= + +The Out Of Memory Killer (aka OOM Killer) is invoked when the system is +critically low on memory. The in-kernel implementation is to iterate over +all tasks in the specific oom domain (all tasks for global and all members +of memcg tree for hard limit oom) and select a victim based some heuristic +policy to kill. + +Specifically: + +1. Begin to iterate tasks using ``oom_evaluate_task()`` and find a valid (killable) + victim in iteration N, select it. + +2. In iteration N + 1, N + 2..., we compare the current iteration task with the + previous selected task, if current is more suitable then select it. + +3. finally we get a victim to kill. + +However, this does not meet the needs of users in some special scenarios. Using +the eBPF capabilities, We can implement customized OOM policies to meet needs. + +Developer API: +================== + +bpf_oom_evaluate_task +---------------------- + +``bpf_oom_evaluate_task`` is a new interface hooking into ``oom_evaluate_task()`` +which is used to bypass the in-kernel selection logic. Users can customize their +victim selection policy through BPF programs attached to it. +:: + + int bpf_oom_evaluate_task(struct task_struct *task, + struct oom_control *oc); + +return value:: + + NO_BPF_POLICY no bpf policy and would fallback to the in-kernel selection + BPF_EVAL_ABORT abort the selection (exit from current selection loop) + BPF_EVAL_NEXT ignore the task + BPF_EAVL_SELECT select the current task + +Suppose we want to select a victim based on the specified pid when OOM is +invoked, we can use the following BPF program:: + + SEC("fmod_ret/bpf_oom_evaluate_task") + int BPF_PROG(bpf_oom_evaluate_task, struct task_struct *task, struct oom_control *oc) + { + if (task->pid == target_pid) + return BPF_EAVL_SELECT; + return BPF_EVAL_NEXT; + } + +bpf_set_policy_name +--------------------- + +``bpf_set_policy_name`` is a interface hooking before the start of victim selection. We can +set policy's name in the attached program, so dump_header() can identify different policies +when reporting messages. We can set policy's name through kfunc ``set_oom_policy_name`` +:: + + SEC("fentry/bpf_set_policy_name") + int BPF_PROG(set_police_name_k, struct oom_control *oc) + { + char name[] = "my_policy"; + set_oom_policy_name(oc, name, sizeof(name)); + return 0; + } \ No newline at end of file