From patchwork Thu Aug 10 08:13:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chuyi Zhou X-Patchwork-Id: 133763 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp278616vqi; Thu, 10 Aug 2023 01:55:01 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE38Lwk+nuG1HB00vh5d+rexn/7BDgD/lgqDq/xAbM/yG6Dgcnb0mJF4Jedh4ScmwKGpVIA X-Received: by 2002:a05:6a20:9143:b0:13a:ccb9:d5b7 with SMTP id x3-20020a056a20914300b0013accb9d5b7mr1922965pzc.41.1691657701621; Thu, 10 Aug 2023 01:55:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691657701; cv=none; d=google.com; s=arc-20160816; b=unVJbMvVWlUjHDEZK2y7In213sq94l3rTFaRrcCWrY7ALkSJuBGbsCymSo9AaZ5D9+ bfGFlGxQT6CEOxD0Yl50l0dxqICHc1bFCVvKNBAgQvKOYtMBfxJDUtQzpoY6Pk4xFDvn IRDnUykCDeG9yYwRaVyP20sTCTz9NrYsryp5Kz1XlDrHTYYKMvO2/IcGEqHBcPKTcn93 x5bqmKZbeGIFo3v3zKP5PCILrFBXkiUiie9vKBkrE7BEZ4X4NouxEc9l0B2KVttOi0r5 eD3D4U6dgZrbkHCJZYQ5zrq7RAHfXRbNbWfBEci7YP98/H0foOQbFXkyID6JYdFIbIEj Dp/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=YS6ue8t2dd5iDW6jb431VKpPx5tAdDiMNn4sU9SZ4eM=; fh=5K/5O+uaWy/TXWHq+XYNoCsNLzz39Nw0/H1YBlI/NpI=; b=H4f0nSZM7Yks/lDdn6VtflfFbNuNllkhC6YR5UhXScjgmyvIhamsB2zIawg36Prs2B 7QOKxZLlkah3Qq+mQbeljnhkTDKJkNLmSTqixxP708H4PDnaEc8igJi1MSumwVa8zG/O Dhsf/sQpL8VsadL4ZEko2X2FIkYJh+uIk/q4lluKBC81Jv7KQmC72pCofjFTGsOqgepV oF1scImJ2rbAoFpJkNuutp/2KAyGzVjBHqN47XHUwbzc73m6vVKh5WSl0R0yvnpWgpTu 9OcopvGwKQGa0o2J2eSnnzfyzEe8lgWLK9GC5f7AUAwG+yO3NwOahTE4vREpg6p3fGpe HoNw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=KNJbzwTx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k9-20020a170902c40900b001adc5bc4d8asi1155609plk.572.2023.08.10.01.54.48; Thu, 10 Aug 2023 01:55:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=KNJbzwTx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234092AbjHJIOS (ORCPT + 99 others); Thu, 10 Aug 2023 04:14:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48316 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234178AbjHJIOL (ORCPT ); Thu, 10 Aug 2023 04:14:11 -0400 Received: from mail-pg1-x52e.google.com (mail-pg1-x52e.google.com [IPv6:2607:f8b0:4864:20::52e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5F3F2E7E for ; Thu, 10 Aug 2023 01:13:51 -0700 (PDT) Received: by mail-pg1-x52e.google.com with SMTP id 41be03b00d2f7-564b8e60ce9so432088a12.2 for ; Thu, 10 Aug 2023 01:13:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691655231; x=1692260031; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=YS6ue8t2dd5iDW6jb431VKpPx5tAdDiMNn4sU9SZ4eM=; b=KNJbzwTxxK3KzgAfGpr2FeHEj85XtDGujL07/8Dd3z3V3jSF8Rh+LYfOCIp5iXK0w7 YcNv7ogCQvXKM3pEP24qCmrjpg/JDP9h1bPZJUTufYhvRvW+AfvjaCJjSAIiDVPy4YlB FvMnZgpH//K7t4jlj1KtZkT8Tjr1dVTBmA31ymCxgtTRo28F/xNhPrvKRxrHZK39ygQZ TPSiOnZgnkDk1zDo7yeScGo2f+KIT1WWFlda15vKkcDADn6i8rF0kiPkSV8N9pjikKK2 NsnxwPD8zv745JOrycqHn98sjc4x3oYJA0Stmu2pcb6Xa6lTRk6/qVJaIsczS2Rfi1mM 78hA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691655231; x=1692260031; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YS6ue8t2dd5iDW6jb431VKpPx5tAdDiMNn4sU9SZ4eM=; b=jem38Q03kgOhbuCGqpGyvcnQhoAaP0O132pPuxQJYOFajkt3450WoFFtNqsjve4nR7 kwVDzjMkSCYqq8DbZQy5paEZIGOMrvsilU4sHsWSHAAS3j1PINgZGLe8Y5WQDd42Xdf4 zD/l+UUlQbgQSmAFdX/lst2U66GuPwpsvmf+2hOefn1KJdYqd1YxVyltDjkzoT26Rhqh cjRVF/0CtSpa86pGBVgWAXRmGchKLLsunnRf6N6wwQbyZD/In4B/jdCSSRpjfoh1jLrC 6Ht3vts4AEP0ENvD9aNvo9hAviMH205bmvHeqvcaxeXJtVzXuS5vL8bqsH3VPHXHAMX+ Fngg== X-Gm-Message-State: AOJu0YxweEDp5vnDYOTids1PN30QiTA35aJWZJnb+wEK7q3+C/fVEbuU BfPCCpBZKCVeRP4OBzD8A0wmpA== X-Received: by 2002:a17:902:e548:b0:1ac:63ac:10a7 with SMTP id n8-20020a170902e54800b001ac63ac10a7mr1519133plf.68.1691655230885; Thu, 10 Aug 2023 01:13:50 -0700 (PDT) Received: from n37-019-243.byted.org ([180.184.51.40]) by smtp.gmail.com with ESMTPSA id x12-20020a170902ec8c00b001b1a2c14a4asm1019036plg.38.2023.08.10.01.13.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 10 Aug 2023 01:13:50 -0700 (PDT) From: Chuyi Zhou To: hannes@cmpxchg.org, mhocko@kernel.org, roman.gushchin@linux.dev, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, muchun.song@linux.dev Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com, robin.lu@bytedance.com, Chuyi Zhou Subject: [RFC PATCH v2 5/5] bpf: Add a BPF OOM policy Doc Date: Thu, 10 Aug 2023 16:13:19 +0800 Message-Id: <20230810081319.65668-6-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230810081319.65668-1-zhouchuyi@bytedance.com> References: <20230810081319.65668-1-zhouchuyi@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773831666141944816 X-GMAIL-MSGID: 1773831666141944816 This patch adds a new doc Documentation/bpf/oom.rst to describe how BPF OOM policy is supposed to work. Signed-off-by: Chuyi Zhou --- Documentation/bpf/oom.rst | 70 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 70 insertions(+) create mode 100644 Documentation/bpf/oom.rst diff --git a/Documentation/bpf/oom.rst b/Documentation/bpf/oom.rst new file mode 100644 index 000000000000..9bad1fd30d4a --- /dev/null +++ b/Documentation/bpf/oom.rst @@ -0,0 +1,70 @@ +============= +BPF OOM Policy +============= + +The Out Of Memory Killer (aka OOM Killer) is invoked when the system is +critically low on memory. The in-kernel implementation is to iterate over +all tasks in the specific oom domain (all tasks for global and all members +of memcg tree for hard limit oom) and select a victim based some heuristic +policy to kill. + +Specifically: + +1. Begin to iterate tasks using ``oom_evaluate_task()`` and find a valid (killable) + victim in iteration N, select it. + +2. In iteration N + 1, N + 2..., we compare the current iteration task with the + previous selected task, if current is more suitable then select it. + +3. finally we get a victim to kill. + +However, this does not meet the needs of users in some special scenarios. Using +the eBPF capabilities, We can implement customized OOM policies to meet needs. + +Developer API: +================== + +bpf_oom_evaluate_task +---------------------- + +``bpf_oom_evaluate_task`` is a new interface hooking into ``oom_evaluate_task()`` +which is used to bypass the in-kernel selection logic. Users can customize their +victim selection policy through BPF programs attached to it. +:: + + int bpf_oom_evaluate_task(struct task_struct *task, + struct oom_control *oc); + +return value:: + + NO_BPF_POLICY no bpf policy and would fallback to the in-kernel selection + BPF_EVAL_ABORT abort the selection (exit from current selection loop) + BPF_EVAL_NEXT ignore the task + BPF_EAVL_SELECT select the current task + +Suppose we want to select a victim based on the specified pid when OOM is +invoked, we can use the following BPF program:: + + SEC("fmod_ret/bpf_oom_evaluate_task") + int BPF_PROG(bpf_oom_evaluate_task, struct task_struct *task, struct oom_control *oc) + { + if (task->pid == target_pid) + return BPF_EAVL_SELECT; + return BPF_EVAL_NEXT; + } + +bpf_set_policy_name +--------------------- + +``bpf_set_policy_name`` is a interface hooking before the start of victim selection. We can +set policy's name in the attached program, so dump_header() can identify different policies +when reporting messages. We can set policy's name through kfunc ``set_oom_policy_name`` +:: + + SEC("fentry/bpf_set_policy_name") + int BPF_PROG(set_police_name_k, struct oom_control *oc) + { + char name[] = "my_policy"; + set_oom_policy_name(oc, name, sizeof(name)); + return 0; + } \ No newline at end of file