Message ID | 20230314091136.264878-1-haifeng.xu@shopee.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1648843wrd; Tue, 14 Mar 2023 02:18:20 -0700 (PDT) X-Google-Smtp-Source: AK7set+/cIB/stscV52+VPmeJS78IB0R2arrb2o8xO2Vrdp/GL7Ci3TxX5GV9Qo5IrTBfvLR/cek X-Received: by 2002:a17:90a:7402:b0:23d:15d8:1bc3 with SMTP id a2-20020a17090a740200b0023d15d81bc3mr5107424pjg.39.1678785500230; Tue, 14 Mar 2023 02:18:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678785500; cv=none; d=google.com; s=arc-20160816; b=isA/N66kNy/czOjII1WOJQhRHtye7q8E/DUAfc5ssW6GO7p8U95SXHkCeWB+0P2tLB Tbxy6K8kyaXTq+uBczvFz2ZI3ItY8jZ+s/92ffQMcnzcUpEi3cXMyHUbgJ7XwDByooKB CwWmP+6SK1laGW9vmhDGf38PGOzakjAZPDxQJH+xu/lLgN0EWH/lC+vfScEu2m08xa/p p35/g1kLYasrjpBq15wZVleIDlqzLT7EUiLo2uavy1vOJ3Tf2tIl9lf0paWToVEeeqpd TDvtQ3z81ajNZqhkXHbmiJX8UDIOu7Q/UExDkWiWGrv5H5UMm1q5ISpgL6b4ej9XRXy/ Jq+g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=vKgzht+gmBOBVC9nKYNuGxb9cEGGStYD0EpYNTNxlyk=; b=DubROsa37rm0rYNlvY2alBVz1zdbKVUM32J0hmu0dVNkJ9q7yyeLuYpWScZxyl2UYU W1m7J5RSp65WHT8Nh+WwNg6g25yNc4P2VJ3ROXMNA96sCxwVYGWy/QR+Y03Dfj23Ycdf ljSb5f/tcmzscNjVbGEdhZXzWSKPHk8kvZScpJXHiBjyApVfBV1EJCoQuInQnCztOT30 rENtCYlpuBTinhYD/zRdu9DhCnzWtGvRuExdo/CgzOhQQ46nIZU7CrbX1HiO2rAPN6eJ 4yIPYR/1VaQPIfoZTQbw/I7IBd3i7KpvfawAWtHtidAj/a3duxqkqel/DmNPRwSAUQsd rEEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@shopee.com header.s=shopee.com header.b=MMrv6ZFH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=shopee.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x9-20020a17090a970900b0023676bd55d1si2082489pjo.94.2023.03.14.02.18.03; Tue, 14 Mar 2023 02:18:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@shopee.com header.s=shopee.com header.b=MMrv6ZFH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=shopee.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230284AbjCNJMH (ORCPT <rfc822;realc9580@gmail.com> + 99 others); Tue, 14 Mar 2023 05:12:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45014 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230499AbjCNJL5 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 14 Mar 2023 05:11:57 -0400 Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 695BB97FCF for <linux-kernel@vger.kernel.org>; Tue, 14 Mar 2023 02:11:56 -0700 (PDT) Received: by mail-pj1-x102a.google.com with SMTP id nn12so14721407pjb.5 for <linux-kernel@vger.kernel.org>; Tue, 14 Mar 2023 02:11:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shopee.com; s=shopee.com; t=1678785116; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=vKgzht+gmBOBVC9nKYNuGxb9cEGGStYD0EpYNTNxlyk=; b=MMrv6ZFHPI284Taspim+FZeJ6qozn0gloLqh+CxK4nBduqnhhyIbXrePSgt5OCIHrE EvqrolF79gdgWPT8feQqTD/5LwfRwaKzK22Skvz80I4XiRYCYJplQ+niOpiXf6P/IEjz h0Ztc5KvjaGqg473FVDid4DZBxtst34g/Z+x/ROnyG+orVI/zI0/Znxy38Nwx771RuvN PYta3zARpAZxDa2rLgHEhG3jx6JoZXGifgFpezRU2rREdpRkvprpJU3FXLufhHbGSqvb +ozCg/0srVvYP2w/w2YDxZcGPx4uTbf7j3UrsOAar48T+xvMjJQ3ZZJ++mQ2x8sirVwC DADA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678785116; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=vKgzht+gmBOBVC9nKYNuGxb9cEGGStYD0EpYNTNxlyk=; b=3Iy1Rdb1yCkBSS6p0myxJYGuz6SPidlhFAF8GcO+9smfgLUlVqm3yu/JdExCqcNwCq ZQAmYzROWBwmaksLsH7ogXck/ggUQz7jAU9lZ+wvbiHnBkEqNNlAWzqAuaSf/5noi5JG b87OkeDJy/R2Sw9E62Nuvg+dcYp8PDNh0pTsDDRaoewnmwch1qVCwo7jjuHx6IjieRbl tTR/kd2BgdivmJFrH3Ssc1TsFyX/CZy2cKOl2XEoBw1RoPZsTe//+BmV2a+4NFXW1yfI LZl4SkV1Z/rUI5RCzT199E2WgntK2o7KS1WJUBZk7nkvGdmG4eRvFECHvOXSAfzq02dT +FeA== X-Gm-Message-State: AO0yUKW0t/tmxX2zhx42PpAYrUaG8+HmRAYtRZNI5GtEixA9edF5DsF/ NPjvN8pdTy9pirPagp7W8OppJUKkU6odfZAsD19ljMmW X-Received: by 2002:a17:902:e54e:b0:1a0:42c0:b2af with SMTP id n14-20020a170902e54e00b001a042c0b2afmr8081816plf.33.1678785115897; Tue, 14 Mar 2023 02:11:55 -0700 (PDT) Received: from ubuntu-haifeng.default.svc.cluster.local ([101.127.248.173]) by smtp.gmail.com with ESMTPSA id kj8-20020a17090306c800b001991942dde7sm1258594plb.125.2023.03.14.02.11.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Mar 2023 02:11:55 -0700 (PDT) From: Haifeng Xu <haifeng.xu@shopee.com> To: mhocko@suse.com Cc: shakeelb@google.com, hannes@cmpxchg.org, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Haifeng Xu <haifeng.xu@shopee.com> Subject: [PATCH RESEND] mm/oom_kill: don't kill exiting tasks in oom_kill_memcg_member Date: Tue, 14 Mar 2023 09:11:36 +0000 Message-Id: <20230314091136.264878-1-haifeng.xu@shopee.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760334184652503193?= X-GMAIL-MSGID: =?utf-8?q?1760334184652503193?= |
Series |
[RESEND] mm/oom_kill: don't kill exiting tasks in oom_kill_memcg_member
|
|
Commit Message
Haifeng Xu
March 14, 2023, 9:11 a.m. UTC
If oom_group is set, oom_kill_process() invokes oom_kill_memcg_member()
to kill all processes in the memcg. When scanning tasks in memcg, maybe
the provided task is marked as oom victim. Also, some tasks are likely
to release their address space. There is no need to kill the exiting tasks.
In order to handle these tasks which may free memory in the future, add
a function helper reap_task_will_free_mem() to mark it as oom victim and
queue it in oom reaper.
Signed-off-by: Haifeng Xu <haifeng.xu@shopee.com>
---
mm/oom_kill.c | 35 +++++++++++++++++++++++++++++------
1 file changed, 29 insertions(+), 6 deletions(-)
Comments
On Tue 14-03-23 09:11:36, Haifeng Xu wrote: > If oom_group is set, oom_kill_process() invokes oom_kill_memcg_member() > to kill all processes in the memcg. When scanning tasks in memcg, maybe > the provided task is marked as oom victim. Also, some tasks are likely > to release their address space. There is no need to kill the exiting tasks. This doesn't state any actual problem. Could you be more specific? Is this a bug fix, a behavior change or an optimization?
On 2023/3/14 17:19, Michal Hocko wrote: > On Tue 14-03-23 09:11:36, Haifeng Xu wrote: >> If oom_group is set, oom_kill_process() invokes oom_kill_memcg_member() >> to kill all processes in the memcg. When scanning tasks in memcg, maybe >> the provided task is marked as oom victim. Also, some tasks are likely >> to release their address space. There is no need to kill the exiting tasks. > > This doesn't state any actual problem. Could you be more specific? Is > this a bug fix, a behavior change or an optimization? 1) oom_kill_process() has inovked __oom_kill_process() to kill the selected victim, but it will be scanned in mem_cgroup_scan_tasks(). It's pointless to kill the victim twice. 2) for those exiting processes, reaping them directly is also a faster way to free memory compare with invoking __oom_kill_process().
On Tue 14-03-23 18:07:42, Haifeng Xu wrote: > > > On 2023/3/14 17:19, Michal Hocko wrote: > > On Tue 14-03-23 09:11:36, Haifeng Xu wrote: > >> If oom_group is set, oom_kill_process() invokes oom_kill_memcg_member() > >> to kill all processes in the memcg. When scanning tasks in memcg, maybe > >> the provided task is marked as oom victim. Also, some tasks are likely > >> to release their address space. There is no need to kill the exiting tasks. > > > > This doesn't state any actual problem. Could you be more specific? Is > > this a bug fix, a behavior change or an optimization? > > > 1) oom_kill_process() has inovked __oom_kill_process() to kill the selected victim, but it will be scanned > in mem_cgroup_scan_tasks(). It's pointless to kill the victim twice. Why does that matter though? The purpose of task_will_free_mem in oom_kill_process is different. It would bail out from a potentially noisy OOM report when the selected oom victim is expected to terminate soon. __oom_kill_process called for the whole memcg doesn't aim at avoiding any oom victims. It merely sends a kill signal too all of them. > 2) for those exiting processes, reaping them directly is also a faster way to free memory compare with invoking > __oom_kill_process(). Is it? What if the terminating task is blocked on lock? Async oom reaping might release those resources in that case.
On 2023/3/14 18:16, Michal Hocko wrote: > On Tue 14-03-23 18:07:42, Haifeng Xu wrote: >> >> >> On 2023/3/14 17:19, Michal Hocko wrote: >>> On Tue 14-03-23 09:11:36, Haifeng Xu wrote: >>>> If oom_group is set, oom_kill_process() invokes oom_kill_memcg_member() >>>> to kill all processes in the memcg. When scanning tasks in memcg, maybe >>>> the provided task is marked as oom victim. Also, some tasks are likely >>>> to release their address space. There is no need to kill the exiting tasks. >>> >>> This doesn't state any actual problem. Could you be more specific? Is >>> this a bug fix, a behavior change or an optimization? >> >> >> 1) oom_kill_process() has inovked __oom_kill_process() to kill the selected victim, but it will be scanned >> in mem_cgroup_scan_tasks(). It's pointless to kill the victim twice. > > Why does that matter though? The purpose of task_will_free_mem in > oom_kill_process is different. It would bail out from a potentially > noisy OOM report when the selected oom victim is expected to terminate > soon. __oom_kill_process called for the whole memcg doesn't aim at > avoiding any oom victims. It merely sends a kill signal too all of them. > except sending kill signals, __oom_kill_process() will do some other work, such as print messeages, traversal all all user processes sharing mm which holds RCU section and so on. So if skip the victim, we don't need those work again and it won't affect the original mechanism. All oom victims are still get killed. >> 2) for those exiting processes, reaping them directly is also a faster way to free memory compare with invoking >> __oom_kill_process(). > > Is it? What if the terminating task is blocked on lock? Async oom > reaping might release those resources in that case. Yes, the reaping process is asynchronous. I mean we don't need the work mentioned above any more. "reaping them directly" here is that joining the task in oom reaper queue.
On Tue 14-03-23 19:07:27, Haifeng Xu wrote: > > > On 2023/3/14 18:16, Michal Hocko wrote: > > On Tue 14-03-23 18:07:42, Haifeng Xu wrote: > >> > >> > >> On 2023/3/14 17:19, Michal Hocko wrote: > >>> On Tue 14-03-23 09:11:36, Haifeng Xu wrote: > >>>> If oom_group is set, oom_kill_process() invokes oom_kill_memcg_member() > >>>> to kill all processes in the memcg. When scanning tasks in memcg, maybe > >>>> the provided task is marked as oom victim. Also, some tasks are likely > >>>> to release their address space. There is no need to kill the exiting tasks. > >>> > >>> This doesn't state any actual problem. Could you be more specific? Is > >>> this a bug fix, a behavior change or an optimization? > >> > >> > >> 1) oom_kill_process() has inovked __oom_kill_process() to kill the selected victim, but it will be scanned > >> in mem_cgroup_scan_tasks(). It's pointless to kill the victim twice. > > > > Why does that matter though? The purpose of task_will_free_mem in > > oom_kill_process is different. It would bail out from a potentially > > noisy OOM report when the selected oom victim is expected to terminate > > soon. __oom_kill_process called for the whole memcg doesn't aim at > > avoiding any oom victims. It merely sends a kill signal too all of them. > > > > except sending kill signals, __oom_kill_process() will do some other work, such as print messeages, traversal all > all user processes sharing mm which holds RCU section and so on. So if skip the victim, we don't need those work again > and it won't affect the original mechanism. All oom victims are still get killed. mm sharing among processes is a very rare thing but do not forget that task_will_free_mem needs to do the same thing for the same reason. > >> 2) for those exiting processes, reaping them directly is also a faster way to free memory compare with invoking > >> __oom_kill_process(). > > > > Is it? What if the terminating task is blocked on lock? Async oom > > reaping might release those resources in that case. > > Yes, the reaping process is asynchronous. I mean we don't need the work mentioned above any more. > "reaping them directly" here is that joining the task in oom reaper queue. I do not follow. In any case I still do not see any actual justification for the change other than "we can do it and it might turn out less expensive". This alone is not sufficient, just be explicit, because oom is hardly a fast path to optimize every single cpu cycle for. So unless you see an actual real life problem that would be behaving much better or even fixed then I am not convinced this is a worthwhile change to have.
On 2023/3/14 20:00, Michal Hocko wrote: > On Tue 14-03-23 19:07:27, Haifeng Xu wrote: >> >> >> On 2023/3/14 18:16, Michal Hocko wrote: >>> On Tue 14-03-23 18:07:42, Haifeng Xu wrote: >>>> >>>> >>>> On 2023/3/14 17:19, Michal Hocko wrote: >>>>> On Tue 14-03-23 09:11:36, Haifeng Xu wrote: >>>>>> If oom_group is set, oom_kill_process() invokes oom_kill_memcg_member() >>>>>> to kill all processes in the memcg. When scanning tasks in memcg, maybe >>>>>> the provided task is marked as oom victim. Also, some tasks are likely >>>>>> to release their address space. There is no need to kill the exiting tasks. >>>>> >>>>> This doesn't state any actual problem. Could you be more specific? Is >>>>> this a bug fix, a behavior change or an optimization? >>>> >>>> >>>> 1) oom_kill_process() has inovked __oom_kill_process() to kill the selected victim, but it will be scanned >>>> in mem_cgroup_scan_tasks(). It's pointless to kill the victim twice. >>> >>> Why does that matter though? The purpose of task_will_free_mem in >>> oom_kill_process is different. It would bail out from a potentially >>> noisy OOM report when the selected oom victim is expected to terminate >>> soon. __oom_kill_process called for the whole memcg doesn't aim at >>> avoiding any oom victims. It merely sends a kill signal too all of them. >>> >> >> except sending kill signals, __oom_kill_process() will do some other work, such as print messeages, traversal all >> all user processes sharing mm which holds RCU section and so on. So if skip the victim, we don't need those work again >> and it won't affect the original mechanism. All oom victims are still get killed. > > mm sharing among processes is a very rare thing but do not forget that > task_will_free_mem needs to do the same thing for the same reason. For the victim, __oom_kill_process() traversals all processes in the system whether there some other tasks sharing mm or not. If skip it, this work can be dropped. > >>>> 2) for those exiting processes, reaping them directly is also a faster way to free memory compare with invoking >>>> __oom_kill_process(). >>> >>> Is it? What if the terminating task is blocked on lock? Async oom >>> reaping might release those resources in that case. >> >> Yes, the reaping process is asynchronous. I mean we don't need the work mentioned above any more. >> "reaping them directly" here is that joining the task in oom reaper queue. > > I do not follow. > > In any case I still do not see any actual justification for the change > other than "we can do it and it might turn out less expensive". This > alone is not sufficient, just be explicit, because oom is hardly a fast > path to optimize every single cpu cycle for. So unless you see an actual > real life problem that would be behaving much better or even fixed then > I am not convinced this is a worthwhile change to have. > we can also see two same messages("Memory cgroup out of memory: Killed process ***")about the victim. This seems a little confusing. If skip the victim, only one message was printed.
diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 044e1eed720e..f16bca506dc2 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -996,15 +996,43 @@ static void __oom_kill_process(struct task_struct *victim, const char *message) } #undef K +static bool reap_task_will_free_mem(struct task_struct *victim) +{ + bool ret = false; + + task_lock(victim); + if (task_will_free_mem(victim)) { + mark_oom_victim(victim); + queue_oom_reaper(victim); + ret = true; + } + task_unlock(victim); + + return ret; +} + /* * Kill provided task unless it's secured by setting * oom_score_adj to OOM_SCORE_ADJ_MIN. + * If the task is marked as oom victim or will free + * memory, there is no need to kill it again. */ static int oom_kill_memcg_member(struct task_struct *task, void *message) { if (task->signal->oom_score_adj != OOM_SCORE_ADJ_MIN && !is_global_init(task)) { get_task_struct(task); + + /* + * If the task is already exiting, don't alarm the sysadmin or kill + * its children or threads, just give it access to memory reserves + * so it can die quickly + */ + if (tsk_is_oom_victim(task) || reap_task_will_free_mem(task)) { + put_task_struct(task); + return 0; + } + __oom_kill_process(task, message); } return 0; @@ -1022,15 +1050,10 @@ static void oom_kill_process(struct oom_control *oc, const char *message) * its children or threads, just give it access to memory reserves * so it can die quickly */ - task_lock(victim); - if (task_will_free_mem(victim)) { - mark_oom_victim(victim); - queue_oom_reaper(victim); - task_unlock(victim); + if (reap_task_will_free_mem(victim)) { put_task_struct(victim); return; } - task_unlock(victim); if (__ratelimit(&oom_rs)) dump_header(oc, victim);