From patchwork Thu Feb 9 07:14:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: huyd12@chinatelecom.cn X-Patchwork-Id: 54782 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp192890wrn; Wed, 8 Feb 2023 23:43:25 -0800 (PST) X-Google-Smtp-Source: AK7set+e0fyyHAbdbl0CcY7F3ilYwEoW1T80X06K4YMxqpyPygDwspYPFduc8s+26wQvihIUFDA9 X-Received: by 2002:a17:906:81a:b0:8af:2a94:d1b7 with SMTP id e26-20020a170906081a00b008af2a94d1b7mr3721983ejd.24.1675928605483; Wed, 08 Feb 2023 23:43:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675928605; cv=none; d=google.com; s=arc-20160816; b=1C/0in6VqckRXLDDZPRpPS0Lq5zHadbqydS3cy5masptq7GOg0jWulvRnWc8svaWLN tVQJbFJGJdPXVKdeq8vvXVl8ITGK6OYHBzzXl4HqEM03Ua/lFccq0oS1pte685fZ3TzD DX1KiQ3O9YFrTbgySV2fiBmPHO3MeRD6mCLeG5inMT4/Zq+UV5lbaj1hj19BxgbTX/xk coGRAdhDT1/RtDIqIxBGq6jE+QgoajLULI4pzFH4LxIb21+E5YfdmEEop+L9bglmc+YQ nHei3ACFBtk1uLKYcdRD07fx2GBAwyEXWUCiPVO5c7rlMxnYwzsgLzLVif//aHzpN/II lnQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:thread-index :content-transfer-encoding:mime-version:message-id:date:subject :in-reply-to:references:cc:to:from:sender:hmm_source_type :hmm_attache_num:hmm_source_ip; bh=/O1axaG0q+efiuF5arrHh9vERuHBLiv/5p0XNHNetZw=; b=rDVxh0XRAl2VPHc866UC1yIYX4RHVD2vfjwh3aXMIg5R2mFrwKQfQbQj7eZFN448tG 79EXRCJMf8wqzTZb0VXHe3QqvLrs48Gmd87CyA/SSqV8x0Tz/f4buxifJT4TuIEzB7Ir 57StYwV108QyZjcJl1pfu4xzQ1eoxrun/SrhHwK4hvYm20Z1ZnsxOcq+K2EhultwRHa5 e0McV1jL/3c7FkUmBcBeYT7VlkSxT2Yxwr+dqxgJNzK4LGaCZO4CWqJGdTN8GPVM6oBZ W/jDK7DZhywc90ctfXQcfgBuAKWPn8IiBeoJBzxvhTTdkOp7AVIQb6SuK9WDry93I0Kx kPqw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id om15-20020a170907a18f00b0087bdb8e1214si1365546ejc.345.2023.02.08.23.43.02; Wed, 08 Feb 2023 23:43:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229889AbjBIHkr convert rfc822-to-8bit (ORCPT + 99 others); Thu, 9 Feb 2023 02:40:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45324 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229879AbjBIHkn (ORCPT ); Thu, 9 Feb 2023 02:40:43 -0500 X-Greylist: delayed 720 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Wed, 08 Feb 2023 23:40:41 PST Received: from chinatelecom.cn (prt-mail.chinatelecom.cn [42.123.76.220]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 55C70402DA for ; Wed, 8 Feb 2023 23:40:41 -0800 (PST) HMM_SOURCE_IP: 172.18.0.188:33460.1299675246 HMM_ATTACHE_NUM: 0000 HMM_SOURCE_TYPE: SMTP Received: from clientip-10.133.8.199 (unknown [172.18.0.188]) by chinatelecom.cn (HERMES) with SMTP id 68AA52800BD; Thu, 9 Feb 2023 15:14:56 +0800 (CST) X-189-SAVE-TO-SEND: huyd12@chinatelecom.cn Received: from ([10.133.8.199]) by app0023 with ESMTP id 79a6d3fc04714ff6ba7bdd059d93cc5d for agruenba@redhat.com; Thu, 09 Feb 2023 15:15:12 CST X-Transaction-ID: 79a6d3fc04714ff6ba7bdd059d93cc5d X-Real-From: huyd12@chinatelecom.cn X-Receive-IP: 10.133.8.199 X-MEDUSA-Status: 0 Sender: huyd12@chinatelecom.cn From: To: , , , "'Christian Brauner'" , "'Michal Hocko'" , "'Andrew Morton'" Cc: , , References: <20230208094905.373-1-liuq131@chinatelecom.cn> In-Reply-To: <20230208094905.373-1-liuq131@chinatelecom.cn> Subject: =?eucgb2312_cn?b?u9i4tDogW1BBVENIXSBwaWQ6IGFkZCBoYW5kbGluZyBvZiB0b28gbWFu?= =?eucgb2312_cn?b?eSB6b21iaWUgcHJvY2Vzc2Vz?= Date: Thu, 9 Feb 2023 15:14:57 +0800 Message-ID: <000e01d93c56$3a4bcb00$aee36100$@chinatelecom.cn> MIME-Version: 1.0 X-Mailer: Microsoft Outlook 16.0 Thread-Index: AQDiPQqBEqp6eGY2GPbOHeYBBQy2tLC0MRXw Content-Language: zh-cn X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_PASS, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757256060148292919?= X-GMAIL-MSGID: =?utf-8?q?1757338513332012774?= Any comments will be appreciated. -----邮件原件----- 发件人: liuq131@chinatelecom.cn 发送时间: 2023年2月8日 17:49 收件人: akpm@linux-foundation.org 抄送: agruenba@redhat.com; linux-mm@kvack.org; linux-kernel@vger.kernel.org; huyd12@chinatelecom.cn; liuq 主题: [PATCH] pid: add handling of too many zombie processes There is a common situation that a parent process forks many child processes to execute tasks, but the parent process does not execute wait/waitpid when the child process exits, resulting in a large number of child processes becoming zombie processes. At this time, if the number of processes in the system out of kernel.pid_max, the new fork syscall will fail, and the system will not be able to execute any command at this time (unless an old process exits) eg: [root@lq-workstation ~]# ls -bash: fork: retry: Resource temporarily unavailable -bash: fork: retry: Resource temporarily unavailable -bash: fork: retry: Resource temporarily unavailable -bash: fork: retry: Resource temporarily unavailable -bash: fork: Resource temporarily unavailable [root@lq-workstation ~]# reboot -bash: fork: retry: Resource temporarily unavailable -bash: fork: retry: Resource temporarily unavailable -bash: fork: retry: Resource temporarily unavailable -bash: fork: retry: Resource temporarily unavailable -bash: fork: Resource temporarily unavailable I dealt with this situation in the alloc_pid function, and found a process with the most zombie subprocesses, and more than 10(or other reasonable values?) zombie subprocesses, so I tried to kill this process to release the pid resources. Signed-off-by: liuq --- include/linux/mm.h | 2 ++ kernel/pid.c | 6 +++- mm/oom_kill.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 77 insertions(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 8f857163ac89..afcff08a3878 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1940,6 +1940,8 @@ static inline void clear_page_pfmemalloc(struct page *page) * Can be called by the pagefault handler when it gets a VM_FAULT_OOM. */ extern void pagefault_out_of_memory(void); +extern void pid_max_oom_check(struct pid_namespace *ns); + #define offset_in_page(p) ((unsigned long)(p) & ~PAGE_MASK) #define offset_in_thp(page, p) ((unsigned long)(p) & (thp_size(page) - 1)) diff --git a/kernel/pid.c b/kernel/pid.c index 3fbc5e46b721..1a9a60e19ab6 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -237,7 +237,11 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid, idr_preload_end(); if (nr < 0) { - retval = (nr == -ENOSPC) ? -EAGAIN : nr; + retval = nr; + if (nr == -ENOSPC) { + retval = -EAGAIN; + pid_max_oom_check(tmp); + } goto out_free; } diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 1276e49b31b0..18d05d706f48 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -1260,3 +1260,73 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags) return -ENOSYS; #endif /* CONFIG_MMU */ } + +static void oom_pid_evaluate_task(struct task_struct *p, + struct task_struct **max_zombie_task, int *max_zombie_num) { + struct task_struct *child; + int zombie_num = 0; + + list_for_each_entry(child, &p->children, sibling) { + if (child->exit_state == EXIT_ZOMBIE) + zombie_num++; + } + if (zombie_num > *max_zombie_num) { + *max_zombie_num = zombie_num; + *max_zombie_task = p; + } +} +#define MAX_ZOMBIE_NUM 10 +struct task_struct *pid_max_bad_process(struct pid_namespace *ns) { + int max_zombie_num = 0; + struct task_struct *max_zombie_task = &init_task; + struct task_struct *p; + + rcu_read_lock(); + for_each_process(p) + oom_pid_evaluate_task(p, &max_zombie_task, &max_zombie_num); + rcu_read_unlock(); + + if (max_zombie_num > MAX_ZOMBIE_NUM) { + pr_info("process %d has %d zombie child\n", + task_pid_nr_ns(max_zombie_task, ns), max_zombie_num); + return max_zombie_task; + } + + return NULL; +} + +void pid_max_oom_kill_process(struct task_struct *task) { + struct oom_control oc = { + .zonelist = NULL, + .nodemask = NULL, + .memcg = NULL, + .gfp_mask = 0, + .order = 0, + }; + + get_task_struct(task); + oc.chosen = task; + + if (mem_cgroup_oom_synchronize(true)) + return; + + if (!mutex_trylock(&oom_lock)) + return; + + oom_kill_process(&oc, "Out of pid max(oom_kill_allocating_task)"); + mutex_unlock(&oom_lock); +} + +void pid_max_oom_check(struct pid_namespace *ns) { + struct task_struct *p; + + p = pid_max_bad_process(ns); + if (p) { + pr_info("oom_kill process %d\n", task_pid_nr_ns(p, ns)); + pid_max_oom_kill_process(p); + } +} -- 2.27.0