Message ID | 20240109091511.8299-1-jianfeng.w.wang@oracle.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-20638-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:37c1:b0:101:2151:f287 with SMTP id y1csp1534212dyq; Tue, 9 Jan 2024 01:23:53 -0800 (PST) X-Google-Smtp-Source: AGHT+IHaVZBmCvBqyFpY5fPHxeYRW58dMMqgHE1zYiRQUXdR7+KtVKSVDqUoOdTub3M6d+8uHV/b X-Received: by 2002:a05:6a20:dd87:b0:197:599a:28ae with SMTP id kw7-20020a056a20dd8700b00197599a28aemr4623194pzb.69.1704792232900; Tue, 09 Jan 2024 01:23:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704792232; cv=none; d=google.com; s=arc-20160816; b=fZcPBlK3+K8xIgMnUn2LiELm5d/m3B3Qk87pK3+0xety04T04TToD1X9PTpGMfgH4O c6LCRsrtY1pkt7iyYICfaXqejSe9sKFr2bkP8MHSFS2urBBd3z2nfRY6Lm0JH0bcR7Nn erro7iMBQ1+paHV6XSWFzEIsMxVoxqoWkzDHjvR1mcUChMctRiJuEqp8pOIctVAGUFjI FsXAhRrS5U0KbxNspjgVC4V1cjKzg36OtStIrYVi8QruUrI2gtrYq2EFg3j0Xieq+7lc pfuQRyCz3nvrL4i7aE8O3Eemmc6RkOaFMJ+HXV0+xrsIbwTFdOBAvnDUr5XFIzzw5k90 u2Qw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=+xyrwDFS099oTZr80eG8tqlbdegvI+EmzfeYorlmT8s=; fh=DIJ01Ae6+3TNHKEdae74IbciJB1Kgg4ryiuG3K+I8lE=; b=U+HrYlPR4KWvS6PivYdbs278chfLFBLfJOgJzIX9by/5uMYOyNrVv5fGvfaW764xKL xpKY69eyJe8oJFWdjC/mKa/SG5oVZE0T8Koy8dgHgkLaBG4JZYLXeIjm2UaQ+TUrfzcT +cC30IwFVbbhEZmgXWcmFFYqmgK81GOSRFajRLHL7aGCjfDgQfVmiEwjOUZZJJk7SbUC CzJulRu1COA2yCrLu6RQQpggPknsWKJLIiO2ynuVn+3l74DU9b5HfLuzR3huE09T7oQW eGdLTUL6+ufEOkRb1IAkx2JyFq3xqohJfagv5ufclUbi/8AJxzqD7OivzHvyo3MKwlpQ mAFA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2023-11-20 header.b=LwPgBZ8F; spf=pass (google.com: domain of linux-kernel+bounces-20638-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-20638-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id a24-20020a62d418000000b006dab98006c1si1176662pfh.202.2024.01.09.01.23.52 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 09 Jan 2024 01:23:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-20638-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2023-11-20 header.b=LwPgBZ8F; spf=pass (google.com: domain of linux-kernel+bounces-20638-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-20638-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id F1BF2B23DB8 for <ouuuleilei@gmail.com>; Tue, 9 Jan 2024 09:15:55 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1B9E933CDF; Tue, 9 Jan 2024 09:15:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="LwPgBZ8F" Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D70629CF1 for <linux-kernel@vger.kernel.org>; Tue, 9 Jan 2024 09:15:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 4099ExJF002379; Tue, 9 Jan 2024 09:15:15 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=corp-2023-11-20; bh=+xyrwDFS099oTZr80eG8tqlbdegvI+EmzfeYorlmT8s=; b=LwPgBZ8FNFDr95rT4pYF6Jb9Nqwk0ExICiU9LTw94K2MnYwAKMqPES5Gl+aCvrBTSYct avy/AXVkQTv0Ai57iyzSAjwoPBcQAVebzTw5Vd2P5MifynPKWR+VLclqSDUZFofatJCy e2g0siJ4Nc4iaKZmqnZA+PP2ti9ocd/97qw+vHJzmmGQIHOQXMsJP2X9bL3funAg168J rp1qc9trPCBRXDP9l9xw63G/i6G34XtAXm0wvtxxSJIVx+BC8Rl6QFvWHsizZVuG1cwg 1+GtlU1DxhfifcLdypb8Ae0VryLpdb3HCjc56Ax9XFtcYg+sLvxNqHkoETBCzebA/8Vq eg== Received: from iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta03.appoci.oracle.com [130.35.103.27]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3vh3bd0020-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Jan 2024 09:15:14 +0000 Received: from pps.filterd (iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 4098Xsuf006721; Tue, 9 Jan 2024 09:15:13 GMT Received: from pps.reinject (localhost [127.0.0.1]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3vfur3f6rt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 09 Jan 2024 09:15:13 +0000 Received: from iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 4099FCea007613; Tue, 9 Jan 2024 09:15:13 GMT Received: from jfwang-mac.us.oracle.com (dhcp-10-159-139-172.vpn.oracle.com [10.159.139.172]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTP id 3vfur3f6r8-1; Tue, 09 Jan 2024 09:15:12 +0000 From: Jianfeng Wang <jianfeng.w.wang@oracle.com> To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jianfeng Wang <jianfeng.w.wang@oracle.com> Subject: [PATCH] mm, oom: Add lru_add_drain() in __oom_reap_task_mm() Date: Tue, 9 Jan 2024 01:15:11 -0800 Message-ID: <20240109091511.8299-1-jianfeng.w.wang@oracle.com> X-Mailer: git-send-email 2.42.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-01-09_03,2024-01-08_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 malwarescore=0 spamscore=0 adultscore=0 mlxlogscore=999 phishscore=0 suspectscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2401090073 X-Proofpoint-ORIG-GUID: 89_hUgsNPu7_jKHX54smR6a95sDi_sq1 X-Proofpoint-GUID: 89_hUgsNPu7_jKHX54smR6a95sDi_sq1 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1787604219996561729 X-GMAIL-MSGID: 1787604219996561729 |
Series |
mm, oom: Add lru_add_drain() in __oom_reap_task_mm()
|
|
Commit Message
Jianfeng Wang
Jan. 9, 2024, 9:15 a.m. UTC
The oom_reaper tries to reclaim additional memory owned by the oom
victim. In __oom_reap_task_mm(), it uses mmu_gather for batched page
free. After oom_reaper was added, mmu_gather feature introduced
CONFIG_MMU_GATHER_NO_GATHER (in 'commit 952a31c9e6fa ("asm-generic/tlb:
Introduce CONFIG_HAVE_MMU_GATHER_NO_GATHER=y")', an option to skip batched
page free. If set, tlb_batch_pages_flush(), which is responsible for
calling lru_add_drain(), is skipped during tlb_finish_mmu(). Without it,
pages could still be held by per-cpu fbatches rather than be freed.
This fix adds lru_add_drain() prior to mmu_gather. This makes the code
consistent with other cases where mmu_gather is used for freeing pages.
Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com>
---
mm/oom_kill.c | 1 +
1 file changed, 1 insertion(+)
Comments
On Tue 09-01-24 01:15:11, Jianfeng Wang wrote: > The oom_reaper tries to reclaim additional memory owned by the oom > victim. In __oom_reap_task_mm(), it uses mmu_gather for batched page > free. After oom_reaper was added, mmu_gather feature introduced > CONFIG_MMU_GATHER_NO_GATHER (in 'commit 952a31c9e6fa ("asm-generic/tlb: > Introduce CONFIG_HAVE_MMU_GATHER_NO_GATHER=y")', an option to skip batched > page free. If set, tlb_batch_pages_flush(), which is responsible for > calling lru_add_drain(), is skipped during tlb_finish_mmu(). Without it, > pages could still be held by per-cpu fbatches rather than be freed. > > This fix adds lru_add_drain() prior to mmu_gather. This makes the code > consistent with other cases where mmu_gather is used for freeing pages. Does this fix any actual problem or is this pure code consistency thing? I am asking because it doesn't make much sense to me TBH, LRU cache draining is usually important when we want to ensure that cached pages are put to LRU to be dealt with because otherwise the MM code wouldn't be able to deal with them. OOM reaper doesn't necessarily run on the same CPU as the oom victim so draining on a local CPU doesn't necessarily do anything for the victim's pages. While this patch is not harmful I really do not see much point in adding the local draining here. Could you clarify please? > Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com> > --- > mm/oom_kill.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/oom_kill.c b/mm/oom_kill.c > index 9e6071fde34a..e2fcf4f062ea 100644 > --- a/mm/oom_kill.c > +++ b/mm/oom_kill.c > @@ -537,6 +537,7 @@ static bool __oom_reap_task_mm(struct mm_struct *mm) > struct mmu_notifier_range range; > struct mmu_gather tlb; > > + lru_add_drain(); > mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, > mm, vma->vm_start, > vma->vm_end); > -- > 2.42.1 >
On 1/10/24 12:46 AM, Michal Hocko wrote: > On Tue 09-01-24 01:15:11, Jianfeng Wang wrote: >> The oom_reaper tries to reclaim additional memory owned by the oom >> victim. In __oom_reap_task_mm(), it uses mmu_gather for batched page >> free. After oom_reaper was added, mmu_gather feature introduced >> CONFIG_MMU_GATHER_NO_GATHER (in 'commit 952a31c9e6fa ("asm-generic/tlb: >> Introduce CONFIG_HAVE_MMU_GATHER_NO_GATHER=y")', an option to skip batched >> page free. If set, tlb_batch_pages_flush(), which is responsible for >> calling lru_add_drain(), is skipped during tlb_finish_mmu(). Without it, >> pages could still be held by per-cpu fbatches rather than be freed. >> >> This fix adds lru_add_drain() prior to mmu_gather. This makes the code >> consistent with other cases where mmu_gather is used for freeing pages. > > Does this fix any actual problem or is this pure code consistency thing? > I am asking because it doesn't make much sense to me TBH, LRU cache > draining is usually important when we want to ensure that cached pages > are put to LRU to be dealt with because otherwise the MM code wouldn't > be able to deal with them. OOM reaper doesn't necessarily run on the > same CPU as the oom victim so draining on a local CPU doesn't > necessarily do anything for the victim's pages. > > While this patch is not harmful I really do not see much point in adding > the local draining here. Could you clarify please? > It targets the case described in the patch's commit message: oom_killer thinks that it 'reclaims' pages while pages are still held by per-cpu fbatches with a ref count. I admit that pages may sit on a different core(s). Given that doing remote calls to all CPUs with lru_add_drain_all() is expensive, this line of code can be helpful if it happens to give back a few pages to the system right away without the overhead, especially when oom is involved. Plus, it also makes the code consistent with other places using mmu_gather feature to free pages in batch. --JW >> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com> >> --- >> mm/oom_kill.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/mm/oom_kill.c b/mm/oom_kill.c >> index 9e6071fde34a..e2fcf4f062ea 100644 >> --- a/mm/oom_kill.c >> +++ b/mm/oom_kill.c >> @@ -537,6 +537,7 @@ static bool __oom_reap_task_mm(struct mm_struct *mm) >> struct mmu_notifier_range range; >> struct mmu_gather tlb; >> >> + lru_add_drain(); >> mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, >> mm, vma->vm_start, >> vma->vm_end); >> -- >> 2.42.1 >> >
On Wed 10-01-24 11:02:03, Jianfeng Wang wrote: > On 1/10/24 12:46 AM, Michal Hocko wrote: > > On Tue 09-01-24 01:15:11, Jianfeng Wang wrote: > >> The oom_reaper tries to reclaim additional memory owned by the oom > >> victim. In __oom_reap_task_mm(), it uses mmu_gather for batched page > >> free. After oom_reaper was added, mmu_gather feature introduced > >> CONFIG_MMU_GATHER_NO_GATHER (in 'commit 952a31c9e6fa ("asm-generic/tlb: > >> Introduce CONFIG_HAVE_MMU_GATHER_NO_GATHER=y")', an option to skip batched > >> page free. If set, tlb_batch_pages_flush(), which is responsible for > >> calling lru_add_drain(), is skipped during tlb_finish_mmu(). Without it, > >> pages could still be held by per-cpu fbatches rather than be freed. > >> > >> This fix adds lru_add_drain() prior to mmu_gather. This makes the code > >> consistent with other cases where mmu_gather is used for freeing pages. > > > > Does this fix any actual problem or is this pure code consistency thing? > > I am asking because it doesn't make much sense to me TBH, LRU cache > > draining is usually important when we want to ensure that cached pages > > are put to LRU to be dealt with because otherwise the MM code wouldn't > > be able to deal with them. OOM reaper doesn't necessarily run on the > > same CPU as the oom victim so draining on a local CPU doesn't > > necessarily do anything for the victim's pages. > > > > While this patch is not harmful I really do not see much point in adding > > the local draining here. Could you clarify please? > > > It targets the case described in the patch's commit message: oom_killer > thinks that it 'reclaims' pages while pages are still held by per-cpu > fbatches with a ref count. > > I admit that pages may sit on a different core(s). Given that > doing remote calls to all CPUs with lru_add_drain_all() is expensive, > this line of code can be helpful if it happens to give back a few pages > to the system right away without the overhead, especially when oom is > involved. Plus, it also makes the code consistent with other places > using mmu_gather feature to free pages in batch. I would argue that consistency the biggest problem of this patch. It tries to follow a pattern that is just not really correct. First it operates on a random CPU from the oom victim perspective and second it doesn't really block any unmapping operation and that is the main purpose of the reaper. Sure it frees a lot of unmapped memory but if there are couple of pages that cannot be freed imeediately because they are sitting on a per-cpu LRU caches then this is not a deal breaker. As you have noted those pages might be sitting on any per-cpu cache. So I do not really see that as a good justification. People will follow that pattern even more and spread lru_add_drain to other random places. Unless you can show any actual runtime effect of this patch then I think it shouldn't be merged.
On 1/11/24 12:46 AM, Michal Hocko wrote: > On Wed 10-01-24 11:02:03, Jianfeng Wang wrote: >> On 1/10/24 12:46 AM, Michal Hocko wrote: >>> On Tue 09-01-24 01:15:11, Jianfeng Wang wrote: >>>> The oom_reaper tries to reclaim additional memory owned by the oom >>>> victim. In __oom_reap_task_mm(), it uses mmu_gather for batched page >>>> free. After oom_reaper was added, mmu_gather feature introduced >>>> CONFIG_MMU_GATHER_NO_GATHER (in 'commit 952a31c9e6fa ("asm-generic/tlb: >>>> Introduce CONFIG_HAVE_MMU_GATHER_NO_GATHER=y")', an option to skip batched >>>> page free. If set, tlb_batch_pages_flush(), which is responsible for >>>> calling lru_add_drain(), is skipped during tlb_finish_mmu(). Without it, >>>> pages could still be held by per-cpu fbatches rather than be freed. >>>> >>>> This fix adds lru_add_drain() prior to mmu_gather. This makes the code >>>> consistent with other cases where mmu_gather is used for freeing pages. >>> >>> Does this fix any actual problem or is this pure code consistency thing? >>> I am asking because it doesn't make much sense to me TBH, LRU cache >>> draining is usually important when we want to ensure that cached pages >>> are put to LRU to be dealt with because otherwise the MM code wouldn't >>> be able to deal with them. OOM reaper doesn't necessarily run on the >>> same CPU as the oom victim so draining on a local CPU doesn't >>> necessarily do anything for the victim's pages. >>> >>> While this patch is not harmful I really do not see much point in adding >>> the local draining here. Could you clarify please? >>> >> It targets the case described in the patch's commit message: oom_killer >> thinks that it 'reclaims' pages while pages are still held by per-cpu >> fbatches with a ref count. >> >> I admit that pages may sit on a different core(s). Given that >> doing remote calls to all CPUs with lru_add_drain_all() is expensive, >> this line of code can be helpful if it happens to give back a few pages >> to the system right away without the overhead, especially when oom is >> involved. Plus, it also makes the code consistent with other places >> using mmu_gather feature to free pages in batch. > > I would argue that consistency the biggest problem of this patch. It > tries to follow a pattern that is just not really correct. First it > operates on a random CPU from the oom victim perspective and second it > doesn't really block any unmapping operation and that is the main > purpose of the reaper. Sure it frees a lot of unmapped memory but if > there are couple of pages that cannot be freed imeediately because they > are sitting on a per-cpu LRU caches then this is not a deal breaker. As > you have noted those pages might be sitting on any per-cpu cache. > > So I do not really see that as a good justification. People will follow > that pattern even more and spread lru_add_drain to other random places. > > Unless you can show any actual runtime effect of this patch then I think > it shouldn't be merged. > Thanks for raising your concern. I'd call it a trade-off rather than "not really correct". Look at unmap_region() / free_pages_and_swap_cache() written by Linus. These are in favor of this pattern, which indicates that the trade-off (i.e. draining local CPU or draining all CPUs or no draining at all) had been made in the same way in the past. I don't have a specific runtime effect to provide, except that it will free 10s kB pages immediately during OOM.
On Thu, 11 Jan 2024 10:54:45 -0800 Jianfeng Wang <jianfeng.w.wang@oracle.com> wrote: > > > Unless you can show any actual runtime effect of this patch then I think > > it shouldn't be merged. > > > > Thanks for raising your concern. > I'd call it a trade-off rather than "not really correct". Look at > unmap_region() / free_pages_and_swap_cache() written by Linus. These are in > favor of this pattern, which indicates that the trade-off (i.e. draining > local CPU or draining all CPUs or no draining at all) had been made in the > same way in the past. I don't have a specific runtime effect to provide, > except that it will free 10s kB pages immediately during OOM. I don't think it's necessary to run lru_add_drain() for each vma. Once we've done it it once, it can be skipped for additional vmas. That's pretty minor because the second and successive calls will be cheap. But it becomes much more significant if we switch to lru_add_drain_all(), which sounds like what we should be doing here. Is it possible?
On 1/11/24 1:54 PM, Andrew Morton wrote: > On Thu, 11 Jan 2024 10:54:45 -0800 Jianfeng Wang <jianfeng.w.wang@oracle.com> wrote: > >> >>> Unless you can show any actual runtime effect of this patch then I think >>> it shouldn't be merged. >>> >> >> Thanks for raising your concern. >> I'd call it a trade-off rather than "not really correct". Look at >> unmap_region() / free_pages_and_swap_cache() written by Linus. These are in >> favor of this pattern, which indicates that the trade-off (i.e. draining >> local CPU or draining all CPUs or no draining at all) had been made in the >> same way in the past. I don't have a specific runtime effect to provide, >> except that it will free 10s kB pages immediately during OOM. > > I don't think it's necessary to run lru_add_drain() for each vma. Once > we've done it it once, it can be skipped for additional vmas. > Agreed. > That's pretty minor because the second and successive calls will be > cheap. But it becomes much more significant if we switch to > lru_add_drain_all(), which sounds like what we should be doing here. > Is it possible? > What do you both think of adding lru_add_drain_all() prior to the for loop?
On Thu 11-01-24 16:08:57, Jianfeng Wang wrote: > > > On 1/11/24 1:54 PM, Andrew Morton wrote: > > On Thu, 11 Jan 2024 10:54:45 -0800 Jianfeng Wang <jianfeng.w.wang@oracle.com> wrote: > > > >> > >>> Unless you can show any actual runtime effect of this patch then I think > >>> it shouldn't be merged. > >>> > >> > >> Thanks for raising your concern. > >> I'd call it a trade-off rather than "not really correct". Look at > >> unmap_region() / free_pages_and_swap_cache() written by Linus. These are in > >> favor of this pattern, which indicates that the trade-off (i.e. draining > >> local CPU or draining all CPUs or no draining at all) had been made in the > >> same way in the past. I don't have a specific runtime effect to provide, > >> except that it will free 10s kB pages immediately during OOM. You are missing an important point. Those two calls are quite different. oom_reaper unmaps memory after all the reclaim attempts have failed. That includes draining all sorts of caches on the way. Including draining LRU pcp cache (look for lru_add_drain_all in the reclaim path). > > I don't think it's necessary to run lru_add_drain() for each vma. Once > > we've done it it once, it can be skipped for additional vmas. > > > Agreed. > > > That's pretty minor because the second and successive calls will be > > cheap. But it becomes much more significant if we switch to > > lru_add_drain_all(), which sounds like what we should be doing here. > > Is it possible? > > > What do you both think of adding lru_add_drain_all() prior to the for loop? lru_add_drain_all relies on WQs. And we absolutely do not want to get oom_reaper stuck just because all the WQ is jammed. So no, this is actually actively harmful! All that being said I stand by my previous statement that this patch is not doing anything measurably useful. Prove me wrong otherwise I am against merging "just for consistency patch". Really, we should go and re-evaluate existing local lru draining callers. I wouldn't be surprised if we removed some of them.
On Fri, Jan 12, 2024 at 09:49:08AM +0100, Michal Hocko wrote: > On Thu 11-01-24 16:08:57, Jianfeng Wang wrote: > > > > > > On 1/11/24 1:54 PM, Andrew Morton wrote: > > > On Thu, 11 Jan 2024 10:54:45 -0800 Jianfeng Wang <jianfeng.w.wang@oracle.com> wrote: > > > > > >> > > >>> Unless you can show any actual runtime effect of this patch then I think > > >>> it shouldn't be merged. > > >>> > > >> > > >> Thanks for raising your concern. > > >> I'd call it a trade-off rather than "not really correct". Look at > > >> unmap_region() / free_pages_and_swap_cache() written by Linus. These are in > > >> favor of this pattern, which indicates that the trade-off (i.e. draining > > >> local CPU or draining all CPUs or no draining at all) had been made in the > > >> same way in the past. I don't have a specific runtime effect to provide, > > >> except that it will free 10s kB pages immediately during OOM. > > You are missing an important point. Those two calls are quite different. > oom_reaper unmaps memory after all the reclaim attempts have failed. > That includes draining all sorts of caches on the way. Including > draining LRU pcp cache (look for lru_add_drain_all in the reclaim path). > > > > I don't think it's necessary to run lru_add_drain() for each vma. Once > > > we've done it it once, it can be skipped for additional vmas. > > > > > Agreed. > > > > > That's pretty minor because the second and successive calls will be > > > cheap. But it becomes much more significant if we switch to > > > lru_add_drain_all(), which sounds like what we should be doing here. > > > Is it possible? > > > > > What do you both think of adding lru_add_drain_all() prior to the for loop? > > lru_add_drain_all relies on WQs. And we absolutely do not want to get > oom_reaper stuck just because all the WQ is jammed. So no, this is > actually actively harmful! I completely agree. The oom_reap_task_mm function is also used for process_mrelease, which is a critical path for releasing memory in Android and is typically used under system pressure(not only for memory pressure but also CPU pressured at the same time). The lru_add_drain_all function can take a long time to finish because Android is susceptible to priority inversion among processes. The better idea may enable remote draining with lru_add_drain_all, analogous to the recent PCP modifications.
diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 9e6071fde34a..e2fcf4f062ea 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -537,6 +537,7 @@ static bool __oom_reap_task_mm(struct mm_struct *mm) struct mmu_notifier_range range; struct mmu_gather tlb; + lru_add_drain(); mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, mm, vma->vm_start, vma->vm_end);