Message ID | 20231214222717.50277-1-jianfeng.w.wang@oracle.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-190-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:3b04:b0:fb:cd0c:d3e with SMTP id c4csp8891675dys; Thu, 14 Dec 2023 14:28:20 -0800 (PST) X-Google-Smtp-Source: AGHT+IFAX2NZOQO0xtVAXG4wBXKosODe4Bgds3z6HGRK2W7KcC/2KnuUfWWoX6d1xsyMt/kX0P5a X-Received: by 2002:a05:6122:20a5:b0:4ab:f40f:c50 with SMTP id i37-20020a05612220a500b004abf40f0c50mr8467677vkd.10.1702592900714; Thu, 14 Dec 2023 14:28:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702592900; cv=none; d=google.com; s=arc-20160816; b=buU8fwcuwIrIIgILIAZHQx6qLQpmDDRX7CXE8zRbl9KIJcJqvJ1LjF+m1fsLwbTK7P Au1rJmgkrbWQlWMdnngGMOOoTZplqjGKbw5tjPapP2tQ/Rqz4qRjkU1sjKqHZ1cHwpH5 Qfv5GLcf0lV1heO94NHYAQE17GG4z5gx+/3DHclV8GZYRrH3ndHKqNfeK1qQnkgj5ibs EOwK568O3sojCQh8yjCM2Mv+IIzMu96PVdI7qqgSmaVivvzf7h3Qvpj8hC8yG+1mS4Jm uBfNGCfR5N6zVDIsoTZ0oED/QHIit1ZYwy8MUjRGc8wEoPWl/Zm7VfudAeLBE3aGawbE +dQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=vWebtL2rhfE1sZveziVMAIBgy0M/wp/MnldOKJtCGJA=; fh=mQjI9cbiCKBl8nxHY1zqw1wNKG+kPVhK81Xrlrz1z08=; b=yEkqwD76mddtZdRyT8aeoTQYJYG/taE8XGbDnY6y+4O6oXUy1bC3Chw2saVZfUjbWV aFTaSCBL1/NfYYWmW53Y7qb0da+XReK+RDN3NfNowX2dYp7ZDXz/MpNvpvOqAD9m0snT 1ORERQxn9RUhpPPFBZvxjpXAc29bEv6H/K7E/5x5cjCD2c1BNMh+6jgTnaEgE3CrCmPt e0DDnywGL8q3YPBl2lPWcJbTtemghBbcFtMC18ytH3tWxUqamYyx4Z0e8gnR1xqGhS1A eNAVLcT35nVImyH7UJlDz78kfD5wfGAA2oW6MZYW3kyWzkmxIbBO0yNIq4hrKtBdOJiD u8EA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2023-11-20 header.b=Gx18r61u; spf=pass (google.com: domain of linux-kernel+bounces-190-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-190-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id p6-20020a05620a22a600b007725a68ea34si5904504qkh.89.2023.12.14.14.28.20 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Dec 2023 14:28:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-190-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2023-11-20 header.b=Gx18r61u; spf=pass (google.com: domain of linux-kernel+bounces-190-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-190-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 0CFE01C21DF6 for <ouuuleilei@gmail.com>; Thu, 14 Dec 2023 22:28:20 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B22622C69B; Thu, 14 Dec 2023 22:27:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="Gx18r61u" X-Original-To: linux-kernel@vger.kernel.org Received: from mx0a-00069f02.pphosted.com (mx0a-00069f02.pphosted.com [205.220.165.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1614467216 for <linux-kernel@vger.kernel.org>; Thu, 14 Dec 2023 22:27:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3BEMHP2J032548; Thu, 14 Dec 2023 22:27:20 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=corp-2023-11-20; bh=vWebtL2rhfE1sZveziVMAIBgy0M/wp/MnldOKJtCGJA=; b=Gx18r61ud07Dn6edn9pWwaJ7Z63e0Kz9zITNQAXzM0QfCuGWEI90KvM7sS4J0Dh5GxEL UGMtJreLR2/0op/a1T3XQ+z6JWTJrkLd61aj7rU6OajKxZ8siqrOkW73yEdN2T89ektb //FuSiaGwXxFikHP50hwbuFwWSzt+QY9q3U+zmaN2Vz6PwcHMsi0270MYEi6QL7n4ybv hUqlLV7pS+lencdsArH6e1iRvIwRgwwPO7BP7yWQWkXUvMOS5unrLiRC1tZz0o/KqoqK ljlY1leGL10IqEcKY4Im1bMXEp3w1KsGePr9jWaKyPrpMaFejUYZqQsehQXb3lk+osZH 7Q== Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.appoci.oracle.com [138.1.114.2]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3uvgsum0rw-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Dec 2023 22:27:20 +0000 Received: from pps.filterd (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 3BELD9D8008284; Thu, 14 Dec 2023 22:27:19 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3uvepaug4d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Dec 2023 22:27:19 +0000 Received: from phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3BEMRImB001139; Thu, 14 Dec 2023 22:27:19 GMT Received: from jfwang-mac.us.oracle.com (dhcp-10-65-130-157.vpn.oracle.com [10.65.130.157]) by phxpaimrmta01.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3uvepaug41-1; Thu, 14 Dec 2023 22:27:18 +0000 From: Jianfeng Wang <jianfeng.w.wang@oracle.com> To: akpm@linux-foundation.org Cc: tim.c.chen@linux.intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Jianfeng Wang <jianfeng.w.wang@oracle.com> Subject: [PATCH v2] mm: remove redundant lru_add_drain() prior to unmapping pages Date: Thu, 14 Dec 2023 14:27:17 -0800 Message-ID: <20231214222717.50277-1-jianfeng.w.wang@oracle.com> X-Mailer: git-send-email 2.42.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.997,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-12-14_15,2023-12-14_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 bulkscore=0 malwarescore=0 mlxlogscore=999 adultscore=0 phishscore=0 suspectscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2312140159 X-Proofpoint-ORIG-GUID: 7K-1cvQBvkGHX_8Z6IKaJyasvuhRD8p7 X-Proofpoint-GUID: 7K-1cvQBvkGHX_8Z6IKaJyasvuhRD8p7 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785150857765127288 X-GMAIL-MSGID: 1785298053495510424 |
Series |
[v2] mm: remove redundant lru_add_drain() prior to unmapping pages
|
|
Commit Message
Jianfeng Wang
Dec. 14, 2023, 10:27 p.m. UTC
When unmapping VMA pages, pages will be gathered in batch and released by
tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function
tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(),
which calls lru_add_drain() to drain cached pages in folio_batch before
releasing gathered pages. Thus, it is redundant to call lru_add_drain()
before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set.
Remove lru_add_drain() prior to gathering and unmapping pages in
exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set.
Note that the page unmapping process in oom_killer (e.g., in
__oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have
redundant lru_add_drain(). So, this commit makes the code more consistent.
Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com>
---
mm/mmap.c | 6 ++++++
1 file changed, 6 insertions(+)
Comments
On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote: > When unmapping VMA pages, pages will be gathered in batch and released by > tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function > tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), > which calls lru_add_drain() to drain cached pages in folio_batch before > releasing gathered pages. Thus, it is redundant to call lru_add_drain() > before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. > > Remove lru_add_drain() prior to gathering and unmapping pages in > exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. > > Note that the page unmapping process in oom_killer (e.g., in > __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have > redundant lru_add_drain(). So, this commit makes the code more consistent. Shouldn't we put this in __tlb_gather_mmu() which already has the CONFIG_MMU_GATHER_NO_GATHER ifdefs? That would presuambly help with, eg zap_page_range_single() too. > Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com> > --- > mm/mmap.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/mm/mmap.c b/mm/mmap.c > index 1971bfffcc03..da0308eef435 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -2330,7 +2330,10 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas, > struct mmu_gather tlb; > unsigned long mt_start = mas->index; > > + /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */ > +#ifdef CONFIG_MMU_GATHER_NO_GATHER > lru_add_drain(); > +#endif > tlb_gather_mmu(&tlb, mm); > update_hiwater_rss(mm); > unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); > @@ -3300,7 +3303,10 @@ void exit_mmap(struct mm_struct *mm) > return; > } > > + /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */ > +#ifdef CONFIG_MMU_GATHER_NO_GATHER > lru_add_drain(); > +#endif > flush_cache_mm(mm); > tlb_gather_mmu_fullmm(&tlb, mm); > /* update_hiwater_rss(mm) here? but nobody should be looking */ > -- > 2.42.1 > >
On 12/14/23 3:00 PM, Matthew Wilcox wrote: > On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote: >> When unmapping VMA pages, pages will be gathered in batch and released by >> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function >> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), >> which calls lru_add_drain() to drain cached pages in folio_batch before >> releasing gathered pages. Thus, it is redundant to call lru_add_drain() >> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. >> >> Remove lru_add_drain() prior to gathering and unmapping pages in >> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. >> >> Note that the page unmapping process in oom_killer (e.g., in >> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have >> redundant lru_add_drain(). So, this commit makes the code more consistent. > > Shouldn't we put this in __tlb_gather_mmu() which already has the > CONFIG_MMU_GATHER_NO_GATHER ifdefs? That would presuambly help with, eg > zap_page_range_single() too. > Thanks. It makes sense to me. This commit is motivated by a workload that use mmap/unmap heavily. While the mmu_gather feature is also used by hugetlb, madvise, mprotect, etc., thus I prefer to have another standalone commit (following this one) that moves lru_add_drain() to __tlb_gather_mmu() to unify these cases for not making redundant lru_add_drain() calls when using mmu_gather. >> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com> >> --- >> mm/mmap.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/mm/mmap.c b/mm/mmap.c >> index 1971bfffcc03..da0308eef435 100644 >> --- a/mm/mmap.c >> +++ b/mm/mmap.c >> @@ -2330,7 +2330,10 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas, >> struct mmu_gather tlb; >> unsigned long mt_start = mas->index; >> >> + /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */ >> +#ifdef CONFIG_MMU_GATHER_NO_GATHER >> lru_add_drain(); >> +#endif >> tlb_gather_mmu(&tlb, mm); >> update_hiwater_rss(mm); >> unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); >> @@ -3300,7 +3303,10 @@ void exit_mmap(struct mm_struct *mm) >> return; >> } >> >> + /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */ >> +#ifdef CONFIG_MMU_GATHER_NO_GATHER >> lru_add_drain(); >> +#endif >> flush_cache_mm(mm); >> tlb_gather_mmu_fullmm(&tlb, mm); >> /* update_hiwater_rss(mm) here? but nobody should be looking */ >> -- >> 2.42.1 >> >>
On 12/14/23 3:00 PM, Matthew Wilcox wrote: > On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote: >> When unmapping VMA pages, pages will be gathered in batch and released by >> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function >> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), >> which calls lru_add_drain() to drain cached pages in folio_batch before >> releasing gathered pages. Thus, it is redundant to call lru_add_drain() >> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. >> >> Remove lru_add_drain() prior to gathering and unmapping pages in >> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. >> >> Note that the page unmapping process in oom_killer (e.g., in >> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have >> redundant lru_add_drain(). So, this commit makes the code more consistent. > > Shouldn't we put this in __tlb_gather_mmu() which already has the > CONFIG_MMU_GATHER_NO_GATHER ifdefs? That would presuambly help with, eg > zap_page_range_single() too. > Thanks. It makes sense to me. This commit is motivated by a workload that use mmap/unmap heavily. While the mmu_gather feature is also used by hugetlb, madvise, mprotect, etc., thus I prefer to have another standalone commit (following this one) that moves lru_add_drain() to __tlb_gather_mmu() to unify these cases for not making redundant lru_add_drain() calls when using mmu_gather. >> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com> >> --- >> mm/mmap.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/mm/mmap.c b/mm/mmap.c >> index 1971bfffcc03..da0308eef435 100644 >> --- a/mm/mmap.c >> +++ b/mm/mmap.c >> @@ -2330,7 +2330,10 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas, >> struct mmu_gather tlb; >> unsigned long mt_start = mas->index; >> >> + /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */ >> +#ifdef CONFIG_MMU_GATHER_NO_GATHER >> lru_add_drain(); >> +#endif >> tlb_gather_mmu(&tlb, mm); >> update_hiwater_rss(mm); >> unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); >> @@ -3300,7 +3303,10 @@ void exit_mmap(struct mm_struct *mm) >> return; >> } >> >> + /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */ >> +#ifdef CONFIG_MMU_GATHER_NO_GATHER >> lru_add_drain(); >> +#endif >> flush_cache_mm(mm); >> tlb_gather_mmu_fullmm(&tlb, mm); >> /* update_hiwater_rss(mm) here? but nobody should be looking */ >> -- >> 2.42.1 >> >>
On Thu, Dec 14, 2023 at 03:59:00PM -0800, Jianfeng Wang wrote: > On 12/14/23 3:00 PM, Matthew Wilcox wrote: > > On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote: > >> When unmapping VMA pages, pages will be gathered in batch and released by > >> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function > >> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), > >> which calls lru_add_drain() to drain cached pages in folio_batch before > >> releasing gathered pages. Thus, it is redundant to call lru_add_drain() > >> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. > >> > >> Remove lru_add_drain() prior to gathering and unmapping pages in > >> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. > >> > >> Note that the page unmapping process in oom_killer (e.g., in > >> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have > >> redundant lru_add_drain(). So, this commit makes the code more consistent. > > > > Shouldn't we put this in __tlb_gather_mmu() which already has the > > CONFIG_MMU_GATHER_NO_GATHER ifdefs? That would presuambly help with, eg > > zap_page_range_single() too. > > > > Thanks. It makes sense to me. > This commit is motivated by a workload that use mmap/unmap heavily. > While the mmu_gather feature is also used by hugetlb, madvise, mprotect, > etc., thus I prefer to have another standalone commit (following this one) > that moves lru_add_drain() to __tlb_gather_mmu() to unify these cases for > not making redundant lru_add_drain() calls when using mmu_gather. That's not normally the approach we take.
On 12/14/23 7:06 PM, Matthew Wilcox wrote: > On Thu, Dec 14, 2023 at 03:59:00PM -0800, Jianfeng Wang wrote: >> On 12/14/23 3:00 PM, Matthew Wilcox wrote: >>> On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote: >>>> When unmapping VMA pages, pages will be gathered in batch and released by >>>> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function >>>> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), >>>> which calls lru_add_drain() to drain cached pages in folio_batch before >>>> releasing gathered pages. Thus, it is redundant to call lru_add_drain() >>>> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. >>>> >>>> Remove lru_add_drain() prior to gathering and unmapping pages in >>>> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. >>>> >>>> Note that the page unmapping process in oom_killer (e.g., in >>>> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have >>>> redundant lru_add_drain(). So, this commit makes the code more consistent. >>> >>> Shouldn't we put this in __tlb_gather_mmu() which already has the >>> CONFIG_MMU_GATHER_NO_GATHER ifdefs? That would presuambly help with, eg >>> zap_page_range_single() too. >>> >> >> Thanks. It makes sense to me. >> This commit is motivated by a workload that use mmap/unmap heavily. >> While the mmu_gather feature is also used by hugetlb, madvise, mprotect, >> etc., thus I prefer to have another standalone commit (following this one) >> that moves lru_add_drain() to __tlb_gather_mmu() to unify these cases for >> not making redundant lru_add_drain() calls when using mmu_gather. > > That's not normally the approach we take. Okay, understood. Thanks for pointing it out. Let me send a new patch that aligns with your suggestion.
On 12/14/23 3:00 PM, Matthew Wilcox wrote: > On Thu, Dec 14, 2023 at 02:27:17PM -0800, Jianfeng Wang wrote: >> When unmapping VMA pages, pages will be gathered in batch and released by >> tlb_finish_mmu() if CONFIG_MMU_GATHER_NO_GATHER is not set. The function >> tlb_finish_mmu() is responsible for calling free_pages_and_swap_cache(), >> which calls lru_add_drain() to drain cached pages in folio_batch before >> releasing gathered pages. Thus, it is redundant to call lru_add_drain() >> before gathering pages, if CONFIG_MMU_GATHER_NO_GATHER is not set. >> >> Remove lru_add_drain() prior to gathering and unmapping pages in >> exit_mmap() and unmap_region() if CONFIG_MMU_GATHER_NO_GATHER is not set. >> >> Note that the page unmapping process in oom_killer (e.g., in >> __oom_reap_task_mm()) also uses tlb_finish_mmu() and does not have >> redundant lru_add_drain(). So, this commit makes the code more consistent. > > Shouldn't we put this in __tlb_gather_mmu() which already has the > CONFIG_MMU_GATHER_NO_GATHER ifdefs? That would presuambly help with, eg > zap_page_range_single() too. > After looking at different use cases of tlb_gather_mmu(), I feel it is questionable to move lru_add_drain() into __tlb_gather_mmu(). There are two use cases of tlb_gather_mmu(): one for unmapping and releasing pages (e.g., the two cases in mmap.c); the other one is to update page table entries and flush TLB without releasing pages (e.g., together with mprotect_fixup()). For the latter use case, it is reasonable to not call lru_add_drain() prior to or within tlb_gather_mmu(). Of course, we may update tlb_gather_mmu()'s API to take this into account. For example, we can have tlb_gather_mmu_for_release() for the first case and tlb_gather_mmu() for the latter. I'd like to have your opinion on this. Thanks! >> Signed-off-by: Jianfeng Wang <jianfeng.w.wang@oracle.com> >> --- >> mm/mmap.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/mm/mmap.c b/mm/mmap.c >> index 1971bfffcc03..da0308eef435 100644 >> --- a/mm/mmap.c >> +++ b/mm/mmap.c >> @@ -2330,7 +2330,10 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas, >> struct mmu_gather tlb; >> unsigned long mt_start = mas->index; >> >> + /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */ >> +#ifdef CONFIG_MMU_GATHER_NO_GATHER >> lru_add_drain(); >> +#endif >> tlb_gather_mmu(&tlb, mm); >> update_hiwater_rss(mm); >> unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); >> @@ -3300,7 +3303,10 @@ void exit_mmap(struct mm_struct *mm) >> return; >> } >> >> + /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */ >> +#ifdef CONFIG_MMU_GATHER_NO_GATHER >> lru_add_drain(); >> +#endif >> flush_cache_mm(mm); >> tlb_gather_mmu_fullmm(&tlb, mm); >> /* update_hiwater_rss(mm) here? but nobody should be looking */ >> -- >> 2.42.1 >> >>
On Fri, Dec 15, 2023 at 12:48:10AM -0800, Jianfeng Wang wrote: > On 12/14/23 3:00 PM, Matthew Wilcox wrote: > > Shouldn't we put this in __tlb_gather_mmu() which already has the > > CONFIG_MMU_GATHER_NO_GATHER ifdefs? That would presuambly help with, eg > > zap_page_range_single() too. > > After looking at different use cases of tlb_gather_mmu(), I feel it is > questionable to move lru_add_drain() into __tlb_gather_mmu(). There are > two use cases of tlb_gather_mmu(): one for unmapping and releasing pages > (e.g., the two cases in mmap.c); the other one is to update page table > entries and flush TLB without releasing pages (e.g., together with > mprotect_fixup()). For the latter use case, it is reasonable to not call > lru_add_drain() prior to or within tlb_gather_mmu(). > > Of course, we may update tlb_gather_mmu()'s API to take this into account. > For example, we can have tlb_gather_mmu_for_release() for the first case > and tlb_gather_mmu() for the latter. I'd like to have your opinion on this. Yes, I like this idea. You're right that there's no need to drain the lru lists for the other use case, so it makes sense to have two APIs.
Hello, kernel test robot noticed "kernel_BUG_at_mm/page_alloc.c" on: commit: 5e3b1fa97b04faab8760b62ddab2cd2d62110400 ("[PATCH v2] mm: remove redundant lru_add_drain() prior to unmapping pages") url: https://github.com/intel-lab-lkp/linux/commits/Jianfeng-Wang/mm-remove-redundant-lru_add_drain-prior-to-unmapping-pages/20231215-062828 base: v6.7-rc5 patch link: https://lore.kernel.org/all/20231214222717.50277-1-jianfeng.w.wang@oracle.com/ patch subject: [PATCH v2] mm: remove redundant lru_add_drain() prior to unmapping pages in testcase: trinity version: trinity-i386-abe9de86-1_20230429 with following parameters: runtime: 600s test-description: Trinity is a linux system call fuzz tester. test-url: http://codemonkey.org.uk/projects/trinity/ compiler: clang-16 test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G (please refer to attached dmesg/kmsg for entire log/backtrace) +--------------------------------------------------------------------------------------+----------+------------+ | | v6.7-rc5 | 5e3b1fa97b | +--------------------------------------------------------------------------------------+----------+------------+ | kernel_BUG_at_mm/page_alloc.c | 0 | 10 | | invalid_opcode:#[##] | 0 | 10 | | EIP:free_unref_page_prepare | 0 | 10 | +--------------------------------------------------------------------------------------+----------+------------+ If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202312221154.85f3c4a3-oliver.sang@intel.com [ 127.901286][ T2652] ------------[ cut here ]------------ [ 127.901579][ T2652] kernel BUG at mm/page_alloc.c:1084! [ 127.901870][ T2652] invalid opcode: 0000 [#1] PREEMPT SMP [ 127.902168][ T2652] CPU: 1 PID: 2652 Comm: trinity-c3 Tainted: G W N 6.7.0-rc5-00001-g5e3b1fa97b04 #1 e678fecaf3798641ec130c7b4b31f90b64a8d53d [ 127.902921][ T2652] EIP: free_unref_page_prepare (mm/page_alloc.c:1084) [ 127.903243][ T2652] Code: 8b 55 e4 b9 07 00 00 00 e8 a3 de ff ff 89 47 10 b0 01 83 c4 10 5e 5f 5b 5d 31 c9 31 d2 c3 89 f8 ba 4c 7c 68 c2 e8 86 76 fd ff <0f> 0b 68 c8 21 b2 c2 e8 8a f4 1f 00 0f 0b 0f a3 05 1c e4 c5 c2 0f All code ======== 0: 8b 55 e4 mov -0x1c(%rbp),%edx 3: b9 07 00 00 00 mov $0x7,%ecx 8: e8 a3 de ff ff call 0xffffffffffffdeb0 d: 89 47 10 mov %eax,0x10(%rdi) 10: b0 01 mov $0x1,%al 12: 83 c4 10 add $0x10,%esp 15: 5e pop %rsi 16: 5f pop %rdi 17: 5b pop %rbx 18: 5d pop %rbp 19: 31 c9 xor %ecx,%ecx 1b: 31 d2 xor %edx,%edx 1d: c3 ret 1e: 89 f8 mov %edi,%eax 20: ba 4c 7c 68 c2 mov $0xc2687c4c,%edx 25: e8 86 76 fd ff call 0xfffffffffffd76b0 2a:* 0f 0b ud2 <-- trapping instruction 2c: 68 c8 21 b2 c2 push $0xffffffffc2b221c8 31: e8 8a f4 1f 00 call 0x1ff4c0 36: 0f 0b ud2 38: 0f a3 05 1c e4 c5 c2 bt %eax,-0x3d3a1be4(%rip) # 0xffffffffc2c5e45b 3f: 0f .byte 0xf Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: 68 c8 21 b2 c2 push $0xffffffffc2b221c8 7: e8 8a f4 1f 00 call 0x1ff496 c: 0f 0b ud2 e: 0f a3 05 1c e4 c5 c2 bt %eax,-0x3d3a1be4(%rip) # 0xffffffffc2c5e431 15: 0f .byte 0xf [ 127.904296][ T2652] EAX: 00000000 EBX: e869ad40 ECX: 00000000 EDX: 00000000 [ 127.904681][ T2652] ESI: 00000001 EDI: e869ad40 EBP: ed779b3b ESP: ed779b1f [ 127.905066][ T2652] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010246 [ 127.905480][ T2652] CR0: 80050033 CR2: b69e1000 CR3: 2cd56000 CR4: 00040690 [ 127.905867][ T2652] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 127.906253][ T2652] DR6: fffe0ff0 DR7: 00000400 [ 127.906508][ T2652] Call Trace: [ 127.906711][ T2652] ? fault_in_iov_iter_readable (lib/iov_iter.c:109) [ 127.907028][ T2652] ? generic_perform_write (mm/filemap.c:?) [ 127.907339][ T2652] ? shmem_file_write_iter (mm/shmem.c:?) [ 127.907637][ T2652] ? do_iter_write (fs/read_write.c:736) [ 127.907899][ T2652] ? vfs_writev (fs/read_write.c:933) [ 127.908144][ T2652] ? do_writev (fs/read_write.c:976) [ 127.908376][ T2652] ? __ia32_sys_writev (fs/read_write.c:1049 fs/read_write.c:1046 fs/read_write.c:1046) [ 127.908644][ T2652] ? __do_fast_syscall_32 (arch/x86/entry/common.c:165) [ 127.908927][ T2652] ? irqentry_exit_to_user_mode (kernel/entry/common.c:312) [ 127.909235][ T2652] ? irqentry_exit (kernel/entry/common.c:445) [ 127.909485][ T2652] ? do_fast_syscall_32 (arch/x86/entry/common.c:346) [ 127.909757][ T2652] ? do_SYSENTER_32 (arch/x86/entry/common.c:384) [ 127.910012][ T2652] ? entry_SYSENTER_32 (arch/x86/entry/entry_32.S:840) [ 127.910282][ T2652] Modules linked in: [ 127.910512][ T2652] ---[ end trace 0000000000000000 ]--- [ 127.910806][ T2652] EIP: free_unref_page_prepare (mm/page_alloc.c:1084) [ 127.911140][ T2652] Code: 8b 55 e4 b9 07 00 00 00 e8 a3 de ff ff 89 47 10 b0 01 83 c4 10 5e 5f 5b 5d 31 c9 31 d2 c3 89 f8 ba 4c 7c 68 c2 e8 86 76 fd ff <0f> 0b 68 c8 21 b2 c2 e8 8a f4 1f 00 0f 0b 0f a3 05 1c e4 c5 c2 0f All code ======== 0: 8b 55 e4 mov -0x1c(%rbp),%edx 3: b9 07 00 00 00 mov $0x7,%ecx 8: e8 a3 de ff ff call 0xffffffffffffdeb0 d: 89 47 10 mov %eax,0x10(%rdi) 10: b0 01 mov $0x1,%al 12: 83 c4 10 add $0x10,%esp 15: 5e pop %rsi 16: 5f pop %rdi 17: 5b pop %rbx 18: 5d pop %rbp 19: 31 c9 xor %ecx,%ecx 1b: 31 d2 xor %edx,%edx 1d: c3 ret 1e: 89 f8 mov %edi,%eax 20: ba 4c 7c 68 c2 mov $0xc2687c4c,%edx 25: e8 86 76 fd ff call 0xfffffffffffd76b0 2a:* 0f 0b ud2 <-- trapping instruction 2c: 68 c8 21 b2 c2 push $0xffffffffc2b221c8 31: e8 8a f4 1f 00 call 0x1ff4c0 36: 0f 0b ud2 38: 0f a3 05 1c e4 c5 c2 bt %eax,-0x3d3a1be4(%rip) # 0xffffffffc2c5e45b 3f: 0f .byte 0xf Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: 68 c8 21 b2 c2 push $0xffffffffc2b221c8 7: e8 8a f4 1f 00 call 0x1ff496 c: 0f 0b ud2 e: 0f a3 05 1c e4 c5 c2 bt %eax,-0x3d3a1be4(%rip) # 0xffffffffc2c5e431 15: 0f .byte 0xf The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20231222/202312221154.85f3c4a3-oliver.sang@intel.com
diff --git a/mm/mmap.c b/mm/mmap.c index 1971bfffcc03..da0308eef435 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2330,7 +2330,10 @@ static void unmap_region(struct mm_struct *mm, struct ma_state *mas, struct mmu_gather tlb; unsigned long mt_start = mas->index; + /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */ +#ifdef CONFIG_MMU_GATHER_NO_GATHER lru_add_drain(); +#endif tlb_gather_mmu(&tlb, mm); update_hiwater_rss(mm); unmap_vmas(&tlb, mas, vma, start, end, tree_end, mm_wr_locked); @@ -3300,7 +3303,10 @@ void exit_mmap(struct mm_struct *mm) return; } + /* Defer lru_add_drain() to tlb_finish_mmu() for the ifndef case. */ +#ifdef CONFIG_MMU_GATHER_NO_GATHER lru_add_drain(); +#endif flush_cache_mm(mm); tlb_gather_mmu_fullmm(&tlb, mm); /* update_hiwater_rss(mm) here? but nobody should be looking */