Message ID | 202311142036302357580@zte.com.cn |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b909:0:b0:403:3b70:6f57 with SMTP id t9csp1826722vqg; Tue, 14 Nov 2023 04:37:37 -0800 (PST) X-Google-Smtp-Source: AGHT+IEj8XaICMBqytqR9pXVl9AFUhvUUhLJ75ORQgPbinG9Hb48sFEgVraHobC4WUo8C7ATXham X-Received: by 2002:a17:902:ccd2:b0:1cc:5d06:b38 with SMTP id z18-20020a170902ccd200b001cc5d060b38mr2149379ple.64.1699965457278; Tue, 14 Nov 2023 04:37:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699965457; cv=none; d=google.com; s=arc-20160816; b=Rqd5gk+2Oi/5byZUScKOx11IO7R5r7GVQt0mKzvV6w9kpxcDKP2RlzUERt3KTFVQsE iV0m3/8FXEbEB2/uRe1oVtQPhOPkxvzBQ+d5k92Kgo7dJCfBrVKHgsfuujWLQ+NSvCPl nH5re5pHetCKq2u2XrLkFD36O+0NwovnQocsqM+SIWPjiqyx4JcfNEjQqmYh9P35ePNa +/xnp6piXZ16prD1LQY4d+TjKbJ9Hf10Ni/T4xzq68NO7hpZSlzaFHUNrAW17nU2Dvf8 DfD+ZjarCGM4oxNhAyPpDQ9DS2eYskpetgOMapmWaFT8yGLEPswG++fQ7Gw0AKIUA/4q HqiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:subject:cc:to:from:mime-version:message-id:date; bh=r3pZbpvx+eMrnwi3b7DCyVz9oZbOtGVK8k7Jiw5A3wg=; fh=RQFR+2pHd96oBQk1Nu7crncwPMkY9aXhAX1+oSC0Ouo=; b=lZjprfEPQa0yVF0HykYvk5pAQ98/hgDvsO578bNnE7Na3nL5NGi6kDF6jNthWEkNBu k3Zd2Lle6sY5PMm6Yv38akUFxiWMncfbz5rxoBL4u09MPTlO3SYiKYDo5Y8zT9y1M3gd 1W2lkyYNoJN9doeXksor9EGWqQgFD2PQWaaioKYpUZChVWLWv6exHuFL1LKfxc8iIKbn hFQe1SaqtvqNI28lIt2VSuiQJp2NHjuqb2aPT0kH5NpuXHEcjltzF/5W4a0LTN8j5nox /351nmdrBFD1c+KLySgCq941WzSjO2FmOcNvif/dNxoJmRb701pPo9wywkb2htbtldTo bOuA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=zte.com.cn Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id i5-20020a170902cf0500b001c72b13a1cbsi8729826plg.352.2023.11.14.04.37.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Nov 2023 04:37:37 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=zte.com.cn Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id D9F0880B7C2A; Tue, 14 Nov 2023 04:37:34 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232992AbjKNMg4 (ORCPT <rfc822;lhua1029@gmail.com> + 29 others); Tue, 14 Nov 2023 07:36:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34668 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231877AbjKNMgz (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 14 Nov 2023 07:36:55 -0500 Received: from mxct.zte.com.cn (mxct.zte.com.cn [58.251.27.85]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E5BDF0 for <linux-kernel@vger.kernel.org>; Tue, 14 Nov 2023 04:36:50 -0800 (PST) Received: from mxde.zte.com.cn (unknown [10.35.20.165]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mxct.zte.com.cn (FangMail) with ESMTPS id 4SV5QM6HcCz1Fhp for <linux-kernel@vger.kernel.org>; Tue, 14 Nov 2023 20:36:43 +0800 (CST) Received: from mxhk.zte.com.cn (unknown [192.168.250.138]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mxde.zte.com.cn (FangMail) with ESMTPS id 4SV5QG4Qqkz67Ckc for <linux-kernel@vger.kernel.org>; Tue, 14 Nov 2023 20:36:38 +0800 (CST) Received: from mse-fl2.zte.com.cn (unknown [10.5.228.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mxhk.zte.com.cn (FangMail) with ESMTPS id 4SV5Q82TyLz4xPYj; Tue, 14 Nov 2023 20:36:32 +0800 (CST) Received: from szxlzmapp04.zte.com.cn ([10.5.231.166]) by mse-fl2.zte.com.cn with SMTP id 3AECaR48053112; Tue, 14 Nov 2023 20:36:27 +0800 (+08) (envelope-from yang.yang29@zte.com.cn) Received: from mapi (szxlzmapp01[null]) by mapi (Zmail) with MAPI id mid14; Tue, 14 Nov 2023 20:36:30 +0800 (CST) Date: Tue, 14 Nov 2023 20:36:30 +0800 (CST) X-Zmail-TransId: 2b03655369ceffffffff932-85ee3 X-Mailer: Zmail v1.0 Message-ID: <202311142036302357580@zte.com.cn> Mime-Version: 1.0 From: <yang.yang29@zte.com.cn> To: <akpm@linux-foundation.org>, <david@redhat.com> Cc: <imbrenda@linux.ibm.com>, <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>, <ran.xiaokai@zte.com.cn>, <xu.xin.sc@gmail.com>, <xu.xin16@zte.com.cn>, <yang.yang29@zte.com.cn>, <jiang.xuexin@zte.com.cn>, <wang.yong12@zte.com.cn> Subject: =?utf-8?q?=5BPATCH=5D_ksm=3A_delay_the_check_of_splitting_compound_?= =?utf-8?q?pages?= Content-Type: text/plain; charset="UTF-8" X-MAIL: mse-fl2.zte.com.cn 3AECaR48053112 X-Fangmail-Gw-Spam-Type: 0 X-Fangmail-Anti-Spam-Filtered: true X-Fangmail-MID-QID: 655369DA.001/4SV5QM6HcCz1Fhp X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Tue, 14 Nov 2023 04:37:35 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782542978947322944 X-GMAIL-MSGID: 1782542978947322944 |
Series |
ksm: delay the check of splitting compound pages
|
|
Commit Message
Yang Yang
Nov. 14, 2023, 12:36 p.m. UTC
From: xu xin <xu.xin16@zte.com.cn> Background ========== When trying to merge two pages, it may fail because the two pages belongs to the same compound page and split_huge_page fails due to the incorrect reference to the page. To solve the problem, the commit 77da2ba0648a4 ("mm/ksm: fix interaction with THP") tries to split the compound page after try_to_merge_two_pages() fails and put_page in that case. However it is too early to calculate of the variable 'split' which indicates whether the two pages belongs to the same compound page. What to do ========== If try_to_merge_two_pages() succeeds, there is no need to check whether to splitting compound pages. So we delay the check of splitting compound pages until try_to_merge_two_pages() fails, which can improve the processing efficiency of cmp_and_merge_page() a little. Signed-off-by: xu xin <xu.xin16@zte.com.cn> Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> --- mm/ksm.c | 36 ++++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 16 deletions(-)
Comments
On 14.11.23 13:36, yang.yang29@zte.com.cn wrote: > From: xu xin <xu.xin16@zte.com.cn> > > Background > ========== > When trying to merge two pages, it may fail because the two pages > belongs to the same compound page and split_huge_page fails due to > the incorrect reference to the page. To solve the problem, the commit > 77da2ba0648a4 ("mm/ksm: fix interaction with THP") tries to split the > compound page after try_to_merge_two_pages() fails and put_page in > that case. However it is too early to calculate of the variable 'split' which > indicates whether the two pages belongs to the same compound page. > > What to do > ========== > If try_to_merge_two_pages() succeeds, there is no need to check whether > to splitting compound pages. So we delay the check of splitting compound > pages until try_to_merge_two_pages() fails, which can improve the > processing efficiency of cmp_and_merge_page() a little. > > Signed-off-by: xu xin <xu.xin16@zte.com.cn> > Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> Can we please add a unit test to ksm_functional_tests.c so we actually get it right this time?
On 14.11.23 13:36, yang.yang29@zte.com.cn wrote: > From: xu xin <xu.xin16@zte.com.cn> > > Background > ========== > When trying to merge two pages, it may fail because the two pages > belongs to the same compound page and split_huge_page fails due to > the incorrect reference to the page. To solve the problem, the commit > 77da2ba0648a4 ("mm/ksm: fix interaction with THP") tries to split the > compound page after try_to_merge_two_pages() fails and put_page in > that case. However it is too early to calculate of the variable 'split' which > indicates whether the two pages belongs to the same compound page. > > What to do > ========== > If try_to_merge_two_pages() succeeds, there is no need to check whether > to splitting compound pages. So we delay the check of splitting compound > pages until try_to_merge_two_pages() fails, which can improve the > processing efficiency of cmp_and_merge_page() a little. > > Signed-off-by: xu xin <xu.xin16@zte.com.cn> > Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> > --- > mm/ksm.c | 36 ++++++++++++++++++++---------------- > 1 file changed, 20 insertions(+), 16 deletions(-) > > diff --git a/mm/ksm.c b/mm/ksm.c > index 7efcc68ccc6e..c952fe5d9e43 100644 > --- a/mm/ksm.c > +++ b/mm/ksm.c > @@ -2229,24 +2229,10 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite > tree_rmap_item = > unstable_tree_search_insert(rmap_item, page, &tree_page); > if (tree_rmap_item) { > - bool split; > - > kpage = try_to_merge_two_pages(rmap_item, page, > tree_rmap_item, tree_page); > - /* > - * If both pages we tried to merge belong to the same compound > - * page, then we actually ended up increasing the reference > - * count of the same compound page twice, and split_huge_page > - * failed. > - * Here we set a flag if that happened, and we use it later to > - * try split_huge_page again. Since we call put_page right > - * afterwards, the reference count will be correct and > - * split_huge_page should succeed. > - */ I'm curious, why can't we detect that ahead of time and keep only a single reference? Why do we need the backup code? Anything I am missing? > - split = PageTransCompound(page) > - && compound_head(page) == compound_head(tree_page); > - put_page(tree_page); > if (kpage) { > + put_page(tree_page); > /* > * The pages were successfully merged: insert new > * node in the stable tree and add both rmap_items. > @@ -2271,7 +2257,25 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite > break_cow(tree_rmap_item); > break_cow(rmap_item); > } > - } else if (split) { > + } else { > + bool split; > + /* > + * If both pages we tried to merge belong to the same compound > + * page, then we actually ended up increasing the reference > + * count of the same compound page twice, and split_huge_page > + * failed. > + * Here we set a flag if that happened, and we use it later to > + * try split_huge_page again. Since we call put_page right > + * afterwards, the reference count will be correct and > + * split_huge_page should succeed. > + */ > + > + split = PageTransCompound(page) > + && compound_head(page) == compound_head(tree_page); Would split = page_folio(page) == page_folio(tree_page); do the trick? No need to mess with compound pages.
>> diff --git a/mm/ksm.c b/mm/ksm.c >> index 7efcc68ccc6e..c952fe5d9e43 100644 >> --- a/mm/ksm.c >> +++ b/mm/ksm.c >> @@ -2229,24 +2229,10 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite >> tree_rmap_item = >> unstable_tree_search_insert(rmap_item, page, &tree_page); >> if (tree_rmap_item) { >> - bool split; >> - >> kpage = try_to_merge_two_pages(rmap_item, page, >> tree_rmap_item, tree_page); >> - /* >> - * If both pages we tried to merge belong to the same compound >> - * page, then we actually ended up increasing the reference >> - * count of the same compound page twice, and split_huge_page >> - * failed. >> - * Here we set a flag if that happened, and we use it later to >> - * try split_huge_page again. Since we call put_page right >> - * afterwards, the reference count will be correct and >> - * split_huge_page should succeed. >> - */ > >I'm curious, why can't we detect that ahead of time and keep only a >single reference? Why do we need the backup code? Anything I am missing? I don't know the original reason, better ask Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>. Maybe because doing detection that ahead of time will break several funtions' semantic, such as try_to_merge_two_pages(), try_to_merge_with_ksm_page() and try_to_merge_one_page() Adding the backup code don't change the old code and fixing the old problem, it's good. > >> - split = PageTransCompound(page) >> - && compound_head(page) == compound_head(tree_page); >> - put_page(tree_page); >> if (kpage) { >> + put_page(tree_page); >> /* >> * The pages were successfully merged: insert new >> * node in the stable tree and add both rmap_items. >> @@ -2271,7 +2257,25 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite >> break_cow(tree_rmap_item); >> break_cow(rmap_item); >> } >> - } else if (split) { >> + } else { >> + bool split; >> + /* >> + * If both pages we tried to merge belong to the same compound >> + * page, then we actually ended up increasing the reference >> + * count of the same compound page twice, and split_huge_page >> + * failed. >> + * Here we set a flag if that happened, and we use it later to >> + * try split_huge_page again. Since we call put_page right >> + * afterwards, the reference count will be correct and >> + * split_huge_page should succeed. >> + */ >> + >> + split = PageTransCompound(page) >> + && compound_head(page) == compound_head(tree_page); > >Would > >split = page_folio(page) == page_folio(tree_page); > >do the trick? No need to mess with compound pages. In terms of function correctness, it should work correctly because here 'page' and 'tree_page' are never the same page, which is guaranteed by unstable_tree_search_insert(). But it's not very intuitive, maybe ww need to add some comment.
>> From: xu xin <xu.xin16@zte.com.cn> >> >> Background >> ========== >> When trying to merge two pages, it may fail because the two pages >> belongs to the same compound page and split_huge_page fails due to >> the incorrect reference to the page. To solve the problem, the commit >> 77da2ba0648a4 ("mm/ksm: fix interaction with THP") tries to split the >> compound page after try_to_merge_two_pages() fails and put_page in >> that case. However it is too early to calculate of the variable 'split' which >> indicates whether the two pages belongs to the same compound page. >> >> What to do >> ========== >> If try_to_merge_two_pages() succeeds, there is no need to check whether >> to splitting compound pages. So we delay the check of splitting compound >> pages until try_to_merge_two_pages() fails, which can improve the >> processing efficiency of cmp_and_merge_page() a little. >> >> Signed-off-by: xu xin <xu.xin16@zte.com.cn> >> Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> > >Can we please add a unit test to ksm_functional_tests.c so we actually >get it right this time? Sure. Maybe we can simply refer to the reproducing way Claudio proposes in 77da2ba0648a4 ("mm/ksm: fix interaction with THP"). >-- >Cheers, > >David / dhildenb
On 15.11.23 04:11, xu wrote: >>> diff --git a/mm/ksm.c b/mm/ksm.c >>> index 7efcc68ccc6e..c952fe5d9e43 100644 >>> --- a/mm/ksm.c >>> +++ b/mm/ksm.c >>> @@ -2229,24 +2229,10 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite >>> tree_rmap_item = >>> unstable_tree_search_insert(rmap_item, page, &tree_page); >>> if (tree_rmap_item) { >>> - bool split; >>> - >>> kpage = try_to_merge_two_pages(rmap_item, page, >>> tree_rmap_item, tree_page); >>> - /* >>> - * If both pages we tried to merge belong to the same compound >>> - * page, then we actually ended up increasing the reference >>> - * count of the same compound page twice, and split_huge_page >>> - * failed. >>> - * Here we set a flag if that happened, and we use it later to >>> - * try split_huge_page again. Since we call put_page right >>> - * afterwards, the reference count will be correct and >>> - * split_huge_page should succeed. >>> - */ >> >> I'm curious, why can't we detect that ahead of time and keep only a >> single reference? Why do we need the backup code? Anything I am missing? > > I don't know the original reason, better ask Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>. > Maybe because doing detection that ahead of time will break several funtions' semantic, > such as try_to_merge_two_pages(), try_to_merge_with_ksm_page() and try_to_merge_one_page() > > Adding the backup code don't change the old code and fixing the old problem, it's good. It's absolutely counter-intuitive to check for something that cannot possibly work after the effects. This better has a good reason to make that code more complicated.
On 15.11.23 04:15, xu wrote: >>> From: xu xin <xu.xin16@zte.com.cn> >>> >>> Background >>> ========== >>> When trying to merge two pages, it may fail because the two pages >>> belongs to the same compound page and split_huge_page fails due to >>> the incorrect reference to the page. To solve the problem, the commit >>> 77da2ba0648a4 ("mm/ksm: fix interaction with THP") tries to split the >>> compound page after try_to_merge_two_pages() fails and put_page in >>> that case. However it is too early to calculate of the variable 'split' which >>> indicates whether the two pages belongs to the same compound page. >>> >>> What to do >>> ========== >>> If try_to_merge_two_pages() succeeds, there is no need to check whether >>> to splitting compound pages. So we delay the check of splitting compound >>> pages until try_to_merge_two_pages() fails, which can improve the >>> processing efficiency of cmp_and_merge_page() a little. >>> >>> Signed-off-by: xu xin <xu.xin16@zte.com.cn> >>> Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> >> >> Can we please add a unit test to ksm_functional_tests.c so we actually >> get it right this time? > > Sure. Maybe we can simply refer to the reproducing way Claudio proposes in > 77da2ba0648a4 ("mm/ksm: fix interaction with THP"). So, was Claudio able to verify that his fix was correct, and how come we figure out 5 years later that that fix is insufficient? Could it not have possibly worked, has something else changed in the meantime, or what's the deal here? Further elaborating on that in the patch description and adding a proper Fixes: tag will be appreciated ;)
>>>> @@ -2229,24 +2229,10 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite >>>> tree_rmap_item = >>>> unstable_tree_search_insert(rmap_item, page, &tree_page); >>>> if (tree_rmap_item) { >>>> - bool split; >>>> - >>>> kpage = try_to_merge_two_pages(rmap_item, page, >>>> tree_rmap_item, tree_page); >>>> - /* >>>> - * If both pages we tried to merge belong to the same compound >>>> - * page, then we actually ended up increasing the reference >>>> - * count of the same compound page twice, and split_huge_page >>>> - * failed. >>>> - * Here we set a flag if that happened, and we use it later to >>>> - * try split_huge_page again. Since we call put_page right >>>> - * afterwards, the reference count will be correct and >>>> - * split_huge_page should succeed. >>>> - */ >>> >>> I'm curious, why can't we detect that ahead of time and keep only a >>> single reference? Why do we need the backup code? Anything I am missing? Do you mean like this? --- a/mm/ksm.c +++ b/mm/ksm.c @@ -2229,23 +2229,21 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite tree_rmap_item = unstable_tree_search_insert(rmap_item, page, &tree_page); if (tree_rmap_item) { - bool split; + bool SameCompound; + /* + * If they belongs to the same compound page, its' reference + * get twice, so need to put_page once to avoid that + * split_huge_page fails in try_to_merge_two_pages(). + */ + if (SameCompound = Is_SameCompound(page, tree_page)) + put_page(tree_page); kpage = try_to_merge_two_pages(rmap_item, page, tree_rmap_item, tree_page); - /* - * If both pages we tried to merge belong to the same compound - * page, then we actually ended up increasing the reference - * count of the same compound page twice, and split_huge_page - * failed. - * Here we set a flag if that happened, and we use it later to - * try split_huge_page again. Since we call put_page right - * afterwards, the reference count will be correct and - * split_huge_page should succeed. - */ - split = PageTransCompound(page) - && compound_head(page) == compound_head(tree_page); - put_page(tree_page); + + if (!SameCompound) + put_page(tree_page); + if (kpage) { /* * The pages were successfully merged: insert new @@ -2271,20 +2269,6 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite break_cow(tree_rmap_item); break_cow(rmap_item); } - } else if (split) { - /* - * We are here if we tried to merge two pages and - * failed because they both belonged to the same - * compound page. We will split the page now, but no - * merging will take place. - * We do not want to add the cost of a full lock; if - * the page is locked, it is better to skip it and - * perhaps try again later. - */ - if (!trylock_page(page)) - return; - split_huge_page(page); - unlock_page(page); } } } >> >> I don't know the original reason, better ask Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>. >> Maybe because doing detection that ahead of time will break several funtions' semantic, >> such as try_to_merge_two_pages(), try_to_merge_with_ksm_page() and try_to_merge_one_page() >> >> Adding the backup code don't change the old code and fixing the old problem, it's good. > >It's absolutely counter-intuitive to check for something that cannot >possibly work after the effects. This better has a good reason to make >that code more complicated. >--
On 16.11.23 13:17, xu wrote: >>>>> @@ -2229,24 +2229,10 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite >>>>> tree_rmap_item = >>>>> unstable_tree_search_insert(rmap_item, page, &tree_page); >>>>> if (tree_rmap_item) { >>>>> - bool split; >>>>> - >>>>> kpage = try_to_merge_two_pages(rmap_item, page, >>>>> tree_rmap_item, tree_page); >>>>> - /* >>>>> - * If both pages we tried to merge belong to the same compound >>>>> - * page, then we actually ended up increasing the reference >>>>> - * count of the same compound page twice, and split_huge_page >>>>> - * failed. >>>>> - * Here we set a flag if that happened, and we use it later to >>>>> - * try split_huge_page again. Since we call put_page right >>>>> - * afterwards, the reference count will be correct and >>>>> - * split_huge_page should succeed. >>>>> - */ >>>> >>>> I'm curious, why can't we detect that ahead of time and keep only a >>>> single reference? Why do we need the backup code? Anything I am missing? > > Do you mean like this? Let me see if the refcounting here is sane: (a) The caller has a reference on "page" that it will put just after the cmp_and_merge_page() call. (b) unstable_tree_search_insert() obtained a reference to the "tree_page" using get_mergeable_page()->follow_page(). We will put that reference. So indeed, if both references are to the same folio, *we* have two references to the folio and can safely drop one of both. > > --- a/mm/ksm.c > +++ b/mm/ksm.c > @@ -2229,23 +2229,21 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite > tree_rmap_item = > unstable_tree_search_insert(rmap_item, page, &tree_page); > if (tree_rmap_item) { > - bool split; > + bool SameCompound; > + /* > + * If they belongs to the same compound page, its' reference > + * get twice, so need to put_page once to avoid that > + * split_huge_page fails in try_to_merge_two_pages(). > + */ > + if (SameCompound = Is_SameCompound(page, tree_page)) > + put_page(tree_page); > bool same_folio = page_folio(page) == page_folio(tree_page); /* * If both pages belong to the same folio, we are holding two references * to the same large folio: splitting the folio in * try_to_merge_one_page() will fail for that reason. So let's just drop * one reference early. Note that this is only possible if tree_page is * not a KSM page yet. */ if (same_folio) put_page(tree_page); [we could use more folio operations here, but lets KIS]
On 16.11.23 18:39, David Hildenbrand wrote: > On 16.11.23 13:17, xu wrote: >>>>>> @@ -2229,24 +2229,10 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite >>>>>> tree_rmap_item = >>>>>> unstable_tree_search_insert(rmap_item, page, &tree_page); >>>>>> if (tree_rmap_item) { >>>>>> - bool split; >>>>>> - >>>>>> kpage = try_to_merge_two_pages(rmap_item, page, >>>>>> tree_rmap_item, tree_page); >>>>>> - /* >>>>>> - * If both pages we tried to merge belong to the same compound >>>>>> - * page, then we actually ended up increasing the reference >>>>>> - * count of the same compound page twice, and split_huge_page >>>>>> - * failed. >>>>>> - * Here we set a flag if that happened, and we use it later to >>>>>> - * try split_huge_page again. Since we call put_page right >>>>>> - * afterwards, the reference count will be correct and >>>>>> - * split_huge_page should succeed. >>>>>> - */ >>>>> >>>>> I'm curious, why can't we detect that ahead of time and keep only a >>>>> single reference? Why do we need the backup code? Anything I am missing? >> >> Do you mean like this? > > Let me see if the refcounting here is sane: > > (a) The caller has a reference on "page" that it will put just after the > cmp_and_merge_page() call. > (b) unstable_tree_search_insert() obtained a reference to the > "tree_page" using get_mergeable_page()->follow_page(). We will put > that reference. > > So indeed, if both references are to the same folio, *we* have two > references to the folio and can safely drop one of both. > >> >> --- a/mm/ksm.c >> +++ b/mm/ksm.c >> @@ -2229,23 +2229,21 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite >> tree_rmap_item = >> unstable_tree_search_insert(rmap_item, page, &tree_page); >> if (tree_rmap_item) { >> - bool split; >> + bool SameCompound; >> + /* >> + * If they belongs to the same compound page, its' reference >> + * get twice, so need to put_page once to avoid that >> + * split_huge_page fails in try_to_merge_two_pages(). >> + */ >> + if (SameCompound = Is_SameCompound(page, tree_page)) >> + put_page(tree_page); >> > > bool same_folio = page_folio(page) == page_folio(tree_page); > > /* > * If both pages belong to the same folio, we are holding two references > * to the same large folio: splitting the folio in > * try_to_merge_one_page() will fail for that reason. So let's just drop > * one reference early. Note that this is only possible if tree_page is > * not a KSM page yet. > */ > if (same_folio) > put_page(tree_page); > > [we could use more folio operations here, but lets KIS] > Looking into the details, that doesn't work unfortunately. try_to_merge_one_page() will call split_huge_page(), but that will only hold a reference to "page", not to "tree page" after the split. Long story short, after split_huge_page() tree_page could get freed because we don't hold a reference to that one anymore. So we most probably cannot do better than the following (untested): diff --git a/mm/ksm.c b/mm/ksm.c index 7efcc68ccc6e..afb079524585 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -2226,25 +2226,30 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite if (!err) return; } +retry: tree_rmap_item = unstable_tree_search_insert(rmap_item, page, &tree_page); if (tree_rmap_item) { - bool split; + /* + * If both pages belong to the same folio, we are holding two + * references to the same large folio: splitting the folio in + * try_to_merge_one_page() will fail for that reason. + * + * We have to split manually, and lookup the tree_page + * again. Note that we'll only hold a reference to "page" after + * the split. + */ + if (page_folio(page) == page_folio(tree_page)) { + put_page(tree_page); + + if (!trylock_page(page)) + return; + split_huge_page(page); + unlock_page(page); + goto retry; + } kpage = try_to_merge_two_pages(rmap_item, page, tree_rmap_item, tree_page); - /* - * If both pages we tried to merge belong to the same compound - * page, then we actually ended up increasing the reference - * count of the same compound page twice, and split_huge_page - * failed. - * Here we set a flag if that happened, and we use it later to - * try split_huge_page again. Since we call put_page right - * afterwards, the reference count will be correct and - * split_huge_page should succeed. - */ - split = PageTransCompound(page) - && compound_head(page) == compound_head(tree_page); put_page(tree_page); if (kpage) { /* @@ -2271,20 +2276,6 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite break_cow(tree_rmap_item); break_cow(rmap_item); } - } else if (split) { - /* - * We are here if we tried to merge two pages and - * failed because they both belonged to the same - * compound page. We will split the page now, but no - * merging will take place. - * We do not want to add the cost of a full lock; if - * the page is locked, it is better to skip it and - * perhaps try again later. - */ - if (!trylock_page(page)) - return; - split_huge_page(page); - unlock_page(page); } } }
diff --git a/mm/ksm.c b/mm/ksm.c index 7efcc68ccc6e..c952fe5d9e43 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -2229,24 +2229,10 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite tree_rmap_item = unstable_tree_search_insert(rmap_item, page, &tree_page); if (tree_rmap_item) { - bool split; - kpage = try_to_merge_two_pages(rmap_item, page, tree_rmap_item, tree_page); - /* - * If both pages we tried to merge belong to the same compound - * page, then we actually ended up increasing the reference - * count of the same compound page twice, and split_huge_page - * failed. - * Here we set a flag if that happened, and we use it later to - * try split_huge_page again. Since we call put_page right - * afterwards, the reference count will be correct and - * split_huge_page should succeed. - */ - split = PageTransCompound(page) - && compound_head(page) == compound_head(tree_page); - put_page(tree_page); if (kpage) { + put_page(tree_page); /* * The pages were successfully merged: insert new * node in the stable tree and add both rmap_items. @@ -2271,7 +2257,25 @@ static void cmp_and_merge_page(struct page *page, struct ksm_rmap_item *rmap_ite break_cow(tree_rmap_item); break_cow(rmap_item); } - } else if (split) { + } else { + bool split; + /* + * If both pages we tried to merge belong to the same compound + * page, then we actually ended up increasing the reference + * count of the same compound page twice, and split_huge_page + * failed. + * Here we set a flag if that happened, and we use it later to + * try split_huge_page again. Since we call put_page right + * afterwards, the reference count will be correct and + * split_huge_page should succeed. + */ + + split = PageTransCompound(page) + && compound_head(page) == compound_head(tree_page); + put_page(tree_page); + + if (!split) + return; /* * We are here if we tried to merge two pages and * failed because they both belonged to the same