Message ID | 20230414194832.973194-1-tsahu@linux.ibm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp625264vqo; Fri, 14 Apr 2023 12:54:28 -0700 (PDT) X-Google-Smtp-Source: AKy350ZfmdN8fN6qoxm2jOHulVjGrKOjFi+S4A49rUizLHjIiybV+gmmPOvZcDihjU6yjRum8L9X X-Received: by 2002:a05:6a20:ba85:b0:ec:2b01:1069 with SMTP id fb5-20020a056a20ba8500b000ec2b011069mr6266290pzb.45.1681502068592; Fri, 14 Apr 2023 12:54:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681502068; cv=none; d=google.com; s=arc-20160816; b=gnTih4v9s5x0QNqSECYjJEXfqiCPExkUr/qa/oYR2UL1Ak0c/42RNdWzYOnKSX8Qob l2QTxUwcAt5piF4OG8u/G+cAC2cojJMofxLG7xjhq0TeHtUp08t6WAzs0pODhfFKHPPg Zyf1YTTp+82prDBsP0zBoXvlMTGwKnHiRJnW8uzDJoKKXwwYiWGNIso/51osvVkU+dXZ YkpQ+XhIgAU+N276SFewd4zJ8GxLbFnnBIqHNql5ifBo82oqaQd06CulEzeBZaOZ1bGv uHJ83710lBlKUMTexixarfRj6csWOBwyiHVxzW3/1BNGn/DzpBXNdj/ThKIMKu6EDVCh OmzQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :message-id:date:subject:cc:to:from:dkim-signature; bh=Tz1sdDgTX7NzO5j6Tr87zpN4tbJYB+amGklPjb0YLQw=; b=lKMmelHuQvBgY37I7WBe1ivDkxfN6fvHnR00MpjIimmvYIZlXpiJzyUmAA/uZ9xIaT 4+2+haVSMSPI2UEk5GA4OBAf+3vXOP0pFCTPRhnKEaAyMkfG7OKJcugrWOu39Bx4ItC6 1Qf+UVrTVqVwydRLCovO+eUC1hXPvM3TqnrwVf5KKMwbdwl01wC1tUevt3dyWUKhhm7S Jota6mMtpbhdKuxVqAv7uq80IaJDO7c5V6B6n5lzQv4dbN0t33PyDfGpOC84JpnBXxGL /zbJ5fnhO437eDD1hktZTEI9R6PMw94MCpAnW2fPm5U7teUxF5vNko3g6EzXrXuSCHak TSnQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=RXSjyrwS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r78-20020a632b51000000b0050fa029676fsi5286490pgr.774.2023.04.14.12.54.13; Fri, 14 Apr 2023 12:54:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=RXSjyrwS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229904AbjDNTtH (ORCPT <rfc822;leviz.kernel.dev@gmail.com> + 99 others); Fri, 14 Apr 2023 15:49:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53136 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229491AbjDNTtG (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 14 Apr 2023 15:49:06 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 12EAF4C39 for <linux-kernel@vger.kernel.org>; Fri, 14 Apr 2023 12:49:02 -0700 (PDT) Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 33EJMZpS023873; Fri, 14 Apr 2023 19:48:47 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : content-transfer-encoding : mime-version; s=pp1; bh=Tz1sdDgTX7NzO5j6Tr87zpN4tbJYB+amGklPjb0YLQw=; b=RXSjyrwSSMaMPJ9ojJUieOVk8aFp1I41gpMA5PCxGNbZT+O9aO+BaXJ1DeYiS44VlN8z tz1YNhPjN6MgNKVeXkfX/qXOIs3I5wxyq60veYXUtBBxDKh16x1QQCPQ6aShdjOr3dMT SzSQuu10om0vOKfsdqNP5uigKl0RsijUtVQbNNAVJPkMdG7KLF8Kay5nZfniF/h7s1gi 49LRllNfKj5IeMoTTTj/IaYRtXXonSUOnPCiWQmX6+fVnnMjqhzPxvku9LR89onc+4rL Hg8SS0rkpIMir7tc+CWAmeglN7fNRYn9Dz35rYdt7Ilm6iuoACSLO0ZrFLUERwATtcxm 1A== Received: from ppma04ams.nl.ibm.com (63.31.33a9.ip4.static.sl-reverse.com [169.51.49.99]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3pycxdrr00-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 14 Apr 2023 19:48:47 +0000 Received: from pps.filterd (ppma04ams.nl.ibm.com [127.0.0.1]) by ppma04ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 33E3v1UH002603; Fri, 14 Apr 2023 19:48:45 GMT Received: from smtprelay07.fra02v.mail.ibm.com ([9.218.2.229]) by ppma04ams.nl.ibm.com (PPS) with ESMTPS id 3pu0m1bw0p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 14 Apr 2023 19:48:45 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay07.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 33EJmfYc13894366 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 14 Apr 2023 19:48:41 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id C0F2A2004B; Fri, 14 Apr 2023 19:48:41 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6C42520040; Fri, 14 Apr 2023 19:48:36 +0000 (GMT) Received: from tarunpc.ibmuc.com (unknown [9.43.14.252]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Fri, 14 Apr 2023 19:48:36 +0000 (GMT) From: Tarun Sahu <tsahu@linux.ibm.com> To: linux-mm@kvack.org Cc: akpm@linux-foundation.org, muchun.song@linux.dev, mike.kravetz@oracle.com, aneesh.kumar@linux.ibm.com, willy@infradead.org, sidhartha.kumar@oracle.com, gerald.schaefer@linux.ibm.com, linux-kernel@vger.kernel.org, jaypatel@linux.ibm.com, tsahu@linux.ibm.com Subject: [PATCH] mm/folio: Avoid special handling for order value 0 in folio_set_order Date: Sat, 15 Apr 2023 01:18:32 +0530 Message-Id: <20230414194832.973194-1-tsahu@linux.ibm.com> X-Mailer: git-send-email 2.31.1 X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: 3i7m5OSu9A-wYDstwtiRq8slVmQrLv6q X-Proofpoint-GUID: 3i7m5OSu9A-wYDstwtiRq8slVmQrLv6q Content-Transfer-Encoding: 8bit X-Proofpoint-UnRewURL: 0 URL was un-rewritten MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.942,Hydra:6.0.573,FMLib:17.11.170.22 definitions=2023-04-14_12,2023-04-14_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 spamscore=0 impostorscore=0 lowpriorityscore=0 malwarescore=0 phishscore=0 clxscore=1011 mlxlogscore=999 adultscore=0 bulkscore=0 mlxscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2303200000 definitions=main-2304140174 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763182713224068256?= X-GMAIL-MSGID: =?utf-8?q?1763182713224068256?= |
Series |
mm/folio: Avoid special handling for order value 0 in folio_set_order
|
|
Commit Message
Tarun Sahu
April 14, 2023, 7:48 p.m. UTC
folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order
folio does not have any tail page to set order. folio->_folio_nr_pages is
set to 0 for order 0 in folio_set_order. It is required because
_folio_nr_pages overlapped with page->mapping and leaving it non zero
caused "bad page" error while freeing gigantic hugepages. This was fixed in
Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic
pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA
pages to CMA") now explicitly clear page->mapping and hence we won't see
the bad page error even if _folio_nr_pages remains unset. Also the order 0
folios are not supposed to call folio_set_order, So now we can get rid of
folio_set_order(folio, 0) from hugetlb code path to clear the confusion.
The patch also moves _folio_set_head and folio_set_order calls in
__prep_compound_gigantic_folio() such that we avoid clearing them in the
error path.
Testing: I have run LTP tests, which all passes. and also I have written
the test in LTP which tests the bug caused by compound_nr and page->mapping
overlapping.
https://lore.kernel.org/all/20230413090753.883953-1-tsahu@linux.ibm.com/
Running on older kernel ( < 5.10-rc7) with the above bug this fails while
on newer kernel and, also with this patch it passes.
Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com>
---
mm/hugetlb.c | 9 +++------
mm/internal.h | 8 ++------
2 files changed, 5 insertions(+), 12 deletions(-)
Comments
On Sat, Apr 15, 2023 at 01:18:32AM +0530, Tarun Sahu wrote: > folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order > folio does not have any tail page to set order. I think you're missing the point of how folio_set_order() is used. When splitting a large folio, we need to zero out the folio_nr_pages in the tail, so it does have a tail page, and that tail page needs to be zeroed. We even assert that there is a tail page: if (WARN_ON_ONCE(!folio_test_large(folio))) return; Or maybe you need to explain yourself better. > folio->_folio_nr_pages is > set to 0 for order 0 in folio_set_order. It is required because > _folio_nr_pages overlapped with page->mapping and leaving it non zero > caused "bad page" error while freeing gigantic hugepages. This was fixed in > Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic > pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA > pages to CMA") now explicitly clear page->mapping and hence we won't see > the bad page error even if _folio_nr_pages remains unset. Also the order 0 > folios are not supposed to call folio_set_order, So now we can get rid of > folio_set_order(folio, 0) from hugetlb code path to clear the confusion. ... this is all very confusing. > The patch also moves _folio_set_head and folio_set_order calls in > __prep_compound_gigantic_folio() such that we avoid clearing them in the > error path. But don't we need those bits set while we operate on the folio to set it up? It makes me nervous if we don't have those bits set because we can end up with speculative references that point to a head page while that page is not marked as a head page. It may not be a problem, but I want to see some air-tight analysis of that. > Testing: I have run LTP tests, which all passes. and also I have written > the test in LTP which tests the bug caused by compound_nr and page->mapping > overlapping. > > https://lore.kernel.org/all/20230413090753.883953-1-tsahu@linux.ibm.com/ > > Running on older kernel ( < 5.10-rc7) with the above bug this fails while > on newer kernel and, also with this patch it passes. > > Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com> > --- > mm/hugetlb.c | 9 +++------ > mm/internal.h | 8 ++------ > 2 files changed, 5 insertions(+), 12 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index f16b25b1a6b9..e2540269c1dc 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, > set_page_refcounted(p); > } > > - folio_set_order(folio, 0); > __folio_clear_head(folio); > } > > @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, > struct page *p; > > __folio_clear_reserved(folio); > - __folio_set_head(folio); > - /* we rely on prep_new_hugetlb_folio to set the destructor */ > - folio_set_order(folio, order); > for (i = 0; i < nr_pages; i++) { > p = folio_page(folio, i); > > @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, > if (i != 0) > set_compound_head(p, &folio->page); > } > + __folio_set_head(folio); > + /* we rely on prep_new_hugetlb_folio to set the destructor */ > + folio_set_order(folio, order); > atomic_set(&folio->_entire_mapcount, -1); > atomic_set(&folio->_nr_pages_mapped, 0); > atomic_set(&folio->_pincount, 0); > @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, > p = folio_page(folio, j); > __ClearPageReserved(p); > } > - folio_set_order(folio, 0); > - __folio_clear_head(folio); > return false; > } > > diff --git a/mm/internal.h b/mm/internal.h > index 18cda26b8a92..0d96a3bc1d58 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page, > */ > static inline void folio_set_order(struct folio *folio, unsigned int order) > { > - if (WARN_ON_ONCE(!folio_test_large(folio))) > + if (WARN_ON_ONCE(!order || !folio_test_large(folio))) > return; > > folio->_folio_order = order; > #ifdef CONFIG_64BIT > - /* > - * When hugetlb dissolves a folio, we need to clear the tail > - * page, rather than setting nr_pages to 1. > - */ > - folio->_folio_nr_pages = order ? 1U << order : 0; > + folio->_folio_nr_pages = 1U << order; > #endif > } > > -- > 2.31.1 >
On 4/14/23 12:48 PM, Tarun Sahu wrote: > folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order > folio does not have any tail page to set order. folio->_folio_nr_pages is > set to 0 for order 0 in folio_set_order. It is required because In the previous discussion of this function, Mike mentioned having folio_set_order() be used for non-zero orders and adding a folio_clear_order() that is used to set order to 0. This could be done to reduce confusion. > _folio_nr_pages overlapped with page->mapping and leaving it non zero > caused "bad page" error while freeing gigantic hugepages. This was fixed in > Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic > pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA > pages to CMA") now explicitly clear page->mapping and hence we won't see > the bad page error even if _folio_nr_pages remains unset. Also the order 0 > folios are not supposed to call folio_set_order, So now we can get rid of > folio_set_order(folio, 0) from hugetlb code path to clear the confusion. > > The patch also moves _folio_set_head and folio_set_order calls in > __prep_compound_gigantic_folio() such that we avoid clearing them in the > error path. > > Testing: I have run LTP tests, which all passes. and also I have written > the test in LTP which tests the bug caused by compound_nr and page->mapping > overlapping. > > https://lore.kernel.org/all/20230413090753.883953-1-tsahu@linux.ibm.com/ > > Running on older kernel ( < 5.10-rc7) with the above bug this fails while > on newer kernel and, also with this patch it passes. > > Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com> > --- > mm/hugetlb.c | 9 +++------ > mm/internal.h | 8 ++------ > 2 files changed, 5 insertions(+), 12 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index f16b25b1a6b9..e2540269c1dc 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, > set_page_refcounted(p); > } > > - folio_set_order(folio, 0); > __folio_clear_head(folio); > } > > @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, > struct page *p; > > __folio_clear_reserved(folio); > - __folio_set_head(folio); > - /* we rely on prep_new_hugetlb_folio to set the destructor */ > - folio_set_order(folio, order); > for (i = 0; i < nr_pages; i++) { > p = folio_page(folio, i); > > @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, > if (i != 0) > set_compound_head(p, &folio->page); > } calling set_compound_head() for the tail page before the folio has the head flag set could seem misleading. At this point order is not set as well so it is not clear that the folio is a compound page folio. > + __folio_set_head(folio); > + /* we rely on prep_new_hugetlb_folio to set the destructor */ > + folio_set_order(folio, order); > atomic_set(&folio->_entire_mapcount, -1); > atomic_set(&folio->_nr_pages_mapped, 0); > atomic_set(&folio->_pincount, 0); > @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, > p = folio_page(folio, j); > __ClearPageReserved(p); > } > - folio_set_order(folio, 0); > - __folio_clear_head(folio); > return false; > } > > diff --git a/mm/internal.h b/mm/internal.h > index 18cda26b8a92..0d96a3bc1d58 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page, > */ > static inline void folio_set_order(struct folio *folio, unsigned int order) > { > - if (WARN_ON_ONCE(!folio_test_large(folio))) > + if (WARN_ON_ONCE(!order || !folio_test_large(folio))) > return; > > folio->_folio_order = order; > #ifdef CONFIG_64BIT > - /* > - * When hugetlb dissolves a folio, we need to clear the tail > - * page, rather than setting nr_pages to 1. > - */ > - folio->_folio_nr_pages = order ? 1U << order : 0; > + folio->_folio_nr_pages = 1U << order; > #endif > } >
Matthew Wilcox <willy@infradead.org> writes: Hi Mathew, Thanks for reviewing. please find my comments inline. > On Sat, Apr 15, 2023 at 01:18:32AM +0530, Tarun Sahu wrote: >> folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order >> folio does not have any tail page to set order. > > I think you're missing the point of how folio_set_order() is used. > When splitting a large folio, we need to zero out the folio_nr_pages > in the tail, so it does have a tail page, and that tail page needs to > be zeroed. We even assert that there is a tail page: > > if (WARN_ON_ONCE(!folio_test_large(folio))) > return; > > Or maybe you need to explain yourself better. > Yes, I understand, folio_set_order(order, 0) is called to clear out tail pages folio_order/folio_nr_pages. With this patch, I am trying to convey two things:- 1. It is not necessary to clear out these field if page->mapping is being explicitly updated. I explain this below [EXP]. 2. folio_set_order(order, 0) now currently being used to clear folio_order and folio_nr_pages which is ok. But looking at folio_set_order(folio, 0) is confusing as setting order 0 implies that only 1 page in folio. and folio_order and folio_nr_pages are part of first tail page. IIRC, there was a discussion to use folio_clear_order to avoid such confusion. But if above point 1 deemed to be correct, there will not be any need of this too. **[EXP]** IIUC, during splitting, page->mapping is updated explicitly for tail pages. There is no code path I see, where folio_set_order(order, 0) or set_compound_order(head, 0) is called except below two places. 1. __destroy_compound_gigantic_folio Here, in past there was a problem when struct page used to have compound_nr field which used to overlap with page->mapping. So while freeing, it was necessary to explicitly clear out compound_nr. Which was taken care by Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic pages"). But after, Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to CMA"), page->mapping has explicitly been cleared out for all tail pages. for (i = 1; i < nr_pages; i++) { p = folio_page(folio, i); p->mapping = NULL; <======== (Here) clear_compound_head(p); if (!demote) set_page_refcounted(p); } folio_set_order(folio, 0); <== this line can be removed. 2. __prep_compound_gigantic_folio Here, folio_set_order(folio, 0) is called in error path only. which can be avoided if we call folio_set_order(folio, order) after the for loop. I am new to memory allocators. But as far as I could understood by looking at past discussion around this function [1][2], During RCU grace period there could be a race condition causing ref count inflation. But IIUC, that doesn't have any dependency on newly allocated gigantic page except that the ref count might be taken by folio_ref_try_add_rcu for the same page/s which will cause prep_compound_gigantic_folio to fail. So IMHO, it will be ok to move __folio_set_head and folio_set_order after the for loop. Here, Just for reference, below I copy pasted the *for loop*, from before, I am moving these two calls to after this. for (i = 0; i < nr_pages; i++) { p = folio_page(folio, i); if (i != 0) /* head page cleared above */ __ClearPageReserved(p); if (!demote) { if (!page_ref_freeze(p, 1)) { pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n"); goto out_error; } } else { VM_BUG_ON_PAGE(page_count(p), p); } if (i != 0) set_compound_head(p, &folio->page); } I also tested it with triggering demotion of gigantic hugepages to PMD hugepages. $ echo 5 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages $ cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages 5 $ cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 0 $ echo 1 > /sys/kernel/mm/hugepages/hugepages-1048576kB/demote $ cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages 512 I am quite new to field. Please correct me if I understood it differently than it is. Also if I didn't consider other code path for its consideration. [1] https://lore.kernel.org/linux-mm/CAG48ez23q0Jy9cuVnwAe7t_fdhMk2S7N5Hdi-GLcCeq5bsfLxw@mail.gmail.com/ [2] https://lore.kernel.org/all/20210622021423.154662-3-mike.kravetz@oracle.com/T/#u >> folio->_folio_nr_pages is >> set to 0 for order 0 in folio_set_order. It is required because >> _folio_nr_pages overlapped with page->mapping and leaving it non zero >> caused "bad page" error while freeing gigantic hugepages. This was fixed in >> Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic >> pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA >> pages to CMA") now explicitly clear page->mapping and hence we won't see >> the bad page error even if _folio_nr_pages remains unset. Also the order 0 >> folios are not supposed to call folio_set_order, So now we can get rid of >> folio_set_order(folio, 0) from hugetlb code path to clear the confusion. > > ... this is all very confusing. > Sorry, for this. Lemme know if above explanation [EXP] is clear. >> The patch also moves _folio_set_head and folio_set_order calls in >> __prep_compound_gigantic_folio() such that we avoid clearing them in the >> error path. > > But don't we need those bits set while we operate on the folio to set it > up? It makes me nervous if we don't have those bits set because we can > end up with speculative references that point to a head page while that > page is not marked as a head page. It may not be a problem, but I want > to see some air-tight analysis of that. > >> Testing: I have run LTP tests, which all passes. and also I have written >> the test in LTP which tests the bug caused by compound_nr and page->mapping >> overlapping. >> >> https://lore.kernel.org/all/20230413090753.883953-1-tsahu@linux.ibm.com/ >> >> Running on older kernel ( < 5.10-rc7) with the above bug this fails while >> on newer kernel and, also with this patch it passes. >> >> Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com> >> --- >> mm/hugetlb.c | 9 +++------ >> mm/internal.h | 8 ++------ >> 2 files changed, 5 insertions(+), 12 deletions(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index f16b25b1a6b9..e2540269c1dc 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, >> set_page_refcounted(p); >> } >> >> - folio_set_order(folio, 0); >> __folio_clear_head(folio); >> } >> >> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >> struct page *p; >> >> __folio_clear_reserved(folio); >> - __folio_set_head(folio); >> - /* we rely on prep_new_hugetlb_folio to set the destructor */ >> - folio_set_order(folio, order); >> for (i = 0; i < nr_pages; i++) { >> p = folio_page(folio, i); >> >> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >> if (i != 0) >> set_compound_head(p, &folio->page); >> } >> + __folio_set_head(folio); >> + /* we rely on prep_new_hugetlb_folio to set the destructor */ >> + folio_set_order(folio, order); >> atomic_set(&folio->_entire_mapcount, -1); >> atomic_set(&folio->_nr_pages_mapped, 0); >> atomic_set(&folio->_pincount, 0); >> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >> p = folio_page(folio, j); >> __ClearPageReserved(p); >> } >> - folio_set_order(folio, 0); >> - __folio_clear_head(folio); >> return false; >> } >> >> diff --git a/mm/internal.h b/mm/internal.h >> index 18cda26b8a92..0d96a3bc1d58 100644 >> --- a/mm/internal.h >> +++ b/mm/internal.h >> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page, >> */ >> static inline void folio_set_order(struct folio *folio, unsigned int order) >> { >> - if (WARN_ON_ONCE(!folio_test_large(folio))) >> + if (WARN_ON_ONCE(!order || !folio_test_large(folio))) >> return; >> >> folio->_folio_order = order; >> #ifdef CONFIG_64BIT >> - /* >> - * When hugetlb dissolves a folio, we need to clear the tail >> - * page, rather than setting nr_pages to 1. >> - */ >> - folio->_folio_nr_pages = order ? 1U << order : 0; >> + folio->_folio_nr_pages = 1U << order; >> #endif >> } >> >> -- >> 2.31.1 >>
Hi Sidhartha, Thanks for your inputs, please find my comments inline > On 4/14/23 12:48 PM, Tarun Sahu wrote: >> folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order >> folio does not have any tail page to set order. folio->_folio_nr_pages is >> set to 0 for order 0 in folio_set_order. It is required because > > In the previous discussion of this function, Mike mentioned having > folio_set_order() be used for non-zero orders and adding a > folio_clear_order() that is used to set order to 0. This could be done > to reduce confusion. > Yes, I agree, I replied to Mathew reply to this thread, Lemme know your thought on this. In this patch, I proposed that there won't be need of folio_clear_order if folio_set_order(folio, 0) is not needed with minor changes in code path. >> _folio_nr_pages overlapped with page->mapping and leaving it non zero >> caused "bad page" error while freeing gigantic hugepages. This was fixed in >> Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic >> pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA >> pages to CMA") now explicitly clear page->mapping and hence we won't see >> the bad page error even if _folio_nr_pages remains unset. Also the order 0 >> folios are not supposed to call folio_set_order, So now we can get rid of >> folio_set_order(folio, 0) from hugetlb code path to clear the confusion. >> >> The patch also moves _folio_set_head and folio_set_order calls in >> __prep_compound_gigantic_folio() such that we avoid clearing them in the >> error path. >> >> Testing: I have run LTP tests, which all passes. and also I have written >> the test in LTP which tests the bug caused by compound_nr and page->mapping >> overlapping. >> >> https://lore.kernel.org/all/20230413090753.883953-1-tsahu@linux.ibm.com/ >> >> Running on older kernel ( < 5.10-rc7) with the above bug this fails while >> on newer kernel and, also with this patch it passes. >> >> Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com> >> --- >> mm/hugetlb.c | 9 +++------ >> mm/internal.h | 8 ++------ >> 2 files changed, 5 insertions(+), 12 deletions(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index f16b25b1a6b9..e2540269c1dc 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, >> set_page_refcounted(p); >> } >> >> - folio_set_order(folio, 0); >> __folio_clear_head(folio); >> } >> >> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >> struct page *p; >> >> __folio_clear_reserved(folio); >> - __folio_set_head(folio); >> - /* we rely on prep_new_hugetlb_folio to set the destructor */ >> - folio_set_order(folio, order); >> for (i = 0; i < nr_pages; i++) { >> p = folio_page(folio, i); >> >> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >> if (i != 0) >> set_compound_head(p, &folio->page); >> } > > calling set_compound_head() for the tail page before the folio has the > head flag set could seem misleading. At this point order is not set as > well so it is not clear that the folio is a compound page folio. > Yeah, I agree, But they are part of same call. I can avoid moving __folio_set_head. And I think, It wont mislead if I avoid moving __folio_set_head. Below function has the similar path. void prep_compound_page(struct page *page, unsigned int order) { int i; int nr_pages = 1 << order; __SetPageHead(page); for (i = 1; i < nr_pages; i++) prep_compound_tail(page, i); prep_compound_head(page, order); } Lemme know you thoughts. ~Tarun >> + __folio_set_head(folio); >> + /* we rely on prep_new_hugetlb_folio to set the destructor */ >> + folio_set_order(folio, order); >> atomic_set(&folio->_entire_mapcount, -1); >> atomic_set(&folio->_nr_pages_mapped, 0); >> atomic_set(&folio->_pincount, 0); >> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >> p = folio_page(folio, j); >> __ClearPageReserved(p); >> } >> - folio_set_order(folio, 0); >> - __folio_clear_head(folio); >> return false; >> } >> >> diff --git a/mm/internal.h b/mm/internal.h >> index 18cda26b8a92..0d96a3bc1d58 100644 >> --- a/mm/internal.h >> +++ b/mm/internal.h >> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page, >> */ >> static inline void folio_set_order(struct folio *folio, unsigned int order) >> { >> - if (WARN_ON_ONCE(!folio_test_large(folio))) >> + if (WARN_ON_ONCE(!order || !folio_test_large(folio))) >> return; >> >> folio->_folio_order = order; >> #ifdef CONFIG_64BIT >> - /* >> - * When hugetlb dissolves a folio, we need to clear the tail >> - * page, rather than setting nr_pages to 1. >> - */ >> - folio->_folio_nr_pages = order ? 1U << order : 0; >> + folio->_folio_nr_pages = 1U << order; >> #endif >> } >>
Tarun Sahu <tsahu@linux.ibm.com> writes: > Hi Sidhartha, > > Thanks for your inputs, please find my comments inline > >> On 4/14/23 12:48 PM, Tarun Sahu wrote: >>> folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order >>> folio does not have any tail page to set order. folio->_folio_nr_pages is >>> set to 0 for order 0 in folio_set_order. It is required because >> >> In the previous discussion of this function, Mike mentioned having >> folio_set_order() be used for non-zero orders and adding a >> folio_clear_order() that is used to set order to 0. This could be done >> to reduce confusion. >> > Yes, I agree, I replied to Mathew reply to this thread, Lemme know your > thought on this. In this patch, I proposed that there won't be need of > folio_clear_order if folio_set_order(folio, 0) is not needed with minor > changes in code path. > >>> _folio_nr_pages overlapped with page->mapping and leaving it non zero >>> caused "bad page" error while freeing gigantic hugepages. This was fixed in >>> Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic >>> pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA >>> pages to CMA") now explicitly clear page->mapping and hence we won't see >>> the bad page error even if _folio_nr_pages remains unset. Also the order 0 >>> folios are not supposed to call folio_set_order, So now we can get rid of >>> folio_set_order(folio, 0) from hugetlb code path to clear the confusion. >>> >>> The patch also moves _folio_set_head and folio_set_order calls in >>> __prep_compound_gigantic_folio() such that we avoid clearing them in the >>> error path. >>> >>> Testing: I have run LTP tests, which all passes. and also I have written >>> the test in LTP which tests the bug caused by compound_nr and page->mapping >>> overlapping. >>> >>> https://lore.kernel.org/all/20230413090753.883953-1-tsahu@linux.ibm.com/ >>> >>> Running on older kernel ( < 5.10-rc7) with the above bug this fails while >>> on newer kernel and, also with this patch it passes. >>> >>> Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com> >>> --- >>> mm/hugetlb.c | 9 +++------ >>> mm/internal.h | 8 ++------ >>> 2 files changed, 5 insertions(+), 12 deletions(-) >>> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>> index f16b25b1a6b9..e2540269c1dc 100644 >>> --- a/mm/hugetlb.c >>> +++ b/mm/hugetlb.c >>> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, >>> set_page_refcounted(p); >>> } >>> >>> - folio_set_order(folio, 0); >>> __folio_clear_head(folio); >>> } >>> >>> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >>> struct page *p; >>> >>> __folio_clear_reserved(folio); >>> - __folio_set_head(folio); >>> - /* we rely on prep_new_hugetlb_folio to set the destructor */ >>> - folio_set_order(folio, order); >>> for (i = 0; i < nr_pages; i++) { >>> p = folio_page(folio, i); >>> >>> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >>> if (i != 0) >>> set_compound_head(p, &folio->page); >>> } >> >> calling set_compound_head() for the tail page before the folio has the >> head flag set could seem misleading. At this point order is not set as >> well so it is not clear that the folio is a compound page folio. >> > Yeah, I agree, But they are part of same call. I can avoid moving > __folio_set_head. And I think, It wont mislead if I avoid moving > __folio_set_head. Below function has the similar path. Apologies, Here, I mixed the sentences, I want to say It won't mislead if we avoid moving __folio_set_head, but move only folio_set_order. Below function does the same. > > void prep_compound_page(struct page *page, unsigned int order) > { > int i; > int nr_pages = 1 << order; > > __SetPageHead(page); > for (i = 1; i < nr_pages; i++) > prep_compound_tail(page, i); > > prep_compound_head(page, order); > } > > Lemme know you thoughts. > > > ~Tarun > >>> + __folio_set_head(folio); >>> + /* we rely on prep_new_hugetlb_folio to set the destructor */ >>> + folio_set_order(folio, order); >>> atomic_set(&folio->_entire_mapcount, -1); >>> atomic_set(&folio->_nr_pages_mapped, 0); >>> atomic_set(&folio->_pincount, 0); >>> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >>> p = folio_page(folio, j); >>> __ClearPageReserved(p); >>> } >>> - folio_set_order(folio, 0); >>> - __folio_clear_head(folio); >>> return false; >>> } >>> >>> diff --git a/mm/internal.h b/mm/internal.h >>> index 18cda26b8a92..0d96a3bc1d58 100644 >>> --- a/mm/internal.h >>> +++ b/mm/internal.h >>> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page, >>> */ >>> static inline void folio_set_order(struct folio *folio, unsigned int order) >>> { >>> - if (WARN_ON_ONCE(!folio_test_large(folio))) >>> + if (WARN_ON_ONCE(!order || !folio_test_large(folio))) >>> return; >>> >>> folio->_folio_order = order; >>> #ifdef CONFIG_64BIT >>> - /* >>> - * When hugetlb dissolves a folio, we need to clear the tail >>> - * page, rather than setting nr_pages to 1. >>> - */ >>> - folio->_folio_nr_pages = order ? 1U << order : 0; >>> + folio->_folio_nr_pages = 1U << order; >>> #endif >>> } >>>
On 04/14/23 21:12, Matthew Wilcox wrote: > On Sat, Apr 15, 2023 at 01:18:32AM +0530, Tarun Sahu wrote: > > folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order > > folio does not have any tail page to set order. > > I think you're missing the point of how folio_set_order() is used. > When splitting a large folio, we need to zero out the folio_nr_pages > in the tail, so it does have a tail page, and that tail page needs to > be zeroed. We even assert that there is a tail page: > > if (WARN_ON_ONCE(!folio_test_large(folio))) > return; > > Or maybe you need to explain yourself better. > > > folio->_folio_nr_pages is > > set to 0 for order 0 in folio_set_order. It is required because > > _folio_nr_pages overlapped with page->mapping and leaving it non zero > > caused "bad page" error while freeing gigantic hugepages. This was fixed in > > Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic > > pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA > > pages to CMA") now explicitly clear page->mapping and hence we won't see > > the bad page error even if _folio_nr_pages remains unset. Also the order 0 > > folios are not supposed to call folio_set_order, So now we can get rid of > > folio_set_order(folio, 0) from hugetlb code path to clear the confusion. > > ... this is all very confusing. > > > The patch also moves _folio_set_head and folio_set_order calls in > > __prep_compound_gigantic_folio() such that we avoid clearing them in the > > error path. > > But don't we need those bits set while we operate on the folio to set it > up? It makes me nervous if we don't have those bits set because we can > end up with speculative references that point to a head page while that > page is not marked as a head page. It may not be a problem, but I want > to see some air-tight analysis of that. I am fairly certain we are 'safe'. Here is code before setting up the pointer to the head page. * In the case of demote, the ref count will be zero. */ if (!demote) { if (!page_ref_freeze(p, 1)) { pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n"); goto out_error; } } else { VM_BUG_ON_PAGE(page_count(p), p); } if (i != 0) set_compound_head(p, &folio->page); So, before setting the pointer to head page ref count will be zero. I 'think' it would actually be better to move the calls to _folio_set_head and folio_set_order in __prep_compound_gigantic_folio() as suggested here. Why? In the current code, the ref count on the 'head page' is still 1 (or more) while those calls are made. So, someone could take a speculative ref on the page BEFORE the tail pages are set up. TBH, I do not have much of an opinion about potential confusion surrounding folio_set_compound_order(folio, 0). IIUC, hugetlb gigantic page setup is the only place outside the page allocation code that sets up compound pages/large folios. So, it is going to be a bit 'special'. As mentioned, when this was originally discussed I suggested folio_clear_order(). I would be happy with either.
Hi, Mathew, I am not sure If I was clear about my intention behind the patch. Here, I attempt to answer it again. Please lemme know if you agree. Matthew Wilcox <willy@infradead.org> writes: > On Sat, Apr 15, 2023 at 01:18:32AM +0530, Tarun Sahu wrote: >> folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order >> folio does not have any tail page to set order. > > I think you're missing the point of how folio_set_order() is used. > When splitting a large folio, we need to zero out the folio_nr_pages > in the tail, so it does have a tail page, and that tail page needs to > be zeroed. We even assert that there is a tail page: > > if (WARN_ON_ONCE(!folio_test_large(folio))) > return; > > Or maybe you need to explain yourself better. > In the upstream, I don't see folio_set_order(folio, 0) being called in splitting path. IIUC, we had to zero out _folio_nr_pages during freeing gigantic folio as described by Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic pages"). I agree that folio_set_order(folio, 0) is called with folio having tail pages. But I meant only that, in general, it is just confusing to have setting the folio order to 0. With this patch, I would like to draw attention to the point that there is no need to call folio_set_order(folio, 0) anymore to zero out _folio_order and _folio_nr_pages. In past, it was needed because page->mapping used to overlap with _folio_nr_pages and _folio_order. So if these fields were left uncleared during freeing gigantic hugepages, they were causing "BUG: bad page state" due to non-zero page->mapping. Now, After Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to CMA") page->mapping has explicitly been cleared out for tail pages. Also, _folio_order and _folio_nr_pages no longer overlaps with page->mapping. struct page { ... struct address_space * mapping; /* 24 8 */ ... } struct folio { ... union { struct { long unsigned int _flags_1; /* 64 8 */ long unsigned int _head_1; /* 72 8 */ unsigned char _folio_dtor; /* 80 1 */ unsigned char _folio_order; /* 81 1 */ /* XXX 2 bytes hole, try to pack */ atomic_t _entire_mapcount; /* 84 4 */ atomic_t _nr_pages_mapped; /* 88 4 */ atomic_t _pincount; /* 92 4 */ unsigned int _folio_nr_pages; /* 96 4 */ }; /* 64 40 */ struct page __page_1 __attribute__((__aligned__(8))); /* 64 64 */ } ... } So, folio_set_order(folio, 0) can be removed from freeing gigantic folio path (__destroy_compound_gigantic_folio). Another place, where folio_set_order(folio, 0) is called inside __prep_compound_gigantic_folio during error path. Here, folio_set_order(folio, 0) can also be removed if we move folio_set_order(folio, order) after for loop. Also, Mike confirmed that it is safe to move the call. ~Tarun >> folio->_folio_nr_pages is >> set to 0 for order 0 in folio_set_order. It is required because >> _folio_nr_pages overlapped with page->mapping and leaving it non zero >> caused "bad page" error while freeing gigantic hugepages. This was fixed in >> Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic >> pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA >> pages to CMA") now explicitly clear page->mapping and hence we won't see >> the bad page error even if _folio_nr_pages remains unset. Also the order 0 >> folios are not supposed to call folio_set_order, So now we can get rid of >> folio_set_order(folio, 0) from hugetlb code path to clear the confusion. > > ... this is all very confusing. > >> The patch also moves _folio_set_head and folio_set_order calls in >> __prep_compound_gigantic_folio() such that we avoid clearing them in the >> error path. > > But don't we need those bits set while we operate on the folio to set it > up? It makes me nervous if we don't have those bits set because we can > end up with speculative references that point to a head page while that > page is not marked as a head page. It may not be a problem, but I want > to see some air-tight analysis of that. > >> Testing: I have run LTP tests, which all passes. and also I have written >> the test in LTP which tests the bug caused by compound_nr and page->mapping >> overlapping. >> >> https://lore.kernel.org/all/20230413090753.883953-1-tsahu@linux.ibm.com/ >> >> Running on older kernel ( < 5.10-rc7) with the above bug this fails while >> on newer kernel and, also with this patch it passes. >> >> Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com> >> --- >> mm/hugetlb.c | 9 +++------ >> mm/internal.h | 8 ++------ >> 2 files changed, 5 insertions(+), 12 deletions(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index f16b25b1a6b9..e2540269c1dc 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, >> set_page_refcounted(p); >> } >> >> - folio_set_order(folio, 0); >> __folio_clear_head(folio); >> } >> >> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >> struct page *p; >> >> __folio_clear_reserved(folio); >> - __folio_set_head(folio); >> - /* we rely on prep_new_hugetlb_folio to set the destructor */ >> - folio_set_order(folio, order); >> for (i = 0; i < nr_pages; i++) { >> p = folio_page(folio, i); >> >> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >> if (i != 0) >> set_compound_head(p, &folio->page); >> } >> + __folio_set_head(folio); >> + /* we rely on prep_new_hugetlb_folio to set the destructor */ >> + folio_set_order(folio, order); >> atomic_set(&folio->_entire_mapcount, -1); >> atomic_set(&folio->_nr_pages_mapped, 0); >> atomic_set(&folio->_pincount, 0); >> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, >> p = folio_page(folio, j); >> __ClearPageReserved(p); >> } >> - folio_set_order(folio, 0); >> - __folio_clear_head(folio); >> return false; >> } >> >> diff --git a/mm/internal.h b/mm/internal.h >> index 18cda26b8a92..0d96a3bc1d58 100644 >> --- a/mm/internal.h >> +++ b/mm/internal.h >> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page, >> */ >> static inline void folio_set_order(struct folio *folio, unsigned int order) >> { >> - if (WARN_ON_ONCE(!folio_test_large(folio))) >> + if (WARN_ON_ONCE(!order || !folio_test_large(folio))) >> return; >> >> folio->_folio_order = order; >> #ifdef CONFIG_64BIT >> - /* >> - * When hugetlb dissolves a folio, we need to clear the tail >> - * page, rather than setting nr_pages to 1. >> - */ >> - folio->_folio_nr_pages = order ? 1U << order : 0; >> + folio->_folio_nr_pages = 1U << order; >> #endif >> } >> >> -- >> 2.31.1 >>
Hi Mike, Mike Kravetz <mike.kravetz@oracle.com> writes: > On 04/14/23 21:12, Matthew Wilcox wrote: >> On Sat, Apr 15, 2023 at 01:18:32AM +0530, Tarun Sahu wrote: >> > folio_set_order(folio, 0); which is an abuse of folio_set_order as 0-order >> > folio does not have any tail page to set order. >> >> I think you're missing the point of how folio_set_order() is used. >> When splitting a large folio, we need to zero out the folio_nr_pages >> in the tail, so it does have a tail page, and that tail page needs to >> be zeroed. We even assert that there is a tail page: >> >> if (WARN_ON_ONCE(!folio_test_large(folio))) >> return; >> >> Or maybe you need to explain yourself better. >> >> > folio->_folio_nr_pages is >> > set to 0 for order 0 in folio_set_order. It is required because >> > _folio_nr_pages overlapped with page->mapping and leaving it non zero >> > caused "bad page" error while freeing gigantic hugepages. This was fixed in >> > Commit ba9c1201beaa ("mm/hugetlb: clear compound_nr before freeing gigantic >> > pages"). Also commit a01f43901cfb ("hugetlb: be sure to free demoted CMA >> > pages to CMA") now explicitly clear page->mapping and hence we won't see >> > the bad page error even if _folio_nr_pages remains unset. Also the order 0 >> > folios are not supposed to call folio_set_order, So now we can get rid of >> > folio_set_order(folio, 0) from hugetlb code path to clear the confusion. >> >> ... this is all very confusing. >> >> > The patch also moves _folio_set_head and folio_set_order calls in >> > __prep_compound_gigantic_folio() such that we avoid clearing them in the >> > error path. >> >> But don't we need those bits set while we operate on the folio to set it >> up? It makes me nervous if we don't have those bits set because we can >> end up with speculative references that point to a head page while that >> page is not marked as a head page. It may not be a problem, but I want >> to see some air-tight analysis of that. > > I am fairly certain we are 'safe'. Here is code before setting up the > pointer to the head page. > > * In the case of demote, the ref count will be zero. > */ > if (!demote) { > if (!page_ref_freeze(p, 1)) { > pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n"); > goto out_error; > } > } else { > VM_BUG_ON_PAGE(page_count(p), p); > } > if (i != 0) > set_compound_head(p, &folio->page); > > So, before setting the pointer to head page ref count will be zero. > > I 'think' it would actually be better to move the calls to _folio_set_head and > folio_set_order in __prep_compound_gigantic_folio() as suggested here. Why? > In the current code, the ref count on the 'head page' is still 1 (or more) > while those calls are made. So, someone could take a speculative ref on the > page BEFORE the tail pages are set up. > Thanks, for confirming the correctness of moving these calls. Also I didn't look at it this way while moving them. Thanks for the comment. I will update the commit msg and send the v2. ~Tarun > TBH, I do not have much of an opinion about potential confusion surrounding > folio_set_compound_order(folio, 0). IIUC, hugetlb gigantic page setup is the > only place outside the page allocation code that sets up compound pages/large > folios. So, it is going to be a bit 'special'. As mentioned, when this was > originally discussed I suggested folio_clear_order(). I would be happy with > either. > -- > Mike Kravetz
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f16b25b1a6b9..e2540269c1dc 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio, set_page_refcounted(p); } - folio_set_order(folio, 0); __folio_clear_head(folio); } @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, struct page *p; __folio_clear_reserved(folio); - __folio_set_head(folio); - /* we rely on prep_new_hugetlb_folio to set the destructor */ - folio_set_order(folio, order); for (i = 0; i < nr_pages; i++) { p = folio_page(folio, i); @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, if (i != 0) set_compound_head(p, &folio->page); } + __folio_set_head(folio); + /* we rely on prep_new_hugetlb_folio to set the destructor */ + folio_set_order(folio, order); atomic_set(&folio->_entire_mapcount, -1); atomic_set(&folio->_nr_pages_mapped, 0); atomic_set(&folio->_pincount, 0); @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio, p = folio_page(folio, j); __ClearPageReserved(p); } - folio_set_order(folio, 0); - __folio_clear_head(folio); return false; } diff --git a/mm/internal.h b/mm/internal.h index 18cda26b8a92..0d96a3bc1d58 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page, */ static inline void folio_set_order(struct folio *folio, unsigned int order) { - if (WARN_ON_ONCE(!folio_test_large(folio))) + if (WARN_ON_ONCE(!order || !folio_test_large(folio))) return; folio->_folio_order = order; #ifdef CONFIG_64BIT - /* - * When hugetlb dissolves a folio, we need to clear the tail - * page, rather than setting nr_pages to 1. - */ - folio->_folio_nr_pages = order ? 1U << order : 0; + folio->_folio_nr_pages = 1U << order; #endif }