Message ID | 20231010142801.3780917-1-naoya.horiguchi@linux.dev |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp248863vqb; Tue, 10 Oct 2023 07:36:19 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEx5SjDqIBGAQf8rdVSuwfHXaGAXa1GN4LnlmKh8YRuARfbSC8TN3LPtoI+s/fTfSD28V0Z X-Received: by 2002:a9d:7587:0:b0:6c1:7927:6550 with SMTP id s7-20020a9d7587000000b006c179276550mr17652501otk.2.1696948579007; Tue, 10 Oct 2023 07:36:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696948578; cv=none; d=google.com; s=arc-20160816; b=twY4rBjZ2klNEZHA8D8AvNDqM09oFUqEAVcpzLTrOShv4gFF2vG5cGVq/ytUs4fxf8 FHBMILPasXg4X5hrAO6gXJ7cpINGA0qPefhRAbULMO9SPm+m0Jmmc+9zHYkIPHvh8DIt 58dgyq+unLfDhEp6AqDb2kvDNndTy/Nu7YifBYFiOSfSYa+E/r/JPmvmwuFm5oKSACsC piczbqlhgUcL50Yh4h2kq0Hwn7SZz1lr/Bxqi+DXIDzOen9s6/t3aQeNKVbMMtsppST8 XKM4Ccl6bW5rqqiltRhXQnBqz/mija8wFinqRIwBs4lRWSqM+tInPQPQ+gb8due3q7gM NqyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=GLeS08MBIWGTlAyOqrf8ngGV4rj6bq2718i39P+fBE0=; fh=vd113QmVxKSP0xyTVgYADqM2Ffy1JHXPefyn38Zlb+8=; b=ndy91AbXEIavvWHUdVzrm9Ml2z5Tp6POQO4QXokAEePqB+c+Fap0WtJ1Ut2q+oNIw/ zdtfKlHEMPZso7ezBnBtauUjNtwlYO7Npo6xVB+m0Jw9yanSdohLIrUZ/b42Mypr5CQG GANf5oyVMEYZLnKV6Vkm2fxGH/tfdwzm3r2XuwBylmtA9BZgnzCjQOwprhb8LknRvSyw mH3sf1CxAzVdqOUq/5p4/MkVV6vbISnV5rqbeBXz99v0chU0BTYEcC0zCwqEB7uINA9x /gUhyF2601U9Ntses+zZhSq9+ZHn4U2HyU+dLozhTUcPr91DDLydBt8VaDgSGxNKzwAL L6kw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=fgCPyXho; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id i8-20020a639d08000000b00578a26aee75si7710801pgd.295.2023.10.10.07.36.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Oct 2023 07:36:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=fgCPyXho; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id BE10881D4DF9; Tue, 10 Oct 2023 07:36:15 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232197AbjJJOfy (ORCPT <rfc822;rua109.linux@gmail.com> + 20 others); Tue, 10 Oct 2023 10:35:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233312AbjJJO2r (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 10 Oct 2023 10:28:47 -0400 Received: from out-198.mta0.migadu.com (out-198.mta0.migadu.com [IPv6:2001:41d0:1004:224b::c6]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 09EDDB7 for <linux-kernel@vger.kernel.org>; Tue, 10 Oct 2023 07:28:44 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1696948122; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=GLeS08MBIWGTlAyOqrf8ngGV4rj6bq2718i39P+fBE0=; b=fgCPyXhoYfUC7YYYl8tbmesU6YmPouU1ODhbQ4C5ejRiZTvMAeIOKIIt9sB+Bi1MTFdfyL 16O8Zfs7mISfCxZCLesDUK0x1PtKIFbYWJ5XSQmhjhNMJPKCVpLJkVGELKvZB+zJKGuJy8 uZlhs5msyzRtDD0dz5FtKRSpSV16koU= From: Naoya Horiguchi <naoya.horiguchi@linux.dev> To: linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org>, Matthew Wilcox <willy@infradead.org>, David Hildenbrand <david@redhat.com>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Mike Kravetz <mike.kravetz@oracle.com>, Miaohe Lin <linmiaohe@huawei.com>, Vlastimil Babka <vbabka@suse.cz>, Muchun Song <songmuchun@bytedance.com>, Naoya Horiguchi <naoya.horiguchi@nec.com>, linux-kernel@vger.kernel.org Subject: [PATCH v1 0/5] mm, kpageflags: support folio and fix output for compound pages Date: Tue, 10 Oct 2023 23:27:56 +0900 Message-Id: <20231010142801.3780917-1-naoya.horiguchi@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=2.7 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Tue, 10 Oct 2023 07:36:15 -0700 (PDT) X-Spam-Level: ** X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1779379553062858223 X-GMAIL-MSGID: 1779379553062858223 |
Series |
mm, kpageflags: support folio and fix output for compound pages
|
|
Message
Naoya Horiguchi
Oct. 10, 2023, 2:27 p.m. UTC
Hi everyone, This patchset addresses 2 issues in /proc/kpageflags. 1. We can't easily tell folio from thp, because currently both pages are judged as thp, and 2. we see some garbage data in records of compound tail pages because we use tail pages to store some internal data. These issues require userspace programs to do additional work to understand the page status, which makes situation more complicated. This patchset tries to solve these by defining KPF_FOLIO for issue 1., and by hiding part of page flag info on tail pages of compound pages for issue 2. I think that technically some compound pages like thp/hugetlb/slab could be considered as folio, but in this version KPF_FOLIO is set only on folios in pagecache (so "folios in narrower meaning"). I'm not confident about this choice, so if you have any idea about this, please let me know. How we can see using tools/mm/page-types.c will change like below (only focusing on compound pages). Before patchset: // anonymous thp voffset offset len flags ... 700000000 156c00 1 ___U_l_____Ma_bH______t_____________f_d_____1 700000001 156c01 1 L__U_______Ma___T_____t_____________f_______1 700000002 156c02 1fe ___________Ma___T_____t_____________f_______1 // file thp 700000000 15d600 1 __RUDl_____M__bH______t_____________f__I____1 700000001 15d601 1 L__U_______M____T_____t_____________f_______1 700000002 15d602 1fe ___________M____T_____t_____________f_______1 // large folio 700000000 154f84 1 __RU_l_____M___H______t________P____f_____F_1 700000001 154f85 1 ________W__M____T_____t_____________f_____F_1 700000002 154f86 2 ___________M____T_____t_____________f_____F_1 700000004 14d0a4 1 __RU_l_____M___H______t________P____f_____F_1 700000005 14d0a5 1 ________W__M____T_____t_____________f_____F_1 700000006 14d0a6 2 ___________M____T_____t_____________f_____F_1 ... // free hugetlb (HVO disabled) offset len flags ... 106a00 1 _______________H_G___________________________ 106a01 1 L__U__A_________TG___________________________ 106a02 1fe ________________TG___________________________ // anonymous hugetlb (HVO disabled) 700000000 157200 1 ___U_______Ma__H_G__________________f_d_____1 700000001 157201 1 L__U__A____Ma___TG__________________f_______1 700000002 157202 1fe ___________Ma___TG__________________f_______1 // free hugetlb (HVO enabled) 12a600 1 _______________H_G___________________________ 12a601 1 L__U__A_________TG___________________________ 12a602 3f ________________TG___________________________ 12a641 1 L__U__A_________TG___________________________ 12a642 3f ________________TG___________________________ ... // anonymous hugetlb (HVO enabled) 700000000 15e600 1 ___U_______Ma__H_G__________________f_d_____1 700000001 15e601 1 L__U__A____Ma___TG__________________f_______1 700000002 15e602 3e ___________Ma___TG__________________f_______1 700000040 15e640 1 ___U_______Ma___TG__________________f_d_____1 700000041 15e641 1 L__U__A____Ma___TG__________________f_______1 700000042 15e642 3e ___________Ma___TG__________________f_______1 ... // slab flags page-count MB symbolic-flags long-symbolic-flags 0x0000000000000080 5304 20 _______S_____________________________________ slab 0x0000000000008080 1488 5 _______S_______H_____________________________ slab,compound_head 0x0000000000010081 365 1 L______S________T____________________________ locked,slab,compound_tail 0x0000000000010080 4142 16 _______S________T____________________________ slab,compound_tail 0x0000000000010180 649 2 _______SW_______T____________________________ slab,writeback,compound_tail 0x0000000000010181 474 1 L______SW_______T____________________________ locked,slab,writeback,compound_tail 0x0000000000201080 192 0 _______S____a________x_______________________ slab,anonymous,ksm 0x0000000000001080 427 1 _______S____a________________________________ slab,anonymous 0x0000000000409080 237 0 _______S____a__H______t______________________ slab,anonymous,compound_head,thp 0x0000000000411081 78 0 L______S____a___T_____t______________________ locked,slab,anonymous,compound_tail,thp 0x0000000000609080 77 0 _______S____a__H_____xt______________________ slab,anonymous,compound_head,ksm,thp 0x0000000000611081 32 0 L______S____a___T____xt______________________ locked,slab,anonymous,compound_tail,ksm,thp 0x0000000000411080 698 2 _______S____a___T_____t______________________ slab,anonymous,compound_tail,thp 0x0000000000611080 142 0 _______S____a___T____xt______________________ slab,anonymous,compound_tail,ksm,thp 0x0000000000611180 32 0 _______SW___a___T____xt______________________ slab,writeback,anonymous,compound_tail,ksm,thp 0x0000000000411181 95 0 L______SW___a___T_____t______________________ locked,slab,writeback,anonymous,compound_tail,thp 0x0000000000411180 64 0 _______SW___a___T_____t______________________ slab,writeback,anonymous,compound_tail,thp 0x0000000000611181 13 0 L______SW___a___T____xt______________________ locked,slab,writeback,anonymous,compound_tail,ksm,thp After patchset: // anonymous thp 700000000 117000 1 ___U_l_____Ma_bH______t_____________f_d_____1 700000001 117001 1ff ________________T_____t_____________f_______1 // file thp 700000000 118400 1 __RUDl_____M__bH______t_____________f__I____1 700000001 118401 1ff ________________T_____t_____________f_______1 // large folio 700000000 148da4 1 __RU_l_____M___H___________f___P____f_____F_1 700000001 148da5 3 ________________T__________f________f_____F_1 700000004 148da8 1 __RU_l_____M___H___________f___P____f_____F_1 700000005 148da9 3 ________________T__________f________f_____F_1 // free hugetlb (HVO disabled) 116000 1 _______________H_G___________________________ 116001 1ff ________________TG___________________________ // anonymous hugetlb (HVO disabled) 700000000 116000 1 ___U_______Ma__H_G__________________f_d_____1 700000001 116001 1ff ________________TG__________________f_______1 // free hugetlb (HVO enabled) 116000 1 _______________H_G___________________________ 116001 1ff ________________TG___________________________ // anonymous hugetlb (HVO enabled) 700000000 116000 1 ___U_______Ma__H_G__________________f_d_____1 700000001 116001 1ff ________________TG__________________f_______1 // slab 0x0000000000000080 5659 22 _______S_____________________________________ slab 0x0000000000008080 1644 6 _______S_______H_____________________________ slab,compound_head 0x0000000000010080 6196 24 _______S________T____________________________ slab,compound_tail Thanks, Naoya Horiguchi --- Summary: Naoya Horiguchi (5): include/uapi/linux/kernel-page-flags.h: define KPF_FOLIO mm: kpageflags: distinguish thp and folio mm, kpageflags: separate code path for hugetlb pages mm, kpageflags: fix invalid output for PageSlab tools/mm/page-types.c: hide compound pages in non-raw mode fs/proc/page.c | 90 +++++++++++++++++++--------------- include/uapi/linux/kernel-page-flags.h | 1 + tools/mm/page-types.c | 3 +- 3 files changed, 53 insertions(+), 41 deletions(-)
Comments
On 10.10.23 16:27, Naoya Horiguchi wrote: > Hi everyone, > > This patchset addresses 2 issues in /proc/kpageflags. > > 1. We can't easily tell folio from thp, because currently both pages are > judged as thp, and > 2. we see some garbage data in records of compound tail pages because > we use tail pages to store some internal data. > > These issues require userspace programs to do additional work to understand > the page status, which makes situation more complicated. > > This patchset tries to solve these by defining KPF_FOLIO for issue 1., and > by hiding part of page flag info on tail pages of compound pages for issue 2. > > I think that technically some compound pages like thp/hugetlb/slab could be > considered as folio, but in this version KPF_FOLIO is set only on folios At least thp+hugetlb are most certainly folios. Regarding slab, I suspect we no longer call them folios (cannot be mapped to user space). But Im not sure about the type hierarchy. > in pagecache (so "folios in narrower meaning"). I'm not confident about > this choice, so if you have any idea about this, please let me know. It does sound inconsistent. What exactly do you want to tell user space with the new flag?
On Thu, Oct 12, 2023 at 10:33:04AM +0200, David Hildenbrand wrote: > On 10.10.23 16:27, Naoya Horiguchi wrote: > > Hi everyone, > > > > This patchset addresses 2 issues in /proc/kpageflags. > > > > 1. We can't easily tell folio from thp, because currently both pages are > > judged as thp, and > > 2. we see some garbage data in records of compound tail pages because > > we use tail pages to store some internal data. > > > > These issues require userspace programs to do additional work to understand > > the page status, which makes situation more complicated. > > > > This patchset tries to solve these by defining KPF_FOLIO for issue 1., and > > by hiding part of page flag info on tail pages of compound pages for issue 2. > > > > I think that technically some compound pages like thp/hugetlb/slab could be > > considered as folio, but in this version KPF_FOLIO is set only on folios > > At least thp+hugetlb are most certainly folios. Regarding slab, I suspect we > no longer call them folios (cannot be mapped to user space). But Im not sure > about the type hierarchy. I'm not sure about the exact definition of "folio", and I think it's better to make KPF_FOLIO set based on the definition. "being mapped to userspace" can be one possible criteria for the definition. But reading source code, folio_slab() and slab_folio() convert between struct slab and struct folio, so I feel that someone might think a slab is a kind of folio. > > > in pagecache (so "folios in narrower meaning"). I'm not confident about > > this choice, so if you have any idea about this, please let me know. > > It does sound inconsistent. What exactly do you want to tell user space with > the new flag? The current most problematic behavior is to report folio as thp (order-2 pagecache page is definitely a folio but not a thp), and this is what the new flag is intended to tell. Thanks, Naoya Horiguchi
On 12.10.23 17:02, Naoya Horiguchi wrote: > On Thu, Oct 12, 2023 at 10:33:04AM +0200, David Hildenbrand wrote: >> On 10.10.23 16:27, Naoya Horiguchi wrote: >>> Hi everyone, >>> >>> This patchset addresses 2 issues in /proc/kpageflags. >>> >>> 1. We can't easily tell folio from thp, because currently both pages are >>> judged as thp, and >>> 2. we see some garbage data in records of compound tail pages because >>> we use tail pages to store some internal data. >>> >>> These issues require userspace programs to do additional work to understand >>> the page status, which makes situation more complicated. >>> >>> This patchset tries to solve these by defining KPF_FOLIO for issue 1., and >>> by hiding part of page flag info on tail pages of compound pages for issue 2. >>> >>> I think that technically some compound pages like thp/hugetlb/slab could be >>> considered as folio, but in this version KPF_FOLIO is set only on folios >> >> At least thp+hugetlb are most certainly folios. Regarding slab, I suspect we >> no longer call them folios (cannot be mapped to user space). But Im not sure >> about the type hierarchy. > > I'm not sure about the exact definition of "folio", and I think it's better > to make KPF_FOLIO set based on the definition. Me neither. But in any case a THP *is* a folio. So you'd have to set that flag in any case. And any order-0 page (i.e., anon, pagecache) is also a folio. What you seem to imply with folio is "large folio". So KPF_FOLIO is really wrong as far as I can tell. > "being mapped to userspace" can be one possible criteria for the definition. > But reading source code, folio_slab() and slab_folio() convert between > struct slab and struct folio, so I feel that someone might think a slab is > a kind of folio. I keep forgetting if "folio" is just the generic term for any order-0 or compound page, or only for some of them. I usually live in the "anon" world, so I don't get reminded that often :) >>> in pagecache (so "folios in narrower meaning"). I'm not confident about >>> this choice, so if you have any idea about this, please let me know. >> >> It does sound inconsistent. What exactly do you want to tell user space with >> the new flag? > > The current most problematic behavior is to report folio as thp (order-2 > pagecache page is definitely a folio but not a thp), and this is what the > new flag is intended to tell. We are currently considering calling these sub-PMD sized THPs "small-sized THP". [1] Arguably, we're starting with the anon part where we won't get around exposing them to the user in sysfs. So I wouldn't immediately say that these things are not THPs. They are not PMD-sized THP. A slab/hugetlb is certainly not a thp but a folio. Whereby slabs can also be order-0 folios, but hugetlb can't. Looking at other interfaces, we do expose: include/uapi/linux/kernel-page-flags.h:#define KPF_COMPOUND_HEAD 15 include/uapi/linux/kernel-page-flags.h:#define KPF_COMPOUND_TAIL 16 So maybe we should just continue talking about compound pages or do we have to use both terms here in this interface? [1] https://lkml.kernel.org/r/20230929114421.3761121-1-ryan.roberts@arm.com
On Thu, Oct 12, 2023 at 05:30:34PM +0200, David Hildenbrand wrote: > On 12.10.23 17:02, Naoya Horiguchi wrote: > > On Thu, Oct 12, 2023 at 10:33:04AM +0200, David Hildenbrand wrote: > > > On 10.10.23 16:27, Naoya Horiguchi wrote: > > > > Hi everyone, > > > > > > > > This patchset addresses 2 issues in /proc/kpageflags. > > > > > > > > 1. We can't easily tell folio from thp, because currently both pages are > > > > judged as thp, and > > > > 2. we see some garbage data in records of compound tail pages because > > > > we use tail pages to store some internal data. > > > > > > > > These issues require userspace programs to do additional work to understand > > > > the page status, which makes situation more complicated. > > > > > > > > This patchset tries to solve these by defining KPF_FOLIO for issue 1., and > > > > by hiding part of page flag info on tail pages of compound pages for issue 2. > > > > > > > > I think that technically some compound pages like thp/hugetlb/slab could be > > > > considered as folio, but in this version KPF_FOLIO is set only on folios > > > > > > At least thp+hugetlb are most certainly folios. Regarding slab, I suspect we > > > no longer call them folios (cannot be mapped to user space). But Im not sure > > > about the type hierarchy. > > > > I'm not sure about the exact definition of "folio", and I think it's better > > to make KPF_FOLIO set based on the definition. > > Me neither. But in any case a THP *is* a folio. So you'd have to set that > flag in any case. OK. > > And any order-0 page (i.e., anon, pagecache) is also a folio. What you seem > to imply with folio is "large folio". So KPF_FOLIO is really wrong as far as > I can tell. Ah, I meant "large folio" for the new flag, so it might have been better to name it KPF_LARGE_FOLIO. > > > "being mapped to userspace" can be one possible criteria for the definition. > > But reading source code, folio_slab() and slab_folio() convert between > > struct slab and struct folio, so I feel that someone might think a slab is > > a kind of folio. > > I keep forgetting if "folio" is just the generic term for any order-0 or > compound page, or only for some of them. I usually live in the "anon" world, > so I don't get reminded that often :) I didn't notice that an order-0 page is also a folio. > > > > > > in pagecache (so "folios in narrower meaning"). I'm not confident about > > > > this choice, so if you have any idea about this, please let me know. > > > > > > It does sound inconsistent. What exactly do you want to tell user space with > > > the new flag? > > > > The current most problematic behavior is to report folio as thp (order-2 > > pagecache page is definitely a folio but not a thp), and this is what the > > new flag is intended to tell. > > We are currently considering calling these sub-PMD sized THPs "small-sized > THP". [1] Arguably, we're starting with the anon part where we won't get > around exposing them to the user in sysfs. > > So I wouldn't immediately say that these things are not THPs. They are not > PMD-sized THP. A slab/hugetlb is certainly not a thp but a folio. Whereby > slabs can also be order-0 folios, but hugetlb can't. > > > Looking at other interfaces, we do expose: > > include/uapi/linux/kernel-page-flags.h:#define KPF_COMPOUND_HEAD 15 > include/uapi/linux/kernel-page-flags.h:#define KPF_COMPOUND_TAIL 16 > > So maybe we should just continue talking about compound pages or do we have > to use both terms here in this interface? Extending the concept of thp to arbitrary size of thp sounds good to me. If patchset [1] will be merged, then setting KPF_THP on large folios is totally fine and one of my problem in this patchset will be automatically resolved. So I'm thinking of not adding new flag and just focusing on garbage data issue. Thank you very much for sharing ideas. Naoya Horiguchi > > [1] https://lkml.kernel.org/r/20230929114421.3761121-1-ryan.roberts@arm.com > > -- > Cheers, > > David / dhildenb >
On 13.10.23 02:54, Naoya Horiguchi wrote: > On Thu, Oct 12, 2023 at 05:30:34PM +0200, David Hildenbrand wrote: >> On 12.10.23 17:02, Naoya Horiguchi wrote: >>> On Thu, Oct 12, 2023 at 10:33:04AM +0200, David Hildenbrand wrote: >>>> On 10.10.23 16:27, Naoya Horiguchi wrote: >>>>> Hi everyone, >>>>> >>>>> This patchset addresses 2 issues in /proc/kpageflags. >>>>> >>>>> 1. We can't easily tell folio from thp, because currently both pages are >>>>> judged as thp, and >>>>> 2. we see some garbage data in records of compound tail pages because >>>>> we use tail pages to store some internal data. >>>>> >>>>> These issues require userspace programs to do additional work to understand >>>>> the page status, which makes situation more complicated. >>>>> >>>>> This patchset tries to solve these by defining KPF_FOLIO for issue 1., and >>>>> by hiding part of page flag info on tail pages of compound pages for issue 2. >>>>> >>>>> I think that technically some compound pages like thp/hugetlb/slab could be >>>>> considered as folio, but in this version KPF_FOLIO is set only on folios >>>> >>>> At least thp+hugetlb are most certainly folios. Regarding slab, I suspect we >>>> no longer call them folios (cannot be mapped to user space). But Im not sure >>>> about the type hierarchy. >>> >>> I'm not sure about the exact definition of "folio", and I think it's better >>> to make KPF_FOLIO set based on the definition. >> >> Me neither. But in any case a THP *is* a folio. So you'd have to set that >> flag in any case. > > OK. > >> >> And any order-0 page (i.e., anon, pagecache) is also a folio. What you seem >> to imply with folio is "large folio". So KPF_FOLIO is really wrong as far as >> I can tell. > > Ah, I meant "large folio" for the new flag, so it might have been better to > name it KPF_LARGE_FOLIO. > >> >>> "being mapped to userspace" can be one possible criteria for the definition. >>> But reading source code, folio_slab() and slab_folio() convert between >>> struct slab and struct folio, so I feel that someone might think a slab is >>> a kind of folio. >> >> I keep forgetting if "folio" is just the generic term for any order-0 or >> compound page, or only for some of them. I usually live in the "anon" world, >> so I don't get reminded that often :) > > I didn't notice that an order-0 page is also a folio. > >> >> >>>>> in pagecache (so "folios in narrower meaning"). I'm not confident about >>>>> this choice, so if you have any idea about this, please let me know. >>>> >>>> It does sound inconsistent. What exactly do you want to tell user space with >>>> the new flag? >>> >>> The current most problematic behavior is to report folio as thp (order-2 >>> pagecache page is definitely a folio but not a thp), and this is what the >>> new flag is intended to tell. >> >> We are currently considering calling these sub-PMD sized THPs "small-sized >> THP". [1] Arguably, we're starting with the anon part where we won't get >> around exposing them to the user in sysfs. >> >> So I wouldn't immediately say that these things are not THPs. They are not >> PMD-sized THP. A slab/hugetlb is certainly not a thp but a folio. Whereby >> slabs can also be order-0 folios, but hugetlb can't. >> >> >> Looking at other interfaces, we do expose: >> >> include/uapi/linux/kernel-page-flags.h:#define KPF_COMPOUND_HEAD 15 >> include/uapi/linux/kernel-page-flags.h:#define KPF_COMPOUND_TAIL 16 >> >> So maybe we should just continue talking about compound pages or do we have >> to use both terms here in this interface? > > Extending the concept of thp to arbitrary size of thp sounds good to me. > If patchset [1] will be merged, then setting KPF_THP on large folios is totally > fine and one of my problem in this patchset will be automatically resolved. CCing Ryan. > So I'm thinking of not adding new flag and just focusing on garbage data issue. That sounds minimal and reasonable! Flags/values that logically belong to the head (although are stored in the tail) should probably be exposed along with the head. Flags that apply to the actual tail pages should stay with the tail pages. > > Thank you very much for sharing ideas. Thank you!
On Thu, Oct 12, 2023 at 05:30:34PM +0200, David Hildenbrand wrote: > On 12.10.23 17:02, Naoya Horiguchi wrote: > > On Thu, Oct 12, 2023 at 10:33:04AM +0200, David Hildenbrand wrote: > > > On 10.10.23 16:27, Naoya Horiguchi wrote: > > > > Hi everyone, > > > > > > > > This patchset addresses 2 issues in /proc/kpageflags. > > > > > > > > 1. We can't easily tell folio from thp, because currently both pages are > > > > judged as thp, and > > > > 2. we see some garbage data in records of compound tail pages because > > > > we use tail pages to store some internal data. > > > > > > > > These issues require userspace programs to do additional work to understand > > > > the page status, which makes situation more complicated. > > > > > > > > This patchset tries to solve these by defining KPF_FOLIO for issue 1., and > > > > by hiding part of page flag info on tail pages of compound pages for issue 2. > > > > > > > > I think that technically some compound pages like thp/hugetlb/slab could be > > > > considered as folio, but in this version KPF_FOLIO is set only on folios > > > > > > At least thp+hugetlb are most certainly folios. Regarding slab, I suspect we > > > no longer call them folios (cannot be mapped to user space). But Im not sure > > > about the type hierarchy. > > > > I'm not sure about the exact definition of "folio", and I think it's better > > to make KPF_FOLIO set based on the definition. > > Me neither. But in any case a THP *is* a folio. So you'd have to set that > flag in any case. > > And any order-0 page (i.e., anon, pagecache) is also a folio. What you seem > to imply with folio is "large folio". So KPF_FOLIO is really wrong as far as > I can tell. Our type hierarchy is degenerate ... in both the neutral and negative sense of the word. A folio is simply not-a-tail-page. So, as you said, all head pages and all order-0 pages are folios. But we're still struggling against the legacy of our "struct page is everything" mistake, and trying to fix that too. The general term I've chosen for this is "memdesc", but we aren't very far down the route of disentangling the various types from either page or folio. I'd imagined that we'd convert everything to folio, then get into splitting them out, but at least for ptdesc and slab we've gone for the direct conversion approach. At some point we probably want to disentangle anon folios from file folios, but that's a fair ways down the list, after turning folios into a separate allocation from struct page. At least on my list ... if someone wants to do that as a matter of urgency, I'm sure they can be accomodated. It's not an easy task, for sure. Our needs are better expressed as (in Java terms) Interfaces rather than subclasses. Or Traits/Generics if you've started learning Rust. We definitely have the concept of "mappable to userspace" which applies to anon, file, netmem, some device driver allocations, some vmalloc allocations, but not slab, page tables, or free memory. Those memdescs need refcount, mapcount, dirty flag, lock flag, maybe mapping? Then we have "managed by the LRU" which applies to anon & file only. Those memdescs need refcount, lru, and a pile of flags. There's definitely scope for reordering and shrinking the various memdescs. Once they're fully separated from struct page. What we _call_ them is a separate struggle. Try to imagine how shrink_folio_list() works if filemem & anonmem have different types ... > > > It does sound inconsistent. What exactly do you want to tell user space with > > > the new flag? > > > > The current most problematic behavior is to report folio as thp (order-2 > > pagecache page is definitely a folio but not a thp), and this is what the > > new flag is intended to tell. > > We are currently considering calling these sub-PMD sized THPs "small-sized > THP". [1] Arguably, we're starting with the anon part where we won't get > around exposing them to the user in sysfs. > > So I wouldn't immediately say that these things are not THPs. They are not > PMD-sized THP. A slab/hugetlb is certainly not a thp but a folio. Whereby > slabs can also be order-0 folios, but hugetlb can't. I think this is a mistake. Users expect THPs to be PMD sized. We already have the term "large folio" in use for file-backed memory; why do we need to invent a new term for anon large folios? > Looking at other interfaces, we do expose: > > include/uapi/linux/kernel-page-flags.h:#define KPF_COMPOUND_HEAD 15 > include/uapi/linux/kernel-page-flags.h:#define KPF_COMPOUND_TAIL 16 > > So maybe we should just continue talking about compound pages or do we have > to use both terms here in this interface? I don;t know how easy it's going to be to distinguish between a head and tail page in the Glorious Future once pages and folios are separated.
>>>> It does sound inconsistent. What exactly do you want to tell user space with >>>> the new flag? >>> >>> The current most problematic behavior is to report folio as thp (order-2 >>> pagecache page is definitely a folio but not a thp), and this is what the >>> new flag is intended to tell. >> >> We are currently considering calling these sub-PMD sized THPs "small-sized >> THP". [1] Arguably, we're starting with the anon part where we won't get >> around exposing them to the user in sysfs. >> >> So I wouldn't immediately say that these things are not THPs. They are not >> PMD-sized THP. A slab/hugetlb is certainly not a thp but a folio. Whereby >> slabs can also be order-0 folios, but hugetlb can't. > > I think this is a mistake. Users expect THPs to be PMD sized. We already > have the term "large folio" in use for file-backed memory; why do we > need to invent a new term for anon large folios? I changed my opinion two times, but I stabilized at "these are just huge pages of different size" when it comes to user-visible features. Handling/calling them folios internally -- especially to abstract the page vs. compound page and how we manage/handle the metadata -- is a reasonable thing to do, because that's what we decided to pass around. For future reference, here is a writeup about my findings and the reason for my opinion: (1) OS-independent concept Ignoring how the OS manages metadata (e.g., "struct page", "struct folio", compound head/tail, memdesc, ...), the common term to describe a "the smallest fixed-length contiguous block of physical memory into which memory pages are mapped by the operating system.["[1] is a page frame -- people usually simplify by dropping the "frame" part, so do I. Larger pages (which we call "huge pages", FreeBSD "superpages", Windows "large pages") can come in different sizes and were traditionally based on architecture support, whereby architectures can support multiple ones [1]; I think what we see is that the OS might use intermediate sizes to manage memory more efficiently, abstracting/evolving that concept from the actual hardware page table mapping granularity. But the foundation is that we are dealing with "blocks of physical memory" in a unit that is larger than the smallest page sizes. Larger pages. [the comment about SGI IRIX on [1] is an interesting read; so are "scattered superpages"[3]] Users learned the difference between a "page" and a "huge page". I'm confident that they can learn the difference between a "traditional huge page" and a "small-sized huge page", just like they did with hugetlb (below). We just have to be careful with memory statistics and to default to the traditional huge pages for now. Slowly, the term "THP" will become more generic. Apart from that, I fail to see the big source of confusion. Note: FreeBSD currently similarly calls these things on arm64 "medium-sized superpages", and did not invent new terms for that so far [2]. (2) hugetlb Traditional huge pages started out to be PMD-sized. Before 2008, we only supported a single huge page size. Ever since, we added support for sizes larger (gigantic) and smaller than that (cont-pte / cont-pmd). So (a) users did not panic because we also supported huge pages that were not PMD-sized; (b) we managed to integrate it into the existing environment, defaulting to the old PMD-sized huge pages towards the user but still providing configuration knobs and (c) it is natural today to have multiple huge page sizes supported in hugetlb. Nowadays, when somebody says that they are using hugetlb huge pages, the first question frequently is "which huge page size?". The same will happen with transparent huge pages I believe. (3) THP preparation for multiple sizes With /sys/kernel/mm/transparent_hugepage/hpage_pmd_size added in 2016, we already provided a way for users to query the PMD size for THP, implying that there might be multiple sizes in the future. Therefore, in commit 49920d28781d, Hugh already envisioned " some transparent support for pud and pgd pages" and ended up calling it "_pmd_size". Turns out, we want smaller THPs first, not larger ones. (4) Metadata management How the OS manages metadata for its memory -- and how it calls the involved datastructures -- is IMHO an implementation detail (an important one regarding performance, robustness and metadata overhead as we learned, though ;) ). We were able to introduce folios without user-visible changes. We should be able to implement memdesc (or memory type hierarchies) without user-visible changes -- except for some interfaces that provide access to bare "struct page" information (classifies as debugging interfaces IMHO). Last but not least, we ended up consistently calling these "larger than a page" things that we map into user space "(transparent) huge page" towards the user in toggles, stats and documentation. Fortunately we didn't use the term "compound page" back then; it would have been a mistake. Regarding the pagecache, we managed to not expose any toggles towards the user, because memory waste can be better controlled. So the term "folio" does not pop up as a toggle in /sys and /proc. t14s: ~ $ find /sys -name "*folio*" 2> /dev/null t14s: ~ $ find /proc -name "*folio*" 2> /dev/null Once we want to remove the (sub)page mapcount, we'll likely have to remove _nr_pages_mapped. To make some workloads that are sensitive to memory consumption [4] play along when not accounting only the actually mapped parts, we might have to introduce other ways to control that, when "/sys/kernel/debug/fault_around_bytes" no longer does the trick. I'm hoping we can still find ways to avoid exposing any toggles for that; we'll see. [1] https://en.wikipedia.org/wiki/Page_(computer_memory) [2] https://www.freebsd.org/status/report-2022-04-2022-06/superpages/ [3] https://ieeexplore.ieee.org/document/6657040/similar#similar [4] https://www.suse.com/support/kb/doc/?id=000019017 > >> Looking at other interfaces, we do expose: >> >> include/uapi/linux/kernel-page-flags.h:#define KPF_COMPOUND_HEAD 15 >> include/uapi/linux/kernel-page-flags.h:#define KPF_COMPOUND_TAIL 16 >> >> So maybe we should just continue talking about compound pages or do we have >> to use both terms here in this interface? > > I don;t know how easy it's going to be to distinguish between a head > and tail page in the Glorious Future once pages and folios are separated. Probably a page-based interface would be the wrong interface for that; fortunately, this interface has a "debugging" smell to it, so we might be able to replace it.
On 16/10/2023 11:13, David Hildenbrand wrote: >>>>> It does sound inconsistent. What exactly do you want to tell user space with >>>>> the new flag? >>>> >>>> The current most problematic behavior is to report folio as thp (order-2 >>>> pagecache page is definitely a folio but not a thp), and this is what the >>>> new flag is intended to tell. >>> >>> We are currently considering calling these sub-PMD sized THPs "small-sized >>> THP". [1] Arguably, we're starting with the anon part where we won't get >>> around exposing them to the user in sysfs. >>> >>> So I wouldn't immediately say that these things are not THPs. They are not >>> PMD-sized THP. A slab/hugetlb is certainly not a thp but a folio. Whereby >>> slabs can also be order-0 folios, but hugetlb can't. >> >> I think this is a mistake. Users expect THPs to be PMD sized. We already >> have the term "large folio" in use for file-backed memory; why do we >> need to invent a new term for anon large folios? > > I changed my opinion two times, but I stabilized at "these are just huge pages > of different size" when it comes to user-visible features. > > Handling/calling them folios internally -- especially to abstract the page vs. > compound page and how we manage/handle the metadata -- is a reasonable thing to > do, because that's what we decided to pass around. > > > For future reference, here is a writeup about my findings and the reason for my > opinion: > > > (1) OS-independent concept > > Ignoring how the OS manages metadata (e.g., "struct page", "struct folio", > compound head/tail, memdesc, ...), the common term to describe a "the smallest > fixed-length contiguous block of physical memory into which memory pages are > mapped by the operating system.["[1] is a page frame -- people usually simplify > by dropping the "frame" part, so do I. > > Larger pages (which we call "huge pages", FreeBSD "superpages", Windows "large > pages") can come in different sizes and were traditionally based on architecture > support, whereby architectures can support multiple ones [1]; I think what we > see is that the OS might use intermediate sizes to manage memory more > efficiently, abstracting/evolving that concept from the actual hardware page > table mapping granularity. > > But the foundation is that we are dealing with "blocks of physical memory" in a > unit that is larger than the smallest page sizes. Larger pages. > > [the comment about SGI IRIX on [1] is an interesting read; so are "scattered > superpages"[3]] > > Users learned the difference between a "page" and a "huge page". I'm confident > that they can learn the difference between a "traditional huge page" and a > "small-sized huge page", just like they did with hugetlb (below). > > We just have to be careful with memory statistics and to default to the > traditional huge pages for now. Slowly, the term "THP" will become more generic. > Apart from that, I fail to see the big source of confusion. > > Note: FreeBSD currently similarly calls these things on arm64 "medium-sized > superpages", and did not invent new terms for that so far [2]. > > > (2) hugetlb > > Traditional huge pages started out to be PMD-sized. Before 2008, we only > supported a single huge page size. Ever since, we added support for sizes larger > (gigantic) and smaller than that (cont-pte / cont-pmd). > > So (a) users did not panic because we also supported huge pages that were not > PMD-sized; (b) we managed to integrate it into the existing environment, > defaulting to the old PMD-sized huge pages towards the user but still providing > configuration knobs and (c) it is natural today to have multiple huge page sizes > supported in hugetlb. > > Nowadays, when somebody says that they are using hugetlb huge pages, the first > question frequently is "which huge page size?". The same will happen with > transparent huge pages I believe. > > > (3) THP preparation for multiple sizes > > With > /sys/kernel/mm/transparent_hugepage/hpage_pmd_size > added in 2016, we already provided a way for users to query the PMD size for > THP, implying that there might be multiple sizes in the future. > > Therefore, in commit 49920d28781d, Hugh already envisioned " some transparent > support for pud and pgd pages" and ended up calling it "_pmd_size". Turns out, > we want smaller THPs first, not larger ones. > > > (4) Metadata management > > How the OS manages metadata for its memory -- and how it calls the involved > datastructures -- is IMHO an implementation detail (an important one regarding > performance, robustness and metadata overhead as we learned, though ;) ). > > We were able to introduce folios without user-visible changes. We should be able > to implement memdesc (or memory type hierarchies) without user-visible changes > -- except for some interfaces that provide access to bare "struct page" > information (classifies as debugging interfaces IMHO). > > > Last but not least, we ended up consistently calling these "larger than a page" > things that we map into user space "(transparent) huge page" towards the user in > toggles, stats and documentation. Fortunately we didn't use the term "compound > page" back then; it would have been a mistake. > > > Regarding the pagecache, we managed to not expose any toggles towards the user, > because memory waste can be better controlled. So the term "folio" does not pop > up as a toggle in /sys and /proc. > > t14s: ~ $ find /sys -name "*folio*" 2> /dev/null > t14s: ~ $ find /proc -name "*folio*" 2> /dev/null > > Once we want to remove the (sub)page mapcount, we'll likely have to remove > _nr_pages_mapped. To make some workloads that are sensitive to memory > consumption [4] play along when not accounting only the actually mapped parts, > we might have to introduce other ways to control that, when > "/sys/kernel/debug/fault_around_bytes" no longer does the trick. I'm hoping we > can still find ways to avoid exposing any toggles for that; we'll see. > > > [1] https://en.wikipedia.org/wiki/Page_(computer_memory) > [2] https://www.freebsd.org/status/report-2022-04-2022-06/superpages/ > [3] https://ieeexplore.ieee.org/document/6657040/similar#similar > [4] https://www.suse.com/support/kb/doc/?id=000019017 +1 for David's reasoning. FWIW, the way I see it, everything is a folio; a folio is an implementation detail that neatly abstracts a physically contiguous, power-of-2 number of pages (including the single page case). So I'm not sure how useful it is to add the proposed KPF_FOLIO flag. The only real thing I can imagine user space using it for would be to tell if some extent of virtual memory is physically contiguous, and you can already do that from the PFN. Bigger picture interface-wise, I think it is simpler and more understandable to the user to extend an existing concept (THP) rather than invent a new one (folios) that substantially overlaps with the existing (PMD-sized) THP concept. That said, if you have plans in the folio roadmap that I'm not aware of, then perhaps those would change my mind. There is a thread here [1] where we are discussing the best way to expose "small-sized THP" (anon large folios) to user space - Metthew if you you stong feelings, please do reply! [1] https://lore.kernel.org/linux-mm/6d89fdc9-ef55-d44e-bf12-fafff318aef8@redhat.com/ Thanks, Ryan > > >> >>> Looking at other interfaces, we do expose: >>> >>> include/uapi/linux/kernel-page-flags.h:#define KPF_COMPOUND_HEAD 15 >>> include/uapi/linux/kernel-page-flags.h:#define KPF_COMPOUND_TAIL 16 >>> >>> So maybe we should just continue talking about compound pages or do we have >>> to use both terms here in this interface? >> >> I don;t know how easy it's going to be to distinguish between a head >> and tail page in the Glorious Future once pages and folios are separated. > > Probably a page-based interface would be the wrong interface for that; > fortunately, this interface has a "debugging" smell to it, so we might be able > to replace it. >
On Mon, Oct 16, 2023 at 12:36:22PM +0100, Ryan Roberts wrote: > On 16/10/2023 11:13, David Hildenbrand wrote: > >>>>> It does sound inconsistent. What exactly do you want to tell user space with > >>>>> the new flag? > >>>> > >>>> The current most problematic behavior is to report folio as thp (order-2 > >>>> pagecache page is definitely a folio but not a thp), and this is what the > >>>> new flag is intended to tell. > >>> > >>> We are currently considering calling these sub-PMD sized THPs "small-sized > >>> THP". [1] Arguably, we're starting with the anon part where we won't get > >>> around exposing them to the user in sysfs. > >>> > >>> So I wouldn't immediately say that these things are not THPs. They are not > >>> PMD-sized THP. A slab/hugetlb is certainly not a thp but a folio. Whereby > >>> slabs can also be order-0 folios, but hugetlb can't. > >> > >> I think this is a mistake. Users expect THPs to be PMD sized. We already > >> have the term "large folio" in use for file-backed memory; why do we > >> need to invent a new term for anon large folios? > > > > I changed my opinion two times, but I stabilized at "these are just huge pages > > of different size" when it comes to user-visible features. > > > > Handling/calling them folios internally -- especially to abstract the page vs. > > compound page and how we manage/handle the metadata -- is a reasonable thing to > > do, because that's what we decided to pass around. > > > > > > For future reference, here is a writeup about my findings and the reason for my > > opinion: > > > > > > (1) OS-independent concept > > > > Ignoring how the OS manages metadata (e.g., "struct page", "struct folio", > > compound head/tail, memdesc, ...), the common term to describe a "the smallest > > fixed-length contiguous block of physical memory into which memory pages are > > mapped by the operating system.["[1] is a page frame -- people usually simplify > > by dropping the "frame" part, so do I. > > > > Larger pages (which we call "huge pages", FreeBSD "superpages", Windows "large > > pages") can come in different sizes and were traditionally based on architecture > > support, whereby architectures can support multiple ones [1]; I think what we > > see is that the OS might use intermediate sizes to manage memory more > > efficiently, abstracting/evolving that concept from the actual hardware page > > table mapping granularity. > > > > But the foundation is that we are dealing with "blocks of physical memory" in a > > unit that is larger than the smallest page sizes. Larger pages. > > > > [the comment about SGI IRIX on [1] is an interesting read; so are "scattered > > superpages"[3]] > > > > Users learned the difference between a "page" and a "huge page". I'm confident > > that they can learn the difference between a "traditional huge page" and a > > "small-sized huge page", just like they did with hugetlb (below). > > > > We just have to be careful with memory statistics and to default to the > > traditional huge pages for now. Slowly, the term "THP" will become more generic. > > Apart from that, I fail to see the big source of confusion. > > > > Note: FreeBSD currently similarly calls these things on arm64 "medium-sized > > superpages", and did not invent new terms for that so far [2]. > > > > > > (2) hugetlb > > > > Traditional huge pages started out to be PMD-sized. Before 2008, we only > > supported a single huge page size. Ever since, we added support for sizes larger > > (gigantic) and smaller than that (cont-pte / cont-pmd). > > > > So (a) users did not panic because we also supported huge pages that were not > > PMD-sized; (b) we managed to integrate it into the existing environment, > > defaulting to the old PMD-sized huge pages towards the user but still providing > > configuration knobs and (c) it is natural today to have multiple huge page sizes > > supported in hugetlb. > > > > Nowadays, when somebody says that they are using hugetlb huge pages, the first > > question frequently is "which huge page size?". The same will happen with > > transparent huge pages I believe. > > > > > > (3) THP preparation for multiple sizes > > > > With > > /sys/kernel/mm/transparent_hugepage/hpage_pmd_size > > added in 2016, we already provided a way for users to query the PMD size for > > THP, implying that there might be multiple sizes in the future. > > > > Therefore, in commit 49920d28781d, Hugh already envisioned " some transparent > > support for pud and pgd pages" and ended up calling it "_pmd_size". Turns out, > > we want smaller THPs first, not larger ones. > > > > > > (4) Metadata management > > > > How the OS manages metadata for its memory -- and how it calls the involved > > datastructures -- is IMHO an implementation detail (an important one regarding > > performance, robustness and metadata overhead as we learned, though ;) ). > > > > We were able to introduce folios without user-visible changes. We should be able > > to implement memdesc (or memory type hierarchies) without user-visible changes > > -- except for some interfaces that provide access to bare "struct page" > > information (classifies as debugging interfaces IMHO). > > > > > > Last but not least, we ended up consistently calling these "larger than a page" > > things that we map into user space "(transparent) huge page" towards the user in > > toggles, stats and documentation. Fortunately we didn't use the term "compound > > page" back then; it would have been a mistake. > > > > > > Regarding the pagecache, we managed to not expose any toggles towards the user, > > because memory waste can be better controlled. So the term "folio" does not pop > > up as a toggle in /sys and /proc. > > > > t14s: ~ $ find /sys -name "*folio*" 2> /dev/null > > t14s: ~ $ find /proc -name "*folio*" 2> /dev/null > > > > Once we want to remove the (sub)page mapcount, we'll likely have to remove > > _nr_pages_mapped. To make some workloads that are sensitive to memory > > consumption [4] play along when not accounting only the actually mapped parts, > > we might have to introduce other ways to control that, when > > "/sys/kernel/debug/fault_around_bytes" no longer does the trick. I'm hoping we > > can still find ways to avoid exposing any toggles for that; we'll see. > > > > > > [1] https://en.wikipedia.org/wiki/Page_(computer_memory) > > [2] https://www.freebsd.org/status/report-2022-04-2022-06/superpages/ > > [3] https://ieeexplore.ieee.org/document/6657040/similar#similar > > [4] https://www.suse.com/support/kb/doc/?id=000019017 > > +1 for David's reasoning. > > FWIW, the way I see it, everything is a folio; a folio is an implementation > detail that neatly abstracts a physically contiguous, power-of-2 number of pages > (including the single page case). So I'm not sure how useful it is to add the > proposed KPF_FOLIO flag. The only real thing I can imagine user space using it > for would be to tell if some extent of virtual memory is physically contiguous, > and you can already do that from the PFN. > > Bigger picture interface-wise, I think it is simpler and more understandable to > the user to extend an existing concept (THP) rather than invent a new one > (folios) that substantially overlaps with the existing (PMD-sized) THP concept. > > That said, if you have plans in the folio roadmap that I'm not aware of, then > perhaps those would change my mind. There is a thread here [1] where we are > discussing the best way to expose "small-sized THP" (anon large folios) to user > space - Metthew if you you stong feelings, please do reply! > > [1] > https://lore.kernel.org/linux-mm/6d89fdc9-ef55-d44e-bf12-fafff318aef8@redhat.com/ > > Thanks, > Ryan > > > > > > > >> > >>> Looking at other interfaces, we do expose: > >>> > >>> include/uapi/linux/kernel-page-flags.h:#define KPF_COMPOUND_HEAD 15 > >>> include/uapi/linux/kernel-page-flags.h:#define KPF_COMPOUND_TAIL 16 > >>> > >>> So maybe we should just continue talking about compound pages or do we have > >>> to use both terms here in this interface? > >> > >> I don;t know how easy it's going to be to distinguish between a head > >> and tail page in the Glorious Future once pages and folios are separated. > > > > Probably a page-based interface would be the wrong interface for that; > > fortunately, this interface has a "debugging" smell to it, so we might be able > > to replace it. This interface exposes per-pfn (not per-page) data records, specifying pfn by file offset. It does not care about distinction between head and tail. So I don't think that we can avoid referring to tail pages even after page-to-folio conversion is complete. But I agree that this interface is for debugging or testing. To clarify this, we might consider relocating this interface to a more suitable location within debugfs, making it effectively invisible to non-debugging processes. And maybe this could be the case also for other similar interfaces /proc/kpage*. So all these files can be handled together to address this problem. Thanks, Naoya Horiguchi