Message ID | 20230403201839.4097845-7-zi.yan@sent.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2563671vqo; Mon, 3 Apr 2023 13:23:51 -0700 (PDT) X-Google-Smtp-Source: AKy350YI/I+pz5cShKonSvolwotKTJw5AGp4vWYXIEmMYW8D4m9N8DnqBwC2SCVD/Q/vrXE58+fw X-Received: by 2002:a05:6a20:85a2:b0:d3:6d54:7852 with SMTP id s34-20020a056a2085a200b000d36d547852mr10742pzd.31.1680553431202; Mon, 03 Apr 2023 13:23:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680553431; cv=none; d=google.com; s=arc-20160816; b=Nw/Upz4ClUIxP+tqEUS82ZOhJiExi63COKDjBlmRhCErRR+3PsmBmXpT6XJkWEqjpJ JqE5GiBdn2ehJuDx4TSh1+KjZSJVAaa10Ri3lLAoG6b/A99r048zS3cSpkEzB+Id8iGG dAnO2TnjKm0fQfEtd1K+O9HoIRbg5vKKKk2jm7H3T4NLOlBIOw0ifXPAgtmKQBeiTYvF /+/E9N83fwUhlPwFJg0Nrfigypw+tvx9/WSM9UsFEZ9rleQZPEmMORBlfjdLveTkt/gJ SrmC0qm0COh4fKuvr2CH7tS8FCfvsa5zAxpqOv7wSE+e/DUNW6zVj3tuTssyMNT1SW1Y 9uVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version:reply-to :references:in-reply-to:message-id:date:subject:cc:to:from :feedback-id:dkim-signature:dkim-signature; bh=n0I3rKbVmW11Ko5MdyrWnLEXy5CTcBBnWUTtQZPt4TU=; b=QbUvLTDOsmB8iOVr+7Zotl9SrUxYx49035VQjY0jfm+zoEvrDgqyxFhsMjI9sORpMv PIumxEFP5ypemEG77acnc6dkMPNsfcB84vNA9kgAJ6xDIftv5SsY/xKcXUSoQZkvPhFV 0Ps7nkDZHEKr7FHdAH2YiJFlJpN3I63gDAFxYZ2Pc24WzkiC1oE7BXqthcscnWi6IWnb FaIwnF48qyOSSURZuOAEpG82NBo95hffktJCC7LHEvDJAvROiHaQsPpxawbwR0IzEnbv Don9PhgCnka4wUb5FU6ypUlKyI6xiV3fPWLpTRu6fqnMka7S3xwXZyfbm/R6iPCP3rUW 3nHQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=CknXjAfE; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=hNICcSGN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w7-20020a623007000000b0062880abbf2esi8548747pfw.315.2023.04.03.13.23.39; Mon, 03 Apr 2023 13:23:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@sent.com header.s=fm2 header.b=CknXjAfE; dkim=pass header.i=@messagingengine.com header.s=fm2 header.b=hNICcSGN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sent.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232543AbjDCUTZ (ORCPT <rfc822;zwp10758@gmail.com> + 99 others); Mon, 3 Apr 2023 16:19:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54972 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232943AbjDCUTJ (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 3 Apr 2023 16:19:09 -0400 Received: from out3-smtp.messagingengine.com (out3-smtp.messagingengine.com [66.111.4.27]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 93EDD30F2; Mon, 3 Apr 2023 13:19:08 -0700 (PDT) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id F34585C012D; Mon, 3 Apr 2023 16:19:07 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Mon, 03 Apr 2023 16:19:07 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=cc :cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:reply-to:sender:subject:subject:to:to; s=fm2; t= 1680553147; x=1680639547; bh=n0I3rKbVmW11Ko5MdyrWnLEXy5CTcBBnWUT tQZPt4TU=; b=CknXjAfEg+BlpRI36eBMzyBQgGufPONGhPvWy7s+gy/bhNUeDoy XrZZS36kGlcl7YvPRJZxh2QYUhLMrXCglMzsbdi6caUVkB32GPU136oGWOw9D7FN a8wMhxHbFNmlt1qrJ4P/OxqrHkHDhHurwSRTuukUPsm7F/OfAzu7PwJdwNe7hY1d dOGCKNRadN2+ZLi0jXCvtfukxAs0chGy99+JG1ewv3jNM2OLkjwCNlw0I19tiNiQ FnGOZayjMtLBHucHfEbFGOi/9g1Wg+PAq0cU0Ne/rfk5Pt2zZn8rEnuff6x7Lwtq wL12JuQLFZHNFUdi0ohPgNaCwuQhZRPtcow== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t= 1680553147; x=1680639547; bh=n0I3rKbVmW11Ko5MdyrWnLEXy5CTcBBnWUT tQZPt4TU=; b=hNICcSGN1EXXDV3uYu5QpP5/qwn5lAAgv19FTzFTRbX5N0FlNAD +5Ly6h8+eRZyyrSEBPSN01hlDZia4jeE8G0YMvkRUcpZK9YRyKP43ep+HjxUeE20 azKRsRgrYvZ0GHFOvtBFsAnC1eAK617AZaxsoAL4N0/2EGRvbUPQr0Hgug9brVGx wq1vCYXFWpRBjdnVF9SmQ0OIxtvSSxi76UNt/wFGEF57YmmJAUg0kBVUaVJRBARw Peey3G6reNzAV18EFN4QhsoRg4Ow1+bNDTN5r17diEi2jc8HfCGGMbSUEL0f0Ikv GUwL+HnFRkobihhK9fbtiKJ902159F8YvgA== X-ME-Sender: <xms:uzQrZB6IJt0E4Vr7WXGEEOtrVSNqZr0oRs82LO0MPz-HoT9ix9hZIQ> <xme:uzQrZO4EUY-KmN6UILhtIOt_8x7Ov0dXwU_33A4TVKRm0yI_4gULxoOGGcrBNMYBG cDCBkQgxY_y5Sc0Yw> X-ME-Received: <xmr:uzQrZIdFtbYcVopmKQ27n_ZKtGjULpTfkNbprXMl9jxFx4OX7DigqM5rm16vg_aaTKU_JDAsIcT_lptJWt4sJZnugHE5DQKp192EwTFjIzUgnd-uK3aiy-qxd-KxXLM> X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrvdeijedgudegkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvfevufffkffojghfrhgggfestdhqredtredttdenucfhrhhomhepkghi ucgjrghnuceoiihirdihrghnsehsvghnthdrtghomheqnecuggftrfgrthhtvghrnhepge eghedugfduuddvleehheetgeeltdetieevuefhffevkefhveeufeeiieejgedvnecuvehl uhhsthgvrhfuihiivgepvdenucfrrghrrghmpehmrghilhhfrhhomhepiihirdihrghnse hsvghnthdrtghomh X-ME-Proxy: <xmx:uzQrZKJQ_RZ7WBIwcyQsvqV-5WucpCTLug_YI5UCZKnhtrozXo1qXw> <xmx:uzQrZFKl6G3Z1OlYdkv7gle-VuRdh66aK_GjPcy6tb_G5GaKm9kJhg> <xmx:uzQrZDysFNWsA_zBLmOw_g7mj6g0RwYMp4NYg8VcFUN-w8SIV08Z-w> <xmx:uzQrZMZs96aArrHocdWadvKbkOMQXEzr9eTIkcWDRGQQg8vr-qv-vg> Feedback-ID: iccd040f4:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Mon, 3 Apr 2023 16:19:07 -0400 (EDT) From: Zi Yan <zi.yan@sent.com> To: "Matthew Wilcox (Oracle)" <willy@infradead.org>, Yang Shi <shy828301@gmail.com>, Yu Zhao <yuzhao@google.com>, linux-mm@kvack.org Cc: Zi Yan <ziy@nvidia.com>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>, Ryan Roberts <ryan.roberts@arm.com>, =?utf-8?q?Michal_Koutn=C3=BD?= <mkoutny@suse.com>, Roman Gushchin <roman.gushchin@linux.dev>, "Zach O'Keefe" <zokeefe@google.com>, Andrew Morton <akpm@linux-foundation.org>, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH v3 6/7] mm: truncate: split huge page cache page to a non-zero order if possible. Date: Mon, 3 Apr 2023 16:18:38 -0400 Message-Id: <20230403201839.4097845-7-zi.yan@sent.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230403201839.4097845-1-zi.yan@sent.com> References: <20230403201839.4097845-1-zi.yan@sent.com> Reply-To: Zi Yan <ziy@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762187994612315990?= X-GMAIL-MSGID: =?utf-8?q?1762187994612315990?= |
Series |
Split a folio to any lower order folios
|
|
Commit Message
Zi Yan
April 3, 2023, 8:18 p.m. UTC
From: Zi Yan <ziy@nvidia.com> To minimize the number of pages after a huge page truncation, we do not need to split it all the way down to order-0. The huge page has at most three parts, the part before offset, the part to be truncated, the part remaining at the end. Find the greatest common divisor of them to calculate the new page order from it, so we can split the huge page to this order and keep the remaining pages as large and as few as possible. Signed-off-by: Zi Yan <ziy@nvidia.com> --- mm/truncate.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-)
Comments
On Mon, 3 Apr 2023, Zi Yan wrote: > From: Zi Yan <ziy@nvidia.com> > > To minimize the number of pages after a huge page truncation, we do not > need to split it all the way down to order-0. The huge page has at most > three parts, the part before offset, the part to be truncated, the part > remaining at the end. Find the greatest common divisor of them to > calculate the new page order from it, so we can split the huge > page to this order and keep the remaining pages as large and as few as > possible. > > Signed-off-by: Zi Yan <ziy@nvidia.com> > --- > mm/truncate.c | 21 +++++++++++++++++++-- > 1 file changed, 19 insertions(+), 2 deletions(-) > > diff --git a/mm/truncate.c b/mm/truncate.c > index 86de31ed4d32..817efd5e94b4 100644 > --- a/mm/truncate.c > +++ b/mm/truncate.c > @@ -22,6 +22,7 @@ > #include <linux/buffer_head.h> /* grr. try_to_release_page */ > #include <linux/shmem_fs.h> > #include <linux/rmap.h> > +#include <linux/gcd.h> Really? > #include "internal.h" > > /* > @@ -211,7 +212,8 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio) > bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) > { > loff_t pos = folio_pos(folio); > - unsigned int offset, length; > + unsigned int offset, length, remaining; > + unsigned int new_order = folio_order(folio); > > if (pos < start) > offset = start - pos; > @@ -222,6 +224,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) > length = length - offset; > else > length = end + 1 - pos - offset; > + remaining = folio_size(folio) - offset - length; > > folio_wait_writeback(folio); > if (length == folio_size(folio)) { > @@ -236,11 +239,25 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) > */ > folio_zero_range(folio, offset, length); > > + /* > + * Use the greatest common divisor of offset, length, and remaining > + * as the smallest page size and compute the new order from it. So we > + * can truncate a subpage as large as possible. Round up gcd to > + * PAGE_SIZE, otherwise ilog2 can give -1 when gcd/PAGE_SIZE is 0. > + */ > + new_order = ilog2(round_up(gcd(gcd(offset, length), remaining), > + PAGE_SIZE) / PAGE_SIZE); Gosh. In mm/readahead.c I can see "order = __ffs(index)", and I think something along those lines would be more appropriate here. But, if there's any value at all to choosing intermediate orders here in truncation, I don't think choosing a single order is the right approach - more easily implemented, yes, but is it worth doing? What you'd actually want (if anything) is to choose the largest orders possible, with smaller and smaller orders filling in the rest (I expect there's a technical name for this, but I don't remember - bin packing is something else, I think). As this code stands, truncate a 2M huge page at 1M and you get two 1M pieces (one then discarded) - nice; but truncate it at 1M+1 and you get lots of order 2 (forced up from 1) pieces. Seems weird, and not worth the effort. Hugh > + > + /* order-1 THP not supported, downgrade to order-0 */ > + if (new_order == 1) > + new_order = 0; > + > + > if (folio_has_private(folio)) > folio_invalidate(folio, offset, length); > if (!folio_test_large(folio)) > return true; > - if (split_folio(folio) == 0) > + if (split_huge_page_to_list_to_order(&folio->page, NULL, new_order) == 0) > return true; > if (folio_test_dirty(folio)) > return false; > -- > 2.39.2
> As this code stands, truncate a 2M huge page at 1M and you get two 1M > pieces (one then discarded) - nice; but truncate it at 1M+1 and you get > lots of order 2 (forced up from 1) pieces. Seems weird, and not worth > the effort. I've probably said that wrong: truncate at 1M+1 and you'd get lots of order 0 pieces. Hugh
On 16 Apr 2023, at 15:44, Hugh Dickins wrote: > On Mon, 3 Apr 2023, Zi Yan wrote: > >> From: Zi Yan <ziy@nvidia.com> >> >> To minimize the number of pages after a huge page truncation, we do not >> need to split it all the way down to order-0. The huge page has at most >> three parts, the part before offset, the part to be truncated, the part >> remaining at the end. Find the greatest common divisor of them to >> calculate the new page order from it, so we can split the huge >> page to this order and keep the remaining pages as large and as few as >> possible. >> >> Signed-off-by: Zi Yan <ziy@nvidia.com> >> --- >> mm/truncate.c | 21 +++++++++++++++++++-- >> 1 file changed, 19 insertions(+), 2 deletions(-) >> >> diff --git a/mm/truncate.c b/mm/truncate.c >> index 86de31ed4d32..817efd5e94b4 100644 >> --- a/mm/truncate.c >> +++ b/mm/truncate.c >> @@ -22,6 +22,7 @@ >> #include <linux/buffer_head.h> /* grr. try_to_release_page */ >> #include <linux/shmem_fs.h> >> #include <linux/rmap.h> >> +#include <linux/gcd.h> > > Really? > >> #include "internal.h" >> >> /* >> @@ -211,7 +212,8 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio) >> bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) >> { >> loff_t pos = folio_pos(folio); >> - unsigned int offset, length; >> + unsigned int offset, length, remaining; >> + unsigned int new_order = folio_order(folio); >> >> if (pos < start) >> offset = start - pos; >> @@ -222,6 +224,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) >> length = length - offset; >> else >> length = end + 1 - pos - offset; >> + remaining = folio_size(folio) - offset - length; >> >> folio_wait_writeback(folio); >> if (length == folio_size(folio)) { >> @@ -236,11 +239,25 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) >> */ >> folio_zero_range(folio, offset, length); >> >> + /* >> + * Use the greatest common divisor of offset, length, and remaining >> + * as the smallest page size and compute the new order from it. So we >> + * can truncate a subpage as large as possible. Round up gcd to >> + * PAGE_SIZE, otherwise ilog2 can give -1 when gcd/PAGE_SIZE is 0. >> + */ >> + new_order = ilog2(round_up(gcd(gcd(offset, length), remaining), >> + PAGE_SIZE) / PAGE_SIZE); > > Gosh. In mm/readahead.c I can see "order = __ffs(index)", > and I think something along those lines would be more appropriate here. > > But, if there's any value at all to choosing intermediate orders here in > truncation, I don't think choosing a single order is the right approach - > more easily implemented, yes, but is it worth doing? > > What you'd actually want (if anything) is to choose the largest orders > possible, with smaller and smaller orders filling in the rest (I expect > there's a technical name for this, but I don't remember - bin packing > is something else, I think). > > As this code stands, truncate a 2M huge page at 1M and you get two 1M > pieces (one then discarded) - nice; but truncate it at 1M+1 and you get > lots of order 2 (forced up from 1) pieces. Seems weird, and not worth > the effort. The approach I am adding here is the simplest way of splitting a folio and trying to get >0 order folios after the split. Yes, I agree that using "__ffs(index)" can create more >0 order folios, but it comes with either more runtime overheads or more code changes. Like your "1MB + 1" page split example, using "__ffs(index)", ideally, you will split 2MB into 2 1MBs, then 1MB into 2 512KBs, ..., 8KB into 2 4KBs and at the end of the day, we will have 1MB, 512KB, ..., 8KB each and two 2 4KBs, maximizing the number of >0 order folios. But what is the cost? 1. To minimizing code changes, we then need to call split_huge_page_to_list_to_order() 9 times from new_order=8 to new_order=0. Since each split needs to unmap and remap the target folio, we shall see 9 TLB shutdowns. I am not sure it is worth the cost. 2. To get rid of the unmap and remap overheads, we probably can unmap the folio, then do all the 9 splits, then remap all the split pages. But this would make split_huge_page() a lot more complicated and I am not sure a good way of passing split order information and the corresponding to-be-split subpages. Do we need a dynamic list to store them, making new memory allocations a prerequisite of split_huge_page()? Maybe we can encode in the split information in two ints? In the first one, each bit tells which order to split the page (like order=__ffs(index)) and in the second one, each bit tells which subpage to split next (0 means the left subpage, 1 means the right subpage). So your "1MB + 1" page split will be encoded as 0b111111111 (first int), 0b1000000 (second int and it has 1 fewer bit, since first split does not need to know which subpage to split). What do you think? If you have a better idea, I am all ears. And if you are willing to help me review the more complicated code changes, I am more than happy to implement it in the next version. :) Thank you for your comments. They are very helpful! > > Hugh > >> + >> + /* order-1 THP not supported, downgrade to order-0 */ >> + if (new_order == 1) >> + new_order = 0; >> + >> + >> if (folio_has_private(folio)) >> folio_invalidate(folio, offset, length); >> if (!folio_test_large(folio)) >> return true; >> - if (split_folio(folio) == 0) >> + if (split_huge_page_to_list_to_order(&folio->page, NULL, new_order) == 0) >> return true; >> if (folio_test_dirty(folio)) >> return false; >> -- >> 2.39.2 -- Best Regards, Yan, Zi
On Mon, 17 Apr 2023, Zi Yan wrote: > > What do you think? If you have a better idea, I am all ears. And if you > are willing to help me review the more complicated code changes, I am > more than happy to implement it in the next version. :) Sorry, no, not me. You'll have to persuade someone else that "optimizing" truncation is a worthwhile path to pursue (and to pursue now) - I was just trying to illustrate that the current patchset didn't seem very useful. But don't throw your work away. I expect some of it can become useful later e.g. once most of the main filesystems support large folios, and the complication of CONFIG_READ_ONLY_THP_FOR_FS can be deleted. I doubt my "minimizing the number of folios" approach would be worth the effort; I think more likely that we shall settle on an intermediate folio size (between 4K and THP: maybe 64K, but not necessarily the same on all machines or all workloads) to aim for, and then maybe truncation of THP would split to those units. But it's not a job I shall get into - I'll just continue to report and try to fix what bugs I see. Hugh
diff --git a/mm/truncate.c b/mm/truncate.c index 86de31ed4d32..817efd5e94b4 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -22,6 +22,7 @@ #include <linux/buffer_head.h> /* grr. try_to_release_page */ #include <linux/shmem_fs.h> #include <linux/rmap.h> +#include <linux/gcd.h> #include "internal.h" /* @@ -211,7 +212,8 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio) bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) { loff_t pos = folio_pos(folio); - unsigned int offset, length; + unsigned int offset, length, remaining; + unsigned int new_order = folio_order(folio); if (pos < start) offset = start - pos; @@ -222,6 +224,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) length = length - offset; else length = end + 1 - pos - offset; + remaining = folio_size(folio) - offset - length; folio_wait_writeback(folio); if (length == folio_size(folio)) { @@ -236,11 +239,25 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end) */ folio_zero_range(folio, offset, length); + /* + * Use the greatest common divisor of offset, length, and remaining + * as the smallest page size and compute the new order from it. So we + * can truncate a subpage as large as possible. Round up gcd to + * PAGE_SIZE, otherwise ilog2 can give -1 when gcd/PAGE_SIZE is 0. + */ + new_order = ilog2(round_up(gcd(gcd(offset, length), remaining), + PAGE_SIZE) / PAGE_SIZE); + + /* order-1 THP not supported, downgrade to order-0 */ + if (new_order == 1) + new_order = 0; + + if (folio_has_private(folio)) folio_invalidate(folio, offset, length); if (!folio_test_large(folio)) return true; - if (split_folio(folio) == 0) + if (split_huge_page_to_list_to_order(&folio->page, NULL, new_order) == 0) return true; if (folio_test_dirty(folio)) return false;