Message ID | 20230915095042.1320180-7-da.gomez@samsung.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp1250710vqi; Fri, 15 Sep 2023 11:46:18 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF7iHwTSCwUykdSyzsxjrI6C4kDfn9ZXeVjmecW4OxOpyr1PQZKDXEs9NWN6zu6z0luOcfY X-Received: by 2002:a05:6a00:21cb:b0:68f:fa05:b77a with SMTP id t11-20020a056a0021cb00b0068ffa05b77amr2788547pfj.31.1694803578653; Fri, 15 Sep 2023 11:46:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694803578; cv=none; d=google.com; s=arc-20160816; b=be0VdqNgpUT5yWwKfUDVO3+Zga15SH7PDRhyBu0AeQAyX2lR2a7yDAqqtmlyAbcsBb FqFqdwZLxxr9GGHqz56PZe783n0l8T9fvupqvusceiZtC/+nMjiFcYzBSQDTkNARbMVt VH4IgmfJ2ZC0Yxk9vPpxPLo2WrbuUAckufJ1KibvTzcsg2MQltT5r1juEqWIQfgWaHn1 sNAc6OY0SYx4/5S995jZ0qdaZ7lxnpiZqaqkz6ac7R4bsvSQvjW1Y5S9lT4sI419+Ipk GtMj27QxAfpSIq5CnzRNXc6Tn77LLRtzsw0Q1GKFAT2bJONjm+Wt0Jw7X/KOMW08xTdb tAfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:cms-type:mime-version :content-transfer-encoding:content-language:accept-language :in-reply-to:message-id:date:thread-index:thread-topic:subject:cc:to :from:dkim-signature:dkim-filter; bh=nTKXGiIiML3QQW/w2E+mOtcZhHyvpI4/kPyFKr8+glQ=; fh=uxBAUgUSkBYxjd1CTtcj7MO+vdwExfJsOb4I3IkdCcM=; b=OSTGedp2t0sRSh4TEA1LwwD8r4pyL6FEU3iWjybojtFTFmZNp35a6DW1y1XM7aidMx DOdKyPeZ77DGV/oSGCeVBc+LTmohz1mOpNObGUlWvfSBtdkaobsH2rvHkPQKCogxyQ29 C+4dgWFZ4nvGBHaG9+JAI9d1ZNa2P0HL7fnBAG5BE2uGbZtJ1MQUzdEEQyrdQeZWcWuZ psAyfMQYOxfELljXBHXeFIkn1Ybsf6fEfuqerB2pKSxgkql8ccHvI/skqQJB6dWq4npW 2nD7j9B1e7V0AxheVYN08L7EIiuhJag8g28ycIZZQv/45FzR7rCm4tdZCdrXbDcyZCJ8 kCCA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@samsung.com header.s=mail20170921 header.b=KI2kOdoS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=samsung.com Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id d19-20020a637353000000b00573fe94635fsi3591633pgn.398.2023.09.15.11.46.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Sep 2023 11:46:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@samsung.com header.s=mail20170921 header.b=KI2kOdoS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=samsung.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id E332A817C3F8; Fri, 15 Sep 2023 02:55:30 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234028AbjIOJzY (ORCPT <rfc822;ruipengqi7@gmail.com> + 32 others); Fri, 15 Sep 2023 05:55:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34426 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234049AbjIOJzF (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 15 Sep 2023 05:55:05 -0400 Received: from mailout1.w1.samsung.com (mailout1.w1.samsung.com [210.118.77.11]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40B423591 for <linux-kernel@vger.kernel.org>; Fri, 15 Sep 2023 02:53:12 -0700 (PDT) Received: from eucas1p1.samsung.com (unknown [182.198.249.206]) by mailout1.w1.samsung.com (KnoxPortal) with ESMTP id 20230915095134euoutp0198c5d6018e4b66ab8b14f26db708a89f~FCbHVgmS90808108081euoutp01h for <linux-kernel@vger.kernel.org>; Fri, 15 Sep 2023 09:51:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 mailout1.w1.samsung.com 20230915095134euoutp0198c5d6018e4b66ab8b14f26db708a89f~FCbHVgmS90808108081euoutp01h DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com; s=mail20170921; t=1694771494; bh=nTKXGiIiML3QQW/w2E+mOtcZhHyvpI4/kPyFKr8+glQ=; h=From:To:CC:Subject:Date:In-Reply-To:References:From; b=KI2kOdoS2yg7Vh86md6sPK4vwrtx1rr8lkcPGelzTYN+V1wDwjhXbTPHfi1ayADmq SnuX6aSN1YRX5kU7TmhzwLcmfDlEAs/J/VhdqKM/lMU1vn2PXUcHsE0QM7qAuv739c MwIfQURAnIrUaVFqn6+icPbpvOOlVSZ/rhM39T1M= Received: from eusmges1new.samsung.com (unknown [203.254.199.242]) by eucas1p1.samsung.com (KnoxPortal) with ESMTP id 20230915095133eucas1p1fc05405567bd735c07ff213d3dff3c85~FCbG_4jQF2007620076eucas1p1T; Fri, 15 Sep 2023 09:51:33 +0000 (GMT) Received: from eucas1p2.samsung.com ( [182.198.249.207]) by eusmges1new.samsung.com (EUCPMTA) with SMTP id 63.AA.42423.52924056; Fri, 15 Sep 2023 10:51:33 +0100 (BST) Received: from eusmtrp2.samsung.com (unknown [182.198.249.139]) by eucas1p2.samsung.com (KnoxPortal) with ESMTPA id 20230915095133eucas1p267bade2888b7fcd2e1ea8e13e21c495f~FCbGdpbh00862308623eucas1p2c; Fri, 15 Sep 2023 09:51:33 +0000 (GMT) Received: from eusmgms1.samsung.com (unknown [182.198.249.179]) by eusmtrp2.samsung.com (KnoxPortal) with ESMTP id 20230915095133eusmtrp2540609b9602d103cb0a1998aeeb2d1b1~FCbGc8gqX1712217122eusmtrp2_; Fri, 15 Sep 2023 09:51:33 +0000 (GMT) X-AuditID: cbfec7f2-a3bff7000002a5b7-26-65042925ea5e Received: from eusmtip1.samsung.com ( [203.254.199.221]) by eusmgms1.samsung.com (EUCPMTA) with SMTP id 5B.3B.10549.52924056; Fri, 15 Sep 2023 10:51:33 +0100 (BST) Received: from CAMSVWEXC01.scsc.local (unknown [106.1.227.71]) by eusmtip1.samsung.com (KnoxPortal) with ESMTPA id 20230915095133eusmtip145b39d48e2d4c2b2f0f476f141a1971f~FCbGQl2U92594625946eusmtip1i; Fri, 15 Sep 2023 09:51:33 +0000 (GMT) Received: from CAMSVWEXC02.scsc.local (2002:6a01:e348::6a01:e348) by CAMSVWEXC01.scsc.local (2002:6a01:e347::6a01:e347) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 15 Sep 2023 10:51:32 +0100 Received: from CAMSVWEXC02.scsc.local ([::1]) by CAMSVWEXC02.scsc.local ([fe80::3c08:6c51:fa0a:6384%13]) with mapi id 15.00.1497.012; Fri, 15 Sep 2023 10:51:32 +0100 From: Daniel Gomez <da.gomez@samsung.com> To: "minchan@kernel.org" <minchan@kernel.org>, "senozhatsky@chromium.org" <senozhatsky@chromium.org>, "axboe@kernel.dk" <axboe@kernel.dk>, "djwong@kernel.org" <djwong@kernel.org>, "willy@infradead.org" <willy@infradead.org>, "hughd@google.com" <hughd@google.com>, "akpm@linux-foundation.org" <akpm@linux-foundation.org>, "mcgrof@kernel.org" <mcgrof@kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>, "linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>, "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org> CC: "gost.dev@samsung.com" <gost.dev@samsung.com>, Pankaj Raghav <p.raghav@samsung.com>, Daniel Gomez <da.gomez@samsung.com> Subject: [PATCH 6/6] shmem: add large folios support to the write path Thread-Topic: [PATCH 6/6] shmem: add large folios support to the write path Thread-Index: AQHZ57o0gOj42/h7V0WVSDdaHJaWqA== Date: Fri, 15 Sep 2023 09:51:31 +0000 Message-ID: <20230915095042.1320180-7-da.gomez@samsung.com> In-Reply-To: <20230915095042.1320180-1-da.gomez@samsung.com> Accept-Language: en-US, en-GB Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-messagesentrepresentingtype: 1 x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [106.110.32.103] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrKKsWRmVeSWpSXmKPExsWy7djP87qqmiypBpfmq1vMWb+GzWL13X42 i8tP+Cyefupjsdh7S9tiz96TLBaXd81hs7i35j+rxa4/O9gtbkx4ymix7Ot7dovdGxexWfz+ MYfNgddjdsNFFo8Fm0o9Nq/Q8rh8ttRj06pONo9Nnyaxe5yY8ZvF4/MmuQCOKC6blNSczLLU In27BK6M/d1mBbuEKv5se8bWwHiar4uRg0NCwETia5tOFyMXh5DACkaJncsWMkM4Xxgl9t9b xgLhfGaUeNZ6mxWmY94rFYj4ckaJoy1b2eCKfmyeDeWcYZRY8vYn1KyVjBLbZr0EmsXJwSag KbHv5CZ2kISIwGxWicOLOxhBEswCdRJrns0CKmLnEBZwk1ijDxIVEfCWeLFzCRuErSfx/tAb sGoWAVWJz7t+soNcxCtgLbH+VjaIySlgI/H+RzlIBaOArMSjlb/YIWaLS9x6Mp8JxJYQEJRY NHsPM4QtJvFv10M2CFtH4uz1J4wQtoHE1qX7WCBsJYk/HQuhbtSTuDF1ChuErS2xbOFrsDm8 QDNPznwCDiwJgSYuicZ7nawQzS4SvXtPQA0Vlnh1fAs7hC0j8X/nfKYJjNqzkNw3C8mOWUh2 zEKyYwEjyypG8dTS4tz01GLDvNRyveLE3OLSvHS95PzcTYzAlHb63/FPOxjnvvqod4iRiYPx EKMEB7OSCC+bLVOqEG9KYmVValF+fFFpTmrxIUZpDhYlcV5t25PJQgLpiSWp2ampBalFMFkm Dk6pBqaWRM5Pj4tPdNV4Tzrws1rjd/3C9dXPnLiL7z9YHLMk/25guu/P9V+Lpdj4Z2jzRDL9 l1n0MeamnZfEdI3PdaVHPr6/flR/4rROu4AjFyYG3lsqODM4gm3BhvUcbW9kdZkqJP++jO10 +TF3ofZvNbvbTXvvJSoXmcvaT9tXu1LT1zBiW03Dzis/1tsU297Ilzn5vvTou6anmTPCN1rK P1Nlets6Y5X0mUNc6/9nzXUwkfJ11t66P5G16WNq9pEzTc95wi/PNT3UxG786OrEkF3qgR8f b/Xfc2ZyZ7XDBLuIVTWL1wp/NF1wlt1HU1nSzTK1sWNiTcP7ne8S7FfxKm+enyh5R97rdfMO xZ+fHLmUWIozEg21mIuKEwGrauQe2AMAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrPKsWRmVeSWpSXmKPExsVy+t/xu7qqmiypBlf+cVjMWb+GzWL13X42 i8tP+Cyefupjsdh7S9tiz96TLBaXd81hs7i35j+rxa4/O9gtbkx4ymix7Ot7dovdGxexWfz+ MYfNgddjdsNFFo8Fm0o9Nq/Q8rh8ttRj06pONo9Nnyaxe5yY8ZvF4/MmuQCOKD2bovzSklSF jPziElulaEMLIz1DSws9IxNLPUNj81grI1MlfTublNSczLLUIn27BL2M/d1mBbuEKv5se8bW wHiar4uRg0NCwERi3iuVLkYuDiGBpYwSRx/fZOti5ASKy0hs/HKVFcIWlvhzrYsNougjo8S3 ta9ZQBJCAmcYJfo+VkMkVjJKHGvcBZZgE9CU2HdyEztIQkRgNqvE4cUdjCAJZoE6iTXPZgEV sXMIC7hJrNEHiYoIeEu82LmEDcLWk3h/6A1YNYuAqsTnXT/ZQQ7lFbCWWH8rG8QUAjJnTfcC MTkFbCTe/ygHKWYUkJV4tPIXO8QacYlbT+YzQVwvILFkz3lmCFtU4uXjf1Bf6Uicvf6EEcI2 kNi6dB8LhK0k8adjIdS5ehI3pk5hg7C1JZYtfA02h1dAUOLkzCcsExilZyFZNwtJyywkLbOQ tCxgZFnFKJJaWpybnltsqFecmFtcmpeul5yfu4kRmJK2Hfu5eQfjvFcf9Q4xMnEwHmKU4GBW EuFls2VKFeJNSaysSi3Kjy8qzUktPsRoCgygicxSosn5wKSYVxJvaGZgamhiZmlgamlmrCTO 61nQkSgkkJ5YkpqdmlqQWgTTx8TBKdXAFGb97Euc/wHZOwe2H2d42C6wT9inRO7LFrv2jz5P F+jw6Pd+rmiR+vWiN6yJcdZj/UWebt99mZZpm3FUc8u9iHs94eG6jMblr2Q0rq9tOuBysvnq HqNDv/XmeUz9sMmm9fl6u4v7WCZt+nauo1l/yR1BpU1qlornQqTCTdjnZiWU1r2rlODnvbzW y/y5mNgqoQcvG6a8O3E48N3Ed//ZoxIdvsRvEP6t4OX9WH7C2uygo9XeTnlK3J69P5QXenPc YjKPy5Hg+z9xZ/CXbD2xu1ucfZzSnzWK+9ucKK07vLfk5Url5KaYKNUCptfdS40XvdXc07yx feHuG8fPJVYcFP399u7Tmcvfdkzx73g/VYmlOCPRUIu5qDgRACdVhs7SAwAA X-CMS-MailID: 20230915095133eucas1p267bade2888b7fcd2e1ea8e13e21c495f X-Msg-Generator: CA X-RootMTR: 20230915095133eucas1p267bade2888b7fcd2e1ea8e13e21c495f X-EPHeader: CA CMS-TYPE: 201P X-CMS-RootMailID: 20230915095133eucas1p267bade2888b7fcd2e1ea8e13e21c495f References: <20230915095042.1320180-1-da.gomez@samsung.com> <CGME20230915095133eucas1p267bade2888b7fcd2e1ea8e13e21c495f@eucas1p2.samsung.com> Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Fri, 15 Sep 2023 02:55:31 -0700 (PDT) X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777130357035894073 X-GMAIL-MSGID: 1777130357035894073 |
Series |
shmem: high order folios support in write path
|
|
Commit Message
Daniel Gomez
Sept. 15, 2023, 9:51 a.m. UTC
Add large folio support for shmem write path matching the same high
order preference mechanism used for iomap buffered IO path as used in
__filemap_get_folio().
Use the __folio_get_max_order to get a hint for the order of the folio
based on file size which takes care of the mapping requirements.
Swap does not support high order folios for now, so make it order 0 in
case swap is enabled.
Signed-off-by: Daniel Gomez <da.gomez@samsung.com>
---
mm/shmem.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
Comments
On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > Add large folio support for shmem write path matching the same high > order preference mechanism used for iomap buffered IO path as used in > __filemap_get_folio(). > > Use the __folio_get_max_order to get a hint for the order of the folio > based on file size which takes care of the mapping requirements. > > Swap does not support high order folios for now, so make it order 0 in > case swap is enabled. I didn't take a close look at the series, but I am not sure I understand the rationale here. Reclaim will split high order shmem folios anyway, right? It seems like we only enable high order folios if the "noswap" mount option is used, which is fairly recent. I doubt it is widely used. > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > --- > mm/shmem.c | 16 +++++++++++++--- > 1 file changed, 13 insertions(+), 3 deletions(-) > > diff --git a/mm/shmem.c b/mm/shmem.c > index adff74751065..26ca555b1669 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -1683,13 +1683,19 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, > } > > static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode, > - pgoff_t index, bool huge, unsigned int *order) > + pgoff_t index, bool huge, unsigned int *order, > + struct shmem_sb_info *sbinfo) > { > struct shmem_inode_info *info = SHMEM_I(inode); > struct folio *folio; > int nr; > int err; > > + if (!sbinfo->noswap) > + *order = 0; > + else > + *order = (*order == 1) ? 0 : *order; > + > if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) > huge = false; > nr = huge ? HPAGE_PMD_NR : 1U << *order; > @@ -2032,6 +2038,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > return 0; > } > > + order = mapping_size_order(inode->i_mapping, index, len); > + > if (!shmem_is_huge(inode, index, false, > vma ? vma->vm_mm : NULL, vma ? vma->vm_flags : 0)) > goto alloc_nohuge; > @@ -2039,11 +2047,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > huge_gfp = vma_thp_gfp_mask(vma); > huge_gfp = limit_gfp_mask(huge_gfp, gfp); > folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true, > - &order); > + &order, sbinfo); > if (IS_ERR(folio)) { > alloc_nohuge: > folio = shmem_alloc_and_acct_folio(gfp, inode, index, false, > - &order); > + &order, sbinfo); > } > if (IS_ERR(folio)) { > int retry = 5; > @@ -2147,6 +2155,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > if (folio_test_large(folio)) { > folio_unlock(folio); > folio_put(folio); > + if (order > 0) > + order--; > goto alloc_nohuge; > } > unlock: > -- > 2.39.2 >
On Fri, Sep 15, 2023 at 11:26:37AM -0700, Yosry Ahmed wrote: > On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > Add large folio support for shmem write path matching the same high > > order preference mechanism used for iomap buffered IO path as used in > > __filemap_get_folio(). > > > > Use the __folio_get_max_order to get a hint for the order of the folio > > based on file size which takes care of the mapping requirements. > > > > Swap does not support high order folios for now, so make it order 0 in > > case swap is enabled. > > I didn't take a close look at the series, but I am not sure I > understand the rationale here. Reclaim will split high order shmem > folios anyway, right? For context, this is part of the enablement of large block sizes (LBS) effort [1][2][3], so the assumption here is that the kernel will reclaim memory with the same (large) block sizes that were written to the device. I'll add more context in the V2. [1] https://kernelnewbies.org/KernelProjects/large-block-size [2] https://docs.google.com/spreadsheets/d/e/2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8/pubhtml# [3] https://lore.kernel.org/all/ZQfbHloBUpDh+zCg@dread.disaster.area/ > > It seems like we only enable high order folios if the "noswap" mount > option is used, which is fairly recent. I doubt it is widely used. For now, I skipped the swap path as it currently lacks support for high order folios. But I'm currently looking into it as part of the LBS effort (please check spreadsheet at [2] for that). > > > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > > --- > > mm/shmem.c | 16 +++++++++++++--- > > 1 file changed, 13 insertions(+), 3 deletions(-) > > > > diff --git a/mm/shmem.c b/mm/shmem.c > > index adff74751065..26ca555b1669 100644 > > --- a/mm/shmem.c > > +++ b/mm/shmem.c > > @@ -1683,13 +1683,19 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, > > } > > > > static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode, > > - pgoff_t index, bool huge, unsigned int *order) > > + pgoff_t index, bool huge, unsigned int *order, > > + struct shmem_sb_info *sbinfo) > > { > > struct shmem_inode_info *info = SHMEM_I(inode); > > struct folio *folio; > > int nr; > > int err; > > > > + if (!sbinfo->noswap) > > + *order = 0; > > + else > > + *order = (*order == 1) ? 0 : *order; > > + > > if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) > > huge = false; > > nr = huge ? HPAGE_PMD_NR : 1U << *order; > > @@ -2032,6 +2038,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > > return 0; > > } > > > > + order = mapping_size_order(inode->i_mapping, index, len); > > + > > if (!shmem_is_huge(inode, index, false, > > vma ? vma->vm_mm : NULL, vma ? vma->vm_flags : 0)) > > goto alloc_nohuge; > > @@ -2039,11 +2047,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > > huge_gfp = vma_thp_gfp_mask(vma); > > huge_gfp = limit_gfp_mask(huge_gfp, gfp); > > folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true, > > - &order); > > + &order, sbinfo); > > if (IS_ERR(folio)) { > > alloc_nohuge: > > folio = shmem_alloc_and_acct_folio(gfp, inode, index, false, > > - &order); > > + &order, sbinfo); > > } > > if (IS_ERR(folio)) { > > int retry = 5; > > @@ -2147,6 +2155,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > > if (folio_test_large(folio)) { > > folio_unlock(folio); > > folio_put(folio); > > + if (order > 0) > > + order--; > > goto alloc_nohuge; > > } > > unlock: > > -- > > 2.39.2 > >
On Mon, Sep 18, 2023 at 1:00 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > On Fri, Sep 15, 2023 at 11:26:37AM -0700, Yosry Ahmed wrote: > > On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > Add large folio support for shmem write path matching the same high > > > order preference mechanism used for iomap buffered IO path as used in > > > __filemap_get_folio(). > > > > > > Use the __folio_get_max_order to get a hint for the order of the folio > > > based on file size which takes care of the mapping requirements. > > > > > > Swap does not support high order folios for now, so make it order 0 in > > > case swap is enabled. > > > > I didn't take a close look at the series, but I am not sure I > > understand the rationale here. Reclaim will split high order shmem > > folios anyway, right? > > For context, this is part of the enablement of large block sizes (LBS) > effort [1][2][3], so the assumption here is that the kernel will > reclaim memory with the same (large) block sizes that were written to > the device. > > I'll add more context in the V2. > > [1] https://kernelnewbies.org/KernelProjects/large-block-size > [2] https://docs.google.com/spreadsheets/d/e/2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8/pubhtml# > [3] https://lore.kernel.org/all/ZQfbHloBUpDh+zCg@dread.disaster.area/ > > > > It seems like we only enable high order folios if the "noswap" mount > > option is used, which is fairly recent. I doubt it is widely used. > > For now, I skipped the swap path as it currently lacks support for > high order folios. But I'm currently looking into it as part of the LBS > effort (please check spreadsheet at [2] for that). Thanks for the context, but I am not sure I understand. IIUC we are skipping allocating large folios in shmem if swap is enabled in this patch. Swap does not support swapping out large folios as a whole (except THPs), but page reclaim will split those large folios and swap them out as order-0 pages anyway. So I am not sure I understand why we need to skip allocating large folios if swap is enabled. > > > > > > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > > > --- > > > mm/shmem.c | 16 +++++++++++++--- > > > 1 file changed, 13 insertions(+), 3 deletions(-) > > > > > > diff --git a/mm/shmem.c b/mm/shmem.c > > > index adff74751065..26ca555b1669 100644 > > > --- a/mm/shmem.c > > > +++ b/mm/shmem.c > > > @@ -1683,13 +1683,19 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, > > > } > > > > > > static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode, > > > - pgoff_t index, bool huge, unsigned int *order) > > > + pgoff_t index, bool huge, unsigned int *order, > > > + struct shmem_sb_info *sbinfo) > > > { > > > struct shmem_inode_info *info = SHMEM_I(inode); > > > struct folio *folio; > > > int nr; > > > int err; > > > > > > + if (!sbinfo->noswap) > > > + *order = 0; > > > + else > > > + *order = (*order == 1) ? 0 : *order; > > > + > > > if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) > > > huge = false; > > > nr = huge ? HPAGE_PMD_NR : 1U << *order; > > > @@ -2032,6 +2038,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > > > return 0; > > > } > > > > > > + order = mapping_size_order(inode->i_mapping, index, len); > > > + > > > if (!shmem_is_huge(inode, index, false, > > > vma ? vma->vm_mm : NULL, vma ? vma->vm_flags : 0)) > > > goto alloc_nohuge; > > > @@ -2039,11 +2047,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > > > huge_gfp = vma_thp_gfp_mask(vma); > > > huge_gfp = limit_gfp_mask(huge_gfp, gfp); > > > folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true, > > > - &order); > > > + &order, sbinfo); > > > if (IS_ERR(folio)) { > > > alloc_nohuge: > > > folio = shmem_alloc_and_acct_folio(gfp, inode, index, false, > > > - &order); > > > + &order, sbinfo); > > > } > > > if (IS_ERR(folio)) { > > > int retry = 5; > > > @@ -2147,6 +2155,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > > > if (folio_test_large(folio)) { > > > folio_unlock(folio); > > > folio_put(folio); > > > + if (order > 0) > > > + order--; > > > goto alloc_nohuge; > > > } > > > unlock: > > > -- > > > 2.39.2 > > >
On Mon, Sep 18, 2023 at 11:55:34AM -0700, Yosry Ahmed wrote: > On Mon, Sep 18, 2023 at 1:00 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > On Fri, Sep 15, 2023 at 11:26:37AM -0700, Yosry Ahmed wrote: > > > On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > > > Add large folio support for shmem write path matching the same high > > > > order preference mechanism used for iomap buffered IO path as used in > > > > __filemap_get_folio(). > > > > > > > > Use the __folio_get_max_order to get a hint for the order of the folio > > > > based on file size which takes care of the mapping requirements. > > > > > > > > Swap does not support high order folios for now, so make it order 0 in > > > > case swap is enabled. > > > > > > I didn't take a close look at the series, but I am not sure I > > > understand the rationale here. Reclaim will split high order shmem > > > folios anyway, right? > > > > For context, this is part of the enablement of large block sizes (LBS) > > effort [1][2][3], so the assumption here is that the kernel will > > reclaim memory with the same (large) block sizes that were written to > > the device. > > > > I'll add more context in the V2. > > > > [1] https://protect2.fireeye.com/v1/url?k=a80aab33-c981be05-a80b207c-000babff9b5d-b656d8860b04562f&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fkernelnewbies.org%2FKernelProjects%2Flarge-block-size > > [2] https://protect2.fireeye.com/v1/url?k=3f753ca2-5efe2994-3f74b7ed-000babff9b5d-e678f885471555e3&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8%2Fpubhtml%23 > > [3] https://lore.kernel.org/all/ZQfbHloBUpDh+zCg@dread.disaster.area/ > > > > > > It seems like we only enable high order folios if the "noswap" mount > > > option is used, which is fairly recent. I doubt it is widely used. > > > > For now, I skipped the swap path as it currently lacks support for > > high order folios. But I'm currently looking into it as part of the LBS > > effort (please check spreadsheet at [2] for that). > > Thanks for the context, but I am not sure I understand. > > IIUC we are skipping allocating large folios in shmem if swap is > enabled in this patch. Swap does not support swapping out large folios > as a whole (except THPs), but page reclaim will split those large > folios and swap them out as order-0 pages anyway. So I am not sure I > understand why we need to skip allocating large folios if swap is > enabled. I lifted noswap condition and retested it again on top of 230918 and there is some regression. So, based on the results I guess the initial requirement may be the way to go. But what do you think? Here the logs: * shmem-large-folios-swap: https://gitlab.com/-/snippets/3600360 * shmem-baseline-swap : https://gitlab.com/-/snippets/3600362 -Failures: generic/080 generic/126 generic/193 generic/633 generic/689 -Failed 5 of 730 tests \ No newline at end of file +Failures: generic/080 generic/103 generic/126 generic/193 generic/285 generic/436 generic/619 generic/633 generic/689 +Failed 9 of 730 tests \ No newline at end of file > > > > > > > > > > > > > Signed-off-by: Daniel Gomez <da.gomez@samsung.com> > > > > --- > > > > mm/shmem.c | 16 +++++++++++++--- > > > > 1 file changed, 13 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/mm/shmem.c b/mm/shmem.c > > > > index adff74751065..26ca555b1669 100644 > > > > --- a/mm/shmem.c > > > > +++ b/mm/shmem.c > > > > @@ -1683,13 +1683,19 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, > > > > } > > > > > > > > static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode, > > > > - pgoff_t index, bool huge, unsigned int *order) > > > > + pgoff_t index, bool huge, unsigned int *order, > > > > + struct shmem_sb_info *sbinfo) > > > > { > > > > struct shmem_inode_info *info = SHMEM_I(inode); > > > > struct folio *folio; > > > > int nr; > > > > int err; > > > > > > > > + if (!sbinfo->noswap) > > > > + *order = 0; > > > > + else > > > > + *order = (*order == 1) ? 0 : *order; > > > > + > > > > if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) > > > > huge = false; > > > > nr = huge ? HPAGE_PMD_NR : 1U << *order; > > > > @@ -2032,6 +2038,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > > > > return 0; > > > > } > > > > > > > > + order = mapping_size_order(inode->i_mapping, index, len); > > > > + > > > > if (!shmem_is_huge(inode, index, false, > > > > vma ? vma->vm_mm : NULL, vma ? vma->vm_flags : 0)) > > > > goto alloc_nohuge; > > > > @@ -2039,11 +2047,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > > > > huge_gfp = vma_thp_gfp_mask(vma); > > > > huge_gfp = limit_gfp_mask(huge_gfp, gfp); > > > > folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true, > > > > - &order); > > > > + &order, sbinfo); > > > > if (IS_ERR(folio)) { > > > > alloc_nohuge: > > > > folio = shmem_alloc_and_acct_folio(gfp, inode, index, false, > > > > - &order); > > > > + &order, sbinfo); > > > > } > > > > if (IS_ERR(folio)) { > > > > int retry = 5; > > > > @@ -2147,6 +2155,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, > > > > if (folio_test_large(folio)) { > > > > folio_unlock(folio); > > > > folio_put(folio); > > > > + if (order > 0) > > > > + order--; > > > > goto alloc_nohuge; > > > > } > > > > unlock: > > > > -- > > > > 2.39.2 > > > >
On Tue, Sep 19, 2023 at 6:27 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > On Mon, Sep 18, 2023 at 11:55:34AM -0700, Yosry Ahmed wrote: > > On Mon, Sep 18, 2023 at 1:00 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > On Fri, Sep 15, 2023 at 11:26:37AM -0700, Yosry Ahmed wrote: > > > > On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > > > > > Add large folio support for shmem write path matching the same high > > > > > order preference mechanism used for iomap buffered IO path as used in > > > > > __filemap_get_folio(). > > > > > > > > > > Use the __folio_get_max_order to get a hint for the order of the folio > > > > > based on file size which takes care of the mapping requirements. > > > > > > > > > > Swap does not support high order folios for now, so make it order 0 in > > > > > case swap is enabled. > > > > > > > > I didn't take a close look at the series, but I am not sure I > > > > understand the rationale here. Reclaim will split high order shmem > > > > folios anyway, right? > > > > > > For context, this is part of the enablement of large block sizes (LBS) > > > effort [1][2][3], so the assumption here is that the kernel will > > > reclaim memory with the same (large) block sizes that were written to > > > the device. > > > > > > I'll add more context in the V2. > > > > > > [1] https://protect2.fireeye.com/v1/url?k=a80aab33-c981be05-a80b207c-000babff9b5d-b656d8860b04562f&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fkernelnewbies.org%2FKernelProjects%2Flarge-block-size > > > [2] https://protect2.fireeye.com/v1/url?k=3f753ca2-5efe2994-3f74b7ed-000babff9b5d-e678f885471555e3&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8%2Fpubhtml%23 > > > [3] https://lore.kernel.org/all/ZQfbHloBUpDh+zCg@dread.disaster.area/ > > > > > > > > It seems like we only enable high order folios if the "noswap" mount > > > > option is used, which is fairly recent. I doubt it is widely used. > > > > > > For now, I skipped the swap path as it currently lacks support for > > > high order folios. But I'm currently looking into it as part of the LBS > > > effort (please check spreadsheet at [2] for that). > > > > Thanks for the context, but I am not sure I understand. > > > > IIUC we are skipping allocating large folios in shmem if swap is > > enabled in this patch. Swap does not support swapping out large folios > > as a whole (except THPs), but page reclaim will split those large > > folios and swap them out as order-0 pages anyway. So I am not sure I > > understand why we need to skip allocating large folios if swap is > > enabled. > > I lifted noswap condition and retested it again on top of 230918 and > there is some regression. So, based on the results I guess the initial > requirement may be the way to go. But what do you think? > > Here the logs: > * shmem-large-folios-swap: https://gitlab.com/-/snippets/3600360 > * shmem-baseline-swap : https://gitlab.com/-/snippets/3600362 > > -Failures: generic/080 generic/126 generic/193 generic/633 generic/689 > -Failed 5 of 730 tests > \ No newline at end of file > +Failures: generic/080 generic/103 generic/126 generic/193 generic/285 generic/436 generic/619 generic/633 generic/689 > +Failed 9 of 730 tests > \ No newline at end of file > > I am not really familiar with these tests so I cannot really tell what's going on. I can see "swapfiles are not supported" in the logs though, so it seems like we are seeing extra failures by just lifting "noswap" even without actually swapping. I am curious if this is just hiding a different issue, I would at least try to understand what's happening. Anyway, I don't have enough context here to be useful. I was just making an observation about reclaim splitting shmem folios to swap them out as order-0 pages, and asking why this is needed based on that. I will leave it up to you and the reviewers to decide if there's anything interesting here.
On Tue, Sep 19, 2023 at 09:00:16AM -0700, Yosry Ahmed wrote: > On Tue, Sep 19, 2023 at 6:27 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > On Mon, Sep 18, 2023 at 11:55:34AM -0700, Yosry Ahmed wrote: > > > On Mon, Sep 18, 2023 at 1:00 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > > > On Fri, Sep 15, 2023 at 11:26:37AM -0700, Yosry Ahmed wrote: > > > > > On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > > > > > > > Add large folio support for shmem write path matching the same high > > > > > > order preference mechanism used for iomap buffered IO path as used in > > > > > > __filemap_get_folio(). > > > > > > > > > > > > Use the __folio_get_max_order to get a hint for the order of the folio > > > > > > based on file size which takes care of the mapping requirements. > > > > > > > > > > > > Swap does not support high order folios for now, so make it order 0 in > > > > > > case swap is enabled. > > > > > > > > > > I didn't take a close look at the series, but I am not sure I > > > > > understand the rationale here. Reclaim will split high order shmem > > > > > folios anyway, right? > > > > > > > > For context, this is part of the enablement of large block sizes (LBS) > > > > effort [1][2][3], so the assumption here is that the kernel will > > > > reclaim memory with the same (large) block sizes that were written to > > > > the device. > > > > > > > > I'll add more context in the V2. > > > > > > > > [1] https://protect2.fireeye.com/v1/url?k=a80aab33-c981be05-a80b207c-000babff9b5d-b656d8860b04562f&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fkernelnewbies.org%2FKernelProjects%2Flarge-block-size > > > > [2] https://protect2.fireeye.com/v1/url?k=3f753ca2-5efe2994-3f74b7ed-000babff9b5d-e678f885471555e3&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8%2Fpubhtml%23 > > > > [3] https://lore.kernel.org/all/ZQfbHloBUpDh+zCg@dread.disaster.area/ > > > > > > > > > > It seems like we only enable high order folios if the "noswap" mount > > > > > option is used, which is fairly recent. I doubt it is widely used. > > > > > > > > For now, I skipped the swap path as it currently lacks support for > > > > high order folios. But I'm currently looking into it as part of the LBS > > > > effort (please check spreadsheet at [2] for that). > > > > > > Thanks for the context, but I am not sure I understand. > > > > > > IIUC we are skipping allocating large folios in shmem if swap is > > > enabled in this patch. Swap does not support swapping out large folios > > > as a whole (except THPs), but page reclaim will split those large > > > folios and swap them out as order-0 pages anyway. So I am not sure I > > > understand why we need to skip allocating large folios if swap is > > > enabled. > > > > I lifted noswap condition and retested it again on top of 230918 and > > there is some regression. So, based on the results I guess the initial > > requirement may be the way to go. But what do you think? > > > > Here the logs: > > * shmem-large-folios-swap: https://gitlab.com/-/snippets/3600360 > > * shmem-baseline-swap : https://gitlab.com/-/snippets/3600362 > > > > -Failures: generic/080 generic/126 generic/193 generic/633 generic/689 > > -Failed 5 of 730 tests > > \ No newline at end of file > > +Failures: generic/080 generic/103 generic/126 generic/193 generic/285 generic/436 generic/619 generic/633 generic/689 > > +Failed 9 of 730 tests > > \ No newline at end of file > > > > > I am not really familiar with these tests so I cannot really tell > what's going on. I can see "swapfiles are not supported" in the logs > though, so it seems like we are seeing extra failures by just lifting > "noswap" even without actually swapping. I am curious if this is just > hiding a different issue, I would at least try to understand what's > happening. > > Anyway, I don't have enough context here to be useful. I was just > making an observation about reclaim splitting shmem folios to swap > them out as order-0 pages, and asking why this is needed based on > that. I will leave it up to you and the reviewers to decide if there's > anything interesting here. The tests which are failing seem be related to permissions, I could not immediate decipher why, because as you suggest we'd just be doing the silly thing of splitting large folios on writepage. I'd prefer we don't require swap until those regressions would be fixed. Note that part of the rationale to enable this work is to eventually also extend swap code to support large order folios, so it is not like this would be left as-is. It is just that it may take time to resolve the kinks with swap. So I'd stick to nowap for now. The above tests also don't stress swap too, and if we do that I would imagine we might see some other undesirable failures. Luis
On Tue, Sep 19, 2023 at 2:47 PM Luis Chamberlain <mcgrof@kernel.org> wrote: > > On Tue, Sep 19, 2023 at 09:00:16AM -0700, Yosry Ahmed wrote: > > On Tue, Sep 19, 2023 at 6:27 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > On Mon, Sep 18, 2023 at 11:55:34AM -0700, Yosry Ahmed wrote: > > > > On Mon, Sep 18, 2023 at 1:00 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > > > > > On Fri, Sep 15, 2023 at 11:26:37AM -0700, Yosry Ahmed wrote: > > > > > > On Fri, Sep 15, 2023 at 2:51 AM Daniel Gomez <da.gomez@samsung.com> wrote: > > > > > > > > > > > > > > Add large folio support for shmem write path matching the same high > > > > > > > order preference mechanism used for iomap buffered IO path as used in > > > > > > > __filemap_get_folio(). > > > > > > > > > > > > > > Use the __folio_get_max_order to get a hint for the order of the folio > > > > > > > based on file size which takes care of the mapping requirements. > > > > > > > > > > > > > > Swap does not support high order folios for now, so make it order 0 in > > > > > > > case swap is enabled. > > > > > > > > > > > > I didn't take a close look at the series, but I am not sure I > > > > > > understand the rationale here. Reclaim will split high order shmem > > > > > > folios anyway, right? > > > > > > > > > > For context, this is part of the enablement of large block sizes (LBS) > > > > > effort [1][2][3], so the assumption here is that the kernel will > > > > > reclaim memory with the same (large) block sizes that were written to > > > > > the device. > > > > > > > > > > I'll add more context in the V2. > > > > > > > > > > [1] https://protect2.fireeye.com/v1/url?k=a80aab33-c981be05-a80b207c-000babff9b5d-b656d8860b04562f&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fkernelnewbies.org%2FKernelProjects%2Flarge-block-size > > > > > [2] https://protect2.fireeye.com/v1/url?k=3f753ca2-5efe2994-3f74b7ed-000babff9b5d-e678f885471555e3&q=1&e=46666acf-d70d-4e8d-8d00-b027808ae400&u=https%3A%2F%2Fdocs.google.com%2Fspreadsheets%2Fd%2Fe%2F2PACX-1vS7sQfw90S00l2rfOKm83Jlg0px8KxMQE4HHp_DKRGbAGcAV-xu6LITHBEc4xzVh9wLH6WM2lR0cZS8%2Fpubhtml%23 > > > > > [3] https://lore.kernel.org/all/ZQfbHloBUpDh+zCg@dread.disaster.area/ > > > > > > > > > > > > It seems like we only enable high order folios if the "noswap" mount > > > > > > option is used, which is fairly recent. I doubt it is widely used. > > > > > > > > > > For now, I skipped the swap path as it currently lacks support for > > > > > high order folios. But I'm currently looking into it as part of the LBS > > > > > effort (please check spreadsheet at [2] for that). > > > > > > > > Thanks for the context, but I am not sure I understand. > > > > > > > > IIUC we are skipping allocating large folios in shmem if swap is > > > > enabled in this patch. Swap does not support swapping out large folios > > > > as a whole (except THPs), but page reclaim will split those large > > > > folios and swap them out as order-0 pages anyway. So I am not sure I > > > > understand why we need to skip allocating large folios if swap is > > > > enabled. > > > > > > I lifted noswap condition and retested it again on top of 230918 and > > > there is some regression. So, based on the results I guess the initial > > > requirement may be the way to go. But what do you think? > > > > > > Here the logs: > > > * shmem-large-folios-swap: https://gitlab.com/-/snippets/3600360 > > > * shmem-baseline-swap : https://gitlab.com/-/snippets/3600362 > > > > > > -Failures: generic/080 generic/126 generic/193 generic/633 generic/689 > > > -Failed 5 of 730 tests > > > \ No newline at end of file > > > +Failures: generic/080 generic/103 generic/126 generic/193 generic/285 generic/436 generic/619 generic/633 generic/689 > > > +Failed 9 of 730 tests > > > \ No newline at end of file > > > > > > > > I am not really familiar with these tests so I cannot really tell > > what's going on. I can see "swapfiles are not supported" in the logs > > though, so it seems like we are seeing extra failures by just lifting > > "noswap" even without actually swapping. I am curious if this is just > > hiding a different issue, I would at least try to understand what's > > happening. > > > > Anyway, I don't have enough context here to be useful. I was just > > making an observation about reclaim splitting shmem folios to swap > > them out as order-0 pages, and asking why this is needed based on > > that. I will leave it up to you and the reviewers to decide if there's > > anything interesting here. > > The tests which are failing seem be related to permissions, I could not > immediate decipher why, because as you suggest we'd just be doing the > silly thing of splitting large folios on writepage. > > I'd prefer we don't require swap until those regressions would be fixed. > > Note that part of the rationale to enable this work is to eventually > also extend swap code to support large order folios, so it is not like > this would be left as-is. It is just that it may take time to resolve > the kinks with swap. > > So I'd stick to nowap for now. > > The above tests also don't stress swap too, and if we do that I would > imagine we might see some other undesirable failures. > > Luis I thought we already have some notion of exercising swap with large shmem folios from THPs, so this shouldn't be new, but perhaps I am missing something.
diff --git a/mm/shmem.c b/mm/shmem.c index adff74751065..26ca555b1669 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1683,13 +1683,19 @@ static struct folio *shmem_alloc_folio(gfp_t gfp, } static struct folio *shmem_alloc_and_acct_folio(gfp_t gfp, struct inode *inode, - pgoff_t index, bool huge, unsigned int *order) + pgoff_t index, bool huge, unsigned int *order, + struct shmem_sb_info *sbinfo) { struct shmem_inode_info *info = SHMEM_I(inode); struct folio *folio; int nr; int err; + if (!sbinfo->noswap) + *order = 0; + else + *order = (*order == 1) ? 0 : *order; + if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) huge = false; nr = huge ? HPAGE_PMD_NR : 1U << *order; @@ -2032,6 +2038,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, return 0; } + order = mapping_size_order(inode->i_mapping, index, len); + if (!shmem_is_huge(inode, index, false, vma ? vma->vm_mm : NULL, vma ? vma->vm_flags : 0)) goto alloc_nohuge; @@ -2039,11 +2047,11 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, huge_gfp = vma_thp_gfp_mask(vma); huge_gfp = limit_gfp_mask(huge_gfp, gfp); folio = shmem_alloc_and_acct_folio(huge_gfp, inode, index, true, - &order); + &order, sbinfo); if (IS_ERR(folio)) { alloc_nohuge: folio = shmem_alloc_and_acct_folio(gfp, inode, index, false, - &order); + &order, sbinfo); } if (IS_ERR(folio)) { int retry = 5; @@ -2147,6 +2155,8 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index, if (folio_test_large(folio)) { folio_unlock(folio); folio_put(folio); + if (order > 0) + order--; goto alloc_nohuge; } unlock: