[0/6] shmem: high order folios support in write path

Message ID	20230915095042.1320180-1-da.gomez@samsung.com
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; From: Daniel Gomez <da.gomez@samsung.com> To: "minchan@kernel.org" <minchan@kernel.org>, "senozhatsky@chromium.org" <senozhatsky@chromium.org>, "axboe@kernel.dk" <axboe@kernel.dk>, "djwong@kernel.org" <djwong@kernel.org>, "willy@infradead.org" <willy@infradead.org>, "hughd@google.com" <hughd@google.com>, "akpm@linux-foundation.org" <akpm@linux-foundation.org>, "mcgrof@kernel.org" <mcgrof@kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>, "linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>, "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>, "linux-mm@kvack.org" <linux-mm@kvack.org> CC: "gost.dev@samsung.com" <gost.dev@samsung.com>, Pankaj Raghav <p.raghav@samsung.com>, Daniel Gomez <da.gomez@samsung.com> Subject: [PATCH 0/6] shmem: high order folios support in write path Thread-Topic: [PATCH 0/6] shmem: high order folios support in write path Thread-Index: AQHZ57ou5Qxd50H2CkCAi/G4rRLLzQ== Date: Fri, 15 Sep 2023 09:51:21 +0000 Message-ID: <20230915095042.1320180-1-da.gomez@samsung.com> Accept-Language: en-US, en-GB Content-Language: en-US Content-Type: text/plain; charset="utf-8" Content-ID: <2175A30A7CB61244BB755A21411698ED@scsc.local> Content-Transfer-Encoding: base64 MIME-Version: 1.0 CMS-TYPE: 201P References: <CGME20230915095123eucas1p2c23d8a8d910f5a8e9fd077dd9579ad0a@eucas1p2.samsung.com> Precedence: bulk
Series	shmem: high order folios support in write path \| [0/6] shmem: high order folios support in write path [1/6] filemap: make the folio order calculation shareable [2/6] shmem: drop BLOCKS_PER_PAGE macro [3/6] shmem: account for large order folios [4/6] shmem: add order parameter support to shmem_alloc_folio [5/6] shmem: add file length in shmem_get_folio path [6/6] shmem: add large folios support to the write path

Message ID

20230915095042.1320180-1-da.gomez@samsung.com

Headers

Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::3:5 as permitted sender)
 client-ip=2620:137:e000::3:5;
From: Daniel Gomez <da.gomez@samsung.com>
To: "minchan@kernel.org" <minchan@kernel.org>,
        "senozhatsky@chromium.org" <senozhatsky@chromium.org>,
        "axboe@kernel.dk" <axboe@kernel.dk>,
        "djwong@kernel.org" <djwong@kernel.org>,
        "willy@infradead.org" <willy@infradead.org>,
        "hughd@google.com" <hughd@google.com>,
        "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
        "mcgrof@kernel.org" <mcgrof@kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
        "linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
        "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
        "linux-mm@kvack.org" <linux-mm@kvack.org>
CC: "gost.dev@samsung.com" <gost.dev@samsung.com>,
        Pankaj Raghav <p.raghav@samsung.com>,
        Daniel Gomez <da.gomez@samsung.com>
Subject: [PATCH 0/6] shmem: high order folios support in write path
Thread-Topic: [PATCH 0/6] shmem: high order folios support in write path
Thread-Index: AQHZ57ou5Qxd50H2CkCAi/G4rRLLzQ==
Date: Fri, 15 Sep 2023 09:51:21 +0000
Message-ID: <20230915095042.1320180-1-da.gomez@samsung.com>
Accept-Language: en-US, en-GB
Content-Language: en-US
x-ms-exchange-messagesentrepresentingtype: 1
x-ms-exchange-transport-fromentityheader: Hosted
x-originating-ip: [106.110.32.103]
Content-Type: text/plain; charset="utf-8"
Content-ID: <2175A30A7CB61244BB755A21411698ED@scsc.local>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
X-Brightmail-Tracker: 
 H4sIAAAAAAAAA+NgFrrGKsWRmVeSWpSXmKPExsWy7djP87rSmiypBnsvSFnMWb+GzWL13X42
        i8tP+Cyefupjsdh7S9tiz96TLBaXd81hs7i35j+rxa4/O9gtbkx4ymix7Ot7dovdGxexWfz+
        MYfNgddjdsNFFo8Fm0o9Nq/Q8rh8ttRj06pONo9Nnyaxe5yY8ZvF4/MmuQCOKC6blNSczLLU
        In27BK6M1+tWMRVcEK3o2X2BrYGxQ7SLkZNDQsBEYvfkxUxdjFwcQgIrGCV6lt5jg3C+MEp8
        2jkXyvnMKHH30gF2mJbbF9pZIBLLGSWeXdvPAle169dPdgjnDKPEgUmNzBDOSkaJmQc7WUH6
        2QQ0Jfad3ARWJSIwm1Xi8OIORpAEs0CdxJpns1hAbGEBZ4ljPXvBFooIeEgcnDiFFcLWk/h5
        4gcbiM0ioCrRu3E+WD2vgLXE/pt7weKMArISj1b+YoeYKS5x68l8JojDBSUWzd7DDGGLSfzb
        9ZANwtaROHv9CSOEbSCxdek+FghbSeJPx0KgOAfQHE2J9bv0IUZaSsy+uZMFwlaUmNL9kB3i
        BEGJkzOfgINCQuAXp8S5mcegdrlI3Fl6A2q+sMSr41ugASkj8X/nfKYJjNqzkJw6C2HdLCTr
        ZiFZNwvJugWMrKsYxVNLi3PTU4uN8lLL9YoTc4tL89L1kvNzNzECk97pf8e/7GBc/uqj3iFG
        Jg7GQ4wSHMxKIrxstkypQrwpiZVVqUX58UWlOanFhxilOViUxHm1bU8mCwmkJ5akZqemFqQW
        wWSZODilGpgi0y4kcx381Xpge1a2yc+F006+071tzvFc923ZMa4TBRX16yzdK5p1dx48uvuP
        rpjP7HaJxC8VfK8tlTWn+aa9/Po6i69htt+Vvrv+jhlpZY/Oz1sjs0ecdeWZsAk5/zSrVzVw
        brFMv7dTan3rgR1lFW85Or1qlwUYJGiuzNB3WcMtmWTwIeifo2fUpMpT04P/1V+5Iq4uZyxT
        W7KnWGKn2yZbxTOfmDwrD5xc/f3T9I/vDGsObX8j/s6zekbi7UMf3tm6n1QrcJhR+eWgpZxS
        85m2F/sNvt0Nip1od0n9sdvJQFXR9A8LHh1zzG9sua/+KXnWvdJapwlvo4NZA2q8zU4y5Vlv
        Y2mW8l2+5KQSS3FGoqEWc1FxIgD/Ko+J6QMAAA==
X-Brightmail-Tracker: 
 H4sIAAAAAAAAA+NgFjrKKsWRmVeSWpSXmKPExsVy+t/xe7pSmiypBmffWljMWb+GzWL13X42
        i8tP+Cyefupjsdh7S9tiz96TLBaXd81hs7i35j+rxa4/O9gtbkx4ymix7Ot7dovdGxexWfz+
        MYfNgddjdsNFFo8Fm0o9Nq/Q8rh8ttRj06pONo9Nnyaxe5yY8ZvF4/MmuQCOKD2bovzSklSF
        jPziElulaEMLIz1DSws9IxNLPUNj81grI1MlfTublNSczLLUIn27BL2M1+tWMRVcEK3o2X2B
        rYGxQ7SLkZNDQsBE4vaFdpYuRi4OIYGljBLr76xkgUjISGz8cpUVwhaW+HOtiw2i6COjRNvU
        JYwQzhlGiad9a5khnJWMEvuWHWYDaWET0JTYd3ITO0hCRGA2q8ThxR2MIAlmgTqJNc9mge0Q
        FnCWONazlx3EFhHwkDg4cQorhK0n8fPED7BBLAKqEr0b54PV8wpYS+y/uRcsziggK/Fo5S92
        iJniEreezGeCuFVAYsme88wQtqjEy8f/oH7QkTh7/QkjhG0gsXXpPqg/lST+dCwEinMAzdGU
        WL9LH2KkpcTsmztZIGxFiSndD9khThCUODnzCcsERqlZSDbPQuiehaR7FpLuWUi6FzCyrmIU
        SS0tzk3PLTbUK07MLS7NS9dLzs/dxAhMT9uO/dy8g3Heq496hxiZOBgPMUpwMCuJ8LLZMqUK
        8aYkVlalFuXHF5XmpBYfYjQFhtBEZinR5HxggswriTc0MzA1NDGzNDC1NDNWEuf1LOhIFBJI
        TyxJzU5NLUgtgulj4uCUamAKOxnHKukm475RyP7ZmsNbrjuJ7W/Y617xZmlodO7Mtne3fDvv
        rBV+dHvvhZMh7E+XzT1u271RLfOxSZGyW1Dp3skCD7/OeSqZ8uKIV9aEEr/zpgcnR/76NGvJ
        36Wx3vMSjN8I3r17Sun//K2yItLMJcGbQ4QnPxe9V8IdIJ95xqNFMd0y5YGm4+l85Umph3nq
        C89c4ffedmFCRkWLoK50IPfEz/sk/S5ff2DcKZn45NOswGzlrsfXd+vdnax+jktHJzH9w+oC
        lwofzV/m2Q94l3vM27RU/gDT19//pyjkZ2dd3dl8oebxdra9gkvTwxSe9x/Q2/vliM8c3b8i
        2suDcs9zSTgImF/P3ZTeZz1BiaU4I9FQi7moOBEAuIVg49gDAAA=
X-CMS-MailID: 20230915095123eucas1p2c23d8a8d910f5a8e9fd077dd9579ad0a
X-Msg-Generator: CA
X-RootMTR: 20230915095123eucas1p2c23d8a8d910f5a8e9fd077dd9579ad0a
X-EPHeader: CA
CMS-TYPE: 201P
X-CMS-RootMailID: 20230915095123eucas1p2c23d8a8d910f5a8e9fd077dd9579ad0a
References: 
 <CGME20230915095123eucas1p2c23d8a8d910f5a8e9fd077dd9579ad0a@eucas1p2.samsung.com>
X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable
	autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-Greylist: Sender passed SPF test,
 not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]);
 Fri, 15 Sep 2023 02:53:34 -0700 (PDT)
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1777139277747293138
X-GMAIL-MSGID: 1777144162459525235

Series

shmem: high order folios support in write path |

Message

Daniel Gomez Sept. 15, 2023, 9:51 a.m. UTC

  This series add support for high order folios in shmem write
path.

This is a continuation of the shmem work from Luis here [1]
following Matthew Wilcox's suggestion [2] regarding the path to take
for the folio allocation order calculation.

[1] RFC v2 add support for blocksize > PAGE_SIZE
https://lore.kernel.org/all/ZHBowMEDfyrAAOWH@bombadil.infradead.org/T/#md3e93ab46ce2ad9254e1eb54ffe71211988b5632
[2] https://lore.kernel.org/all/ZHD9zmIeNXICDaRJ@casper.infradead.org/

Patches have been tested and sent from next-230911. They do apply
cleanly to the latest next-230914.

fsx and fstests has been performed on tmpfs with noswap with the
following results:
- fsx: 2d test, 21,5B
- fstests: Same result as baseline for next-230911 [3][4][5]

[3] Baseline next-230911 failures are: generic/080 generic/126
generic/193 generic/633 generic/689
[4] fstests logs baseline: https://gitlab.com/-/snippets/3598621
[5] fstests logs patches: https://gitlab.com/-/snippets/3598628

There are at least 2 cases/topics to handle that I'd appreciate
feedback.
1. With the new strategy, you might end up with a folio order matching
HPAGE_PMD_ORDER. However, we won't respect the 'huge' flag anymore if
THP is enabled.
2. When the above (1.) occurs, the code skips the huge path, so
xa_find with hindex is skipped.

Daniel

Daniel Gomez (5):
  filemap: make the folio order calculation shareable
  shmem: drop BLOCKS_PER_PAGE macro
  shmem: add order parameter support to shmem_alloc_folio
  shmem: add file length in shmem_get_folio path
  shmem: add large folios support to the write path

Luis Chamberlain (1):
  shmem: account for large order folios

 fs/iomap/buffered-io.c   |  6 ++-
 include/linux/pagemap.h  | 42 ++++++++++++++++---
 include/linux/shmem_fs.h |  2 +-
 mm/filemap.c             |  8 ----
 mm/khugepaged.c          |  2 +-
 mm/shmem.c               | 91 +++++++++++++++++++++++++---------------
 6 files changed, 100 insertions(+), 51 deletions(-)

--
2.39.2

Comments

David Hildenbrand Sept. 15, 2023, 3:36 p.m. UTC | #1

On 15.09.23 17:34, Matthew Wilcox wrote:
> On Fri, Sep 15, 2023 at 05:29:51PM +0200, David Hildenbrand wrote:
>> On 15.09.23 11:51, Daniel Gomez wrote:
>>> This series add support for high order folios in shmem write
>>> path.
>>> There are at least 2 cases/topics to handle that I'd appreciate
>>> feedback.
>>> 1. With the new strategy, you might end up with a folio order matching
>>> HPAGE_PMD_ORDER. However, we won't respect the 'huge' flag anymore if
>>> THP is enabled.
>>> 2. When the above (1.) occurs, the code skips the huge path, so
>>> xa_find with hindex is skipped.
>>
>> Similar to large anon folios (but different to large non-shmem folios in the
>> pagecache), this can result in memory waste.
> 
> No, it can't.  This patchset triggers only on write, not on read or page
> fault, and it's conservative, so it will only allocate folios which are
> entirely covered by the write.  IOW this is memory we must allocate in
> order to satisfy the write; we're just allocating it in larger chunks
> when we can.

Oh, good! I was assuming you would eventually over-allocate on the write 
path.

Matthew Wilcox Sept. 15, 2023, 3:40 p.m. UTC | #2

On Fri, Sep 15, 2023 at 05:36:27PM +0200, David Hildenbrand wrote:
> On 15.09.23 17:34, Matthew Wilcox wrote:
> > No, it can't.  This patchset triggers only on write, not on read or page
> > fault, and it's conservative, so it will only allocate folios which are
> > entirely covered by the write.  IOW this is memory we must allocate in
> > order to satisfy the write; we're just allocating it in larger chunks
> > when we can.
> 
> Oh, good! I was assuming you would eventually over-allocate on the write
> path.

We might!  But that would be a different patchset, and it would be
subject to its own discussion.

Something else I've been wondering about is possibly reallocating the
pages on a write.  This would apply to both normal files and shmem.
If you read in a file one byte at a time, then overwrite a big chunk of
it with a large single write, that seems like a good signal that maybe
we should manage that part of the file as a single large chunk instead
of individual pages.  Maybe.

Lots of things for people who are obsessed with performance to play
with ;-)

David Hildenbrand Sept. 15, 2023, 3:43 p.m. UTC | #3

On 15.09.23 17:40, Matthew Wilcox wrote:
> On Fri, Sep 15, 2023 at 05:36:27PM +0200, David Hildenbrand wrote:
>> On 15.09.23 17:34, Matthew Wilcox wrote:
>>> No, it can't.  This patchset triggers only on write, not on read or page
>>> fault, and it's conservative, so it will only allocate folios which are
>>> entirely covered by the write.  IOW this is memory we must allocate in
>>> order to satisfy the write; we're just allocating it in larger chunks
>>> when we can.
>>
>> Oh, good! I was assuming you would eventually over-allocate on the write
>> path.
> 
> We might!  But that would be a different patchset, and it would be
> subject to its own discussion.
> 
> Something else I've been wondering about is possibly reallocating the
> pages on a write.  This would apply to both normal files and shmem.
> If you read in a file one byte at a time, then overwrite a big chunk of
> it with a large single write, that seems like a good signal that maybe
> we should manage that part of the file as a single large chunk instead
> of individual pages.  Maybe.
> 
> Lots of things for people who are obsessed with performance to play
> with ;-)

:) Absolutely. ... because if nobody will be consuming that written 
memory any time soon, it might also be the wrong place for a large/huge 
folio.

Daniel Gomez Sept. 18, 2023, 7:32 a.m. UTC | #4

On Fri, Sep 15, 2023 at 05:29:51PM +0200, David Hildenbrand wrote:
> On 15.09.23 11:51, Daniel Gomez wrote:
> > This series add support for high order folios in shmem write
> > path.
> >
> > This is a continuation of the shmem work from Luis here [1]
> > following Matthew Wilcox's suggestion [2] regarding the path to take
> > for the folio allocation order calculation.
> >
> > [1] RFC v2 add support for blocksize > PAGE_SIZE
> > https://lore.kernel.org/all/ZHBowMEDfyrAAOWH@bombadil.infradead.org/T/#md3e93ab46ce2ad9254e1eb54ffe71211988b5632
> > [2] https://lore.kernel.org/all/ZHD9zmIeNXICDaRJ@casper.infradead.org/
> >
> > Patches have been tested and sent from next-230911. They do apply
> > cleanly to the latest next-230914.
> >
> > fsx and fstests has been performed on tmpfs with noswap with the
> > following results:
> > - fsx: 2d test, 21,5B
> > - fstests: Same result as baseline for next-230911 [3][4][5]
> >
> > [3] Baseline next-230911 failures are: generic/080 generic/126
> > generic/193 generic/633 generic/689
> > [4] fstests logs baseline: https://gitlab.com/-/snippets/3598621
> > [5] fstests logs patches: https://gitlab.com/-/snippets/3598628
> >
> > There are at least 2 cases/topics to handle that I'd appreciate
> > feedback.
> > 1. With the new strategy, you might end up with a folio order matching
> > HPAGE_PMD_ORDER. However, we won't respect the 'huge' flag anymore if
> > THP is enabled.
> > 2. When the above (1.) occurs, the code skips the huge path, so
> > xa_find with hindex is skipped.
>
> Similar to large anon folios (but different to large non-shmem folios in the
> pagecache), this can result in memory waste.
>
> We discussed that topic in the last bi-weekly mm meeting, and also how to
> eventually configure that for shmem.
>
> Refer to of a summary. [1]
>
> [1] https://lkml.kernel.org/r/4966f496-9f71-460c-b2ab-8661384ce626@arm.com

Thanks for the summary David (I was missing linux-MM from kvack in lei).

I think the PMD_ORDER-1 as max would suffice here to honor/respect the
huge flag. Although, we would end up having a different max value
than pagecache/readahead.
>
> --
> Cheers,
>
> David / dhildenb
>