[v2,0/1] change ->index to PAGE_SIZE for hugetlb pages

Message ID 20230710230450.110064-1-sidhartha.kumar@oracle.com
Headers
Series change ->index to PAGE_SIZE for hugetlb pages |

Message

Sidhartha Kumar July 10, 2023, 11:04 p.m. UTC
  ========================== OVERVIEW ========================================
This patchset attempts to implement a listed filemap TODO which is
changing hugetlb folios to have ->index in PAGE_SIZE. This simplifies many
functions within filemap.c as they have to special case hugetlb pages.
From the RFC v1[1], Mike pointed out that hugetlb will still have to maintain
a huge page sized index as it is used for the reservation map and the hash
function for the hugetlb mutex table.

This patchset adds new wrappers for hugetlb code to to interact with the
page cache. These wrappers calculate a linear page index as this is now
what the page cache expects for hugetlb pages.

From the discussion on HGM for hugetlb[3], there is a want to remove hugetlb
special casing throughout the core mm code. This series accomplishes
a part of this by shifting complexity from filemap.c to hugetlb.c. There
are still checks for hugetlb within the filemap code as cgroup accounting
and hugetlb accounting are special cased as well. 

=========================== PERFORMANCE =====================================
6.4.0-rc5
@hugetlb_add_to_page_cache:                                                                                                                                                                                                     
[512, 1K)           7518 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|                                                                                                                                   
[1K, 2K)             158 |@                                                   |                                                                                                                                   
[2K, 4K)              30 |                                                    |                                                                                                                                   
[4K, 8K)               6 |                                                    |                                                                                                                                   
[8K, 16K)              9 |                                                    

6.5.0-rc1 + this patch series
@hugetlb_add_to_page_cache:                                                                                                                                                                                                     
[512, 1K)           6345 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|                                                                                                                                   
[1K, 2K)            1308 |@@@@@@@@@@                                          |                                                                                                                                   
[2K, 4K)              39 |                                                    |                                                                                                                                   
[4K, 8K)              20 |                                                    |                                                                                                                                   
[8K, 16K)              8 |                                                    |                                                                                                                                   
[16K, 32K)             1 |                                                    |  

6.4.0-rc5
@__filemap_get_folio:                                                                                                                                                                                                    
[256, 512)         11292 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|                                                                                                                                   
[512, 1K)           4615 |@@@@@@@@@@@@@@@@@@@@@                               |                                                                                                                                   
[1K, 2K)             960 |@@@@                                                |                                                                                                                                   
[2K, 4K)             188 |                                                    |                                                                                                                                   
[4K, 8K)              68 |                                                    |                                                                                                                                   
[8K, 16K)             14 |                                                    |                                                                                                                                   
[16K, 32K)             4 |                                                    |                                                                                                                                                                                                                                                                
[2G, 4G)               4 |                                                    |

6.5.0-rc1 + this patch series
@__filemap_get_folio:    
@get_folio_ns:                                                                                                                                                                                                    
[128, 256)            13 |                                                    |                                                                                                                                   
[256, 512)         11131 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[512, 1K)           5544 |@@@@@@@@@@@@@@@@@@@@@@@@@                           |
[1K, 2K)             996 |@@@@                                                |
[2K, 4K)             317 |@                                                   |
[4K, 8K)              76 |                                                    |
[8K, 16K)             23 |                                                    |
[16K, 32K)             6 |                                                    |
[32K, 64K)             1 |                                                    |
[64K, 128K)            0 |                                                    |
[128K, 256K)           0 |                                                    |
[256K, 512K)           0 |                                                    |
[512K, 1M)             0 |                                                    |
[1M, 2M)               0 |                                                    |
[2M, 4M)               0 |                                                    |
[4M, 8M)               0 |                                                    |
[8M, 16M)              0 |                                                    |
[16M, 32M)             0 |                                                    |
[32M, 64M)             0 |                                                    |
[64M, 128M)            0 |                                                    |
[128M, 256M)           0 |                                                    |
[256M, 512M)           0 |                                                    |
[512M, 1G)             0 |                                                    |
[1G, 2G)               0 |                                                    |
[2G, 4G)               3 |                                                    

=========================== TESTING ==========================================
This series passes the LTP hugetlb test cases as well as the libhugetlbfs tests.

********** TEST SUMMARY
*                      2M            
*                      32-bit 64-bit 
*     Total testcases:   110    113   
*             Skipped:     0      0   
*                PASS:   107    113   
*                FAIL:     0      0   
*    Killed by signal:     3      0   
*   Bad configuration:     0      0   
*       Expected FAIL:     0      0   
*     Unexpected PASS:     0      0   
*    Test not present:     0      0   
* Strange test result:     0      0   
**********



RFC v2[2]-> v1:
  -cleanup code style

RFC v1 -> v2
  -change direction of series to maintain both huge and base page size index
   rather than try to get rid of all references to a huge page sized index.

v1 -> v2
  - squash seperate filemap and hugetlb patches into one to allow for bisection
    per Matthew
  - get rid of page_to_index()
  - fix errors in hugetlb_fallocate() and remove_inode_hugepages()


rebased on 07/10/2023 mm-unstable


Sidhartha Kumar (1):
  mm/filemap: remove hugetlb special casing in filemap.c

 fs/hugetlbfs/inode.c    | 10 +++++-----
 include/linux/hugetlb.h | 12 ++++++++++++
 include/linux/pagemap.h | 26 ++------------------------
 mm/filemap.c            | 36 +++++++++++-------------------------
 mm/hugetlb.c            | 11 ++++++-----
 5 files changed, 36 insertions(+), 59 deletions(-)
  

Comments

Mike Kravetz July 20, 2023, midnight UTC | #1
On 07/10/23 16:04, Sidhartha Kumar wrote:
> ========================== OVERVIEW ========================================
> This patchset attempts to implement a listed filemap TODO which is
> changing hugetlb folios to have ->index in PAGE_SIZE. This simplifies many
> functions within filemap.c as they have to special case hugetlb pages.
> From the RFC v1[1], Mike pointed out that hugetlb will still have to maintain
> a huge page sized index as it is used for the reservation map and the hash
> function for the hugetlb mutex table.
> 
> This patchset adds new wrappers for hugetlb code to to interact with the
> page cache. These wrappers calculate a linear page index as this is now
> what the page cache expects for hugetlb pages.
> 
> From the discussion on HGM for hugetlb[3], there is a want to remove hugetlb
> special casing throughout the core mm code. This series accomplishes
> a part of this by shifting complexity from filemap.c to hugetlb.c. There
> are still checks for hugetlb within the filemap code as cgroup accounting
> and hugetlb accounting are special cased as well. 
> 
> =========================== PERFORMANCE =====================================

Hi Sid,

Sorry for being dense but can you tell me what the below performance
information means.  My concern with such a change would be any noticeable
difference in populating a large (up to TB) hugetlb file.  My guess is
that it is going to take longer unless xarray is optimized for this.

We do have users that create and pre-populate hugetlb files this big.
Just want to make sure there are no surprises for them.
  
Matthew Wilcox July 22, 2023, 4:05 a.m. UTC | #2
On Wed, Jul 19, 2023 at 05:00:11PM -0700, Mike Kravetz wrote:
> On 07/10/23 16:04, Sidhartha Kumar wrote:
> > ========================== OVERVIEW ========================================
> > This patchset attempts to implement a listed filemap TODO which is
> > changing hugetlb folios to have ->index in PAGE_SIZE. This simplifies many
> > functions within filemap.c as they have to special case hugetlb pages.
> > From the RFC v1[1], Mike pointed out that hugetlb will still have to maintain
> > a huge page sized index as it is used for the reservation map and the hash
> > function for the hugetlb mutex table.
> > 
> > This patchset adds new wrappers for hugetlb code to to interact with the
> > page cache. These wrappers calculate a linear page index as this is now
> > what the page cache expects for hugetlb pages.
> > 
> > From the discussion on HGM for hugetlb[3], there is a want to remove hugetlb
> > special casing throughout the core mm code. This series accomplishes
> > a part of this by shifting complexity from filemap.c to hugetlb.c. There
> > are still checks for hugetlb within the filemap code as cgroup accounting
> > and hugetlb accounting are special cased as well. 
> > 
> > =========================== PERFORMANCE =====================================
> 
> Hi Sid,
> 
> Sorry for being dense but can you tell me what the below performance
> information means.  My concern with such a change would be any noticeable
> difference in populating a large (up to TB) hugetlb file.  My guess is
> that it is going to take longer unless xarray is optimized for this.
> 
> We do have users that create and pre-populate hugetlb files this big.
> Just want to make sure there are no surprises for them.

It's Going To Depend.  Annoyingly.

Let's say you're using 1GB pages on a 4kB PAGE_SIZE machine.  That's an
order-18 folio, so we end up skipping three layers of the tree, and if
you're going up to 1TB, it's structured:

root -> node (shift 30) -> node (shift 24) -> entry
                                           -> entry (...)
			-> node (shift 24) -> entry
					   (...)
			(...)

This is essentially no different from before where each 1GB page would
occupy a single entry.  It's just that it now occupies 2^18 entries,
and everything in the tree has a different label.

Where you will (may?) see a difference is with the 2MB entries.
An order-9 page doesn't quite fit with the order-6 nodes in the tree,
so it looks like this:

root -> node (s30) -> node (s24) -> node (s18) -> node (s12) -> entry 0
							     -> sibling
							     -> sibling
							     (...)
							     -> entry 8
							     -> sibling
							     -> sibling
							     (...)

so all of a sudden the tree is 8x as big as it used to be.  The upside
is that we lose all the calculations from filemap.c/pagemap.h.  It's a
lot better than it was perhaps five years ago when each 2MB page would
occupy 512 entries, but 8 entries is still worse than 1.

Could we do better?  Undoubtedly.  We could have variable shifts & node
sizes in the tree so that we perhaps had an s18 node that was 8x as large
(4160 bytes), and then each order-9 entry in the tree would occupy one
entry in that special large node.  I've been reluctant to introduce such
a beast without strong evidence it would help.  Or we could introduce a
small s12 node which could only store 8 entries (again an order-9 entry
would occupy one entry in such a special node).

These are things which would only benefit hugetlbfs, so there's a bit
of a chicken-and-egg problem; no demand for the feature until the work
is done, and the work maybe performs badly until the feature exists.

And then some architectures have other orders for their huge pages.
Order 11 is probably the worst possibility to exist (or in general 6n -
1), but I haven't done a detailed survey to figure out if anyone supports
such a thing.