Message ID | 20231123133036.68540-1-gang.li@linux.dev |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce62:0:b0:403:3b70:6f57 with SMTP id o2csp450455vqx; Thu, 23 Nov 2023 05:37:12 -0800 (PST) X-Google-Smtp-Source: AGHT+IEBrukwy1r3Mq5ypJE1WTZXtsqQSiieeN05hn1YU/NFWEyaBraMHFjjgSC7E51SdiVfYWpN X-Received: by 2002:a05:6a21:6da6:b0:18a:b5c3:55c1 with SMTP id wl38-20020a056a216da600b0018ab5c355c1mr6506133pzb.57.1700746631775; Thu, 23 Nov 2023 05:37:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700746631; cv=none; d=google.com; s=arc-20160816; b=KoWAAkO3p+ksD20grKX20/FAhTkOU8ZTp5CKZJyzQwKRL+KWX3To9RIUz38nsyuhf4 WBBYgfdg4+4TqmAuqooMgM9if8I2fZAfVwVSLltEWj0c+CZjOANXGuzbRdTIfhbSqCvm G/OgPO3YcOE9S2eejcU0aNuJYzw7ibuyhJ6IwJeAxOJHh7iqjhrjr98OyBnk7ANj9YyS e0eJzLiQ6t3TxZgCplEZ2Tv0dCWbj/hOnppNYtWwnWClJYGbVO95NIsbP+szsWbih7qN hGe72dbemZ06zOnt2TGTGy/mv4h6ZfarLOlfimvSr+APJ14fxEfpAGgCnu2XW5WRfQRH B1IA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=HH76KM7ghHpttjrPF4dCFO56nBTHd4OZxFz/BIMAGKc=; fh=Mfp6tXIfhm0wKwJBfrCMEq/ET4PmqmqmqGe1ZOy3t1A=; b=QE+JR0jbl48UUp74alz2nZXMQT7a8tXR/Vk56Bg45dVad8Qq+GFbsNT7RHTzHDSfM5 mdaHQMuiCFA3to/WGCqROnkNCSjduc2ZXkDfwgvO0b8BpyQeaJUl3+x1brmom/TRaJdW U9wQokZZkmPoA3XNwH7NuQbs2eaDdi3YYxFtj9RLypzAshcyNoRu+5ulUIBWrk9e5Ehd t/WwICCeCnGlMUEwGGM7JLZ1uKomSpeTto//qZkuuGbHwmxVniBk1NcuxOvwfD69yT/d U+f5VO7zhnlr4I3iBVqTVFQhONEa6/bS/dPMf+trIXwh0QXOyeqygSsSoXfcU11YU3+u LZjQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=GXVeyiK0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id o21-20020a635a15000000b005b92b048254si1347968pgb.201.2023.11.23.05.37.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Nov 2023 05:37:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=GXVeyiK0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 3D37583014AB; Thu, 23 Nov 2023 05:36:57 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345544AbjKWNgg (ORCPT <rfc822;ouuuleilei@gmail.com> + 99 others); Thu, 23 Nov 2023 08:36:36 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47716 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345501AbjKWNg0 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 23 Nov 2023 08:36:26 -0500 Received: from out-175.mta0.migadu.com (out-175.mta0.migadu.com [IPv6:2001:41d0:1004:224b::af]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D77DD40 for <linux-kernel@vger.kernel.org>; Thu, 23 Nov 2023 05:36:31 -0800 (PST) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1700746263; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=HH76KM7ghHpttjrPF4dCFO56nBTHd4OZxFz/BIMAGKc=; b=GXVeyiK0rc7R4kF855jfXTv6slAM+iTN7yba/bhzrB51BelY7dLn+GMzVGY28gMMRACwqW wx6ZQf9hQFHsYu+ne6n85pObPvr8dJEkSruX6BUAdh8HNjSUAZECLbVBi3B9wZ2kh+GXGL 8UNAScpbb/anNZvItz/59uFMT0Fsij4= From: Gang Li <gang.li@linux.dev> To: Mike Kravetz <mike.kravetz@oracle.com>, Muchun Song <muchun.song@linux.dev>, Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Gang Li <ligang.bdlg@bytedance.com> Subject: [RFC PATCH v1 0/4] hugetlb: parallelize hugetlb page allocation on boot Date: Thu, 23 Nov 2023 21:30:32 +0800 Message-Id: <20231123133036.68540-1-gang.li@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Thu, 23 Nov 2023 05:36:57 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783362100388097819 X-GMAIL-MSGID: 1783362100388097819 |
Series |
hugetlb: parallelize hugetlb page allocation on boot
|
|
Message
Gang Li
Nov. 23, 2023, 1:30 p.m. UTC
From: Gang Li <ligang.bdlg@bytedance.com>
Inspired by these patches [1][2], this series aims to speed up the
initialization of hugetlb during the boot process through
parallelization.
It is particularly effective in large systems. On a machine equipped
with 1TB of memory and two NUMA nodes, the time for hugetlb
initialization was reduced from 2 seconds to 1 second.
In the future, as memory continues to grow, more and more time can
be saved.
This series currently focuses on optimizing 2MB hugetlb. Since
gigantic pages are few in number, their optimization effects
are not as pronounced. We may explore optimizations for
gigantic pages in the future.
Thanks,
Gang Li
Gang Li (4):
hugetlb: code clean for hugetlb_hstate_alloc_pages
hugetlb: split hugetlb_hstate_alloc_pages
hugetlb: add timing to hugetlb allocations on boot
hugetlb: parallelize hugetlb page allocation
mm/hugetlb.c | 191 ++++++++++++++++++++++++++++++++++++---------------
1 file changed, 134 insertions(+), 57 deletions(-)
Comments
On 23.11.23 14:30, Gang Li wrote: > From: Gang Li <ligang.bdlg@bytedance.com> > > Inspired by these patches [1][2], this series aims to speed up the > initialization of hugetlb during the boot process through > parallelization. > > It is particularly effective in large systems. On a machine equipped > with 1TB of memory and two NUMA nodes, the time for hugetlb > initialization was reduced from 2 seconds to 1 second. Sorry to say, but why is that a scenario worth adding complexity for / optimizing for? You don't cover that, so there is a clear lack in the motivation. 2 vs. 1 second on a 1 TiB system is usually really just noise.
On Thu, 23 Nov 2023, David Hildenbrand wrote: > On 23.11.23 14:30, Gang Li wrote: > > From: Gang Li <ligang.bdlg@bytedance.com> > > > > Inspired by these patches [1][2], this series aims to speed up the > > initialization of hugetlb during the boot process through > > parallelization. > > > > It is particularly effective in large systems. On a machine equipped > > with 1TB of memory and two NUMA nodes, the time for hugetlb > > initialization was reduced from 2 seconds to 1 second. > > Sorry to say, but why is that a scenario worth adding complexity for / > optimizing for? You don't cover that, so there is a clear lack in the > motivation. > > 2 vs. 1 second on a 1 TiB system is usually really just noise. > The cost will continue to grow over time, so I presume that Gang is trying to get out in front of the issue even though it may not be a large savings today. Running single boot tests, with the latest upstream kernel, allocating 1,440 1GB hugetlb pages on a 1.5TB AMD host appears to take 1.47s. But allocating 11,776 1GB hugetlb pages on a 12TB Intel host takes 65.2s today with the current implementation. So it's likely something worth optimizing. Gang, I'm curious about this in the cover letter: """ This series currently focuses on optimizing 2MB hugetlb. Since gigantic pages are few in number, their optimization effects are not as pronounced. We may explore optimizations for gigantic pages in the future. """ For >1TB hosts, why the emphasis on 2MB hugetlb? :) I would have expected 1GB pages. Are you really allocating ~500k 2MB hugetlb pages? So if the patchset optimizes for the more likely scenario on these large hosts, which would be 1GB pages, that would be great.
On 24.11.23 20:44, David Rientjes wrote: > On Thu, 23 Nov 2023, David Hildenbrand wrote: > >> On 23.11.23 14:30, Gang Li wrote: >>> From: Gang Li <ligang.bdlg@bytedance.com> >>> >>> Inspired by these patches [1][2], this series aims to speed up the >>> initialization of hugetlb during the boot process through >>> parallelization. >>> >>> It is particularly effective in large systems. On a machine equipped >>> with 1TB of memory and two NUMA nodes, the time for hugetlb >>> initialization was reduced from 2 seconds to 1 second. >> >> Sorry to say, but why is that a scenario worth adding complexity for / >> optimizing for? You don't cover that, so there is a clear lack in the >> motivation. >> >> 2 vs. 1 second on a 1 TiB system is usually really just noise. >> > > The cost will continue to grow over time, so I presume that Gang is trying > to get out in front of the issue even though it may not be a large savings > today. > > Running single boot tests, with the latest upstream kernel, allocating > 1,440 1GB hugetlb pages on a 1.5TB AMD host appears to take 1.47s. > > But allocating 11,776 1GB hugetlb pages on a 12TB Intel host takes 65.2s > today with the current implementation. And there, the 65.2s won't be noise because that 12TB system is up by a snap of a finger? :)
On Fri, 24 Nov 2023, David Hildenbrand wrote: > On 24.11.23 20:44, David Rientjes wrote: > > On Thu, 23 Nov 2023, David Hildenbrand wrote: > > > > > On 23.11.23 14:30, Gang Li wrote: > > > > From: Gang Li <ligang.bdlg@bytedance.com> > > > > > > > > Inspired by these patches [1][2], this series aims to speed up the > > > > initialization of hugetlb during the boot process through > > > > parallelization. > > > > > > > > It is particularly effective in large systems. On a machine equipped > > > > with 1TB of memory and two NUMA nodes, the time for hugetlb > > > > initialization was reduced from 2 seconds to 1 second. > > > > > > Sorry to say, but why is that a scenario worth adding complexity for / > > > optimizing for? You don't cover that, so there is a clear lack in the > > > motivation. > > > > > > 2 vs. 1 second on a 1 TiB system is usually really just noise. > > > > > > > The cost will continue to grow over time, so I presume that Gang is trying > > to get out in front of the issue even though it may not be a large savings > > today. > > > > Running single boot tests, with the latest upstream kernel, allocating > > 1,440 1GB hugetlb pages on a 1.5TB AMD host appears to take 1.47s. > > > > But allocating 11,776 1GB hugetlb pages on a 12TB Intel host takes 65.2s > > today with the current implementation. > > And there, the 65.2s won't be noise because that 12TB system is up by a snap > of a finger? :) > In this single boot test, total boot time was 373.78s, so 1GB hugetlb allocation is 17.4% of that. Would love to see what the numbers would look like if 1GB pages were supported.
On 2023/11/25 04:00, David Rientjes wrote: > On Fri, 24 Nov 2023, David Hildenbrand wrote: > >> And there, the 65.2s won't be noise because that 12TB system is up by a snap >> of a finger? :) >> > > In this single boot test, total boot time was 373.78s, so 1GB hugetlb > allocation is 17.4% of that. Thank you for sharing these data. Currently, I don't have access to a machine of such large capacity, so the benefits in my tests are not as pronounced. I believe testing on a system of this scale would yield significant benefits. > > Would love to see what the numbers would look like if 1GB pages were > supported. > Support for 1GB hugetlb is not yet perfect, so it wasn't included in v1. But I'm happy to refine and introduce 1GB hugetlb support in future versions.
Hi David Hildenbrand :), On 2023/11/23 22:10, David Hildenbrand wrote: > Sorry to say, but why is that a scenario worth adding complexity for / > optimizing for? You don't cover that, so there is a clear lack in the > motivation. Regarding your concern about complexity, this is indeed something to consider. There is a precedent of parallelization in pgdata[1] which might be reused (or other methods) to reduce the complexity of this series. [1] https://lore.kernel.org/all/20200527173608.2885243-1-daniel.m.jordan@oracle.com/
On 28.11.23 07:52, Gang Li wrote: > Hi David Hildenbrand :), > > On 2023/11/23 22:10, David Hildenbrand wrote: >> Sorry to say, but why is that a scenario worth adding complexity for / >> optimizing for? You don't cover that, so there is a clear lack in the >> motivation. > > Regarding your concern about complexity, this is indeed something to > consider. There is a precedent of parallelization in pgdata[1] which > might be reused (or other methods) to reduce the complexity of this > series. Yes, please!
On Tue, 28 Nov 2023, Gang Li wrote: > > > > > And there, the 65.2s won't be noise because that 12TB system is up by a > > > snap > > > of a finger? :) > > > > > > > In this single boot test, total boot time was 373.78s, so 1GB hugetlb > > allocation is 17.4% of that. > > Thank you for sharing these data. Currently, I don't have access to a machine > of such large capacity, so the benefits in my tests are not as pronounced. > > I believe testing on a system of this scale would yield significant benefits. > > > > > Would love to see what the numbers would look like if 1GB pages were > > supported. > > > > Support for 1GB hugetlb is not yet perfect, so it wasn't included in v1. But > I'm happy to refine and introduce 1GB hugetlb support in future versions. > That would be very appreciated, thank you! I'm happy to test and collect data for any proposed patch series on 12TB systems booted with a lot of 1GB hugetlb pages on the kernel command line.