Message ID | 20230307082811.120774-1-chenjun102@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp2317377wrd; Tue, 7 Mar 2023 00:45:23 -0800 (PST) X-Google-Smtp-Source: AK7set9qN729KztC+nBgCUAzXb9UcDm/rBy5ixK70YVrOxYd+OremOelIdxWlbgTJOFiY0ohVJee X-Received: by 2002:a62:cf81:0:b0:5dd:3f84:8c15 with SMTP id b123-20020a62cf81000000b005dd3f848c15mr11634298pfg.24.1678178723224; Tue, 07 Mar 2023 00:45:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678178723; cv=none; d=google.com; s=arc-20160816; b=wgAMJQrbW6IuQL8KGZpQY7aeUsQqSFo6+1NgsEccuFJyjMiyeermlzjulqY1ZBgAUo 0IofI77AQfKuk5x5yNP9S3q8f9VtAyfRv3rXP2SosMu9xu1M4wiWtJB3Wrk+uckO2ejw VaDciJAJZAiJydksPx2ljAiEOPEDT8zEg9cM//znjTlMioz8UmtDEwrQgn/A6qEZBcvP D9HDHAKDvM+x9BwRPw8nz7v/pnkqb5P3ywx2Qu9okSR3l18CkQX+Gw1c0vjL44PsqZdn 98qOE8oJ3wvqgmFlo+aoKY4fdV6aHOdZDrpPXUy3BYVZQCQyNVw7y2NWB6P4gA8ZBB+e MPfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:subject:cc:to:from; bh=QVSQCxg7uB4clDFeOJf9V/12f3MfRqSrcIpe/ZmBDa8=; b=YD2HPqbuluStJ2ThRpdEj9h/ZROYonGZqaw0I6cX8ThoHIjo2kn8VhXMz/VrsgLknT c5mOuDX6V+pamKIyzXzDp3A2JeBgROOO391en1t154P5Jg8ygglh2U0ceUjGPDHum3wB 5312InJFEkLbLbYR/1tQVtJCK+A6UE7+zgmpcDtx3Y/NLTckHZw+trSTM83bzQakq9zD s951JoMkCSesqhUchbMNm2/615yFd3Mn06OPemaTvdwuHOEl4TIyvz2xKW5ZWpw3+G8n NUwKdD17mkc4nTDY4QWM4rmYuoaB2MGScdZjwValDindumZymNp8IpFzaG4SsezY9dg7 mmAg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 188-20020a6300c5000000b004fc4c511c57si4890020pga.307.2023.03.07.00.45.10; Tue, 07 Mar 2023 00:45:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229742AbjCGIbQ (ORCPT <rfc822;toshivichauhan@gmail.com> + 99 others); Tue, 7 Mar 2023 03:31:16 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229646AbjCGIbK (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 7 Mar 2023 03:31:10 -0500 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B949E4B824 for <linux-kernel@vger.kernel.org>; Tue, 7 Mar 2023 00:31:05 -0800 (PST) Received: from dggpemm500006.china.huawei.com (unknown [172.30.72.55]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4PW7rq1FCbzKq79; Tue, 7 Mar 2023 16:28:59 +0800 (CST) Received: from mdc.huawei.com (10.175.112.208) by dggpemm500006.china.huawei.com (7.185.36.236) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21; Tue, 7 Mar 2023 16:31:03 +0800 From: Chen Jun <chenjun102@huawei.com> To: <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org>, <cl@linux.com>, <penberg@kernel.org>, <rientjes@google.com>, <iamjoonsoo.kim@lge.com>, <akpm@linux-foundation.org>, <vbabka@suse.cz> CC: <xuqiang36@huawei.com>, <chenjun102@huawei.com> Subject: [RFC] mm/slub: Reduce memory consumption in extreme scenarios Date: Tue, 7 Mar 2023 08:28:11 +0000 Message-ID: <20230307082811.120774-1-chenjun102@huawei.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.175.112.208] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To dggpemm500006.china.huawei.com (7.185.36.236) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759697932990018165?= X-GMAIL-MSGID: =?utf-8?q?1759697932990018165?= |
Series |
[RFC] mm/slub: Reduce memory consumption in extreme scenarios
|
|
Commit Message
Chen Jun
March 7, 2023, 8:28 a.m. UTC
If call kmalloc_node with NO __GFP_THISNODE and node[A] with no memory.
Slub will alloc a slub page which is not belong to A, and put the page
to kmem_cache_node[page_to_nid(page)]. The page can not be reused
at next calling, because NULL will be get from get_partical().
That make kmalloc_node consume more memory.
On qemu with 4 numas and each numa has 1G memory, Write a test ko
to call kmalloc_node(196, 0xd20c0, 3) for 5 * 1024 * 1024 times.
cat /proc/slabinfo shows:
kmalloc-256 4302317 15151808 256 32 2 : tunables..
the total objects is much more then active objects.
After this patch, cat /prac/slubinfo shows:
kmalloc-256 5244950 5245088 256 32 2 : tunables..
Signed-off-by: Chen Jun <chenjun102@huawei.com>
---
mm/slub.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
Comments
On Tue, Mar 07, 2023 at 08:28:11AM +0000, Chen Jun wrote: > If call kmalloc_node with NO __GFP_THISNODE and node[A] with no memory. > Slub will alloc a slub page which is not belong to A, and put the page > to kmem_cache_node[page_to_nid(page)]. The page can not be reused > at next calling, because NULL will be get from get_partical(). > That make kmalloc_node consume more memory. Hello, elaborating a little bit: "When kmalloc_node() is called without __GFP_THISNODE and the target node lacks sufficient memory, SLUB allocates a folio from a different node other than the requested node, instead of taking a partial slab from it. However, since the allocated folio does not belong to the requested node, it is deactivated and added to the partial slab list of the node it belongs to. This behavior can result in excessive memory usage when the requested node has insufficient memory, as SLUB will repeatedly allocate folios from other nodes without reusing the previously allocated ones. To prevent memory wastage, take a partial slab from a different node when the requested node has no partial slab and __GFP_THISNODE is not explicitly specified." > On qemu with 4 numas and each numa has 1G memory, Write a test ko > to call kmalloc_node(196, 0xd20c0, 3) for 5 * 1024 * 1024 times. > > cat /proc/slabinfo shows: > kmalloc-256 4302317 15151808 256 32 2 : tunables.. > > the total objects is much more then active objects. > > After this patch, cat /prac/slubinfo shows: > kmalloc-256 5244950 5245088 256 32 2 : tunables.. > > Signed-off-by: Chen Jun <chenjun102@huawei.com> > --- > mm/slub.c | 17 ++++++++++++++--- > 1 file changed, 14 insertions(+), 3 deletions(-) > > diff --git a/mm/slub.c b/mm/slub.c > index 39327e98fce3..c0090a5de54e 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -2384,7 +2384,7 @@ static void *get_partial(struct kmem_cache *s, int node, struct partial_context > searchnode = numa_mem_id(); > > object = get_partial_node(s, get_node(s, searchnode), pc); > - if (object || node != NUMA_NO_NODE) > + if (object || (node != NUMA_NO_NODE && (pc->flags & __GFP_THISNODE))) > return object; I think the problem here is to avoid taking a partial slab from different node than the requested node even if __GFP_THISNODE is not set. (and then allocating new slab instead) Thus this hunk makes sense to me, even if SLUB currently do not implement __GFP_THISNODE semantics. > return get_any_partial(s, pc); > @@ -3069,6 +3069,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > struct slab *slab; > unsigned long flags; > struct partial_context pc; > + int try_thisndoe = 0; > > > stat(s, ALLOC_SLOWPATH); > > @@ -3181,8 +3182,12 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > } > > new_objects: > - > pc.flags = gfpflags; > + > + /* Try to get page from specific node even if __GFP_THISNODE is not set */ > + if (node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE) && try_thisnode) > + pc.flags |= __GFP_THISNODE; > + > pc.slab = &slab; > pc.orig_size = orig_size; > freelist = get_partial(s, node, &pc); > @@ -3190,10 +3195,16 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > goto check_new_slab; > > slub_put_cpu_ptr(s->cpu_slab); > - slab = new_slab(s, gfpflags, node); > + slab = new_slab(s, pc.flags, node); > c = slub_get_cpu_ptr(s->cpu_slab); > > if (unlikely(!slab)) { > + /* Try to get page from any other node */ > + if (node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE) && try_thisnode) { > + try_thisnode = 0; > + goto new_objects; > + } > + > slab_out_of_memory(s, gfpflags, node); > return NULL; But these hunks do not make sense to me. Why force __GFP_THISNODE even when the caller did not specify it? (Apart from the fact that try_thisnode is defined as try_thisndoe, and try_thisnode is never set to nonzero value.) IMHO the first hunk is enough to solve the problem. Thanks, Hyeonggon > } > -- > 2.17.1 > >
On Wed, Mar 08, 2023 at 07:16:49AM +0000, chenjun (AM) wrote: > Hi, > > Thanks for reply. > > 在 2023/3/7 22:20, Hyeonggon Yoo 写道: > > On Tue, Mar 07, 2023 at 08:28:11AM +0000, Chen Jun wrote: > >> If call kmalloc_node with NO __GFP_THISNODE and node[A] with no memory. > >> Slub will alloc a slub page which is not belong to A, and put the page > >> to kmem_cache_node[page_to_nid(page)]. The page can not be reused > >> at next calling, because NULL will be get from get_partical(). > >> That make kmalloc_node consume more memory. > > > > Hello, > > > > elaborating a little bit: > > > > "When kmalloc_node() is called without __GFP_THISNODE and the target node > > lacks sufficient memory, SLUB allocates a folio from a different node other > > than the requested node, instead of taking a partial slab from it. > > > > However, since the allocated folio does not belong to the requested node, > > it is deactivated and added to the partial slab list of the node it > > belongs to. > > > > This behavior can result in excessive memory usage when the requested > > node has insufficient memory, as SLUB will repeatedly allocate folios from > > other nodes without reusing the previously allocated ones. > > > > To prevent memory wastage, take a partial slab from a different node when > > the requested node has no partial slab and __GFP_THISNODE is not explicitly > > specified." > > > > Thanks, This is more clear than what I described. > > >> On qemu with 4 numas and each numa has 1G memory, Write a test ko > >> to call kmalloc_node(196, 0xd20c0, 3) for 5 * 1024 * 1024 times. > >> > >> cat /proc/slabinfo shows: > >> kmalloc-256 4302317 15151808 256 32 2 : tunables.. > >> > >> the total objects is much more then active objects. > >> > >> After this patch, cat /prac/slubinfo shows: > >> kmalloc-256 5244950 5245088 256 32 2 : tunables.. > >> > >> Signed-off-by: Chen Jun <chenjun102@huawei.com> > >> --- > >> mm/slub.c | 17 ++++++++++++++--- > >> 1 file changed, 14 insertions(+), 3 deletions(-) > >> > >> diff --git a/mm/slub.c b/mm/slub.c > >> index 39327e98fce3..c0090a5de54e 100644 > >> --- a/mm/slub.c > >> +++ b/mm/slub.c > >> @@ -2384,7 +2384,7 @@ static void *get_partial(struct kmem_cache *s, int node, struct partial_context > >> searchnode = numa_mem_id(); > >> > >> object = get_partial_node(s, get_node(s, searchnode), pc); > >> - if (object || node != NUMA_NO_NODE) > >> + if (object || (node != NUMA_NO_NODE && (pc->flags & __GFP_THISNODE))) > >> return object; > > > > I think the problem here is to avoid taking a partial slab from > > different node than the requested node even if __GFP_THISNODE is not set. > > (and then allocating new slab instead) > > > > Thus this hunk makes sense to me, > > even if SLUB currently do not implement __GFP_THISNODE semantics. > > > >> return get_any_partial(s, pc); > >> @@ -3069,6 +3069,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > >> struct slab *slab; > >> unsigned long flags; > >> struct partial_context pc; > >> + int try_thisndoe = 0; > >> > >> > >> stat(s, ALLOC_SLOWPATH); > >> > >> @@ -3181,8 +3182,12 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > >> } > >> > >> new_objects: > >> - > >> pc.flags = gfpflags; > >> + > >> + /* Try to get page from specific node even if __GFP_THISNODE is not set */ > >> + if (node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE) && try_thisnode) > >> + pc.flags |= __GFP_THISNODE; > >> + > >> pc.slab = &slab; > >> pc.orig_size = orig_size; > >> freelist = get_partial(s, node, &pc); > >> @@ -3190,10 +3195,16 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > >> goto check_new_slab; > >> > >> slub_put_cpu_ptr(s->cpu_slab); > >> - slab = new_slab(s, gfpflags, node); > >> + slab = new_slab(s, pc.flags, node); > >> c = slub_get_cpu_ptr(s->cpu_slab); > >> > >> if (unlikely(!slab)) { > >> + /* Try to get page from any other node */ > >> + if (node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE) && try_thisnode) { > >> + try_thisnode = 0; > >> + goto new_objects; > >> + } > >> + > >> slab_out_of_memory(s, gfpflags, node); > >> return NULL; > > > > But these hunks do not make sense to me. > > Why force __GFP_THISNODE even when the caller did not specify it? > > > > (Apart from the fact that try_thisnode is defined as try_thisndoe, > > and try_thisnode is never set to nonzero value.) > > My mistake, It should be: > int try_thisnode = 0; I think it should be try_thisnode = 1? Otherwise it won't be executed at all. Also bool type will be more readable than int. > > > > > IMHO the first hunk is enough to solve the problem. > > I think, we should try to alloc a page on the target node before getting > one from other nodes' partial. You are right. Hmm then the new behavior when (node != NUMA_NO_NODE) && (gfpflags & __GFP_THISNODE) is: 1) try to get a partial slab from target node with __GFP_THISNODE 2) if 1) failed, try to allocate a new slab from target node with __GFP_THISNODE 3) if 2) failed, retry 1) and 2) without __GFP_THISNODE constraint when node != NUMA_NO_NODE || (gfpflags & __GFP_THISNODE), the behavior remains unchanged. It does not look that crazy to me, although it complicates the code a little bit. (Vlastimil may have some opinions?) Now that I understand your intention, I think this behavior change also need to be added to the commit log. Thanks, Hyeonggon > If the caller does not specify __GFP_THISNODE, we add __GFP_THISNODE to > try to get the slab only on the target node. If it fails, use the > original GFP FLAG to allow fallback.
diff --git a/mm/slub.c b/mm/slub.c index 39327e98fce3..c0090a5de54e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2384,7 +2384,7 @@ static void *get_partial(struct kmem_cache *s, int node, struct partial_context searchnode = numa_mem_id(); object = get_partial_node(s, get_node(s, searchnode), pc); - if (object || node != NUMA_NO_NODE) + if (object || (node != NUMA_NO_NODE && (pc->flags & __GFP_THISNODE))) return object; return get_any_partial(s, pc); @@ -3069,6 +3069,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, struct slab *slab; unsigned long flags; struct partial_context pc; + int try_thisndoe = 0; stat(s, ALLOC_SLOWPATH); @@ -3181,8 +3182,12 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, } new_objects: - pc.flags = gfpflags; + + /* Try to get page from specific node even if __GFP_THISNODE is not set */ + if (node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE) && try_thisnode) + pc.flags |= __GFP_THISNODE; + pc.slab = &slab; pc.orig_size = orig_size; freelist = get_partial(s, node, &pc); @@ -3190,10 +3195,16 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, goto check_new_slab; slub_put_cpu_ptr(s->cpu_slab); - slab = new_slab(s, gfpflags, node); + slab = new_slab(s, pc.flags, node); c = slub_get_cpu_ptr(s->cpu_slab); if (unlikely(!slab)) { + /* Try to get page from any other node */ + if (node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE) && try_thisnode) { + try_thisnode = 0; + goto new_objects; + } + slab_out_of_memory(s, gfpflags, node); return NULL; }