Message ID | 20230408142517.800549-1-qiang1.zhang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp908709vqo; Sat, 8 Apr 2023 07:35:52 -0700 (PDT) X-Google-Smtp-Source: AKy350Y1XYmRNaTHiy6dXmZPfY/1cUOPXCfD/edkIeHqHM34PjjNhYqjCDkY6ZRfY0bTzoK1Y4pQ X-Received: by 2002:a62:1745:0:b0:634:10a8:5387 with SMTP id 66-20020a621745000000b0063410a85387mr1342917pfx.24.1680964552666; Sat, 08 Apr 2023 07:35:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680964552; cv=none; d=google.com; s=arc-20160816; b=uEMz4TEfkNdNBFmScolRDIp8XFgBeNjEkg23d6Qc+t1tzJXTJxX/K10bKN8+7Ak6Rv YJjqjyH4OXq3ZTwEil2omoSLxIDOLc1PygskF35ufDeZAGW7aNfqwmuoHCcq6TjZBlnG nd53lZhwwzdOUyliG02KnOCPsfnmiKl3xfyHb9ZT018n9eadlMPO49XNoCDwPHprh8pR cmX99RcFzoXYTfER9qw0tse5TvJ9W1EKc6uhdKnGF4ULC5DwQaE4ja4UrkrHWX0Dv4n5 3OYrCh2Frc07/uRa0V1I7i2jVwAombiDtxSSsddFR+aCd3bj6vJQO+TsiNu99Lknyijh 0Cow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=lgQuBNF/zlYQKiyuhC6mg+eCs/QPhU39hJtSFyJBLB4=; b=BrHvXfO6KZmNGluepY4VuvdlXI3YWdxZVgWQRykj6/e3MgAaxOmKvXSIMIcglhRl8a kC4SL2zdkpwabUXXoBZ31J/hbdYyKIzJahcEz/NxZ8RaHKcKKG/c6VTTS8zUGW4Oau8G f6N3+CY/3p0g+EjJbJ6t2AJ3PbQ5r2GmnsCnbRiFmv6sCqlQiCKNX+dgQqhDL6PL1Ipf gAez2unR6nghdQRVZGFECu5W1+VX+qLQf7IieBcxbFUjecDFPMpBGHf/f+Nf8Vlg62aB Nf4lakqoAeUztceKEQgFS7gF+LAsPce3c3Jv3YyZDwbwjhz0kXByDoVDfEcmJ/0fAF7i g3yA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=h3cG+kBR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n7-20020a6543c7000000b005032da21acasi5849987pgp.204.2023.04.08.07.35.40; Sat, 08 Apr 2023 07:35:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=h3cG+kBR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229988AbjDHOYk (ORCPT <rfc822;a1648639935@gmail.com> + 99 others); Sat, 8 Apr 2023 10:24:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48300 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229756AbjDHOYj (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 8 Apr 2023 10:24:39 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 42F95C64A; Sat, 8 Apr 2023 07:24:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1680963878; x=1712499878; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=1/EFS4qTJgUjgKWW6wh9rSkbnbQnihJy1r+mSbHPrws=; b=h3cG+kBRw1/u2i4vwy2nE9rjhZVPBwAjLzWjmW4YjxLxPSxj1VpZI74G pcvQU3gJRt762vUNx2f7CvRcHgMuamsO1ZhcuTtQk3X0ZtdfancbVtz3k CZhuHgVLUPZ4Nx/qDrsSmptpFtKjwPj5J4K5rqL4FX4pMoHDTMRz/hogf aGtNT0mMqnxnzKTjmRYasz+LBo45OETDcrXAq/c4CLN08lsTY9f1W8X+y UuJf4lcIrqgaba+QFlMKMlRjF3x/vIyZh2xQcTV++M8gYQyw8763mVCxD QlvH5MZn8mCBanjCFRoNJ5oJfiBBIVlsCBuFsaZqHqbD0dJ38qBUeVjMn w==; X-IronPort-AV: E=McAfee;i="6600,9927,10674"; a="343147974" X-IronPort-AV: E=Sophos;i="5.98,329,1673942400"; d="scan'208";a="343147974" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Apr 2023 07:24:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10674"; a="831461577" X-IronPort-AV: E=Sophos;i="5.98,329,1673942400"; d="scan'208";a="831461577" Received: from ubuntu.bj.intel.com ([10.238.155.108]) by fmsmga001.fm.intel.com with ESMTP; 08 Apr 2023 07:24:35 -0700 From: Zqiang <qiang1.zhang@intel.com> To: urezki@gmail.com, paulmck@kernel.org, frederic@kernel.org, joel@joelfernandes.org, qiang1.zhang@intel.com Cc: qiang.zhang1211@gmail.com, rcu@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v3] rcu/kvfree: Prevents cache growing when the backoff_page_cache_fill is set Date: Sat, 8 Apr 2023 22:25:17 +0800 Message-Id: <20230408142517.800549-1-qiang1.zhang@intel.com> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.5 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762619086557811661?= X-GMAIL-MSGID: =?utf-8?q?1762619086557811661?= |
Series |
[v3] rcu/kvfree: Prevents cache growing when the backoff_page_cache_fill is set
|
|
Commit Message
Zqiang
April 8, 2023, 2:25 p.m. UTC
Currently, in kfree_rcu_shrink_scan(), the drain_page_cache() is
executed before kfree_rcu_monitor() to drain page cache, if the bnode
structure's->gp_snap has done, the kvfree_rcu_bulk() will fill the
page cache again in kfree_rcu_monitor(), this commit add a check
for krcp structure's->backoff_page_cache_fill in put_cached_bnode(),
if the krcp structure's->backoff_page_cache_fill is set, prevent page
cache growing and disable allocated page in fill_page_cache_func().
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
---
kernel/rcu/tree.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
Comments
On Sat, Apr 08, 2023 at 10:25:17PM +0800, Zqiang wrote: > Currently, in kfree_rcu_shrink_scan(), the drain_page_cache() is > executed before kfree_rcu_monitor() to drain page cache, if the bnode > structure's->gp_snap has done, the kvfree_rcu_bulk() will fill the > page cache again in kfree_rcu_monitor(), this commit add a check > for krcp structure's->backoff_page_cache_fill in put_cached_bnode(), > if the krcp structure's->backoff_page_cache_fill is set, prevent page > cache growing and disable allocated page in fill_page_cache_func(). > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> Much improved! But still some questions below... Thanx, Paul > --- > kernel/rcu/tree.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index cc34d13be181..9d9d3772cc45 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -2908,6 +2908,8 @@ static inline bool > put_cached_bnode(struct kfree_rcu_cpu *krcp, > struct kvfree_rcu_bulk_data *bnode) > { > + if (atomic_read(&krcp->backoff_page_cache_fill)) > + return false; This will mean that under low-memory conditions, we will keep zero pages in ->bkvcache. All attempts to put something there will fail. This is probably not an issue for structures containing an rcu_head that are passed to kfree_rcu(p, field), but doesn't this mean that kfree_rcu_mightsleep() unconditionally invokes synchronize_rcu()? This could seriously slow up freeing under low-memory conditions, which might exacerbate the low-memory conditions. Is this really what we want? Zero cached rather than just fewer cached? > // Check the limit. > if (krcp->nr_bkv_objs >= rcu_min_cached_objs) > return false; > @@ -3221,7 +3223,7 @@ static void fill_page_cache_func(struct work_struct *work) > int i; > > nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? > - 1 : rcu_min_cached_objs; > + 0 : rcu_min_cached_objs; > > for (i = 0; i < nr_pages; i++) { I am still confused as to why we start "i" at zero rather than at ->nr_bkv_objs. What am I missing here? > bnode = (struct kvfree_rcu_bulk_data *) > -- > 2.32.0 >
> Currently, in kfree_rcu_shrink_scan(), the drain_page_cache() is > executed before kfree_rcu_monitor() to drain page cache, if the bnode > structure's->gp_snap has done, the kvfree_rcu_bulk() will fill the > page cache again in kfree_rcu_monitor(), this commit add a check > for krcp structure's->backoff_page_cache_fill in put_cached_bnode(), > if the krcp structure's->backoff_page_cache_fill is set, prevent page > cache growing and disable allocated page in fill_page_cache_func(). > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > >Much improved! But still some questions below... > > Thanx, Paul > > --- > kernel/rcu/tree.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index cc34d13be181..9d9d3772cc45 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -2908,6 +2908,8 @@ static inline bool > put_cached_bnode(struct kfree_rcu_cpu *krcp, > struct kvfree_rcu_bulk_data *bnode) > { > + if (atomic_read(&krcp->backoff_page_cache_fill)) > + return false; > >This will mean that under low-memory conditions, we will keep zero >pages in ->bkvcache. All attempts to put something there will fail. > >This is probably not an issue for structures containing an rcu_head >that are passed to kfree_rcu(p, field), but doesn't this mean that >kfree_rcu_mightsleep() unconditionally invokes synchronize_rcu()? >This could seriously slow up freeing under low-memory conditions, >which might exacerbate the low-memory conditions. Thanks for mentioning this, I didn't think of this before😊. > >Is this really what we want? Zero cached rather than just fewer cached? > > > > // Check the limit. > if (krcp->nr_bkv_objs >= rcu_min_cached_objs) > return false; > @@ -3221,7 +3223,7 @@ static void fill_page_cache_func(struct work_struct *work) > int i; > > nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? > - 1 : rcu_min_cached_objs; > + 0 : rcu_min_cached_objs; > > for (i = 0; i < nr_pages; i++) { > >I am still confused as to why we start "i" at zero rather than at >->nr_bkv_objs. What am I missing here? No, you are right, I missed this place. --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2908,6 +2908,8 @@ static inline bool put_cached_bnode(struct kfree_rcu_cpu *krcp, struct kvfree_rcu_bulk_data *bnode) { + if (atomic_read(&krcp->backoff_page_cache_fill)) + return false; // Check the limit. if (krcp->nr_bkv_objs >= rcu_min_cached_objs) return false; @@ -3223,7 +3225,7 @@ static void fill_page_cache_func(struct work_struct *work) nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? 1 : rcu_min_cached_objs; - for (i = 0; i < nr_pages; i++) { + for (i = krcp->nr_bkv_objs; i < nr_pages; i++) { bnode = (struct kvfree_rcu_bulk_data *) __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); thoughts? Thanks Zqiang > > bnode = (struct kvfree_rcu_bulk_data *) > -- > 2.32.0 >
On Tue, Apr 11, 2023 at 04:04:45AM +0000, Zhang, Qiang1 wrote: > > Currently, in kfree_rcu_shrink_scan(), the drain_page_cache() is > > executed before kfree_rcu_monitor() to drain page cache, if the bnode > > structure's->gp_snap has done, the kvfree_rcu_bulk() will fill the > > page cache again in kfree_rcu_monitor(), this commit add a check > > for krcp structure's->backoff_page_cache_fill in put_cached_bnode(), > > if the krcp structure's->backoff_page_cache_fill is set, prevent page > > cache growing and disable allocated page in fill_page_cache_func(). > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > > >Much improved! But still some questions below... > > > > Thanx, Paul > > > > --- > > kernel/rcu/tree.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index cc34d13be181..9d9d3772cc45 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -2908,6 +2908,8 @@ static inline bool > > put_cached_bnode(struct kfree_rcu_cpu *krcp, > > struct kvfree_rcu_bulk_data *bnode) > > { > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > + return false; > > > >This will mean that under low-memory conditions, we will keep zero > >pages in ->bkvcache. All attempts to put something there will fail. > > > >This is probably not an issue for structures containing an rcu_head > >that are passed to kfree_rcu(p, field), but doesn't this mean that > >kfree_rcu_mightsleep() unconditionally invokes synchronize_rcu()? > >This could seriously slow up freeing under low-memory conditions, > >which might exacerbate the low-memory conditions. > > Thanks for mentioning this, I didn't think of this before😊. > > > > >Is this really what we want? Zero cached rather than just fewer cached? > > > > > > > > // Check the limit. > > if (krcp->nr_bkv_objs >= rcu_min_cached_objs) > > return false; > > @@ -3221,7 +3223,7 @@ static void fill_page_cache_func(struct work_struct *work) > > int i; > > > > nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? > > - 1 : rcu_min_cached_objs; > > + 0 : rcu_min_cached_objs; > > > > for (i = 0; i < nr_pages; i++) { > > > >I am still confused as to why we start "i" at zero rather than at > >->nr_bkv_objs. What am I missing here? > > > No, you are right, I missed this place. > > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -2908,6 +2908,8 @@ static inline bool > put_cached_bnode(struct kfree_rcu_cpu *krcp, > struct kvfree_rcu_bulk_data *bnode) > { > + if (atomic_read(&krcp->backoff_page_cache_fill)) > + return false; > This is broken, unfortunately. If a low memory condition we fill fill a cache with at least one page anyway because of we do not want to hit a slow path. > // Check the limit. > if (krcp->nr_bkv_objs >= rcu_min_cached_objs) > return false; > @@ -3223,7 +3225,7 @@ static void fill_page_cache_func(struct work_struct *work) > nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? > 1 : rcu_min_cached_objs; > > - for (i = 0; i < nr_pages; i++) { > + for (i = krcp->nr_bkv_objs; i < nr_pages; i++) { > bnode = (struct kvfree_rcu_bulk_data *) > __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); > > IMHO, it should be send as a separate patch explaining why it it is needed. -- Uladzislau Rezki
> > Currently, in kfree_rcu_shrink_scan(), the drain_page_cache() is > > executed before kfree_rcu_monitor() to drain page cache, if the bnode > > structure's->gp_snap has done, the kvfree_rcu_bulk() will fill the > > page cache again in kfree_rcu_monitor(), this commit add a check > > for krcp structure's->backoff_page_cache_fill in put_cached_bnode(), > > if the krcp structure's->backoff_page_cache_fill is set, prevent page > > cache growing and disable allocated page in fill_page_cache_func(). > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > > >Much improved! But still some questions below... > > > > Thanx, Paul > > > > --- > > kernel/rcu/tree.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index cc34d13be181..9d9d3772cc45 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -2908,6 +2908,8 @@ static inline bool > > put_cached_bnode(struct kfree_rcu_cpu *krcp, > > struct kvfree_rcu_bulk_data *bnode) > > { > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > + return false; > > > >This will mean that under low-memory conditions, we will keep zero > >pages in ->bkvcache. All attempts to put something there will fail. > > > >This is probably not an issue for structures containing an rcu_head > >that are passed to kfree_rcu(p, field), but doesn't this mean that > >kfree_rcu_mightsleep() unconditionally invokes synchronize_rcu()? > >This could seriously slow up freeing under low-memory conditions, > >which might exacerbate the low-memory conditions. > > Thanks for mentioning this, I didn't think of this before😊. > > > > >Is this really what we want? Zero cached rather than just fewer cached? > > > > > > > > // Check the limit. > > if (krcp->nr_bkv_objs >= rcu_min_cached_objs) > > return false; > > @@ -3221,7 +3223,7 @@ static void fill_page_cache_func(struct work_struct *work) > > int i; > > > > nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? > > - 1 : rcu_min_cached_objs; > > + 0 : rcu_min_cached_objs; > > > > for (i = 0; i < nr_pages; i++) { > > > >I am still confused as to why we start "i" at zero rather than at > >->nr_bkv_objs. What am I missing here? > > > No, you are right, I missed this place. > > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -2908,6 +2908,8 @@ static inline bool > put_cached_bnode(struct kfree_rcu_cpu *krcp, > struct kvfree_rcu_bulk_data *bnode) > { > + if (atomic_read(&krcp->backoff_page_cache_fill)) > + return false; > >This is broken, unfortunately. If a low memory condition we fill >fill a cache with at least one page anyway because of we do not want >to hit a slow path. Thanks remind, please ignore my v4 patch, how about the following? diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 41daae3239b5..e2e8412e687f 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3238,6 +3238,9 @@ static void fill_page_cache_func(struct work_struct *work) free_page((unsigned long) bnode); break; } + + if (atomic_read(&krcp->backoff_page_cache_fill)) + break; } atomic_set(&krcp->work_in_progress, 0); > > // Check the limit. > if (krcp->nr_bkv_objs >= rcu_min_cached_objs) > return false; > @@ -3223,7 +3225,7 @@ static void fill_page_cache_func(struct work_struct *work) > nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? > 1 : rcu_min_cached_objs; > > - for (i = 0; i < nr_pages; i++) { > + for (i = krcp->nr_bkv_objs; i < nr_pages; i++) { > bnode = (struct kvfree_rcu_bulk_data *) > __get_free_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOMEMALLOC | __GFP_NOWARN); > > >IMHO, it should be send as a separate patch explaining why it >it is needed. Agree. Thanks Zqiang > >-- >Uladzislau Rezki
On Tue, Apr 11, 2023 at 02:42:27PM +0000, Zhang, Qiang1 wrote: > > > Currently, in kfree_rcu_shrink_scan(), the drain_page_cache() is > > > executed before kfree_rcu_monitor() to drain page cache, if the bnode > > > structure's->gp_snap has done, the kvfree_rcu_bulk() will fill the > > > page cache again in kfree_rcu_monitor(), this commit add a check > > > for krcp structure's->backoff_page_cache_fill in put_cached_bnode(), > > > if the krcp structure's->backoff_page_cache_fill is set, prevent page > > > cache growing and disable allocated page in fill_page_cache_func(). > > > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > > > > >Much improved! But still some questions below... > > > > > > Thanx, Paul > > > > > > --- > > > kernel/rcu/tree.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index cc34d13be181..9d9d3772cc45 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -2908,6 +2908,8 @@ static inline bool > > > put_cached_bnode(struct kfree_rcu_cpu *krcp, > > > struct kvfree_rcu_bulk_data *bnode) > > > { > > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > > + return false; > > > > > >This will mean that under low-memory conditions, we will keep zero > > >pages in ->bkvcache. All attempts to put something there will fail. > > > > > >This is probably not an issue for structures containing an rcu_head > > >that are passed to kfree_rcu(p, field), but doesn't this mean that > > >kfree_rcu_mightsleep() unconditionally invokes synchronize_rcu()? > > >This could seriously slow up freeing under low-memory conditions, > > >which might exacerbate the low-memory conditions. > > > > Thanks for mentioning this, I didn't think of this before😊. > > > > > > > >Is this really what we want? Zero cached rather than just fewer cached? > > > > > > > > > > > > // Check the limit. > > > if (krcp->nr_bkv_objs >= rcu_min_cached_objs) > > > return false; > > > @@ -3221,7 +3223,7 @@ static void fill_page_cache_func(struct work_struct *work) > > > int i; > > > > > > nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? > > > - 1 : rcu_min_cached_objs; > > > + 0 : rcu_min_cached_objs; > > > > > > for (i = 0; i < nr_pages; i++) { > > > > > >I am still confused as to why we start "i" at zero rather than at > > >->nr_bkv_objs. What am I missing here? > > > > > > No, you are right, I missed this place. > > > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -2908,6 +2908,8 @@ static inline bool > > put_cached_bnode(struct kfree_rcu_cpu *krcp, > > struct kvfree_rcu_bulk_data *bnode) > > { > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > + return false; > > > >This is broken, unfortunately. If a low memory condition we fill > >fill a cache with at least one page anyway because of we do not want > >to hit a slow path. > > Thanks remind, please ignore my v4 patch, how about the following? > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 41daae3239b5..e2e8412e687f 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3238,6 +3238,9 @@ static void fill_page_cache_func(struct work_struct *work) > free_page((unsigned long) bnode); > break; > } > + > + if (atomic_read(&krcp->backoff_page_cache_fill)) > + break; > } It does not fix an "issue" you are reporting. kvfree_rcu_bulk() function can still fill it back. IMHO, the solution here is to disable cache if a low memory condition and enable back later on. The cache size is controlled by the rcu_min_cached_objs variable. We can set it to 1 and restore it back to original value to make the cache operating as before. -- Uladzislau Rezki
> > > Currently, in kfree_rcu_shrink_scan(), the drain_page_cache() is > > > executed before kfree_rcu_monitor() to drain page cache, if the bnode > > > structure's->gp_snap has done, the kvfree_rcu_bulk() will fill the > > > page cache again in kfree_rcu_monitor(), this commit add a check > > > for krcp structure's->backoff_page_cache_fill in put_cached_bnode(), > > > if the krcp structure's->backoff_page_cache_fill is set, prevent page > > > cache growing and disable allocated page in fill_page_cache_func(). > > > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > > > > >Much improved! But still some questions below... > > > > > > Thanx, Paul > > > > > > --- > > > kernel/rcu/tree.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index cc34d13be181..9d9d3772cc45 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -2908,6 +2908,8 @@ static inline bool > > > put_cached_bnode(struct kfree_rcu_cpu *krcp, > > > struct kvfree_rcu_bulk_data *bnode) > > > { > > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > > + return false; > > > > > >This will mean that under low-memory conditions, we will keep zero > > >pages in ->bkvcache. All attempts to put something there will fail. > > > > > >This is probably not an issue for structures containing an rcu_head > > >that are passed to kfree_rcu(p, field), but doesn't this mean that > > >kfree_rcu_mightsleep() unconditionally invokes synchronize_rcu()? > > >This could seriously slow up freeing under low-memory conditions, > > >which might exacerbate the low-memory conditions. > > > > Thanks for mentioning this, I didn't think of this before😊. > > > > > > > >Is this really what we want? Zero cached rather than just fewer cached? > > > > > > > > > > > > // Check the limit. > > > if (krcp->nr_bkv_objs >= rcu_min_cached_objs) > > > return false; > > > @@ -3221,7 +3223,7 @@ static void fill_page_cache_func(struct work_struct *work) > > > int i; > > > > > > nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? > > > - 1 : rcu_min_cached_objs; > > > + 0 : rcu_min_cached_objs; > > > > > > for (i = 0; i < nr_pages; i++) { > > > > > >I am still confused as to why we start "i" at zero rather than at > > >->nr_bkv_objs. What am I missing here? > > > > > > No, you are right, I missed this place. > > > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -2908,6 +2908,8 @@ static inline bool > > put_cached_bnode(struct kfree_rcu_cpu *krcp, > > struct kvfree_rcu_bulk_data *bnode) > > { > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > + return false; > > > >This is broken, unfortunately. If a low memory condition we fill > >fill a cache with at least one page anyway because of we do not want > >to hit a slow path. > > Thanks remind, please ignore my v4 patch, how about the following? > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 41daae3239b5..e2e8412e687f 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -3238,6 +3238,9 @@ static void fill_page_cache_func(struct work_struct *work) > free_page((unsigned long) bnode); > break; > } > + > + if (atomic_read(&krcp->backoff_page_cache_fill)) > + break; > } >It does not fix an "issue" you are reporting. kvfree_rcu_bulk() function >can still fill it back. IMHO, the solution here is to disable cache if >a low memory condition and enable back later on. > > >The cache size is controlled by the rcu_min_cached_objs variable. We can >set it to 1 and restore it back to original value to make the cache operating >as before. A good suggestion. a question, when need to restore rcu_min_cached_objs? after the execution of kfree_rcu_shrink_scan() ends? Thanks Zqiang > >-- >Uladzislau Rezki
On Tue, Apr 11, 2023 at 03:09:13PM +0000, Zhang, Qiang1 wrote: > > > > Currently, in kfree_rcu_shrink_scan(), the drain_page_cache() is > > > > executed before kfree_rcu_monitor() to drain page cache, if the bnode > > > > structure's->gp_snap has done, the kvfree_rcu_bulk() will fill the > > > > page cache again in kfree_rcu_monitor(), this commit add a check > > > > for krcp structure's->backoff_page_cache_fill in put_cached_bnode(), > > > > if the krcp structure's->backoff_page_cache_fill is set, prevent page > > > > cache growing and disable allocated page in fill_page_cache_func(). > > > > > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > > > > > > >Much improved! But still some questions below... > > > > > > > > Thanx, Paul > > > > > > > > --- > > > > kernel/rcu/tree.c | 4 +++- > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index cc34d13be181..9d9d3772cc45 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -2908,6 +2908,8 @@ static inline bool > > > > put_cached_bnode(struct kfree_rcu_cpu *krcp, > > > > struct kvfree_rcu_bulk_data *bnode) > > > > { > > > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > > > + return false; > > > > > > > >This will mean that under low-memory conditions, we will keep zero > > > >pages in ->bkvcache. All attempts to put something there will fail. > > > > > > > >This is probably not an issue for structures containing an rcu_head > > > >that are passed to kfree_rcu(p, field), but doesn't this mean that > > > >kfree_rcu_mightsleep() unconditionally invokes synchronize_rcu()? > > > >This could seriously slow up freeing under low-memory conditions, > > > >which might exacerbate the low-memory conditions. > > > > > > Thanks for mentioning this, I didn't think of this before😊. > > > > > > > > > > >Is this really what we want? Zero cached rather than just fewer cached? > > > > > > > > > > > > > > > > // Check the limit. > > > > if (krcp->nr_bkv_objs >= rcu_min_cached_objs) > > > > return false; > > > > @@ -3221,7 +3223,7 @@ static void fill_page_cache_func(struct work_struct *work) > > > > int i; > > > > > > > > nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? > > > > - 1 : rcu_min_cached_objs; > > > > + 0 : rcu_min_cached_objs; > > > > > > > > for (i = 0; i < nr_pages; i++) { > > > > > > > >I am still confused as to why we start "i" at zero rather than at > > > >->nr_bkv_objs. What am I missing here? > > > > > > > > > No, you are right, I missed this place. > > > > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -2908,6 +2908,8 @@ static inline bool > > > put_cached_bnode(struct kfree_rcu_cpu *krcp, > > > struct kvfree_rcu_bulk_data *bnode) > > > { > > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > > + return false; > > > > > >This is broken, unfortunately. If a low memory condition we fill > > >fill a cache with at least one page anyway because of we do not want > > >to hit a slow path. > > > > Thanks remind, please ignore my v4 patch, how about the following? > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 41daae3239b5..e2e8412e687f 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -3238,6 +3238,9 @@ static void fill_page_cache_func(struct work_struct *work) > > free_page((unsigned long) bnode); > > break; > > } > > + > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > + break; > > } > >It does not fix an "issue" you are reporting. kvfree_rcu_bulk() function > >can still fill it back. IMHO, the solution here is to disable cache if > >a low memory condition and enable back later on. > > > > > >The cache size is controlled by the rcu_min_cached_objs variable. We can > >set it to 1 and restore it back to original value to make the cache operating > >as before. > > A good suggestion. a question, when need to restore rcu_min_cached_objs? > after the execution of kfree_rcu_shrink_scan() ends? > We do not know when a low memory condition ends :) Therefore we defer a refill work for a certain time. In the fill_page_cache_func() we allow the cache operate as normal again: ... atomic_set(&krcp->backoff_page_cache_fill, 0); ... -- Uladzislau Rezki
On Tue, Apr 11, 2023 at 04:58:22PM +0200, Uladzislau Rezki wrote: > On Tue, Apr 11, 2023 at 02:42:27PM +0000, Zhang, Qiang1 wrote: > > > > Currently, in kfree_rcu_shrink_scan(), the drain_page_cache() is > > > > executed before kfree_rcu_monitor() to drain page cache, if the bnode > > > > structure's->gp_snap has done, the kvfree_rcu_bulk() will fill the > > > > page cache again in kfree_rcu_monitor(), this commit add a check > > > > for krcp structure's->backoff_page_cache_fill in put_cached_bnode(), > > > > if the krcp structure's->backoff_page_cache_fill is set, prevent page > > > > cache growing and disable allocated page in fill_page_cache_func(). > > > > > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > > > > > > >Much improved! But still some questions below... > > > > > > > > Thanx, Paul > > > > > > > > --- > > > > kernel/rcu/tree.c | 4 +++- > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index cc34d13be181..9d9d3772cc45 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -2908,6 +2908,8 @@ static inline bool > > > > put_cached_bnode(struct kfree_rcu_cpu *krcp, > > > > struct kvfree_rcu_bulk_data *bnode) > > > > { > > > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > > > + return false; > > > > > > > >This will mean that under low-memory conditions, we will keep zero > > > >pages in ->bkvcache. All attempts to put something there will fail. > > > > > > > >This is probably not an issue for structures containing an rcu_head > > > >that are passed to kfree_rcu(p, field), but doesn't this mean that > > > >kfree_rcu_mightsleep() unconditionally invokes synchronize_rcu()? > > > >This could seriously slow up freeing under low-memory conditions, > > > >which might exacerbate the low-memory conditions. > > > > > > Thanks for mentioning this, I didn't think of this before😊. > > > > > > > > > > >Is this really what we want? Zero cached rather than just fewer cached? > > > > > > > > > > > > > > > > // Check the limit. > > > > if (krcp->nr_bkv_objs >= rcu_min_cached_objs) > > > > return false; > > > > @@ -3221,7 +3223,7 @@ static void fill_page_cache_func(struct work_struct *work) > > > > int i; > > > > > > > > nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? > > > > - 1 : rcu_min_cached_objs; > > > > + 0 : rcu_min_cached_objs; > > > > > > > > for (i = 0; i < nr_pages; i++) { > > > > > > > >I am still confused as to why we start "i" at zero rather than at > > > >->nr_bkv_objs. What am I missing here? > > > > > > > > > No, you are right, I missed this place. > > > > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -2908,6 +2908,8 @@ static inline bool > > > put_cached_bnode(struct kfree_rcu_cpu *krcp, > > > struct kvfree_rcu_bulk_data *bnode) > > > { > > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > > + return false; > > > > > >This is broken, unfortunately. If a low memory condition we fill > > >fill a cache with at least one page anyway because of we do not want > > >to hit a slow path. > > > > Thanks remind, please ignore my v4 patch, how about the following? > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index 41daae3239b5..e2e8412e687f 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -3238,6 +3238,9 @@ static void fill_page_cache_func(struct work_struct *work) > > free_page((unsigned long) bnode); > > break; > > } > > + > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > + break; > > } > It does not fix an "issue" you are reporting. kvfree_rcu_bulk() function > can still fill it back. IMHO, the solution here is to disable cache if > a low memory condition and enable back later on. > > The cache size is controlled by the rcu_min_cached_objs variable. We can > set it to 1 and restore it back to original value to make the cache operating > as before. It would be best to use a second variable for this. Users might get annoyed if their changes to rcu_min_cached_objs got overwritten once things got set back to normal operation. Thanx, Paul
On Tue, Apr 11, 2023 at 09:42:37AM -0700, Paul E. McKenney wrote: > On Tue, Apr 11, 2023 at 04:58:22PM +0200, Uladzislau Rezki wrote: > > On Tue, Apr 11, 2023 at 02:42:27PM +0000, Zhang, Qiang1 wrote: > > > > > Currently, in kfree_rcu_shrink_scan(), the drain_page_cache() is > > > > > executed before kfree_rcu_monitor() to drain page cache, if the bnode > > > > > structure's->gp_snap has done, the kvfree_rcu_bulk() will fill the > > > > > page cache again in kfree_rcu_monitor(), this commit add a check > > > > > for krcp structure's->backoff_page_cache_fill in put_cached_bnode(), > > > > > if the krcp structure's->backoff_page_cache_fill is set, prevent page > > > > > cache growing and disable allocated page in fill_page_cache_func(). > > > > > > > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > > > > > > > > >Much improved! But still some questions below... > > > > > > > > > > Thanx, Paul > > > > > > > > > > --- > > > > > kernel/rcu/tree.c | 4 +++- > > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > index cc34d13be181..9d9d3772cc45 100644 > > > > > --- a/kernel/rcu/tree.c > > > > > +++ b/kernel/rcu/tree.c > > > > > @@ -2908,6 +2908,8 @@ static inline bool > > > > > put_cached_bnode(struct kfree_rcu_cpu *krcp, > > > > > struct kvfree_rcu_bulk_data *bnode) > > > > > { > > > > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > > > > + return false; > > > > > > > > > >This will mean that under low-memory conditions, we will keep zero > > > > >pages in ->bkvcache. All attempts to put something there will fail. > > > > > > > > > >This is probably not an issue for structures containing an rcu_head > > > > >that are passed to kfree_rcu(p, field), but doesn't this mean that > > > > >kfree_rcu_mightsleep() unconditionally invokes synchronize_rcu()? > > > > >This could seriously slow up freeing under low-memory conditions, > > > > >which might exacerbate the low-memory conditions. > > > > > > > > Thanks for mentioning this, I didn't think of this before😊. > > > > > > > > > > > > > >Is this really what we want? Zero cached rather than just fewer cached? > > > > > > > > > > > > > > > > > > > > // Check the limit. > > > > > if (krcp->nr_bkv_objs >= rcu_min_cached_objs) > > > > > return false; > > > > > @@ -3221,7 +3223,7 @@ static void fill_page_cache_func(struct work_struct *work) > > > > > int i; > > > > > > > > > > nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? > > > > > - 1 : rcu_min_cached_objs; > > > > > + 0 : rcu_min_cached_objs; > > > > > > > > > > for (i = 0; i < nr_pages; i++) { > > > > > > > > > >I am still confused as to why we start "i" at zero rather than at > > > > >->nr_bkv_objs. What am I missing here? > > > > > > > > > > > > No, you are right, I missed this place. > > > > > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -2908,6 +2908,8 @@ static inline bool > > > > put_cached_bnode(struct kfree_rcu_cpu *krcp, > > > > struct kvfree_rcu_bulk_data *bnode) > > > > { > > > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > > > + return false; > > > > > > > >This is broken, unfortunately. If a low memory condition we fill > > > >fill a cache with at least one page anyway because of we do not want > > > >to hit a slow path. > > > > > > Thanks remind, please ignore my v4 patch, how about the following? > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index 41daae3239b5..e2e8412e687f 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -3238,6 +3238,9 @@ static void fill_page_cache_func(struct work_struct *work) > > > free_page((unsigned long) bnode); > > > break; > > > } > > > + > > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > > + break; > > > } > > It does not fix an "issue" you are reporting. kvfree_rcu_bulk() function > > can still fill it back. IMHO, the solution here is to disable cache if > > a low memory condition and enable back later on. > > > > The cache size is controlled by the rcu_min_cached_objs variable. We can > > set it to 1 and restore it back to original value to make the cache operating > > as before. > > It would be best to use a second variable for this. Users might get > annoyed if their changes to rcu_min_cached_objs got overwritten once > things got set back to normal operation. > Agree. So we do not make it visible over sysfs interface for user that we manipulate it. -- Uladzislau Rezki
> On Tue, Apr 11, 2023 at 04:58:22PM +0200, Uladzislau Rezki wrote: > > On Tue, Apr 11, 2023 at 02:42:27PM +0000, Zhang, Qiang1 wrote: > > > > > Currently, in kfree_rcu_shrink_scan(), the drain_page_cache() > > > > > is executed before kfree_rcu_monitor() to drain page cache, if > > > > > the bnode structure's->gp_snap has done, the kvfree_rcu_bulk() > > > > > will fill the page cache again in kfree_rcu_monitor(), this > > > > > commit add a check for krcp > > > > > structure's->backoff_page_cache_fill in put_cached_bnode(), if > > > > > the krcp structure's->backoff_page_cache_fill is set, prevent page cache growing and disable allocated page in fill_page_cache_func(). > > > > > > > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > > > > > > > > >Much improved! But still some questions below... > > > > > > > > > > Thanx, Paul > > > > > > > > > > --- > > > > > kernel/rcu/tree.c | 4 +++- > > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index > > > > > cc34d13be181..9d9d3772cc45 100644 > > > > > --- a/kernel/rcu/tree.c > > > > > +++ b/kernel/rcu/tree.c > > > > > @@ -2908,6 +2908,8 @@ static inline bool > > > > > put_cached_bnode(struct kfree_rcu_cpu *krcp, > > > > > struct kvfree_rcu_bulk_data *bnode) { > > > > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > > > > + return false; > > > > > > > > > >This will mean that under low-memory conditions, we will keep > > > > >zero pages in ->bkvcache. All attempts to put something there will fail. > > > > > > > > > >This is probably not an issue for structures containing an > > > > >rcu_head that are passed to kfree_rcu(p, field), but doesn't > > > > >this mean that > > > > >kfree_rcu_mightsleep() unconditionally invokes synchronize_rcu()? > > > > >This could seriously slow up freeing under low-memory > > > > >conditions, which might exacerbate the low-memory conditions. > > > > > > > > Thanks for mentioning this, I didn't think of this before😊. > > > > > > > > > > > > > >Is this really what we want? Zero cached rather than just fewer cached? > > > > > > > > > > > > > > > > > > > > // Check the limit. > > > > > if (krcp->nr_bkv_objs >= rcu_min_cached_objs) > > > > > return false; > > > > > @@ -3221,7 +3223,7 @@ static void fill_page_cache_func(struct work_struct *work) > > > > > int i; > > > > > > > > > > nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? > > > > > - 1 : rcu_min_cached_objs; > > > > > + 0 : rcu_min_cached_objs; > > > > > > > > > > for (i = 0; i < nr_pages; i++) { > > > > > > > > > >I am still confused as to why we start "i" at zero rather than > > > > >at > > > > >->nr_bkv_objs. What am I missing here? > > > > > > > > > > > > No, you are right, I missed this place. > > > > > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -2908,6 +2908,8 @@ static inline bool > > > > put_cached_bnode(struct kfree_rcu_cpu *krcp, > > > > struct kvfree_rcu_bulk_data *bnode) { > > > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > > > + return false; > > > > > > > >This is broken, unfortunately. If a low memory condition we fill > > > >fill a cache with at least one page anyway because of we do not > > > >want to hit a slow path. > > > > > > Thanks remind, please ignore my v4 patch, how about the following? > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index > > > 41daae3239b5..e2e8412e687f 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -3238,6 +3238,9 @@ static void fill_page_cache_func(struct work_struct *work) > > > free_page((unsigned long) bnode); > > > break; > > > } > > > + > > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > > + break; > > > } > > It does not fix an "issue" you are reporting. kvfree_rcu_bulk() > > function can still fill it back. IMHO, the solution here is to > > disable cache if a low memory condition and enable back later on. > > > > The cache size is controlled by the rcu_min_cached_objs variable. We > > can set it to 1 and restore it back to original value to make the > > cache operating as before. > > It would be best to use a second variable for this. Users might get > annoyed if their changes to rcu_min_cached_objs got overwritten once > things got set back to normal operation. > >Agree. So we do not make it visible over sysfs interface for user that we manipulate it. > > The rcu_min_cached_objs is read-only, Users cannot be set rcu_min_cached_objs dynamically. -r--r--r-- 1 root root 4.0K Apr 12 01:08 rcu_min_cached_objs diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 41daae3239b5..0e9f83562823 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2909,7 +2909,8 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp, struct kvfree_rcu_bulk_data *bnode) { // Check the limit. - if (krcp->nr_bkv_objs >= rcu_min_cached_objs) + if ((atomic_read(&krcp->backoff_page_cache_fill) && krcp->nr_bkv_objs) || + krcp->nr_bkv_objs >= rcu_min_cached_objs) return false; llist_add((struct llist_node *) bnode, &krcp->bkvcache); thoughts? Thanks Zqiang > >-- >Uladzislau Rezki
On Wed, Apr 12, 2023 at 09:14:15AM +0000, Zhang, Qiang1 wrote: > > > On Tue, Apr 11, 2023 at 04:58:22PM +0200, Uladzislau Rezki wrote: > > > On Tue, Apr 11, 2023 at 02:42:27PM +0000, Zhang, Qiang1 wrote: > > > > > > Currently, in kfree_rcu_shrink_scan(), the drain_page_cache() > > > > > > is executed before kfree_rcu_monitor() to drain page cache, if > > > > > > the bnode structure's->gp_snap has done, the kvfree_rcu_bulk() > > > > > > will fill the page cache again in kfree_rcu_monitor(), this > > > > > > commit add a check for krcp > > > > > > structure's->backoff_page_cache_fill in put_cached_bnode(), if > > > > > > the krcp structure's->backoff_page_cache_fill is set, prevent page cache growing and disable allocated page in fill_page_cache_func(). > > > > > > > > > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > > > > > > > > > > >Much improved! But still some questions below... > > > > > > > > > > > > Thanx, Paul > > > > > > > > > > > > --- > > > > > > kernel/rcu/tree.c | 4 +++- > > > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index > > > > > > cc34d13be181..9d9d3772cc45 100644 > > > > > > --- a/kernel/rcu/tree.c > > > > > > +++ b/kernel/rcu/tree.c > > > > > > @@ -2908,6 +2908,8 @@ static inline bool > > > > > > put_cached_bnode(struct kfree_rcu_cpu *krcp, > > > > > > struct kvfree_rcu_bulk_data *bnode) { > > > > > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > > > > > + return false; > > > > > > > > > > > >This will mean that under low-memory conditions, we will keep > > > > > >zero pages in ->bkvcache. All attempts to put something there will fail. > > > > > > > > > > > >This is probably not an issue for structures containing an > > > > > >rcu_head that are passed to kfree_rcu(p, field), but doesn't > > > > > >this mean that > > > > > >kfree_rcu_mightsleep() unconditionally invokes synchronize_rcu()? > > > > > >This could seriously slow up freeing under low-memory > > > > > >conditions, which might exacerbate the low-memory conditions. > > > > > > > > > > Thanks for mentioning this, I didn't think of this before😊. > > > > > > > > > > > > > > > > >Is this really what we want? Zero cached rather than just fewer cached? > > > > > > > > > > > > > > > > > > > > > > > > // Check the limit. > > > > > > if (krcp->nr_bkv_objs >= rcu_min_cached_objs) > > > > > > return false; > > > > > > @@ -3221,7 +3223,7 @@ static void fill_page_cache_func(struct work_struct *work) > > > > > > int i; > > > > > > > > > > > > nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? > > > > > > - 1 : rcu_min_cached_objs; > > > > > > + 0 : rcu_min_cached_objs; > > > > > > > > > > > > for (i = 0; i < nr_pages; i++) { > > > > > > > > > > > >I am still confused as to why we start "i" at zero rather than > > > > > >at > > > > > >->nr_bkv_objs. What am I missing here? > > > > > > > > > > > > > > > No, you are right, I missed this place. > > > > > > > > > > --- a/kernel/rcu/tree.c > > > > > +++ b/kernel/rcu/tree.c > > > > > @@ -2908,6 +2908,8 @@ static inline bool > > > > > put_cached_bnode(struct kfree_rcu_cpu *krcp, > > > > > struct kvfree_rcu_bulk_data *bnode) { > > > > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > > > > + return false; > > > > > > > > > >This is broken, unfortunately. If a low memory condition we fill > > > > >fill a cache with at least one page anyway because of we do not > > > > >want to hit a slow path. > > > > > > > > Thanks remind, please ignore my v4 patch, how about the following? > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index > > > > 41daae3239b5..e2e8412e687f 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -3238,6 +3238,9 @@ static void fill_page_cache_func(struct work_struct *work) > > > > free_page((unsigned long) bnode); > > > > break; > > > > } > > > > + > > > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > > > + break; > > > > } > > > It does not fix an "issue" you are reporting. kvfree_rcu_bulk() > > > function can still fill it back. IMHO, the solution here is to > > > disable cache if a low memory condition and enable back later on. > > > > > > The cache size is controlled by the rcu_min_cached_objs variable. We > > > can set it to 1 and restore it back to original value to make the > > > cache operating as before. > > > > It would be best to use a second variable for this. Users might get > > annoyed if their changes to rcu_min_cached_objs got overwritten once > > things got set back to normal operation. > > > >Agree. So we do not make it visible over sysfs interface for user that we manipulate it. > > > > > > > The rcu_min_cached_objs is read-only, Users cannot be set rcu_min_cached_objs dynamically. > > -r--r--r-- 1 root root 4.0K Apr 12 01:08 rcu_min_cached_objs > You can set it as a boot parameter: rcutree.rcu_min_cached_objs=XXX > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 41daae3239b5..0e9f83562823 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -2909,7 +2909,8 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp, > struct kvfree_rcu_bulk_data *bnode) > { > // Check the limit. > - if (krcp->nr_bkv_objs >= rcu_min_cached_objs) > + if ((atomic_read(&krcp->backoff_page_cache_fill) && krcp->nr_bkv_objs) || > + krcp->nr_bkv_objs >= rcu_min_cached_objs) > return false; > We can eliminate the backoff_page_cache_fill per-cpu atomic variable and just change a new one, say, min_cached_objs, if a low memory condition. Restore it to a default what is the rcu_min_cached_objs. I can post here an example if it helps to make it more clear. -- Uladzislau Rezki
> > On Tue, Apr 11, 2023 at 04:58:22PM +0200, Uladzislau Rezki wrote: > > > On Tue, Apr 11, 2023 at 02:42:27PM +0000, Zhang, Qiang1 wrote: > > > > > > Currently, in kfree_rcu_shrink_scan(), the drain_page_cache() > > > > > > is executed before kfree_rcu_monitor() to drain page cache, if > > > > > > the bnode structure's->gp_snap has done, the kvfree_rcu_bulk() > > > > > > will fill the page cache again in kfree_rcu_monitor(), this > > > > > > commit add a check for krcp > > > > > > structure's->backoff_page_cache_fill in put_cached_bnode(), if > > > > > > the krcp structure's->backoff_page_cache_fill is set, prevent page cache growing and disable allocated page in fill_page_cache_func(). > > > > > > > > > > > > Signed-off-by: Zqiang <qiang1.zhang@intel.com> > > > > > > > > > > > >Much improved! But still some questions below... > > > > > > > > > > > > Thanx, Paul > > > > > > > > > > > > --- > > > > > > kernel/rcu/tree.c | 4 +++- > > > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index > > > > > > cc34d13be181..9d9d3772cc45 100644 > > > > > > --- a/kernel/rcu/tree.c > > > > > > +++ b/kernel/rcu/tree.c > > > > > > @@ -2908,6 +2908,8 @@ static inline bool > > > > > > put_cached_bnode(struct kfree_rcu_cpu *krcp, > > > > > > struct kvfree_rcu_bulk_data *bnode) { > > > > > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > > > > > + return false; > > > > > > > > > > > >This will mean that under low-memory conditions, we will keep > > > > > >zero pages in ->bkvcache. All attempts to put something there will fail. > > > > > > > > > > > >This is probably not an issue for structures containing an > > > > > >rcu_head that are passed to kfree_rcu(p, field), but doesn't > > > > > >this mean that > > > > > >kfree_rcu_mightsleep() unconditionally invokes synchronize_rcu()? > > > > > >This could seriously slow up freeing under low-memory > > > > > >conditions, which might exacerbate the low-memory conditions. > > > > > > > > > > Thanks for mentioning this, I didn't think of this before😊. > > > > > > > > > > > > > > > > >Is this really what we want? Zero cached rather than just fewer cached? > > > > > > > > > > > > > > > > > > > > > > > > // Check the limit. > > > > > > if (krcp->nr_bkv_objs >= rcu_min_cached_objs) > > > > > > return false; > > > > > > @@ -3221,7 +3223,7 @@ static void fill_page_cache_func(struct work_struct *work) > > > > > > int i; > > > > > > > > > > > > nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? > > > > > > - 1 : rcu_min_cached_objs; > > > > > > + 0 : rcu_min_cached_objs; > > > > > > > > > > > > for (i = 0; i < nr_pages; i++) { > > > > > > > > > > > >I am still confused as to why we start "i" at zero rather than > > > > > >at > > > > > >->nr_bkv_objs. What am I missing here? > > > > > > > > > > > > > > > No, you are right, I missed this place. > > > > > > > > > > --- a/kernel/rcu/tree.c > > > > > +++ b/kernel/rcu/tree.c > > > > > @@ -2908,6 +2908,8 @@ static inline bool > > > > > put_cached_bnode(struct kfree_rcu_cpu *krcp, > > > > > struct kvfree_rcu_bulk_data *bnode) { > > > > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > > > > + return false; > > > > > > > > > >This is broken, unfortunately. If a low memory condition we fill > > > > >fill a cache with at least one page anyway because of we do not > > > > >want to hit a slow path. > > > > > > > > Thanks remind, please ignore my v4 patch, how about the following? > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index > > > > 41daae3239b5..e2e8412e687f 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -3238,6 +3238,9 @@ static void fill_page_cache_func(struct work_struct *work) > > > > free_page((unsigned long) bnode); > > > > break; > > > > } > > > > + > > > > + if (atomic_read(&krcp->backoff_page_cache_fill)) > > > > + break; > > > > } > > > It does not fix an "issue" you are reporting. kvfree_rcu_bulk() > > > function can still fill it back. IMHO, the solution here is to > > > disable cache if a low memory condition and enable back later on. > > > > > > The cache size is controlled by the rcu_min_cached_objs variable. We > > > can set it to 1 and restore it back to original value to make the > > > cache operating as before. > > > > It would be best to use a second variable for this. Users might get > > annoyed if their changes to rcu_min_cached_objs got overwritten once > > things got set back to normal operation. > > > >Agree. So we do not make it visible over sysfs interface for user that we manipulate it. > > > > > > > The rcu_min_cached_objs is read-only, Users cannot be set rcu_min_cached_objs dynamically. > > -r--r--r-- 1 root root 4.0K Apr 12 01:08 rcu_min_cached_objs > You can set it as a boot parameter: rcutree.rcu_min_cached_objs=XXX > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index 41daae3239b5..0e9f83562823 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -2909,7 +2909,8 @@ put_cached_bnode(struct kfree_rcu_cpu *krcp, > struct kvfree_rcu_bulk_data *bnode) > { > // Check the limit. > - if (krcp->nr_bkv_objs >= rcu_min_cached_objs) > + if ((atomic_read(&krcp->backoff_page_cache_fill) && krcp->nr_bkv_objs) || > + krcp->nr_bkv_objs >= rcu_min_cached_objs) > return false; > >We can eliminate the backoff_page_cache_fill per-cpu atomic variable and >just change a new one, say, min_cached_objs, if a low memory condition. >Restore it to a default what is the rcu_min_cached_objs. Thanks for suggestion, will be modified in this way. > >I can post here an example if it helps to make it more clear. > >-- >Uladzislau Rezki
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index cc34d13be181..9d9d3772cc45 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2908,6 +2908,8 @@ static inline bool put_cached_bnode(struct kfree_rcu_cpu *krcp, struct kvfree_rcu_bulk_data *bnode) { + if (atomic_read(&krcp->backoff_page_cache_fill)) + return false; // Check the limit. if (krcp->nr_bkv_objs >= rcu_min_cached_objs) return false; @@ -3221,7 +3223,7 @@ static void fill_page_cache_func(struct work_struct *work) int i; nr_pages = atomic_read(&krcp->backoff_page_cache_fill) ? - 1 : rcu_min_cached_objs; + 0 : rcu_min_cached_objs; for (i = 0; i < nr_pages; i++) { bnode = (struct kvfree_rcu_bulk_data *)