Message ID | 20231123193937.11628-3-ddrokosov@salutedevices.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce62:0:b0:403:3b70:6f57 with SMTP id o2csp672855vqx; Thu, 23 Nov 2023 11:41:50 -0800 (PST) X-Google-Smtp-Source: AGHT+IFYW13nAprLBY41EZDPRZQ8edWY/JgIjN+S8+e25qMZTiANiS4SfwZT6Xk2mb68LIQbcFD9 X-Received: by 2002:a9d:7416:0:b0:6d3:2960:c49 with SMTP id n22-20020a9d7416000000b006d329600c49mr538198otk.26.1700768510716; Thu, 23 Nov 2023 11:41:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700768510; cv=none; d=google.com; s=arc-20160816; b=ooRMUN9SRfmgNyK0rAU9mA/X2Ttod0VzYp5Oe00x/efC3WLprV+rueipSg5kfZuFZs dPBfvWUJ7weg/Mr1ec9lSuPVCaq+h0RP78n/xpsdWj/+gHH1a3aVvIR1oylLAgFRGg2+ MG/QWBgbhuryqacqpD2Qj1ZUu2/aklIvGYJvuFpEI9zA2sY2YxbeIsOTrabdfQ4Q6A2d kQVAZ1upbO/cZlYhtLdk7605Omvsg4OEvziiZT5im6cl7T2mHA1rdTz29HTSZxBSM4f4 bYnMdjG3rLdWN0n6lE7eVcKdR7GM8DnRQ61wVwsOEBa7UiZcqCiMzqmcTKuuT85sDQ17 SwgQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-filter; bh=fGDBCiKmSTE+7SXnjipAsp+7eqI2bC5r1hZ9/0vsdks=; fh=zfITSpriae2MLqKm83qRy3n/ZXLe+Vlxxft5MyYasfM=; b=q6kfQy4eqf/taU4bPbOose93KjK71s3I5qmOsu+1Mc3ZQ9DJoPuNLxu1AmpFrNx0t2 cxfSsqnKzYH+YLGReLGwwTfvNCoWBigxyNSiFdpFNyknkST5SHbfxKSWneKdvcsfyGIQ n0sGgk/b61q9Kn9gaDvlwro1NEPitUsw7+YJnEWIUcumdUGmbgjqxCrd5YdolJiaKEGp RV4GmKZmUSwTh1nCtryy/rqtzd4eVJ68XTrnLvb9btmFpp3ZwXhEY8a5aGaAalOUeXGF 25lTc0D8HoWOHEgoqmH0CseOkIS9nGKgvuZbkhV7RaxgS0fkZ9Y5nFlm8ByZPIVKrRyY siuw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@salutedevices.com header.s=mail header.b=UDEGgq7W; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=salutedevices.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id by30-20020a056a02059e00b005bd3cf5fd1esi2105746pgb.389.2023.11.23.11.41.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Nov 2023 11:41:50 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@salutedevices.com header.s=mail header.b=UDEGgq7W; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=salutedevices.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 2244E80C8D72; Thu, 23 Nov 2023 11:40:06 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229788AbjKWTju (ORCPT <rfc822;ouuuleilei@gmail.com> + 99 others); Thu, 23 Nov 2023 14:39:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59266 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229453AbjKWTjo (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 23 Nov 2023 14:39:44 -0500 Received: from mx1.sberdevices.ru (mx2.sberdevices.ru [45.89.224.132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BCF9FD5A; Thu, 23 Nov 2023 11:39:47 -0800 (PST) Received: from p-infra-ksmg-sc-msk02 (localhost [127.0.0.1]) by mx1.sberdevices.ru (Postfix) with ESMTP id 30808120011; Thu, 23 Nov 2023 22:39:46 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.sberdevices.ru 30808120011 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=salutedevices.com; s=mail; t=1700768386; bh=fGDBCiKmSTE+7SXnjipAsp+7eqI2bC5r1hZ9/0vsdks=; h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type:From; b=UDEGgq7WizGwhy3OIZgsMiqkZLcTpueDXg7U2Imd1cfixaVQx4o6vVtmX9T1OzYn1 T3A4mqAHGm9OWkx7niV8FtJqfMR2DJjvS6R9vZDw2KAfyfRXl5zmKpuNnJsdzRS5jF pFvpyVd4N7VZYKBl3OXBfLrJeuWt/DFIb1jSYICCO52ubgfO5r8HLBQ+JBvTtOVASw g4kI659Jm40D3Rymt6ELCvngjiSmu/QowWf0vNBZsdLvQpXVby0ndm3np0MKZOaNiU UlxVGJQk71a2iyk08KherA4MzmRSDRLJczMDB1C3zFxkE8dHfqpZCspqFBPmQAt2ut s7WQ4D7dm3bGg== Received: from p-i-exch-sc-m01.sberdevices.ru (p-i-exch-sc-m01.sberdevices.ru [172.16.192.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.sberdevices.ru (Postfix) with ESMTPS; Thu, 23 Nov 2023 22:39:46 +0300 (MSK) Received: from localhost.localdomain (100.64.160.123) by p-i-exch-sc-m01.sberdevices.ru (172.16.192.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Thu, 23 Nov 2023 22:39:45 +0300 From: Dmitry Rokosov <ddrokosov@salutedevices.com> To: <rostedt@goodmis.org>, <mhiramat@kernel.org>, <hannes@cmpxchg.org>, <mhocko@kernel.org>, <roman.gushchin@linux.dev>, <shakeelb@google.com>, <muchun.song@linux.dev>, <mhocko@suse.com>, <akpm@linux-foundation.org> CC: <kernel@sberdevices.ru>, <rockosov@gmail.com>, <cgroups@vger.kernel.org>, <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>, <bpf@vger.kernel.org>, Dmitry Rokosov <ddrokosov@salutedevices.com> Subject: [PATCH v3 2/2] mm: memcg: introduce new event to trace shrink_memcg Date: Thu, 23 Nov 2023 22:39:37 +0300 Message-ID: <20231123193937.11628-3-ddrokosov@salutedevices.com> X-Mailer: git-send-email 2.36.0 In-Reply-To: <20231123193937.11628-1-ddrokosov@salutedevices.com> References: <20231123193937.11628-1-ddrokosov@salutedevices.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [100.64.160.123] X-ClientProxiedBy: p-i-exch-sc-m02.sberdevices.ru (172.16.192.103) To p-i-exch-sc-m01.sberdevices.ru (172.16.192.107) X-KSMG-Rule-ID: 10 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Lua-Profiles: 181569 [Nov 23 2023] X-KSMG-AntiSpam-Version: 6.0.0.2 X-KSMG-AntiSpam-Envelope-From: ddrokosov@salutedevices.com X-KSMG-AntiSpam-Rate: 0 X-KSMG-AntiSpam-Status: not_detected X-KSMG-AntiSpam-Method: none X-KSMG-AntiSpam-Auth: dkim=none X-KSMG-AntiSpam-Info: LuaCore: 4 0.3.4 720d3c21819df9b72e78f051e300e232316d302a, {Tracking_from_domain_doesnt_match_to}, d41d8cd98f00b204e9800998ecf8427e.com:7.1.1;100.64.160.123:7.1.2;salutedevices.com:7.1.1;127.0.0.199:7.1.2;p-i-exch-sc-m01.sberdevices.ru:7.1.1,5.0.1, FromAlignment: s, ApMailHostAddress: 100.64.160.123 X-MS-Exchange-Organization-SCL: -1 X-KSMG-AntiSpam-Interceptor-Info: scan successful X-KSMG-AntiPhishing: Clean X-KSMG-LinksScanning: Clean X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 2.0.1.6960, bases: 2023/11/23 17:02:00 #22509098 X-KSMG-AntiVirus-Status: Clean, skipped X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 23 Nov 2023 11:40:06 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783385042075145478 X-GMAIL-MSGID: 1783385042075145478 |
Series |
mm: memcg: improve vmscan tracepoints
|
|
Commit Message
Dmitry Rokosov
Nov. 23, 2023, 7:39 p.m. UTC
The shrink_memcg flow plays a crucial role in memcg reclamation.
Currently, it is not possible to trace this point from non-direct
reclaim paths. However, direct reclaim has its own tracepoint, so there
is no issue there. In certain cases, when debugging memcg pressure,
developers may need to identify all potential requests for memcg
reclamation including kswapd(). The patchset introduces the tracepoints
mm_vmscan_memcg_shrink_{begin|end}() to address this problem.
Example of output in the kswapd context (non-direct reclaim):
kswapd0-39 [001] ..... 240.356378: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16
kswapd0-39 [001] ..... 240.356396: mm_vmscan_memcg_shrink_end: nr_reclaimed=0 memcg=16
kswapd0-39 [001] ..... 240.356420: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16
kswapd0-39 [001] ..... 240.356454: mm_vmscan_memcg_shrink_end: nr_reclaimed=1 memcg=16
kswapd0-39 [001] ..... 240.356479: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16
kswapd0-39 [001] ..... 240.356506: mm_vmscan_memcg_shrink_end: nr_reclaimed=4 memcg=16
kswapd0-39 [001] ..... 240.356525: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16
kswapd0-39 [001] ..... 240.356593: mm_vmscan_memcg_shrink_end: nr_reclaimed=11 memcg=16
kswapd0-39 [001] ..... 240.356614: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16
kswapd0-39 [001] ..... 240.356738: mm_vmscan_memcg_shrink_end: nr_reclaimed=25 memcg=16
kswapd0-39 [001] ..... 240.356790: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16
kswapd0-39 [001] ..... 240.357125: mm_vmscan_memcg_shrink_end: nr_reclaimed=53 memcg=16
Signed-off-by: Dmitry Rokosov <ddrokosov@salutedevices.com>
---
include/trace/events/vmscan.h | 22 ++++++++++++++++++++++
mm/vmscan.c | 7 +++++++
2 files changed, 29 insertions(+)
Comments
On Thu, Nov 23, 2023 at 10:39:37PM +0300, Dmitry Rokosov wrote: > The shrink_memcg flow plays a crucial role in memcg reclamation. > Currently, it is not possible to trace this point from non-direct > reclaim paths. However, direct reclaim has its own tracepoint, so there > is no issue there. In certain cases, when debugging memcg pressure, > developers may need to identify all potential requests for memcg > reclamation including kswapd(). The patchset introduces the tracepoints > mm_vmscan_memcg_shrink_{begin|end}() to address this problem. > > Example of output in the kswapd context (non-direct reclaim): > kswapd0-39 [001] ..... 240.356378: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > kswapd0-39 [001] ..... 240.356396: mm_vmscan_memcg_shrink_end: nr_reclaimed=0 memcg=16 > kswapd0-39 [001] ..... 240.356420: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > kswapd0-39 [001] ..... 240.356454: mm_vmscan_memcg_shrink_end: nr_reclaimed=1 memcg=16 > kswapd0-39 [001] ..... 240.356479: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > kswapd0-39 [001] ..... 240.356506: mm_vmscan_memcg_shrink_end: nr_reclaimed=4 memcg=16 > kswapd0-39 [001] ..... 240.356525: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > kswapd0-39 [001] ..... 240.356593: mm_vmscan_memcg_shrink_end: nr_reclaimed=11 memcg=16 > kswapd0-39 [001] ..... 240.356614: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > kswapd0-39 [001] ..... 240.356738: mm_vmscan_memcg_shrink_end: nr_reclaimed=25 memcg=16 > kswapd0-39 [001] ..... 240.356790: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > kswapd0-39 [001] ..... 240.357125: mm_vmscan_memcg_shrink_end: nr_reclaimed=53 memcg=16 > > Signed-off-by: Dmitry Rokosov <ddrokosov@salutedevices.com> > --- > include/trace/events/vmscan.h | 22 ++++++++++++++++++++++ > mm/vmscan.c | 7 +++++++ > 2 files changed, 29 insertions(+) > > diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h > index e9093fa1c924..a4686afe571d 100644 > --- a/include/trace/events/vmscan.h > +++ b/include/trace/events/vmscan.h > @@ -180,6 +180,17 @@ DEFINE_EVENT(mm_vmscan_memcg_reclaim_begin_template, mm_vmscan_memcg_softlimit_r > TP_ARGS(order, gfp_flags, memcg) > ); > > +DEFINE_EVENT(mm_vmscan_memcg_reclaim_begin_template, mm_vmscan_memcg_shrink_begin, > + > + TP_PROTO(int order, gfp_t gfp_flags, const struct mem_cgroup *memcg), > + > + TP_ARGS(order, gfp_flags, memcg) > +); > + > +#else > + > +#define trace_mm_vmscan_memcg_shrink_begin(...) > + > #endif /* CONFIG_MEMCG */ > > DECLARE_EVENT_CLASS(mm_vmscan_direct_reclaim_end_template, > @@ -243,6 +254,17 @@ DEFINE_EVENT(mm_vmscan_memcg_reclaim_end_template, mm_vmscan_memcg_softlimit_rec > TP_ARGS(nr_reclaimed, memcg) > ); > > +DEFINE_EVENT(mm_vmscan_memcg_reclaim_end_template, mm_vmscan_memcg_shrink_end, > + > + TP_PROTO(unsigned long nr_reclaimed, const struct mem_cgroup *memcg), > + > + TP_ARGS(nr_reclaimed, memcg) > +); > + > +#else > + > +#define trace_mm_vmscan_memcg_shrink_end(...) > + > #endif /* CONFIG_MEMCG */ > > TRACE_EVENT(mm_shrink_slab_start, > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 45780952f4b5..f7e3ddc5a7ad 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -6461,6 +6461,10 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > */ > cond_resched(); > > + trace_mm_vmscan_memcg_shrink_begin(sc->order, > + sc->gfp_mask, > + memcg); > + If you place the start of the trace here, you may have only the begin trace for memcgs whose usage are below their min or low limits. Is that fine? Otherwise you can put it just before shrink_lruvec() call. > mem_cgroup_calculate_protection(target_memcg, memcg); > > if (mem_cgroup_below_min(target_memcg, memcg)) { > @@ -6491,6 +6495,9 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, > sc->priority); > > + trace_mm_vmscan_memcg_shrink_end(sc->nr_reclaimed - reclaimed, > + memcg); > + > /* Record the group's reclaim efficiency */ > if (!sc->proactive) > vmpressure(sc->gfp_mask, memcg, false, > -- > 2.36.0 >
On Sat, Nov 25, 2023 at 06:36:16AM +0000, Shakeel Butt wrote: > On Thu, Nov 23, 2023 at 10:39:37PM +0300, Dmitry Rokosov wrote: > > The shrink_memcg flow plays a crucial role in memcg reclamation. > > Currently, it is not possible to trace this point from non-direct > > reclaim paths. However, direct reclaim has its own tracepoint, so there > > is no issue there. In certain cases, when debugging memcg pressure, > > developers may need to identify all potential requests for memcg > > reclamation including kswapd(). The patchset introduces the tracepoints > > mm_vmscan_memcg_shrink_{begin|end}() to address this problem. > > > > Example of output in the kswapd context (non-direct reclaim): > > kswapd0-39 [001] ..... 240.356378: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > kswapd0-39 [001] ..... 240.356396: mm_vmscan_memcg_shrink_end: nr_reclaimed=0 memcg=16 > > kswapd0-39 [001] ..... 240.356420: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > kswapd0-39 [001] ..... 240.356454: mm_vmscan_memcg_shrink_end: nr_reclaimed=1 memcg=16 > > kswapd0-39 [001] ..... 240.356479: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > kswapd0-39 [001] ..... 240.356506: mm_vmscan_memcg_shrink_end: nr_reclaimed=4 memcg=16 > > kswapd0-39 [001] ..... 240.356525: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > kswapd0-39 [001] ..... 240.356593: mm_vmscan_memcg_shrink_end: nr_reclaimed=11 memcg=16 > > kswapd0-39 [001] ..... 240.356614: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > kswapd0-39 [001] ..... 240.356738: mm_vmscan_memcg_shrink_end: nr_reclaimed=25 memcg=16 > > kswapd0-39 [001] ..... 240.356790: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > kswapd0-39 [001] ..... 240.357125: mm_vmscan_memcg_shrink_end: nr_reclaimed=53 memcg=16 > > > > Signed-off-by: Dmitry Rokosov <ddrokosov@salutedevices.com> > > --- > > include/trace/events/vmscan.h | 22 ++++++++++++++++++++++ > > mm/vmscan.c | 7 +++++++ > > 2 files changed, 29 insertions(+) > > > > diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h > > index e9093fa1c924..a4686afe571d 100644 > > --- a/include/trace/events/vmscan.h > > +++ b/include/trace/events/vmscan.h > > @@ -180,6 +180,17 @@ DEFINE_EVENT(mm_vmscan_memcg_reclaim_begin_template, mm_vmscan_memcg_softlimit_r > > TP_ARGS(order, gfp_flags, memcg) > > ); > > > > +DEFINE_EVENT(mm_vmscan_memcg_reclaim_begin_template, mm_vmscan_memcg_shrink_begin, > > + > > + TP_PROTO(int order, gfp_t gfp_flags, const struct mem_cgroup *memcg), > > + > > + TP_ARGS(order, gfp_flags, memcg) > > +); > > + > > +#else > > + > > +#define trace_mm_vmscan_memcg_shrink_begin(...) > > + > > #endif /* CONFIG_MEMCG */ > > > > DECLARE_EVENT_CLASS(mm_vmscan_direct_reclaim_end_template, > > @@ -243,6 +254,17 @@ DEFINE_EVENT(mm_vmscan_memcg_reclaim_end_template, mm_vmscan_memcg_softlimit_rec > > TP_ARGS(nr_reclaimed, memcg) > > ); > > > > +DEFINE_EVENT(mm_vmscan_memcg_reclaim_end_template, mm_vmscan_memcg_shrink_end, > > + > > + TP_PROTO(unsigned long nr_reclaimed, const struct mem_cgroup *memcg), > > + > > + TP_ARGS(nr_reclaimed, memcg) > > +); > > + > > +#else > > + > > +#define trace_mm_vmscan_memcg_shrink_end(...) > > + > > #endif /* CONFIG_MEMCG */ > > > > TRACE_EVENT(mm_shrink_slab_start, > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 45780952f4b5..f7e3ddc5a7ad 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -6461,6 +6461,10 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > > */ > > cond_resched(); > > > > + trace_mm_vmscan_memcg_shrink_begin(sc->order, > > + sc->gfp_mask, > > + memcg); > > + > > If you place the start of the trace here, you may have only the begin > trace for memcgs whose usage are below their min or low limits. Is that > fine? Otherwise you can put it just before shrink_lruvec() call. > From my point of view, it's fine. For situations like the one you described, when we only see the begin() tracepoint raised without the end(), we understand that reclaim requests are being made but cannot be satisfied due to certain conditions within memcg (such as limits). There may be some spam tracepoints in the trace pipe, which is a disadvantage of this approach. How important do you think it is to understand such situations? Or do you suggest moving the begin() tracepoint after the memcg limits checks and don't care about it? > > mem_cgroup_calculate_protection(target_memcg, memcg); > > > > if (mem_cgroup_below_min(target_memcg, memcg)) { > > @@ -6491,6 +6495,9 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > > shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, > > sc->priority); > > > > + trace_mm_vmscan_memcg_shrink_end(sc->nr_reclaimed - reclaimed, > > + memcg); > > + > > /* Record the group's reclaim efficiency */ > > if (!sc->proactive) > > vmpressure(sc->gfp_mask, memcg, false, > > -- > > 2.36.0 > >
On Sat, Nov 25, 2023 at 11:01:37AM +0300, Dmitry Rokosov wrote: [...] > > > + trace_mm_vmscan_memcg_shrink_begin(sc->order, > > > + sc->gfp_mask, > > > + memcg); > > > + > > > > If you place the start of the trace here, you may have only the begin > > trace for memcgs whose usage are below their min or low limits. Is that > > fine? Otherwise you can put it just before shrink_lruvec() call. > > > > From my point of view, it's fine. For situations like the one you > described, when we only see the begin() tracepoint raised without the > end(), we understand that reclaim requests are being made but cannot be > satisfied due to certain conditions within memcg (such as limits). > > There may be some spam tracepoints in the trace pipe, which is a disadvantage > of this approach. > > How important do you think it is to understand such situations? Or do > you suggest moving the begin() tracepoint after the memcg limits checks > and don't care about it? > I was mainly wondering if that is intentional. It seems like you as first user of this trace has a need to know that a reclaim for a given memcg was triggered but due to min/low limits no reclaim was done. This is a totally reasonable use-case.
On Thu, Nov 23, 2023 at 10:39:37PM +0300, Dmitry Rokosov wrote: > The shrink_memcg flow plays a crucial role in memcg reclamation. > Currently, it is not possible to trace this point from non-direct > reclaim paths. However, direct reclaim has its own tracepoint, so there > is no issue there. In certain cases, when debugging memcg pressure, > developers may need to identify all potential requests for memcg > reclamation including kswapd(). The patchset introduces the tracepoints > mm_vmscan_memcg_shrink_{begin|end}() to address this problem. > > Example of output in the kswapd context (non-direct reclaim): > kswapd0-39 [001] ..... 240.356378: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > kswapd0-39 [001] ..... 240.356396: mm_vmscan_memcg_shrink_end: nr_reclaimed=0 memcg=16 > kswapd0-39 [001] ..... 240.356420: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > kswapd0-39 [001] ..... 240.356454: mm_vmscan_memcg_shrink_end: nr_reclaimed=1 memcg=16 > kswapd0-39 [001] ..... 240.356479: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > kswapd0-39 [001] ..... 240.356506: mm_vmscan_memcg_shrink_end: nr_reclaimed=4 memcg=16 > kswapd0-39 [001] ..... 240.356525: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > kswapd0-39 [001] ..... 240.356593: mm_vmscan_memcg_shrink_end: nr_reclaimed=11 memcg=16 > kswapd0-39 [001] ..... 240.356614: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > kswapd0-39 [001] ..... 240.356738: mm_vmscan_memcg_shrink_end: nr_reclaimed=25 memcg=16 > kswapd0-39 [001] ..... 240.356790: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > kswapd0-39 [001] ..... 240.357125: mm_vmscan_memcg_shrink_end: nr_reclaimed=53 memcg=16 > > Signed-off-by: Dmitry Rokosov <ddrokosov@salutedevices.com> Acked-by: Shakeel Butt <shakeelb@google.com>
On Thu 23-11-23 22:39:37, Dmitry Rokosov wrote: > The shrink_memcg flow plays a crucial role in memcg reclamation. > Currently, it is not possible to trace this point from non-direct > reclaim paths. However, direct reclaim has its own tracepoint, so there > is no issue there. In certain cases, when debugging memcg pressure, > developers may need to identify all potential requests for memcg > reclamation including kswapd(). The patchset introduces the tracepoints > mm_vmscan_memcg_shrink_{begin|end}() to address this problem. > > Example of output in the kswapd context (non-direct reclaim): > kswapd0-39 [001] ..... 240.356378: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > kswapd0-39 [001] ..... 240.356396: mm_vmscan_memcg_shrink_end: nr_reclaimed=0 memcg=16 > kswapd0-39 [001] ..... 240.356420: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > kswapd0-39 [001] ..... 240.356454: mm_vmscan_memcg_shrink_end: nr_reclaimed=1 memcg=16 > kswapd0-39 [001] ..... 240.356479: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > kswapd0-39 [001] ..... 240.356506: mm_vmscan_memcg_shrink_end: nr_reclaimed=4 memcg=16 > kswapd0-39 [001] ..... 240.356525: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > kswapd0-39 [001] ..... 240.356593: mm_vmscan_memcg_shrink_end: nr_reclaimed=11 memcg=16 > kswapd0-39 [001] ..... 240.356614: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > kswapd0-39 [001] ..... 240.356738: mm_vmscan_memcg_shrink_end: nr_reclaimed=25 memcg=16 > kswapd0-39 [001] ..... 240.356790: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > kswapd0-39 [001] ..... 240.357125: mm_vmscan_memcg_shrink_end: nr_reclaimed=53 memcg=16 In the previous version I have asked why do we need this specific tracepoint when we already do have trace_mm_vmscan_lru_shrink_{in}active which already give you a very good insight. That includes the number of reclaimed pages but also more. I do see that we do not include memcg id of the reclaimed LRU, but that shouldn't be a big problem to add, no?
On Mon, Nov 27, 2023 at 10:33:49AM +0100, Michal Hocko wrote: > On Thu 23-11-23 22:39:37, Dmitry Rokosov wrote: > > The shrink_memcg flow plays a crucial role in memcg reclamation. > > Currently, it is not possible to trace this point from non-direct > > reclaim paths. However, direct reclaim has its own tracepoint, so there > > is no issue there. In certain cases, when debugging memcg pressure, > > developers may need to identify all potential requests for memcg > > reclamation including kswapd(). The patchset introduces the tracepoints > > mm_vmscan_memcg_shrink_{begin|end}() to address this problem. > > > > Example of output in the kswapd context (non-direct reclaim): > > kswapd0-39 [001] ..... 240.356378: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > kswapd0-39 [001] ..... 240.356396: mm_vmscan_memcg_shrink_end: nr_reclaimed=0 memcg=16 > > kswapd0-39 [001] ..... 240.356420: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > kswapd0-39 [001] ..... 240.356454: mm_vmscan_memcg_shrink_end: nr_reclaimed=1 memcg=16 > > kswapd0-39 [001] ..... 240.356479: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > kswapd0-39 [001] ..... 240.356506: mm_vmscan_memcg_shrink_end: nr_reclaimed=4 memcg=16 > > kswapd0-39 [001] ..... 240.356525: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > kswapd0-39 [001] ..... 240.356593: mm_vmscan_memcg_shrink_end: nr_reclaimed=11 memcg=16 > > kswapd0-39 [001] ..... 240.356614: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > kswapd0-39 [001] ..... 240.356738: mm_vmscan_memcg_shrink_end: nr_reclaimed=25 memcg=16 > > kswapd0-39 [001] ..... 240.356790: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > kswapd0-39 [001] ..... 240.357125: mm_vmscan_memcg_shrink_end: nr_reclaimed=53 memcg=16 > > In the previous version I have asked why do we need this specific > tracepoint when we already do have trace_mm_vmscan_lru_shrink_{in}active > which already give you a very good insight. That includes the number of > reclaimed pages but also more. I do see that we do not include memcg id > of the reclaimed LRU, but that shouldn't be a big problem to add, no? From my point of view, memcg reclaim includes two points: LRU shrink and slab shrink, as mentioned in the vmscan.c file. static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) ... reclaimed = sc->nr_reclaimed; scanned = sc->nr_scanned; shrink_lruvec(lruvec, sc); shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority); ... So, both of these operations are important for understanding whether memcg reclaiming was successful or not, as well as its effectiveness. I believe it would be beneficial to summarize them, which is why I have created new tracepoints.
On Wed, Nov 29, 2023 at 06:20:57PM +0300, Dmitry Rokosov wrote: > On Tue, Nov 28, 2023 at 10:32:50AM +0100, Michal Hocko wrote: > > On Mon 27-11-23 19:16:37, Dmitry Rokosov wrote: > > > On Mon, Nov 27, 2023 at 01:50:22PM +0100, Michal Hocko wrote: > > > > On Mon 27-11-23 14:36:44, Dmitry Rokosov wrote: > > > > > On Mon, Nov 27, 2023 at 10:33:49AM +0100, Michal Hocko wrote: > > > > > > On Thu 23-11-23 22:39:37, Dmitry Rokosov wrote: > > > > > > > The shrink_memcg flow plays a crucial role in memcg reclamation. > > > > > > > Currently, it is not possible to trace this point from non-direct > > > > > > > reclaim paths. However, direct reclaim has its own tracepoint, so there > > > > > > > is no issue there. In certain cases, when debugging memcg pressure, > > > > > > > developers may need to identify all potential requests for memcg > > > > > > > reclamation including kswapd(). The patchset introduces the tracepoints > > > > > > > mm_vmscan_memcg_shrink_{begin|end}() to address this problem. > > > > > > > > > > > > > > Example of output in the kswapd context (non-direct reclaim): > > > > > > > kswapd0-39 [001] ..... 240.356378: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356396: mm_vmscan_memcg_shrink_end: nr_reclaimed=0 memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356420: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356454: mm_vmscan_memcg_shrink_end: nr_reclaimed=1 memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356479: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356506: mm_vmscan_memcg_shrink_end: nr_reclaimed=4 memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356525: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356593: mm_vmscan_memcg_shrink_end: nr_reclaimed=11 memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356614: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356738: mm_vmscan_memcg_shrink_end: nr_reclaimed=25 memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.356790: mm_vmscan_memcg_shrink_begin: order=0 gfp_flags=GFP_KERNEL memcg=16 > > > > > > > kswapd0-39 [001] ..... 240.357125: mm_vmscan_memcg_shrink_end: nr_reclaimed=53 memcg=16 > > > > > > > > > > > > In the previous version I have asked why do we need this specific > > > > > > tracepoint when we already do have trace_mm_vmscan_lru_shrink_{in}active > > > > > > which already give you a very good insight. That includes the number of > > > > > > reclaimed pages but also more. I do see that we do not include memcg id > > > > > > of the reclaimed LRU, but that shouldn't be a big problem to add, no? > > > > > > > > > > >From my point of view, memcg reclaim includes two points: LRU shrink and > > > > > slab shrink, as mentioned in the vmscan.c file. > > > > > > > > > > > > > > > static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) > > > > > ... > > > > > reclaimed = sc->nr_reclaimed; > > > > > scanned = sc->nr_scanned; > > > > > > > > > > shrink_lruvec(lruvec, sc); > > > > > > > > > > shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, > > > > > sc->priority); > > > > > ... > > > > > > > > > > So, both of these operations are important for understanding whether > > > > > memcg reclaiming was successful or not, as well as its effectiveness. I > > > > > believe it would be beneficial to summarize them, which is why I have > > > > > created new tracepoints. > > > > > > > > This sounds like nice to have rather than must. Put it differently. If > > > > you make existing reclaim trace points memcg aware (print memcg id) then > > > > what prevents you from making analysis you need? > > > > > > You are right, nothing prevents me from making this analysis... but... > > > > > > This approach does have some disadvantages: > > > 1) It requires more changes to vmscan. At the very least, the memcg > > > object should be forwarded to all subfunctions for LRU and SLAB > > > shrinkers. > > > > We should have lruvec or memcg available. lruvec_memcg() could be used > > to get memcg from the lruvec. It might be more places to add the id but > > arguably this would improve them to identify where the memory has been > > scanned/reclaimed from. > > > > Oh, thank you, didn't see this conversion function before... > > > > 2) With this approach, we will not have the ability to trace a situation > > > where the kernel is requesting reclaim for a specific memcg, but due to > > > limits issues, we are unable to run it. > > > > I do not follow. Could you be more specific please? > > > > I'm referring to a situation where kswapd() or another kernel mm code > requests some reclaim pages from memcg, but memcg rejects it due to > limits checkers. This occurs in the shrink_node_memcgs() function. > > === > mem_cgroup_calculate_protection(target_memcg, memcg); > > if (mem_cgroup_below_min(target_memcg, memcg)) { > /* > * Hard protection. > * If there is no reclaimable memory, OOM. > */ > continue; > } else if (mem_cgroup_below_low(target_memcg, memcg)) { > /* > * Soft protection. > * Respect the protection only as long as > * there is an unprotected supply > * of reclaimable memory from other cgroups. > */ > if (!sc->memcg_low_reclaim) { > sc->memcg_low_skipped = 1; > continue; > } > memcg_memory_event(memcg, MEMCG_LOW); > } > === > > With separate shrink begin()/end() tracepoints we can detect such > problem. > > > > > 3) LRU and SLAB shrinkers are too common places to handle memcg-related > > > tasks. Additionally, memcg can be disabled in the kernel configuration. > > > > Right. This could be all hidden in the tracing code. You simply do not > > print memcg id when the controller is disabled. Or just simply print 0. > > I do not really see any major problems with that. > > > > I would really prefer to focus on that direction rather than adding > > another begin/end tracepoint which overalaps with existing begin/end > > traces and provides much more limited information because I would bet we > > will have somebody complaining that mere nr_reclaimed is not sufficient. > > Okay, I will try to prepare a new patch version with memcg printing from > lruvec and slab tracepoints. > > Then Andrew should drop the previous patchsets, I suppose. Please advise > on the correct workflow steps here. Actually, it has already been merged into linux-next... I just checked. Maybe it would be better to prepare lruvec and slab memcg printing as a separate patch series? https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=0e7f0c52a76cb22c8633f21bff6e48fabff6016e
On Wed, Nov 29, 2023 at 05:06:37PM +0100, Michal Hocko wrote: > On Wed 29-11-23 18:20:57, Dmitry Rokosov wrote: > > On Tue, Nov 28, 2023 at 10:32:50AM +0100, Michal Hocko wrote: > > > On Mon 27-11-23 19:16:37, Dmitry Rokosov wrote: > [...] > > > > 2) With this approach, we will not have the ability to trace a situation > > > > where the kernel is requesting reclaim for a specific memcg, but due to > > > > limits issues, we are unable to run it. > > > > > > I do not follow. Could you be more specific please? > > > > > > > I'm referring to a situation where kswapd() or another kernel mm code > > requests some reclaim pages from memcg, but memcg rejects it due to > > limits checkers. This occurs in the shrink_node_memcgs() function. > > Ohh, you mean reclaim protection > > > === > > mem_cgroup_calculate_protection(target_memcg, memcg); > > > > if (mem_cgroup_below_min(target_memcg, memcg)) { > > /* > > * Hard protection. > > * If there is no reclaimable memory, OOM. > > */ > > continue; > > } else if (mem_cgroup_below_low(target_memcg, memcg)) { > > /* > > * Soft protection. > > * Respect the protection only as long as > > * there is an unprotected supply > > * of reclaimable memory from other cgroups. > > */ > > if (!sc->memcg_low_reclaim) { > > sc->memcg_low_skipped = 1; > > continue; > > } > > memcg_memory_event(memcg, MEMCG_LOW); > > } > > === > > > > With separate shrink begin()/end() tracepoints we can detect such > > problem. > > How? You are only reporting the number of reclaimed pages and no > reclaimed pages could be not just because of low/min limits but > generally because of other reasons. You would need to report also the > number of scanned/isolated pages. > From my perspective, if memory control group (memcg) protection restrictions occur, we can identify them by the absence of the end() pair of begin(). Other reasons will have both tracepoints raised. > > > > 3) LRU and SLAB shrinkers are too common places to handle memcg-related > > > > tasks. Additionally, memcg can be disabled in the kernel configuration. > > > > > > Right. This could be all hidden in the tracing code. You simply do not > > > print memcg id when the controller is disabled. Or just simply print 0. > > > I do not really see any major problems with that. > > > > > > I would really prefer to focus on that direction rather than adding > > > another begin/end tracepoint which overalaps with existing begin/end > > > traces and provides much more limited information because I would bet we > > > will have somebody complaining that mere nr_reclaimed is not sufficient. > > > > Okay, I will try to prepare a new patch version with memcg printing from > > lruvec and slab tracepoints. > > > > Then Andrew should drop the previous patchsets, I suppose. Please advise > > on the correct workflow steps here. > > Andrew usually just drops the patch from his tree and it will disappaer > from the linux-next as well. Okay, I understand, thank you! Andrew, could you please take a look? I am planning to prepare a new patch version based on Michal's suggestion, so previous one should be dropped.
On Wed 29-11-23 19:57:52, Dmitry Rokosov wrote: > On Wed, Nov 29, 2023 at 05:06:37PM +0100, Michal Hocko wrote: > > On Wed 29-11-23 18:20:57, Dmitry Rokosov wrote: > > > On Tue, Nov 28, 2023 at 10:32:50AM +0100, Michal Hocko wrote: > > > > On Mon 27-11-23 19:16:37, Dmitry Rokosov wrote: > > [...] > > > > > 2) With this approach, we will not have the ability to trace a situation > > > > > where the kernel is requesting reclaim for a specific memcg, but due to > > > > > limits issues, we are unable to run it. > > > > > > > > I do not follow. Could you be more specific please? > > > > > > > > > > I'm referring to a situation where kswapd() or another kernel mm code > > > requests some reclaim pages from memcg, but memcg rejects it due to > > > limits checkers. This occurs in the shrink_node_memcgs() function. > > > > Ohh, you mean reclaim protection > > > > > === > > > mem_cgroup_calculate_protection(target_memcg, memcg); > > > > > > if (mem_cgroup_below_min(target_memcg, memcg)) { > > > /* > > > * Hard protection. > > > * If there is no reclaimable memory, OOM. > > > */ > > > continue; > > > } else if (mem_cgroup_below_low(target_memcg, memcg)) { > > > /* > > > * Soft protection. > > > * Respect the protection only as long as > > > * there is an unprotected supply > > > * of reclaimable memory from other cgroups. > > > */ > > > if (!sc->memcg_low_reclaim) { > > > sc->memcg_low_skipped = 1; > > > continue; > > > } > > > memcg_memory_event(memcg, MEMCG_LOW); > > > } > > > === > > > > > > With separate shrink begin()/end() tracepoints we can detect such > > > problem. > > > > How? You are only reporting the number of reclaimed pages and no > > reclaimed pages could be not just because of low/min limits but > > generally because of other reasons. You would need to report also the > > number of scanned/isolated pages. > > > > From my perspective, if memory control group (memcg) protection > restrictions occur, we can identify them by the absence of the end() > pair of begin(). Other reasons will have both tracepoints raised. That is not really great way to detect that TBH. Trace events could be lost and then you simply do not know what has happened.
On Wed, Nov 29, 2023 at 06:10:33PM +0100, Michal Hocko wrote: > On Wed 29-11-23 19:57:52, Dmitry Rokosov wrote: > > On Wed, Nov 29, 2023 at 05:06:37PM +0100, Michal Hocko wrote: > > > On Wed 29-11-23 18:20:57, Dmitry Rokosov wrote: > > > > On Tue, Nov 28, 2023 at 10:32:50AM +0100, Michal Hocko wrote: > > > > > On Mon 27-11-23 19:16:37, Dmitry Rokosov wrote: > > > [...] > > > > > > 2) With this approach, we will not have the ability to trace a situation > > > > > > where the kernel is requesting reclaim for a specific memcg, but due to > > > > > > limits issues, we are unable to run it. > > > > > > > > > > I do not follow. Could you be more specific please? > > > > > > > > > > > > > I'm referring to a situation where kswapd() or another kernel mm code > > > > requests some reclaim pages from memcg, but memcg rejects it due to > > > > limits checkers. This occurs in the shrink_node_memcgs() function. > > > > > > Ohh, you mean reclaim protection > > > > > > > === > > > > mem_cgroup_calculate_protection(target_memcg, memcg); > > > > > > > > if (mem_cgroup_below_min(target_memcg, memcg)) { > > > > /* > > > > * Hard protection. > > > > * If there is no reclaimable memory, OOM. > > > > */ > > > > continue; > > > > } else if (mem_cgroup_below_low(target_memcg, memcg)) { > > > > /* > > > > * Soft protection. > > > > * Respect the protection only as long as > > > > * there is an unprotected supply > > > > * of reclaimable memory from other cgroups. > > > > */ > > > > if (!sc->memcg_low_reclaim) { > > > > sc->memcg_low_skipped = 1; > > > > continue; > > > > } > > > > memcg_memory_event(memcg, MEMCG_LOW); > > > > } > > > > === > > > > > > > > With separate shrink begin()/end() tracepoints we can detect such > > > > problem. > > > > > > How? You are only reporting the number of reclaimed pages and no > > > reclaimed pages could be not just because of low/min limits but > > > generally because of other reasons. You would need to report also the > > > number of scanned/isolated pages. > > > > > > > From my perspective, if memory control group (memcg) protection > > restrictions occur, we can identify them by the absence of the end() > > pair of begin(). Other reasons will have both tracepoints raised. > > That is not really great way to detect that TBH. Trace events could be > lost and then you simply do not know what has happened. I see, thank you very much for the detailed review! I will prepare a new patchset with memcg names in the lruvec and slab paths, will back soon.
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h index e9093fa1c924..a4686afe571d 100644 --- a/include/trace/events/vmscan.h +++ b/include/trace/events/vmscan.h @@ -180,6 +180,17 @@ DEFINE_EVENT(mm_vmscan_memcg_reclaim_begin_template, mm_vmscan_memcg_softlimit_r TP_ARGS(order, gfp_flags, memcg) ); +DEFINE_EVENT(mm_vmscan_memcg_reclaim_begin_template, mm_vmscan_memcg_shrink_begin, + + TP_PROTO(int order, gfp_t gfp_flags, const struct mem_cgroup *memcg), + + TP_ARGS(order, gfp_flags, memcg) +); + +#else + +#define trace_mm_vmscan_memcg_shrink_begin(...) + #endif /* CONFIG_MEMCG */ DECLARE_EVENT_CLASS(mm_vmscan_direct_reclaim_end_template, @@ -243,6 +254,17 @@ DEFINE_EVENT(mm_vmscan_memcg_reclaim_end_template, mm_vmscan_memcg_softlimit_rec TP_ARGS(nr_reclaimed, memcg) ); +DEFINE_EVENT(mm_vmscan_memcg_reclaim_end_template, mm_vmscan_memcg_shrink_end, + + TP_PROTO(unsigned long nr_reclaimed, const struct mem_cgroup *memcg), + + TP_ARGS(nr_reclaimed, memcg) +); + +#else + +#define trace_mm_vmscan_memcg_shrink_end(...) + #endif /* CONFIG_MEMCG */ TRACE_EVENT(mm_shrink_slab_start, diff --git a/mm/vmscan.c b/mm/vmscan.c index 45780952f4b5..f7e3ddc5a7ad 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -6461,6 +6461,10 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) */ cond_resched(); + trace_mm_vmscan_memcg_shrink_begin(sc->order, + sc->gfp_mask, + memcg); + mem_cgroup_calculate_protection(target_memcg, memcg); if (mem_cgroup_below_min(target_memcg, memcg)) { @@ -6491,6 +6495,9 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority); + trace_mm_vmscan_memcg_shrink_end(sc->nr_reclaimed - reclaimed, + memcg); + /* Record the group's reclaim efficiency */ if (!sc->proactive) vmpressure(sc->gfp_mask, memcg, false,