Message ID | 20240212213922.783301-32-surenb@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-62437-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp186170dyb; Mon, 12 Feb 2024 13:52:42 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWfoQ++uMMjPkap2NAMgmRl+ouB1zs8K3nM4KrLV2w2IacS/xPpkPwNr4LPHB5FI81uCKLPXmHDcQWGKoR7B8apSmD4ug== X-Google-Smtp-Source: AGHT+IGBX5/9oibtmUotqfWiSy/JRedscc2WhJJKDyInT0U3JaEWCS88EAw5u7WRW7seX4jb9OqO X-Received: by 2002:a0c:f5c9:0:b0:68c:da0a:2108 with SMTP id q9-20020a0cf5c9000000b0068cda0a2108mr8505893qvm.28.1707774762563; Mon, 12 Feb 2024 13:52:42 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707774762; cv=pass; d=google.com; s=arc-20160816; b=S5OBUaw779l8Wg+WwzUoqn0PRHGn0ZoW+l0KnB0LNVdJ4+f+5wqfqcR71o8veYrmBV NbRHXA7GX8lROUo0gD6W5en5WX79kG6r3nPG6lYuYIRoYC6Oc+nb1EgAwloeKC/Nn5LM bPf24EUWOm2Leqkxebk+4R9JTKayW7nb87obIcSEMG4wggUlzezBV6teiralKQTXmW+h vH2fitunJgX6XVqbpt1zQfB/gauqGykGtjqxhSNF+vqPAxElZOwJzAkymLAVk56tI0+w 1PZNeERTHZZuwkY+pDuIK+YPMYAR3KRQS8xJr/Vlc2NyGlBJxV1c6R5gcca1Jh0HvOxb hvAw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:message-id:references:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date :dkim-signature; bh=hFi0jScQsczt4c2Plav2pAvubIeyjPFOX5wIUH8AZHM=; fh=+fOV3l1pEFRfg5AkBVZxk8aiVKXYpSkJ/ThUo52rMgE=; b=lEV5CSs0E4ywimbmkLEQfvyNBRUZN7RmMCw8+M8Yuokamq9vaoECqNA84ifueWWjpm oxzHUUhOsrj6czMiARyXBady6adCvVEEKBhSrxIYW5zeuuoWI+xjYlwz5r1+P+Na2/ki 79ybuodQ9/hcHPUycmap3yAkcf2hEpGrqA2iCigjaftcavPPSYDsX7QnS+kcRsQR6uRV +W7vlkN/YwigyDCuz9RXxyDw+5Ci5O+2F6xwp3N/lXQV6wzNQOni3DuuaiBi22ExyQM5 YohNMccYiqADcf7YbUELJzuMdyzA98gFWXimdLDeg08mndgr4kG+K+IqIQpwUFBM+yOy noNg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="c/1ZZfBR"; arc=pass (i=1 spf=pass spfdomain=flex--surenb.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-62437-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-62437-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com X-Forwarded-Encrypted: i=2; AJvYcCXLKT6+SD9XUFl606mQqCdxSCuSBC2VI6aidtNAv1NiwtDCBx2bWmdtCcomVl+hBeZHD541lTbG4oYPF0w3YeYr9tiSvA== Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id p10-20020ad45f4a000000b0068d14851dbbsi1380850qvg.302.2024.02.12.13.52.42 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Feb 2024 13:52:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-62437-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="c/1ZZfBR"; arc=pass (i=1 spf=pass spfdomain=flex--surenb.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-62437-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-62437-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 4D20C1C21FA5 for <ouuuleilei@gmail.com>; Mon, 12 Feb 2024 21:52:42 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DA8775EE77; Mon, 12 Feb 2024 21:40:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="c/1ZZfBR" Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B1295DF01 for <linux-kernel@vger.kernel.org>; Mon, 12 Feb 2024 21:40:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707774042; cv=none; b=ZIJ+HUDDM3EqRgnzAdQJ3LFsWg9x3wI2GkTOYLooycjexsc8PZgew1knMIMtSNAK+2PspvYBohSTBj1sgTp1Mvco0Zrp6AZqmGw5ipJ3uVnsWbrZWnX+CM2XDNCuqlORKfX04yP0Dz2UtljvfStXmLs6nJsyhoZvL757FuG0CDY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707774042; c=relaxed/simple; bh=MrsgVd/Qc298kApLcMr0fYQLGuQi40dU4EQGVe/x6w4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=sr661dhCBq8Ex9CfknHH1JcvnJ5yyVJN+RWbZYpcY1DWx9UdSVWLocz1uV0bZ5O8QvraxcwHNQVGNdOl44nt47ER7BZj5LbmnPNqP0b38Lrnl8foY7IwttBHVzVq0LYNtyfu3+a0K1jYwrl7AaziwiQLn8Rw/yv2O4rSV4bzLz4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=c/1ZZfBR; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--surenb.bounces.google.com Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-604a423af12so82714557b3.1 for <linux-kernel@vger.kernel.org>; Mon, 12 Feb 2024 13:40:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1707774040; x=1708378840; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=hFi0jScQsczt4c2Plav2pAvubIeyjPFOX5wIUH8AZHM=; b=c/1ZZfBRdBbLqbqdFHuYzhc9WAChL0QL3QRl25KzATLzer6svTpEiPd1L/ak7eCfCf rFChlTkhcz+Xz9gSjYMn2ynJelxCPQJWx2d2kkLQ6fb1YsCd/Jq/zQSXfXHpbVFvruWF bR5hSonjSEnsIDKQ97gQQKzxX1aH86iDpKnW7O+837eLQKXus7xEJIF6hxmPcOL+6sDh P/Pt1YadLifgnfiJ/fOyhztuRxtiqCOaWkp1En6qyU/TTKjPYVYvTBgquBy5i8eGPFL5 nRVLR2xRR24diT4HrF8MA8tUNbR7jgTuSyvnnzHsYW4WfYRUsDN7jCObhXsCXajOy3sl UPqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707774040; x=1708378840; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hFi0jScQsczt4c2Plav2pAvubIeyjPFOX5wIUH8AZHM=; b=fxTuVkZ3Eznz2C5WWB0CFocBUvJ3A6hXP9D/I/UBYYaFPb1D8XjutlMRHDhD0hUwRS Iz/soRqyTy2nZ++weSoXGl0w3PKSZw98cNII/8fY+JcTinGjk/pYKxzmA5YvX725nS0B eeFbDAnWxdljCxnaQhVgwdyilacmVhuFJT9wHM4FeQ0bTWBuJn7gRqmlZKC880FtCHc/ UFUFyBlHA8kJMcdOGyAV7QWyCnDmB6k5b6bBNixlPhQESeFBMWEMsobMROiFPigVSLC7 v0yg95FPaYDiQpjw0PYM6vj8gyr+30BHHtf4iKTmoSrNPrz1BSMbyakDWdQyoT85Hplp ZXHA== X-Forwarded-Encrypted: i=1; AJvYcCVeI+B1pBfqyUDGxwqI6cSoeBE3TpzCvGSJsZ3l8Sf6biwgScUMyrQvfPfEmus5Yu5Q93BGgw4GU0WCBFvIKkNCHKsDnUMUtAsCysXV X-Gm-Message-State: AOJu0YxKaNbOqbLgLYhEiiYNCWGQs40zuGmRWKAiWxZcXAXihFr8ySnn DNlcQoST+3TdB+esOPkbrM9u1z5UsEU7S3aZQAZ5ZS92x9b3Cws4rMaDmVk0AlUFbmOyRNzzh1z yTQ== X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:201:b848:2b3f:be49:9cbc]) (user=surenb job=sendgmr) by 2002:a05:690c:b8b:b0:5ff:96b6:8ee1 with SMTP id ck11-20020a05690c0b8b00b005ff96b68ee1mr2134418ywb.7.1707774039644; Mon, 12 Feb 2024 13:40:39 -0800 (PST) Date: Mon, 12 Feb 2024 13:39:17 -0800 In-Reply-To: <20240212213922.783301-1-surenb@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> Mime-Version: 1.0 References: <20240212213922.783301-1-surenb@google.com> X-Mailer: git-send-email 2.43.0.687.g38aa6559b0-goog Message-ID: <20240212213922.783301-32-surenb@google.com> Subject: [PATCH v3 31/35] lib: add memory allocations report in show_mem() From: Suren Baghdasaryan <surenb@google.com> To: akpm@linux-foundation.org Cc: kent.overstreet@linux.dev, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, roman.gushchin@linux.dev, mgorman@suse.de, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, corbet@lwn.net, void@manifault.com, peterz@infradead.org, juri.lelli@redhat.com, catalin.marinas@arm.com, will@kernel.org, arnd@arndb.de, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, x86@kernel.org, peterx@redhat.com, david@redhat.com, axboe@kernel.dk, mcgrof@kernel.org, masahiroy@kernel.org, nathan@kernel.org, dennis@kernel.org, tj@kernel.org, muchun.song@linux.dev, rppt@kernel.org, paulmck@kernel.org, pasha.tatashin@soleen.com, yosryahmed@google.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, andreyknvl@gmail.com, keescook@chromium.org, ndesaulniers@google.com, vvvvvv@google.com, gregkh@linuxfoundation.org, ebiggers@google.com, ytcoode@gmail.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, bristot@redhat.com, vschneid@redhat.com, cl@linux.com, penberg@kernel.org, iamjoonsoo.kim@lge.com, 42.hyeyoo@gmail.com, glider@google.com, elver@google.com, dvyukov@google.com, shakeelb@google.com, songmuchun@bytedance.com, jbaron@akamai.com, rientjes@google.com, minchan@google.com, kaleshsingh@google.com, surenb@google.com, kernel-team@android.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, iommu@lists.linux.dev, linux-arch@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-modules@vger.kernel.org, kasan-dev@googlegroups.com, cgroups@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790731629855213692 X-GMAIL-MSGID: 1790731629855213692 |
Series |
Memory allocation profiling
|
|
Commit Message
Suren Baghdasaryan
Feb. 12, 2024, 9:39 p.m. UTC
Include allocations in show_mem reports. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Suren Baghdasaryan <surenb@google.com> --- include/linux/alloc_tag.h | 2 ++ lib/alloc_tag.c | 38 ++++++++++++++++++++++++++++++++++++++ mm/show_mem.c | 15 +++++++++++++++ 3 files changed, 55 insertions(+)
Comments
On Mon, Feb 12, 2024 at 01:39:17PM -0800, Suren Baghdasaryan wrote: > Include allocations in show_mem reports. > > Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > --- > include/linux/alloc_tag.h | 2 ++ > lib/alloc_tag.c | 38 ++++++++++++++++++++++++++++++++++++++ > mm/show_mem.c | 15 +++++++++++++++ > 3 files changed, 55 insertions(+) > > diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h > index 3fe51e67e231..0a5973c4ad77 100644 > --- a/include/linux/alloc_tag.h > +++ b/include/linux/alloc_tag.h > @@ -30,6 +30,8 @@ struct alloc_tag { > > #ifdef CONFIG_MEM_ALLOC_PROFILING > > +void alloc_tags_show_mem_report(struct seq_buf *s); > + > static inline struct alloc_tag *ct_to_alloc_tag(struct codetag *ct) > { > return container_of(ct, struct alloc_tag, ct); > diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c > index 2d5226d9262d..54312c213860 100644 > --- a/lib/alloc_tag.c > +++ b/lib/alloc_tag.c > @@ -96,6 +96,44 @@ static const struct seq_operations allocinfo_seq_op = { > .show = allocinfo_show, > }; > > +void alloc_tags_show_mem_report(struct seq_buf *s) > +{ > + struct codetag_iterator iter; > + struct codetag *ct; > + struct { > + struct codetag *tag; > + size_t bytes; > + } tags[10], n; > + unsigned int i, nr = 0; > + > + codetag_lock_module_list(alloc_tag_cttype, true); > + iter = codetag_get_ct_iter(alloc_tag_cttype); > + while ((ct = codetag_next_ct(&iter))) { > + struct alloc_tag_counters counter = alloc_tag_read(ct_to_alloc_tag(ct)); > + > + n.tag = ct; > + n.bytes = counter.bytes; > + > + for (i = 0; i < nr; i++) > + if (n.bytes > tags[i].bytes) > + break; > + > + if (i < ARRAY_SIZE(tags)) { > + nr -= nr == ARRAY_SIZE(tags); > + memmove(&tags[i + 1], > + &tags[i], > + sizeof(tags[0]) * (nr - i)); > + nr++; > + tags[i] = n; > + } > + } > + > + for (i = 0; i < nr; i++) > + alloc_tag_to_text(s, tags[i].tag); > + > + codetag_lock_module_list(alloc_tag_cttype, false); > +} > + > static void __init procfs_init(void) > { > proc_create_seq("allocinfo", 0444, NULL, &allocinfo_seq_op); > diff --git a/mm/show_mem.c b/mm/show_mem.c > index 8dcfafbd283c..d514c15ca076 100644 > --- a/mm/show_mem.c > +++ b/mm/show_mem.c > @@ -12,6 +12,7 @@ > #include <linux/hugetlb.h> > #include <linux/mm.h> > #include <linux/mmzone.h> > +#include <linux/seq_buf.h> > #include <linux/swap.h> > #include <linux/vmstat.h> > > @@ -423,4 +424,18 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > #ifdef CONFIG_MEMORY_FAILURE > printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); > #endif > +#ifdef CONFIG_MEM_ALLOC_PROFILING > + { > + struct seq_buf s; > + char *buf = kmalloc(4096, GFP_ATOMIC); Why 4096? Maybe use PAGE_SIZE instead? > + > + if (buf) { > + printk("Memory allocations:\n"); This needs a printk prefix, or better yet, just use pr_info() or similar. > + seq_buf_init(&s, buf, 4096); > + alloc_tags_show_mem_report(&s); > + printk("%s", buf); Once a seq_buf "consumes" a char *, please don't use any directly any more. This should be: pr_info("%s", seq_buf_str(s)); Otherwise %NUL termination isn't certain. Very likely, given the use case here, but let's use good hygiene. :) > + kfree(buf); > + } > + } > +#endif > } > -- > 2.43.0.687.g38aa6559b0-goog >
On Mon, 12 Feb 2024 16:10:02 -0800 Kees Cook <keescook@chromium.org> wrote: > > #endif > > +#ifdef CONFIG_MEM_ALLOC_PROFILING > > + { > > + struct seq_buf s; > > + char *buf = kmalloc(4096, GFP_ATOMIC); > > Why 4096? Maybe use PAGE_SIZE instead? Will it make a difference for architectures that don't have 4096 PAGE_SIZE? Like PowerPC which has PAGE_SIZE of anywhere between 4K to 256K! -- Steve
On Mon, Feb 12, 2024 at 07:22:42PM -0500, Steven Rostedt wrote: > On Mon, 12 Feb 2024 16:10:02 -0800 > Kees Cook <keescook@chromium.org> wrote: > > > > #endif > > > +#ifdef CONFIG_MEM_ALLOC_PROFILING > > > + { > > > + struct seq_buf s; > > > + char *buf = kmalloc(4096, GFP_ATOMIC); > > > > Why 4096? Maybe use PAGE_SIZE instead? > > Will it make a difference for architectures that don't have 4096 PAGE_SIZE? > Like PowerPC which has PAGE_SIZE of anywhere between 4K to 256K! it's just a string buffer
On Mon, Feb 12, 2024 at 8:33 PM Kent Overstreet <kent.overstreet@linux.dev> wrote: > > On Mon, Feb 12, 2024 at 07:22:42PM -0500, Steven Rostedt wrote: > > On Mon, 12 Feb 2024 16:10:02 -0800 > > Kees Cook <keescook@chromium.org> wrote: > > > > > > #endif > > > > +#ifdef CONFIG_MEM_ALLOC_PROFILING > > > > + { > > > > + struct seq_buf s; > > > > + char *buf = kmalloc(4096, GFP_ATOMIC); > > > > > > Why 4096? Maybe use PAGE_SIZE instead? > > > > Will it make a difference for architectures that don't have 4096 PAGE_SIZE? > > Like PowerPC which has PAGE_SIZE of anywhere between 4K to 256K! > > it's just a string buffer We should document that __show_mem() prints only the top 10 largest allocations, therefore as long as this buffer is large enough to hold 10 records we should be good. Technically we could simply print one record at a time and then the buffer can be smaller.
On Mon 12-02-24 13:39:17, Suren Baghdasaryan wrote: [...] > @@ -423,4 +424,18 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > #ifdef CONFIG_MEMORY_FAILURE > printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); > #endif > +#ifdef CONFIG_MEM_ALLOC_PROFILING > + { > + struct seq_buf s; > + char *buf = kmalloc(4096, GFP_ATOMIC); > + > + if (buf) { > + printk("Memory allocations:\n"); > + seq_buf_init(&s, buf, 4096); > + alloc_tags_show_mem_report(&s); > + printk("%s", buf); > + kfree(buf); > + } > + } > +#endif I am pretty sure I have already objected to this. Memory allocations in the oom path are simply no go unless there is absolutely no other way around that. In this case the buffer could be preallocated.
On Thu, Feb 15, 2024 at 1:22 AM Michal Hocko <mhocko@suse.com> wrote: > > On Mon 12-02-24 13:39:17, Suren Baghdasaryan wrote: > [...] > > @@ -423,4 +424,18 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > > #ifdef CONFIG_MEMORY_FAILURE > > printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); > > #endif > > +#ifdef CONFIG_MEM_ALLOC_PROFILING > > + { > > + struct seq_buf s; > > + char *buf = kmalloc(4096, GFP_ATOMIC); > > + > > + if (buf) { > > + printk("Memory allocations:\n"); > > + seq_buf_init(&s, buf, 4096); > > + alloc_tags_show_mem_report(&s); > > + printk("%s", buf); > > + kfree(buf); > > + } > > + } > > +#endif > > I am pretty sure I have already objected to this. Memory allocations in > the oom path are simply no go unless there is absolutely no other way > around that. In this case the buffer could be preallocated. Good point. We will change this to a smaller buffer allocated on the stack and will print records one-by-one. Thanks! > > -- > Michal Hocko > SUSE Labs
On Thu 15-02-24 06:58:42, Suren Baghdasaryan wrote: > On Thu, Feb 15, 2024 at 1:22 AM Michal Hocko <mhocko@suse.com> wrote: > > > > On Mon 12-02-24 13:39:17, Suren Baghdasaryan wrote: > > [...] > > > @@ -423,4 +424,18 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > > > #ifdef CONFIG_MEMORY_FAILURE > > > printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); > > > #endif > > > +#ifdef CONFIG_MEM_ALLOC_PROFILING > > > + { > > > + struct seq_buf s; > > > + char *buf = kmalloc(4096, GFP_ATOMIC); > > > + > > > + if (buf) { > > > + printk("Memory allocations:\n"); > > > + seq_buf_init(&s, buf, 4096); > > > + alloc_tags_show_mem_report(&s); > > > + printk("%s", buf); > > > + kfree(buf); > > > + } > > > + } > > > +#endif > > > > I am pretty sure I have already objected to this. Memory allocations in > > the oom path are simply no go unless there is absolutely no other way > > around that. In this case the buffer could be preallocated. > > Good point. We will change this to a smaller buffer allocated on the > stack and will print records one-by-one. Thanks! __show_mem could be called with a very deep call chains. A single pre-allocated buffer should just do ok.
On Thu, Feb 15, 2024 at 8:45 AM Michal Hocko <mhocko@suse.com> wrote: > > On Thu 15-02-24 06:58:42, Suren Baghdasaryan wrote: > > On Thu, Feb 15, 2024 at 1:22 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > On Mon 12-02-24 13:39:17, Suren Baghdasaryan wrote: > > > [...] > > > > @@ -423,4 +424,18 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > > > > #ifdef CONFIG_MEMORY_FAILURE > > > > printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); > > > > #endif > > > > +#ifdef CONFIG_MEM_ALLOC_PROFILING > > > > + { > > > > + struct seq_buf s; > > > > + char *buf = kmalloc(4096, GFP_ATOMIC); > > > > + > > > > + if (buf) { > > > > + printk("Memory allocations:\n"); > > > > + seq_buf_init(&s, buf, 4096); > > > > + alloc_tags_show_mem_report(&s); > > > > + printk("%s", buf); > > > > + kfree(buf); > > > > + } > > > > + } > > > > +#endif > > > > > > I am pretty sure I have already objected to this. Memory allocations in > > > the oom path are simply no go unless there is absolutely no other way > > > around that. In this case the buffer could be preallocated. > > > > Good point. We will change this to a smaller buffer allocated on the > > stack and will print records one-by-one. Thanks! > > __show_mem could be called with a very deep call chains. A single > pre-allocated buffer should just do ok. Ack. Will do. > > -- > Michal Hocko > SUSE Labs
On Thu, Feb 15, 2024 at 10:33:53AM -0800, Suren Baghdasaryan wrote: > On Thu, Feb 15, 2024 at 10:29 AM Kent Overstreet > <kent.overstreet@linux.dev> wrote: > > > > On Thu, Feb 15, 2024 at 08:47:59AM -0800, Suren Baghdasaryan wrote: > > > On Thu, Feb 15, 2024 at 8:45 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > > > On Thu 15-02-24 06:58:42, Suren Baghdasaryan wrote: > > > > > On Thu, Feb 15, 2024 at 1:22 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > > > > > > > On Mon 12-02-24 13:39:17, Suren Baghdasaryan wrote: > > > > > > [...] > > > > > > > @@ -423,4 +424,18 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > > > > > > > #ifdef CONFIG_MEMORY_FAILURE > > > > > > > printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); > > > > > > > #endif > > > > > > > +#ifdef CONFIG_MEM_ALLOC_PROFILING > > > > > > > + { > > > > > > > + struct seq_buf s; > > > > > > > + char *buf = kmalloc(4096, GFP_ATOMIC); > > > > > > > + > > > > > > > + if (buf) { > > > > > > > + printk("Memory allocations:\n"); > > > > > > > + seq_buf_init(&s, buf, 4096); > > > > > > > + alloc_tags_show_mem_report(&s); > > > > > > > + printk("%s", buf); > > > > > > > + kfree(buf); > > > > > > > + } > > > > > > > + } > > > > > > > +#endif > > > > > > > > > > > > I am pretty sure I have already objected to this. Memory allocations in > > > > > > the oom path are simply no go unless there is absolutely no other way > > > > > > around that. In this case the buffer could be preallocated. > > > > > > > > > > Good point. We will change this to a smaller buffer allocated on the > > > > > stack and will print records one-by-one. Thanks! > > > > > > > > __show_mem could be called with a very deep call chains. A single > > > > pre-allocated buffer should just do ok. > > > > > > Ack. Will do. > > > > No, we're not going to permanently burn 4k here. > > We don't need 4K here. Just enough to store one line and then print > these 10 highest allocations one line at a time. This way we can also > change that 10 to any higher number we like without any side effects. There's no reason to make the change at all. If Michal thinks there's something "dangerous" about allocating a buffer here, he needs to able to explain what it is.
On Thu 15-02-24 13:29:40, Kent Overstreet wrote: > On Thu, Feb 15, 2024 at 08:47:59AM -0800, Suren Baghdasaryan wrote: > > On Thu, Feb 15, 2024 at 8:45 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > On Thu 15-02-24 06:58:42, Suren Baghdasaryan wrote: > > > > On Thu, Feb 15, 2024 at 1:22 AM Michal Hocko <mhocko@suse.com> wrote: > > > > > > > > > > On Mon 12-02-24 13:39:17, Suren Baghdasaryan wrote: > > > > > [...] > > > > > > @@ -423,4 +424,18 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > > > > > > #ifdef CONFIG_MEMORY_FAILURE > > > > > > printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); > > > > > > #endif > > > > > > +#ifdef CONFIG_MEM_ALLOC_PROFILING > > > > > > + { > > > > > > + struct seq_buf s; > > > > > > + char *buf = kmalloc(4096, GFP_ATOMIC); > > > > > > + > > > > > > + if (buf) { > > > > > > + printk("Memory allocations:\n"); > > > > > > + seq_buf_init(&s, buf, 4096); > > > > > > + alloc_tags_show_mem_report(&s); > > > > > > + printk("%s", buf); > > > > > > + kfree(buf); > > > > > > + } > > > > > > + } > > > > > > +#endif > > > > > > > > > > I am pretty sure I have already objected to this. Memory allocations in > > > > > the oom path are simply no go unless there is absolutely no other way > > > > > around that. In this case the buffer could be preallocated. > > > > > > > > Good point. We will change this to a smaller buffer allocated on the > > > > stack and will print records one-by-one. Thanks! > > > > > > __show_mem could be called with a very deep call chains. A single > > > pre-allocated buffer should just do ok. > > > > Ack. Will do. > > No, we're not going to permanently burn 4k here. > > It's completely fine if the allocation fails, there's nothing "unsafe" > about doing a GFP_ATOMIC allocation here. Nobody is talking about safety. This is just a wrong thing to do when you are likely under OOM situation. This is a situation when you GFP_ATOMIC allocation is _likely_ to fail. Yes, yes you will get some additional memory reservers head room, but you shouldn't rely on that because that will make the output unreliable. Not something you want in situation when you really want to know that information. More over you do not need to preallocate a full page.
On Thu 15-02-24 15:33:30, Kent Overstreet wrote: > On Thu, Feb 15, 2024 at 09:22:07PM +0100, Vlastimil Babka wrote: > > On 2/15/24 19:29, Kent Overstreet wrote: > > > On Thu, Feb 15, 2024 at 08:47:59AM -0800, Suren Baghdasaryan wrote: > > >> On Thu, Feb 15, 2024 at 8:45 AM Michal Hocko <mhocko@suse.com> wrote: > > >> > > > >> > On Thu 15-02-24 06:58:42, Suren Baghdasaryan wrote: > > >> > > On Thu, Feb 15, 2024 at 1:22 AM Michal Hocko <mhocko@suse.com> wrote: > > >> > > > > > >> > > > On Mon 12-02-24 13:39:17, Suren Baghdasaryan wrote: > > >> > > > [...] > > >> > > > > @@ -423,4 +424,18 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) > > >> > > > > #ifdef CONFIG_MEMORY_FAILURE > > >> > > > > printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); > > >> > > > > #endif > > >> > > > > +#ifdef CONFIG_MEM_ALLOC_PROFILING > > >> > > > > + { > > >> > > > > + struct seq_buf s; > > >> > > > > + char *buf = kmalloc(4096, GFP_ATOMIC); > > >> > > > > + > > >> > > > > + if (buf) { > > >> > > > > + printk("Memory allocations:\n"); > > >> > > > > + seq_buf_init(&s, buf, 4096); > > >> > > > > + alloc_tags_show_mem_report(&s); > > >> > > > > + printk("%s", buf); > > >> > > > > + kfree(buf); > > >> > > > > + } > > >> > > > > + } > > >> > > > > +#endif > > >> > > > > > >> > > > I am pretty sure I have already objected to this. Memory allocations in > > >> > > > the oom path are simply no go unless there is absolutely no other way > > >> > > > around that. In this case the buffer could be preallocated. > > >> > > > > >> > > Good point. We will change this to a smaller buffer allocated on the > > >> > > stack and will print records one-by-one. Thanks! > > >> > > > >> > __show_mem could be called with a very deep call chains. A single > > >> > pre-allocated buffer should just do ok. > > >> > > >> Ack. Will do. > > > > > > No, we're not going to permanently burn 4k here. > > > > > > It's completely fine if the allocation fails, there's nothing "unsafe" > > > about doing a GFP_ATOMIC allocation here. > > > > Well, I think without __GFP_NOWARN it will cause a warning and thus > > recursion into __show_mem(), potentially infinite? Which is of course > > trivial to fix, but I'd myself rather sacrifice a bit of memory to get this > > potentially very useful output, if I enabled the profiling. The necessary > > memory overhead of page_ext and slabobj_ext makes the printing buffer > > overhead negligible in comparison? > > __GFP_NOWARN is a good point, we should have that. > > But - and correct me if I'm wrong here - doesn't an OOM kick in well > before GFP_ATOMIC 4k allocations are failing? Not really, GFP_ATOMIC users can compete with reclaimers and consume those reserves. > I'd expect the system to > be well and truly hosed at that point. It is OOMed... > If we want this report to be 100% reliable, then yes the preallocated > buffer makes sense - but I don't think 100% makes sense here; I think we > can accept ~99% and give back that 4k. Think about that from the memory reserves consumers. The atomic reserve is a scarse resource and now you want to use it for debugging purposes for which you could have preallocated.
On Thu, Feb 15, 2024 at 10:54:53PM +0100, Michal Hocko wrote: > On Thu 15-02-24 15:33:30, Kent Overstreet wrote: > > If we want this report to be 100% reliable, then yes the preallocated > > buffer makes sense - but I don't think 100% makes sense here; I think we > > can accept ~99% and give back that 4k. > > Think about that from the memory reserves consumers. The atomic reserve > is a scarse resource and now you want to use it for debugging purposes > for which you could have preallocated. _Memory_ is a finite resource that we shouldn't be using unnecessarily. We don't need this for the entire time we're under memory pressure; just the short duration it takes to generate the report, then it's back available for other users. You would have us dedicate 4k, from system bootup, that can never be used by other users. Again: this makes no sense. The whole point of having watermarks and shared reserves is so that every codepath doesn't have to have its own dedicated, private reserve, so that we can make better use of a shared finite resource.
On 2/15/24 15:07, Steven Rostedt wrote: > Just adding the patches increases the size by 5k. But the rest shows an > increase of 259k, and you are worried about 4k (and possibly less?)??? Doesn't the new page_ext thingy add a pointer per 'struct page', or ~0.2% of RAM, or ~32MB on a 16GB laptop? I, too, am confused why 4k is even remotely an issue.
On Mon 19-02-24 09:17:36, Suren Baghdasaryan wrote: [...] > For now I think with Vlastimil's __GFP_NOWARN suggestion the code > becomes safe and the only risk is to lose this report. If we get cases > with reports missing this data, we can easily change to reserved > memory. This is not just about missing part of the oom report. This is annoying but not earth shattering. Eating into very small reserves (that might be the only usable memory while the system is struggling in OOM situation) could cause functional problems that would be non trivial to test for. All that for debugging purposes is just lame. If you want to reuse the code for a different purpose then abstract it and allocate the buffer when you can afford that and use preallocated on when in OOM situation. We have always went extra mile to avoid potentially disruptive operations from the oom handling code and I do not see any good reason to diverge from that principle.
On Tue 20-02-24 12:18:49, Kent Overstreet wrote: > On Tue, Feb 20, 2024 at 05:23:29PM +0100, Michal Hocko wrote: > > On Mon 19-02-24 09:17:36, Suren Baghdasaryan wrote: > > [...] > > > For now I think with Vlastimil's __GFP_NOWARN suggestion the code > > > becomes safe and the only risk is to lose this report. If we get cases > > > with reports missing this data, we can easily change to reserved > > > memory. > > > > This is not just about missing part of the oom report. This is annoying > > but not earth shattering. Eating into very small reserves (that might be > > the only usable memory while the system is struggling in OOM situation) > > could cause functional problems that would be non trivial to test for. > > All that for debugging purposes is just lame. If you want to reuse the code > > for a different purpose then abstract it and allocate the buffer when you > > can afford that and use preallocated on when in OOM situation. > > > > We have always went extra mile to avoid potentially disruptive > > operations from the oom handling code and I do not see any good reason > > to diverge from that principle. > > Michal, I gave you the logic between dedicated reserves and system > reserves. Please stop repeating these vague what-ifs. Your argument makes little sense and it seems that it is impossible to explain that to you. I gave up on discussing this further with you. Consider NAK to any additional allocation from oom path unless you can give very _solid_ arguments this is absolutely necessary. "It's gona be fine and work most of the time" is not a solid argument.
diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h index 3fe51e67e231..0a5973c4ad77 100644 --- a/include/linux/alloc_tag.h +++ b/include/linux/alloc_tag.h @@ -30,6 +30,8 @@ struct alloc_tag { #ifdef CONFIG_MEM_ALLOC_PROFILING +void alloc_tags_show_mem_report(struct seq_buf *s); + static inline struct alloc_tag *ct_to_alloc_tag(struct codetag *ct) { return container_of(ct, struct alloc_tag, ct); diff --git a/lib/alloc_tag.c b/lib/alloc_tag.c index 2d5226d9262d..54312c213860 100644 --- a/lib/alloc_tag.c +++ b/lib/alloc_tag.c @@ -96,6 +96,44 @@ static const struct seq_operations allocinfo_seq_op = { .show = allocinfo_show, }; +void alloc_tags_show_mem_report(struct seq_buf *s) +{ + struct codetag_iterator iter; + struct codetag *ct; + struct { + struct codetag *tag; + size_t bytes; + } tags[10], n; + unsigned int i, nr = 0; + + codetag_lock_module_list(alloc_tag_cttype, true); + iter = codetag_get_ct_iter(alloc_tag_cttype); + while ((ct = codetag_next_ct(&iter))) { + struct alloc_tag_counters counter = alloc_tag_read(ct_to_alloc_tag(ct)); + + n.tag = ct; + n.bytes = counter.bytes; + + for (i = 0; i < nr; i++) + if (n.bytes > tags[i].bytes) + break; + + if (i < ARRAY_SIZE(tags)) { + nr -= nr == ARRAY_SIZE(tags); + memmove(&tags[i + 1], + &tags[i], + sizeof(tags[0]) * (nr - i)); + nr++; + tags[i] = n; + } + } + + for (i = 0; i < nr; i++) + alloc_tag_to_text(s, tags[i].tag); + + codetag_lock_module_list(alloc_tag_cttype, false); +} + static void __init procfs_init(void) { proc_create_seq("allocinfo", 0444, NULL, &allocinfo_seq_op); diff --git a/mm/show_mem.c b/mm/show_mem.c index 8dcfafbd283c..d514c15ca076 100644 --- a/mm/show_mem.c +++ b/mm/show_mem.c @@ -12,6 +12,7 @@ #include <linux/hugetlb.h> #include <linux/mm.h> #include <linux/mmzone.h> +#include <linux/seq_buf.h> #include <linux/swap.h> #include <linux/vmstat.h> @@ -423,4 +424,18 @@ void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx) #ifdef CONFIG_MEMORY_FAILURE printk("%lu pages hwpoisoned\n", atomic_long_read(&num_poisoned_pages)); #endif +#ifdef CONFIG_MEM_ALLOC_PROFILING + { + struct seq_buf s; + char *buf = kmalloc(4096, GFP_ATOMIC); + + if (buf) { + printk("Memory allocations:\n"); + seq_buf_init(&s, buf, 4096); + alloc_tags_show_mem_report(&s); + printk("%s", buf); + kfree(buf); + } + } +#endif }