Message ID | 20240208234539.19113-4-osalvador@suse.de |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-58880-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:50ea:b0:106:860b:bbdd with SMTP id r10csp533137dyd; Thu, 8 Feb 2024 16:10:24 -0800 (PST) X-Google-Smtp-Source: AGHT+IG7nmESTjfMJX+TC5WFWVASxut92/iVGSon3fl9hdy9rX5vOUFY4xTCi9OAuY29aAb43HOm X-Received: by 2002:a05:6a20:e608:b0:19e:3659:c23a with SMTP id my8-20020a056a20e60800b0019e3659c23amr194310pzb.43.1707437424664; Thu, 08 Feb 2024 16:10:24 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707437424; cv=pass; d=google.com; s=arc-20160816; b=eO1ZsTx1l8PX+i8JZ1C1h1+A4Xpww5G5LapHg4bl8hq4aI3J4WvBUGj3M9zreHUdgv o2/NnY28HP3Au+0kiE1K0ZA/a3WOn0UfzidS+AJGxbS8qlmJTF2KpPaDadXYUtuk2CRr 0OTP6dKxWHgTrE80T0G/u7ouofllfnMQowIXnzLPFi7REhrzYHjBnweomJh6ut3ILs02 Kgcx9mqZcbXcQ38hf1vPHOQhX5Bn+FiAn9RF4uvUyjfA5J9usyZZn1KQAJYS2WmogQ2y FF00FRLDv43zCWwELfCWWQYA53vdoia5tYdkJ2b8OwbI1931ffQPS6hrlm8spx5S3RgN x9lA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature:dkim-signature :dkim-signature:dkim-signature; bh=1RIuerYMWRH7Xo5xR9Dr7KugCllJ88k2PizonkIPXIU=; fh=AECHGR58kb7Z2UqvdgXv9rX9cOrtxXsYpAlsTh7RP1s=; b=KlQKuE+6bCGrs/J5KmWE/8U1rCpEnj9euJcL8hoCjTSRBh9j7s2phwFwafD0wGpCfE cXjXSoq/CnV036TiBwJnSsdzm1qJovSMVvvRhzWRwd0Ek6Xmx9P57UwTTO2S8qfnR6v7 h9V36omXJmbDTnQsnI6mOP/3S7lWjpjWbynWThHKtruJ9uKkZRjxYtKDCOSZuEDuM761 0iN/UKpKiK4O5dyWWZY4HYjyvr/OmWaxT1mktZ4J7CpFhqxBqm+9Hv7hAk8vAdjsoB6u 1xdk3R5+xXpgqRIF3pafyyiACFGk5WMxq2W23t4WSiHeREVobAkugkh+0QWazF8OgW45 qlkQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=hY9JCYvA; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=hY9JCYvA; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519; arc=pass (i=1 spf=pass spfdomain=suse.de dkim=pass dkdomain=suse.de dkim=pass dkdomain=suse.de dmarc=pass fromdomain=suse.de); spf=pass (google.com: domain of linux-kernel+bounces-58880-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-58880-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de X-Forwarded-Encrypted: i=2; AJvYcCU/hDL/11TLU+rGs9cgFSg9xTgrKrUEEBmB4b7WNh8kh96NkA+wwRQvN5IK9JjaGYUhg2Iv4dcQU+CnHMkS0OI9jRixTA== Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id e11-20020a65688b000000b005dbd0facb50si665287pgt.384.2024.02.08.16.10.24 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Feb 2024 16:10:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-58880-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=hY9JCYvA; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=hY9JCYvA; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519; arc=pass (i=1 spf=pass spfdomain=suse.de dkim=pass dkdomain=suse.de dkim=pass dkdomain=suse.de dmarc=pass fromdomain=suse.de); spf=pass (google.com: domain of linux-kernel+bounces-58880-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-58880-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id E0CFBB2933B for <ouuuleilei@gmail.com>; Thu, 8 Feb 2024 23:46:20 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4764B52F6B; Thu, 8 Feb 2024 23:45:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="hY9JCYvA"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="p5HZ5uBZ"; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b="hY9JCYvA"; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b="p5HZ5uBZ" Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.223.130]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 683EB5101E for <linux-kernel@vger.kernel.org>; Thu, 8 Feb 2024 23:45:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=195.135.223.130 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707435919; cv=none; b=TK5JoUjnSZ1zjdxo+HIfyQl3wkDCcUB5NJWXi8a8mpBCiBR8XTbXwT31HUt9GwsZFORFlvrvXb0x3HeuJH5HiCq4IeXM1mtAJABFZzuN1fLjVyT2rg0yUvI1/hL0M7oOZkYLYIBucBhaxHSSp/BvO82V/lofJ8OWRXMOREMnS0M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707435919; c=relaxed/simple; bh=vD+uDbcZd1nUQSYd7TnT05L2X2yMg5F/ZdZDImKbnS0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=h7NUL4oEwN7ssndHREhCFuQnUcryDuWIPfmn+HlG83NBo62fiSQ/1bvG6bMMnT+0rtuxpbryrP26ayg56o9laPEMeZ2j0qU1NQwL3DN6wqUE7qnfXYDShqqXzuWqQdSk7CcwYuZCocn3BwpsMPQ5oxFbGDixI9UNXWYtwy+MJ6I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de; spf=pass smtp.mailfrom=suse.de; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=hY9JCYvA; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=p5HZ5uBZ; dkim=pass (1024-bit key) header.d=suse.de header.i=@suse.de header.b=hY9JCYvA; dkim=permerror (0-bit key) header.d=suse.de header.i=@suse.de header.b=p5HZ5uBZ; arc=none smtp.client-ip=195.135.223.130 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=suse.de Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id A783422317; Thu, 8 Feb 2024 23:45:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1707435915; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1RIuerYMWRH7Xo5xR9Dr7KugCllJ88k2PizonkIPXIU=; b=hY9JCYvAncAGepUu81hefJSrPIqWxz/6UK81x78gggd7DefAzQkSaFtX/UeMHzsUrsprWg UnkgoUNyQrDRszamEUWOuLhaRRJBjVfAf5YiWMHqvwRf62A7ru9jv/bv/vvDNy+oUMNH+u QAu0UK+6FYmPjn6QLgfthVizcCiWcww= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1707435915; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1RIuerYMWRH7Xo5xR9Dr7KugCllJ88k2PizonkIPXIU=; b=p5HZ5uBZO2fi/Vl014Q4Qrb0lax8dsjC8lQ2L5wn/MAUf37MHid3+N7LFaSZ6Sd95BS7TW b1gGtfcxLmlURUDQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1707435915; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1RIuerYMWRH7Xo5xR9Dr7KugCllJ88k2PizonkIPXIU=; b=hY9JCYvAncAGepUu81hefJSrPIqWxz/6UK81x78gggd7DefAzQkSaFtX/UeMHzsUrsprWg UnkgoUNyQrDRszamEUWOuLhaRRJBjVfAf5YiWMHqvwRf62A7ru9jv/bv/vvDNy+oUMNH+u QAu0UK+6FYmPjn6QLgfthVizcCiWcww= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1707435915; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1RIuerYMWRH7Xo5xR9Dr7KugCllJ88k2PizonkIPXIU=; b=p5HZ5uBZO2fi/Vl014Q4Qrb0lax8dsjC8lQ2L5wn/MAUf37MHid3+N7LFaSZ6Sd95BS7TW b1gGtfcxLmlURUDQ== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 1BB6413984; Thu, 8 Feb 2024 23:45:15 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id WI3VA4tnxWWUfAAAD6G6ig (envelope-from <osalvador@suse.de>); Thu, 08 Feb 2024 23:45:15 +0000 From: Oscar Salvador <osalvador@suse.de> To: Andrew Morton <akpm@linux-foundation.org> Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>, Marco Elver <elver@google.com>, Andrey Konovalov <andreyknvl@gmail.com>, Alexander Potapenko <glider@google.com>, Oscar Salvador <osalvador@suse.de> Subject: [PATCH v7 3/4] mm,page_owner: Display all stacks and their count Date: Fri, 9 Feb 2024 00:45:38 +0100 Message-ID: <20240208234539.19113-4-osalvador@suse.de> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240208234539.19113-1-osalvador@suse.de> References: <20240208234539.19113-1-osalvador@suse.de> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Authentication-Results: smtp-out1.suse.de; none X-Spamd-Result: default: False [1.90 / 50.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; R_MISSING_CHARSET(2.50)[]; MIME_GOOD(-0.10)[text/plain]; TO_MATCH_ENVRCPT_ALL(0.00)[]; BROKEN_CONTENT_TYPE(1.50)[]; RCVD_COUNT_THREE(0.00)[3]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; RCPT_COUNT_SEVEN(0.00)[9]; MID_CONTAINS_FROM(1.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.de:email]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; FREEMAIL_CC(0.00)[vger.kernel.org,kvack.org,suse.com,suse.cz,google.com,gmail.com,suse.de]; RCVD_TLS_ALL(0.00)[]; BAYES_HAM(-3.00)[100.00%] X-Spam-Level: * X-Spam-Score: 1.90 X-Spam-Flag: NO X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790377905213618678 X-GMAIL-MSGID: 1790377905213618678 |
Series |
page_owner: print stacks and their outstanding allocations
|
|
Commit Message
Oscar Salvador
Feb. 8, 2024, 11:45 p.m. UTC
This patch adds a new file called 'page_owner_stacks', which
will show all stacks that were added by page_owner followed by
their counting, giving us a clear overview of stack <-> count
relationship.
E.g:
prep_new_page+0xa9/0x120
get_page_from_freelist+0x801/0x2210
__alloc_pages+0x18b/0x350
alloc_pages_mpol+0x91/0x1f0
folio_alloc+0x14/0x50
filemap_alloc_folio+0xb2/0x100
__filemap_get_folio+0x14a/0x490
ext4_write_begin+0xbd/0x4b0 [ext4]
generic_perform_write+0xc1/0x1e0
ext4_buffered_write_iter+0x68/0xe0 [ext4]
ext4_file_write_iter+0x70/0x740 [ext4]
vfs_write+0x33d/0x420
ksys_write+0xa5/0xe0
do_syscall_64+0x80/0x160
entry_SYSCALL_64_after_hwframe+0x6e/0x76
stack_count: 4578
In order to show all the stacks, we implement stack_depot_get_next_stack(),
which walks all buckets while retrieving the stacks stored in them.
stack_depot_get_next_stack() will return all stacks, one at a time,
by first finding a non-empty bucket, and then retrieving all the stacks
stored in that bucket.
Once we have completely gone through it, we get the next non-empty bucket
and repeat the same steps, and so on until we have completely checked all
buckets.
Signed-off-by: Oscar Salvador <osalvador@suse.de>
---
include/linux/stackdepot.h | 20 +++++++++
lib/stackdepot.c | 46 +++++++++++++++++++++
mm/page_owner.c | 85 ++++++++++++++++++++++++++++++++++++++
3 files changed, 151 insertions(+)
Comments
On Fri, 9 Feb 2024 at 00:45, Oscar Salvador <osalvador@suse.de> wrote: > > This patch adds a new file called 'page_owner_stacks', which > will show all stacks that were added by page_owner followed by > their counting, giving us a clear overview of stack <-> count > relationship. > > E.g: > > prep_new_page+0xa9/0x120 > get_page_from_freelist+0x801/0x2210 > __alloc_pages+0x18b/0x350 > alloc_pages_mpol+0x91/0x1f0 > folio_alloc+0x14/0x50 > filemap_alloc_folio+0xb2/0x100 > __filemap_get_folio+0x14a/0x490 > ext4_write_begin+0xbd/0x4b0 [ext4] > generic_perform_write+0xc1/0x1e0 > ext4_buffered_write_iter+0x68/0xe0 [ext4] > ext4_file_write_iter+0x70/0x740 [ext4] > vfs_write+0x33d/0x420 > ksys_write+0xa5/0xe0 > do_syscall_64+0x80/0x160 > entry_SYSCALL_64_after_hwframe+0x6e/0x76 > stack_count: 4578 > > In order to show all the stacks, we implement stack_depot_get_next_stack(), > which walks all buckets while retrieving the stacks stored in them. > stack_depot_get_next_stack() will return all stacks, one at a time, > by first finding a non-empty bucket, and then retrieving all the stacks > stored in that bucket. > Once we have completely gone through it, we get the next non-empty bucket > and repeat the same steps, and so on until we have completely checked all > buckets. > > Signed-off-by: Oscar Salvador <osalvador@suse.de> > --- > include/linux/stackdepot.h | 20 +++++++++ > lib/stackdepot.c | 46 +++++++++++++++++++++ > mm/page_owner.c | 85 ++++++++++++++++++++++++++++++++++++++ > 3 files changed, 151 insertions(+) > > diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h > index ac62de4d4999..d851ec821e6f 100644 > --- a/include/linux/stackdepot.h > +++ b/include/linux/stackdepot.h > @@ -183,6 +183,26 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries, > */ > struct stack_record *stack_depot_get_stack(depot_stack_handle_t handle); > > +/** > + * stack_depot_get_next_stack - Returns all stacks, one at a time "Returns all stack_records" to be clear that this is returning the struct. > + * > + * @table: Current table we are checking > + * @bucket: Current bucket we are checking > + * @last_found: Last stack that was found > + * > + * This function finds first a non-empty bucket and returns the first stack > + * stored in it. On consequent calls, it walks the bucket to see whether > + * it contains more stacks. > + * Once we have walked all the stacks in a bucket, we check > + * the next one, and we repeat the same steps until we have checked all of them I think for this function it's important to say that no entry returned from this function can be evicted. I.e. the easiest way to ensure this is that the caller makes sure the entries returned are never passed to stack_depot_put() - which is certainly the case for your usecase because you do not use stack_depot_put(). > + * Return: A pointer a to stack_record struct, or NULL when we have walked all > + * buckets. > + */ > +struct stack_record *stack_depot_get_next_stack(unsigned long *table, To keep consistent, I'd also call this __stack_depot_get_next_stack_record(), so that we're clear this is more of an internal function not for general usage. > + struct list_head **bucket, > + struct stack_record **last_found); > + > /** > * stack_depot_fetch - Fetch a stack trace from stack depot > * > diff --git a/lib/stackdepot.c b/lib/stackdepot.c > index 197c355601f9..107bd0174cd6 100644 > --- a/lib/stackdepot.c > +++ b/lib/stackdepot.c > @@ -782,6 +782,52 @@ unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle) > } > EXPORT_SYMBOL(stack_depot_get_extra_bits); > > +struct stack_record *stack_depot_get_next_stack(unsigned long *table, > + struct list_head **curr_bucket, > + struct stack_record **last_found) > +{ > + struct list_head *bucket = *curr_bucket; > + unsigned long nr_table = *table; > + struct stack_record *found = NULL; > + unsigned long stack_table_entries = stack_hash_mask + 1; > + > + rcu_read_lock_sched_notrace(); We are returning pointers to stack_records out of the RCU-read critical section, which are then later used to continue the iteration. list_for_each_entry_continue_rcu() says this is fine if "... you held some sort of non-RCU reference (such as a reference count) ...". Updating the function's documentation to say none of these entries can be evicted via a stack_depot_put() is required. > + if (!bucket) { > + /* > + * Find a non-empty bucket. Once we have found it, > + * we will use list_for_each_entry_continue_rcu() on the next > + * call to keep walking the bucket. > + */ > +new_table: > + bucket = &stack_table[nr_table]; > + list_for_each_entry_rcu(found, bucket, hash_list) { > + goto out; > + } > + } else { > + /* Check whether we have more stacks in this bucket */ > + found = *last_found; > + list_for_each_entry_continue_rcu(found, bucket, hash_list) { > + goto out; > + } > + } > + > + /* No more stacks in this bucket, check the next one */ > + nr_table++; > + if (nr_table < stack_table_entries) > + goto new_table; > + > + /* We are done walking all buckets */ > + found = NULL; > + > +out: > + *table = nr_table; > + *curr_bucket = bucket; > + *last_found = found; > + rcu_read_unlock_sched_notrace(); > + > + return found; > +} > + > static int stats_show(struct seq_file *seq, void *v) > { > /* > diff --git a/mm/page_owner.c b/mm/page_owner.c > index 0adf41702b9d..aea212734557 100644 > --- a/mm/page_owner.c > +++ b/mm/page_owner.c > @@ -749,6 +749,89 @@ static const struct file_operations proc_page_owner_operations = { > .llseek = lseek_page_owner, > }; > > +struct stack_iterator { > + unsigned long nr_table; > + struct list_head *bucket; > + struct stack_record *last_stack; > +}; > + > +static void *stack_start(struct seq_file *m, loff_t *ppos) > +{ > + struct stack_iterator *iter = m->private; > + > + if (*ppos == -1UL) > + return NULL; > + > + return stack_depot_get_next_stack(&iter->nr_table, > + &iter->bucket, > + &iter->last_stack); > +} > + > +static void *stack_next(struct seq_file *m, void *v, loff_t *ppos) > +{ > + struct stack_iterator *iter = m->private; > + struct stack_record *stack; > + > + stack = stack_depot_get_next_stack(&iter->nr_table, > + &iter->bucket, > + &iter->last_stack); > + *ppos = stack ? *ppos + 1 : -1UL; > + > + return stack; > +} > + > +static int stack_print(struct seq_file *m, void *v) > +{ > + char *buf; > + int ret = 0; > + struct stack_iterator *iter = m->private; > + struct stack_record *stack = iter->last_stack; > + > + if (!stack->size || stack->size < 0 || refcount_read(&stack->count) < 2) > + return 0; > + > + buf = kzalloc(PAGE_SIZE, GFP_KERNEL); > + > + ret += stack_trace_snprint(buf, PAGE_SIZE, stack->entries, stack->size, > + 0); > + if (!ret) > + goto out; > + > + scnprintf(buf + ret, PAGE_SIZE - ret, "stack_count: %d\n\n", > + refcount_read(&stack->count)); > + > + seq_printf(m, buf); > + seq_puts(m, "\n\n"); > +out: > + kfree(buf); > + > + return 0; > +} > + > +static void stack_stop(struct seq_file *m, void *v) > +{ > +} > + > +static const struct seq_operations page_owner_stack_op = { > + .start = stack_start, > + .next = stack_next, > + .stop = stack_stop, > + .show = stack_print > +}; > + > +static int page_owner_stack_open(struct inode *inode, struct file *file) > +{ > + return seq_open_private(file, &page_owner_stack_op, > + sizeof(struct stack_iterator)); > +} > + > +const struct file_operations page_owner_stack_operations = { > + .open = page_owner_stack_open, > + .read = seq_read, > + .llseek = seq_lseek, > + .release = seq_release, > +}; > + > static int __init pageowner_init(void) > { > if (!static_branch_unlikely(&page_owner_inited)) { > @@ -758,6 +841,8 @@ static int __init pageowner_init(void) > > debugfs_create_file("page_owner", 0400, NULL, NULL, > &proc_page_owner_operations); > + debugfs_create_file("page_owner_stacks", 0400, NULL, NULL, > + &page_owner_stack_operations); > > return 0; > } > -- > 2.43.0 >
On Fri, Feb 09, 2024 at 09:00:40AM +0100, Marco Elver wrote: > > +/** > > + * stack_depot_get_next_stack - Returns all stacks, one at a time > > "Returns all stack_records" to be clear that this is returning the struct. Fixed. > > > + * > > + * @table: Current table we are checking > > + * @bucket: Current bucket we are checking > > + * @last_found: Last stack that was found > > + * > > + * This function finds first a non-empty bucket and returns the first stack > > + * stored in it. On consequent calls, it walks the bucket to see whether > > + * it contains more stacks. > > + * Once we have walked all the stacks in a bucket, we check > > + * the next one, and we repeat the same steps until we have checked all of them > > I think for this function it's important to say that no entry returned > from this function can be evicted. > > I.e. the easiest way to ensure this is that the caller makes sure the > entries returned are never passed to stack_depot_put() - which is > certainly the case for your usecase because you do not use > stack_depot_put(). > > > + * Return: A pointer a to stack_record struct, or NULL when we have walked all > > + * buckets. > > + */ > > +struct stack_record *stack_depot_get_next_stack(unsigned long *table, > > To keep consistent, I'd also call this > __stack_depot_get_next_stack_record(), so that we're clear this is > more of an internal function not for general usage. > > > + struct list_head **bucket, > > + struct stack_record **last_found); > > + > > /** > > * stack_depot_fetch - Fetch a stack trace from stack depot > > * > > diff --git a/lib/stackdepot.c b/lib/stackdepot.c > > index 197c355601f9..107bd0174cd6 100644 > > --- a/lib/stackdepot.c > > +++ b/lib/stackdepot.c > > @@ -782,6 +782,52 @@ unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle) > > } > > EXPORT_SYMBOL(stack_depot_get_extra_bits); > > > > +struct stack_record *stack_depot_get_next_stack(unsigned long *table, > > + struct list_head **curr_bucket, > > + struct stack_record **last_found) > > +{ > > + struct list_head *bucket = *curr_bucket; > > + unsigned long nr_table = *table; > > + struct stack_record *found = NULL; > > + unsigned long stack_table_entries = stack_hash_mask + 1; > > + > > + rcu_read_lock_sched_notrace(); > > We are returning pointers to stack_records out of the RCU-read > critical section, which are then later used to continue the iteration. > list_for_each_entry_continue_rcu() says this is fine if "... you held > some sort of non-RCU reference (such as a reference count) ...". > Updating the function's documentation to say none of these entries can > be evicted via a stack_depot_put() is required. Thinking about it some more, I think I made a mistake: I am walking all buckets, and within those buckets there are not only page_owner stack_records, which means that I could return a stack_record from e.g: KASAN (which I think can evict stack_records) and then everything goes off the rails. Which means I cannot walk the buckets like that. Actually, I think that having something like the following struct list_stack_records { struct stack_record *stack; struct list_stack_records *next; } in page_owner would make sense. Then the only thing I would have to do is to add a new record on every new stack_record, and then I could just walk the list like a linked list. Which means that the function stack_depot_get_next_stack() could be killed because everything would happen in page_owner code. e.g: static void inc_stack_record_count(depot_stack_handle_t handle) { struct stack_record *stack = __stack_depot_get_stack_record(handle); if (stack) { /* * New stack_record's that do not use STACK_DEPOT_FLAG_GET start * with REFCOUNT_SATURATED to catch spurious increments of their * refcount. * Since we do not use STACK_DEPOT_FLAG_{GET,PUT} API, let us * set a refcount of 1 ourselves. */ if (refcount_read(&stack->count) == REFCOUNT_SATURATED) { refcount_set(&stack->count, 1); add_new_stack_record_into_the_list(stack) } refcount_inc(&stack->count); } } and then just walk the list_stack_records list whenever we want to show the stacktraces and their counting. I think that overall this approach is cleaner and safer.
On Fri, Feb 09, 2024 at 10:52:48PM +0100, Oscar Salvador wrote: > Thinking about it some more, I think I made a mistake: > > I am walking all buckets, and within those buckets there are not only > page_owner stack_records, which means that I could return a stack_record > from e.g: KASAN (which I think can evict stack_records) and then > everything goes off the rails. > Which means I cannot walk the buckets like that. > > Actually, I think that having something like the following > > struct list_stack_records { > struct stack_record *stack; > struct list_stack_records *next; > } Or, I could use the extra_bits field from handle_parts to flag that when a depot_stack_handle_t is used by page_owner. Then __stack_depot_get_next_stack_record() would check whether a stack_record->handle.extra_bits has the page_owner bit, and only return those stacks that have such bit. This would solve the problem of returning a potentially evictable stack , only by returning page_owner's stack_records, and I would not have to maintain my own list. I yet have to see how that would look like, but sounds promising. Do you think that is feasible Marco? Thanks
Hi Oscar, kernel test robot noticed the following build warnings: [auto build test WARNING on akpm-mm/mm-nonmm-unstable] [also build test WARNING on linus/master v6.8-rc3] [cannot apply to akpm-mm/mm-everything next-20240209] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Oscar-Salvador/lib-stackdepot-Move-stack_record-struct-definition-into-the-header/20240209-074611 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-nonmm-unstable patch link: https://lore.kernel.org/r/20240208234539.19113-4-osalvador%40suse.de patch subject: [PATCH v7 3/4] mm,page_owner: Display all stacks and their count config: x86_64-randconfig-121-20240209 (https://download.01.org/0day-ci/archive/20240210/202402100636.5lFPAlbB-lkp@intel.com/config) compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240210/202402100636.5lFPAlbB-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202402100636.5lFPAlbB-lkp@intel.com/ sparse warnings: (new ones prefixed by >>) >> mm/page_owner.c:828:30: sparse: sparse: symbol 'page_owner_stack_operations' was not declared. Should it be static? mm/page_owner.c: note: in included file (through include/linux/mmzone.h, include/linux/gfp.h, include/linux/xarray.h, ...): include/linux/page-flags.h:242:46: sparse: sparse: self-comparison always evaluates to false vim +/page_owner_stack_operations +828 mm/page_owner.c 827 > 828 const struct file_operations page_owner_stack_operations = { 829 .open = page_owner_stack_open, 830 .read = seq_read, 831 .llseek = seq_lseek, 832 .release = seq_release, 833 }; 834
On Sat, 10 Feb 2024 at 00:13, Oscar Salvador <osalvador@suse.de> wrote: > > On Fri, Feb 09, 2024 at 10:52:48PM +0100, Oscar Salvador wrote: > > Thinking about it some more, I think I made a mistake: > > > > I am walking all buckets, and within those buckets there are not only > > page_owner stack_records, which means that I could return a stack_record > > from e.g: KASAN (which I think can evict stack_records) and then > > everything goes off the rails. > > Which means I cannot walk the buckets like that. > > > > Actually, I think that having something like the following > > > > struct list_stack_records { > > struct stack_record *stack; > > struct list_stack_records *next; > > } > > Or, I could use the extra_bits field from handle_parts to flag that > when a depot_stack_handle_t is used by page_owner. > > Then __stack_depot_get_next_stack_record() would check whether > a stack_record->handle.extra_bits has the page_owner bit, and only > return those stacks that have such bit. > This would solve the problem of returning a potentially evictable stack > , only by returning page_owner's stack_records, and I would not have > to maintain my own list. > > I yet have to see how that would look like, but sounds promising. > Do you think that is feasible Marco? The extra bits are used by KMSAN, and might conflict if enabled at the same time. I think the safest option is to keep your own list. I think that will also be more performant if there are other stackdepot users because you do not have to traverse any of the other entries.
On Sat, Feb 10, 2024 at 08:52:25AM +0100, Marco Elver wrote: > The extra bits are used by KMSAN, and might conflict if enabled at the > same time. I think the safest option is to keep your own list. I think > that will also be more performant if there are other stackdepot users > because you do not have to traverse any of the other entries. Ok, I thought we had spare bits for other users. But thinking about it some more, yes, it makes sense for page_owner to maintain its own list, so traversing it is faster and we do not have to place code to traverse the buckets in stackdepot.
On 2/11/24 21:39, Oscar Salvador wrote: > On Sat, Feb 10, 2024 at 08:52:25AM +0100, Marco Elver wrote: >> The extra bits are used by KMSAN, and might conflict if enabled at the >> same time. I think the safest option is to keep your own list. I think >> that will also be more performant if there are other stackdepot users >> because you do not have to traverse any of the other entries. > > Ok, I thought we had spare bits for other users. > But thinking about it some more, yes, it makes sense for page_owner to > maintain its own list, so traversing it is faster and we do not have > to place code to traverse the buckets in stackdepot. Would it make sense to introduce per-user stack depot instances? ("user" being a subsystem i.e. kasan or page_owner). I'd expect each to have a distinct set of stacks, so there's no benefits of using the same hash table, only downsides of longer collision lists? I can imagine this would be easier for users that don't need the early init kind of stackdepot, but maybe even there it could be feasible to have a small fixed size array of hash table roots and every user would get a separate index?
diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h index ac62de4d4999..d851ec821e6f 100644 --- a/include/linux/stackdepot.h +++ b/include/linux/stackdepot.h @@ -183,6 +183,26 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries, */ struct stack_record *stack_depot_get_stack(depot_stack_handle_t handle); +/** + * stack_depot_get_next_stack - Returns all stacks, one at a time + * + * @table: Current table we are checking + * @bucket: Current bucket we are checking + * @last_found: Last stack that was found + * + * This function finds first a non-empty bucket and returns the first stack + * stored in it. On consequent calls, it walks the bucket to see whether + * it contains more stacks. + * Once we have walked all the stacks in a bucket, we check + * the next one, and we repeat the same steps until we have checked all of them + * + * Return: A pointer a to stack_record struct, or NULL when we have walked all + * buckets. + */ +struct stack_record *stack_depot_get_next_stack(unsigned long *table, + struct list_head **bucket, + struct stack_record **last_found); + /** * stack_depot_fetch - Fetch a stack trace from stack depot * diff --git a/lib/stackdepot.c b/lib/stackdepot.c index 197c355601f9..107bd0174cd6 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -782,6 +782,52 @@ unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle) } EXPORT_SYMBOL(stack_depot_get_extra_bits); +struct stack_record *stack_depot_get_next_stack(unsigned long *table, + struct list_head **curr_bucket, + struct stack_record **last_found) +{ + struct list_head *bucket = *curr_bucket; + unsigned long nr_table = *table; + struct stack_record *found = NULL; + unsigned long stack_table_entries = stack_hash_mask + 1; + + rcu_read_lock_sched_notrace(); + if (!bucket) { + /* + * Find a non-empty bucket. Once we have found it, + * we will use list_for_each_entry_continue_rcu() on the next + * call to keep walking the bucket. + */ +new_table: + bucket = &stack_table[nr_table]; + list_for_each_entry_rcu(found, bucket, hash_list) { + goto out; + } + } else { + /* Check whether we have more stacks in this bucket */ + found = *last_found; + list_for_each_entry_continue_rcu(found, bucket, hash_list) { + goto out; + } + } + + /* No more stacks in this bucket, check the next one */ + nr_table++; + if (nr_table < stack_table_entries) + goto new_table; + + /* We are done walking all buckets */ + found = NULL; + +out: + *table = nr_table; + *curr_bucket = bucket; + *last_found = found; + rcu_read_unlock_sched_notrace(); + + return found; +} + static int stats_show(struct seq_file *seq, void *v) { /* diff --git a/mm/page_owner.c b/mm/page_owner.c index 0adf41702b9d..aea212734557 100644 --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -749,6 +749,89 @@ static const struct file_operations proc_page_owner_operations = { .llseek = lseek_page_owner, }; +struct stack_iterator { + unsigned long nr_table; + struct list_head *bucket; + struct stack_record *last_stack; +}; + +static void *stack_start(struct seq_file *m, loff_t *ppos) +{ + struct stack_iterator *iter = m->private; + + if (*ppos == -1UL) + return NULL; + + return stack_depot_get_next_stack(&iter->nr_table, + &iter->bucket, + &iter->last_stack); +} + +static void *stack_next(struct seq_file *m, void *v, loff_t *ppos) +{ + struct stack_iterator *iter = m->private; + struct stack_record *stack; + + stack = stack_depot_get_next_stack(&iter->nr_table, + &iter->bucket, + &iter->last_stack); + *ppos = stack ? *ppos + 1 : -1UL; + + return stack; +} + +static int stack_print(struct seq_file *m, void *v) +{ + char *buf; + int ret = 0; + struct stack_iterator *iter = m->private; + struct stack_record *stack = iter->last_stack; + + if (!stack->size || stack->size < 0 || refcount_read(&stack->count) < 2) + return 0; + + buf = kzalloc(PAGE_SIZE, GFP_KERNEL); + + ret += stack_trace_snprint(buf, PAGE_SIZE, stack->entries, stack->size, + 0); + if (!ret) + goto out; + + scnprintf(buf + ret, PAGE_SIZE - ret, "stack_count: %d\n\n", + refcount_read(&stack->count)); + + seq_printf(m, buf); + seq_puts(m, "\n\n"); +out: + kfree(buf); + + return 0; +} + +static void stack_stop(struct seq_file *m, void *v) +{ +} + +static const struct seq_operations page_owner_stack_op = { + .start = stack_start, + .next = stack_next, + .stop = stack_stop, + .show = stack_print +}; + +static int page_owner_stack_open(struct inode *inode, struct file *file) +{ + return seq_open_private(file, &page_owner_stack_op, + sizeof(struct stack_iterator)); +} + +const struct file_operations page_owner_stack_operations = { + .open = page_owner_stack_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, +}; + static int __init pageowner_init(void) { if (!static_branch_unlikely(&page_owner_inited)) { @@ -758,6 +841,8 @@ static int __init pageowner_init(void) debugfs_create_file("page_owner", 0400, NULL, NULL, &proc_page_owner_operations); + debugfs_create_file("page_owner_stacks", 0400, NULL, NULL, + &page_owner_stack_operations); return 0; }