Message ID | 170786028128.11135.4581426129369576567.stgit@91.116.238.104.host.secureserver.net |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-64347-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp823281dyb; Tue, 13 Feb 2024 13:40:41 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWq+GrpAfi6jIHf7PfCLe4FE9afSV3KHlQhB5bgyM8zuwg5Arw1Q46prH2jocLD9AwLfHRfPJpa0pzm/M6XMqsel3sQpg== X-Google-Smtp-Source: AGHT+IGgxFopY3nPApmitOP+TpwT5eRFvYuumkOpAp0nSTBizoOFja0p5HjanR0UJkM0BuSmK7WV X-Received: by 2002:a17:90a:be14:b0:296:6b49:b416 with SMTP id a20-20020a17090abe1400b002966b49b416mr666215pjs.49.1707860441371; Tue, 13 Feb 2024 13:40:41 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707860441; cv=pass; d=google.com; s=arc-20160816; b=uPynRCP8LYJKSVctymkzfWytQPY291xesji2n4DpcESEqRL3fmClP+TeNBP7610spf jBxdozgq2TxVuEvbjcVWw+zqG4WalogdneQgwpG9J5h72u6o+5M4rBJeq6HVHTMyhtKC E6fcGpnFgVYMtJFxKnHyPJUIuY+xuo2QYIGdiZyThptayTW5EvRB5SmMtikvPsDYuL/Z T9/0bfLXlOJQxFPabxh+E2eT1NltC0dUvxEwjMM8oOYGlLIfPwzW/dvojWc5TEXfjUNo QFm3x7Aa/T1rGNI71W69atOky6OjZgi29LV3Jygj5DH+qErjbpXIum1XMMXpP1mV3m0N 0tvQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:user-agent:references:in-reply-to :message-id:date:cc:to:from:subject:dkim-signature; bh=0SER4lOz/AH2yj5owcu/4zqzTVorxUC2vU0ru7wDFG4=; fh=oiYmNWjiwWdtt3fdo8rcMGV6cM67ZqpY9GUeyHNMRH4=; b=cdNI/zWNDcmiJVneGjMoj90srHLrK+x+zTePyIFYv7KOYeFshgZjtjJTYJV8jmJHa3 +Ggxk1smya/v5npxY0uPjvmmN2cD1vqsTUo2D4xZqr60yeWwSKbR0NMdGNbIdInzAo3V OU3e2avnPnKrk6gudg2eDviK5+OLVegWmupciwTY8Ts7A8wV6Xo2+WvtKNq+Faizu2GO vSIaMaLszB8SyIR59THb+5WcMwgZdypDBi7gGwnT6U5Y6BM0UV23PfBkpVZjK99K9haP 3j3Dt0Vj8DOBEi+WXABcNw4WMbzlyowOt987KMMN0vcQGvOvDb7Qtv/9HeOdPkh9kyKN Ki8w==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=FG7j61xv; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-64347-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-64347-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Forwarded-Encrypted: i=2; AJvYcCUZKQO64SB1fY9XZkgLO4ocjc09N2UjWK+vj+rwvAke2hn4nHCMqRSqARtSECs4oj3vft/KSJcDnGZTk6LOnF+LIewloQ== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id c9-20020a170902c1c900b001d8ead4fbfesi2566967plc.191.2024.02.13.13.40.41 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Feb 2024 13:40:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-64347-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=FG7j61xv; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-64347-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-64347-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 87B41283F4B for <ouuuleilei@gmail.com>; Tue, 13 Feb 2024 21:39:54 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id EE54E6216E; Tue, 13 Feb 2024 21:38:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="FG7j61xv" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AEC3C61688; Tue, 13 Feb 2024 21:38:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707860283; cv=none; b=YIwHcU8lJI3aSxJYwzZdwSntz2388jXmWFK2jVusOxH7l4RcwncNAwcDgaf1gv3QGOijS62HUuHhDJ0Hqby69HeESRri3I9ijT5LL5/7jVmIkWvpxSZOqUO6tcTsm9Lk+ZzK9UrhrcNy6utsD/+L+46ljzf+JP905URRkmViYP8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707860283; c=relaxed/simple; bh=KGqt+JHPo7CMimpAfjV92G5RJ+LvokLdXc+VC2S1AlQ=; h=Subject:From:To:Cc:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=oVtmvM6R5p00fSWNyB3NcRMssUShTG0KD4rvGPIzpcsBM9swbYpE5jEapSdgf/MBmLmVRFgdKE/2j10ZBETowdFXtZQOwYpbT83gV8tqI8bMY6ESgoGFj7POMWSCssayK4y7FqnXnTQxdVc/GlKCBrktmlos8dVdo1SvzP1H+v0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=FG7j61xv; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3C6B8C433F1; Tue, 13 Feb 2024 21:38:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707860283; bh=KGqt+JHPo7CMimpAfjV92G5RJ+LvokLdXc+VC2S1AlQ=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=FG7j61xv3XzeLrMru1T0AuzhJtumhe7YDI4XxvKqKzZErrZwLX9fcCO+Ik0BsBBKz y2HK3IiVkalN33FSz9+c8RRFS5yv1P/s570qwwK4A1HiYLX+BSffOA/RvoQ+F2QnqB 7RgknMYrNxdARbcEimBh6CwC7aBu/Cgct8Rs1vM4/su6p/cBDjro7gKVP3NkCdVTqw kHEYMT3APXIbiLWNXceuQq/2JflUJexQigp2iom0MFZGAA21wJjQa9YREfL4/mNkod R1nvl7c7hVJUnFzZ7I83N503q4+pMhWDj7kVXBYxEnZUNQONDco1zKaW19Mxav9op8 Z9F26fNWv5Ulw== Subject: [PATCH RFC 6/7] libfs: Convert simple directory offsets to use a Maple Tree From: Chuck Lever <cel@kernel.org> To: viro@zeniv.linux.org.uk, brauner@kernel.org, jack@suse.cz, hughd@google.com, akpm@linux-foundation.org, Liam.Howlett@oracle.com, oliver.sang@intel.com, feng.tang@intel.com Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, maple-tree@lists.infradead.org, linux-mm@kvack.org, lkp@intel.com Date: Tue, 13 Feb 2024 16:38:01 -0500 Message-ID: <170786028128.11135.4581426129369576567.stgit@91.116.238.104.host.secureserver.net> In-Reply-To: <170785993027.11135.8830043889278631735.stgit@91.116.238.104.host.secureserver.net> References: <170785993027.11135.8830043889278631735.stgit@91.116.238.104.host.secureserver.net> User-Agent: StGit/1.5 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790821470259376236 X-GMAIL-MSGID: 1790821470259376236 |
Series |
Use Maple Trees for simple_offset utilities
|
|
Commit Message
Chuck Lever
Feb. 13, 2024, 9:38 p.m. UTC
From: Chuck Lever <chuck.lever@oracle.com> Test robot reports: > kernel test robot noticed a -19.0% regression of aim9.disk_src.ops_per_sec on: > > commit: a2e459555c5f9da3e619b7e47a63f98574dc75f1 ("shmem: stable directory offsets") > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master Feng Tang further clarifies that: > ... the new simple_offset_add() > called by shmem_mknod() brings extra cost related with slab, > specifically the 'radix_tree_node', which cause the regression. Willy's analysis is that, over time, the test workload causes xa_alloc_cyclic() to fragment the underlying SLAB cache. This patch replaces the offset_ctx's xarray with a Maple Tree in the hope that Maple Tree's dense node mode will handle this scenario more scalably. In addition, we can widen the directory offset to an unsigned long everywhere. Suggested-by: Matthew Wilcox <willy@infradead.org> Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202309081306.3ecb3734-oliver.sang@intel.com Signed-off-by: Chuck Lever <chuck.lever@oracle.com> --- fs/libfs.c | 53 ++++++++++++++++++++++++++-------------------------- include/linux/fs.h | 5 +++-- 2 files changed, 29 insertions(+), 29 deletions(-)
Comments
On Tue 13-02-24 16:38:01, Chuck Lever wrote: > From: Chuck Lever <chuck.lever@oracle.com> > > Test robot reports: > > kernel test robot noticed a -19.0% regression of aim9.disk_src.ops_per_sec on: > > > > commit: a2e459555c5f9da3e619b7e47a63f98574dc75f1 ("shmem: stable directory offsets") > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > Feng Tang further clarifies that: > > ... the new simple_offset_add() > > called by shmem_mknod() brings extra cost related with slab, > > specifically the 'radix_tree_node', which cause the regression. > > Willy's analysis is that, over time, the test workload causes > xa_alloc_cyclic() to fragment the underlying SLAB cache. > > This patch replaces the offset_ctx's xarray with a Maple Tree in the > hope that Maple Tree's dense node mode will handle this scenario > more scalably. > > In addition, we can widen the directory offset to an unsigned long > everywhere. > > Suggested-by: Matthew Wilcox <willy@infradead.org> > Reported-by: kernel test robot <oliver.sang@intel.com> > Closes: https://lore.kernel.org/oe-lkp/202309081306.3ecb3734-oliver.sang@intel.com > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> OK, but this will need the performance numbers. Otherwise we have no idea whether this is worth it or not. Maybe you can ask Oliver Sang? Usually 0-day guys are quite helpful. > @@ -330,9 +329,9 @@ int simple_offset_empty(struct dentry *dentry) > if (!inode || !S_ISDIR(inode->i_mode)) > return ret; > > - index = 2; > + index = DIR_OFFSET_MIN; This bit should go into the simple_offset_empty() patch... > @@ -434,15 +433,15 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) > > /* In this case, ->private_data is protected by f_pos_lock */ > file->private_data = NULL; > - return vfs_setpos(file, offset, U32_MAX); > + return vfs_setpos(file, offset, MAX_LFS_FILESIZE); ^^^ Why this? It is ULONG_MAX << PAGE_SHIFT on 32-bit so that doesn't seem quite right? Why not use ULONG_MAX here directly? Otherwise the patch looks good to me. Honza
On Thu, Feb 15, 2024 at 02:06:01PM +0100, Jan Kara wrote: > On Tue 13-02-24 16:38:01, Chuck Lever wrote: > > From: Chuck Lever <chuck.lever@oracle.com> > > > > Test robot reports: > > > kernel test robot noticed a -19.0% regression of aim9.disk_src.ops_per_sec on: > > > > > > commit: a2e459555c5f9da3e619b7e47a63f98574dc75f1 ("shmem: stable directory offsets") > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > Feng Tang further clarifies that: > > > ... the new simple_offset_add() > > > called by shmem_mknod() brings extra cost related with slab, > > > specifically the 'radix_tree_node', which cause the regression. > > > > Willy's analysis is that, over time, the test workload causes > > xa_alloc_cyclic() to fragment the underlying SLAB cache. > > > > This patch replaces the offset_ctx's xarray with a Maple Tree in the > > hope that Maple Tree's dense node mode will handle this scenario > > more scalably. > > > > In addition, we can widen the directory offset to an unsigned long > > everywhere. > > > > Suggested-by: Matthew Wilcox <willy@infradead.org> > > Reported-by: kernel test robot <oliver.sang@intel.com> > > Closes: https://lore.kernel.org/oe-lkp/202309081306.3ecb3734-oliver.sang@intel.com > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > OK, but this will need the performance numbers. Yes, I totally concur. The point of this posting was to get some early review and start the ball rolling. Actually we expect roughly the same performance numbers now. "Dense node" support in Maple Tree is supposed to be the real win, but I'm not sure it's ready yet. > Otherwise we have no idea > whether this is worth it or not. Maybe you can ask Oliver Sang? Usually > 0-day guys are quite helpful. Oliver and Feng were copied on this series. > > @@ -330,9 +329,9 @@ int simple_offset_empty(struct dentry *dentry) > > if (!inode || !S_ISDIR(inode->i_mode)) > > return ret; > > > > - index = 2; > > + index = DIR_OFFSET_MIN; > > This bit should go into the simple_offset_empty() patch... > > > @@ -434,15 +433,15 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) > > > > /* In this case, ->private_data is protected by f_pos_lock */ > > file->private_data = NULL; > > - return vfs_setpos(file, offset, U32_MAX); > > + return vfs_setpos(file, offset, MAX_LFS_FILESIZE); > ^^^ > Why this? It is ULONG_MAX << PAGE_SHIFT on 32-bit so that doesn't seem > quite right? Why not use ULONG_MAX here directly? I initially changed U32_MAX to ULONG_MAX, but for some reason, the length checking in vfs_setpos() fails. There is probably a sign extension thing happening here that I don't understand. > Otherwise the patch looks good to me. As always, thank you for your review.
On Thu 15-02-24 08:45:33, Chuck Lever wrote: > On Thu, Feb 15, 2024 at 02:06:01PM +0100, Jan Kara wrote: > > On Tue 13-02-24 16:38:01, Chuck Lever wrote: > > > From: Chuck Lever <chuck.lever@oracle.com> > > > > > > Test robot reports: > > > > kernel test robot noticed a -19.0% regression of aim9.disk_src.ops_per_sec on: > > > > > > > > commit: a2e459555c5f9da3e619b7e47a63f98574dc75f1 ("shmem: stable directory offsets") > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > Feng Tang further clarifies that: > > > > ... the new simple_offset_add() > > > > called by shmem_mknod() brings extra cost related with slab, > > > > specifically the 'radix_tree_node', which cause the regression. > > > > > > Willy's analysis is that, over time, the test workload causes > > > xa_alloc_cyclic() to fragment the underlying SLAB cache. > > > > > > This patch replaces the offset_ctx's xarray with a Maple Tree in the > > > hope that Maple Tree's dense node mode will handle this scenario > > > more scalably. > > > > > > In addition, we can widen the directory offset to an unsigned long > > > everywhere. > > > > > > Suggested-by: Matthew Wilcox <willy@infradead.org> > > > Reported-by: kernel test robot <oliver.sang@intel.com> > > > Closes: https://lore.kernel.org/oe-lkp/202309081306.3ecb3734-oliver.sang@intel.com > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > > > OK, but this will need the performance numbers. > > Yes, I totally concur. The point of this posting was to get some > early review and start the ball rolling. > > Actually we expect roughly the same performance numbers now. "Dense > node" support in Maple Tree is supposed to be the real win, but > I'm not sure it's ready yet. > > > > Otherwise we have no idea > > whether this is worth it or not. Maybe you can ask Oliver Sang? Usually > > 0-day guys are quite helpful. > > Oliver and Feng were copied on this series. > > > > > @@ -330,9 +329,9 @@ int simple_offset_empty(struct dentry *dentry) > > > if (!inode || !S_ISDIR(inode->i_mode)) > > > return ret; > > > > > > - index = 2; > > > + index = DIR_OFFSET_MIN; > > > > This bit should go into the simple_offset_empty() patch... > > > > > @@ -434,15 +433,15 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) > > > > > > /* In this case, ->private_data is protected by f_pos_lock */ > > > file->private_data = NULL; > > > - return vfs_setpos(file, offset, U32_MAX); > > > + return vfs_setpos(file, offset, MAX_LFS_FILESIZE); > > ^^^ > > Why this? It is ULONG_MAX << PAGE_SHIFT on 32-bit so that doesn't seem > > quite right? Why not use ULONG_MAX here directly? > > I initially changed U32_MAX to ULONG_MAX, but for some reason, the > length checking in vfs_setpos() fails. There is probably a sign > extension thing happening here that I don't understand. Right. loff_t is signed (long long). So I think you should make the 'offset' be long instead of unsigned long and allow values 0..LONG_MAX? Then you can pass LONG_MAX here. You potentially loose half of the usable offsets on 32-bit userspace with 64-bit file offsets but who cares I guess? Honza
On Thu, Feb 15, 2024 at 08:45:33AM -0500, Chuck Lever wrote: > On Thu, Feb 15, 2024 at 02:06:01PM +0100, Jan Kara wrote: > > On Tue 13-02-24 16:38:01, Chuck Lever wrote: > > > From: Chuck Lever <chuck.lever@oracle.com> > > > > > > Test robot reports: > > > > kernel test robot noticed a -19.0% regression of aim9.disk_src.ops_per_sec on: > > > > > > > > commit: a2e459555c5f9da3e619b7e47a63f98574dc75f1 ("shmem: stable directory offsets") > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > Feng Tang further clarifies that: > > > > ... the new simple_offset_add() > > > > called by shmem_mknod() brings extra cost related with slab, > > > > specifically the 'radix_tree_node', which cause the regression. > > > > > > Willy's analysis is that, over time, the test workload causes > > > xa_alloc_cyclic() to fragment the underlying SLAB cache. > > > > > > This patch replaces the offset_ctx's xarray with a Maple Tree in the > > > hope that Maple Tree's dense node mode will handle this scenario > > > more scalably. > > > > > > In addition, we can widen the directory offset to an unsigned long > > > everywhere. > > > > > > Suggested-by: Matthew Wilcox <willy@infradead.org> > > > Reported-by: kernel test robot <oliver.sang@intel.com> > > > Closes: https://lore.kernel.org/oe-lkp/202309081306.3ecb3734-oliver.sang@intel.com > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > > > OK, but this will need the performance numbers. > > Yes, I totally concur. The point of this posting was to get some > early review and start the ball rolling. > > Actually we expect roughly the same performance numbers now. "Dense > node" support in Maple Tree is supposed to be the real win, but > I'm not sure it's ready yet. I keep repeating this but we need a better way to request performance tests for specific series/branches. Maybe I can add a vfs.perf branch where we can put patches that we suspect have positive/negative perf impact and that perf bot can pull that in. I know, that's a fuzzy boundary but for stuff like this where we already know that there's a perf impact that's important for us it would really help. Because it is royally annoying to get a perf regression report after a patch has been in -next for a long time already and the merge window is coming up or we already merged that stuff.
hi, Chuck Lever, On Thu, Feb 15, 2024 at 08:45:33AM -0500, Chuck Lever wrote: > On Thu, Feb 15, 2024 at 02:06:01PM +0100, Jan Kara wrote: > > On Tue 13-02-24 16:38:01, Chuck Lever wrote: > > > From: Chuck Lever <chuck.lever@oracle.com> > > > > > > Test robot reports: > > > > kernel test robot noticed a -19.0% regression of aim9.disk_src.ops_per_sec on: > > > > > > > > commit: a2e459555c5f9da3e619b7e47a63f98574dc75f1 ("shmem: stable directory offsets") > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > Feng Tang further clarifies that: > > > > ... the new simple_offset_add() > > > > called by shmem_mknod() brings extra cost related with slab, > > > > specifically the 'radix_tree_node', which cause the regression. > > > > > > Willy's analysis is that, over time, the test workload causes > > > xa_alloc_cyclic() to fragment the underlying SLAB cache. > > > > > > This patch replaces the offset_ctx's xarray with a Maple Tree in the > > > hope that Maple Tree's dense node mode will handle this scenario > > > more scalably. > > > > > > In addition, we can widen the directory offset to an unsigned long > > > everywhere. > > > > > > Suggested-by: Matthew Wilcox <willy@infradead.org> > > > Reported-by: kernel test robot <oliver.sang@intel.com> > > > Closes: https://lore.kernel.org/oe-lkp/202309081306.3ecb3734-oliver.sang@intel.com > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > > > OK, but this will need the performance numbers. > > Yes, I totally concur. The point of this posting was to get some > early review and start the ball rolling. > > Actually we expect roughly the same performance numbers now. "Dense > node" support in Maple Tree is supposed to be the real win, but > I'm not sure it's ready yet. > > > > Otherwise we have no idea > > whether this is worth it or not. Maybe you can ask Oliver Sang? Usually > > 0-day guys are quite helpful. > > Oliver and Feng were copied on this series. we are in holidays last week, now we are back. I noticed there is v2 for this patch set https://lore.kernel.org/all/170820145616.6328.12620992971699079156.stgit@91.116.238.104.host.secureserver.net/ and you also put it in a branch: https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git "simple-offset-maple" branch. we will test aim9 performance based on this branch. Thanks > > > > > @@ -330,9 +329,9 @@ int simple_offset_empty(struct dentry *dentry) > > > if (!inode || !S_ISDIR(inode->i_mode)) > > > return ret; > > > > > > - index = 2; > > > + index = DIR_OFFSET_MIN; > > > > This bit should go into the simple_offset_empty() patch... > > > > > @@ -434,15 +433,15 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) > > > > > > /* In this case, ->private_data is protected by f_pos_lock */ > > > file->private_data = NULL; > > > - return vfs_setpos(file, offset, U32_MAX); > > > + return vfs_setpos(file, offset, MAX_LFS_FILESIZE); > > ^^^ > > Why this? It is ULONG_MAX << PAGE_SHIFT on 32-bit so that doesn't seem > > quite right? Why not use ULONG_MAX here directly? > > I initially changed U32_MAX to ULONG_MAX, but for some reason, the > length checking in vfs_setpos() fails. There is probably a sign > extension thing happening here that I don't understand. > > > > Otherwise the patch looks good to me. > > As always, thank you for your review. > > > -- > Chuck Lever
On Sun, Feb 18, 2024 at 10:02:37AM +0800, Oliver Sang wrote: > hi, Chuck Lever, > > On Thu, Feb 15, 2024 at 08:45:33AM -0500, Chuck Lever wrote: > > On Thu, Feb 15, 2024 at 02:06:01PM +0100, Jan Kara wrote: > > > On Tue 13-02-24 16:38:01, Chuck Lever wrote: > > > > From: Chuck Lever <chuck.lever@oracle.com> > > > > > > > > Test robot reports: > > > > > kernel test robot noticed a -19.0% regression of aim9.disk_src.ops_per_sec on: > > > > > > > > > > commit: a2e459555c5f9da3e619b7e47a63f98574dc75f1 ("shmem: stable directory offsets") > > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > > > Feng Tang further clarifies that: > > > > > ... the new simple_offset_add() > > > > > called by shmem_mknod() brings extra cost related with slab, > > > > > specifically the 'radix_tree_node', which cause the regression. > > > > > > > > Willy's analysis is that, over time, the test workload causes > > > > xa_alloc_cyclic() to fragment the underlying SLAB cache. > > > > > > > > This patch replaces the offset_ctx's xarray with a Maple Tree in the > > > > hope that Maple Tree's dense node mode will handle this scenario > > > > more scalably. > > > > > > > > In addition, we can widen the directory offset to an unsigned long > > > > everywhere. > > > > > > > > Suggested-by: Matthew Wilcox <willy@infradead.org> > > > > Reported-by: kernel test robot <oliver.sang@intel.com> > > > > Closes: https://lore.kernel.org/oe-lkp/202309081306.3ecb3734-oliver.sang@intel.com > > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > > > > > OK, but this will need the performance numbers. > > > > Yes, I totally concur. The point of this posting was to get some > > early review and start the ball rolling. > > > > Actually we expect roughly the same performance numbers now. "Dense > > node" support in Maple Tree is supposed to be the real win, but > > I'm not sure it's ready yet. > > > > > > > Otherwise we have no idea > > > whether this is worth it or not. Maybe you can ask Oliver Sang? Usually > > > 0-day guys are quite helpful. > > > > Oliver and Feng were copied on this series. > > we are in holidays last week, now we are back. > > I noticed there is v2 for this patch set > https://lore.kernel.org/all/170820145616.6328.12620992971699079156.stgit@91.116.238.104.host.secureserver.net/ > > and you also put it in a branch: > https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git > "simple-offset-maple" branch. > > we will test aim9 performance based on this branch. Thanks Very much appreciated! > > > > @@ -330,9 +329,9 @@ int simple_offset_empty(struct dentry *dentry) > > > > if (!inode || !S_ISDIR(inode->i_mode)) > > > > return ret; > > > > > > > > - index = 2; > > > > + index = DIR_OFFSET_MIN; > > > > > > This bit should go into the simple_offset_empty() patch... > > > > > > > @@ -434,15 +433,15 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) > > > > > > > > /* In this case, ->private_data is protected by f_pos_lock */ > > > > file->private_data = NULL; > > > > - return vfs_setpos(file, offset, U32_MAX); > > > > + return vfs_setpos(file, offset, MAX_LFS_FILESIZE); > > > ^^^ > > > Why this? It is ULONG_MAX << PAGE_SHIFT on 32-bit so that doesn't seem > > > quite right? Why not use ULONG_MAX here directly? > > > > I initially changed U32_MAX to ULONG_MAX, but for some reason, the > > length checking in vfs_setpos() fails. There is probably a sign > > extension thing happening here that I don't understand. > > > > > > > Otherwise the patch looks good to me. > > > > As always, thank you for your review. > > > > > > -- > > Chuck Lever
hi, Chuck Lever, On Sun, Feb 18, 2024 at 10:57:07AM -0500, Chuck Lever wrote: > On Sun, Feb 18, 2024 at 10:02:37AM +0800, Oliver Sang wrote: > > hi, Chuck Lever, > > > > On Thu, Feb 15, 2024 at 08:45:33AM -0500, Chuck Lever wrote: > > > On Thu, Feb 15, 2024 at 02:06:01PM +0100, Jan Kara wrote: > > > > On Tue 13-02-24 16:38:01, Chuck Lever wrote: > > > > > From: Chuck Lever <chuck.lever@oracle.com> > > > > > > > > > > Test robot reports: > > > > > > kernel test robot noticed a -19.0% regression of aim9.disk_src.ops_per_sec on: > > > > > > > > > > > > commit: a2e459555c5f9da3e619b7e47a63f98574dc75f1 ("shmem: stable directory offsets") > > > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > > > > > Feng Tang further clarifies that: > > > > > > ... the new simple_offset_add() > > > > > > called by shmem_mknod() brings extra cost related with slab, > > > > > > specifically the 'radix_tree_node', which cause the regression. > > > > > > > > > > Willy's analysis is that, over time, the test workload causes > > > > > xa_alloc_cyclic() to fragment the underlying SLAB cache. > > > > > > > > > > This patch replaces the offset_ctx's xarray with a Maple Tree in the > > > > > hope that Maple Tree's dense node mode will handle this scenario > > > > > more scalably. > > > > > > > > > > In addition, we can widen the directory offset to an unsigned long > > > > > everywhere. > > > > > > > > > > Suggested-by: Matthew Wilcox <willy@infradead.org> > > > > > Reported-by: kernel test robot <oliver.sang@intel.com> > > > > > Closes: https://lore.kernel.org/oe-lkp/202309081306.3ecb3734-oliver.sang@intel.com > > > > > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > > > > > > > > OK, but this will need the performance numbers. > > > > > > Yes, I totally concur. The point of this posting was to get some > > > early review and start the ball rolling. > > > > > > Actually we expect roughly the same performance numbers now. "Dense > > > node" support in Maple Tree is supposed to be the real win, but > > > I'm not sure it's ready yet. > > > > > > > > > > Otherwise we have no idea > > > > whether this is worth it or not. Maybe you can ask Oliver Sang? Usually > > > > 0-day guys are quite helpful. > > > > > > Oliver and Feng were copied on this series. > > > > we are in holidays last week, now we are back. > > > > I noticed there is v2 for this patch set > > https://lore.kernel.org/all/170820145616.6328.12620992971699079156.stgit@91.116.238.104.host.secureserver.net/ > > > > and you also put it in a branch: > > https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git > > "simple-offset-maple" branch. > > > > we will test aim9 performance based on this branch. Thanks > > Very much appreciated! always our pleasure! we've already sent out a report [1] to you for the commit a616bc6667 in new branch. we saw 11.8% improvement of aim9.disk_src.ops_per_sec on it comparing to its parent. so the regression we saw on a2e459555c is 'half' recovered. since I noticed the performance for a2e459555c, v6.8-rc4 and f3f24869a1 (parent of a616bc6667) are very similar, so ignored results from v6.8-rc4 and f3f24869a1 in below tables for brief. if you want a full table, please let me know. Thanks! summary for aim9.disk_src.ops_per_sec: ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime: gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/disk_src/aim9/300s commit: 23a31d8764 ("shmem: Refactor shmem_symlink()") a2e459555c ("shmem: stable directory offsets") a616bc6667 ("libfs: Convert simple directory offsets to use a Maple Tree") 23a31d87645c6527 a2e459555c5f9da3e619b7e47a6 a616bc666748063733c62e15ea4 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 202424 -19.0% 163868 -9.3% 183678 aim9.disk_src.ops_per_sec full data: ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime: gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/disk_src/aim9/300s commit: 23a31d8764 ("shmem: Refactor shmem_symlink()") a2e459555c ("shmem: stable directory offsets") a616bc6667 ("libfs: Convert simple directory offsets to use a Maple Tree") 23a31d87645c6527 a2e459555c5f9da3e619b7e47a6 a616bc666748063733c62e15ea4 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 1404 +0.6% 1412 +3.3% 1450 boot-time.idle 0.26 ± 9% +0.1 0.36 ± 2% -0.1 0.20 ± 4% mpstat.cpu.all.soft% 0.61 -0.1 0.52 -0.0 0.58 mpstat.cpu.all.usr% 1.00 +0.0% 1.00 +19.5% 1.19 ± 2% vmstat.procs.r 1525 ± 6% -5.6% 1440 +9.4% 1668 ± 4% vmstat.system.cs 52670 +5.0% 55323 ± 3% +12.7% 59381 ± 3% turbostat.C1 0.02 +0.0 0.02 +0.0 0.03 ± 10% turbostat.C1% 12949 ± 9% -1.2% 12795 ± 6% -19.6% 10412 ± 6% turbostat.POLL 115468 ± 24% +11.9% 129258 ± 7% +16.5% 134545 ± 5% numa-meminfo.node0.AnonPages.max 2015 ± 12% -7.5% 1864 ± 5% +30.3% 2624 ± 10% numa-meminfo.node0.PageTables 4795 ± 30% +1.1% 4846 ± 37% +117.0% 10405 ± 21% numa-meminfo.node0.Shmem 6442 ± 5% -0.6% 6401 ± 4% -19.6% 5180 ± 7% numa-meminfo.node1.KernelStack 13731 ± 92% -88.7% 1546 ± 6% +161.6% 35915 ± 23% time.involuntary_context_switches 94.83 -4.2% 90.83 +1.2% 96.00 time.percent_of_cpu_this_job_got 211.64 +0.5% 212.70 +4.0% 220.06 time.system_time 73.62 -17.6% 60.69 -6.4% 68.94 time.user_time 202424 -19.0% 163868 -9.3% 183678 aim9.disk_src.ops_per_sec 13731 ± 92% -88.7% 1546 ± 6% +161.6% 35915 ± 23% aim9.time.involuntary_context_switches 94.83 -4.2% 90.83 +1.2% 96.00 aim9.time.percent_of_cpu_this_job_got 211.64 +0.5% 212.70 +4.0% 220.06 aim9.time.system_time 73.62 -17.6% 60.69 -6.4% 68.94 aim9.time.user_time 174558 ± 4% -1.0% 172852 ± 7% -19.7% 140084 ± 7% meminfo.DirectMap4k 94166 +6.6% 100388 -14.4% 80579 meminfo.KReclaimable 12941 +0.4% 12989 -14.4% 11078 meminfo.KernelStack 3769 +0.2% 3775 +31.5% 4955 meminfo.PageTables 79298 +0.0% 79298 -70.6% 23298 meminfo.Percpu 94166 +6.6% 100388 -14.4% 80579 meminfo.SReclaimable 204209 +2.9% 210111 -9.6% 184661 meminfo.Slab 503.33 ± 12% -7.5% 465.67 ± 5% +30.4% 656.39 ± 10% numa-vmstat.node0.nr_page_table_pages 1198 ± 30% +1.1% 1211 ± 37% +117.0% 2601 ± 21% numa-vmstat.node0.nr_shmem 220.33 +0.2% 220.83 ± 2% -97.3% 6.00 ±100% numa-vmstat.node1.nr_active_file 167.00 +0.2% 167.33 ± 2% -87.7% 20.48 ±100% numa-vmstat.node1.nr_inactive_file 6443 ± 5% -0.6% 6405 ± 4% -19.5% 5184 ± 7% numa-vmstat.node1.nr_kernel_stack 220.33 +0.2% 220.83 ± 2% -97.3% 6.00 ±100% numa-vmstat.node1.nr_zone_active_file 167.00 +0.2% 167.33 ± 2% -87.7% 20.48 ±100% numa-vmstat.node1.nr_zone_inactive_file 0.04 ± 25% +4.1% 0.05 ± 33% -40.1% 0.03 ± 34% perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read 0.01 ± 4% +1.6% 0.01 ± 8% +136.9% 0.02 ± 29% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 0.16 ± 10% -18.9% 0.13 ± 12% -16.4% 0.13 ± 15% perf-sched.sch_delay.max.ms.pipe_read.vfs_read.ksys_read.do_syscall_64 0.14 ± 15% -6.0% 0.14 ± 20% -30.4% 0.10 ± 26% perf-sched.sch_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone 0.04 ± 7% +1802.4% 0.78 ±115% +7258.6% 3.00 ± 56% perf-sched.sch_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm 0.05 ± 40% -14.5% 0.04 ± 31% +6155.4% 3.09 perf-sched.sch_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 0.01 ± 8% +25.5% 0.01 ± 21% +117.0% 0.02 ± 30% perf-sched.total_sch_delay.average.ms 0.16 ± 11% +1204.9% 2.12 ± 90% +2225.8% 3.77 ± 10% perf-sched.total_sch_delay.max.ms 58.50 ± 28% -4.8% 55.67 ± 14% -23.9% 44.50 ± 4% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.usleep_range_state.ipmi_thread.kthread 277.83 ± 3% +10.3% 306.50 ± 5% +17.9% 327.62 ± 4% perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm 0.10 ± 75% -47.1% 0.05 ± 86% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.dput.path_put.vfs_statx.vfs_fstatat 0.10 ± 72% -10.8% 0.09 ± 67% -100.0% 0.00 perf-sched.wait_time.avg.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt 2.00 ± 27% -4.8% 1.90 ± 14% -23.7% 1.53 ± 4% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select 0.16 ± 81% +20.1% 0.19 ±104% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.dput.path_put.vfs_statx.vfs_fstatat 0.22 ± 77% +14.2% 0.25 ± 54% -100.0% 0.00 perf-sched.wait_time.max.ms.exit_to_user_mode_loop.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt 7.32 ± 43% -10.3% 6.57 ± 20% -33.2% 4.89 ± 6% perf-sched.wait_time.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select 220.33 +0.2% 220.83 ± 2% -94.6% 12.00 proc-vmstat.nr_active_file 676766 -0.0% 676706 +4.1% 704377 proc-vmstat.nr_file_pages 167.00 +0.2% 167.33 ± 2% -75.5% 40.98 proc-vmstat.nr_inactive_file 12941 +0.4% 12988 -14.4% 11083 proc-vmstat.nr_kernel_stack 942.50 +0.1% 943.50 +31.4% 1238 proc-vmstat.nr_page_table_pages 7915 ± 3% -0.8% 7855 +8.6% 8599 proc-vmstat.nr_shmem 23541 +6.5% 25074 -14.4% 20144 proc-vmstat.nr_slab_reclaimable 27499 -0.3% 27422 -5.4% 26016 proc-vmstat.nr_slab_unreclaimable 668461 +0.0% 668461 +4.2% 696493 proc-vmstat.nr_unevictable 220.33 +0.2% 220.83 ± 2% -94.6% 12.00 proc-vmstat.nr_zone_active_file 167.00 +0.2% 167.33 ± 2% -75.5% 40.98 proc-vmstat.nr_zone_inactive_file 668461 +0.0% 668461 +4.2% 696493 proc-vmstat.nr_zone_unevictable 722.17 ± 71% -34.7% 471.67 ±136% -98.3% 12.62 ±129% proc-vmstat.numa_hint_faults_local 1437319 ± 24% +377.6% 6864201 -47.8% 750673 ± 7% proc-vmstat.numa_hit 1387016 ± 25% +391.4% 6815486 -49.5% 700947 ± 7% proc-vmstat.numa_local 50329 -0.0% 50324 -1.2% 49704 proc-vmstat.numa_other 4864362 ± 34% +453.6% 26931180 -66.1% 1648373 ± 17% proc-vmstat.pgalloc_normal 4835960 ± 34% +455.4% 26856610 -66.3% 1628178 ± 18% proc-vmstat.pgfree 11.21 +23.7% 13.87 -81.8% 2.04 perf-stat.i.MPKI 7.223e+08 -4.4% 6.907e+08 -3.9% 6.94e+08 perf-stat.i.branch-instructions 2.67 +0.2 2.88 +0.0 2.70 perf-stat.i.branch-miss-rate% 19988363 +2.8% 20539702 -3.1% 19363031 perf-stat.i.branch-misses 17.36 -2.8 14.59 +0.4 17.77 perf-stat.i.cache-miss-rate% 40733859 +19.5% 48659982 -1.9% 39962840 perf-stat.i.cache-references 1482 ± 7% -5.7% 1398 +9.6% 1623 ± 5% perf-stat.i.context-switches 1.76 +3.5% 1.82 +5.1% 1.85 perf-stat.i.cpi 55.21 +5.4% 58.21 ± 2% +0.4% 55.45 perf-stat.i.cpu-migrations 16524721 ± 8% -4.8% 15726404 ± 4% -13.1% 14367627 ± 4% perf-stat.i.dTLB-load-misses 1.01e+09 -3.8% 9.719e+08 -4.4% 9.659e+08 perf-stat.i.dTLB-loads 0.26 ± 4% -0.0 0.23 ± 3% -0.0 0.25 ± 3% perf-stat.i.dTLB-store-miss-rate% 2166022 ± 4% -6.9% 2015917 ± 3% -7.0% 2014037 ± 3% perf-stat.i.dTLB-store-misses 8.503e+08 +5.5% 8.968e+08 -3.5% 8.205e+08 perf-stat.i.dTLB-stores 69.22 ± 4% +6.4 75.60 +14.4 83.60 ± 3% perf-stat.i.iTLB-load-miss-rate% 709457 ± 5% -5.4% 670950 ± 2% +133.6% 1657233 perf-stat.i.iTLB-load-misses 316455 ± 12% -31.6% 216531 ± 3% +3.5% 327592 ± 19% perf-stat.i.iTLB-loads 3.722e+09 -3.1% 3.608e+09 -4.6% 3.553e+09 perf-stat.i.instructions 5243 ± 5% +2.2% 5357 ± 2% -59.1% 2142 perf-stat.i.instructions-per-iTLB-miss 0.57 -3.3% 0.55 -4.8% 0.54 perf-stat.i.ipc 865.04 -10.4% 775.02 ± 3% -2.5% 843.12 perf-stat.i.metric.K/sec 53.84 -0.4% 53.61 -4.0% 51.71 perf-stat.i.metric.M/sec 47.51 -2.1 45.37 +0.7 48.17 perf-stat.i.node-load-miss-rate% 88195 ± 3% +5.2% 92745 ± 4% +16.4% 102647 ± 6% perf-stat.i.node-load-misses 106705 ± 3% +14.8% 122490 ± 5% +13.2% 120774 ± 5% perf-stat.i.node-loads 107169 ± 4% +29.0% 138208 ± 7% +7.5% 115217 ± 5% perf-stat.i.node-stores 10.94 +23.3% 13.49 -81.8% 1.99 perf-stat.overall.MPKI 2.77 +0.2 2.97 +0.0 2.79 perf-stat.overall.branch-miss-rate% 17.28 -2.7 14.56 +0.4 17.67 perf-stat.overall.cache-miss-rate% 1.73 +3.4% 1.79 +5.0% 1.82 perf-stat.overall.cpi 0.25 ± 4% -0.0 0.22 ± 3% -0.0 0.24 ± 3% perf-stat.overall.dTLB-store-miss-rate% 69.20 ± 4% +6.4 75.60 +14.4 83.58 ± 3% perf-stat.overall.iTLB-load-miss-rate% 5260 ± 5% +2.3% 5380 ± 2% -59.2% 2144 perf-stat.overall.instructions-per-iTLB-miss 0.58 -3.2% 0.56 -4.7% 0.55 perf-stat.overall.ipc 45.25 -2.2 43.10 +0.7 45.93 perf-stat.overall.node-load-miss-rate% 7.199e+08 -4.4% 6.883e+08 -3.9% 6.917e+08 perf-stat.ps.branch-instructions 19919808 +2.8% 20469001 -3.1% 19299968 perf-stat.ps.branch-misses 40597326 +19.5% 48497201 -1.9% 39829580 perf-stat.ps.cache-references 1477 ± 7% -5.7% 1393 +9.6% 1618 ± 5% perf-stat.ps.context-switches 55.06 +5.4% 58.03 ± 2% +0.5% 55.32 perf-stat.ps.cpu-migrations 16469488 ± 8% -4.8% 15673772 ± 4% -13.1% 14319828 ± 4% perf-stat.ps.dTLB-load-misses 1.007e+09 -3.8% 9.686e+08 -4.4% 9.627e+08 perf-stat.ps.dTLB-loads 2158768 ± 4% -6.9% 2009174 ± 3% -7.0% 2007326 ± 3% perf-stat.ps.dTLB-store-misses 8.475e+08 +5.5% 8.937e+08 -3.5% 8.178e+08 perf-stat.ps.dTLB-stores 707081 ± 5% -5.4% 668703 ± 2% +133.6% 1651678 perf-stat.ps.iTLB-load-misses 315394 ± 12% -31.6% 215816 ± 3% +3.5% 326463 ± 19% perf-stat.ps.iTLB-loads 3.71e+09 -3.1% 3.595e+09 -4.5% 3.541e+09 perf-stat.ps.instructions 87895 ± 3% +5.2% 92424 ± 4% +16.4% 102341 ± 6% perf-stat.ps.node-load-misses 106351 ± 3% +14.8% 122083 ± 5% +13.2% 120439 ± 5% perf-stat.ps.node-loads 106728 ± 4% +29.1% 137740 ± 7% +7.6% 114824 ± 5% perf-stat.ps.node-stores 1.117e+12 -3.0% 1.084e+12 -4.5% 1.067e+12 perf-stat.total.instructions 10.55 ± 95% -100.0% 0.00 -100.0% 0.00 sched_debug.cfs_rq:/.MIN_vruntime.avg 506.30 ± 95% -100.0% 0.00 -100.0% 0.00 sched_debug.cfs_rq:/.MIN_vruntime.max 0.00 +0.0% 0.00 -100.0% 0.00 sched_debug.cfs_rq:/.MIN_vruntime.min 1.08 ± 7% -7.7% 1.00 -46.2% 0.58 ± 27% sched_debug.cfs_rq:/.h_nr_running.max 538959 ± 24% -23.2% 414090 -58.3% 224671 ± 31% sched_debug.cfs_rq:/.load.max 130191 ± 14% -13.3% 112846 ± 6% -56.6% 56505 ± 39% sched_debug.cfs_rq:/.load.stddev 10.56 ± 95% -100.0% 0.00 -100.0% 0.00 sched_debug.cfs_rq:/.max_vruntime.avg 506.64 ± 95% -100.0% 0.00 -100.0% 0.00 sched_debug.cfs_rq:/.max_vruntime.max 0.00 +0.0% 0.00 -100.0% 0.00 sched_debug.cfs_rq:/.max_vruntime.min 116849 ± 27% -51.2% 56995 ± 20% -66.7% 38966 ± 39% sched_debug.cfs_rq:/.min_vruntime.max 1248 ± 14% +9.8% 1370 ± 15% -29.8% 876.86 ± 16% sched_debug.cfs_rq:/.min_vruntime.min 20484 ± 22% -37.0% 12910 ± 15% -60.7% 8059 ± 32% sched_debug.cfs_rq:/.min_vruntime.stddev 1.08 ± 7% -7.7% 1.00 -46.2% 0.58 ± 27% sched_debug.cfs_rq:/.nr_running.max 1223 ±191% -897.4% -9754 -100.0% 0.00 sched_debug.cfs_rq:/.spread0.avg 107969 ± 29% -65.3% 37448 ± 39% -100.0% 0.00 sched_debug.cfs_rq:/.spread0.max -7628 +138.2% -18173 -100.0% 0.00 sched_debug.cfs_rq:/.spread0.min 20484 ± 22% -37.0% 12910 ± 15% -100.0% 0.00 sched_debug.cfs_rq:/.spread0.stddev 29.84 ± 19% -1.1% 29.52 ± 14% -100.0% 0.00 sched_debug.cfs_rq:/.util_est_enqueued.avg 569.91 ± 11% +4.5% 595.58 -100.0% 0.00 sched_debug.cfs_rq:/.util_est_enqueued.max 109.06 ± 13% +3.0% 112.32 ± 5% -100.0% 0.00 sched_debug.cfs_rq:/.util_est_enqueued.stddev 910320 +0.0% 910388 -44.2% 507736 ± 28% sched_debug.cpu.avg_idle.avg 1052715 ± 5% -3.8% 1012910 -45.8% 570219 ± 28% sched_debug.cpu.avg_idle.max 175426 ± 8% +2.8% 180422 ± 6% -42.5% 100839 ± 20% sched_debug.cpu.clock.avg 175428 ± 8% +2.8% 180424 ± 6% -42.5% 100840 ± 20% sched_debug.cpu.clock.max 175424 ± 8% +2.8% 180421 ± 6% -42.5% 100838 ± 20% sched_debug.cpu.clock.min 170979 ± 8% +2.8% 175682 ± 6% -42.5% 98373 ± 20% sched_debug.cpu.clock_task.avg 173398 ± 8% +2.6% 177937 ± 6% -42.6% 99568 ± 20% sched_debug.cpu.clock_task.max 165789 ± 8% +3.2% 171014 ± 6% -42.4% 95513 ± 20% sched_debug.cpu.clock_task.min 2178 ± 8% -13.1% 1893 ± 18% -49.7% 1094 ± 33% sched_debug.cpu.clock_task.stddev 7443 ± 4% +1.7% 7566 ± 3% -43.0% 4246 ± 24% sched_debug.cpu.curr->pid.max 1409 ± 2% +1.1% 1426 ± 2% -42.0% 818.46 ± 25% sched_debug.cpu.curr->pid.stddev 502082 +0.0% 502206 -43.9% 281834 ± 29% sched_debug.cpu.max_idle_balance_cost.avg 567767 ± 5% -3.3% 548996 ± 2% -46.6% 303439 ± 27% sched_debug.cpu.max_idle_balance_cost.max 4294 +0.0% 4294 -43.7% 2415 ± 29% sched_debug.cpu.next_balance.avg 4294 +0.0% 4294 -43.7% 2415 ± 29% sched_debug.cpu.next_balance.max 4294 +0.0% 4294 -43.7% 2415 ± 29% sched_debug.cpu.next_balance.min 0.29 ± 3% -1.2% 0.28 -43.0% 0.16 ± 29% sched_debug.cpu.nr_running.stddev 8212 ± 7% -2.2% 8032 ± 4% -40.7% 4867 ± 20% sched_debug.cpu.nr_switches.avg 55209 ± 14% -21.8% 43154 ± 14% -57.0% 23755 ± 20% sched_debug.cpu.nr_switches.max 1272 ± 23% +10.2% 1402 ± 8% -39.1% 775.51 ± 29% sched_debug.cpu.nr_switches.min 9805 ± 13% -13.7% 8459 ± 8% -50.8% 4825 ± 20% sched_debug.cpu.nr_switches.stddev -15.73 -14.1% -13.52 -64.8% -5.53 sched_debug.cpu.nr_uninterruptible.min 6.08 ± 27% -1.0% 6.02 ± 13% -53.7% 2.82 ± 24% sched_debug.cpu.nr_uninterruptible.stddev 175425 ± 8% +2.8% 180421 ± 6% -42.5% 100838 ± 20% sched_debug.cpu_clk 4.295e+09 +0.0% 4.295e+09 -43.7% 2.416e+09 ± 29% sched_debug.jiffies 174815 ± 8% +2.9% 179811 ± 6% -42.5% 100505 ± 20% sched_debug.ktime 175972 ± 8% +2.8% 180955 ± 6% -42.5% 101150 ± 20% sched_debug.sched_clk 58611259 +0.0% 58611259 -94.0% 3508734 ± 29% sched_debug.sysctl_sched.sysctl_sched_features 0.75 +0.0% 0.75 -100.0% 0.00 sched_debug.sysctl_sched.sysctl_sched_idle_min_granularity 24.00 +0.0% 24.00 -100.0% 0.00 sched_debug.sysctl_sched.sysctl_sched_latency 3.00 +0.0% 3.00 -100.0% 0.00 sched_debug.sysctl_sched.sysctl_sched_min_granularity 4.00 +0.0% 4.00 -100.0% 0.00 sched_debug.sysctl_sched.sysctl_sched_wakeup_granularity 0.26 ±100% -0.3 0.00 +1.2 1.44 ± 9% perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close 2.08 ± 26% -0.2 1.87 ± 12% -1.4 0.63 ± 11% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close 0.00 +0.0 0.00 +0.7 0.67 ± 11% perf-profile.calltrace.cycles-pp.rcu_sched_clock_irq.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues 0.00 +0.0 0.00 +0.7 0.71 ± 18% perf-profile.calltrace.cycles-pp.rebalance_domains.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt 0.00 +0.0 0.00 +0.8 0.80 ± 14% perf-profile.calltrace.cycles-pp.getname_flags.__do_sys_newstat.do_syscall_64.entry_SYSCALL_64_after_hwframe.__xstat64 0.00 +0.0 0.00 +0.8 0.84 ± 12% perf-profile.calltrace.cycles-pp.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close 0.00 +0.0 0.00 +1.0 0.98 ± 13% perf-profile.calltrace.cycles-pp.mas_alloc_cyclic.mtree_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open 0.00 +0.0 0.00 +1.1 1.10 ± 14% perf-profile.calltrace.cycles-pp.mtree_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups 0.00 +0.0 0.00 +1.2 1.20 ± 13% perf-profile.calltrace.cycles-pp.mas_erase.mtree_erase.simple_offset_remove.shmem_unlink.vfs_unlink 0.00 +0.0 0.00 +1.3 1.34 ± 15% perf-profile.calltrace.cycles-pp.link_path_walk.path_lookupat.filename_lookup.vfs_statx.__do_sys_newstat 0.00 +0.0 0.00 +1.4 1.35 ± 12% perf-profile.calltrace.cycles-pp.mtree_erase.simple_offset_remove.shmem_unlink.vfs_unlink.do_unlinkat 0.00 +0.0 0.00 +1.6 1.56 ± 18% perf-profile.calltrace.cycles-pp.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state 0.00 +0.0 0.00 +1.7 1.73 ± 6% perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues 0.00 +0.0 0.00 +1.8 1.80 ± 16% perf-profile.calltrace.cycles-pp.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter 0.00 +0.0 0.00 +2.0 2.03 ± 15% perf-profile.calltrace.cycles-pp.path_lookupat.filename_lookup.vfs_statx.__do_sys_newstat.do_syscall_64 0.00 +0.0 0.00 +2.2 2.16 ± 15% perf-profile.calltrace.cycles-pp.filename_lookup.vfs_statx.__do_sys_newstat.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.00 +0.0 0.00 +2.9 2.94 ± 14% perf-profile.calltrace.cycles-pp.vfs_statx.__do_sys_newstat.do_syscall_64.entry_SYSCALL_64_after_hwframe.__xstat64 0.00 +0.0 0.00 +3.2 3.19 ± 8% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt 0.00 +0.0 0.00 +3.4 3.35 ± 8% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt 0.00 +0.0 0.00 +3.9 3.88 ± 8% perf-profile.calltrace.cycles-pp.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt 0.00 +0.8 0.75 ± 12% +0.0 0.00 perf-profile.calltrace.cycles-pp.__call_rcu_common.xas_store.__xa_erase.xa_erase.simple_offset_remove 0.00 +0.8 0.78 ± 34% +0.0 0.00 perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_lru.xas_alloc.xas_create.xas_store 0.00 +0.8 0.83 ± 29% +0.0 0.00 perf-profile.calltrace.cycles-pp.allocate_slab.___slab_alloc.kmem_cache_alloc_lru.xas_alloc.xas_expand 0.00 +0.9 0.92 ± 26% +0.0 0.00 perf-profile.calltrace.cycles-pp.___slab_alloc.kmem_cache_alloc_lru.xas_alloc.xas_expand.xas_create 0.00 +1.0 0.99 ± 27% +0.0 0.00 perf-profile.calltrace.cycles-pp.shuffle_freelist.allocate_slab.___slab_alloc.kmem_cache_alloc_lru.xas_alloc 0.00 +1.0 1.04 ± 28% +0.0 0.00 perf-profile.calltrace.cycles-pp.kmem_cache_alloc_lru.xas_alloc.xas_create.xas_store.__xa_alloc 0.00 +1.1 1.11 ± 26% +0.0 0.00 perf-profile.calltrace.cycles-pp.xas_alloc.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic 1.51 ± 24% +1.2 2.73 ± 10% +1.2 2.75 ± 10% perf-profile.calltrace.cycles-pp.vfs_unlink.do_unlinkat.__x64_sys_unlink.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.00 +1.2 1.24 ± 20% +0.0 0.00 perf-profile.calltrace.cycles-pp.kmem_cache_alloc_lru.xas_alloc.xas_expand.xas_create.xas_store 0.00 +1.3 1.27 ± 10% +0.0 0.00 perf-profile.calltrace.cycles-pp.xas_store.__xa_erase.xa_erase.simple_offset_remove.shmem_unlink 0.00 +1.3 1.30 ± 10% +0.0 0.00 perf-profile.calltrace.cycles-pp.__xa_erase.xa_erase.simple_offset_remove.shmem_unlink.vfs_unlink 0.00 +1.3 1.33 ± 19% +0.0 0.00 perf-profile.calltrace.cycles-pp.xas_alloc.xas_expand.xas_create.xas_store.__xa_alloc 0.00 +1.4 1.36 ± 10% +0.0 0.00 perf-profile.calltrace.cycles-pp.xa_erase.simple_offset_remove.shmem_unlink.vfs_unlink.do_unlinkat 0.00 +1.4 1.37 ± 10% +1.4 1.37 ± 12% perf-profile.calltrace.cycles-pp.simple_offset_remove.shmem_unlink.vfs_unlink.do_unlinkat.__x64_sys_unlink 0.00 +1.5 1.51 ± 17% +0.0 0.00 perf-profile.calltrace.cycles-pp.xas_expand.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic 0.00 +1.6 1.62 ± 12% +1.6 1.65 ± 12% perf-profile.calltrace.cycles-pp.shmem_unlink.vfs_unlink.do_unlinkat.__x64_sys_unlink.do_syscall_64 0.00 +2.8 2.80 ± 13% +0.0 0.00 perf-profile.calltrace.cycles-pp.xas_create.xas_store.__xa_alloc.__xa_alloc_cyclic.simple_offset_add 0.00 +2.9 2.94 ± 13% +0.0 0.00 perf-profile.calltrace.cycles-pp.xas_store.__xa_alloc.__xa_alloc_cyclic.simple_offset_add.shmem_mknod 5.38 ± 24% +3.1 8.51 ± 11% +0.6 5.95 ± 11% perf-profile.calltrace.cycles-pp.lookup_open.open_last_lookups.path_openat.do_filp_open.do_sys_openat2 6.08 ± 24% +3.2 9.24 ± 12% +0.5 6.59 ± 10% perf-profile.calltrace.cycles-pp.open_last_lookups.path_openat.do_filp_open.do_sys_openat2.__x64_sys_creat 0.00 +3.2 3.20 ± 13% +0.0 0.00 perf-profile.calltrace.cycles-pp.__xa_alloc.__xa_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open 0.00 +3.2 3.24 ± 13% +0.0 0.00 perf-profile.calltrace.cycles-pp.__xa_alloc_cyclic.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups 0.00 +3.4 3.36 ± 14% +1.2 1.16 ± 13% perf-profile.calltrace.cycles-pp.simple_offset_add.shmem_mknod.lookup_open.open_last_lookups.path_openat 2.78 ± 25% +3.4 6.17 ± 12% +0.9 3.69 ± 12% perf-profile.calltrace.cycles-pp.shmem_mknod.lookup_open.open_last_lookups.path_openat.do_filp_open 0.16 ± 30% -0.1 0.08 ± 20% -0.0 0.13 ± 42% perf-profile.children.cycles-pp.map_id_up 0.22 ± 18% -0.0 0.16 ± 17% -0.1 0.12 ± 8% perf-profile.children.cycles-pp._raw_spin_lock_irq 0.47 ± 17% -0.0 0.43 ± 16% +1.0 1.49 ± 10% perf-profile.children.cycles-pp.__x64_sys_close 0.02 ±142% -0.0 0.00 +0.1 0.08 ± 22% perf-profile.children.cycles-pp._find_next_zero_bit 0.01 ±223% -0.0 0.00 +2.0 1.97 ± 15% perf-profile.children.cycles-pp.irq_exit_rcu 0.00 +0.0 0.00 +0.1 0.08 ± 30% perf-profile.children.cycles-pp.__wake_up 0.00 +0.0 0.00 +0.1 0.08 ± 21% perf-profile.children.cycles-pp.should_we_balance 0.00 +0.0 0.00 +0.1 0.09 ± 34% perf-profile.children.cycles-pp.amd_clear_divider 0.00 +0.0 0.00 +0.1 0.10 ± 36% perf-profile.children.cycles-pp.apparmor_current_getsecid_subj 0.00 +0.0 0.00 +0.1 0.10 ± 32% perf-profile.children.cycles-pp.filp_flush 0.00 +0.0 0.00 +0.1 0.10 ± 27% perf-profile.children.cycles-pp.mas_wr_end_piv 0.00 +0.0 0.00 +0.1 0.12 ± 27% perf-profile.children.cycles-pp.mnt_get_write_access 0.00 +0.0 0.00 +0.1 0.12 ± 23% perf-profile.children.cycles-pp.file_close_fd 0.00 +0.0 0.00 +0.1 0.13 ± 30% perf-profile.children.cycles-pp.security_current_getsecid_subj 0.00 +0.0 0.00 +0.1 0.13 ± 18% perf-profile.children.cycles-pp.native_apic_mem_eoi 0.00 +0.0 0.00 +0.2 0.17 ± 22% perf-profile.children.cycles-pp.mas_leaf_max_gap 0.00 +0.0 0.00 +0.2 0.18 ± 24% perf-profile.children.cycles-pp.mtree_range_walk 0.00 +0.0 0.00 +0.2 0.20 ± 19% perf-profile.children.cycles-pp.inode_set_ctime_current 0.00 +0.0 0.00 +0.2 0.24 ± 14% perf-profile.children.cycles-pp.ima_file_check 0.00 +0.0 0.00 +0.2 0.24 ± 22% perf-profile.children.cycles-pp.mas_anode_descend 0.00 +0.0 0.00 +0.3 0.26 ± 18% perf-profile.children.cycles-pp.lockref_put_return 0.00 +0.0 0.00 +0.3 0.29 ± 16% perf-profile.children.cycles-pp.mas_wr_walk 0.00 +0.0 0.00 +0.3 0.31 ± 23% perf-profile.children.cycles-pp.mas_update_gap 0.00 +0.0 0.00 +0.3 0.32 ± 17% perf-profile.children.cycles-pp.mas_wr_append 0.00 +0.0 0.00 +0.4 0.37 ± 15% perf-profile.children.cycles-pp.mas_empty_area 0.00 +0.0 0.00 +0.5 0.47 ± 18% perf-profile.children.cycles-pp.mas_wr_node_store 0.00 +0.0 0.00 +0.5 0.53 ± 18% perf-profile.children.cycles-pp.__memcg_slab_post_alloc_hook 0.00 +0.0 0.00 +0.7 0.65 ± 12% perf-profile.children.cycles-pp.__memcg_slab_pre_alloc_hook 0.00 +0.0 0.00 +0.7 0.65 ± 28% perf-profile.children.cycles-pp.__memcg_slab_free_hook 0.00 +0.0 0.00 +1.0 0.99 ± 13% perf-profile.children.cycles-pp.mas_alloc_cyclic 0.00 +0.0 0.00 +1.1 1.11 ± 14% perf-profile.children.cycles-pp.mtree_alloc_cyclic 0.00 +0.0 0.00 +1.2 1.21 ± 14% perf-profile.children.cycles-pp.mas_erase 0.00 +0.0 0.00 +1.4 1.35 ± 12% perf-profile.children.cycles-pp.mtree_erase 0.00 +0.0 0.00 +1.8 1.78 ± 9% perf-profile.children.cycles-pp.entry_SYSCALL_64 0.00 +0.0 0.00 +4.1 4.06 ± 8% perf-profile.children.cycles-pp.tick_nohz_highres_handler 0.02 ±146% +0.1 0.08 ± 13% +0.0 0.06 ± 81% perf-profile.children.cycles-pp.shmem_is_huge 0.02 ±141% +0.1 0.09 ± 16% -0.0 0.00 perf-profile.children.cycles-pp.__list_del_entry_valid 0.00 +0.1 0.08 ± 11% +0.0 0.00 perf-profile.children.cycles-pp.free_unref_page 0.00 +0.1 0.08 ± 13% +0.1 0.08 ± 45% perf-profile.children.cycles-pp.shmem_destroy_inode 0.04 ±101% +0.1 0.14 ± 25% +0.0 0.05 ± 65% perf-profile.children.cycles-pp.rcu_nocb_try_bypass 0.00 +0.1 0.12 ± 27% +0.0 0.00 perf-profile.children.cycles-pp.xas_find_marked 0.02 ±144% +0.1 0.16 ± 14% -0.0 0.00 perf-profile.children.cycles-pp.__unfreeze_partials 0.03 ±106% +0.2 0.19 ± 26% -0.0 0.03 ±136% perf-profile.children.cycles-pp.xas_descend 0.01 ±223% +0.2 0.17 ± 15% -0.0 0.00 perf-profile.children.cycles-pp.get_page_from_freelist 0.11 ± 22% +0.2 0.29 ± 16% -0.0 0.08 ± 30% perf-profile.children.cycles-pp.rcu_segcblist_enqueue 0.02 ±146% +0.2 0.24 ± 13% -0.0 0.01 ±174% perf-profile.children.cycles-pp.__alloc_pages 0.36 ± 79% +0.6 0.98 ± 15% -0.0 0.31 ± 43% perf-profile.children.cycles-pp.__slab_free 0.50 ± 26% +0.7 1.23 ± 14% -0.2 0.31 ± 19% perf-profile.children.cycles-pp.__call_rcu_common 0.00 +0.8 0.82 ± 13% +0.0 0.00 perf-profile.children.cycles-pp.radix_tree_node_rcu_free 0.00 +1.1 1.14 ± 17% +0.0 0.00 perf-profile.children.cycles-pp.radix_tree_node_ctor 0.16 ± 86% +1.2 1.38 ± 16% -0.1 0.02 ±174% perf-profile.children.cycles-pp.setup_object 1.52 ± 25% +1.2 2.75 ± 10% +1.2 2.76 ± 11% perf-profile.children.cycles-pp.vfs_unlink 0.36 ± 22% +1.3 1.63 ± 12% +1.3 1.65 ± 12% perf-profile.children.cycles-pp.shmem_unlink 0.00 +1.3 1.30 ± 10% +0.0 0.00 perf-profile.children.cycles-pp.__xa_erase 0.20 ± 79% +1.3 1.53 ± 15% -0.2 0.02 ±173% perf-profile.children.cycles-pp.shuffle_freelist 0.00 +1.4 1.36 ± 10% +0.0 0.00 perf-profile.children.cycles-pp.xa_erase 0.00 +1.4 1.38 ± 10% +1.4 1.37 ± 12% perf-profile.children.cycles-pp.simple_offset_remove 0.00 +1.5 1.51 ± 17% +0.0 0.00 perf-profile.children.cycles-pp.xas_expand 0.26 ± 78% +1.6 1.87 ± 13% -0.2 0.05 ± 68% perf-profile.children.cycles-pp.allocate_slab 0.40 ± 49% +1.7 2.10 ± 13% -0.3 0.14 ± 28% perf-profile.children.cycles-pp.___slab_alloc 1.30 ± 85% +2.1 3.42 ± 12% -0.1 1.15 ± 41% perf-profile.children.cycles-pp.rcu_do_batch 1.56 ± 27% +2.4 3.93 ± 11% -0.2 1.41 ± 12% perf-profile.children.cycles-pp.kmem_cache_alloc_lru 0.00 +2.4 2.44 ± 12% +0.0 0.00 perf-profile.children.cycles-pp.xas_alloc 2.66 ± 13% +2.5 5.14 ± 5% -2.7 0.00 perf-profile.children.cycles-pp.__irq_exit_rcu 11.16 ± 10% +2.7 13.88 ± 8% +0.6 11.72 ± 8% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 11.77 ± 10% +2.7 14.49 ± 8% +0.6 12.40 ± 8% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.00 +2.8 2.82 ± 13% +0.0 0.00 perf-profile.children.cycles-pp.xas_create 5.40 ± 24% +3.1 8.52 ± 11% +0.6 5.97 ± 11% perf-profile.children.cycles-pp.lookup_open 6.12 ± 24% +3.1 9.27 ± 12% +0.5 6.62 ± 10% perf-profile.children.cycles-pp.open_last_lookups 0.00 +3.2 3.22 ± 13% +0.0 0.00 perf-profile.children.cycles-pp.__xa_alloc 0.00 +3.2 3.24 ± 13% +0.0 0.00 perf-profile.children.cycles-pp.__xa_alloc_cyclic 0.00 +3.4 3.36 ± 14% +1.2 1.16 ± 13% perf-profile.children.cycles-pp.simple_offset_add 2.78 ± 25% +3.4 6.18 ± 12% +0.9 3.70 ± 12% perf-profile.children.cycles-pp.shmem_mknod 0.00 +4.2 4.24 ± 12% +0.0 0.00 perf-profile.children.cycles-pp.xas_store 0.14 ± 27% -0.1 0.08 ± 21% -0.0 0.12 ± 42% perf-profile.self.cycles-pp.map_id_up 0.09 ± 18% -0.0 0.06 ± 52% +0.1 0.14 ± 12% perf-profile.self.cycles-pp.obj_cgroup_charge 0.18 ± 22% -0.0 0.15 ± 17% -0.1 0.10 ± 9% perf-profile.self.cycles-pp._raw_spin_lock_irq 0.02 ±141% -0.0 0.00 +0.1 0.08 ± 22% perf-profile.self.cycles-pp._find_next_zero_bit 0.16 ± 24% -0.0 0.16 ± 24% -0.1 0.07 ± 64% perf-profile.self.cycles-pp.__sysvec_apic_timer_interrupt 0.02 ±146% +0.0 0.02 ±146% +0.1 0.08 ± 18% perf-profile.self.cycles-pp.shmem_mknod 0.00 +0.0 0.00 +0.1 0.09 ± 36% perf-profile.self.cycles-pp.irq_exit_rcu 0.00 +0.0 0.00 +0.1 0.09 ± 28% perf-profile.self.cycles-pp.tick_nohz_highres_handler 0.00 +0.0 0.00 +0.1 0.09 ± 36% perf-profile.self.cycles-pp.apparmor_current_getsecid_subj 0.00 +0.0 0.00 +0.1 0.09 ± 30% perf-profile.self.cycles-pp.mtree_erase 0.00 +0.0 0.00 +0.1 0.10 ± 26% perf-profile.self.cycles-pp.mtree_alloc_cyclic 0.00 +0.0 0.00 +0.1 0.10 ± 27% perf-profile.self.cycles-pp.mas_wr_end_piv 0.00 +0.0 0.00 +0.1 0.12 ± 28% perf-profile.self.cycles-pp.mnt_get_write_access 0.00 +0.0 0.00 +0.1 0.12 ± 29% perf-profile.self.cycles-pp.inode_set_ctime_current 0.00 +0.0 0.00 +0.1 0.12 ± 38% perf-profile.self.cycles-pp.mas_empty_area 0.00 +0.0 0.00 +0.1 0.13 ± 18% perf-profile.self.cycles-pp.native_apic_mem_eoi 0.00 +0.0 0.00 +0.1 0.14 ± 38% perf-profile.self.cycles-pp.mas_update_gap 0.00 +0.0 0.00 +0.1 0.14 ± 20% perf-profile.self.cycles-pp.mas_wr_append 0.00 +0.0 0.00 +0.2 0.16 ± 23% perf-profile.self.cycles-pp.mas_leaf_max_gap 0.00 +0.0 0.00 +0.2 0.18 ± 24% perf-profile.self.cycles-pp.mtree_range_walk 0.00 +0.0 0.00 +0.2 0.18 ± 29% perf-profile.self.cycles-pp.mas_alloc_cyclic 0.00 +0.0 0.00 +0.2 0.20 ± 14% perf-profile.self.cycles-pp.__memcg_slab_pre_alloc_hook 0.00 +0.0 0.00 +0.2 0.22 ± 32% perf-profile.self.cycles-pp.mas_erase 0.00 +0.0 0.00 +0.2 0.24 ± 35% perf-profile.self.cycles-pp.__memcg_slab_free_hook 0.00 +0.0 0.00 +0.2 0.24 ± 22% perf-profile.self.cycles-pp.mas_anode_descend 0.00 +0.0 0.00 +0.3 0.26 ± 17% perf-profile.self.cycles-pp.lockref_put_return 0.00 +0.0 0.00 +0.3 0.27 ± 16% perf-profile.self.cycles-pp.mas_wr_walk 0.00 +0.0 0.00 +0.3 0.34 ± 20% perf-profile.self.cycles-pp.mas_wr_node_store 0.00 +0.0 0.00 +0.4 0.35 ± 20% perf-profile.self.cycles-pp.__memcg_slab_post_alloc_hook 0.00 +0.0 0.00 +1.6 1.59 ± 8% perf-profile.self.cycles-pp.entry_SYSCALL_64 0.05 ± 85% +0.0 0.06 ± 47% +0.1 0.15 ± 54% perf-profile.self.cycles-pp.check_heap_object 0.05 ± 72% +0.0 0.06 ± 81% +0.0 0.10 ± 15% perf-profile.self.cycles-pp.call_cpuidle 0.04 ±105% +0.0 0.04 ± 75% +0.1 0.10 ± 33% perf-profile.self.cycles-pp.putname 0.00 +0.1 0.06 ± 24% +0.0 0.05 ± 66% perf-profile.self.cycles-pp.shmem_destroy_inode 0.00 +0.1 0.07 ± 8% +0.0 0.00 perf-profile.self.cycles-pp.__xa_alloc 0.02 ±146% +0.1 0.11 ± 28% +0.0 0.02 ±133% perf-profile.self.cycles-pp.rcu_nocb_try_bypass 0.01 ±223% +0.1 0.10 ± 28% -0.0 0.00 perf-profile.self.cycles-pp.shuffle_freelist 0.00 +0.1 0.11 ± 40% +0.0 0.00 perf-profile.self.cycles-pp.xas_create 0.00 +0.1 0.12 ± 27% +0.0 0.00 perf-profile.self.cycles-pp.xas_find_marked 0.00 +0.1 0.14 ± 18% +0.0 0.00 perf-profile.self.cycles-pp.xas_alloc 0.03 ±103% +0.1 0.17 ± 29% -0.0 0.03 ±136% perf-profile.self.cycles-pp.xas_descend 0.00 +0.2 0.16 ± 23% +0.0 0.00 perf-profile.self.cycles-pp.xas_expand 0.10 ± 22% +0.2 0.27 ± 16% -0.0 0.06 ± 65% perf-profile.self.cycles-pp.rcu_segcblist_enqueue 0.92 ± 35% +0.3 1.22 ± 11% -0.3 0.59 ± 15% perf-profile.self.cycles-pp.kmem_cache_free 0.00 +0.4 0.36 ± 16% +0.0 0.00 perf-profile.self.cycles-pp.xas_store 0.32 ± 30% +0.4 0.71 ± 12% -0.1 0.18 ± 23% perf-profile.self.cycles-pp.__call_rcu_common 0.18 ± 27% +0.5 0.65 ± 8% +0.1 0.26 ± 21% perf-profile.self.cycles-pp.kmem_cache_alloc_lru 0.36 ± 79% +0.6 0.96 ± 15% -0.0 0.31 ± 42% perf-profile.self.cycles-pp.__slab_free 0.00 +0.8 0.80 ± 14% +0.0 0.00 perf-profile.self.cycles-pp.radix_tree_node_rcu_free 0.00 +1.0 1.01 ± 16% +0.0 0.00 perf-profile.self.cycles-pp.radix_tree_node_ctor [1] https://lore.kernel.org/all/202402191308.8e7ee8c7-oliver.sang@intel.com/ > > > > > > > @@ -330,9 +329,9 @@ int simple_offset_empty(struct dentry *dentry) > > > > > if (!inode || !S_ISDIR(inode->i_mode)) > > > > > return ret; > > > > > > > > > > - index = 2; > > > > > + index = DIR_OFFSET_MIN; > > > > > > > > This bit should go into the simple_offset_empty() patch... > > > > > > > > > @@ -434,15 +433,15 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) > > > > > > > > > > /* In this case, ->private_data is protected by f_pos_lock */ > > > > > file->private_data = NULL; > > > > > - return vfs_setpos(file, offset, U32_MAX); > > > > > + return vfs_setpos(file, offset, MAX_LFS_FILESIZE); > > > > ^^^ > > > > Why this? It is ULONG_MAX << PAGE_SHIFT on 32-bit so that doesn't seem > > > > quite right? Why not use ULONG_MAX here directly? > > > > > > I initially changed U32_MAX to ULONG_MAX, but for some reason, the > > > length checking in vfs_setpos() fails. There is probably a sign > > > extension thing happening here that I don't understand. > > > > > > > > > > Otherwise the patch looks good to me. > > > > > > As always, thank you for your review. > > > > > > > > > -- > > > Chuck Lever > > -- > Chuck Lever
diff --git a/fs/libfs.c b/fs/libfs.c index 3cf773950f93..f073e9aeb2bf 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -245,17 +245,17 @@ enum { DIR_OFFSET_MIN = 2, }; -static void offset_set(struct dentry *dentry, u32 offset) +static void offset_set(struct dentry *dentry, unsigned long offset) { - dentry->d_fsdata = (void *)((uintptr_t)(offset)); + dentry->d_fsdata = (void *)offset; } -static u32 dentry2offset(struct dentry *dentry) +static unsigned long dentry2offset(struct dentry *dentry) { - return (u32)((uintptr_t)(dentry->d_fsdata)); + return (unsigned long)dentry->d_fsdata; } -static struct lock_class_key simple_offset_xa_lock; +static struct lock_class_key simple_offset_lock_class; /** * simple_offset_init - initialize an offset_ctx @@ -264,8 +264,8 @@ static struct lock_class_key simple_offset_xa_lock; */ void simple_offset_init(struct offset_ctx *octx) { - xa_init_flags(&octx->xa, XA_FLAGS_ALLOC1); - lockdep_set_class(&octx->xa.xa_lock, &simple_offset_xa_lock); + mt_init_flags(&octx->mt, MT_FLAGS_ALLOC_RANGE); + lockdep_set_class(&octx->mt.ma_lock, &simple_offset_lock_class); octx->next_offset = DIR_OFFSET_MIN; } @@ -279,15 +279,14 @@ void simple_offset_init(struct offset_ctx *octx) */ int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry) { - static const struct xa_limit limit = XA_LIMIT(DIR_OFFSET_MIN, U32_MAX); - u32 offset; + unsigned long offset; int ret; if (dentry2offset(dentry) != 0) return -EBUSY; - ret = xa_alloc_cyclic(&octx->xa, &offset, dentry, limit, - &octx->next_offset, GFP_KERNEL); + ret = mtree_alloc_cyclic(&octx->mt, &offset, dentry, DIR_OFFSET_MIN, + ULONG_MAX, &octx->next_offset, GFP_KERNEL); if (ret < 0) return ret; @@ -303,13 +302,13 @@ int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry) */ void simple_offset_remove(struct offset_ctx *octx, struct dentry *dentry) { - u32 offset; + unsigned long offset; offset = dentry2offset(dentry); if (offset == 0) return; - xa_erase(&octx->xa, offset); + mtree_erase(&octx->mt, offset); offset_set(dentry, 0); } @@ -330,9 +329,9 @@ int simple_offset_empty(struct dentry *dentry) if (!inode || !S_ISDIR(inode->i_mode)) return ret; - index = 2; + index = DIR_OFFSET_MIN; octx = inode->i_op->get_offset_ctx(inode); - xa_for_each(&octx->xa, index, child) { + mt_for_each(&octx->mt, child, index, ULONG_MAX) { spin_lock(&child->d_lock); if (simple_positive(child)) { spin_unlock(&child->d_lock); @@ -362,8 +361,8 @@ int simple_offset_rename_exchange(struct inode *old_dir, { struct offset_ctx *old_ctx = old_dir->i_op->get_offset_ctx(old_dir); struct offset_ctx *new_ctx = new_dir->i_op->get_offset_ctx(new_dir); - u32 old_index = dentry2offset(old_dentry); - u32 new_index = dentry2offset(new_dentry); + unsigned long old_index = dentry2offset(old_dentry); + unsigned long new_index = dentry2offset(new_dentry); int ret; simple_offset_remove(old_ctx, old_dentry); @@ -389,9 +388,9 @@ int simple_offset_rename_exchange(struct inode *old_dir, out_restore: offset_set(old_dentry, old_index); - xa_store(&old_ctx->xa, old_index, old_dentry, GFP_KERNEL); + mtree_store(&old_ctx->mt, old_index, old_dentry, GFP_KERNEL); offset_set(new_dentry, new_index); - xa_store(&new_ctx->xa, new_index, new_dentry, GFP_KERNEL); + mtree_store(&new_ctx->mt, new_index, new_dentry, GFP_KERNEL); return ret; } @@ -404,7 +403,7 @@ int simple_offset_rename_exchange(struct inode *old_dir, */ void simple_offset_destroy(struct offset_ctx *octx) { - xa_destroy(&octx->xa); + mtree_destroy(&octx->mt); } /** @@ -434,15 +433,15 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) /* In this case, ->private_data is protected by f_pos_lock */ file->private_data = NULL; - return vfs_setpos(file, offset, U32_MAX); + return vfs_setpos(file, offset, MAX_LFS_FILESIZE); } -static struct dentry *offset_find_next(struct xa_state *xas) +static struct dentry *offset_find_next(struct ma_state *mas) { struct dentry *child, *found = NULL; rcu_read_lock(); - child = xas_next_entry(xas, U32_MAX); + child = mas_find(mas, ULONG_MAX); if (!child) goto out; spin_lock(&child->d_lock); @@ -456,7 +455,7 @@ static struct dentry *offset_find_next(struct xa_state *xas) static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) { - u32 offset = dentry2offset(dentry); + unsigned long offset = dentry2offset(dentry); struct inode *inode = d_inode(dentry); return ctx->actor(ctx, dentry->d_name.name, dentry->d_name.len, offset, @@ -466,11 +465,11 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) { struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode); - XA_STATE(xas, &octx->xa, ctx->pos); + MA_STATE(mas, &octx->mt, ctx->pos, ctx->pos); struct dentry *dentry; while (true) { - dentry = offset_find_next(&xas); + dentry = offset_find_next(&mas); if (!dentry) return ERR_PTR(-ENOENT); @@ -480,7 +479,7 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) } dput(dentry); - ctx->pos = xas.xa_index + 1; + ctx->pos = mas.index + 1; } return NULL; } diff --git a/include/linux/fs.h b/include/linux/fs.h index 03d141809a2c..55144c12ee0f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -43,6 +43,7 @@ #include <linux/cred.h> #include <linux/mnt_idmapping.h> #include <linux/slab.h> +#include <linux/maple_tree.h> #include <asm/byteorder.h> #include <uapi/linux/fs.h> @@ -3260,8 +3261,8 @@ extern ssize_t simple_write_to_buffer(void *to, size_t available, loff_t *ppos, const void __user *from, size_t count); struct offset_ctx { - struct xarray xa; - u32 next_offset; + struct maple_tree mt; + unsigned long next_offset; }; void simple_offset_init(struct offset_ctx *octx);