Message ID | 20221024173140.30673-1-ivan@cloudflare.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp606227wru; Mon, 24 Oct 2022 12:04:20 -0700 (PDT) X-Google-Smtp-Source: AMsMyM644+HcE89BOzFriwb/wLKuu1NcPBhcqWZRhEwJqG5oBsOxIpQsWDxV3/eZzOzilN3bnEVU X-Received: by 2002:a17:90a:4216:b0:20d:2891:157 with SMTP id o22-20020a17090a421600b0020d28910157mr76944167pjg.47.1666638260592; Mon, 24 Oct 2022 12:04:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666638260; cv=none; d=google.com; s=arc-20160816; b=IDwGxPrZAyIb/+6qpK/RaevsSWTzsUzvcm8yBlVBtSwvZSH0sozUj+66PSrjE/I3fn zo1xpiB74/IdqRFttu3HO2+s5JazA7jy7EJpcoSbZvhIMNDIy+/Z+4oTU2lqwEKpIJgp Uf9wDFK1bvybNzLSCmU0IYxlVFvC4+/64chd45mBe5wi2Tm3s3w15/lrAI+H8pWudjEj n6ECFdE1ABSCy5d9fguB6/5hrLaQ/Z0WsGWZsMpKQMuE5U/JcAbjvxTp1LJzzlhxoDUW AVFfWX/qlRpi03ALpB+3oDADqY5Qf7CtCCZoYL9NJsZAK2Dm0W4ihG9wwge86bzSP3pg a6FA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=uXzCZ5rm6SI2wrD2b5taRfSp+IHoN3hQXiAouJQX7ls=; b=NXmAJ5deIh3Nll2z9aYkOzYMgN58SdZn2b0Hu5ZF3XTA4CNjMzHmK128iz7bZHcted uAEh7GXdIP9TneXFPs/yIKzSWsWZNw5ET43SljQ9tePYvDuRyNEHbHL7S7ZfRbyhXdd5 rev5JBN1+vGHLKXe8yB+KUuF76XhL1T0+C2cf1dY8idvQVCe2uBxBm/3QqrlF8dK4UJE xcYwSD59HttE3oKLimXsUhPShDG8r6uYRqvYOyK+SoVI1gtDRAuKk5n/6/qwWWDGWO9n mkuiYPz0mfLdXAuAMO5K1rXzmEFRoRL3D853LxesOGjyQZaaIz4Yn5lWXoPwyyhJRffE X/GQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b=kLEQVCJX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cloudflare.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x6-20020a654146000000b0046e9da9ac40si279204pgp.556.2022.10.24.12.04.06; Mon, 24 Oct 2022 12:04:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@cloudflare.com header.s=google header.b=kLEQVCJX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=cloudflare.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232861AbiJXSwk (ORCPT <rfc822;pwkd43@gmail.com> + 99 others); Mon, 24 Oct 2022 14:52:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34306 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232850AbiJXSwL (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 24 Oct 2022 14:52:11 -0400 Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 15E4F153E1C for <linux-kernel@vger.kernel.org>; Mon, 24 Oct 2022 10:33:33 -0700 (PDT) Received: by mail-pf1-x432.google.com with SMTP id e4so5624799pfl.2 for <linux-kernel@vger.kernel.org>; Mon, 24 Oct 2022 10:33:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=uXzCZ5rm6SI2wrD2b5taRfSp+IHoN3hQXiAouJQX7ls=; b=kLEQVCJXkj3W9Vy4Ri0brcVSrlyZha6CYi1LpldrkhaEQsW/bjDxnvxHPTR3qOEKJ2 uvinpGHK8cgooBRIChPLJkHNxi+M2BQkHewI0a0ohsmaLr/w9/92Gw4nwPkr0uxXvxRN nYoWKpxXNHDKEb6GRU0+nWm17+mbOrAfQiMgI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=uXzCZ5rm6SI2wrD2b5taRfSp+IHoN3hQXiAouJQX7ls=; b=VB2IoCKLLu6ZHiCbYPKZ9dGSMf8/jWbbo7vDmEpCKsudj1iHkzGL6GTDlDTr4IIM+r GJEzvyT0CEmlDLHZzi8eUlCvKl6WQr49sb9ysTtxRr139NpX+XCa3eWc8biKf0YQn1Mw HqUYpq+8Ke6delCBtSlj6TIQl7N4zRD2yZsOA7rqxmlyPrVCs7sw/cNMKlySqSMr1MsX NAS3c9xOzdN06TYGcCrrsrqIz2ltm0gplOm4K0S/bnZ6pOysfnnUh7iXvjEoi1AiNIZO L22o2qpHSrcu4DbnjNu4Q0dKwhQQNWhRNOrrdpOVG1k0Gv1ZzV5jFQDMchr5pSFdQQE1 oZkA== X-Gm-Message-State: ACrzQf3j8NQ6epBsh/m8Gu+Dz7Jz5OzY3bq/N1ei4OEovP46moevgwHN RUSE0Uh/WvqXawavHQkUxWlTMQ== X-Received: by 2002:aa7:9212:0:b0:562:b5f6:f7d7 with SMTP id 18-20020aa79212000000b00562b5f6f7d7mr35480509pfo.70.1666632744744; Mon, 24 Oct 2022 10:32:24 -0700 (PDT) Received: from localhost ([2601:644:200:2b2:c54f:5b4:9653:ba01]) by smtp.gmail.com with ESMTPSA id x7-20020a170902ec8700b001755e4278a6sm9905plg.261.2022.10.24.10.32.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Oct 2022 10:32:24 -0700 (PDT) From: Ivan Babrou <ivan@cloudflare.com> To: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, kernel-team@cloudflare.com, Alexey Dobriyan <adobriyan@gmail.com>, Al Viro <viro@zeniv.linux.org.uk>, Theodore Ts'o <tytso@mit.edu>, David Laight <David.Laight@aculab.com>, Jonathan Corbet <corbet@lwn.net>, Andrew Morton <akpm@linux-foundation.org>, David Hildenbrand <david@redhat.com>, Johannes Weiner <hannes@cmpxchg.org>, Christoph Anton Mitterer <mail@christoph.anton.mitterer.name>, Mike Rapoport <rppt@kernel.org>, Paul Gortmaker <paul.gortmaker@windriver.com>, Kalesh Singh <kaleshsingh@google.com>, Brian Foster <bfoster@redhat.com>, Ivan Babrou <ivan@cloudflare.com> Subject: [PATCH v4] proc: report open files as size in stat() for /proc/pid/fd Date: Mon, 24 Oct 2022 10:31:40 -0700 Message-Id: <20221024173140.30673-1-ivan@cloudflare.com> X-Mailer: git-send-email 2.37.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747596880586385029?= X-GMAIL-MSGID: =?utf-8?q?1747596880586385029?= |
Series |
[v4] proc: report open files as size in stat() for /proc/pid/fd
|
|
Commit Message
Ivan Babrou
Oct. 24, 2022, 5:31 p.m. UTC
Many monitoring tools include open file count as a metric. Currently
the only way to get this number is to enumerate the files in /proc/pid/fd.
The problem with the current approach is that it does many things people
generally don't care about when they need one number for a metric.
In our tests for cadvisor, which reports open file counts per cgroup,
we observed that reading the number of open files is slow. Out of 35.23%
of CPU time spent in `proc_readfd_common`, we see 29.43% spent in
`proc_fill_cache`, which is responsible for filling dentry info.
Some of this extra time is spinlock contention, but it's a contention
for the lock we don't want to take to begin with.
We considered putting the number of open files in /proc/pid/status.
Unfortunately, counting the number of fds involves iterating the open_files
bitmap, which has a linear complexity in proportion with the number
of open files (bitmap slots really, but it's close). We don't want
to make /proc/pid/status any slower, so instead we put this info
in /proc/pid/fd as a size member of the stat syscall result.
Previously the reported number was zero, so there's very little
risk of breaking anything, while still providing a somewhat logical
way to count the open files with a fallback if it's zero.
RFC for this patch included iterating open fds under RCU. Thanks
to Frank Hofmann for the suggestion to use the bitmap instead.
Previously:
```
$ sudo stat /proc/1/fd | head -n2
File: /proc/1/fd
Size: 0 Blocks: 0 IO Block: 1024 directory
```
With this patch:
```
$ sudo stat /proc/1/fd | head -n2
File: /proc/1/fd
Size: 65 Blocks: 0 IO Block: 1024 directory
```
Correctness check:
```
$ sudo ls /proc/1/fd | wc -l
65
```
I added the docs for /proc/<pid>/fd while I'm at it.
Signed-off-by: Ivan Babrou <ivan@cloudflare.com>
---
v4: Return errno from proc_fd_getattr() instead of setting negative size.
Added an explicit include for linux/bitmap.h.
v3: Made use of bitmap_weight() to count the bits.
v2: Added missing rcu_read_lock() / rcu_read_unlock(),
task_lock() / task_unlock() and put_task_struct().
---
Documentation/filesystems/proc.rst | 17 +++++++++++
fs/proc/fd.c | 45 ++++++++++++++++++++++++++++++
2 files changed, 62 insertions(+)
Comments
On Mon, Oct 24, 2022 at 10:31:40AM -0700, Ivan Babrou wrote: > Many monitoring tools include open file count as a metric. Currently > the only way to get this number is to enumerate the files in /proc/pid/fd. > > The problem with the current approach is that it does many things people > generally don't care about when they need one number for a metric. > In our tests for cadvisor, which reports open file counts per cgroup, > we observed that reading the number of open files is slow. Out of 35.23% > of CPU time spent in `proc_readfd_common`, we see 29.43% spent in > `proc_fill_cache`, which is responsible for filling dentry info. > Some of this extra time is spinlock contention, but it's a contention > for the lock we don't want to take to begin with. > > We considered putting the number of open files in /proc/pid/status. > Unfortunately, counting the number of fds involves iterating the open_files > bitmap, which has a linear complexity in proportion with the number > of open files (bitmap slots really, but it's close). We don't want > to make /proc/pid/status any slower, so instead we put this info > in /proc/pid/fd as a size member of the stat syscall result. > Previously the reported number was zero, so there's very little > risk of breaking anything, while still providing a somewhat logical > way to count the open files with a fallback if it's zero. > > RFC for this patch included iterating open fds under RCU. Thanks > to Frank Hofmann for the suggestion to use the bitmap instead. > > Previously: > > ``` > $ sudo stat /proc/1/fd | head -n2 > File: /proc/1/fd > Size: 0 Blocks: 0 IO Block: 1024 directory > ``` > > With this patch: > > ``` > $ sudo stat /proc/1/fd | head -n2 > File: /proc/1/fd > Size: 65 Blocks: 0 IO Block: 1024 directory > ``` > > Correctness check: > > ``` > $ sudo ls /proc/1/fd | wc -l > 65 > ``` > > I added the docs for /proc/<pid>/fd while I'm at it. > > Signed-off-by: Ivan Babrou <ivan@cloudflare.com> > > --- > v4: Return errno from proc_fd_getattr() instead of setting negative size. > Added an explicit include for linux/bitmap.h. > v3: Made use of bitmap_weight() to count the bits. > v2: Added missing rcu_read_lock() / rcu_read_unlock(), > task_lock() / task_unlock() and put_task_struct(). > --- > Documentation/filesystems/proc.rst | 17 +++++++++++ > fs/proc/fd.c | 45 ++++++++++++++++++++++++++++++ > 2 files changed, 62 insertions(+) > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst > index 898c99eae8e4..ec6cfdf1796a 100644 > --- a/Documentation/filesystems/proc.rst > +++ b/Documentation/filesystems/proc.rst > @@ -47,6 +47,7 @@ fixes/update part 1.1 Stefani Seibold <stefani@seibold.net> June 9 2009 > 3.10 /proc/<pid>/timerslack_ns - Task timerslack value > 3.11 /proc/<pid>/patch_state - Livepatch patch operation state > 3.12 /proc/<pid>/arch_status - Task architecture specific information > + 3.13 /proc/<pid>/fd - List of symlinks to open files > > 4 Configuring procfs > 4.1 Mount options > @@ -2149,6 +2150,22 @@ AVX512_elapsed_ms > the task is unlikely an AVX512 user, but depends on the workload and the > scheduling scenario, it also could be a false negative mentioned above. > > +3.13 /proc/<pid>/fd - List of symlinks to open files > +------------------------------------------------------- > +This directory contains symbolic links which represent open files > +the process is maintaining. Example output:: > + > + lr-x------ 1 root root 64 Sep 20 17:53 0 -> /dev/null > + l-wx------ 1 root root 64 Sep 20 17:53 1 -> /dev/null > + lrwx------ 1 root root 64 Sep 20 17:53 10 -> 'socket:[12539]' > + lrwx------ 1 root root 64 Sep 20 17:53 11 -> 'socket:[12540]' > + lrwx------ 1 root root 64 Sep 20 17:53 12 -> 'socket:[12542]' > + > +The number of open files for the process is stored in 'size' member > +of stat() output for /proc/<pid>/fd for fast access. > +------------------------------------------------------- > + > + > Chapter 4: Configuring procfs > ============================= > > diff --git a/fs/proc/fd.c b/fs/proc/fd.c > index 913bef0d2a36..fc46d6fe080c 100644 > --- a/fs/proc/fd.c > +++ b/fs/proc/fd.c > @@ -7,6 +7,7 @@ > #include <linux/namei.h> > #include <linux/pid.h> > #include <linux/ptrace.h> > +#include <linux/bitmap.h> > #include <linux/security.h> > #include <linux/file.h> > #include <linux/seq_file.h> > @@ -279,6 +280,30 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx, > return 0; > } > > +static int proc_readfd_count(struct inode *inode, loff_t *count) > +{ > + struct task_struct *p = get_proc_task(inode); > + struct fdtable *fdt; > + > + if (!p) > + return -ENOENT; > + > + task_lock(p); > + if (p->files) { > + rcu_read_lock(); > + > + fdt = files_fdtable(p->files); > + *count = bitmap_weight(fdt->open_fds, fdt->max_fds); > + > + rcu_read_unlock(); > + } > + task_unlock(p); > + > + put_task_struct(p); > + > + return 0; > +} > + > static int proc_readfd(struct file *file, struct dir_context *ctx) > { > return proc_readfd_common(file, ctx, proc_fd_instantiate); > @@ -319,9 +344,29 @@ int proc_fd_permission(struct user_namespace *mnt_userns, > return rv; > } > > +static int proc_fd_getattr(struct user_namespace *mnt_userns, > + const struct path *path, struct kstat *stat, > + u32 request_mask, unsigned int query_flags) > +{ > + struct inode *inode = d_inode(path->dentry); > + int rv = 0; > + > + generic_fillattr(&init_user_ns, inode, stat); > + Sorry I missed this on v3, but shouldn't this pass through the mnt_userns parameter? > + /* If it's a directory, put the number of open fds there */ > + if (S_ISDIR(inode->i_mode)) { > + rv = proc_readfd_count(inode, &stat->size); > + if (rv < 0) > + return rv; > + } Also I suppose this could just do: if (S_ISDIR(inode->i_mode)) rv = proc_readfd_count(inode, &stat->size); return rv; But that's a nit. Otherwise seems reasonable to me. Brian > + > + return rv; > +} > + > const struct inode_operations proc_fd_inode_operations = { > .lookup = proc_lookupfd, > .permission = proc_fd_permission, > + .getattr = proc_fd_getattr, > .setattr = proc_setattr, > }; > > -- > 2.37.3 >
On Fri, Nov 18, 2022 at 11:10 AM Brian Foster <bfoster@redhat.com> wrote: > > +static int proc_fd_getattr(struct user_namespace *mnt_userns, > > + const struct path *path, struct kstat *stat, > > + u32 request_mask, unsigned int query_flags) > > +{ > > + struct inode *inode = d_inode(path->dentry); > > + int rv = 0; > > + > > + generic_fillattr(&init_user_ns, inode, stat); > > + > > Sorry I missed this on v3, but shouldn't this pass through the > mnt_userns parameter? The mnt_userns parameter was added in 549c729 (fs: make helpers idmap mount aware), and it's not passed anywhere in fs/proc. Looking at other uses of generic_fillattr, all of them use "init_user_ns": $ rg generic_fillattr fs/proc fs/proc/proc_net.c 301: generic_fillattr(&init_user_ns, inode, stat); fs/proc/base.c 1970: generic_fillattr(&init_user_ns, inode, stat); 3856: generic_fillattr(&init_user_ns, inode, stat); fs/proc/root.c 315: generic_fillattr(&init_user_ns, d_inode(path->dentry), stat); fs/proc/generic.c 150: generic_fillattr(&init_user_ns, inode, stat); fs/proc/proc_sysctl.c 841: generic_fillattr(&init_user_ns, inode, stat);
On Fri, Nov 18, 2022 at 11:18:36AM -0800, Ivan Babrou wrote: > On Fri, Nov 18, 2022 at 11:10 AM Brian Foster <bfoster@redhat.com> wrote: > > > +static int proc_fd_getattr(struct user_namespace *mnt_userns, > > > + const struct path *path, struct kstat *stat, > > > + u32 request_mask, unsigned int query_flags) > > > +{ > > > + struct inode *inode = d_inode(path->dentry); > > > + int rv = 0; > > > + > > > + generic_fillattr(&init_user_ns, inode, stat); > > > + > > > > Sorry I missed this on v3, but shouldn't this pass through the > > mnt_userns parameter? > > The mnt_userns parameter was added in 549c729 (fs: make helpers idmap > mount aware), and it's not passed anywhere in fs/proc. > > Looking at other uses of generic_fillattr, all of them use "init_user_ns": > Interesting. It looks like this would have used mnt_userns from vfs_getattr_nosec() before proc_fd_getattr() is wired up, right? I'm not familiar enough with that change to say whether /proc should use one value or the other, or perhaps it just doesn't matter.? Christian? Brian > $ rg generic_fillattr fs/proc > fs/proc/proc_net.c > 301: generic_fillattr(&init_user_ns, inode, stat); > > fs/proc/base.c > 1970: generic_fillattr(&init_user_ns, inode, stat); > 3856: generic_fillattr(&init_user_ns, inode, stat); > > fs/proc/root.c > 315: generic_fillattr(&init_user_ns, d_inode(path->dentry), stat); > > fs/proc/generic.c > 150: generic_fillattr(&init_user_ns, inode, stat); > > fs/proc/proc_sysctl.c > 841: generic_fillattr(&init_user_ns, inode, stat); >
On Fri, Nov 18, 2022 at 02:33:27PM -0500, Brian Foster wrote: > On Fri, Nov 18, 2022 at 11:18:36AM -0800, Ivan Babrou wrote: > > On Fri, Nov 18, 2022 at 11:10 AM Brian Foster <bfoster@redhat.com> wrote: > > > > +static int proc_fd_getattr(struct user_namespace *mnt_userns, > > > > + const struct path *path, struct kstat *stat, > > > > + u32 request_mask, unsigned int query_flags) > > > > +{ > > > > + struct inode *inode = d_inode(path->dentry); > > > > + int rv = 0; > > > > + > > > > + generic_fillattr(&init_user_ns, inode, stat); > > > > + > > > > > > Sorry I missed this on v3, but shouldn't this pass through the > > > mnt_userns parameter? > > > > The mnt_userns parameter was added in 549c729 (fs: make helpers idmap > > mount aware), and it's not passed anywhere in fs/proc. > > > > Looking at other uses of generic_fillattr, all of them use "init_user_ns": > > > > Interesting. It looks like this would have used mnt_userns from > vfs_getattr_nosec() before proc_fd_getattr() is wired up, right? I'm not > familiar enough with that change to say whether /proc should use one > value or the other, or perhaps it just doesn't matter.? > > Christian? Hey Brian, This should pass init_user_ns. So it is correct the way it is done now. The init_user_ns is used to indicate that no idmappings are used and since procfs doesn't support the creation of idmapped mounts and doesn't need to, passing it here makes the most sense. Technically passing down mnt_userns would work too but that would make it look like procfs could support idmapped mounts which isn't the case and so we don't do it this way. Starting soon this will be a lot clearer too since we're about to introduce struct mnt_idmap and replace passing around userns here. That'll make things also safer as the helpers that currently could be passed a mnt_userns - which could be any userns - will now only be able to take mnt_idmap which is a different type. Long story short, the way your patch does it is correct. Thanks! Christian
On Sat, Nov 19, 2022 at 01:01:11PM +0100, Christian Brauner wrote: > On Fri, Nov 18, 2022 at 02:33:27PM -0500, Brian Foster wrote: > > On Fri, Nov 18, 2022 at 11:18:36AM -0800, Ivan Babrou wrote: > > > On Fri, Nov 18, 2022 at 11:10 AM Brian Foster <bfoster@redhat.com> wrote: > > > > > +static int proc_fd_getattr(struct user_namespace *mnt_userns, > > > > > + const struct path *path, struct kstat *stat, > > > > > + u32 request_mask, unsigned int query_flags) > > > > > +{ > > > > > + struct inode *inode = d_inode(path->dentry); > > > > > + int rv = 0; > > > > > + > > > > > + generic_fillattr(&init_user_ns, inode, stat); > > > > > + > > > > > > > > Sorry I missed this on v3, but shouldn't this pass through the > > > > mnt_userns parameter? > > > > > > The mnt_userns parameter was added in 549c729 (fs: make helpers idmap > > > mount aware), and it's not passed anywhere in fs/proc. > > > > > > Looking at other uses of generic_fillattr, all of them use "init_user_ns": > > > > > > > Interesting. It looks like this would have used mnt_userns from > > vfs_getattr_nosec() before proc_fd_getattr() is wired up, right? I'm not > > familiar enough with that change to say whether /proc should use one > > value or the other, or perhaps it just doesn't matter.? > > > > Christian? > > Hey Brian, > > This should pass init_user_ns. So it is correct the way it is done now. > The init_user_ns is used to indicate that no idmappings are used and > since procfs doesn't support the creation of idmapped mounts and doesn't > need to, passing it here makes the most sense. Technically passing down > mnt_userns would work too but that would make it look like procfs could > support idmapped mounts which isn't the case and so we don't do it this > way. > Got it, thanks for the context. Ivan, Sorry for the noise. FWIW, for this version of the patch: Reviewed-by: Brian Foster <bfoster@redhat.com> > Starting soon this will be a lot clearer too since we're about to > introduce struct mnt_idmap and replace passing around userns here. > That'll make things also safer as the helpers that currently could be > passed a mnt_userns - which could be any userns - will now only be able > to take mnt_idmap which is a different type. > > Long story short, the way your patch does it is correct. > > Thanks! > Christian >
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index 898c99eae8e4..ec6cfdf1796a 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -47,6 +47,7 @@ fixes/update part 1.1 Stefani Seibold <stefani@seibold.net> June 9 2009 3.10 /proc/<pid>/timerslack_ns - Task timerslack value 3.11 /proc/<pid>/patch_state - Livepatch patch operation state 3.12 /proc/<pid>/arch_status - Task architecture specific information + 3.13 /proc/<pid>/fd - List of symlinks to open files 4 Configuring procfs 4.1 Mount options @@ -2149,6 +2150,22 @@ AVX512_elapsed_ms the task is unlikely an AVX512 user, but depends on the workload and the scheduling scenario, it also could be a false negative mentioned above. +3.13 /proc/<pid>/fd - List of symlinks to open files +------------------------------------------------------- +This directory contains symbolic links which represent open files +the process is maintaining. Example output:: + + lr-x------ 1 root root 64 Sep 20 17:53 0 -> /dev/null + l-wx------ 1 root root 64 Sep 20 17:53 1 -> /dev/null + lrwx------ 1 root root 64 Sep 20 17:53 10 -> 'socket:[12539]' + lrwx------ 1 root root 64 Sep 20 17:53 11 -> 'socket:[12540]' + lrwx------ 1 root root 64 Sep 20 17:53 12 -> 'socket:[12542]' + +The number of open files for the process is stored in 'size' member +of stat() output for /proc/<pid>/fd for fast access. +------------------------------------------------------- + + Chapter 4: Configuring procfs ============================= diff --git a/fs/proc/fd.c b/fs/proc/fd.c index 913bef0d2a36..fc46d6fe080c 100644 --- a/fs/proc/fd.c +++ b/fs/proc/fd.c @@ -7,6 +7,7 @@ #include <linux/namei.h> #include <linux/pid.h> #include <linux/ptrace.h> +#include <linux/bitmap.h> #include <linux/security.h> #include <linux/file.h> #include <linux/seq_file.h> @@ -279,6 +280,30 @@ static int proc_readfd_common(struct file *file, struct dir_context *ctx, return 0; } +static int proc_readfd_count(struct inode *inode, loff_t *count) +{ + struct task_struct *p = get_proc_task(inode); + struct fdtable *fdt; + + if (!p) + return -ENOENT; + + task_lock(p); + if (p->files) { + rcu_read_lock(); + + fdt = files_fdtable(p->files); + *count = bitmap_weight(fdt->open_fds, fdt->max_fds); + + rcu_read_unlock(); + } + task_unlock(p); + + put_task_struct(p); + + return 0; +} + static int proc_readfd(struct file *file, struct dir_context *ctx) { return proc_readfd_common(file, ctx, proc_fd_instantiate); @@ -319,9 +344,29 @@ int proc_fd_permission(struct user_namespace *mnt_userns, return rv; } +static int proc_fd_getattr(struct user_namespace *mnt_userns, + const struct path *path, struct kstat *stat, + u32 request_mask, unsigned int query_flags) +{ + struct inode *inode = d_inode(path->dentry); + int rv = 0; + + generic_fillattr(&init_user_ns, inode, stat); + + /* If it's a directory, put the number of open fds there */ + if (S_ISDIR(inode->i_mode)) { + rv = proc_readfd_count(inode, &stat->size); + if (rv < 0) + return rv; + } + + return rv; +} + const struct inode_operations proc_fd_inode_operations = { .lookup = proc_lookupfd, .permission = proc_fd_permission, + .getattr = proc_fd_getattr, .setattr = proc_setattr, };