Message ID | 20231211193048.580691-1-avagin@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp7290401vqy; Mon, 11 Dec 2023 11:31:03 -0800 (PST) X-Google-Smtp-Source: AGHT+IF/KuKA9BtfXJgUq0jlnZScCFLCTDC24R6nRSM20RzpJC46Z1bp8CbSsKCTYgrp8KJK4spn X-Received: by 2002:a05:6a20:12c8:b0:190:a0ed:8058 with SMTP id v8-20020a056a2012c800b00190a0ed8058mr5091065pzg.103.1702323063655; Mon, 11 Dec 2023 11:31:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702323063; cv=none; d=google.com; s=arc-20160816; b=Fqm4XoXWiN0IgfFxLjQkrZU7IvsVwr/mszxGLWfw9GWwKazcDpUr+m7oFq+ZpFS1zv XYZ2GvEKg4BqxL2CMOGkgPqkhu4HWOAOuQjHTt/v2mKbpHwjqL/8/chxj9dIQ+RPVPk2 iDSb8lk48IUxR/F4UExsaDr/0HjvahgQa5JOFHWpY1EnC3du2uReyRYY+s5gp8skQFNR SSxJYBbG9FfriKtQEGYMBWxQFS1uf5PZBXPX8etid9hJA1KqFnfdM7/XGrB64mGTLWkR 85ji6/pAI2jNNbTb2njGY7175E2d8xOLpnlaKY4UwoACHnRrWV3XtxMMSu9UOv6gWZhp o5GQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :dkim-signature; bh=Zg8AR05UHVq8cdvaCLIRoC/0NFp4lFWlQJeIXd72AHI=; fh=k9VKxD0Re/jkv49R7QbtIZFLcKbwxv6/4UblSDbzkbw=; b=SI7mI10IPJ4nDtWkBZ4BeJBqvabRTfl/qNR7VtRp48RZQbi7oMpe4n0wVFRADeDqX0 Vfc+BartIUbdJGBPHCtJrsOXzfutMBx9YT34pq2Xmat6jQaWhBi51Js4fBEGclj93N8U R9HPawz3dPc0XNQfRXaEfqPsgjE7OiHmPfeYR3ngOuP6mbpHddH2M69whmCA0Cok9aYH 3lDlOj+OOcH9Hl8x/S51oPbjb+mk0FZtzBI7FuF59OhMkJFEvHVuHwbbVvOsQuaskwVK e0tL+k9RL798Lip3/t30tYTXop+5aHH5zh//YNipic5OUKES53IUyY1XKQcdmb+ZhsQD F4fg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=dDsf8FA0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from pete.vger.email (pete.vger.email. [23.128.96.36]) by mx.google.com with ESMTPS id t11-20020a056a0021cb00b006cef51e9ea9si4205037pfj.218.2023.12.11.11.31.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Dec 2023 11:31:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) client-ip=23.128.96.36; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=dDsf8FA0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.36 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id CE95580411AF; Mon, 11 Dec 2023 11:30:58 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344776AbjLKTar (ORCPT <rfc822;dexuan.linux@gmail.com> + 99 others); Mon, 11 Dec 2023 14:30:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229539AbjLKTaq (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 11 Dec 2023 14:30:46 -0500 Received: from mail-pl1-x64a.google.com (mail-pl1-x64a.google.com [IPv6:2607:f8b0:4864:20::64a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C5D4D5 for <linux-kernel@vger.kernel.org>; Mon, 11 Dec 2023 11:30:52 -0800 (PST) Received: by mail-pl1-x64a.google.com with SMTP id d9443c01a7336-1d0544c07c3so22869015ad.3 for <linux-kernel@vger.kernel.org>; Mon, 11 Dec 2023 11:30:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1702323052; x=1702927852; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=Zg8AR05UHVq8cdvaCLIRoC/0NFp4lFWlQJeIXd72AHI=; b=dDsf8FA0wFzn77vPL6A4/RdEy1AZpMSWVA/FQL3czPbjtuRKRarsadrEci9Nxx1Bzt C5esFW1gRGWiCLAa9EK4m64MwTFk9MS4UykMvI67NnJBcFlbGM/TWXlQ58D1GuRZVEXV riBDCY3Y0I2v0m6zL+rEtvUQFzW10PfIv4+8aKvig3KG1A+GBafwsqaaefx2J93Wj65v Pqypumeu0nSrTvzTdmYcKd889cg7eC4fuRWD0AXHsC4tZ7rqP0N4lg6ucdt+8h+Uhz5v HGljVxjkFJsUzd3RGV03VZ9CRCifKKOwyXhlZvC3AFE9wPVMe2fr4pjf83/Fh3BBR79p 4IJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702323052; x=1702927852; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=Zg8AR05UHVq8cdvaCLIRoC/0NFp4lFWlQJeIXd72AHI=; b=GLuHvUaszBG22CwJUe03cj98Gg+1Lu5S5LTXOJDGyrpsi0WUg4bN0O2a++EfnZRDky WbCYPXW20I0Ffxx6IZP/G5lJtCwn077i+gCv9IZZJVChhlZOl2CCXdSrrhmwsVUFvDD6 QICy+ulMdnR3KL2J5i3gJyCp9FspqRbjARTtFqXSryX0wKNGdXEHbYW3tJfhVCvYzp4d mgyQhIEs2c8SzWoLbTBxw7DUdF2rIOIMrOn0X8mUUlGAb4GITt3A1xXR3r4FmtHVNPpw gXjgiVW7mdfdYASGa/oSfIRKuGhbTKa9/ANpQoQDGxs6TIQdIehA1B7xARsBRdxMYQ0Q rl6Q== X-Gm-Message-State: AOJu0YzIVwhB72TPL01UvkhqYovMqf016b9V7HQNr/tYPc0lXLc/GZpc vKyNnmFuIjw45s6I/Qje06XixUnt1qY= X-Received: from avagin.kir.corp.google.com ([2620:0:1008:10:dc8d:a8c5:498f:a21c]) (user=avagin job=sendgmr) by 2002:a17:902:e804:b0:1d0:c2be:3d9d with SMTP id u4-20020a170902e80400b001d0c2be3d9dmr39536plg.7.1702323051863; Mon, 11 Dec 2023 11:30:51 -0800 (PST) Date: Mon, 11 Dec 2023 11:30:47 -0800 Mime-Version: 1.0 X-Mailer: git-send-email 2.43.0.472.g3155946c3a-goog Message-ID: <20231211193048.580691-1-avagin@google.com> Subject: [PATCH 1/2] fs/proc: show correct device and inode numbers in /proc/pid/maps From: Andrei Vagin <avagin@google.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: linux-kernel@vger.kernel.org, Andrei Vagin <avagin@google.com>, Amir Goldstein <amir73il@gmail.com>, Alexander Mikhalitsyn <alexander@mihalicyn.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Mon, 11 Dec 2023 11:30:59 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785015108918397993 X-GMAIL-MSGID: 1785015108918397993 |
Series |
[1/2] fs/proc: show correct device and inode numbers in /proc/pid/maps
|
|
Commit Message
Andrei Vagin
Dec. 11, 2023, 7:30 p.m. UTC
Device and inode numbers in /proc/pid/maps have to match numbers returned by
statx for the same files.
/proc/pid/maps shows device and inode numbers of vma->vm_file-s. Here is
an issue. If a mapped file is on a stackable file system (e.g.,
overlayfs), vma->vm_file is a backing file whose f_inode is on the
underlying filesystem. To show correct numbers, we need to get a user
file and shows its numbers. The same trick is used to show file paths in
/proc/pid/maps.
But it isn't the end of this story. A file system can manipulate inode numbers
within the getattr callback (e.g., ovl_getattr), so vfs_getattr must be used to
get correct numbers.
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com>
Signed-off-by: Andrei Vagin <avagin@google.com>
---
fs/proc/task_mmu.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)
Comments
On Mon, 11 Dec 2023 11:30:47 -0800 Andrei Vagin <avagin@google.com> wrote: > Device and inode numbers in /proc/pid/maps have to match numbers returned by > statx for the same files. > > /proc/pid/maps shows device and inode numbers of vma->vm_file-s. Here is > an issue. If a mapped file is on a stackable file system (e.g., > overlayfs), vma->vm_file is a backing file whose f_inode is on the > underlying filesystem. To show correct numbers, we need to get a user > file and shows its numbers. The same trick is used to show file paths in > /proc/pid/maps. > > But it isn't the end of this story. A file system can manipulate inode numbers > within the getattr callback (e.g., ovl_getattr), so vfs_getattr must be used to > get correct numbers. Al, could you please comment on this? Thanks. > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -273,9 +273,23 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma) > const char *name = NULL; > > if (file) { > - struct inode *inode = file_inode(vma->vm_file); > - dev = inode->i_sb->s_dev; > - ino = inode->i_ino; > + const struct path *path; > + struct kstat stat; > + > + path = file_user_path(file); > + /* > + * A file system can manipulate inode numbers within the > + * getattr callback (e.g. ovl_getattr). > + */ > + if (!vfs_getattr_nosec(path, &stat, STATX_INO, AT_STATX_DONT_SYNC)) { > + dev = stat.dev; > + ino = stat.ino; > + } else { > + struct inode *inode = d_backing_inode(path->dentry); > + > + dev = inode->i_sb->s_dev; > + ino = inode->i_ino; > + } > pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT; > } > > -- > 2.43.0.472.g3155946c3a-goog
On Mon, 11 Dec 2023 11:30:47 -0800 Andrei Vagin <avagin@google.com> wrote: > Device and inode numbers in /proc/pid/maps have to match numbers returned by > statx for the same files. > > /proc/pid/maps shows device and inode numbers of vma->vm_file-s. Here is > an issue. If a mapped file is on a stackable file system (e.g., > overlayfs), vma->vm_file is a backing file whose f_inode is on the > underlying filesystem. To show correct numbers, we need to get a user > file and shows its numbers. The same trick is used to show file paths in > /proc/pid/maps. > > But it isn't the end of this story. A file system can manipulate inode numbers > within the getattr callback (e.g., ovl_getattr), so vfs_getattr must be used to > get correct numbers. > > Cc: Amir Goldstein <amir73il@gmail.com> > Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com> We have discussed this with Andrei offlist, so LGTM. Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> +cc Christian > Signed-off-by: Andrei Vagin <avagin@google.com> > --- > fs/proc/task_mmu.c | 20 +++++++++++++++++--- > 1 file changed, 17 insertions(+), 3 deletions(-) > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 435b61054b5b..abbf96c091ad 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -273,9 +273,23 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma) > const char *name = NULL; > > if (file) { > - struct inode *inode = file_inode(vma->vm_file); > - dev = inode->i_sb->s_dev; > - ino = inode->i_ino; > + const struct path *path; > + struct kstat stat; > + > + path = file_user_path(file); > + /* > + * A file system can manipulate inode numbers within the > + * getattr callback (e.g. ovl_getattr). > + */ > + if (!vfs_getattr_nosec(path, &stat, STATX_INO, AT_STATX_DONT_SYNC)) { > + dev = stat.dev; > + ino = stat.ino; > + } else { > + struct inode *inode = d_backing_inode(path->dentry); > + > + dev = inode->i_sb->s_dev; > + ino = inode->i_ino; > + } > pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT; > } > > -- > 2.43.0.472.g3155946c3a-goog >
+fsdevel, +overlayfs, +brauner, +miklos On Mon, Dec 11, 2023 at 9:30 PM Andrei Vagin <avagin@google.com> wrote: > > Device and inode numbers in /proc/pid/maps have to match numbers returned by > statx for the same files. That statement may be true for regular files. It is not true for block/char as far as I know. I think that your fix will break that by displaying the ino/dev of the block/char reference inode and not their backing rdev inode. > > /proc/pid/maps shows device and inode numbers of vma->vm_file-s. Here is > an issue. If a mapped file is on a stackable file system (e.g., > overlayfs), vma->vm_file is a backing file whose f_inode is on the > underlying filesystem. To show correct numbers, we need to get a user > file and shows its numbers. The same trick is used to show file paths in > /proc/pid/maps. For the *same* trick, see my patch below. > > But it isn't the end of this story. A file system can manipulate inode numbers > within the getattr callback (e.g., ovl_getattr), so vfs_getattr must be used to > get correct numbers. This explanation is inaccurate, because it mixes two different overlayfs traits which are unrelated. It is true that a filesystem *can* manipulate st_dev in a way that will not match i_ino and it is true that overlayfs may do that in some non-default configurations (see [1]), but this is not the reason that you are seeing mismatches ino/dev in /proc/<pid>/maps. [1] https://docs.kernel.org/filesystems/overlayfs.html#inode-properties The reason is that the vma->vm_file is a special internal backing file which is not otherwise exposed to userspace. Please see my suggested fix below. > > Cc: Amir Goldstein <amir73il@gmail.com> > Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com> > Signed-off-by: Andrei Vagin <avagin@google.com> > --- > fs/proc/task_mmu.c | 20 +++++++++++++++++--- > 1 file changed, 17 insertions(+), 3 deletions(-) > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 435b61054b5b..abbf96c091ad 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -273,9 +273,23 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma) > const char *name = NULL; > > if (file) { > - struct inode *inode = file_inode(vma->vm_file); > - dev = inode->i_sb->s_dev; > - ino = inode->i_ino; > + const struct path *path; > + struct kstat stat; > + > + path = file_user_path(file); > + /* > + * A file system can manipulate inode numbers within the > + * getattr callback (e.g. ovl_getattr). > + */ > + if (!vfs_getattr_nosec(path, &stat, STATX_INO, AT_STATX_DONT_SYNC)) { Should you prefer to keep this solution it should be constrained to regular files. > + dev = stat.dev; > + ino = stat.ino; > + } else { > + struct inode *inode = d_backing_inode(path->dentry); d_inode() please. d_backing_inode()/d_backing_dentry() are relics of an era that never existed (i.e. union mounts). > + > + dev = inode->i_sb->s_dev; > + ino = inode->i_ino; > + } > pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT; > } > Would you mind trying this alternative (untested) patch? I think it is preferred, because it is simpler. Thanks, Amir. diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index ef2eb12906da..5328266be6b5 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -273,7 +273,8 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma) const char *name = NULL; if (file) { - struct inode *inode = file_inode(vma->vm_file); + struct inode *inode = file_user_inode(vma->vm_file); + dev = inode->i_sb->s_dev; ino = inode->i_ino; pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT; diff --git a/include/linux/fs.h b/include/linux/fs.h index 900d0cd55b50..d78412c6fd47 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2581,20 +2581,28 @@ struct file *backing_file_open(const struct path *user_path, int flags, struct path *backing_file_user_path(struct file *f); /* - * file_user_path - get the path to display for memory mapped file - * * When mmapping a file on a stackable filesystem (e.g., overlayfs), the file * stored in ->vm_file is a backing file whose f_inode is on the underlying - * filesystem. When the mapped file path is displayed to user (e.g. via - * /proc/<pid>/maps), this helper should be used to get the path to display - * to the user, which is the path of the fd that user has requested to map. + * filesystem. When the mapped file path and inode number are displayed to + * user (e.g. via /proc/<pid>/maps), these helper should be used to get the + * path and inode number to display to the user, which is the path of the fd + * that user has requested to map and the inode number that would be returned + * by fstat() on that same fd. */ +/* Get the path to display in /proc/<pid>/maps */ static inline const struct path *file_user_path(struct file *f) { if (unlikely(f->f_mode & FMODE_BACKING)) return backing_file_user_path(f); return &f->f_path; } +/* Get the inode whose inode number to display in /proc/<pid>/maps */ +static inline const struct path *file_user_inode(struct file *f) +{ + if (unlikely(f->f_mode & FMODE_BACKING)) + return d_inode(backing_file_user_path(f)->dentry); + return file_inode(f); +}
On Tue, Dec 12, 2023 at 07:51:31AM +0200, Amir Goldstein wrote: > +fsdevel, +overlayfs, +brauner, +miklos > > On Mon, Dec 11, 2023 at 9:30 PM Andrei Vagin <avagin@google.com> wrote: > > > > Device and inode numbers in /proc/pid/maps have to match numbers returned by > > statx for the same files. > > That statement may be true for regular files. > It is not true for block/char as far as I know. > > I think that your fix will break that by displaying the ino/dev > of the block/char reference inode and not their backing rdev inode. > > > > > /proc/pid/maps shows device and inode numbers of vma->vm_file-s. Here is > > an issue. If a mapped file is on a stackable file system (e.g., > > overlayfs), vma->vm_file is a backing file whose f_inode is on the > > underlying filesystem. To show correct numbers, we need to get a user > > file and shows its numbers. The same trick is used to show file paths in > > /proc/pid/maps. > > For the *same* trick, see my patch below. > > > > > But it isn't the end of this story. A file system can manipulate inode numbers > > within the getattr callback (e.g., ovl_getattr), so vfs_getattr must be used to > > get correct numbers. > > This explanation is inaccurate, because it mixes two different overlayfs > traits which are unrelated. > It is true that a filesystem *can* manipulate st_dev in a way that will not > match i_ino and it is true that overlayfs may do that in some non-default > configurations (see [1]), but this is not the reason that you are seeing > mismatches ino/dev in /proc/<pid>/maps. > > [1] https://docs.kernel.org/filesystems/overlayfs.html#inode-properties > > The reason is that the vma->vm_file is a special internal backing file > which is not otherwise exposed to userspace. > Please see my suggested fix below. > > > > > Cc: Amir Goldstein <amir73il@gmail.com> > > Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com> > > Signed-off-by: Andrei Vagin <avagin@google.com> > > --- > > fs/proc/task_mmu.c | 20 +++++++++++++++++--- > > 1 file changed, 17 insertions(+), 3 deletions(-) > > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > index 435b61054b5b..abbf96c091ad 100644 > > --- a/fs/proc/task_mmu.c > > +++ b/fs/proc/task_mmu.c > > @@ -273,9 +273,23 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma) > > const char *name = NULL; > > > > if (file) { > > - struct inode *inode = file_inode(vma->vm_file); > > - dev = inode->i_sb->s_dev; > > - ino = inode->i_ino; > > + const struct path *path; > > + struct kstat stat; > > + > > + path = file_user_path(file); > > + /* > > + * A file system can manipulate inode numbers within the > > + * getattr callback (e.g. ovl_getattr). > > + */ > > + if (!vfs_getattr_nosec(path, &stat, STATX_INO, AT_STATX_DONT_SYNC)) { > > Should you prefer to keep this solution it should be constrained to > regular files. It's also very dicy calling into the filesystem from procfs. You might hang the system if you end up talking to a hung NFS server or something. What locks does show_map_vma() hold? And is it safe to call helpers that might generate io? > > > + dev = stat.dev; > > + ino = stat.ino; > > + } else { > > + struct inode *inode = d_backing_inode(path->dentry); > > d_inode() please. > d_backing_inode()/d_backing_dentry() are relics of an era that never existed > (i.e. union mounts). > > > + > > + dev = inode->i_sb->s_dev; > > + ino = inode->i_ino; > > + } > > pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT; > > } > > > > Would you mind trying this alternative (untested) patch? > I think it is preferred, because it is simpler. > > Thanks, > Amir. > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index ef2eb12906da..5328266be6b5 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -273,7 +273,8 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma) > const char *name = NULL; > > if (file) { > - struct inode *inode = file_inode(vma->vm_file); > + struct inode *inode = file_user_inode(vma->vm_file); > + > dev = inode->i_sb->s_dev; > ino = inode->i_ino; > pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT; > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 900d0cd55b50..d78412c6fd47 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2581,20 +2581,28 @@ struct file *backing_file_open(const struct > path *user_path, int flags, > struct path *backing_file_user_path(struct file *f); > > /* > - * file_user_path - get the path to display for memory mapped file > - * > * When mmapping a file on a stackable filesystem (e.g., overlayfs), the file > * stored in ->vm_file is a backing file whose f_inode is on the underlying > - * filesystem. When the mapped file path is displayed to user (e.g. via > - * /proc/<pid>/maps), this helper should be used to get the path to display > - * to the user, which is the path of the fd that user has requested to map. > + * filesystem. When the mapped file path and inode number are displayed to > + * user (e.g. via /proc/<pid>/maps), these helper should be used to get the > + * path and inode number to display to the user, which is the path of the fd > + * that user has requested to map and the inode number that would be returned > + * by fstat() on that same fd. > */ > +/* Get the path to display in /proc/<pid>/maps */ > static inline const struct path *file_user_path(struct file *f) > { > if (unlikely(f->f_mode & FMODE_BACKING)) > return backing_file_user_path(f); > return &f->f_path; > } > +/* Get the inode whose inode number to display in /proc/<pid>/maps */ > +static inline const struct path *file_user_inode(struct file *f) > +{ > + if (unlikely(f->f_mode & FMODE_BACKING)) > + return d_inode(backing_file_user_path(f)->dentry); > + return file_inode(f); > +} Way better imho.
Hi Amir, On Mon, Dec 11, 2023 at 9:51 PM Amir Goldstein <amir73il@gmail.com> wrote: > > +fsdevel, +overlayfs, +brauner, +miklos > > On Mon, Dec 11, 2023 at 9:30 PM Andrei Vagin <avagin@google.com> wrote: > > > > Device and inode numbers in /proc/pid/maps have to match numbers returned by > > statx for the same files. > > That statement may be true for regular files. > It is not true for block/char as far as I know. > > I think that your fix will break that by displaying the ino/dev > of the block/char reference inode and not their backing rdev inode. I think it doesn't break anything here. /proc/pid/maps shows dev of a filesystem where the device file resides. 7f336b6c3000-7f336b6c4000 rw-p 00000000 00:05 7 /dev/zero $ stat /dev/zero Device: 0,5 Inode: 7 Links: 1 Device type: 1,5 I checked that it works with and without my patch. It doesn't matter, look at the following comments. > > > > > /proc/pid/maps shows device and inode numbers of vma->vm_file-s. Here is > > an issue. If a mapped file is on a stackable file system (e.g., > > overlayfs), vma->vm_file is a backing file whose f_inode is on the > > underlying filesystem. To show correct numbers, we need to get a user > > file and shows its numbers. The same trick is used to show file paths in > > /proc/pid/maps. > > For the *same* trick, see my patch below. The patch looks good to me. Thanks! Will you send it? > > > > > But it isn't the end of this story. A file system can manipulate inode numbers > > within the getattr callback (e.g., ovl_getattr), so vfs_getattr must be used to > > get correct numbers. > > This explanation is inaccurate, because it mixes two different overlayfs > traits which are unrelated. > It is true that a filesystem *can* manipulate st_dev in a way that will not > match i_ino and it is true that overlayfs may do that in some non-default > configurations (see [1]), but this is not the reason that you are seeing > mismatches ino/dev in /proc/<pid>/maps. > > [1] https://docs.kernel.org/filesystems/overlayfs.html#inode-properties > > The reason is that the vma->vm_file is a special internal backing file > which is not otherwise exposed to userspace. > Please see my suggested fix below. I understand that this is the main root cause of issues that we have seen. But when I was preparing this patch, I found that ovl_getattr manipulates with inode numbers and decided that it can return a different inode number than file_user_inode(vma->vm_file).i_ino. I am glad that I was wrong and we don't need to use vfs_getattr here. > > > > > Cc: Amir Goldstein <amir73il@gmail.com> > > Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com> > > Signed-off-by: Andrei Vagin <avagin@google.com> <snip> > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index ef2eb12906da..5328266be6b5 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -273,7 +273,8 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma) > const char *name = NULL; > > if (file) { > - struct inode *inode = file_inode(vma->vm_file); > + struct inode *inode = file_user_inode(vma->vm_file); > + > dev = inode->i_sb->s_dev; > ino = inode->i_ino; > pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT; > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 900d0cd55b50..d78412c6fd47 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2581,20 +2581,28 @@ struct file *backing_file_open(const struct > path *user_path, int flags, > struct path *backing_file_user_path(struct file *f); > > /* > - * file_user_path - get the path to display for memory mapped file > - * > * When mmapping a file on a stackable filesystem (e.g., overlayfs), the file > * stored in ->vm_file is a backing file whose f_inode is on the underlying > - * filesystem. When the mapped file path is displayed to user (e.g. via > - * /proc/<pid>/maps), this helper should be used to get the path to display > - * to the user, which is the path of the fd that user has requested to map. > + * filesystem. When the mapped file path and inode number are displayed to > + * user (e.g. via /proc/<pid>/maps), these helper should be used to get the > + * path and inode number to display to the user, which is the path of the fd > + * that user has requested to map and the inode number that would be returned > + * by fstat() on that same fd. > */ > +/* Get the path to display in /proc/<pid>/maps */ > static inline const struct path *file_user_path(struct file *f) > { > if (unlikely(f->f_mode & FMODE_BACKING)) > return backing_file_user_path(f); > return &f->f_path; > } > +/* Get the inode whose inode number to display in /proc/<pid>/maps */ > +static inline const struct path *file_user_inode(struct file *f) nit: struct inode * > +{ > + if (unlikely(f->f_mode & FMODE_BACKING)) > + return d_inode(backing_file_user_path(f)->dentry); > + return file_inode(f); > +} Thanks, Andrei
On Tue, Dec 12, 2023 at 1:27 AM Christian Brauner <brauner@kernel.org> wrote: > > On Tue, Dec 12, 2023 at 07:51:31AM +0200, Amir Goldstein wrote: > > +fsdevel, +overlayfs, +brauner, +miklos > > > > On Mon, Dec 11, 2023 at 9:30 PM Andrei Vagin <avagin@google.com> wrote: > > > > > > Device and inode numbers in /proc/pid/maps have to match numbers returned by > > > statx for the same files. > > > > That statement may be true for regular files. > > It is not true for block/char as far as I know. > > > > I think that your fix will break that by displaying the ino/dev > > of the block/char reference inode and not their backing rdev inode. > > > > > > > > /proc/pid/maps shows device and inode numbers of vma->vm_file-s. Here is > > > an issue. If a mapped file is on a stackable file system (e.g., > > > overlayfs), vma->vm_file is a backing file whose f_inode is on the > > > underlying filesystem. To show correct numbers, we need to get a user > > > file and shows its numbers. The same trick is used to show file paths in > > > /proc/pid/maps. > > > > For the *same* trick, see my patch below. > > > > > > > > But it isn't the end of this story. A file system can manipulate inode numbers > > > within the getattr callback (e.g., ovl_getattr), so vfs_getattr must be used to > > > get correct numbers. > > > > This explanation is inaccurate, because it mixes two different overlayfs > > traits which are unrelated. > > It is true that a filesystem *can* manipulate st_dev in a way that will not > > match i_ino and it is true that overlayfs may do that in some non-default > > configurations (see [1]), but this is not the reason that you are seeing > > mismatches ino/dev in /proc/<pid>/maps. > > > > [1] https://docs.kernel.org/filesystems/overlayfs.html#inode-properties > > > > The reason is that the vma->vm_file is a special internal backing file > > which is not otherwise exposed to userspace. > > Please see my suggested fix below. > > > > > > > > Cc: Amir Goldstein <amir73il@gmail.com> > > > Cc: Alexander Mikhalitsyn <alexander@mihalicyn.com> > > > Signed-off-by: Andrei Vagin <avagin@google.com> > > > --- > > > fs/proc/task_mmu.c | 20 +++++++++++++++++--- > > > 1 file changed, 17 insertions(+), 3 deletions(-) > > > > > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > > > index 435b61054b5b..abbf96c091ad 100644 > > > --- a/fs/proc/task_mmu.c > > > +++ b/fs/proc/task_mmu.c > > > @@ -273,9 +273,23 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma) > > > const char *name = NULL; > > > > > > if (file) { > > > - struct inode *inode = file_inode(vma->vm_file); > > > - dev = inode->i_sb->s_dev; > > > - ino = inode->i_ino; > > > + const struct path *path; > > > + struct kstat stat; > > > + > > > + path = file_user_path(file); > > > + /* > > > + * A file system can manipulate inode numbers within the > > > + * getattr callback (e.g. ovl_getattr). > > > + */ > > > + if (!vfs_getattr_nosec(path, &stat, STATX_INO, AT_STATX_DONT_SYNC)) { > > > > Should you prefer to keep this solution it should be constrained to > > regular files. > > It's also very dicy calling into the filesystem from procfs. You might > hang the system if you end up talking to a hung NFS server or something. > What locks does show_map_vma() hold? And is it safe to call helpers that > might generate io? I had the same thoughts when I was thinking about whether it is safe to use it here or not. Then I found AT_STATX_DONT_SYNC (don't sync attributes with the server) and decided that it should be safe. Anyway, Amir explains that vfs_getattr_nosec isn't needed for overlay files. Thanks, Andrei
On Tue, Dec 12, 2023 at 9:08 PM Andrei Vagin <avagin@google.com> wrote: > > Hi Amir, > > On Mon, Dec 11, 2023 at 9:51 PM Amir Goldstein <amir73il@gmail.com> wrote: > > > > +fsdevel, +overlayfs, +brauner, +miklos > > > > On Mon, Dec 11, 2023 at 9:30 PM Andrei Vagin <avagin@google.com> wrote: > > > > > > Device and inode numbers in /proc/pid/maps have to match numbers returned by > > > statx for the same files. > > > > That statement may be true for regular files. > > It is not true for block/char as far as I know. > > > > I think that your fix will break that by displaying the ino/dev > > of the block/char reference inode and not their backing rdev inode. > > I think it doesn't break anything here. /proc/pid/maps shows dev of a > filesystem where the device file resides. > > 7f336b6c3000-7f336b6c4000 rw-p 00000000 00:05 7 > /dev/zero > $ stat /dev/zero > Device: 0,5 Inode: 7 Links: 1 Device type: 1,5 > > I checked that it works with and without my patch. It doesn't matter, look at > the following comments. > > > > > > > > > /proc/pid/maps shows device and inode numbers of vma->vm_file-s. Here is > > > an issue. If a mapped file is on a stackable file system (e.g., > > > overlayfs), vma->vm_file is a backing file whose f_inode is on the > > > underlying filesystem. To show correct numbers, we need to get a user > > > file and shows its numbers. The same trick is used to show file paths in > > > /proc/pid/maps. > > > > For the *same* trick, see my patch below. > > The patch looks good to me. Thanks! Will you send it? > I can send it, if you want. I wouldn't mind if you send it with my Suggested-by though, as you are already testing it and posting the selftest. Thanks, Amir.
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 435b61054b5b..abbf96c091ad 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -273,9 +273,23 @@ show_map_vma(struct seq_file *m, struct vm_area_struct *vma) const char *name = NULL; if (file) { - struct inode *inode = file_inode(vma->vm_file); - dev = inode->i_sb->s_dev; - ino = inode->i_ino; + const struct path *path; + struct kstat stat; + + path = file_user_path(file); + /* + * A file system can manipulate inode numbers within the + * getattr callback (e.g. ovl_getattr). + */ + if (!vfs_getattr_nosec(path, &stat, STATX_INO, AT_STATX_DONT_SYNC)) { + dev = stat.dev; + ino = stat.ino; + } else { + struct inode *inode = d_backing_inode(path->dentry); + + dev = inode->i_sb->s_dev; + ino = inode->i_ino; + } pgoff = ((loff_t)vma->vm_pgoff) << PAGE_SHIFT; }