proc: nommu: /proc/<pid>/maps: release mmap read lock

Message ID 20230914163019.4050530-2-ben.wolsieffer@hefring.com
State New
Headers
Series proc: nommu: /proc/<pid>/maps: release mmap read lock |

Commit Message

Ben Wolsieffer Sept. 14, 2023, 4:30 p.m. UTC
  From: Ben Wolsieffer <Ben.Wolsieffer@hefring.com>

The no-MMU implementation of /proc/<pid>/map doesn't normally release
the mmap read lock, because it uses !IS_ERR_OR_NULL(_vml) to determine
whether to release the lock. Since _vml is NULL when the end of the
mappings is reached, the lock is not released.

This code was incorrectly adapted from the MMU implementation, which
at the time released the lock in m_next() before returning the last entry.

The MMU implementation has diverged further from the no-MMU version
since then, so this patch brings their locking and error handling into
sync, fixing the bug and hopefully avoiding similar issues in the
future.

Fixes: 47fecca15c09 ("fs/proc/task_nommu.c: don't use priv->task->mm")
Signed-off-by: Ben Wolsieffer <ben.wolsieffer@hefring.com>
---
 fs/proc/task_nommu.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)
  

Comments

Andrew Morton Sept. 14, 2023, 5:02 p.m. UTC | #1
On Thu, 14 Sep 2023 12:30:20 -0400 Ben Wolsieffer <ben.wolsieffer@hefring.com> wrote:

> The no-MMU implementation of /proc/<pid>/map doesn't normally release
> the mmap read lock, because it uses !IS_ERR_OR_NULL(_vml) to determine
> whether to release the lock. Since _vml is NULL when the end of the
> mappings is reached, the lock is not released.
> 
> This code was incorrectly adapted from the MMU implementation, which
> at the time released the lock in m_next() before returning the last entry.
> 
> The MMU implementation has diverged further from the no-MMU version
> since then, so this patch brings their locking and error handling into
> sync, fixing the bug and hopefully avoiding similar issues in the
> future.

Thanks.  Is this bug demonstrable from userspace?  If so, how?
  
Ben Wolsieffer Sept. 14, 2023, 5:30 p.m. UTC | #2
On Thu, Sep 14, 2023 at 10:02:03AM -0700, Andrew Morton wrote:
> On Thu, 14 Sep 2023 12:30:20 -0400 Ben Wolsieffer <ben.wolsieffer@hefring.com> wrote:
> 
> > The no-MMU implementation of /proc/<pid>/map doesn't normally release
> > the mmap read lock, because it uses !IS_ERR_OR_NULL(_vml) to determine
> > whether to release the lock. Since _vml is NULL when the end of the
> > mappings is reached, the lock is not released.
> > 
> 
> Thanks.  Is this bug demonstrable from userspace?  If so, how?

Yes, run "cat /proc/1/maps" twice. You should observe that the
second run hangs.
  
Oleg Nesterov Sept. 15, 2023, 12:15 p.m. UTC | #3
On 09/14, Ben Wolsieffer wrote:
>
> Fixes: 47fecca15c09 ("fs/proc/task_nommu.c: don't use priv->task->mm")
> Signed-off-by: Ben Wolsieffer <ben.wolsieffer@hefring.com>
> ---
>  fs/proc/task_nommu.c | 27 +++++++++++++++------------
>  1 file changed, 15 insertions(+), 12 deletions(-)

Acked-by: Oleg Nesterov <oleg@redhat.com>


-------------------------------------------------------------------------------
Sorry for the offtopic question. I know NOTHING about nommu and when I tried to
review this patch I was puzzled by

	/* See m_next(). Zero at the start or after lseek. */
	if (addr == -1UL)
		return NULL;

at the start of m_start(). OK, lets look at

	static void *m_next(struct seq_file *m, void *_p, loff_t *pos)
	{
		struct vm_area_struct *vma = _p;

		*pos = vma->vm_end;
		return find_vma(vma->vm_mm, vma->vm_end);
	}

where does this -1UL come from? Does this mean that on nommu

	last_vma->vm_end == -1UL

or what?

fs/proc/task_mmu.c has the same check at the start, but in this case
the "See m_next()" comment actually helps.

Just curious, thanks.

Oleg.
  
Ben Wolsieffer Sept. 15, 2023, 3:42 p.m. UTC | #4
On Thu, Sep 14, 2023 at 01:30:08PM -0400, Ben Wolsieffer wrote:
> On Thu, Sep 14, 2023 at 10:02:03AM -0700, Andrew Morton wrote:
> > On Thu, 14 Sep 2023 12:30:20 -0400 Ben Wolsieffer <ben.wolsieffer@hefring.com> wrote:
> > 
> > > The no-MMU implementation of /proc/<pid>/map doesn't normally release
> > > the mmap read lock, because it uses !IS_ERR_OR_NULL(_vml) to determine
> > > whether to release the lock. Since _vml is NULL when the end of the
> > > mappings is reached, the lock is not released.
> > > 
> > 
> > Thanks.  Is this bug demonstrable from userspace?  If so, how?
> 
> Yes, run "cat /proc/1/maps" twice. You should observe that the
> second run hangs.
 
Hi Andrew,

I apologize because I realized I provided an incorrect reproducer for
this bug. I responded from what I remembered of this bug (I originally
wrote the patch over a year ago) and did not test it.

Reading /proc/1/maps twice doesn't reproduce the bug because it only
takes the read lock, which can be taken multiple times and therefore
doesn't show any problem if the lock isn't released. Instead, you need
to perform some operation that attempts to take the write lock after
reading /proc/<pid>/maps. To actually reproduce the bug, compile the
following code as 'proc_maps_bug':

#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>

int main(int argc, char *argv[]) {
        void *buf;
        sleep(1);
        buf = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
        puts("mmap returned");
        return 0;
}

Then, run:

  ./proc_maps_bug &; cat /proc/$!/maps; fg

Without this patch, mmap() will hang and the command will never
complete.

Additionally, it turns out you cannot reproduce this bug on recent
kernels because 0c563f148043 ("proc: remove VMA rbtree use from nommu")
introduces a second bug that completely breaks /proc/<pid>/maps and
prevents the locking bug from being triggered. I will have a second
patch for that soon.

Thanks, Ben
  
Ben Wolsieffer Sept. 15, 2023, 3:44 p.m. UTC | #5
On Fri, Sep 15, 2023 at 02:15:15PM +0200, Oleg Nesterov wrote:
> Sorry for the offtopic question. I know NOTHING about nommu and when I tried to
> review this patch I was puzzled by
> 
> 	/* See m_next(). Zero at the start or after lseek. */
> 	if (addr == -1UL)
> 		return NULL;
> 
> at the start of m_start(). OK, lets look at
> 
> 	static void *m_next(struct seq_file *m, void *_p, loff_t *pos)
> 	{
> 		struct vm_area_struct *vma = _p;
> 
> 		*pos = vma->vm_end;
> 		return find_vma(vma->vm_mm, vma->vm_end);
> 	}
> 
> where does this -1UL come from? Does this mean that on nommu
> 
> 	last_vma->vm_end == -1UL
> 
> or what?
> 
> fs/proc/task_mmu.c has the same check at the start, but in this case
> the "See m_next()" comment actually helps.

Yes, this is another copying mistake from the MMU implementation. In
fact, it turns out that no-MMU /proc/<pid>/maps is completely broken
after 0c563f148043 ("proc: remove VMA rbtree use from nommu"). It just
returns an empty file.

This happens because find_vma() doesn't do what we want here. It "look[s]
up the first VMA in which addr resides, NULL if none", and the address
will be zero in in m_start(), which makes find_vma() return NULL (unless
presumably the zero address is actually part of the process's address
space).

I didn't run into this because I developed my patch against an older
kernel, and didn't test the latest version until today.

I'm preparing a second patch to fix this bug.

> 
> Just curious, thanks.
> 
> Oleg.
> 

Thanks, Ben
  

Patch

diff --git a/fs/proc/task_nommu.c b/fs/proc/task_nommu.c
index 2c8b62265981..061bd3f82756 100644
--- a/fs/proc/task_nommu.c
+++ b/fs/proc/task_nommu.c
@@ -205,11 +205,16 @@  static void *m_start(struct seq_file *m, loff_t *pos)
 		return ERR_PTR(-ESRCH);
 
 	mm = priv->mm;
-	if (!mm || !mmget_not_zero(mm))
+	if (!mm || !mmget_not_zero(mm)) {
+		put_task_struct(priv->task);
+		priv->task = NULL;
 		return NULL;
+	}
 
 	if (mmap_read_lock_killable(mm)) {
 		mmput(mm);
+		put_task_struct(priv->task);
+		priv->task = NULL;
 		return ERR_PTR(-EINTR);
 	}
 
@@ -218,23 +223,21 @@  static void *m_start(struct seq_file *m, loff_t *pos)
 	if (vma)
 		return vma;
 
-	mmap_read_unlock(mm);
-	mmput(mm);
 	return NULL;
 }
 
-static void m_stop(struct seq_file *m, void *_vml)
+static void m_stop(struct seq_file *m, void *v)
 {
 	struct proc_maps_private *priv = m->private;
+	struct mm_struct *mm = priv->mm;
 
-	if (!IS_ERR_OR_NULL(_vml)) {
-		mmap_read_unlock(priv->mm);
-		mmput(priv->mm);
-	}
-	if (priv->task) {
-		put_task_struct(priv->task);
-		priv->task = NULL;
-	}
+	if (!priv->task)
+		return;
+
+	mmap_read_unlock(mm);
+	mmput(mm);
+	put_task_struct(priv->task);
+	priv->task = NULL;
 }
 
 static void *m_next(struct seq_file *m, void *_p, loff_t *pos)