[1/3] mm: change vma_start_read to fail if VMA got detached from under it

Message ID 20230620235726.3873043-1-surenb@google.com
State New
Headers
Series [1/3] mm: change vma_start_read to fail if VMA got detached from under it |

Commit Message

Suren Baghdasaryan June 20, 2023, 11:57 p.m. UTC
  Current implementation of vma_start_read() checks VMA for being locked
before taking vma->vm_lock and then checks that again. This mechanism
fails to detect a case when the VMA gets write-locked, modified and
unlocked after the first check but before the vma->vm_lock is obtained.
While this is not strictly a problem (vma_start_read would not produce
a false unlocked result), this allows it to successfully lock a VMA which
got detached from the VMA tree while vma_start_read was locking it.
New condition checks for any change in vma->vm_lock_seq after we obtain
vma->vm_lock and will cause vma_start_read() to fail if the above race
occurs.

Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it")
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/mm.h | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)
  

Comments

Suren Baghdasaryan June 21, 2023, 12:05 a.m. UTC | #1
On Tue, Jun 20, 2023 at 4:57 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> By the time VMA is freed it has to be detached with the exception of
> exit_mmap which is destroying the whole VMA tree. Enforce this
> requirement before freeing the VMA. exit_mmap in the only user calling
> __vm_area_free directly, therefore it won't trigger the new check.
> Change VMA initialization to mark new VMAs as detached and change that
> flag once the VMA is added into a tree.
>
> Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>

My tests did not generate the warning but the test coverage is far
from perfect, so if someone can run extensive testing on this one that
would be greatly appreciated.
Thanks,
Suren.

> ---
>  include/linux/mm.h | 4 ++--
>  kernel/fork.c      | 2 ++
>  mm/internal.h      | 1 +
>  3 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 74e3033c9fc2..9a10fcdb134e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -247,7 +247,7 @@ void setup_initial_init_mm(void *start_code, void *end_code,
>  struct vm_area_struct *vm_area_alloc(struct mm_struct *);
>  struct vm_area_struct *vm_area_dup(struct vm_area_struct *);
>  void vm_area_free(struct vm_area_struct *);
> -/* Use only if VMA has no other users */
> +/* Use only if VMA has no other users and might still be attached to a tree */
>  void __vm_area_free(struct vm_area_struct *vma);
>
>  #ifndef CONFIG_MMU
> @@ -751,7 +751,7 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
>         vma->vm_mm = mm;
>         vma->vm_ops = &dummy_vm_ops;
>         INIT_LIST_HEAD(&vma->anon_vma_chain);
> -       vma_mark_detached(vma, false);
> +       vma->detached = true;
>         vma_numab_state_init(vma);
>  }
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 41c964104b58..000fc429345c 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -540,6 +540,7 @@ static void vm_area_free_rcu_cb(struct rcu_head *head)
>
>         /* The vma should not be locked while being destroyed. */
>         VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock->lock), vma);
> +       WARN_ON_ONCE(!vma->detached);
>         __vm_area_free(vma);
>  }
>  #endif
> @@ -549,6 +550,7 @@ void vm_area_free(struct vm_area_struct *vma)
>  #ifdef CONFIG_PER_VMA_LOCK
>         call_rcu(&vma->vm_rcu, vm_area_free_rcu_cb);
>  #else
> +       WARN_ON_ONCE(!vma->detached);
>         __vm_area_free(vma);
>  #endif
>  }
> diff --git a/mm/internal.h b/mm/internal.h
> index 68410c6d97ac..728189e6c703 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1068,6 +1068,7 @@ static inline void vma_iter_store(struct vma_iterator *vmi,
>         vmi->mas.index = vma->vm_start;
>         vmi->mas.last = vma->vm_end - 1;
>         mas_store_prealloc(&vmi->mas, vma);
> +       vma_mark_detached(vma, false);
>  }
>
>  static inline int vma_iter_store_gfp(struct vma_iterator *vmi,
> --
> 2.41.0.162.gfafddb0af9-goog
>
  
Suren Baghdasaryan June 26, 2023, 8:51 p.m. UTC | #2
On Tue, Jun 20, 2023 at 11:57 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> Current implementation of vma_start_read() checks VMA for being locked
> before taking vma->vm_lock and then checks that again. This mechanism
> fails to detect a case when the VMA gets write-locked, modified and
> unlocked after the first check but before the vma->vm_lock is obtained.
> While this is not strictly a problem (vma_start_read would not produce
> a false unlocked result), this allows it to successfully lock a VMA which
> got detached from the VMA tree while vma_start_read was locking it.
> New condition checks for any change in vma->vm_lock_seq after we obtain
> vma->vm_lock and will cause vma_start_read() to fail if the above race
> occurs.

Just a friendly ping for feedback.
I know most people were busy fixing the stack expansion problem and
these patches are not urgent, so no rush. If nobody reviews them, I'll
ping again next week.

>
> Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it")
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> ---
>  include/linux/mm.h | 19 ++++++++++---------
>  1 file changed, 10 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 27ce77080c79..8410da79c570 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -639,23 +639,24 @@ static inline void vma_numab_state_free(struct vm_area_struct *vma) {}
>   */
>  static inline bool vma_start_read(struct vm_area_struct *vma)
>  {
> -       /* Check before locking. A race might cause false locked result. */
> -       if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
> +       int vm_lock_seq = READ_ONCE(vma->vm_lock_seq);
> +
> +       /*
> +        * Check if VMA is locked before taking vma->vm_lock. A race or
> +        * mm_lock_seq overflow might cause false locked result.
> +        */
> +       if (vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
>                 return false;
>
>         if (unlikely(down_read_trylock(&vma->vm_lock->lock) == 0))
>                 return false;
>
> -       /*
> -        * Overflow might produce false locked result.
> -        * False unlocked result is impossible because we modify and check
> -        * vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq
> -        * modification invalidates all existing locks.
> -        */
> -       if (unlikely(vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) {
> +       /* Fail if VMA was write-locked after we checked it earlier */
> +       if (unlikely(vm_lock_seq != READ_ONCE(vma->vm_lock_seq))) {
>                 up_read(&vma->vm_lock->lock);
>                 return false;
>         }
> +
>         return true;
>  }
>
> --
> 2.41.0.162.gfafddb0af9-goog
>
  

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 27ce77080c79..8410da79c570 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -639,23 +639,24 @@  static inline void vma_numab_state_free(struct vm_area_struct *vma) {}
  */
 static inline bool vma_start_read(struct vm_area_struct *vma)
 {
-	/* Check before locking. A race might cause false locked result. */
-	if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
+	int vm_lock_seq = READ_ONCE(vma->vm_lock_seq);
+
+	/*
+	 * Check if VMA is locked before taking vma->vm_lock. A race or
+	 * mm_lock_seq overflow might cause false locked result.
+	 */
+	if (vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
 		return false;
 
 	if (unlikely(down_read_trylock(&vma->vm_lock->lock) == 0))
 		return false;
 
-	/*
-	 * Overflow might produce false locked result.
-	 * False unlocked result is impossible because we modify and check
-	 * vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq
-	 * modification invalidates all existing locks.
-	 */
-	if (unlikely(vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) {
+	/* Fail if VMA was write-locked after we checked it earlier */
+	if (unlikely(vm_lock_seq != READ_ONCE(vma->vm_lock_seq))) {
 		up_read(&vma->vm_lock->lock);
 		return false;
 	}
+
 	return true;
 }