[09/41] mm: rcu safe VMA freeing

Message ID 20230109205336.3665937-10-surenb@google.com
State New
Headers
Series Per-VMA locks |

Commit Message

Suren Baghdasaryan Jan. 9, 2023, 8:53 p.m. UTC
  From: Michel Lespinasse <michel@lespinasse.org>

This prepares for page faults handling under VMA lock, looking up VMAs
under protection of an rcu read lock, instead of the usual mmap read lock.

Signed-off-by: Michel Lespinasse <michel@lespinasse.org>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 include/linux/mm_types.h | 13 ++++++++++---
 kernel/fork.c            | 13 +++++++++++++
 2 files changed, 23 insertions(+), 3 deletions(-)
  

Comments

Michal Hocko Jan. 17, 2023, 2:25 p.m. UTC | #1
On Mon 09-01-23 12:53:04, Suren Baghdasaryan wrote:
[...]
>  void vm_area_free(struct vm_area_struct *vma)
>  {
>  	free_anon_vma_name(vma);
> +#ifdef CONFIG_PER_VMA_LOCK
> +	call_rcu(&vma->vm_rcu, __vm_area_free);
> +#else
>  	kmem_cache_free(vm_area_cachep, vma);
> +#endif

Is it safe to have vma with already freed vma_name? I suspect this is
safe because of mmap_lock but is there any reason to split the freeing
process and have this potential UAF lurking?

>  }
>  
>  static void account_kernel_stack(struct task_struct *tsk, int account)
> -- 
> 2.39.0
  
Suren Baghdasaryan Jan. 18, 2023, 2:16 a.m. UTC | #2
On Tue, Jan 17, 2023 at 6:25 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 09-01-23 12:53:04, Suren Baghdasaryan wrote:
> [...]
> >  void vm_area_free(struct vm_area_struct *vma)
> >  {
> >       free_anon_vma_name(vma);
> > +#ifdef CONFIG_PER_VMA_LOCK
> > +     call_rcu(&vma->vm_rcu, __vm_area_free);
> > +#else
> >       kmem_cache_free(vm_area_cachep, vma);
> > +#endif
>
> Is it safe to have vma with already freed vma_name? I suspect this is
> safe because of mmap_lock but is there any reason to split the freeing
> process and have this potential UAF lurking?

It should be safe because VMA is either locked or has been isolated
while locked, so no page fault handlers should have access to it. But
you are right, moving free_anon_vma_name() into __vm_area_free() does
seem safer. Will make the change in the next rev.

>
> >  }
> >
> >  static void account_kernel_stack(struct task_struct *tsk, int account)
> > --
> > 2.39.0
>
> --
> Michal Hocko
> SUSE Labs
  

Patch

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 4b6bce73fbb4..d5cdec1314fe 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -535,9 +535,16 @@  struct anon_vma_name {
 struct vm_area_struct {
 	/* The first cache line has the info for VMA tree walking. */
 
-	unsigned long vm_start;		/* Our start address within vm_mm. */
-	unsigned long vm_end;		/* The first byte after our end address
-					   within vm_mm. */
+	union {
+		struct {
+			/* VMA covers [vm_start; vm_end) addresses within mm */
+			unsigned long vm_start;
+			unsigned long vm_end;
+		};
+#ifdef CONFIG_PER_VMA_LOCK
+		struct rcu_head vm_rcu;	/* Used for deferred freeing. */
+#endif
+	};
 
 	struct mm_struct *vm_mm;	/* The address space we belong to. */
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 58aab6c889a4..5986817f393c 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -479,10 +479,23 @@  struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
 	return new;
 }
 
+#ifdef CONFIG_PER_VMA_LOCK
+static void __vm_area_free(struct rcu_head *head)
+{
+	struct vm_area_struct *vma = container_of(head, struct vm_area_struct,
+						  vm_rcu);
+	kmem_cache_free(vm_area_cachep, vma);
+}
+#endif
+
 void vm_area_free(struct vm_area_struct *vma)
 {
 	free_anon_vma_name(vma);
+#ifdef CONFIG_PER_VMA_LOCK
+	call_rcu(&vma->vm_rcu, __vm_area_free);
+#else
 	kmem_cache_free(vm_area_cachep, vma);
+#endif
 }
 
 static void account_kernel_stack(struct task_struct *tsk, int account)