[v2] umh: fix out of scope usage when the process is being killed

Message ID 20221214134656.21228-1-schspa@gmail.com
State New
Headers
Series [v2] umh: fix out of scope usage when the process is being killed |

Commit Message

Schspa Shi Dec. 14, 2022, 1:46 p.m. UTC
  When the process is killed, wait_for_completion_state will return with
-ERESTARTSYS, and the completion variable in the stack will be unavailable,
even freed. If the user-mode thread is complete at the same time, there
will be a race to use a unavailable variable.

Please refer to the following scenarios.
            T1                                  T2
------------------------------------------------------------------
call_usermodehelper_exec
                                   call_usermodehelper_exec_async
                                   << do something >>
                                   umh_complete(sub_info);
                                   comp = xchg(&sub_info->complete, NULL);
                                   /* we got the completion */
                                   << context switch >>

    << Being killed >>
	retval = wait_for_completion_state(sub_info->complete, state);
	if (!retval)
		goto wait_done;

	if (wait & UMH_KILLABLE) {
		/* umh_complete() will see NULL and free sub_info */
		if (xchg(&sub_info->complete, NULL))
			goto unlock;
        << we can't got the completion, because T2 take it already >>
	}
	....
	return retval;
}

/**
 * the completion variable in stack is end of life cycle.
 * and maybe freed due to process is recycled.
 */
                                   -------- BUG here----------
                                   if (comp)
                                       complete(comp);

To fix it, we can add an additional wait_for_completion to ensure the
completion object is completely unused. And this is what
kthread_create_on_node does to handle this race.

Reported-by: syzbot+10d19d528d9755d9af22@syzkaller.appspotmail.com
Reported-by: syzbot+70d5d5d83d03db2c813d@syzkaller.appspotmail.com
Reported-by: syzbot+83cb0411d0fcf0a30fc1@syzkaller.appspotmail.com
Reported-by: syzbot+c92c6a251d49ceceb625@syzkaller.appspotmail.com
Signed-off-by: Schspa Shi <schspa@gmail.com>
---

v1->v2:
  - Use a new way to fix the race as kthread_create_on_node do.
  - Optimize comments and use more accurate words to describe the problem.

 kernel/umh.c | 4 ++++
 1 file changed, 4 insertions(+)
  

Comments

Luis Chamberlain Dec. 14, 2022, 7:59 p.m. UTC | #1
On Wed, Dec 14, 2022 at 09:46:56PM +0800, Schspa Shi wrote:
> When the process is killed, wait_for_completion_state will return with
> -ERESTARTSYS, and the completion variable in the stack will be unavailable,
> even freed. If the user-mode thread is complete at the same time, there
> will be a race to use a unavailable variable.
> 
> Please refer to the following scenarios.
>             T1                                  T2
> ------------------------------------------------------------------
> call_usermodehelper_exec
>                                    call_usermodehelper_exec_async
>                                    << do something >>
>                                    umh_complete(sub_info);
>                                    comp = xchg(&sub_info->complete, NULL);
>                                    /* we got the completion */
>                                    << context switch >>
> 
>     << Being killed >>
> 	retval = wait_for_completion_state(sub_info->complete, state);
> 	if (!retval)
> 		goto wait_done;
> 
> 	if (wait & UMH_KILLABLE) {
> 		/* umh_complete() will see NULL and free sub_info */
> 		if (xchg(&sub_info->complete, NULL))
> 			goto unlock;
>         << we can't got the completion, because T2 take it already >>
> 	}
> 	....
> 	return retval;
> }
> 
> /**
>  * the completion variable in stack is end of life cycle.
>  * and maybe freed due to process is recycled.
>  */
>                                    -------- BUG here----------
>                                    if (comp)
>                                        complete(comp);
> 
> To fix it, we can add an additional wait_for_completion to ensure the
> completion object is completely unused. And this is what
> kthread_create_on_node does to handle this race.
> 
> Reported-by: syzbot+10d19d528d9755d9af22@syzkaller.appspotmail.com
> Reported-by: syzbot+70d5d5d83d03db2c813d@syzkaller.appspotmail.com
> Reported-by: syzbot+83cb0411d0fcf0a30fc1@syzkaller.appspotmail.com
> Reported-by: syzbot+c92c6a251d49ceceb625@syzkaller.appspotmail.com
> Signed-off-by: Schspa Shi <schspa@gmail.com>
> ---

Please fix the commit log a bit more with the cotext I provided, *if*
on the other thread the community agrees with the approach to be
compartamentalized. I am still not sure why this would fix the
UAF after thinking about it some more, and the issue would mean
there likely could be a generic fix / issue to consider.

So for now I'd like more review of this race and the proposed fix
as I mentioned in the follow-up threaad in your v1 patch. Let's
follow up there and see how that discussion goes.

  Luis
  
Schspa Shi Dec. 15, 2022, 5:11 a.m. UTC | #2
Luis Chamberlain <mcgrof@kernel.org> writes:

> On Wed, Dec 14, 2022 at 09:46:56PM +0800, Schspa Shi wrote:
>> When the process is killed, wait_for_completion_state will return with
>> -ERESTARTSYS, and the completion variable in the stack will be unavailable,
>> even freed. If the user-mode thread is complete at the same time, there
>> will be a race to use a unavailable variable.
>> 
>> Please refer to the following scenarios.
>>             T1                                  T2
>> ------------------------------------------------------------------
>> call_usermodehelper_exec
>>                                    call_usermodehelper_exec_async
>>                                    << do something >>
>>                                    umh_complete(sub_info);
>>                                    comp = xchg(&sub_info->complete, NULL);
>>                                    /* we got the completion */
>>                                    << context switch >>
>> 
>>     << Being killed >>
>> 	retval = wait_for_completion_state(sub_info->complete, state);
>> 	if (!retval)
>> 		goto wait_done;
>> 
>> 	if (wait & UMH_KILLABLE) {
>> 		/* umh_complete() will see NULL and free sub_info */
>> 		if (xchg(&sub_info->complete, NULL))
>> 			goto unlock;
>>         << we can't got the completion, because T2 take it already >>
>> 	}
>> 	....
>> 	return retval;
>> }
>> 
>> /**
>>  * the completion variable in stack is end of life cycle.
>>  * and maybe freed due to process is recycled.
>>  */
>>                                    -------- BUG here----------
>>                                    if (comp)
>>                                        complete(comp);
>> 
>> To fix it, we can add an additional wait_for_completion to ensure the
>> completion object is completely unused. And this is what
>> kthread_create_on_node does to handle this race.
>> 
>> Reported-by: syzbot+10d19d528d9755d9af22@syzkaller.appspotmail.com
>> Reported-by: syzbot+70d5d5d83d03db2c813d@syzkaller.appspotmail.com
>> Reported-by: syzbot+83cb0411d0fcf0a30fc1@syzkaller.appspotmail.com
>> Reported-by: syzbot+c92c6a251d49ceceb625@syzkaller.appspotmail.com
>> Signed-off-by: Schspa Shi <schspa@gmail.com>
>> ---
>
> Please fix the commit log a bit more with the cotext I provided, *if*
> on the other thread the community agrees with the approach to be
> compartamentalized. I am still not sure why this would fix the
> UAF after thinking about it some more, and the issue would mean
> there likely could be a generic fix / issue to consider.
>

I think a syntactic sugar for a complete api can be added here for a
generic usage.

> So for now I'd like more review of this race and the proposed fix
> as I mentioned in the follow-up threaad in your v1 patch. Let's
> follow up there and see how that discussion goes.
>

Ok, let's talk about this on the v1 patch's thread.

>   Luis
  

Patch

diff --git a/kernel/umh.c b/kernel/umh.c
index 850631518665..d8350a195c7f 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -452,6 +452,10 @@  int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
 		/* umh_complete() will see NULL and free sub_info */
 		if (xchg(&sub_info->complete, NULL))
 			goto unlock;
+		/*
+		 * umh_complete will call complete() shortly.
+		 */
+		wait_for_completion(&done);
 	}
 
 wait_done: