Message ID | 20221115140233.21981-1-schspa@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp2747953wru; Tue, 15 Nov 2022 06:06:42 -0800 (PST) X-Google-Smtp-Source: AA0mqf5CSLMEt8Zx6ob2qHbHKHQjw5jfsEhQ0nMAoxMUPclCuuE54mhntab8L/YOTMPIa+wRH0H3 X-Received: by 2002:a63:b5b:0:b0:461:a261:eb50 with SMTP id a27-20020a630b5b000000b00461a261eb50mr16135222pgl.311.1668521196832; Tue, 15 Nov 2022 06:06:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668521196; cv=none; d=google.com; s=arc-20160816; b=YGjzD8UJtGY3i+lwUAySlgl5gQeDGhuuoAvMTteNMJ6fy5YdoNxXS3piZ4E4g8Bcpf YKl8H8p0jfaWo9rMpihBCugrAQ/MJp5YZD1fvklvTE9YrClMFxbk0VphaxFAharavrRH xS5vRzE0p+4wy1Ahlm5E/nRtjWdRkETciyyUd279dAwp7ffIkLQFlmoOxoR8oXQB3vwE rQDRbe9hPjG+FBQzO+waq6HBFusUsjRKm1JCcXgzUIPoLVS8W1hOFhAOM+HXcoQ40YRQ s3qb90wX7k3MrBpXOG1NHlMiFelOOrCQfOrthDof7Exuh5r26DovBVqiIL4nApgJwxuq FQNg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=jCMTMQ3bV9ouYbsVyfPhqKIkECYbZdKTE0krZImaYNQ=; b=E0Za7XcGHERCKFB79O1bRDqtjxYu56sSdQvIYH/6OyuqOPjkdh0/z839cZW5963tEo /rzPE0Iu44BY3arGoYkBTJzH9PEjn26TDEvHin6zEhlix8pr6DEjZHLTE5wrsNLhFKSO czCzTio+oS8rwyysk5MqRSTINPOjKOtjyiromfcAtn2KHPr+0xdbWIinO01n29bJrHeG x7iy+TDrmEDPr0r57n2QTXraGn3Xy2eV90vzapPCUWa7xkfTRvs58GTkFRROE5Uh1nPM hteQF422KepLjZLfkCUsWQ5I66jZsXbpajmScnLLgMYVLK8ddAcd94Q6jb45G1REg3tj o59w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=RIUc50f4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 7-20020a630907000000b0043ca0a1f7dasi11681135pgj.674.2022.11.15.06.06.07; Tue, 15 Nov 2022 06:06:36 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=RIUc50f4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238043AbiKOOCs (ORCPT <rfc822;lkml4gm@gmail.com> + 99 others); Tue, 15 Nov 2022 09:02:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59704 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231211AbiKOOCr (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 15 Nov 2022 09:02:47 -0500 Received: from mail-io1-xd2b.google.com (mail-io1-xd2b.google.com [IPv6:2607:f8b0:4864:20::d2b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67AE62790B for <linux-kernel@vger.kernel.org>; Tue, 15 Nov 2022 06:02:46 -0800 (PST) Received: by mail-io1-xd2b.google.com with SMTP id r81so10710051iod.2 for <linux-kernel@vger.kernel.org>; Tue, 15 Nov 2022 06:02:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=jCMTMQ3bV9ouYbsVyfPhqKIkECYbZdKTE0krZImaYNQ=; b=RIUc50f4lWMEMIerZkZiP2SVyqrwDdWHyiMACL97pRFmxXOtmxb/yTHIr/kLfTqJ8G sVLx34JgrjsaLjGHkf43Ao5XqqtnW74dam++S0DJ3yhiHGLRlO2reCgXXpaIDfO1JSHb LHOWrHc/ESMgKjdqAqWwEdcHnEU7i6rAijFa/gZ16d/44h/dXzeBVtIF8XHmgTwXR9Ha MnuvbBzAXlHwbhtZoJxkUEu/SqeglkIjFmaQnbTSzXFmt5dK0JhJvsAkwA4l5cXjo41d ANslkgJayLwsFgFsDxcN92L5HwmMti5LjBP9/j5xxa800sNPJth7BRwZq50VxUAQg8HK Idhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jCMTMQ3bV9ouYbsVyfPhqKIkECYbZdKTE0krZImaYNQ=; b=Y6rv1m9Qm4UcyMjzmoaw7EnxyQgAjyg9hpYbcUPKzFC5CrH7wPg0z6WUTcQiSaax8t xAe/K0A3z2Lq3Jkxd4Tut0JXuMIVLWcY2HpCi16kuKIOvYSUf3upOVJK9RwQZ57uXPcW UeH2Rf4wfn4H9HqycXvFOvPGAeSi0eQAXICyVHiJPrh6o/3tLkTEif+etv+8RIK/lw36 416WVV8kd5wHvY47sl1wm0nWhRMnQq52uaT+d4L2SUOiCRNedVxBzT5L1D5tSYyp/lp4 O6JskhZUhaGFmRSvZ6nxozE5gN8+U9BE6H8dhy9bsshXskyiAm31a9ucHBCHXQ6OOrNV Qsyg== X-Gm-Message-State: ANoB5pk3nA+7ZaUW0E1NUP7zkoqj3yT3/cDCjpbE9dv10gcgZVinAwgg TXOnqMUlQEmF8MB8ltpCw+CEX9X86GNXog== X-Received: by 2002:a5d:80d4:0:b0:6bc:ebd:4df9 with SMTP id h20-20020a5d80d4000000b006bc0ebd4df9mr7212232ior.84.1668520965406; Tue, 15 Nov 2022 06:02:45 -0800 (PST) Received: from MBP.lan (ec2-18-117-95-84.us-east-2.compute.amazonaws.com. [18.117.95.84]) by smtp.gmail.com with ESMTPSA id y16-20020a920910000000b002e85e8b8d1dsm5095192ilg.5.2022.11.15.06.02.41 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 15 Nov 2022 06:02:44 -0800 (PST) From: Schspa Shi <schspa@gmail.com> To: mcgrof@kernel.org Cc: linux-kernel@vger.kernel.org, Schspa Shi <schspa@gmail.com>, syzbot+10d19d528d9755d9af22@syzkaller.appspotmail.com, syzbot+70d5d5d83d03db2c813d@syzkaller.appspotmail.com, syzbot+83cb0411d0fcf0a30fc1@syzkaller.appspotmail.com Subject: [PATCH] umh: fix UAF when the process is being killed Date: Tue, 15 Nov 2022 22:02:33 +0800 Message-Id: <20221115140233.21981-1-schspa@gmail.com> X-Mailer: git-send-email 2.37.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749571282373561181?= X-GMAIL-MSGID: =?utf-8?q?1749571282373561181?= |
Series |
umh: fix UAF when the process is being killed
|
|
Commit Message
Schspa Shi
Nov. 15, 2022, 2:02 p.m. UTC
When the process is killed, wait_for_completion_state will return with
-ERESTARTSYS, and the completion variable in the stack will be freed.
If the user-mode thread is complete at the same time, there will be a UAF.
Please refer to the following scenarios.
T1 T2
------------------------------------------------------------------
call_usermodehelper_exec
call_usermodehelper_exec_async
<< do something >>
umh_complete(sub_info);
comp = xchg(&sub_info->complete, NULL);
/* we got the completion */
<< context switch >>
<< Being killed >>
retval = wait_for_completion_state(sub_info->complete, state);
if (!retval)
goto wait_done;
if (wait & UMH_KILLABLE) {
/* umh_complete() will see NULL and free sub_info */
if (xchg(&sub_info->complete, NULL))
goto unlock;
<< we can't got the completion >>
}
....
unlock:
helper_unlock();
return retval;
}
/**
* the completion variable in stack is end of life cycle.
* and maybe freed due to process is recycled.
*/
--------UAF here----------
if (comp)
complete(comp);
To fix it, we can put the completion variable in the subprocess_info
variable.
Reported-by: syzbot+10d19d528d9755d9af22@syzkaller.appspotmail.com
Reported-by: syzbot+70d5d5d83d03db2c813d@syzkaller.appspotmail.com
Reported-by: syzbot+83cb0411d0fcf0a30fc1@syzkaller.appspotmail.com
Signed-off-by: Schspa Shi <schspa@gmail.com>
---
include/linux/umh.h | 1 +
kernel/umh.c | 6 +++---
2 files changed, 4 insertions(+), 3 deletions(-)
Comments
Schspa Shi <schspa@gmail.com> writes: > When the process is killed, wait_for_completion_state will return with > -ERESTARTSYS, and the completion variable in the stack will be freed. > If the user-mode thread is complete at the same time, there will be a UAF. > > Please refer to the following scenarios. > T1 T2 > ------------------------------------------------------------------ > call_usermodehelper_exec > call_usermodehelper_exec_async > << do something >> > umh_complete(sub_info); > comp = xchg(&sub_info->complete, NULL); > /* we got the completion */ > << context switch >> > > << Being killed >> > retval = wait_for_completion_state(sub_info->complete, state); > if (!retval) > goto wait_done; > > if (wait & UMH_KILLABLE) { > /* umh_complete() will see NULL and free sub_info */ > if (xchg(&sub_info->complete, NULL)) > goto unlock; > << we can't got the completion >> > } > .... > unlock: > helper_unlock(); > return retval; > } > > /** > * the completion variable in stack is end of life cycle. > * and maybe freed due to process is recycled. > */ > --------UAF here---------- > if (comp) > complete(comp); > > To fix it, we can put the completion variable in the subprocess_info > variable. > > Reported-by: syzbot+10d19d528d9755d9af22@syzkaller.appspotmail.com > Reported-by: syzbot+70d5d5d83d03db2c813d@syzkaller.appspotmail.com > Reported-by: syzbot+83cb0411d0fcf0a30fc1@syzkaller.appspotmail.com > > Signed-off-by: Schspa Shi <schspa@gmail.com> > --- > include/linux/umh.h | 1 + > kernel/umh.c | 6 +++--- > 2 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/include/linux/umh.h b/include/linux/umh.h > index 5d1f6129b847..801f7efbc825 100644 > --- a/include/linux/umh.h > +++ b/include/linux/umh.h > @@ -20,6 +20,7 @@ struct file; > struct subprocess_info { > struct work_struct work; > struct completion *complete; > + struct completion done; > const char *path; > char **argv; > char **envp; > diff --git a/kernel/umh.c b/kernel/umh.c > index 850631518665..3ed39956c777 100644 > --- a/kernel/umh.c > +++ b/kernel/umh.c > @@ -380,6 +380,7 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv, > sub_info->cleanup = cleanup; > sub_info->init = init; > sub_info->data = data; > + init_completion(&sub_info->done); > out: > return sub_info; > } > @@ -405,7 +406,6 @@ EXPORT_SYMBOL(call_usermodehelper_setup); > int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) > { > unsigned int state = TASK_UNINTERRUPTIBLE; > - DECLARE_COMPLETION_ONSTACK(done); > int retval = 0; > > if (!sub_info->path) { > @@ -431,7 +431,7 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) > * This makes it possible to use umh_complete to free > * the data structure in case of UMH_NO_WAIT. > */ > - sub_info->complete = (wait == UMH_NO_WAIT) ? NULL : &done; > + sub_info->complete = (wait == UMH_NO_WAIT) ? NULL : &sub_info->done; > sub_info->wait = wait; > > queue_work(system_unbound_wq, &sub_info->work); > @@ -444,7 +444,7 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) > if (wait & UMH_FREEZABLE) > state |= TASK_FREEZABLE; > > - retval = wait_for_completion_state(&done, state); > + retval = wait_for_completion_state(sub_info->complete, state); > if (!retval) > goto wait_done; Hi Luis Chamberlain: Could you help to review this patch? I'm not sure why we define the amount of completion here on the stack. But this UAF can be fixed by moving the completion variable to the heap.
On Mon, Dec 05, 2022 at 07:38:21PM +0800, Schspa Shi wrote: > > Schspa Shi <schspa@gmail.com> writes: > > > When the process is killed, wait_for_completion_state will return with > > -ERESTARTSYS, and the completion variable in the stack will be freed. There is no free'ing here, it's just not availabel anymore, which is different. > > If the user-mode thread is complete at the same time, there will be a UAF. > > > > Please refer to the following scenarios. > > T1 T2 > > ------------------------------------------------------------------ > > call_usermodehelper_exec > > call_usermodehelper_exec_async > > << do something >> > > umh_complete(sub_info); > > comp = xchg(&sub_info->complete, NULL); > > /* we got the completion */ > > << context switch >> > > > > << Being killed >> > > retval = wait_for_completion_state(sub_info->complete, state); > > if (!retval) > > goto wait_done; > > > > if (wait & UMH_KILLABLE) { > > /* umh_complete() will see NULL and free sub_info */ > > if (xchg(&sub_info->complete, NULL)) > > goto unlock; > > << we can't got the completion >> I'm sorry I don't understand what you tried to say here, we can't got? > > } > > .... > > unlock: > > helper_unlock(); > > return retval; > > } > > > > /** > > * the completion variable in stack is end of life cycle. > > * and maybe freed due to process is recycled. > > */ > > --------UAF here---------- > > if (comp) > > complete(comp); > > > > To fix it, we can put the completion variable in the subprocess_info > > variable. > > > > Reported-by: syzbot+10d19d528d9755d9af22@syzkaller.appspotmail.com > > Reported-by: syzbot+70d5d5d83d03db2c813d@syzkaller.appspotmail.com > > Reported-by: syzbot+83cb0411d0fcf0a30fc1@syzkaller.appspotmail.com > > > > Signed-off-by: Schspa Shi <schspa@gmail.com> > > --- > > include/linux/umh.h | 1 + > > kernel/umh.c | 6 +++--- > > 2 files changed, 4 insertions(+), 3 deletions(-) > > > > diff --git a/include/linux/umh.h b/include/linux/umh.h > > index 5d1f6129b847..801f7efbc825 100644 > > --- a/include/linux/umh.h > > +++ b/include/linux/umh.h > > @@ -20,6 +20,7 @@ struct file; > > struct subprocess_info { > > struct work_struct work; > > struct completion *complete; > > + struct completion done; > > const char *path; > > char **argv; > > char **envp; > > diff --git a/kernel/umh.c b/kernel/umh.c > > index 850631518665..3ed39956c777 100644 > > --- a/kernel/umh.c > > +++ b/kernel/umh.c > > @@ -380,6 +380,7 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv, > > sub_info->cleanup = cleanup; > > sub_info->init = init; > > sub_info->data = data; > > + init_completion(&sub_info->done); > > out: > > return sub_info; > > } > > @@ -405,7 +406,6 @@ EXPORT_SYMBOL(call_usermodehelper_setup); > > int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) > > { > > unsigned int state = TASK_UNINTERRUPTIBLE; > > - DECLARE_COMPLETION_ONSTACK(done); > > int retval = 0; > > > > if (!sub_info->path) { > > @@ -431,7 +431,7 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) > > * This makes it possible to use umh_complete to free > > * the data structure in case of UMH_NO_WAIT. > > */ > > - sub_info->complete = (wait == UMH_NO_WAIT) ? NULL : &done; > > + sub_info->complete = (wait == UMH_NO_WAIT) ? NULL : &sub_info->done; > > sub_info->wait = wait; > > > > queue_work(system_unbound_wq, &sub_info->work); > > @@ -444,7 +444,7 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) > > if (wait & UMH_FREEZABLE) > > state |= TASK_FREEZABLE; > > > > - retval = wait_for_completion_state(&done, state); > > + retval = wait_for_completion_state(sub_info->complete, state); > > if (!retval) > > goto wait_done; > > Hi Luis Chamberlain: > > Could you help to review this patch? I'm not sure why we define the > amount of completion here on the stack. But this UAF can be fixed by > moving the completion variable to the heap. It would seem to me that if this is an issue other areas would have similar races as well, so I was hard pressed about the approach / fix. Wouldn't something like this be a bit more explicit about ensuring we don't let the work item race beyond? diff --git a/kernel/umh.c b/kernel/umh.c index 850631518665..55fc698115a7 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -447,6 +447,8 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) retval = wait_for_completion_state(&done, state); if (!retval) goto wait_done; + else if (retval == -ERESTARTSYS) + cancel_work_sync(&sub_info->work); if (wait & UMH_KILLABLE) { /* umh_complete() will see NULL and free sub_info */
Luis Chamberlain <mcgrof@kernel.org> writes: > On Mon, Dec 05, 2022 at 07:38:21PM +0800, Schspa Shi wrote: >> >> Schspa Shi <schspa@gmail.com> writes: >> >> > When the process is killed, wait_for_completion_state will return with >> > -ERESTARTSYS, and the completion variable in the stack will be freed. > > There is no free'ing here, it's just not availabel anymore, which is > different. > No, the whole thread stack will be freed when the process died. There will be some cases where it will be released. It will be more accurate to modify it to be unavailable. >> > If the user-mode thread is complete at the same time, there will be a UAF. >> > >> > Please refer to the following scenarios. >> > T1 T2 >> > ------------------------------------------------------------------ >> > call_usermodehelper_exec >> > call_usermodehelper_exec_async >> > << do something >> >> > umh_complete(sub_info); >> > comp = xchg(&sub_info->complete, NULL); >> > /* we got the completion */ >> > << context switch >> The sub_info->complete will be set to NULL. >> > >> > << Being killed >> >> > retval = wait_for_completion_state(sub_info->complete, state); >> > if (!retval) >> > goto wait_done; >> > >> > if (wait & UMH_KILLABLE) { >> > /* umh_complete() will see NULL and free sub_info */ >> > if (xchg(&sub_info->complete, NULL)) >> > goto unlock; >> > << we can't got the completion >> > > I'm sorry I don't understand what you tried to say here, we can't got? > In this scenario, the sub_info->complete will be NULL, at the call_usermodehelper_exec_async, and we will go to the unlock branch here. >> > } >> > .... >> > unlock: >> > helper_unlock(); >> > return retval; >> > } >> > >> > /** >> > * the completion variable in stack is end of life cycle. >> > * and maybe freed due to process is recycled. >> > */ >> > --------UAF here---------- >> > if (comp) >> > complete(comp); >> > >> > To fix it, we can put the completion variable in the subprocess_info >> > variable. >> > >> > Reported-by: syzbot+10d19d528d9755d9af22@syzkaller.appspotmail.com >> > Reported-by: syzbot+70d5d5d83d03db2c813d@syzkaller.appspotmail.com >> > Reported-by: syzbot+83cb0411d0fcf0a30fc1@syzkaller.appspotmail.com >> > >> > Signed-off-by: Schspa Shi <schspa@gmail.com> >> > --- >> > include/linux/umh.h | 1 + >> > kernel/umh.c | 6 +++--- >> > 2 files changed, 4 insertions(+), 3 deletions(-) >> > >> > diff --git a/include/linux/umh.h b/include/linux/umh.h >> > index 5d1f6129b847..801f7efbc825 100644 >> > --- a/include/linux/umh.h >> > +++ b/include/linux/umh.h >> > @@ -20,6 +20,7 @@ struct file; >> > struct subprocess_info { >> > struct work_struct work; >> > struct completion *complete; >> > + struct completion done; >> > const char *path; >> > char **argv; >> > char **envp; >> > diff --git a/kernel/umh.c b/kernel/umh.c >> > index 850631518665..3ed39956c777 100644 >> > --- a/kernel/umh.c >> > +++ b/kernel/umh.c >> > @@ -380,6 +380,7 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv, >> > sub_info->cleanup = cleanup; >> > sub_info->init = init; >> > sub_info->data = data; >> > + init_completion(&sub_info->done); >> > out: >> > return sub_info; >> > } >> > @@ -405,7 +406,6 @@ EXPORT_SYMBOL(call_usermodehelper_setup); >> > int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) >> > { >> > unsigned int state = TASK_UNINTERRUPTIBLE; >> > - DECLARE_COMPLETION_ONSTACK(done); >> > int retval = 0; >> > >> > if (!sub_info->path) { >> > @@ -431,7 +431,7 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) >> > * This makes it possible to use umh_complete to free >> > * the data structure in case of UMH_NO_WAIT. >> > */ >> > - sub_info->complete = (wait == UMH_NO_WAIT) ? NULL : &done; >> > + sub_info->complete = (wait == UMH_NO_WAIT) ? NULL : &sub_info->done; >> > sub_info->wait = wait; >> > >> > queue_work(system_unbound_wq, &sub_info->work); >> > @@ -444,7 +444,7 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) >> > if (wait & UMH_FREEZABLE) >> > state |= TASK_FREEZABLE; >> > >> > - retval = wait_for_completion_state(&done, state); >> > + retval = wait_for_completion_state(sub_info->complete, state); >> > if (!retval) >> > goto wait_done; >> >> Hi Luis Chamberlain: >> >> Could you help to review this patch? I'm not sure why we define the >> amount of completion here on the stack. But this UAF can be fixed by >> moving the completion variable to the heap. > > It would seem to me that if this is an issue other areas would have > similar races as well, so I was hard pressed about the approach / fix. > I think other modules will have similar bugs, but this is a limitation on the use of the DECLARE_COMPLETION_ONSTACK macro, and it has been specifically stated in the completion's documentation. There is the description from completion's documentation: Note that when using completion objects as local variables you must be acutely aware of the short life time of the function stack: the function must not return to a calling context until all activities (such as waiting threads) have ceased and the completion object is completely unused. > Wouldn't something like this be a bit more explicit about ensuring > we don't let the work item race beyond? > > diff --git a/kernel/umh.c b/kernel/umh.c > index 850631518665..55fc698115a7 100644 > --- a/kernel/umh.c > +++ b/kernel/umh.c > @@ -447,6 +447,8 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) > retval = wait_for_completion_state(&done, state); > if (!retval) > goto wait_done; > + else if (retval == -ERESTARTSYS) > + cancel_work_sync(&sub_info->work); > I think this modification will make UMH_KILLABLE useless, we have to wait for this task to complete, even if it is killed. > if (wait & UMH_KILLABLE) { > /* umh_complete() will see NULL and free sub_info */
Schspa Shi <schspa@gmail.com> writes: > Luis Chamberlain <mcgrof@kernel.org> writes: > >> On Mon, Dec 05, 2022 at 07:38:21PM +0800, Schspa Shi wrote: >>> >>> Schspa Shi <schspa@gmail.com> writes: >>> >>> > When the process is killed, wait_for_completion_state will return with >>> > -ERESTARTSYS, and the completion variable in the stack will be freed. >> >> There is no free'ing here, it's just not availabel anymore, which is >> different. >> > > No, the whole thread stack will be freed when the process died. There > will be some cases where it will be released. It will be more accurate > to modify it to be unavailable. > >>> > If the user-mode thread is complete at the same time, there will be a UAF. >>> > >>> > Please refer to the following scenarios. >>> > T1 T2 >>> > ------------------------------------------------------------------ >>> > call_usermodehelper_exec >>> > call_usermodehelper_exec_async >>> > << do something >> >>> > umh_complete(sub_info); >>> > comp = xchg(&sub_info->complete, NULL); >>> > /* we got the completion */ >>> > << context switch >> > > The sub_info->complete will be set to NULL. > >>> > >>> > << Being killed >> >>> > retval = wait_for_completion_state(sub_info->complete, state); >>> > if (!retval) >>> > goto wait_done; >>> > >>> > if (wait & UMH_KILLABLE) { >>> > /* umh_complete() will see NULL and free sub_info */ >>> > if (xchg(&sub_info->complete, NULL)) >>> > goto unlock; >>> > << we can't got the completion >> >> >> I'm sorry I don't understand what you tried to say here, we can't got? >> > > In this scenario, the sub_info->complete will be NULL, at the > call_usermodehelper_exec_async, and we will go to the unlock branch here. > >>> > } >>> > .... >>> > unlock: >>> > helper_unlock(); >>> > return retval; >>> > } >>> > >>> > /** >>> > * the completion variable in stack is end of life cycle. >>> > * and maybe freed due to process is recycled. >>> > */ >>> > --------UAF here---------- >>> > if (comp) >>> > complete(comp); >>> > >>> > To fix it, we can put the completion variable in the subprocess_info >>> > variable. >>> > >>> > Reported-by: syzbot+10d19d528d9755d9af22@syzkaller.appspotmail.com >>> > Reported-by: syzbot+70d5d5d83d03db2c813d@syzkaller.appspotmail.com >>> > Reported-by: syzbot+83cb0411d0fcf0a30fc1@syzkaller.appspotmail.com >>> > >>> > Signed-off-by: Schspa Shi <schspa@gmail.com> >>> > --- >>> > include/linux/umh.h | 1 + >>> > kernel/umh.c | 6 +++--- >>> > 2 files changed, 4 insertions(+), 3 deletions(-) >>> > >>> > diff --git a/include/linux/umh.h b/include/linux/umh.h >>> > index 5d1f6129b847..801f7efbc825 100644 >>> > --- a/include/linux/umh.h >>> > +++ b/include/linux/umh.h >>> > @@ -20,6 +20,7 @@ struct file; >>> > struct subprocess_info { >>> > struct work_struct work; >>> > struct completion *complete; >>> > + struct completion done; >>> > const char *path; >>> > char **argv; >>> > char **envp; >>> > diff --git a/kernel/umh.c b/kernel/umh.c >>> > index 850631518665..3ed39956c777 100644 >>> > --- a/kernel/umh.c >>> > +++ b/kernel/umh.c >>> > @@ -380,6 +380,7 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv, >>> > sub_info->cleanup = cleanup; >>> > sub_info->init = init; >>> > sub_info->data = data; >>> > + init_completion(&sub_info->done); >>> > out: >>> > return sub_info; >>> > } >>> > @@ -405,7 +406,6 @@ EXPORT_SYMBOL(call_usermodehelper_setup); >>> > int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) >>> > { >>> > unsigned int state = TASK_UNINTERRUPTIBLE; >>> > - DECLARE_COMPLETION_ONSTACK(done); >>> > int retval = 0; >>> > >>> > if (!sub_info->path) { >>> > @@ -431,7 +431,7 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) >>> > * This makes it possible to use umh_complete to free >>> > * the data structure in case of UMH_NO_WAIT. >>> > */ >>> > - sub_info->complete = (wait == UMH_NO_WAIT) ? NULL : &done; >>> > + sub_info->complete = (wait == UMH_NO_WAIT) ? NULL : &sub_info->done; >>> > sub_info->wait = wait; >>> > >>> > queue_work(system_unbound_wq, &sub_info->work); >>> > @@ -444,7 +444,7 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) >>> > if (wait & UMH_FREEZABLE) >>> > state |= TASK_FREEZABLE; >>> > >>> > - retval = wait_for_completion_state(&done, state); >>> > + retval = wait_for_completion_state(sub_info->complete, state); >>> > if (!retval) >>> > goto wait_done; >>> >>> Hi Luis Chamberlain: >>> >>> Could you help to review this patch? I'm not sure why we define the >>> amount of completion here on the stack. But this UAF can be fixed by >>> moving the completion variable to the heap. >> >> It would seem to me that if this is an issue other areas would have >> similar races as well, so I was hard pressed about the approach / fix. >> > > I think other modules will have similar bugs, but this is a limitation > on the use of the DECLARE_COMPLETION_ONSTACK macro, and it has been > specifically stated in the completion's documentation. > > There is the description from completion's documentation: > > Note that when using completion objects as local variables you must be > acutely aware of the short life time of the function stack: the function > must not return to a calling context until all activities (such as waiting > threads) have ceased and the completion object is completely unused. > >> Wouldn't something like this be a bit more explicit about ensuring >> we don't let the work item race beyond? >> >> diff --git a/kernel/umh.c b/kernel/umh.c >> index 850631518665..55fc698115a7 100644 >> --- a/kernel/umh.c >> +++ b/kernel/umh.c >> @@ -447,6 +447,8 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) >> retval = wait_for_completion_state(&done, state); >> if (!retval) >> goto wait_done; >> + else if (retval == -ERESTARTSYS) >> + cancel_work_sync(&sub_info->work); >> > > I think this modification will make UMH_KILLABLE useless, we have to > wait for this task to complete, even if it is killed. > >> if (wait & UMH_KILLABLE) { >> /* umh_complete() will see NULL and free sub_info */ Hi Luis Chamberlain: I checked the code from __kthread_create_on_node, and we can fix this with the following change too. I'd like to upload a V2 patch with the new solution if you prefer the following way. diff --git a/kernel/umh.c b/kernel/umh.c index 850631518665..8023f11fcfc0 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -452,6 +452,11 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) /* umh_complete() will see NULL and free sub_info */ if (xchg(&sub_info->complete, NULL)) goto unlock; + /* + * kthreadd (or new kernel thread) will call complete() + * shortly. + */ + wait_for_completion(&done); } wait_done:
On Mon, Dec 12, 2022 at 09:38:31PM +0800, Schspa Shi wrote: > I'd like to upload a V2 patch with the new solution if you prefer the > following way. > > diff --git a/kernel/umh.c b/kernel/umh.c > index 850631518665..8023f11fcfc0 100644 > --- a/kernel/umh.c > +++ b/kernel/umh.c > @@ -452,6 +452,11 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) > /* umh_complete() will see NULL and free sub_info */ > if (xchg(&sub_info->complete, NULL)) > goto unlock; > + /* > + * kthreadd (or new kernel thread) will call complete() > + * shortly. > + */ > + wait_for_completion(&done); > } Yes much better. Did you verify it fixes the splat found by the bots? Luis
Luis Chamberlain <mcgrof@kernel.org> writes: > On Mon, Dec 12, 2022 at 09:38:31PM +0800, Schspa Shi wrote: >> I'd like to upload a V2 patch with the new solution if you prefer the >> following way. >> >> diff --git a/kernel/umh.c b/kernel/umh.c >> index 850631518665..8023f11fcfc0 100644 >> --- a/kernel/umh.c >> +++ b/kernel/umh.c >> @@ -452,6 +452,11 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) >> /* umh_complete() will see NULL and free sub_info */ >> if (xchg(&sub_info->complete, NULL)) >> goto unlock; >> + /* >> + * kthreadd (or new kernel thread) will call complete() >> + * shortly. >> + */ >> + wait_for_completion(&done); >> } > > Yes much better. Did you verify it fixes the splat found by the bots? > Yes, it will fix it. > Luis
Peter, Ingo, Steven would like you're review. On Tue, Dec 13, 2022 at 03:03:53PM -0800, Luis Chamberlain wrote: > On Mon, Dec 12, 2022 at 09:38:31PM +0800, Schspa Shi wrote: > > I'd like to upload a V2 patch with the new solution if you prefer the > > following way. > > > > diff --git a/kernel/umh.c b/kernel/umh.c > > index 850631518665..8023f11fcfc0 100644 > > --- a/kernel/umh.c > > +++ b/kernel/umh.c > > @@ -452,6 +452,11 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) > > /* umh_complete() will see NULL and free sub_info */ > > if (xchg(&sub_info->complete, NULL)) > > goto unlock; > > + /* > > + * kthreadd (or new kernel thread) will call complete() > > + * shortly. > > + */ > > + wait_for_completion(&done); > > } > > Yes much better. Did you verify it fixes the splat found by the bots? Wait, I'm not sure yet why this would fix it... I first started thinking that this may be a good example of a Coccinelle SmPL rule, something like: DECLARE_COMPLETION_ONSTACK(done); foo *foo; ... foo->completion = &done; ... queue_work(system_unbound_wq, &foo->work); .... ret = wait_for_completion_state(&done, state); ... if (!ret) S ... +wait_for_completion(&done); But that is pretty complex, and while it may be useful to know how many patterns we have like this, it begs the question if generalizing this inside the callers is best for -ERESTARTSYS condition is best. What do folks think? The rationale here is that if you queue stuff and give access to the completion variable but its on-stack obviously you can end up with the queued stuff complete() on a on-stack variable. The issue seems to be that wait_for_completion_state() for -ERESTARTSYS still means that the already scheduled queue'd work is *about* to run and the process with the completion on-stack completed. So we race with the end of the routine and the completion on-stack. It makes me wonder if wait_for_completion() above really is doing something more, if it is just helping with timing and is still error prone. The queued work will try the the completion as follows: static void umh_complete(struct subprocess_info *sub_info) { struct completion *comp = xchg(&sub_info->complete, NULL); /* * See call_usermodehelper_exec(). If xchg() returns NULL * we own sub_info, the UMH_KILLABLE caller has gone away * or the caller used UMH_NO_WAIT. */ if (comp) complete(comp); else call_usermodehelper_freeinfo(sub_info); } So the race is getting -ERESTARTSYS on the process with completion on-stack and the above running complete(comp). Why would sprinkling wait_for_completion(&done) *after* wait_for_completion_state(&done, state) fix this UAF? } diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c index d57a5c1c1cd9..aa7031faca04 100644 --- a/kernel/sched/completion.c +++ b/kernel/sched/completion.c @@ -205,8 +205,10 @@ int __sched wait_for_completion_interruptible(struct completion *x) { long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_INTERRUPTIBLE); - if (t == -ERESTARTSYS) + if (t == -ERESTARTSYS) { + wait_for_completion(x); return t; + } return 0; } EXPORT_SYMBOL(wait_for_completion_interruptible); @@ -243,8 +245,10 @@ int __sched wait_for_completion_killable(struct completion *x) { long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_KILLABLE); - if (t == -ERESTARTSYS) + if (t == -ERESTARTSYS) { + wait_for_completion(x); return t; + } return 0; } EXPORT_SYMBOL(wait_for_completion_killable); @@ -253,8 +257,10 @@ int __sched wait_for_completion_state(struct completion *x, unsigned int state) { long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, state); - if (t == -ERESTARTSYS) + if (t == -ERESTARTSYS) { + wait_for_completion(x); return t; + } return 0; } EXPORT_SYMBOL(wait_for_completion_state);
Luis Chamberlain <mcgrof@kernel.org> writes: > Peter, Ingo, Steven would like you're review. > > On Tue, Dec 13, 2022 at 03:03:53PM -0800, Luis Chamberlain wrote: >> On Mon, Dec 12, 2022 at 09:38:31PM +0800, Schspa Shi wrote: >> > I'd like to upload a V2 patch with the new solution if you prefer the >> > following way. >> > >> > diff --git a/kernel/umh.c b/kernel/umh.c >> > index 850631518665..8023f11fcfc0 100644 >> > --- a/kernel/umh.c >> > +++ b/kernel/umh.c >> > @@ -452,6 +452,11 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) >> > /* umh_complete() will see NULL and free sub_info */ >> > if (xchg(&sub_info->complete, NULL)) >> > goto unlock; >> > + /* >> > + * kthreadd (or new kernel thread) will call complete() >> > + * shortly. >> > + */ >> > + wait_for_completion(&done); >> > } >> >> Yes much better. Did you verify it fixes the splat found by the bots? > > Wait, I'm not sure yet why this would fix it... I first started thinking > that this may be a good example of a Coccinelle SmPL rule, something like: > > DECLARE_COMPLETION_ONSTACK(done); > foo *foo; > ... > foo->completion = &done; > ... > queue_work(system_unbound_wq, &foo->work); > .... > ret = wait_for_completion_state(&done, state); > ... > if (!ret) > S > ... > +wait_for_completion(&done); > > But that is pretty complex, and while it may be useful to know how many > patterns we have like this, it begs the question if generalizing this > inside the callers is best for -ERESTARTSYS condition is best. What > do folks think? > > The rationale here is that if you queue stuff and give access to the > completion variable but its on-stack obviously you can end up with the > queued stuff complete() on a on-stack variable. The issue seems to > be that wait_for_completion_state() for -ERESTARTSYS still means > that the already scheduled queue'd work is *about* to run and > the process with the completion on-stack completed. So we race with > the end of the routine and the completion on-stack. > > It makes me wonder if wait_for_completion() above really is doing > something more, if it is just helping with timing and is still error > prone. > > The queued work will try the the completion as follows: > > static void umh_complete(struct subprocess_info *sub_info) > { > struct completion *comp = xchg(&sub_info->complete, NULL); > /* > * See call_usermodehelper_exec(). If xchg() returns NULL > * we own sub_info, the UMH_KILLABLE caller has gone away > * or the caller used UMH_NO_WAIT. > */ > if (comp) > complete(comp); > else > call_usermodehelper_freeinfo(sub_info); > } > > So the race is getting -ERESTARTSYS on the process with completion > on-stack and the above running complete(comp). Why would sprinkling > wait_for_completion(&done) *after* wait_for_completion_state(&done, state) > fix this UAF? The wait_for_completion(&done) is added when xchg(&sub_info->complete, NULL) return NULL. When it returns NULL, it means the umh_complete was using the completion variable at the same time and will call complete in a very short time. Add wait_for_completion *after* wait_for_completion_state will make the interruptible/timeout version API not working anymore. > } > diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c > index d57a5c1c1cd9..aa7031faca04 100644 > --- a/kernel/sched/completion.c > +++ b/kernel/sched/completion.c > @@ -205,8 +205,10 @@ int __sched wait_for_completion_interruptible(struct completion *x) > { > long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_INTERRUPTIBLE); > > - if (t == -ERESTARTSYS) > + if (t == -ERESTARTSYS) { > + wait_for_completion(x); > return t; > + } > return 0; > } > EXPORT_SYMBOL(wait_for_completion_interruptible); > @@ -243,8 +245,10 @@ int __sched wait_for_completion_killable(struct completion *x) > { > long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_KILLABLE); > > - if (t == -ERESTARTSYS) > + if (t == -ERESTARTSYS) { > + wait_for_completion(x); > return t; > + } > return 0; > } > EXPORT_SYMBOL(wait_for_completion_killable); > @@ -253,8 +257,10 @@ int __sched wait_for_completion_state(struct completion *x, unsigned int state) > { > long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, state); > > - if (t == -ERESTARTSYS) > + if (t == -ERESTARTSYS) { > + wait_for_completion(x); > return t; > + } > return 0; > } > EXPORT_SYMBOL(wait_for_completion_state); If we want to make it a generic fix, syntactic sugar can be added to simplify usage for users. Consider the following patch. diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c index d57a5c1c1cd9..67b7d02c0098 100644 --- a/kernel/sched/completion.c +++ b/kernel/sched/completion.c @@ -341,3 +341,33 @@ bool completion_done(struct completion *x) return true; } EXPORT_SYMBOL(completion_done); + +void complete_on_stack(struct completion **x) +{ + struct completion *comp = xchg(*x, NULL); + + if (comp) + complete(comp); +} +EXPORT_SYMBOL(complete_on_stack); + +int __sched wait_for_completion_state_on_stack(struct completion **x, + unsigned int state) +{ + struct completion *comp = *x; + int retval; + + retval = wait_for_completion_state(comp, state); + if (retval) { + if (xchg(*x, NULL)) + return retval; + + /* + * complete_on_stack will call complete shortly. + */ + wait_for_completion(comp); + } + + return retval; +} +EXPORT_SYMBOL(wait_for_completion_state_on_stack);
Schspa Shi <schspa@gmail.com> writes: > Luis Chamberlain <mcgrof@kernel.org> writes: > >> Peter, Ingo, Steven would like you're review. >> >> On Tue, Dec 13, 2022 at 03:03:53PM -0800, Luis Chamberlain wrote: >>> On Mon, Dec 12, 2022 at 09:38:31PM +0800, Schspa Shi wrote: >>> > I'd like to upload a V2 patch with the new solution if you prefer the >>> > following way. >>> > >>> > diff --git a/kernel/umh.c b/kernel/umh.c >>> > index 850631518665..8023f11fcfc0 100644 >>> > --- a/kernel/umh.c >>> > +++ b/kernel/umh.c >>> > @@ -452,6 +452,11 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) >>> > /* umh_complete() will see NULL and free sub_info */ >>> > if (xchg(&sub_info->complete, NULL)) >>> > goto unlock; >>> > + /* >>> > + * kthreadd (or new kernel thread) will call complete() >>> > + * shortly. >>> > + */ >>> > + wait_for_completion(&done); >>> > } >>> >>> Yes much better. Did you verify it fixes the splat found by the bots? >> >> Wait, I'm not sure yet why this would fix it... I first started thinking >> that this may be a good example of a Coccinelle SmPL rule, something like: >> >> DECLARE_COMPLETION_ONSTACK(done); >> foo *foo; >> ... >> foo->completion = &done; >> ... >> queue_work(system_unbound_wq, &foo->work); >> .... >> ret = wait_for_completion_state(&done, state); >> ... >> if (!ret) >> S >> ... >> +wait_for_completion(&done); >> >> But that is pretty complex, and while it may be useful to know how many >> patterns we have like this, it begs the question if generalizing this >> inside the callers is best for -ERESTARTSYS condition is best. What >> do folks think? >> >> The rationale here is that if you queue stuff and give access to the >> completion variable but its on-stack obviously you can end up with the >> queued stuff complete() on a on-stack variable. The issue seems to >> be that wait_for_completion_state() for -ERESTARTSYS still means >> that the already scheduled queue'd work is *about* to run and >> the process with the completion on-stack completed. So we race with >> the end of the routine and the completion on-stack. >> >> It makes me wonder if wait_for_completion() above really is doing >> something more, if it is just helping with timing and is still error >> prone. >> >> The queued work will try the the completion as follows: >> >> static void umh_complete(struct subprocess_info *sub_info) >> { >> struct completion *comp = xchg(&sub_info->complete, NULL); >> /* >> * See call_usermodehelper_exec(). If xchg() returns NULL >> * we own sub_info, the UMH_KILLABLE caller has gone away >> * or the caller used UMH_NO_WAIT. >> */ >> if (comp) >> complete(comp); >> else >> call_usermodehelper_freeinfo(sub_info); >> } >> >> So the race is getting -ERESTARTSYS on the process with completion >> on-stack and the above running complete(comp). Why would sprinkling >> wait_for_completion(&done) *after* wait_for_completion_state(&done, state) >> fix this UAF? > > The wait_for_completion(&done) is added when xchg(&sub_info->complete, > NULL) return NULL. When it returns NULL, it means the umh_complete was > using the completion variable at the same time and will call complete > in a very short time. > Hi Luis: Is there any further progress on this problem? Does the above explanation answer your doubts? > Add wait_for_completion *after* wait_for_completion_state will make the > interruptible/timeout version API not working anymore. > >> } >> diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c >> index d57a5c1c1cd9..aa7031faca04 100644 >> --- a/kernel/sched/completion.c >> +++ b/kernel/sched/completion.c >> @@ -205,8 +205,10 @@ int __sched wait_for_completion_interruptible(struct completion *x) >> { >> long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_INTERRUPTIBLE); >> >> - if (t == -ERESTARTSYS) >> + if (t == -ERESTARTSYS) { >> + wait_for_completion(x); >> return t; >> + } >> return 0; >> } >> EXPORT_SYMBOL(wait_for_completion_interruptible); >> @@ -243,8 +245,10 @@ int __sched wait_for_completion_killable(struct completion *x) >> { >> long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_KILLABLE); >> >> - if (t == -ERESTARTSYS) >> + if (t == -ERESTARTSYS) { >> + wait_for_completion(x); >> return t; >> + } >> return 0; >> } >> EXPORT_SYMBOL(wait_for_completion_killable); >> @@ -253,8 +257,10 @@ int __sched wait_for_completion_state(struct completion *x, unsigned int state) >> { >> long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, state); >> >> - if (t == -ERESTARTSYS) >> + if (t == -ERESTARTSYS) { >> + wait_for_completion(x); >> return t; >> + } >> return 0; >> } >> EXPORT_SYMBOL(wait_for_completion_state); > > If we want to make it a generic fix, syntactic sugar can be added to > simplify usage for users. > > Consider the following patch. > > diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c > index d57a5c1c1cd9..67b7d02c0098 100644 > --- a/kernel/sched/completion.c > +++ b/kernel/sched/completion.c > @@ -341,3 +341,33 @@ bool completion_done(struct completion *x) > return true; > } > EXPORT_SYMBOL(completion_done); > + > +void complete_on_stack(struct completion **x) > +{ > + struct completion *comp = xchg(*x, NULL); > + > + if (comp) > + complete(comp); > +} > +EXPORT_SYMBOL(complete_on_stack); > + > +int __sched wait_for_completion_state_on_stack(struct completion **x, > + unsigned int state) > +{ > + struct completion *comp = *x; > + int retval; > + > + retval = wait_for_completion_state(comp, state); > + if (retval) { > + if (xchg(*x, NULL)) > + return retval; > + > + /* > + * complete_on_stack will call complete shortly. > + */ > + wait_for_completion(comp); > + } > + > + return retval; > +} > +EXPORT_SYMBOL(wait_for_completion_state_on_stack);
On Thu, Dec 22, 2022 at 01:45:46PM +0800, Schspa Shi wrote: > > Schspa Shi <schspa@gmail.com> writes: > > > Luis Chamberlain <mcgrof@kernel.org> writes: > > > >> Peter, Ingo, Steven would like you're review. > >> > >> On Tue, Dec 13, 2022 at 03:03:53PM -0800, Luis Chamberlain wrote: > >>> On Mon, Dec 12, 2022 at 09:38:31PM +0800, Schspa Shi wrote: > >>> > I'd like to upload a V2 patch with the new solution if you prefer the > >>> > following way. > >>> > > >>> > diff --git a/kernel/umh.c b/kernel/umh.c > >>> > index 850631518665..8023f11fcfc0 100644 > >>> > --- a/kernel/umh.c > >>> > +++ b/kernel/umh.c > >>> > @@ -452,6 +452,11 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) > >>> > /* umh_complete() will see NULL and free sub_info */ > >>> > if (xchg(&sub_info->complete, NULL)) > >>> > goto unlock; > >>> > + /* > >>> > + * kthreadd (or new kernel thread) will call complete() > >>> > + * shortly. > >>> > + */ > >>> > + wait_for_completion(&done); > >>> > } > >>> > >>> Yes much better. Did you verify it fixes the splat found by the bots? > >> > >> Wait, I'm not sure yet why this would fix it... I first started thinking > >> that this may be a good example of a Coccinelle SmPL rule, something like: > >> > >> DECLARE_COMPLETION_ONSTACK(done); > >> foo *foo; > >> ... > >> foo->completion = &done; > >> ... > >> queue_work(system_unbound_wq, &foo->work); > >> .... > >> ret = wait_for_completion_state(&done, state); > >> ... > >> if (!ret) > >> S > >> ... > >> +wait_for_completion(&done); > >> > >> But that is pretty complex, and while it may be useful to know how many > >> patterns we have like this, it begs the question if generalizing this > >> inside the callers is best for -ERESTARTSYS condition is best. What > >> do folks think? > >> > >> The rationale here is that if you queue stuff and give access to the > >> completion variable but its on-stack obviously you can end up with the > >> queued stuff complete() on a on-stack variable. The issue seems to > >> be that wait_for_completion_state() for -ERESTARTSYS still means > >> that the already scheduled queue'd work is *about* to run and > >> the process with the completion on-stack completed. So we race with > >> the end of the routine and the completion on-stack. > >> > >> It makes me wonder if wait_for_completion() above really is doing > >> something more, if it is just helping with timing and is still error > >> prone. > >> > >> The queued work will try the the completion as follows: > >> > >> static void umh_complete(struct subprocess_info *sub_info) > >> { > >> struct completion *comp = xchg(&sub_info->complete, NULL); > >> /* > >> * See call_usermodehelper_exec(). If xchg() returns NULL > >> * we own sub_info, the UMH_KILLABLE caller has gone away > >> * or the caller used UMH_NO_WAIT. > >> */ > >> if (comp) > >> complete(comp); > >> else > >> call_usermodehelper_freeinfo(sub_info); > >> } > >> > >> So the race is getting -ERESTARTSYS on the process with completion > >> on-stack and the above running complete(comp). Why would sprinkling > >> wait_for_completion(&done) *after* wait_for_completion_state(&done, state) > >> fix this UAF? > > > > The wait_for_completion(&done) is added when xchg(&sub_info->complete, > > NULL) return NULL. When it returns NULL, it means the umh_complete was > > using the completion variable at the same time and will call complete > > in a very short time. > > > Hi Luis: > > Is there any further progress on this problem? Does the above > explanation answer your doubts? I think it would be useful to proove your work for you to either hunt with SmPL coccinelle a similar flaw / how rampant this issue is and then also try to create the same UAF there and prove how your changes fixes it. Luis
Luis Chamberlain <mcgrof@kernel.org> writes: > On Thu, Dec 22, 2022 at 01:45:46PM +0800, Schspa Shi wrote: >> >> Schspa Shi <schspa@gmail.com> writes: >> >> > Luis Chamberlain <mcgrof@kernel.org> writes: >> > >> >> Peter, Ingo, Steven would like you're review. >> >> >> >> On Tue, Dec 13, 2022 at 03:03:53PM -0800, Luis Chamberlain wrote: >> >>> On Mon, Dec 12, 2022 at 09:38:31PM +0800, Schspa Shi wrote: >> >>> > I'd like to upload a V2 patch with the new solution if you prefer the >> >>> > following way. >> >>> > >> >>> > diff --git a/kernel/umh.c b/kernel/umh.c >> >>> > index 850631518665..8023f11fcfc0 100644 >> >>> > --- a/kernel/umh.c >> >>> > +++ b/kernel/umh.c >> >>> > @@ -452,6 +452,11 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) >> >>> > /* umh_complete() will see NULL and free sub_info */ >> >>> > if (xchg(&sub_info->complete, NULL)) >> >>> > goto unlock; >> >>> > + /* >> >>> > + * kthreadd (or new kernel thread) will call complete() >> >>> > + * shortly. >> >>> > + */ >> >>> > + wait_for_completion(&done); >> >>> > } >> >>> >> >>> Yes much better. Did you verify it fixes the splat found by the bots? >> >> >> >> Wait, I'm not sure yet why this would fix it... I first started thinking >> >> that this may be a good example of a Coccinelle SmPL rule, something like: >> >> >> >> DECLARE_COMPLETION_ONSTACK(done); >> >> foo *foo; >> >> ... >> >> foo->completion = &done; >> >> ... >> >> queue_work(system_unbound_wq, &foo->work); >> >> .... >> >> ret = wait_for_completion_state(&done, state); >> >> ... >> >> if (!ret) >> >> S >> >> ... >> >> +wait_for_completion(&done); >> >> >> >> But that is pretty complex, and while it may be useful to know how many >> >> patterns we have like this, it begs the question if generalizing this >> >> inside the callers is best for -ERESTARTSYS condition is best. What >> >> do folks think? >> >> >> >> The rationale here is that if you queue stuff and give access to the >> >> completion variable but its on-stack obviously you can end up with the >> >> queued stuff complete() on a on-stack variable. The issue seems to >> >> be that wait_for_completion_state() for -ERESTARTSYS still means >> >> that the already scheduled queue'd work is *about* to run and >> >> the process with the completion on-stack completed. So we race with >> >> the end of the routine and the completion on-stack. >> >> >> >> It makes me wonder if wait_for_completion() above really is doing >> >> something more, if it is just helping with timing and is still error >> >> prone. >> >> >> >> The queued work will try the the completion as follows: >> >> >> >> static void umh_complete(struct subprocess_info *sub_info) >> >> { >> >> struct completion *comp = xchg(&sub_info->complete, NULL); >> >> /* >> >> * See call_usermodehelper_exec(). If xchg() returns NULL >> >> * we own sub_info, the UMH_KILLABLE caller has gone away >> >> * or the caller used UMH_NO_WAIT. >> >> */ >> >> if (comp) >> >> complete(comp); >> >> else >> >> call_usermodehelper_freeinfo(sub_info); >> >> } >> >> >> >> So the race is getting -ERESTARTSYS on the process with completion >> >> on-stack and the above running complete(comp). Why would sprinkling >> >> wait_for_completion(&done) *after* wait_for_completion_state(&done, state) >> >> fix this UAF? >> > >> > The wait_for_completion(&done) is added when xchg(&sub_info->complete, >> > NULL) return NULL. When it returns NULL, it means the umh_complete was >> > using the completion variable at the same time and will call complete >> > in a very short time. >> > >> Hi Luis: >> >> Is there any further progress on this problem? Does the above >> explanation answer your doubts? > > I think it would be useful to proove your work for you to either > hunt with SmPL coccinelle a similar flaw / how rampant this issue > is and then also try to create the same UAF there and prove how > your changes fixes it. > OK, but it will take some time. > Luis
Schspa Shi <schspa@gmail.com> writes: > Luis Chamberlain <mcgrof@kernel.org> writes: > >> On Thu, Dec 22, 2022 at 01:45:46PM +0800, Schspa Shi wrote: >>> >>> Schspa Shi <schspa@gmail.com> writes: >>> >>> > Luis Chamberlain <mcgrof@kernel.org> writes: >>> > >>> >> Peter, Ingo, Steven would like you're review. >>> >> >>> >> On Tue, Dec 13, 2022 at 03:03:53PM -0800, Luis Chamberlain wrote: >>> >>> On Mon, Dec 12, 2022 at 09:38:31PM +0800, Schspa Shi wrote: >>> >>> > I'd like to upload a V2 patch with the new solution if you prefer the >>> >>> > following way. >>> >>> > >>> >>> > diff --git a/kernel/umh.c b/kernel/umh.c >>> >>> > index 850631518665..8023f11fcfc0 100644 >>> >>> > --- a/kernel/umh.c >>> >>> > +++ b/kernel/umh.c >>> >>> > @@ -452,6 +452,11 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) >>> >>> > /* umh_complete() will see NULL and free sub_info */ >>> >>> > if (xchg(&sub_info->complete, NULL)) >>> >>> > goto unlock; >>> >>> > + /* >>> >>> > + * kthreadd (or new kernel thread) will call complete() >>> >>> > + * shortly. >>> >>> > + */ >>> >>> > + wait_for_completion(&done); >>> >>> > } >>> >>> >>> >>> Yes much better. Did you verify it fixes the splat found by the bots? >>> >> >>> >> Wait, I'm not sure yet why this would fix it... I first started thinking >>> >> that this may be a good example of a Coccinelle SmPL rule, something like: >>> >> >>> >> DECLARE_COMPLETION_ONSTACK(done); >>> >> foo *foo; >>> >> ... >>> >> foo->completion = &done; >>> >> ... >>> >> queue_work(system_unbound_wq, &foo->work); >>> >> .... >>> >> ret = wait_for_completion_state(&done, state); >>> >> ... >>> >> if (!ret) >>> >> S >>> >> ... >>> >> +wait_for_completion(&done); >>> >> >>> >> But that is pretty complex, and while it may be useful to know how many >>> >> patterns we have like this, it begs the question if generalizing this >>> >> inside the callers is best for -ERESTARTSYS condition is best. What >>> >> do folks think? >>> >> >>> >> The rationale here is that if you queue stuff and give access to the >>> >> completion variable but its on-stack obviously you can end up with the >>> >> queued stuff complete() on a on-stack variable. The issue seems to >>> >> be that wait_for_completion_state() for -ERESTARTSYS still means >>> >> that the already scheduled queue'd work is *about* to run and >>> >> the process with the completion on-stack completed. So we race with >>> >> the end of the routine and the completion on-stack. >>> >> >>> >> It makes me wonder if wait_for_completion() above really is doing >>> >> something more, if it is just helping with timing and is still error >>> >> prone. >>> >> >>> >> The queued work will try the the completion as follows: >>> >> >>> >> static void umh_complete(struct subprocess_info *sub_info) >>> >> { >>> >> struct completion *comp = xchg(&sub_info->complete, NULL); >>> >> /* >>> >> * See call_usermodehelper_exec(). If xchg() returns NULL >>> >> * we own sub_info, the UMH_KILLABLE caller has gone away >>> >> * or the caller used UMH_NO_WAIT. >>> >> */ >>> >> if (comp) >>> >> complete(comp); >>> >> else >>> >> call_usermodehelper_freeinfo(sub_info); >>> >> } >>> >> >>> >> So the race is getting -ERESTARTSYS on the process with completion >>> >> on-stack and the above running complete(comp). Why would sprinkling >>> >> wait_for_completion(&done) *after* wait_for_completion_state(&done, state) >>> >> fix this UAF? >>> > >>> > The wait_for_completion(&done) is added when xchg(&sub_info->complete, >>> > NULL) return NULL. When it returns NULL, it means the umh_complete was >>> > using the completion variable at the same time and will call complete >>> > in a very short time. >>> > >>> Hi Luis: >>> >>> Is there any further progress on this problem? Does the above >>> explanation answer your doubts? >> >> I think it would be useful to proove your work for you to either >> hunt with SmPL coccinelle a similar flaw / how rampant this issue >> is and then also try to create the same UAF there and prove how >> your changes fixes it. >> > > OK, but it will take some time. Hi Luis: I have made a kernel module to prove this fix. Please check this at: Link: https://github.com/schspa/code_snippet/tree/master/kernel_module/completion There is both bad & ok test case in the README.org. > >> Luis
Attaching the full test program in case anyone wants to add some comments. /* * complete-uaf.c --- UAF test for complete * * Copyright (C) 2022, Schspa Shi, all rights reserved. * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to deal * in the Software without restriction, including without limitation the rights * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell * copies of the Software, and to permit persons to whom the Software is * furnished to do so, subject to the following conditions: * * The above copyright notice and this permission notice shall be included in * all copies or substantial portions of the Software. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN * THE SOFTWARE. */ #define pr_fmt(fmt) "complete-uaf-test:" fmt #include <linux/kernel.h> #include <linux/module.h> #include <linux/errno.h> #include <linux/proc_fs.h> #include <linux/seq_file.h> #include <linux/spinlock.h> #include <linux/random.h> #include <linux/delay.h> #include <linux/atomic.h> #include <linux/kthread.h> #include <linux/slab.h> #include <linux/completion.h> struct test_work { struct completion *complete; struct pid *caller_pid; unsigned long delay_time; int id; }; #define MAX_WAIT_TIMEOUT (50) static atomic_t test_instance_count; static bool use_fix = false; module_param(use_fix, bool, 0444); MODULE_PARM_DESC(use_fix, "Use fix"); static void mdelay_with_yield(unsigned long timeout_ms) { unsigned long start = jiffies; do { yield(); } while (jiffies_to_msecs(jiffies - start) < timeout_ms); return; } static void test_work_complete(struct test_work *workdata) { struct completion *comp = xchg(&workdata->complete, NULL); /* Sleep for 1 millisecond to simulate preemption */ msleep(100); if (comp) complete(comp); kfree(workdata); } static int completion_thread(void *data) { struct test_work *workdata = data; mdelay_with_yield(workdata->delay_time); /* Simulate an external kill signal */ kill_pid(workdata->caller_pid, SIGKILL, 1); test_work_complete(workdata); return 0; } static int complete_uaf_test_proc_show(struct seq_file *m, void *v) { struct task_struct *thread; DECLARE_COMPLETION_ONSTACK(done); struct test_work *workdata; int retval; int id; workdata = kzalloc(sizeof(*workdata), GFP_KERNEL); if (!workdata) { return -ENOMEM; } id = atomic_inc_return(&test_instance_count); workdata->complete = &done; workdata->id = id; workdata->delay_time = get_random_u32() % (MAX_WAIT_TIMEOUT); workdata->caller_pid = get_pid(task_tgid(current)); thread = kthread_run(completion_thread, workdata, "complete_uaf_test_kthread-%d", workdata->id); if (IS_ERR(thread)) { seq_printf(m, "kthread create failed with status %ld", PTR_ERR(thread)); kfree(workdata); return PTR_ERR(thread); } retval = wait_for_completion_killable(&done); if (retval) { if (xchg(&workdata->complete, NULL)) goto exit; if (use_fix) { wait_for_completion(&done); } } seq_printf(m, "test %d success\n", id); exit: return 0; } static int __init complete_uaf_test_init(void) { proc_create_single("complete_uaf_test", 0, NULL, complete_uaf_test_proc_show); return 0; } module_init(complete_uaf_test_init); MODULE_AUTHOR("Schspa <schspa@gmail.com>"); MODULE_LICENSE("GPL v2");
On Thu, Dec 22, 2022 at 08:09:38PM +0800, Schspa Shi wrote: > > Attaching the full test program in case anyone wants to add some > comments. Good stuff. That looks like a kernel sefltest. So you can just add it as an initial selftest for completion so lib/test_completion.c and extend lib/Kconfig.debug for a new kconfig symbol for it, and then just add a script on tools/testing/selftets/completion/ with a simple makefile which references a script which just calls modprobe. You can look at tools/testing/selftests/kmod/ for an example. But I still think you may want an SmPL Coccinelle grammer patch to hunt down other users with this pattern. The beneefit is that then you can use the same Coccinelle patch to also then *fix* the issue in other places. The current uaf on umh is not something I'm terribly concerned to be exploited in the wild. I don't think other use cases would be easier, but, all this work would close the gap completely. Thanks for doing this. Luis
Luis Chamberlain <mcgrof@kernel.org> writes: > On Thu, Dec 22, 2022 at 08:09:38PM +0800, Schspa Shi wrote: >> >> Attaching the full test program in case anyone wants to add some >> comments. > > Good stuff. > > That looks like a kernel sefltest. So you can just add it as an > initial selftest for completion so lib/test_completion.c and extend > lib/Kconfig.debug for a new kconfig symbol for it, and then just add > a script on tools/testing/selftets/completion/ with a simple makefile > which references a script which just calls modprobe. You can look at > tools/testing/selftests/kmod/ for an example. OK, but I want to know, is it enough to add only positive examples for the test items here? Do we need a reverse example to prove that the previous writing is wrong? > > But I still think you may want an SmPL Coccinelle grammer patch to hunt > down other users with this pattern. The beneefit is that then you can > use the same Coccinelle patch to also then *fix* the issue in other > places. > Yes, I'm learning about SmPL, and I'll add this syntax patch later to find more problems. > The current uaf on umh is not something I'm terribly concerned to be > exploited in the wild. I don't think other use cases would be easier, > but, all this work would close the gap completely. > > Thanks for doing this. > > Luis
On Fri, Jan 13, 2023 at 01:42:05PM +0800, Schspa Shi wrote: > > Luis Chamberlain <mcgrof@kernel.org> writes: > > > On Thu, Dec 22, 2022 at 08:09:38PM +0800, Schspa Shi wrote: > >> > >> Attaching the full test program in case anyone wants to add some > >> comments. > > > > Good stuff. > > > > That looks like a kernel sefltest. So you can just add it as an > > initial selftest for completion so lib/test_completion.c and extend > > lib/Kconfig.debug for a new kconfig symbol for it, and then just add > > a script on tools/testing/selftets/completion/ with a simple makefile > > which references a script which just calls modprobe. You can look at > > tools/testing/selftests/kmod/ for an example. > > OK, but I want to know, is it enough to add only positive examples for > the test items here? Do we need a reverse example to prove that the > previous writing is wrong? That would mean adding code which would cause a UAF, perhaps useful if disabled by default. > > But I still think you may want an SmPL Coccinelle grammer patch to hunt > > down other users with this pattern. The beneefit is that then you can > > use the same Coccinelle patch to also then *fix* the issue in other > > places. > > > > Yes, I'm learning about SmPL, and I'll add this syntax patch later to > find more problems. Great thanks. Luis
diff --git a/include/linux/umh.h b/include/linux/umh.h index 5d1f6129b847..801f7efbc825 100644 --- a/include/linux/umh.h +++ b/include/linux/umh.h @@ -20,6 +20,7 @@ struct file; struct subprocess_info { struct work_struct work; struct completion *complete; + struct completion done; const char *path; char **argv; char **envp; diff --git a/kernel/umh.c b/kernel/umh.c index 850631518665..3ed39956c777 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -380,6 +380,7 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv, sub_info->cleanup = cleanup; sub_info->init = init; sub_info->data = data; + init_completion(&sub_info->done); out: return sub_info; } @@ -405,7 +406,6 @@ EXPORT_SYMBOL(call_usermodehelper_setup); int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) { unsigned int state = TASK_UNINTERRUPTIBLE; - DECLARE_COMPLETION_ONSTACK(done); int retval = 0; if (!sub_info->path) { @@ -431,7 +431,7 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) * This makes it possible to use umh_complete to free * the data structure in case of UMH_NO_WAIT. */ - sub_info->complete = (wait == UMH_NO_WAIT) ? NULL : &done; + sub_info->complete = (wait == UMH_NO_WAIT) ? NULL : &sub_info->done; sub_info->wait = wait; queue_work(system_unbound_wq, &sub_info->work); @@ -444,7 +444,7 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) if (wait & UMH_FREEZABLE) state |= TASK_FREEZABLE; - retval = wait_for_completion_state(&done, state); + retval = wait_for_completion_state(sub_info->complete, state); if (!retval) goto wait_done;