Message ID | 20221129162251.90790-1-schspa@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp440259wrr; Tue, 29 Nov 2022 08:27:43 -0800 (PST) X-Google-Smtp-Source: AA0mqf6TCYv31kamUUO6ASKrfylO8Z2vJQczUfJfWIKCs/9cxEtNnCDjdl9muG9JzY30BoNYVpjS X-Received: by 2002:a17:90a:6508:b0:213:3918:f2a2 with SMTP id i8-20020a17090a650800b002133918f2a2mr66679458pjj.218.1669739262935; Tue, 29 Nov 2022 08:27:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669739262; cv=none; d=google.com; s=arc-20160816; b=CYVnrkeuWAKAhOufhaKjIN+CqM7TFGQgLRgIoipa/vPBlwwHwXF3z7g8K4b0bv6StV OMchIqM4FUz6qgAyBFljoO1sE5aJVwsWqINFgCd8IdkDYV3JOVCV8EctCWq7Yu48ZGoh DEWPg9MVsg8GhcglZ1ZQW2w2gXTXcGYqkUnxmYnCHooriqdx4GzscfbypMo90m7KM9JZ spmy+LUnZooAgMCddDGP1mcjjommLK2RxBqqJtombHlZu7WRnmy5GvwS65PYPSO7fTEN er/3nAAtjf/CF/5o3eHwfOs1JW1XVWmmTqNts3pfnkO0cpiVHS+Rhk76bephL0WQIaAO yvrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=NcXAf7kW6X0N8QOOI9Ap4mD0Z2urNR7zKdEOvhntDTY=; b=xBRxRPca1qdEGOcy3Cg0vpsjlobazweI1hEFlHZPc+FErqtL1J4pvKk6lyHgQKJBCU +iBVOiFSPANj9EYCCSw/fiiMWcb+ogt8nVPE1VUaQM56oR695aUC3TtZbpgV/bhG2Wa/ pP5n6w5YbbKT8j7zH3cxfHKHsrEcS+VjVyoZQvR7nRcy2z76rDNYvT7edVcz9hNCSxzb hpbmssJF/cnm7CelRYyx1PuTELfH2V4UFhqjZH5v4KLeLLyHB263rumIwXGbe/ZkyFLt 4qMG4lgAAlT0Am9i92QFoUsgMu5vdB9N79r8P/vkRA6jPPN3iXFecnZWoE0Jqk/XDkDn 5KGQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=TbVVjhsN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nb13-20020a17090b35cd00b00212cae10683si1995617pjb.69.2022.11.29.08.27.30; Tue, 29 Nov 2022 08:27:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=TbVVjhsN; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235644AbiK2QXH (ORCPT <rfc822;rbbytesnap@gmail.com> + 99 others); Tue, 29 Nov 2022 11:23:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52080 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231151AbiK2QXG (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 29 Nov 2022 11:23:06 -0500 Received: from mail-io1-xd35.google.com (mail-io1-xd35.google.com [IPv6:2607:f8b0:4864:20::d35]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D226286E4; Tue, 29 Nov 2022 08:23:05 -0800 (PST) Received: by mail-io1-xd35.google.com with SMTP id c7so10389388iof.13; Tue, 29 Nov 2022 08:23:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=NcXAf7kW6X0N8QOOI9Ap4mD0Z2urNR7zKdEOvhntDTY=; b=TbVVjhsN6Lsksx5jQUoKasgH5oGbQEes1jlylk6/dr3TMSO/QovM2rG/ComLvR/mAE 5fA2l3ZYfbUsKX01G0AV4Uq8d5VmKdQlDnC3hget5f+pElz9m5cCvOcGdq8kAE0fBDd6 s7HmLRYCtlAeaY5JjU0Sgud0vi80Q4C3FiF4PRxjc4mJr+FGD0uO9dq6itUWbmA/orGJ WNtY+FOS6Z33iORgVZDQfll1jNuIwb4MHjLn9M8dm1zsJZ7QBe8j3MSqqUJTH0B3k0wf SGQlyQlXlCSevzT5LWTSlQEL6WYlHexcH5hXuB/7mxBiw1siGn3cjXA3rrQVSMQyCOss C3YQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=NcXAf7kW6X0N8QOOI9Ap4mD0Z2urNR7zKdEOvhntDTY=; b=ARA54SeD3cebvdJXHELSX9C4zByprFkfG+mr56vesypl3Xd+SXJDbBPA9KglbjXXIB Fbfq3gtKqhBYPWv9uUIpq8HYzqFaL0zPPf3yfyIwSeIVYMO4bYD1UGSvv0CTCWyOePQS jpXWAW4wZw2bvpcRx/WaEQlwT4Z6EoUEGXGx8adXd+R7h3ICwCBy4Cb6YhgTGCqBF6uT f4DGTRQJ4NaPGO0XsGcrZFniqKxCRahWp5yBNZuZKorUDVdDo+O2yGMs9HIS1fTiU3Yg 5t2twwIK0bX+m96J2Rs7700Ua2bgefQVc84rNj+DDHdTiEe0nwtCEEZnBPeLsn4ityY1 oPrA== X-Gm-Message-State: ANoB5pkL3vxSZfSLET5ci2FDEctWLVf7tikdSYCZFcYyXK1eQ9ataEsQ jzroi+C0owh2gI+KZEbuzUA= X-Received: by 2002:a05:6602:220a:b0:6bb:a841:11fc with SMTP id n10-20020a056602220a00b006bba84111fcmr17116271ion.162.1669738984730; Tue, 29 Nov 2022 08:23:04 -0800 (PST) Received: from MBP.lan (ec2-18-117-95-84.us-east-2.compute.amazonaws.com. [18.117.95.84]) by smtp.gmail.com with ESMTPSA id t3-20020a02b183000000b00375192e7484sm5382402jah.90.2022.11.29.08.22.59 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 29 Nov 2022 08:23:04 -0800 (PST) From: Schspa Shi <schspa@gmail.com> To: ericvh@gmail.com, lucho@ionkov.net, asmadeus@codewreck.org, linux_oss@crudebyte.co, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: v9fs-developer@lists.sourceforge.net, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Schspa Shi <schspa@gmail.com>, syzbot+8f1060e2aaf8ca55220b@syzkaller.appspotmail.com Subject: [PATCH] 9p: fix crash when transaction killed Date: Wed, 30 Nov 2022 00:22:51 +0800 Message-Id: <20221129162251.90790-1-schspa@gmail.com> X-Mailer: git-send-email 2.37.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750848516896654239?= X-GMAIL-MSGID: =?utf-8?q?1750848516896654239?= |
Series |
9p: fix crash when transaction killed
|
|
Commit Message
Schspa Shi
Nov. 29, 2022, 4:22 p.m. UTC
The transport layer of fs does not fully support the cancel request.
When the request is in the REQ_STATUS_SENT state, p9_fd_cancelled
will forcibly delete the request, and at this time p9_[read/write]_work
may continue to use the request. Therefore, it causes UAF .
There is the logs from syzbot.
Corrupted memory at 0xffff88807eade00b [ 0xff 0x07 0x00 0x00 0x00 0x00
0x00 0x00 . . . . . . . . ] (in kfence-#110):
p9_fcall_fini net/9p/client.c:248 [inline]
p9_req_put net/9p/client.c:396 [inline]
p9_req_put+0x208/0x250 net/9p/client.c:390
p9_client_walk+0x247/0x540 net/9p/client.c:1165
clone_fid fs/9p/fid.h:21 [inline]
v9fs_fid_xattr_set+0xe4/0x2b0 fs/9p/xattr.c:118
v9fs_xattr_set fs/9p/xattr.c:100 [inline]
v9fs_xattr_handler_set+0x6f/0x120 fs/9p/xattr.c:159
__vfs_setxattr+0x119/0x180 fs/xattr.c:182
__vfs_setxattr_noperm+0x129/0x5f0 fs/xattr.c:216
__vfs_setxattr_locked+0x1d3/0x260 fs/xattr.c:277
vfs_setxattr+0x143/0x340 fs/xattr.c:309
setxattr+0x146/0x160 fs/xattr.c:617
path_setxattr+0x197/0x1c0 fs/xattr.c:636
__do_sys_setxattr fs/xattr.c:652 [inline]
__se_sys_setxattr fs/xattr.c:648 [inline]
__ia32_sys_setxattr+0xc0/0x160 fs/xattr.c:648
do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
__do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178
do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203
entry_SYSENTER_compat_after_hwframe+0x70/0x82
Below is a similar scenario, the scenario in the syzbot log looks more
complicated than this one, but the root cause seems to be the same.
T21124 p9_write_work p9 read_work
======================== first trans =================================
p9_client_walk
p9_client_rpc
p9_client_prepare_req
/* req->refcount == 2 */
c->trans_mod->request(c, req);
p9_fd_request
req move to unsent_req_list
req->status = REQ_STATUS_SENT;
req move to req_list
<< send to server >>
wait_event_killable
<< get kill signal >>
if (c->trans_mod->cancel(c, req))
p9_client_flush(c, req);
/* send flush request */
req = p9_client_rpc(c, P9_TFLUSH, "w", oldtag);
if (c->trans_mod->cancelled)
c->trans_mod->cancelled(c, oldreq);
/* old req was deleted from req_list */
/* req->refcount == 1 */
p9_req_put
/* req->refcount == 0 */
<< preempted >>
<< get response, UAF here >>
m->rreq = p9_tag_lookup(m->client, m->rc.tag);
/* req->refcount == 1 */
<< do response >>
p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD);
/* req->refcount == 0 */
p9_fcall_fini
/* request have been freed */
p9_fcall_fini
/* double free */
p9_req_put(m->client, m->rreq);
/* req->refcount == 1 */
To fix it, we can wait the request with status REQ_STATUS_SENT returned.
Reported-by: syzbot+8f1060e2aaf8ca55220b@syzkaller.appspotmail.com
Signed-off-by: Schspa Shi <schspa@gmail.com>
---
net/9p/client.c | 2 +-
net/9p/trans_fd.c | 12 ++++++++++++
2 files changed, 13 insertions(+), 1 deletion(-)
Comments
Schspa Shi <schspa@gmail.com> writes: > The transport layer of fs does not fully support the cancel request. > When the request is in the REQ_STATUS_SENT state, p9_fd_cancelled > will forcibly delete the request, and at this time p9_[read/write]_work > may continue to use the request. Therefore, it causes UAF . > > There is the logs from syzbot. > > Corrupted memory at 0xffff88807eade00b [ 0xff 0x07 0x00 0x00 0x00 0x00 > 0x00 0x00 . . . . . . . . ] (in kfence-#110): > p9_fcall_fini net/9p/client.c:248 [inline] > p9_req_put net/9p/client.c:396 [inline] > p9_req_put+0x208/0x250 net/9p/client.c:390 > p9_client_walk+0x247/0x540 net/9p/client.c:1165 > clone_fid fs/9p/fid.h:21 [inline] > v9fs_fid_xattr_set+0xe4/0x2b0 fs/9p/xattr.c:118 > v9fs_xattr_set fs/9p/xattr.c:100 [inline] > v9fs_xattr_handler_set+0x6f/0x120 fs/9p/xattr.c:159 > __vfs_setxattr+0x119/0x180 fs/xattr.c:182 > __vfs_setxattr_noperm+0x129/0x5f0 fs/xattr.c:216 > __vfs_setxattr_locked+0x1d3/0x260 fs/xattr.c:277 > vfs_setxattr+0x143/0x340 fs/xattr.c:309 > setxattr+0x146/0x160 fs/xattr.c:617 > path_setxattr+0x197/0x1c0 fs/xattr.c:636 > __do_sys_setxattr fs/xattr.c:652 [inline] > __se_sys_setxattr fs/xattr.c:648 [inline] > __ia32_sys_setxattr+0xc0/0x160 fs/xattr.c:648 > do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] > __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178 > do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203 > entry_SYSENTER_compat_after_hwframe+0x70/0x82 > > Below is a similar scenario, the scenario in the syzbot log looks more > complicated than this one, but the root cause seems to be the same. > > T21124 p9_write_work p9 read_work > ======================== first trans ================================= > p9_client_walk > p9_client_rpc > p9_client_prepare_req > /* req->refcount == 2 */ > c->trans_mod->request(c, req); > p9_fd_request > req move to unsent_req_list > req->status = REQ_STATUS_SENT; > req move to req_list > << send to server >> > wait_event_killable > << get kill signal >> > if (c->trans_mod->cancel(c, req)) > p9_client_flush(c, req); > /* send flush request */ > req = p9_client_rpc(c, P9_TFLUSH, "w", oldtag); > if (c->trans_mod->cancelled) > c->trans_mod->cancelled(c, oldreq); > /* old req was deleted from req_list */ > /* req->refcount == 1 */ > p9_req_put > /* req->refcount == 0 */ > << preempted >> > << get response, UAF here >> > m->rreq = p9_tag_lookup(m->client, m->rc.tag); > /* req->refcount == 1 */ > << do response >> > p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD); > /* req->refcount == 0 */ > p9_fcall_fini > /* request have been freed */ > p9_fcall_fini > /* double free */ > p9_req_put(m->client, m->rreq); > /* req->refcount == 1 */ > > To fix it, we can wait the request with status REQ_STATUS_SENT returned. > > Reported-by: syzbot+8f1060e2aaf8ca55220b@syzkaller.appspotmail.com > > Signed-off-by: Schspa Shi <schspa@gmail.com> > --- > net/9p/client.c | 2 +- > net/9p/trans_fd.c | 12 ++++++++++++ > 2 files changed, 13 insertions(+), 1 deletion(-) > > diff --git a/net/9p/client.c b/net/9p/client.c > index aaa37b07e30a..963cf91aa0d5 100644 > --- a/net/9p/client.c > +++ b/net/9p/client.c > @@ -440,7 +440,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status) > smp_wmb(); > req->status = status; > > - wake_up(&req->wq); > + wake_up_all(&req->wq); > p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc.tag); > p9_req_put(c, req); > } > diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c > index eeea0a6a75b6..ee2d6b231af1 100644 > --- a/net/9p/trans_fd.c > +++ b/net/9p/trans_fd.c > @@ -30,6 +30,7 @@ > #include <net/9p/transport.h> > > #include <linux/syscalls.h> /* killme */ > +#include <linux/wait.h> > > #define P9_PORT 564 > #define MAX_SOCK_BUF (1024*1024) > @@ -728,6 +729,17 @@ static int p9_fd_cancelled(struct p9_client *client, struct p9_req_t *req) > return 0; > } > > + /* If the request is been sent to the server, we need to wait for the > + * job to finish. > + */ > + if (req->status == REQ_STATUS_SENT) { > + spin_unlock(&m->req_lock); > + p9_debug(P9_DEBUG_TRANS, "client %p req %p wait done\n", > + client, req); > + wait_event(req->wq, req->status >= REQ_STATUS_RCVD); > + > + return 0; > + } > /* we haven't received a response for oldreq, > * remove it from the list. > */ Add Christian Schoenebeck for bad mail address typo.
On Tuesday, November 29, 2022 5:26:46 PM CET Schspa Shi wrote: > > Schspa Shi <schspa@gmail.com> writes: > > > The transport layer of fs does not fully support the cancel request. > > When the request is in the REQ_STATUS_SENT state, p9_fd_cancelled > > will forcibly delete the request, and at this time p9_[read/write]_work > > may continue to use the request. Therefore, it causes UAF . > > > > There is the logs from syzbot. > > > > Corrupted memory at 0xffff88807eade00b [ 0xff 0x07 0x00 0x00 0x00 0x00 > > 0x00 0x00 . . . . . . . . ] (in kfence-#110): > > p9_fcall_fini net/9p/client.c:248 [inline] > > p9_req_put net/9p/client.c:396 [inline] > > p9_req_put+0x208/0x250 net/9p/client.c:390 > > p9_client_walk+0x247/0x540 net/9p/client.c:1165 > > clone_fid fs/9p/fid.h:21 [inline] > > v9fs_fid_xattr_set+0xe4/0x2b0 fs/9p/xattr.c:118 > > v9fs_xattr_set fs/9p/xattr.c:100 [inline] > > v9fs_xattr_handler_set+0x6f/0x120 fs/9p/xattr.c:159 > > __vfs_setxattr+0x119/0x180 fs/xattr.c:182 > > __vfs_setxattr_noperm+0x129/0x5f0 fs/xattr.c:216 > > __vfs_setxattr_locked+0x1d3/0x260 fs/xattr.c:277 > > vfs_setxattr+0x143/0x340 fs/xattr.c:309 > > setxattr+0x146/0x160 fs/xattr.c:617 > > path_setxattr+0x197/0x1c0 fs/xattr.c:636 > > __do_sys_setxattr fs/xattr.c:652 [inline] > > __se_sys_setxattr fs/xattr.c:648 [inline] > > __ia32_sys_setxattr+0xc0/0x160 fs/xattr.c:648 > > do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] > > __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178 > > do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203 > > entry_SYSENTER_compat_after_hwframe+0x70/0x82 > > > > Below is a similar scenario, the scenario in the syzbot log looks more > > complicated than this one, but the root cause seems to be the same. > > > > T21124 p9_write_work p9 read_work > > ======================== first trans ================================= > > p9_client_walk > > p9_client_rpc > > p9_client_prepare_req > > /* req->refcount == 2 */ > > c->trans_mod->request(c, req); > > p9_fd_request > > req move to unsent_req_list > > req->status = REQ_STATUS_SENT; > > req move to req_list > > << send to server >> > > wait_event_killable > > << get kill signal >> > > if (c->trans_mod->cancel(c, req)) > > p9_client_flush(c, req); > > /* send flush request */ > > req = p9_client_rpc(c, P9_TFLUSH, "w", oldtag); > > if (c->trans_mod->cancelled) > > c->trans_mod->cancelled(c, oldreq); > > /* old req was deleted from req_list */ > > /* req->refcount == 1 */ > > p9_req_put > > /* req->refcount == 0 */ > > << preempted >> > > << get response, UAF here >> > > m->rreq = p9_tag_lookup(m->client, m->rc.tag); > > /* req->refcount == 1 */ > > << do response >> > > p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD); > > /* req->refcount == 0 */ > > p9_fcall_fini > > /* request have been freed */ > > p9_fcall_fini > > /* double free */ > > p9_req_put(m->client, m->rreq); > > /* req->refcount == 1 */ > > > > To fix it, we can wait the request with status REQ_STATUS_SENT returned. 9p server might or might not send a reply on cancelled request. If 9p server notices client's Tflush request early enough, then it would simply discard the old=cancelled request and not send any reply on that old request. If server notices Tflush too late, then server would send a response to the old request. http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28 However after sending Tflush client waits for the corresponding Rflush response, and at this point situation should be clear; no further response expected from server for old request at this point. And that's what Linux client does. Which server implementation caused that? > > > > Reported-by: syzbot+8f1060e2aaf8ca55220b@syzkaller.appspotmail.com > > > > Signed-off-by: Schspa Shi <schspa@gmail.com> > > --- > > net/9p/client.c | 2 +- > > net/9p/trans_fd.c | 12 ++++++++++++ > > 2 files changed, 13 insertions(+), 1 deletion(-) > > > > diff --git a/net/9p/client.c b/net/9p/client.c > > index aaa37b07e30a..963cf91aa0d5 100644 > > --- a/net/9p/client.c > > +++ b/net/9p/client.c > > @@ -440,7 +440,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status) > > smp_wmb(); > > req->status = status; > > > > - wake_up(&req->wq); > > + wake_up_all(&req->wq); Purpose? > > p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc.tag); > > p9_req_put(c, req); > > } > > diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c > > index eeea0a6a75b6..ee2d6b231af1 100644 > > --- a/net/9p/trans_fd.c > > +++ b/net/9p/trans_fd.c > > @@ -30,6 +30,7 @@ > > #include <net/9p/transport.h> > > > > #include <linux/syscalls.h> /* killme */ > > +#include <linux/wait.h> > > > > #define P9_PORT 564 > > #define MAX_SOCK_BUF (1024*1024) > > @@ -728,6 +729,17 @@ static int p9_fd_cancelled(struct p9_client *client, struct p9_req_t *req) > > return 0; > > } > > > > + /* If the request is been sent to the server, we need to wait for the > > + * job to finish. > > + */ > > + if (req->status == REQ_STATUS_SENT) { > > + spin_unlock(&m->req_lock); > > + p9_debug(P9_DEBUG_TRANS, "client %p req %p wait done\n", > > + client, req); > > + wait_event(req->wq, req->status >= REQ_STATUS_RCVD); > > + > > + return 0; > > + } > > /* we haven't received a response for oldreq, > > * remove it from the list. > > */ > > Add Christian Schoenebeck for bad mail address typo. > >
Schspa Shi wrote on Wed, Nov 30, 2022 at 12:22:51AM +0800: > The transport layer of fs does not fully support the cancel request. > When the request is in the REQ_STATUS_SENT state, p9_fd_cancelled > will forcibly delete the request, and at this time p9_[read/write]_work > may continue to use the request. Therefore, it causes UAF . > > There is the logs from syzbot. > > Corrupted memory at 0xffff88807eade00b [ 0xff 0x07 0x00 0x00 0x00 0x00 > 0x00 0x00 . . . . . . . . ] (in kfence-#110): > p9_fcall_fini net/9p/client.c:248 [inline] > p9_req_put net/9p/client.c:396 [inline] > p9_req_put+0x208/0x250 net/9p/client.c:390 > p9_client_walk+0x247/0x540 net/9p/client.c:1165 > clone_fid fs/9p/fid.h:21 [inline] > v9fs_fid_xattr_set+0xe4/0x2b0 fs/9p/xattr.c:118 > v9fs_xattr_set fs/9p/xattr.c:100 [inline] > v9fs_xattr_handler_set+0x6f/0x120 fs/9p/xattr.c:159 > __vfs_setxattr+0x119/0x180 fs/xattr.c:182 > __vfs_setxattr_noperm+0x129/0x5f0 fs/xattr.c:216 > __vfs_setxattr_locked+0x1d3/0x260 fs/xattr.c:277 > vfs_setxattr+0x143/0x340 fs/xattr.c:309 > setxattr+0x146/0x160 fs/xattr.c:617 > path_setxattr+0x197/0x1c0 fs/xattr.c:636 > __do_sys_setxattr fs/xattr.c:652 [inline] > __se_sys_setxattr fs/xattr.c:648 [inline] > __ia32_sys_setxattr+0xc0/0x160 fs/xattr.c:648 > do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] > __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178 > do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203 > entry_SYSENTER_compat_after_hwframe+0x70/0x82 > > Below is a similar scenario, the scenario in the syzbot log looks more > complicated than this one, but the root cause seems to be the same. > > T21124 p9_write_work p9 read_work > ======================== first trans ================================= > p9_client_walk > p9_client_rpc > p9_client_prepare_req > /* req->refcount == 2 */ > c->trans_mod->request(c, req); > p9_fd_request > req move to unsent_req_list > req->status = REQ_STATUS_SENT; > req move to req_list > << send to server >> > wait_event_killable > << get kill signal >> > if (c->trans_mod->cancel(c, req)) > p9_client_flush(c, req); > /* send flush request */ > req = p9_client_rpc(c, P9_TFLUSH, "w", oldtag); > if (c->trans_mod->cancelled) > c->trans_mod->cancelled(c, oldreq); > /* old req was deleted from req_list */ > /* req->refcount == 1 */ > p9_req_put > /* req->refcount == 0 */ > << preempted >> > << get response, UAF here >> > m->rreq = p9_tag_lookup(m->client, m->rc.tag); > /* req->refcount == 1 */ > << do response >> > p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD); > /* req->refcount == 0 */ > p9_fcall_fini > /* request have been freed */ > p9_fcall_fini > /* double free */ > p9_req_put(m->client, m->rreq); > /* req->refcount == 1 */ > > To fix it, we can wait the request with status REQ_STATUS_SENT returned. Christian replied on this (we cannot wait) but I agree with him -- the scenario you describe is proteced by p9_tag_lookup checking for refcount with refcount_inc_not_zero (p9_req_try_get). The normal scenarii for flush are as follow: - cancel before request is sent: no flush, just free - flush is ignored and reply comes first: we get reply from original request then reply from flush - flush is handled and reply never comes: we only get reply from flush Protocol-wise, we can safely reuse the tag after the flush reply got received; and as far as I can follow the code we only ever free the tag (last p9_call_fini) after flush has returned so the entry should be protected. If we receive a response on the given tag between cancelled and the main thread going out the request has been marked as FLSHD and should be ignored. . . here is one p9_req_put in p9_read_work() in this case but it corresponds to the ref obtained by p9_tag_lookup() so it should be valid. I'm happy to believe we have a race somewhere (even if no sane server would produce it), but right now I don't see it looking at the code.. :/
asmadeus@codewreck.org writes: > Schspa Shi wrote on Wed, Nov 30, 2022 at 12:22:51AM +0800: >> The transport layer of fs does not fully support the cancel request. >> When the request is in the REQ_STATUS_SENT state, p9_fd_cancelled >> will forcibly delete the request, and at this time p9_[read/write]_work >> may continue to use the request. Therefore, it causes UAF . >> >> There is the logs from syzbot. >> >> Corrupted memory at 0xffff88807eade00b [ 0xff 0x07 0x00 0x00 0x00 0x00 >> 0x00 0x00 . . . . . . . . ] (in kfence-#110): >> p9_fcall_fini net/9p/client.c:248 [inline] >> p9_req_put net/9p/client.c:396 [inline] >> p9_req_put+0x208/0x250 net/9p/client.c:390 >> p9_client_walk+0x247/0x540 net/9p/client.c:1165 >> clone_fid fs/9p/fid.h:21 [inline] >> v9fs_fid_xattr_set+0xe4/0x2b0 fs/9p/xattr.c:118 >> v9fs_xattr_set fs/9p/xattr.c:100 [inline] >> v9fs_xattr_handler_set+0x6f/0x120 fs/9p/xattr.c:159 >> __vfs_setxattr+0x119/0x180 fs/xattr.c:182 >> __vfs_setxattr_noperm+0x129/0x5f0 fs/xattr.c:216 >> __vfs_setxattr_locked+0x1d3/0x260 fs/xattr.c:277 >> vfs_setxattr+0x143/0x340 fs/xattr.c:309 >> setxattr+0x146/0x160 fs/xattr.c:617 >> path_setxattr+0x197/0x1c0 fs/xattr.c:636 >> __do_sys_setxattr fs/xattr.c:652 [inline] >> __se_sys_setxattr fs/xattr.c:648 [inline] >> __ia32_sys_setxattr+0xc0/0x160 fs/xattr.c:648 >> do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] >> __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178 >> do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203 >> entry_SYSENTER_compat_after_hwframe+0x70/0x82 >> >> Below is a similar scenario, the scenario in the syzbot log looks more >> complicated than this one, but the root cause seems to be the same. >> >> T21124 p9_write_work p9 read_work >> ======================== first trans ================================= >> p9_client_walk >> p9_client_rpc >> p9_client_prepare_req >> /* req->refcount == 2 */ >> c->trans_mod->request(c, req); >> p9_fd_request >> req move to unsent_req_list >> req->status = REQ_STATUS_SENT; >> req move to req_list >> << send to server >> >> wait_event_killable >> << get kill signal >> >> if (c->trans_mod->cancel(c, req)) >> p9_client_flush(c, req); >> /* send flush request */ >> req = p9_client_rpc(c, P9_TFLUSH, "w", oldtag); >> if (c->trans_mod->cancelled) >> c->trans_mod->cancelled(c, oldreq); >> /* old req was deleted from req_list */ >> /* req->refcount == 1 */ >> p9_req_put >> /* req->refcount == 0 */ >> << preempted >> >> << get response, UAF here >> >> m->rreq = p9_tag_lookup(m->client, m->rc.tag); >> /* req->refcount == 1 */ >> << do response >> >> p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD); >> /* req->refcount == 0 */ >> p9_fcall_fini >> /* request have been freed */ >> p9_fcall_fini >> /* double free */ >> p9_req_put(m->client, m->rreq); >> /* req->refcount == 1 */ >> >> To fix it, we can wait the request with status REQ_STATUS_SENT returned. > > Christian replied on this (we cannot wait) but I agree with him -- the Yes, this is where I worry about too, this wait maybe cause a deadlock. > scenario you describe is proteced by p9_tag_lookup checking for refcount > with refcount_inc_not_zero (p9_req_try_get). Thanks for pointing out the zero value check here, the scene in the commit message does not hold. > > The normal scenarii for flush are as follow: > - cancel before request is sent: no flush, just free > - flush is ignored and reply comes first: we get reply from original > request then reply from flush > - flush is handled and reply never comes: we only get reply from flush > > Protocol-wise, we can safely reuse the tag after the flush reply got > received; and as far as I can follow the code we only ever free the tag > (last p9_call_fini) after flush has returned so the entry should be > protected. > > If we receive a response on the given tag between cancelled and the main > thread going out the request has been marked as FLSHD and should be > ignored. . . here is one p9_req_put in p9_read_work() in this case but > it corresponds to the ref obtained by p9_tag_lookup() so it should be > valid. > > > I'm happy to believe we have a race somewhere (even if no sane server > would produce it), but right now I don't see it looking at the code.. :/ And I think there is a race too. because the syzbot report about 9p fs memory corruption multi times. As for the problem, the p9_tag_lookup only takes the rcu_read_lock when accessing the IDR, why it doesn't take the p9_client->lock? Maybe the root cause is that a lock is missing here.
Schspa Shi <schspa@gmail.com> writes: > asmadeus@codewreck.org writes: > >> Schspa Shi wrote on Wed, Nov 30, 2022 at 12:22:51AM +0800: >>> The transport layer of fs does not fully support the cancel request. >>> When the request is in the REQ_STATUS_SENT state, p9_fd_cancelled >>> will forcibly delete the request, and at this time p9_[read/write]_work >>> may continue to use the request. Therefore, it causes UAF . >>> >>> There is the logs from syzbot. >>> >>> Corrupted memory at 0xffff88807eade00b [ 0xff 0x07 0x00 0x00 0x00 0x00 >>> 0x00 0x00 . . . . . . . . ] (in kfence-#110): >>> p9_fcall_fini net/9p/client.c:248 [inline] >>> p9_req_put net/9p/client.c:396 [inline] >>> p9_req_put+0x208/0x250 net/9p/client.c:390 >>> p9_client_walk+0x247/0x540 net/9p/client.c:1165 >>> clone_fid fs/9p/fid.h:21 [inline] >>> v9fs_fid_xattr_set+0xe4/0x2b0 fs/9p/xattr.c:118 >>> v9fs_xattr_set fs/9p/xattr.c:100 [inline] >>> v9fs_xattr_handler_set+0x6f/0x120 fs/9p/xattr.c:159 >>> __vfs_setxattr+0x119/0x180 fs/xattr.c:182 >>> __vfs_setxattr_noperm+0x129/0x5f0 fs/xattr.c:216 >>> __vfs_setxattr_locked+0x1d3/0x260 fs/xattr.c:277 >>> vfs_setxattr+0x143/0x340 fs/xattr.c:309 >>> setxattr+0x146/0x160 fs/xattr.c:617 >>> path_setxattr+0x197/0x1c0 fs/xattr.c:636 >>> __do_sys_setxattr fs/xattr.c:652 [inline] >>> __se_sys_setxattr fs/xattr.c:648 [inline] >>> __ia32_sys_setxattr+0xc0/0x160 fs/xattr.c:648 >>> do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] >>> __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178 >>> do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203 >>> entry_SYSENTER_compat_after_hwframe+0x70/0x82 >>> >>> Below is a similar scenario, the scenario in the syzbot log looks more >>> complicated than this one, but the root cause seems to be the same. >>> >>> T21124 p9_write_work p9 read_work >>> ======================== first trans ================================= >>> p9_client_walk >>> p9_client_rpc >>> p9_client_prepare_req >>> /* req->refcount == 2 */ >>> c->trans_mod->request(c, req); >>> p9_fd_request >>> req move to unsent_req_list >>> req->status = REQ_STATUS_SENT; >>> req move to req_list >>> << send to server >> >>> wait_event_killable >>> << get kill signal >> >>> if (c->trans_mod->cancel(c, req)) >>> p9_client_flush(c, req); >>> /* send flush request */ >>> req = p9_client_rpc(c, P9_TFLUSH, "w", oldtag); >>> if (c->trans_mod->cancelled) >>> c->trans_mod->cancelled(c, oldreq); >>> /* old req was deleted from req_list */ >>> /* req->refcount == 1 */ >>> p9_req_put >>> /* req->refcount == 0 */ >>> << preempted >> >>> << get response, UAF here >> >>> m->rreq = p9_tag_lookup(m->client, m->rc.tag); >>> /* req->refcount == 1 */ >>> << do response >> >>> p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD); >>> /* req->refcount == 0 */ >>> p9_fcall_fini >>> /* request have been freed */ >>> p9_fcall_fini >>> /* double free */ >>> p9_req_put(m->client, m->rreq); >>> /* req->refcount == 1 */ >>> >>> To fix it, we can wait the request with status REQ_STATUS_SENT returned. >> >> Christian replied on this (we cannot wait) but I agree with him -- the > > Yes, this is where I worry about too, this wait maybe cause a deadlock. > @Christian: It seems we don't need this wait, The problem maybe cause by lack of lock in p9_tag_lookup. >> scenario you describe is proteced by p9_tag_lookup checking for refcount >> with refcount_inc_not_zero (p9_req_try_get). > > Thanks for pointing out the zero value check here, the scene in the > commit message does not hold. > >> >> The normal scenarii for flush are as follow: >> - cancel before request is sent: no flush, just free >> - flush is ignored and reply comes first: we get reply from original >> request then reply from flush >> - flush is handled and reply never comes: we only get reply from flush >> >> Protocol-wise, we can safely reuse the tag after the flush reply got >> received; and as far as I can follow the code we only ever free the tag >> (last p9_call_fini) after flush has returned so the entry should be >> protected. >> >> If we receive a response on the given tag between cancelled and the main >> thread going out the request has been marked as FLSHD and should be >> ignored. . . here is one p9_req_put in p9_read_work() in this case but >> it corresponds to the ref obtained by p9_tag_lookup() so it should be >> valid. >> >> >> I'm happy to believe we have a race somewhere (even if no sane server >> would produce it), but right now I don't see it looking at the code.. :/ > > And I think there is a race too. because the syzbot report about 9p fs > memory corruption multi times. > > As for the problem, the p9_tag_lookup only takes the rcu_read_lock when > accessing the IDR, why it doesn't take the p9_client->lock? Maybe the > root cause is that a lock is missing here. Add Christian Schoenebeck for bad mail address typo.
(fixed Christophe's address, hopefully that will do for good...) Schspa Shi wrote on Wed, Nov 30, 2022 at 10:22:44AM +0800: > > I'm happy to believe we have a race somewhere (even if no sane server > > would produce it), but right now I don't see it looking at the code.. :/ > > And I think there is a race too. because the syzbot report about 9p fs > memory corruption multi times. Yes, no point in denying that :) > As for the problem, the p9_tag_lookup only takes the rcu_read_lock when > accessing the IDR, why it doesn't take the p9_client->lock? Maybe the > root cause is that a lock is missing here. It shouldn't need to, but happy to try adding it. For the logic: - idr_find is RCU-safe (trusting the comment above it) - reqs are alloced in a kmem_cache created with SLAB_TYPESAFE_BY_RCU. This means that if we get a req from idr_find, even if it has just been freed, it either is still in the state it was freed at (hence refcount 0, we ignore it) or is another req coming from the same cache (if refcount isn't zero, we can check its tag) The refcount itself is an atomic operation so doesn't require lock. ... And in the off chance I hadn't considered that we're already dealing with a new request with the same tag here, we'll be updating its status so another receive for it shouldn't use it?... I don't think adding the client lock helps with anything here, but it'll certainly simplify this logic as we then are guaranteed not to get obsolete results from idr_find. Unfortunately adding a lock will slow things down regardless of correctness, so it might just make the race much harder to hit without fixing it and we might not notice that, so it'd be good to understand the race.
asmadeus@codewreck.org writes: > (fixed Christophe's address, hopefully that will do for good...) > > Schspa Shi wrote on Wed, Nov 30, 2022 at 10:22:44AM +0800: >> > I'm happy to believe we have a race somewhere (even if no sane server >> > would produce it), but right now I don't see it looking at the code.. :/ >> >> And I think there is a race too. because the syzbot report about 9p fs >> memory corruption multi times. > > Yes, no point in denying that :) > >> As for the problem, the p9_tag_lookup only takes the rcu_read_lock when >> accessing the IDR, why it doesn't take the p9_client->lock? Maybe the >> root cause is that a lock is missing here. > > It shouldn't need to, but happy to try adding it. > For the logic: > - idr_find is RCU-safe (trusting the comment above it) > - reqs are alloced in a kmem_cache created with SLAB_TYPESAFE_BY_RCU. > This means that if we get a req from idr_find, even if it has just been > freed, it either is still in the state it was freed at (hence refcount > 0, we ignore it) or is another req coming from the same cache (if If the req was newly alloced(It was at a new page), refcount maybe not 0, there will be problem in this case. It seems we can't relay on this. We need to set the refcount to zero before add it to idr in p9_tag_alloc. > refcount isn't zero, we can check its tag) As for the release case, the next request will have the same tag with high probability. It's better to make the tag value to be an increase sequence, thus will avoid very much possible req reuse. > The refcount itself is an atomic operation so doesn't require lock. > ... And in the off chance I hadn't considered that we're already > dealing with a new request with the same tag here, we'll be updating > its status so another receive for it shouldn't use it?... > > I don't think adding the client lock helps with anything here, but it'll > certainly simplify this logic as we then are guaranteed not to get > obsolete results from idr_find. > > Unfortunately adding a lock will slow things down regardless of > correctness, so it might just make the race much harder to hit without > fixing it and we might not notice that, so it'd be good to understand > the race.
Schspa Shi wrote on Wed, Nov 30, 2022 at 04:14:32PM +0800: > > - reqs are alloced in a kmem_cache created with SLAB_TYPESAFE_BY_RCU. > > This means that if we get a req from idr_find, even if it has just been > > freed, it either is still in the state it was freed at (hence refcount > > 0, we ignore it) or is another req coming from the same cache (if > > If the req was newly alloced(It was at a new page), refcount maybe not > 0, there will be problem in this case. It seems we can't relay on this. > > We need to set the refcount to zero before add it to idr in p9_tag_alloc. Hmm, if it's reused then it's zero by definition, but if it's a new allocation (uninitialized) then anything goes; that lookup could find and increase it before the refcount_set, and we'd have an off by one leading to use after free. Good catch! Initializing it to zero will lead to the client busy-looping until after the refcount is properly set, which should work. Setting refcount early might have us use an re-used req before the tag has been changed so that one cannot move. Could you test with just that changed if syzbot still reproduces this bug? (perhaps add a comment if you send this) ------ diff --git a/net/9p/client.c b/net/9p/client.c index aaa37b07e30a..aa64724f6a69 100644 --- a/net/9p/client.c +++ b/net/9p/client.c @@ -297,6 +297,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, uint t_size, uint r_size, p9pdu_reset(&req->rc); req->t_err = 0; req->status = REQ_STATUS_ALLOC; + refcount_set(&req->refcount, 0); init_waitqueue_head(&req->wq); INIT_LIST_HEAD(&req->req_list); ----- > > refcount isn't zero, we can check its tag) > > As for the release case, the next request will have the same tag with > high probability. It's better to make the tag value to be an increase > sequence, thus will avoid very much possible req reuse. I'd love to be able to do this, but it would break some servers that assume tags are small (e.g. using it as an index for a tag array) ... I thought nfs-ganesha was doing this but they properly put in in buckets, so that's one less server to worry about, but I wouldn't put it past some simple servers to do that; having a way to lookup a given tag for flush is an implementation requirement. That shouldn't be a problem though as that will just lead to either fail the guard check after lookup (m->rreq->status != REQ_STATUS_SENT) or be processed as a normal reply if it's already been sent by the other thread at this point. OTOH, that m->rreq->status isn't protected by m->req_lock in trans_fd, and that is probably another bug...
On Wednesday, November 30, 2022 12:06:31 PM CET asmadeus@codewreck.org wrote: > Schspa Shi wrote on Wed, Nov 30, 2022 at 04:14:32PM +0800: > > > - reqs are alloced in a kmem_cache created with SLAB_TYPESAFE_BY_RCU. > > > This means that if we get a req from idr_find, even if it has just been > > > freed, it either is still in the state it was freed at (hence refcount > > > 0, we ignore it) or is another req coming from the same cache (if > > > > If the req was newly alloced(It was at a new page), refcount maybe not > > 0, there will be problem in this case. It seems we can't relay on this. > > > > We need to set the refcount to zero before add it to idr in p9_tag_alloc. > > Hmm, if it's reused then it's zero by definition, but if it's a new > allocation (uninitialized) then anything goes; that lookup could find > and increase it before the refcount_set, and we'd have an off by one > leading to use after free. Good catch! > > Initializing it to zero will lead to the client busy-looping until after > the refcount is properly set, which should work. > Setting refcount early might have us use an re-used req before the tag > has been changed so that one cannot move. > > Could you test with just that changed if syzbot still reproduces this > bug? (perhaps add a comment if you send this) > > ------ > diff --git a/net/9p/client.c b/net/9p/client.c > index aaa37b07e30a..aa64724f6a69 100644 > --- a/net/9p/client.c > +++ b/net/9p/client.c > @@ -297,6 +297,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, uint t_size, uint r_size, > p9pdu_reset(&req->rc); > req->t_err = 0; > req->status = REQ_STATUS_ALLOC; > + refcount_set(&req->refcount, 0); > init_waitqueue_head(&req->wq); > INIT_LIST_HEAD(&req->req_list); > > ----- > > > > refcount isn't zero, we can check its tag) > > > > As for the release case, the next request will have the same tag with > > high probability. It's better to make the tag value to be an increase > > sequence, thus will avoid very much possible req reuse. > > I'd love to be able to do this, but it would break some servers that > assume tags are small (e.g. using it as an index for a tag array) > ... I thought nfs-ganesha was doing this but they properly put in in > buckets, so that's one less server to worry about, but I wouldn't put > it past some simple servers to do that; having a way to lookup a given > tag for flush is an implementation requirement. I really think it's time to emit tag number sequentially. If it turns out that it's a server that is broken, we could then simply ignore replies with old/ unknown tag number. It would also help a lot when debugging 9p issues in general when you know tag numbers are not re-used (in near future). A 9p server must not make any assumptions how tag numbers are generated by client, whether dense or sparse, or whatever. If it does then server is broken, which is much easier to fix than synchronization issues we have to deal with like this one. > That shouldn't be a problem though as that will just lead to either fail > the guard check after lookup (m->rreq->status != REQ_STATUS_SENT) or be > processed as a normal reply if it's already been sent by the other > thread at this point. > OTOH, that m->rreq->status isn't protected by m->req_lock in trans_fd, > and that is probably another bug...
Christian Schoenebeck wrote on Wed, Nov 30, 2022 at 01:43:20PM +0100: > > > As for the release case, the next request will have the same tag with > > > high probability. It's better to make the tag value to be an increase > > > sequence, thus will avoid very much possible req reuse. > > > > I'd love to be able to do this, but it would break some servers that > > assume tags are small (e.g. using it as an index for a tag array) > > ... I thought nfs-ganesha was doing this but they properly put in in > > buckets, so that's one less server to worry about, but I wouldn't put > > it past some simple servers to do that; having a way to lookup a given > > tag for flush is an implementation requirement. > > I really think it's time to emit tag number sequentially. If it turns out that > it's a server that is broken, we could then simply ignore replies with old/ > unknown tag number. It would also help a lot when debugging 9p issues in > general when you know tag numbers are not re-used (in near future). > > A 9p server must not make any assumptions how tag numbers are generated by > client, whether dense or sparse, or whatever. If it does then server is > broken, which is much easier to fix than synchronization issues we have to > deal with like this one. Well, it's a one line change: just replace the idr_alloc in the else branch of p9_tag_alloc with idr_alloc_cyclic. But linux has an history of not breaking userspace, even if it's broken. One could argue that the server side of a networked protocol isn't as tightly coupled but I still think we should be careful with it -- adding a new mount option to rever to the old behaviour at the very least. I'm also not convinced it'd fix anything here, we're not talking about a real server but about a potential attacker -- if a reply comes in with the next tag while we're allocating it, we'll get the exact same problem as we have right now. Frankly, 9p has no security at all so I'm not sure this is something we really need to worry about, but bugs are bugs so we might as well fix them if someone has the time for that... Anyway, I can appreciate that logs will definitely be easier to read, so an option to voluntarily switch to cyclic allocation would be more than welcome as a first step and shouldn't be too hard to do...
asmadeus@codewreck.org writes: > Schspa Shi wrote on Wed, Nov 30, 2022 at 04:14:32PM +0800: >> > - reqs are alloced in a kmem_cache created with SLAB_TYPESAFE_BY_RCU. >> > This means that if we get a req from idr_find, even if it has just been >> > freed, it either is still in the state it was freed at (hence refcount >> > 0, we ignore it) or is another req coming from the same cache (if >> >> If the req was newly alloced(It was at a new page), refcount maybe not >> 0, there will be problem in this case. It seems we can't relay on this. >> >> We need to set the refcount to zero before add it to idr in p9_tag_alloc. > > Hmm, if it's reused then it's zero by definition, but if it's a new > allocation (uninitialized) then anything goes; that lookup could find > and increase it before the refcount_set, and we'd have an off by one > leading to use after free. Good catch! > > Initializing it to zero will lead to the client busy-looping until after > the refcount is properly set, which should work. Why? It looks no different from the previous process here. Initializing it to zero should makes no difference. > Setting refcount early might have us use an re-used req before the tag > has been changed so that one cannot move. > > Could you test with just that changed if syzbot still reproduces this > bug? (perhaps add a comment if you send this) > I have upload a new v2 change for this. But I can't easily reproduce this problem. > ------ > diff --git a/net/9p/client.c b/net/9p/client.c > index aaa37b07e30a..aa64724f6a69 100644 > --- a/net/9p/client.c > +++ b/net/9p/client.c > @@ -297,6 +297,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, uint t_size, uint r_size, > p9pdu_reset(&req->rc); > req->t_err = 0; > req->status = REQ_STATUS_ALLOC; > + refcount_set(&req->refcount, 0); > init_waitqueue_head(&req->wq); > INIT_LIST_HEAD(&req->req_list); > > ----- > >> > refcount isn't zero, we can check its tag) >> >> As for the release case, the next request will have the same tag with >> high probability. It's better to make the tag value to be an increase >> sequence, thus will avoid very much possible req reuse. > > I'd love to be able to do this, but it would break some servers that > assume tags are small (e.g. using it as an index for a tag array) > ... I thought nfs-ganesha was doing this but they properly put in in > buckets, so that's one less server to worry about, but I wouldn't put > it past some simple servers to do that; having a way to lookup a given > tag for flush is an implementation requirement. > > That shouldn't be a problem though as that will just lead to either fail > the guard check after lookup (m->rreq->status != REQ_STATUS_SENT) or be > processed as a normal reply if it's already been sent by the other > thread at this point. > OTOH, that m->rreq->status isn't protected by m->req_lock in trans_fd, > and that is probably another bug... Yes, I aggree with you, another BUG.
On Wednesday, November 30, 2022 1:54:21 PM CET asmadeus@codewreck.org wrote: > Christian Schoenebeck wrote on Wed, Nov 30, 2022 at 01:43:20PM +0100: > > > > As for the release case, the next request will have the same tag with > > > > high probability. It's better to make the tag value to be an increase > > > > sequence, thus will avoid very much possible req reuse. > > > > > > I'd love to be able to do this, but it would break some servers that > > > assume tags are small (e.g. using it as an index for a tag array) > > > ... I thought nfs-ganesha was doing this but they properly put in in > > > buckets, so that's one less server to worry about, but I wouldn't put > > > it past some simple servers to do that; having a way to lookup a given > > > tag for flush is an implementation requirement. > > > > I really think it's time to emit tag number sequentially. If it turns out that > > it's a server that is broken, we could then simply ignore replies with old/ > > unknown tag number. It would also help a lot when debugging 9p issues in > > general when you know tag numbers are not re-used (in near future). > > > > A 9p server must not make any assumptions how tag numbers are generated by > > client, whether dense or sparse, or whatever. If it does then server is > > broken, which is much easier to fix than synchronization issues we have to > > deal with like this one. > > Well, it's a one line change: just replace the idr_alloc in the else > branch of p9_tag_alloc with idr_alloc_cyclic. > But linux has an history of not breaking userspace, even if it's broken. > One could argue that the server side of a networked protocol isn't > as tightly coupled but I still think we should be careful with it -- > adding a new mount option to rever to the old behaviour at the very > least. +1 for the mount option. > I'm also not convinced it'd fix anything here, we're not talking about a > real server but about a potential attacker -- if a reply comes in with > the next tag while we're allocating it, we'll get the exact same problem > as we have right now. > Frankly, 9p has no security at all so I'm not sure this is something we > really need to worry about, but bugs are bugs so we might as well fix > them if someone has the time for that... > > Anyway, I can appreciate that logs will definitely be easier to read, so > an option to voluntarily switch to cyclic allocation would be more than > welcome as a first step and shouldn't be too hard to do... I would actually do it the other way around: generating continuous sequential tags by default and only reverting back to dense tags if requested by mount option. Is there any server implementation known to rely on current dense tag generation? If there is really some exotic server somewhere that uses e.g. a simple constant size array to lookup tags and nobody is able to replace that array by a hash table or something for whatever reason, then I am pretty sure that server is limited at other ends as well (e.g. small 'msize'). So what we could do is adjusting the default behaviour according to the other side and allow to explicitly set both sequential and dense tags by mount option (i.e. not just a boolean mount option). Best regards, Christian Schoenebeck
Schspa Shi wrote on Wed, Nov 30, 2022 at 09:15:12PM +0800: > >> If the req was newly alloced(It was at a new page), refcount maybe not > >> 0, there will be problem in this case. It seems we can't relay on this. > >> > >> We need to set the refcount to zero before add it to idr in p9_tag_alloc. > > > > Hmm, if it's reused then it's zero by definition, but if it's a new > > allocation (uninitialized) then anything goes; that lookup could find > > and increase it before the refcount_set, and we'd have an off by one > > leading to use after free. Good catch! > > > > Initializing it to zero will lead to the client busy-looping until after > > the refcount is properly set, which should work. > > Why? It looks no different from the previous process here. Initializing > it to zero should makes no difference. I do not understand this remark. If this is a freed request it will be zero, because we freed the request as the refcount hit zero, but if it's a newly allocated request then the memory is uninitalized, and the lookup can get anything. In that case we want refcount to be zero to have the check in p9_tag_lookup to not use the request until we set the refcount to 2. > > Setting refcount early might have us use an re-used req before the tag > > has been changed so that one cannot move. > > > > Could you test with just that changed if syzbot still reproduces this > > bug? (perhaps add a comment if you send this) > > > > I have upload a new v2 change for this. But I can't easily reproduce > this problem. Ah, I read that v2 as you actually ran some tests with this, sorry for the misuderstanding. Well, it's a fix anyway, so it cannot hurt to apply...
Christian Schoenebeck wrote on Wed, Nov 30, 2022 at 02:25:59PM +0100: > > I'm also not convinced it'd fix anything here, we're not talking about a > > real server but about a potential attacker -- if a reply comes in with > > the next tag while we're allocating it, we'll get the exact same problem > > as we have right now. > > Frankly, 9p has no security at all so I'm not sure this is something we > > really need to worry about, but bugs are bugs so we might as well fix > > them if someone has the time for that... > > > > Anyway, I can appreciate that logs will definitely be easier to read, so > > an option to voluntarily switch to cyclic allocation would be more than > > welcome as a first step and shouldn't be too hard to do... > > I would actually do it the other way around: generating continuous sequential > tags by default and only reverting back to dense tags if requested by mount > option. > > Is there any server implementation known to rely on current dense tag > generation? No, I thought ganesha did when we discussed it last time, but checked just now and it appears to be correct. I had a quick look at other servers I have around (diod uses a plain list, libixp uses a bucket list like ganesha...), but there are so many 9p servers out here that I'm far from keeping track... Happy to give it a try and see who complains... > If there is really some exotic server somewhere that uses e.g. a simple > constant size array to lookup tags and nobody is able to replace that array by > a hash table or something for whatever reason, then I am pretty sure that > server is limited at other ends as well (e.g. small 'msize'). So what we could > do is adjusting the default behaviour according to the other side and allow to > explicitly set both sequential and dense tags by mount option (i.e. not just > a boolean mount option). Well, TVERSION doesn't have much negotiation capability aside of msize, not sure what to suggest here...
asmadeus@codewreck.org writes: > Schspa Shi wrote on Wed, Nov 30, 2022 at 09:15:12PM +0800: >> >> If the req was newly alloced(It was at a new page), refcount maybe not >> >> 0, there will be problem in this case. It seems we can't relay on this. >> >> >> >> We need to set the refcount to zero before add it to idr in p9_tag_alloc. >> > >> > Hmm, if it's reused then it's zero by definition, but if it's a new >> > allocation (uninitialized) then anything goes; that lookup could find >> > and increase it before the refcount_set, and we'd have an off by one >> > leading to use after free. Good catch! >> > >> > Initializing it to zero will lead to the client busy-looping until after >> > the refcount is properly set, which should work. >> >> Why? It looks no different from the previous process here. Initializing >> it to zero should makes no difference. > > I do not understand this remark. > If this is a freed request it will be zero, because we freed the request > as the refcount hit zero, but if it's a newly allocated request then the > memory is uninitalized, and the lookup can get anything. Here is my misunderstanding. I thought you meant that there would be a loop on the client side to wait for the refcount to become a non-zero value. Actually, there is no such loop. > > In that case we want refcount to be zero to have the check in > p9_tag_lookup to not use the request until we set the refcount to 2. > > >> > Setting refcount early might have us use an re-used req before the tag >> > has been changed so that one cannot move. >> > >> > Could you test with just that changed if syzbot still reproduces this >> > bug? (perhaps add a comment if you send this) >> > >> >> I have upload a new v2 change for this. But I can't easily reproduce >> this problem. > > Ah, I read that v2 as you actually ran some tests with this, sorry for > the misuderstanding. > > Well, it's a fix anyway, so it cannot hurt to apply...
diff --git a/net/9p/client.c b/net/9p/client.c index aaa37b07e30a..963cf91aa0d5 100644 --- a/net/9p/client.c +++ b/net/9p/client.c @@ -440,7 +440,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status) smp_wmb(); req->status = status; - wake_up(&req->wq); + wake_up_all(&req->wq); p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc.tag); p9_req_put(c, req); } diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c index eeea0a6a75b6..ee2d6b231af1 100644 --- a/net/9p/trans_fd.c +++ b/net/9p/trans_fd.c @@ -30,6 +30,7 @@ #include <net/9p/transport.h> #include <linux/syscalls.h> /* killme */ +#include <linux/wait.h> #define P9_PORT 564 #define MAX_SOCK_BUF (1024*1024) @@ -728,6 +729,17 @@ static int p9_fd_cancelled(struct p9_client *client, struct p9_req_t *req) return 0; } + /* If the request is been sent to the server, we need to wait for the + * job to finish. + */ + if (req->status == REQ_STATUS_SENT) { + spin_unlock(&m->req_lock); + p9_debug(P9_DEBUG_TRANS, "client %p req %p wait done\n", + client, req); + wait_event(req->wq, req->status >= REQ_STATUS_RCVD); + + return 0; + } /* we haven't received a response for oldreq, * remove it from the list. */