diff mbox series

9p: fix crash when transaction killed

Message ID	20221129162251.90790-1-schspa@gmail.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Schspa Shi <schspa@gmail.com> To: ericvh@gmail.com, lucho@ionkov.net, asmadeus@codewreck.org, linux_oss@crudebyte.co, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: v9fs-developer@lists.sourceforge.net, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Schspa Shi <schspa@gmail.com>, syzbot+8f1060e2aaf8ca55220b@syzkaller.appspotmail.com Subject: [PATCH] 9p: fix crash when transaction killed Date: Wed, 30 Nov 2022 00:22:51 +0800 Message-Id: <20221129162251.90790-1-schspa@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	9p: fix crash when transaction killed \| 9p: fix crash when transaction killed

Commit Message

Schspa Shi Nov. 29, 2022, 4:22 p.m. UTC

  The transport layer of fs does not fully support the cancel request.
When the request is in the REQ_STATUS_SENT state, p9_fd_cancelled
will forcibly delete the request, and at this time p9_[read/write]_work
may continue to use the request. Therefore, it causes UAF .

There is the logs from syzbot.

Corrupted memory at 0xffff88807eade00b [ 0xff 0x07 0x00 0x00 0x00 0x00
0x00 0x00 . . . . . . . . ] (in kfence-#110):
 p9_fcall_fini net/9p/client.c:248 [inline]
 p9_req_put net/9p/client.c:396 [inline]
 p9_req_put+0x208/0x250 net/9p/client.c:390
 p9_client_walk+0x247/0x540 net/9p/client.c:1165
 clone_fid fs/9p/fid.h:21 [inline]
 v9fs_fid_xattr_set+0xe4/0x2b0 fs/9p/xattr.c:118
 v9fs_xattr_set fs/9p/xattr.c:100 [inline]
 v9fs_xattr_handler_set+0x6f/0x120 fs/9p/xattr.c:159
 __vfs_setxattr+0x119/0x180 fs/xattr.c:182
 __vfs_setxattr_noperm+0x129/0x5f0 fs/xattr.c:216
 __vfs_setxattr_locked+0x1d3/0x260 fs/xattr.c:277
 vfs_setxattr+0x143/0x340 fs/xattr.c:309
 setxattr+0x146/0x160 fs/xattr.c:617
 path_setxattr+0x197/0x1c0 fs/xattr.c:636
 __do_sys_setxattr fs/xattr.c:652 [inline]
 __se_sys_setxattr fs/xattr.c:648 [inline]
 __ia32_sys_setxattr+0xc0/0x160 fs/xattr.c:648
 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
 __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178
 do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203
 entry_SYSENTER_compat_after_hwframe+0x70/0x82

Below is a similar scenario, the scenario in the syzbot log looks more
complicated than this one, but the root cause seems to be the same.

     T21124               p9_write_work        p9 read_work
======================== first trans =================================
p9_client_walk
  p9_client_rpc
    p9_client_prepare_req
    /* req->refcount == 2 */
    c->trans_mod->request(c, req);
      p9_fd_request
        req move to unsent_req_list
                            req->status = REQ_STATUS_SENT;
                            req move to req_list
                            << send to server >>
    wait_event_killable
    << get kill signal >>
    if (c->trans_mod->cancel(c, req))
       p9_client_flush(c, req);
         /* send flush request */
         req = p9_client_rpc(c, P9_TFLUSH, "w", oldtag);
		 if (c->trans_mod->cancelled)
            c->trans_mod->cancelled(c, oldreq);
              /* old req was deleted from req_list */
              /* req->refcount == 1 */
  p9_req_put
    /* req->refcount == 0 */
    << preempted >>
                                       << get response, UAF here >>
                                       m->rreq = p9_tag_lookup(m->client, m->rc.tag);
                                         /* req->refcount == 1 */
                                       << do response >>
                                       p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD);
                                         /* req->refcount == 0 */
                                         p9_fcall_fini
                                         /* request have been freed */
    p9_fcall_fini
     /* double free */
                                       p9_req_put(m->client, m->rreq);
                                         /* req->refcount == 1 */

To fix it, we can wait the request with status REQ_STATUS_SENT returned.

Reported-by: syzbot+8f1060e2aaf8ca55220b@syzkaller.appspotmail.com

Signed-off-by: Schspa Shi <schspa@gmail.com>
---
 net/9p/client.c   |  2 +-
 net/9p/trans_fd.c | 12 ++++++++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

Comments

Schspa Shi Nov. 29, 2022, 4:26 p.m. UTC | #1

Schspa Shi <schspa@gmail.com> writes:

> The transport layer of fs does not fully support the cancel request.
> When the request is in the REQ_STATUS_SENT state, p9_fd_cancelled
> will forcibly delete the request, and at this time p9_[read/write]_work
> may continue to use the request. Therefore, it causes UAF .
>
> There is the logs from syzbot.
>
> Corrupted memory at 0xffff88807eade00b [ 0xff 0x07 0x00 0x00 0x00 0x00
> 0x00 0x00 . . . . . . . . ] (in kfence-#110):
>  p9_fcall_fini net/9p/client.c:248 [inline]
>  p9_req_put net/9p/client.c:396 [inline]
>  p9_req_put+0x208/0x250 net/9p/client.c:390
>  p9_client_walk+0x247/0x540 net/9p/client.c:1165
>  clone_fid fs/9p/fid.h:21 [inline]
>  v9fs_fid_xattr_set+0xe4/0x2b0 fs/9p/xattr.c:118
>  v9fs_xattr_set fs/9p/xattr.c:100 [inline]
>  v9fs_xattr_handler_set+0x6f/0x120 fs/9p/xattr.c:159
>  __vfs_setxattr+0x119/0x180 fs/xattr.c:182
>  __vfs_setxattr_noperm+0x129/0x5f0 fs/xattr.c:216
>  __vfs_setxattr_locked+0x1d3/0x260 fs/xattr.c:277
>  vfs_setxattr+0x143/0x340 fs/xattr.c:309
>  setxattr+0x146/0x160 fs/xattr.c:617
>  path_setxattr+0x197/0x1c0 fs/xattr.c:636
>  __do_sys_setxattr fs/xattr.c:652 [inline]
>  __se_sys_setxattr fs/xattr.c:648 [inline]
>  __ia32_sys_setxattr+0xc0/0x160 fs/xattr.c:648
>  do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
>  __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178
>  do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203
>  entry_SYSENTER_compat_after_hwframe+0x70/0x82
>
> Below is a similar scenario, the scenario in the syzbot log looks more
> complicated than this one, but the root cause seems to be the same.
>
>      T21124               p9_write_work        p9 read_work
> ======================== first trans =================================
> p9_client_walk
>   p9_client_rpc
>     p9_client_prepare_req
>     /* req->refcount == 2 */
>     c->trans_mod->request(c, req);
>       p9_fd_request
>         req move to unsent_req_list
>                             req->status = REQ_STATUS_SENT;
>                             req move to req_list
>                             << send to server >>
>     wait_event_killable
>     << get kill signal >>
>     if (c->trans_mod->cancel(c, req))
>        p9_client_flush(c, req);
>          /* send flush request */
>          req = p9_client_rpc(c, P9_TFLUSH, "w", oldtag);
> 		 if (c->trans_mod->cancelled)
>             c->trans_mod->cancelled(c, oldreq);
>               /* old req was deleted from req_list */
>               /* req->refcount == 1 */
>   p9_req_put
>     /* req->refcount == 0 */
>     << preempted >>
>                                        << get response, UAF here >>
>                                        m->rreq = p9_tag_lookup(m->client, m->rc.tag);
>                                          /* req->refcount == 1 */
>                                        << do response >>
>                                        p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD);
>                                          /* req->refcount == 0 */
>                                          p9_fcall_fini
>                                          /* request have been freed */
>     p9_fcall_fini
>      /* double free */
>                                        p9_req_put(m->client, m->rreq);
>                                          /* req->refcount == 1 */
>
> To fix it, we can wait the request with status REQ_STATUS_SENT returned.
>
> Reported-by: syzbot+8f1060e2aaf8ca55220b@syzkaller.appspotmail.com
>
> Signed-off-by: Schspa Shi <schspa@gmail.com>
> ---
>  net/9p/client.c   |  2 +-
>  net/9p/trans_fd.c | 12 ++++++++++++
>  2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/net/9p/client.c b/net/9p/client.c
> index aaa37b07e30a..963cf91aa0d5 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -440,7 +440,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
>  	smp_wmb();
>  	req->status = status;
>  
> -	wake_up(&req->wq);
> +	wake_up_all(&req->wq);
>  	p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc.tag);
>  	p9_req_put(c, req);
>  }
> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index eeea0a6a75b6..ee2d6b231af1 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -30,6 +30,7 @@
>  #include <net/9p/transport.h>
>  
>  #include <linux/syscalls.h> /* killme */
> +#include <linux/wait.h>
>  
>  #define P9_PORT 564
>  #define MAX_SOCK_BUF (1024*1024)
> @@ -728,6 +729,17 @@ static int p9_fd_cancelled(struct p9_client *client, struct p9_req_t *req)
>  		return 0;
>  	}
>  
> +	/* If the request is been sent to the server, we need to wait for the
> +	 * job to finish.
> +	 */
> +	if (req->status == REQ_STATUS_SENT) {
> +		spin_unlock(&m->req_lock);
> +		p9_debug(P9_DEBUG_TRANS, "client %p req %p wait done\n",
> +			 client, req);
> +		wait_event(req->wq, req->status >= REQ_STATUS_RCVD);
> +
> +		return 0;
> +	}
>  	/* we haven't received a response for oldreq,
>  	 * remove it from the list.
>  	 */

Add Christian Schoenebeck for bad mail address typo.

Christian Schoenebeck Nov. 29, 2022, 6:23 p.m. UTC | #2

On Tuesday, November 29, 2022 5:26:46 PM CET Schspa Shi wrote:
> 
> Schspa Shi <schspa@gmail.com> writes:
> 
> > The transport layer of fs does not fully support the cancel request.
> > When the request is in the REQ_STATUS_SENT state, p9_fd_cancelled
> > will forcibly delete the request, and at this time p9_[read/write]_work
> > may continue to use the request. Therefore, it causes UAF .
> >
> > There is the logs from syzbot.
> >
> > Corrupted memory at 0xffff88807eade00b [ 0xff 0x07 0x00 0x00 0x00 0x00
> > 0x00 0x00 . . . . . . . . ] (in kfence-#110):
> >  p9_fcall_fini net/9p/client.c:248 [inline]
> >  p9_req_put net/9p/client.c:396 [inline]
> >  p9_req_put+0x208/0x250 net/9p/client.c:390
> >  p9_client_walk+0x247/0x540 net/9p/client.c:1165
> >  clone_fid fs/9p/fid.h:21 [inline]
> >  v9fs_fid_xattr_set+0xe4/0x2b0 fs/9p/xattr.c:118
> >  v9fs_xattr_set fs/9p/xattr.c:100 [inline]
> >  v9fs_xattr_handler_set+0x6f/0x120 fs/9p/xattr.c:159
> >  __vfs_setxattr+0x119/0x180 fs/xattr.c:182
> >  __vfs_setxattr_noperm+0x129/0x5f0 fs/xattr.c:216
> >  __vfs_setxattr_locked+0x1d3/0x260 fs/xattr.c:277
> >  vfs_setxattr+0x143/0x340 fs/xattr.c:309
> >  setxattr+0x146/0x160 fs/xattr.c:617
> >  path_setxattr+0x197/0x1c0 fs/xattr.c:636
> >  __do_sys_setxattr fs/xattr.c:652 [inline]
> >  __se_sys_setxattr fs/xattr.c:648 [inline]
> >  __ia32_sys_setxattr+0xc0/0x160 fs/xattr.c:648
> >  do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
> >  __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178
> >  do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203
> >  entry_SYSENTER_compat_after_hwframe+0x70/0x82
> >
> > Below is a similar scenario, the scenario in the syzbot log looks more
> > complicated than this one, but the root cause seems to be the same.
> >
> >      T21124               p9_write_work        p9 read_work
> > ======================== first trans =================================
> > p9_client_walk
> >   p9_client_rpc
> >     p9_client_prepare_req
> >     /* req->refcount == 2 */
> >     c->trans_mod->request(c, req);
> >       p9_fd_request
> >         req move to unsent_req_list
> >                             req->status = REQ_STATUS_SENT;
> >                             req move to req_list
> >                             << send to server >>
> >     wait_event_killable
> >     << get kill signal >>
> >     if (c->trans_mod->cancel(c, req))
> >        p9_client_flush(c, req);
> >          /* send flush request */
> >          req = p9_client_rpc(c, P9_TFLUSH, "w", oldtag);
> > 		 if (c->trans_mod->cancelled)
> >             c->trans_mod->cancelled(c, oldreq);
> >               /* old req was deleted from req_list */
> >               /* req->refcount == 1 */
> >   p9_req_put
> >     /* req->refcount == 0 */
> >     << preempted >>
> >                                        << get response, UAF here >>
> >                                        m->rreq = p9_tag_lookup(m->client, m->rc.tag);
> >                                          /* req->refcount == 1 */
> >                                        << do response >>
> >                                        p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD);
> >                                          /* req->refcount == 0 */
> >                                          p9_fcall_fini
> >                                          /* request have been freed */
> >     p9_fcall_fini
> >      /* double free */
> >                                        p9_req_put(m->client, m->rreq);
> >                                          /* req->refcount == 1 */
> >
> > To fix it, we can wait the request with status REQ_STATUS_SENT returned.

9p server might or might not send a reply on cancelled request. If 9p server
notices client's Tflush request early enough, then it would simply discard the
old=cancelled request and not send any reply on that old request. If server
notices Tflush too late, then server would send a response to the old request.

http://ericvh.github.io/9p-rfc/rfc9p2000.html#anchor28

However after sending Tflush client waits for the corresponding Rflush
response, and at this point situation should be clear; no further response
expected from server for old request at this point. And that's what Linux
client does.

Which server implementation caused that?

> >
> > Reported-by: syzbot+8f1060e2aaf8ca55220b@syzkaller.appspotmail.com
> >
> > Signed-off-by: Schspa Shi <schspa@gmail.com>
> > ---
> >  net/9p/client.c   |  2 +-
> >  net/9p/trans_fd.c | 12 ++++++++++++
> >  2 files changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/9p/client.c b/net/9p/client.c
> > index aaa37b07e30a..963cf91aa0d5 100644
> > --- a/net/9p/client.c
> > +++ b/net/9p/client.c
> > @@ -440,7 +440,7 @@ void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
> >  	smp_wmb();
> >  	req->status = status;
> >  
> > -	wake_up(&req->wq);
> > +	wake_up_all(&req->wq);

Purpose?

> >  	p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc.tag);
> >  	p9_req_put(c, req);
> >  }
> > diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> > index eeea0a6a75b6..ee2d6b231af1 100644
> > --- a/net/9p/trans_fd.c
> > +++ b/net/9p/trans_fd.c
> > @@ -30,6 +30,7 @@
> >  #include <net/9p/transport.h>
> >  
> >  #include <linux/syscalls.h> /* killme */
> > +#include <linux/wait.h>
> >  
> >  #define P9_PORT 564
> >  #define MAX_SOCK_BUF (1024*1024)
> > @@ -728,6 +729,17 @@ static int p9_fd_cancelled(struct p9_client *client, struct p9_req_t *req)
> >  		return 0;
> >  	}
> >  
> > +	/* If the request is been sent to the server, we need to wait for the
> > +	 * job to finish.
> > +	 */
> > +	if (req->status == REQ_STATUS_SENT) {
> > +		spin_unlock(&m->req_lock);
> > +		p9_debug(P9_DEBUG_TRANS, "client %p req %p wait done\n",
> > +			 client, req);
> > +		wait_event(req->wq, req->status >= REQ_STATUS_RCVD);
> > +
> > +		return 0;
> > +	}
> >  	/* we haven't received a response for oldreq,
> >  	 * remove it from the list.
> >  	 */
> 
> Add Christian Schoenebeck for bad mail address typo.
> 
>

Dominique Martinet Nov. 29, 2022, 10:38 p.m. UTC | #3

Schspa Shi wrote on Wed, Nov 30, 2022 at 12:22:51AM +0800:
> The transport layer of fs does not fully support the cancel request.
> When the request is in the REQ_STATUS_SENT state, p9_fd_cancelled
> will forcibly delete the request, and at this time p9_[read/write]_work
> may continue to use the request. Therefore, it causes UAF .
> 
> There is the logs from syzbot.
> 
> Corrupted memory at 0xffff88807eade00b [ 0xff 0x07 0x00 0x00 0x00 0x00
> 0x00 0x00 . . . . . . . . ] (in kfence-#110):
>  p9_fcall_fini net/9p/client.c:248 [inline]
>  p9_req_put net/9p/client.c:396 [inline]
>  p9_req_put+0x208/0x250 net/9p/client.c:390
>  p9_client_walk+0x247/0x540 net/9p/client.c:1165
>  clone_fid fs/9p/fid.h:21 [inline]
>  v9fs_fid_xattr_set+0xe4/0x2b0 fs/9p/xattr.c:118
>  v9fs_xattr_set fs/9p/xattr.c:100 [inline]
>  v9fs_xattr_handler_set+0x6f/0x120 fs/9p/xattr.c:159
>  __vfs_setxattr+0x119/0x180 fs/xattr.c:182
>  __vfs_setxattr_noperm+0x129/0x5f0 fs/xattr.c:216
>  __vfs_setxattr_locked+0x1d3/0x260 fs/xattr.c:277
>  vfs_setxattr+0x143/0x340 fs/xattr.c:309
>  setxattr+0x146/0x160 fs/xattr.c:617
>  path_setxattr+0x197/0x1c0 fs/xattr.c:636
>  __do_sys_setxattr fs/xattr.c:652 [inline]
>  __se_sys_setxattr fs/xattr.c:648 [inline]
>  __ia32_sys_setxattr+0xc0/0x160 fs/xattr.c:648
>  do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
>  __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178
>  do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203
>  entry_SYSENTER_compat_after_hwframe+0x70/0x82
> 
> Below is a similar scenario, the scenario in the syzbot log looks more
> complicated than this one, but the root cause seems to be the same.
> 
>      T21124               p9_write_work        p9 read_work
> ======================== first trans =================================
> p9_client_walk
>   p9_client_rpc
>     p9_client_prepare_req
>     /* req->refcount == 2 */
>     c->trans_mod->request(c, req);
>       p9_fd_request
>         req move to unsent_req_list
>                             req->status = REQ_STATUS_SENT;
>                             req move to req_list
>                             << send to server >>
>     wait_event_killable
>     << get kill signal >>
>     if (c->trans_mod->cancel(c, req))
>        p9_client_flush(c, req);
>          /* send flush request */
>          req = p9_client_rpc(c, P9_TFLUSH, "w", oldtag);
> 		 if (c->trans_mod->cancelled)
>             c->trans_mod->cancelled(c, oldreq);
>               /* old req was deleted from req_list */
>               /* req->refcount == 1 */
>   p9_req_put
>     /* req->refcount == 0 */
>     << preempted >>
>                                        << get response, UAF here >>
>                                        m->rreq = p9_tag_lookup(m->client, m->rc.tag);
>                                          /* req->refcount == 1 */
>                                        << do response >>
>                                        p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD);
>                                          /* req->refcount == 0 */
>                                          p9_fcall_fini
>                                          /* request have been freed */
>     p9_fcall_fini
>      /* double free */
>                                        p9_req_put(m->client, m->rreq);
>                                          /* req->refcount == 1 */
> 
> To fix it, we can wait the request with status REQ_STATUS_SENT returned.

Christian replied on this (we cannot wait) but I agree with him -- the
scenario you describe is proteced by p9_tag_lookup checking for refcount
with refcount_inc_not_zero (p9_req_try_get).

The normal scenarii for flush are as follow:
 - cancel before request is sent: no flush, just free
 - flush is ignored and reply comes first: we get reply from original
request then reply from flush
 - flush is handled and reply never comes: we only get reply from flush

Protocol-wise, we can safely reuse the tag after the flush reply got
received; and as far as I can follow the code we only ever free the tag
(last p9_call_fini) after flush has returned so the entry should be
protected.

If we receive a response on the given tag between cancelled and the main
thread going out the request has been marked as FLSHD and should be
ignored. . . here is one p9_req_put in p9_read_work() in this case but
it corresponds to the ref obtained by p9_tag_lookup() so it should be
valid.


I'm happy to believe we have a race somewhere (even if no sane server
would produce it), but right now I don't see it looking at the code.. :/

Schspa Shi Nov. 30, 2022, 2:22 a.m. UTC | #4

asmadeus@codewreck.org writes:

> Schspa Shi wrote on Wed, Nov 30, 2022 at 12:22:51AM +0800:
>> The transport layer of fs does not fully support the cancel request.
>> When the request is in the REQ_STATUS_SENT state, p9_fd_cancelled
>> will forcibly delete the request, and at this time p9_[read/write]_work
>> may continue to use the request. Therefore, it causes UAF .
>> 
>> There is the logs from syzbot.
>> 
>> Corrupted memory at 0xffff88807eade00b [ 0xff 0x07 0x00 0x00 0x00 0x00
>> 0x00 0x00 . . . . . . . . ] (in kfence-#110):
>>  p9_fcall_fini net/9p/client.c:248 [inline]
>>  p9_req_put net/9p/client.c:396 [inline]
>>  p9_req_put+0x208/0x250 net/9p/client.c:390
>>  p9_client_walk+0x247/0x540 net/9p/client.c:1165
>>  clone_fid fs/9p/fid.h:21 [inline]
>>  v9fs_fid_xattr_set+0xe4/0x2b0 fs/9p/xattr.c:118
>>  v9fs_xattr_set fs/9p/xattr.c:100 [inline]
>>  v9fs_xattr_handler_set+0x6f/0x120 fs/9p/xattr.c:159
>>  __vfs_setxattr+0x119/0x180 fs/xattr.c:182
>>  __vfs_setxattr_noperm+0x129/0x5f0 fs/xattr.c:216
>>  __vfs_setxattr_locked+0x1d3/0x260 fs/xattr.c:277
>>  vfs_setxattr+0x143/0x340 fs/xattr.c:309
>>  setxattr+0x146/0x160 fs/xattr.c:617
>>  path_setxattr+0x197/0x1c0 fs/xattr.c:636
>>  __do_sys_setxattr fs/xattr.c:652 [inline]
>>  __se_sys_setxattr fs/xattr.c:648 [inline]
>>  __ia32_sys_setxattr+0xc0/0x160 fs/xattr.c:648
>>  do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
>>  __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178
>>  do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203
>>  entry_SYSENTER_compat_after_hwframe+0x70/0x82
>> 
>> Below is a similar scenario, the scenario in the syzbot log looks more
>> complicated than this one, but the root cause seems to be the same.
>> 
>>      T21124               p9_write_work        p9 read_work
>> ======================== first trans =================================
>> p9_client_walk
>>   p9_client_rpc
>>     p9_client_prepare_req
>>     /* req->refcount == 2 */
>>     c->trans_mod->request(c, req);
>>       p9_fd_request
>>         req move to unsent_req_list
>>                             req->status = REQ_STATUS_SENT;
>>                             req move to req_list
>>                             << send to server >>
>>     wait_event_killable
>>     << get kill signal >>
>>     if (c->trans_mod->cancel(c, req))
>>        p9_client_flush(c, req);
>>          /* send flush request */
>>          req = p9_client_rpc(c, P9_TFLUSH, "w", oldtag);
>> 		 if (c->trans_mod->cancelled)
>>             c->trans_mod->cancelled(c, oldreq);
>>               /* old req was deleted from req_list */
>>               /* req->refcount == 1 */
>>   p9_req_put
>>     /* req->refcount == 0 */
>>     << preempted >>
>>                                        << get response, UAF here >>
>>                                        m->rreq = p9_tag_lookup(m->client, m->rc.tag);
>>                                          /* req->refcount == 1 */
>>                                        << do response >>
>>                                        p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD);
>>                                          /* req->refcount == 0 */
>>                                          p9_fcall_fini
>>                                          /* request have been freed */
>>     p9_fcall_fini
>>      /* double free */
>>                                        p9_req_put(m->client, m->rreq);
>>                                          /* req->refcount == 1 */
>> 
>> To fix it, we can wait the request with status REQ_STATUS_SENT returned.
>
> Christian replied on this (we cannot wait) but I agree with him -- the

Yes, this is where I worry about too, this wait maybe cause a deadlock.

> scenario you describe is proteced by p9_tag_lookup checking for refcount
> with refcount_inc_not_zero (p9_req_try_get).

Thanks for pointing out the zero value check here, the scene in the
commit message does not hold.

>
> The normal scenarii for flush are as follow:
>  - cancel before request is sent: no flush, just free
>  - flush is ignored and reply comes first: we get reply from original
> request then reply from flush
>  - flush is handled and reply never comes: we only get reply from flush
>
> Protocol-wise, we can safely reuse the tag after the flush reply got
> received; and as far as I can follow the code we only ever free the tag
> (last p9_call_fini) after flush has returned so the entry should be
> protected.
>
> If we receive a response on the given tag between cancelled and the main
> thread going out the request has been marked as FLSHD and should be
> ignored. . . here is one p9_req_put in p9_read_work() in this case but
> it corresponds to the ref obtained by p9_tag_lookup() so it should be
> valid.
>
>
> I'm happy to believe we have a race somewhere (even if no sane server
> would produce it), but right now I don't see it looking at the code.. :/

And I think there is a race too. because the syzbot report about 9p fs
memory corruption multi times.

As for the problem, the p9_tag_lookup only takes the rcu_read_lock when
accessing the IDR, why it doesn't take the p9_client->lock? Maybe the
root cause is that a lock is missing here.

Schspa Shi Nov. 30, 2022, 3:26 a.m. UTC | #5

Schspa Shi <schspa@gmail.com> writes:

> asmadeus@codewreck.org writes:
>
>> Schspa Shi wrote on Wed, Nov 30, 2022 at 12:22:51AM +0800:
>>> The transport layer of fs does not fully support the cancel request.
>>> When the request is in the REQ_STATUS_SENT state, p9_fd_cancelled
>>> will forcibly delete the request, and at this time p9_[read/write]_work
>>> may continue to use the request. Therefore, it causes UAF .
>>> 
>>> There is the logs from syzbot.
>>> 
>>> Corrupted memory at 0xffff88807eade00b [ 0xff 0x07 0x00 0x00 0x00 0x00
>>> 0x00 0x00 . . . . . . . . ] (in kfence-#110):
>>>  p9_fcall_fini net/9p/client.c:248 [inline]
>>>  p9_req_put net/9p/client.c:396 [inline]
>>>  p9_req_put+0x208/0x250 net/9p/client.c:390
>>>  p9_client_walk+0x247/0x540 net/9p/client.c:1165
>>>  clone_fid fs/9p/fid.h:21 [inline]
>>>  v9fs_fid_xattr_set+0xe4/0x2b0 fs/9p/xattr.c:118
>>>  v9fs_xattr_set fs/9p/xattr.c:100 [inline]
>>>  v9fs_xattr_handler_set+0x6f/0x120 fs/9p/xattr.c:159
>>>  __vfs_setxattr+0x119/0x180 fs/xattr.c:182
>>>  __vfs_setxattr_noperm+0x129/0x5f0 fs/xattr.c:216
>>>  __vfs_setxattr_locked+0x1d3/0x260 fs/xattr.c:277
>>>  vfs_setxattr+0x143/0x340 fs/xattr.c:309
>>>  setxattr+0x146/0x160 fs/xattr.c:617
>>>  path_setxattr+0x197/0x1c0 fs/xattr.c:636
>>>  __do_sys_setxattr fs/xattr.c:652 [inline]
>>>  __se_sys_setxattr fs/xattr.c:648 [inline]
>>>  __ia32_sys_setxattr+0xc0/0x160 fs/xattr.c:648
>>>  do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline]
>>>  __do_fast_syscall_32+0x65/0xf0 arch/x86/entry/common.c:178
>>>  do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203
>>>  entry_SYSENTER_compat_after_hwframe+0x70/0x82
>>> 
>>> Below is a similar scenario, the scenario in the syzbot log looks more
>>> complicated than this one, but the root cause seems to be the same.
>>> 
>>>      T21124               p9_write_work        p9 read_work
>>> ======================== first trans =================================
>>> p9_client_walk
>>>   p9_client_rpc
>>>     p9_client_prepare_req
>>>     /* req->refcount == 2 */
>>>     c->trans_mod->request(c, req);
>>>       p9_fd_request
>>>         req move to unsent_req_list
>>>                             req->status = REQ_STATUS_SENT;
>>>                             req move to req_list
>>>                             << send to server >>
>>>     wait_event_killable
>>>     << get kill signal >>
>>>     if (c->trans_mod->cancel(c, req))
>>>        p9_client_flush(c, req);
>>>          /* send flush request */
>>>          req = p9_client_rpc(c, P9_TFLUSH, "w", oldtag);
>>> 		 if (c->trans_mod->cancelled)
>>>             c->trans_mod->cancelled(c, oldreq);
>>>               /* old req was deleted from req_list */
>>>               /* req->refcount == 1 */
>>>   p9_req_put
>>>     /* req->refcount == 0 */
>>>     << preempted >>
>>>                                        << get response, UAF here >>
>>>                                        m->rreq = p9_tag_lookup(m->client, m->rc.tag);
>>>                                          /* req->refcount == 1 */
>>>                                        << do response >>
>>>                                        p9_client_cb(m->client, m->rreq, REQ_STATUS_RCVD);
>>>                                          /* req->refcount == 0 */
>>>                                          p9_fcall_fini
>>>                                          /* request have been freed */
>>>     p9_fcall_fini
>>>      /* double free */
>>>                                        p9_req_put(m->client, m->rreq);
>>>                                          /* req->refcount == 1 */
>>> 
>>> To fix it, we can wait the request with status REQ_STATUS_SENT returned.
>>
>> Christian replied on this (we cannot wait) but I agree with him -- the
>
> Yes, this is where I worry about too, this wait maybe cause a deadlock.
>

@Christian: It seems we don't need this wait, The problem maybe cause by
lack of lock in p9_tag_lookup.

>> scenario you describe is proteced by p9_tag_lookup checking for refcount
>> with refcount_inc_not_zero (p9_req_try_get).
>
> Thanks for pointing out the zero value check here, the scene in the
> commit message does not hold.
>
>>
>> The normal scenarii for flush are as follow:
>>  - cancel before request is sent: no flush, just free
>>  - flush is ignored and reply comes first: we get reply from original
>> request then reply from flush
>>  - flush is handled and reply never comes: we only get reply from flush
>>
>> Protocol-wise, we can safely reuse the tag after the flush reply got
>> received; and as far as I can follow the code we only ever free the tag
>> (last p9_call_fini) after flush has returned so the entry should be
>> protected.
>>
>> If we receive a response on the given tag between cancelled and the main
>> thread going out the request has been marked as FLSHD and should be
>> ignored. . . here is one p9_req_put in p9_read_work() in this case but
>> it corresponds to the ref obtained by p9_tag_lookup() so it should be
>> valid.
>>
>>
>> I'm happy to believe we have a race somewhere (even if no sane server
>> would produce it), but right now I don't see it looking at the code.. :/
>
> And I think there is a race too. because the syzbot report about 9p fs
> memory corruption multi times.
>
> As for the problem, the p9_tag_lookup only takes the rcu_read_lock when
> accessing the IDR, why it doesn't take the p9_client->lock? Maybe the
> root cause is that a lock is missing here.

Add Christian Schoenebeck for bad mail address typo.

Dominique Martinet Nov. 30, 2022, 6:16 a.m. UTC | #6

(fixed Christophe's address, hopefully that will do for good...)

Schspa Shi wrote on Wed, Nov 30, 2022 at 10:22:44AM +0800:
> > I'm happy to believe we have a race somewhere (even if no sane server
> > would produce it), but right now I don't see it looking at the code.. :/
> 
> And I think there is a race too. because the syzbot report about 9p fs
> memory corruption multi times.

Yes, no point in denying that :)

> As for the problem, the p9_tag_lookup only takes the rcu_read_lock when
> accessing the IDR, why it doesn't take the p9_client->lock? Maybe the
> root cause is that a lock is missing here.

It shouldn't need to, but happy to try adding it.
For the logic:
 - idr_find is RCU-safe (trusting the comment above it)
 - reqs are alloced in a kmem_cache created with SLAB_TYPESAFE_BY_RCU.
 This means that if we get a req from idr_find, even if it has just been
 freed, it either is still in the state it was freed at (hence refcount
 0, we ignore it) or is another req coming from the same cache (if
 refcount isn't zero, we can check its tag)
 The refcount itself is an atomic operation so doesn't require lock.
 ... And in the off chance I hadn't considered that we're already
 dealing with a new request with the same tag here, we'll be updating
 its status so another receive for it shouldn't use it?...

I don't think adding the client lock helps with anything here, but it'll
certainly simplify this logic as we then are guaranteed not to get
obsolete results from idr_find.

Unfortunately adding a lock will slow things down regardless of
correctness, so it might just make the race much harder to hit without
fixing it and we might not notice that, so it'd be good to understand
the race.

Schspa Shi Nov. 30, 2022, 8:14 a.m. UTC | #7

asmadeus@codewreck.org writes:

> (fixed Christophe's address, hopefully that will do for good...)
>
> Schspa Shi wrote on Wed, Nov 30, 2022 at 10:22:44AM +0800:
>> > I'm happy to believe we have a race somewhere (even if no sane server
>> > would produce it), but right now I don't see it looking at the code.. :/
>> 
>> And I think there is a race too. because the syzbot report about 9p fs
>> memory corruption multi times.
>
> Yes, no point in denying that :)
>
>> As for the problem, the p9_tag_lookup only takes the rcu_read_lock when
>> accessing the IDR, why it doesn't take the p9_client->lock? Maybe the
>> root cause is that a lock is missing here.
>
> It shouldn't need to, but happy to try adding it.
> For the logic:
>  - idr_find is RCU-safe (trusting the comment above it)
>  - reqs are alloced in a kmem_cache created with SLAB_TYPESAFE_BY_RCU.
>  This means that if we get a req from idr_find, even if it has just been
>  freed, it either is still in the state it was freed at (hence refcount
>  0, we ignore it) or is another req coming from the same cache (if

If the req was newly alloced(It was at a new page), refcount maybe not
0, there will be problem in this case. It seems we can't relay on this.

We need to set the refcount to zero before add it to idr in p9_tag_alloc.

>  refcount isn't zero, we can check its tag)

As for the release case, the next request will have the same tag with
high probability. It's better to make the tag value to be an increase
sequence, thus will avoid very much possible req reuse.

>  The refcount itself is an atomic operation so doesn't require lock.
>  ... And in the off chance I hadn't considered that we're already
>  dealing with a new request with the same tag here, we'll be updating
>  its status so another receive for it shouldn't use it?...
>
> I don't think adding the client lock helps with anything here, but it'll
> certainly simplify this logic as we then are guaranteed not to get
> obsolete results from idr_find.
>
> Unfortunately adding a lock will slow things down regardless of
> correctness, so it might just make the race much harder to hit without
> fixing it and we might not notice that, so it'd be good to understand
> the race.

Dominique Martinet Nov. 30, 2022, 11:06 a.m. UTC | #8

Schspa Shi wrote on Wed, Nov 30, 2022 at 04:14:32PM +0800:
> >  - reqs are alloced in a kmem_cache created with SLAB_TYPESAFE_BY_RCU.
> >  This means that if we get a req from idr_find, even if it has just been
> >  freed, it either is still in the state it was freed at (hence refcount
> >  0, we ignore it) or is another req coming from the same cache (if
> 
> If the req was newly alloced(It was at a new page), refcount maybe not
> 0, there will be problem in this case. It seems we can't relay on this.
> 
> We need to set the refcount to zero before add it to idr in p9_tag_alloc.

Hmm, if it's reused then it's zero by definition, but if it's a new
allocation (uninitialized) then anything goes; that lookup could find
and increase it before the refcount_set, and we'd have an off by one
leading to use after free. Good catch!

Initializing it to zero will lead to the client busy-looping until after
the refcount is properly set, which should work.
Setting refcount early might have us use an re-used req before the tag
has been changed so that one cannot move.

Could you test with just that changed if syzbot still reproduces this
bug? (perhaps add a comment if you send this)

------
diff --git a/net/9p/client.c b/net/9p/client.c
index aaa37b07e30a..aa64724f6a69 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -297,6 +297,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, uint t_size, uint r_size,
 	p9pdu_reset(&req->rc);
 	req->t_err = 0;
 	req->status = REQ_STATUS_ALLOC;
+	refcount_set(&req->refcount, 0);
 	init_waitqueue_head(&req->wq);
 	INIT_LIST_HEAD(&req->req_list);

----- 

> >  refcount isn't zero, we can check its tag)
> 
> As for the release case, the next request will have the same tag with
> high probability. It's better to make the tag value to be an increase
> sequence, thus will avoid very much possible req reuse.

I'd love to be able to do this, but it would break some servers that
assume tags are small (e.g. using it as an index for a tag array)
... I thought nfs-ganesha was doing this but they properly put in in
buckets, so that's one less server to worry about, but I wouldn't put
it past some simple servers to do that; having a way to lookup a given
tag for flush is an implementation requirement.

That shouldn't be a problem though as that will just lead to either fail
the guard check after lookup (m->rreq->status != REQ_STATUS_SENT) or be
processed as a normal reply if it's already been sent by the other
thread at this point.
OTOH, that m->rreq->status isn't protected by m->req_lock in trans_fd,
and that is probably another bug...

Christian Schoenebeck Nov. 30, 2022, 12:43 p.m. UTC | #9

On Wednesday, November 30, 2022 12:06:31 PM CET asmadeus@codewreck.org wrote:
> Schspa Shi wrote on Wed, Nov 30, 2022 at 04:14:32PM +0800:
> > >  - reqs are alloced in a kmem_cache created with SLAB_TYPESAFE_BY_RCU.
> > >  This means that if we get a req from idr_find, even if it has just been
> > >  freed, it either is still in the state it was freed at (hence refcount
> > >  0, we ignore it) or is another req coming from the same cache (if
> > 
> > If the req was newly alloced(It was at a new page), refcount maybe not
> > 0, there will be problem in this case. It seems we can't relay on this.
> > 
> > We need to set the refcount to zero before add it to idr in p9_tag_alloc.
> 
> Hmm, if it's reused then it's zero by definition, but if it's a new
> allocation (uninitialized) then anything goes; that lookup could find
> and increase it before the refcount_set, and we'd have an off by one
> leading to use after free. Good catch!
> 
> Initializing it to zero will lead to the client busy-looping until after
> the refcount is properly set, which should work.
> Setting refcount early might have us use an re-used req before the tag
> has been changed so that one cannot move.
> 
> Could you test with just that changed if syzbot still reproduces this
> bug? (perhaps add a comment if you send this)
> 
> ------
> diff --git a/net/9p/client.c b/net/9p/client.c
> index aaa37b07e30a..aa64724f6a69 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -297,6 +297,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, uint t_size, uint r_size,
>  	p9pdu_reset(&req->rc);
>  	req->t_err = 0;
>  	req->status = REQ_STATUS_ALLOC;
> +	refcount_set(&req->refcount, 0);
>  	init_waitqueue_head(&req->wq);
>  	INIT_LIST_HEAD(&req->req_list);
> 
> ----- 
> 
> > >  refcount isn't zero, we can check its tag)
> > 
> > As for the release case, the next request will have the same tag with
> > high probability. It's better to make the tag value to be an increase
> > sequence, thus will avoid very much possible req reuse.
> 
> I'd love to be able to do this, but it would break some servers that
> assume tags are small (e.g. using it as an index for a tag array)
> ... I thought nfs-ganesha was doing this but they properly put in in
> buckets, so that's one less server to worry about, but I wouldn't put
> it past some simple servers to do that; having a way to lookup a given
> tag for flush is an implementation requirement.

I really think it's time to emit tag number sequentially. If it turns out that
it's a server that is broken, we could then simply ignore replies with old/
unknown tag number. It would also help a lot when debugging 9p issues in
general when you know tag numbers are not re-used (in near future).

A 9p server must not make any assumptions how tag numbers are generated by
client, whether dense or sparse, or whatever. If it does then server is
broken, which is much easier to fix than synchronization issues we have to
deal with like this one.

> That shouldn't be a problem though as that will just lead to either fail
> the guard check after lookup (m->rreq->status != REQ_STATUS_SENT) or be
> processed as a normal reply if it's already been sent by the other
> thread at this point.
> OTOH, that m->rreq->status isn't protected by m->req_lock in trans_fd,
> and that is probably another bug...

Dominique Martinet Nov. 30, 2022, 12:54 p.m. UTC | #10

Christian Schoenebeck wrote on Wed, Nov 30, 2022 at 01:43:20PM +0100:
> > > As for the release case, the next request will have the same tag with
> > > high probability. It's better to make the tag value to be an increase
> > > sequence, thus will avoid very much possible req reuse.
> > 
> > I'd love to be able to do this, but it would break some servers that
> > assume tags are small (e.g. using it as an index for a tag array)
> > ... I thought nfs-ganesha was doing this but they properly put in in
> > buckets, so that's one less server to worry about, but I wouldn't put
> > it past some simple servers to do that; having a way to lookup a given
> > tag for flush is an implementation requirement.
> 
> I really think it's time to emit tag number sequentially. If it turns out that
> it's a server that is broken, we could then simply ignore replies with old/
> unknown tag number. It would also help a lot when debugging 9p issues in
> general when you know tag numbers are not re-used (in near future).
> 
> A 9p server must not make any assumptions how tag numbers are generated by
> client, whether dense or sparse, or whatever. If it does then server is
> broken, which is much easier to fix than synchronization issues we have to
> deal with like this one.

Well, it's a one line change: just replace the idr_alloc in the else
branch of p9_tag_alloc with idr_alloc_cyclic.
But linux has an history of not breaking userspace, even if it's broken.
One could argue that the server side of a networked protocol isn't
as tightly coupled but I still think we should be careful with it --
adding a new mount option to rever to the old behaviour at the very
least.

I'm also not convinced it'd fix anything here, we're not talking about a
real server but about a potential attacker -- if a reply comes in with
the next tag while we're allocating it, we'll get the exact same problem
as we have right now.
Frankly, 9p has no security at all so I'm not sure this is something we
really need to worry about, but bugs are bugs so we might as well fix
them if someone has the time for that...

Anyway, I can appreciate that logs will definitely be easier to read, so
an option to voluntarily switch to cyclic allocation would be more than
welcome as a first step and shouldn't be too hard to do...

Schspa Shi Nov. 30, 2022, 1:15 p.m. UTC | #11

asmadeus@codewreck.org writes:

> Schspa Shi wrote on Wed, Nov 30, 2022 at 04:14:32PM +0800:
>> >  - reqs are alloced in a kmem_cache created with SLAB_TYPESAFE_BY_RCU.
>> >  This means that if we get a req from idr_find, even if it has just been
>> >  freed, it either is still in the state it was freed at (hence refcount
>> >  0, we ignore it) or is another req coming from the same cache (if
>> 
>> If the req was newly alloced(It was at a new page), refcount maybe not
>> 0, there will be problem in this case. It seems we can't relay on this.
>> 
>> We need to set the refcount to zero before add it to idr in p9_tag_alloc.
>
> Hmm, if it's reused then it's zero by definition, but if it's a new
> allocation (uninitialized) then anything goes; that lookup could find
> and increase it before the refcount_set, and we'd have an off by one
> leading to use after free. Good catch!
>
> Initializing it to zero will lead to the client busy-looping until after
> the refcount is properly set, which should work.

Why? It looks no different from the previous process here. Initializing
it to zero should makes no difference.

> Setting refcount early might have us use an re-used req before the tag
> has been changed so that one cannot move.
>
> Could you test with just that changed if syzbot still reproduces this
> bug? (perhaps add a comment if you send this)
>

I have upload a new v2 change for this. But I can't easily reproduce
this problem.

> ------
> diff --git a/net/9p/client.c b/net/9p/client.c
> index aaa37b07e30a..aa64724f6a69 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -297,6 +297,7 @@ p9_tag_alloc(struct p9_client *c, int8_t type, uint t_size, uint r_size,
>  	p9pdu_reset(&req->rc);
>  	req->t_err = 0;
>  	req->status = REQ_STATUS_ALLOC;
> +	refcount_set(&req->refcount, 0);
>  	init_waitqueue_head(&req->wq);
>  	INIT_LIST_HEAD(&req->req_list);
>
> ----- 
>
>> >  refcount isn't zero, we can check its tag)
>> 
>> As for the release case, the next request will have the same tag with
>> high probability. It's better to make the tag value to be an increase
>> sequence, thus will avoid very much possible req reuse.
>
> I'd love to be able to do this, but it would break some servers that
> assume tags are small (e.g. using it as an index for a tag array)
> ... I thought nfs-ganesha was doing this but they properly put in in
> buckets, so that's one less server to worry about, but I wouldn't put
> it past some simple servers to do that; having a way to lookup a given
> tag for flush is an implementation requirement.
>
> That shouldn't be a problem though as that will just lead to either fail
> the guard check after lookup (m->rreq->status != REQ_STATUS_SENT) or be
> processed as a normal reply if it's already been sent by the other
> thread at this point.
> OTOH, that m->rreq->status isn't protected by m->req_lock in trans_fd,
> and that is probably another bug...

Yes, I aggree with you, another BUG.

Christian Schoenebeck Nov. 30, 2022, 1:25 p.m. UTC | #12

On Wednesday, November 30, 2022 1:54:21 PM CET asmadeus@codewreck.org wrote:
> Christian Schoenebeck wrote on Wed, Nov 30, 2022 at 01:43:20PM +0100:
> > > > As for the release case, the next request will have the same tag with
> > > > high probability. It's better to make the tag value to be an increase
> > > > sequence, thus will avoid very much possible req reuse.
> > > 
> > > I'd love to be able to do this, but it would break some servers that
> > > assume tags are small (e.g. using it as an index for a tag array)
> > > ... I thought nfs-ganesha was doing this but they properly put in in
> > > buckets, so that's one less server to worry about, but I wouldn't put
> > > it past some simple servers to do that; having a way to lookup a given
> > > tag for flush is an implementation requirement.
> > 
> > I really think it's time to emit tag number sequentially. If it turns out that
> > it's a server that is broken, we could then simply ignore replies with old/
> > unknown tag number. It would also help a lot when debugging 9p issues in
> > general when you know tag numbers are not re-used (in near future).
> > 
> > A 9p server must not make any assumptions how tag numbers are generated by
> > client, whether dense or sparse, or whatever. If it does then server is
> > broken, which is much easier to fix than synchronization issues we have to
> > deal with like this one.
> 
> Well, it's a one line change: just replace the idr_alloc in the else
> branch of p9_tag_alloc with idr_alloc_cyclic.
> But linux has an history of not breaking userspace, even if it's broken.
> One could argue that the server side of a networked protocol isn't
> as tightly coupled but I still think we should be careful with it --
> adding a new mount option to rever to the old behaviour at the very
> least.

+1 for the mount option.

> I'm also not convinced it'd fix anything here, we're not talking about a
> real server but about a potential attacker -- if a reply comes in with
> the next tag while we're allocating it, we'll get the exact same problem
> as we have right now.
> Frankly, 9p has no security at all so I'm not sure this is something we
> really need to worry about, but bugs are bugs so we might as well fix
> them if someone has the time for that...
> 
> Anyway, I can appreciate that logs will definitely be easier to read, so
> an option to voluntarily switch to cyclic allocation would be more than
> welcome as a first step and shouldn't be too hard to do...

I would actually do it the other way around: generating continuous sequential
tags by default and only reverting back to dense tags if requested by mount
option.

Is there any server implementation known to rely on current dense tag
generation?

If there is really some exotic server somewhere that uses e.g. a simple
constant size array to lookup tags and nobody is able to replace that array by
a hash table or something for whatever reason, then I am pretty sure that
server is limited at other ends as well (e.g. small 'msize'). So what we could
do is adjusting the default behaviour according to the other side and allow to
explicitly set both sequential and dense tags by mount option (i.e. not just
a boolean mount option).

Best regards,
Christian Schoenebeck

Dominique Martinet Nov. 30, 2022, 1:34 p.m. UTC | #13

Schspa Shi wrote on Wed, Nov 30, 2022 at 09:15:12PM +0800:
> >> If the req was newly alloced(It was at a new page), refcount maybe not
> >> 0, there will be problem in this case. It seems we can't relay on this.
> >> 
> >> We need to set the refcount to zero before add it to idr in p9_tag_alloc.
> >
> > Hmm, if it's reused then it's zero by definition, but if it's a new
> > allocation (uninitialized) then anything goes; that lookup could find
> > and increase it before the refcount_set, and we'd have an off by one
> > leading to use after free. Good catch!
> >
> > Initializing it to zero will lead to the client busy-looping until after
> > the refcount is properly set, which should work.
> 
> Why? It looks no different from the previous process here. Initializing
> it to zero should makes no difference.

I do not understand this remark.
If this is a freed request it will be zero, because we freed the request
as the refcount hit zero, but if it's a newly allocated request then the
memory is uninitalized, and the lookup can get anything.

In that case we want refcount to be zero to have the check in
p9_tag_lookup to not use the request until we set the refcount to 2.


> > Setting refcount early might have us use an re-used req before the tag
> > has been changed so that one cannot move.
> >
> > Could you test with just that changed if syzbot still reproduces this
> > bug? (perhaps add a comment if you send this)
> >
> 
> I have upload a new v2 change for this. But I can't easily reproduce
> this problem.

Ah, I read that v2 as you actually ran some tests with this, sorry for
the misuderstanding.

Well, it's a fix anyway, so it cannot hurt to apply...

Dominique Martinet Nov. 30, 2022, 1:40 p.m. UTC | #14

Christian Schoenebeck wrote on Wed, Nov 30, 2022 at 02:25:59PM +0100:
> > I'm also not convinced it'd fix anything here, we're not talking about a
> > real server but about a potential attacker -- if a reply comes in with
> > the next tag while we're allocating it, we'll get the exact same problem
> > as we have right now.
> > Frankly, 9p has no security at all so I'm not sure this is something we
> > really need to worry about, but bugs are bugs so we might as well fix
> > them if someone has the time for that...
> > 
> > Anyway, I can appreciate that logs will definitely be easier to read, so
> > an option to voluntarily switch to cyclic allocation would be more than
> > welcome as a first step and shouldn't be too hard to do...
> 
> I would actually do it the other way around: generating continuous sequential
> tags by default and only reverting back to dense tags if requested by mount
> option.
> 
> Is there any server implementation known to rely on current dense tag
> generation?

No, I thought ganesha did when we discussed it last time, but checked
just now and it appears to be correct.

I had a quick look at other servers I have around (diod uses a plain
list, libixp uses a bucket list like ganesha...), but there are so many
9p servers out here that I'm far from keeping track...

Happy to give it a try and see who complains...

> If there is really some exotic server somewhere that uses e.g. a simple
> constant size array to lookup tags and nobody is able to replace that array by
> a hash table or something for whatever reason, then I am pretty sure that
> server is limited at other ends as well (e.g. small 'msize'). So what we could
> do is adjusting the default behaviour according to the other side and allow to
> explicitly set both sequential and dense tags by mount option (i.e. not just
> a boolean mount option).

Well, TVERSION doesn't have much negotiation capability aside of msize,
not sure what to suggest here...

Schspa Shi Dec. 1, 2022, 2:26 a.m. UTC | #15

asmadeus@codewreck.org writes:

> Schspa Shi wrote on Wed, Nov 30, 2022 at 09:15:12PM +0800:
>> >> If the req was newly alloced(It was at a new page), refcount maybe not
>> >> 0, there will be problem in this case. It seems we can't relay on this.
>> >> 
>> >> We need to set the refcount to zero before add it to idr in p9_tag_alloc.
>> >
>> > Hmm, if it's reused then it's zero by definition, but if it's a new
>> > allocation (uninitialized) then anything goes; that lookup could find
>> > and increase it before the refcount_set, and we'd have an off by one
>> > leading to use after free. Good catch!
>> >
>> > Initializing it to zero will lead to the client busy-looping until after
>> > the refcount is properly set, which should work.
>> 
>> Why? It looks no different from the previous process here. Initializing
>> it to zero should makes no difference.
>
> I do not understand this remark.
> If this is a freed request it will be zero, because we freed the request
> as the refcount hit zero, but if it's a newly allocated request then the
> memory is uninitalized, and the lookup can get anything.

Here is my misunderstanding. I thought you meant that there would be a
loop on the client side to wait for the refcount to become a non-zero
value. Actually, there is no such loop.

>
> In that case we want refcount to be zero to have the check in
> p9_tag_lookup to not use the request until we set the refcount to 2.
>
>
>> > Setting refcount early might have us use an re-used req before the tag
>> > has been changed so that one cannot move.
>> >
>> > Could you test with just that changed if syzbot still reproduces this
>> > bug? (perhaps add a comment if you send this)
>> >
>> 
>> I have upload a new v2 change for this. But I can't easily reproduce
>> this problem.
>
> Ah, I read that v2 as you actually ran some tests with this, sorry for
> the misuderstanding.
>
> Well, it's a fix anyway, so it cannot hurt to apply...

diff mbox series

Patch

diff --git a/net/9p/client.c b/net/9p/client.c
index aaa37b07e30a..963cf91aa0d5 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -440,7 +440,7 @@  void p9_client_cb(struct p9_client *c, struct p9_req_t *req, int status)
 	smp_wmb();
 	req->status = status;
 
-	wake_up(&req->wq);
+	wake_up_all(&req->wq);
 	p9_debug(P9_DEBUG_MUX, "wakeup: %d\n", req->tc.tag);
 	p9_req_put(c, req);
 }
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index eeea0a6a75b6..ee2d6b231af1 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -30,6 +30,7 @@ 
 #include <net/9p/transport.h>
 
 #include <linux/syscalls.h> /* killme */
+#include <linux/wait.h>
 
 #define P9_PORT 564
 #define MAX_SOCK_BUF (1024*1024)
@@ -728,6 +729,17 @@  static int p9_fd_cancelled(struct p9_client *client, struct p9_req_t *req)
 		return 0;
 	}
 
+	/* If the request is been sent to the server, we need to wait for the
+	 * job to finish.
+	 */
+	if (req->status == REQ_STATUS_SENT) {
+		spin_unlock(&m->req_lock);
+		p9_debug(P9_DEBUG_TRANS, "client %p req %p wait done\n",
+			 client, req);
+		wait_event(req->wq, req->status >= REQ_STATUS_RCVD);
+
+		return 0;
+	}
 	/* we haven't received a response for oldreq,
 	 * remove it from the list.
 	 */