Message ID | 20230507091131.23540-1-dinghui@sangfor.com.cn |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1472849vqo; Sun, 7 May 2023 02:13:46 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4V6GM7nURXkQacjJ5Y6R5RE4gFBYlLTK6kAI1MjLDrZq+/5gKJUFGmQlByr+ogpJEpQTQp X-Received: by 2002:a17:90a:6349:b0:24e:246e:6454 with SMTP id v9-20020a17090a634900b0024e246e6454mr7140454pjs.33.1683450826013; Sun, 07 May 2023 02:13:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683450825; cv=none; d=google.com; s=arc-20160816; b=iZIassp9b6FxdY5tFhjGw3Gf+VuKjlopaRF9bsjOoQCxxcPG140Jct3sSOLIc0afsq 9iCs9Xl4LXempbgoClZrh577BMSLjRd92tjE6FxnCma8NnuwmXciZPyN2mMh8qw08a9t fbzMqUVgGqpc570Vbtk2zriHWJMlpT9e4mBxNsUu1kcbO7QyWLh+NeZwx/CaYclKeqtq lP/aSCJ9rQ+hniNkXIriBXQXDn8j4jMBGiSwAoNDl3/9neMamA6ShNWK7mL41NMZDkE1 ZAY4JLufE2CqHl1jm0wAA10IJV5dGj6M7ih2S5DV6quMGEcnMkH4x+6gtNBMzR+ekElW hlHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:subject:cc:to:from; bh=otuf5R12LEO4ag7kVH4qeivV5tdVRSsfiXQONGzQjSo=; b=vhhjR8EFBf6cu160VKdDpxbf+3w1uLqpu55jwFFCULHkXqjWl6RXFWHasKfX8gGKsL jr1/3/vRwnsgfmDBfXgxtgqOaZvYmoL4hIFcBn+ZuwtzXFDs03mNOkyjdo4PC4n4ofk+ S8wO/I3rYoaBQOh6sS+OJCkvbKXJnkWuGHc5mHsgVh/qpwLuOqoqTAJc3OM7yjjx+EkP /ovvrnXmdlijCUY6wF0SSdYY3yp+XfZ8NTOzj0WMt950EPDHGAwFiGV7eQNJslGea7SG iIatpDuUN/5+/y/RNsmkLgIzhuxCxCN1cllOG2Up6gvFE7VR8d7glyHsdp3pxbqoUTvQ JRjw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sangfor.com.cn Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s23-20020a170902a51700b001a197aa18fesi5359940plq.121.2023.05.07.02.13.31; Sun, 07 May 2023 02:13:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=sangfor.com.cn Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231144AbjEGJMX (ORCPT <rfc822;baris.duru.linux@gmail.com> + 99 others); Sun, 7 May 2023 05:12:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230489AbjEGJMV (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 7 May 2023 05:12:21 -0400 Received: from mail-m127104.qiye.163.com (mail-m127104.qiye.163.com [115.236.127.104]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E1EDB93EF for <linux-kernel@vger.kernel.org>; Sun, 7 May 2023 02:12:19 -0700 (PDT) Received: from localhost.localdomain (unknown [IPV6:240e:3b7:3277:3e50:d9d7:3dc:49c3:c0bf]) by mail-m127104.qiye.163.com (Hmail) with ESMTPA id 36F23A4010E; Sun, 7 May 2023 17:12:14 +0800 (CST) From: Ding Hui <dinghui@sangfor.com.cn> To: chuck.lever@oracle.com, jlayton@kernel.org, trond.myklebust@hammerspace.com, anna@kernel.org Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, bfields@redhat.com, linux-nfs@vger.kernel.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, dinghui@sangfor.com.cn Subject: [RFC PATCH] SUNRPC: Fix UAF in svc_tcp_listen_data_ready() Date: Sun, 7 May 2023 17:11:31 +0800 Message-Id: <20230507091131.23540-1-dinghui@sangfor.com.cn> X-Mailer: git-send-email 2.17.1 X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFITzdXWS1ZQUlXWQ8JGhUIEh9ZQVlDSB5LVh1MHkNLSE0aSx9PT1UTARMWGhIXJBQOD1 lXWRgSC1lBWUlPSx5BSBlMQUhJTExBSB5OS0EfQh9MQUgfGEFPQhhIQRhLGR1ZV1kWGg8SFR0UWU FZT0tIVUpKS0hKTFVKS0tVS1kG X-HM-Tid: 0a87f57ba7f5b282kuuu36f23a4010e X-HM-MType: 1 X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6PQg6HAw*DD0KCTErOhU*OAwu LDJPCwxVSlVKTUNIT05LTEhOS0xCVTMWGhIXVR8SFRwTDhI7CBoVHB0UCVUYFBZVGBVFWVdZEgtZ QVlJT0seQUgZTEFISUxMQUgeTktBH0IfTEFIHxhBT0IYSEEYSxkdWVdZCAFZQU1LTEM3Bg++ X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1765226133538332441?= X-GMAIL-MSGID: =?utf-8?q?1765226133538332441?= |
Series |
[RFC] SUNRPC: Fix UAF in svc_tcp_listen_data_ready()
|
|
Commit Message
Ding Hui
May 7, 2023, 9:11 a.m. UTC
After the listener svc_sock freed, and before invoking svc_tcp_accept()
for the established child sock, there is a window that the newsock
retaining a freed listener svc_sock in sk_user_data which cloning from
parent. In the race windows if data is received on the newsock, we will
observe use-after-free report in svc_tcp_listen_data_ready().
Reproduce by two tasks:
1. while :; do rpc.nfsd 0 ; rpc.nfsd; done
2. while :; do echo "" | ncat -4 127.0.0.1 2049 ; done
KASAN report:
==================================================================
BUG: KASAN: slab-use-after-free in svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc]
Read of size 8 at addr ffff888139d96228 by task nc/102553
CPU: 7 PID: 102553 Comm: nc Not tainted 6.3.0+ #18
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020
Call Trace:
<IRQ>
dump_stack_lvl+0x33/0x50
print_address_description.constprop.0+0x27/0x310
print_report+0x3e/0x70
kasan_report+0xae/0xe0
svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc]
tcp_data_queue+0x9f4/0x20e0
tcp_rcv_established+0x666/0x1f60
tcp_v4_do_rcv+0x51c/0x850
tcp_v4_rcv+0x23fc/0x2e80
ip_protocol_deliver_rcu+0x62/0x300
ip_local_deliver_finish+0x267/0x350
ip_local_deliver+0x18b/0x2d0
ip_rcv+0x2fb/0x370
__netif_receive_skb_one_core+0x166/0x1b0
process_backlog+0x24c/0x5e0
__napi_poll+0xa2/0x500
net_rx_action+0x854/0xc90
__do_softirq+0x1bb/0x5de
do_softirq+0xcb/0x100
</IRQ>
<TASK>
...
</TASK>
Allocated by task 102371:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
__kasan_kmalloc+0x7b/0x90
svc_setup_socket+0x52/0x4f0 [sunrpc]
svc_addsock+0x20d/0x400 [sunrpc]
__write_ports_addfd+0x209/0x390 [nfsd]
write_ports+0x239/0x2c0 [nfsd]
nfsctl_transaction_write+0xac/0x110 [nfsd]
vfs_write+0x1c3/0xae0
ksys_write+0xed/0x1c0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
Freed by task 102551:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
kasan_save_free_info+0x2a/0x50
__kasan_slab_free+0x106/0x190
__kmem_cache_free+0x133/0x270
svc_xprt_free+0x1e2/0x350 [sunrpc]
svc_xprt_destroy_all+0x25a/0x440 [sunrpc]
nfsd_put+0x125/0x240 [nfsd]
nfsd_svc+0x2cb/0x3c0 [nfsd]
write_threads+0x1ac/0x2a0 [nfsd]
nfsctl_transaction_write+0xac/0x110 [nfsd]
vfs_write+0x1c3/0xae0
ksys_write+0xed/0x1c0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
In this RFC patch, I try to fix the UAF by skipping dereferencing
svsk for all child socket in svc_tcp_listen_data_ready(), it is
easy to backport for stable.
However I'm not sure if there are other potential risks in the race
window, so I thought another fix which depends on SK_USER_DATA_NOCOPY
introduced in commit f1ff5ce2cd5e ("net, sk_msg: Clear sk_user_data
pointer on clone if tagged").
Saving svsk into sk_user_data with SK_USER_DATA_NOCOPY tag in
svc_setup_socket() like this:
__rcu_assign_sk_user_data_with_flags(inet, svsk, SK_USER_DATA_NOCOPY);
Obtaining svsk in callbacks like this:
struct svc_sock *svsk = rcu_dereference_sk_user_data(sk);
This will avoid copying sk_user_data for sunrpc svc_sock in
sk_clone_lock(), so the sk_user_data of child sock before accepted
will be NULL.
Appreciate any comment and suggestion, thanks.
Fixes: fa9251afc33c ("SUNRPC: Call the default socket callbacks instead of open coding")
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
---
net/sunrpc/svcsock.c | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)
Comments
> On May 7, 2023, at 5:11 AM, Ding Hui <dinghui@sangfor.com.cn> wrote: > > After the listener svc_sock freed, and before invoking svc_tcp_accept() > for the established child sock, there is a window that the newsock > retaining a freed listener svc_sock in sk_user_data which cloning from > parent. In the race windows if data is received on the newsock, we will > observe use-after-free report in svc_tcp_listen_data_ready(). My thought is that not calling sk_odata() for the newsock could potentially result in missing a data_ready event, resulting in a hung client on that socket. IMO the preferred approach is to ensure that svsk is always safe to dereference in tcp_listen_data_ready. I haven't yet thought carefully about how to do that. > Reproduce by two tasks: > > 1. while :; do rpc.nfsd 0 ; rpc.nfsd; done > 2. while :; do echo "" | ncat -4 127.0.0.1 2049 ; done > > KASAN report: > > ================================================================== > BUG: KASAN: slab-use-after-free in svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc] > Read of size 8 at addr ffff888139d96228 by task nc/102553 > CPU: 7 PID: 102553 Comm: nc Not tainted 6.3.0+ #18 > Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 > Call Trace: > <IRQ> > dump_stack_lvl+0x33/0x50 > print_address_description.constprop.0+0x27/0x310 > print_report+0x3e/0x70 > kasan_report+0xae/0xe0 > svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc] > tcp_data_queue+0x9f4/0x20e0 > tcp_rcv_established+0x666/0x1f60 > tcp_v4_do_rcv+0x51c/0x850 > tcp_v4_rcv+0x23fc/0x2e80 > ip_protocol_deliver_rcu+0x62/0x300 > ip_local_deliver_finish+0x267/0x350 > ip_local_deliver+0x18b/0x2d0 > ip_rcv+0x2fb/0x370 > __netif_receive_skb_one_core+0x166/0x1b0 > process_backlog+0x24c/0x5e0 > __napi_poll+0xa2/0x500 > net_rx_action+0x854/0xc90 > __do_softirq+0x1bb/0x5de > do_softirq+0xcb/0x100 > </IRQ> > <TASK> > ... > </TASK> > > Allocated by task 102371: > kasan_save_stack+0x1e/0x40 > kasan_set_track+0x21/0x30 > __kasan_kmalloc+0x7b/0x90 > svc_setup_socket+0x52/0x4f0 [sunrpc] > svc_addsock+0x20d/0x400 [sunrpc] > __write_ports_addfd+0x209/0x390 [nfsd] > write_ports+0x239/0x2c0 [nfsd] > nfsctl_transaction_write+0xac/0x110 [nfsd] > vfs_write+0x1c3/0xae0 > ksys_write+0xed/0x1c0 > do_syscall_64+0x38/0x90 > entry_SYSCALL_64_after_hwframe+0x72/0xdc > > Freed by task 102551: > kasan_save_stack+0x1e/0x40 > kasan_set_track+0x21/0x30 > kasan_save_free_info+0x2a/0x50 > __kasan_slab_free+0x106/0x190 > __kmem_cache_free+0x133/0x270 > svc_xprt_free+0x1e2/0x350 [sunrpc] > svc_xprt_destroy_all+0x25a/0x440 [sunrpc] > nfsd_put+0x125/0x240 [nfsd] > nfsd_svc+0x2cb/0x3c0 [nfsd] > write_threads+0x1ac/0x2a0 [nfsd] > nfsctl_transaction_write+0xac/0x110 [nfsd] > vfs_write+0x1c3/0xae0 > ksys_write+0xed/0x1c0 > do_syscall_64+0x38/0x90 > entry_SYSCALL_64_after_hwframe+0x72/0xdc > > In this RFC patch, I try to fix the UAF by skipping dereferencing > svsk for all child socket in svc_tcp_listen_data_ready(), it is > easy to backport for stable. > > However I'm not sure if there are other potential risks in the race > window, so I thought another fix which depends on SK_USER_DATA_NOCOPY > introduced in commit f1ff5ce2cd5e ("net, sk_msg: Clear sk_user_data > pointer on clone if tagged"). > > Saving svsk into sk_user_data with SK_USER_DATA_NOCOPY tag in > svc_setup_socket() like this: > > __rcu_assign_sk_user_data_with_flags(inet, svsk, SK_USER_DATA_NOCOPY); > > Obtaining svsk in callbacks like this: > > struct svc_sock *svsk = rcu_dereference_sk_user_data(sk); > > This will avoid copying sk_user_data for sunrpc svc_sock in > sk_clone_lock(), so the sk_user_data of child sock before accepted > will be NULL. > > Appreciate any comment and suggestion, thanks. > > Fixes: fa9251afc33c ("SUNRPC: Call the default socket callbacks instead of open coding") > Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> > --- > net/sunrpc/svcsock.c | 23 +++++++++++------------ > 1 file changed, 11 insertions(+), 12 deletions(-) > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c > index a51c9b989d58..9aca6e1e78e4 100644 > --- a/net/sunrpc/svcsock.c > +++ b/net/sunrpc/svcsock.c > @@ -825,12 +825,6 @@ static void svc_tcp_listen_data_ready(struct sock *sk) > > trace_sk_data_ready(sk); > > - if (svsk) { > - /* Refer to svc_setup_socket() for details. */ > - rmb(); > - svsk->sk_odata(sk); > - } > - > /* > * This callback may called twice when a new connection > * is established as a child socket inherits everything > @@ -839,13 +833,18 @@ static void svc_tcp_listen_data_ready(struct sock *sk) > * when one of child sockets become ESTABLISHED. > * 2) data_ready method of the child socket may be called > * when it receives data before the socket is accepted. > - * In case of 2, we should ignore it silently. > + * In case of 2, we should ignore it silently and DO NOT > + * dereference svsk. > */ > - if (sk->sk_state == TCP_LISTEN) { > - if (svsk) { > - set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags); > - svc_xprt_enqueue(&svsk->sk_xprt); > - } > + if (sk->sk_state != TCP_LISTEN) > + return; > + > + if (svsk) { > + /* Refer to svc_setup_socket() for details. */ > + rmb(); > + svsk->sk_odata(sk); > + set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags); > + svc_xprt_enqueue(&svsk->sk_xprt); > } > } > > -- > 2.17.1 > -- Chuck Lever
On 2023/5/7 23:26, Chuck Lever III wrote: > > >> On May 7, 2023, at 5:11 AM, Ding Hui <dinghui@sangfor.com.cn> wrote: >> >> After the listener svc_sock freed, and before invoking svc_tcp_accept() >> for the established child sock, there is a window that the newsock >> retaining a freed listener svc_sock in sk_user_data which cloning from >> parent. In the race windows if data is received on the newsock, we will >> observe use-after-free report in svc_tcp_listen_data_ready(). > > My thought is that not calling sk_odata() for the newsock > could potentially result in missing a data_ready event, > resulting in a hung client on that socket. > I checked the vmcore, found that sk_odata points to sock_def_readable(), and the sk_wq of newsock is NULL, which be assigned by sk_clone_lock() unconditionally. Calling sk_odata() for the newsock maybe do not wake up any sleepers. > IMO the preferred approach is to ensure that svsk is always > safe to dereference in tcp_listen_data_ready. I haven't yet > thought carefully about how to do that. > Agree, but I don't have a good way for now. > >> Reproduce by two tasks: >> >> 1. while :; do rpc.nfsd 0 ; rpc.nfsd; done >> 2. while :; do echo "" | ncat -4 127.0.0.1 2049 ; done >> >> KASAN report: >> >> ================================================================== >> BUG: KASAN: slab-use-after-free in svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc] >> Read of size 8 at addr ffff888139d96228 by task nc/102553 >> CPU: 7 PID: 102553 Comm: nc Not tainted 6.3.0+ #18 >> Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 >> Call Trace: >> <IRQ> >> dump_stack_lvl+0x33/0x50 >> print_address_description.constprop.0+0x27/0x310 >> print_report+0x3e/0x70 >> kasan_report+0xae/0xe0 >> svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc] >> tcp_data_queue+0x9f4/0x20e0 >> tcp_rcv_established+0x666/0x1f60 >> tcp_v4_do_rcv+0x51c/0x850 >> tcp_v4_rcv+0x23fc/0x2e80 >> ip_protocol_deliver_rcu+0x62/0x300 >> ip_local_deliver_finish+0x267/0x350 >> ip_local_deliver+0x18b/0x2d0 >> ip_rcv+0x2fb/0x370 >> __netif_receive_skb_one_core+0x166/0x1b0 >> process_backlog+0x24c/0x5e0 >> __napi_poll+0xa2/0x500 >> net_rx_action+0x854/0xc90 >> __do_softirq+0x1bb/0x5de >> do_softirq+0xcb/0x100 >> </IRQ> >> <TASK> >> ... >> </TASK> >> >> Allocated by task 102371: >> kasan_save_stack+0x1e/0x40 >> kasan_set_track+0x21/0x30 >> __kasan_kmalloc+0x7b/0x90 >> svc_setup_socket+0x52/0x4f0 [sunrpc] >> svc_addsock+0x20d/0x400 [sunrpc] >> __write_ports_addfd+0x209/0x390 [nfsd] >> write_ports+0x239/0x2c0 [nfsd] >> nfsctl_transaction_write+0xac/0x110 [nfsd] >> vfs_write+0x1c3/0xae0 >> ksys_write+0xed/0x1c0 >> do_syscall_64+0x38/0x90 >> entry_SYSCALL_64_after_hwframe+0x72/0xdc >> >> Freed by task 102551: >> kasan_save_stack+0x1e/0x40 >> kasan_set_track+0x21/0x30 >> kasan_save_free_info+0x2a/0x50 >> __kasan_slab_free+0x106/0x190 >> __kmem_cache_free+0x133/0x270 >> svc_xprt_free+0x1e2/0x350 [sunrpc] >> svc_xprt_destroy_all+0x25a/0x440 [sunrpc] >> nfsd_put+0x125/0x240 [nfsd] >> nfsd_svc+0x2cb/0x3c0 [nfsd] >> write_threads+0x1ac/0x2a0 [nfsd] >> nfsctl_transaction_write+0xac/0x110 [nfsd] >> vfs_write+0x1c3/0xae0 >> ksys_write+0xed/0x1c0 >> do_syscall_64+0x38/0x90 >> entry_SYSCALL_64_after_hwframe+0x72/0xdc >> >> In this RFC patch, I try to fix the UAF by skipping dereferencing >> svsk for all child socket in svc_tcp_listen_data_ready(), it is >> easy to backport for stable. >> >> However I'm not sure if there are other potential risks in the race >> window, so I thought another fix which depends on SK_USER_DATA_NOCOPY >> introduced in commit f1ff5ce2cd5e ("net, sk_msg: Clear sk_user_data >> pointer on clone if tagged"). >> >> Saving svsk into sk_user_data with SK_USER_DATA_NOCOPY tag in >> svc_setup_socket() like this: >> >> __rcu_assign_sk_user_data_with_flags(inet, svsk, SK_USER_DATA_NOCOPY); >> >> Obtaining svsk in callbacks like this: >> >> struct svc_sock *svsk = rcu_dereference_sk_user_data(sk); >> >> This will avoid copying sk_user_data for sunrpc svc_sock in >> sk_clone_lock(), so the sk_user_data of child sock before accepted >> will be NULL. >> >> Appreciate any comment and suggestion, thanks. >> >> Fixes: fa9251afc33c ("SUNRPC: Call the default socket callbacks instead of open coding") >> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> >> --- >> net/sunrpc/svcsock.c | 23 +++++++++++------------ >> 1 file changed, 11 insertions(+), 12 deletions(-) >> >> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c >> index a51c9b989d58..9aca6e1e78e4 100644 >> --- a/net/sunrpc/svcsock.c >> +++ b/net/sunrpc/svcsock.c >> @@ -825,12 +825,6 @@ static void svc_tcp_listen_data_ready(struct sock *sk) >> >> trace_sk_data_ready(sk); >> >> - if (svsk) { >> - /* Refer to svc_setup_socket() for details. */ >> - rmb(); >> - svsk->sk_odata(sk); >> - } >> - >> /* >> * This callback may called twice when a new connection >> * is established as a child socket inherits everything >> @@ -839,13 +833,18 @@ static void svc_tcp_listen_data_ready(struct sock *sk) >> * when one of child sockets become ESTABLISHED. >> * 2) data_ready method of the child socket may be called >> * when it receives data before the socket is accepted. >> - * In case of 2, we should ignore it silently. >> + * In case of 2, we should ignore it silently and DO NOT >> + * dereference svsk. >> */ >> - if (sk->sk_state == TCP_LISTEN) { >> - if (svsk) { >> - set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags); >> - svc_xprt_enqueue(&svsk->sk_xprt); >> - } >> + if (sk->sk_state != TCP_LISTEN) >> + return; >> + >> + if (svsk) { >> + /* Refer to svc_setup_socket() for details. */ >> + rmb(); >> + svsk->sk_odata(sk); >> + set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags); >> + svc_xprt_enqueue(&svsk->sk_xprt); >> } >> } >> >> -- >> 2.17.1 >> > > -- > Chuck Lever > > >
> On May 7, 2023, at 6:32 PM, Ding Hui <dinghui@sangfor.com.cn> wrote: > > On 2023/5/7 23:26, Chuck Lever III wrote: >>> On May 7, 2023, at 5:11 AM, Ding Hui <dinghui@sangfor.com.cn> wrote: >>> >>> After the listener svc_sock freed, and before invoking svc_tcp_accept() >>> for the established child sock, there is a window that the newsock >>> retaining a freed listener svc_sock in sk_user_data which cloning from >>> parent. In the race windows if data is received on the newsock, we will >>> observe use-after-free report in svc_tcp_listen_data_ready(). >> My thought is that not calling sk_odata() for the newsock >> could potentially result in missing a data_ready event, >> resulting in a hung client on that socket. > > I checked the vmcore, found that sk_odata points to sock_def_readable(), > and the sk_wq of newsock is NULL, which be assigned by sk_clone_lock() > unconditionally. > > Calling sk_odata() for the newsock maybe do not wake up any sleepers. > >> IMO the preferred approach is to ensure that svsk is always >> safe to dereference in tcp_listen_data_ready. I haven't yet >> thought carefully about how to do that. > > Agree, but I don't have a good way for now. Would a smartly-placed svc_xprt_get() hold the listener in place until accept processing completes? >>> Reproduce by two tasks: >>> >>> 1. while :; do rpc.nfsd 0 ; rpc.nfsd; done >>> 2. while :; do echo "" | ncat -4 127.0.0.1 2049 ; done >>> >>> KASAN report: >>> >>> ================================================================== >>> BUG: KASAN: slab-use-after-free in svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc] >>> Read of size 8 at addr ffff888139d96228 by task nc/102553 >>> CPU: 7 PID: 102553 Comm: nc Not tainted 6.3.0+ #18 >>> Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 >>> Call Trace: >>> <IRQ> >>> dump_stack_lvl+0x33/0x50 >>> print_address_description.constprop.0+0x27/0x310 >>> print_report+0x3e/0x70 >>> kasan_report+0xae/0xe0 >>> svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc] >>> tcp_data_queue+0x9f4/0x20e0 >>> tcp_rcv_established+0x666/0x1f60 >>> tcp_v4_do_rcv+0x51c/0x850 >>> tcp_v4_rcv+0x23fc/0x2e80 >>> ip_protocol_deliver_rcu+0x62/0x300 >>> ip_local_deliver_finish+0x267/0x350 >>> ip_local_deliver+0x18b/0x2d0 >>> ip_rcv+0x2fb/0x370 >>> __netif_receive_skb_one_core+0x166/0x1b0 >>> process_backlog+0x24c/0x5e0 >>> __napi_poll+0xa2/0x500 >>> net_rx_action+0x854/0xc90 >>> __do_softirq+0x1bb/0x5de >>> do_softirq+0xcb/0x100 >>> </IRQ> >>> <TASK> >>> ... >>> </TASK> >>> >>> Allocated by task 102371: >>> kasan_save_stack+0x1e/0x40 >>> kasan_set_track+0x21/0x30 >>> __kasan_kmalloc+0x7b/0x90 >>> svc_setup_socket+0x52/0x4f0 [sunrpc] >>> svc_addsock+0x20d/0x400 [sunrpc] >>> __write_ports_addfd+0x209/0x390 [nfsd] >>> write_ports+0x239/0x2c0 [nfsd] >>> nfsctl_transaction_write+0xac/0x110 [nfsd] >>> vfs_write+0x1c3/0xae0 >>> ksys_write+0xed/0x1c0 >>> do_syscall_64+0x38/0x90 >>> entry_SYSCALL_64_after_hwframe+0x72/0xdc >>> >>> Freed by task 102551: >>> kasan_save_stack+0x1e/0x40 >>> kasan_set_track+0x21/0x30 >>> kasan_save_free_info+0x2a/0x50 >>> __kasan_slab_free+0x106/0x190 >>> __kmem_cache_free+0x133/0x270 >>> svc_xprt_free+0x1e2/0x350 [sunrpc] >>> svc_xprt_destroy_all+0x25a/0x440 [sunrpc] >>> nfsd_put+0x125/0x240 [nfsd] >>> nfsd_svc+0x2cb/0x3c0 [nfsd] >>> write_threads+0x1ac/0x2a0 [nfsd] >>> nfsctl_transaction_write+0xac/0x110 [nfsd] >>> vfs_write+0x1c3/0xae0 >>> ksys_write+0xed/0x1c0 >>> do_syscall_64+0x38/0x90 >>> entry_SYSCALL_64_after_hwframe+0x72/0xdc >>> >>> In this RFC patch, I try to fix the UAF by skipping dereferencing >>> svsk for all child socket in svc_tcp_listen_data_ready(), it is >>> easy to backport for stable. >>> >>> However I'm not sure if there are other potential risks in the race >>> window, so I thought another fix which depends on SK_USER_DATA_NOCOPY >>> introduced in commit f1ff5ce2cd5e ("net, sk_msg: Clear sk_user_data >>> pointer on clone if tagged"). >>> >>> Saving svsk into sk_user_data with SK_USER_DATA_NOCOPY tag in >>> svc_setup_socket() like this: >>> >>> __rcu_assign_sk_user_data_with_flags(inet, svsk, SK_USER_DATA_NOCOPY); >>> >>> Obtaining svsk in callbacks like this: >>> >>> struct svc_sock *svsk = rcu_dereference_sk_user_data(sk); >>> >>> This will avoid copying sk_user_data for sunrpc svc_sock in >>> sk_clone_lock(), so the sk_user_data of child sock before accepted >>> will be NULL. >>> >>> Appreciate any comment and suggestion, thanks. >>> >>> Fixes: fa9251afc33c ("SUNRPC: Call the default socket callbacks instead of open coding") >>> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> >>> --- >>> net/sunrpc/svcsock.c | 23 +++++++++++------------ >>> 1 file changed, 11 insertions(+), 12 deletions(-) >>> >>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c >>> index a51c9b989d58..9aca6e1e78e4 100644 >>> --- a/net/sunrpc/svcsock.c >>> +++ b/net/sunrpc/svcsock.c >>> @@ -825,12 +825,6 @@ static void svc_tcp_listen_data_ready(struct sock *sk) >>> >>> trace_sk_data_ready(sk); >>> >>> - if (svsk) { >>> - /* Refer to svc_setup_socket() for details. */ >>> - rmb(); >>> - svsk->sk_odata(sk); >>> - } >>> - >>> /* >>> * This callback may called twice when a new connection >>> * is established as a child socket inherits everything >>> @@ -839,13 +833,18 @@ static void svc_tcp_listen_data_ready(struct sock *sk) >>> * when one of child sockets become ESTABLISHED. >>> * 2) data_ready method of the child socket may be called >>> * when it receives data before the socket is accepted. >>> - * In case of 2, we should ignore it silently. >>> + * In case of 2, we should ignore it silently and DO NOT >>> + * dereference svsk. >>> */ >>> - if (sk->sk_state == TCP_LISTEN) { >>> - if (svsk) { >>> - set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags); >>> - svc_xprt_enqueue(&svsk->sk_xprt); >>> - } >>> + if (sk->sk_state != TCP_LISTEN) >>> + return; >>> + >>> + if (svsk) { >>> + /* Refer to svc_setup_socket() for details. */ >>> + rmb(); >>> + svsk->sk_odata(sk); >>> + set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags); >>> + svc_xprt_enqueue(&svsk->sk_xprt); >>> } >>> } >>> >>> -- >>> 2.17.1 >>> >> -- >> Chuck Lever > > -- > Thanks, > - Ding Hui -- Chuck Lever
On 2023/5/8 12:00, Chuck Lever III wrote: > > >> On May 7, 2023, at 6:32 PM, Ding Hui <dinghui@sangfor.com.cn> wrote: >> >> On 2023/5/7 23:26, Chuck Lever III wrote: >>>> On May 7, 2023, at 5:11 AM, Ding Hui <dinghui@sangfor.com.cn> wrote: >>>> >>>> After the listener svc_sock freed, and before invoking svc_tcp_accept() >>>> for the established child sock, there is a window that the newsock >>>> retaining a freed listener svc_sock in sk_user_data which cloning from >>>> parent. In the race windows if data is received on the newsock, we will >>>> observe use-after-free report in svc_tcp_listen_data_ready(). >>> My thought is that not calling sk_odata() for the newsock >>> could potentially result in missing a data_ready event, >>> resulting in a hung client on that socket. >> >> I checked the vmcore, found that sk_odata points to sock_def_readable(), >> and the sk_wq of newsock is NULL, which be assigned by sk_clone_lock() >> unconditionally. >> >> Calling sk_odata() for the newsock maybe do not wake up any sleepers. >> >>> IMO the preferred approach is to ensure that svsk is always >>> safe to dereference in tcp_listen_data_ready. I haven't yet >>> thought carefully about how to do that. >> >> Agree, but I don't have a good way for now. > > Would a smartly-placed svc_xprt_get() hold the listener in place > until accept processing completes? > It is difficult and complicated to me. I think it's a little bit out of SUNRPC's control for the newsocks before accepted, e.g.: we don't know how many they have. Back to this RFC, I checked the code and thought it is safe by skipping sk_odata() for the newsocks before accepted in **svc_tcp_listen_data_ready()**, since these newsocks's sk_wq must be NULL, and will be assigned new one in sock_alloc_inode() called by kernel_accept(), so we can say if the child sock is not be accepted, there is nothing to be waked up. > >>>> Reproduce by two tasks: >>>> ...
[ Removing the stale address for Bruce from the Cc, as he no longer works at Red Hat. ] > On May 7, 2023, at 9:32 PM, Ding Hui <dinghui@sangfor.com.cn> wrote: > > On 2023/5/7 23:26, Chuck Lever III wrote: >>> On May 7, 2023, at 5:11 AM, Ding Hui <dinghui@sangfor.com.cn> wrote: >>> >>> After the listener svc_sock freed, and before invoking svc_tcp_accept() >>> for the established child sock, there is a window that the newsock >>> retaining a freed listener svc_sock in sk_user_data which cloning from >>> parent. In the race windows if data is received on the newsock, we will >>> observe use-after-free report in svc_tcp_listen_data_ready(). >> My thought is that not calling sk_odata() for the newsock >> could potentially result in missing a data_ready event, >> resulting in a hung client on that socket. > > I checked the vmcore, found that sk_odata points to sock_def_readable(), > and the sk_wq of newsock is NULL, which be assigned by sk_clone_lock() > unconditionally. > > Calling sk_odata() for the newsock maybe do not wake up any sleepers. > >> IMO the preferred approach is to ensure that svsk is always >> safe to dereference in tcp_listen_data_ready. I haven't yet >> thought carefully about how to do that. > > Agree, but I don't have a good way for now. > >>> Reproduce by two tasks: >>> >>> 1. while :; do rpc.nfsd 0 ; rpc.nfsd; done >>> 2. while :; do echo "" | ncat -4 127.0.0.1 2049 ; done I haven't been able to reproduce a crash with this snippet. But I've done some archaeology to understand the problem better. I found that svc_tcp_listen_data_ready is actually invoked /three/ times: once for the listener socket, and /twice/ for the child. The big comment, which pre-dates the git era, appears to be somewhat stale; or perhaps it's the specifics of this particular test that triggers the third call. I reviewed several other tcp_listen_data_ready callbacks. They generally do not do anything at all with non-listener sockets, suggesting that approach would likely be safe for NFSD. Prior to commit 939bb7ef901b ("[PATCH] Code cleanups in calbacks in svcsock"), this data_ready callback was a complete no-op for non-listener sockets as well. That commit is described as only a clean-up, but it indeed changes the logic. I also note that most other data_ready callbacks take the sk_callback_lock, and svc_tcp_listen_data_ready does not. Not clear to me whether svc_tcp_listen_data_ready should be taking that lock too. The upshot is that I think it would be reasonable to simply do nothing in svc_tcp_listen_data_ready() if state != TCP_LISTEN. >>> KASAN report: >>> >>> ================================================================== >>> BUG: KASAN: slab-use-after-free in svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc] >>> Read of size 8 at addr ffff888139d96228 by task nc/102553 >>> CPU: 7 PID: 102553 Comm: nc Not tainted 6.3.0+ #18 >>> Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 >>> Call Trace: >>> <IRQ> >>> dump_stack_lvl+0x33/0x50 >>> print_address_description.constprop.0+0x27/0x310 >>> print_report+0x3e/0x70 >>> kasan_report+0xae/0xe0 >>> svc_tcp_listen_data_ready+0x1cf/0x1f0 [sunrpc] >>> tcp_data_queue+0x9f4/0x20e0 >>> tcp_rcv_established+0x666/0x1f60 >>> tcp_v4_do_rcv+0x51c/0x850 >>> tcp_v4_rcv+0x23fc/0x2e80 >>> ip_protocol_deliver_rcu+0x62/0x300 >>> ip_local_deliver_finish+0x267/0x350 >>> ip_local_deliver+0x18b/0x2d0 >>> ip_rcv+0x2fb/0x370 >>> __netif_receive_skb_one_core+0x166/0x1b0 >>> process_backlog+0x24c/0x5e0 >>> __napi_poll+0xa2/0x500 >>> net_rx_action+0x854/0xc90 >>> __do_softirq+0x1bb/0x5de >>> do_softirq+0xcb/0x100 >>> </IRQ> >>> <TASK> >>> ... >>> </TASK> >>> >>> Allocated by task 102371: >>> kasan_save_stack+0x1e/0x40 >>> kasan_set_track+0x21/0x30 >>> __kasan_kmalloc+0x7b/0x90 >>> svc_setup_socket+0x52/0x4f0 [sunrpc] >>> svc_addsock+0x20d/0x400 [sunrpc] >>> __write_ports_addfd+0x209/0x390 [nfsd] >>> write_ports+0x239/0x2c0 [nfsd] >>> nfsctl_transaction_write+0xac/0x110 [nfsd] >>> vfs_write+0x1c3/0xae0 >>> ksys_write+0xed/0x1c0 >>> do_syscall_64+0x38/0x90 >>> entry_SYSCALL_64_after_hwframe+0x72/0xdc >>> >>> Freed by task 102551: >>> kasan_save_stack+0x1e/0x40 >>> kasan_set_track+0x21/0x30 >>> kasan_save_free_info+0x2a/0x50 >>> __kasan_slab_free+0x106/0x190 >>> __kmem_cache_free+0x133/0x270 >>> svc_xprt_free+0x1e2/0x350 [sunrpc] >>> svc_xprt_destroy_all+0x25a/0x440 [sunrpc] >>> nfsd_put+0x125/0x240 [nfsd] >>> nfsd_svc+0x2cb/0x3c0 [nfsd] >>> write_threads+0x1ac/0x2a0 [nfsd] >>> nfsctl_transaction_write+0xac/0x110 [nfsd] >>> vfs_write+0x1c3/0xae0 >>> ksys_write+0xed/0x1c0 >>> do_syscall_64+0x38/0x90 >>> entry_SYSCALL_64_after_hwframe+0x72/0xdc >>> >>> In this RFC patch, I try to fix the UAF by skipping dereferencing >>> svsk for all child socket in svc_tcp_listen_data_ready(), it is >>> easy to backport for stable. >>> >>> However I'm not sure if there are other potential risks in the race >>> window, so I thought another fix which depends on SK_USER_DATA_NOCOPY >>> introduced in commit f1ff5ce2cd5e ("net, sk_msg: Clear sk_user_data >>> pointer on clone if tagged"). >>> >>> Saving svsk into sk_user_data with SK_USER_DATA_NOCOPY tag in >>> svc_setup_socket() like this: >>> >>> __rcu_assign_sk_user_data_with_flags(inet, svsk, SK_USER_DATA_NOCOPY); >>> >>> Obtaining svsk in callbacks like this: >>> >>> struct svc_sock *svsk = rcu_dereference_sk_user_data(sk); >>> >>> This will avoid copying sk_user_data for sunrpc svc_sock in >>> sk_clone_lock(), so the sk_user_data of child sock before accepted >>> will be NULL. >>> >>> Appreciate any comment and suggestion, thanks. >>> >>> Fixes: fa9251afc33c ("SUNRPC: Call the default socket callbacks instead of open coding") >>> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> >>> --- >>> net/sunrpc/svcsock.c | 23 +++++++++++------------ >>> 1 file changed, 11 insertions(+), 12 deletions(-) >>> >>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c >>> index a51c9b989d58..9aca6e1e78e4 100644 >>> --- a/net/sunrpc/svcsock.c >>> +++ b/net/sunrpc/svcsock.c >>> @@ -825,12 +825,6 @@ static void svc_tcp_listen_data_ready(struct sock *sk) >>> >>> trace_sk_data_ready(sk); >>> >>> - if (svsk) { >>> - /* Refer to svc_setup_socket() for details. */ >>> - rmb(); >>> - svsk->sk_odata(sk); >>> - } >>> - >>> /* >>> * This callback may called twice when a new connection >>> * is established as a child socket inherits everything >>> @@ -839,13 +833,18 @@ static void svc_tcp_listen_data_ready(struct sock *sk) >>> * when one of child sockets become ESTABLISHED. >>> * 2) data_ready method of the child socket may be called >>> * when it receives data before the socket is accepted. >>> - * In case of 2, we should ignore it silently. >>> + * In case of 2, we should ignore it silently and DO NOT >>> + * dereference svsk. >>> */ >>> - if (sk->sk_state == TCP_LISTEN) { >>> - if (svsk) { >>> - set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags); >>> - svc_xprt_enqueue(&svsk->sk_xprt); >>> - } >>> + if (sk->sk_state != TCP_LISTEN) >>> + return; >>> + >>> + if (svsk) { >>> + /* Refer to svc_setup_socket() for details. */ >>> + rmb(); >>> + svsk->sk_odata(sk); >>> + set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags); >>> + svc_xprt_enqueue(&svsk->sk_xprt); >>> } >>> } >>> >>> -- >>> 2.17.1 >>> >> -- >> Chuck Lever > > -- > Thanks, > - Ding Hui > -- Chuck Lever
On 2023/5/15 2:29, Chuck Lever III wrote: > [ Removing the stale address for Bruce from the Cc, as he no longer > works at Red Hat. ] > > >> On May 7, 2023, at 9:32 PM, Ding Hui <dinghui@sangfor.com.cn> wrote: >> >> On 2023/5/7 23:26, Chuck Lever III wrote: >>>> On May 7, 2023, at 5:11 AM, Ding Hui <dinghui@sangfor.com.cn> wrote: >>>> >>>> After the listener svc_sock freed, and before invoking svc_tcp_accept() >>>> for the established child sock, there is a window that the newsock >>>> retaining a freed listener svc_sock in sk_user_data which cloning from >>>> parent. In the race windows if data is received on the newsock, we will >>>> observe use-after-free report in svc_tcp_listen_data_ready(). >>> My thought is that not calling sk_odata() for the newsock >>> could potentially result in missing a data_ready event, >>> resulting in a hung client on that socket. >> >> I checked the vmcore, found that sk_odata points to sock_def_readable(), >> and the sk_wq of newsock is NULL, which be assigned by sk_clone_lock() >> unconditionally. >> >> Calling sk_odata() for the newsock maybe do not wake up any sleepers. >> >>> IMO the preferred approach is to ensure that svsk is always >>> safe to dereference in tcp_listen_data_ready. I haven't yet >>> thought carefully about how to do that. >> >> Agree, but I don't have a good way for now. >> >>>> Reproduce by two tasks: >>>> >>>> 1. while :; do rpc.nfsd 0 ; rpc.nfsd; done >>>> 2. while :; do echo "" | ncat -4 127.0.0.1 2049 ; done > > I haven't been able to reproduce a crash with this snippet. But KASAN report should be easier to reproduce than real crash. > I've done some archaeology to understand the problem better. > > I found that svc_tcp_listen_data_ready is actually invoked /three/ > times: once for the listener socket, and /twice/ for the child. > The big comment, which pre-dates the git era, appears to be > somewhat stale; or perhaps it's the specifics of this particular > test that triggers the third call. > > I reviewed several other tcp_listen_data_ready callbacks. They > generally do not do anything at all with non-listener sockets, > suggesting that approach would likely be safe for NFSD. > > Prior to commit 939bb7ef901b ("[PATCH] Code cleanups in calbacks > in svcsock"), this data_ready callback was a complete no-op for > non-listener sockets as well. That commit is described as only > a clean-up, but it indeed changes the logic. > > I also note that most other data_ready callbacks take the > sk_callback_lock, and svc_tcp_listen_data_ready does not. Not > clear to me whether svc_tcp_listen_data_ready should be taking > that lock too. > I notice the lock too, IMO the sk_callback_lock should be used to protect the svsk avoiding be freed during in the callbacks. Perhaps it can be reproduced by increasing the processing time in svc_tcp_listen_data_ready(), but anyway, it would be another issue. > The upshot is that I think it would be reasonable to simply do > nothing in svc_tcp_listen_data_ready() if state != TCP_LISTEN. > Thanks for the information. I will send the formal patch soon later.
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index a51c9b989d58..9aca6e1e78e4 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -825,12 +825,6 @@ static void svc_tcp_listen_data_ready(struct sock *sk) trace_sk_data_ready(sk); - if (svsk) { - /* Refer to svc_setup_socket() for details. */ - rmb(); - svsk->sk_odata(sk); - } - /* * This callback may called twice when a new connection * is established as a child socket inherits everything @@ -839,13 +833,18 @@ static void svc_tcp_listen_data_ready(struct sock *sk) * when one of child sockets become ESTABLISHED. * 2) data_ready method of the child socket may be called * when it receives data before the socket is accepted. - * In case of 2, we should ignore it silently. + * In case of 2, we should ignore it silently and DO NOT + * dereference svsk. */ - if (sk->sk_state == TCP_LISTEN) { - if (svsk) { - set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags); - svc_xprt_enqueue(&svsk->sk_xprt); - } + if (sk->sk_state != TCP_LISTEN) + return; + + if (svsk) { + /* Refer to svc_setup_socket() for details. */ + rmb(); + svsk->sk_odata(sk); + set_bit(XPT_CONN, &svsk->sk_xprt.xpt_flags); + svc_xprt_enqueue(&svsk->sk_xprt); } }