From patchwork Sat Jan 7 09:52:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 3686 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp1232515wrt; Sat, 7 Jan 2023 01:55:53 -0800 (PST) X-Google-Smtp-Source: AMrXdXvJWp6Dsa1u8GufmeA5AICCy4bH89TsTIdaACyrtWw3AaZLhS1IAjd7L1pcQDbrmdhdFAvr X-Received: by 2002:a17:907:c386:b0:81e:8988:71ac with SMTP id tm6-20020a170907c38600b0081e898871acmr52495815ejc.24.1673085353246; Sat, 07 Jan 2023 01:55:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673085353; cv=none; d=google.com; s=arc-20160816; b=IxPbVmeFiGLdv/02aZt3dzJXm7lorkQeo8WZ+no9VH5qJOKbx+ZVMoOJ0/VZZ4ANWO 9a4rvYOiMdl8JlgXU6HPrn+wqRYHXFIwfabIAaNr/niVr6MpA5CvmCzOlqRgaOltijXH +UZ1zJb/dogP5LE/SnxH4keBQX+V0Sy669NUUeh1nTgaqLLJDYI/TeVOidVOQNCQCAtY 2h9+SHSb5csM+EJwcfnSCcTweOsoyLzLDwfggXRyZmX1twPsyto3kKeTYekuaM+OxfCd NMQvxYks7CbfDmVSCiajrKqVKFoOSRVhfVHLPT7F/qo4ZfzIy6hhxSXqJPKy0tJ9Jq5P hVQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:message-id:date:cc:to:from:subject:dkim-signature; bh=rIEC3SAlo0v2mH4ZztYrNVkXsYAur5kz6OPpXe2f6gE=; b=oTpClCzooOk7lY0KTyUni68KwJCSyzAKWa6/go8YEeizCq/ML1zeZz/4VpQI4gMPVJ W+KkmIn6oeC5tAC51KETplsyRjGj528lkRCDWI22YigV0MGA/GiRp6gwb+nNdSdO+2bf O3brmpLTIdFVMZKj+/ZOVtkQx6pPjjh/h7/205wJ5P1WV8UmLQjbZYrIeGzZ363ln80s fIpIqlUt11f6eMpCKZFpnviBz5P6zo/oWnFxvcuEviSvgjmI58RPGfzx58FezoxbP4Gs bv2iaiyRQWvgRqZMZ4KINFI6bcHxHi4lK/02dypNZVJ9lrK1VcKmc/QTgRsYOjHaJ4vS sPig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=jWF56ZGF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id la26-20020a170907781a00b0084a5289627asi3047027ejc.965.2023.01.07.01.55.28; Sat, 07 Jan 2023 01:55:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=jWF56ZGF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229695AbjAGJxv (ORCPT + 99 others); Sat, 7 Jan 2023 04:53:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47696 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231829AbjAGJxo (ORCPT ); Sat, 7 Jan 2023 04:53:44 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6EB5E7CBE4 for ; Sat, 7 Jan 2023 01:52:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1673085176; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=rIEC3SAlo0v2mH4ZztYrNVkXsYAur5kz6OPpXe2f6gE=; b=jWF56ZGF/Clk7bKVMjTo6axdexz8RGQ28GPEVctzKGVcgxkGsxVDi4TdTYZDAt14VP9QUI z2QtK6gPOq0Evk8EztPoKY24V7JfKdjfpb/AGmSp92dgGcBaf9lDPTjJbP7N5A7xo2axYK e3DuSgYS3lunl88vl16okKj4v794Zg0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-590-EGmrCz9ePSqRQSxoOsHzIg-1; Sat, 07 Jan 2023 04:52:53 -0500 X-MC-Unique: EGmrCz9ePSqRQSxoOsHzIg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 1A9B0811E6E; Sat, 7 Jan 2023 09:52:53 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.33.36.87]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3B1C01121314; Sat, 7 Jan 2023 09:52:52 +0000 (UTC) Subject: [PATCH net 00/19] rxrpc: Fix race between call connection, data transmit and call disconnect From: David Howells To: netdev@vger.kernel.org Cc: Marc Dionne , syzbot+c22650d2844392afdcfd@syzkaller.appspotmail.com, linux-afs@lists.infradead.org, dhowells@redhat.com, linux-afs@lists.infradead.org, linux-kernel@vger.kernel.org Date: Sat, 07 Jan 2023 09:52:51 +0000 Message-ID: <167308517118.1538866.3440481802366869065.stgit@warthog.procyon.org.uk> User-Agent: StGit/1.5 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754357147450116781?= X-GMAIL-MSGID: =?utf-8?q?1754357147450116781?= Here are patches to fix an oops[1] caused by a race between call connection, initial packet transmission and call disconnection which results in something like: kernel BUG at net/rxrpc/peer_object.c:413! when the syzbot test is run. The problem is that the connection procedure is effectively split across two threads and can get expanded by taking an interrupt, thereby adding the call to the peer error distribution list *after* it has been disconnected (say by the rxrpc socket shutting down). The easiest solution is to look at the fourth set of I/O thread conversion/SACK table expansion patches that didn't get applied[2] and take from it those patches that move call connection and disconnection into the I/O thread. Moving these things into the I/O thread means that the sequencing is managed by all being done in the same thread - and the race can no longer happen. This is preferable to introducing an extra lock as adding an extra lock would make the I/O thread have to wait for the app thread in yet another place. The changes can be considered as a number of logical parts: (1) Move all of the call state changes into the I/O thread. (2) Make client connection ID space per-local endpoint so that the I/O thread doesn't need locks to access it. (3) Move actual abort generation into the I/O thread and clean it up. If sendmsg or recvmsg want to cause an abort, they have to delegate it. (4) Offload the setting up of the security context on a connection to the thread of one of the apps that's starting a call. We don't want to be doing any sort of crypto in the I/O thread. (5) Connect calls (ie. assign them to channel slots on connections) in the I/O thread. Calls are set up by sendmsg/kafs and passed to the I/O thread to connect. Connections are allocated in the I/O thread after this. (6) Disconnect calls in the I/O thread. I've also added a patch for an unrelated bug that cropped up during testing, whereby a race can occur between an incoming call and socket shutdown. Note that whilst this fixes the original syzbot bug, another bug may get triggered if this one is fixed: INFO: rcu detected stall in corrupted rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P5792 } 2657 jiffies s: 2825 root: 0x0/T rcu: blocking rcu_node structures (internal RCU debug): It doesn't look this should be anything to do with rxrpc, though, as I've tested an additional patch[3] that removes practically all the RCU usage from rxrpc and it still occurs. It seems likely that it is being caused by something in the tunnelling setup that the syzbot test does, but there's not enough info to go on. It also seems unlikely to be anything to do with the afs driver as the test doesn't use that. The patches are tagged here: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git rxrpc-fixes-20230107 and can also be found on the following branch: https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes David Link: https://syzkaller.appspot.com/bug?extid=c22650d2844392afdcfd [1] Link: https://lore.kernel.org/r/167034231605.1105287.1693064952174322878.stgit@warthog.procyon.org.uk/ [2] Link: https://lore.kernel.org/r/1278570.1673042093@warthog.procyon.org.uk/ [3] --- David Howells (19): rxrpc: Stash the network namespace pointer in rxrpc_local rxrpc: Make the local endpoint hold a ref on a connected call rxrpc: Separate call retransmission from other conn events rxrpc: Only set/transmit aborts in the I/O thread rxrpc: Only disconnect calls in the I/O thread rxrpc: Implement a mechanism to send an event notification to a connection rxrpc: Clean up connection abort rxrpc: Tidy up abort generation infrastructure rxrpc: Make the set of connection IDs per local endpoint rxrpc: Offload the completion of service conn security to the I/O thread rxrpc: Set up a connection bundle from a call, not rxrpc_conn_parameters rxrpc: Split out the call state changing functions into their own file rxrpc: Wrap accesses to get call state to put the barrier in one place rxrpc: Move call state changes from sendmsg to I/O thread rxrpc: Move call state changes from recvmsg to I/O thread rxrpc: Remove call->state_lock rxrpc: Move the client conn cache management to the I/O thread rxrpc: Move client call connection to the I/O thread rxrpc: Fix incoming call setup race Documentation/networking/rxrpc.rst | 4 +- fs/afs/cmservice.c | 6 +- fs/afs/rxrpc.c | 24 +- include/net/af_rxrpc.h | 3 +- include/trace/events/rxrpc.h | 160 +++++-- net/rxrpc/Makefile | 1 + net/rxrpc/af_rxrpc.c | 27 +- net/rxrpc/ar-internal.h | 212 ++++++--- net/rxrpc/call_accept.c | 57 ++- net/rxrpc/call_event.c | 86 +++- net/rxrpc/call_object.c | 116 +++-- net/rxrpc/call_state.c | 69 +++ net/rxrpc/conn_client.c | 709 ++++++++--------------------- net/rxrpc/conn_event.c | 382 ++++++---------- net/rxrpc/conn_object.c | 67 ++- net/rxrpc/conn_service.c | 1 - net/rxrpc/input.c | 175 +++---- net/rxrpc/insecure.c | 20 +- net/rxrpc/io_thread.c | 204 +++++---- net/rxrpc/local_object.c | 35 +- net/rxrpc/net_ns.c | 17 - net/rxrpc/output.c | 60 ++- net/rxrpc/peer_object.c | 23 +- net/rxrpc/proc.c | 17 +- net/rxrpc/recvmsg.c | 256 +++-------- net/rxrpc/rxkad.c | 356 ++++++--------- net/rxrpc/rxperf.c | 17 +- net/rxrpc/security.c | 53 +-- net/rxrpc/sendmsg.c | 195 ++++---- 29 files changed, 1592 insertions(+), 1760 deletions(-) create mode 100644 net/rxrpc/call_state.c