[net,0/9] rxrpc: Fixes for I/O thread conversion/SACK table expansion

Message ID 167112117887.152641.6194213035340041732.stgit@warthog.procyon.org.uk
Headers
Series rxrpc: Fixes for I/O thread conversion/SACK table expansion |

Message

David Howells Dec. 15, 2022, 4:19 p.m. UTC
  Here are some fixes for AF_RXRPC:

 (1) Fix missing unlock in rxrpc's sendmsg.

 (2) Fix (lack of) propagation of security settings to rxrpc_call.

 (3) Fix NULL ptr deref in rxrpc_unuse_local().

 (4) Fix problem with kthread_run() not invoking the I/O thread function if
     the kthread gets stopped first.  Possibly this should actually be
     fixed in the kthread code.

 (5) Fix locking problem as putting a peer (which may be done from RCU) may
     now invoke kthread_stop().

 (6) Fix switched parameters in a couple of trace calls.

 (7) Fix I/O thread's checking for kthread stop to make sure it completes
     all outstanding work before returning so that calls are cleaned up.

 (8) Fix an uninitialised var in the new rxperf test server.

 (9) Fix the return value of rxrpc_new_incoming_call() so that the checks
     on it work correctly.

The patches fix at least one syzbot bug[1] and probably some others that
don't have reproducers[2][3][4].  I think it also fixes another[5], but
that showed another failure during testing that was different to the
original.

There's also an outstanding bug in rxrpc_put_peer()[6] that is fixed by a
combination of several patches in my rxrpc-next branch, but I haven't
included that here.

The patches are tagged here:

	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
	rxrpc-fixes-20221215

and can also be found on the following branch:

	https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes

David

Link: https://syzkaller.appspot.com/bug?extid=3538a6a72efa8b059c38 [1]
Link: https://syzkaller.appspot.com/bug?extid=2a99eae8dc7c754bc16b [2]
Link: https://syzkaller.appspot.com/bug?extid=e1391a5bf3f779e31237 [3]
Link: https://syzkaller.appspot.com/bug?extid=2aea8e1c8e20cb27a01f [4]
Link: https://syzkaller.appspot.com/bug?extid=1eb4232fca28c0a6d1c2 [5]
Link: https://syzkaller.appspot.com/bug?extid=c22650d2844392afdcfd [6]

---
David Howells (9):
      rxrpc: Fix missing unlock in rxrpc_do_sendmsg()
      rxrpc: Fix security setting propagation
      rxrpc: Fix NULL deref in rxrpc_unuse_local()
      rxrpc: Fix I/O thread startup getting skipped
      rxrpc: Fix locking issues in rxrpc_put_peer_locked()
      rxrpc: Fix switched parameters in peer tracing
      rxrpc: Fix I/O thread stop
      rxrpc: rxperf: Fix uninitialised variable
      rxrpc: Fix the return value of rxrpc_new_incoming_call()


 include/trace/events/rxrpc.h |  2 +-
 net/rxrpc/ar-internal.h      |  8 ++++----
 net/rxrpc/call_accept.c      | 18 +++++++++---------
 net/rxrpc/call_object.c      |  1 +
 net/rxrpc/conn_client.c      |  2 --
 net/rxrpc/io_thread.c        | 10 +++++++---
 net/rxrpc/local_object.c     |  5 ++++-
 net/rxrpc/peer_event.c       | 10 +++++++---
 net/rxrpc/peer_object.c      | 23 ++---------------------
 net/rxrpc/rxperf.c           |  2 +-
 net/rxrpc/security.c         |  6 +++---
 net/rxrpc/sendmsg.c          |  2 +-
 12 files changed, 40 insertions(+), 49 deletions(-)
  

Comments

Marc Dionne Dec. 15, 2022, 7:48 p.m. UTC | #1
On Thu, Dec 15, 2022 at 12:20 PM David Howells <dhowells@redhat.com> wrote:
>
>
> Here are some fixes for AF_RXRPC:
>
>  (1) Fix missing unlock in rxrpc's sendmsg.
>
>  (2) Fix (lack of) propagation of security settings to rxrpc_call.
>
>  (3) Fix NULL ptr deref in rxrpc_unuse_local().
>
>  (4) Fix problem with kthread_run() not invoking the I/O thread function if
>      the kthread gets stopped first.  Possibly this should actually be
>      fixed in the kthread code.
>
>  (5) Fix locking problem as putting a peer (which may be done from RCU) may
>      now invoke kthread_stop().
>
>  (6) Fix switched parameters in a couple of trace calls.
>
>  (7) Fix I/O thread's checking for kthread stop to make sure it completes
>      all outstanding work before returning so that calls are cleaned up.
>
>  (8) Fix an uninitialised var in the new rxperf test server.
>
>  (9) Fix the return value of rxrpc_new_incoming_call() so that the checks
>      on it work correctly.
>
> The patches fix at least one syzbot bug[1] and probably some others that
> don't have reproducers[2][3][4].  I think it also fixes another[5], but
> that showed another failure during testing that was different to the
> original.
>
> There's also an outstanding bug in rxrpc_put_peer()[6] that is fixed by a
> combination of several patches in my rxrpc-next branch, but I haven't
> included that here.
>
> The patches are tagged here:
>
>         git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
>         rxrpc-fixes-20221215
>
> and can also be found on the following branch:
>
>         https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes
>
> David
>
> Link: https://syzkaller.appspot.com/bug?extid=3538a6a72efa8b059c38 [1]
> Link: https://syzkaller.appspot.com/bug?extid=2a99eae8dc7c754bc16b [2]
> Link: https://syzkaller.appspot.com/bug?extid=e1391a5bf3f779e31237 [3]
> Link: https://syzkaller.appspot.com/bug?extid=2aea8e1c8e20cb27a01f [4]
> Link: https://syzkaller.appspot.com/bug?extid=1eb4232fca28c0a6d1c2 [5]
> Link: https://syzkaller.appspot.com/bug?extid=c22650d2844392afdcfd [6]
>
> ---
> David Howells (9):
>       rxrpc: Fix missing unlock in rxrpc_do_sendmsg()
>       rxrpc: Fix security setting propagation
>       rxrpc: Fix NULL deref in rxrpc_unuse_local()
>       rxrpc: Fix I/O thread startup getting skipped
>       rxrpc: Fix locking issues in rxrpc_put_peer_locked()
>       rxrpc: Fix switched parameters in peer tracing
>       rxrpc: Fix I/O thread stop
>       rxrpc: rxperf: Fix uninitialised variable
>       rxrpc: Fix the return value of rxrpc_new_incoming_call()
>
>
>  include/trace/events/rxrpc.h |  2 +-
>  net/rxrpc/ar-internal.h      |  8 ++++----
>  net/rxrpc/call_accept.c      | 18 +++++++++---------
>  net/rxrpc/call_object.c      |  1 +
>  net/rxrpc/conn_client.c      |  2 --
>  net/rxrpc/io_thread.c        | 10 +++++++---
>  net/rxrpc/local_object.c     |  5 ++++-
>  net/rxrpc/peer_event.c       | 10 +++++++---
>  net/rxrpc/peer_object.c      | 23 ++---------------------
>  net/rxrpc/rxperf.c           |  2 +-
>  net/rxrpc/security.c         |  6 +++---
>  net/rxrpc/sendmsg.c          |  2 +-
>  12 files changed, 40 insertions(+), 49 deletions(-)

For the series:

Tested-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: kafs-testing+fedora36_64checkkafs-build-164@auristor.com

Marc
  
David Howells Dec. 16, 2022, 6:46 a.m. UTC | #2
Hillf Danton <hdanton@sina.com> wrote:

> > @@ -478,13 +479,14 @@ int rxrpc_io_thread(void *data)
> >  		}
> >  
> >  		set_current_state(TASK_INTERRUPTIBLE);
> > +		should_stop = kthread_should_stop();
> >  		if (!skb_queue_empty(&local->rx_queue) ||
> >  		    !list_empty(&local->call_attend_q)) {
> >  			__set_current_state(TASK_RUNNING);
> >  			continue;
> >  		}
> >  
> > -		if (kthread_should_stop())
> > +		if (should_stop)
> >  			break;
> >  		schedule();
> >  	}
> 
> At the second glance still fail to see the difference this change can
> make.

There is a window here:

  		if (!skb_queue_empty(&local->rx_queue) ...)
			continue;
	--->
		if (kthread_should_stop())
 			break;

in which an event can happen that should be attended to.  For instance the
AF_RXRPC socket being closed, aborting all its calls and stopping the kthread
by doing the final unuse on its rxrpc_local struct - in that order.  The
window can be expanded by an interrupt or softirq handler running.

So once we've observed that we've been asked to stop, we need to check if
there's more work to be done and, if so, do that work first.

So by doing:

		should_stop = kthread_should_stop();
  		if (!skb_queue_empty(&local->rx_queue) ...)
			continue;
		if (should_stop)
 			break;

we do all outstanding cleanup work before exiting the loop to stop the thread.

David
  
David Howells Dec. 18, 2022, 7:59 p.m. UTC | #3
Hillf Danton <hdanton@sina.com> wrote:

> In line with
> 
> 	if (condition)
> 		return;
> 	add to wait queue
> 	if (!condition)
> 		schedule();
> 
> this change should look like
> 
>    		if (!skb_queue_empty(&local->rx_queue) ...)
>  			continue;
> 
>  		if (kthread_should_stop())
>    			if (!skb_queue_empty(&local->rx_queue) ...)
>  				continue;
> 			else
>   				break;
> 
> as checking condition once barely makes sense.

Really, no.  The condition is going to expand to have a whole bunch of things
in it and I don't want to have it twice, e.g.:

                if (!skb_queue_empty(&local->rx_queue) ||
                   READ_ONCE(local->events) ||
                    !list_empty(&local->call_attend_q) ||
                    !list_empty(&local->conn_attend_q) ||
                    !list_empty(&local->new_client_calls) ||
		    test_bit(RXRPC_CLIENT_CONN_REAP_TIMER,
			     &local->client_conn_flags)) {
			...

Hmmm...  I wonder if kthread_should_stop() needs a barrier associated with
it.  It's just a test_bit(), so the compiler can cache the results of all
these tests - or reorder them.

David
  
David Howells Dec. 19, 2022, 12:20 a.m. UTC | #4
Hillf Danton <hdanton@sina.com> wrote:

> > So once we've observed that we've been asked to stop, we need to check if
> > there's more work to be done and, if so, do that work first.
> 
> In line with
> 
> 	if (condition)
> 		return;
> 	add to wait queue
> 	if (!condition)
> 		schedule();
> 
> this change should look like
> 
>    		if (!skb_queue_empty(&local->rx_queue) ...)
>  			continue;
> 
>  		if (kthread_should_stop())
>    			if (!skb_queue_empty(&local->rx_queue) ...)
>  				continue;
> 			else
>   				break;
> 
> as checking condition once barely makes sense.

Note that these are not really analogous.  The add-to-wait-queue step is
significantly more expensive than kthread_should_stop() and requires removal
in the event that the condition becomes true in the window.

In the case of kthread_should_stop(), it's just a test_bit() of a word that's
in a cacheline not going to get changed until the thread is stopped.  Testing
the value first and then checking the condition should be fine as the stop
flag can be shared in the cpu's data cache until it's set.

Also from a code-maintenance PoV, I don't want to write the condition twice if
I can avoid it.  That allows for the two copies to get out of sync.

> Because of a bit complex condition does not mean checking it once is neither
> sane nor correct.

So you agree with me, I think?

David
  
patchwork-bot+netdevbpf@kernel.org Dec. 19, 2022, 10:20 a.m. UTC | #5
Hello:

This series was applied to netdev/net.git (master)
by David S. Miller <davem@davemloft.net>:

On Thu, 15 Dec 2022 16:19:38 +0000 you wrote:
> Here are some fixes for AF_RXRPC:
> 
>  (1) Fix missing unlock in rxrpc's sendmsg.
> 
>  (2) Fix (lack of) propagation of security settings to rxrpc_call.
> 
>  (3) Fix NULL ptr deref in rxrpc_unuse_local().
> 
> [...]

Here is the summary with links:
  - [net,1/9] rxrpc: Fix missing unlock in rxrpc_do_sendmsg()
    https://git.kernel.org/netdev/net/c/4feb2c44629e
  - [net,2/9] rxrpc: Fix security setting propagation
    https://git.kernel.org/netdev/net/c/fdb99487b018
  - [net,3/9] rxrpc: Fix NULL deref in rxrpc_unuse_local()
    https://git.kernel.org/netdev/net/c/eaa02390adb0
  - [net,4/9] rxrpc: Fix I/O thread startup getting skipped
    https://git.kernel.org/netdev/net/c/8fbcc83334a7
  - [net,5/9] rxrpc: Fix locking issues in rxrpc_put_peer_locked()
    https://git.kernel.org/netdev/net/c/608aecd16a31
  - [net,6/9] rxrpc: Fix switched parameters in peer tracing
    https://git.kernel.org/netdev/net/c/c838f1a73d77
  - [net,7/9] rxrpc: Fix I/O thread stop
    https://git.kernel.org/netdev/net/c/743d1768a008
  - [net,8/9] rxrpc: rxperf: Fix uninitialised variable
    https://git.kernel.org/netdev/net/c/11e1706bc84f
  - [net,9/9] rxrpc: Fix the return value of rxrpc_new_incoming_call()
    https://git.kernel.org/netdev/net/c/31d35a02ad5b

You are awesome, thank you!