Message ID | 20230801132441.559222-1-frederic@kernel.org |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp2695128vqg; Tue, 1 Aug 2023 07:10:00 -0700 (PDT) X-Google-Smtp-Source: APBJJlFRTtgpKM70C60HQW1CeEWUBiR7QMKkyr/UDHGmBjwpapeIcxWX1NQj3O+sx+X3hc+IMCbR X-Received: by 2002:a05:6a00:2d23:b0:66d:514c:cb33 with SMTP id fa35-20020a056a002d2300b0066d514ccb33mr15148613pfb.6.1690899000005; Tue, 01 Aug 2023 07:10:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690898999; cv=none; d=google.com; s=arc-20160816; b=JNkSTkTHAD25St+MQRf+rjkkJ1iTWY0J9yjGl65mfknGQpdeqRSbv+l3n+zorM+T9V 5GlCOVW04cju/AWqo71pxKZq/vrjvHgPv93k6oIqpRmiioHcqlJoHRtlX0TdWBN1YvoF L0SdFKdotWbmr2LX1cscjLE9kg5cuzuHOyFBcwXQpZihJ+sWbKza4RsTl4NMpIzsuhpE XzIe5rvMOwU1FSYoj5qOis805kcgVIpMkx2Ye7CSFeqOHvtrXrY9GITuqPTPtjrCtXIN bf8mD+8Oi/6cdvOLs1/iCCvznGKI6IQwTy6M2nRkbY7gmd4ZidC41/N6uHtaOCsPpg2Y vF3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=LiLr0PjsZYoE6qcnl3vZSMXxsP+UkQo+lnqiNAc57C0=; fh=JC0FP1KBPXa+8NkvIvZoCUIIa+zGdCFP1xCgvJrJBSE=; b=Qgy+Fn+kya1cRENy7M8R5IEWtyplwkEujdS/OVVWk28O/6asaOWxxqv0gEcV0LFmio r50LbJEREqH1U33qkxrdhiPcyzichUEWuBKa89KDM6wDSP54lbwzVU+WOeXE7ZLPoV/a 0uyFkp+0QkVnJWcp1QHkqLT0ftsCP9zDtA+dz3SoT7KDpYKK7dEHvRqN2DHwssfejw0r yeOg258IxFGJndi4nZpwlvUwdOXnMdmH0/DMyfbo7wC7TKq6OQmsuyVxfW51pZ7h4AWW kVuKQmWE80+VimG8c/+gj9quEReDKB7lxcy2PstmyMyxO8iUskr8FQtNO/OooQ1dpk+X IoPg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=ZrKDSANa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s6-20020a6550c6000000b0055c1fb5a4a9si3987339pgp.661.2023.08.01.07.09.46; Tue, 01 Aug 2023 07:09:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=ZrKDSANa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234062AbjHANYv (ORCPT <rfc822;maxi.paulin@gmail.com> + 99 others); Tue, 1 Aug 2023 09:24:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230315AbjHANYt (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 1 Aug 2023 09:24:49 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CE4751985 for <linux-kernel@vger.kernel.org>; Tue, 1 Aug 2023 06:24:48 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6D30861592 for <linux-kernel@vger.kernel.org>; Tue, 1 Aug 2023 13:24:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DE704C433C8; Tue, 1 Aug 2023 13:24:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1690896287; bh=gi17WKwgWC1yvGrPqagQBV1JsuFx4OzmnPfU0tzhjOs=; h=From:To:Cc:Subject:Date:From; b=ZrKDSANaiRFMP0aOHzjxabn7HIaX8xWTDBeOTIcsPw5bJX8ogS4WP+BE+g+6ZAld9 5zNl5zp3Hd5PUoxahbRgmB3hY1e5a0Ca6qDB8RASF5NOHhMfZveg5wWvtXs5HbvLUd cG8KOOAEVuSzhneLl+uR4vLoRv71t2mqIIqCNO6Tp4T0vuAGjgB+sj1c8Inln2/Te8 x9jJN4fPEtB2olcPE2LUN//VKrRUxeiTSGth2pZ1qGTncvhYtciNlwyrSTJbTlwI7+ mcDyd+9YALri04qhE8ViPLPgjjQTYuc3Tz8PbmIJLIt6YAhPdO3bJFD6UY95sLBMfF AqkRr+RI40QNQ== From: Frederic Weisbecker <frederic@kernel.org> To: LKML <linux-kernel@vger.kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org>, Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, "Paul E . McKenney" <paulmck@kernel.org>, Linus Torvalds <torvalds@linux-foundation.org>, Anna-Maria Behnsen <anna-maria@linutronix.de>, Eric Dumazet <edumazet@google.com> Subject: [RFC PATCH 0/6] softirq: Start pushing down the big softirq lock Date: Tue, 1 Aug 2023 15:24:35 +0200 Message-Id: <20230801132441.559222-1-frederic@kernel.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773036109862943258 X-GMAIL-MSGID: 1773036109862943258 |
Series |
softirq: Start pushing down the big softirq lock
|
|
Message
Frederic Weisbecker
Aug. 1, 2023, 1:24 p.m. UTC
Networking softirqs can take time, holding the execution of block, tasklets, timers and RCU callbacks. People fight hard through this big softirq lock, proposing more and more hacks over the years to deal with the resulting fundamental unfairness that is not only a problem for RT users. Here is a proposal for an entrypoint to dealing with that issue in the long term. The purpose is to adopt a similar journey to the one we took with the BKL push-down but with timers. Most timers are unrelated to other softirq vectors, those can simply be tagged with the new TIMER_SOFTINTERRUPTIBLE flag that makes a callback soft-interruptible. The others can carry the TIMER_SOFTINTERRUPTIBLE after they get converted to use appropriate synchronization against other vectors callbacks (using spin_lock_bh() for example). Once all timers are dealt with after a few years (famous last words), they can be handled separately from the softirq infrastructure. RCU could follow a similar treatment, if we manage to find room for a flag somewhere... (Only -RT and x86 are supported for now) git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git timers/softirq-preemptible HEAD: c233aee141ddb78d07b2f7311be38cfc286de654 Thanks, Frederic --- Frederic Weisbecker (6): softirq: Turn set_softirq_pending() to reset_softirq_pending() softirq: Make softirq handling entry/exit generally available softirq: Introduce softirq disabled mask x86/softirq: Support softirq disabled mask timers: Introduce soft-interruptible timers timers: Make process_timeout() soft-interruptible arch/Kconfig | 3 +++ arch/x86/Kconfig | 1 + arch/x86/include/asm/current.h | 1 + arch/x86/include/asm/hardirq.h | 1 + include/linux/bottom_half.h | 9 +++++++++ include/linux/interrupt.h | 15 ++++++++++++--- include/linux/timer.h | 5 +++-- kernel/softirq.c | 40 ++++++++++++++++++++++++++++++++-------- kernel/time/timer.c | 20 +++++++++++++++++++- 9 files changed, 81 insertions(+), 14 deletions(-)
Comments
On 2023-08-01 15:24:35 [+0200], Frederic Weisbecker wrote: > Networking softirqs can take time, holding the execution of block, > tasklets, timers and RCU callbacks. > > People fight hard through this big softirq lock, proposing more and > more hacks over the years to deal with the resulting fundamental > unfairness that is not only a problem for RT users. > > Here is a proposal for an entrypoint to dealing with that issue in the > long term. The purpose is to adopt a similar journey to the one we took > with the BKL push-down but with timers. Most timers are unrelated to > other softirq vectors, those can simply be tagged with the new > TIMER_SOFTINTERRUPTIBLE flag that makes a callback soft-interruptible. > The others can carry the TIMER_SOFTINTERRUPTIBLE after they get converted > to use appropriate synchronization against other vectors callbacks > (using spin_lock_bh() for example). This doesn't work as proposed because of lock ordering: |====================================================== |WARNING: possible circular locking dependency detected |6.5.0-rc4-rt1+ #220 Not tainted |------------------------------------------------------ |ktimers/0/15 is trying to acquire lock: |ffff88817b41b6d8 ((softirq_ctrl.lock)){+.+.}-{2:2}, at: __local_bh_disable_ip+0xb7/0x1a0 | |but task is already holding lock: |ffff88817b41c820 (&base->expiry_lock){+.+.}-{2:2}, at: run_timer_softirq+0x61/0x3f0 | |which lock already depends on the new lock. | | |the existing dependency chain (in reverse order) is: | |-> #1 (&base->expiry_lock){+.+.}-{2:2}: | lock_acquire+0xd4/0x2f0 | rt_spin_lock+0x21/0xf0 | run_timer_softirq+0x61/0x3f0 | __do_softirq+0x19b/0x4cb | run_timersd+0x92/0xf0 | smpboot_thread_fn+0x211/0x330 | kthread+0x110/0x130 | ret_from_fork+0x2b/0x40 | ret_from_fork_asm+0x1b/0x30 | |-> #0 ((softirq_ctrl.lock)){+.+.}-{2:2}: | check_prev_add+0xe2/0xd60 | __lock_acquire+0x132d/0x1700 | lock_acquire+0xd4/0x2f0 | rt_spin_lock+0x21/0xf0 | __local_bh_disable_ip+0xb7/0x1a0 | call_timer_fn+0x172/0x310 | run_timer_softirq+0x331/0x3f0 | __do_softirq+0x19b/0x4cb | run_timersd+0x92/0xf0 | smpboot_thread_fn+0x211/0x330 | kthread+0x110/0x130 | ret_from_fork+0x2b/0x40 | ret_from_fork_asm+0x1b/0x30 | |other info that might help us debug this: | | Possible unsafe locking scenario: | | CPU0 CPU1 | ---- ---- | lock(&base->expiry_lock); | lock((softirq_ctrl.lock)); | lock(&base->expiry_lock); | lock((softirq_ctrl.lock)); | | *** DEADLOCK *** | |2 locks held by ktimers/0/15: | #0: ffffffff826e9ce0 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5d/0xf0 | #1: ffff88817b41c820 (&base->expiry_lock){+.+.}-{2:2}, at: run_timer_softirq+0x61/0x3f0 I posted a different series last week where I drop the lock for other reasons at a different spot where it is safe to do so. It needs adopting other level softirq handler (besides TIMER_SOFTIRQ) but there is no need to deal with the individual callback. However, how do you continue here? Assuming all timers are marked TIMER_SOFTINTERRUPTIBLE then you could avoid the BH-lock at the timer-softirq. But when is a timer considered safe? Would the lack of the _bh suffix be that and you would simply add it during your push down? Then you continue the same thing for the remaining softirqs. And once you are done you would remove that RT lock within local_bh_disable()? This isn't something a !RT user would benefit, right? The other idea I have (besides the preemption point in each softirq handler (mentioned earlier)) is to simple drop the BH-lock on RT. Unlike mainline, RT wouldn't deadlock then. The only that would be missing is synchronisation against local_bh_disable() only locking for variables. From what I remember from the various BH-models we have in RT in the past, that was the only thing that exploded. Sebastian
On Mon, Aug 07, 2023 at 02:50:20PM +0200, Sebastian Andrzej Siewior wrote: > On 2023-08-01 15:24:35 [+0200], Frederic Weisbecker wrote: > > Networking softirqs can take time, holding the execution of block, > > tasklets, timers and RCU callbacks. > > > > People fight hard through this big softirq lock, proposing more and > > more hacks over the years to deal with the resulting fundamental > > unfairness that is not only a problem for RT users. > > > > Here is a proposal for an entrypoint to dealing with that issue in the > > long term. The purpose is to adopt a similar journey to the one we took > > with the BKL push-down but with timers. Most timers are unrelated to > > other softirq vectors, those can simply be tagged with the new > > TIMER_SOFTINTERRUPTIBLE flag that makes a callback soft-interruptible. > > The others can carry the TIMER_SOFTINTERRUPTIBLE after they get converted > > to use appropriate synchronization against other vectors callbacks > > (using spin_lock_bh() for example). > > This doesn't work as proposed because of lock ordering: > |====================================================== > |WARNING: possible circular locking dependency detected > |6.5.0-rc4-rt1+ #220 Not tainted > |------------------------------------------------------ > |ktimers/0/15 is trying to acquire lock: > |ffff88817b41b6d8 ((softirq_ctrl.lock)){+.+.}-{2:2}, at: __local_bh_disable_ip+0xb7/0x1a0 > | > |but task is already holding lock: > |ffff88817b41c820 (&base->expiry_lock){+.+.}-{2:2}, at: run_timer_softirq+0x61/0x3f0 > | > |which lock already depends on the new lock. > | > | > |the existing dependency chain (in reverse order) is: > | > |-> #1 (&base->expiry_lock){+.+.}-{2:2}: > | lock_acquire+0xd4/0x2f0 > | rt_spin_lock+0x21/0xf0 > | run_timer_softirq+0x61/0x3f0 > | __do_softirq+0x19b/0x4cb > | run_timersd+0x92/0xf0 > | smpboot_thread_fn+0x211/0x330 > | kthread+0x110/0x130 > | ret_from_fork+0x2b/0x40 > | ret_from_fork_asm+0x1b/0x30 > | > |-> #0 ((softirq_ctrl.lock)){+.+.}-{2:2}: > | check_prev_add+0xe2/0xd60 > | __lock_acquire+0x132d/0x1700 > | lock_acquire+0xd4/0x2f0 > | rt_spin_lock+0x21/0xf0 > | __local_bh_disable_ip+0xb7/0x1a0 > | call_timer_fn+0x172/0x310 > | run_timer_softirq+0x331/0x3f0 > | __do_softirq+0x19b/0x4cb > | run_timersd+0x92/0xf0 > | smpboot_thread_fn+0x211/0x330 > | kthread+0x110/0x130 > | ret_from_fork+0x2b/0x40 > | ret_from_fork_asm+0x1b/0x30 > | > |other info that might help us debug this: > | > | Possible unsafe locking scenario: > | > | CPU0 CPU1 > | ---- ---- > | lock(&base->expiry_lock); > | lock((softirq_ctrl.lock)); > | lock(&base->expiry_lock); > | lock((softirq_ctrl.lock)); > | > | *** DEADLOCK *** > | > |2 locks held by ktimers/0/15: > | #0: ffffffff826e9ce0 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5d/0xf0 > | #1: ffff88817b41c820 (&base->expiry_lock){+.+.}-{2:2}, at: > |run_timer_softirq+0x61/0x3f0 Right, need to pull that to the caller before it releases the lock. > > I posted a different series last week where I drop the lock for other > reasons at a different spot where it is safe to do so. It needs adopting > other level softirq handler (besides TIMER_SOFTIRQ) but there is no > need to deal with the individual callback. I've seen that yes, I have yet to review deeper but it looks like a good idea in any case to have a preemption point between timer callbacks. RCU already does a similar thing from its rcu boost processing with cond_resched_tasks_rcu_qs(). > > However, how do you continue here? Assuming all timers are marked > TIMER_SOFTINTERRUPTIBLE then you could avoid the BH-lock at the > timer-softirq. > But when is a timer considered safe? Would the lack of the _bh suffix be > that and you would simply add it during your push down? Yeah that requires manual inspection. A timer that obviously doesn't mess up with other softirqs, as is the case most of the time, can simply get the flag. Other timers can be dealt with individually with local_bh_disable() or spin_lock_bh() or critical section. > Then you continue the same thing for the remaining softirqs. And once > you are done you would remove that RT lock within local_bh_disable()? > This isn't something a !RT user would benefit, right? Why not? A long lasting ksoftirqd handling lots of NET_RX could be interrupted by a timer/rcu softirq in !RT for example. Further, there could even be more than one ksoftirqd if necessary, though I doubt it. > The other idea I have (besides the preemption point in each softirq > handler (mentioned earlier)) is to simple drop the BH-lock on RT. Unlike > mainline, RT wouldn't deadlock then. The only that would be missing is > synchronisation against local_bh_disable() only locking for variables. > From what I remember from the various BH-models we have in RT in the > past, that was the only thing that exploded. I thought the issue was about timers fiddling with per-cpu state assuming they wouldn't be disturbed by other vectors and thus they lack local_bh_disable() on appropriate places. Or perhaps I misunderstood? Otherwise all timers can carry TIMER_SOFTINTERRUPTIBLE right away, right? Thanks.
On 2023-08-07 17:22:10 [+0200], Frederic Weisbecker wrote: > > However, how do you continue here? Assuming all timers are marked > > TIMER_SOFTINTERRUPTIBLE then you could avoid the BH-lock at the > > timer-softirq. > > But when is a timer considered safe? Would the lack of the _bh suffix be > > that and you would simply add it during your push down? > > Yeah that requires manual inspection. A timer that obviously doesn't mess > up with other softirqs, as is the case most of the time, can simply get the flag. > > Other timers can be dealt with individually with local_bh_disable() or > spin_lock_bh() or critical section. The question is what keeps _bh() doing if you intend to get rid of the lock. > > Then you continue the same thing for the remaining softirqs. And once > > you are done you would remove that RT lock within local_bh_disable()? > > This isn't something a !RT user would benefit, right? > > Why not? A long lasting ksoftirqd handling lots of NET_RX could be > interrupted by a timer/rcu softirq in !RT for example. Further, there > could even be more than one ksoftirqd if necessary, though I doubt it. That would require more thinking on how to accomplish this. It is generally assumed that a softirq callback makes always progress if you look at it from a different CPU. Therefore a loop like in __timer_delete_sync() is "okay" because the callback will continue and finish. If you drop the BH bits from the preemption pointer and allow preemption then the other CPU could spin until that timer gets back and continues to work. There is more of this kind of logic… But I don't think that you want ksoftirqd handling NET_RX in the first place. However having a long running NET_RX is a problem in RT because it blocks all other force-threaded interrupts. Mainline doesn't have this problem unless it uses threaded interrupts. The other problematic thing is that everything is mixed up. So you can't really distinguish between TASKLET as in USB vs SCSI. Not to mention NET_RX vs TASKLET. If you don't have luxury to move them to different CPU I *think* assigning them a priority would be a good thing. > > The other idea I have (besides the preemption point in each softirq > > handler (mentioned earlier)) is to simple drop the BH-lock on RT. Unlike > > mainline, RT wouldn't deadlock then. The only that would be missing is > > synchronisation against local_bh_disable() only locking for variables. > > From what I remember from the various BH-models we have in RT in the > > past, that was the only thing that exploded. > > I thought the issue was about timers fiddling with per-cpu state assuming > they wouldn't be disturbed by other vectors and thus they lack > local_bh_disable() on appropriate places. Or perhaps I misunderstood? > > Otherwise all timers can carry TIMER_SOFTINTERRUPTIBLE right away, right? The last time I have been looking at it is just the usage of per-CPU variables. For instance assume a timer invokes napi_consume_skb(). > Thanks. Sebastian