Message ID | 20230322162719.wYG1N0hh@linutronix.de |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp2438344wrt; Wed, 22 Mar 2023 09:33:22 -0700 (PDT) X-Google-Smtp-Source: AK7set/Pp1yg4IochkkRnB6YsZSiFNps8HfcsRPOkvgrbdJYr2sgw1mrRxKyYVvzLYEJ/z7JdHRR X-Received: by 2002:a05:6a20:354a:b0:cb:98e5:de2c with SMTP id f10-20020a056a20354a00b000cb98e5de2cmr241220pze.8.1679502802359; Wed, 22 Mar 2023 09:33:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679502802; cv=none; d=google.com; s=arc-20160816; b=VcopOUtTD9aiZyCGyzsLgyw0jYNhUpTOS+a3ZmIc8Lm+5eSZ0H0w3g2kGmnv8eGZ5k Vx75QXukYhH5cF0XXnTWi84Pf8UtWgkZQ52pRxhUmE1P8dSMFuZc/qFLMQ93gKR4jyKY 74Lnm/yI0ki6nXJHj2DScL9ddTKo+d4Z+9A5pXw5EDoF0mqXMzXb6i05rsZm2j7scZgG /CCiDANBo5MEvd/vRyxQtQ67CJ1w4ldfgODN7LZcsF+DPCSzaaxj1DDow5LmeIVvOVIR 2A6W4ODS//Knfj4ucTQbEfMs1GxwX0mA5xDfJvQGerKjEibJUyOqBl8xjId/Y11tGseR TJSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:message-id :subject:cc:to:from:dkim-signature:dkim-signature:date; bh=9Rm915zahe3wkfDA3OoSR/L3qpm80GrhS/bMurOPXUw=; b=hUiBStrSfHJEyZ8y6fsr2z+uDpJMEUick7ORd4H37zEksgqcZAgRHovIGFIpBrvE0a sOt7GPM5HMDsQodkBkYdCwFV2YDdmJgFe66xu5GbUsQ2Wp95hhAl6o/YQxEERn2VXfRu TsucmSUanQBxnwesfSn621atrRAqaNvdwed+In+biPXE1ImZfIP8zRnwdu4wXD31DaDs yGhckGee57MUnnIJsKnaQVIZmNutxbjkqeCzF4nIN6/0I/aD0cxtzleViUn2M+eXVaLk +J89HfmGgGrXFGr09l/bRqE6fgD4wsrHFoz9lo0kpr2jATXE9Ob1Jr+5EI9qtbwgPI9h WXIw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=gJD8resC; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j6-20020a056a00130600b00627e8281621si11804121pfu.130.2023.03.22.09.33.08; Wed, 22 Mar 2023 09:33:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=gJD8resC; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230110AbjCVQ1t (ORCPT <rfc822;pusanteemu@gmail.com> + 99 others); Wed, 22 Mar 2023 12:27:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229788AbjCVQ1n (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 22 Mar 2023 12:27:43 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D36C660D7E for <linux-kernel@vger.kernel.org>; Wed, 22 Mar 2023 09:27:23 -0700 (PDT) Date: Wed, 22 Mar 2023 17:27:19 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1679502441; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=9Rm915zahe3wkfDA3OoSR/L3qpm80GrhS/bMurOPXUw=; b=gJD8resC1Fv1HxTm3TwZdZQqtLtBZkthWZGKGNlHWKbcAfflfIJgktQ0dLbUiGo/TuIyrx rYRvAfkHk3wjk340s8bMIXiXdbbTEEHS2eDzk3VqeoDOvqnOnU9aWn7bW66AfL1LZQPnxc anjx0PS6Lo9GZ4BbSiPiicx174egLrRxeHTV/UC0UngW7loYP8997KJ9zJ/AXzYSt+sCK2 bzGLXGP2IlUUXtCt25A7MZdeKEyT5ZZJHGMja/nOuOUZXKWm+6+telYXU8EZiJZRX/XkdT VvwrgBrxgPha31sTucOiY1uNEeUkwLwhwBMSoNjhmDS4RtGBIKrM1EiAcMKOMg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1679502441; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type; bh=9Rm915zahe3wkfDA3OoSR/L3qpm80GrhS/bMurOPXUw=; b=xiaieJyN/TwkmbQDmpGFACCEw+FEWUcM2mYF2MOK/zHj4RgxKhdi5ULkvybRSRbfZ5+g9d aU2xtawMQ9yOGjCw== From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> To: Thomas Gleixner <tglx@linutronix.de>, linux-kernel@vger.kernel.org Cc: Crystal Wood <swood@redhat.com>, John Keeping <john@metanate.com>, Boqun Feng <boqun.feng@gmail.com>, Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Waiman Long <longman@redhat.com>, Will Deacon <will@kernel.org> Subject: [PATCH] locking/rtmutex: Flush the plug before entering the slowpath. Message-ID: <20230322162719.wYG1N0hh@linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Spam-Status: No, score=-2.5 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761086330301756528?= X-GMAIL-MSGID: =?utf-8?q?1761086330301756528?= |
Series |
locking/rtmutex: Flush the plug before entering the slowpath.
|
|
Commit Message
Sebastian Andrzej Siewior
March 22, 2023, 4:27 p.m. UTC
blk_flush_plug() is invoked on schedule() to flush out the IO progress that has been made so far so that it is globally visible. This is important to avoid deadlocks because a lock owner can wait for IO. Therefore the IO must be first flushed before a thread can block on a lock. The plug flush routine can acquire a sleeping lock which is contended. Blocking on a lock requires an assignment to task_struct::pi_blocked_on. If blk_flush_plug() is invoked from the slow path on schedule() then the variable is already set and will be overwritten by the lock in blk_flush_plug(). Therefore it is needed to invoke blk_flush_plug() (and block on potential locks in the process) before the blocking on the actual lock. Invoke blk_flush_plug() before blocking on a sleeping lock. The PREEMPT_RT only sleeping locks (spinlock_t and rwlock_t) are excluded because their slow path does not invoke blk_flush_plug(). Fixes: e17ba59b7e8e1 ("locking/rtmutex: Guard regular sleeping locks specific functions") Reported-by: Crystal Wood <swood@redhat.com> Link: https://lore.kernel.org/4b4ab374d3e24e6ea8df5cadc4297619a6d945af.camel@redhat.com Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- On 2023-02-20 19:21:51 [+0100], Thomas Gleixner wrote: > > This still leaves the problem vs. io_wq_worker_sleeping() and it's > running() counterpart after schedule(). io_wq_worker_sleeping() has a kfree() so it probably should be moved, too. io_wq_worker_running() is a OR and INC and is fine. > Aside of that for CONFIG_DEBUG_RT_MUTEXES=y builds it flushes on every > lock operation whether the lock is contended or not. For mutex & ww_mutex operations. rwsem is not affected by CONFIG_DEBUG_RT_MUTEXES. As for mutex it could be mitigated by invoking try_to_take_rt_mutex() before blk_flush_plug(). kernel/locking/rtmutex.c | 7 +++++++ kernel/locking/rwbase_rt.c | 8 ++++++++ kernel/locking/ww_rt_mutex.c | 5 +++++ 3 files changed, 20 insertions(+)
Comments
On 2023-03-22 17:27:21 [+0100], To Thomas Gleixner wrote: > blk_flush_plug() is invoked on schedule() to flush out the IO progress > that has been made so far so that it is globally visible. This is > important to avoid deadlocks because a lock owner can wait for IO. > Therefore the IO must be first flushed before a thread can block on a > lock. > > The plug flush routine can acquire a sleeping lock which is contended. > Blocking on a lock requires an assignment to task_struct::pi_blocked_on. > If blk_flush_plug() is invoked from the slow path on schedule() then the > variable is already set and will be overwritten by the lock in > blk_flush_plug(). > Therefore it is needed to invoke blk_flush_plug() (and block on > potential locks in the process) before the blocking on the actual lock. > > Invoke blk_flush_plug() before blocking on a sleeping lock. The > PREEMPT_RT only sleeping locks (spinlock_t and rwlock_t) are excluded > because their slow path does not invoke blk_flush_plug(). > > Fixes: e17ba59b7e8e1 ("locking/rtmutex: Guard regular sleeping locks specific functions") > Reported-by: Crystal Wood <swood@redhat.com> > Link: https://lore.kernel.org/4b4ab374d3e24e6ea8df5cadc4297619a6d945af.camel@redhat.com > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> polite ping. Sebastian
On Tue, 2023-04-18 at 17:18 +0200, Sebastian Andrzej Siewior wrote: > On 2023-03-22 17:27:21 [+0100], To Thomas Gleixner wrote: > > blk_flush_plug() is invoked on schedule() to flush out the IO progress > > that has been made so far so that it is globally visible. This is > > important to avoid deadlocks because a lock owner can wait for IO. > > Therefore the IO must be first flushed before a thread can block on a > > lock. > > > > The plug flush routine can acquire a sleeping lock which is contended. > > Blocking on a lock requires an assignment to task_struct::pi_blocked_on. > > If blk_flush_plug() is invoked from the slow path on schedule() then the > > variable is already set and will be overwritten by the lock in > > blk_flush_plug(). > > Therefore it is needed to invoke blk_flush_plug() (and block on > > potential locks in the process) before the blocking on the actual lock. > > > > Invoke blk_flush_plug() before blocking on a sleeping lock. The > > PREEMPT_RT only sleeping locks (spinlock_t and rwlock_t) are excluded > > because their slow path does not invoke blk_flush_plug(). > > > > Fixes: e17ba59b7e8e1 ("locking/rtmutex: Guard regular sleeping locks > > specific functions") > > Reported-by: Crystal Wood <swood@redhat.com> > > Link: > > https://lore.kernel.org/4b4ab374d3e24e6ea8df5cadc4297619a6d945af.camel@redhat.com > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > polite ping. Sorry, got distracted. It does appear to make the asserts go away but I ran into other possibly related stalls when running rteval -- but if I remove both the asserts and your patch it hangs in a different way, so I need to investigate some more. -Crystal
On 2023-04-18 18:43:48 [-0500], Crystal Wood wrote: > Sorry, got distracted. It does appear to make the asserts go away but I ran > into other possibly related stalls when running rteval -- but if I remove > both the asserts and your patch it hangs in a different way, so I need to > investigate some more. So all is fine with the patch or does it still hang? I tried "make runit" runtime set to 1h and it passed. The filesystem is XFS on a single SATA device in case it matters. > -Crystal Sebastian
On Wed, Mar 22 2023 at 17:27, Sebastian Andrzej Siewior wrote: >> This still leaves the problem vs. io_wq_worker_sleeping() and it's >> running() counterpart after schedule(). > > io_wq_worker_sleeping() has a kfree() so it probably should be moved, > too. > io_wq_worker_running() is a OR and INC and is fine. Why is io_wq_worker_sleeping() not cured in the same way? Just because it did not yet result in a splat? Why not just expose sched_submit_work()? > --- a/kernel/locking/rwbase_rt.c > +++ b/kernel/locking/rwbase_rt.c > @@ -143,6 +143,14 @@ static __always_inline int rwbase_read_lock(struct rwbase_rt *rwb, > if (rwbase_read_trylock(rwb)) > return 0; > > + if (state != TASK_RTLOCK_WAIT) { Bah. That code has explicit rwbase_foo() helpers which are filled in by rwsem and rwlock. Making this conditional on state is creative at best. > + /* > + * If we are going to sleep and we have plugged IO queued, > + * make sure to submit it to avoid deadlocks. > + */ > + blk_flush_plug(current->plug, true); > + } > + > return __rwbase_read_lock(rwb, state); > } > > diff --git a/kernel/locking/ww_rt_mutex.c b/kernel/locking/ww_rt_mutex.c > index d1473c624105c..472e3622abf09 100644 > --- a/kernel/locking/ww_rt_mutex.c > +++ b/kernel/locking/ww_rt_mutex.c > @@ -67,6 +67,11 @@ __ww_rt_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ww_ctx, > ww_mutex_set_context_fastpath(lock, ww_ctx); > return 0; > } > + /* > + * If we are going to sleep and we have plugged IO queued, make sure to > + * submit it to avoid deadlocks. > + */ > + blk_flush_plug(current->plug, true); > > ret = rt_mutex_slowlock(&rtm->rtmutex, ww_ctx, state); This hunk can be avoided by moving the submit work invocation to rt_mutex_slowlock(). Thanks, tglx
On Wed, 2023-04-19 at 16:04 +0200, Sebastian Andrzej Siewior wrote: > On 2023-04-18 18:43:48 [-0500], Crystal Wood wrote: > > Sorry, got distracted. It does appear to make the asserts go away but I > > ran > > into other possibly related stalls when running rteval -- but if I > > remove > > both the asserts and your patch it hangs in a different way, so I need > > to > > investigate some more. > > So all is fine with the patch or does it still hang? It hangs either way, but with different symptoms. With unmodified v6.3-rc7-rt9 I get this very quickly after starting rteval: [ 178.366305] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 178.366325] rcu: Tasks blocked on level-1 rcu_node (CPUs 80-95): P819/1:b..l [ 178.366335] rcu: Tasks blocked on level-1 rcu_node (CPUs 128-143): P1220/2:b..l P1292/2:b..l [ 178.366343] rcu: Tasks blocked on level-1 rcu_node (CPUs 224-239): P2063/2:b..l [ 178.366361] rcu: (detected by 148, t=60002 jiffies, g=19421, q=128303 ncpus=256) [ 178.366368] task:irq_work/88 state:R running task stack:0 pid:819 ppid:2 flags:0x00004000 [ 178.366375] Call Trace: [ 178.366378] <TASK> [ 178.366385] __schedule+0x1fe/0x590 [ 178.366400] ? _raw_spin_lock+0x13/0x40 [ 178.366406] preempt_schedule_irq+0x39/0x60 [ 178.366409] asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 178.366418] RIP: 0010:__rcu_read_unlock+0xe/0x30 [ 178.366426] Code: 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 65 48 8b 3c 25 40 18 03 00 <83> af 74 08 00 00 01 75 0a 8b 87 78 08 00 00 85 c0 75 05 c3 cc cc [ 178.366428] RSP: 0018:ffffa1218ecfbec0 EFLAGS: 00000246 [ 178.366431] RAX: 0000000000000001 RBX: ffff944c0e032f48 RCX: 0000000000000000 [ 178.366432] RDX: ffff942c173099c0 RSI: ffffffff9b997992 RDI: ffff942c173099c0 [ 178.366433] RBP: ffff942c162e4bb0 R08: ffff942c1730a2f8 R09: ffff942c1730a2f8 [ 178.366434] R10: 0000000000000282 R11: 0000000000000000 R12: 0000000000000000 [ 178.366435] R13: 0000000000000001 R14: 0000000000000058 R15: ffff940cc8229500 [ 178.366441] rcuwait_wake_up+0x2e/0x40 [ 178.366449] run_irq_workd+0x39/0x50 [ 178.366456] smpboot_thread_fn+0x1c3/0x2b0 [ 178.366463] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 178.366465] kthread+0x108/0x130 [ 178.366469] ? __pfx_kthread+0x10/0x10 [ 178.366470] ret_from_fork+0x2c/0x50 [ 178.366480] </TASK> [ 178.366481] task:rcuc/140 state:R running task stack:0 pid:1292 ppid:2 flags:0x00004000 [ 178.366484] Call Trace: [ 178.366485] <TASK> [ 178.366486] __schedule+0x1fe/0x590 [ 178.366489] preempt_schedule_irq+0x39/0x60 [ 178.366491] asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 178.366493] RIP: 0010:rcu_cblist_dequeue+0x15/0x30 [ 178.366498] Code: 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 8b 07 48 85 c0 74 10 48 83 6f 10 01 48 8b 10 <48> 89 17 48 85 d2 74 05 c3 cc cc cc cc 48 89 7f 08 c3 cc cc cc cc [ 178.366500] RSP: 0018:ffffa1218fc93e10 EFLAGS: 00000202 [ 178.366501] RAX: ffff940f12aedb80 RBX: 00000000000000c6 RCX: 0000000000000000 [ 178.366501] RDX: ffff940f12aec500 RSI: ffffffff9b94e60e RDI: ffffa1218fc93e28 [ 178.366502] RBP: ffff940cc88c3380 R08: 0000000000000001 R09: ffffffff9a5c1ea3 [ 178.366503] R10: 00000000000002c8 R11: 0000000001a6b00f R12: ffff942b8e132f00 [ 178.366504] R13: 00000000000000c5 R14: 0000000000000000 R15: ffff942b8e132f78 [ 178.366505] ? rcu_do_batch+0x1a3/0x4e0 [ 178.366508] rcu_do_batch+0x17f/0x4e0 [ 178.366511] rcu_core+0x2d4/0x4a0 [ 178.366514] rcu_cpu_kthread+0xb4/0x2f0 [ 178.366517] smpboot_thread_fn+0x1c3/0x2b0 [ 178.366519] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 178.366521] kthread+0x108/0x130 [ 178.366522] ? __pfx_kthread+0x10/0x10 [ 178.366523] ret_from_fork+0x2c/0x50 [ 178.366527] </TASK> [ 178.366528] task:rcuc/132 state:R running task stack:0 pid:1220 ppid:2 flags:0x00004000 [ 178.366530] Call Trace: [ 178.366531] <TASK> [ 178.366532] __schedule+0x1fe/0x590 [ 178.366534] preempt_schedule_irq+0x39/0x60 [ 178.366535] asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 178.366537] RIP: 0010:kmem_cache_free+0xf6/0x430 [ 178.366542] Code: 03 0d 76 de 25 01 48 29 c8 89 c1 48 0f af f1 0f b6 4d 24 48 c1 ee 20 29 f0 d3 e8 0f b6 4d 25 01 f0 d3 e8 48 8d 0c c2 4c 8b 39 <4d> 85 ff 74 75 48 c7 01 00 00 00 00 8b 75 18 4c 89 ff 48 83 c6 08 [ 178.366543] RSP: 0018:ffffa1218fa33de0 EFLAGS: 00000212 [ 178.366544] RAX: 0000000000000019 RBX: 0000000000000006 RCX: ffff940fd063c0c8 [ 178.366545] RDX: ffff940fd063c000 RSI: 00000000000012c0 RDI: ffffffc206e93f40 [ 178.366546] RBP: ffff940cc01f6f00 R08: 0000000000000001 R09: ffffffff9a5c1ea3 [ 178.366546] R10: 000000000000004d R11: 00000000000000b5 R12: ffff940e13e69f40 [ 178.366547] R13: ffffffff9a5c1ea3 R14: ffffcdd8894f9a00 R15: ffff940d62b27500 [ 178.366548] ? rcu_do_batch+0x1a3/0x4e0 [ 178.366550] ? rcu_do_batch+0x1a3/0x4e0 [ 178.366553] rcu_do_batch+0x1a3/0x4e0 [ 178.366555] rcu_core+0x2d4/0x4a0 [ 178.366557] rcu_cpu_kthread+0xb4/0x2f0 [ 178.366559] smpboot_thread_fn+0x1c3/0x2b0 [ 178.366561] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 178.366564] kthread+0x108/0x130 [ 178.366565] ? __pfx_kthread+0x10/0x10 [ 178.366566] ret_from_fork+0x2c/0x50 [ 178.366570] </TASK> [ 178.366571] task:rcuc/225 state:R running task stack:0 pid:2063 ppid:2 flags:0x00004000 [ 178.366574] Call Trace: [ 178.366575] <TASK> [ 178.366577] __schedule+0x1fe/0x590 [ 178.366579] preempt_schedule_irq+0x39/0x60 [ 178.366581] asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 178.366583] RIP: 0010:rcu_cblist_dequeue+0x15/0x30 [ 178.366585] Code: 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 8b 07 48 85 c0 74 10 48 83 6f 10 01 48 8b 10 <48> 89 17 48 85 d2 74 05 c3 cc cc cc cc 48 89 7f 08 c3 cc cc cc cc [ 178.366586] RSP: 0018:ffffa121915ffe10 EFLAGS: 00000216 [ 178.366587] RAX: ffff942d4dd9bd40 RBX: 000000000000001f RCX: 0000000000000000 [ 178.366588] RDX: ffff942d4dd99040 RSI: ffffffff9b94e60e RDI: ffffa121915ffe28 [ 178.366588] RBP: ffff942c18908000 R08: 0000000000000001 R09: ffffffff9a5c1ea3 [ 178.366589] R10: 000000000000005e R11: 000000000000000b R12: ffff944c0f272f00 [ 178.366590] R13: 000000000000001e R14: 0000000000000000 R15: ffff944c0f272f78 [ 178.366591] ? rcu_do_batch+0x1a3/0x4e0 [ 178.366593] rcu_do_batch+0x17f/0x4e0 [ 178.366595] rcu_core+0x2d4/0x4a0 [ 178.366598] rcu_cpu_kthread+0xb4/0x2f0 [ 178.366600] smpboot_thread_fn+0x1c3/0x2b0 [ 178.366602] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 178.366604] kthread+0x108/0x130 [ 178.366605] ? __pfx_kthread+0x10/0x10 [ 178.366606] ret_from_fork+0x2c/0x50 [ 178.366610] </TASK> [ 178.366611] rcu: rcu_preempt kthread timer wakeup didn't happen for 58989 jiffies! g19421 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 [ 178.366613] rcu: Possible timer handling issue on cpu=144 timer-softirq=91 [ 178.366615] rcu: rcu_preempt kthread starved for 58993 jiffies! g19421 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=144 [ 178.366617] rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. [ 178.366618] rcu: RCU grace-period kthread stack dump: [ 178.366619] task:rcu_preempt state:I stack:0 pid:19 ppid:2 flags:0x00004000 [ 178.366620] Call Trace: [ 178.366621] <TASK> [ 178.366622] __schedule+0x1fe/0x590 [ 178.366624] schedule+0x5d/0xc0 [ 178.366626] schedule_timeout+0x88/0x190 [ 178.366629] ? __pfx_process_timeout+0x10/0x10 [ 178.366634] rcu_gp_fqs_loop+0x114/0x480 [ 178.366639] rcu_gp_kthread+0xf8/0x1c0 [ 178.366642] ? __pfx_rcu_gp_kthread+0x10/0x10 [ 178.366644] kthread+0x108/0x130 [ 178.366645] ? __pfx_kthread+0x10/0x10 [ 178.366646] ret_from_fork+0x2c/0x50 [ 178.366650] </TASK> [ 178.366650] rcu: Stack dump where RCU GP kthread last ran: [ 178.366651] Sending NMI from CPU 148 to CPUs 144: [ 178.366661] NMI backtrace for cpu 144 skipped: idling at poll_idle+0x5a/0xa4 -Crystal
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 728f434de2bbf..c1bc2cb1522cb 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -23,6 +23,7 @@ #include <linux/sched/rt.h> #include <linux/sched/wake_q.h> #include <linux/ww_mutex.h> +#include <linux/blkdev.h> #include <trace/events/lock.h> @@ -1700,6 +1701,12 @@ static __always_inline int __rt_mutex_lock(struct rt_mutex_base *lock, if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) return 0; + /* + * If we are going to sleep and we have plugged IO queued, make sure to + * submit it to avoid deadlocks. + */ + blk_flush_plug(current->plug, true); + return rt_mutex_slowlock(lock, NULL, state); } #endif /* RT_MUTEX_BUILD_MUTEX */ diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c index c201aadb93017..70c08ec4ad8af 100644 --- a/kernel/locking/rwbase_rt.c +++ b/kernel/locking/rwbase_rt.c @@ -143,6 +143,14 @@ static __always_inline int rwbase_read_lock(struct rwbase_rt *rwb, if (rwbase_read_trylock(rwb)) return 0; + if (state != TASK_RTLOCK_WAIT) { + /* + * If we are going to sleep and we have plugged IO queued, + * make sure to submit it to avoid deadlocks. + */ + blk_flush_plug(current->plug, true); + } + return __rwbase_read_lock(rwb, state); } diff --git a/kernel/locking/ww_rt_mutex.c b/kernel/locking/ww_rt_mutex.c index d1473c624105c..472e3622abf09 100644 --- a/kernel/locking/ww_rt_mutex.c +++ b/kernel/locking/ww_rt_mutex.c @@ -67,6 +67,11 @@ __ww_rt_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ww_ctx, ww_mutex_set_context_fastpath(lock, ww_ctx); return 0; } + /* + * If we are going to sleep and we have plugged IO queued, make sure to + * submit it to avoid deadlocks. + */ + blk_flush_plug(current->plug, true); ret = rt_mutex_slowlock(&rtm->rtmutex, ww_ctx, state);