Message ID | 20240104054030.14733-1-xuewen.yan@unisoc.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-16263-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:6f82:b0:100:9c79:88ff with SMTP id tb2csp5430282dyb; Wed, 3 Jan 2024 21:41:43 -0800 (PST) X-Google-Smtp-Source: AGHT+IFY5uy+ZSStNpl4vNYcchFA7zekw20WeiqlMiXy5+S+gdjGiOytLjuYz3Anx3/bxWJT5OK3 X-Received: by 2002:a05:6359:6316:b0:170:5398:81e5 with SMTP id sf22-20020a056359631600b00170539881e5mr28120rwb.39.1704346903005; Wed, 03 Jan 2024 21:41:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704346902; cv=none; d=google.com; s=arc-20160816; b=fFNK4CKxg1x3igZNOdJky0EgWXV3ww3jGmRfqqeNA4m1O2PBZmvQPsefjLn/S9Wlf8 c6T7D38fTNdlOfwhtuglLtp7yMAepV43tsjWMF10hgA1gLLESzJD+rmgYEAicHqLlcX8 ir7KRMIhEjwG4EMe6VW5U8s+3mOicbBG5PwnziqKReNDv6+Xg6pEvg5Xn5RVT4U0Jj14 fkXNCspsY2sTWi4Yfqr2KZXibiCeuYwmvWQPF1gg/rv9zz143r6EpTLuIxyWbhX0Zald RWCom34JmBYXiUxAQavc3hFqTETBt0NdKCTlbrk6GBl+/veVBVr6C1oi99jGmZxx7Xdl s5SA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from; bh=EsnJy+leTwKbzpG/0sU2DwnSB8aB+UajX9OfOZ++ZGA=; fh=fCbZmewdeCoNvgouhvn6BTfl32w0KLbPSiVoJrFPDA8=; b=FYcRzytLaOaZvCM1z82rP5FpQMVg5C6QaDEpAwsAGRq2MJXl2K2hK7XUygyWAhzOu6 MLRe2TUj90YVCG2f2/synlq7pSPu304HI9QkHIGpurDnWlU+/LcJ3Mm4DWnNRe0xLBAz LfAGidGeAkchNvwsXyyJZT5fmJ3PvDCLlieixkghcSRFC+RSLgVZef0k4GIIEuFB6bNO 2NEPV2mLaE9O1fBOzLuSDDSDN3CFASADyOYYXWws+CceFE8nnf1NQfcSmrKClqa2yIAX WkeR2+sqMjGc7/BGTIwmRgEUHCdjm2zusupE1lFUj4B9AR6h3q8fzNi/1IJ1RE2t8FrO IANA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-16263-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-16263-ouuuleilei=gmail.com@vger.kernel.org" Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id l4-20020a656804000000b005ce19c68671si17557489pgt.497.2024.01.03.21.41.42 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 03 Jan 2024 21:41:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-16263-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-16263-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-16263-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id ACD452882C5 for <ouuuleilei@gmail.com>; Thu, 4 Jan 2024 05:41:42 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 91F081CA98; Thu, 4 Jan 2024 05:41:30 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from SHSQR01.spreadtrum.com (unknown [222.66.158.135]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D38F18EBF for <linux-kernel@vger.kernel.org>; Thu, 4 Jan 2024 05:41:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=unisoc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=unisoc.com Received: from dlp.unisoc.com ([10.29.3.86]) by SHSQR01.spreadtrum.com with ESMTP id 4045eZCU057778; Thu, 4 Jan 2024 13:40:35 +0800 (+08) (envelope-from Xuewen.Yan@unisoc.com) Received: from SHDLP.spreadtrum.com (bjmbx01.spreadtrum.com [10.0.64.7]) by dlp.unisoc.com (SkyGuard) with ESMTPS id 4T5Fct75KXz2QcMNy; Thu, 4 Jan 2024 13:33:50 +0800 (CST) Received: from BJ10918NBW01.spreadtrum.com (10.0.73.73) by BJMBX01.spreadtrum.com (10.0.64.7) with Microsoft SMTP Server (TLS) id 15.0.1497.23; Thu, 4 Jan 2024 13:40:33 +0800 From: Xuewen Yan <xuewen.yan@unisoc.com> To: <peterz@infradead.org>, <mingo@redhat.com>, <will@kernel.org> CC: <longman@redhat.com>, <boqun.feng@gmail.com>, <ke.wang@unisoc.com>, <zhiguo.niu@unisoc.com>, <linux-kernel@vger.kernel.org> Subject: [PATCH] lock/lockdep: Add missing graph_unlock in validate_chain Date: Thu, 4 Jan 2024 13:40:30 +0800 Message-ID: <20240104054030.14733-1-xuewen.yan@unisoc.com> X-Mailer: git-send-email 2.25.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: SHCAS03.spreadtrum.com (10.0.1.207) To BJMBX01.spreadtrum.com (10.0.64.7) X-MAIL: SHSQR01.spreadtrum.com 4045eZCU057778 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1787137258149564842 X-GMAIL-MSGID: 1787137258149564842 |
Series |
lock/lockdep: Add missing graph_unlock in validate_chain
|
|
Commit Message
Xuewen Yan
Jan. 4, 2024, 5:40 a.m. UTC
The lookup_chain_cache_add will get graph_lock, but the validate_chain do not unlock before return 0. So add graph_unlock before return 0. Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> --- kernel/locking/lockdep.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
Comments
Hi, On Thu, Jan 04, 2024 at 01:40:30PM +0800, Xuewen Yan wrote: > The lookup_chain_cache_add will get graph_lock, but the > validate_chain do not unlock before return 0. > Thanks for looking into this, a few comment below: > So add graph_unlock before return 0. > > Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> > Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> > --- > kernel/locking/lockdep.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > index 151bd3de5936..24995e1ebc62 100644 > --- a/kernel/locking/lockdep.c > +++ b/kernel/locking/lockdep.c > @@ -3855,8 +3855,11 @@ static int validate_chain(struct task_struct *curr, > */ > int ret = check_deadlock(curr, hlock); > > - if (!ret) > + if (!ret) { > + graph_unlock(); Note that when check_deadlock() return 0, there is a print_deadlock_bug() before the return, so I think it covers the graph_unlock() (see debug_locks_off_graph_unlock()). > return 0; > + } > + > /* > * Add dependency only if this lock is not the head > * of the chain, and if the new lock introduces no more > @@ -3865,9 +3868,9 @@ static int validate_chain(struct task_struct *curr, > * serializes nesting locks), see the comments for > * check_deadlock(). > */ > - if (!chain_head && ret != 2) { > - if (!check_prevs_add(curr, hlock)) > - return 0; > + if (!chain_head && ret != 2 && !check_prevs_add(curr, hlock)) { > + graph_unlock(); This part is interesting, usually when an internal function in lockdep returns 0, it means there is an error (either a deadlock or internal error), and that means a print_*() would be called, and the graph lock will be unlocked in that print_*(). However, in check_prevs_add() there is one condition where it will return 0 without any print_*(), that is: in check_prev_add(): /* <prev> is not found in <next>::locks_before */ return 0; it's an internal error where <next> is in the <prev>::locks_after list but <prev> is not in the <next>::locks_before list, which should seldom happen: it's dead code. If you put a graph_unlock() before that return, I think it covers all the cases, unless I'm missing something subtle. Are you hitting a real issue or this is found by code reading? Regards, Boqun > + return 0; > } > > graph_unlock(); > -- > 2.25.1 >
Hi On Fri, Jan 5, 2024 at 3:44 AM Boqun Feng <boqun.feng@gmail.com> wrote: > > Hi, > > On Thu, Jan 04, 2024 at 01:40:30PM +0800, Xuewen Yan wrote: > > The lookup_chain_cache_add will get graph_lock, but the > > validate_chain do not unlock before return 0. > > > > Thanks for looking into this, a few comment below: > > > So add graph_unlock before return 0. > > > > Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com> > > Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com> > > --- > > kernel/locking/lockdep.c | 11 +++++++---- > > 1 file changed, 7 insertions(+), 4 deletions(-) > > > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > > index 151bd3de5936..24995e1ebc62 100644 > > --- a/kernel/locking/lockdep.c > > +++ b/kernel/locking/lockdep.c > > @@ -3855,8 +3855,11 @@ static int validate_chain(struct task_struct *curr, > > */ > > int ret = check_deadlock(curr, hlock); > > > > - if (!ret) > > + if (!ret) { > > + graph_unlock(); > > Note that when check_deadlock() return 0, there is a > print_deadlock_bug() before the return, so I think it covers the > graph_unlock() (see debug_locks_off_graph_unlock()). Yes, I did not see the check_deadlock's details carefully. > > > return 0; > > + } > > + > > /* > > * Add dependency only if this lock is not the head > > * of the chain, and if the new lock introduces no more > > @@ -3865,9 +3868,9 @@ static int validate_chain(struct task_struct *curr, > > * serializes nesting locks), see the comments for > > * check_deadlock(). > > */ > > - if (!chain_head && ret != 2) { > > - if (!check_prevs_add(curr, hlock)) > > - return 0; > > + if (!chain_head && ret != 2 && !check_prevs_add(curr, hlock)) { > > + graph_unlock(); > > This part is interesting, usually when an internal function in lockdep > returns 0, it means there is an error (either a deadlock or internal > error), and that means a print_*() would be called, and the graph lock > will be unlocked in that print_*(). However, in check_prevs_add() there > is one condition where it will return 0 without any print_*(), that is: > > > in check_prev_add(): > > /* <prev> is not found in <next>::locks_before */ > return 0; > > it's an internal error where <next> is in the <prev>::locks_after list > but <prev> is not in the <next>::locks_before list, which should seldom > happen: it's dead code. If you put a graph_unlock() before that return, > I think it covers all the cases, unless I'm missing something subtle. If only this condition does not unlock, It is indeed better to put graph_unlock here. I would change the patch in the V2. > > Are you hitting a real issue or this is found by code reading? Indeed, we hit a real issue: One cpu did not call graph_unlock, as a result, caused a deadlock with other cpus, because any cpu calling raw_spin_lock would get the graph_lock first. Thanks! -- BR xuewen > > Regards, > Boqun > > > + return 0; > > } > > > > graph_unlock(); > > -- > > 2.25.1 > > >
Hello, kernel test robot noticed "WARNING:at_kernel/locking/lockdep.c:#lock_acquire" on: commit: 1033f8642b6ca8652763f95013e30ff1c220f3c9 ("[PATCH] lock/lockdep: Add missing graph_unlock in validate_chain") url: https://github.com/intel-lab-lkp/linux/commits/Xuewen-Yan/lock-lockdep-Add-missing-graph_unlock-in-validate_chain/20240104-134335 base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 67a1723344cfe05430977483d6d3c7a999480143 patch link: https://lore.kernel.org/all/20240104054030.14733-1-xuewen.yan@unisoc.com/ patch subject: [PATCH] lock/lockdep: Add missing graph_unlock in validate_chain in testcase: boot compiler: clang-17 test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G (please refer to attached dmesg/kmsg for entire log/backtrace) +---------------------------------------------------------+------------+------------+ | | 67a1723344 | 1033f8642b | +---------------------------------------------------------+------------+------------+ | WARNING:at_kernel/locking/lockdep.c:#lock_acquire | 0 | 8 | | EIP:lock_acquire | 0 | 8 | +---------------------------------------------------------+------------+------------+ If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202401082116.303fea3-oliver.sang@intel.com [ 0.755622][ T0] ------------[ cut here ]------------ [ 0.756871][ T0] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:467 lock_acquire (kernel/locking/lockdep.c:467) [ 0.757950][ T0] Modules linked in: [ 0.758418][ T0] CPU: 0 PID: 0 Comm: swapper Not tainted 6.7.0-rc8-00007-g1033f8642b6c #1 [ 0.759457][ T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 0.760652][ T0] EIP: lock_acquire (kernel/locking/lockdep.c:467) [ 0.761215][ T0] Code: 03 c1 e3 10 8b 45 0c 83 e0 01 c1 e0 12 01 d8 05 00 40 08 00 89 45 e8 8d 55 d4 89 f9 56 e8 47 00 00 00 83 c4 04 e9 67 ff ff ff <0f> 0b b8 c8 51 6a c2 e8 b7 66 f7 00 c7 05 e4 50 8b c2 00 00 00 00 All code ======== 0: 03 c1 add %ecx,%eax 2: e3 10 jrcxz 0x14 4: 8b 45 0c mov 0xc(%rbp),%eax 7: 83 e0 01 and $0x1,%eax a: c1 e0 12 shl $0x12,%eax d: 01 d8 add %ebx,%eax f: 05 00 40 08 00 add $0x84000,%eax 14: 89 45 e8 mov %eax,-0x18(%rbp) 17: 8d 55 d4 lea -0x2c(%rbp),%edx 1a: 89 f9 mov %edi,%ecx 1c: 56 push %rsi 1d: e8 47 00 00 00 call 0x69 22: 83 c4 04 add $0x4,%esp 25: e9 67 ff ff ff jmp 0xffffffffffffff91 2a:* 0f 0b ud2 <-- trapping instruction 2c: b8 c8 51 6a c2 mov $0xc26a51c8,%eax 31: e8 b7 66 f7 00 call 0xf766ed 36: c7 05 e4 50 8b c2 00 movl $0x0,-0x3d74af1c(%rip) # 0xffffffffc28b5124 3d: 00 00 00 Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: b8 c8 51 6a c2 mov $0xc26a51c8,%eax 7: e8 b7 66 f7 00 call 0xf766c3 c: c7 05 e4 50 8b c2 00 movl $0x0,-0x3d74af1c(%rip) # 0xffffffffc28b50fa 13: 00 00 00 [ 0.763535][ T0] EAX: 00000000 EBX: 00200246 ECX: 00000000 EDX: 00000000 [ 0.764339][ T0] ESI: 00000000 EDI: c2c299b4 EBP: c2877f3c ESP: c2877f10 [ 0.765192][ T0] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00210097 [ 0.766057][ T0] CR0: 80050033 CR2: ffd96000 CR3: 02e96000 CR4: 00000090 [ 0.766888][ T0] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [ 0.767677][ T0] DR6: fffe0ff0 DR7: 00000400 [ 0.768224][ T0] Call Trace: [ 0.768615][ T0] ? show_regs (arch/x86/kernel/dumpstack.c:478) [ 0.769115][ T0] ? __warn (kernel/panic.c:236 kernel/panic.c:677) [ 0.769584][ T0] ? lock_acquire (kernel/locking/lockdep.c:467) [ 0.770147][ T0] ? report_bug (lib/bug.c:199) [ 0.770697][ T0] ? exc_overflow (arch/x86/kernel/traps.c:250) [ 0.771253][ T0] ? handle_bug (arch/x86/kernel/traps.c:237) [ 0.771781][ T0] ? exc_invalid_op (arch/x86/kernel/traps.c:258) [ 0.772375][ T0] ? handle_exception (arch/x86/entry/entry_32.S:1056) [ 0.772953][ T0] ? exc_overflow (arch/x86/kernel/traps.c:250) [ 0.773465][ T0] ? lock_acquire (kernel/locking/lockdep.c:467) [ 0.774005][ T0] ? exc_overflow (arch/x86/kernel/traps.c:250) [ 0.774531][ T0] ? lock_acquire (kernel/locking/lockdep.c:467) [ 0.775079][ T0] ? vprintk_emit (arch/x86/include/asm/preempt.h:104 kernel/printk/printk.c:3899 kernel/printk/printk.c:3914 kernel/printk/printk.c:2310) [ 0.775628][ T0] _raw_spin_lock (include/linux/spinlock_api_smp.h:133 kernel/locking/spinlock.c:154) [ 0.776177][ T0] ? AA_spin (lib/locking-selftest.c:310) [ 0.776678][ T0] AA_spin (lib/locking-selftest.c:310) [ 0.777139][ T0] dotest (lib/locking-selftest.c:1455) [ 0.777627][ T0] ? _printk (kernel/printk/printk.c:2331) [ 0.778068][ T0] locking_selftest (lib/locking-selftest.c:2821) [ 0.778571][ T0] start_kernel (init/main.c:1023) [ 0.779072][ T0] i386_start_kernel (arch/x86/kernel/head32.c:79) [ 0.779655][ T0] startup_32_smp (arch/x86/kernel/head_32.S:295) [ 0.780199][ T0] irq event stamp: 221 [ 0.780678][ T0] hardirqs last enabled at (221): console_unlock (arch/x86/include/asm/irqflags.h:19 arch/x86/include/asm/irqflags.h:67 arch/x86/include/asm/irqflags.h:127 kernel/printk/printk.c:341 kernel/printk/printk.c:2706 kernel/printk/printk.c:3038) [ 0.781708][ T0] hardirqs last disabled at (220): console_unlock (kernel/printk/printk.c:339) [ 0.782687][ T0] softirqs last enabled at (0): 0x0 [ 0.783367][ T0] softirqs last disabled at (0): 0x0 [ 0.783995][ T0] ---[ end trace 0000000000000000 ]--- The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240108/202401082116.303fea3-oliver.sang@intel.com
On Fri, Jan 05, 2024 at 12:46:36PM +0800, Xuewen Yan wrote: [...] > > > > Are you hitting a real issue or this is found by code reading? > > Indeed, we hit a real issue: > One cpu did not call graph_unlock, as a result, caused a deadlock with > other cpus, > because any cpu calling raw_spin_lock would get the graph_lock first. > Could you share more details about the real issue you hit? For example, serial log? I asked because although the graph_unlock() makes logical sense, but that path should really not hit if lockdep works correctly. Thanks! Regards, Boqun > Thanks!
Hi boqun On Tue, Jan 9, 2024 at 12:30 AM Boqun Feng <boqun.feng@gmail.com> wrote: > > On Fri, Jan 05, 2024 at 12:46:36PM +0800, Xuewen Yan wrote: > [...] > > > > > > Are you hitting a real issue or this is found by code reading? > > > > Indeed, we hit a real issue: > > One cpu did not call graph_unlock, as a result, caused a deadlock with > > other cpus, > > because any cpu calling raw_spin_lock would get the graph_lock first. > > > > Could you share more details about the real issue you hit? For example, > serial log? I asked because although the graph_unlock() makes logical > sense, but that path should really not hit if lockdep works correctly. The following is our stack: first, there is a scenario in our kernel tree: get mutexA lock-->get rcu lock-->get mutexB lock As a result, there is a warning about the chain: [ 7.344848][ T1@C0] reboot: Restarting system with command 'userrequested,recovery' [ 7.354028][ T1@C0] [ 7.354358][ T1@C0] ============================= [ 7.354967][ T1@C0] [ BUG: Invalid wait context ] [ 7.355580][ T1@C0] 6.6.5-0-g53d4f6ea5e56-dirty-ab000013 #1 Tainted: G W [ 7.356720][ T1@C0] ----------------------------- [ 7.357326][ T1@C0] init/1 is trying to lock: [ 7.357894][ T1@C0] ffffffc07b0b5310 (kmsg_buf_lock){+.+.}-{3:3}, at: last_kmsg_handler+0x60/0xb8 [ 7.359244][ T1@C0] other info that might help us debug this: [ 7.359982][ T1@C0] context-{4:4} [ 7.360420][ T1@C0] 2 locks held by init/1: [ 7.360967][ T1@C0] #0: ffffffc08234d0b8 (system_transition_mutex){+.+.}-{3:3}, at: __arm64_sys_reboot+0x130/0x27c [ 7.362353][ T1@C0] #1: ffffffc0823c9a18 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x38 [ 7.363568][ T1@C0] stack backtrace: [ 7.364038][ T1@C0] CPU: 0 PID: 1 Comm: init Tainted: G W 6.6.5-android14-0-g53d4f6ea5e56-dirty-ab000013 #1 [ 7.365453][ T1@C0] Hardware name: Unisoc ***-base Board (DT) [ 7.366238][ T1@C0] Call trace: [ 7.366652][ T1@C0] dump_backtrace+0xf8/0x148 [ 7.367244][ T1@C0] show_stack+0x20/0x30 [ 7.367779][ T1@C0] dump_stack_lvl+0x60/0x84 [ 7.368372][ T1@C0] __lock_acquire+0xc2c/0x3288 [ 7.368982][ T1@C0] lock_acquire+0x124/0x2b0 [ 7.369556][ T1@C0] __mutex_lock+0xa0/0xbfc [ 7.370122][ T1@C0] mutex_lock_nested+0x2c/0x38 [ 7.370730][ T1@C0] last_kmsg_handler+0x60/0xb8 <<<<get mutex B [ 7.371494][ T1@C0] kmsg_dump+0xf0/0x16c <<<<rcu lock [ 7.372028][ T1@C0] kernel_restart+0x100/0x11c [ 7.372626][ T1@C0] __arm64_sys_reboot+0x1a8/0x27c [ 7.373270][ T1@C0] invoke_syscall+0x60/0x11c [ 7.373857][ T1@C0] el0_svc_common+0xb4/0xf0 [ 7.374434][ T1@C0] do_el0_svc+0x24/0x30 [ 7.374967][ T1@C0] el0_svc+0x50/0xe4 [ 7.375467][ T1@C0] el0t_64_sync_handler+0x68/0xbc [ 7.376109][ T1@C0] el0t_64_sync+0x1a8/0x1ac This warning should be caused by the order of taking the lock. But when CONFIG_PREEMPT_RCU is turned on, I think it is reasonable to take the mutex after taking the rcu-lock. Maybe the logic of check_wait_context needs to be modified? And then it occurs the following stack: core0 get rdp_gp->nocb_gp_lock and then try to get graph_lock: -000|queued_spin_lock_slowpath(lock = 0xFFFFFFC08291BBC8, val = 0) -001|queued_spin_lock() -001|lockdep_lock() -001|graph_lock() -002|lookup_chain_cache_add() -002|validate_chain() -003|lock_acquire(lock = 0xFFFFFF817F211D98, subclass = 0, trylock = 0, read = 0, check = 1, nest_lock = 0x0, ip = 18446743800981533176) -004|__raw_spin_lock_irqsave() | lock = 0xFFFFFF817F211D80 -> ( | raw_lock = ( | val = (counter = 0), | locked = 0, | pending = 0, | locked_pending = 0, | tail = 0), | magic = 3735899821, | owner_cpu = 4294967295, | owner = 0xFFFFFFFFFFFFFFFF, | dep_map = (key = 0xFFFFFFC082F12C88, class_cache = (0xFFFFFFC0825378E0, 0x0), name = 0xFFFFFFC081541039, wait_type_outer = 0, wait_type_inner = 2, lock_type = 0)) | flags = 0 | 110| -004|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F211D80) -005|lock_timer_base(inline) -005|__mod_timer.llvm.7968396489078322347(timer = 0xFFFFFF817F2A88D0, expires = 4294893461, options = 0) -006|mod_timer(inline) -006|wake_nocb_gp_defer(inline) -006|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F2A8680, was_alldone = ?, flags = ?) -007|__call_rcu_common(inline) -007|call_rcu(head = 0xFFFFFFC0822E0B58, func = ?) -008|call_rcu_hurry(inline) -008|rcu_sync_call(inline) -008|rcu_sync_func(rhp = 0xFFFFFFC0822E0B58) -009|rcu_do_batch(rdp = 0xFFFFFF817F266680) -010|nocb_cb_wait(inline) -010|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F266680) -011|kthread(_create = 0xFFFFFF8080363740) -012|ret_from_fork(asm) However, the grapg_lock is owned by core1, and it try to get rdp_gp->nocb_gp_lock, and it caused the deadlock. But we do not see where the core do not unlock the graph_lock: -000|queued_spin_lock_slowpath(lock = 0xFFFFFF817F2A8A80, val = ?) -001|queued_spin_lock(inline) -001|do_raw_spin_lock(lock = 0xFFFFFF817F2A8A80) -002|__raw_spin_lock_irqsave(inline) -002|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F2A8A80) -003|wake_nocb_gp_defer(inline) -003|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F30B680, was_alldone = ?, flags = ?) -004|__call_rcu_common(inline) -004|call_rcu(head = 0xFFFFFFC082EECC28, func = ?) -005|call_rcu_zapped(inline) -005|free_zapped_rcu(ch = ?) -006|rcu_do_batch(rdp = 0xFFFFFF817F245680) -007|nocb_cb_wait(inline) -007|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F245680) -008|kthread(_create = 0xFFFFFF80803122C0) -009|ret_from_fork(asm) Based on this, we read the lockdep code, and found the check_prev_add's logic. But now we reproduce the scenario although we add the graph_unlock in check_prev_add(). So this is not the root-cause of our problem. And we are also still debugging. We would be very grateful if you could give us some suggestions. Thanks! --- BRs xuewen > Thanks! > > Regards, > Boqun > > > Thanks!
On Tue, Jan 09, 2024 at 10:55:11AM +0800, Xuewen Yan wrote: > Hi boqun > > On Tue, Jan 9, 2024 at 12:30 AM Boqun Feng <boqun.feng@gmail.com> wrote: > > > > On Fri, Jan 05, 2024 at 12:46:36PM +0800, Xuewen Yan wrote: > > [...] > > > > > > > > Are you hitting a real issue or this is found by code reading? > > > > > > Indeed, we hit a real issue: > > > One cpu did not call graph_unlock, as a result, caused a deadlock with > > > other cpus, > > > because any cpu calling raw_spin_lock would get the graph_lock first. > > > > > > > Could you share more details about the real issue you hit? For example, > > serial log? I asked because although the graph_unlock() makes logical > > sense, but that path should really not hit if lockdep works correctly. > > The following is our stack: > > first, there is a scenario in our kernel tree: > get mutexA lock-->get rcu lock-->get mutexB lock > As a result, there is a warning about the chain: > > [ 7.344848][ T1@C0] reboot: Restarting system with command > 'userrequested,recovery' > [ 7.354028][ T1@C0] > [ 7.354358][ T1@C0] ============================= > [ 7.354967][ T1@C0] [ BUG: Invalid wait context ] > [ 7.355580][ T1@C0] 6.6.5-0-g53d4f6ea5e56-dirty-ab000013 #1 > Tainted: G W > [ 7.356720][ T1@C0] ----------------------------- > [ 7.357326][ T1@C0] init/1 is trying to lock: > [ 7.357894][ T1@C0] ffffffc07b0b5310 (kmsg_buf_lock){+.+.}-{3:3}, > at: last_kmsg_handler+0x60/0xb8 > [ 7.359244][ T1@C0] other info that might help us debug this: > [ 7.359982][ T1@C0] context-{4:4} > [ 7.360420][ T1@C0] 2 locks held by init/1: > [ 7.360967][ T1@C0] #0: ffffffc08234d0b8 > (system_transition_mutex){+.+.}-{3:3}, at: > __arm64_sys_reboot+0x130/0x27c > [ 7.362353][ T1@C0] #1: ffffffc0823c9a18 > (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x38 > [ 7.363568][ T1@C0] stack backtrace: > [ 7.364038][ T1@C0] CPU: 0 PID: 1 Comm: init Tainted: G W > 6.6.5-android14-0-g53d4f6ea5e56-dirty-ab000013 #1 > [ 7.365453][ T1@C0] Hardware name: Unisoc ***-base Board (DT) > [ 7.366238][ T1@C0] Call trace: > [ 7.366652][ T1@C0] dump_backtrace+0xf8/0x148 > [ 7.367244][ T1@C0] show_stack+0x20/0x30 > [ 7.367779][ T1@C0] dump_stack_lvl+0x60/0x84 > [ 7.368372][ T1@C0] __lock_acquire+0xc2c/0x3288 > [ 7.368982][ T1@C0] lock_acquire+0x124/0x2b0 > [ 7.369556][ T1@C0] __mutex_lock+0xa0/0xbfc > [ 7.370122][ T1@C0] mutex_lock_nested+0x2c/0x38 > [ 7.370730][ T1@C0] last_kmsg_handler+0x60/0xb8 <<<<get mutex B > [ 7.371494][ T1@C0] kmsg_dump+0xf0/0x16c <<<<rcu lock > [ 7.372028][ T1@C0] kernel_restart+0x100/0x11c > [ 7.372626][ T1@C0] __arm64_sys_reboot+0x1a8/0x27c > [ 7.373270][ T1@C0] invoke_syscall+0x60/0x11c > [ 7.373857][ T1@C0] el0_svc_common+0xb4/0xf0 > [ 7.374434][ T1@C0] do_el0_svc+0x24/0x30 > [ 7.374967][ T1@C0] el0_svc+0x50/0xe4 > [ 7.375467][ T1@C0] el0t_64_sync_handler+0x68/0xbc > [ 7.376109][ T1@C0] el0t_64_sync+0x1a8/0x1ac > > This warning should be caused by the order of taking the lock. > But when CONFIG_PREEMPT_RCU is turned on, I think it is reasonable to > take the mutex after taking the rcu-lock. Thanks for the information, but... No, the rule of CONFIG_PREEMPT_RCU is allowing only implicit preemption in RCU read-side critical sections. So calling mutex after taking RCU read_lock() is illegal. See the comments before rcu_read_lock(): * In preemptible RCU implementations (PREEMPT_RCU) in CONFIG_PREEMPTION * kernel builds, RCU read-side critical sections may be preempted, * but explicit blocking is illegal. > Maybe the logic of check_wait_context needs to be modified? > > And then it occurs the following stack: > > core0 get rdp_gp->nocb_gp_lock and then try to get graph_lock: > > -000|queued_spin_lock_slowpath(lock = 0xFFFFFFC08291BBC8, val = 0) > -001|queued_spin_lock() > -001|lockdep_lock() > -001|graph_lock() > -002|lookup_chain_cache_add() > -002|validate_chain() > -003|lock_acquire(lock = 0xFFFFFF817F211D98, subclass = 0, trylock = > 0, read = 0, check = 1, nest_lock = 0x0, ip = 18446743800981533176) > -004|__raw_spin_lock_irqsave() > | lock = 0xFFFFFF817F211D80 -> ( > | raw_lock = ( > | val = (counter = 0), > | locked = 0, > | pending = 0, > | locked_pending = 0, > | tail = 0), > | magic = 3735899821, > | owner_cpu = 4294967295, > | owner = 0xFFFFFFFFFFFFFFFF, > | dep_map = (key = 0xFFFFFFC082F12C88, class_cache = > (0xFFFFFFC0825378E0, 0x0), name = 0xFFFFFFC081541039, wait_type_outer > = 0, wait_type_inner = 2, lock_type = 0)) > | flags = 0 > | > 110| > -004|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F211D80) > -005|lock_timer_base(inline) > -005|__mod_timer.llvm.7968396489078322347(timer = 0xFFFFFF817F2A88D0, > expires = 4294893461, options = 0) > -006|mod_timer(inline) > -006|wake_nocb_gp_defer(inline) > -006|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F2A8680, was_alldone = ?, flags = ?) > -007|__call_rcu_common(inline) > -007|call_rcu(head = 0xFFFFFFC0822E0B58, func = ?) > -008|call_rcu_hurry(inline) > -008|rcu_sync_call(inline) > -008|rcu_sync_func(rhp = 0xFFFFFFC0822E0B58) > -009|rcu_do_batch(rdp = 0xFFFFFF817F266680) > -010|nocb_cb_wait(inline) > -010|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F266680) > -011|kthread(_create = 0xFFFFFF8080363740) > -012|ret_from_fork(asm) > > However, the grapg_lock is owned by core1, and it try to get > rdp_gp->nocb_gp_lock, and it caused the deadlock. > But we do not see where the core do not unlock the graph_lock: > Right, this is the key point. There is a static variable tracking the owner task of graph-lock: static struct task_struct *__owner; are you able to find some information from the coredump? > -000|queued_spin_lock_slowpath(lock = 0xFFFFFF817F2A8A80, val = ?) > -001|queued_spin_lock(inline) > -001|do_raw_spin_lock(lock = 0xFFFFFF817F2A8A80) > -002|__raw_spin_lock_irqsave(inline) > -002|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F2A8A80) > -003|wake_nocb_gp_defer(inline) > -003|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F30B680, was_alldone = ?, flags = ?) > -004|__call_rcu_common(inline) > -004|call_rcu(head = 0xFFFFFFC082EECC28, func = ?) > -005|call_rcu_zapped(inline) > -005|free_zapped_rcu(ch = ?) > -006|rcu_do_batch(rdp = 0xFFFFFF817F245680) > -007|nocb_cb_wait(inline) > -007|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F245680) > -008|kthread(_create = 0xFFFFFF80803122C0) > -009|ret_from_fork(asm) > > Based on this, we read the lockdep code, and found the check_prev_add's logic. > > But now we reproduce the scenario although we add the graph_unlock in > check_prev_add(). If you have a reliable reproduce then could you try the following debug code? diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index e85b5ad3e206..a313fcc78e8e 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -466,6 +466,8 @@ static __always_inline void lockdep_recursion_finish(void) { if (WARN_ON_ONCE(__this_cpu_dec_return(lockdep_recursion))) __this_cpu_write(lockdep_recursion, 0); + + BUG_ON(__owner == current); } void lockdep_set_selftest_task(struct task_struct *task) This may tell you which code path in lockdep forgot to unlock the graph lock. > So this is not the root-cause of our problem. And we are also still debugging. That matches my expectation: that "return 0" should never hit. So the problem may be somewhere any else. Regards, BOqun > We would be very grateful if you could give us some suggestions. > > Thanks! > > --- > BRs > xuewen > > > > Thanks! > > > > Regards, > > Boqun > > > > > Thanks!
Hi Boqun On Tue, Jan 9, 2024 at 1:35 PM Boqun Feng <boqun.feng@gmail.com> wrote: > > On Tue, Jan 09, 2024 at 10:55:11AM +0800, Xuewen Yan wrote: > > Hi boqun > > > > On Tue, Jan 9, 2024 at 12:30 AM Boqun Feng <boqun.feng@gmail.com> wrote: > > > > > > On Fri, Jan 05, 2024 at 12:46:36PM +0800, Xuewen Yan wrote: > > > [...] > > > > > > > > > > Are you hitting a real issue or this is found by code reading? > > > > > > > > Indeed, we hit a real issue: > > > > One cpu did not call graph_unlock, as a result, caused a deadlock with > > > > other cpus, > > > > because any cpu calling raw_spin_lock would get the graph_lock first. > > > > > > > > > > Could you share more details about the real issue you hit? For example, > > > serial log? I asked because although the graph_unlock() makes logical > > > sense, but that path should really not hit if lockdep works correctly. > > > > The following is our stack: > > > > first, there is a scenario in our kernel tree: > > get mutexA lock-->get rcu lock-->get mutexB lock > > As a result, there is a warning about the chain: > > > > [ 7.344848][ T1@C0] reboot: Restarting system with command > > 'userrequested,recovery' > > [ 7.354028][ T1@C0] > > [ 7.354358][ T1@C0] ============================= > > [ 7.354967][ T1@C0] [ BUG: Invalid wait context ] > > [ 7.355580][ T1@C0] 6.6.5-0-g53d4f6ea5e56-dirty-ab000013 #1 > > Tainted: G W > > [ 7.356720][ T1@C0] ----------------------------- > > [ 7.357326][ T1@C0] init/1 is trying to lock: > > [ 7.357894][ T1@C0] ffffffc07b0b5310 (kmsg_buf_lock){+.+.}-{3:3}, > > at: last_kmsg_handler+0x60/0xb8 > > [ 7.359244][ T1@C0] other info that might help us debug this: > > [ 7.359982][ T1@C0] context-{4:4} > > [ 7.360420][ T1@C0] 2 locks held by init/1: > > [ 7.360967][ T1@C0] #0: ffffffc08234d0b8 > > (system_transition_mutex){+.+.}-{3:3}, at: > > __arm64_sys_reboot+0x130/0x27c > > [ 7.362353][ T1@C0] #1: ffffffc0823c9a18 > > (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x38 > > [ 7.363568][ T1@C0] stack backtrace: > > [ 7.364038][ T1@C0] CPU: 0 PID: 1 Comm: init Tainted: G W > > 6.6.5-android14-0-g53d4f6ea5e56-dirty-ab000013 #1 > > [ 7.365453][ T1@C0] Hardware name: Unisoc ***-base Board (DT) > > [ 7.366238][ T1@C0] Call trace: > > [ 7.366652][ T1@C0] dump_backtrace+0xf8/0x148 > > [ 7.367244][ T1@C0] show_stack+0x20/0x30 > > [ 7.367779][ T1@C0] dump_stack_lvl+0x60/0x84 > > [ 7.368372][ T1@C0] __lock_acquire+0xc2c/0x3288 > > [ 7.368982][ T1@C0] lock_acquire+0x124/0x2b0 > > [ 7.369556][ T1@C0] __mutex_lock+0xa0/0xbfc > > [ 7.370122][ T1@C0] mutex_lock_nested+0x2c/0x38 > > [ 7.370730][ T1@C0] last_kmsg_handler+0x60/0xb8 <<<<get mutex B > > [ 7.371494][ T1@C0] kmsg_dump+0xf0/0x16c <<<<rcu lock > > [ 7.372028][ T1@C0] kernel_restart+0x100/0x11c > > [ 7.372626][ T1@C0] __arm64_sys_reboot+0x1a8/0x27c > > [ 7.373270][ T1@C0] invoke_syscall+0x60/0x11c > > [ 7.373857][ T1@C0] el0_svc_common+0xb4/0xf0 > > [ 7.374434][ T1@C0] do_el0_svc+0x24/0x30 > > [ 7.374967][ T1@C0] el0_svc+0x50/0xe4 > > [ 7.375467][ T1@C0] el0t_64_sync_handler+0x68/0xbc > > [ 7.376109][ T1@C0] el0t_64_sync+0x1a8/0x1ac > > > > This warning should be caused by the order of taking the lock. > > But when CONFIG_PREEMPT_RCU is turned on, I think it is reasonable to > > take the mutex after taking the rcu-lock. > > Thanks for the information, but... > > No, the rule of CONFIG_PREEMPT_RCU is allowing only implicit preemption > in RCU read-side critical sections. So calling mutex after taking RCU > read_lock() is illegal. > > See the comments before rcu_read_lock(): > > * In preemptible RCU implementations (PREEMPT_RCU) in CONFIG_PREEMPTION > * kernel builds, RCU read-side critical sections may be preempted, > * but explicit blocking is illegal. Thank you for your patient reply!And we would fix it in our tree. > > > Maybe the logic of check_wait_context needs to be modified? > > > > And then it occurs the following stack: > > > > core0 get rdp_gp->nocb_gp_lock and then try to get graph_lock: > > > > -000|queued_spin_lock_slowpath(lock = 0xFFFFFFC08291BBC8, val = 0) > > -001|queued_spin_lock() > > -001|lockdep_lock() > > -001|graph_lock() > > -002|lookup_chain_cache_add() > > -002|validate_chain() > > -003|lock_acquire(lock = 0xFFFFFF817F211D98, subclass = 0, trylock = > > 0, read = 0, check = 1, nest_lock = 0x0, ip = 18446743800981533176) > > -004|__raw_spin_lock_irqsave() > > | lock = 0xFFFFFF817F211D80 -> ( > > | raw_lock = ( > > | val = (counter = 0), > > | locked = 0, > > | pending = 0, > > | locked_pending = 0, > > | tail = 0), > > | magic = 3735899821, > > | owner_cpu = 4294967295, > > | owner = 0xFFFFFFFFFFFFFFFF, > > | dep_map = (key = 0xFFFFFFC082F12C88, class_cache = > > (0xFFFFFFC0825378E0, 0x0), name = 0xFFFFFFC081541039, wait_type_outer > > = 0, wait_type_inner = 2, lock_type = 0)) > > | flags = 0 > > | > > 110| > > -004|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F211D80) > > -005|lock_timer_base(inline) > > -005|__mod_timer.llvm.7968396489078322347(timer = 0xFFFFFF817F2A88D0, > > expires = 4294893461, options = 0) > > -006|mod_timer(inline) > > -006|wake_nocb_gp_defer(inline) > > -006|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F2A8680, was_alldone = ?, flags = ?) > > -007|__call_rcu_common(inline) > > -007|call_rcu(head = 0xFFFFFFC0822E0B58, func = ?) > > -008|call_rcu_hurry(inline) > > -008|rcu_sync_call(inline) > > -008|rcu_sync_func(rhp = 0xFFFFFFC0822E0B58) > > -009|rcu_do_batch(rdp = 0xFFFFFF817F266680) > > -010|nocb_cb_wait(inline) > > -010|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F266680) > > -011|kthread(_create = 0xFFFFFF8080363740) > > -012|ret_from_fork(asm) > > > > However, the grapg_lock is owned by core1, and it try to get > > rdp_gp->nocb_gp_lock, and it caused the deadlock. > > But we do not see where the core do not unlock the graph_lock: > > > > Right, this is the key point. There is a static variable tracking the > owner task of graph-lock: > > static struct task_struct *__owner; > > are you able to find some information from the coredump? > > > -000|queued_spin_lock_slowpath(lock = 0xFFFFFF817F2A8A80, val = ?) > > -001|queued_spin_lock(inline) > > -001|do_raw_spin_lock(lock = 0xFFFFFF817F2A8A80) > > -002|__raw_spin_lock_irqsave(inline) > > -002|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F2A8A80) > > -003|wake_nocb_gp_defer(inline) > > -003|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F30B680, was_alldone = ?, flags = ?) > > -004|__call_rcu_common(inline) > > -004|call_rcu(head = 0xFFFFFFC082EECC28, func = ?) > > -005|call_rcu_zapped(inline) > > -005|free_zapped_rcu(ch = ?) > > -006|rcu_do_batch(rdp = 0xFFFFFF817F245680) > > -007|nocb_cb_wait(inline) > > -007|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F245680) > > -008|kthread(_create = 0xFFFFFF80803122C0) > > -009|ret_from_fork(asm) > > > > Based on this, we read the lockdep code, and found the check_prev_add's logic. > > > > But now we reproduce the scenario although we add the graph_unlock in > > check_prev_add(). > > If you have a reliable reproduce then could you try the following debug > code? > > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c > index e85b5ad3e206..a313fcc78e8e 100644 > --- a/kernel/locking/lockdep.c > +++ b/kernel/locking/lockdep.c > @@ -466,6 +466,8 @@ static __always_inline void lockdep_recursion_finish(void) > { > if (WARN_ON_ONCE(__this_cpu_dec_return(lockdep_recursion))) > __this_cpu_write(lockdep_recursion, 0); > + > + BUG_ON(__owner == current); > } > > void lockdep_set_selftest_task(struct task_struct *task) > > This may tell you which code path in lockdep forgot to unlock the graph > lock. Thanks! We will try it! > > > So this is not the root-cause of our problem. And we are also still debugging. > > That matches my expectation: that "return 0" should never hit. So the > problem may be somewhere any else. We are still reading the code of lockdep and found that the following path does not seem to release the lock? lookup_chain_cache_add add_chain_cache(): if (lockdep_assert_locked()) return 0; In lockdep_assert_locked(), I do not see the graph_unlock(), did I miss something? Or should we also add graph_unlock? diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 8b665ba90ad0..0acb64ede7cd 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -3710,8 +3710,10 @@ static inline int add_chain_cache(struct task_struct *curr, * disabled to make this an IRQ-safe lock.. for recursion reasons * lockdep won't complain about its own locking errors. */ - if (lockdep_assert_locked()) + if (lockdep_assert_locked()) { + graph_unlock(); return 0; + } chain = alloc_lock_chain(); if (!chain) { Thanks! --- BRs xuewen > > Regards, > BOqun > > > We would be very grateful if you could give us some suggestions. > > > > Thanks! > > > > --- > > BRs > > xuewen > > > > > > > Thanks! > > > > > > Regards, > > > Boqun > > > > > > > Thanks!
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 151bd3de5936..24995e1ebc62 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -3855,8 +3855,11 @@ static int validate_chain(struct task_struct *curr, */ int ret = check_deadlock(curr, hlock); - if (!ret) + if (!ret) { + graph_unlock(); return 0; + } + /* * Add dependency only if this lock is not the head * of the chain, and if the new lock introduces no more @@ -3865,9 +3868,9 @@ static int validate_chain(struct task_struct *curr, * serializes nesting locks), see the comments for * check_deadlock(). */ - if (!chain_head && ret != 2) { - if (!check_prevs_add(curr, hlock)) - return 0; + if (!chain_head && ret != 2 && !check_prevs_add(curr, hlock)) { + graph_unlock(); + return 0; } graph_unlock();