lock/lockdep: Add missing graph_unlock in validate_chain

Message ID 20240104054030.14733-1-xuewen.yan@unisoc.com
State New
Headers
Series lock/lockdep: Add missing graph_unlock in validate_chain |

Commit Message

Xuewen Yan Jan. 4, 2024, 5:40 a.m. UTC
  The lookup_chain_cache_add will get graph_lock, but the
validate_chain do not unlock before return 0.

So add graph_unlock before return 0.

Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
---
 kernel/locking/lockdep.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)
  

Comments

Boqun Feng Jan. 4, 2024, 7:41 p.m. UTC | #1
Hi,

On Thu, Jan 04, 2024 at 01:40:30PM +0800, Xuewen Yan wrote:
> The lookup_chain_cache_add will get graph_lock, but the
> validate_chain do not unlock before return 0.
> 

Thanks for looking into this, a few comment below:

> So add graph_unlock before return 0.
> 
> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
> ---
>  kernel/locking/lockdep.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 151bd3de5936..24995e1ebc62 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -3855,8 +3855,11 @@ static int validate_chain(struct task_struct *curr,
>  		 */
>  		int ret = check_deadlock(curr, hlock);
>  
> -		if (!ret)
> +		if (!ret) {
> +			graph_unlock();

Note that when check_deadlock() return 0, there is a
print_deadlock_bug() before the return, so I think it covers the
graph_unlock() (see debug_locks_off_graph_unlock()).

>  			return 0;
> +		}
> +
>  		/*
>  		 * Add dependency only if this lock is not the head
>  		 * of the chain, and if the new lock introduces no more
> @@ -3865,9 +3868,9 @@ static int validate_chain(struct task_struct *curr,
>  		 * serializes nesting locks), see the comments for
>  		 * check_deadlock().
>  		 */
> -		if (!chain_head && ret != 2) {
> -			if (!check_prevs_add(curr, hlock))
> -				return 0;
> +		if (!chain_head && ret != 2 && !check_prevs_add(curr, hlock)) {
> +			graph_unlock();

This part is interesting, usually when an internal function in lockdep
returns 0, it means there is an error (either a deadlock or internal
error), and that means a print_*() would be called, and the graph lock
will be unlocked in that print_*(). However, in check_prevs_add() there
is one condition where it will return 0 without any print_*(), that is:


in check_prev_add():

			/* <prev> is not found in <next>::locks_before */
			return 0;

it's an internal error where <next> is in the <prev>::locks_after list
but <prev> is not in the <next>::locks_before list, which should seldom
happen: it's dead code. If you put a graph_unlock() before that return,
I think it covers all the cases, unless I'm missing something subtle.

Are you hitting a real issue or this is found by code reading?

Regards,
Boqun

> +			return 0;
>  		}
>  
>  		graph_unlock();
> -- 
> 2.25.1
>
  
Xuewen Yan Jan. 5, 2024, 4:46 a.m. UTC | #2
Hi

On Fri, Jan 5, 2024 at 3:44 AM Boqun Feng <boqun.feng@gmail.com> wrote:
>
> Hi,
>
> On Thu, Jan 04, 2024 at 01:40:30PM +0800, Xuewen Yan wrote:
> > The lookup_chain_cache_add will get graph_lock, but the
> > validate_chain do not unlock before return 0.
> >
>
> Thanks for looking into this, a few comment below:
>
> > So add graph_unlock before return 0.
> >
> > Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> > Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
> > ---
> >  kernel/locking/lockdep.c | 11 +++++++----
> >  1 file changed, 7 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index 151bd3de5936..24995e1ebc62 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -3855,8 +3855,11 @@ static int validate_chain(struct task_struct *curr,
> >                */
> >               int ret = check_deadlock(curr, hlock);
> >
> > -             if (!ret)
> > +             if (!ret) {
> > +                     graph_unlock();
>
> Note that when check_deadlock() return 0, there is a
> print_deadlock_bug() before the return, so I think it covers the
> graph_unlock() (see debug_locks_off_graph_unlock()).

Yes, I did not see the check_deadlock's details carefully.

>
> >                       return 0;
> > +             }
> > +
> >               /*
> >                * Add dependency only if this lock is not the head
> >                * of the chain, and if the new lock introduces no more
> > @@ -3865,9 +3868,9 @@ static int validate_chain(struct task_struct *curr,
> >                * serializes nesting locks), see the comments for
> >                * check_deadlock().
> >                */
> > -             if (!chain_head && ret != 2) {
> > -                     if (!check_prevs_add(curr, hlock))
> > -                             return 0;
> > +             if (!chain_head && ret != 2 && !check_prevs_add(curr, hlock)) {
> > +                     graph_unlock();
>
> This part is interesting, usually when an internal function in lockdep
> returns 0, it means there is an error (either a deadlock or internal
> error), and that means a print_*() would be called, and the graph lock
> will be unlocked in that print_*(). However, in check_prevs_add() there
> is one condition where it will return 0 without any print_*(), that is:
>
>
> in check_prev_add():
>
>                         /* <prev> is not found in <next>::locks_before */
>                         return 0;
>
> it's an internal error where <next> is in the <prev>::locks_after list
> but <prev> is not in the <next>::locks_before list, which should seldom
> happen: it's dead code. If you put a graph_unlock() before that return,
> I think it covers all the cases, unless I'm missing something subtle.

If only this condition does not unlock, It is indeed better to put
graph_unlock here.
I would change the patch in the V2.

>
> Are you hitting a real issue or this is found by code reading?

Indeed, we hit a real issue:
One cpu did not call graph_unlock, as a result, caused a deadlock with
other cpus,
because any cpu calling raw_spin_lock would get the graph_lock first.

Thanks!

--
BR
xuewen

>
> Regards,
> Boqun
>
> > +                     return 0;
> >               }
> >
> >               graph_unlock();
> > --
> > 2.25.1
> >
>
  
kernel test robot Jan. 8, 2024, 1:53 p.m. UTC | #3
Hello,

kernel test robot noticed "WARNING:at_kernel/locking/lockdep.c:#lock_acquire" on:

commit: 1033f8642b6ca8652763f95013e30ff1c220f3c9 ("[PATCH] lock/lockdep: Add missing graph_unlock in validate_chain")
url: https://github.com/intel-lab-lkp/linux/commits/Xuewen-Yan/lock-lockdep-Add-missing-graph_unlock-in-validate_chain/20240104-134335
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 67a1723344cfe05430977483d6d3c7a999480143
patch link: https://lore.kernel.org/all/20240104054030.14733-1-xuewen.yan@unisoc.com/
patch subject: [PATCH] lock/lockdep: Add missing graph_unlock in validate_chain

in testcase: boot

compiler: clang-17
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+---------------------------------------------------------+------------+------------+
|                                                         | 67a1723344 | 1033f8642b |
+---------------------------------------------------------+------------+------------+
| WARNING:at_kernel/locking/lockdep.c:#lock_acquire       | 0          | 8          |
| EIP:lock_acquire                                        | 0          | 8          |
+---------------------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202401082116.303fea3-oliver.sang@intel.com


[    0.755622][    T0] ------------[ cut here ]------------
[ 0.756871][ T0] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:467 lock_acquire (kernel/locking/lockdep.c:467) 
[    0.757950][    T0] Modules linked in:
[    0.758418][    T0] CPU: 0 PID: 0 Comm: swapper Not tainted 6.7.0-rc8-00007-g1033f8642b6c #1
[    0.759457][    T0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 0.760652][ T0] EIP: lock_acquire (kernel/locking/lockdep.c:467) 
[ 0.761215][ T0] Code: 03 c1 e3 10 8b 45 0c 83 e0 01 c1 e0 12 01 d8 05 00 40 08 00 89 45 e8 8d 55 d4 89 f9 56 e8 47 00 00 00 83 c4 04 e9 67 ff ff ff <0f> 0b b8 c8 51 6a c2 e8 b7 66 f7 00 c7 05 e4 50 8b c2 00 00 00 00
All code
========
   0:	03 c1                	add    %ecx,%eax
   2:	e3 10                	jrcxz  0x14
   4:	8b 45 0c             	mov    0xc(%rbp),%eax
   7:	83 e0 01             	and    $0x1,%eax
   a:	c1 e0 12             	shl    $0x12,%eax
   d:	01 d8                	add    %ebx,%eax
   f:	05 00 40 08 00       	add    $0x84000,%eax
  14:	89 45 e8             	mov    %eax,-0x18(%rbp)
  17:	8d 55 d4             	lea    -0x2c(%rbp),%edx
  1a:	89 f9                	mov    %edi,%ecx
  1c:	56                   	push   %rsi
  1d:	e8 47 00 00 00       	call   0x69
  22:	83 c4 04             	add    $0x4,%esp
  25:	e9 67 ff ff ff       	jmp    0xffffffffffffff91
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	b8 c8 51 6a c2       	mov    $0xc26a51c8,%eax
  31:	e8 b7 66 f7 00       	call   0xf766ed
  36:	c7 05 e4 50 8b c2 00 	movl   $0x0,-0x3d74af1c(%rip)        # 0xffffffffc28b5124
  3d:	00 00 00 

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	b8 c8 51 6a c2       	mov    $0xc26a51c8,%eax
   7:	e8 b7 66 f7 00       	call   0xf766c3
   c:	c7 05 e4 50 8b c2 00 	movl   $0x0,-0x3d74af1c(%rip)        # 0xffffffffc28b50fa
  13:	00 00 00 
[    0.763535][    T0] EAX: 00000000 EBX: 00200246 ECX: 00000000 EDX: 00000000
[    0.764339][    T0] ESI: 00000000 EDI: c2c299b4 EBP: c2877f3c ESP: c2877f10
[    0.765192][    T0] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00210097
[    0.766057][    T0] CR0: 80050033 CR2: ffd96000 CR3: 02e96000 CR4: 00000090
[    0.766888][    T0] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[    0.767677][    T0] DR6: fffe0ff0 DR7: 00000400
[    0.768224][    T0] Call Trace:
[ 0.768615][ T0] ? show_regs (arch/x86/kernel/dumpstack.c:478) 
[ 0.769115][ T0] ? __warn (kernel/panic.c:236 kernel/panic.c:677) 
[ 0.769584][ T0] ? lock_acquire (kernel/locking/lockdep.c:467) 
[ 0.770147][ T0] ? report_bug (lib/bug.c:199) 
[ 0.770697][ T0] ? exc_overflow (arch/x86/kernel/traps.c:250) 
[ 0.771253][ T0] ? handle_bug (arch/x86/kernel/traps.c:237) 
[ 0.771781][ T0] ? exc_invalid_op (arch/x86/kernel/traps.c:258) 
[ 0.772375][ T0] ? handle_exception (arch/x86/entry/entry_32.S:1056) 
[ 0.772953][ T0] ? exc_overflow (arch/x86/kernel/traps.c:250) 
[ 0.773465][ T0] ? lock_acquire (kernel/locking/lockdep.c:467) 
[ 0.774005][ T0] ? exc_overflow (arch/x86/kernel/traps.c:250) 
[ 0.774531][ T0] ? lock_acquire (kernel/locking/lockdep.c:467) 
[ 0.775079][ T0] ? vprintk_emit (arch/x86/include/asm/preempt.h:104 kernel/printk/printk.c:3899 kernel/printk/printk.c:3914 kernel/printk/printk.c:2310) 
[ 0.775628][ T0] _raw_spin_lock (include/linux/spinlock_api_smp.h:133 kernel/locking/spinlock.c:154) 
[ 0.776177][ T0] ? AA_spin (lib/locking-selftest.c:310) 
[ 0.776678][ T0] AA_spin (lib/locking-selftest.c:310) 
[ 0.777139][ T0] dotest (lib/locking-selftest.c:1455) 
[ 0.777627][ T0] ? _printk (kernel/printk/printk.c:2331) 
[ 0.778068][ T0] locking_selftest (lib/locking-selftest.c:2821) 
[ 0.778571][ T0] start_kernel (init/main.c:1023) 
[ 0.779072][ T0] i386_start_kernel (arch/x86/kernel/head32.c:79) 
[ 0.779655][ T0] startup_32_smp (arch/x86/kernel/head_32.S:295) 
[    0.780199][    T0] irq event stamp: 221
[ 0.780678][ T0] hardirqs last enabled at (221): console_unlock (arch/x86/include/asm/irqflags.h:19 arch/x86/include/asm/irqflags.h:67 arch/x86/include/asm/irqflags.h:127 kernel/printk/printk.c:341 kernel/printk/printk.c:2706 kernel/printk/printk.c:3038) 
[ 0.781708][ T0] hardirqs last disabled at (220): console_unlock (kernel/printk/printk.c:339) 
[ 0.782687][ T0] softirqs last enabled at (0): 0x0 
[ 0.783367][ T0] softirqs last disabled at (0): 0x0 
[    0.783995][    T0] ---[ end trace 0000000000000000 ]---



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240108/202401082116.303fea3-oliver.sang@intel.com
  
Boqun Feng Jan. 8, 2024, 4:28 p.m. UTC | #4
On Fri, Jan 05, 2024 at 12:46:36PM +0800, Xuewen Yan wrote:
[...]
> >
> > Are you hitting a real issue or this is found by code reading?
> 
> Indeed, we hit a real issue:
> One cpu did not call graph_unlock, as a result, caused a deadlock with
> other cpus,
> because any cpu calling raw_spin_lock would get the graph_lock first.
> 

Could you share more details about the real issue you hit? For example,
serial log? I asked because although the graph_unlock() makes logical
sense, but that path should really not hit if lockdep works correctly.
Thanks!

Regards,
Boqun

> Thanks!
  
Xuewen Yan Jan. 9, 2024, 2:55 a.m. UTC | #5
Hi boqun

On Tue, Jan 9, 2024 at 12:30 AM Boqun Feng <boqun.feng@gmail.com> wrote:
>
> On Fri, Jan 05, 2024 at 12:46:36PM +0800, Xuewen Yan wrote:
> [...]
> > >
> > > Are you hitting a real issue or this is found by code reading?
> >
> > Indeed, we hit a real issue:
> > One cpu did not call graph_unlock, as a result, caused a deadlock with
> > other cpus,
> > because any cpu calling raw_spin_lock would get the graph_lock first.
> >
>
> Could you share more details about the real issue you hit? For example,
> serial log? I asked because although the graph_unlock() makes logical
> sense, but that path should really not hit if lockdep works correctly.

The following is our stack:

first, there is a  scenario in our kernel tree:
get mutexA lock-->get rcu lock-->get mutexB lock
As a result, there is a warning about the chain:

[    7.344848][ T1@C0] reboot: Restarting system with command
'userrequested,recovery'
[    7.354028][ T1@C0]
[    7.354358][ T1@C0] =============================
[    7.354967][ T1@C0] [ BUG: Invalid wait context ]
[    7.355580][ T1@C0] 6.6.5-0-g53d4f6ea5e56-dirty-ab000013 #1
Tainted: G        W
[    7.356720][ T1@C0] -----------------------------
[    7.357326][ T1@C0] init/1 is trying to lock:
[    7.357894][ T1@C0] ffffffc07b0b5310 (kmsg_buf_lock){+.+.}-{3:3},
at: last_kmsg_handler+0x60/0xb8
[    7.359244][ T1@C0] other info that might help us debug this:
[    7.359982][ T1@C0] context-{4:4}
[    7.360420][ T1@C0] 2 locks held by init/1:
[    7.360967][ T1@C0]  #0: ffffffc08234d0b8
(system_transition_mutex){+.+.}-{3:3}, at:
__arm64_sys_reboot+0x130/0x27c
[    7.362353][ T1@C0]  #1: ffffffc0823c9a18
(rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x38
[    7.363568][ T1@C0] stack backtrace:
[    7.364038][ T1@C0] CPU: 0 PID: 1 Comm: init Tainted: G        W
      6.6.5-android14-0-g53d4f6ea5e56-dirty-ab000013 #1
[    7.365453][ T1@C0] Hardware name: Unisoc ***-base Board (DT)
[    7.366238][ T1@C0] Call trace:
[    7.366652][ T1@C0]  dump_backtrace+0xf8/0x148
[    7.367244][ T1@C0]  show_stack+0x20/0x30
[    7.367779][ T1@C0]  dump_stack_lvl+0x60/0x84
[    7.368372][ T1@C0]  __lock_acquire+0xc2c/0x3288
[    7.368982][ T1@C0]  lock_acquire+0x124/0x2b0
[    7.369556][ T1@C0]  __mutex_lock+0xa0/0xbfc
[    7.370122][ T1@C0]  mutex_lock_nested+0x2c/0x38
[    7.370730][ T1@C0]  last_kmsg_handler+0x60/0xb8 <<<<get mutex B
[    7.371494][ T1@C0]  kmsg_dump+0xf0/0x16c   <<<<rcu lock
[    7.372028][ T1@C0]  kernel_restart+0x100/0x11c
[    7.372626][ T1@C0]  __arm64_sys_reboot+0x1a8/0x27c
[    7.373270][ T1@C0]  invoke_syscall+0x60/0x11c
[    7.373857][ T1@C0]  el0_svc_common+0xb4/0xf0
[    7.374434][ T1@C0]  do_el0_svc+0x24/0x30
[    7.374967][ T1@C0]  el0_svc+0x50/0xe4
[    7.375467][ T1@C0]  el0t_64_sync_handler+0x68/0xbc
[    7.376109][ T1@C0]  el0t_64_sync+0x1a8/0x1ac

This warning should be caused by the order of taking the lock.
But when CONFIG_PREEMPT_RCU is turned on, I think it is reasonable to
take the mutex after taking the rcu-lock.
Maybe the logic of check_wait_context needs to be modified?

And then it occurs the following stack:

core0 get rdp_gp->nocb_gp_lock and then try to get graph_lock:

-000|queued_spin_lock_slowpath(lock = 0xFFFFFFC08291BBC8, val = 0)
-001|queued_spin_lock()
-001|lockdep_lock()
-001|graph_lock()
-002|lookup_chain_cache_add()
-002|validate_chain()
-003|lock_acquire(lock = 0xFFFFFF817F211D98, subclass = 0, trylock =
0, read = 0, check = 1, nest_lock = 0x0, ip = 18446743800981533176)
-004|__raw_spin_lock_irqsave()
    |  lock = 0xFFFFFF817F211D80 -> (
    |    raw_lock = (
    |      val = (counter = 0),
    |      locked = 0,
    |      pending = 0,
    |      locked_pending = 0,
    |      tail = 0),
    |    magic = 3735899821,
    |    owner_cpu = 4294967295,
    |    owner = 0xFFFFFFFFFFFFFFFF,
    |    dep_map = (key = 0xFFFFFFC082F12C88, class_cache =
(0xFFFFFFC0825378E0, 0x0), name = 0xFFFFFFC081541039, wait_type_outer
= 0, wait_type_inner = 2, lock_type = 0))
    |  flags = 0
    |
 110|
-004|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F211D80)
-005|lock_timer_base(inline)
-005|__mod_timer.llvm.7968396489078322347(timer = 0xFFFFFF817F2A88D0,
expires = 4294893461, options = 0)
-006|mod_timer(inline)
-006|wake_nocb_gp_defer(inline)
-006|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F2A8680, was_alldone = ?, flags = ?)
-007|__call_rcu_common(inline)
-007|call_rcu(head = 0xFFFFFFC0822E0B58, func = ?)
-008|call_rcu_hurry(inline)
-008|rcu_sync_call(inline)
-008|rcu_sync_func(rhp = 0xFFFFFFC0822E0B58)
-009|rcu_do_batch(rdp = 0xFFFFFF817F266680)
-010|nocb_cb_wait(inline)
-010|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F266680)
-011|kthread(_create = 0xFFFFFF8080363740)
-012|ret_from_fork(asm)

However, the grapg_lock is owned by core1, and it try to get
rdp_gp->nocb_gp_lock, and it caused the deadlock.
But we do not see where the core do not unlock the graph_lock:

-000|queued_spin_lock_slowpath(lock = 0xFFFFFF817F2A8A80, val = ?)
-001|queued_spin_lock(inline)
-001|do_raw_spin_lock(lock = 0xFFFFFF817F2A8A80)
-002|__raw_spin_lock_irqsave(inline)
-002|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F2A8A80)
-003|wake_nocb_gp_defer(inline)
-003|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F30B680, was_alldone = ?, flags = ?)
-004|__call_rcu_common(inline)
-004|call_rcu(head = 0xFFFFFFC082EECC28, func = ?)
-005|call_rcu_zapped(inline)
-005|free_zapped_rcu(ch = ?)
-006|rcu_do_batch(rdp = 0xFFFFFF817F245680)
-007|nocb_cb_wait(inline)
-007|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F245680)
-008|kthread(_create = 0xFFFFFF80803122C0)
-009|ret_from_fork(asm)

Based on this, we read the lockdep code, and found the check_prev_add's logic.

But now we reproduce the scenario although we add the graph_unlock in
check_prev_add().
So this is not the root-cause of our problem. And we are also still debugging.
We would be very grateful if you could give us some suggestions.

Thanks!

---
BRs
xuewen


> Thanks!
>
> Regards,
> Boqun
>
> > Thanks!
  
Boqun Feng Jan. 9, 2024, 5:35 a.m. UTC | #6
On Tue, Jan 09, 2024 at 10:55:11AM +0800, Xuewen Yan wrote:
> Hi boqun
> 
> On Tue, Jan 9, 2024 at 12:30 AM Boqun Feng <boqun.feng@gmail.com> wrote:
> >
> > On Fri, Jan 05, 2024 at 12:46:36PM +0800, Xuewen Yan wrote:
> > [...]
> > > >
> > > > Are you hitting a real issue or this is found by code reading?
> > >
> > > Indeed, we hit a real issue:
> > > One cpu did not call graph_unlock, as a result, caused a deadlock with
> > > other cpus,
> > > because any cpu calling raw_spin_lock would get the graph_lock first.
> > >
> >
> > Could you share more details about the real issue you hit? For example,
> > serial log? I asked because although the graph_unlock() makes logical
> > sense, but that path should really not hit if lockdep works correctly.
> 
> The following is our stack:
> 
> first, there is a  scenario in our kernel tree:
> get mutexA lock-->get rcu lock-->get mutexB lock
> As a result, there is a warning about the chain:
> 
> [    7.344848][ T1@C0] reboot: Restarting system with command
> 'userrequested,recovery'
> [    7.354028][ T1@C0]
> [    7.354358][ T1@C0] =============================
> [    7.354967][ T1@C0] [ BUG: Invalid wait context ]
> [    7.355580][ T1@C0] 6.6.5-0-g53d4f6ea5e56-dirty-ab000013 #1
> Tainted: G        W
> [    7.356720][ T1@C0] -----------------------------
> [    7.357326][ T1@C0] init/1 is trying to lock:
> [    7.357894][ T1@C0] ffffffc07b0b5310 (kmsg_buf_lock){+.+.}-{3:3},
> at: last_kmsg_handler+0x60/0xb8
> [    7.359244][ T1@C0] other info that might help us debug this:
> [    7.359982][ T1@C0] context-{4:4}
> [    7.360420][ T1@C0] 2 locks held by init/1:
> [    7.360967][ T1@C0]  #0: ffffffc08234d0b8
> (system_transition_mutex){+.+.}-{3:3}, at:
> __arm64_sys_reboot+0x130/0x27c
> [    7.362353][ T1@C0]  #1: ffffffc0823c9a18
> (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x38
> [    7.363568][ T1@C0] stack backtrace:
> [    7.364038][ T1@C0] CPU: 0 PID: 1 Comm: init Tainted: G        W
>       6.6.5-android14-0-g53d4f6ea5e56-dirty-ab000013 #1
> [    7.365453][ T1@C0] Hardware name: Unisoc ***-base Board (DT)
> [    7.366238][ T1@C0] Call trace:
> [    7.366652][ T1@C0]  dump_backtrace+0xf8/0x148
> [    7.367244][ T1@C0]  show_stack+0x20/0x30
> [    7.367779][ T1@C0]  dump_stack_lvl+0x60/0x84
> [    7.368372][ T1@C0]  __lock_acquire+0xc2c/0x3288
> [    7.368982][ T1@C0]  lock_acquire+0x124/0x2b0
> [    7.369556][ T1@C0]  __mutex_lock+0xa0/0xbfc
> [    7.370122][ T1@C0]  mutex_lock_nested+0x2c/0x38
> [    7.370730][ T1@C0]  last_kmsg_handler+0x60/0xb8 <<<<get mutex B
> [    7.371494][ T1@C0]  kmsg_dump+0xf0/0x16c   <<<<rcu lock
> [    7.372028][ T1@C0]  kernel_restart+0x100/0x11c
> [    7.372626][ T1@C0]  __arm64_sys_reboot+0x1a8/0x27c
> [    7.373270][ T1@C0]  invoke_syscall+0x60/0x11c
> [    7.373857][ T1@C0]  el0_svc_common+0xb4/0xf0
> [    7.374434][ T1@C0]  do_el0_svc+0x24/0x30
> [    7.374967][ T1@C0]  el0_svc+0x50/0xe4
> [    7.375467][ T1@C0]  el0t_64_sync_handler+0x68/0xbc
> [    7.376109][ T1@C0]  el0t_64_sync+0x1a8/0x1ac
> 
> This warning should be caused by the order of taking the lock.
> But when CONFIG_PREEMPT_RCU is turned on, I think it is reasonable to
> take the mutex after taking the rcu-lock.

Thanks for the information, but...

No, the rule of CONFIG_PREEMPT_RCU is allowing only implicit preemption
in RCU read-side critical sections. So calling mutex after taking RCU
read_lock() is illegal.

See the comments before rcu_read_lock():

 * In preemptible RCU implementations (PREEMPT_RCU) in CONFIG_PREEMPTION
 * kernel builds, RCU read-side critical sections may be preempted,
 * but explicit blocking is illegal.

> Maybe the logic of check_wait_context needs to be modified?
> 
> And then it occurs the following stack:
> 
> core0 get rdp_gp->nocb_gp_lock and then try to get graph_lock:
> 
> -000|queued_spin_lock_slowpath(lock = 0xFFFFFFC08291BBC8, val = 0)
> -001|queued_spin_lock()
> -001|lockdep_lock()
> -001|graph_lock()
> -002|lookup_chain_cache_add()
> -002|validate_chain()
> -003|lock_acquire(lock = 0xFFFFFF817F211D98, subclass = 0, trylock =
> 0, read = 0, check = 1, nest_lock = 0x0, ip = 18446743800981533176)
> -004|__raw_spin_lock_irqsave()
>     |  lock = 0xFFFFFF817F211D80 -> (
>     |    raw_lock = (
>     |      val = (counter = 0),
>     |      locked = 0,
>     |      pending = 0,
>     |      locked_pending = 0,
>     |      tail = 0),
>     |    magic = 3735899821,
>     |    owner_cpu = 4294967295,
>     |    owner = 0xFFFFFFFFFFFFFFFF,
>     |    dep_map = (key = 0xFFFFFFC082F12C88, class_cache =
> (0xFFFFFFC0825378E0, 0x0), name = 0xFFFFFFC081541039, wait_type_outer
> = 0, wait_type_inner = 2, lock_type = 0))
>     |  flags = 0
>     |
>  110|
> -004|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F211D80)
> -005|lock_timer_base(inline)
> -005|__mod_timer.llvm.7968396489078322347(timer = 0xFFFFFF817F2A88D0,
> expires = 4294893461, options = 0)
> -006|mod_timer(inline)
> -006|wake_nocb_gp_defer(inline)
> -006|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F2A8680, was_alldone = ?, flags = ?)
> -007|__call_rcu_common(inline)
> -007|call_rcu(head = 0xFFFFFFC0822E0B58, func = ?)
> -008|call_rcu_hurry(inline)
> -008|rcu_sync_call(inline)
> -008|rcu_sync_func(rhp = 0xFFFFFFC0822E0B58)
> -009|rcu_do_batch(rdp = 0xFFFFFF817F266680)
> -010|nocb_cb_wait(inline)
> -010|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F266680)
> -011|kthread(_create = 0xFFFFFF8080363740)
> -012|ret_from_fork(asm)
> 
> However, the grapg_lock is owned by core1, and it try to get
> rdp_gp->nocb_gp_lock, and it caused the deadlock.
> But we do not see where the core do not unlock the graph_lock:
> 

Right, this is the key point. There is a static variable tracking the
owner task of graph-lock:

static struct task_struct *__owner;

are you able to find some information from the coredump?

> -000|queued_spin_lock_slowpath(lock = 0xFFFFFF817F2A8A80, val = ?)
> -001|queued_spin_lock(inline)
> -001|do_raw_spin_lock(lock = 0xFFFFFF817F2A8A80)
> -002|__raw_spin_lock_irqsave(inline)
> -002|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F2A8A80)
> -003|wake_nocb_gp_defer(inline)
> -003|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F30B680, was_alldone = ?, flags = ?)
> -004|__call_rcu_common(inline)
> -004|call_rcu(head = 0xFFFFFFC082EECC28, func = ?)
> -005|call_rcu_zapped(inline)
> -005|free_zapped_rcu(ch = ?)
> -006|rcu_do_batch(rdp = 0xFFFFFF817F245680)
> -007|nocb_cb_wait(inline)
> -007|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F245680)
> -008|kthread(_create = 0xFFFFFF80803122C0)
> -009|ret_from_fork(asm)
> 
> Based on this, we read the lockdep code, and found the check_prev_add's logic.
> 
> But now we reproduce the scenario although we add the graph_unlock in
> check_prev_add().

If you have a reliable reproduce then could you try the following debug
code?

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index e85b5ad3e206..a313fcc78e8e 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -466,6 +466,8 @@ static __always_inline void lockdep_recursion_finish(void)
 {
        if (WARN_ON_ONCE(__this_cpu_dec_return(lockdep_recursion)))
                __this_cpu_write(lockdep_recursion, 0);
+
+       BUG_ON(__owner == current);
 }

 void lockdep_set_selftest_task(struct task_struct *task)

This may tell you which code path in lockdep forgot to unlock the graph
lock.

> So this is not the root-cause of our problem. And we are also still debugging.

That matches my expectation: that "return 0" should never hit. So the
problem may be somewhere any else.

Regards,
BOqun

> We would be very grateful if you could give us some suggestions.
> 
> Thanks!
> 
> ---
> BRs
> xuewen
> 
> 
> > Thanks!
> >
> > Regards,
> > Boqun
> >
> > > Thanks!
  
Xuewen Yan Jan. 9, 2024, 6:11 a.m. UTC | #7
Hi Boqun

On Tue, Jan 9, 2024 at 1:35 PM Boqun Feng <boqun.feng@gmail.com> wrote:
>
> On Tue, Jan 09, 2024 at 10:55:11AM +0800, Xuewen Yan wrote:
> > Hi boqun
> >
> > On Tue, Jan 9, 2024 at 12:30 AM Boqun Feng <boqun.feng@gmail.com> wrote:
> > >
> > > On Fri, Jan 05, 2024 at 12:46:36PM +0800, Xuewen Yan wrote:
> > > [...]
> > > > >
> > > > > Are you hitting a real issue or this is found by code reading?
> > > >
> > > > Indeed, we hit a real issue:
> > > > One cpu did not call graph_unlock, as a result, caused a deadlock with
> > > > other cpus,
> > > > because any cpu calling raw_spin_lock would get the graph_lock first.
> > > >
> > >
> > > Could you share more details about the real issue you hit? For example,
> > > serial log? I asked because although the graph_unlock() makes logical
> > > sense, but that path should really not hit if lockdep works correctly.
> >
> > The following is our stack:
> >
> > first, there is a  scenario in our kernel tree:
> > get mutexA lock-->get rcu lock-->get mutexB lock
> > As a result, there is a warning about the chain:
> >
> > [    7.344848][ T1@C0] reboot: Restarting system with command
> > 'userrequested,recovery'
> > [    7.354028][ T1@C0]
> > [    7.354358][ T1@C0] =============================
> > [    7.354967][ T1@C0] [ BUG: Invalid wait context ]
> > [    7.355580][ T1@C0] 6.6.5-0-g53d4f6ea5e56-dirty-ab000013 #1
> > Tainted: G        W
> > [    7.356720][ T1@C0] -----------------------------
> > [    7.357326][ T1@C0] init/1 is trying to lock:
> > [    7.357894][ T1@C0] ffffffc07b0b5310 (kmsg_buf_lock){+.+.}-{3:3},
> > at: last_kmsg_handler+0x60/0xb8
> > [    7.359244][ T1@C0] other info that might help us debug this:
> > [    7.359982][ T1@C0] context-{4:4}
> > [    7.360420][ T1@C0] 2 locks held by init/1:
> > [    7.360967][ T1@C0]  #0: ffffffc08234d0b8
> > (system_transition_mutex){+.+.}-{3:3}, at:
> > __arm64_sys_reboot+0x130/0x27c
> > [    7.362353][ T1@C0]  #1: ffffffc0823c9a18
> > (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire+0x0/0x38
> > [    7.363568][ T1@C0] stack backtrace:
> > [    7.364038][ T1@C0] CPU: 0 PID: 1 Comm: init Tainted: G        W
> >       6.6.5-android14-0-g53d4f6ea5e56-dirty-ab000013 #1
> > [    7.365453][ T1@C0] Hardware name: Unisoc ***-base Board (DT)
> > [    7.366238][ T1@C0] Call trace:
> > [    7.366652][ T1@C0]  dump_backtrace+0xf8/0x148
> > [    7.367244][ T1@C0]  show_stack+0x20/0x30
> > [    7.367779][ T1@C0]  dump_stack_lvl+0x60/0x84
> > [    7.368372][ T1@C0]  __lock_acquire+0xc2c/0x3288
> > [    7.368982][ T1@C0]  lock_acquire+0x124/0x2b0
> > [    7.369556][ T1@C0]  __mutex_lock+0xa0/0xbfc
> > [    7.370122][ T1@C0]  mutex_lock_nested+0x2c/0x38
> > [    7.370730][ T1@C0]  last_kmsg_handler+0x60/0xb8 <<<<get mutex B
> > [    7.371494][ T1@C0]  kmsg_dump+0xf0/0x16c   <<<<rcu lock
> > [    7.372028][ T1@C0]  kernel_restart+0x100/0x11c
> > [    7.372626][ T1@C0]  __arm64_sys_reboot+0x1a8/0x27c
> > [    7.373270][ T1@C0]  invoke_syscall+0x60/0x11c
> > [    7.373857][ T1@C0]  el0_svc_common+0xb4/0xf0
> > [    7.374434][ T1@C0]  do_el0_svc+0x24/0x30
> > [    7.374967][ T1@C0]  el0_svc+0x50/0xe4
> > [    7.375467][ T1@C0]  el0t_64_sync_handler+0x68/0xbc
> > [    7.376109][ T1@C0]  el0t_64_sync+0x1a8/0x1ac
> >
> > This warning should be caused by the order of taking the lock.
> > But when CONFIG_PREEMPT_RCU is turned on, I think it is reasonable to
> > take the mutex after taking the rcu-lock.
>
> Thanks for the information, but...
>
> No, the rule of CONFIG_PREEMPT_RCU is allowing only implicit preemption
> in RCU read-side critical sections. So calling mutex after taking RCU
> read_lock() is illegal.
>
> See the comments before rcu_read_lock():
>
>  * In preemptible RCU implementations (PREEMPT_RCU) in CONFIG_PREEMPTION
>  * kernel builds, RCU read-side critical sections may be preempted,
>  * but explicit blocking is illegal.

Thank you for your patient reply!And we would fix it in our tree.

>
> > Maybe the logic of check_wait_context needs to be modified?
> >
> > And then it occurs the following stack:
> >
> > core0 get rdp_gp->nocb_gp_lock and then try to get graph_lock:
> >
> > -000|queued_spin_lock_slowpath(lock = 0xFFFFFFC08291BBC8, val = 0)
> > -001|queued_spin_lock()
> > -001|lockdep_lock()
> > -001|graph_lock()
> > -002|lookup_chain_cache_add()
> > -002|validate_chain()
> > -003|lock_acquire(lock = 0xFFFFFF817F211D98, subclass = 0, trylock =
> > 0, read = 0, check = 1, nest_lock = 0x0, ip = 18446743800981533176)
> > -004|__raw_spin_lock_irqsave()
> >     |  lock = 0xFFFFFF817F211D80 -> (
> >     |    raw_lock = (
> >     |      val = (counter = 0),
> >     |      locked = 0,
> >     |      pending = 0,
> >     |      locked_pending = 0,
> >     |      tail = 0),
> >     |    magic = 3735899821,
> >     |    owner_cpu = 4294967295,
> >     |    owner = 0xFFFFFFFFFFFFFFFF,
> >     |    dep_map = (key = 0xFFFFFFC082F12C88, class_cache =
> > (0xFFFFFFC0825378E0, 0x0), name = 0xFFFFFFC081541039, wait_type_outer
> > = 0, wait_type_inner = 2, lock_type = 0))
> >     |  flags = 0
> >     |
> >  110|
> > -004|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F211D80)
> > -005|lock_timer_base(inline)
> > -005|__mod_timer.llvm.7968396489078322347(timer = 0xFFFFFF817F2A88D0,
> > expires = 4294893461, options = 0)
> > -006|mod_timer(inline)
> > -006|wake_nocb_gp_defer(inline)
> > -006|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F2A8680, was_alldone = ?, flags = ?)
> > -007|__call_rcu_common(inline)
> > -007|call_rcu(head = 0xFFFFFFC0822E0B58, func = ?)
> > -008|call_rcu_hurry(inline)
> > -008|rcu_sync_call(inline)
> > -008|rcu_sync_func(rhp = 0xFFFFFFC0822E0B58)
> > -009|rcu_do_batch(rdp = 0xFFFFFF817F266680)
> > -010|nocb_cb_wait(inline)
> > -010|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F266680)
> > -011|kthread(_create = 0xFFFFFF8080363740)
> > -012|ret_from_fork(asm)
> >
> > However, the grapg_lock is owned by core1, and it try to get
> > rdp_gp->nocb_gp_lock, and it caused the deadlock.
> > But we do not see where the core do not unlock the graph_lock:
> >
>
> Right, this is the key point. There is a static variable tracking the
> owner task of graph-lock:
>
> static struct task_struct *__owner;
>
> are you able to find some information from the coredump?
>
> > -000|queued_spin_lock_slowpath(lock = 0xFFFFFF817F2A8A80, val = ?)
> > -001|queued_spin_lock(inline)
> > -001|do_raw_spin_lock(lock = 0xFFFFFF817F2A8A80)
> > -002|__raw_spin_lock_irqsave(inline)
> > -002|_raw_spin_lock_irqsave(lock = 0xFFFFFF817F2A8A80)
> > -003|wake_nocb_gp_defer(inline)
> > -003|__call_rcu_nocb_wake(rdp = 0xFFFFFF817F30B680, was_alldone = ?, flags = ?)
> > -004|__call_rcu_common(inline)
> > -004|call_rcu(head = 0xFFFFFFC082EECC28, func = ?)
> > -005|call_rcu_zapped(inline)
> > -005|free_zapped_rcu(ch = ?)
> > -006|rcu_do_batch(rdp = 0xFFFFFF817F245680)
> > -007|nocb_cb_wait(inline)
> > -007|rcu_nocb_cb_kthread(arg = 0xFFFFFF817F245680)
> > -008|kthread(_create = 0xFFFFFF80803122C0)
> > -009|ret_from_fork(asm)
> >
> > Based on this, we read the lockdep code, and found the check_prev_add's logic.
> >
> > But now we reproduce the scenario although we add the graph_unlock in
> > check_prev_add().
>
> If you have a reliable reproduce then could you try the following debug
> code?
>
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index e85b5ad3e206..a313fcc78e8e 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -466,6 +466,8 @@ static __always_inline void lockdep_recursion_finish(void)
>  {
>         if (WARN_ON_ONCE(__this_cpu_dec_return(lockdep_recursion)))
>                 __this_cpu_write(lockdep_recursion, 0);
> +
> +       BUG_ON(__owner == current);
>  }
>
>  void lockdep_set_selftest_task(struct task_struct *task)
>
> This may tell you which code path in lockdep forgot to unlock the graph
> lock.

Thanks! We will try it!

>
> > So this is not the root-cause of our problem. And we are also still debugging.
>
> That matches my expectation: that "return 0" should never hit. So the
> problem may be somewhere any else.

We are still reading the code of lockdep and found that the following
path does not seem to release the lock?

lookup_chain_cache_add
    add_chain_cache():
        if (lockdep_assert_locked())
                return 0;
In lockdep_assert_locked(), I do not see the graph_unlock(), did I
miss something?
Or should we also add graph_unlock?

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 8b665ba90ad0..0acb64ede7cd 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3710,8 +3710,10 @@ static inline int add_chain_cache(struct
task_struct *curr,
         * disabled to make this an IRQ-safe lock.. for recursion reasons
         * lockdep won't complain about its own locking errors.
         */
-       if (lockdep_assert_locked())
+       if (lockdep_assert_locked()) {
+               graph_unlock();
                return 0;
+       }

        chain = alloc_lock_chain();
        if (!chain) {


Thanks!
---
BRs
xuewen

>
> Regards,
> BOqun
>
> > We would be very grateful if you could give us some suggestions.
> >
> > Thanks!
> >
> > ---
> > BRs
> > xuewen
> >
> >
> > > Thanks!
> > >
> > > Regards,
> > > Boqun
> > >
> > > > Thanks!
  

Patch

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 151bd3de5936..24995e1ebc62 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3855,8 +3855,11 @@  static int validate_chain(struct task_struct *curr,
 		 */
 		int ret = check_deadlock(curr, hlock);
 
-		if (!ret)
+		if (!ret) {
+			graph_unlock();
 			return 0;
+		}
+
 		/*
 		 * Add dependency only if this lock is not the head
 		 * of the chain, and if the new lock introduces no more
@@ -3865,9 +3868,9 @@  static int validate_chain(struct task_struct *curr,
 		 * serializes nesting locks), see the comments for
 		 * check_deadlock().
 		 */
-		if (!chain_head && ret != 2) {
-			if (!check_prevs_add(curr, hlock))
-				return 0;
+		if (!chain_head && ret != 2 && !check_prevs_add(curr, hlock)) {
+			graph_unlock();
+			return 0;
 		}
 
 		graph_unlock();