workqueue: Add rcu lock check after work execute end

Message ID 20240109111014.2689-1-xuewen.yan@unisoc.com
State New
Headers
Series workqueue: Add rcu lock check after work execute end |

Commit Message

Xuewen Yan Jan. 9, 2024, 11:10 a.m. UTC
  Now the workqueue just check the atomic and lock after
work execute end. However, sometimes, drivers's work
may don't unlock rcu after call rcu_read_lock().
And as a result, it would cause rcu stall, but the rcu stall warning
can not dump the work func, because the work has finished.

In order to quickly discover those works that do not call
rcu_read_unlock after rcu_read_lock(). Add the rcu lock check.

Use rcu_preempt_depth() to check the work's rcu status,
Normally, this value is 0. If this value is bigger than 0,
it means that the rcu lock is still held after the work ends.
At this time, we print err info and print the work func.

Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
---
 kernel/workqueue.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)
  

Comments

Waiman Long Jan. 9, 2024, 4:39 p.m. UTC | #1
On 1/9/24 06:10, Xuewen Yan wrote:
> Now the workqueue just check the atomic and lock after
> work execute end. However, sometimes, drivers's work
> may don't unlock rcu after call rcu_read_lock().
> And as a result, it would cause rcu stall, but the rcu stall warning
> can not dump the work func, because the work has finished.
>
> In order to quickly discover those works that do not call
> rcu_read_unlock after rcu_read_lock(). Add the rcu lock check.
>
> Use rcu_preempt_depth() to check the work's rcu status,
> Normally, this value is 0. If this value is bigger than 0,
> it means that the rcu lock is still held after the work ends.
> At this time, we print err info and print the work func.
>
> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> ---
>   kernel/workqueue.c | 9 +++++----
>   1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 2989b57e154a..a5a0df824df1 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2634,11 +2634,12 @@ __acquires(&pool->lock)
>   	lock_map_release(&lockdep_map);
>   	lock_map_release(&pwq->wq->lockdep_map);
>   
> -	if (unlikely(in_atomic() || lockdep_depth(current) > 0)) {
> -		pr_err("BUG: workqueue leaked lock or atomic: %s/0x%08x/%d\n"
> +	if (unlikely(in_atomic() || lockdep_depth(current) > 0) ||
> +		rcu_preempt_depth() > 0) {

The rcu_preempt_depth() check should be within the unlikely() helper. 
Other than that, it looks good to me.

Cheers,
Longman

> +		pr_err("BUG: workqueue leaked lock or atomic: %s/0x%08x/%d/%d\n"
>   		       "     last function: %ps\n",
> -		       current->comm, preempt_count(), task_pid_nr(current),
> -		       worker->current_func);
> +		       current->comm, preempt_count(), rcu_preempt_depth(),
> +		       task_pid_nr(current), worker->current_func);
>   		debug_show_held_locks(current);
>   		dump_stack();
>   	}
  

Patch

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 2989b57e154a..a5a0df824df1 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2634,11 +2634,12 @@  __acquires(&pool->lock)
 	lock_map_release(&lockdep_map);
 	lock_map_release(&pwq->wq->lockdep_map);
 
-	if (unlikely(in_atomic() || lockdep_depth(current) > 0)) {
-		pr_err("BUG: workqueue leaked lock or atomic: %s/0x%08x/%d\n"
+	if (unlikely(in_atomic() || lockdep_depth(current) > 0) ||
+		rcu_preempt_depth() > 0) {
+		pr_err("BUG: workqueue leaked lock or atomic: %s/0x%08x/%d/%d\n"
 		       "     last function: %ps\n",
-		       current->comm, preempt_count(), task_pid_nr(current),
-		       worker->current_func);
+		       current->comm, preempt_count(), rcu_preempt_depth(),
+		       task_pid_nr(current), worker->current_func);
 		debug_show_held_locks(current);
 		dump_stack();
 	}