[v2] tracing: relax trace_event_eval_update() execution with cond_resched()

Message ID 20230929191637.416931-1-cleger@rivosinc.com
State New
Headers
Series [v2] tracing: relax trace_event_eval_update() execution with cond_resched() |

Commit Message

Clément Léger Sept. 29, 2023, 7:16 p.m. UTC
  When kernel is compiled without preemption, the eval_map_work_func()
(which calls trace_event_eval_update()) will not be preempted up to its
complete execution. This can actually cause a problem since if another
CPU call stop_machine(), the call will have to wait for the
eval_map_work_func() function to finish executing in the workqueue
before being able to be scheduled. This problem was observe on a SMP
system at boot time, when the CPU calling the initcalls executed
clocksource_done_booting() which in the end calls stop_machine(). We
observed a 1 second delay because one CPU was executing
eval_map_work_func() and was not preempted by the stop_machine() task.

Adding a call to cond_resched() in trace_event_eval_update() allows
other tasks to be executed and thus continue working asynchronously
like before without blocking any pending task at boot time.

Signed-off-by: Clément Léger <cleger@rivosinc.com>
---

Changes in v2:
 - Use cond_resched() instead of schedule()

 kernel/trace/trace_events.c | 1 +
 1 file changed, 1 insertion(+)
  

Comments

Atish Patra Sept. 29, 2023, 8:14 p.m. UTC | #1
On Fri, Sep 29, 2023 at 12:17 PM Clément Léger <cleger@rivosinc.com> wrote:
>
> When kernel is compiled without preemption, the eval_map_work_func()
> (which calls trace_event_eval_update()) will not be preempted up to its
> complete execution. This can actually cause a problem since if another
> CPU call stop_machine(), the call will have to wait for the
> eval_map_work_func() function to finish executing in the workqueue
> before being able to be scheduled. This problem was observe on a SMP
> system at boot time, when the CPU calling the initcalls executed
> clocksource_done_booting() which in the end calls stop_machine(). We
> observed a 1 second delay because one CPU was executing
> eval_map_work_func() and was not preempted by the stop_machine() task.
>
> Adding a call to cond_resched() in trace_event_eval_update() allows
> other tasks to be executed and thus continue working asynchronously
> like before without blocking any pending task at boot time.
>
> Signed-off-by: Clément Léger <cleger@rivosinc.com>
> ---
>
> Changes in v2:
>  - Use cond_resched() instead of schedule()
>
>  kernel/trace/trace_events.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
> index 91951d038ba4..f49d6ddb6342 100644
> --- a/kernel/trace/trace_events.c
> +++ b/kernel/trace/trace_events.c
> @@ -2770,6 +2770,7 @@ void trace_event_eval_update(struct trace_eval_map **map, int len)
>                                 update_event_fields(call, map[i]);
>                         }
>                 }
> +               cond_resched();
>         }
>         up_write(&trace_event_sem);
>  }

(First email got rejected as gmail decided to switch back the format
to html :(). Sorry for the noise.
Thanks for the fix. The boot time improvement is evident in both
spike/Qemu. FWIW
Tested-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>

> --
> 2.40.1
>
  

Patch

diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 91951d038ba4..f49d6ddb6342 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2770,6 +2770,7 @@  void trace_event_eval_update(struct trace_eval_map **map, int len)
 				update_event_fields(call, map[i]);
 			}
 		}
+		cond_resched();
 	}
 	up_write(&trace_event_sem);
 }