[1/2] exit: add a tracepoint for profiling a task that is starting to exit

Message ID tencent_09CF49556CD442411A93D0E92ACC2B7E5D08@qq.com
State New
Headers
Series [1/2] exit: add a tracepoint for profiling a task that is starting to exit |

Commit Message

Wen Yang Feb. 22, 2024, 4:04 p.m. UTC
  From: Wen Yang <wenyang.linux@foxmail.com>

Currently coredump_task_exit() takes some time to wait for the generation
of the dump file. But if the user-space wants to receive a notification
as soon as possible it maybe inconvenient.

Commit 2d4bcf886e42 ("exit: Remove profile_task_exit & profile_munmap")
simplified the code, but also removed profile_task_exit(), which may
prevent third-party kernel modules from detecting process exits timely.

Add the new trace_sched_profile_task_exit() this way a user-space monitor
could detect the exits and potentially make some preparations in advance.

Signed-off-by: Wen Yang <wenyang.linux@foxmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
---
 include/trace/events/sched.h | 28 ++++++++++++++++++++++++++++
 kernel/exit.c                |  1 +
 2 files changed, 29 insertions(+)
  

Comments

Mathieu Desnoyers Feb. 22, 2024, 4:25 p.m. UTC | #1
On 2024-02-22 11:04, wenyang.linux@foxmail.com wrote:
> From: Wen Yang <wenyang.linux@foxmail.com>
> 
> Currently coredump_task_exit() takes some time to wait for the generation
> of the dump file. But if the user-space wants to receive a notification
> as soon as possible it maybe inconvenient.
> 
> Commit 2d4bcf886e42 ("exit: Remove profile_task_exit & profile_munmap")
> simplified the code, but also removed profile_task_exit(), which may
> prevent third-party kernel modules from detecting process exits timely.
> 
> Add the new trace_sched_profile_task_exit() this way a user-space monitor
> could detect the exits and potentially make some preparations in advance.

I don't see any explanation justifying adding an extra tracepoint
rather than just moving trace_sched_process_exit() earlier in do_exit().

Why is moving trace_sched_process_exit() earlier in do_exit() an issue,
considering that any tracer interested in knowing the point where a task
is really reclaimed (from zombie state) is trace_sched_process_free()
called from delayed_put_task_struct() ?

Thanks,

Mathieu

> 
> Signed-off-by: Wen Yang <wenyang.linux@foxmail.com>
> Cc: Oleg Nesterov <oleg@redhat.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: linux-kernel@vger.kernel.org
> ---
>   include/trace/events/sched.h | 28 ++++++++++++++++++++++++++++
>   kernel/exit.c                |  1 +
>   2 files changed, 29 insertions(+)
> 
> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> index dbb01b4b7451..750b2f0bdf69 100644
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -341,6 +341,34 @@ DEFINE_EVENT(sched_process_template, sched_wait_task,
>   	TP_PROTO(struct task_struct *p),
>   	TP_ARGS(p));
>   
> +/*
> + * Tracepoint for profiling a task that is starting to exit:
> + */
> +TRACE_EVENT(sched_profile_task_exit,
> +
> +	TP_PROTO(struct task_struct *task, long code),
> +
> +	TP_ARGS(task, code),
> +
> +	TP_STRUCT__entry(
> +		__array(	char,	comm,	TASK_COMM_LEN	)
> +		__field(	pid_t,	pid			)
> +		__field(	int,	prio			)
> +		__field(	long,	code			)
> +	),
> +
> +	TP_fast_assign(
> +		memcpy(__entry->comm, task->comm, TASK_COMM_LEN);
> +		__entry->pid		= task->pid;
> +		__entry->prio		= task->prio;
> +		__entry->code		= code;
> +	),
> +
> +	TP_printk("comm=%s pid=%d prio=%d exit_code=0x%lx",
> +		  __entry->comm, __entry->pid, __entry->prio,
> +		  __entry->code)
> +);
> +
>   /*
>    * Tracepoint for a waiting task:
>    */
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 493647fd7c07..f675f879a1b2 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -826,6 +826,7 @@ void __noreturn do_exit(long code)
>   
>   	WARN_ON(tsk->plug);
>   
> +	trace_sched_profile_task_exit(tsk, code);
>   	kcov_task_exit(tsk);
>   	kmsan_task_exit(tsk);
>
  
Wen Yang Feb. 23, 2024, 5:17 a.m. UTC | #2
On 2024/2/23 00:25, Mathieu Desnoyers wrote:
> On 2024-02-22 11:04, wenyang.linux@foxmail.com wrote:
>> From: Wen Yang <wenyang.linux@foxmail.com>
>>
>> Currently coredump_task_exit() takes some time to wait for the generation
>> of the dump file. But if the user-space wants to receive a notification
>> as soon as possible it maybe inconvenient.
>>
>> Commit 2d4bcf886e42 ("exit: Remove profile_task_exit & profile_munmap")
>> simplified the code, but also removed profile_task_exit(), which may
>> prevent third-party kernel modules from detecting process exits timely.
>>
>> Add the new trace_sched_profile_task_exit() this way a user-space monitor
>> could detect the exits and potentially make some preparations in advance.
> 
> I don't see any explanation justifying adding an extra tracepoint
> rather than just moving trace_sched_process_exit() earlier in do_exit().
> 
> Why is moving trace_sched_process_exit() earlier in do_exit() an issue,
> considering that any tracer interested in knowing the point where a task
> is really reclaimed (from zombie state) is trace_sched_process_free()
> called from delayed_put_task_struct() ?
> 
> Thanks,
> 
> Mathieu
> 

Thanks.
We will make the modifications according to your suggestions.

--
Best wishes,
Wen

>>
>> Signed-off-by: Wen Yang <wenyang.linux@foxmail.com>
>> Cc: Oleg Nesterov <oleg@redhat.com>
>> Cc: Steven Rostedt <rostedt@goodmis.org>
>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Mel Gorman <mgorman@techsingularity.net>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>   include/trace/events/sched.h | 28 ++++++++++++++++++++++++++++
>>   kernel/exit.c                |  1 +
>>   2 files changed, 29 insertions(+)
>>
>> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
>> index dbb01b4b7451..750b2f0bdf69 100644
>> --- a/include/trace/events/sched.h
>> +++ b/include/trace/events/sched.h
>> @@ -341,6 +341,34 @@ DEFINE_EVENT(sched_process_template, 
>> sched_wait_task,
>>       TP_PROTO(struct task_struct *p),
>>       TP_ARGS(p));
>> +/*
>> + * Tracepoint for profiling a task that is starting to exit:
>> + */
>> +TRACE_EVENT(sched_profile_task_exit,
>> +
>> +    TP_PROTO(struct task_struct *task, long code),
>> +
>> +    TP_ARGS(task, code),
>> +
>> +    TP_STRUCT__entry(
>> +        __array(    char,    comm,    TASK_COMM_LEN    )
>> +        __field(    pid_t,    pid            )
>> +        __field(    int,    prio            )
>> +        __field(    long,    code            )
>> +    ),
>> +
>> +    TP_fast_assign(
>> +        memcpy(__entry->comm, task->comm, TASK_COMM_LEN);
>> +        __entry->pid        = task->pid;
>> +        __entry->prio        = task->prio;
>> +        __entry->code        = code;
>> +    ),
>> +
>> +    TP_printk("comm=%s pid=%d prio=%d exit_code=0x%lx",
>> +          __entry->comm, __entry->pid, __entry->prio,
>> +          __entry->code)
>> +);
>> +
>>   /*
>>    * Tracepoint for a waiting task:
>>    */
>> diff --git a/kernel/exit.c b/kernel/exit.c
>> index 493647fd7c07..f675f879a1b2 100644
>> --- a/kernel/exit.c
>> +++ b/kernel/exit.c
>> @@ -826,6 +826,7 @@ void __noreturn do_exit(long code)
>>       WARN_ON(tsk->plug);
>> +    trace_sched_profile_task_exit(tsk, code);
>>       kcov_task_exit(tsk);
>>       kmsan_task_exit(tsk);
>
  

Patch

diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index dbb01b4b7451..750b2f0bdf69 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -341,6 +341,34 @@  DEFINE_EVENT(sched_process_template, sched_wait_task,
 	TP_PROTO(struct task_struct *p),
 	TP_ARGS(p));
 
+/*
+ * Tracepoint for profiling a task that is starting to exit:
+ */
+TRACE_EVENT(sched_profile_task_exit,
+
+	TP_PROTO(struct task_struct *task, long code),
+
+	TP_ARGS(task, code),
+
+	TP_STRUCT__entry(
+		__array(	char,	comm,	TASK_COMM_LEN	)
+		__field(	pid_t,	pid			)
+		__field(	int,	prio			)
+		__field(	long,	code			)
+	),
+
+	TP_fast_assign(
+		memcpy(__entry->comm, task->comm, TASK_COMM_LEN);
+		__entry->pid		= task->pid;
+		__entry->prio		= task->prio;
+		__entry->code		= code;
+	),
+
+	TP_printk("comm=%s pid=%d prio=%d exit_code=0x%lx",
+		  __entry->comm, __entry->pid, __entry->prio,
+		  __entry->code)
+);
+
 /*
  * Tracepoint for a waiting task:
  */
diff --git a/kernel/exit.c b/kernel/exit.c
index 493647fd7c07..f675f879a1b2 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -826,6 +826,7 @@  void __noreturn do_exit(long code)
 
 	WARN_ON(tsk->plug);
 
+	trace_sched_profile_task_exit(tsk, code);
 	kcov_task_exit(tsk);
 	kmsan_task_exit(tsk);