sched/tracing: correct the task blocking state

Message ID 20240102073351.1527503-1-alexs@kernel.org
State New
Headers
Series sched/tracing: correct the task blocking state |

Commit Message

alexs@kernel.org Jan. 2, 2024, 7:33 a.m. UTC
  From: Alex Shi <alexs@kernel.org>

commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
stopped the idle kthreads contribution to loadavg. Also task idle should
separated from blocked state too, otherwise we will get incorrect task
blocking state from event tracing sched:sched_stat_blocked.

Originally-from: Curu Wong <curuwang@tencent.com>
Signed-off-by: Alex Shi <alexs@kernel.org>
To: linux-kernel@vger.kernel.org
To: Valentin Schneider <vschneid@redhat.com>
To: Daniel Bristot de Oliveira <bristot@redhat.com>
To: Mel Gorman <mgorman@suse.de>
To: Ben Segall <bsegall@google.com>
To: Steven Rostedt <rostedt@goodmis.org>
To: Dietmar Eggemann <dietmar.eggemann@arm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
To: Juri Lelli <juri.lelli@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
To: Ingo Molnar <mingo@redhat.com>
---
 include/linux/sched.h   | 4 ++++
 kernel/sched/deadline.c | 2 +-
 kernel/sched/fair.c     | 2 +-
 kernel/sched/rt.c       | 2 +-
 4 files changed, 7 insertions(+), 3 deletions(-)
  

Comments

Valentin Schneider Jan. 2, 2024, 10:18 a.m. UTC | #1
On 02/01/24 15:33, alexs@kernel.org wrote:
> From: Alex Shi <alexs@kernel.org>
>
> commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
> stopped the idle kthreads contribution to loadavg. Also task idle should
> separated from blocked state too, otherwise we will get incorrect task
> blocking state from event tracing sched:sched_stat_blocked.
>

Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
loadavg yes, but they are still in an UNINTERRUPTIBLE wait.

> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index b28114478b82..b6afa596f071 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1570,7 +1570,7 @@ update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
>                       __schedstat_set(p->stats.sleep_start,
>                                       rq_clock(rq_of_dl_rq(dl_rq)));
>
> -		if (state & TASK_UNINTERRUPTIBLE)
> +		if (is_blocked_task_state(state))
>                       __schedstat_set(p->stats.block_start,
>                                       rq_clock(rq_of_dl_rq(dl_rq)));

This change makes it so tasks waiting in TASK_IDLE have their waiting
ignored by schedstat (they are seen as neither INTERRUPTIBLE nor UNINTERRUPTIBLE).
  
kuiliang Shi Jan. 2, 2024, 1 p.m. UTC | #2
On Tue, Jan 2, 2024 at 6:19 PM Valentin Schneider <vschneid@redhat.com> wrote:
>
> On 02/01/24 15:33, alexs@kernel.org wrote:
> > From: Alex Shi <alexs@kernel.org>
> >
> > commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
> > stopped the idle kthreads contribution to loadavg. Also task idle should
> > separated from blocked state too, otherwise we will get incorrect task
> > blocking state from event tracing sched:sched_stat_blocked.
> >
>
> Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
> meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
> loadavg yes, but they are still in an UNINTERRUPTIBLE wait.


Hi Valentin,
Thanks a lot for the reply.

I agree with you the current usage, but if so, we account for the idle task into
blocked state. And it's better to distinguish between idle and block.

>
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index b28114478b82..b6afa596f071 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1570,7 +1570,7 @@ update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
> >                       __schedstat_set(p->stats.sleep_start,
> >                                       rq_clock(rq_of_dl_rq(dl_rq)));
> >
> > -             if (state & TASK_UNINTERRUPTIBLE)
> > +             if (is_blocked_task_state(state))
> >                       __schedstat_set(p->stats.block_start,
> >                                       rq_clock(rq_of_dl_rq(dl_rq)));
>
> This change makes it so tasks waiting in TASK_IDLE have their waiting
> ignored by schedstat (they are seen as neither INTERRUPTIBLE nor UNINTERRUPTIBLE).
>

Right, maybe it's time to add a new state for TASK_IDLE?

Many thanks!
Alex Shi
  
Valentin Schneider Jan. 2, 2024, 4:16 p.m. UTC | #3
On 02/01/24 21:00, Alex Shi wrote:
> On Tue, Jan 2, 2024 at 6:19 PM Valentin Schneider <vschneid@redhat.com> wrote:
>>
>> On 02/01/24 15:33, alexs@kernel.org wrote:
>> > From: Alex Shi <alexs@kernel.org>
>> >
>> > commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
>> > stopped the idle kthreads contribution to loadavg. Also task idle should
>> > separated from blocked state too, otherwise we will get incorrect task
>> > blocking state from event tracing sched:sched_stat_blocked.
>> >
>>
>> Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
>> meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
>> loadavg yes, but they are still in an UNINTERRUPTIBLE wait.
>
>
> Hi Valentin,
> Thanks a lot for the reply.
>
> I agree with you the current usage, but if so, we account for the idle task into
> blocked state. And it's better to distinguish between idle and block.
>

Why is that an issue? If those tasks didn't have to be
TASK_UNINTERRUPTIBLE (via TASK_IDLE), we'd make them TASK_INTERRUPTIBLE and
they'd also inflate the 'sleeping' schedstat (rather than the 'blocked').

What problem are you facing with those tasks being flagged as blocked during
their wait?
  
kuiliang Shi Jan. 3, 2024, 3:14 a.m. UTC | #4
On Wed, Jan 3, 2024 at 12:16 AM Valentin Schneider <vschneid@redhat.com> wrote:
>
> On 02/01/24 21:00, Alex Shi wrote:
> > On Tue, Jan 2, 2024 at 6:19 PM Valentin Schneider <vschneid@redhat.com> wrote:
> >>
> >> On 02/01/24 15:33, alexs@kernel.org wrote:
> >> > From: Alex Shi <alexs@kernel.org>
> >> >
> >> > commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
> >> > stopped the idle kthreads contribution to loadavg. Also task idle should
> >> > separated from blocked state too, otherwise we will get incorrect task
> >> > blocking state from event tracing sched:sched_stat_blocked.
> >> >
> >>
> >> Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
> >> meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
> >> loadavg yes, but they are still in an UNINTERRUPTIBLE wait.
> >
> >
> > Hi Valentin,
> > Thanks a lot for the reply.
> >
> > I agree with you the current usage, but if so, we account for the idle task into
> > blocked state. And it's better to distinguish between idle and block.
> >
>
> Why is that an issue? If those tasks didn't have to be
> TASK_UNINTERRUPTIBLE (via TASK_IDLE), we'd make them TASK_INTERRUPTIBLE and
> they'd also inflate the 'sleeping' schedstat (rather than the 'blocked').
>
> What problem are you facing with those tasks being flagged as blocked during
> their wait?
>

Uh, Tencent cloud has some latency sensitive services, a blocked state
means the service has
 some trouble, but with IDLE state involved, it's failed on this judgement.
and 2nd, if a service has abnormal, we want to check if it's hanging
on io or sth else, but the top
3 D tasks are often queuework in our system, and even a task in
blocked state we have no
quick way to figure out if it's IDLE or BLOCKED.  2 different states
will help us a lot.

Thanks!
  
kuiliang Shi Jan. 3, 2024, 7:20 a.m. UTC | #5
On Tue, Jan 2, 2024 at 6:19 PM Valentin Schneider <vschneid@redhat.com> wrote:
>
> On 02/01/24 15:33, alexs@kernel.org wrote:
> > From: Alex Shi <alexs@kernel.org>
> >
> > commit 80ed87c8a9ca ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")
> > stopped the idle kthreads contribution to loadavg. Also task idle should
> > separated from blocked state too, otherwise we will get incorrect task
> > blocking state from event tracing sched:sched_stat_blocked.
> >
>
> Why is that incorrect? AFAICT we have mapped the (schedstat) 'blocked'
> meaning to TASK_UNINTERRUPTIBLE. TASK_IDLE tasks don't contribute to
> loadavg yes, but they are still in an UNINTERRUPTIBLE wait.
>
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index b28114478b82..b6afa596f071 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1570,7 +1570,7 @@ update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
> >                       __schedstat_set(p->stats.sleep_start,
> >                                       rq_clock(rq_of_dl_rq(dl_rq)));
> >
> > -             if (state & TASK_UNINTERRUPTIBLE)
> > +             if (is_blocked_task_state(state))
> >                       __schedstat_set(p->stats.block_start,
> >                                       rq_clock(rq_of_dl_rq(dl_rq)));
>
> This change makes it so tasks waiting in TASK_IDLE have their waiting
> ignored by schedstat (they are seen as neither INTERRUPTIBLE nor UNINTERRUPTIBLE).

Right, I will fix it by adding idle time into sleep. will send the 2nd
version patch.

Thanks
Alex
  
Valentin Schneider Jan. 3, 2024, 1:56 p.m. UTC | #6
On 03/01/24 11:14, Alex Shi wrote:
> On Wed, Jan 3, 2024 at 12:16 AM Valentin Schneider <vschneid@redhat.com> wrote:
>>
>> Why is that an issue? If those tasks didn't have to be
>> TASK_UNINTERRUPTIBLE (via TASK_IDLE), we'd make them TASK_INTERRUPTIBLE and
>> they'd also inflate the 'sleeping' schedstat (rather than the 'blocked').
>>
>> What problem are you facing with those tasks being flagged as blocked during
>> their wait?
>>
>
> Uh, Tencent cloud has some latency sensitive services, a blocked state
> means the service has
>  some trouble, but with IDLE state involved, it's failed on this judgement.
> and 2nd, if a service has abnormal, we want to check if it's hanging
> on io or sth else, but the top
> 3 D tasks are often queuework in our system, and even a task in
> blocked state we have no
> quick way to figure out if it's IDLE or BLOCKED.  2 different states
> will help us a lot.
>

That's useful information - generally it's good to add the motivation for a
patch in the changelog.

> Thanks!
  
kuiliang Shi Jan. 4, 2024, 7:56 a.m. UTC | #7
On Wed, Jan 3, 2024 at 9:56 PM Valentin Schneider <vschneid@redhat.com> wrote:
>
> On 03/01/24 11:14, Alex Shi wrote:
> > On Wed, Jan 3, 2024 at 12:16 AM Valentin Schneider <vschneid@redhat.com> wrote:
> >>
> >> Why is that an issue? If those tasks didn't have to be
> >> TASK_UNINTERRUPTIBLE (via TASK_IDLE), we'd make them TASK_INTERRUPTIBLE and
> >> they'd also inflate the 'sleeping' schedstat (rather than the 'blocked').
> >>
> >> What problem are you facing with those tasks being flagged as blocked during
> >> their wait?
> >>
> >
> > Uh, Tencent cloud has some latency sensitive services, a blocked state
> > means the service has
> >  some trouble, but with IDLE state involved, it's failed on this judgement.
> > and 2nd, if a service has abnormal, we want to check if it's hanging
> > on io or sth else, but the top
> > 3 D tasks are often queuework in our system, and even a task in
> > blocked state we have no
> > quick way to figure out if it's IDLE or BLOCKED.  2 different states
> > will help us a lot.
> >
>
> That's useful information - generally it's good to add the motivation for a
> patch in the changelog.

Thanks a lot for coaching, I changed the commit log in the v2 patch:
https://lore.kernel.org/lkml/20240103081042.1549189-1-alexs@kernel.org/
Btw, which way is usual prefered, send v2 patch in a separate thread
or add '--in-reply-to' to follow this thread?

Thanks
Alex

>
> > Thanks!
>
  
Valentin Schneider Jan. 4, 2024, 9:05 a.m. UTC | #8
On 04/01/24 15:56, Alex Shi wrote:
> On Wed, Jan 3, 2024 at 9:56 PM Valentin Schneider <vschneid@redhat.com> wrote:
>>
>> That's useful information - generally it's good to add the motivation for a
>> patch in the changelog.
>
> Thanks a lot for coaching, I changed the commit log in the v2 patch:
> https://lore.kernel.org/lkml/20240103081042.1549189-1-alexs@kernel.org/
> Btw, which way is usual prefered, send v2 patch in a separate thread
> or add '--in-reply-to' to follow this thread?
>

New threads are usually better, it makes it clear that it is a new version
of the patch and not just a reply to the previous version's
thread.

Reviewers/Maintainers can also jump more easily on the v(n+1) when it's a
separate thread, in case they missed / didn't have time to review the
previous version.

> Thanks
> Alex
>
>>
>> > Thanks!
>>
  

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 292c31697248..341e62255ea7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -140,6 +140,10 @@  struct user_event_mm;
 #define is_special_task_state(state)				\
 	((state) & (__TASK_STOPPED | __TASK_TRACED | TASK_PARKED | TASK_DEAD))
 
+/* blocked task is UNINTERRUPTIBLE but not NOLOAD */
+#define is_blocked_task_state(state)				\
+	((state) & TASK_UNINTERRUPTIBLE && (!((state) & TASK_NOLOAD)))
+
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 # define debug_normal_state_change(state_value)				\
 	do {								\
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index b28114478b82..b6afa596f071 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1570,7 +1570,7 @@  update_stats_dequeue_dl(struct dl_rq *dl_rq, struct sched_dl_entity *dl_se,
 			__schedstat_set(p->stats.sleep_start,
 					rq_clock(rq_of_dl_rq(dl_rq)));
 
-		if (state & TASK_UNINTERRUPTIBLE)
+		if (is_blocked_task_state(state))
 			__schedstat_set(p->stats.block_start,
 					rq_clock(rq_of_dl_rq(dl_rq)));
 	}
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d7a3c63a2171..349b0c5104b6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1281,7 +1281,7 @@  update_stats_dequeue_fair(struct cfs_rq *cfs_rq, struct sched_entity *se, int fl
 		if (state & TASK_INTERRUPTIBLE)
 			__schedstat_set(tsk->stats.sleep_start,
 				      rq_clock(rq_of(cfs_rq)));
-		if (state & TASK_UNINTERRUPTIBLE)
+		if (is_blocked_task_state(state))
 			__schedstat_set(tsk->stats.block_start,
 				      rq_clock(rq_of(cfs_rq)));
 	}
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 6aaf0a3d6081..2fdf3d71428d 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1375,7 +1375,7 @@  update_stats_dequeue_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se,
 			__schedstat_set(p->stats.sleep_start,
 					rq_clock(rq_of_rt_rq(rt_rq)));
 
-		if (state & TASK_UNINTERRUPTIBLE)
+		if (is_blocked_task_state(state))
 			__schedstat_set(p->stats.block_start,
 					rq_clock(rq_of_rt_rq(rt_rq)));
 	}