[PATCHSET,v2,0/2] Split iowait into two states

Message ID	20240227211152.1099534-1-axboe@kernel.dk
Headers	Received-SPF: pass (google.com: domain of linux-kernel+bounces-84016-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; From: Jens Axboe <axboe@kernel.dk> To: linux-kernel@vger.kernel.org Cc: peterz@infradead.org, mingo@redhat.com Subject: [PATCHSET v2 0/2] Split iowait into two states Date: Tue, 27 Feb 2024 14:06:01 -0700 Message-ID: <20240227211152.1099534-1-axboe@kernel.dk> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX
Series	Split iowait into two states \| [PATCHSET,v2,0/2] Split iowait into two states [1/2] sched/core: switch struct rq->nr_iowait to an unsigned int [2/2] sched/core: split iowait state into two states

Message ID

20240227211152.1099534-1-axboe@kernel.dk

Headers

Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-84016-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.199.223 as permitted sender) client-ip=147.75.199.223;
From: Jens Axboe <axboe@kernel.dk>
To: linux-kernel@vger.kernel.org
Cc: peterz@infradead.org,
	mingo@redhat.com
Subject: [PATCHSET v2 0/2] Split iowait into two states
Date: Tue, 27 Feb 2024 14:06:01 -0700
Message-ID: <20240227211152.1099534-1-axboe@kernel.dk>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

Split iowait into two states |

Message

Jens Axboe Feb. 27, 2024, 9:06 p.m. UTC

  Hi,

This is v2 of the patch posted yesterday, where the current in_iowait
state is split into two parts:

1) The "task is sleeping waiting on IO", and would like cpufreq goodness
   in terms of sleep and wakeup latencies.
2) The above, and also accounted as such in the iowait stats.

The current ->in_iowait covers both, with this series we have ->in_iowait
for step 1 above, and ->in_iowait_acct for step 2. You cannot be
->in_iowait_acct without also having ->in_iowait set.

Patch 1 is a prep patch, that turns rq->nr_iowait into an unsigned int
rather than an atomic_t. Reasons given in that patch.

Patch 2 adds the ->in_iowait_acct stage inside the current ->in_iowait
setting.

I haven't been able to properly benchmark patch 1, as the atomics are
noise in any workloads that approximate normality. I can certainly
concoct a synthetic test case if folks are interested. My gut says that
we're trading 3 fast path atomics for none, and with the 4th case
_probably_ being way less likely. There we grab the rq lock.

Comments welcome! Peter, CC'ing you since I did in the previous, feel
free to ignore.

Since v1:
- Add prep patch 1, switching nr_iowait to an unsigned int
- Modify patch 2 to not use atomic_t as well, no changes otherwise

 arch/s390/appldata/appldata_base.c |  2 +-
 arch/s390/appldata/appldata_os.c   |  2 +-
 fs/proc/stat.c                     |  2 +-
 include/linux/sched.h              |  6 ++++
 include/linux/sched/stat.h         | 10 ++++--
 kernel/sched/core.c                | 55 +++++++++++++++++++++++-------
 kernel/sched/cputime.c             |  2 +-
 kernel/sched/sched.h               |  3 +-
 kernel/time/tick-sched.c           |  6 ++--
 9 files changed, 66 insertions(+), 22 deletions(-)

Comments

Jens Axboe Feb. 28, 2024, 2:21 a.m. UTC | #1

On 2/27/24 2:06 PM, Jens Axboe wrote:
> I haven't been able to properly benchmark patch 1, as the atomics are
> noise in any workloads that approximate normality. I can certainly
> concoct a synthetic test case if folks are interested. My gut says that
> we're trading 3 fast path atomics for none, and with the 4th case
> _probably_ being way less likely. There we grab the rq lock.

OK, so on Chris's suggestion, I tried his schbench to exercise the
scheduling side. It's very futex intensive, so I hacked up futex to set
iowait state when sleeping. I also added simple accounting to that path
so I knew how many times it ran. A run of:

/schbench -m 60 -t 10 -p 8

on a 2 socket Intel(R) Xeon(R) Platinum 8458P with 176 threads, there's
no regression in performance and try_to_wake_up() locking the rq of the
task being scheduled in from another CPU doesn't seem to register much.
On the previous run, I saw 2.21% there and now it's 2.36%. But it was
also a better performing run, which may have lead to the increase.

Each run takes 30 seconds, and during that time I see around 290-310M
hits of that path, or about ~10M/sec. Without modifying futex to use
iowait, we obviously rarely hit it. About 200 times for a run, which
makes sense as we're not really doing IO.

Anyway, just some data on this. If I leave the futex/pipe iowait in and
run the same test, I see no discernable difference in profiles. In fact,
the highest cost across the tests is bringing in the task->in_iowait
cacheline.