From patchwork Tue Jul 25 19:30:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mathieu Desnoyers X-Patchwork-Id: 125795 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9010:0:b0:3e4:2afc:c1 with SMTP id l16csp2713228vqg; Tue, 25 Jul 2023 13:27:24 -0700 (PDT) X-Google-Smtp-Source: APBJJlEvlewEPbNJKTHnoWnliENHfVf/UtFO5BmQ3ub1lhLu4cYkxOq3wMpQvPjkswlmYhFmpyO8 X-Received: by 2002:aa7:c50c:0:b0:522:56e7:bafb with SMTP id o12-20020aa7c50c000000b0052256e7bafbmr3774edq.19.1690316844454; Tue, 25 Jul 2023 13:27:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690316844; cv=none; d=google.com; s=arc-20160816; b=f+uZvPW9+dsOJH7OrFAp5Zmqhc+saKWw2wEVS7D9+la4kaVlyrV1mGbjShDApJAIUT ak9mD7NCDaJEj7RiGd9nP6LT9ZhetXI1T23/U3nEAn9ytM4JVuKQJoBdSK7JdMW4g0aJ HuoKEdfU5XYfMzWqNsefHibjUKpmB5LZPZwB+d9rlnRt9SuSbJ1MwIZerHm/BpP41Wp9 38CbtW626Kdv/vczYr0Yd2+2AunSv+r/w3z9HZAFMCr1xAKrCw8dp83O4J44aHVUhNZU Kh5ziXrzho1c5PvMkNRAp7wW4xk04oQR2clrubO1v/mDb2gQ1fAIKa1da2XdjOJw9X0n mQvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=ccP1ir6jefWXqE7qXOrwt2wVtdr+z1qhzq9gZj4Wddc=; fh=6jUvvwNgggoeLCB3PgvGfBZFdr+LWzDd6oFaQWdUTrM=; b=zCLiQLfOsLZgEOBwQ5p7rv/tV2ow5Qc/S5ApXSOhio+Fdz/glo6Hn5+ZE8V681OANA tq3tKkMKjWYadz5u1KH4RQ0NfDg2fYg7SJ/swFThE1Pxm2ChkZeqmdht9KpS44vYhqV6 CMv9a8DBpUJvpihtJd91OsBK7BxWTv56eLlh/oqEy31lBjhhZHWgd+XL3CxQPwFmbfpf i+5w/HyB6qJTfnV//+Fkmy4d+z+GJTaRWPIZKfgRwGymdOtUtpY628ZVllaOvZkQ/tqE HYrthcNHA9xZZrblgdMbnigbyCF+50LMYBb8vo4LX9yF/CaBw7xmkiusyWcsMHojnbg/ acFA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=eXeG+43P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f19-20020a056402069300b0051e16fd4d9asi8361830edy.223.2023.07.25.13.27.00; Tue, 25 Jul 2023 13:27:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@efficios.com header.s=smtpout1 header.b=eXeG+43P; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=efficios.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229907AbjGYTaO (ORCPT + 99 others); Tue, 25 Jul 2023 15:30:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32842 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230379AbjGYTaN (ORCPT ); Tue, 25 Jul 2023 15:30:13 -0400 Received: from smtpout.efficios.com (unknown [IPv6:2607:5300:203:b2ee::31e5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78F0B1FFC for ; Tue, 25 Jul 2023 12:30:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=efficios.com; s=smtpout1; t=1690313409; bh=eksg/xSQaqbvhzxjIvfY0VWqJKo9vV4O1aJahYJV9EU=; h=From:To:Cc:Subject:Date:From; b=eXeG+43Pt5Ybsmy61t2JpU5U9Bbulf/sp2lq3j+akg+AtWvpxqBhpUJ23PVSw1+at oJObQ7aUzj7wNJGZU/M9yv2YsVMMFxwNCJkmcHpiGWTNFNPz4UbYTSr8X7pxZfuooL kP6JK1GFJF1L1jT4f0LLW5gPNf8GBFf9NIESdlCjAswZaFycQT0bYLqt3hTDuYCAHB ztXFfdObz06oinFRCUFIvpx37fn7opjlH3JZTtGnaBiXlF4/vyq9r3ko5yGFs6hTtA u9OToijt0yeBlshWrDePz6KdFwRcpxL6+y/AKoI/dELDqNQ/V9c8yQ85R4IPbno8zr ZtfK3Jd2cw/6Q== Received: from thinkos.internal.efficios.com (192-222-143-198.qc.cable.ebox.net [192.222.143.198]) by smtpout.efficios.com (Postfix) with ESMTPSA id 4R9Rv463Hcz1JZF; Tue, 25 Jul 2023 15:30:08 -0400 (EDT) From: Mathieu Desnoyers To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Mathieu Desnoyers , Ingo Molnar , Valentin Schneider , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Vincent Guittot , Juri Lelli , Swapnil Sapkal , Aaron Lu , x86@kernel.org Subject: [RFC PATCH 1/1] sched: Extend cpu idle state for 1ms Date: Tue, 25 Jul 2023 15:30:48 -0400 Message-Id: <20230725193048.124796-1-mathieu.desnoyers@efficios.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED,RDNS_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772425675469755676 X-GMAIL-MSGID: 1772425675469755676 Allow select_task_rq to consider a cpu as idle for 1ms after that cpu has exited the idle loop. This speeds up the following hackbench workload on a 192 cores AMD EPYC 9654 96-Core Processor (over 2 sockets): hackbench -g 32 -f 20 --threads --pipe -l 480000 -s 100 from 49s to 34s. (30% speedup) My working hypothesis for why this helps is: queuing more than a single task on the runqueue of a cpu which just exited idle rather than spreading work over other idle cpus helps power efficiency on systems with large number of cores. This was developed as part of the investigation into a weird regression reported by AMD where adding a raw spinlock in the scheduler context switch accelerated hackbench. It turned out that changing this raw spinlock for a loop of 10000x cpu_relax within do_idle() had similar benefits. This patch achieve a similar effect without the busy-waiting by introducing a runqueue state sampling the sched_clock() when exiting idle, which allows select_task_rq to consider "as idle" a cpu which has recently exited idle. This patch should be considered "food for thoughts", and I would be glad to hear feedback on whether it causes regressions on _other_ workloads, and whether it helps with the hackbench workload on large Intel system as well. Link: https://lore.kernel.org/r/09e0f469-a3f7-62ef-75a1-e64cec2dcfc5@amd.com Signed-off-by: Mathieu Desnoyers Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: Vincent Guittot Cc: Juri Lelli Cc: Swapnil Sapkal Cc: Aaron Lu Cc: x86@kernel.org --- kernel/sched/core.c | 4 ++++ kernel/sched/sched.h | 3 +++ 2 files changed, 7 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index a68d1276bab0..d40e3a0a5ced 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6769,6 +6769,7 @@ void __sched schedule_idle(void) * TASK_RUNNING state. */ WARN_ON_ONCE(current->__state); + WRITE_ONCE(this_rq()->idle_end_time, sched_clock()); do { __schedule(SM_NONE); } while (need_resched()); @@ -7300,6 +7301,9 @@ int idle_cpu(int cpu) { struct rq *rq = cpu_rq(cpu); + if (sched_clock() < READ_ONCE(rq->idle_end_time) + IDLE_CPU_DELAY_NS) + return 1; + if (rq->curr != rq->idle) return 0; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 81ac605b9cd5..8932e198a33a 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -97,6 +97,8 @@ # define SCHED_WARN_ON(x) ({ (void)(x), 0; }) #endif +#define IDLE_CPU_DELAY_NS 1000000 /* 1ms */ + struct rq; struct cpuidle_state; @@ -1010,6 +1012,7 @@ struct rq { struct task_struct __rcu *curr; struct task_struct *idle; + u64 idle_end_time; struct task_struct *stop; unsigned long next_balance; struct mm_struct *prev_mm;