From patchwork Fri Oct 20 01:40:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joel Fernandes X-Patchwork-Id: 155751 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2010:b0:403:3b70:6f57 with SMTP id fe16csp764562vqb; Thu, 19 Oct 2023 18:40:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE1icQz0hsCoZrB1lFZhkS6PfzpZjTnxwi1X+G+tTE5ZMwfCb9wiZSKXbmMsqC6PqPmgDKA X-Received: by 2002:a17:90b:4c52:b0:27d:4278:ba53 with SMTP id np18-20020a17090b4c5200b0027d4278ba53mr552338pjb.47.1697766050722; Thu, 19 Oct 2023 18:40:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697766050; cv=none; d=google.com; s=arc-20160816; b=NTwBdsIDLi6g8pTHGi5JEd7G3vpY/B9WxT/qz6UHrT/feg9TpgixzEu9rHSt9mjhgp /DAulpjIp88+ZXoYW/jjIafE16x94dslbeCH79zKYYNZBPMnKcAwjrMhC/JfluR/fYPo BnBkZ1lG2lJ+gHO1VkPgPmn+kRc48iSjj5uanN7EG95u/K1ZlXK5LAr/Vjdn/vfxpRUG OwCalHwZbUJB/PLF99Z1fK5iHqxtzosp0ycY/lSygcK4l/Q248x3GFoGG7/z4NVPCsix Q4T7xqkNELbGZOdbrl/hDRpbjbIZVO9iUBVsQTJg8oHp4KajDKI9soA8GUTVlgXQCvuj XtVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=nR2o8TcR34USlB3A1w+r6NomFEotH136iyhtq4t1qlA=; fh=nbgVp/chHZ5mNTpIJeNwCRarsVf+m11/j0rQwPz3ZRk=; b=DiuMQ7yk2+E/36WC3TPQj2AW9/s7AwdzqwY+YBRX6/bgmbmtz8oRcNKEyI4LeRY1+b KxT5lfSDf2/RX1lhX8W2X6MG9ocTplXWrbODApz51bN3o1PdhlEadhlUKCUIt+DaO7mk 2yf8Nt3wtY1nwISCHQhGxq7zilKLvSs8JeBnoilzOBXanHnV6C5CmP6xuLpH3V4Vj3V8 b2HCFJwNgSXL8eGHUz1bRG37qtDhveNFXJdCbP5O5Gv/IjbSkplB4CqAZGWZ+x6rJ2Fn 08d3Gyzz3d1gAfE8d8EOukc7Efdt7RRz+TdyqqZpPs7abb20o6/FpFMQ8hoU8acef1bw vWVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=NqYlplqs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id z9-20020a17090a8b8900b0027d30e575ccsi797937pjn.115.2023.10.19.18.40.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 18:40:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=NqYlplqs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id C5D0C82881BB; Thu, 19 Oct 2023 18:40:49 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346788AbjJTBkp (ORCPT + 25 others); Thu, 19 Oct 2023 21:40:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47176 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346767AbjJTBko (ORCPT ); Thu, 19 Oct 2023 21:40:44 -0400 Received: from mail-il1-x134.google.com (mail-il1-x134.google.com [IPv6:2607:f8b0:4864:20::134]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5B945114 for ; Thu, 19 Oct 2023 18:40:42 -0700 (PDT) Received: by mail-il1-x134.google.com with SMTP id e9e14a558f8ab-35135b79b6aso1114165ab.0 for ; Thu, 19 Oct 2023 18:40:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; t=1697766041; x=1698370841; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=nR2o8TcR34USlB3A1w+r6NomFEotH136iyhtq4t1qlA=; b=NqYlplqs+/Z7j2ad/reXxwB7OpvGEhEeb3bUywE/dhBGdk9ViE0NuD1Jhz93p1TQoS WAS/kE3F+uPCi8lnYWfYiokYBWR3WNoSGin3QKMp5nntr20xaj4XRI0pWwAM3ZA09kDX yHlOezGreQU7kU42tZQp7OW2rCnt+3c51Ey3g= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697766041; x=1698370841; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=nR2o8TcR34USlB3A1w+r6NomFEotH136iyhtq4t1qlA=; b=rFVyBQVQzLnJzVZV/s2YFijaGayxUZj7eJFAD7pjORy8RnzUXQTdxfK5J8rylCSu+J M25ysmJyY228qZn075DKTPILKTB5Q3fdniGFKqxM2OXBH9QEwt1mp3lV+Rewhs+hFoGo XKXuJq9DzwrB8S22PhG3x6mK64eU4OCgiQiAr3ZztzzRXmBN2AhVmc8CmRExq5pELbUc Q7EuZW6g2eY/B6Km+9atNH495ba01R6DsebXrfBJ4AhLgNyFO/JBSfHn+tWhZPp1Z7cx ta8sUAyuyqkFwKzrP2d+IiQMDfiXIGVPhl5Umdkm4e9Tn+HtGK44Gt9gDokQGJdFjCyz lbXg== X-Gm-Message-State: AOJu0YwzeUmrwNrkIjMWJURNd4fYe29HbTlMLm/hf94KUsImrbAw/AGe WTAsvxGU4BvPW+lM5Zb7MTXRn1jxuu6UjOh7lf1tuQ== X-Received: by 2002:a92:c906:0:b0:34f:77bc:8d49 with SMTP id t6-20020a92c906000000b0034f77bc8d49mr683100ilp.23.1697766041302; Thu, 19 Oct 2023 18:40:41 -0700 (PDT) Received: from joelboxx5.c.googlers.com.com (20.10.132.34.bc.googleusercontent.com. [34.132.10.20]) by smtp.gmail.com with ESMTPSA id h9-20020a056e020d4900b00350f5584876sm270394ilj.27.2023.10.19.18.40.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 18:40:40 -0700 (PDT) From: "Joel Fernandes (Google)" To: linux-kernel@vger.kernel.org, Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider Cc: "Joel Fernandes (Google)" , Suleiman Souhlal , Frederic Weisbecker , "Paul E . McKenney" , Vineeth Pillai Subject: [PATCH 1/3] sched/nohz: Update nohz.next_balance directly without IPIs (v2) Date: Fri, 20 Oct 2023 01:40:26 +0000 Message-ID: <20231020014031.919742-1-joel@joelfernandes.org> X-Mailer: git-send-email 2.42.0.655.g421f12c284-goog MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 19 Oct 2023 18:40:49 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780236734494749747 X-GMAIL-MSGID: 1780236734494749747 Whenever a CPU stops its tick, it now requires another idle CPU to handle the balancing for it because it can't perform its own periodic load balancing. This means it might need to update 'nohz.next_balance' to 'rq->next_balance' if the upcoming nohz-idle load balancing is too distant in the future. This update process is done by triggering an ILB, as the general ILB handler (_nohz_idle_balance) that manages regular nohz balancing also refreshes 'nohz.next_balance' by looking at the 'rq->next_balance' of all other idle CPUs and selecting the smallest value. Triggering this ILB is achieved in current mainline by setting the NOHZ_NEXT_KICK flag. This primarily results in the ILB handler updating 'nohz.next_balance' while possibly not doing any load balancing at all. However, sending an IPI merely to refresh 'nohz.next_balance' seems excessive. This patch therefore directly sets nohz.next_balance from the CPU stopping the tick. Testing shows a considerable reduction in IPIs when doing this: Running "cyclictest -i 100 -d 100 --latency=1000 -t -m" on a 4vcpu VM the IPI call count profiled over 10s period is as follows: without fix: ~10500 with fix: ~1000 Also just to note, without this patch we observe the following pattern: 1. A CPU is about to stop its tick. 2. It sets nohz.needs_update to 1. 3. It then stops its tick and goes idle. 4. The scheduler tick on another CPU checks this flag and decides an ILB kick is needed. 5. The ILB CPU ends up being the one that just stopped its tick! 6. This results in an IPI to the tick-stopped CPU which ends up waking it up and disturbing it! Finally, this patch also avoids having to go through all the idle CPUs just to update nohz.next_balance when the tick is stopped. Previous version of patch had some issues which are addressed now: https://lore.kernel.org/all/20231005161727.1855004-1-joel@joelfernandes.org/ Cc: Suleiman Souhlal Cc: Steven Rostedt Cc: Frederic Weisbecker Cc: Paul E. McKenney Fixes: 7fd7a9e0caba ("sched/fair: Trigger nohz.next_balance updates when a CPU goes NOHZ-idle") Co-developed-by: Vineeth Pillai (Google) Signed-off-by: Vineeth Pillai (Google) Signed-off-by: Joel Fernandes (Google) --- kernel/sched/fair.c | 44 +++++++++++++++++++++++++++++--------------- kernel/sched/sched.h | 5 +---- 2 files changed, 30 insertions(+), 19 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index cb225921bbca..965c30fbbe5c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6627,7 +6627,6 @@ static struct { cpumask_var_t idle_cpus_mask; atomic_t nr_cpus; int has_blocked; /* Idle CPUS has blocked load */ - int needs_update; /* Newly idle CPUs need their next_balance collated */ unsigned long next_balance; /* in jiffy units */ unsigned long next_blocked; /* Next update of blocked load in jiffies */ } nohz ____cacheline_aligned; @@ -11687,9 +11686,6 @@ static void nohz_balancer_kick(struct rq *rq) unlock: rcu_read_unlock(); out: - if (READ_ONCE(nohz.needs_update)) - flags |= NOHZ_NEXT_KICK; - if (flags) kick_ilb(flags); } @@ -11740,6 +11736,20 @@ static void set_cpu_sd_state_idle(int cpu) rcu_read_unlock(); } +static inline void +update_nohz_next_balance(unsigned long next_balance) +{ + unsigned long nohz_next_balance; + + /* In event of a race, only update with the earliest next_balance. */ + do { + nohz_next_balance = READ_ONCE(nohz.next_balance); + if (!time_after(nohz_next_balance, next_balance)) + break; + } while (!try_cmpxchg(&nohz.next_balance, &nohz_next_balance, + next_balance)); +} + /* * This routine will record that the CPU is going idle with tick stopped. * This info will be used in performing idle load balancing in the future. @@ -11786,13 +11796,13 @@ void nohz_balance_enter_idle(int cpu) /* * Ensures that if nohz_idle_balance() fails to observe our * @idle_cpus_mask store, it must observe the @has_blocked - * and @needs_update stores. + * store. */ smp_mb__after_atomic(); set_cpu_sd_state_idle(cpu); - WRITE_ONCE(nohz.needs_update, 1); + update_nohz_next_balance(rq->next_balance); out: /* * Each time a cpu enter idle, we assume that it has blocked load and @@ -11829,6 +11839,7 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags) /* Earliest time when we have to do rebalance again */ unsigned long now = jiffies; unsigned long next_balance = now + 60*HZ; + unsigned long old_nohz_next_balance; bool has_blocked_load = false; int update_next_balance = 0; int this_cpu = this_rq->cpu; @@ -11837,6 +11848,8 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags) SCHED_WARN_ON((flags & NOHZ_KICK_MASK) == NOHZ_BALANCE_KICK); + old_nohz_next_balance = READ_ONCE(nohz.next_balance); + /* * We assume there will be no idle load after this update and clear * the has_blocked flag. If a cpu enters idle in the mean time, it will @@ -11844,13 +11857,9 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags) * Because a cpu that becomes idle, is added to idle_cpus_mask before * setting the flag, we are sure to not clear the state and not * check the load of an idle cpu. - * - * Same applies to idle_cpus_mask vs needs_update. */ if (flags & NOHZ_STATS_KICK) WRITE_ONCE(nohz.has_blocked, 0); - if (flags & NOHZ_NEXT_KICK) - WRITE_ONCE(nohz.needs_update, 0); /* * Ensures that if we miss the CPU, we must see the has_blocked @@ -11874,8 +11883,6 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags) if (need_resched()) { if (flags & NOHZ_STATS_KICK) has_blocked_load = true; - if (flags & NOHZ_NEXT_KICK) - WRITE_ONCE(nohz.needs_update, 1); goto abort; } @@ -11906,12 +11913,19 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags) } /* - * next_balance will be updated only when there is a need. + * nohz.next_balance will be updated only when there is a need. * When the CPU is attached to null domain for ex, it will not be * updated. + * + * Also, if it changed since we scanned the nohz CPUs above, do nothing as: + * 1. A concurrent call to _nohz_idle_balance() moved nohz.next_balance forward. + * 2. nohz_balance_enter_idle moved it backward. */ - if (likely(update_next_balance)) - nohz.next_balance = next_balance; + if (likely(update_next_balance)) { + /* Pairs with the smp_mb() above. */ + cmpxchg_release(&nohz.next_balance, old_nohz_next_balance, + next_balance); + } if (flags & NOHZ_STATS_KICK) WRITE_ONCE(nohz.next_blocked, diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 04846272409c..cf3597d91977 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2874,7 +2874,6 @@ extern void cfs_bandwidth_usage_dec(void); #define NOHZ_BALANCE_KICK_BIT 0 #define NOHZ_STATS_KICK_BIT 1 #define NOHZ_NEWILB_KICK_BIT 2 -#define NOHZ_NEXT_KICK_BIT 3 /* Run rebalance_domains() */ #define NOHZ_BALANCE_KICK BIT(NOHZ_BALANCE_KICK_BIT) @@ -2882,10 +2881,8 @@ extern void cfs_bandwidth_usage_dec(void); #define NOHZ_STATS_KICK BIT(NOHZ_STATS_KICK_BIT) /* Update blocked load when entering idle */ #define NOHZ_NEWILB_KICK BIT(NOHZ_NEWILB_KICK_BIT) -/* Update nohz.next_balance */ -#define NOHZ_NEXT_KICK BIT(NOHZ_NEXT_KICK_BIT) -#define NOHZ_KICK_MASK (NOHZ_BALANCE_KICK | NOHZ_STATS_KICK | NOHZ_NEXT_KICK) +#define NOHZ_KICK_MASK (NOHZ_BALANCE_KICK | NOHZ_STATS_KICK) #define nohz_flags(cpu) (&cpu_rq(cpu)->nohz_flags) From patchwork Fri Oct 20 01:40:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joel Fernandes X-Patchwork-Id: 155752 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2010:b0:403:3b70:6f57 with SMTP id fe16csp764570vqb; Thu, 19 Oct 2023 18:40:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFqCUPn4pyul8piSsneNtaqcE1U9Iu06H8M0CIV7f6auw7aVliiruEyH6hq6udWSOn/tTmM X-Received: by 2002:a0d:dfca:0:b0:5a8:2078:48e5 with SMTP id i193-20020a0ddfca000000b005a8207848e5mr575519ywe.31.1697766052036; Thu, 19 Oct 2023 18:40:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697766051; cv=none; d=google.com; s=arc-20160816; b=xjuHc+I8B7ks1LLxFwqwELQCQ4yaTxfmJwbWTPsstkG+6F9kW6wLYKuAaOViL9rHKn rPpBQ0p2qj3eJ5Qu0OfRlRu3xC/2VKvTIn5cmFJEujbnTOfAv5K6WlywPJPEWbbx5YA/ CHMh/ZOo8zQ1kEyXDVN8tHq+DxXtY2Ch2p0aw3/mrGe4RB/pyJez/9Edt+j2rHKiGxHz qhD46IgfT6+e9gZBBvEM0CJd38WyNQeM1aHEFOlcN2v9cIpqF2VtM4AIPhyUdQ9FXPmN n53+Xm8Px8TOfEI7nsX3EOQ1JfJ5VJAd6saEXRMfPTVs2fkOF3psnBFLnYS8SUs1yrVB zGvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=yh5be8dZ5dcoT5J8JNwZ13KMhHHmnbemy0aycAzw2qI=; fh=nnfH2AC1mT31g9MWQ7pNPr5F+2lLf+9flBau2HM0dhA=; b=gyvpRGY1WnP6Zk16Pu4YYpfR6jldbLrjU0HCjw8YxquEi6U3YRwvr5kWiRo4zf8YL4 Jwogn874vZBQsTO8UsYkW43HJSMR+oiJNlUgFCUstW3eybkDN7GLZ39bONL0Yays6URE xmpkSqLKAdQ/fjjakyWZmqOht715qeTxbshBVs0j1pgLyKcqnixbE2x+LZjLl4csMgse Udw4VvgsZ0L5CGvnF0RFagF7CQ3d+hUno4aJv1eRic9zSPTmfUYg7HvrmDkOoS9c/NBO 0d/jkoK23TnE9fKA7UXLyC6EygIYPpXiJaWajYepCr+1/Jdlfpw1v+ontheWEmd11rbC E5Xw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=xI8wf3N3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id h4-20020a636c04000000b005ace065e52dsi755341pgc.369.2023.10.19.18.40.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 18:40:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=xI8wf3N3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id B28C58288C82; Thu, 19 Oct 2023 18:40:50 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346811AbjJTBkq (ORCPT + 25 others); Thu, 19 Oct 2023 21:40:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47188 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346780AbjJTBkp (ORCPT ); Thu, 19 Oct 2023 21:40:45 -0400 Received: from mail-il1-x12a.google.com (mail-il1-x12a.google.com [IPv6:2607:f8b0:4864:20::12a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9715B119 for ; Thu, 19 Oct 2023 18:40:43 -0700 (PDT) Received: by mail-il1-x12a.google.com with SMTP id e9e14a558f8ab-3576121362eso1110585ab.1 for ; Thu, 19 Oct 2023 18:40:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; t=1697766042; x=1698370842; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=yh5be8dZ5dcoT5J8JNwZ13KMhHHmnbemy0aycAzw2qI=; b=xI8wf3N3hDb622ncSnja0fIzG4grMTp1OelEPRX/VWHTlYsXObTrqplI9Si8vKbNb2 fJ0/H2vY6x/HC3wa7wxl2aL34cX77rVBEumY+/pSODy+Pz0x61HB35AsDTuqSxeEW31/ x3WUqUvOaIAWwPalQFXGemCB69C9LloMlhg5s= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697766042; x=1698370842; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yh5be8dZ5dcoT5J8JNwZ13KMhHHmnbemy0aycAzw2qI=; b=U+d1ClX5+GddiNTQ+1PAGLF53zYC6F6NWuOrtFO4Xe1dtokUfGDqn0x8OlZYBo+YX1 OI0PrcBMTgnyeqwb6DE//NrmvQ8QQpMtjbWtGgKyISvpHwceFU0TPyaUgceZY2kjtr6+ RA1E3D9NXGps4v7BL4ve9svkXH4p6vJkNdknmuI5MQhtA5gaX+MINM9gu4Wfg5Uj9PIT x44JRcOK3VB6U8qIbH1PrZ9jPHrwV8etHpU7t2+wRUIOv3o2aZz74k2KRhxIOGZlT+lT G0QzlAHHzRnO39gpSHNOK9rF/E9L/zNAGVKAgzp+RP37T9+88yA/HZlA5JuOnU5AcGSu Z5dw== X-Gm-Message-State: AOJu0YxdCsyyD88RxDhHaRo0wNbFEeWvOBlDfPNCpkSJJlFSR1w7m6SH Tjh5sGoeX7iyL9IUYWwYEy0yLZ6740IauWkikQtKSA== X-Received: by 2002:a92:cb41:0:b0:357:a3fb:1a81 with SMTP id f1-20020a92cb41000000b00357a3fb1a81mr682373ilq.21.1697766042597; Thu, 19 Oct 2023 18:40:42 -0700 (PDT) Received: from joelboxx5.c.googlers.com.com (20.10.132.34.bc.googleusercontent.com. [34.132.10.20]) by smtp.gmail.com with ESMTPSA id h9-20020a056e020d4900b00350f5584876sm270394ilj.27.2023.10.19.18.40.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 18:40:42 -0700 (PDT) From: "Joel Fernandes (Google)" To: linux-kernel@vger.kernel.org, Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider Cc: "Joel Fernandes (Google)" , Suleiman Souhlal , Frederic Weisbecker , "Paul E . McKenney" Subject: [PATCH 2/3] sched/nohz: Update comments about NEWILB_KICK Date: Fri, 20 Oct 2023 01:40:27 +0000 Message-ID: <20231020014031.919742-2-joel@joelfernandes.org> X-Mailer: git-send-email 2.42.0.655.g421f12c284-goog In-Reply-To: <20231020014031.919742-1-joel@joelfernandes.org> References: <20231020014031.919742-1-joel@joelfernandes.org> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 19 Oct 2023 18:40:50 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780236735958778260 X-GMAIL-MSGID: 1780236735958778260 How ILB is triggered without IPIs is cryptic. Out of mercy for future code readers, document it in code comments. The comments are derived from a discussion with Vincent in a past review. Cc: Suleiman Souhlal Cc: Steven Rostedt Cc: Frederic Weisbecker Cc: Paul E. McKenney Signed-off-by: Joel Fernandes (Google) --- kernel/sched/fair.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 965c30fbbe5c..8e276d12c3cb 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -11959,8 +11959,19 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) } /* - * Check if we need to run the ILB for updating blocked load before entering - * idle state. + * Check if we need to directly run the ILB for updating blocked load before + * entering idle state. Here we run ILB directly without issuing IPIs. + * + * Note that when this function is called, the tick may not yet be stopped on + * this CPU yet. nohz.idle_cpus_mask is updated only when tick is stopped and + * cleared on the next busy tick. In other words, nohz.idle_cpus_mask updates + * don't align with CPUs enter/exit idle to avoid bottlenecks due to high idle + * entry/exit rate (usec). So it is possible that _nohz_idle_balance() is + * called from this function on (this) CPU that's not yet in the mask. That's + * OK because the goal of nohz_run_idle_balance() is to run ILB only for + * updating the blocked load of already idle CPUs without waking up one of + * those idle CPUs and outside the preempt disable / irq off phase of the local + * cpu about to enter idle, because it can take a long time. */ void nohz_run_idle_balance(int cpu) { From patchwork Fri Oct 20 01:40:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joel Fernandes X-Patchwork-Id: 155753 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2010:b0:403:3b70:6f57 with SMTP id fe16csp764588vqb; Thu, 19 Oct 2023 18:40:56 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEcR1EHDyCLgkMzLL1Fpt2xBknOVYeTS0GndLByeedLUod4lmQGsjEd/1Ws7ANjU2moNhnb X-Received: by 2002:a17:902:d682:b0:1c6:7ba:3a9a with SMTP id v2-20020a170902d68200b001c607ba3a9amr756428ply.14.1697766056258; Thu, 19 Oct 2023 18:40:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697766056; cv=none; d=google.com; s=arc-20160816; b=snjxq9NMplIdPMrYLk31Fx0wAUM0QmwycBTaDtCMz7KN8DaKagbCqb3mh9OZBkdrZi czI749+D1elFiFzPzqp+hnQ1UydAs87goXU/iaHEwrmCVrLXaVszFXEu3iqawW0nwD6W uFYfUjEc9ZvarSiKh/DVqKB/xyAjaH/Ygwk2D3OU//7DPmNqV9BvAJMqDeSBvQWd+7gV tv9yGoE61czNk7crQ/bU7qQ4X/qmNRA93bN2NjWqwxMpGWexydAm9EgAJlvkQ0gBsRYh tgRqAJ0DVT7oS0RDBp4e1m6CFeVY3txXkK3SUkorU2e7tAcztn2N7h6TSi5xIc5JSqo1 5UoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=m0pwSa2GbXJihOZrNu7aMcnrt6aomdhOt1W1hRoVbr8=; fh=eUeDaXEoz+BmTGwje8HoWK/5Htsj786g1xe5Bw+HP1U=; b=xmvDgFrtBbfPLwriztnnnQumEd5Qs7lxbCAMUnWuqsuth/JyMudzM2m3NTHwik0HLO Mgsa8cO8b/Rnt+UuEH9umyPIhOFSWBPkXcp+PqT83PenkDXt+ZiDeVoEN7I8VRfHqq+p R8YYQTMn170H21hrvVpl7wQWSaQhdOeh7tHO65+W4swn8QwxBCsekrBmMxkqNNhCSBz1 +meQDXY2dDtMKw7ue7RKJOoNwHmiBb1R89yshrhZvmbUKARV99cuHFD/F+eJ9zNqK2Rd jU4pYLuHKdp8gEp7MewrtISln407SA21jAaWyRQowAYXkgzo64eCNr1E6tJduDzm8qxC MN4w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=N44iB+Am; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id jo23-20020a170903055700b001c3ea2bbebcsi663928plb.322.2023.10.19.18.40.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 18:40:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@joelfernandes.org header.s=google header.b=N44iB+Am; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 4DAF382881BB; Thu, 19 Oct 2023 18:40:55 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346825AbjJTBkv (ORCPT + 25 others); Thu, 19 Oct 2023 21:40:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346805AbjJTBkq (ORCPT ); Thu, 19 Oct 2023 21:40:46 -0400 Received: from mail-il1-x132.google.com (mail-il1-x132.google.com [IPv6:2607:f8b0:4864:20::132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB347124 for ; Thu, 19 Oct 2023 18:40:44 -0700 (PDT) Received: by mail-il1-x132.google.com with SMTP id e9e14a558f8ab-3574cde48b4so1066595ab.1 for ; Thu, 19 Oct 2023 18:40:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; t=1697766044; x=1698370844; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=m0pwSa2GbXJihOZrNu7aMcnrt6aomdhOt1W1hRoVbr8=; b=N44iB+AmDtzIIgLBjpm32TTYi98/zApjmjJ1fnQPghi2nhfvDY8fJqkFIMsg2+ukGu pg/OlKXxJlIg9T9ys70LiFam6r7U+LUiS/9irMfmvHS9XRsKaIQWY2ysuMCgGzSJ/VGv h+TG+s8yfyMKNkXAC/6cCnfY9gGFxdiD1vUa8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697766044; x=1698370844; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=m0pwSa2GbXJihOZrNu7aMcnrt6aomdhOt1W1hRoVbr8=; b=ONOFXLSR4mtBa0sYKumbyqiEIDNAT3IilqL0Rv/qNVjuwCGHTBTt9N3gncQFGTD1a4 XSfAYRbXzTaZNBoyiRgZ9ISA7xGQCmTivuLUX1Bw9BEuVrO6mPpKjvQu82egCYAN7Q9i AgXQnyX7G8NcpaYjDWJVYK0TrisqSYUjL0mLJVVXBFSxXqtLNO0W6yT9U2mlJauIskgx t+F9EKOLPskxCYBTbxajjATHIonro2vc6IvvQW3vwV3x3ULjcVA72EkWFrOh6uN24v2v m2kh814ulLbQASF8GUaP8CMbRulzvPCLVdjMi/2DzcYaJS2yKhtTXA2jY99nTSesxJG9 iBOw== X-Gm-Message-State: AOJu0Yzc8D8ODGlX9YANZMxZMbtI4Uvp1KEX4nKKatbDG0wRWvjyGz1b YoHa8lWjUAb+JbEYmN+LOclMNqgTFt8yiomefAPFnA== X-Received: by 2002:a05:6e02:214b:b0:34d:e998:fb4f with SMTP id d11-20020a056e02214b00b0034de998fb4fmr850150ilv.10.1697766043833; Thu, 19 Oct 2023 18:40:43 -0700 (PDT) Received: from joelboxx5.c.googlers.com.com (20.10.132.34.bc.googleusercontent.com. [34.132.10.20]) by smtp.gmail.com with ESMTPSA id h9-20020a056e020d4900b00350f5584876sm270394ilj.27.2023.10.19.18.40.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 18:40:43 -0700 (PDT) From: "Joel Fernandes (Google)" To: linux-kernel@vger.kernel.org, Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider Cc: "Vineeth Pillai (Google)" , Suleiman Souhlal , Frederic Weisbecker , "Paul E . McKenney" , Joel Fernandes Subject: [PATCH 3/3] sched: Update ->next_balance correctly during newidle balance Date: Fri, 20 Oct 2023 01:40:28 +0000 Message-ID: <20231020014031.919742-3-joel@joelfernandes.org> X-Mailer: git-send-email 2.42.0.655.g421f12c284-goog In-Reply-To: <20231020014031.919742-1-joel@joelfernandes.org> References: <20231020014031.919742-1-joel@joelfernandes.org> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 19 Oct 2023 18:40:55 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780236740326130651 X-GMAIL-MSGID: 1780236740326130651 From: "Vineeth Pillai (Google)" When newidle balancing triggers, we see that it constantly clobbers rq->next_balance even when there is no newidle balance happening due to the cost estimates. Due to this, we see that periodic load balance (rebalance_domains) may trigger way more often when the CPU is going in and out of idle at a high rate but is no really idle. Repeatedly triggering load balance there is a bad idea as it is a heavy operation. It also causes increases in softirq. Another issue is ->last_balance is not updated after newidle balance causing mistakes in the ->next_balance calculations. Fix by updating last_balance when a newidle load balance actually happens and then updating next_balance. This is also how it is done in other load balance paths. Testing shows a significant drop in softirqs when running: cyclictest -i 100 -d 100 --latency=1000 -D 5 -t -m -q Goes from ~6k to ~800. Cc: Suleiman Souhlal Cc: Steven Rostedt Cc: Frederic Weisbecker Cc: Paul E. McKenney Signed-off-by: Vineeth Pillai (Google) Co-developed-by: Joel Fernandes (Google) Signed-off-by: Joel Fernandes (Google) --- kernel/sched/fair.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8e276d12c3cb..b147ad09126a 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -12076,11 +12076,7 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) if (!READ_ONCE(this_rq->rd->overload) || (sd && this_rq->avg_idle < sd->max_newidle_lb_cost)) { - - if (sd) - update_next_balance(sd, &next_balance); rcu_read_unlock(); - goto out; } rcu_read_unlock(); @@ -12095,8 +12091,6 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) int continue_balancing = 1; u64 domain_cost; - update_next_balance(sd, &next_balance); - if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) break; @@ -12109,6 +12103,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf) t1 = sched_clock_cpu(this_cpu); domain_cost = t1 - t0; update_newidle_cost(sd, domain_cost); + sd->last_balance = jiffies; + update_next_balance(sd, &next_balance); curr_cost += domain_cost; t0 = t1;