From patchwork Tue Feb 7 04:58:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 53663 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp2649479wrn; Mon, 6 Feb 2023 20:53:13 -0800 (PST) X-Google-Smtp-Source: AK7set8s0U/t578pXZBwLnq8Qdkiq4v4llH3+FJgI0c5gGznenahWhWLg89Lo6KsvauCB40DKE3G X-Received: by 2002:a17:90b:1b50:b0:230:ba03:f322 with SMTP id nv16-20020a17090b1b5000b00230ba03f322mr2591814pjb.21.1675745593057; Mon, 06 Feb 2023 20:53:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675745593; cv=none; d=google.com; s=arc-20160816; b=PSxsZnzjXiD8jHuejqT4226I8XhlKlUB5rq5LqvzdJp3nSwBWHvGybLBeyAD6n1Qk4 qq/JwC62bUZupL48y4xescQ2lzaw/5UO+6P1+HWEmHWfAoUlwHp8oKDdY2UuBmb7kmoC YxjSu3oY/fldj5oSRB+jzyz7BW5JDNWedFRA5NyMbZ2c4gewEsb8QjKqLbKzB5fgG6K9 2rbZXwAtKFPix09AF5fPHTWhjgERCqprjnA9/Viz3L5lWTkNkZ8cxWpV/Z6NBfiNuO9h yR/7j3sU1P3Q7wU9NyN0QspISNYxRZNdAqVrtH3Huhk2sBxRKRdfzpVxuRPALBfnn8q2 FR/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=ww4/UVZg7pule0+hKsxP3kqOsZ8YGPIEAXnT8F2GSHQ=; b=Vt+S15DFN6vm0jXNY02d8IV5YyfItlc4SbbE/sDUuttaE1PZP6Q8WIvnN+/puBNnU1 Yao9mQkR2nE0Eu2ZR5LT0PNShJO0EyQ08FJG5IrH4QOoDYYq3W7/ir+22qntmwPHVUHz rDDYYkhcAhWhwP74/NvYDPE71fXikbciufSa53Zrc2LxMGCNiX/AUgfHHtOvXek3fGGt R9vML0PKRzPI0agD233PeGf9t4gNJeeyRl1LjpUqd0QMw9UhhzelHA8SVg38iJP8NVeq zV2hhmmjVKhscQldAuf1tlyO8QGrOQ36pkTzIJKGtO+idt2FtxGABsKx0gL/R5E87B0l wzIQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=SsRNjdSS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s8-20020a17090aad8800b00227132d2a4esi8260772pjq.101.2023.02.06.20.52.59; Mon, 06 Feb 2023 20:53:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=SsRNjdSS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230092AbjBGEvN (ORCPT + 99 others); Mon, 6 Feb 2023 23:51:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53308 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229939AbjBGEut (ORCPT ); Mon, 6 Feb 2023 23:50:49 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC3EE4224 for ; Mon, 6 Feb 2023 20:50:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1675745448; x=1707281448; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=YXCyBdAQ0Q8XIVxjs4jHd98vK+N1vGYP5jLh9y5xDlw=; b=SsRNjdSSrN7/Pechmo0lvFWLJlMbcut3thdz6lEyVulzntW2fqLGpEoe MMkA8ZZaZm7UY5JjrC3MgQurO7pLd8nchhy2nvRcdbeSRX/+H1zyNOROI m/oeS9/hUPgDoB9lr/ygETgGQkP908ws2naKl6ZxTqurrtVUMfBWe2i46 lJDNrkgebccArGTGSv199qzuQe86DMEIoH/6b2XuSIyNOMG52aCZyEVsS dlytJghiMrliO3eIkbWhw8XncPTJIacKdqsAwW+HPED8dmWR3pNrPhpAk pKm7+WaPb14KjKcFqrvnpWvLkyO+qIZkhId8cBKdNNsNBkaRWonDW2VnD A==; X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="415624023" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="415624023" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Feb 2023 20:50:47 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="668653796" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="668653796" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by fmsmga007.fm.intel.com with ESMTP; 06 Feb 2023 20:50:47 -0800 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Ionela Voinescu , x86@kernel.org, linux-kernel@vger.kernel.org, Ricardo Neri , "Tim C . Chen" Subject: [PATCH v3 06/10] sched/fair: Use the prefer_sibling flag of the current sched domain Date: Mon, 6 Feb 2023 20:58:34 -0800 Message-Id: <20230207045838.11243-7-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230207045838.11243-1-ricardo.neri-calderon@linux.intel.com> References: <20230207045838.11243-1-ricardo.neri-calderon@linux.intel.com> X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757146611214728574?= X-GMAIL-MSGID: =?utf-8?q?1757146611214728574?= SD_PREFER_SIBLING is set from the SMT scheduling domain up to the first non-NUMA domain (the exception is systems with SD_ASYM_CPUCAPACITY). Above the SMT sched domain, all domains have a child. The SD_PREFER_ SIBLING is honored always regardless of the scheduling domain at which the load balance takes place. There are cases, however, in which the busiest CPU's sched domain has child but the destination CPU's does not. Consider, for instance a non-SMT core (or an SMT core with only one online sibling) doing load balance with an SMT core at the MC level. SD_PREFER_SIBLING will not be honored. We are left with a fully busy SMT core and an idle non-SMT core. Avoid inconsistent behavior. Use the prefer_sibling behavior at the current scheduling domain, not its child. The NUMA sched domain does not have the SD_PREFER_SIBLING flag. Thus, we will not spread load among NUMA sched groups, as desired. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Len Brown Cc: Mel Gorman Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Suggested-by: Valentin Schneider Signed-off-by: Ricardo Neri --- Changes since v2: * Introduced this patch. Changes since v1: * N/A --- kernel/sched/fair.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index df7bcbf634a8..a37ad59f20ea 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -10004,7 +10004,6 @@ static void update_idle_cpu_scan(struct lb_env *env, static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sds) { - struct sched_domain *child = env->sd->child; struct sched_group *sg = env->sd->groups; struct sg_lb_stats *local = &sds->local_stat; struct sg_lb_stats tmp_sgs; @@ -10045,9 +10044,11 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd sg = sg->next; } while (sg != env->sd->groups); - /* Tag domain that child domain prefers tasks go to siblings first */ - sds->prefer_sibling = child && child->flags & SD_PREFER_SIBLING; - + /* + * Tag domain that @env::sd prefers to spread excess tasks among + * sibling sched groups. + */ + sds->prefer_sibling = env->sd->flags & SD_PREFER_SIBLING; if (env->sd->flags & SD_NUMA) env->fbq_type = fbq_classify_group(&sds->busiest_stat); @@ -10346,7 +10347,6 @@ static struct sched_group *find_busiest_group(struct lb_env *env) goto out_balanced; } - /* Try to move all excess tasks to child's sibling domain */ if (sds.prefer_sibling && local->group_type == group_has_spare && busiest->sum_nr_running > local->sum_nr_running + 1) goto force_balance;