From patchwork Wed May 24 07:20:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yicong Yang X-Patchwork-Id: 98349 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2657403vqo; Wed, 24 May 2023 00:25:25 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6wNq+M0rQuQTyFdkRmchFFuRXPmGxmnem1NAKHJrJshOkP0ElymDDTYPV3nndWfjf49zVy X-Received: by 2002:a05:6a20:938e:b0:10b:7b22:cbf1 with SMTP id x14-20020a056a20938e00b0010b7b22cbf1mr11108709pzh.7.1684913125125; Wed, 24 May 2023 00:25:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684913125; cv=none; d=google.com; s=arc-20160816; b=L5U6whw8w6WW0VqQiXxCS64MH1u52sAQe2Y3oQnBkUAYR5JgzI04ldq5DuzoxTwAme D2G80psht2tj1CMSv0BOvoZSJvYo9iy4RNhWos1+HQeO+KpNSVZKTd/2idCumz6ghEV0 rDzJCI+Wl96RwlaGlhE3VeTrfOSl4XB75tidLFFjOTNia2+DFqEJ5j8xFpfy3iSvtxLC BO0YOLWh4aWcsE8dDQQJBXISNKBI1OeBky+I40+un/tLmPvl3FoWy/Jj6OtgDZW3ACR8 DCYMR01xaJXdaxH7Yot1WNxe8rERgdD7ncpsxP+fCM4sez8wCrXby34xvjYGxiN5C+4x GF0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=P4pf/+X0D8YwcMd3nczxEzqZ14AENGfHL3dwyCyhSKE=; b=jMd10N5ZbphWthZuU5C/oxS7sr97VUUNmlrLyXyM0JlDTD9BmS2MzYMqmnficKJPpS QHiUYqS2zEC0DZDMnM/Dm8Yf4wxRFCgk+OiqYxRTtkREWVUz86BwMc+4xDsZ4xCA1tWU lgQtNN+rsHcPNADKR336UBaegxhkDt4uRUNgf4T4LyyW3Uv27zYRXPFeSNfFyVJ9Cted uEr6ap5BoBfRCmNMq149k1ihNtVsZ97K+6CtjAr3SDNcYquY9gdmBIvCI3iWfonLgUI0 NvMcSh13wHypckTXc22N9oYw+u4NKBJAdcQakQRoYpkV2q+o9njjb4/EGYwUR0eSCqU8 RpjA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p37-20020a635b25000000b0052868a865d4si7583246pgb.553.2023.05.24.00.25.13; Wed, 24 May 2023 00:25:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229518AbjEXHVp (ORCPT + 99 others); Wed, 24 May 2023 03:21:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57518 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239476AbjEXHVo (ORCPT ); Wed, 24 May 2023 03:21:44 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AEE08A1 for ; Wed, 24 May 2023 00:21:39 -0700 (PDT) Received: from canpemm500009.china.huawei.com (unknown [172.30.72.55]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4QR2Yv4h87zqT2D; Wed, 24 May 2023 15:17:07 +0800 (CST) Received: from localhost.localdomain (10.50.163.32) by canpemm500009.china.huawei.com (7.192.105.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.23; Wed, 24 May 2023 15:21:37 +0800 From: Yicong Yang To: , , , , , , CC: , , , , , , , , Subject: [PATCH] sched/fair: Don't balance task to its current running CPU Date: Wed, 24 May 2023 15:20:18 +0800 Message-ID: <20230524072018.62204-1-yangyicong@huawei.com> X-Mailer: git-send-email 2.31.0 MIME-Version: 1.0 X-Originating-IP: [10.50.163.32] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To canpemm500009.china.huawei.com (7.192.105.203) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766759465050119179?= X-GMAIL-MSGID: =?utf-8?q?1766759465050119179?= From: Yicong Yang We've run into the case that the balancer tries to balance a migration disabled task and trigger the warning in set_task_cpu() like below: ------------[ cut here ]------------ WARNING: CPU: 7 PID: 0 at kernel/sched/core.c:3115 set_task_cpu+0x188/0x240 Modules linked in: hclgevf xt_CHECKSUM ipt_REJECT nf_reject_ipv4 <...snip> CPU: 7 PID: 0 Comm: swapper/7 Kdump: loaded Tainted: G O 6.1.0-rc4+ #1 Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V5.B221.01 12/09/2021 pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : set_task_cpu+0x188/0x240 lr : load_balance+0x5d0/0xc60 sp : ffff80000803bc70 x29: ffff80000803bc70 x28: ffff004089e190e8 x27: ffff004089e19040 x26: ffff007effcabc38 x25: 0000000000000000 x24: 0000000000000001 x23: ffff80000803be84 x22: 000000000000000c x21: ffffb093e79e2a78 x20: 000000000000000c x19: ffff004089e19040 x18: 0000000000000000 x17: 0000000000001fad x16: 0000000000000030 x15: 0000000000000000 x14: 0000000000000003 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000001 x10: 0000000000000400 x9 : ffffb093e4cee530 x8 : 00000000fffffffe x7 : 0000000000ce168a x6 : 000000000000013e x5 : 00000000ffffffe1 x4 : 0000000000000001 x3 : 0000000000000b2a x2 : 0000000000000b2a x1 : ffffb093e6d6c510 x0 : 0000000000000001 Call trace: set_task_cpu+0x188/0x240 load_balance+0x5d0/0xc60 rebalance_domains+0x26c/0x380 _nohz_idle_balance.isra.0+0x1e0/0x370 run_rebalance_domains+0x6c/0x80 __do_softirq+0x128/0x3d8 ____do_softirq+0x18/0x24 call_on_irq_stack+0x2c/0x38 do_softirq_own_stack+0x24/0x3c __irq_exit_rcu+0xcc/0xf4 irq_exit_rcu+0x18/0x24 el1_interrupt+0x4c/0xe4 el1h_64_irq_handler+0x18/0x2c el1h_64_irq+0x74/0x78 arch_cpu_idle+0x18/0x4c default_idle_call+0x58/0x194 do_idle+0x244/0x2b0 cpu_startup_entry+0x30/0x3c secondary_start_kernel+0x14c/0x190 __secondary_switched+0xb0/0xb4 ---[ end trace 0000000000000000 ]--- Further investigation shows that the warning is superfluous, the migration disabled task is just going to be migrated to its current running CPU. This is because that on load balance if the dst_cpu is not allowed by the task, we'll re-select a new_dst_cpu as a candidate. If no task can be balanced to dst_cpu we'll try to balance the task to the new_dst_cpu instead. In this case when the migration disabled task is not on CPU it only allows to run on its current CPU, load balance will select its current CPU as new_dst_cpu and later triggers the the warning above. This patch tries to solve this by not select the task's current running CPU as new_dst_cpu in the load balance. Signed-off-by: Yicong Yang Reviewed-by: Valentin Schneider --- Thanks Valentin for the knowledge of migration disable. Previous discussion can be found at https://lore.kernel.org/all/20230313065759.39698-1-yangyicong@huawei.com/ kernel/sched/fair.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7a1b1f855b96..3c4f3a244c1d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8456,7 +8456,8 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) /* Prevent to re-select dst_cpu via env's CPUs: */ for_each_cpu_and(cpu, env->dst_grpmask, env->cpus) { - if (cpumask_test_cpu(cpu, p->cpus_ptr)) { + if (cpumask_test_cpu(cpu, p->cpus_ptr) && + cpu != env->src_cpu) { env->flags |= LBF_DST_PINNED; env->new_dst_cpu = cpu; break;