From patchwork Tue Jan 31 22:17:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Will Deacon X-Patchwork-Id: 51056 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp64328wrn; Tue, 31 Jan 2023 14:22:34 -0800 (PST) X-Google-Smtp-Source: AK7set8yE8nWCcFFFqkq2Fjtg9nkeZZTLjUbiR/Ubpn6XehENLJzh7Wuj1mUEUQI8zeDChRKmlBi X-Received: by 2002:a17:906:53d5:b0:88c:8c2e:af17 with SMTP id p21-20020a17090653d500b0088c8c2eaf17mr21660ejo.2.1675203753986; Tue, 31 Jan 2023 14:22:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675203753; cv=none; d=google.com; s=arc-20160816; b=cUIGPtos/4mgfb/wJNp1cIFnFD3trDgyiIv/V4lqpDGn9iAaHyrRyYthDyMSPKbxj7 Pj8QutW32CV/74IF/Y0m7arVCh5rO7LMqOkLYv9tIxEL6mS/1kJbfJhRRuRvPM6UMeYt mL/n4MkixGv3RFextQmyKmdmT7iAmBiH5VTOzYr0uLJ3LuNDUvjP40gz/kHcnp/3zhbK P97d/2iEDIWjRRhx6om1Tgabwki6vSNcoylmk4hbEvh65iP4i5wrY+ZBrmOG6ifW1O7E kfZBFMxDQ+q8yTJ/aspyf0OORaiGBbXaOxmfPfMxz57WJ0WFMxVyrLYZ/ETF/BLXxxGO wUFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=1Ou8o6p+FHjSjUbjWTQ/+dwv2EXF2tBLfl2NOjJfufQ=; b=MfFpG8wK2BeJE/+lWTti6UeakesFHaUccctNzftfFZikZKACnmeQN1T/PKEuydEjhw nREdtS9PJ6LAdFoZgujZnv2vJfd2KQiwKGBFWKMy/9GTOFVCpxoG0kBpY/0dEfCiaqsu uIGWTEe6kwp0t6qSwgY1xYrFO3vTi0DqfHcmasb41J1Tbqu4FvHaxTA4ZiyD68NQu6tF qDGyyZDRQ9w+5bPxqo2WvvHU4eugvvVi/UU/8t7PJsTnfqOmHkIVA4FdPLB5xYwpFf1R slH89sw+frYxDe6Cv2gsOOc6i+2hTNWfNczZ7bJYozwF5pPg3AOqz4lKOv2bnLOmTj8n e0xw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nfTuawCh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j20-20020a50ed14000000b004a21d0a3088si11852974eds.467.2023.01.31.14.22.10; Tue, 31 Jan 2023 14:22:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=nfTuawCh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231431AbjAaWRe (ORCPT + 99 others); Tue, 31 Jan 2023 17:17:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44828 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229518AbjAaWR2 (ORCPT ); Tue, 31 Jan 2023 17:17:28 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C854F46157; Tue, 31 Jan 2023 14:17:27 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6D26B6172A; Tue, 31 Jan 2023 22:17:27 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3060FC4339E; Tue, 31 Jan 2023 22:17:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1675203446; bh=ELfyS2nEQLpVpTwQYA/Y76BnsftBMsvCjEcLwiHp7V4=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=nfTuawChPCNl9Geuhr0xvy93l5JDQisVlKYHhjH/PfAaEzwpic+WNs3QMvLjz93I0 WOrnIvHvp81Y3wCqXP4jLx48EN45G9jZxuM2l7EofC6G0X9VeoszRHJ+QCSEZYJaZ5 ra4tk+7dO/u7a8vCq0Aw9iOCSq3lipvtj+wcDtJweZVxO1w//nQtG/59RA6sHZPvKt XY3PVrGHBw+yA5zPqPCNG/eao/QPJ6RxpN+yCjaYrPN4dtH+WZE1VMGKzPpPhJBAJQ gpArYGeEzyBb8EIQ7ehvR5aklzi7yKQ2yVjAXW3JTXY+08g8MRfQHiozQ+0jYuQWaq e0ddTf9L2Vrcg== From: Will Deacon To: linux-kernel@vger.kernel.org Cc: kernel-team@android.com, Will Deacon , Peter Zijlstra , Waiman Long , Zefan Li , Tejun Heo , Johannes Weiner , cgroups@vger.kernel.org Subject: [PATCH 1/2] cpuset: Fix cpuset_cpus_allowed() to not filter offline CPUs Date: Tue, 31 Jan 2023 22:17:18 +0000 Message-Id: <20230131221719.3176-2-will@kernel.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230131221719.3176-1-will@kernel.org> References: <20230131221719.3176-1-will@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756578451312840000?= X-GMAIL-MSGID: =?utf-8?q?1756578451312840000?= From: Peter Zijlstra There is a difference in behaviour between CPUSET={y,n} that is now wrecking havoc with {relax,force}_compatible_cpus_allowed_ptr(). Specifically, since commit 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask") relax_compatible_cpus_allowed_ptr() is calling __sched_setaffinity() unconditionally. But the underlying problem goes back a lot further, possibly to commit: ae1c802382f7 ("cpuset: apply cs->effective_{cpus,mems}") which switched cpuset_cpus_allowed() from cs->cpus_allowed to cs->effective_cpus. The problem is that for CPUSET=y cpuset_cpus_allowed() will filter out all offline CPUs. For tasks that are part of a (!root) cpuset this is then later fixed up by the cpuset hotplug notifiers that re-evaluate and re-apply cs->effective_cpus, but for (normal) tasks in the root cpuset this does not happen and they will forever after be excluded from CPUs onlined later. As such, rewrite cpuset_cpus_allowed() to return a wider mask, including the offline CPUs. Fixes: 8f9ea86fdf99 ("sched: Always preserve the user requested cpumask") Reported-by: Will Deacon Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20230117160825.GA17756@willie-the-truck Signed-off-by: Will Deacon --- kernel/cgroup/cpuset.c | 39 ++++++++++++++++++++++++++++++++++----- 1 file changed, 34 insertions(+), 5 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index a29c0b13706b..8552cc2c586a 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -3683,23 +3683,52 @@ void __init cpuset_init_smp(void) BUG_ON(!cpuset_migrate_mm_wq); } +static const struct cpumask *__cs_cpus_allowed(struct cpuset *cs) +{ + const struct cpumask *cs_mask = cs->cpus_allowed; + if (!parent_cs(cs)) + cs_mask = cpu_possible_mask; + return cs_mask; +} + +static void cs_cpus_allowed(struct cpuset *cs, struct cpumask *pmask) +{ + do { + cpumask_and(pmask, pmask, __cs_cpus_allowed(cs)); + cs = parent_cs(cs); + } while (cs); +} + /** * cpuset_cpus_allowed - return cpus_allowed mask from a tasks cpuset. * @tsk: pointer to task_struct from which to obtain cpuset->cpus_allowed. * @pmask: pointer to struct cpumask variable to receive cpus_allowed set. * - * Description: Returns the cpumask_var_t cpus_allowed of the cpuset - * attached to the specified @tsk. Guaranteed to return some non-empty - * subset of cpu_online_mask, even if this means going outside the - * tasks cpuset. + * Description: Returns the cpumask_var_t cpus_allowed of the cpuset attached + * to the specified @tsk. Guaranteed to return some non-empty intersection + * with cpu_online_mask, even if this means going outside the tasks cpuset. **/ void cpuset_cpus_allowed(struct task_struct *tsk, struct cpumask *pmask) { unsigned long flags; + struct cpuset *cs; spin_lock_irqsave(&callback_lock, flags); - guarantee_online_cpus(tsk, pmask); + rcu_read_lock(); + + cs = task_cs(tsk); + do { + cpumask_copy(pmask, task_cpu_possible_mask(tsk)); + cs_cpus_allowed(cs, pmask); + + if (cpumask_intersects(pmask, cpu_online_mask)) + break; + + cs = parent_cs(cs); + } while (cs); + + rcu_read_unlock(); spin_unlock_irqrestore(&callback_lock, flags); } From patchwork Tue Jan 31 22:17:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Will Deacon X-Patchwork-Id: 51055 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp64267wrn; Tue, 31 Jan 2023 14:22:27 -0800 (PST) X-Google-Smtp-Source: AK7set+6bMY82QF2qwajM5OrKNoCW26D243Rfd+Gtpem+PdPO+vMt4j7aHJxbONrxhBrNlcvC856 X-Received: by 2002:a17:90b:3cc:b0:22c:816e:d67d with SMTP id go12-20020a17090b03cc00b0022c816ed67dmr11861370pjb.24.1675203746926; Tue, 31 Jan 2023 14:22:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675203746; cv=none; d=google.com; s=arc-20160816; b=htoEycZT9+Pfsq/B187NS1kPUMlOgp9Hf1mXoOl4BU8+pQquhtUdi9s2pXyTIkM8vf Ir66L7MGmhFv/mQ3rgur+Jk57DYL0nmC1xwtWDIpWqT8GWHA/L64Fn/YrOR0fkQAfKBl ShWAL8OjG/5IwVaDXFJiMB6EuvY/VLeXF8fzfqiHAu8g3DV3k94a6a93xU4BF1XCsSQC kRG4oiWJKDp+sYBSBN1YV2qBLypzGEbzwWb6Qg6RrzLUg7so/uqj66ad+QJkwQFBhPlj wbCBjHCMntNly3rw0TEmuaF3+IKxf0F7EtLHT67XOVRqnl8/ZpOJEspKDjdq6Gfo5l5s UDXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=L2j5GQCj0XbmrTI4gp7UQ7rtjX3B5+7lexVDzgcHGGw=; b=QfaMp6r7QEVLt1v1bSAMmI4j1fOIWw4lBFIoKxkDNasqPKHOMaz+6K4pHglf1Ran+e 87yf64rMIrTazK871VjkEKLZMsXi0FTqINNfqga+jSqFUvlhj7VbdnISN9KgkuwbnjtK +OcPFmgnb4rNoeuN7oGDJssHebXTtFNwXyhn6pIDlH96TsfeLnIXgLWmXaFLcMoz19vP v13ULkCIQpYR+VYAkMxQXnzJ3v0LqWttJQDoxlx4d0rPVeMxJbbA7rEE0L0FDqIK6H/r B2/DAghh3Lrsfgh9b98o8C14YpZIAzTOqhgfKUG8in4ToBTp6i8chhh+OAsmxhNQRkGK fB/A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=EMvjDpuw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q19-20020a17090a2e1300b00229dce6fffcsi19356989pjd.61.2023.01.31.14.22.14; Tue, 31 Jan 2023 14:22:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=EMvjDpuw; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231650AbjAaWRi (ORCPT + 99 others); Tue, 31 Jan 2023 17:17:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44854 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229911AbjAaWRc (ORCPT ); Tue, 31 Jan 2023 17:17:32 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8FAF9460BA; Tue, 31 Jan 2023 14:17:31 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 4BACAB81F54; Tue, 31 Jan 2023 22:17:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5170FC433A0; Tue, 31 Jan 2023 22:17:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1675203449; bh=QeeSjHFFviAdqSXwR8qgEE04ILGmTX9d2u0+rUBGGfA=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=EMvjDpuwGsse6CjAWdx95eLhVhyqOmy3NGY6VaRrPM/v/x8FUdelAx0Ym+Iu4CCk9 XVT8SSVTaujB2jVjdnzEtmg9/DezLBbtvfFEamzabWcKu4n0VR2c0oYv0nrQZ/9Y/D N1qJqPyE1ClBStFMeJpK2VVHJUbRzwnHlh6DP4+2o2lhtvl8gY6R57neNjSSpGA+kS N9XUFglBzHgv1H8l+gilMzKauMsCUjx8s/t+lokRFeHi9w1+iK8nbaJRHGI2T4v0IK oPBwO75mt/aZZpiOQ06Rhmph6dirsFzbf+RH60mNCDzBEfVcEdQ4laPaQqI6yqry0+ HVEJFPnhrr2zw== From: Will Deacon To: linux-kernel@vger.kernel.org Cc: kernel-team@android.com, Will Deacon , Peter Zijlstra , Waiman Long , Zefan Li , Tejun Heo , Johannes Weiner , cgroups@vger.kernel.org Subject: [PATCH 2/2] cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task Date: Tue, 31 Jan 2023 22:17:19 +0000 Message-Id: <20230131221719.3176-3-will@kernel.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20230131221719.3176-1-will@kernel.org> References: <20230131221719.3176-1-will@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756578444081500058?= X-GMAIL-MSGID: =?utf-8?q?1756578444081500058?= set_cpus_allowed_ptr() will fail with -EINVAL if the requested affinity mask is not a subset of the task_cpu_possible_mask() for the task being updated. Consequently, on a heterogeneous system with cpusets spanning the different CPU types, updates to the cgroup hierarchy can silently fail to update task affinities when the effective affinity mask for the cpuset is expanded. For example, consider an arm64 system with 4 CPUs, where CPUs 2-3 are the only cores capable of executing 32-bit tasks. Attaching a 32-bit task to a cpuset containing CPUs 0-2 will correctly affine the task to CPU 2. Extending the cpuset to CPUs 0-3, however, will fail to extend the affinity mask of the 32-bit task because update_tasks_cpumask() will pass the full 0-3 mask to set_cpus_allowed_ptr(). Extend update_tasks_cpumask() to take a temporary 'cpumask' paramater and use it to mask the 'effective_cpus' mask with the possible mask for each task being updated. Fixes: 431c69fac05b ("cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus()") Signed-off-by: Will Deacon Acked-by: Waiman Long --- Note: We wondered whether it was worth calling guarantee_online_cpus() if the cpumask_and() returns 0 in update_tasks_cpumask(), but given that this path is only called when the effective mask changes, it didn't seem appropriate. Ultimately, if you have 32-bit tasks attached to a cpuset containing only 64-bit cpus, then the affinity is going to be forced. kernel/cgroup/cpuset.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 8552cc2c586a..f15fb0426707 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1205,12 +1205,13 @@ void rebuild_sched_domains(void) /** * update_tasks_cpumask - Update the cpumasks of tasks in the cpuset. * @cs: the cpuset in which each task's cpus_allowed mask needs to be changed + * @new_cpus: the temp variable for the new effective_cpus mask * * Iterate through each task of @cs updating its cpus_allowed to the * effective cpuset's. As this function is called with cpuset_rwsem held, * cpuset membership stays stable. */ -static void update_tasks_cpumask(struct cpuset *cs) +static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus) { struct css_task_iter it; struct task_struct *task; @@ -1224,7 +1225,10 @@ static void update_tasks_cpumask(struct cpuset *cs) if (top_cs && (task->flags & PF_KTHREAD) && kthread_is_per_cpu(task)) continue; - set_cpus_allowed_ptr(task, cs->effective_cpus); + + cpumask_and(new_cpus, cs->effective_cpus, + task_cpu_possible_mask(task)); + set_cpus_allowed_ptr(task, new_cpus); } css_task_iter_end(&it); } @@ -1509,7 +1513,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd, spin_unlock_irq(&callback_lock); if (adding || deleting) - update_tasks_cpumask(parent); + update_tasks_cpumask(parent, tmp->new_cpus); /* * Set or clear CS_SCHED_LOAD_BALANCE when partcmd_update, if necessary. @@ -1661,7 +1665,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp, WARN_ON(!is_in_v2_mode() && !cpumask_equal(cp->cpus_allowed, cp->effective_cpus)); - update_tasks_cpumask(cp); + update_tasks_cpumask(cp, tmp->new_cpus); /* * On legacy hierarchy, if the effective cpumask of any non- @@ -2309,7 +2313,7 @@ static int update_prstate(struct cpuset *cs, int new_prs) } } - update_tasks_cpumask(parent); + update_tasks_cpumask(parent, tmpmask.new_cpus); if (parent->child_ecpus_count) update_sibling_cpumasks(parent, cs, &tmpmask); @@ -3347,7 +3351,7 @@ hotplug_update_tasks_legacy(struct cpuset *cs, * as the tasks will be migrated to an ancestor. */ if (cpus_updated && !cpumask_empty(cs->cpus_allowed)) - update_tasks_cpumask(cs); + update_tasks_cpumask(cs, new_cpus); if (mems_updated && !nodes_empty(cs->mems_allowed)) update_tasks_nodemask(cs); @@ -3384,7 +3388,7 @@ hotplug_update_tasks(struct cpuset *cs, spin_unlock_irq(&callback_lock); if (cpus_updated) - update_tasks_cpumask(cs); + update_tasks_cpumask(cs, new_cpus); if (mems_updated) update_tasks_nodemask(cs); }