Message ID | 20230701065049.1758266-1-linmiaohe@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp10880533vqr; Sat, 1 Jul 2023 00:17:39 -0700 (PDT) X-Google-Smtp-Source: APBJJlGoAIxn7ZMuwNfRhhx+UB0afgiSlddTsJR6tgFrMn6rrh6rSoDTtcYj831c5MWif7fyiqpd X-Received: by 2002:a17:903:2606:b0:1b8:15bf:d2ed with SMTP id jd6-20020a170903260600b001b815bfd2edmr4154175plb.65.1688195859244; Sat, 01 Jul 2023 00:17:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688195859; cv=none; d=google.com; s=arc-20160816; b=N3uXT37aVx/vZDupdVPXxsAmWaGFi8LeCud+cvaaj8szDoLBpsyLre+1+fsKEuxgHf b/2DKfqhwQ6+L4pVN7P/MbcBOYQoetzyZ/WRJHSZoe4YCvasEwjjiTs9o0Y5uVWU253Z QZlv0zsXBA7oU2aitB4fusxqV8KR9c5N8kJynhE+CiqiIsmb+us236HF9q7zr4o77Y53 0E16ezIWfKXF9GiGxHPTeYP76g5tYskhIisyPtXcSeafwRHi/myVGrKuQSheJHShINA6 hLJaCycXK7DfTrNCYlfoVXQsp/iW5Y+rhM6aWJSggjB9BJxD0D7sT74TIEz35oebbcXa MlKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=AYnFhvIMZyh/5xzJjTtA6c4mbUbz4Pn+uQs+8vDT4qg=; fh=xm61FqaKFbVtJj3DZjMBwIjT3viqbFfn4Ws5hFHxOsg=; b=xtanLqhIt0bY9QR8PPuKLTjGxDzyG3fjvP9cRGp21cd9SnbDkg+ypfXnmJJeOt6EuC AzTX1nRMdtYKgVjH7eAsOIKSbwJzMhi2pzhMolkpzp6SKX4YSTSvybH+RHL6Kk+FhzZ5 QDouHx76rjxL2CwZEnPp/0xtLgvmHDQZdR5L93qSL04RlPANFzPr1HGKryd39oNpVKN+ f0AJcRX11jvIQQ2XMetJrwT8Wl8VjOMc+7F9yXB5ZRvhTlM7e9Axr7+opxXXHceeHNUH wFPGYP6a3Bxoqjm7BX+WS+GFm7cmBzBRat6HbA2ylQEWXoik5AeVPeIF86s88OaUQnBg RNSQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s3-20020a170902ea0300b001b876d46162si2398460plg.38.2023.07.01.00.17.24; Sat, 01 Jul 2023 00:17:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231685AbjGAG4U (ORCPT <rfc822;nicolai.engesland@gmail.com> + 99 others); Sat, 1 Jul 2023 02:56:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46872 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229695AbjGAGza (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 1 Jul 2023 02:55:30 -0400 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11539423B; Fri, 30 Jun 2023 23:50:27 -0700 (PDT) Received: from canpemm500002.china.huawei.com (unknown [172.30.72.53]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4QtN966WN7zqSKb; Sat, 1 Jul 2023 14:50:02 +0800 (CST) Received: from huawei.com (10.174.151.185) by canpemm500002.china.huawei.com (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Sat, 1 Jul 2023 14:50:23 +0800 From: Miaohe Lin <linmiaohe@huawei.com> To: <longman@redhat.com>, <tj@kernel.org>, <hannes@cmpxchg.org>, <lizefan.x@bytedance.com> CC: <cgroups@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <linmiaohe@huawei.com> Subject: [PATCH] cgroup/cpuset: update parent subparts cpumask while holding css refcnt Date: Sat, 1 Jul 2023 14:50:49 +0800 Message-ID: <20230701065049.1758266-1-linmiaohe@huawei.com> X-Mailer: git-send-email 2.33.0 MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.174.151.185] X-ClientProxiedBy: dggems705-chm.china.huawei.com (10.3.19.182) To canpemm500002.china.huawei.com (7.192.104.244) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1770201661217664888?= X-GMAIL-MSGID: =?utf-8?q?1770201661217664888?= |
Series |
cgroup/cpuset: update parent subparts cpumask while holding css refcnt
|
|
Commit Message
Miaohe Lin
July 1, 2023, 6:50 a.m. UTC
update_parent_subparts_cpumask() is called outside RCU read-side critical
section without holding extra css refcnt of cp. In theroy, cp could be
freed at any time. Holding extra css refcnt to ensure cp is valid while
updating parent subparts cpumask.
Fixes: d7c8142d5a55 ("cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
kernel/cgroup/cpuset.c | 3 +++
1 file changed, 3 insertions(+)
Comments
On 7/1/23 02:50, Miaohe Lin wrote: > update_parent_subparts_cpumask() is called outside RCU read-side critical > section without holding extra css refcnt of cp. In theroy, cp could be > freed at any time. Holding extra css refcnt to ensure cp is valid while > updating parent subparts cpumask. > > Fixes: d7c8142d5a55 ("cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule") > Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> > --- > kernel/cgroup/cpuset.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index 58e6f18f01c1..632a9986d5de 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -1806,9 +1806,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, > cpuset_for_each_child(cp, css, parent) > if (is_partition_valid(cp) && > cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) { > + if (!css_tryget_online(&cp->css)) > + continue; > rcu_read_unlock(); > update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp); > rcu_read_lock(); > + css_put(&cp->css); > } > rcu_read_unlock(); > retval = 0; Thanks for finding that. It looks good to me. Reviewed-by: Waiman Long <longman@redhat.com>
On 7/1/23 19:38, Waiman Long wrote: > On 7/1/23 02:50, Miaohe Lin wrote: >> update_parent_subparts_cpumask() is called outside RCU read-side >> critical >> section without holding extra css refcnt of cp. In theroy, cp could be >> freed at any time. Holding extra css refcnt to ensure cp is valid while >> updating parent subparts cpumask. >> >> Fixes: d7c8142d5a55 ("cgroup/cpuset: Make partition invalid if >> cpumask change violates exclusivity rule") >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >> --- >> kernel/cgroup/cpuset.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c >> index 58e6f18f01c1..632a9986d5de 100644 >> --- a/kernel/cgroup/cpuset.c >> +++ b/kernel/cgroup/cpuset.c >> @@ -1806,9 +1806,12 @@ static int update_cpumask(struct cpuset *cs, >> struct cpuset *trialcs, >> cpuset_for_each_child(cp, css, parent) >> if (is_partition_valid(cp) && >> cpumask_intersects(trialcs->cpus_allowed, >> cp->cpus_allowed)) { >> + if (!css_tryget_online(&cp->css)) >> + continue; >> rcu_read_unlock(); >> update_parent_subparts_cpumask(cp, >> partcmd_invalidate, NULL, &tmp); >> rcu_read_lock(); >> + css_put(&cp->css); >> } >> rcu_read_unlock(); >> retval = 0; > > Thanks for finding that. It looks good to me. > > Reviewed-by: Waiman Long <longman@redhat.com> Though, I will say that an offline cpuset cannot be a valid partition root. So it is not really a problem. For correctness sake and consistency with other similar code, I am in favor of getting it merged. Cheers, Longman
On 2023/7/2 7:46, Waiman Long wrote: > On 7/1/23 19:38, Waiman Long wrote: >> On 7/1/23 02:50, Miaohe Lin wrote: >>> update_parent_subparts_cpumask() is called outside RCU read-side critical >>> section without holding extra css refcnt of cp. In theroy, cp could be >>> freed at any time. Holding extra css refcnt to ensure cp is valid while >>> updating parent subparts cpumask. >>> >>> Fixes: d7c8142d5a55 ("cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule") >>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> >>> --- >>> kernel/cgroup/cpuset.c | 3 +++ >>> 1 file changed, 3 insertions(+) >>> >>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c >>> index 58e6f18f01c1..632a9986d5de 100644 >>> --- a/kernel/cgroup/cpuset.c >>> +++ b/kernel/cgroup/cpuset.c >>> @@ -1806,9 +1806,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, >>> cpuset_for_each_child(cp, css, parent) >>> if (is_partition_valid(cp) && >>> cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) { >>> + if (!css_tryget_online(&cp->css)) >>> + continue; >>> rcu_read_unlock(); >>> update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp); >>> rcu_read_lock(); >>> + css_put(&cp->css); >>> } >>> rcu_read_unlock(); >>> retval = 0; >> >> Thanks for finding that. It looks good to me. >> >> Reviewed-by: Waiman Long <longman@redhat.com> > > Though, I will say that an offline cpuset cannot be a valid partition root. So it is not really a problem. For correctness sake and consistency with other similar code, I am in favor of getting it merged. Yes, cpuset_mutex will prevent cpuset from being offline while update cpumask. And as you mentioned, this patch makes code more consistency at least. Thanks for your review and comment.
On Mon, Jul 03, 2023 at 10:58:19AM +0800, Miaohe Lin wrote: > On 2023/7/2 7:46, Waiman Long wrote: > > On 7/1/23 19:38, Waiman Long wrote: > >> On 7/1/23 02:50, Miaohe Lin wrote: > >>> update_parent_subparts_cpumask() is called outside RCU read-side critical > >>> section without holding extra css refcnt of cp. In theroy, cp could be > >>> freed at any time. Holding extra css refcnt to ensure cp is valid while > >>> updating parent subparts cpumask. > >>> > >>> Fixes: d7c8142d5a55 ("cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule") > >>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> > >>> --- > >>> kernel/cgroup/cpuset.c | 3 +++ > >>> 1 file changed, 3 insertions(+) > >>> > >>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > >>> index 58e6f18f01c1..632a9986d5de 100644 > >>> --- a/kernel/cgroup/cpuset.c > >>> +++ b/kernel/cgroup/cpuset.c > >>> @@ -1806,9 +1806,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, > >>> cpuset_for_each_child(cp, css, parent) > >>> if (is_partition_valid(cp) && > >>> cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) { > >>> + if (!css_tryget_online(&cp->css)) > >>> + continue; > >>> rcu_read_unlock(); > >>> update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp); > >>> rcu_read_lock(); > >>> + css_put(&cp->css); > >>> } > >>> rcu_read_unlock(); > >>> retval = 0; > >> > >> Thanks for finding that. It looks good to me. > >> > >> Reviewed-by: Waiman Long <longman@redhat.com> > > > > Though, I will say that an offline cpuset cannot be a valid partition root. So it is not really a problem. For correctness sake and consistency with other similar code, I am in favor of getting it merged. > > Yes, cpuset_mutex will prevent cpuset from being offline while update cpumask. And as you mentioned, this patch makes code more consistency at least. Can you update the patch description to note that this isn't required for correctness? Thanks.
Hello. On Sat, Jul 01, 2023 at 02:50:49PM +0800, Miaohe Lin <linmiaohe@huawei.com> wrote: > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -1806,9 +1806,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, > cpuset_for_each_child(cp, css, parent) > if (is_partition_valid(cp) && > cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) { > + if (!css_tryget_online(&cp->css)) > + continue; > rcu_read_unlock(); > update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp); > rcu_read_lock(); > + css_put(&cp->css); Apologies for a possibly noob question -- why is RCU read lock temporarily dropped within the loop? (Is it only because of callback_lock or cgroup_file_kn_lock (via notify_partition_change()) on PREEMPT_RT?) [ OT question: cpuset_for_each_child(cp, css, parent) (1) if (is_partition_valid(cp) && cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) { if (!css_tryget_online(&cp->css)) continue; rcu_read_unlock(); update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp); ... update_tasks_cpumask(cp->parent) ... css_task_iter_start(&cp->parent->css, 0, &it); (2) ... rcu_read_lock(); css_put(&cp->css); } May this touch each task same number of times as its depth within herarchy? ] Thanks, Michal
On 7/10/23 11:11, Michal Koutný wrote: > Hello. > > On Sat, Jul 01, 2023 at 02:50:49PM +0800, Miaohe Lin <linmiaohe@huawei.com> wrote: >> --- a/kernel/cgroup/cpuset.c >> +++ b/kernel/cgroup/cpuset.c >> @@ -1806,9 +1806,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, >> cpuset_for_each_child(cp, css, parent) >> if (is_partition_valid(cp) && >> cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) { >> + if (!css_tryget_online(&cp->css)) >> + continue; >> rcu_read_unlock(); >> update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp); >> rcu_read_lock(); >> + css_put(&cp->css); > Apologies for a possibly noob question -- why is RCU read lock > temporarily dropped within the loop? > (Is it only because of callback_lock or cgroup_file_kn_lock (via > notify_partition_change()) on PREEMPT_RT?) > > > > [ > OT question: > cpuset_for_each_child(cp, css, parent) (1) > if (is_partition_valid(cp) && > cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) { > if (!css_tryget_online(&cp->css)) > continue; > rcu_read_unlock(); > update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp); > ... > update_tasks_cpumask(cp->parent) > ... > css_task_iter_start(&cp->parent->css, 0, &it); (2) > ... > rcu_read_lock(); > css_put(&cp->css); > } > > May this touch each task same number of times as its depth within > herarchy? I believe the primary reason is because update_parent_subparts_cpumask() can potential run for quite a while. So we don't want to hold the rcu_read_lock for too long. There may also be a potential that schedule() may be called. Cheers, Longman
On Mon, Jul 10, 2023 at 11:40:36AM -0400, Waiman Long <longman@redhat.com> wrote: > I believe the primary reason is because update_parent_subparts_cpumask() can > potential run for quite a while. So we don't want to hold the rcu_read_lock > for too long. But holding cpuset_mutex is even worse than rcu_read_lock()? IOW is the relieve with this reason worth it? > There may also be a potential that schedule() may be called. Do you mean the spinlocks with PREEMPT_RT or anything else? (That seems like the actual reason IIUC.) Thanks, Michal
On 2023/7/10 23:40, Waiman Long wrote: > On 7/10/23 11:11, Michal Koutný wrote: >> Hello. >> >> On Sat, Jul 01, 2023 at 02:50:49PM +0800, Miaohe Lin <linmiaohe@huawei.com> wrote: >>> --- a/kernel/cgroup/cpuset.c >>> +++ b/kernel/cgroup/cpuset.c >>> @@ -1806,9 +1806,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, >>> cpuset_for_each_child(cp, css, parent) >>> if (is_partition_valid(cp) && >>> cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) { >>> + if (!css_tryget_online(&cp->css)) >>> + continue; >>> rcu_read_unlock(); >>> update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp); >>> rcu_read_lock(); >>> + css_put(&cp->css); >> Apologies for a possibly noob question -- why is RCU read lock >> temporarily dropped within the loop? >> (Is it only because of callback_lock or cgroup_file_kn_lock (via >> notify_partition_change()) on PREEMPT_RT?) >> >> >> >> [ >> OT question: >> cpuset_for_each_child(cp, css, parent) (1) >> if (is_partition_valid(cp) && >> cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) { >> if (!css_tryget_online(&cp->css)) >> continue; >> rcu_read_unlock(); >> update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp); >> ... >> update_tasks_cpumask(cp->parent) >> ... >> css_task_iter_start(&cp->parent->css, 0, &it); (2) >> ... >> rcu_read_lock(); >> css_put(&cp->css); >> } >> >> May this touch each task same number of times as its depth within >> herarchy? > > I believe the primary reason is because update_parent_subparts_cpumask() can potential run for quite a while. So we don't want to hold the rcu_read_lock for too long. There may also be a potential that schedule() may be called. IMHO, the reason should be as same as the below commit: commit 2bdfd2825c9662463371e6691b1a794e97fa36b4 Author: Waiman Long <longman@redhat.com> Date: Wed Feb 2 22:31:03 2022 -0500 cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning It was found that a "suspicious RCU usage" lockdep warning was issued with the rcu_read_lock() call in update_sibling_cpumasks(). It is because the update_cpumasks_hier() function may sleep. So we have to release the RCU lock, call update_cpumasks_hier() and reacquire it afterward. Also add a percpu_rwsem_assert_held() in update_sibling_cpumasks() instead of stating that in the comment. Thanks both.
On Tue, Jul 11, 2023 at 10:52:02AM +0800, Miaohe Lin <linmiaohe@huawei.com> wrote: > commit 2bdfd2825c9662463371e6691b1a794e97fa36b4 > Author: Waiman Long <longman@redhat.com> > Date: Wed Feb 2 22:31:03 2022 -0500 > > cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning Aha, thanks for the pointer. I've also found a paragraph in [1]: > In addition, the -rt patchset turns spinlocks into a sleeping locks so > that the corresponding critical sections can be preempted, which also > means that these sleeplockified spinlocks (but not other sleeping > locks!) may be acquire within -rt-Linux-kernel RCU read-side critical > sections. That suggests (together with practical use) that dicussed spinlocks should be fine in RCU read section. And the possible reason is deeper in generate_sched_domains() that do kmalloc(..., GFP_KERNEL). Alas update_cpumask_hier() still calls generate_sched_domains(), OTOH, update_parent_subparts_cpumask() doesn't seem so. The idea to not relieve rcu_read_lock() in update_cpumask() iteration (instead of the technically unneeded refcnt bump) would have to be verified with CONFIG_PROVE_RCU && CONFIG_LOCKDEP. WDYT? Michal [1] https://www.kernel.org/doc/html/latest/RCU/Design/Requirements/Requirements.html?highlight=rcu+read+section#specialization
On 2023/7/11 19:52, Michal Koutný wrote: > On Tue, Jul 11, 2023 at 10:52:02AM +0800, Miaohe Lin <linmiaohe@huawei.com> wrote: >> commit 2bdfd2825c9662463371e6691b1a794e97fa36b4 >> Author: Waiman Long <longman@redhat.com> >> Date: Wed Feb 2 22:31:03 2022 -0500 >> >> cgroup/cpuset: Fix "suspicious RCU usage" lockdep warning > > Aha, thanks for the pointer. > > I've also found a paragraph in [1]: >> In addition, the -rt patchset turns spinlocks into a sleeping locks so >> that the corresponding critical sections can be preempted, which also >> means that these sleeplockified spinlocks (but not other sleeping >> locks!) may be acquire within -rt-Linux-kernel RCU read-side critical >> sections. > > That suggests (together with practical use) that dicussed spinlocks > should be fine in RCU read section. And the possible reason is deeper in > generate_sched_domains() that do kmalloc(..., GFP_KERNEL). update_parent_subparts_cpumask() would call update_flag() that do kmemdup(..., GFP_KERNEL)? > > Alas update_cpumask_hier() still calls generate_sched_domains(), OTOH, > update_parent_subparts_cpumask() doesn't seem so. It seems update_parent_subparts_cpumask() doesn't call generate_sched_domains(). > > The idea to not relieve rcu_read_lock() in update_cpumask() iteration > (instead of the technically unneeded refcnt bump) would have to be > verified with CONFIG_PROVE_RCU && CONFIG_LOCKDEP. WDYT? The idea to relieve rcu_read_lock() in update_cpumask() iteration was initially introduced via the below commit: commit d7c8142d5a5534c3c7de214e35a40a493a32b98e Author: Waiman Long <longman@redhat.com> Date: Thu Sep 1 16:57:43 2022 -0400 cgroup/cpuset: Make partition invalid if cpumask change violates exclusivity rule Currently, changes in "cpust.cpus" of a partition root is not allowed if it violates the sibling cpu exclusivity rule when the check is done in the validate_change() function. That is inconsistent with the other cpuset changes that are always allowed but may make a partition invalid. Update the cpuset code to allow cpumask change even if it violates the sibling cpu exclusivity rule, but invalidate the partition instead just like the other changes. However, other sibling partitions with conflicting cpumask will also be invalidated in order to not violating the exclusivity rule. This behavior is specific to this partition rule violation. Note that a previous commit has made sibling cpu exclusivity rule check the last check of validate_change(). So if -EINVAL is returned, we can be sure that sibling cpu exclusivity rule violation is the only rule that is broken. It would be really helpful if @Waiman can figure this out. Thanks both.
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c index 58e6f18f01c1..632a9986d5de 100644 --- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1806,9 +1806,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs, cpuset_for_each_child(cp, css, parent) if (is_partition_valid(cp) && cpumask_intersects(trialcs->cpus_allowed, cp->cpus_allowed)) { + if (!css_tryget_online(&cp->css)) + continue; rcu_read_unlock(); update_parent_subparts_cpumask(cp, partcmd_invalidate, NULL, &tmp); rcu_read_lock(); + css_put(&cp->css); } rcu_read_unlock(); retval = 0;