Message ID | 20230125101334.1069060-4-peternewman@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp198964wrn; Wed, 25 Jan 2023 02:23:23 -0800 (PST) X-Google-Smtp-Source: AMrXdXuhXeW7vBlhR9iEjf9hxVymiM6cBGiV1lif5BWF8U9El+RUGHjxNOa2C0E3HfHvKEd2k1zu X-Received: by 2002:a05:6402:10d4:b0:48f:fcc9:665e with SMTP id p20-20020a05640210d400b0048ffcc9665emr41666596edu.0.1674642203574; Wed, 25 Jan 2023 02:23:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674642203; cv=none; d=google.com; s=arc-20160816; b=Mcdm0GTi/9gHqAt8Ove7RaiwH2rSHSqPMSaglrdlQTm1q48Qh0eiBuFgi/aUMT2hrU dakAaE0MsvfpKjuT6yC39+g5ANSZSTgq71/LJgxmOBBTtwLR0ZcxbIGql7cKjGRA1IlQ uDeHHhXTcECN+9+xl+CwGhVRiPX5TtRRVWgu7Pb0qXYoD93HLM86KF/5JjiVNpPmKEij 9yI0bAHJWfNhpuDn9qJOZnPlupZzFjVLsbCopRJhroUfMk8u/MrHAVUrN4N6BLbtdmWT F28zHjvNr+AO8VeQLrf290b2k6Lan/ldMZPu/Q6hI8RCG84vZ7MlAgZOjTS8qQ/qeQIB jf+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=tFANQsplcfYmN7Cu3ZOpiMje75MfpJ9o5dr0Y/I/sbo=; b=GK1FN82GT2wYYWp/7Pm9fGh0GCgoscLwYM4bNFb/ozJ01gjoMQsN8MED1IwbeAoYJd W16I7EDQXVeLSzWeIVW8GxctFnvBuUvjIuA8mLvKXMyvUuUfsabEpOlF6M2Np9xelsZe SfqyCqnASmkjD49RYaSvO7UVwk2T/PHFnGhmM+UYKKlx8onc7lv8L/0K0IuJ7GfEe2on i+lDAZnN73BFyl8qT0D0KFlgbQ65kIGEASYy0dvo2hbi3GqWCvWpsMp3xeDoYcTtnf0y Woxx07Z05ciqp51F7Yut4NOmcmo8wxyiMzTJHgX+WCgH818PP8RI5rKevxxof/3oN5xD Zxdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=O8VIBCPk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ec27-20020a0564020d5b00b0045731196587si3459260edb.64.2023.01.25.02.22.59; Wed, 25 Jan 2023 02:23:23 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=O8VIBCPk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234431AbjAYKOG (ORCPT <rfc822;rust.linux@gmail.com> + 99 others); Wed, 25 Jan 2023 05:14:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53368 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235487AbjAYKOC (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 25 Jan 2023 05:14:02 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0306A13502 for <linux-kernel@vger.kernel.org>; Wed, 25 Jan 2023 02:13:54 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-4cddba76f55so185675157b3.23 for <linux-kernel@vger.kernel.org>; Wed, 25 Jan 2023 02:13:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=tFANQsplcfYmN7Cu3ZOpiMje75MfpJ9o5dr0Y/I/sbo=; b=O8VIBCPkSHVXBUTdp+6dObEdarVkoyg8VZeXFQryzghwGEbgSgaG51+9JdcnG5iYBg MUfKU4+EXidCxnEa0lc1wn6DDGQzCx9pwBFR6nufNT3tbXnavgu8XFN3j5y+R0clCaQJ RkmcsXkJVg7j5ytpKWgXYWxvyjwHcRIdfMFDUHZC8xMV7Z5baskFowkjOHxfzh4PzrhQ OTNhAtUBiE78GJUyrieidEbeN/NLfKPNOsAhO9It64x87uTm+NgA9D3gAA2NGWyOObGq pR9/yyvDIcWntbPuWjso3LfIXtTTtNxCJZTCOcbsbkj0w8HBwKbz1XUxtOovk3Lj4R7D B2Fg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tFANQsplcfYmN7Cu3ZOpiMje75MfpJ9o5dr0Y/I/sbo=; b=JUamtVDqNByxvm4NNgTBAR/g4gvEBpqCZBA+KurhthlGVFYzXZU/ZM2A7bpu1damo0 YcH82Z8/uCPkCH5UzvfEQ9n2rfrAgGWVCTBvhNpsK+U3WSmACQqNSoyTT0QJxn/56C00 1FEG9RKOj/wYqIBifEeLNdloSWGQxKHH6hF1S7Sw61MimnZfJwCdpGfhn6X/W+AUcYzc qNabXjuXABywUd8OwVz5hMy4pknBb+mV3Pv0sfcuqPDbvE/RoYy5QDsWsZcYF7puwTGX gETYKN1Le6gzNx8NY/0377VqaZ1zvZ81Etud2xCEmCFuhxSA9BL2wp5LuuDTEVgKPaWF pk8g== X-Gm-Message-State: AFqh2krfCynZyXWl9Uee3W6Ednm/p3LEpx+7fPoqi17B2N6VbdupHFCx okaA4lgZMem5I/ITeCbzMbm510bcRQylKAjY+g== X-Received: from peternewman10.zrh.corp.google.com ([2a00:79e0:9d:6:e533:80e6:38fe:22c]) (user=peternewman job=sendgmr) by 2002:a25:7dc7:0:b0:766:2e0a:55ff with SMTP id y190-20020a257dc7000000b007662e0a55ffmr3693462ybc.325.1674641633188; Wed, 25 Jan 2023 02:13:53 -0800 (PST) Date: Wed, 25 Jan 2023 11:13:34 +0100 In-Reply-To: <20230125101334.1069060-1-peternewman@google.com> Mime-Version: 1.0 References: <20230125101334.1069060-1-peternewman@google.com> X-Mailer: git-send-email 2.39.1.405.gd4c25cc71f-goog Message-ID: <20230125101334.1069060-4-peternewman@google.com> Subject: [PATCH v3 3/3] x86/resctrl: Implement rename op for mon groups From: Peter Newman <peternewman@google.com> To: reinette.chatre@intel.com, fenghua.yu@intel.com Cc: Babu.Moger@amd.com, bp@alien8.de, dave.hansen@linux.intel.com, eranian@google.com, gupasani@google.com, hpa@zytor.com, james.morse@arm.com, linux-kernel@vger.kernel.org, mingo@redhat.com, skodak@google.com, tony.luck@intel.com, tglx@linutronix.de, x86@kernel.org, Peter Newman <peternewman@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755989623104742307?= X-GMAIL-MSGID: =?utf-8?q?1755989623104742307?= |
Series |
x86/resctrl: Implement rename to help move containers' tasks
|
|
Commit Message
Peter Newman
Jan. 25, 2023, 10:13 a.m. UTC
To change the class of service for a large group of tasks, such as an
application container, a container manager must write all of the tasks'
IDs into the tasks file interface of the new control group.
If a container manager is tracking containers' bandwidth usage by
placing tasks from each into their own monitoring group, it must first
move the tasks to the default monitoring group of the new control group
before it can move the tasks into their new monitoring groups. This is
undesirable because it makes bandwidth usage during the move
unattributable to the correct tasks and resets monitoring event counters
and cache usage information for the group.
To address this, implement the rename operation for resctrlfs mon groups
to effect a change in CLOSID for a MON group while otherwise leaving the
monitoring group intact.
Signed-off-by: Peter Newman <peternewman@google.com>
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 75 ++++++++++++++++++++++++++
1 file changed, 75 insertions(+)
Comments
Hi Peter, On 1/25/2023 2:13 AM, Peter Newman wrote: > To change the class of service for a large group of tasks, such as an > application container, a container manager must write all of the tasks' > IDs into the tasks file interface of the new control group. > > If a container manager is tracking containers' bandwidth usage by > placing tasks from each into their own monitoring group, it must first > move the tasks to the default monitoring group of the new control group > before it can move the tasks into their new monitoring groups. This is > undesirable because it makes bandwidth usage during the move > unattributable to the correct tasks and resets monitoring event counters > and cache usage information for the group. > > To address this, implement the rename operation for resctrlfs mon groups > to effect a change in CLOSID for a MON group while otherwise leaving the > monitoring group intact. > > Signed-off-by: Peter Newman <peternewman@google.com> > --- > arch/x86/kernel/cpu/resctrl/rdtgroup.c | 75 ++++++++++++++++++++++++++ > 1 file changed, 75 insertions(+) > > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > index b2081bc1bbfd..595f83a517c6 100644 > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > @@ -3238,6 +3238,80 @@ static int rdtgroup_rmdir(struct kernfs_node *kn) > return ret; > } > > +static void mongrp_move(struct rdtgroup *rdtgrp, struct rdtgroup *new_prdtgrp, > + cpumask_var_t cpus) Could you please add some function comments on what is moved and how it is accomplished? > +{ > + struct rdtgroup *prdtgrp = rdtgrp->mon.parent; > + struct task_struct *p, *t; > + > + WARN_ON(list_empty(&prdtgrp->mon.crdtgrp_list)); > + list_del(&rdtgrp->mon.crdtgrp_list); > + > + list_add_tail(&rdtgrp->mon.crdtgrp_list, > + &new_prdtgrp->mon.crdtgrp_list); > + rdtgrp->mon.parent = new_prdtgrp; > + > + read_lock(&tasklist_lock); > + for_each_process_thread(p, t) { > + if (is_closid_match(t, prdtgrp) && is_rmid_match(t, rdtgrp)) > + rdt_move_one_task(t, new_prdtgrp->closid, t->rmid, > + cpus); > + } > + read_unlock(&tasklist_lock); Can rdt_move_group_tasks() be used here? > + > + update_closid_rmid(cpus, NULL); > +} I see the tasks in a monitor group handled. There is another way to create/manage a monitor group. For example, by not writing tasks to the tasks file but instead to write CPU ids to the CPU file. All tasks on a particular CPU will be monitored by that group. One rule is that a MON group may only have CPUs that are owned by the CTRL_MON group. It is not clear to me how such a group is handled in this work. > + > +static int rdtgroup_rename(struct kernfs_node *kn, > + struct kernfs_node *new_parent, const char *new_name) > +{ > + struct rdtgroup *new_prdtgrp; > + struct rdtgroup *rdtgrp; > + cpumask_var_t tmpmask; > + int ret; > + > + if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL)) > + return -ENOMEM; > + > + rdtgrp = kernfs_to_rdtgroup(kn); > + new_prdtgrp = kernfs_to_rdtgroup(new_parent); > + if (!rdtgrp || !new_prdtgrp) { > + free_cpumask_var(tmpmask); > + return -EPERM; > + } > + How robust is this against user space attempting to move files? > + /* Release both kernfs active_refs before obtaining rdtgroup mutex. */ > + rdtgroup_kn_get(rdtgrp, kn); > + rdtgroup_kn_get(new_prdtgrp, new_parent); > + > + mutex_lock(&rdtgroup_mutex); > + > + if ((rdtgrp->flags & RDT_DELETED) || (new_prdtgrp->flags & RDT_DELETED)) { > + ret = -ESRCH; > + goto out; > + } > + > + /* Only a mon group can be moved to a new mon_groups directory. */ > + if (rdtgrp->type != RDTMON_GROUP || > + !is_mon_groups(new_parent, kn->name)) { > + ret = -EPERM; > + goto out; > + } > + Should in-place moves be allowed? > + ret = kernfs_rename(kn, new_parent, new_name); > + if (ret) > + goto out; > + > + mongrp_move(rdtgrp, new_prdtgrp, tmpmask); > + Can tmpmask allocation/free be done in mongrp_move()? > +out: > + mutex_unlock(&rdtgroup_mutex); > + rdtgroup_kn_put(rdtgrp, kn); > + rdtgroup_kn_put(new_prdtgrp, new_parent); > + free_cpumask_var(tmpmask); > + return ret; > +} > + > static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf) > { > if (resctrl_arch_get_cdp_enabled(RDT_RESOURCE_L3)) > @@ -3255,6 +3329,7 @@ static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf) > static struct kernfs_syscall_ops rdtgroup_kf_syscall_ops = { > .mkdir = rdtgroup_mkdir, > .rmdir = rdtgroup_rmdir, > + .rename = rdtgroup_rename, > .show_options = rdtgroup_show_options, > }; > Reinette
Hi Reinette, On Sat, Feb 11, 2023 at 12:21 AM Reinette Chatre <reinette.chatre@intel.com> wrote: > On 1/25/2023 2:13 AM, Peter Newman wrote: > > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c > > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > > @@ -3238,6 +3238,80 @@ static int rdtgroup_rmdir(struct kernfs_node *kn) > > return ret; > > } > > > > +static void mongrp_move(struct rdtgroup *rdtgrp, struct rdtgroup *new_prdtgrp, > > + cpumask_var_t cpus) > > Could you please add some function comments on what is moved and how it is accomplished? Sure, I think I should also make the name more descriptive. I think "move" is too high-level here. > > + for_each_process_thread(p, t) { > > + if (is_closid_match(t, prdtgrp) && is_rmid_match(t, rdtgrp)) > > + rdt_move_one_task(t, new_prdtgrp->closid, t->rmid, > > + cpus); > > + } > > + read_unlock(&tasklist_lock); > > Can rdt_move_group_tasks() be used here? As it is today, rdt_move_group_tasks() would move too many tasks. mongrp_move() needs both the CLOSID and RMID to match, while rdt_move_group_tasks() only needs 0-1 of the two to match. I tried adding more parameters to rdt_move_group_tasks() to change the move condition, but I couldn't make the resulting code not look gross and after factoring the tricky stuff into rdt_move_one_task(), rdt_move_group_tasks() didn't look interesting enough to reuse. > > > + > > + update_closid_rmid(cpus, NULL); > > +} > > I see the tasks in a monitor group handled. There is another way to create/manage > a monitor group. For example, by not writing tasks to the tasks file but instead > to write CPU ids to the CPU file. All tasks on a particular CPU will be monitored > by that group. One rule is that a MON group may only have CPUs that are owned by > the CTRL_MON group. > It is not clear to me how such a group is handled in this work. Right, I overlooked this form of monitoring. I don't think it makes sense to move such a monitoring group, because a CPU can only be assigned to one CTRL_MON group. Therefore a move of a MON group with an assigned CPU will always violate the parent CTRL_MON group rule after the move. Based on this, I think rename of a MON group should fail when rdtgrp->cpu_mask is non-zero. > > > > + > > +static int rdtgroup_rename(struct kernfs_node *kn, > > + struct kernfs_node *new_parent, const char *new_name) > > +{ > > + struct rdtgroup *new_prdtgrp; > > + struct rdtgroup *rdtgrp; > > + cpumask_var_t tmpmask; > > + int ret; > > + > > + if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL)) > > + return -ENOMEM; > > + > > + rdtgrp = kernfs_to_rdtgroup(kn); > > + new_prdtgrp = kernfs_to_rdtgroup(new_parent); > > + if (!rdtgrp || !new_prdtgrp) { > > + free_cpumask_var(tmpmask); > > + return -EPERM; > > + } > > + > > How robust is this against user space attempting to move files? I'm not sure I understand the question. Can you be more specific? > > > + /* Release both kernfs active_refs before obtaining rdtgroup mutex. */ > > + rdtgroup_kn_get(rdtgrp, kn); > > + rdtgroup_kn_get(new_prdtgrp, new_parent); > > + > > + mutex_lock(&rdtgroup_mutex); > > + > > + if ((rdtgrp->flags & RDT_DELETED) || (new_prdtgrp->flags & RDT_DELETED)) { > > + ret = -ESRCH; > > + goto out; > > + } > > + > > + /* Only a mon group can be moved to a new mon_groups directory. */ > > + if (rdtgrp->type != RDTMON_GROUP || > > + !is_mon_groups(new_parent, kn->name)) { > > + ret = -EPERM; > > + goto out; > > + } > > + > > Should in-place moves be allowed? I don't think it's useful in the context of the intended use case. Also, it looks like kernfs_rename() would fail with EEXIST if I tried. If it were important to someone, I could make it succeed before calling into kernfs_rename(). > > > + ret = kernfs_rename(kn, new_parent, new_name); > > + if (ret) > > + goto out; > > + > > + mongrp_move(rdtgrp, new_prdtgrp, tmpmask); > > + > > Can tmpmask allocation/free be done in mongrp_move()? Yes, but it looked like most other functions in this file allocate the cpumask up front before validating parameters. If you have a preference for internalizing its allocation within mongrp_move(), I can try it. Thank you for your review. -Peter
Hi Peter, On 3/2/2023 6:26 AM, Peter Newman wrote: > On Sat, Feb 11, 2023 at 12:21 AM Reinette Chatre > <reinette.chatre@intel.com> wrote: > >> On 1/25/2023 2:13 AM, Peter Newman wrote: > >>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c >>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c >>> @@ -3238,6 +3238,80 @@ static int rdtgroup_rmdir(struct kernfs_node *kn) >>> return ret; >>> } >>> >>> +static void mongrp_move(struct rdtgroup *rdtgrp, struct rdtgroup *new_prdtgrp, >>> + cpumask_var_t cpus) >> >> Could you please add some function comments on what is moved and how it is accomplished? > > Sure, I think I should also make the name more descriptive. I think > "move" is too high-level here. > > >>> + for_each_process_thread(p, t) { >>> + if (is_closid_match(t, prdtgrp) && is_rmid_match(t, rdtgrp)) >>> + rdt_move_one_task(t, new_prdtgrp->closid, t->rmid, >>> + cpus); >>> + } >>> + read_unlock(&tasklist_lock); >> >> Can rdt_move_group_tasks() be used here? > > As it is today, rdt_move_group_tasks() would move too many tasks. > mongrp_move() needs both the CLOSID and RMID to match, while > rdt_move_group_tasks() only needs 0-1 of the two to match. > > I tried adding more parameters to rdt_move_group_tasks() to change the > move condition, but I couldn't make the resulting code not look gross > and after factoring the tricky stuff into rdt_move_one_task(), > rdt_move_group_tasks() didn't look interesting enough to reuse. Could it be made readable by adding a compare function as parameter to rdt_move_group_tasks() that is used to determine if a task should be moved? >>> + >>> + update_closid_rmid(cpus, NULL); >>> +} >> >> I see the tasks in a monitor group handled. There is another way to create/manage >> a monitor group. For example, by not writing tasks to the tasks file but instead >> to write CPU ids to the CPU file. All tasks on a particular CPU will be monitored >> by that group. One rule is that a MON group may only have CPUs that are owned by >> the CTRL_MON group. >> It is not clear to me how such a group is handled in this work. > > Right, I overlooked this form of monitoring. > > I don't think it makes sense to move such a monitoring group, because a > CPU can only be assigned to one CTRL_MON group. Therefore a move of a MON > group with an assigned CPU will always violate the parent CTRL_MON group > rule after the move. > > Based on this, I think rename of a MON group should fail when > rdtgrp->cpu_mask is non-zero. I think this is fair. Also please write something useful in the last command status buffer. >>> + >>> +static int rdtgroup_rename(struct kernfs_node *kn, >>> + struct kernfs_node *new_parent, const char *new_name) >>> +{ >>> + struct rdtgroup *new_prdtgrp; >>> + struct rdtgroup *rdtgrp; >>> + cpumask_var_t tmpmask; >>> + int ret; >>> + >>> + if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL)) >>> + return -ENOMEM; >>> + >>> + rdtgrp = kernfs_to_rdtgroup(kn); >>> + new_prdtgrp = kernfs_to_rdtgroup(new_parent); >>> + if (!rdtgrp || !new_prdtgrp) { >>> + free_cpumask_var(tmpmask); >>> + return -EPERM; >>> + } >>> + >> >> How robust is this against user space attempting to move files? > > I'm not sure I understand the question. Can you be more specific? This commit adds support for rename to resctrl. I thus expect this function to be called when user space attempts to move _any_ of the files and/or directories within resctrl. This could be out of curiosity, buggy, or maliciousness. I would like to understand how robust this code would be against such attempts because I do not see much checking before the preparation to do the move is started. >>> + /* Release both kernfs active_refs before obtaining rdtgroup mutex. */ >>> + rdtgroup_kn_get(rdtgrp, kn); >>> + rdtgroup_kn_get(new_prdtgrp, new_parent); >>> + >>> + mutex_lock(&rdtgroup_mutex); >>> + >>> + if ((rdtgrp->flags & RDT_DELETED) || (new_prdtgrp->flags & RDT_DELETED)) { >>> + ret = -ESRCH; >>> + goto out; >>> + } >>> + >>> + /* Only a mon group can be moved to a new mon_groups directory. */ >>> + if (rdtgrp->type != RDTMON_GROUP || >>> + !is_mon_groups(new_parent, kn->name)) { >>> + ret = -EPERM; >>> + goto out; >>> + } >>> + >> >> Should in-place moves be allowed? > > I don't think it's useful in the context of the intended use case. > Also, it looks like kernfs_rename() would fail with EEXIST if I tried. > From what I can tell kernfs_rename() will return EEXIST if there is an attempt to create file/directory with the same name at the same place. What I am asking about is if user space requests to change the name of a monitoring group without moving it to a new resource group. This seems to be supported by this code but if it is supported it could likely be done more efficiently since no tasks need to be moved because neither closid nor rmid needs to change. > If it were important to someone, I could make it succeed before calling > into kernfs_rename(). > > >> >>> + ret = kernfs_rename(kn, new_parent, new_name); >>> + if (ret) >>> + goto out; >>> + >>> + mongrp_move(rdtgrp, new_prdtgrp, tmpmask); >>> + >> >> Can tmpmask allocation/free be done in mongrp_move()? > > Yes, but it looked like most other functions in this file allocate the > cpumask up front before validating parameters. If you have a preference > for internalizing its allocation within mongrp_move(), I can try it. Could you please elaborate what the concern is? From what I can tell mongrp_move() is the only user of the cpumask so it is not clear to me why it cannot take care of the allocation and free. When referring to existing code I assume you mean rdtgroup_rmdir(). This is the only code I could find in this file with this pattern. I looked back at the history and indeed the cpumask was allocated where it was used but the flow was changed to the current when support for monitoring groups were added. See f9049547f7e7 ("x86/intel_rdt: Separate the ctrl bits from rmdir") I do not see a requirement for doing the allocations in that way. Reinette
Hi Reinette, On Thu, Mar 2, 2023 at 11:27 PM Reinette Chatre <reinette.chatre@intel.com> wrote: > On 3/2/2023 6:26 AM, Peter Newman wrote: > > On Sat, Feb 11, 2023 at 12:21 AM Reinette Chatre > > <reinette.chatre@intel.com> wrote: > > > >> On 1/25/2023 2:13 AM, Peter Newman wrote: > >>> + for_each_process_thread(p, t) { > >>> + if (is_closid_match(t, prdtgrp) && is_rmid_match(t, rdtgrp)) > >>> + rdt_move_one_task(t, new_prdtgrp->closid, t->rmid, > >>> + cpus); > >>> + } > >>> + read_unlock(&tasklist_lock); > >> > >> Can rdt_move_group_tasks() be used here? > > > > As it is today, rdt_move_group_tasks() would move too many tasks. > > mongrp_move() needs both the CLOSID and RMID to match, while > > rdt_move_group_tasks() only needs 0-1 of the two to match. > > > > I tried adding more parameters to rdt_move_group_tasks() to change the > > move condition, but I couldn't make the resulting code not look gross > > and after factoring the tricky stuff into rdt_move_one_task(), > > rdt_move_group_tasks() didn't look interesting enough to reuse. > > Could it be made readable by adding a compare function as parameter to > rdt_move_group_tasks() that is used to determine if a task should be moved? Yes, I think that would be ok in this case. That shouldn't have any cost if these are all static functions. As long as we have an rdt_move_group_tasks() function, it's a liability to have a separate task-moving loop for someone to miss in the future. Should I still bother with factoring out rdt_move_one_task() in the parent patch? It was motivated by my creating a new task-moving loop in this patch. > >>> + > >>> + rdtgrp = kernfs_to_rdtgroup(kn); > >>> + new_prdtgrp = kernfs_to_rdtgroup(new_parent); > >>> + if (!rdtgrp || !new_prdtgrp) { > >>> + free_cpumask_var(tmpmask); > >>> + return -EPERM; > >>> + } > >>> + > >> > >> How robust is this against user space attempting to move files? > > > > I'm not sure I understand the question. Can you be more specific? > > This commit adds support for rename to resctrl. I thus expect this > function to be called when user space attempts to move _any_ of > the files and/or directories within resctrl. This could be out of > curiosity, buggy, or maliciousness. I would like to understand how > robust this code would be against such attempts because I do not see > much checking before the preparation to do the move is started. Now I see, thanks. kernfs_to_rdtgroup() will return the parent rdtgroup when kn or new_parent is a file, which will lead to kernfs_rename() moving a file out of a group or clobbering another file node. I'll need to enforce that kn and new_parent are rdtgroup directories and not file nodes. Assuming that the paths of kn and new_parent exactly match their rdtgroup directories, I believe the checks below are sufficient to constrain the operation to only moving MON groups to existing mon_groups directories. > >> Should in-place moves be allowed? > > > > I don't think it's useful in the context of the intended use case. > > Also, it looks like kernfs_rename() would fail with EEXIST if I tried. > > > > From what I can tell kernfs_rename() will return EEXIST if there > is an attempt to create file/directory with the same name at the same place. > What I am asking about is if user space requests to change the name > of a monitoring group without moving it to a new resource group. This seems > to be supported by this code but if it is supported it could likely be > done more efficiently since no tasks need to be moved because neither > closid nor rmid needs to change. Yes, I see now. I'll try skipping the mongrp_move() call below when new_parent is already the parent of rdtgrp to optimize the simple rename use case. > >> Can tmpmask allocation/free be done in mongrp_move()? > > > > Yes, but it looked like most other functions in this file allocate the > > cpumask up front before validating parameters. If you have a preference > > for internalizing its allocation within mongrp_move(), I can try it. > > Could you please elaborate what the concern is? From what I can tell > mongrp_move() is the only user of the cpumask so it is not clear to > me why it cannot take care of the allocation and free. > > When referring to existing code I assume you mean rdtgroup_rmdir(). This > is the only code I could find in this file with this pattern. I looked > back at the history and indeed the cpumask was allocated where it was > used but the flow was changed to the current when support for monitoring > groups were added. See f9049547f7e7 ("x86/intel_rdt: Separate the ctrl bits from rmdir") > I do not see a requirement for doing the allocations in that way. I looked over this again in more detail... I need to choose whether to call kernfs_rename() or mongrp_move() first. If the second call fails, the first needs to be reverted. It's feasible to ensure that a mongrp_move() call will be successful before calling kernfs_rename(), but not the other way around. If I allow mongrp_move() to fail, kernfs_rename() should be reversible thanks to the prior checks validating this use case, but I would prefer to eliminate the need for a revert on cleanup entirely. -Peter
Hi Peter, On 3/3/2023 7:10 AM, Peter Newman wrote: > On Thu, Mar 2, 2023 at 11:27 PM Reinette Chatre > <reinette.chatre@intel.com> wrote: >> On 3/2/2023 6:26 AM, Peter Newman wrote: >>> On Sat, Feb 11, 2023 at 12:21 AM Reinette Chatre >>> <reinette.chatre@intel.com> wrote: >>> >>>> On 1/25/2023 2:13 AM, Peter Newman wrote: >>>>> + for_each_process_thread(p, t) { >>>>> + if (is_closid_match(t, prdtgrp) && is_rmid_match(t, rdtgrp)) >>>>> + rdt_move_one_task(t, new_prdtgrp->closid, t->rmid, >>>>> + cpus); >>>>> + } >>>>> + read_unlock(&tasklist_lock); >>>> >>>> Can rdt_move_group_tasks() be used here? >>> >>> As it is today, rdt_move_group_tasks() would move too many tasks. >>> mongrp_move() needs both the CLOSID and RMID to match, while >>> rdt_move_group_tasks() only needs 0-1 of the two to match. >>> >>> I tried adding more parameters to rdt_move_group_tasks() to change the >>> move condition, but I couldn't make the resulting code not look gross >>> and after factoring the tricky stuff into rdt_move_one_task(), >>> rdt_move_group_tasks() didn't look interesting enough to reuse. >> >> Could it be made readable by adding a compare function as parameter to >> rdt_move_group_tasks() that is used to determine if a task should be moved? > > Yes, I think that would be ok in this case. That shouldn't have any > cost if these are all static functions. > > As long as we have an rdt_move_group_tasks() function, it's a liability > to have a separate task-moving loop for someone to miss in the future. Agreed. > Should I still bother with factoring out rdt_move_one_task() in the > parent patch? It was motivated by my creating a new task-moving loop > in this patch. I do not think that refactoring is necessary if you go this route. >>>>> + >>>>> + rdtgrp = kernfs_to_rdtgroup(kn); >>>>> + new_prdtgrp = kernfs_to_rdtgroup(new_parent); >>>>> + if (!rdtgrp || !new_prdtgrp) { >>>>> + free_cpumask_var(tmpmask); >>>>> + return -EPERM; >>>>> + } >>>>> + >>>> >>>> How robust is this against user space attempting to move files? >>> >>> I'm not sure I understand the question. Can you be more specific? >> >> This commit adds support for rename to resctrl. I thus expect this >> function to be called when user space attempts to move _any_ of >> the files and/or directories within resctrl. This could be out of >> curiosity, buggy, or maliciousness. I would like to understand how >> robust this code would be against such attempts because I do not see >> much checking before the preparation to do the move is started. > > Now I see, thanks. > > kernfs_to_rdtgroup() will return the parent rdtgroup when > kn or new_parent is a file, which will lead to kernfs_rename() moving > a file out of a group or clobbering another file node. I'll need to > enforce that kn and new_parent are rdtgroup directories and not file > nodes. Perhaps additionally that the source directories have mon_groups as parent, similar to the destination check you already have. Reinette
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c index b2081bc1bbfd..595f83a517c6 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -3238,6 +3238,80 @@ static int rdtgroup_rmdir(struct kernfs_node *kn) return ret; } +static void mongrp_move(struct rdtgroup *rdtgrp, struct rdtgroup *new_prdtgrp, + cpumask_var_t cpus) +{ + struct rdtgroup *prdtgrp = rdtgrp->mon.parent; + struct task_struct *p, *t; + + WARN_ON(list_empty(&prdtgrp->mon.crdtgrp_list)); + list_del(&rdtgrp->mon.crdtgrp_list); + + list_add_tail(&rdtgrp->mon.crdtgrp_list, + &new_prdtgrp->mon.crdtgrp_list); + rdtgrp->mon.parent = new_prdtgrp; + + read_lock(&tasklist_lock); + for_each_process_thread(p, t) { + if (is_closid_match(t, prdtgrp) && is_rmid_match(t, rdtgrp)) + rdt_move_one_task(t, new_prdtgrp->closid, t->rmid, + cpus); + } + read_unlock(&tasklist_lock); + + update_closid_rmid(cpus, NULL); +} + +static int rdtgroup_rename(struct kernfs_node *kn, + struct kernfs_node *new_parent, const char *new_name) +{ + struct rdtgroup *new_prdtgrp; + struct rdtgroup *rdtgrp; + cpumask_var_t tmpmask; + int ret; + + if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL)) + return -ENOMEM; + + rdtgrp = kernfs_to_rdtgroup(kn); + new_prdtgrp = kernfs_to_rdtgroup(new_parent); + if (!rdtgrp || !new_prdtgrp) { + free_cpumask_var(tmpmask); + return -EPERM; + } + + /* Release both kernfs active_refs before obtaining rdtgroup mutex. */ + rdtgroup_kn_get(rdtgrp, kn); + rdtgroup_kn_get(new_prdtgrp, new_parent); + + mutex_lock(&rdtgroup_mutex); + + if ((rdtgrp->flags & RDT_DELETED) || (new_prdtgrp->flags & RDT_DELETED)) { + ret = -ESRCH; + goto out; + } + + /* Only a mon group can be moved to a new mon_groups directory. */ + if (rdtgrp->type != RDTMON_GROUP || + !is_mon_groups(new_parent, kn->name)) { + ret = -EPERM; + goto out; + } + + ret = kernfs_rename(kn, new_parent, new_name); + if (ret) + goto out; + + mongrp_move(rdtgrp, new_prdtgrp, tmpmask); + +out: + mutex_unlock(&rdtgroup_mutex); + rdtgroup_kn_put(rdtgrp, kn); + rdtgroup_kn_put(new_prdtgrp, new_parent); + free_cpumask_var(tmpmask); + return ret; +} + static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf) { if (resctrl_arch_get_cdp_enabled(RDT_RESOURCE_L3)) @@ -3255,6 +3329,7 @@ static int rdtgroup_show_options(struct seq_file *seq, struct kernfs_root *kf) static struct kernfs_syscall_ops rdtgroup_kf_syscall_ops = { .mkdir = rdtgroup_mkdir, .rmdir = rdtgroup_rmdir, + .rename = rdtgroup_rename, .show_options = rdtgroup_show_options, };