Message ID | 20230116071246.97717-1-shawnwang@linux.alibaba.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1058269wrn; Sun, 15 Jan 2023 23:13:43 -0800 (PST) X-Google-Smtp-Source: AMrXdXt1gfxepnprv3O77rRZDmJMSZCm/gmr54G6UwFsHmaRgY6bnNFBT1sHhBr64hbjv1F8iQRQ X-Received: by 2002:a17:90a:b311:b0:228:cb86:1f75 with SMTP id d17-20020a17090ab31100b00228cb861f75mr24142545pjr.23.1673853223613; Sun, 15 Jan 2023 23:13:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673853223; cv=none; d=google.com; s=arc-20160816; b=T+P1xYn/JEMUR7+1unwgEulkkVljDRZkQJ+k0uSKrDokQcHcSTeKRhh6X/swlMLyBQ M1xAIAzP4370a0c3oRBQgp6wjpU6rb5mq5jE8hWAjal886+meqyItnHWayFyZLa1mAXA 2TtPHepbnNZN06kZk2oPsvBQq95/VKNpzLpVjH1MSgrRM7Ic0gMUsXM5wd71GMU/UK6G DeU6e4/tUQZEiDF5rw4JzYH/8VcFkMn0P2syy3ytyoXH1EMkFqmZ8lYB6pxxxLVEU83k 241rq14pvlCvK+dDcFZyUUkDYVSyDjx6srYkljwUe4xeeM6UEQwYHOp0vN/COn9DpiC3 9SBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=Ti7IOsp+14EAKEoZumhKsxERvmaDirvODV0SODMGdag=; b=lc07JrGLh+Xzl4t2W4RAVqLjeCXjZ1P7eilchdFcTSPhfsIG6Tqv17fnQWNJcIxaB0 4aqLv8nzqMvLwWfzjErCEvsAPyTXJZkyqi4PafGbtdK/75lfQ4LqEsqSvZEtuDviNFPq FFcJoSU6238RfA8Y7OfT7TBTDaZg9IgphiIec8MAbffnww6n5s3gDEChdOJa5D8x4ARK gKW2HVMIAQlkDJvj1qq7axfufvsG1pkh53E4nuTM35Co61WUymjKOAT/dggaatg4eknO T7qfgyxKyIj5MtyHX/2x3XIigP8hjY7ESmkc0FvaJ5fsruYiF2/+6b9IyfsAKoOh9VQ5 d3BA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 193-20020a6300ca000000b004ccbc54e8bcsi3214552pga.93.2023.01.15.23.13.28; Sun, 15 Jan 2023 23:13:43 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231835AbjAPHNH (ORCPT <rfc822;stefanalexe802@gmail.com> + 99 others); Mon, 16 Jan 2023 02:13:07 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39158 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231784AbjAPHM5 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 16 Jan 2023 02:12:57 -0500 Received: from out30-7.freemail.mail.aliyun.com (out30-7.freemail.mail.aliyun.com [115.124.30.7]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 47E4EC642 for <linux-kernel@vger.kernel.org>; Sun, 15 Jan 2023 23:12:55 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R441e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045176;MF=shawnwang@linux.alibaba.com;NM=1;PH=DS;RN=10;SR=0;TI=SMTPD_---0VZdrEag_1673853166; Received: from localhost(mailfrom:shawnwang@linux.alibaba.com fp:SMTPD_---0VZdrEag_1673853166) by smtp.aliyun-inc.com; Mon, 16 Jan 2023 15:12:51 +0800 From: Shawn Wang <shawnwang@linux.alibaba.com> To: fenghua.yu@intel.com, reinette.chatre@intel.com Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, james.morse@arm.com, x86@kernel.org, hpa@zytor.com, linux-kernel@vger.kernel.org Subject: [PATCH] x86/resctrl: Only show tasks' pids in current pid namespace Date: Mon, 16 Jan 2023 15:12:46 +0800 Message-Id: <20230116071246.97717-1-shawnwang@linux.alibaba.com> X-Mailer: git-send-email 2.19.1.6.gb485710b MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755162318115160613?= X-GMAIL-MSGID: =?utf-8?q?1755162318115160613?= |
Series |
x86/resctrl: Only show tasks' pids in current pid namespace
|
|
Commit Message
Shawn Wang
Jan. 16, 2023, 7:12 a.m. UTC
When writing a task id to the "tasks" file in an rdtgroup,
rdtgroup_tasks_write() treats the pid as a number in the current pid
namespace. But when reading the "tasks" file, rdtgroup_tasks_show() shows
the list of global pids from the init namespace. If current pid namespace
is not the init namespace, pids in "tasks" will be confusing and incorrect.
To be more robust, let the "tasks" file only show pids in the current pid
namespace.
Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com>
---
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
Comments
Hi Shawn, On 1/15/2023 11:12 PM, Shawn Wang wrote: > When writing a task id to the "tasks" file in an rdtgroup, > rdtgroup_tasks_write() treats the pid as a number in the current pid > namespace. But when reading the "tasks" file, rdtgroup_tasks_show() shows > the list of global pids from the init namespace. If current pid namespace > is not the init namespace, pids in "tasks" will be confusing and incorrect. > > To be more robust, let the "tasks" file only show pids in the current pid > namespace. > Is it possible to elaborate more on the use case that this is aiming to address? It is unexpected to me that resource management is approached from within a container. My expectation is that the resource management and monitoring is done from the host. > Signed-off-by: Shawn Wang <shawnwang@linux.alibaba.com> > --- > arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > index 5993da21d822..9e97ae24c159 100644 > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > @@ -718,11 +718,15 @@ static ssize_t rdtgroup_tasks_write(struct kernfs_open_file *of, > static void show_rdt_tasks(struct rdtgroup *r, struct seq_file *s) > { > struct task_struct *p, *t; > + pid_t pid; > > rcu_read_lock(); > for_each_process_thread(p, t) { > - if (is_closid_match(t, r) || is_rmid_match(t, r)) > - seq_printf(s, "%d\n", t->pid); > + if (is_closid_match(t, r) || is_rmid_match(t, r)) { > + pid = task_pid_vnr(t); > + if (pid) > + seq_printf(s, "%d\n", pid); > + } > } > rcu_read_unlock(); > } This looks like it would solve the stated problem. Does it slow down reading a tasks file in a measurable way? Reinette
Hi Reinette, On 2/16/23 5:43 AM, Reinette Chatre wrote: > Hi Shawn, > > On 1/15/2023 11:12 PM, Shawn Wang wrote: >> When writing a task id to the "tasks" file in an rdtgroup, >> rdtgroup_tasks_write() treats the pid as a number in the current pid >> namespace. But when reading the "tasks" file, rdtgroup_tasks_show() shows >> the list of global pids from the init namespace. If current pid namespace >> is not the init namespace, pids in "tasks" will be confusing and incorrect. >> >> To be more robust, let the "tasks" file only show pids in the current pid >> namespace. >> > > Is it possible to elaborate more on the use case that this is aiming to > address? It is unexpected to me that resource management is approached from > within a container. My expectation is that the resource management and monitoring > is done from the host. We have a scenario where we only want to mount the resctrl filesystem under a specific container. And We found that the pids in the tasks under resctrl are inconsistent with the pids obtained by top. Besides, current rdtgroup_move_task() uses the find_task_by_vpid() to get the real pid. Our modification is also to maintain symmetry with the rdtgroup_move_task(). >> --- >> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c >> index 5993da21d822..9e97ae24c159 100644 >> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c >> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c >> @@ -718,11 +718,15 @@ static ssize_t rdtgroup_tasks_write(struct kernfs_open_file *of, >> static void show_rdt_tasks(struct rdtgroup *r, struct seq_file *s) >> { >> struct task_struct *p, *t; >> + pid_t pid; >> >> rcu_read_lock(); >> for_each_process_thread(p, t) { >> - if (is_closid_match(t, r) || is_rmid_match(t, r)) >> - seq_printf(s, "%d\n", t->pid); >> + if (is_closid_match(t, r) || is_rmid_match(t, r)) { >> + pid = task_pid_vnr(t); >> + if (pid) >> + seq_printf(s, "%d\n", pid); >> + } >> } >> rcu_read_unlock(); >> } > > This looks like it would solve the stated problem. Does it slow down > reading a tasks file in a measurable way? We didn't test it, but it is proportional to the number of pids in the group. In addition, only an if statement is added here, and actually the reading of the tasks interface will not be called frequently, so it will not be a bottleneck. Thanks, Shawn
Hi Shawn, On 3/15/2023 8:06 AM, Shawn Wang wrote: > On 2/16/23 5:43 AM, Reinette Chatre wrote: >> On 1/15/2023 11:12 PM, Shawn Wang wrote: >>> When writing a task id to the "tasks" file in an rdtgroup, >>> rdtgroup_tasks_write() treats the pid as a number in the current pid >>> namespace. But when reading the "tasks" file, rdtgroup_tasks_show() shows >>> the list of global pids from the init namespace. If current pid namespace >>> is not the init namespace, pids in "tasks" will be confusing and incorrect. >>> >>> To be more robust, let the "tasks" file only show pids in the current pid >>> namespace. >>> >> >> Is it possible to elaborate more on the use case that this is aiming to >> address? It is unexpected to me that resource management is approached from >> within a container. My expectation is that the resource management and monitoring >> is done from the host. > > We have a scenario where we only want to mount the resctrl filesystem under a specific container. This scenario is interesting to me. My assumption has always been that the resource management is done from the host and not a container. Especially since a container can only add its own tasks to resource groups. > And We found that the pids in the tasks under resctrl are inconsistent with the pids obtained by top. Indeed. > > Besides, current rdtgroup_move_task() uses the find_task_by_vpid() to get the real pid. > Our modification is also to maintain symmetry with the rdtgroup_move_task(). I understand, thank you for looking into this. > >>> --- >>> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 8 ++++++-- >>> 1 file changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c >>> index 5993da21d822..9e97ae24c159 100644 >>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c >>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c >>> @@ -718,11 +718,15 @@ static ssize_t rdtgroup_tasks_write(struct kernfs_open_file *of, >>> static void show_rdt_tasks(struct rdtgroup *r, struct seq_file *s) >>> { >>> struct task_struct *p, *t; >>> + pid_t pid; >>> rcu_read_lock(); >>> for_each_process_thread(p, t) { >>> - if (is_closid_match(t, r) || is_rmid_match(t, r)) >>> - seq_printf(s, "%d\n", t->pid); >>> + if (is_closid_match(t, r) || is_rmid_match(t, r)) { >>> + pid = task_pid_vnr(t); >>> + if (pid) >>> + seq_printf(s, "%d\n", pid); >>> + } >>> } >>> rcu_read_unlock(); >>> } >> >> This looks like it would solve the stated problem. Does it slow down >> reading a tasks file in a measurable way? > > We didn't test it, but it is proportional to the number of pids in the group. > In addition, only an if statement is added here, and actually the reading of > the tasks interface will not be called frequently, so it will not be a bottleneck. It adds more than an if statement and for a default root group task_pid_vnr() will be called for every task on the host. I am not familiar with namespaces so my concern was the additional task_pid_vnr() call. This does seem to be the custom though and does what's needed to return the correct data. I did test this and can confirm that when bind mounting /sys/fs/resctrl into the container the container's view of /sys/fs/resctrl/tasks only shows its own tasks with the pids as seen by it. Without this patch both the container and the host shows the same data, which are the pids from the host namespace. Tested-by: Reinette Chatre <reinette.chatre@intel.com> Acked-by: Reinette Chatre <reinette.chatre@intel.com> When you no longer expect any more feedback I'd recommend that you resubmit this patch with the new tags to make it easier for the next level maintainers to notice it and pick it up. To ensure accurate references to discussions you can add a "Link:" to this email. Thank you very much Reinette
Hi Shawn, Thinking about this more, this could probably also do with a: Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files") Reinette
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c index 5993da21d822..9e97ae24c159 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -718,11 +718,15 @@ static ssize_t rdtgroup_tasks_write(struct kernfs_open_file *of, static void show_rdt_tasks(struct rdtgroup *r, struct seq_file *s) { struct task_struct *p, *t; + pid_t pid; rcu_read_lock(); for_each_process_thread(p, t) { - if (is_closid_match(t, r) || is_rmid_match(t, r)) - seq_printf(s, "%d\n", t->pid); + if (is_closid_match(t, r) || is_rmid_match(t, r)) { + pid = task_pid_vnr(t); + if (pid) + seq_printf(s, "%d\n", pid); + } } rcu_read_unlock(); }