Message ID | 20230713163207.219710-4-tony.luck@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp1964154vqm; Thu, 13 Jul 2023 10:07:21 -0700 (PDT) X-Google-Smtp-Source: APBJJlGY/bbbwBx5VPmUoP/Tyw9mh9IqQNYNYhW2DqWDuU/ZdRhjwgPNNcssmpcmZsIsUNmsT75k X-Received: by 2002:ac2:55bb:0:b0:4f8:7333:d1fd with SMTP id y27-20020ac255bb000000b004f87333d1fdmr1508876lfg.34.1689268040870; Thu, 13 Jul 2023 10:07:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689268040; cv=none; d=google.com; s=arc-20160816; b=PpwL8vaX+O5Q6KkmohUxz/2b2WXw0W3CnUW6PrNZh24gqg9bazR5ZVGOWk+MWYSSW6 L625JqZrhdh6Jeifo9z0L8QQEKgipDpc6uaYg7XH/NKkrPFXGuC5B+7u1IyUE7EfoF/+ WDBAGm8MqUh+aM/jBwEVne5MrsZRW4jUU95Cg8JNoi8nHDmQTuzOe4VQYWKYvliyQqNG nIbyT7brGWeCYii7hBdWAHMkzGmaqK4Wqd4TnsU0q6o62eTfs8AXjqNEChzivd/APKzE R0JwriDryeydtygB/f1Vqn92vzSKBPPAFNOwrXOM5N74g4/JxW6LPwOjwg4t//Ilbjms 7mKA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=TnpVlOWPFkOXgpVW1bIvFAPZaV3EnSK5zCVKk90iNw8=; fh=g0WKEVDdB8sQamShH39vSa8SoB8TDPrv8hO7zbxV08s=; b=CLKM3wrBFFexkQHSaNTnjlQNXyzjIGZoGgi9ZyEmy2z2MSoHKebQjWvhyZNb7CtFOJ ysk2m8zFBwBFDmcMn7K2ZGkk+mBYn0Vz73hMoyNIF0iI6Ps277YRkF2GmWtdbyw7skIK YKTG8wq+OZU/Fyr8sHYdMi3ftrR51fZpqbm1LycpGRHlxC/HM+7iMxdagTp7IfSPPYZQ sgI0Mdpdo2X7cXZ4QzkYRSuYX+/SrS3uQIlP7ulADHLGK/BFxKbKGGRLg/nafONxNCMM Ez6DcyprlmWew4223j1XJlQhQmxLdprOzTtzpaOj6oB+i4zLC2vaH6lUIL2vPx/visab ckBg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=E1oeZ9A0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n21-20020aa7d055000000b0051e034292f7si7385296edo.147.2023.07.13.10.06.46; Thu, 13 Jul 2023 10:07:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=E1oeZ9A0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231777AbjGMQcr (ORCPT <rfc822;ybw1215001957@gmail.com> + 99 others); Thu, 13 Jul 2023 12:32:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43502 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232834AbjGMQce (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 13 Jul 2023 12:32:34 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 916AF273F; Thu, 13 Jul 2023 09:32:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689265953; x=1720801953; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mb0vyqYUmP9/Z2FeljUZJDw2W7rRUKlG9bPFHkTU9kY=; b=E1oeZ9A04k0Ke+aX+iHdySs7rP0WfXXQFQfPhL8Hq8X5uhFiHNbm30qF gkLeXhIh8ORondiflXknuPk6JIa4E/9G4nXxPErGIWnSxtSledge+vhxd iCYqBVJ9AGTCQThXssZp01U2seAPhe69BADaVwN++MImOKVOafvL5Vcff xRyOs1PfwEydJz7lD2WSUL6zrkvKG0TNa3IMv7tWJu63KCLnE++Hn35pD QIZ9zdzd7N2mNE4jTKAeT2re2uUobxxTLHz6g16Tqh+wP9biwiwo3LdiP sSqwL5htgPSykCNjufl8LkMk6Qdab/XB2V/Z1s8lOIPfS6jMqizvAqZw1 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="362707619" X-IronPort-AV: E=Sophos;i="6.01,203,1684825200"; d="scan'208";a="362707619" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jul 2023 09:32:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10770"; a="722046377" X-IronPort-AV: E=Sophos;i="6.01,203,1684825200"; d="scan'208";a="722046377" Received: from agluck-desk3.sc.intel.com ([172.25.222.74]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jul 2023 09:32:22 -0700 From: Tony Luck <tony.luck@intel.com> To: Fenghua Yu <fenghua.yu@intel.com>, Reinette Chatre <reinette.chatre@intel.com>, Peter Newman <peternewman@google.com>, Jonathan Corbet <corbet@lwn.net>, Shuah Khan <skhan@linuxfoundation.org>, x86@kernel.org Cc: Shaopeng Tan <tan.shaopeng@fujitsu.com>, James Morse <james.morse@arm.com>, Jamie Iles <quic_jiles@quicinc.com>, Babu Moger <babu.moger@amd.com>, Randy Dunlap <rdunlap@infradead.org>, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, patches@lists.linux.dev, Tony Luck <tony.luck@intel.com> Subject: [PATCH v3 3/8] x86/resctrl: Add a new node-scoped resource to rdt_resources_all[] Date: Thu, 13 Jul 2023 09:32:02 -0700 Message-Id: <20230713163207.219710-4-tony.luck@intel.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230713163207.219710-1-tony.luck@intel.com> References: <20230713163207.219710-1-tony.luck@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771325925558394458 X-GMAIL-MSGID: 1771325925558394458 |
Series |
x86/resctrl: Add support for Sub-NUMA cluster (SNC) systems
|
|
Commit Message
Luck, Tony
July 13, 2023, 4:32 p.m. UTC
Add a placeholder in the array of struct rdt_hw_resource to be used for event monitoring of systems with Sub-NUMA Cluster enabled. Update get_domain_id() to handle SCOPE_NODE. Signed-off-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Peter Newman <peternewman@google.com> --- arch/x86/kernel/cpu/resctrl/internal.h | 4 +++- arch/x86/kernel/cpu/resctrl/core.c | 12 ++++++++++++ 2 files changed, 15 insertions(+), 1 deletion(-)
Comments
Hi Tony, On 7/13/2023 9:32 AM, Tony Luck wrote: > Add a placeholder in the array of struct rdt_hw_resource to be used > for event monitoring of systems with Sub-NUMA Cluster enabled. Could you please elaborate why a new resource is required? > > Update get_domain_id() to handle SCOPE_NODE. > > Signed-off-by: Tony Luck <tony.luck@intel.com> > Reviewed-by: Peter Newman <peternewman@google.com> > --- > arch/x86/kernel/cpu/resctrl/internal.h | 4 +++- > arch/x86/kernel/cpu/resctrl/core.c | 12 ++++++++++++ > 2 files changed, 15 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h > index 8275b8a74f7e..243017096ddf 100644 > --- a/arch/x86/kernel/cpu/resctrl/internal.h > +++ b/arch/x86/kernel/cpu/resctrl/internal.h > @@ -435,6 +435,7 @@ enum resctrl_res_level { > RDT_RESOURCE_L2, > RDT_RESOURCE_MBA, > RDT_RESOURCE_SMBA, > + RDT_RESOURCE_NODE, > > /* Must be the last */ > RDT_NUM_RESOURCES, > @@ -442,7 +443,8 @@ enum resctrl_res_level { > > enum resctrl_scope { > SCOPE_L2_CACHE = 2, > - SCOPE_L3_CACHE = 3 > + SCOPE_L3_CACHE = 3, > + SCOPE_NODE, > }; A new resource _and_ a new scope is added. Could changelog please explain why this is required? > > static inline struct rdt_resource *resctrl_inc(struct rdt_resource *res) > diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c > index 6571514752f3..e4bd3072927c 100644 > --- a/arch/x86/kernel/cpu/resctrl/core.c > +++ b/arch/x86/kernel/cpu/resctrl/core.c > @@ -112,6 +112,16 @@ struct rdt_hw_resource rdt_resources_all[] = { > .fflags = RFTYPE_RES_MB, > }, > }, > + [RDT_RESOURCE_NODE] = > + { > + .r_resctrl = { > + .rid = RDT_RESOURCE_NODE, > + .name = "L3", > + .scope = SCOPE_NODE, > + .domains = domain_init(RDT_RESOURCE_NODE), > + .fflags = 0, > + }, > + }, > }; So the new resource has the same name, from user perspective, as RDT_RESOURCE_L3. From this perspective it thus seems to be a shadow of RDT_RESOURCE_L3 that is used as alternative for some properties of the actual RDT_RESOURCE_L3? This is starting to look as though this solution is wrenching itself into current architecture. From what I can tell the monitoring in SNC environment needs a different domain list because of the change in scope. What else is needed in the resource that is different from the existing L3 resource? Could the monitoring scope of a resource not instead be made distinct from its allocation scope? By default monitoring and allocation scope will be the same and thus use the same domain list but when SNC is enabled then monitoring uses a different domain list. > /* > @@ -489,6 +499,8 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom) > > static int get_domain_id(int cpu, enum resctrl_scope scope) > { > + if (scope == SCOPE_NODE) > + return cpu_to_node(cpu); > return get_cpu_cacheinfo_id(cpu, scope); > } > Reinette
On Tue, Jul 18, 2023 at 01:40:32PM -0700, Reinette Chatre wrote: > > + [RDT_RESOURCE_NODE] = > > + { > > + .r_resctrl = { > > + .rid = RDT_RESOURCE_NODE, > > + .name = "L3", > > + .scope = SCOPE_NODE, > > + .domains = domain_init(RDT_RESOURCE_NODE), > > + .fflags = 0, > > + }, > > + }, > > }; > > So the new resource has the same name, from user perspective, > as RDT_RESOURCE_L3. From this perspective it thus seems to be a > shadow of RDT_RESOURCE_L3 that is used as alternative for some properties > of the actual RDT_RESOURCE_L3? This is starting to look as though this > solution is wrenching itself into current architecture. > > >From what I can tell the monitoring in SNC environment needs a different > domain list because of the change in scope. What else is needed in the > resource that is different from the existing L3 resource? Could the > monitoring scope of a resource not instead be made distinct from its > allocation scope? By default monitoring and allocation scope will be > the same and thus use the same domain list but when SNC is enabled > then monitoring uses a different domain list. Answering this part first, because my choice here affects a bunch of the code that also raised comments from you. The crux of the issue is that when SNC mode is enabled the scope for L3 monitoring functions changes to "node" scope, while the scope of L3 control functions (CAT, CDP) remains at L3 cache scope. My solution was to just create a new resource. But you have an interesing alternate solution. Add an extra domain list to the resource structure to allow creation of distinct domain lists for this case where the scope for control and monitor functions differs. So change the resource structure like this: diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index 8334eeacfec5..01590aa59a67 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -168,10 +168,12 @@ struct rdt_resource { bool alloc_capable; bool mon_capable; int num_rmid; - int cache_level; + int ctrl_scope; + int mon_scope; struct resctrl_cache cache; struct resctrl_membw membw; - struct list_head domains; + struct list_head ctrl_domains; + struct list_head mon_domains; char *name; int data_width; u32 default_ctrl; and build/use separate domain lists for when this resource is being referenced for allocation/monitoring. E.g. domain_add_cpu() would check "r->alloc_capable" and add a cpu to the ctrl_domains list based on the ctrl_scope value. It would do the same with mon_capable / mon_domains / mon_scope. If ctrl_scope == mon_scope, just build one list as you suggest above. Maybe there are more places that walk the list of control domains than walk the list of monitor domains. Need to audit this set: $ git grep list_for_each.*domains -- arch/x86/kernel/cpu/resctrl arch/x86/kernel/cpu/resctrl/core.c: list_for_each_entry(d, &r->domains, list) { arch/x86/kernel/cpu/resctrl/core.c: list_for_each(l, &r->domains) { arch/x86/kernel/cpu/resctrl/ctrlmondata.c: list_for_each_entry(d, &r->domains, list) { arch/x86/kernel/cpu/resctrl/ctrlmondata.c: list_for_each_entry(d, &r->domains, list) { arch/x86/kernel/cpu/resctrl/ctrlmondata.c: list_for_each_entry(dom, &r->domains, list) { arch/x86/kernel/cpu/resctrl/monitor.c: list_for_each_entry(d, &r->domains, list) { arch/x86/kernel/cpu/resctrl/pseudo_lock.c: list_for_each_entry(d_i, &r->domains, list) { arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(dom, &r->domains, list) arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(dom, &r->domains, list) { arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) { arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) { arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(dom, &r->domains, list) { arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) { arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r_l->domains, list) { arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) { arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(dom, &r->domains, list) arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) { arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(dom, &r->domains, list) { arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &s->res->domains, list) { arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) { Maybe "domains" can keep its name and make a "list_for_each_monitor_domain()" macro to pick the right list to walk? I don't think this will reduce the amount of code change in a significant way. But it may be conceptually easier to follow what is going on. -Tony
Hi Tony, On 7/18/2023 3:57 PM, Tony Luck wrote: > On Tue, Jul 18, 2023 at 01:40:32PM -0700, Reinette Chatre wrote: >>> + [RDT_RESOURCE_NODE] = >>> + { >>> + .r_resctrl = { >>> + .rid = RDT_RESOURCE_NODE, >>> + .name = "L3", >>> + .scope = SCOPE_NODE, >>> + .domains = domain_init(RDT_RESOURCE_NODE), >>> + .fflags = 0, >>> + }, >>> + }, >>> }; >> >> So the new resource has the same name, from user perspective, >> as RDT_RESOURCE_L3. From this perspective it thus seems to be a >> shadow of RDT_RESOURCE_L3 that is used as alternative for some properties >> of the actual RDT_RESOURCE_L3? This is starting to look as though this >> solution is wrenching itself into current architecture. >> >> >From what I can tell the monitoring in SNC environment needs a different >> domain list because of the change in scope. What else is needed in the >> resource that is different from the existing L3 resource? Could the >> monitoring scope of a resource not instead be made distinct from its >> allocation scope? By default monitoring and allocation scope will be >> the same and thus use the same domain list but when SNC is enabled >> then monitoring uses a different domain list. > > Answering this part first, because my choice here affects a bunch > of the code that also raised comments from you. Indeed. > > The crux of the issue is that when SNC mode is enabled the scope > for L3 monitoring functions changes to "node" scope, while the > scope of L3 control functions (CAT, CDP) remains at L3 cache scope. > > My solution was to just create a new resource. But you have an > interesing alternate solution. Add an extra domain list to the > resource structure to allow creation of distinct domain lists > for this case where the scope for control and monitor functions > differs. > > So change the resource structure like this: > > diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h > index 8334eeacfec5..01590aa59a67 100644 > --- a/include/linux/resctrl.h > +++ b/include/linux/resctrl.h > @@ -168,10 +168,12 @@ struct rdt_resource { > bool alloc_capable; > bool mon_capable; > int num_rmid; > - int cache_level; > + int ctrl_scope; > + int mon_scope; I am not sure about getting rid of cache_level so fast. I see regarding the current problem being solved that ctrl_scope would have the same values as cache_level but I find that adding this level of indirection while keeping the comparison with cacheinfo->level to create a trap for future mistakes. > struct resctrl_cache cache; > struct resctrl_membw membw; > - struct list_head domains; > + struct list_head ctrl_domains; > + struct list_head mon_domains; > char *name; > int data_width; > u32 default_ctrl; > > and build/use separate domain lists for when this resource is > being referenced for allocation/monitoring. E.g. domain_add_cpu() > would check "r->alloc_capable" and add a cpu to the ctrl_domains > list based on the ctrl_scope value. It would do the same with > mon_capable / mon_domains / mon_scope. > > If ctrl_scope == mon_scope, just build one list as you suggest above. Yes, this is the idea. Thank you for considering it. Something else to consider that may make this even cleaner/simpler would be to review struct rdt_domain and struct rdt_hw_domain members for "monitor" vs "control" usage. These structs could potentially be split further into separate "control" and "monitor" variants. For example, "struct rdt_domain" split into "struct rdt_ctrl_domain" and "struct rdt_mon_domain". If there is a clean split then resctrl can always create two lists with the unnecessary duplication eliminated when two domain lists are created. This would also eliminate the need to scatter ctrl_scope == mon_scope checks throughout. > > Maybe there are more places that walk the list of control domains than > walk the list of monitor domains. Need to audit this set: > > $ git grep list_for_each.*domains -- arch/x86/kernel/cpu/resctrl > arch/x86/kernel/cpu/resctrl/core.c: list_for_each_entry(d, &r->domains, list) { > arch/x86/kernel/cpu/resctrl/core.c: list_for_each(l, &r->domains) { > arch/x86/kernel/cpu/resctrl/ctrlmondata.c: list_for_each_entry(d, &r->domains, list) { > arch/x86/kernel/cpu/resctrl/ctrlmondata.c: list_for_each_entry(d, &r->domains, list) { > arch/x86/kernel/cpu/resctrl/ctrlmondata.c: list_for_each_entry(dom, &r->domains, list) { > arch/x86/kernel/cpu/resctrl/monitor.c: list_for_each_entry(d, &r->domains, list) { > arch/x86/kernel/cpu/resctrl/pseudo_lock.c: list_for_each_entry(d_i, &r->domains, list) { > arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(dom, &r->domains, list) > arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(dom, &r->domains, list) { > arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) { > arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) { > arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(dom, &r->domains, list) { > arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) { > arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r_l->domains, list) { > arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) { > arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(dom, &r->domains, list) > arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) { > arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(dom, &r->domains, list) { > arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &s->res->domains, list) { > arch/x86/kernel/cpu/resctrl/rdtgroup.c: list_for_each_entry(d, &r->domains, list) { > > Maybe "domains" can keep its name and make a "list_for_each_monitor_domain()" macro > to pick the right list to walk? It is not clear to me how "domains" can keep its name. If I understand the macro would be useful if scope always needs to be considered. I wonder if the list walkers may not mostly just walk the appropriate list directly if resctrl always creates separate "control domain" and "monitor domain" lists. > I don't think this will reduce the amount of code change in a > significant way. But it may be conceptually easier to follow > what is going on. Reducing the amount of code changed is not a goal to me. If I understand correctly I think that adapting resctrl to support different monitor and control scope could create a foundation into which SNC can slot in smoothly. Reinette
> Yes, this is the idea. Thank you for considering it. Something else > to consider that may make this even cleaner/simpler would be to review > struct rdt_domain and struct rdt_hw_domain members for "monitor" vs "control" > usage. These structs could potentially be split further into separate > "control" and "monitor" variants. For example, "struct rdt_domain" split into > "struct rdt_ctrl_domain" and "struct rdt_mon_domain". If there is a clean > split then resctrl can always create two lists with the unnecessary duplication > eliminated when two domain lists are created. This would also > eliminate the need to scatter ctrl_scope == mon_scope checks throughout. You might like what I'm doing in the "resctrl2" re-write[1]. Arch independent code that maintains the domain lists for a resource via a cpuhp notifier just has this for the domain structure: struct resctrl_domain { struct list_head list; struct cpumask cpu_mask; int id; int cache_size; }; Each module managing a resource decides what extra information it wants to carry in the domain. So the above structure is common to all, but it is followed by whatever the resource module wants. E.g. the CBM masks for each CLOSid for the CAT module. The module tells core code the size to allocate. "cache_size" is only there because the cache topology bits needed to discover sizes of caches aren't exported. Both the "size" file and pseudo-locking need to know the size. It's also possible that you may hate it. There is zero sharing of resource structures even if they have the same scope. This is because all modules are independently loadable. -Tony [1] WIP snapshot at git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git branch resctrl2_v65rc1. That doesn't have pseudo-locking, but most of the rest of existing resctrl functionality is there.
Here's a quick hack to see how things might look with separate domain lists in the "L3" resource. For testing purposes on a non-SNC system I set ->mon_scope = MON_SCOPE_NODE, but made domain_add_cpu() allocate the mondomains list based on L3 scope ... just so I could check that I found all the places where monitoring needs to use the mondomains list. The kernel doesn't crash when running tools/testing/selftests/resctrl, and the tests all pass. But that doesn't mean I didn't miss something. Some restructuring of control vs. monitoing initialization might avoid some of the code I duplicated in domain_add_cpu(). But this is intended just as a "Is this what you meant?" before I dig deeper. Overall, I think it is a cleaner approach that making a new "L3" resource with different scope just for the SNC monitoring -Tony --- diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index 8334eeacfec5..e4b653088a22 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -151,9 +151,11 @@ struct resctrl_schema; * @mon_capable: Is monitor feature available on this machine * @num_rmid: Number of RMIDs available * @cache_level: Which cache level defines scope of this resource + * @mon_scope: Scope of this resource if different from cache_level * @cache: Cache allocation related data * @membw: If the component has bandwidth controls, their properties. * @domains: All domains for this resource + * @mondomains: Monitor domains for this resource (if mon_scope != 0) * @name: Name to use in "schemata" file. * @data_width: Character width of data when displaying * @default_ctrl: Specifies default cache cbm or memory B/W percent. @@ -169,9 +171,11 @@ struct rdt_resource { bool mon_capable; int num_rmid; int cache_level; + int mon_scope; struct resctrl_cache cache; struct resctrl_membw membw; struct list_head domains; + struct list_head mondomains; char *name; int data_width; u32 default_ctrl; @@ -184,6 +188,8 @@ struct rdt_resource { bool cdp_capable; }; +#define MON_SCOPE_NODE 1 + /** * struct resctrl_schema - configuration abilities of a resource presented to * user-space @@ -217,8 +223,8 @@ int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d, u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d, u32 closid, enum resctrl_conf_type type); -int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d); -void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d); +int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d, bool mon_setup); +void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d, bool mon_teardown); /** * resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h index 85ceaf9a31ac..c5e2ac2a60cf 100644 --- a/arch/x86/kernel/cpu/resctrl/internal.h +++ b/arch/x86/kernel/cpu/resctrl/internal.h @@ -511,7 +511,7 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn); int rdtgroup_kn_mode_restrict(struct rdtgroup *r, const char *name); int rdtgroup_kn_mode_restore(struct rdtgroup *r, const char *name, umode_t mask); -struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id, +struct rdt_domain *rdt_find_domain(struct list_head *h, int id, struct list_head **pos); ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off); diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c index 030d3b409768..545d563ba956 100644 --- a/arch/x86/kernel/cpu/resctrl/core.c +++ b/arch/x86/kernel/cpu/resctrl/core.c @@ -57,7 +57,7 @@ static void mba_wrmsr_amd(struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r); -#define domain_init(id) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.domains) +#define domain_init(id, field) LIST_HEAD_INIT(rdt_resources_all[id].r_resctrl.field) struct rdt_hw_resource rdt_resources_all[] = { [RDT_RESOURCE_L3] = @@ -66,7 +66,9 @@ struct rdt_hw_resource rdt_resources_all[] = { .rid = RDT_RESOURCE_L3, .name = "L3", .cache_level = 3, - .domains = domain_init(RDT_RESOURCE_L3), + .mon_scope = MON_SCOPE_NODE, //FAKE + .domains = domain_init(RDT_RESOURCE_L3, domains), + .mondomains = domain_init(RDT_RESOURCE_L3, mondomains), .parse_ctrlval = parse_cbm, .format_str = "%d=%0*x", .fflags = RFTYPE_RES_CACHE, @@ -80,7 +82,7 @@ struct rdt_hw_resource rdt_resources_all[] = { .rid = RDT_RESOURCE_L2, .name = "L2", .cache_level = 2, - .domains = domain_init(RDT_RESOURCE_L2), + .domains = domain_init(RDT_RESOURCE_L2, domains), .parse_ctrlval = parse_cbm, .format_str = "%d=%0*x", .fflags = RFTYPE_RES_CACHE, @@ -94,7 +96,7 @@ struct rdt_hw_resource rdt_resources_all[] = { .rid = RDT_RESOURCE_MBA, .name = "MB", .cache_level = 3, - .domains = domain_init(RDT_RESOURCE_MBA), + .domains = domain_init(RDT_RESOURCE_MBA, domains), .parse_ctrlval = parse_bw, .format_str = "%d=%*u", .fflags = RFTYPE_RES_MB, @@ -106,7 +108,7 @@ struct rdt_hw_resource rdt_resources_all[] = { .rid = RDT_RESOURCE_SMBA, .name = "SMBA", .cache_level = 3, - .domains = domain_init(RDT_RESOURCE_SMBA), + .domains = domain_init(RDT_RESOURCE_SMBA, domains), .parse_ctrlval = parse_bw, .format_str = "%d=%*u", .fflags = RFTYPE_RES_MB, @@ -384,14 +386,15 @@ void rdt_ctrl_update(void *arg) } /* - * rdt_find_domain - Find a domain in a resource that matches input resource id + * rdt_find_domain - Find a domain in one of the lists for a resource that + * matches input resource id * * Search resource r's domain list to find the resource id. If the resource * id is found in a domain, return the domain. Otherwise, if requested by * caller, return the first domain whose id is bigger than the input id. * The domain list is sorted by id in ascending order. */ -struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id, +struct rdt_domain *rdt_find_domain(struct list_head *h, int id, struct list_head **pos) { struct rdt_domain *d; @@ -400,7 +403,7 @@ struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id, if (id < 0) return ERR_PTR(-ENODEV); - list_for_each(l, &r->domains) { + list_for_each(l, h) { d = list_entry(l, struct rdt_domain, list); /* When id is found, return its domain. */ if (id == d->id) @@ -508,7 +511,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r) struct rdt_domain *d; int err; - d = rdt_find_domain(r, id, &add_pos); + d = rdt_find_domain(&r->domains, id, &add_pos); if (IS_ERR(d)) { pr_warn("Couldn't find cache id for CPU %d\n", cpu); return; @@ -536,6 +539,44 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r) return; } + if (!r->mon_scope && r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) { + domain_free(hw_dom); + return; + } + + list_add_tail(&d->list, add_pos); + + err = resctrl_online_domain(r, d, r->mon_scope == 0); + if (err) { + list_del(&d->list); + domain_free(hw_dom); + } + + if (r->mon_scope != MON_SCOPE_NODE) + return; + + //id = cpu_to_node(cpu); + id = get_cpu_cacheinfo_id(cpu, r->cache_level); // FAKE + add_pos = NULL; + d = rdt_find_domain(&r->mondomains, id, &add_pos); + if (IS_ERR(d)) { + pr_warn("Couldn't find node id for CPU %d\n", cpu); + return; + } + + if (d) { + cpumask_set_cpu(cpu, &d->cpu_mask); + return; + } + + hw_dom = kzalloc_node(sizeof(*hw_dom), GFP_KERNEL, cpu_to_node(cpu)); + if (!hw_dom) + return; + + d = &hw_dom->d_resctrl; + d->id = id; + cpumask_set_cpu(cpu, &d->cpu_mask); + if (r->mon_capable && arch_domain_mbm_alloc(r->num_rmid, hw_dom)) { domain_free(hw_dom); return; @@ -543,7 +584,7 @@ static void domain_add_cpu(int cpu, struct rdt_resource *r) list_add_tail(&d->list, add_pos); - err = resctrl_online_domain(r, d); + err = resctrl_online_domain(r, d, true); if (err) { list_del(&d->list); domain_free(hw_dom); @@ -556,7 +597,7 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r) struct rdt_hw_domain *hw_dom; struct rdt_domain *d; - d = rdt_find_domain(r, id, NULL); + d = rdt_find_domain(&r->domains, id, NULL); if (IS_ERR_OR_NULL(d)) { pr_warn("Couldn't find cache id for CPU %d\n", cpu); return; @@ -565,7 +606,7 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r) cpumask_clear_cpu(cpu, &d->cpu_mask); if (cpumask_empty(&d->cpu_mask)) { - resctrl_offline_domain(r, d); + resctrl_offline_domain(r, d, r->mon_scope == 0); list_del(&d->list); /* @@ -579,7 +620,7 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r) return; } - if (r == &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl) { + if (r->mon_scope == 0 && r == &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl) { if (is_mbm_enabled() && cpu == d->mbm_work_cpu) { cancel_delayed_work(&d->mbm_over); mbm_setup_overflow_handler(d, 0); @@ -590,6 +631,23 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r) cqm_setup_limbo_handler(d, 0); } } + + if (r->mon_scope != MON_SCOPE_NODE) + return; + + id = cpu_to_node(cpu); + d = rdt_find_domain(&r->mondomains, id, NULL); + if (IS_ERR_OR_NULL(d)) { + pr_warn("Couldn't find node id for CPU %d\n", cpu); + return; + } + + cpumask_clear_cpu(cpu, &d->cpu_mask); + if (cpumask_empty(&d->cpu_mask)) { + resctrl_offline_domain(r, d, true); + list_del(&d->list); + domain_free(hw_dom); + } } static void clear_closid_rmid(int cpu) diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c index b44c487727d4..80033cb698d0 100644 --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c @@ -545,6 +545,7 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg) struct rdt_resource *r; union mon_data_bits md; struct rdt_domain *d; + struct list_head *h; struct rmid_read rr; int ret = 0; @@ -560,7 +561,8 @@ int rdtgroup_mondata_show(struct seq_file *m, void *arg) evtid = md.u.evtid; r = &rdt_resources_all[resid].r_resctrl; - d = rdt_find_domain(r, domid, NULL); + h = r->mon_scope ? &r->mondomains : &r->domains; + d = rdt_find_domain(h, domid, NULL); if (IS_ERR_OR_NULL(d)) { ret = -ENOENT; goto out; diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c index ded1fc7cb7cb..08085202582a 100644 --- a/arch/x86/kernel/cpu/resctrl/monitor.c +++ b/arch/x86/kernel/cpu/resctrl/monitor.c @@ -335,12 +335,14 @@ static void add_rmid_to_limbo(struct rmid_entry *entry) { struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl; struct rdt_domain *d; + struct list_head *h; int cpu, err; u64 val = 0; entry->busy = 0; cpu = get_cpu(); - list_for_each_entry(d, &r->domains, list) { + h = r->mon_scope ? &r->mondomains : &r->domains; + list_for_each_entry(d, h, list) { if (cpumask_test_cpu(cpu, &d->cpu_mask)) { err = resctrl_arch_rmid_read(r, d, entry->rmid, QOS_L3_OCCUP_EVENT_ID, diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c index 725344048f85..fb5b23fcb6d4 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -1492,11 +1492,13 @@ static int mbm_config_show(struct seq_file *s, struct rdt_resource *r, u32 evtid { struct mon_config_info mon_info = {0}; struct rdt_domain *dom; + struct list_head *h; bool sep = false; mutex_lock(&rdtgroup_mutex); - list_for_each_entry(dom, &r->domains, list) { + h = r->mon_scope ? &r->mondomains : &r->domains; + list_for_each_entry(dom, h, list) { if (sep) seq_puts(s, ";"); @@ -1599,6 +1601,7 @@ static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid) char *dom_str = NULL, *id_str; unsigned long dom_id, val; struct rdt_domain *d; + struct list_head *h; int ret = 0; next: @@ -1619,7 +1622,8 @@ static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid) return -EINVAL; } - list_for_each_entry(d, &r->domains, list) { + h = r->mon_scope ? &r->mondomains : &r->domains; + list_for_each_entry(d, h, list) { if (d->id == dom_id) { ret = mbm_config_write_domain(r, d, evtid, val); if (ret) @@ -2465,6 +2469,7 @@ static int rdt_get_tree(struct fs_context *fc) struct rdt_fs_context *ctx = rdt_fc2context(fc); struct rdt_domain *dom; struct rdt_resource *r; + struct list_head *h; int ret; cpus_read_lock(); @@ -2525,7 +2530,8 @@ static int rdt_get_tree(struct fs_context *fc) if (is_mbm_enabled()) { r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl; - list_for_each_entry(dom, &r->domains, list) + h = r->mon_scope ? &r->mondomains : &r->domains; + list_for_each_entry(dom, h, list) mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL); } @@ -2917,9 +2923,11 @@ static int mkdir_mondata_subdir_alldom(struct kernfs_node *parent_kn, struct rdtgroup *prgrp) { struct rdt_domain *dom; + struct list_head *h; int ret; - list_for_each_entry(dom, &r->domains, list) { + h = r->mon_scope ? &r->mondomains : &r->domains; + list_for_each_entry(dom, h, list) { ret = mkdir_mondata_subdir(parent_kn, dom, r, prgrp); if (ret) return ret; @@ -3708,14 +3716,14 @@ static void domain_destroy_mon_state(struct rdt_domain *d) kfree(d->mbm_local); } -void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d) +void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d, bool mon_teardown) { lockdep_assert_held(&rdtgroup_mutex); if (supports_mba_mbps() && r->rid == RDT_RESOURCE_MBA) mba_sc_domain_destroy(r, d); - if (!r->mon_capable) + if (!mon_teardown || !r->mon_capable) return; /* @@ -3773,7 +3781,7 @@ static int domain_setup_mon_state(struct rdt_resource *r, struct rdt_domain *d) return 0; } -int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d) +int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d, bool mon_setup) { int err; @@ -3783,7 +3791,7 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d) /* RDT_RESOURCE_MBA is never mon_capable */ return mba_sc_domain_allocate(r, d); - if (!r->mon_capable) + if (!mon_setup || !r->mon_capable) return 0; err = domain_setup_mon_state(r, d);
Hi Tony, On 7/18/2023 5:11 PM, Luck, Tony wrote: >> Yes, this is the idea. Thank you for considering it. Something else >> to consider that may make this even cleaner/simpler would be to review >> struct rdt_domain and struct rdt_hw_domain members for "monitor" vs "control" >> usage. These structs could potentially be split further into separate >> "control" and "monitor" variants. For example, "struct rdt_domain" split into >> "struct rdt_ctrl_domain" and "struct rdt_mon_domain". If there is a clean >> split then resctrl can always create two lists with the unnecessary duplication >> eliminated when two domain lists are created. This would also >> eliminate the need to scatter ctrl_scope == mon_scope checks throughout. > > You might like what I'm doing in the "resctrl2" re-write[1]. Arch independent code > that maintains the domain lists for a resource via a cpuhp notifier just has this > for the domain structure: > > struct resctrl_domain { > struct list_head list; > struct cpumask cpu_mask; > int id; > int cache_size; > }; > > Each module managing a resource decides what extra information it wants to > carry in the domain. So the above structure is common to all, but it is followed > by whatever the resource module wants. E.g. the CBM masks for each CLOSid > for the CAT module. The module tells core code the size to allocate. hmmm ... what I am *hearing* you say is that the goodness from the rewrite can be added to resctrl? :) > "cache_size" is only there because the cache topology bits needed to discover > sizes of caches aren't exported. Both the "size" file and pseudo-locking need > to know the size. > > It's also possible that you may hate it. There is zero sharing of resource structures > even if they have the same scope. This is because all modules are independently > loadable. Apologies but I am still unable to understand the problem statement that motivates the rewrite. Reinette
Hi Tony, On 7/19/2023 5:20 PM, Tony Luck wrote: > Here's a quick hack to see how things might look with > separate domain lists in the "L3" resource. > > For testing purposes on a non-SNC system I set ->mon_scope = > MON_SCOPE_NODE, but made domain_add_cpu() allocate the mondomains > list based on L3 scope ... just so I could check that I found all > the places where monitoring needs to use the mondomains list. > The kernel doesn't crash when running tools/testing/selftests/resctrl, > and the tests all pass. But that doesn't mean I didn't miss something. > > Some restructuring of control vs. monitoing initialization might > avoid some of the code I duplicated in domain_add_cpu(). But this > is intended just as a "Is this what you meant?" before I dig deeper. Thank you for considering the approach. I find that this sample move towards the idea while also highlighting what else can be considered. I do not know if you already considered these ideas and found it flawed so I will try to make it explicit so that you can point out to me where things will fall apart. The sample code introduces a new list "mondomains" that is intended to be used when the monitoring scope is different from the allocation scope. This introduces duplication when the monitoring and allocation scope is different. Each list, "domains" and "mondomains" will host structures that can accommodate both monitoring and allocation data, with the data not relevant to the list going unused as it is unnecessarily duplicated. Additionally this forces significant portions of resctrl to now always consider whether the monitoring and allocation scope is different ... note how this sample now has code like below scattered throughout. h = r->mon_scope ? &r->mondomains : &r->domains; I also find the domain_add_cpu() becoming intricate as it needs to navigate all the different scenarios. This unnecessary duplication, new environment checks needed throughout, and additional code complexities are red flags to me that this solution is not well integrated. To deal with these complexities I would like to consider if it may make things simpler to always (irrespective of allocation and monitoring scope) maintain allocation and monitoring domain lists. Each list need only carry data appropriate to its use ... the allocation list only has data relevant to allocation, the monitoring list only has data relevant to monitoring. This is the struct rdt_domain related split I mentioned previously. Code could become something like: resctrl_online_cpu() { ... for_each_alloc_capable_rdt_resource(r) alloc_domain_add_cpu(...) for_each_mon_capable_rdt_resource(r) mon_domain_add_cpu(...) ... } This would reduce complication in domain_add_cpu() since each domain list only need to concern itself with monitoring or allocation. Even resctrl_online_domain() can be siplified significantly by making it specific to allocation or monitoring. For example, resctrl_online_mon_domain() would only and always just run the monitoring related code. With the separate allocation and monitoring domain lists there may no longer be a need for scattering code with checks like: h = r->mon_scope ? &r->mondomains : &r->domains; This would be because the code can directly pick the domain list it is operating on. What do you think? The above is just refactoring of existing code and from what I can tell this would make supporting SNC straight forward. Reinette
> To deal with these complexities I would like to consider if it may > make things simpler to always (irrespective of allocation and > monitoring scope) maintain allocation and monitoring domain lists. > Each list need only carry data appropriate to its use ... the allocation > list only has data relevant to allocation, the monitoring list only > has data relevant to monitoring. This is the struct rdt_domain related > split I mentioned previously. > > Code could become something like: resctrl_online_cpu() { ... for_each_alloc_capable_rdt_resource(r) alloc_domain_add_cpu(...) for_each_mon_capable_rdt_resource(r) mon_domain_add_cpu(...) ... } > This would reduce complication in domain_add_cpu() since each domain list > only need to concern itself with monitoring or allocation. This does seem a worthy target. I started on a patch to so this ... but I'm not sure I have the stamina or the time to see it through. I split struct rdt_domain into rdt_ctrl_domain and rdt_mon_domain. But that led to also splitting the rdt_hw_domain structure into two, and then splitting the resctrl_to_arch_dom() function, and then another and another. That process will eventually converge (there are a finite number of lines of code) .... but it will be a big patch. I don't see how to stage it a piece at a time. -Tony
On Thu, Jul 20, 2023 at 09:56:50PM +0000, Luck, Tony wrote: > This does seem a worthy target. > > I started on a patch to so this ... but I'm not sure I have the stamina or the time > to see it through. I was being a wuss. I came back at this on Friday from a slightly different perspective, and it all came togeteher fairly easily. New series posted here: https://lore.kernel.org/all/20230722190740.326190-1-tony.luck@intel.com/ -Tony
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h index 8275b8a74f7e..243017096ddf 100644 --- a/arch/x86/kernel/cpu/resctrl/internal.h +++ b/arch/x86/kernel/cpu/resctrl/internal.h @@ -435,6 +435,7 @@ enum resctrl_res_level { RDT_RESOURCE_L2, RDT_RESOURCE_MBA, RDT_RESOURCE_SMBA, + RDT_RESOURCE_NODE, /* Must be the last */ RDT_NUM_RESOURCES, @@ -442,7 +443,8 @@ enum resctrl_res_level { enum resctrl_scope { SCOPE_L2_CACHE = 2, - SCOPE_L3_CACHE = 3 + SCOPE_L3_CACHE = 3, + SCOPE_NODE, }; static inline struct rdt_resource *resctrl_inc(struct rdt_resource *res) diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c index 6571514752f3..e4bd3072927c 100644 --- a/arch/x86/kernel/cpu/resctrl/core.c +++ b/arch/x86/kernel/cpu/resctrl/core.c @@ -112,6 +112,16 @@ struct rdt_hw_resource rdt_resources_all[] = { .fflags = RFTYPE_RES_MB, }, }, + [RDT_RESOURCE_NODE] = + { + .r_resctrl = { + .rid = RDT_RESOURCE_NODE, + .name = "L3", + .scope = SCOPE_NODE, + .domains = domain_init(RDT_RESOURCE_NODE), + .fflags = 0, + }, + }, }; /* @@ -489,6 +499,8 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom) static int get_domain_id(int cpu, enum resctrl_scope scope) { + if (scope == SCOPE_NODE) + return cpu_to_node(cpu); return get_cpu_cacheinfo_id(cpu, scope); }