[4/7] x86/resctrl: Add code to setup monitoring at L3 or NODE scope.

Message ID 20230126184157.27626-5-tony.luck@intel.com
State New
Headers
Series x86/resctrl: Add support for Sub-NUMA cluster (SNC) systems |

Commit Message

Luck, Tony Jan. 26, 2023, 6:41 p.m. UTC
  When Sub-NUMA cluster is enabled (snc_ways > 1) use the RDT_RESOURCE_NODE
instead of RDT_RESOURCE_L3 for all monitoring operations.

The mon_scale and num_rmid values from CPUID(0xf,0x1),(EBX,ECX) must be
scaled down by the number of Sub-NUMA Clusters.

A subsequent change will detect sub-NUMA cluster mode and set
"snc_ways". For now set to one (meaning each L3 cache spans one
node).

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h |  2 ++
 arch/x86/kernel/cpu/resctrl/core.c     | 13 ++++++++++++-
 arch/x86/kernel/cpu/resctrl/monitor.c  |  4 ++--
 arch/x86/kernel/cpu/resctrl/rdtgroup.c |  5 ++++-
 4 files changed, 20 insertions(+), 4 deletions(-)
  

Comments

James Morse Feb. 28, 2023, 5:13 p.m. UTC | #1
Hi Tony,

On 26/01/2023 18:41, Tony Luck wrote:
> When Sub-NUMA cluster is enabled (snc_ways > 1) use the RDT_RESOURCE_NODE
> instead of RDT_RESOURCE_L3 for all monitoring operations.
> 
> The mon_scale and num_rmid values from CPUID(0xf,0x1),(EBX,ECX) must be
> scaled down by the number of Sub-NUMA Clusters.
> 
> A subsequent change will detect sub-NUMA cluster mode and set
> "snc_ways". For now set to one (meaning each L3 cache spans one
> node).

(I'm looking at decoupling "monitoring is always on RDT_RESOURCE_L3" as a separate thing
 to enabling SNC ... just in case my comments seem strange!)


> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 19be6fe42ef3..53b2ab37af2f 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -786,7 +791,13 @@ static __init bool get_rdt_alloc_resources(void)
>  
>  static __init bool get_rdt_mon_resources(void)
>  {
> -	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +	struct rdt_resource *r;
> +
> +	/* When SNC enabled, monitor functions at node instead of L3 cache scope */
> +	if (snc_ways > 1)
> +		r = &rdt_resources_all[RDT_RESOURCE_NODE].r_resctrl;
> +	else
> +		r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;

Could this be hidden in a helper with some name like resctrl_arch_get_mbm_resource()?
You have the same pattern again in rdt_get_tree(). If this gets more complex in the
future, it means its outside the filesystem code, and all in one place.


Thanks,

James


>  	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC))
>  		rdt_mon_features |= (1 << QOS_L3_OCCUP_EVENT_ID);

> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index a6ba3080e5db..a0dc64a70d01 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2238,7 +2238,10 @@ static int rdt_get_tree(struct fs_context *fc)
>  		static_branch_enable_cpuslocked(&rdt_enable_key);
>  
>  	if (is_mbm_enabled()) {
> -		r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
> +		if (snc_ways > 1)
> +			r = &rdt_resources_all[RDT_RESOURCE_NODE].r_resctrl;
> +		else
> +			r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>  		list_for_each_entry(dom, &r->domains, list)
>  			mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL);
>  	}
  
Luck, Tony Feb. 28, 2023, 5:28 p.m. UTC | #2
>> +	/* When SNC enabled, monitor functions at node instead of L3 cache scope */
>> +	if (snc_ways > 1)
>> +		r = &rdt_resources_all[RDT_RESOURCE_NODE].r_resctrl;
>> +	else
>> +		r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>
> Could this be hidden in a helper with some name like resctrl_arch_get_mbm_resource()?
> You have the same pattern again in rdt_get_tree(). If this gets more complex in the
> future, it means its outside the filesystem code, and all in one place.

Sounds like a good idea.  Thanks.

-Tony
  

Patch

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 39a62babd60b..ad26d008dafa 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -405,6 +405,8 @@  DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
 
 extern struct dentry *debugfs_resctrl;
 
+extern int snc_ways;
+
 enum resctrl_res_level {
 	RDT_RESOURCE_L3,
 	RDT_RESOURCE_L2,
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 19be6fe42ef3..53b2ab37af2f 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -48,6 +48,11 @@  int max_name_width, max_data_width;
  */
 bool rdt_alloc_capable;
 
+/*
+ * How many Sub-Numa Cluster nodes share a single L3 cache
+ */
+int snc_ways = 1;
+
 static void
 mba_wrmsr_intel(struct rdt_domain *d, struct msr_param *m,
 		struct rdt_resource *r);
@@ -786,7 +791,13 @@  static __init bool get_rdt_alloc_resources(void)
 
 static __init bool get_rdt_mon_resources(void)
 {
-	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+	struct rdt_resource *r;
+
+	/* When SNC enabled, monitor functions at node instead of L3 cache scope */
+	if (snc_ways > 1)
+		r = &rdt_resources_all[RDT_RESOURCE_NODE].r_resctrl;
+	else
+		r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
 
 	if (rdt_cpu_has(X86_FEATURE_CQM_OCCUP_LLC))
 		rdt_mon_features |= (1 << QOS_L3_OCCUP_EVENT_ID);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index d05bbd4f6b2d..3fc63aa68130 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -777,8 +777,8 @@  int rdt_get_mon_l3_config(struct rdt_resource *r)
 	int ret;
 
 	resctrl_rmid_realloc_limit = boot_cpu_data.x86_cache_size * 1024;
-	hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale;
-	r->num_rmid = boot_cpu_data.x86_cache_max_rmid + 1;
+	hw_res->mon_scale = boot_cpu_data.x86_cache_occ_scale / snc_ways;
+	r->num_rmid = (boot_cpu_data.x86_cache_max_rmid + 1) / snc_ways;
 	hw_res->mbm_width = MBM_CNTR_WIDTH_BASE;
 
 	if (mbm_offset > 0 && mbm_offset <= MBM_CNTR_WIDTH_OFFSET_MAX)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index a6ba3080e5db..a0dc64a70d01 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2238,7 +2238,10 @@  static int rdt_get_tree(struct fs_context *fc)
 		static_branch_enable_cpuslocked(&rdt_enable_key);
 
 	if (is_mbm_enabled()) {
-		r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+		if (snc_ways > 1)
+			r = &rdt_resources_all[RDT_RESOURCE_NODE].r_resctrl;
+		else
+			r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
 		list_for_each_entry(dom, &r->domains, list)
 			mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL);
 	}