[3/7] x86/resctrl: Add a new node-scoped resource to rdt_resources_all[]

Message ID 20230126184157.27626-4-tony.luck@intel.com
State New
Headers
Series x86/resctrl: Add support for Sub-NUMA cluster (SNC) systems |

Commit Message

Luck, Tony Jan. 26, 2023, 6:41 p.m. UTC
  Add a placeholder in the array of struct rdt_hw_resource to be used
for event monitoring of systems with Sub-NUMA Cluster enabled.

Update get_domain_id() to handle SCOPE_NODE.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/resctrl/internal.h |  2 ++
 arch/x86/kernel/cpu/resctrl/core.c     | 12 ++++++++++++
 2 files changed, 14 insertions(+)
  

Comments

Fenghua Yu Jan. 27, 2023, 5:24 a.m. UTC | #1
Hi, Tony,

On 1/26/23 10:41, Tony Luck wrote:
> Add a placeholder in the array of struct rdt_hw_resource to be used
> for event monitoring of systems with Sub-NUMA Cluster enabled.
> 
> Update get_domain_id() to handle SCOPE_NODE.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>   arch/x86/kernel/cpu/resctrl/internal.h |  2 ++
>   arch/x86/kernel/cpu/resctrl/core.c     | 12 ++++++++++++
>   2 files changed, 14 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index 15cea517efaa..39a62babd60b 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -409,12 +409,14 @@ enum resctrl_res_level {
>   	RDT_RESOURCE_L3,
>   	RDT_RESOURCE_L2,
>   	RDT_RESOURCE_MBA,
> +	RDT_RESOURCE_NODE,
>   
>   	/* Must be the last */
>   	RDT_NUM_RESOURCES,
>   };
>   
>   enum resctrl_scope {
> +	SCOPE_NODE,
>   	SCOPE_L2_CACHE = 2,
>   	SCOPE_L3_CACHE = 3
>   };
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 6914232acf84..19be6fe42ef3 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -100,6 +100,16 @@ struct rdt_hw_resource rdt_resources_all[] = {
>   			.fflags			= RFTYPE_RES_MB,
>   		},
>   	},
> +	[RDT_RESOURCE_NODE] =
> +	{
> +		.r_resctrl = {
> +			.rid			= RDT_RESOURCE_NODE,
> +			.name			= "L3",
"L3" was named as RDT_RESOURCE_L3 already. The duplicate name here may
cause duplicate file names in info dir. Maybe rename it as "L3_NODE"?

> +			.scope			= SCOPE_NODE,
> +			.domains		= domain_init(RDT_RESOURCE_NODE),
> +			.fflags			= RFTYPE_RES_MB,

I'm not sure if fflags is RFTYPE_RES_MB | RFTYPE_RES_L3 for both cache
and MB?

> +		},
> +	},
>   };
>   
>   /*
> @@ -464,6 +474,8 @@ static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
>   
>   static int get_domain_id(int cpu, enum resctrl_scope scope)
>   {
> +	if (scope == SCOPE_NODE)
> +		return cpu_to_node(cpu);
>   	return get_cpu_cacheinfo_id(cpu, scope);
>   }
>   

Thanks.

-Fenghua
  
Peter Newman Jan. 27, 2023, 4:02 p.m. UTC | #2
Hi Tony,

On Fri, Jan 27, 2023 at 6:25 AM Yu, Fenghua <fenghua.yu@intel.com> wrote:
>
> On 1/26/23 10:41, Tony Luck wrote:
> > diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> > index 6914232acf84..19be6fe42ef3 100644
> > --- a/arch/x86/kernel/cpu/resctrl/core.c
> > +++ b/arch/x86/kernel/cpu/resctrl/core.c
> > @@ -100,6 +100,16 @@ struct rdt_hw_resource rdt_resources_all[] = {
> >                       .fflags                 = RFTYPE_RES_MB,
> >               },
> >       },
> > +     [RDT_RESOURCE_NODE] =
> > +     {
> > +             .r_resctrl = {
> > +                     .rid                    = RDT_RESOURCE_NODE,
> > +                     .name                   = "L3",
> "L3" was named as RDT_RESOURCE_L3 already. The duplicate name here may
> cause duplicate file names in info dir. Maybe rename it as "L3_NODE"?

I'm trying to get some feedback from our own users on whether changing
the directory names would bother them. At least from my own testing, I
did learn to appreciate the interface change a bit more: I needed an
SNC and non-SNC case to correctly predict which mon_data subdirectory
the data would appear in.

I was able to confirm that this change allows bandwidth to be counted
on RMID/CPU combos where it didn't work before on an SNC2
configuration.

If I'm understanding this correctly, it might be helpful to highlight
that the extra resource is needed to allow a different number of L3
domains in L3 monitoring vs allocation.

Thanks!
-Peter

Tested-By: Peter Newman <peternewman@google.com>
Reviewed-By: Peter Newman <peternewman@google.com>
  
Luck, Tony Jan. 27, 2023, 6:23 p.m. UTC | #3
>> +	[RDT_RESOURCE_NODE] =
>> +	{
>> +		.r_resctrl = {
>> +			.rid			= RDT_RESOURCE_NODE,
>> +			.name			= "L3",

> "L3" was named as RDT_RESOURCE_L3 already. The duplicate name here may
> cause duplicate file names in info dir. Maybe rename it as "L3_NODE"?

I thought the same, and my first implementation used a different string here (I picked
"NODE" rather than "L3_NODE").

But my testers complained that this broke all their existing infrastructure that reads
cache occupancy and memory bandwidth. This string is not just used in the info/
directory, it is also the basis for the directory names in mon_data/

$ tree /sys/fs/resctrl/mon_data
/sys/fs/resctrl/mon_data
├── mon_L3_00
│   ├── llc_occupancy
│   ├── mbm_local_bytes
│   └── mbm_total_bytes
├── mon_L3_01
│   ├── llc_occupancy
│   ├── mbm_local_bytes
│   └── mbm_total_bytes
├── mon_L3_02
│   ├── llc_occupancy
│   ├── mbm_local_bytes
│   └── mbm_total_bytes
└── mon_L3_03
    ├── llc_occupancy
    ├── mbm_local_bytes
    └── mbm_total_bytes

The name using "L3" is still appropriate and accurate.

There isn't a "duplicate file names" problem in the info/ directory because a system
either has SNC disabled, and uses the L3-scoped resource, or has SNC enabled and
uses the node-scoped resource.

-Tony
  
Fenghua Yu Jan. 28, 2023, 2:22 a.m. UTC | #4
Hi, Tony,

On 1/26/23 10:41, Tony Luck wrote:
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 6914232acf84..19be6fe42ef3 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -100,6 +100,16 @@ struct rdt_hw_resource rdt_resources_all[] = {
>   			.fflags			= RFTYPE_RES_MB,
>   		},
>   	},
> +	[RDT_RESOURCE_NODE] =
> +	{
> +		.r_resctrl = {
> +			.rid			= RDT_RESOURCE_NODE,
> +			.name			= "L3",
> +			.scope			= SCOPE_NODE,
> +			.domains		= domain_init(RDT_RESOURCE_NODE),
> +			.fflags			= RFTYPE_RES_MB,

RFTYPE_RES_MB is for the resource to add some files in info/MB.
But the NODE resource doesn't have any files to show in info/MB.

Only shown file for the NODE resource is info/L3_MON/snc_ways. But this
file doesn't need to set fflags.

Maybe

Thanks.

-Fenghua
  
Fenghua Yu Jan. 28, 2023, 2:25 a.m. UTC | #5
Hi, Tony,

On 1/27/23 10:23, Luck, Tony wrote:
>>> +	[RDT_RESOURCE_NODE] =
>>> +	{
>>> +		.r_resctrl = {
>>> +			.rid			= RDT_RESOURCE_NODE,
>>> +			.name			= "L3",
> 
>> "L3" was named as RDT_RESOURCE_L3 already. The duplicate name here may
>> cause duplicate file names in info dir. Maybe rename it as "L3_NODE"?
> 
> I thought the same, and my first implementation used a different string here (I picked
> "NODE" rather than "L3_NODE").
> 
> But my testers complained that this broke all their existing infrastructure that reads
> cache occupancy and memory bandwidth. This string is not just used in the info/
> directory, it is also the basis for the directory names in mon_data/
> 
> $ tree /sys/fs/resctrl/mon_data
> /sys/fs/resctrl/mon_data
> ├── mon_L3_00
> │   ├── llc_occupancy
> │   ├── mbm_local_bytes
> │   └── mbm_total_bytes
> ├── mon_L3_01
> │   ├── llc_occupancy
> │   ├── mbm_local_bytes
> │   └── mbm_total_bytes
> ├── mon_L3_02
> │   ├── llc_occupancy
> │   ├── mbm_local_bytes
> │   └── mbm_total_bytes
> └── mon_L3_03
>      ├── llc_occupancy
>      ├── mbm_local_bytes
>      └── mbm_total_bytes
> 
> The name using "L3" is still appropriate and accurate.
> 
> There isn't a "duplicate file names" problem in the info/ directory because a system
> either has SNC disabled, and uses the L3-scoped resource, or has SNC enabled and
> uses the node-scoped resource.

That's right.

Thank you for your info!

-Fenghua
  
Fenghua Yu Jan. 28, 2023, 2:36 a.m. UTC | #6
Hi, Tony,

On 1/26/23 10:41, Tony Luck wrote:
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 6914232acf84..19be6fe42ef3 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -100,6 +100,16 @@ struct rdt_hw_resource rdt_resources_all[] = {
>   			.fflags			= RFTYPE_RES_MB,
>   		},
>   	},
> +	[RDT_RESOURCE_NODE] =
> +	{
> +		.r_resctrl = {
> +			.rid			= RDT_RESOURCE_NODE,
> +			.name			= "L3",
> +			.scope			= SCOPE_NODE,
> +			.domains		= domain_init(RDT_RESOURCE_NODE),
> +			.fflags			= RFTYPE_RES_MB,

RFTYPE_RES_MB is for the resource to add some files in info/MB.
But the NODE resource doesn't have any files to show in info/MB.

Only shown file for the NODE resource is info/L3_MON/snc_ways. But this 
file doesn't need to set fflags.

Seems no need to set fflags or fflags=0 to eliminate confusion?

Thanks.

-Fenghua
  
Luck, Tony Jan. 30, 2023, 7:04 p.m. UTC | #7
>> +			.domains		= domain_init(RDT_RESOURCE_NODE),
>> +			.fflags			= RFTYPE_RES_MB,

> RFTYPE_RES_MB is for the resource to add some files in info/MB.
> But the NODE resource doesn't have any files to show in info/MB.
>
> Only shown file for the NODE resource is info/L3_MON/snc_ways. But this 
> file doesn't need to set fflags.
>
> Seems no need to set fflags or fflags=0 to eliminate confusion?

I just cut & pasted from the L3 ... oops. I think you may be right that
an explicit ".fflags = 0" would be best here.

-Tony
  
Moger, Babu Feb. 28, 2023, 2:27 p.m. UTC | #8
Hi Tony,
Sorry for the late response. I was looking at your patches.

Do you really need a new resource [RDT_RESOURCE_NODE] to handle this 
feature?

As far as I can see, all that matters is writing/reading the MSRs 
IA32_PQR_ASSOC and IA32_QM_EVTSEL based on cpu index. I think that can 
be done without having the new resource. Let me know if I have 
misunderstood something.

Thanks
Babu


> -----Original Message-----
> From: Tony Luck <tony.luck@intel.com>
> Sent: Thursday, January 26, 2023 12:42 PM
> To: Fenghua Yu <fenghua.yu@intel.com>; Reinette Chatre
> <reinette.chatre@intel.com>; Peter Newman <peternewman@google.com>;
> Jonathan Corbet <corbet@lwn.net>; x86@kernel.org
> Cc: Shaopeng Tan <tan.shaopeng@fujitsu.com>; James Morse
> <james.morse@arm.com>; Jamie Iles <quic_jiles@quicinc.com>; Moger, Babu
> <Babu.Moger@amd.com>; linux-kernel@vger.kernel.org; linux-
> doc@vger.kernel.org; patches@lists.linux.dev; Tony Luck
> <tony.luck@intel.com>
> Subject: [PATCH 3/7] x86/resctrl: Add a new node-scoped resource to
> rdt_resources_all[]
> 
> Add a placeholder in the array of struct rdt_hw_resource to be used for event
> monitoring of systems with Sub-NUMA Cluster enabled.
> 
> Update get_domain_id() to handle SCOPE_NODE.
> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/cpu/resctrl/internal.h |  2 ++
>  arch/x86/kernel/cpu/resctrl/core.c     | 12 ++++++++++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h
> b/arch/x86/kernel/cpu/resctrl/internal.h
> index 15cea517efaa..39a62babd60b 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -409,12 +409,14 @@ enum resctrl_res_level {
>  	RDT_RESOURCE_L3,
>  	RDT_RESOURCE_L2,
>  	RDT_RESOURCE_MBA,
> +	RDT_RESOURCE_NODE,
> 
>  	/* Must be the last */
>  	RDT_NUM_RESOURCES,
>  };
> 
>  enum resctrl_scope {
> +	SCOPE_NODE,
>  	SCOPE_L2_CACHE = 2,
>  	SCOPE_L3_CACHE = 3
>  };
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c
> b/arch/x86/kernel/cpu/resctrl/core.c
> index 6914232acf84..19be6fe42ef3 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -100,6 +100,16 @@ struct rdt_hw_resource rdt_resources_all[] = {
>  			.fflags			= RFTYPE_RES_MB,
>  		},
>  	},
> +	[RDT_RESOURCE_NODE] =
> +	{
> +		.r_resctrl = {
> +			.rid			= RDT_RESOURCE_NODE,
> +			.name			= "L3",
> +			.scope			= SCOPE_NODE,
> +			.domains		=
> domain_init(RDT_RESOURCE_NODE),
> +			.fflags			= RFTYPE_RES_MB,
> +		},
> +	},
>  };
> 
>  /*
> @@ -464,6 +474,8 @@ static int arch_domain_mbm_alloc(u32 num_rmid,
> struct rdt_hw_domain *hw_dom)
> 
>  static int get_domain_id(int cpu, enum resctrl_scope scope)  {
> +	if (scope == SCOPE_NODE)
> +		return cpu_to_node(cpu);
>  	return get_cpu_cacheinfo_id(cpu, scope);  }
> 
> --
> 2.39.1
  
Luck, Tony Feb. 28, 2023, 5:05 p.m. UTC | #9
Babu,

Thanks for looking at my patches.

> Do you really need a new resource [RDT_RESOURCE_NODE] to handle this 
> feature?

Yes. When sub-numa cluster mode is enabled, there are separate counts for
each "node" on the socket. This new resource is the key to creating extra
directories in the /sys/fs/resctrl/mon_data/ area so that the memory bandwidth
and cache occupancy can be read for each node, instead of just for each socket.

But there are some other issues with this patch series. New version will be posted
once they are fixed up.

-Tony
  

Patch

diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index 15cea517efaa..39a62babd60b 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -409,12 +409,14 @@  enum resctrl_res_level {
 	RDT_RESOURCE_L3,
 	RDT_RESOURCE_L2,
 	RDT_RESOURCE_MBA,
+	RDT_RESOURCE_NODE,
 
 	/* Must be the last */
 	RDT_NUM_RESOURCES,
 };
 
 enum resctrl_scope {
+	SCOPE_NODE,
 	SCOPE_L2_CACHE = 2,
 	SCOPE_L3_CACHE = 3
 };
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 6914232acf84..19be6fe42ef3 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -100,6 +100,16 @@  struct rdt_hw_resource rdt_resources_all[] = {
 			.fflags			= RFTYPE_RES_MB,
 		},
 	},
+	[RDT_RESOURCE_NODE] =
+	{
+		.r_resctrl = {
+			.rid			= RDT_RESOURCE_NODE,
+			.name			= "L3",
+			.scope			= SCOPE_NODE,
+			.domains		= domain_init(RDT_RESOURCE_NODE),
+			.fflags			= RFTYPE_RES_MB,
+		},
+	},
 };
 
 /*
@@ -464,6 +474,8 @@  static int arch_domain_mbm_alloc(u32 num_rmid, struct rdt_hw_domain *hw_dom)
 
 static int get_domain_id(int cpu, enum resctrl_scope scope)
 {
+	if (scope == SCOPE_NODE)
+		return cpu_to_node(cpu);
 	return get_cpu_cacheinfo_id(cpu, scope);
 }