[v6,22/24] x86/resctrl: Add cpu offline callback for resctrl work
Commit Message
The resctrl architecture specific code may need to free a domain when
a CPU goes offline, it also needs to reset the CPUs PQR_ASSOC register.
Amongst other things, the resctrl filesystem code needs to clear this
CPU from the cpu_mask of any control and monitor groups.
Currently this is all done in core.c and called from
resctrl_offline_cpu(), making the split between architecture and
filesystem code unclear.
Move the filesystem work to remove the CPU from the control and monitor
groups into a filesystem helper called resctrl_offline_cpu(), and rename
the one in core.c resctrl_arch_offline_cpu().
The rdtgroup_mutex is unlocked and locked again in the call in
preparation for changing the locking rules for the architecture
code.
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-By: Peter Newman <peternewman@google.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
arch/x86/kernel/cpu/resctrl/core.c | 25 +++++--------------------
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 24 ++++++++++++++++++++++++
include/linux/resctrl.h | 1 +
3 files changed, 30 insertions(+), 20 deletions(-)
Comments
Hi James,
On 9/14/2023 10:21 AM, James Morse wrote:
> The resctrl architecture specific code may need to free a domain when
> a CPU goes offline, it also needs to reset the CPUs PQR_ASSOC register.
> Amongst other things, the resctrl filesystem code needs to clear this
> CPU from the cpu_mask of any control and monitor groups.
>
> Currently this is all done in core.c and called from
> resctrl_offline_cpu(), making the split between architecture and
> filesystem code unclear.
>
> Move the filesystem work to remove the CPU from the control and monitor
> groups into a filesystem helper called resctrl_offline_cpu(), and rename
> the one in core.c resctrl_arch_offline_cpu().
>
> The rdtgroup_mutex is unlocked and locked again in the call in
> preparation for changing the locking rules for the architecture
> code.
This last paragraph may cause some confusion since this refactoring
is not changing any current locking. I'll defer to you if you prefer
to keep it.
>
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-By: Peter Newman <peternewman@google.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reinette
Hi Reinette,
On 03/10/2023 22:23, Reinette Chatre wrote:
> On 9/14/2023 10:21 AM, James Morse wrote:
>> The resctrl architecture specific code may need to free a domain when
>> a CPU goes offline, it also needs to reset the CPUs PQR_ASSOC register.
>> Amongst other things, the resctrl filesystem code needs to clear this
>> CPU from the cpu_mask of any control and monitor groups.
>>
>> Currently this is all done in core.c and called from
>> resctrl_offline_cpu(), making the split between architecture and
>> filesystem code unclear.
>>
>> Move the filesystem work to remove the CPU from the control and monitor
>> groups into a filesystem helper called resctrl_offline_cpu(), and rename
>> the one in core.c resctrl_arch_offline_cpu().
>>
>> The rdtgroup_mutex is unlocked and locked again in the call in
>> preparation for changing the locking rules for the architecture
>> code.
>
> This last paragraph may cause some confusion since this refactoring
> is not changing any current locking. I'll defer to you if you prefer
> to keep it.
Hmm, that is referring to an earlier version that looked funny and I felt needed
explanation. I've remove that paragraph.
>> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
>> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
>> Tested-By: Peter Newman <peternewman@google.com>
>> Signed-off-by: James Morse <james.morse@arm.com>
>> ---
>
> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Thanks!
James
@@ -623,31 +623,15 @@ static int resctrl_arch_online_cpu(unsigned int cpu)
return 0;
}
-static void clear_childcpus(struct rdtgroup *r, unsigned int cpu)
+static int resctrl_arch_offline_cpu(unsigned int cpu)
{
- struct rdtgroup *cr;
-
- list_for_each_entry(cr, &r->mon.crdtgrp_list, mon.crdtgrp_list) {
- if (cpumask_test_and_clear_cpu(cpu, &cr->cpu_mask)) {
- break;
- }
- }
-}
-
-static int resctrl_offline_cpu(unsigned int cpu)
-{
- struct rdtgroup *rdtgrp;
struct rdt_resource *r;
mutex_lock(&rdtgroup_mutex);
+ resctrl_offline_cpu(cpu);
+
for_each_capable_rdt_resource(r)
domain_remove_cpu(cpu, r);
- list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) {
- if (cpumask_test_and_clear_cpu(cpu, &rdtgrp->cpu_mask)) {
- clear_childcpus(rdtgrp, cpu);
- break;
- }
- }
clear_closid_rmid(cpu);
mutex_unlock(&rdtgroup_mutex);
@@ -970,7 +954,8 @@ static int __init resctrl_late_init(void)
state = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
"x86/resctrl/cat:online:",
- resctrl_arch_online_cpu, resctrl_offline_cpu);
+ resctrl_arch_online_cpu,
+ resctrl_arch_offline_cpu);
if (state < 0)
return state;
@@ -3878,6 +3878,30 @@ void resctrl_online_cpu(unsigned int cpu)
cpumask_set_cpu(cpu, &rdtgroup_default.cpu_mask);
}
+static void clear_childcpus(struct rdtgroup *r, unsigned int cpu)
+{
+ struct rdtgroup *cr;
+
+ list_for_each_entry(cr, &r->mon.crdtgrp_list, mon.crdtgrp_list) {
+ if (cpumask_test_and_clear_cpu(cpu, &cr->cpu_mask))
+ break;
+ }
+}
+
+void resctrl_offline_cpu(unsigned int cpu)
+{
+ struct rdtgroup *rdtgrp;
+
+ lockdep_assert_held(&rdtgroup_mutex);
+
+ list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) {
+ if (cpumask_test_and_clear_cpu(cpu, &rdtgrp->cpu_mask)) {
+ clear_childcpus(rdtgrp, cpu);
+ break;
+ }
+ }
+}
+
/*
* rdtgroup_init - rdtgroup initialization
*
@@ -226,6 +226,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d,
int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d);
void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d);
void resctrl_online_cpu(unsigned int cpu);
+void resctrl_offline_cpu(unsigned int cpu);
/**
* resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid