[v7,21/24] x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but cpu

Message ID 20231025180345.28061-22-james.morse@arm.com
State New
Headers
Series x86/resctrl: monitored closid+rmid together, separate arch/fs locking |

Commit Message

James Morse Oct. 25, 2023, 6:03 p.m. UTC
  When a CPU is taken offline resctrl may need to move the overflow or
limbo handlers to run on a different CPU.

Once the offline callbacks have been split, cqm_setup_limbo_handler()
will be called while the CPU that is going offline is still present
in the cpu_mask.

Pass the CPU to exclude to cqm_setup_limbo_handler() and
mbm_setup_overflow_handler(). These functions can use a variant of
cpumask_any_but() when selecting the CPU. -1 is used to indicate no CPUs
need excluding.

A subsequent patch moves these calls to be before CPUs have been removed,
so this exclude_cpus behaviour is temporary.

Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Tested-by: Peter Newman <peternewman@google.com>
Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v2:
 * Rephrased a comment to avoid a two letter bad-word. (we)
 * Avoid assigning mbm_work_cpu if the domain is going to be free()d
 * Added cpumask_any_housekeeping_but(), I dislike the name

Changes since v3:
 * Marked an explanatory comment as temporary as the subsequent patch is
   no longer adjacent.

Changes since v4:
 * Check against RESCTRL_PICK_ANY_CPU instead of -1.
 * Leave cqm_work_cpu as nr_cpu_ids when no CPU is available.
 * Made cpumask_any_housekeeping_but() more readable.

Changes since v5:
 * Changes in captialisation, and a typo.
 * Merged cpumask helpers.

Changes since v6:
 * Added the missing dom parameter to some kernel doc.
 * Re-added use of cpumask_any_but(),
 * Expanded comment above cpumask_any_housekeeping(),
 * Added some more comments for good measure.
 * Added explicit IS_ENABLED() check as gcc-12 doesn't seem to work this out.
---
 arch/x86/kernel/cpu/resctrl/core.c        |  8 +++--
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  2 +-
 arch/x86/kernel/cpu/resctrl/internal.h    | 33 ++++++++++++++----
 arch/x86/kernel/cpu/resctrl/monitor.c     | 42 ++++++++++++++++++-----
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    |  6 ++--
 include/linux/resctrl.h                   |  2 ++
 6 files changed, 72 insertions(+), 21 deletions(-)
  

Comments

Reinette Chatre Nov. 9, 2023, 5:48 p.m. UTC | #1
Hi James,

On 10/25/2023 11:03 AM, James Morse wrote:
> When a CPU is taken offline resctrl may need to move the overflow or
> limbo handlers to run on a different CPU.
> 
> Once the offline callbacks have been split, cqm_setup_limbo_handler()
> will be called while the CPU that is going offline is still present
> in the cpu_mask.
> 
> Pass the CPU to exclude to cqm_setup_limbo_handler() and
> mbm_setup_overflow_handler(). These functions can use a variant of
> cpumask_any_but() when selecting the CPU. -1 is used to indicate no CPUs
> need excluding.
> 
> A subsequent patch moves these calls to be before CPUs have been removed,
> so this exclude_cpus behaviour is temporary.

Note "A subsequent patch". Please do go over your entire series. I may not
have noticed all.

> 
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since v2:
>  * Rephrased a comment to avoid a two letter bad-word. (we)
>  * Avoid assigning mbm_work_cpu if the domain is going to be free()d
>  * Added cpumask_any_housekeeping_but(), I dislike the name
> 
> Changes since v3:
>  * Marked an explanatory comment as temporary as the subsequent patch is
>    no longer adjacent.
> 
> Changes since v4:
>  * Check against RESCTRL_PICK_ANY_CPU instead of -1.
>  * Leave cqm_work_cpu as nr_cpu_ids when no CPU is available.
>  * Made cpumask_any_housekeeping_but() more readable.
> 
> Changes since v5:
>  * Changes in captialisation, and a typo.
>  * Merged cpumask helpers.
> 
> Changes since v6:
>  * Added the missing dom parameter to some kernel doc.
>  * Re-added use of cpumask_any_but(),
>  * Expanded comment above cpumask_any_housekeeping(),
>  * Added some more comments for good measure.
>  * Added explicit IS_ENABLED() check as gcc-12 doesn't seem to work this out.
> ---
>  arch/x86/kernel/cpu/resctrl/core.c        |  8 +++--
>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  2 +-
>  arch/x86/kernel/cpu/resctrl/internal.h    | 33 ++++++++++++++----
>  arch/x86/kernel/cpu/resctrl/monitor.c     | 42 ++++++++++++++++++-----
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c    |  6 ++--
>  include/linux/resctrl.h                   |  2 ++
>  6 files changed, 72 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 1a74e9c47416..7e44f2c40897 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -586,12 +586,16 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
>  	if (r == &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl) {
>  		if (is_mbm_enabled() && cpu == d->mbm_work_cpu) {
>  			cancel_delayed_work(&d->mbm_over);
> -			mbm_setup_overflow_handler(d, 0);
> +			/*
> +			 * temporary: exclude_cpu=-1 as this CPU has already
> +			 * been removed by cpumask_clear_cpu()d
> +			 */
> +			mbm_setup_overflow_handler(d, 0, RESCTRL_PICK_ANY_CPU);
>  		}
>  		if (is_llc_occupancy_enabled() && cpu == d->cqm_work_cpu &&
>  		    has_busy_rmid(d)) {
>  			cancel_delayed_work(&d->cqm_limbo);
> -			cqm_setup_limbo_handler(d, 0);
> +			cqm_setup_limbo_handler(d, 0, RESCTRL_PICK_ANY_CPU);
>  		}
>  	}
>  }
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index a033e8e32108..64db51455df3 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -552,7 +552,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>  		return;
>  	}
>  
> -	cpu = cpumask_any_housekeeping(&d->cpu_mask);
> +	cpu = cpumask_any_housekeeping(&d->cpu_mask, RESCTRL_PICK_ANY_CPU);
>  
>  	/*
>  	 * cpumask_any_housekeeping() prefers housekeeping CPUs, but
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index c4c1e1909058..f5fff2f0d866 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -61,19 +61,36 @@
>   * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
>   *			        aren't marked nohz_full
>   * @mask:	The mask to pick a CPU from.
> + * @exclude_cpu:The CPU to avoid picking.
>   *
> - * Returns a CPU in @mask. If there are housekeeping CPUs that don't use
> - * nohz_full, these are preferred.
> + * Returns a CPU from @mask, but not @exclude_cpu. If there are housekeeping
> + * CPUs that don't use nohz_full, these are preferred. Pass
> + * RESCTRL_PICK_ANY_CPU to avoid excluding any CPUs.
> + *
> + * When a CPU is excluded, returns >= nr_cpu_ids if no CPUs are available.
>   */
> -static inline unsigned int cpumask_any_housekeeping(const struct cpumask *mask)
> +static inline unsigned int
> +cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
>  {
>  	unsigned int cpu, hk_cpu;
>  
> -	cpu = cpumask_any(mask);
> -	if (!tick_nohz_full_cpu(cpu))
> +	if (exclude_cpu == RESCTRL_PICK_ANY_CPU)
> +		cpu = cpumask_any(mask);
> +	else
> +		cpu = cpumask_any_but(mask, exclude_cpu);
> +
> +	if (!IS_ENABLED(CONFIG_NO_HZ_FULL))
> +		return cpu;

It is not clear to me how cpumask_any_but() failure is handled.

cpumask_any_but() returns ">= nr_cpu_ids if no cpus set" ...

> +
> +	/* If the CPU picked isn't marked nohz_full, we're done */

Please don't impersonate code.

> +	if (cpu <= nr_cpu_ids && !tick_nohz_full_cpu(cpu))
>  		return cpu;

Is this intended to be "cpu < nr_cpu_ids"? But that would have
code continue ... so maybe it needs explicit error check of
cpumask_any_but() failure with an earlier exit?

>  
> +	/* Try to find a CPU that isn't nohz_full to use in preference */
>  	hk_cpu = cpumask_nth_andnot(0, mask, tick_nohz_full_mask);
> +	if (hk_cpu == exclude_cpu)
> +		hk_cpu = cpumask_nth_andnot(1, mask, tick_nohz_full_mask);
> +
>  	if (hk_cpu < nr_cpu_ids)
>  		cpu = hk_cpu;
>  

Reinette
  
Moger, Babu Nov. 9, 2023, 8:51 p.m. UTC | #2
On 10/25/23 13:03, James Morse wrote:
> When a CPU is taken offline resctrl may need to move the overflow or
> limbo handlers to run on a different CPU.
> 
> Once the offline callbacks have been split, cqm_setup_limbo_handler()
> will be called while the CPU that is going offline is still present
> in the cpu_mask.
> 
> Pass the CPU to exclude to cqm_setup_limbo_handler() and
> mbm_setup_overflow_handler(). These functions can use a variant of
> cpumask_any_but() when selecting the CPU. -1 is used to indicate no CPUs
> need excluding.
> 
> A subsequent patch moves these calls to be before CPUs have been removed,
> so this exclude_cpus behaviour is temporary.
> 
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Signed-off-by: James Morse <james.morse@arm.com>

Reviewed-by: Babu Moger <babu.moger@amd.com>

> ---
> Changes since v2:
>  * Rephrased a comment to avoid a two letter bad-word. (we)
>  * Avoid assigning mbm_work_cpu if the domain is going to be free()d
>  * Added cpumask_any_housekeeping_but(), I dislike the name
> 
> Changes since v3:
>  * Marked an explanatory comment as temporary as the subsequent patch is
>    no longer adjacent.
> 
> Changes since v4:
>  * Check against RESCTRL_PICK_ANY_CPU instead of -1.
>  * Leave cqm_work_cpu as nr_cpu_ids when no CPU is available.
>  * Made cpumask_any_housekeeping_but() more readable.
> 
> Changes since v5:
>  * Changes in captialisation, and a typo.
>  * Merged cpumask helpers.
> 
> Changes since v6:
>  * Added the missing dom parameter to some kernel doc.
>  * Re-added use of cpumask_any_but(),
>  * Expanded comment above cpumask_any_housekeeping(),
>  * Added some more comments for good measure.
>  * Added explicit IS_ENABLED() check as gcc-12 doesn't seem to work this out.
> ---
>  arch/x86/kernel/cpu/resctrl/core.c        |  8 +++--
>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  2 +-
>  arch/x86/kernel/cpu/resctrl/internal.h    | 33 ++++++++++++++----
>  arch/x86/kernel/cpu/resctrl/monitor.c     | 42 ++++++++++++++++++-----
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c    |  6 ++--
>  include/linux/resctrl.h                   |  2 ++
>  6 files changed, 72 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 1a74e9c47416..7e44f2c40897 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -586,12 +586,16 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
>  	if (r == &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl) {
>  		if (is_mbm_enabled() && cpu == d->mbm_work_cpu) {
>  			cancel_delayed_work(&d->mbm_over);
> -			mbm_setup_overflow_handler(d, 0);
> +			/*
> +			 * temporary: exclude_cpu=-1 as this CPU has already
> +			 * been removed by cpumask_clear_cpu()d
> +			 */
> +			mbm_setup_overflow_handler(d, 0, RESCTRL_PICK_ANY_CPU);
>  		}
>  		if (is_llc_occupancy_enabled() && cpu == d->cqm_work_cpu &&
>  		    has_busy_rmid(d)) {
>  			cancel_delayed_work(&d->cqm_limbo);
> -			cqm_setup_limbo_handler(d, 0);
> +			cqm_setup_limbo_handler(d, 0, RESCTRL_PICK_ANY_CPU);
>  		}
>  	}
>  }
> diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> index a033e8e32108..64db51455df3 100644
> --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
> @@ -552,7 +552,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>  		return;
>  	}
>  
> -	cpu = cpumask_any_housekeeping(&d->cpu_mask);
> +	cpu = cpumask_any_housekeeping(&d->cpu_mask, RESCTRL_PICK_ANY_CPU);
>  
>  	/*
>  	 * cpumask_any_housekeeping() prefers housekeeping CPUs, but
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index c4c1e1909058..f5fff2f0d866 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -61,19 +61,36 @@
>   * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
>   *			        aren't marked nohz_full
>   * @mask:	The mask to pick a CPU from.
> + * @exclude_cpu:The CPU to avoid picking.
>   *
> - * Returns a CPU in @mask. If there are housekeeping CPUs that don't use
> - * nohz_full, these are preferred.
> + * Returns a CPU from @mask, but not @exclude_cpu. If there are housekeeping
> + * CPUs that don't use nohz_full, these are preferred. Pass
> + * RESCTRL_PICK_ANY_CPU to avoid excluding any CPUs.
> + *
> + * When a CPU is excluded, returns >= nr_cpu_ids if no CPUs are available.
>   */
> -static inline unsigned int cpumask_any_housekeeping(const struct cpumask *mask)
> +static inline unsigned int
> +cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
>  {
>  	unsigned int cpu, hk_cpu;
>  
> -	cpu = cpumask_any(mask);
> -	if (!tick_nohz_full_cpu(cpu))
> +	if (exclude_cpu == RESCTRL_PICK_ANY_CPU)
> +		cpu = cpumask_any(mask);
> +	else
> +		cpu = cpumask_any_but(mask, exclude_cpu);
> +
> +	if (!IS_ENABLED(CONFIG_NO_HZ_FULL))
> +		return cpu;
> +
> +	/* If the CPU picked isn't marked nohz_full, we're done */
> +	if (cpu <= nr_cpu_ids && !tick_nohz_full_cpu(cpu))
>  		return cpu;
>  
> +	/* Try to find a CPU that isn't nohz_full to use in preference */
>  	hk_cpu = cpumask_nth_andnot(0, mask, tick_nohz_full_mask);
> +	if (hk_cpu == exclude_cpu)
> +		hk_cpu = cpumask_nth_andnot(1, mask, tick_nohz_full_mask);
> +
>  	if (hk_cpu < nr_cpu_ids)
>  		cpu = hk_cpu;
>  
> @@ -575,11 +592,13 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
>  		    struct rdt_domain *d, struct rdtgroup *rdtgrp,
>  		    int evtid, int first);
>  void mbm_setup_overflow_handler(struct rdt_domain *dom,
> -				unsigned long delay_ms);
> +				unsigned long delay_ms,
> +				int exclude_cpu);
>  void mbm_handle_overflow(struct work_struct *work);
>  void __init intel_rdt_mbm_apply_quirk(void);
>  bool is_mba_sc(struct rdt_resource *r);
> -void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
> +void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms,
> +			     int exclude_cpu);
>  void cqm_handle_limbo(struct work_struct *work);
>  bool has_busy_rmid(struct rdt_domain *d);
>  void __check_limbo(struct rdt_domain *d, bool force_free);
> diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
> index 87379d2a339c..241b3dd8646c 100644
> --- a/arch/x86/kernel/cpu/resctrl/monitor.c
> +++ b/arch/x86/kernel/cpu/resctrl/monitor.c
> @@ -481,7 +481,8 @@ static void add_rmid_to_limbo(struct rmid_entry *entry)
>  		 * setup up the limbo worker.
>  		 */
>  		if (!has_busy_rmid(d))
> -			cqm_setup_limbo_handler(d, CQM_LIMBOCHECK_INTERVAL);
> +			cqm_setup_limbo_handler(d, CQM_LIMBOCHECK_INTERVAL,
> +						RESCTRL_PICK_ANY_CPU);
>  		set_bit(idx, d->rmid_busy_llc);
>  		entry->busy++;
>  	}
> @@ -808,7 +809,8 @@ void cqm_handle_limbo(struct work_struct *work)
>  	__check_limbo(d, false);
>  
>  	if (has_busy_rmid(d)) {
> -		d->cqm_work_cpu = cpumask_any_housekeeping(&d->cpu_mask);
> +		d->cqm_work_cpu = cpumask_any_housekeeping(&d->cpu_mask,
> +							   RESCTRL_PICK_ANY_CPU);
>  		schedule_delayed_work_on(d->cqm_work_cpu, &d->cqm_limbo,
>  					 delay);
>  	}
> @@ -816,15 +818,25 @@ void cqm_handle_limbo(struct work_struct *work)
>  	mutex_unlock(&rdtgroup_mutex);
>  }
>  
> -void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms)
> +/**
> + * cqm_setup_limbo_handler() - Schedule the limbo handler to run for this
> + *                             domain.
> + * @dom:           The domain the limbo handler should run for.
> + * @delay_ms:      How far in the future the handler should run.
> + * @exclude_cpu:   Which CPU the handler should not run on,
> + *		   RESCTRL_PICK_ANY_CPU to pick any CPU.
> + */
> +void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms,
> +			     int exclude_cpu)
>  {
>  	unsigned long delay = msecs_to_jiffies(delay_ms);
>  	int cpu;
>  
> -	cpu = cpumask_any_housekeeping(&dom->cpu_mask);
> +	cpu = cpumask_any_housekeeping(&dom->cpu_mask, exclude_cpu);
>  	dom->cqm_work_cpu = cpu;
>  
> -	schedule_delayed_work_on(cpu, &dom->cqm_limbo, delay);
> +	if (cpu < nr_cpu_ids)
> +		schedule_delayed_work_on(cpu, &dom->cqm_limbo, delay);
>  }
>  
>  void mbm_handle_overflow(struct work_struct *work)
> @@ -862,14 +874,24 @@ void mbm_handle_overflow(struct work_struct *work)
>  	 * Re-check for housekeeping CPUs. This allows the overflow handler to
>  	 * move off a nohz_full CPU quickly.
>  	 */
> -	d->mbm_work_cpu = cpumask_any_housekeeping(&d->cpu_mask);
> +	d->mbm_work_cpu = cpumask_any_housekeeping(&d->cpu_mask,
> +						   RESCTRL_PICK_ANY_CPU);
>  	schedule_delayed_work_on(d->mbm_work_cpu, &d->mbm_over, delay);
>  
>  out_unlock:
>  	mutex_unlock(&rdtgroup_mutex);
>  }
>  
> -void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
> +/**
> + * mbm_setup_overflow_handler() - Schedule the overflow handler to run for this
> + *                                domain.
> + * @dom:           The domain the overflow handler should run for.
> + * @delay_ms:      How far in the future the handler should run.
> + * @exclude_cpu:   Which CPU the handler should not run on,
> + *		   RESCTRL_PICK_ANY_CPU to pick any CPU.
> + */
> +void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms,
> +				int exclude_cpu)
>  {
>  	unsigned long delay = msecs_to_jiffies(delay_ms);
>  	int cpu;
> @@ -880,9 +902,11 @@ void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
>  	 */
>  	if (!resctrl_mounted || !resctrl_arch_mon_capable())
>  		return;
> -	cpu = cpumask_any_housekeeping(&dom->cpu_mask);
> +	cpu = cpumask_any_housekeeping(&dom->cpu_mask, exclude_cpu);
>  	dom->mbm_work_cpu = cpu;
> -	schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
> +
> +	if (cpu < nr_cpu_ids)
> +		schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
>  }
>  
>  static int dom_data_init(struct rdt_resource *r)
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index ab9db7ce706f..e22e0f6adeb3 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2683,7 +2683,8 @@ static int rdt_get_tree(struct fs_context *fc)
>  	if (is_mbm_enabled()) {
>  		r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>  		list_for_each_entry(dom, &r->domains, list)
> -			mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL);
> +			mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL,
> +						   RESCTRL_PICK_ANY_CPU);
>  	}
>  
>  	goto out;
> @@ -3994,7 +3995,8 @@ int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
>  
>  	if (is_mbm_enabled()) {
>  		INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
> -		mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL);
> +		mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL,
> +					   RESCTRL_PICK_ANY_CPU);
>  	}
>  
>  	if (is_llc_occupancy_enabled())
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 4c4bad3c34e4..ccbbbe5d18d3 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -10,6 +10,8 @@
>  #define RESCTRL_RESERVED_CLOSID		0
>  #define RESCTRL_RESERVED_RMID		0
>  
> +#define RESCTRL_PICK_ANY_CPU		-1
> +
>  #ifdef CONFIG_PROC_CPU_RESCTRL
>  
>  int proc_resctrl_show(struct seq_file *m,
  
James Morse Dec. 14, 2023, 11:38 a.m. UTC | #3
Hi Babu,

On 09/11/2023 20:51, Moger, Babu wrote:
> On 10/25/23 13:03, James Morse wrote:
>> When a CPU is taken offline resctrl may need to move the overflow or
>> limbo handlers to run on a different CPU.
>>
>> Once the offline callbacks have been split, cqm_setup_limbo_handler()
>> will be called while the CPU that is going offline is still present
>> in the cpu_mask.
>>
>> Pass the CPU to exclude to cqm_setup_limbo_handler() and
>> mbm_setup_overflow_handler(). These functions can use a variant of
>> cpumask_any_but() when selecting the CPU. -1 is used to indicate no CPUs
>> need excluding.
>>
>> A subsequent patch moves these calls to be before CPUs have been removed,
>> so this exclude_cpus behaviour is temporary.

> Reviewed-by: Babu Moger <babu.moger@amd.com>

Thanks!


James
  
James Morse Dec. 14, 2023, 11:38 a.m. UTC | #4
Hi Reinette,

On 09/11/2023 17:48, Reinette Chatre wrote:
> On 10/25/2023 11:03 AM, James Morse wrote:
>> When a CPU is taken offline resctrl may need to move the overflow or
>> limbo handlers to run on a different CPU.
>>
>> Once the offline callbacks have been split, cqm_setup_limbo_handler()
>> will be called while the CPU that is going offline is still present
>> in the cpu_mask.
>>
>> Pass the CPU to exclude to cqm_setup_limbo_handler() and
>> mbm_setup_overflow_handler(). These functions can use a variant of
>> cpumask_any_but() when selecting the CPU. -1 is used to indicate no CPUs
>> need excluding.
>>
>> A subsequent patch moves these calls to be before CPUs have been removed,
>> so this exclude_cpus behaviour is temporary.
> 
> Note "A subsequent patch". Please do go over your entire series. I may not
> have noticed all.

Yup, I've searched the git-log and removed those paragraphs from the x86 patches in my tree.


>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>> index c4c1e1909058..f5fff2f0d866 100644
>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>> @@ -61,19 +61,36 @@
>>   * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
>>   *			        aren't marked nohz_full
>>   * @mask:	The mask to pick a CPU from.
>> + * @exclude_cpu:The CPU to avoid picking.
>>   *
>> - * Returns a CPU in @mask. If there are housekeeping CPUs that don't use
>> - * nohz_full, these are preferred.
>> + * Returns a CPU from @mask, but not @exclude_cpu. If there are housekeeping
>> + * CPUs that don't use nohz_full, these are preferred. Pass
>> + * RESCTRL_PICK_ANY_CPU to avoid excluding any CPUs.
>> + *
>> + * When a CPU is excluded, returns >= nr_cpu_ids if no CPUs are available.
>>   */
>> -static inline unsigned int cpumask_any_housekeeping(const struct cpumask *mask)
>> +static inline unsigned int
>> +cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
>>  {
>>  	unsigned int cpu, hk_cpu;
>>  
>> -	cpu = cpumask_any(mask);
>> -	if (!tick_nohz_full_cpu(cpu))
>> +	if (exclude_cpu == RESCTRL_PICK_ANY_CPU)
>> +		cpu = cpumask_any(mask);
>> +	else
>> +		cpu = cpumask_any_but(mask, exclude_cpu);
>> +
>> +	if (!IS_ENABLED(CONFIG_NO_HZ_FULL))
>> +		return cpu;
> 
> It is not clear to me how cpumask_any_but() failure is handled.
> 
> cpumask_any_but() returns ">= nr_cpu_ids if no cpus set" ...

It wasn't a satisfiable request, there are no CPUs for this domain other than the one that
was excluded. cpumask_any_but() also describes its errors as "returns >= nr_cpu_ids if no
CPUs are available".

The places this can happen in resctrl are:
cqm_setup_limbo_handler(), where it causes the schedule_delayed_work_on() call to be skipped.
mbm_setup_overflow_handler(), which does similar.

These two cases are triggered from resctrl_offline_cpu() when the last CPU in a domain is
going offline, and the domain is about to be free()d. This is how the limbo/overflow
'threads' stop.


>> +
>> +	/* If the CPU picked isn't marked nohz_full, we're done */
> 
> Please don't impersonate code.
> 
>> +	if (cpu <= nr_cpu_ids && !tick_nohz_full_cpu(cpu))
>>  		return cpu;
> 
> Is this intended to be "cpu < nr_cpu_ids"?

Yes, fixed - thanks!


> But that would have
> code continue ... so maybe it needs explicit error check of
> cpumask_any_but() failure with an earlier exit?

I'm not sure what the problem you refer to here is.
If 'cpu' is valid, and not marked nohz_full, nothing more needs doing.
If 'cpu' is invalid or a CPU marked nohz_full, then a second attempt is made to find a
housekeeping CPU into 'hk_cpu'. If the second attempt is valid, it's used in preference.

An error is returned if the request couldn't be satisfied, i.e. an empty mask was passed,
or the only CPU set in the mask was excluded.
There is a second attempt in this case for a housekeeping CPU - but that will fail too.
As above, this only happens when CPUs are going offline, and this error is handled by the
caller.


Thanks,

James
  
Reinette Chatre Dec. 14, 2023, 6:53 p.m. UTC | #5
Hi James,

On 12/14/2023 3:38 AM, James Morse wrote:
> On 09/11/2023 17:48, Reinette Chatre wrote:
>> On 10/25/2023 11:03 AM, James Morse wrote:

>>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>>> index c4c1e1909058..f5fff2f0d866 100644
>>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>>> @@ -61,19 +61,36 @@
>>>   * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
>>>   *			        aren't marked nohz_full
>>>   * @mask:	The mask to pick a CPU from.
>>> + * @exclude_cpu:The CPU to avoid picking.
>>>   *
>>> - * Returns a CPU in @mask. If there are housekeeping CPUs that don't use
>>> - * nohz_full, these are preferred.
>>> + * Returns a CPU from @mask, but not @exclude_cpu. If there are housekeeping
>>> + * CPUs that don't use nohz_full, these are preferred. Pass
>>> + * RESCTRL_PICK_ANY_CPU to avoid excluding any CPUs.
>>> + *
>>> + * When a CPU is excluded, returns >= nr_cpu_ids if no CPUs are available.
>>>   */
>>> -static inline unsigned int cpumask_any_housekeeping(const struct cpumask *mask)
>>> +static inline unsigned int
>>> +cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
>>>  {
>>>  	unsigned int cpu, hk_cpu;
>>>  
>>> -	cpu = cpumask_any(mask);
>>> -	if (!tick_nohz_full_cpu(cpu))
>>> +	if (exclude_cpu == RESCTRL_PICK_ANY_CPU)
>>> +		cpu = cpumask_any(mask);
>>> +	else
>>> +		cpu = cpumask_any_but(mask, exclude_cpu);
>>> +
>>> +	if (!IS_ENABLED(CONFIG_NO_HZ_FULL))
>>> +		return cpu;
>>
>> It is not clear to me how cpumask_any_but() failure is handled.
>>
>> cpumask_any_but() returns ">= nr_cpu_ids if no cpus set" ...
> 
> It wasn't a satisfiable request, there are no CPUs for this domain other than the one that
> was excluded. cpumask_any_but() also describes its errors as "returns >= nr_cpu_ids if no
> CPUs are available".
> 
> The places this can happen in resctrl are:
> cqm_setup_limbo_handler(), where it causes the schedule_delayed_work_on() call to be skipped.
> mbm_setup_overflow_handler(), which does similar.
> 
> These two cases are triggered from resctrl_offline_cpu() when the last CPU in a domain is
> going offline, and the domain is about to be free()d. This is how the limbo/overflow
> 'threads' stop.

Right ... yet this is a generic function, if there are any requirements on when/how it should
be called then it needs to be specified in the function comments. I do not expect this to
be the case for this function.

>>> +
>>> +	/* If the CPU picked isn't marked nohz_full, we're done */
>>
>> Please don't impersonate code.
>>
>>> +	if (cpu <= nr_cpu_ids && !tick_nohz_full_cpu(cpu))
>>>  		return cpu;
>>
>> Is this intended to be "cpu < nr_cpu_ids"?
> 
> Yes, fixed - thanks!
> 
> 
>> But that would have
>> code continue ... so maybe it needs explicit error check of
>> cpumask_any_but() failure with an earlier exit?
> 
> I'm not sure what the problem you refer to here is.
> If 'cpu' is valid, and not marked nohz_full, nothing more needs doing.
> If 'cpu' is invalid or a CPU marked nohz_full, then a second attempt is made to find a
> housekeeping CPU into 'hk_cpu'. If the second attempt is valid, it's used in preference.

Considering that the second attempt can only be on the same or smaller set of CPUs,
how could the second attempt ever succeed if the first attempt failed? I do not see
why it is worth continuing.

> An error is returned if the request couldn't be satisfied, i.e. an empty mask was passed,
> or the only CPU set in the mask was excluded.
> There is a second attempt in this case for a housekeeping CPU - but that will fail too.
> As above, this only happens when CPUs are going offline, and this error is handled by the
> caller.

Reinette
  
James Morse Dec. 15, 2023, 5:41 p.m. UTC | #6
Hi Reinette,

On 12/14/23 18:53, Reinette Chatre wrote:
> On 12/14/2023 3:38 AM, James Morse wrote:
>> On 09/11/2023 17:48, Reinette Chatre wrote:
>>> On 10/25/2023 11:03 AM, James Morse wrote:

>>>> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
>>>> index c4c1e1909058..f5fff2f0d866 100644
>>>> --- a/arch/x86/kernel/cpu/resctrl/internal.h
>>>> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
>>>> @@ -61,19 +61,36 @@
>>>>    * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
>>>>    *			        aren't marked nohz_full
>>>>    * @mask:	The mask to pick a CPU from.
>>>> + * @exclude_cpu:The CPU to avoid picking.
>>>>    *
>>>> - * Returns a CPU in @mask. If there are housekeeping CPUs that don't use
>>>> - * nohz_full, these are preferred.
>>>> + * Returns a CPU from @mask, but not @exclude_cpu. If there are housekeeping
>>>> + * CPUs that don't use nohz_full, these are preferred. Pass
>>>> + * RESCTRL_PICK_ANY_CPU to avoid excluding any CPUs.
>>>> + *
>>>> + * When a CPU is excluded, returns >= nr_cpu_ids if no CPUs are available.
>>>>    */
>>>> -static inline unsigned int cpumask_any_housekeeping(const struct cpumask *mask)
>>>> +static inline unsigned int
>>>> +cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
>>>>   {
>>>>   	unsigned int cpu, hk_cpu;
>>>>   
>>>> -	cpu = cpumask_any(mask);
>>>> -	if (!tick_nohz_full_cpu(cpu))
>>>> +	if (exclude_cpu == RESCTRL_PICK_ANY_CPU)
>>>> +		cpu = cpumask_any(mask);
>>>> +	else
>>>> +		cpu = cpumask_any_but(mask, exclude_cpu);
>>>> +
>>>> +	if (!IS_ENABLED(CONFIG_NO_HZ_FULL))
>>>> +		return cpu;
>>>
>>> It is not clear to me how cpumask_any_but() failure is handled.
>>>
>>> cpumask_any_but() returns ">= nr_cpu_ids if no cpus set" ...
>>
>> It wasn't a satisfiable request, there are no CPUs for this domain other than the one that
>> was excluded. cpumask_any_but() also describes its errors as "returns >= nr_cpu_ids if no
>> CPUs are available".
>>
>> The places this can happen in resctrl are:
>> cqm_setup_limbo_handler(), where it causes the schedule_delayed_work_on() call to be skipped.
>> mbm_setup_overflow_handler(), which does similar.
>>
>> These two cases are triggered from resctrl_offline_cpu() when the last CPU in a domain is
>> going offline, and the domain is about to be free()d. This is how the limbo/overflow
>> 'threads' stop.

> Right ... yet this is a generic function, if there are any requirements on when/how it should
> be called then it needs to be specified in the function comments. I do not expect this to
> be the case for this function.

There are no special requirements, like all the other cpumask_foo() helpers, you can feed it
an empty bitmap and it will return '>= nr_cpu_ids' as an error.

[...]

>>> But that would have
>>> code continue ... so maybe it needs explicit error check of
>>> cpumask_any_but() failure with an earlier exit?
>>
>> I'm not sure what the problem you refer to here is.
>> If 'cpu' is valid, and not marked nohz_full, nothing more needs doing.
>> If 'cpu' is invalid or a CPU marked nohz_full, then a second attempt is made to find a
>> housekeeping CPU into 'hk_cpu'. If the second attempt is valid, it's used in preference.
> 
> Considering that the second attempt can only be on the same or smaller set of CPUs,
> how could the second attempt ever succeed if the first attempt failed? I do not see
> why it is worth continuing.

Its harmless, its not on a performance sensitive path and it would take extra logic in the
more common cases to detect this and return early.


Thanks,

James
  

Patch

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 1a74e9c47416..7e44f2c40897 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -586,12 +586,16 @@  static void domain_remove_cpu(int cpu, struct rdt_resource *r)
 	if (r == &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl) {
 		if (is_mbm_enabled() && cpu == d->mbm_work_cpu) {
 			cancel_delayed_work(&d->mbm_over);
-			mbm_setup_overflow_handler(d, 0);
+			/*
+			 * temporary: exclude_cpu=-1 as this CPU has already
+			 * been removed by cpumask_clear_cpu()d
+			 */
+			mbm_setup_overflow_handler(d, 0, RESCTRL_PICK_ANY_CPU);
 		}
 		if (is_llc_occupancy_enabled() && cpu == d->cqm_work_cpu &&
 		    has_busy_rmid(d)) {
 			cancel_delayed_work(&d->cqm_limbo);
-			cqm_setup_limbo_handler(d, 0);
+			cqm_setup_limbo_handler(d, 0, RESCTRL_PICK_ANY_CPU);
 		}
 	}
 }
diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
index a033e8e32108..64db51455df3 100644
--- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
+++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c
@@ -552,7 +552,7 @@  void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		return;
 	}
 
-	cpu = cpumask_any_housekeeping(&d->cpu_mask);
+	cpu = cpumask_any_housekeeping(&d->cpu_mask, RESCTRL_PICK_ANY_CPU);
 
 	/*
 	 * cpumask_any_housekeeping() prefers housekeeping CPUs, but
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index c4c1e1909058..f5fff2f0d866 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -61,19 +61,36 @@ 
  * cpumask_any_housekeeping() - Choose any CPU in @mask, preferring those that
  *			        aren't marked nohz_full
  * @mask:	The mask to pick a CPU from.
+ * @exclude_cpu:The CPU to avoid picking.
  *
- * Returns a CPU in @mask. If there are housekeeping CPUs that don't use
- * nohz_full, these are preferred.
+ * Returns a CPU from @mask, but not @exclude_cpu. If there are housekeeping
+ * CPUs that don't use nohz_full, these are preferred. Pass
+ * RESCTRL_PICK_ANY_CPU to avoid excluding any CPUs.
+ *
+ * When a CPU is excluded, returns >= nr_cpu_ids if no CPUs are available.
  */
-static inline unsigned int cpumask_any_housekeeping(const struct cpumask *mask)
+static inline unsigned int
+cpumask_any_housekeeping(const struct cpumask *mask, int exclude_cpu)
 {
 	unsigned int cpu, hk_cpu;
 
-	cpu = cpumask_any(mask);
-	if (!tick_nohz_full_cpu(cpu))
+	if (exclude_cpu == RESCTRL_PICK_ANY_CPU)
+		cpu = cpumask_any(mask);
+	else
+		cpu = cpumask_any_but(mask, exclude_cpu);
+
+	if (!IS_ENABLED(CONFIG_NO_HZ_FULL))
+		return cpu;
+
+	/* If the CPU picked isn't marked nohz_full, we're done */
+	if (cpu <= nr_cpu_ids && !tick_nohz_full_cpu(cpu))
 		return cpu;
 
+	/* Try to find a CPU that isn't nohz_full to use in preference */
 	hk_cpu = cpumask_nth_andnot(0, mask, tick_nohz_full_mask);
+	if (hk_cpu == exclude_cpu)
+		hk_cpu = cpumask_nth_andnot(1, mask, tick_nohz_full_mask);
+
 	if (hk_cpu < nr_cpu_ids)
 		cpu = hk_cpu;
 
@@ -575,11 +592,13 @@  void mon_event_read(struct rmid_read *rr, struct rdt_resource *r,
 		    struct rdt_domain *d, struct rdtgroup *rdtgrp,
 		    int evtid, int first);
 void mbm_setup_overflow_handler(struct rdt_domain *dom,
-				unsigned long delay_ms);
+				unsigned long delay_ms,
+				int exclude_cpu);
 void mbm_handle_overflow(struct work_struct *work);
 void __init intel_rdt_mbm_apply_quirk(void);
 bool is_mba_sc(struct rdt_resource *r);
-void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms);
+void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms,
+			     int exclude_cpu);
 void cqm_handle_limbo(struct work_struct *work);
 bool has_busy_rmid(struct rdt_domain *d);
 void __check_limbo(struct rdt_domain *d, bool force_free);
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 87379d2a339c..241b3dd8646c 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -481,7 +481,8 @@  static void add_rmid_to_limbo(struct rmid_entry *entry)
 		 * setup up the limbo worker.
 		 */
 		if (!has_busy_rmid(d))
-			cqm_setup_limbo_handler(d, CQM_LIMBOCHECK_INTERVAL);
+			cqm_setup_limbo_handler(d, CQM_LIMBOCHECK_INTERVAL,
+						RESCTRL_PICK_ANY_CPU);
 		set_bit(idx, d->rmid_busy_llc);
 		entry->busy++;
 	}
@@ -808,7 +809,8 @@  void cqm_handle_limbo(struct work_struct *work)
 	__check_limbo(d, false);
 
 	if (has_busy_rmid(d)) {
-		d->cqm_work_cpu = cpumask_any_housekeeping(&d->cpu_mask);
+		d->cqm_work_cpu = cpumask_any_housekeeping(&d->cpu_mask,
+							   RESCTRL_PICK_ANY_CPU);
 		schedule_delayed_work_on(d->cqm_work_cpu, &d->cqm_limbo,
 					 delay);
 	}
@@ -816,15 +818,25 @@  void cqm_handle_limbo(struct work_struct *work)
 	mutex_unlock(&rdtgroup_mutex);
 }
 
-void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms)
+/**
+ * cqm_setup_limbo_handler() - Schedule the limbo handler to run for this
+ *                             domain.
+ * @dom:           The domain the limbo handler should run for.
+ * @delay_ms:      How far in the future the handler should run.
+ * @exclude_cpu:   Which CPU the handler should not run on,
+ *		   RESCTRL_PICK_ANY_CPU to pick any CPU.
+ */
+void cqm_setup_limbo_handler(struct rdt_domain *dom, unsigned long delay_ms,
+			     int exclude_cpu)
 {
 	unsigned long delay = msecs_to_jiffies(delay_ms);
 	int cpu;
 
-	cpu = cpumask_any_housekeeping(&dom->cpu_mask);
+	cpu = cpumask_any_housekeeping(&dom->cpu_mask, exclude_cpu);
 	dom->cqm_work_cpu = cpu;
 
-	schedule_delayed_work_on(cpu, &dom->cqm_limbo, delay);
+	if (cpu < nr_cpu_ids)
+		schedule_delayed_work_on(cpu, &dom->cqm_limbo, delay);
 }
 
 void mbm_handle_overflow(struct work_struct *work)
@@ -862,14 +874,24 @@  void mbm_handle_overflow(struct work_struct *work)
 	 * Re-check for housekeeping CPUs. This allows the overflow handler to
 	 * move off a nohz_full CPU quickly.
 	 */
-	d->mbm_work_cpu = cpumask_any_housekeeping(&d->cpu_mask);
+	d->mbm_work_cpu = cpumask_any_housekeeping(&d->cpu_mask,
+						   RESCTRL_PICK_ANY_CPU);
 	schedule_delayed_work_on(d->mbm_work_cpu, &d->mbm_over, delay);
 
 out_unlock:
 	mutex_unlock(&rdtgroup_mutex);
 }
 
-void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
+/**
+ * mbm_setup_overflow_handler() - Schedule the overflow handler to run for this
+ *                                domain.
+ * @dom:           The domain the overflow handler should run for.
+ * @delay_ms:      How far in the future the handler should run.
+ * @exclude_cpu:   Which CPU the handler should not run on,
+ *		   RESCTRL_PICK_ANY_CPU to pick any CPU.
+ */
+void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms,
+				int exclude_cpu)
 {
 	unsigned long delay = msecs_to_jiffies(delay_ms);
 	int cpu;
@@ -880,9 +902,11 @@  void mbm_setup_overflow_handler(struct rdt_domain *dom, unsigned long delay_ms)
 	 */
 	if (!resctrl_mounted || !resctrl_arch_mon_capable())
 		return;
-	cpu = cpumask_any_housekeeping(&dom->cpu_mask);
+	cpu = cpumask_any_housekeeping(&dom->cpu_mask, exclude_cpu);
 	dom->mbm_work_cpu = cpu;
-	schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
+
+	if (cpu < nr_cpu_ids)
+		schedule_delayed_work_on(cpu, &dom->mbm_over, delay);
 }
 
 static int dom_data_init(struct rdt_resource *r)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index ab9db7ce706f..e22e0f6adeb3 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -2683,7 +2683,8 @@  static int rdt_get_tree(struct fs_context *fc)
 	if (is_mbm_enabled()) {
 		r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
 		list_for_each_entry(dom, &r->domains, list)
-			mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL);
+			mbm_setup_overflow_handler(dom, MBM_OVERFLOW_INTERVAL,
+						   RESCTRL_PICK_ANY_CPU);
 	}
 
 	goto out;
@@ -3994,7 +3995,8 @@  int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d)
 
 	if (is_mbm_enabled()) {
 		INIT_DELAYED_WORK(&d->mbm_over, mbm_handle_overflow);
-		mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL);
+		mbm_setup_overflow_handler(d, MBM_OVERFLOW_INTERVAL,
+					   RESCTRL_PICK_ANY_CPU);
 	}
 
 	if (is_llc_occupancy_enabled())
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 4c4bad3c34e4..ccbbbe5d18d3 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -10,6 +10,8 @@ 
 #define RESCTRL_RESERVED_CLOSID		0
 #define RESCTRL_RESERVED_RMID		0
 
+#define RESCTRL_PICK_ANY_CPU		-1
+
 #ifdef CONFIG_PROC_CPU_RESCTRL
 
 int proc_resctrl_show(struct seq_file *m,