[3/4] workqueue: add schedule_on_each_cpumask helper

Message ID 20230530145335.930262644@redhat.com
State New
Headers
Series vmstat bug fixes for nohz_full CPUs |

Commit Message

Marcelo Tosatti May 30, 2023, 2:52 p.m. UTC
  Add a schedule_on_each_cpumask function, equivalent to
schedule_on_each_cpu but accepting a cpumask to operate.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
  

Comments

Andrew Morton May 30, 2023, 8:09 p.m. UTC | #1
On Tue, 30 May 2023 11:52:37 -0300 Marcelo Tosatti <mtosatti@redhat.com> wrote:

> Add a schedule_on_each_cpumask function, equivalent to
> schedule_on_each_cpu but accepting a cpumask to operate.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> ---
> 
> Index: linux-vmstat-remote/kernel/workqueue.c
> ===================================================================
> --- linux-vmstat-remote.orig/kernel/workqueue.c
> +++ linux-vmstat-remote/kernel/workqueue.c
> @@ -3455,6 +3455,56 @@ int schedule_on_each_cpu(work_func_t fun
>  	return 0;
>  }
>  
> +
> +/**
> + * schedule_on_each_cpumask - execute a function synchronously on each
> + * CPU in "cpumask", for those which are online.
> + *
> + * @func: the function to call
> + * @mask: the CPUs which to call function on
> + *
> + * schedule_on_each_cpu() executes @func on each specified CPU that is online,
> + * using the system workqueue and blocks until all such CPUs have completed.
> + * schedule_on_each_cpu() is very slow.
> + *
> + * Return:
> + * 0 on success, -errno on failure.
> + */
> +int schedule_on_each_cpumask(work_func_t func, cpumask_t *cpumask)
> +{
> +	int cpu;
> +	struct work_struct __percpu *works;
> +	cpumask_var_t effmask;
> +
> +	works = alloc_percpu(struct work_struct);
> +	if (!works)
> +		return -ENOMEM;
> +
> +	if (!alloc_cpumask_var(&effmask, GFP_KERNEL)) {
> +		free_percpu(works);
> +		return -ENOMEM;
> +	}
> +
> +	cpumask_and(effmask, cpumask, cpu_online_mask);
> +
> +	cpus_read_lock();
> +
> +	for_each_cpu(cpu, effmask) {

Should we check here that the cpu is still online?

> +		struct work_struct *work = per_cpu_ptr(works, cpu);
> +
> +		INIT_WORK(work, func);
> +		schedule_work_on(cpu, work);
> +	}
> +
> +	for_each_cpu(cpu, effmask)
> +		flush_work(per_cpu_ptr(works, cpu));
> +
> +	cpus_read_unlock();
> +	free_percpu(works);
> +	free_cpumask_var(effmask);
> +	return 0;
> +}
> +
>  /**
>   * execute_in_process_context - reliably execute the routine with user context
>   * @fn:		the function to execute
> --- linux-vmstat-remote.orig/include/linux/workqueue.h
> +++ linux-vmstat-remote/include/linux/workqueue.h
> @@ -450,6 +450,7 @@ extern void __flush_workqueue(struct wor
>  extern void drain_workqueue(struct workqueue_struct *wq);
>  
>  extern int schedule_on_each_cpu(work_func_t func);
> +extern int schedule_on_each_cpumask(work_func_t func, cpumask_t *cpumask);

May as well make schedule_on_each_cpu() call
schedule_on_each_cpumask()?  Save a bit of text, and they're hardly
performance-critical to that extent.
  
Marcelo Tosatti June 1, 2023, 6:13 p.m. UTC | #2
On Tue, May 30, 2023 at 01:09:47PM -0700, Andrew Morton wrote:
> On Tue, 30 May 2023 11:52:37 -0300 Marcelo Tosatti <mtosatti@redhat.com> wrote:
> 
> > Add a schedule_on_each_cpumask function, equivalent to
> > schedule_on_each_cpu but accepting a cpumask to operate.
> > 
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > 
> > ---
> > 
> > Index: linux-vmstat-remote/kernel/workqueue.c
> > ===================================================================
> > --- linux-vmstat-remote.orig/kernel/workqueue.c
> > +++ linux-vmstat-remote/kernel/workqueue.c
> > @@ -3455,6 +3455,56 @@ int schedule_on_each_cpu(work_func_t fun
> >  	return 0;
> >  }
> >  
> > +
> > +/**
> > + * schedule_on_each_cpumask - execute a function synchronously on each
> > + * CPU in "cpumask", for those which are online.
> > + *
> > + * @func: the function to call
> > + * @mask: the CPUs which to call function on
> > + *
> > + * schedule_on_each_cpu() executes @func on each specified CPU that is online,
> > + * using the system workqueue and blocks until all such CPUs have completed.
> > + * schedule_on_each_cpu() is very slow.
> > + *
> > + * Return:
> > + * 0 on success, -errno on failure.
> > + */
> > +int schedule_on_each_cpumask(work_func_t func, cpumask_t *cpumask)
> > +{
> > +	int cpu;
> > +	struct work_struct __percpu *works;
> > +	cpumask_var_t effmask;
> > +
> > +	works = alloc_percpu(struct work_struct);
> > +	if (!works)
> > +		return -ENOMEM;
> > +
> > +	if (!alloc_cpumask_var(&effmask, GFP_KERNEL)) {
> > +		free_percpu(works);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	cpumask_and(effmask, cpumask, cpu_online_mask);
> > +
> > +	cpus_read_lock();
> > +
> > +	for_each_cpu(cpu, effmask) {
> 
> Should we check here that the cpu is still online?
> 
> > +		struct work_struct *work = per_cpu_ptr(works, cpu);
> > +
> > +		INIT_WORK(work, func);
> > +		schedule_work_on(cpu, work);
> > +	}
> > +
> > +	for_each_cpu(cpu, effmask)
> > +		flush_work(per_cpu_ptr(works, cpu));
> > +
> > +	cpus_read_unlock();
> > +	free_percpu(works);
> > +	free_cpumask_var(effmask);
> > +	return 0;
> > +}
> > +
> >  /**
> >   * execute_in_process_context - reliably execute the routine with user context
> >   * @fn:		the function to execute
> > --- linux-vmstat-remote.orig/include/linux/workqueue.h
> > +++ linux-vmstat-remote/include/linux/workqueue.h
> > @@ -450,6 +450,7 @@ extern void __flush_workqueue(struct wor
> >  extern void drain_workqueue(struct workqueue_struct *wq);
> >  
> >  extern int schedule_on_each_cpu(work_func_t func);
> > +extern int schedule_on_each_cpumask(work_func_t func, cpumask_t *cpumask);
> 
> May as well make schedule_on_each_cpu() call
> schedule_on_each_cpumask()?  Save a bit of text, and they're hardly
> performance-critical to that extent.

Agree, will wait for Michal's review before resending -v2.


workqueue: add schedule_on_each_cpumask helper

Add a schedule_on_each_cpumask function, equivalent to
schedule_on_each_cpu but accepting a cpumask to operate.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---

v2: - cpu_online_mask reference should happen with cpus_read_lock held	(Andrew Morton)
    - have schedule_on_each_cpu call _cpumask variant			(Andrew Morton)

Index: linux-vmstat-remote/kernel/workqueue.c
===================================================================
--- linux-vmstat-remote.orig/kernel/workqueue.c
+++ linux-vmstat-remote/kernel/workqueue.c
@@ -3431,27 +3431,56 @@ EXPORT_SYMBOL(cancel_delayed_work_sync);
  */
 int schedule_on_each_cpu(work_func_t func)
 {
+	return schedule_on_each_cpumask(func, cpu_possible_mask);
+}
+
+
+/**
+ * schedule_on_each_cpumask - execute a function synchronously on each
+ * CPU in "cpumask", for those which are online.
+ *
+ * @func: the function to call
+ * @mask: the CPUs which to call function on
+ *
+ * schedule_on_each_cpu() executes @func on each specified CPU that is online,
+ * using the system workqueue and blocks until all such CPUs have completed.
+ * schedule_on_each_cpu() is very slow.
+ *
+ * Return:
+ * 0 on success, -errno on failure.
+ */
+int schedule_on_each_cpumask(work_func_t func, cpumask_t *cpumask)
+{
 	int cpu;
 	struct work_struct __percpu *works;
+	cpumask_var_t effmask;
 
 	works = alloc_percpu(struct work_struct);
 	if (!works)
 		return -ENOMEM;
 
+	if (!alloc_cpumask_var(&effmask, GFP_KERNEL)) {
+		free_percpu(works);
+		return -ENOMEM;
+	}
+
 	cpus_read_lock();
 
-	for_each_online_cpu(cpu) {
+	cpumask_and(effmask, cpumask, cpu_online_mask);
+
+	for_each_cpu(cpu, effmask) {
 		struct work_struct *work = per_cpu_ptr(works, cpu);
 
 		INIT_WORK(work, func);
 		schedule_work_on(cpu, work);
 	}
 
-	for_each_online_cpu(cpu)
+	for_each_cpu(cpu, effmask)
 		flush_work(per_cpu_ptr(works, cpu));
 
 	cpus_read_unlock();
 	free_percpu(works);
+	free_cpumask_var(effmask);
 	return 0;
 }
 
Index: linux-vmstat-remote/include/linux/workqueue.h
===================================================================
--- linux-vmstat-remote.orig/include/linux/workqueue.h
+++ linux-vmstat-remote/include/linux/workqueue.h
@@ -450,6 +450,7 @@ extern void __flush_workqueue(struct wor
 extern void drain_workqueue(struct workqueue_struct *wq);
 
 extern int schedule_on_each_cpu(work_func_t func);
+extern int schedule_on_each_cpumask(work_func_t func, cpumask_t *cpumask);
 
 int execute_in_process_context(work_func_t fn, struct execute_work *);
  
Michal Hocko June 2, 2023, 10:48 a.m. UTC | #3
You should be CCing WQ maintainers on changes like this one (now added).

On Tue 30-05-23 11:52:37, Marcelo Tosatti wrote:
> Add a schedule_on_each_cpumask function, equivalent to
> schedule_on_each_cpu but accepting a cpumask to operate.

IMHO it is preferable to add a new function along with its user so that
the usecase is more clear.
 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> ---
> 
> Index: linux-vmstat-remote/kernel/workqueue.c
> ===================================================================
> --- linux-vmstat-remote.orig/kernel/workqueue.c
> +++ linux-vmstat-remote/kernel/workqueue.c
> @@ -3455,6 +3455,56 @@ int schedule_on_each_cpu(work_func_t fun
>  	return 0;
>  }
>  
> +
> +/**
> + * schedule_on_each_cpumask - execute a function synchronously on each
> + * CPU in "cpumask", for those which are online.
> + *
> + * @func: the function to call
> + * @mask: the CPUs which to call function on
> + *
> + * schedule_on_each_cpu() executes @func on each specified CPU that is online,
> + * using the system workqueue and blocks until all such CPUs have completed.
> + * schedule_on_each_cpu() is very slow.
> + *
> + * Return:
> + * 0 on success, -errno on failure.
> + */
> +int schedule_on_each_cpumask(work_func_t func, cpumask_t *cpumask)
> +{
> +	int cpu;
> +	struct work_struct __percpu *works;
> +	cpumask_var_t effmask;
> +
> +	works = alloc_percpu(struct work_struct);
> +	if (!works)
> +		return -ENOMEM;
> +
> +	if (!alloc_cpumask_var(&effmask, GFP_KERNEL)) {
> +		free_percpu(works);
> +		return -ENOMEM;
> +	}
> +
> +	cpumask_and(effmask, cpumask, cpu_online_mask);
> +
> +	cpus_read_lock();
> +
> +	for_each_cpu(cpu, effmask) {

Is the cpu_online_mask dance really necessary? Why cannot you simply do
for_each_online_cpu here? flush_work on unqueued work item should just
return, no?

Also there is no synchronization with the cpu hotplug so cpu_online_mask
can change under your feet so this construct seem unsafe to me.

> +		struct work_struct *work = per_cpu_ptr(works, cpu);
> +
> +		INIT_WORK(work, func);
> +		schedule_work_on(cpu, work);
> +	}
> +
> +	for_each_cpu(cpu, effmask)
> +		flush_work(per_cpu_ptr(works, cpu));
> +
> +	cpus_read_unlock();
> +	free_percpu(works);
> +	free_cpumask_var(effmask);
> +	return 0;
> +}
> +
>  /**
>   * execute_in_process_context - reliably execute the routine with user context
>   * @fn:		the function to execute
> Index: linux-vmstat-remote/include/linux/workqueue.h
> ===================================================================
> --- linux-vmstat-remote.orig/include/linux/workqueue.h
> +++ linux-vmstat-remote/include/linux/workqueue.h
> @@ -450,6 +450,7 @@ extern void __flush_workqueue(struct wor
>  extern void drain_workqueue(struct workqueue_struct *wq);
>  
>  extern int schedule_on_each_cpu(work_func_t func);
> +extern int schedule_on_each_cpumask(work_func_t func, cpumask_t *cpumask);
>  
>  int execute_in_process_context(work_func_t fn, struct execute_work *);
>  
>
  
Marcelo Tosatti June 2, 2023, 5:04 p.m. UTC | #4
On Fri, Jun 02, 2023 at 12:48:23PM +0200, Michal Hocko wrote:
> You should be CCing WQ maintainers on changes like this one (now added).
> 
> On Tue 30-05-23 11:52:37, Marcelo Tosatti wrote:
> > Add a schedule_on_each_cpumask function, equivalent to
> > schedule_on_each_cpu but accepting a cpumask to operate.
> 
> IMHO it is preferable to add a new function along with its user so that
> the usecase is more clear.
>  
> > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> > 
> > ---
> > 
> > Index: linux-vmstat-remote/kernel/workqueue.c
> > ===================================================================
> > --- linux-vmstat-remote.orig/kernel/workqueue.c
> > +++ linux-vmstat-remote/kernel/workqueue.c
> > @@ -3455,6 +3455,56 @@ int schedule_on_each_cpu(work_func_t fun
> >  	return 0;
> >  }
> >  
> > +
> > +/**
> > + * schedule_on_each_cpumask - execute a function synchronously on each
> > + * CPU in "cpumask", for those which are online.
> > + *
> > + * @func: the function to call
> > + * @mask: the CPUs which to call function on
> > + *
> > + * schedule_on_each_cpu() executes @func on each specified CPU that is online,
> > + * using the system workqueue and blocks until all such CPUs have completed.
> > + * schedule_on_each_cpu() is very slow.
> > + *
> > + * Return:
> > + * 0 on success, -errno on failure.
> > + */
> > +int schedule_on_each_cpumask(work_func_t func, cpumask_t *cpumask)
> > +{
> > +	int cpu;
> > +	struct work_struct __percpu *works;
> > +	cpumask_var_t effmask;
> > +
> > +	works = alloc_percpu(struct work_struct);
> > +	if (!works)
> > +		return -ENOMEM;
> > +
> > +	if (!alloc_cpumask_var(&effmask, GFP_KERNEL)) {
> > +		free_percpu(works);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	cpumask_and(effmask, cpumask, cpu_online_mask);
> > +
> > +	cpus_read_lock();
> > +
> > +	for_each_cpu(cpu, effmask) {
> 
> Is the cpu_online_mask dance really necessary? 

> Why cannot you simply do for_each_online_cpu here? 

Are you suggesting to do: 

	for_each_online_cpu(cpu) {
		if cpu is not in cpumask
			continue;
		...
	}

This does not seem efficient.

> flush_work on unqueued work item should just
> return, no?

Apparently not:

commit 0e8d6a9336b487a1dd6f1991ff376e669d4c87c6
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Apr 12 22:07:28 2017 +0200

    workqueue: Provide work_on_cpu_safe()
    
    work_on_cpu() is not protected against CPU hotplug. For code which requires
    to be either executed on an online CPU or to fail if the CPU is not
    available the callsite would have to protect against CPU hotplug.
    
    Provide a function which does get/put_online_cpus() around the call to
    work_on_cpu() and fails the call with -ENODEV if the target CPU is not
    online.

> Also there is no synchronization with the cpu hotplug so cpu_online_mask
> can change under your feet so this construct seem unsafe to me.

Yes, fixed by patch in response to Andrew's comment.
  
Michal Hocko June 5, 2023, 7:12 a.m. UTC | #5
On Fri 02-06-23 14:04:28, Marcelo Tosatti wrote:
> On Fri, Jun 02, 2023 at 12:48:23PM +0200, Michal Hocko wrote:
[...]
> > > +	if (!alloc_cpumask_var(&effmask, GFP_KERNEL)) {
> > > +		free_percpu(works);
> > > +		return -ENOMEM;
> > > +	}
> > > +
> > > +	cpumask_and(effmask, cpumask, cpu_online_mask);
> > > +
> > > +	cpus_read_lock();
> > > +
> > > +	for_each_cpu(cpu, effmask) {
> > 
> > Is the cpu_online_mask dance really necessary? 
> 
> > Why cannot you simply do for_each_online_cpu here? 
> 
> Are you suggesting to do: 
> 
> 	for_each_online_cpu(cpu) {
> 		if cpu is not in cpumask
> 			continue;
> 		...
> 	}
> 
> This does not seem efficient.

Are you sure this is less sufficient than a memory allocation?
  

Patch

Index: linux-vmstat-remote/kernel/workqueue.c
===================================================================
--- linux-vmstat-remote.orig/kernel/workqueue.c
+++ linux-vmstat-remote/kernel/workqueue.c
@@ -3455,6 +3455,56 @@  int schedule_on_each_cpu(work_func_t fun
 	return 0;
 }
 
+
+/**
+ * schedule_on_each_cpumask - execute a function synchronously on each
+ * CPU in "cpumask", for those which are online.
+ *
+ * @func: the function to call
+ * @mask: the CPUs which to call function on
+ *
+ * schedule_on_each_cpu() executes @func on each specified CPU that is online,
+ * using the system workqueue and blocks until all such CPUs have completed.
+ * schedule_on_each_cpu() is very slow.
+ *
+ * Return:
+ * 0 on success, -errno on failure.
+ */
+int schedule_on_each_cpumask(work_func_t func, cpumask_t *cpumask)
+{
+	int cpu;
+	struct work_struct __percpu *works;
+	cpumask_var_t effmask;
+
+	works = alloc_percpu(struct work_struct);
+	if (!works)
+		return -ENOMEM;
+
+	if (!alloc_cpumask_var(&effmask, GFP_KERNEL)) {
+		free_percpu(works);
+		return -ENOMEM;
+	}
+
+	cpumask_and(effmask, cpumask, cpu_online_mask);
+
+	cpus_read_lock();
+
+	for_each_cpu(cpu, effmask) {
+		struct work_struct *work = per_cpu_ptr(works, cpu);
+
+		INIT_WORK(work, func);
+		schedule_work_on(cpu, work);
+	}
+
+	for_each_cpu(cpu, effmask)
+		flush_work(per_cpu_ptr(works, cpu));
+
+	cpus_read_unlock();
+	free_percpu(works);
+	free_cpumask_var(effmask);
+	return 0;
+}
+
 /**
  * execute_in_process_context - reliably execute the routine with user context
  * @fn:		the function to execute
Index: linux-vmstat-remote/include/linux/workqueue.h
===================================================================
--- linux-vmstat-remote.orig/include/linux/workqueue.h
+++ linux-vmstat-remote/include/linux/workqueue.h
@@ -450,6 +450,7 @@  extern void __flush_workqueue(struct wor
 extern void drain_workqueue(struct workqueue_struct *wq);
 
 extern int schedule_on_each_cpu(work_func_t func);
+extern int schedule_on_each_cpumask(work_func_t func, cpumask_t *cpumask);
 
 int execute_in_process_context(work_func_t fn, struct execute_work *);