[2/4] vmstat: skip periodic vmstat update for nohz full CPUs

Message ID 20230530145335.828634764@redhat.com
State New
Headers
Series vmstat bug fixes for nohz_full CPUs |

Commit Message

Marcelo Tosatti May 30, 2023, 2:52 p.m. UTC
  The interruption caused by vmstat_update is undesirable 
for certain aplications:

oslat   1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000)
oslat   1094.456971: workqueue_queue_work: ... function=vmstat_update ...
oslat   1094.456974: sched_switch: prev_comm=oslat ... ==> next_comm=kworker/5:1 ...
kworker 1094.456978: sched_switch: prev_comm=kworker/5:1 ==> next_comm=oslat ...

The example above shows an additional 7us for the

       	oslat -> kworker -> oslat

switches. In the case of a virtualized CPU, and the vmstat_update  
interruption in the host (of a qemu-kvm vcpu), the latency penalty
observed in the guest is higher than 50us, violating the acceptable
latency threshold.

Skip periodic updates for nohz full CPUs. Any callers who
need precise values should use a snapshot of the per-CPU
counters, or use the global counters with measures to 
handle errors up to thresholds (see calculate_normal_threshold).

Suggested by Michal Hocko.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
  

Comments

Michal Hocko June 2, 2023, 10:39 a.m. UTC | #1
On Tue 30-05-23 11:52:36, Marcelo Tosatti wrote:
> @@ -2022,6 +2023,16 @@ static void vmstat_shepherd(struct work_
>  	for_each_online_cpu(cpu) {
>  		struct delayed_work *dw = &per_cpu(vmstat_work, cpu);
>  
> +		/*
> +		 * Skip periodic updates for nohz full CPUs.
> +		 * Any callers who need precise values should use
> +		 * a snapshot of the per-CPU counters, or use the global
> +		 * counters with measures to handle errors up to
> +		 * thresholds (see calculate_normal_threshold).
> +		 */
> +		if (tick_nohz_full_cpu(cpu))
> +			continue;

In other code path we have used cpu_is_isolated, is there any reason to
diverge from that here? Isn't this effectivelly the same kind of
problem?

> +
>  		if (!delayed_work_pending(dw) && need_update(cpu))
>  			queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0);
>  
>
  
Marcelo Tosatti June 2, 2023, 4:57 p.m. UTC | #2
On Fri, Jun 02, 2023 at 12:39:04PM +0200, Michal Hocko wrote:
> On Tue 30-05-23 11:52:36, Marcelo Tosatti wrote:
> > @@ -2022,6 +2023,16 @@ static void vmstat_shepherd(struct work_
> >  	for_each_online_cpu(cpu) {
> >  		struct delayed_work *dw = &per_cpu(vmstat_work, cpu);
> >  
> > +		/*
> > +		 * Skip periodic updates for nohz full CPUs.
> > +		 * Any callers who need precise values should use
> > +		 * a snapshot of the per-CPU counters, or use the global
> > +		 * counters with measures to handle errors up to
> > +		 * thresholds (see calculate_normal_threshold).
> > +		 */
> > +		if (tick_nohz_full_cpu(cpu))
> > +			continue;
> 
> In other code path we have used cpu_is_isolated, is there any reason to
> diverge from that here? Isn't this effectivelly the same kind of
> problem?

Changed to cpu_is_isolated, thanks.
  

Patch

Index: linux-vmstat-remote/mm/vmstat.c
===================================================================
--- linux-vmstat-remote.orig/mm/vmstat.c
+++ linux-vmstat-remote/mm/vmstat.c
@@ -28,6 +28,7 @@ 
 #include <linux/mm_inline.h>
 #include <linux/page_ext.h>
 #include <linux/page_owner.h>
+#include <linux/tick.h>
 
 #include "internal.h"
 
@@ -2022,6 +2023,16 @@  static void vmstat_shepherd(struct work_
 	for_each_online_cpu(cpu) {
 		struct delayed_work *dw = &per_cpu(vmstat_work, cpu);
 
+		/*
+		 * Skip periodic updates for nohz full CPUs.
+		 * Any callers who need precise values should use
+		 * a snapshot of the per-CPU counters, or use the global
+		 * counters with measures to handle errors up to
+		 * thresholds (see calculate_normal_threshold).
+		 */
+		if (tick_nohz_full_cpu(cpu))
+			continue;
+
 		if (!delayed_work_pending(dw) && need_update(cpu))
 			queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0);