Message ID | 20230602190115.521067386@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1247269vqr; Fri, 2 Jun 2023 12:22:24 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6tu4T7PV7G1oO5XriOR1fQOVP7HYQ9OR96H4LQY1jnL27mseOx939rr5S2tgDhCmgEwjyK X-Received: by 2002:a05:6a20:4290:b0:10c:5802:fce4 with SMTP id o16-20020a056a20429000b0010c5802fce4mr11147512pzj.48.1685733744496; Fri, 02 Jun 2023 12:22:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685733744; cv=none; d=google.com; s=arc-20160816; b=ZODeNtnbeMtlSRHO7GjQ7WPE3dMnNv2kHB3oaxzh/PLkRhIzA/8ljVUeIqDJXiv+DU PWaJwt8PLGbY0hIu0pihWdKcL7T6JcuEY1RjlfW9THwg/wjNW54/HM3zBaIov+Xq6Rjh DPLPjaKUOoq2ou2I43RHRNvbnCGNhfvHYSqacUpjenvRpJCr7QpeCq4dPdAj9UoRVgqx JGV9T2mCT2cnz09mQjeULr2J2REJ3O/13TvN+qA6KiLw4Xg7WhPhkRJJgOzhYf4C+6KN suYBGQbJhjMsAEQq5Q4pMZ/I3GaUakHMO7ZVLwJPy13QJBTsmVUZBzBKE/Ic9vIf5/Os blsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=QJoQ4NIFYgqvZuzmyQS3JO7Z0w/EGIPt/606uSPKQ0E=; b=edxz/aqQ/59RmhB75XiiF4G5ljdlsFq8Y2AgFhYiVCHqlIDDIwMBrAzlE2hNqWmkal +t7T617j9pd4ISjLAjuCCBMUXHWsrwFJ23Cbmf+dY9Tf7bQUOcGZHDf9Rhj55vGRDHRr ttc61Rue4X0g/aRlIQtAWFf33UI65s+56pfOK0t6WVxR1nk7UqIJ6dfHOALbM9a2Lq06 xCxYfMvGGTE1fGxuMq4NgllhbDRC3Rm2Fmdi/dZF9Jk/0+NRHRCHgeH9VKWqM8poZEPh tSW8giFBP5U4T2OK89GiFf5v1970fmD/cwIyHJz17JHet7hsh0D2O3O6h+0i4WFmQL/Z RwTg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Ods2WV5H; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g131-20020a636b89000000b005302f84fe3asi1344311pgc.803.2023.06.02.12.22.08; Fri, 02 Jun 2023 12:22:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Ods2WV5H; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237124AbjFBTFE (ORCPT <rfc822;limurcpp@gmail.com> + 99 others); Fri, 2 Jun 2023 15:05:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237121AbjFBTFA (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 2 Jun 2023 15:05:00 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B8A121B5 for <linux-kernel@vger.kernel.org>; Fri, 2 Jun 2023 12:04:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685732655; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=QJoQ4NIFYgqvZuzmyQS3JO7Z0w/EGIPt/606uSPKQ0E=; b=Ods2WV5Ha0G3gUcvNOhEpTp2ivI6GXJayzevmgWD58kYDObLEf4dsqlzXvXYXG4yFb7Muw cjjyYUP7GSEnTXE13DFI59rHjFJMiFpp9pFivV2b8O3BfpZaSMQsBua44NDiNaE5t2Owrh TLHbsrOJUjiTDMhvBQWyLi352aukFOc= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-304-peNUCNUlOFqfgKiEXAwsmw-1; Fri, 02 Jun 2023 15:04:09 -0400 X-MC-Unique: peNUCNUlOFqfgKiEXAwsmw-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6EB4B3800EAB; Fri, 2 Jun 2023 19:04:06 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 3859D492B00; Fri, 2 Jun 2023 19:04:06 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 1437B403BF03A; Fri, 2 Jun 2023 16:03:42 -0300 (-03) Message-ID: <20230602190115.521067386@redhat.com> User-Agent: quilt/0.67 Date: Fri, 02 Jun 2023 15:57:59 -0300 From: Marcelo Tosatti <mtosatti@redhat.com> To: Christoph Lameter <cl@linux.com> Cc: Aaron Tomlin <atomlin@atomlin.com>, Frederic Weisbecker <frederic@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Vlastimil Babka <vbabka@suse.cz>, Michal Hocko <mhocko@suse.com>, Marcelo Tosatti <mtosatti@redhat.com> Subject: [PATCH v2 2/3] vmstat: skip periodic vmstat update for nohz full CPUs References: <20230602185757.110910188@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.9 X-Spam-Status: No, score=-2.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767619946680292053?= X-GMAIL-MSGID: =?utf-8?q?1767619946680292053?= |
Series |
vmstat bug fixes for nohz_full and isolated CPUs
|
|
Commit Message
Marcelo Tosatti
June 2, 2023, 6:57 p.m. UTC
The interruption caused by vmstat_update is undesirable
for certain aplications:
oslat 1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000)
oslat 1094.456971: workqueue_queue_work: ... function=vmstat_update ...
oslat 1094.456974: sched_switch: prev_comm=oslat ... ==> next_comm=kworker/5:1 ...
kworker 1094.456978: sched_switch: prev_comm=kworker/5:1 ==> next_comm=oslat ...
The example above shows an additional 7us for the
oslat -> kworker -> oslat
switches. In the case of a virtualized CPU, and the vmstat_update
interruption in the host (of a qemu-kvm vcpu), the latency penalty
observed in the guest is higher than 50us, violating the acceptable
latency threshold.
Skip periodic updates for nohz full CPUs. Any callers who
need precise values should use a snapshot of the per-CPU
counters, or use the global counters with measures to
handle errors up to thresholds (see calculate_normal_threshold).
Suggested by Michal Hocko.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
---
v2: use cpu_is_isolated (Michal Hocko)
Comments
On Fri 02-06-23 15:57:59, Marcelo Tosatti wrote: > The interruption caused by vmstat_update is undesirable > for certain aplications: > > oslat 1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000) > oslat 1094.456971: workqueue_queue_work: ... function=vmstat_update ... > oslat 1094.456974: sched_switch: prev_comm=oslat ... ==> next_comm=kworker/5:1 ... > kworker 1094.456978: sched_switch: prev_comm=kworker/5:1 ==> next_comm=oslat ... > > The example above shows an additional 7us for the > > oslat -> kworker -> oslat > > switches. In the case of a virtualized CPU, and the vmstat_update > interruption in the host (of a qemu-kvm vcpu), the latency penalty > observed in the guest is higher than 50us, violating the acceptable > latency threshold. I personally find the above problem description insufficient. I have asked several times and only got piece by piece information each time. Maybe there is a reason to be secretive but it would be great to get at least some basic expectations described and what they are based on. E.g. workloads are running on isolated cpus with nohz full mode to shield off any kernel interruption. Yet there are operations that update counters (like mlock, but not mlock alone) that update per cpu counters that will eventually get flushed and that will cause some interference. Now the host/guest transition and intereference. How that happens when the guest is running on an isolated and dedicated cpu? > Skip periodic updates for nohz full CPUs. Any callers who > need precise values should use a snapshot of the per-CPU > counters, or use the global counters with measures to > handle errors up to thresholds (see calculate_normal_threshold). I would rephrase this paragraph. In kernel users of vmstat counters either require the precise value and they are using zone_page_state_snapshot interface or they can live with an imprecision as the regular flushing can happen at arbitrary time and cumulative error can grow (see calculate_normal_threshold). From that POV the regular flushing can be postponed for CPUs that have been isolated from the kernel interference withtout critical infrastructure ever noticing. Skip regular flushing from vmstat_shepherd for all isolated CPUs to avoid interference with the isolated workload. > Suggested by Michal Hocko. > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> > > --- > > v2: use cpu_is_isolated (Michal Hocko) > > Index: linux-vmstat-remote/mm/vmstat.c > =================================================================== > --- linux-vmstat-remote.orig/mm/vmstat.c > +++ linux-vmstat-remote/mm/vmstat.c > @@ -28,6 +28,7 @@ > #include <linux/mm_inline.h> > #include <linux/page_ext.h> > #include <linux/page_owner.h> > +#include <linux/sched/isolation.h> > > #include "internal.h" > > @@ -2022,6 +2023,16 @@ static void vmstat_shepherd(struct work_ > for_each_online_cpu(cpu) { > struct delayed_work *dw = &per_cpu(vmstat_work, cpu); > > + /* > + * Skip periodic updates for isolated CPUs. > + * Any callers who need precise values should use > + * a snapshot of the per-CPU counters, or use the global > + * counters with measures to handle errors up to > + * thresholds (see calculate_normal_threshold). > + */ > + if (cpu_is_isolated(cpu)) > + continue; > + > if (!delayed_work_pending(dw) && need_update(cpu)) > queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0); > >
On Mon, Jun 05, 2023 at 09:55:57AM +0200, Michal Hocko wrote: > On Fri 02-06-23 15:57:59, Marcelo Tosatti wrote: > > The interruption caused by vmstat_update is undesirable > > for certain aplications: > > > > oslat 1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000) > > oslat 1094.456971: workqueue_queue_work: ... function=vmstat_update ... > > oslat 1094.456974: sched_switch: prev_comm=oslat ... ==> next_comm=kworker/5:1 ... > > kworker 1094.456978: sched_switch: prev_comm=kworker/5:1 ==> next_comm=oslat ... > > > > The example above shows an additional 7us for the > > > > oslat -> kworker -> oslat > > > > switches. In the case of a virtualized CPU, and the vmstat_update > > interruption in the host (of a qemu-kvm vcpu), the latency penalty > > observed in the guest is higher than 50us, violating the acceptable > > latency threshold. > > I personally find the above problem description insufficient. I have > asked several times and only got piece by piece information each time. > Maybe there is a reason to be secretive but it would be great to get at > least some basic expectations described and what they are based on. There is no reason to be secretive. > > E.g. workloads are running on isolated cpus with nohz full mode to > shield off any kernel interruption. Yet there are operations that update > counters (like mlock, but not mlock alone) that update per cpu counters > that will eventually get flushed and that will cause some interference. > Now the host/guest transition and intereference. How that happens when > the guest is running on an isolated and dedicated cpu? Follows the updated changelog. Does it contain the information requested ? ---- Performance details for the kworker interruption: With workloads that are running on isolated cpus with nohz full mode to shield off any kernel interruption. For example, a VM running a time sensitive application with a 50us maximum acceptable interruption (use case: soft PLC). oslat 1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000) oslat 1094.456971: workqueue_queue_work: ... function=vmstat_update ... oslat 1094.456974: sched_switch: prev_comm=oslat ... ==> next_comm=kworker/5:1 ... kworker 1094.456978: sched_switch: prev_comm=kworker/5:1 ==> next_comm=oslat ... The example above shows an additional 7us for the oslat -> kworker -> oslat switches. In the case of a virtualized CPU, and the vmstat_update interruption in the host (of a qemu-kvm vcpu), the latency penalty observed in the guest is higher than 50us, violating the acceptable latency threshold. The isolated vCPU can perform operations that modify per-CPU page counters, for example to complete I/O operations: CPU 11/KVM-9540 [001] dNh1. 2314.248584: mod_zone_page_state <-__folio_end_writeback CPU 11/KVM-9540 [001] dNh1. 2314.248585: <stack trace> => 0xffffffffc042b083 => mod_zone_page_state => __folio_end_writeback => folio_end_writeback => iomap_finish_ioend => blk_mq_end_request_batch => nvme_irq => __handle_irq_event_percpu => handle_irq_event => handle_edge_irq => __common_interrupt => common_interrupt => asm_common_interrupt => vmx_do_interrupt_nmi_irqoff => vmx_handle_exit_irqoff => vcpu_enter_guest => vcpu_run => kvm_arch_vcpu_ioctl_run => kvm_vcpu_ioctl => __x64_sys_ioctl => do_syscall_64 => entry_SYSCALL_64_after_hwframe > > Skip periodic updates for nohz full CPUs. Any callers who > > need precise values should use a snapshot of the per-CPU > > counters, or use the global counters with measures to > > handle errors up to thresholds (see calculate_normal_threshold). > > I would rephrase this paragraph. > In kernel users of vmstat counters either require the precise value and > they are using zone_page_state_snapshot interface or they can live with > an imprecision as the regular flushing can happen at arbitrary time and > cumulative error can grow (see calculate_normal_threshold). > >From that POV the regular flushing can be postponed for CPUs that have > been isolated from the kernel interference withtout critical > infrastructure ever noticing. Skip regular flushing from vmstat_shepherd > for all isolated CPUs to avoid interference with the isolated workload. > > > Suggested by Michal Hocko. > > > > Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> > > Acked-by: Michal Hocko <mhocko@suse.com> OK, updated comment, thanks.
On Mon 05-06-23 11:53:56, Marcelo Tosatti wrote: > On Mon, Jun 05, 2023 at 09:55:57AM +0200, Michal Hocko wrote: > > On Fri 02-06-23 15:57:59, Marcelo Tosatti wrote: > > > The interruption caused by vmstat_update is undesirable > > > for certain aplications: > > > > > > oslat 1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000) > > > oslat 1094.456971: workqueue_queue_work: ... function=vmstat_update ... > > > oslat 1094.456974: sched_switch: prev_comm=oslat ... ==> next_comm=kworker/5:1 ... > > > kworker 1094.456978: sched_switch: prev_comm=kworker/5:1 ==> next_comm=oslat ... > > > > > > The example above shows an additional 7us for the > > > > > > oslat -> kworker -> oslat > > > > > > switches. In the case of a virtualized CPU, and the vmstat_update > > > interruption in the host (of a qemu-kvm vcpu), the latency penalty > > > observed in the guest is higher than 50us, violating the acceptable > > > latency threshold. > > > > I personally find the above problem description insufficient. I have > > asked several times and only got piece by piece information each time. > > Maybe there is a reason to be secretive but it would be great to get at > > least some basic expectations described and what they are based on. > > There is no reason to be secretive. > > > > > E.g. workloads are running on isolated cpus with nohz full mode to > > shield off any kernel interruption. Yet there are operations that update > > counters (like mlock, but not mlock alone) that update per cpu counters > > that will eventually get flushed and that will cause some interference. > > Now the host/guest transition and intereference. How that happens when > > the guest is running on an isolated and dedicated cpu? > > Follows the updated changelog. Does it contain the information > requested ? > > ---- > > Performance details for the kworker interruption: > > With workloads that are running on isolated cpus with nohz full mode to > shield off any kernel interruption. For example, a VM running a > time sensitive application with a 50us maximum acceptable interruption > (use case: soft PLC). > > oslat 1094.456862: sys_mlock(start: 7f7ed0000b60, len: 1000) > oslat 1094.456971: workqueue_queue_work: ... function=vmstat_update ... > oslat 1094.456974: sched_switch: prev_comm=oslat ... ==> next_comm=kworker/5:1 ... > kworker 1094.456978: sched_switch: prev_comm=kworker/5:1 ==> next_comm=oslat ... > > The example above shows an additional 7us for the > > oslat -> kworker -> oslat > > switches. In the case of a virtualized CPU, and the vmstat_update > interruption in the host (of a qemu-kvm vcpu), the latency penalty > observed in the guest is higher than 50us, violating the acceptable > latency threshold. > > The isolated vCPU can perform operations that modify per-CPU page counters, > for example to complete I/O operations: > > CPU 11/KVM-9540 [001] dNh1. 2314.248584: mod_zone_page_state <-__folio_end_writeback > CPU 11/KVM-9540 [001] dNh1. 2314.248585: <stack trace> > => 0xffffffffc042b083 > => mod_zone_page_state > => __folio_end_writeback > => folio_end_writeback > => iomap_finish_ioend > => blk_mq_end_request_batch > => nvme_irq > => __handle_irq_event_percpu > => handle_irq_event > => handle_edge_irq > => __common_interrupt > => common_interrupt > => asm_common_interrupt > => vmx_do_interrupt_nmi_irqoff > => vmx_handle_exit_irqoff > => vcpu_enter_guest > => vcpu_run > => kvm_arch_vcpu_ioctl_run > => kvm_vcpu_ioctl > => __x64_sys_ioctl > => do_syscall_64 > => entry_SYSCALL_64_after_hwframe OK, this is really useful. It is just not really clear whether the IO triggered here is from the guest or it a host activity. overall this is much better!
On Mon, Jun 05, 2023 at 05:55:49PM +0200, Michal Hocko wrote: > > The example above shows an additional 7us for the > > > > oslat -> kworker -> oslat > > > > switches. In the case of a virtualized CPU, and the vmstat_update > > interruption in the host (of a qemu-kvm vcpu), the latency penalty > > observed in the guest is higher than 50us, violating the acceptable > > latency threshold. > > > > The isolated vCPU can perform operations that modify per-CPU page counters, > > for example to complete I/O operations: > > > > CPU 11/KVM-9540 [001] dNh1. 2314.248584: mod_zone_page_state <-__folio_end_writeback > > CPU 11/KVM-9540 [001] dNh1. 2314.248585: <stack trace> > > => 0xffffffffc042b083 > > => mod_zone_page_state > > => __folio_end_writeback > > => folio_end_writeback > > => iomap_finish_ioend > > => blk_mq_end_request_batch > > => nvme_irq > > => __handle_irq_event_percpu > > => handle_irq_event > > => handle_edge_irq > > => __common_interrupt > > => common_interrupt > > => asm_common_interrupt > > => vmx_do_interrupt_nmi_irqoff > > => vmx_handle_exit_irqoff > > => vcpu_enter_guest > > => vcpu_run > > => kvm_arch_vcpu_ioctl_run > > => kvm_vcpu_ioctl > > => __x64_sys_ioctl > > => do_syscall_64 > > => entry_SYSCALL_64_after_hwframe > > OK, this is really useful. It is just not really clear whether the IO > triggered here is from the guest or it a host activity. Guest initiated I/O, since the host CPU is isolated.
On Mon 05-06-23 14:35:56, Marcelo Tosatti wrote: > On Mon, Jun 05, 2023 at 05:55:49PM +0200, Michal Hocko wrote: > > > The example above shows an additional 7us for the > > > > > > oslat -> kworker -> oslat > > > > > > switches. In the case of a virtualized CPU, and the vmstat_update > > > interruption in the host (of a qemu-kvm vcpu), the latency penalty > > > observed in the guest is higher than 50us, violating the acceptable > > > latency threshold. > > > > > > The isolated vCPU can perform operations that modify per-CPU page counters, > > > for example to complete I/O operations: > > > > > > CPU 11/KVM-9540 [001] dNh1. 2314.248584: mod_zone_page_state <-__folio_end_writeback > > > CPU 11/KVM-9540 [001] dNh1. 2314.248585: <stack trace> > > > => 0xffffffffc042b083 > > > => mod_zone_page_state > > > => __folio_end_writeback > > > => folio_end_writeback > > > => iomap_finish_ioend > > > => blk_mq_end_request_batch > > > => nvme_irq > > > => __handle_irq_event_percpu > > > => handle_irq_event > > > => handle_edge_irq > > > => __common_interrupt > > > => common_interrupt > > > => asm_common_interrupt > > > => vmx_do_interrupt_nmi_irqoff > > > => vmx_handle_exit_irqoff > > > => vcpu_enter_guest > > > => vcpu_run > > > => kvm_arch_vcpu_ioctl_run > > > => kvm_vcpu_ioctl > > > => __x64_sys_ioctl > > > => do_syscall_64 > > > => entry_SYSCALL_64_after_hwframe > > > > OK, this is really useful. It is just not really clear whether the IO > > triggered here is from the guest or it a host activity. > > Guest initiated I/O, since the host CPU is isolated. Make it explicit in the changelog. I am just wondering how you can achieve your strict deadlines when IO is involved but that is another story I guess.
On Mon, Jun 05, 2023 at 08:57:15PM +0200, Michal Hocko wrote: > > Guest initiated I/O, since the host CPU is isolated. > > Make it explicit in the changelog. I think better use of our time would be to focus on https://lkml.iu.edu/hypermail/linux/kernel/2209.1/01263.html 1) Operate in terms of add CPU, remove CPU on sysfs (to avoid races). 2) Don't allow all CPUs to be marked as "block_interf". 3) Remove percpu rwsem lock. > I am just wondering how you can achieve your strict deadlines when IO is > involved but that is another story I guess. IO can be submitted when the ELF binary and libraries are read from the virtual disk, for example.
Index: linux-vmstat-remote/mm/vmstat.c =================================================================== --- linux-vmstat-remote.orig/mm/vmstat.c +++ linux-vmstat-remote/mm/vmstat.c @@ -28,6 +28,7 @@ #include <linux/mm_inline.h> #include <linux/page_ext.h> #include <linux/page_owner.h> +#include <linux/sched/isolation.h> #include "internal.h" @@ -2022,6 +2023,16 @@ static void vmstat_shepherd(struct work_ for_each_online_cpu(cpu) { struct delayed_work *dw = &per_cpu(vmstat_work, cpu); + /* + * Skip periodic updates for isolated CPUs. + * Any callers who need precise values should use + * a snapshot of the per-CPU counters, or use the global + * counters with measures to handle errors up to + * thresholds (see calculate_normal_threshold). + */ + if (cpu_is_isolated(cpu)) + continue; + if (!delayed_work_pending(dw) && need_update(cpu)) queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0);