From patchwork Wed Dec 21 16:58:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 35442 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp3642014wrn; Wed, 21 Dec 2022 09:14:29 -0800 (PST) X-Google-Smtp-Source: AMrXdXtFfMskBJM34Nr2viJi7S9CZ+TQL4/jO6wDfTW3HOGLoGcSVBdoEItKovMf43nBDnhtAl+H X-Received: by 2002:a05:6a20:4295:b0:a8:8714:233b with SMTP id o21-20020a056a20429500b000a88714233bmr3733699pzj.57.1671642868724; Wed, 21 Dec 2022 09:14:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671642868; cv=none; d=google.com; s=arc-20160816; b=iRycs8xO2oVwhw8OrTS299oCX3NFWm++W1EbDdZQ2HfZ2+nLeoFGeOPZ2/KU9CmYe8 vH9IMI8jsk81ynuStEtA8nbDVWKVoVKfW5Fokyqd/9ncf+fa78az4F0gIEKOlhAyn+uh evh0KIvG03eS5qISzT4uWdUh2srVXIh/wF51uRCwbop9wdhokX0JxEoBy7QJAlK4cGzm X+l22kvuXBWaNafwrfEiCHNjf2pAAzasQ4OhY+JlawrB5LcASj3XmL5elptH0eufGFrf 80/lGxLBgEddd5WQdZ4DTOIq8H7/pG3z9c3FFXBlmWkXOETUeHXWcbaiPx3i0/X/GhTw E4vA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=A8ptA+mtJpfx4WDFilLLKvFD1fvtEmfC0YGkopBqCFA=; b=zgzT+0paPAiwL4OArJhGyTKvGGkhPvQyv9o8J8JFLC1XXmwkWh+Ptga4Autorc1Riq hsNR/RtaM8f2hjsRI1JpukN8cpdc1E5UWjzqQcxN37I+61s0AGJ7SHPctHQjjHEGDRAN KulCznBILNfWrpkauw7JRd8kbdQCeA0zX4zqb6nZJOiALMNRy9T4Rbj+3dMMBEfiwP9f WH3KtQ0CMZ4h5SCxpYm59fD8XuHaOspg7j/VfHTEvH3DF/NVAdvzdbzHSa0MLNnxQeNF ttJefdtvHjLKwuhm9dyqO1sYOvYr4PbqctzgaYVbqulWvjt7Bthc9YtPBNK0E0ugW5HL Qe5w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hIXCUIHH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f7-20020a636a07000000b004780edf920fsi16875395pgc.165.2022.12.21.09.14.16; Wed, 21 Dec 2022 09:14:28 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=hIXCUIHH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234672AbiLURMX (ORCPT + 99 others); Wed, 21 Dec 2022 12:12:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52580 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234757AbiLURLk (ORCPT ); Wed, 21 Dec 2022 12:11:40 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2FA1DAE for ; Wed, 21 Dec 2022 09:10:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671642651; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=A8ptA+mtJpfx4WDFilLLKvFD1fvtEmfC0YGkopBqCFA=; b=hIXCUIHHzGe+MD85x6MH7egkv0pTriE1bsa+PbJQ98slfmBirjcq1dNYlTdFB3vckdu95h v1u4I+1teHDkgOI/7sRxXBoOqiYqj4vydI9BjRChqvKhM/wg8Td7UYerx/3sLk1XIwmQAK yXY5kG81S68iKoeu2OECmtepKTMsRNM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-591-yibgqVRuNwCx0j0WJvOmOg-1; Wed, 21 Dec 2022 12:10:47 -0500 X-MC-Unique: yibgqVRuNwCx0j0WJvOmOg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C9029882821; Wed, 21 Dec 2022 17:10:46 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 9824B40C945A; Wed, 21 Dec 2022 17:10:46 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 9FC4340408D42; Wed, 21 Dec 2022 14:09:34 -0300 (-03) Message-ID: <20221221170436.252896271@redhat.com> User-Agent: quilt/0.66 Date: Wed, 21 Dec 2022 13:58:02 -0300 From: Marcelo Tosatti To: atomlin@atomlin.com, frederic@kernel.org Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH v11 1/6] mm/vmstat: Add CPU-specific variable to track a vmstat discrepancy References: <20221221165801.362118576@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752844592566639925?= X-GMAIL-MSGID: =?utf-8?q?1752844592566639925?= From: Aaron Tomlin Introduce a CPU-specific variable namely vmstat_dirty to indicate if a vmstat imbalance is present for a given CPU. Therefore, at the appropriate time, we can fold all the remaining differentials. This patch also provides trivial helpers for modification and testing. Signed-off-by: Aaron Tomlin Signed-off-by: Marcelo Tosatti --- mm/vmstat.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) Index: linux-2.6/mm/vmstat.c =================================================================== --- linux-2.6.orig/mm/vmstat.c +++ linux-2.6/mm/vmstat.c @@ -194,6 +194,22 @@ void fold_vm_numa_events(void) #endif #ifdef CONFIG_SMP +static DEFINE_PER_CPU_ALIGNED(bool, vmstat_dirty); + +static inline void vmstat_mark_dirty(void) +{ + this_cpu_write(vmstat_dirty, true); +} + +static inline void vmstat_clear_dirty(void) +{ + this_cpu_write(vmstat_dirty, false); +} + +static inline bool is_vmstat_dirty(void) +{ + return this_cpu_read(vmstat_dirty); +} int calculate_pressure_threshold(struct zone *zone) { From patchwork Wed Dec 21 16:58:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 35439 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp3640969wrn; Wed, 21 Dec 2022 09:12:38 -0800 (PST) X-Google-Smtp-Source: AMrXdXs+MnbhrHVWwUP3XY9M4lFm/inJmqlK427kLlpyBJ9vV8NLPyzzVKsE2v1E4paRy1Cj1gmw X-Received: by 2002:a17:902:cf0f:b0:188:f5de:8908 with SMTP id i15-20020a170902cf0f00b00188f5de8908mr2469590plg.65.1671642758575; Wed, 21 Dec 2022 09:12:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671642758; cv=none; d=google.com; s=arc-20160816; b=0sLxzGvQtCFkVu3k2BNgji9RbqrCai4ilBjmNGI+DW/H0zS6ctDQ8zJo5xA47xrnO6 FFUkTcH4+yC5uUytQe/10N217laSLZNm0Ck5Q8tJnprIUFE/wLuFQt65yCJnEmBWyblJ x8gbtfxlbk5SFZkSdBjC/PWOUHyNvGMp9IbOn1J/ycx6hRsrxo83xTQ0vlckTFIPLASs c1DWySDTMjMQ83G9VPdysz5ofDz40Yqhom0f8vWlLaejox6UhzwZ7nLtyF2ZgrhlLuew dxGvu7CFsaH9CRBzUVH5ZqtM048+MM4MJcE/OodsaKcooVj1oMB6XJx7rY3bJa38y+Zh trXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=r/LpF1VkdRdOhfpQs0+SAaS5Z0AIHcjMWCSMDWY5FIc=; b=PGtaqh/HNvNrWouV/tz4RVDGyCTq4fbd4D9lCyAaLX2u6/J2RWeUn3E8weVS85wUrN gw0vvu8CkMgrjtcSxYsEVU/l00PVgauhe0CXi/C724QfNbYVi9qrCvyI0E7lguuqiYcQ lwgftDAvxZT0T8XES/cLgg1hgFYVU9ymT47Qn02noL/EbQh/0N0PQFkZpLiCeRCPvAAe InL4nUs55wyjAuEw/rkzle2/2e93oOUdfvge+RPuTIFzcfwWfAQV7vd5WJGlyFcvCowY u9OJsojN8hsQB4f0ZQUbAhv8LQcz7SSG8KgsmuialQyB49f017WnFHJwwnej3ek2wjvx +z9Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=X763BN3c; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b16-20020a170903229000b00188fead22f3si18287138plh.104.2022.12.21.09.12.25; Wed, 21 Dec 2022 09:12:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=X763BN3c; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234751AbiLURMF (ORCPT + 99 others); Wed, 21 Dec 2022 12:12:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53144 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234729AbiLURLc (ORCPT ); Wed, 21 Dec 2022 12:11:32 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F315A2BFD for ; Wed, 21 Dec 2022 09:10:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671642649; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=r/LpF1VkdRdOhfpQs0+SAaS5Z0AIHcjMWCSMDWY5FIc=; b=X763BN3c/aOUh/7mUs/XErkbo0L4UIjQvW8GhbYDWQ/2krFd8CLLXs96VNPChgY+kTkJfa fSdV1ur//66FSxm06P861vW4cdlDZlNQW2ERy1vG+v67ysUY8sm8aJttiPEWrLtMVIRo8n CCQsY5XKJ25ue4gvKzNAIBrMXqxO1Lw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-652-hd3noGltNP29S8NfNyWmVw-1; Wed, 21 Dec 2022 12:10:45 -0500 X-MC-Unique: hd3noGltNP29S8NfNyWmVw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4FF058F6E87; Wed, 21 Dec 2022 17:10:45 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CDD132026D76; Wed, 21 Dec 2022 17:10:44 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id A32B840408D43; Wed, 21 Dec 2022 14:09:34 -0300 (-03) Message-ID: <20221221170436.292370701@redhat.com> User-Agent: quilt/0.66 Date: Wed, 21 Dec 2022 13:58:03 -0300 From: Marcelo Tosatti To: atomlin@atomlin.com, frederic@kernel.org Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH v11 2/6] mm/vmstat: Use vmstat_dirty to track CPU-specific vmstat discrepancies References: <20221221165801.362118576@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752844477278675308?= X-GMAIL-MSGID: =?utf-8?q?1752844477278675308?= From: Aaron Tomlin This patch will now use the previously introduced CPU-specific variable namely vmstat_dirty to indicate if a vmstat differential/or imbalance is present for a given CPU. So, at the appropriate time, vmstat processing can be initiated. The hope is that this particular approach is "cheaper" when compared to need_update(). The idea is based on Marcelo's patch [1]. [1]: https://lore.kernel.org/lkml/20220204173554.763888172@fedora.localdomain/ Signed-off-by: Aaron Tomlin Signed-off-by: Marcelo Tosatti --- mm/vmstat.c | 48 ++++++++++++++---------------------------------- 1 file changed, 14 insertions(+), 34 deletions(-) Index: linux-2.6/mm/vmstat.c =================================================================== --- linux-2.6.orig/mm/vmstat.c +++ linux-2.6/mm/vmstat.c @@ -381,6 +381,7 @@ void __mod_zone_page_state(struct zone * x = 0; } __this_cpu_write(*p, x); + vmstat_mark_dirty(); preempt_enable_nested(); } @@ -417,6 +418,7 @@ void __mod_node_page_state(struct pglist x = 0; } __this_cpu_write(*p, x); + vmstat_mark_dirty(); preempt_enable_nested(); } @@ -606,6 +608,7 @@ static inline void mod_zone_state(struct if (z) zone_page_state_add(z, zone, item); + vmstat_mark_dirty(); } void mod_zone_page_state(struct zone *zone, enum zone_stat_item item, @@ -674,6 +677,7 @@ static inline void mod_node_state(struct if (z) node_page_state_add(z, pgdat, item); + vmstat_mark_dirty(); } void mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item, @@ -828,6 +832,14 @@ static int refresh_cpu_vm_stats(bool do_ int global_node_diff[NR_VM_NODE_STAT_ITEMS] = { 0, }; int changes = 0; + /* + * Clear vmstat_dirty before clearing the percpu vmstats. + * If interrupts are enabled, it is possible that an interrupt + * or another task modifies a percpu vmstat, which will + * set vmstat_dirty to true. + */ + vmstat_clear_dirty(); + for_each_populated_zone(zone) { struct per_cpu_zonestat __percpu *pzstats = zone->per_cpu_zonestats; #ifdef CONFIG_NUMA @@ -1957,35 +1969,6 @@ static void vmstat_update(struct work_st } /* - * Check if the diffs for a certain cpu indicate that - * an update is needed. - */ -static bool need_update(int cpu) -{ - pg_data_t *last_pgdat = NULL; - struct zone *zone; - - for_each_populated_zone(zone) { - struct per_cpu_zonestat *pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); - struct per_cpu_nodestat *n; - - /* - * The fast way of checking if there are any vmstat diffs. - */ - if (memchr_inv(pzstats->vm_stat_diff, 0, sizeof(pzstats->vm_stat_diff))) - return true; - - if (last_pgdat == zone->zone_pgdat) - continue; - last_pgdat = zone->zone_pgdat; - n = per_cpu_ptr(zone->zone_pgdat->per_cpu_nodestats, cpu); - if (memchr_inv(n->vm_node_stat_diff, 0, sizeof(n->vm_node_stat_diff))) - return true; - } - return false; -} - -/* * Switch off vmstat processing and then fold all the remaining differentials * until the diffs stay at zero. The function is used by NOHZ and can only be * invoked when tick processing is not active. @@ -1995,10 +1978,7 @@ void quiet_vmstat(void) if (system_state != SYSTEM_RUNNING) return; - if (!delayed_work_pending(this_cpu_ptr(&vmstat_work))) - return; - - if (!need_update(smp_processor_id())) + if (!is_vmstat_dirty()) return; /* @@ -2029,7 +2009,7 @@ static void vmstat_shepherd(struct work_ for_each_online_cpu(cpu) { struct delayed_work *dw = &per_cpu(vmstat_work, cpu); - if (!delayed_work_pending(dw) && need_update(cpu)) + if (!delayed_work_pending(dw) && per_cpu(vmstat_dirty, cpu)) queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0); cond_resched(); From patchwork Wed Dec 21 16:58:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 35444 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp3642128wrn; Wed, 21 Dec 2022 09:14:41 -0800 (PST) X-Google-Smtp-Source: AMrXdXtXwM6rW8fOJMrKs09q8X/XL5VxGal/MaI6KLAIYDfFOZloTkHgUoVVc+J/zv2eQyTLLoFp X-Received: by 2002:a05:6a00:1d8e:b0:578:16e6:815d with SMTP id z14-20020a056a001d8e00b0057816e6815dmr3038205pfw.21.1671642881237; Wed, 21 Dec 2022 09:14:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671642881; cv=none; d=google.com; s=arc-20160816; b=NiDqmxktr+gCb9rQpWNDy8I7OvPMCYIwYt46Tl+4H+OUH4Tklf/bPZpwZGPARXvh85 obTY+DeXSiJvKv3dOI4+vdpToXiuacZBaCM1gByn0lhiU0H/QBeMFOs35kMH42AeShn/ YH6+yTAad8Xoq4fbfqf9jZ9VLhrgAlKCF5EGBUgHhVYsQLn+drpHyTXEHM5ftuX1NEbK tAHpLhp0ACMTNDW7n1SGmSBNi0thiZ26d6nGqnOqCm7bpgPfcCGsWHMDVXV9yHc+gHss OcnV/l7uaZTNedvXYxLnIc70M3QeXieNUfwgW7cEHsvmuDbdWDGBU5j4GXkSY4h8Fy1S fZ6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=3gD3dwT2L6CCCjRN92vyud3vtNAmOazkyM6y6Mm+R8Y=; b=MVAA4vKHhjBr+vZe8FUTsnuIFyee/Z1IfAJiHIt7BrbsBYCJWn4Wz2wCiJYXOKh5ab tfxu3S3o1ovJ48K/JV1tK0lFJou2ZfhISLcFaAATlj/nn0CQsfHYpbd4HIf2qSRtT49O 7wcnuIbu8X+ELXMsdYsfa5lKjVWkrI3Yt9ie1er02yoG0kG1GEDueahVphqXKGvgnof5 SHnC650JIbLWR9YWtQKYZpdVnkK2vKNs8nT0dwz9Ol4KCZ8dZkSU7hMe2C4GA6b823iK nRtEBGLV26ld1qwVWgcuTA2L75DyZwquKMz0Gyg4XJ7RH7yIx4IJ9MUq5kwLzs01GOFO ekEQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PQZoEu+1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d11-20020a056a00198b00b005779cde2668si7019477pfl.76.2022.12.21.09.14.27; Wed, 21 Dec 2022 09:14:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=PQZoEu+1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234848AbiLURMK (ORCPT + 99 others); Wed, 21 Dec 2022 12:12:10 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53178 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234742AbiLURLd (ORCPT ); Wed, 21 Dec 2022 12:11:33 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11A7760FD for ; Wed, 21 Dec 2022 09:10:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671642651; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=3gD3dwT2L6CCCjRN92vyud3vtNAmOazkyM6y6Mm+R8Y=; b=PQZoEu+1zHYXP8igNSJ1UAjpgUR2es/w0fiU7U6x1HC4UspxmPOqleWn9+UhCbC9zF+82L 4/AEI3NIQL8gh0bYUuDg4Bq84EgBn2kgUh11hD1aTRB749P494DUGLapysKjV2+sSUcRiw gbeQStiFmDFAU4y48x7mcSfMwz2osLk= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-517-P822zIk2PAq546CNbWl-3w-1; Wed, 21 Dec 2022 12:10:45 -0500 X-MC-Unique: P822zIk2PAq546CNbWl-3w-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3BF6918E0046; Wed, 21 Dec 2022 17:10:45 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id AF8E34014EBD; Wed, 21 Dec 2022 17:10:44 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id A67D440408D49; Wed, 21 Dec 2022 14:09:34 -0300 (-03) Message-ID: <20221221170436.330627967@redhat.com> User-Agent: quilt/0.66 Date: Wed, 21 Dec 2022 13:58:04 -0300 From: Marcelo Tosatti To: atomlin@atomlin.com, frederic@kernel.org Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH v11 3/6] mm/vmstat: manage per-CPU stats from CPU context when NOHZ full References: <20221221165801.362118576@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752844606094280111?= X-GMAIL-MSGID: =?utf-8?q?1752844606094280111?= For nohz full CPUs, we'd like the per-CPU vm statistics to be synchronized when userspace is executing. Otherwise, the vmstat_shepherd might queue a work item to synchronize them, which is undesired intereference for isolated CPUs. This means that its necessary to check for, and possibly sync, the statistics when returning to userspace. This means that there are now two execution contexes, on different CPUs, which require awareness about each other: context switch and vmstat shepherd kernel threadr. To avoid the shared variables between these two contexes (which would require atomic accesses), delegate the responsability of statistics synchronization from vmstat_shepherd to local CPU context, for nohz_full CPUs. Do that by queueing a delayed work when marking per-CPU vmstat dirty. When returning to userspace, fold the stats and cancel the delayed work. When entering idle, only fold the stats. Signed-off-by: Marcelo Tosatti --- include/linux/vmstat.h | 4 ++-- kernel/time/tick-sched.c | 2 +- mm/vmstat.c | 41 ++++++++++++++++++++++++++++++++--------- 3 files changed, 35 insertions(+), 12 deletions(-) Index: linux-2.6/mm/vmstat.c =================================================================== --- linux-2.6.orig/mm/vmstat.c +++ linux-2.6/mm/vmstat.c @@ -28,6 +28,7 @@ #include #include #include +#include #include "internal.h" @@ -194,21 +195,50 @@ void fold_vm_numa_events(void) #endif #ifdef CONFIG_SMP -static DEFINE_PER_CPU_ALIGNED(bool, vmstat_dirty); + +struct vmstat_dirty { + bool dirty; + bool cpuhotplug; +}; + +static DEFINE_PER_CPU_ALIGNED(struct vmstat_dirty, vmstat_dirty_pcpu); +static DEFINE_PER_CPU(struct delayed_work, vmstat_work); +int sysctl_stat_interval __read_mostly = HZ; static inline void vmstat_mark_dirty(void) { - this_cpu_write(vmstat_dirty, true); + struct vmstat_dirty *vms = this_cpu_ptr(&vmstat_dirty_pcpu); + +#ifdef CONFIG_FLUSH_WORK_ON_RESUME_USER + int cpu = smp_processor_id(); + + if (tick_nohz_full_cpu(cpu) && !vms->dirty) { + struct delayed_work *dw; + + dw = this_cpu_ptr(&vmstat_work); + if (!delayed_work_pending(dw) && !vms->cpuhotplug) { + unsigned long delay; + + delay = round_jiffies_relative(sysctl_stat_interval); + queue_delayed_work_on(cpu, mm_percpu_wq, dw, delay); + } + } +#endif + vms->dirty = true; } static inline void vmstat_clear_dirty(void) { - this_cpu_write(vmstat_dirty, false); + struct vmstat_dirty *vms = this_cpu_ptr(&vmstat_dirty_pcpu); + + vms->dirty = false; } static inline bool is_vmstat_dirty(void) { - return this_cpu_read(vmstat_dirty); + struct vmstat_dirty *vms = this_cpu_ptr(&vmstat_dirty_pcpu); + + return vms->dirty; } int calculate_pressure_threshold(struct zone *zone) @@ -1886,9 +1916,6 @@ static const struct seq_operations vmsta #endif /* CONFIG_PROC_FS */ #ifdef CONFIG_SMP -static DEFINE_PER_CPU(struct delayed_work, vmstat_work); -int sysctl_stat_interval __read_mostly = HZ; - #ifdef CONFIG_PROC_FS static void refresh_vm_stats(struct work_struct *work) { @@ -1973,7 +2000,7 @@ static void vmstat_update(struct work_st * until the diffs stay at zero. The function is used by NOHZ and can only be * invoked when tick processing is not active. */ -void quiet_vmstat(void) +void quiet_vmstat(bool user) { if (system_state != SYSTEM_RUNNING) return; @@ -1981,13 +2008,18 @@ void quiet_vmstat(void) if (!is_vmstat_dirty()) return; + refresh_cpu_vm_stats(false); + +#ifdef CONFIG_FLUSH_WORK_ON_RESUME_USER + if (!user) + return; /* - * Just refresh counters and do not care about the pending delayed - * vmstat_update. It doesn't fire that often to matter and canceling - * it would be too expensive from this path. - * vmstat_shepherd will take care about that for us. + * If the tick is stopped, cancel any delayed work to avoid + * interruptions to this CPU in the future. */ - refresh_cpu_vm_stats(false); + if (delayed_work_pending(this_cpu_ptr(&vmstat_work))) + cancel_delayed_work(this_cpu_ptr(&vmstat_work)); +#endif } /* @@ -2008,8 +2040,15 @@ static void vmstat_shepherd(struct work_ /* Check processors whose vmstat worker threads have been disabled */ for_each_online_cpu(cpu) { struct delayed_work *dw = &per_cpu(vmstat_work, cpu); + struct vmstat_dirty *vms = per_cpu_ptr(&vmstat_dirty_pcpu, cpu); - if (!delayed_work_pending(dw) && per_cpu(vmstat_dirty, cpu)) +#ifdef CONFIG_FLUSH_WORK_ON_RESUME_USER + /* NOHZ full CPUs manage their own vmstat flushing */ + if (tick_nohz_full_cpu(cpu)) + continue; +#endif + + if (!delayed_work_pending(dw) && vms->dirty) queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0); cond_resched(); @@ -2044,6 +2083,25 @@ static void __init init_cpu_node_state(v static int vmstat_cpu_online(unsigned int cpu) { +#ifdef CONFIG_FLUSH_WORK_ON_RESUME_USER + struct vmstat_dirty *vms = per_cpu_ptr(&vmstat_dirty_pcpu, cpu); + + if (tick_nohz_full_cpu(cpu)) { + struct delayed_work *dw; + + vms->cpuhotplug = false; + vms->dirty = true; + + dw = this_cpu_ptr(&vmstat_work); + if (!delayed_work_pending(dw)) { + unsigned long delay; + + delay = round_jiffies_relative(sysctl_stat_interval); + queue_delayed_work_on(cpu, mm_percpu_wq, dw, delay); + } + } +#endif + refresh_zone_stat_thresholds(); if (!node_state(cpu_to_node(cpu), N_CPU)) { @@ -2053,8 +2111,15 @@ static int vmstat_cpu_online(unsigned in return 0; } +/* + * ONLINE: The callbacks are invoked on the hotplugged CPU from the per CPU + * hotplug thread with interrupts and preemption enabled. + */ static int vmstat_cpu_down_prep(unsigned int cpu) { + struct vmstat_dirty *vms = per_cpu_ptr(&vmstat_dirty_pcpu, cpu); + + vms->cpuhotplug = true; cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)); return 0; } Index: linux-2.6/include/linux/vmstat.h =================================================================== --- linux-2.6.orig/include/linux/vmstat.h +++ linux-2.6/include/linux/vmstat.h @@ -290,7 +290,7 @@ extern void dec_zone_state(struct zone * extern void __dec_zone_state(struct zone *, enum zone_stat_item); extern void __dec_node_state(struct pglist_data *, enum node_stat_item); -void quiet_vmstat(void); +void quiet_vmstat(bool user); void cpu_vm_stats_fold(int cpu); void refresh_zone_stat_thresholds(void); @@ -403,7 +403,7 @@ static inline void __dec_node_page_state static inline void refresh_zone_stat_thresholds(void) { } static inline void cpu_vm_stats_fold(int cpu) { } -static inline void quiet_vmstat(void) { } +static inline void quiet_vmstat(bool user) { } static inline void drain_zonestat(struct zone *zone, struct per_cpu_zonestat *pzstats) { } Index: linux-2.6/kernel/time/tick-sched.c =================================================================== --- linux-2.6.orig/kernel/time/tick-sched.c +++ linux-2.6/kernel/time/tick-sched.c @@ -911,7 +911,7 @@ static void tick_nohz_stop_tick(struct t */ if (!ts->tick_stopped) { calc_load_nohz_start(); - quiet_vmstat(); + quiet_vmstat(false); ts->last_tick = hrtimer_get_expires(&ts->sched_timer); ts->tick_stopped = 1; Index: linux-2.6/mm/Kconfig =================================================================== --- linux-2.6.orig/mm/Kconfig +++ linux-2.6/mm/Kconfig @@ -1124,6 +1124,19 @@ config PTE_MARKER_UFFD_WP purposes. It is required to enable userfaultfd write protection on file-backed memory types like shmem and hugetlbfs. +config FLUSH_WORK_ON_RESUME_USER + bool "Flush per-CPU vmstats on user return (for nohz full CPUs)" + depends on NO_HZ_FULL + default y + + help + By default, nohz full CPUs flush per-CPU vm statistics on return + to userspace (to avoid additional interferences when executing + userspace code). This has a small but measurable impact on + system call performance. You can disable this to improve system call + performance, at the expense of potential interferences to userspace + execution. + # multi-gen LRU { config LRU_GEN bool "Multi-Gen LRU" From patchwork Wed Dec 21 16:58:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 35441 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp3641812wrn; Wed, 21 Dec 2022 09:14:06 -0800 (PST) X-Google-Smtp-Source: AMrXdXtbT6WdG1Mx1UfqtmKXfJCP/WYgA9KjdAQncd5NLrA+Rfts4W2Jphom8wMynTi1cd2O9TwM X-Received: by 2002:a17:902:d4ca:b0:191:1f16:efa3 with SMTP id o10-20020a170902d4ca00b001911f16efa3mr2285177plg.65.1671642846375; Wed, 21 Dec 2022 09:14:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671642846; cv=none; d=google.com; s=arc-20160816; b=mi75haMNF6uMas/bvetIqFoZOjXSt0rj7H90uDE6ejBCNrktsY7bMt8VBkXtx1fWhZ 38aszK0QnEKNlgDYvIL+SW/t5dDU+eMBqtSej6v/x8Npkokg0ghwj8JXGPrOZpHAovXo 9f982pSoM9hKkM8ENMLIF94m8qRn25JwziVRUjxRjH1H4wZM4nCbmRyPr1BFVvWUNaIs U57BHurWrGhzA5a9TsHp8R/IZ4epB9Wp/aFsoaNa0w97DvicboT2537G8Y6WgUZ4bCDZ fU73Zjt+06YNuH9Rr+Of787yPhkYvoWrMT+JFzj5Gyi83oj7VuTHFZZYwsOJjqVxxUao TAsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=C7fcHUyzcW4F+ucufGFezQLMgXWfhaM9eBPMwThasic=; b=YT3iPd6gJqcRlH0R7eGE0Zf3QNhrD5aL32W5LLefd7AuzVEMh8YHYRCOPbB6aiuyT2 bbkB632PrudMAP2mYa0oYuFw6QzagVz6a2rT5qTeX+G5NcWDSX5V3y1aYIoHncR6dEI2 HQiKDqLEBfUievC/0TB+hn/1b1IXfCqC4Q0ylH//B2G91dKYDYh/wLzHtWkVcHMogXHD wJ8pBKibYBnnw+cbmPh0wRNl5Rl0k2IC/ag89gPSa407d2DwbELckP6CntbjukB3akeg 1K/Z5F9VQDCBOM1oaDnsfgWeIzGS2NUcVmYGNMKftfpIG6xGDpO+4lNfinUReZAMLJcV wMvg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="T/Mpt5XG"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q11-20020a170902dacb00b0016efde92292si19204499plx.255.2022.12.21.09.13.52; Wed, 21 Dec 2022 09:14:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="T/Mpt5XG"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234634AbiLURMP (ORCPT + 99 others); Wed, 21 Dec 2022 12:12:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234745AbiLURLf (ORCPT ); Wed, 21 Dec 2022 12:11:35 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 61EE330D for ; Wed, 21 Dec 2022 09:10:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671642652; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=C7fcHUyzcW4F+ucufGFezQLMgXWfhaM9eBPMwThasic=; b=T/Mpt5XG6t2f00mHTtF7WURimWN7Fv1rArJ/EUkpY+1Ay286+MNlzKgRqGerxM8RourWxJ HIq1OnbVN+CvCMSlU66MTOrZIEEJtt8+KS5vx4ydleGDfQChU9ioYe5lESm+Io9iUhzYhN aklraXRwwzYPXoOcTzK2upMsPrMdbzo= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-25-i--luYGMNhq27CYJNt6lVw-1; Wed, 21 Dec 2022 12:10:45 -0500 X-MC-Unique: i--luYGMNhq27CYJNt6lVw-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 23A8B18E0045; Wed, 21 Dec 2022 17:10:45 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CADF8112132C; Wed, 21 Dec 2022 17:10:44 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id A883040408D5D; Wed, 21 Dec 2022 14:09:34 -0300 (-03) Message-ID: <20221221170436.370028855@redhat.com> User-Agent: quilt/0.66 Date: Wed, 21 Dec 2022 13:58:05 -0300 From: Marcelo Tosatti To: atomlin@atomlin.com, frederic@kernel.org Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH v11 4/6] tick/nohz_full: Ensure quiet_vmstat() is called on exit to user-mode when the idle tick is stopped References: <20221221165801.362118576@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752844569589155545?= X-GMAIL-MSGID: =?utf-8?q?1752844569589155545?= From: Aaron Tomlin For nohz full CPUs, we'd like the per-CPU vm statistics to be synchronized when userspace is executing. Otherwise, the vmstat_shepherd might queue a work item to synchronize them, which is undesired intereference for isolated CPUs. This patch syncs CPU-specific vmstat differentials, on return to userspace, if CONFIG_FLUSH_WORK_ON_RESUME_USER is enabled and the tick is stopped. A trivial test program was used to determine the impact of the proposed changes and under vanilla. The mlock(2) and munlock(2) system calls was used solely to modify vmstat item 'NR_MLOCK'. The following is an average count of CPU-cycles across the aforementioned system calls: Vanilla Modified Cycles per syscall 8461 8690 (+2.6%) Signed-off-by: Aaron Tomlin Signed-off-by: Marcelo Tosatti --- include/linux/tick.h | 5 +++-- kernel/time/tick-sched.c | 15 +++++++++++++++ 2 files changed, 18 insertions(+), 2 deletions(-) Index: linux-2.6/include/linux/tick.h =================================================================== --- linux-2.6.orig/include/linux/tick.h +++ linux-2.6/include/linux/tick.h @@ -11,7 +11,6 @@ #include #include #include -#include #ifdef CONFIG_GENERIC_CLOCKEVENTS extern void __init tick_init(void); @@ -272,6 +271,7 @@ static inline void tick_dep_clear_signal extern void tick_nohz_full_kick_cpu(int cpu); extern void __tick_nohz_task_switch(void); +void __tick_nohz_user_enter_prepare(void); extern void __init tick_nohz_full_setup(cpumask_var_t cpumask); #else static inline bool tick_nohz_full_enabled(void) { return false; } @@ -296,6 +296,7 @@ static inline void tick_dep_clear_signal static inline void tick_nohz_full_kick_cpu(int cpu) { } static inline void __tick_nohz_task_switch(void) { } +static inline void __tick_nohz_user_enter_prepare(void) { } static inline void tick_nohz_full_setup(cpumask_var_t cpumask) { } #endif @@ -308,7 +309,7 @@ static inline void tick_nohz_task_switch static inline void tick_nohz_user_enter_prepare(void) { if (tick_nohz_full_cpu(smp_processor_id())) - rcu_nocb_flush_deferred_wakeup(); + __tick_nohz_user_enter_prepare(); } #endif Index: linux-2.6/kernel/time/tick-sched.c =================================================================== --- linux-2.6.orig/kernel/time/tick-sched.c +++ linux-2.6/kernel/time/tick-sched.c @@ -26,6 +26,7 @@ #include #include #include +#include #include @@ -519,6 +520,22 @@ void __tick_nohz_task_switch(void) } } +void __tick_nohz_user_enter_prepare(void) +{ + if (tick_nohz_full_cpu(smp_processor_id())) { +#ifdef CONFIG_FLUSH_WORK_ON_RESUME_USER + struct tick_sched *ts; + + ts = this_cpu_ptr(&tick_cpu_sched); + + if (ts->tick_stopped) + quiet_vmstat(true); +#endif + rcu_nocb_flush_deferred_wakeup(); + } +} +EXPORT_SYMBOL_GPL(__tick_nohz_user_enter_prepare); + /* Get the boot-time nohz CPU list from the kernel parameters. */ void __init tick_nohz_full_setup(cpumask_var_t cpumask) { From patchwork Wed Dec 21 16:58:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 35440 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp3641156wrn; Wed, 21 Dec 2022 09:12:58 -0800 (PST) X-Google-Smtp-Source: AMrXdXvOlp6jHSZOSS1QmidhncyBZCASJLmax10DMyCDpwka3RCYrDIEUNj4O+Z9+Wm58NXKROVR X-Received: by 2002:a17:902:c401:b0:189:ba1f:b168 with SMTP id k1-20020a170902c40100b00189ba1fb168mr3850288plk.1.1671642777744; Wed, 21 Dec 2022 09:12:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671642777; cv=none; d=google.com; s=arc-20160816; b=mn2aXw1dBIeg4xO0+O4yV9fesusoYxG8SXUCxBFxJQQhaQGq7rEN0ONq5fD0MOnP+J H41S7Jo2P3bU70JkEFXMv5hwheYzQbykyY0MxqLYgPsMsa1lciZanKAhkJoWUA2dymef HC4pusCBmjVGtI6HxY+grdCDvI/CvEigbrv6w0Bl0jaUOJWb/aLzNtM61g0SC1s3cDUi dUml40+fvep7Q7rI2D5B/Ppv5/iJ0JLdeJR4k7XAs278LuAZz6T6SUpLRBDtKMFRdQ6d ic5zf+AT8MeWZf3jH6I7fZUi6ogJDc1HLLTTMe2R7ccNu54cJYrgPFMtHF3s0orpAooo kyCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=GqbgBqzYEO8PAbvMLoGoPX+4TxxqLWlHdIHKhW7PgDE=; b=YS95l3hphQX+vSlJGANg/aKAYrUUVIHedDtVZz7LkwweXRzY36X3hyZiAJUTgpNrb0 RR/ri8+rZV4RtGP9ahFmBHNMkZxRgcjQwqv3ykdME2PgoJNx9d3E+p1XeRsSSbGHzsjB QlALCygWELwyLB1/1d2zs2KggP4vXEGrjOO08TF7N/jYY41U4iYH5Icq/9xHEaoRi+U8 h8dY+GaacYeX6CrUHsZOWe+S6UP1dUteYXuNJgKtbV7SDJ/5I7cwbu0A+wZe7WA3bveM nXjjM/V47nI10w9h1lvCbgsnt7F5OUHIuzuFFjTKwYyp7sfNlLKsgB1oktQK/ie0GFV/ IgYw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=g5ODAZuU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p3-20020a170902ebc300b001897a0f7025si16983540plg.309.2022.12.21.09.12.45; Wed, 21 Dec 2022 09:12:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=g5ODAZuU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234739AbiLURLq (ORCPT + 99 others); Wed, 21 Dec 2022 12:11:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53140 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234567AbiLURL0 (ORCPT ); Wed, 21 Dec 2022 12:11:26 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F302EF2E for ; Wed, 21 Dec 2022 09:10:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671642649; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=GqbgBqzYEO8PAbvMLoGoPX+4TxxqLWlHdIHKhW7PgDE=; b=g5ODAZuUQtnWzbREVEfRHsVhtr6oIdDCSHuJCKsbqEsayEJ4jcr96ln5bY5jFsvOovk4FN nFZ53oYCX2iusJRoFdk68mQ8rcQeS5c4+Ns1K8LPYS6/cNWEYCPTlcdyHN45drekmnQzWP FMHESzJv8iOVzyKwfZ/LmYdN9S/xb0w= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-306-91eF53Q9Mv229HhI4Py33g-1; Wed, 21 Dec 2022 12:10:45 -0500 X-MC-Unique: 91eF53Q9Mv229HhI4Py33g-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3EA4F882823; Wed, 21 Dec 2022 17:10:45 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id CADD92166B26; Wed, 21 Dec 2022 17:10:44 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id ADC4340408D5E; Wed, 21 Dec 2022 14:09:34 -0300 (-03) Message-ID: <20221221170436.409732339@redhat.com> User-Agent: quilt/0.66 Date: Wed, 21 Dec 2022 13:58:06 -0300 From: Marcelo Tosatti To: atomlin@atomlin.com, frederic@kernel.org Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH v11 5/6] tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too References: <20221221165801.362118576@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752844497434435955?= X-GMAIL-MSGID: =?utf-8?q?1752844497434435955?= From: Aaron Tomlin In the context of the idle task and an adaptive-tick mode/or a nohz_full CPU, quiet_vmstat() can be called: before stopping the idle tick, entering an idle state and on exit. In particular, for the latter case, when the idle task is required to reschedule, the idle tick can remain stopped and the timer expiration time endless i.e., KTIME_MAX. Now, indeed before a nohz_full CPU enters an idle state, CPU-specific vmstat counters should be processed to ensure the respective values have been reset and folded into the zone specific 'vm_stat[]'. That being said, it can only occur when: the idle tick was previously stopped, and reprogramming of the timer is not required. A customer provided some evidence which indicates that the idle tick was stopped; albeit, CPU-specific vmstat counters still remained populated. Thus one can only assume quiet_vmstat() was not invoked on return to the idle loop. If I understand correctly, I suspect this divergence might erroneously prevent a reclaim attempt by kswapd. If the number of zone specific free pages are below their per-cpu drift value then zone_page_state_snapshot() is used to compute a more accurate view of the aforementioned statistic. Thus any task blocked on the NUMA node specific pfmemalloc_wait queue will be unable to make significant progress via direct reclaim unless it is killed after being woken up by kswapd (see throttle_direct_reclaim()). Consider the following theoretical scenario: - Note: CPU X is part of 'tick_nohz_full_mask' 1. CPU Y migrated running task A to CPU X that was in an idle state i.e. waiting for an IRQ; marked the current task on CPU X to need/or require a reschedule i.e., set TIF_NEED_RESCHED and invoked a reschedule IPI to CPU X (see sched_move_task()) 2. CPU X acknowledged the reschedule IPI. Generic idle loop code noticed the TIF_NEED_RESCHED flag against the idle task and attempts to exit of the loop and calls the main scheduler function i.e. __schedule(). Since the idle tick was previously stopped no scheduling-clock tick would occur. So, no deferred timers would be handled 3. Post transition to kernel execution Task A running on CPU X, indirectly released a few pages (e.g. see __free_one_page()); CPU X's 'vm_stat_diff[NR_FREE_PAGES]' was updated and zone specific 'vm_stat[]' update was deferred as per the CPU-specific stat threshold 4. Task A does invoke exit(2) and the kernel does remove the task from the run-queue; the idle task was selected to execute next since there are no other runnable tasks assigned to the given CPU (see pick_next_task() and pick_next_task_idle()) 5. On return to the idle loop since the idle tick was already stopped and can remain so (see [1] below) e.g. no pending soft IRQs, no attempt is made to zero and fold CPU X's vmstat counters since reprogramming of the scheduling-clock tick is not required/or needed (see [2]) ... do_idle { __current_set_polling() tick_nohz_idle_enter() while (!need_resched()) { local_irq_disable() ... /* No polling or broadcast event */ cpuidle_idle_call() { if (cpuidle_not_available(drv, dev)) { tick_nohz_idle_stop_tick() __tick_nohz_idle_stop_tick(this_cpu_ptr(&tick_cpu_sched)) { int cpu = smp_processor_id() if (ts->timer_expires_base) expires = ts->timer_expires else if (can_stop_idle_tick(cpu, ts)) (1) -------> expires = tick_nohz_next_event(ts, cpu) else return ts->idle_calls++ if (expires > 0LL) { tick_nohz_stop_tick(ts, cpu) { if (ts->tick_stopped && (expires == ts->next_tick)) { (2) -------> if (tick == KTIME_MAX || ts->next_tick == hrtimer_get_expires(&ts->sched_timer)) return } ... } So, the idea of this patch is to ensure refresh_cpu_vm_stats(false) is called, when it is appropriate, on return to the idle loop if the idle tick was previously stopped too. A trivial test program was used to determine the impact of the proposed changes and under vanilla. The nanosleep(2) system call was used several times to suspend execution for a period of time to approximately compute the number of CPU-cycles in the idle code path. The following is an average count of CPU-cycles: Vanilla Modified Cycles per idle loop 151858 153258 (+1.0%) Signed-off-by: Aaron Tomlin Signed-off-by: Marcelo Tosatti --- kernel/time/tick-sched.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: linux-2.6/kernel/time/tick-sched.c =================================================================== --- linux-2.6.orig/kernel/time/tick-sched.c +++ linux-2.6/kernel/time/tick-sched.c @@ -928,13 +928,14 @@ static void tick_nohz_stop_tick(struct t */ if (!ts->tick_stopped) { calc_load_nohz_start(); - quiet_vmstat(false); ts->last_tick = hrtimer_get_expires(&ts->sched_timer); ts->tick_stopped = 1; trace_tick_stop(1, TICK_DEP_MASK_NONE); } + /* Attempt to fold when the idle tick is stopped or not */ + quiet_vmstat(false); ts->next_tick = tick; /* From patchwork Wed Dec 21 16:58:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 35438 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp3640879wrn; Wed, 21 Dec 2022 09:12:31 -0800 (PST) X-Google-Smtp-Source: AMrXdXviO00fwERXQYfs4C0vqcL5XgeCm/cM/38wpUNwgUH3T2SZyJxaO6C3++YwZGnnxnhSQ/Nt X-Received: by 2002:aa7:88d4:0:b0:574:f201:660a with SMTP id k20-20020aa788d4000000b00574f201660amr3797294pff.33.1671642751002; Wed, 21 Dec 2022 09:12:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671642750; cv=none; d=google.com; s=arc-20160816; b=vTSAQZVL4trIz9GLgjL8jS+r7jWWJ4C6ed9O7Pci6Wd/aKsPLa/f1+MbGg1UYoslJg jWlE6eORY+6Fwuxjt8azkjIoEuN+MCxSwQqZw5826TaJASV2AvrvCtdo3w5h9NsDYuG8 RNkwD6Skeze7oDn7wl1vPF4LKs15tWXCloa5jAhIoTHMH6d5b5S6Cr0vO8+M9sYmcYuG ZwnKAfjC+od/EusmR2+QvbOOn+hETkC9ELwCcBD0jAa8QLWHFyRAUVBZ8yaiPYduisrV 77GXALsOQwXN8nHK4B5Z8XYvqBzYyDhztWVT56StmQajF+wPH5d9/4MeOkc304GpR4Yu 5UIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=TlUBF+2Ovi4fw5cA3chagzYiWYAEFxQHjy6HVV9843g=; b=H+G5cIJefBHi1dF03qp1FLge2h5Exy6dp2Qv1QiMk+q15JxPUCrETPLT102CaJmhX4 A6Qqm9q7FzOy3lQr+m8xlBDju6CmQVsmGtVbPGjr2Dw6sumZW0bEqvrlKsglIjJZDNig d5JGyMECVeKF4ynv/Ga1JlUhTQtvzY1NOB+kttOS/JcrpFm1VJ0jgP+AbkbVenqOPgI7 T0iOWFQYDno9MP/f/+cdNqBlbzv6lE1CvcSLfXGGFJXC/M6rbOD7LnNk5uoQPK+lD5D8 nb/JFyA04Jey4XCdWYzhluhiq0fiFZFMSsm3zWKis8rJt5eYsUSWvEG6iyoPKxNZjpfl mRqQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TrqpULD3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g6-20020a62e306000000b00573a320155fsi4102407pfh.34.2022.12.21.09.12.18; Wed, 21 Dec 2022 09:12:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TrqpULD3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234714AbiLURMA (ORCPT + 99 others); Wed, 21 Dec 2022 12:12:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52572 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234740AbiLURLc (ORCPT ); Wed, 21 Dec 2022 12:11:32 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F3983279 for ; Wed, 21 Dec 2022 09:10:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671642650; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=TlUBF+2Ovi4fw5cA3chagzYiWYAEFxQHjy6HVV9843g=; b=TrqpULD3gTwTepHiCA/N1HKRNLPzyubuRizwl7OHJrJVm6YRci7g9JTGpK3YApuCH7+CGa JE77GBn0jzg3the4L5YDKckJvSfpnAVesGsHSRsmiFSPmYUQvHkPijLSqQ+l2zubPhSCvr FL3/Dm49TCZeA0x3A7SJI/PcVjLcxQU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-483-4PpJoZByPimihLmmqWBpNQ-1; Wed, 21 Dec 2022 12:10:46 -0500 X-MC-Unique: 4PpJoZByPimihLmmqWBpNQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 20631882827; Wed, 21 Dec 2022 17:10:46 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-4.gru2.redhat.com [10.97.112.4]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DA5E92026D4B; Wed, 21 Dec 2022 17:10:44 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id B17E640408D5F; Wed, 21 Dec 2022 14:09:34 -0300 (-03) Message-ID: <20221221170436.449941687@redhat.com> User-Agent: quilt/0.66 Date: Wed, 21 Dec 2022 13:58:07 -0300 From: Marcelo Tosatti To: atomlin@atomlin.com, frederic@kernel.org Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH v11 6/6] mm/vmstat: avoid queueing work item if cpu stats are clean References: <20221221165801.362118576@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752844469308045262?= X-GMAIL-MSGID: =?utf-8?q?1752844469308045262?= It is not necessary to queue work item to run refresh_vm_stats on a remote CPU if that CPU has no dirty stats and no per-CPU allocations for remote nodes. This fixes sosreport hang (which uses vmstat_refresh) with spinning SCHED_FIFO process. Signed-off-by: Marcelo Tosatti Index: linux-2.6/mm/vmstat.c =================================================================== --- linux-2.6.orig/mm/vmstat.c +++ linux-2.6/mm/vmstat.c @@ -1917,6 +1917,31 @@ static const struct seq_operations vmsta #ifdef CONFIG_SMP #ifdef CONFIG_PROC_FS +static bool need_drain_remote_zones(int cpu) +{ +#ifdef CONFIG_NUMA + struct zone *zone; + + for_each_populated_zone(zone) { + struct per_cpu_pages *pcp; + + pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); + if (!pcp->count) + continue; + + if (!pcp->expire) + continue; + + if (zone_to_nid(zone) == cpu_to_node(cpu)) + continue; + + return true; + } +#endif + + return false; +} + static void refresh_vm_stats(struct work_struct *work) { refresh_cpu_vm_stats(true); @@ -1926,8 +1951,12 @@ int vmstat_refresh(struct ctl_table *tab void *buffer, size_t *lenp, loff_t *ppos) { long val; - int err; - int i; + int i, cpu; + struct work_struct __percpu *works; + + works = alloc_percpu(struct work_struct); + if (!works) + return -ENOMEM; /* * The regular update, every sysctl_stat_interval, may come later @@ -1941,9 +1970,21 @@ int vmstat_refresh(struct ctl_table *tab * transiently negative values, report an error here if any of * the stats is negative, so we know to go looking for imbalance. */ - err = schedule_on_each_cpu(refresh_vm_stats); - if (err) - return err; + cpus_read_lock(); + for_each_online_cpu(cpu) { + struct work_struct *work = per_cpu_ptr(works, cpu); + struct vmstat_dirty *vms = per_cpu_ptr(&vmstat_dirty_pcpu, cpu); + + INIT_WORK(work, refresh_vm_stats); + + if (vms->dirty || need_drain_remote_zones(cpu)) + schedule_work_on(cpu, work); + } + for_each_online_cpu(cpu) + flush_work(per_cpu_ptr(works, cpu)); + cpus_read_unlock(); + free_percpu(works); + for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) { /* * Skip checking stats known to go negative occasionally.