From patchwork Thu Jan 5 12:52:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 39522 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp286531wrt; Thu, 5 Jan 2023 04:59:03 -0800 (PST) X-Google-Smtp-Source: AMrXdXtBd6Mk5coU0QVRvGk3xIIMqRm8EAVFl9EJjjkoJvGCdL+RiqfLJXrWmbGp1ONfm/EI2RrP X-Received: by 2002:a05:6a20:2d06:b0:ad:e5e8:cfe8 with SMTP id g6-20020a056a202d0600b000ade5e8cfe8mr64981427pzl.48.1672923542958; Thu, 05 Jan 2023 04:59:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672923542; cv=none; d=google.com; s=arc-20160816; b=xDBQvlOUgph2aFppSOI5DIT2Iu6W1pPH7N9Ei2ww13khkqRGHfurWsesyZAkDDg1/T 3+Up0CNURu8tQmTvA6mH6U/nIwH+Ib3p9batfXpaDwa83Q2XD998s2WC6RVQLl4insh+ O0idePrn93h/ISx4FRcPbPLSqKJEmAgRdnClkOiHVSu9FfoIXmsBsi+tmzdwvOsmrz2r CQ2xQM5x9J9qXYIk7z8hUIuVGcbT4Wc09TaC5wtehGSFDZP57S2/FUkrWfzIIAdD7lwK UyzuXcDVWwhXCFxYQTGbahZh0oUN4fDEbF/S2yTQS3octOJ3LmMAxKUE+mZUFbs0jYUN dzxg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=A8ptA+mtJpfx4WDFilLLKvFD1fvtEmfC0YGkopBqCFA=; b=OG9zIHL0k7szY/EvR+s4w2ZIPP//vFIczOuLIl/onFwJkSQdZQWVpA5cFZi0DZu6y3 3QhPataciveVVv7kb4Mg1/HPlr3Y+tuisX+peVhrdllFY8M2I2ITYa2LAVlCpwHi95cF kA5WoZcdWn1dH9ixlIR6xLC43v1Wb/qulz6NjhodWSYIZOaQWYC/SgcSfEpT+pxCmnE6 YDee9ka5F0auBwa8PsRPR7qfkYnTutVnMBSVHykc/vkjTQyBqNwLvlvCfG2fuWCNHlCT DoXArycie2gRhyMS2Dy//fbTLOOJb4P205G09SOqeJ1rAGfXV1NLwf5Ru3fLj8y6CSvd O9Ww== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RGVgT3Zb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ls4-20020a17090b350400b002194156ef24si1926848pjb.189.2023.01.05.04.58.49; Thu, 05 Jan 2023 04:59:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RGVgT3Zb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233665AbjAEM5u (ORCPT + 99 others); Thu, 5 Jan 2023 07:57:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45898 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233394AbjAEM5d (ORCPT ); Thu, 5 Jan 2023 07:57:33 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16C1050E62 for ; Thu, 5 Jan 2023 04:56:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1672923405; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=A8ptA+mtJpfx4WDFilLLKvFD1fvtEmfC0YGkopBqCFA=; b=RGVgT3ZbuP2Sl2+C8qpCsCT2u2IJbIFO3/wPtpCkE7JOvY290IjFrkRO9vFSoyDZygGtk1 rhGHA5TqkFtRnG7HrBhj+TYgA1mKefv/h1rAkRceeOsxE9elkZImzY6t0+d5M2ha3DQN2R sz6BrsnhUWNGvAHO9BlhZgCqFHcRc6c= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-115-ihzkEaxQNQebANYux16DrQ-1; Thu, 05 Jan 2023 07:56:41 -0500 X-MC-Unique: ihzkEaxQNQebANYux16DrQ-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 9D881101A52E; Thu, 5 Jan 2023 12:56:40 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2F20349BB6A; Thu, 5 Jan 2023 12:56:40 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 9695340502F36; Thu, 5 Jan 2023 09:54:47 -0300 (-03) Message-ID: <20230105125248.772766288@redhat.com> User-Agent: quilt/0.66 Date: Thu, 05 Jan 2023 09:52:19 -0300 From: Marcelo Tosatti To: atomlin@atomlin.com, frederic@kernel.org Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH v13 1/6] mm/vmstat: Add CPU-specific variable to track a vmstat discrepancy References: <20230105125218.031928326@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754187476860322142?= X-GMAIL-MSGID: =?utf-8?q?1754187476860322142?= From: Aaron Tomlin Introduce a CPU-specific variable namely vmstat_dirty to indicate if a vmstat imbalance is present for a given CPU. Therefore, at the appropriate time, we can fold all the remaining differentials. This patch also provides trivial helpers for modification and testing. Signed-off-by: Aaron Tomlin Signed-off-by: Marcelo Tosatti --- mm/vmstat.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) Index: linux-2.6/mm/vmstat.c =================================================================== --- linux-2.6.orig/mm/vmstat.c +++ linux-2.6/mm/vmstat.c @@ -194,6 +194,22 @@ void fold_vm_numa_events(void) #endif #ifdef CONFIG_SMP +static DEFINE_PER_CPU_ALIGNED(bool, vmstat_dirty); + +static inline void vmstat_mark_dirty(void) +{ + this_cpu_write(vmstat_dirty, true); +} + +static inline void vmstat_clear_dirty(void) +{ + this_cpu_write(vmstat_dirty, false); +} + +static inline bool is_vmstat_dirty(void) +{ + return this_cpu_read(vmstat_dirty); +} int calculate_pressure_threshold(struct zone *zone) { From patchwork Thu Jan 5 12:52:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 39519 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp286222wrt; Thu, 5 Jan 2023 04:58:07 -0800 (PST) X-Google-Smtp-Source: AMrXdXtCFR2qOOZpG1ruO4LUVHIK9ywxqt5Vo2I9NYCHzPhwIS+nzW81EjAxmcSs0Jgax1mFxMKj X-Received: by 2002:a17:902:bd98:b0:188:8cfc:6ba7 with SMTP id q24-20020a170902bd9800b001888cfc6ba7mr54649688pls.68.1672923487152; Thu, 05 Jan 2023 04:58:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672923487; cv=none; d=google.com; s=arc-20160816; b=nZprgZs+kVvLnj+7KENjFWNpfYIYu6SVjq05UPJOO4WJ0eP9C5HakyL/IxuQqQ5d/+ 6nWp8vYoe6c4kif2PuSNbhRt/WOJ8Qyu1+LttcIRotg2WYHfvn9kL4N62/dVNOLj5eDm oINsjqYHVUv4CftRjVoeEZdYbAEHLgrepz9iGy2zgkgdSq4Q7Pc6q/M0HIQghX4vlblp NjO/W1vSJBGiTHVAaIsC+hAjCzXFgYwoCWb/87o90fCnu0vFIljrqN21faXMLlXlfBZT QukMTqHqtJmCRQZyciiGbEjzvIKutlJA/IumaAo/y0Rg3y/3INM6JPlanAQPcYCv0rUY OREw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=AWWNy27PnDV2BzrRi18CQGy2pR4xt8zt+F15YzdMHas=; b=BsI9znL5uijtsK1CBAp6ww8o7JNsl2w45BlGUjMPWW1vMTZl4h0KiTYQ8ql4ydhtS+ SS62QkLB/1JZ0u6oPPkA9A+ZE0CzODkN1ZkNfmXIxTZwtY34qF2EWhlt8jZkcczbNd8w 6+WaUVI6C5BQDQiMJATjIVxN+uGyydN0IgToEOeDctN/aVpWpChE0Ia4pmr7DxIfQWSc weq9G+ipTmY3B1OYuvgl2oX1aSEKOG8MLSrCnlOGU/McEjFZUBvgdazFHyTigFAdFP9Q kPp0mV8VGk8MsqAkW6e/gv8FT/lIMk/ua8OcMFPyPu7yC2ZgdBf0udsOtACHUqaw0sf8 XPHg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=e0KcPP9w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z6-20020a170902834600b00188f9534a59si34594150pln.306.2023.01.05.04.57.53; Thu, 05 Jan 2023 04:58:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=e0KcPP9w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233519AbjAEM5i (ORCPT + 99 others); Thu, 5 Jan 2023 07:57:38 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45920 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233061AbjAEM52 (ORCPT ); Thu, 5 Jan 2023 07:57:28 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1156951310 for ; Thu, 5 Jan 2023 04:56:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1672923406; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=AWWNy27PnDV2BzrRi18CQGy2pR4xt8zt+F15YzdMHas=; b=e0KcPP9wqxdWVV0zbR0iSSuB6efDhXeY/iNAAryZCN7Ikg4x5q2t7g7rCJSS++XRJVX6JE UIAnpXmkoPH2vqJ3I2lZSHaCBUnEMysbNioGxwxFzU1U5DfMSJ4X3V/9lPlC+YQpO6rD68 45Ptab/DoQ1UKXnF+ZLrWuH2efBEo7Q= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-191--cpSVXu9NI2oKV4L1YUfng-1; Thu, 05 Jan 2023 07:56:43 -0500 X-MC-Unique: -cpSVXu9NI2oKV4L1YUfng-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id AD5AD80234E; Thu, 5 Jan 2023 12:56:42 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4916153A0; Thu, 5 Jan 2023 12:56:42 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 98FF240502F3A; Thu, 5 Jan 2023 09:54:47 -0300 (-03) Message-ID: <20230105125248.813825852@redhat.com> User-Agent: quilt/0.66 Date: Thu, 05 Jan 2023 09:52:20 -0300 From: Marcelo Tosatti To: atomlin@atomlin.com, frederic@kernel.org Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH v13 2/6] mm/vmstat: Use vmstat_dirty to track CPU-specific vmstat discrepancies References: <20230105125218.031928326@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754187418355573011?= X-GMAIL-MSGID: =?utf-8?q?1754187418355573011?= From: Aaron Tomlin This patch will now use the previously introduced CPU-specific variable namely vmstat_dirty to indicate if a vmstat differential/or imbalance is present for a given CPU. So, at the appropriate time, vmstat processing can be initiated. The hope is that this particular approach is "cheaper" when compared to need_update(). The idea is based on Marcelo's patch [1]. [1]: https://lore.kernel.org/lkml/20220204173554.763888172@fedora.localdomain/ Signed-off-by: Aaron Tomlin Signed-off-by: Marcelo Tosatti --- mm/vmstat.c | 48 ++++++++++++++---------------------------------- 1 file changed, 14 insertions(+), 34 deletions(-) Index: linux-2.6/mm/vmstat.c =================================================================== --- linux-2.6.orig/mm/vmstat.c +++ linux-2.6/mm/vmstat.c @@ -381,6 +381,7 @@ void __mod_zone_page_state(struct zone * x = 0; } __this_cpu_write(*p, x); + vmstat_mark_dirty(); preempt_enable_nested(); } @@ -417,6 +418,7 @@ void __mod_node_page_state(struct pglist x = 0; } __this_cpu_write(*p, x); + vmstat_mark_dirty(); preempt_enable_nested(); } @@ -577,6 +579,9 @@ static inline void mod_zone_state(struct s8 __percpu *p = pcp->vm_stat_diff + item; long o, n, t, z; + /* cmpxchg and vmstat_mark_dirty should happen on the same CPU */ + preempt_disable(); + do { z = 0; /* overflow to zone counters */ @@ -606,6 +611,8 @@ static inline void mod_zone_state(struct if (z) zone_page_state_add(z, zone, item); + vmstat_mark_dirty(); + preempt_enable(); } void mod_zone_page_state(struct zone *zone, enum zone_stat_item item, @@ -645,6 +652,8 @@ static inline void mod_node_state(struct delta >>= PAGE_SHIFT; } + /* cmpxchg and vmstat_mark_dirty should happen on the same CPU */ + preempt_disable(); do { z = 0; /* overflow to node counters */ @@ -674,6 +683,8 @@ static inline void mod_node_state(struct if (z) node_page_state_add(z, pgdat, item); + vmstat_mark_dirty(); + preempt_enable(); } void mod_node_page_state(struct pglist_data *pgdat, enum node_stat_item item, @@ -828,6 +839,14 @@ static int refresh_cpu_vm_stats(bool do_ int global_node_diff[NR_VM_NODE_STAT_ITEMS] = { 0, }; int changes = 0; + /* + * Clear vmstat_dirty before clearing the percpu vmstats. + * If interrupts are enabled, it is possible that an interrupt + * or another task modifies a percpu vmstat, which will + * set vmstat_dirty to true. + */ + vmstat_clear_dirty(); + for_each_populated_zone(zone) { struct per_cpu_zonestat __percpu *pzstats = zone->per_cpu_zonestats; #ifdef CONFIG_NUMA @@ -1957,35 +1976,6 @@ static void vmstat_update(struct work_st } /* - * Check if the diffs for a certain cpu indicate that - * an update is needed. - */ -static bool need_update(int cpu) -{ - pg_data_t *last_pgdat = NULL; - struct zone *zone; - - for_each_populated_zone(zone) { - struct per_cpu_zonestat *pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); - struct per_cpu_nodestat *n; - - /* - * The fast way of checking if there are any vmstat diffs. - */ - if (memchr_inv(pzstats->vm_stat_diff, 0, sizeof(pzstats->vm_stat_diff))) - return true; - - if (last_pgdat == zone->zone_pgdat) - continue; - last_pgdat = zone->zone_pgdat; - n = per_cpu_ptr(zone->zone_pgdat->per_cpu_nodestats, cpu); - if (memchr_inv(n->vm_node_stat_diff, 0, sizeof(n->vm_node_stat_diff))) - return true; - } - return false; -} - -/* * Switch off vmstat processing and then fold all the remaining differentials * until the diffs stay at zero. The function is used by NOHZ and can only be * invoked when tick processing is not active. @@ -1995,10 +1985,7 @@ void quiet_vmstat(void) if (system_state != SYSTEM_RUNNING) return; - if (!delayed_work_pending(this_cpu_ptr(&vmstat_work))) - return; - - if (!need_update(smp_processor_id())) + if (!is_vmstat_dirty()) return; /* @@ -2029,7 +2016,7 @@ static void vmstat_shepherd(struct work_ for_each_online_cpu(cpu) { struct delayed_work *dw = &per_cpu(vmstat_work, cpu); - if (!delayed_work_pending(dw) && need_update(cpu)) + if (!delayed_work_pending(dw) && per_cpu(vmstat_dirty, cpu)) queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0); cond_resched(); From patchwork Thu Jan 5 12:52:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 39523 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp286567wrt; Thu, 5 Jan 2023 04:59:10 -0800 (PST) X-Google-Smtp-Source: AMrXdXtkKE1FT48CvMWVkwRbJVM5EZN7298gsauRyBqb3T5obaQlFtt45suDIDp/YFC/wtoF3TgH X-Received: by 2002:a05:6a21:3a45:b0:9d:efbe:e607 with SMTP id zu5-20020a056a213a4500b0009defbee607mr61552661pzb.35.1672923549924; Thu, 05 Jan 2023 04:59:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672923549; cv=none; d=google.com; s=arc-20160816; b=FgLN0M+JrUcuFCjbf5KRps2rH29jt2UHMhfTXrun/E32Mkfb8ecptaguRyz7Kof2Kn V0D44LI8Z3Bkt+f71avAk53FpLpGesJ0wDILwpYNWW1jCdctJUnDNA0ceChE+MvO7VTm AUHmWES0ci56veIHwFfZa/fzI+Z5Rr1sXISzyYVbDB8ZrIOHdwa47QMybCY2gp/R6QNV xhqlREkSRXc3ES2Ijb1yV0OEQwv09CUds+93KVf48q+uZmchYVvTnnp1j9CWkg97cJb6 H1X513Hrxcdnj6Eh6yBvcloZl+X//pAv+CvfsAaN/lAQ/S1mxGyFtlNDizy7hW/BgZnQ maVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=WjySYZGPU70OrR1lv/LMrVt9W5iPl/r1manuHwTbfNg=; b=OJ6djXmgW4Df5R+G2tKTvhwDSH6NkGENMqBd3oPHr/zw6ZKF5OJAAnXoB+wq0Fao46 FvqzeaQvGWWE1QjIJR0vixriDQIvlr32VDALzJd3Pi3M6aLrv//MzptSAZcDvQ9wnjJH 3XXHHfO1wMaHuxFkNPOn3thZq04Jfx4pW/QnEwu9eh/v/cSHWmJehllTSNcIhhoAqv3i c++F2D52K+j9G4aFhmds6Q/VPUJznTbzp2fqLhdMlr1XEdd9DtRbgo40dV7/ZsiWItxt fAbegx5iztoU8HYNrv0zNSQLi1u3thRJ/29ay6s6H7SHaaSL0XgJOcG23DOqZqBdtwKo 8kmA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Zjdxxyks; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w70-20020a638249000000b00478dfd40ef6si11237216pgd.768.2023.01.05.04.58.57; Thu, 05 Jan 2023 04:59:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Zjdxxyks; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233400AbjAEM54 (ORCPT + 99 others); Thu, 5 Jan 2023 07:57:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45954 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233567AbjAEM5m (ORCPT ); Thu, 5 Jan 2023 07:57:42 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B7B345014D for ; Thu, 5 Jan 2023 04:56:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1672923411; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=WjySYZGPU70OrR1lv/LMrVt9W5iPl/r1manuHwTbfNg=; b=Zjdxxyks8lAOkeKtuT/FxW1h2yppKUJcyu+5Y4iNNSoqgSTdqylhHp3Q92RE9fYNF/JPtE qiPhLJkUOKlRve/cYQ7D1uOJCjNSMu2StodUXUpr+KoBGhmeq05tF8+mqN3YNEBxpcHKOD WEkzd9B9yb60rAqTQtNlPNcpTFUNsrQ= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-669-kZwppUtgOjWtuR5rl6C4SQ-1; Thu, 05 Jan 2023 07:56:43 -0500 X-MC-Unique: kZwppUtgOjWtuR5rl6C4SQ-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id ACFB78588E1; Thu, 5 Jan 2023 12:56:42 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 45C36492D8B; Thu, 5 Jan 2023 12:56:42 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 9B65340502F3C; Thu, 5 Jan 2023 09:54:47 -0300 (-03) Message-ID: <20230105125248.853465707@redhat.com> User-Agent: quilt/0.66 Date: Thu, 05 Jan 2023 09:52:21 -0300 From: Marcelo Tosatti To: atomlin@atomlin.com, frederic@kernel.org Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH v13 3/6] mm/vmstat: manage per-CPU stats from CPU context when NOHZ full References: <20230105125218.031928326@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754187484523745312?= X-GMAIL-MSGID: =?utf-8?q?1754187484523745312?= For nohz full CPUs, we'd like the per-CPU vm statistics to be synchronized when userspace is executing. Otherwise, the vmstat_shepherd might queue a work item to synchronize them, which is undesired intereference for isolated CPUs. This means that its necessary to check for, and possibly sync, the statistics when returning to userspace. This means that there are now two execution contexes, on different CPUs, which require awareness about each other: context switch and vmstat shepherd kernel threadr. To avoid the shared variables between these two contexes (which would require atomic accesses), delegate the responsability of statistics synchronization from vmstat_shepherd to local CPU context, for nohz_full CPUs. Do that by queueing a delayed work when marking per-CPU vmstat dirty. When returning to userspace, fold the stats and cancel the delayed work. When entering idle, only fold the stats. Signed-off-by: Marcelo Tosatti --- include/linux/vmstat.h | 4 ++-- kernel/time/tick-sched.c | 2 +- mm/vmstat.c | 41 ++++++++++++++++++++++++++++++++--------- 3 files changed, 35 insertions(+), 12 deletions(-) Index: linux-2.6/mm/vmstat.c =================================================================== --- linux-2.6.orig/mm/vmstat.c +++ linux-2.6/mm/vmstat.c @@ -28,6 +28,7 @@ #include #include #include +#include #include "internal.h" @@ -194,21 +195,57 @@ void fold_vm_numa_events(void) #endif #ifdef CONFIG_SMP -static DEFINE_PER_CPU_ALIGNED(bool, vmstat_dirty); + +struct vmstat_dirty { + bool dirty; +#ifdef CONFIG_FLUSH_WORK_ON_RESUME_USER + bool cpu_offline; +#endif +}; + +static DEFINE_PER_CPU_ALIGNED(struct vmstat_dirty, vmstat_dirty_pcpu); +static DEFINE_PER_CPU(struct delayed_work, vmstat_work); +int sysctl_stat_interval __read_mostly = HZ; + +#ifdef CONFIG_FLUSH_WORK_ON_RESUME_USER +static inline void vmstat_queue_local_work(void) +{ + bool vmstat_dirty = this_cpu_read(vmstat_dirty_pcpu.dirty); + bool cpu_offline = this_cpu_read(vmstat_dirty_pcpu.cpu_offline); + int cpu = smp_processor_id(); + + if (tick_nohz_full_cpu(cpu) && !vmstat_dirty) { + struct delayed_work *dw; + + dw = this_cpu_ptr(&vmstat_work); + if (!delayed_work_pending(dw) && !cpu_offline) { + unsigned long delay; + + delay = round_jiffies_relative(sysctl_stat_interval); + queue_delayed_work_on(cpu, mm_percpu_wq, dw, delay); + } + } +} +#else +static inline void vmstat_queue_local_work(void) +{ +} +#endif static inline void vmstat_mark_dirty(void) { - this_cpu_write(vmstat_dirty, true); + vmstat_queue_local_work(); + this_cpu_write(vmstat_dirty_pcpu.dirty, true); } static inline void vmstat_clear_dirty(void) { - this_cpu_write(vmstat_dirty, false); + this_cpu_write(vmstat_dirty_pcpu.dirty, false); } static inline bool is_vmstat_dirty(void) { - return this_cpu_read(vmstat_dirty); + return this_cpu_read(vmstat_dirty_pcpu.dirty); } int calculate_pressure_threshold(struct zone *zone) @@ -1893,9 +1930,6 @@ static const struct seq_operations vmsta #endif /* CONFIG_PROC_FS */ #ifdef CONFIG_SMP -static DEFINE_PER_CPU(struct delayed_work, vmstat_work); -int sysctl_stat_interval __read_mostly = HZ; - #ifdef CONFIG_PROC_FS static void refresh_vm_stats(struct work_struct *work) { @@ -1980,7 +2014,7 @@ static void vmstat_update(struct work_st * until the diffs stay at zero. The function is used by NOHZ and can only be * invoked when tick processing is not active. */ -void quiet_vmstat(void) +void quiet_vmstat(bool user) { if (system_state != SYSTEM_RUNNING) return; @@ -1988,13 +2022,19 @@ void quiet_vmstat(void) if (!is_vmstat_dirty()) return; + refresh_cpu_vm_stats(false); + + if (!IS_ENABLED(CONFIG_FLUSH_WORK_ON_RESUME_USER)) + return; + + if (!user) + return; /* - * Just refresh counters and do not care about the pending delayed - * vmstat_update. It doesn't fire that often to matter and canceling - * it would be too expensive from this path. - * vmstat_shepherd will take care about that for us. + * If the tick is stopped, cancel any delayed work to avoid + * interruptions to this CPU in the future. */ - refresh_cpu_vm_stats(false); + if (delayed_work_pending(this_cpu_ptr(&vmstat_work))) + cancel_delayed_work(this_cpu_ptr(&vmstat_work)); } /* @@ -2015,8 +2055,14 @@ static void vmstat_shepherd(struct work_ /* Check processors whose vmstat worker threads have been disabled */ for_each_online_cpu(cpu) { struct delayed_work *dw = &per_cpu(vmstat_work, cpu); + struct vmstat_dirty *vms = per_cpu_ptr(&vmstat_dirty_pcpu, cpu); + + if (IS_ENABLED(CONFIG_FLUSH_WORK_ON_RESUME_USER)) + /* NOHZ full CPUs manage their own vmstat flushing */ + if (tick_nohz_full_cpu(cpu)) + continue; - if (!delayed_work_pending(dw) && per_cpu(vmstat_dirty, cpu)) + if (!delayed_work_pending(dw) && vms->dirty) queue_delayed_work_on(cpu, mm_percpu_wq, dw, 0); cond_resched(); @@ -2049,8 +2095,36 @@ static void __init init_cpu_node_state(v } } +#ifdef CONFIG_FLUSH_WORK_ON_RESUME_USER +static void vmstat_cpu_online_rearm(unsigned int cpu) +{ + struct vmstat_dirty *vms = per_cpu_ptr(&vmstat_dirty_pcpu, cpu); + + if (tick_nohz_full_cpu(cpu)) { + struct delayed_work *dw; + + vms->cpu_offline = false; + vms->dirty = true; + + dw = this_cpu_ptr(&vmstat_work); + if (!delayed_work_pending(dw)) { + unsigned long delay; + + delay = round_jiffies_relative(sysctl_stat_interval); + queue_delayed_work_on(cpu, mm_percpu_wq, dw, delay); + } + } +} +#else +static void vmstat_cpu_online_rearm(unsigned int cpu) +{ +} +#endif + static int vmstat_cpu_online(unsigned int cpu) { + vmstat_cpu_online_rearm(cpu); + refresh_zone_stat_thresholds(); if (!node_state(cpu_to_node(cpu), N_CPU)) { @@ -2060,8 +2134,28 @@ static int vmstat_cpu_online(unsigned in return 0; } + +#ifdef CONFIG_FLUSH_WORK_ON_RESUME_USER +static void vmstat_mark_cpu_offline(unsigned int cpu) +{ + struct vmstat_dirty *vms = per_cpu_ptr(&vmstat_dirty_pcpu, cpu); + + vms->cpu_offline = true; +} +#else +static void vmstat_mark_cpu_offline(unsigned int cpu) +{ +} +#endif + +/* + * Callbacks in the ONLINE section (CPUHP_AP_ONLINE_DYN is in this section), + * are invoked on the hotplugged CPU from the per CPU + * hotplug thread with interrupts and preemption enabled. + */ static int vmstat_cpu_down_prep(unsigned int cpu) { + vmstat_mark_cpu_offline(cpu); cancel_delayed_work_sync(&per_cpu(vmstat_work, cpu)); return 0; } Index: linux-2.6/include/linux/vmstat.h =================================================================== --- linux-2.6.orig/include/linux/vmstat.h +++ linux-2.6/include/linux/vmstat.h @@ -290,7 +290,7 @@ extern void dec_zone_state(struct zone * extern void __dec_zone_state(struct zone *, enum zone_stat_item); extern void __dec_node_state(struct pglist_data *, enum node_stat_item); -void quiet_vmstat(void); +void quiet_vmstat(bool user); void cpu_vm_stats_fold(int cpu); void refresh_zone_stat_thresholds(void); @@ -403,7 +403,7 @@ static inline void __dec_node_page_state static inline void refresh_zone_stat_thresholds(void) { } static inline void cpu_vm_stats_fold(int cpu) { } -static inline void quiet_vmstat(void) { } +static inline void quiet_vmstat(bool user) { } static inline void drain_zonestat(struct zone *zone, struct per_cpu_zonestat *pzstats) { } Index: linux-2.6/kernel/time/tick-sched.c =================================================================== --- linux-2.6.orig/kernel/time/tick-sched.c +++ linux-2.6/kernel/time/tick-sched.c @@ -911,7 +911,7 @@ static void tick_nohz_stop_tick(struct t */ if (!ts->tick_stopped) { calc_load_nohz_start(); - quiet_vmstat(); + quiet_vmstat(false); ts->last_tick = hrtimer_get_expires(&ts->sched_timer); ts->tick_stopped = 1; Index: linux-2.6/init/Kconfig =================================================================== --- linux-2.6.orig/init/Kconfig +++ linux-2.6/init/Kconfig @@ -678,6 +678,19 @@ config CPU_ISOLATION Say Y if unsure. +config FLUSH_WORK_ON_RESUME_USER + bool "Flush per-CPU vmstats on user return (for nohz full CPUs)" + depends on NO_HZ_FULL + default y + + help + By default, nohz full CPUs flush per-CPU vm statistics on return + to userspace (to avoid additional interferences when executing + userspace code). This has a small but measurable impact on + system call performance. You can disable this to improve system call + performance, at the expense of potential interferences to userspace + execution. + source "kernel/rcu/Kconfig" config BUILD_BIN2C From patchwork Thu Jan 5 12:52:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 39521 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp286332wrt; Thu, 5 Jan 2023 04:58:25 -0800 (PST) X-Google-Smtp-Source: AMrXdXt/AeLnnqTKnLB3KNw6x+Rfu6BC38OuWlUkSI7riA+ouxuYGnr+kod6c15uR8cJ7GT5u4Aq X-Received: by 2002:a05:6a21:2d8e:b0:a7:866d:40b5 with SMTP id ty14-20020a056a212d8e00b000a7866d40b5mr59962281pzb.6.1672923505020; Thu, 05 Jan 2023 04:58:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672923505; cv=none; d=google.com; s=arc-20160816; b=Llj41r7eB1A380vFTxHTTlgNHKKNk4+iGunPd/+ciGTLWLds7joW5291PU2gJh1kg2 bWmWoXyfSOUIFWo1OUSTdzLxSAdcMO7EDwzjYV1g8Qf1XrMRG/DBUqEZGVeaeV9VxPXB h60/BtS6PNnUOijJC2qhAXYqbSmIqhUOsyB2Tj9R1/BovDfh6WHQB8umLd7Sv+mOnRQb PfZ0upQV5P8tTABfwvWstEzMSOyiBCNLpWjdyTOcRpF9mZInub4iEL594bLZPKyOMty6 jEsOIp/7jCjTaD9vimvwetLkY3Xn4UbNQrsAg7wYBRT9hPyXqKdI6IDPLC9kUzzRrHnH vR3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=hIc0vPvNI1VykGTxKET9i/kMr1uqEk68mXFfOdiRry8=; b=WCc0FPqXE3uakc4kExCbx2V8m0AUBikIsPPRMYB0U4OXDp4LDq6Qx9SCa3Ra4zVxTW weYu05pBJNuTnKx7V5yz7FKBgD5YH1gplYMvewRUEHU+5oGBZsWG/f4wgqIqgvVJbaOE SS3NiTocLGGT58FHY+OHvTC5zuli0xSz/XTJ3iyovfFtB0uekYkwdZ9dYnVU2M6Tm9El QstGvwAhz+iMh9HXmtOOvJ88mXWJmS4PG5Hgb+irmxXtFbm/kJ70iqqFswHbBxYxpQBZ 96AHe8mNtxfxQi0CRQerFUEZ9cJnWER+Elfc7b3frSKzm3aoeFUi+TkrVLQ5jGJOA4Ik YDLw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=b63X5ef8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a24-20020a63e418000000b004792f347556si37034543pgi.623.2023.01.05.04.58.12; Thu, 05 Jan 2023 04:58:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=b63X5ef8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233607AbjAEM5q (ORCPT + 99 others); Thu, 5 Jan 2023 07:57:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45922 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233380AbjAEM5c (ORCPT ); Thu, 5 Jan 2023 07:57:32 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E704E5017D for ; Thu, 5 Jan 2023 04:56:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1672923406; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=hIc0vPvNI1VykGTxKET9i/kMr1uqEk68mXFfOdiRry8=; b=b63X5ef82Ne7rvWCJKbYJ6G5r8rK0A9jN4LP6vmchMg03NJNTP97JhZH1515paae4gurkZ 5wEZ2yNiHSlWB1JUjGniwpbeWDy1kehLFnU9qFpDiluRaQZCbPA458hXwbF92sivkp2WTA VFToAYtLcoWE5cfz4HqXBWcTt2RjzFM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-564-weQCRA3ZMGSGIUvUlcJqnw-1; Thu, 05 Jan 2023 07:56:43 -0500 X-MC-Unique: weQCRA3ZMGSGIUvUlcJqnw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7B67B3C0F425; Thu, 5 Jan 2023 12:56:42 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 45D0340C1141; Thu, 5 Jan 2023 12:56:42 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id 9F2C340502F3F; Thu, 5 Jan 2023 09:54:47 -0300 (-03) Message-ID: <20230105125248.892336104@redhat.com> User-Agent: quilt/0.66 Date: Thu, 05 Jan 2023 09:52:22 -0300 From: Marcelo Tosatti To: atomlin@atomlin.com, frederic@kernel.org Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH v13 4/6] tick/nohz_full: Ensure quiet_vmstat() is called on exit to user-mode when the idle tick is stopped References: <20230105125218.031928326@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754187437274622575?= X-GMAIL-MSGID: =?utf-8?q?1754187437274622575?= From: Aaron Tomlin For nohz full CPUs, we'd like the per-CPU vm statistics to be synchronized when userspace is executing. Otherwise, the vmstat_shepherd might queue a work item to synchronize them, which is undesired intereference for isolated CPUs. This patch syncs CPU-specific vmstat differentials, on return to userspace, if CONFIG_FLUSH_WORK_ON_RESUME_USER is enabled and the tick is stopped. A trivial test program was used to determine the impact of the proposed changes and under vanilla. The mlock(2) and munlock(2) system calls was used solely to modify vmstat item 'NR_MLOCK'. The following is an average count of CPU-cycles across the aforementioned system calls: Vanilla Modified Cycles per syscall 8461 8690 (+2.6%) Signed-off-by: Aaron Tomlin Signed-off-by: Marcelo Tosatti --- include/linux/tick.h | 5 +++-- kernel/time/tick-sched.c | 15 +++++++++++++++ 2 files changed, 18 insertions(+), 2 deletions(-) Index: linux-2.6/include/linux/tick.h =================================================================== --- linux-2.6.orig/include/linux/tick.h +++ linux-2.6/include/linux/tick.h @@ -11,7 +11,6 @@ #include #include #include -#include #ifdef CONFIG_GENERIC_CLOCKEVENTS extern void __init tick_init(void); @@ -272,6 +271,7 @@ static inline void tick_dep_clear_signal extern void tick_nohz_full_kick_cpu(int cpu); extern void __tick_nohz_task_switch(void); +void __tick_nohz_user_enter_prepare(void); extern void __init tick_nohz_full_setup(cpumask_var_t cpumask); #else static inline bool tick_nohz_full_enabled(void) { return false; } @@ -296,6 +296,7 @@ static inline void tick_dep_clear_signal static inline void tick_nohz_full_kick_cpu(int cpu) { } static inline void __tick_nohz_task_switch(void) { } +static inline void __tick_nohz_user_enter_prepare(void) { } static inline void tick_nohz_full_setup(cpumask_var_t cpumask) { } #endif @@ -308,7 +309,7 @@ static inline void tick_nohz_task_switch static inline void tick_nohz_user_enter_prepare(void) { if (tick_nohz_full_cpu(smp_processor_id())) - rcu_nocb_flush_deferred_wakeup(); + __tick_nohz_user_enter_prepare(); } #endif Index: linux-2.6/kernel/time/tick-sched.c =================================================================== --- linux-2.6.orig/kernel/time/tick-sched.c +++ linux-2.6/kernel/time/tick-sched.c @@ -26,6 +26,7 @@ #include #include #include +#include #include @@ -519,6 +520,23 @@ void __tick_nohz_task_switch(void) } } +void __tick_nohz_user_enter_prepare(void) +{ + if (tick_nohz_full_cpu(smp_processor_id())) { + if (IS_ENABLED(CONFIG_FLUSH_WORK_ON_RESUME_USER)) { + struct tick_sched *ts; + + ts = this_cpu_ptr(&tick_cpu_sched); + + if (ts->tick_stopped) + quiet_vmstat(true); + } + + rcu_nocb_flush_deferred_wakeup(); + } +} +EXPORT_SYMBOL_GPL(__tick_nohz_user_enter_prepare); + /* Get the boot-time nohz CPU list from the kernel parameters. */ void __init tick_nohz_full_setup(cpumask_var_t cpumask) { From patchwork Thu Jan 5 12:52:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 39520 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp286234wrt; Thu, 5 Jan 2023 04:58:10 -0800 (PST) X-Google-Smtp-Source: AMrXdXsxGm8BVPQloLWCcOg4wTBdVLVLH6Qz0AXR74SmI71SjSwuh0gWdHUjJtclWvbmNMBAWZeZ X-Received: by 2002:a17:90b:2643:b0:223:2865:73aa with SMTP id pa3-20020a17090b264300b00223286573aamr59206376pjb.2.1672923490222; Thu, 05 Jan 2023 04:58:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672923490; cv=none; d=google.com; s=arc-20160816; b=h50FRr5q8Tx/oJ4r9FR757mslO2f0kjq3JTHUVp6eHRjDjGy00wasDyhCNHOe+3Ump aM7PcJVP5tB0c1vJX0r4x0yfVHkj6ErEn/QX6z5riKFBy2lxdOazLtvBGddPjT9KyFD9 av8lmBVgUdOAfjS1V9/Rv1ey+QWduiFzfSeiTMyzlpdN6GD3JcFAw2pXC9XfN3Yybh+e h6jGR2RhgTV/mwB7rLRqBsvUomvQMfVJ8D+Hg2ficHp1TpfUVbMZj9QEZvDw3iji3a8X OFKsBChWMYhN6HAt38/E6Ed9F2gvxLgZDY91zZrdrxodfJcfs/qJztWDQRI+c+A6FKhx lDVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=iri/rWZtdhFxtRyLVTm3Z4b/XpaXLxRLi7TX4havHys=; b=kBZE7HCSYoqw1VSGPPnsXGCsHGFXT1mIA72vIw5bqO69D9QZAsAg0qA7YLEmB4e0RH R5WAFDUAWbrhRTlByc3nc4xRsHGaFA5xkGIavU/YPjlTGOrWR+KcGNVRJGK/rxmO9kD+ 85Pcic+LZ7l8sqFILdSrC2Jnsuw/ZVzDCGC5fVN+OjL5QpyYXuT4F0MDGUylVRngmPQj KlqOl5cHnkNzojanhYdcA31ksbaN1tYtbxU1st5In/oaHiY5Mwmqmvm//RK5IJcz51gB 72G+wmk1oOI93ixJ3s7pGDqTxX2ply+dqpBD5mQHEe7SiHRBSrJnLna8fs8Xp0ppBm6Y rl/w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FuLdNDLR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id oa6-20020a17090b1bc600b002194ca43255si1929250pjb.50.2023.01.05.04.57.57; Thu, 05 Jan 2023 04:58:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FuLdNDLR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233582AbjAEM5n (ORCPT + 99 others); Thu, 5 Jan 2023 07:57:43 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45900 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233383AbjAEM5c (ORCPT ); Thu, 5 Jan 2023 07:57:32 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 128B73D9EC for ; Thu, 5 Jan 2023 04:56:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1672923405; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=iri/rWZtdhFxtRyLVTm3Z4b/XpaXLxRLi7TX4havHys=; b=FuLdNDLR5hW/MyI07MB3+HQL6PqvcXGkgUzhxuAR04rhyuuSbHnhnhhMz5UyHY8RvrxeVO qCnezUyYB6XcLwLEe/HrnncoIZoPWbdrP7/mpZQSP5/P4xzdeJ0B4ADuHCkx8AUaAN7ffj P/5V5qVoDNP7rNaENeo3WusZfqFA064= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-63-GnKdm2ZoO8ub5NKyh93f9A-1; Thu, 05 Jan 2023 07:56:41 -0500 X-MC-Unique: GnKdm2ZoO8ub5NKyh93f9A-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D0030811E6E; Thu, 5 Jan 2023 12:56:40 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5B937400E40A; Thu, 5 Jan 2023 12:56:40 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id A376E40502F44; Thu, 5 Jan 2023 09:54:47 -0300 (-03) Message-ID: <20230105125248.932725463@redhat.com> User-Agent: quilt/0.66 Date: Thu, 05 Jan 2023 09:52:23 -0300 From: Marcelo Tosatti To: atomlin@atomlin.com, frederic@kernel.org Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH v13 5/6] tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too References: <20230105125218.031928326@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754187421241987270?= X-GMAIL-MSGID: =?utf-8?q?1754187421241987270?= From: Aaron Tomlin In the context of the idle task and an adaptive-tick mode/or a nohz_full CPU, quiet_vmstat() can be called: before stopping the idle tick, entering an idle state and on exit. In particular, for the latter case, when the idle task is required to reschedule, the idle tick can remain stopped and the timer expiration time endless i.e., KTIME_MAX. Now, indeed before a nohz_full CPU enters an idle state, CPU-specific vmstat counters should be processed to ensure the respective values have been reset and folded into the zone specific 'vm_stat[]'. That being said, it can only occur when: the idle tick was previously stopped, and reprogramming of the timer is not required. A customer provided some evidence which indicates that the idle tick was stopped; albeit, CPU-specific vmstat counters still remained populated. Thus one can only assume quiet_vmstat() was not invoked on return to the idle loop. If I understand correctly, I suspect this divergence might erroneously prevent a reclaim attempt by kswapd. If the number of zone specific free pages are below their per-cpu drift value then zone_page_state_snapshot() is used to compute a more accurate view of the aforementioned statistic. Thus any task blocked on the NUMA node specific pfmemalloc_wait queue will be unable to make significant progress via direct reclaim unless it is killed after being woken up by kswapd (see throttle_direct_reclaim()). Consider the following theoretical scenario: - Note: CPU X is part of 'tick_nohz_full_mask' 1. CPU Y migrated running task A to CPU X that was in an idle state i.e. waiting for an IRQ; marked the current task on CPU X to need/or require a reschedule i.e., set TIF_NEED_RESCHED and invoked a reschedule IPI to CPU X (see sched_move_task()) 2. CPU X acknowledged the reschedule IPI. Generic idle loop code noticed the TIF_NEED_RESCHED flag against the idle task and attempts to exit of the loop and calls the main scheduler function i.e. __schedule(). Since the idle tick was previously stopped no scheduling-clock tick would occur. So, no deferred timers would be handled 3. Post transition to kernel execution Task A running on CPU X, indirectly released a few pages (e.g. see __free_one_page()); CPU X's 'vm_stat_diff[NR_FREE_PAGES]' was updated and zone specific 'vm_stat[]' update was deferred as per the CPU-specific stat threshold 4. Task A does invoke exit(2) and the kernel does remove the task from the run-queue; the idle task was selected to execute next since there are no other runnable tasks assigned to the given CPU (see pick_next_task() and pick_next_task_idle()) 5. On return to the idle loop since the idle tick was already stopped and can remain so (see [1] below) e.g. no pending soft IRQs, no attempt is made to zero and fold CPU X's vmstat counters since reprogramming of the scheduling-clock tick is not required/or needed (see [2]) ... do_idle { __current_set_polling() tick_nohz_idle_enter() while (!need_resched()) { local_irq_disable() ... /* No polling or broadcast event */ cpuidle_idle_call() { if (cpuidle_not_available(drv, dev)) { tick_nohz_idle_stop_tick() __tick_nohz_idle_stop_tick(this_cpu_ptr(&tick_cpu_sched)) { int cpu = smp_processor_id() if (ts->timer_expires_base) expires = ts->timer_expires else if (can_stop_idle_tick(cpu, ts)) (1) -------> expires = tick_nohz_next_event(ts, cpu) else return ts->idle_calls++ if (expires > 0LL) { tick_nohz_stop_tick(ts, cpu) { if (ts->tick_stopped && (expires == ts->next_tick)) { (2) -------> if (tick == KTIME_MAX || ts->next_tick == hrtimer_get_expires(&ts->sched_timer)) return } ... } So, the idea of this patch is to ensure refresh_cpu_vm_stats(false) is called, when it is appropriate, on return to the idle loop if the idle tick was previously stopped too. A trivial test program was used to determine the impact of the proposed changes and under vanilla. The nanosleep(2) system call was used several times to suspend execution for a period of time to approximately compute the number of CPU-cycles in the idle code path. The following is an average count of CPU-cycles: Vanilla Modified Cycles per idle loop 151858 153258 (+1.0%) Signed-off-by: Aaron Tomlin Signed-off-by: Marcelo Tosatti --- kernel/time/tick-sched.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: linux-2.6/kernel/time/tick-sched.c =================================================================== --- linux-2.6.orig/kernel/time/tick-sched.c +++ linux-2.6/kernel/time/tick-sched.c @@ -929,13 +929,14 @@ static void tick_nohz_stop_tick(struct t */ if (!ts->tick_stopped) { calc_load_nohz_start(); - quiet_vmstat(false); ts->last_tick = hrtimer_get_expires(&ts->sched_timer); ts->tick_stopped = 1; trace_tick_stop(1, TICK_DEP_MASK_NONE); } + /* Attempt to fold when the idle tick is stopped or not */ + quiet_vmstat(false); ts->next_tick = tick; /* From patchwork Thu Jan 5 12:52:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 39518 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp286208wrt; Thu, 5 Jan 2023 04:58:03 -0800 (PST) X-Google-Smtp-Source: AMrXdXutpM4IWoAagik/iWrzQPqb5tggbHraJnn/pXBrmfVvJw5xKf5BY8yTZ95YpXyJf3TuS/ar X-Received: by 2002:a17:902:7c89:b0:188:59e2:5f91 with SMTP id y9-20020a1709027c8900b0018859e25f91mr50198120pll.59.1672923482957; Thu, 05 Jan 2023 04:58:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1672923482; cv=none; d=google.com; s=arc-20160816; b=ubMLqEPLz3PwdQmkIeRd26a/rqcB/CHyzmsaredjgG8YWqG/4vbFkbQNBSbhABH/Iy pLNWLGs7xr9t5Yyn4E5NYRY33iR7aKfVPJs5ivzyeHK3szCZZQqhI+ke2ZP/01VnKN7z dxncfR0eQ+o9zem26yT03GeZ/SjrYGqyB0sTMqY9nvpNx1haSu4i8anvAqmOSCp2DclI Ua5sfNEleI6I6kCKqGK6hb1TNN7dMcKM4XhFri0TBG3PIVVJOu4zUcBx17fuR3+LZaTj NN7P//5t4OVoATllb0c/T8dFx3HKhqWUf23nCeWZy2Rt+CyqsoqANjrhRO2Z2X7EEn+z hmrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:subject:cc:to:from:date :user-agent:message-id:dkim-signature; bh=C8n9DE0ruCcCCK7cY6ZY83QBYvGpdJ8qqUIL+bP6IWU=; b=NCQ7sHE8QKyR+eLc3PkuZaoYUU091HHv+9Xco10R8hLXKYVrpCVTsJd3M9jnnBWjPA 89wDb4Cm+LdIrBGcWscbCEgyUkXZIkl4g8qEXtWUadiCncGbsD5OjFlOEu2Ls7QHRi+Y o6j/dlzeGCpZNE1P1AH5rKerhHIHh3It9OyjtBS/wG7BF4/Q0EE0CrEcpI75Cz1clKp4 dqoSl9lRSm+5rS83Ns6Ru/GCh865gFdm4pAwvLGnYEz4r06E5aTVYverJRCXRQ3nEbAV sUSCTiDC1qZegc7COaHrphcc7wZWk2JCAHK4iF7Gdg3SoB8Pyn27UcAiRNXpI+B4EFpN 2g5Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="gb/ZNANV"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z15-20020a170902d54f00b001868a25da0dsi24002750plf.40.2023.01.05.04.57.49; Thu, 05 Jan 2023 04:58:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="gb/ZNANV"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232952AbjAEM51 (ORCPT + 99 others); Thu, 5 Jan 2023 07:57:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45864 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232958AbjAEM5Y (ORCPT ); Thu, 5 Jan 2023 07:57:24 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 672B51081 for ; Thu, 5 Jan 2023 04:56:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1672923403; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=C8n9DE0ruCcCCK7cY6ZY83QBYvGpdJ8qqUIL+bP6IWU=; b=gb/ZNANVaX//SnP38qcHV9CxNSKD1LwWgAx3xjgkLUe4DALRnhpUvZJKNEQVUv1RTFNHMs bbyfGGIVstryDsNhXirltawYbqhVjdJJw/EUNPXoVzGsz+lhiU6jYcJHuOJrKxTgVP1kiv eDrPi2xDmTK899hM10zOmpqjeNL4wUw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-120-zOn_0ctvMduyru0HaogF3g-1; Thu, 05 Jan 2023 07:56:41 -0500 X-MC-Unique: zOn_0ctvMduyru0HaogF3g-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A49B8802C1C; Thu, 5 Jan 2023 12:56:40 +0000 (UTC) Received: from tpad.localdomain (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 3529E53A0; Thu, 5 Jan 2023 12:56:40 +0000 (UTC) Received: by tpad.localdomain (Postfix, from userid 1000) id A7E8340502F46; Thu, 5 Jan 2023 09:54:47 -0300 (-03) Message-ID: <20230105125248.971432211@redhat.com> User-Agent: quilt/0.66 Date: Thu, 05 Jan 2023 09:52:24 -0300 From: Marcelo Tosatti To: atomlin@atomlin.com, frederic@kernel.org Cc: cl@linux.com, tglx@linutronix.de, mingo@kernel.org, peterz@infradead.org, pauld@redhat.com, neelx@redhat.com, oleksandr@natalenko.name, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Marcelo Tosatti Subject: [PATCH v13 6/6] mm/vmstat: avoid queueing work item if cpu stats are clean References: <20230105125218.031928326@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754187413825267059?= X-GMAIL-MSGID: =?utf-8?q?1754187413825267059?= It is not necessary to queue work item to run refresh_vm_stats on a remote CPU if that CPU has no dirty stats and no per-CPU allocations for remote nodes. This fixes sosreport hang (which uses vmstat_refresh) with spinning SCHED_FIFO process. Signed-off-by: Marcelo Tosatti Index: linux-2.6/mm/vmstat.c =================================================================== --- linux-2.6.orig/mm/vmstat.c +++ linux-2.6/mm/vmstat.c @@ -1931,6 +1931,31 @@ static const struct seq_operations vmsta #ifdef CONFIG_SMP #ifdef CONFIG_PROC_FS +static bool need_drain_remote_zones(int cpu) +{ +#ifdef CONFIG_NUMA + struct zone *zone; + + for_each_populated_zone(zone) { + struct per_cpu_pages *pcp; + + pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); + if (!pcp->count) + continue; + + if (!pcp->expire) + continue; + + if (zone_to_nid(zone) == cpu_to_node(cpu)) + continue; + + return true; + } +#endif + + return false; +} + static void refresh_vm_stats(struct work_struct *work) { refresh_cpu_vm_stats(true); @@ -1940,8 +1965,12 @@ int vmstat_refresh(struct ctl_table *tab void *buffer, size_t *lenp, loff_t *ppos) { long val; - int err; - int i; + int i, cpu; + struct work_struct __percpu *works; + + works = alloc_percpu(struct work_struct); + if (!works) + return -ENOMEM; /* * The regular update, every sysctl_stat_interval, may come later @@ -1955,9 +1984,21 @@ int vmstat_refresh(struct ctl_table *tab * transiently negative values, report an error here if any of * the stats is negative, so we know to go looking for imbalance. */ - err = schedule_on_each_cpu(refresh_vm_stats); - if (err) - return err; + cpus_read_lock(); + for_each_online_cpu(cpu) { + struct work_struct *work = per_cpu_ptr(works, cpu); + struct vmstat_dirty *vms = per_cpu_ptr(&vmstat_dirty_pcpu, cpu); + + INIT_WORK(work, refresh_vm_stats); + + if (vms->dirty || need_drain_remote_zones(cpu)) + schedule_work_on(cpu, work); + } + for_each_online_cpu(cpu) + flush_work(per_cpu_ptr(works, cpu)); + cpus_read_unlock(); + free_percpu(works); + for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) { /* * Skip checking stats known to go negative occasionally.