From patchwork Thu Mar 30 10:54:15 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
X-Patchwork-Id: 77110
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1058977vqo;
        Thu, 30 Mar 2023 04:39:46 -0700 (PDT)
X-Google-Smtp-Source: 
 AKy350aoS3P410DylROFPYQk2Ek6mURcdq79qbW2vazENATFvyUg0JnLqf/SiSGbyEwYtEjilvMX
X-Received: by 2002:a17:90b:33c2:b0:23d:3913:bc26 with SMTP id
 lk2-20020a17090b33c200b0023d3913bc26mr25397211pjb.2.1680176386724;
        Thu, 30 Mar 2023 04:39:46 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1680176386; cv=none;
        d=google.com; s=arc-20160816;
        b=Euk06zQdxp97H7h7tyPkDwovTQwEZGlRlUylujikLzzcQOb2jf2bP4RIqLE3kSXPxX
         5jVSwixW8tcWFQJrZICsPnKVTz024TPmy0XpJBtv/fLrPtJ/OCY/f1RMUmND5oHyQ4lE
         iFDNZdvIABTp9v8KIBRUzanbPqvQYSsJoTwb8sB4xaVVJ6ZkIYdTeDVHirpQ4q7QG2AR
         8lVxy9Dhct1V2uNEA+7jiPFasPS3wWxAN3OzeFOvZcm3JYpC2PTk8+S2MQWR6pvUftkk
         ErNYemxh1Mt98H6OAvayeTDPHK1zuIkL0IPgUquXEdx3rXCc08/R5dff6c84aRa2mzIz
         GKfg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :references:in-reply-to:message-id:date:subject:cc:to:from
         :dkim-signature;
        bh=Sgclq3CwPySlcIw6eNZ6zuLOimqag1tJf0PxPQK2QJo=;
        b=k6NFd2K2m14X6x6FTHiGa0ce51cMuW91iGH1is4Fq/Jh042rCmCGJYUdifaOuKOSBi
         dQlWRXLKr6XGBhNUbYSs9XhVCSLmjPQX8p9ylIgm05gDxaG557aHAPOObUndAxafxAFB
         /XsSnZDXtlBIW8KV3CXmIoAgYx2tSXsLv7bxZ85Zi4AAOJARatIY7AOMsNkjPcRMlzoM
         pJCeXvo3J0GIGFp/I4Kn+dudxreERd4FJIxjp15ZHXWW/Nj1Cv6D8FIy2TXW/l1uTxf6
         SK0l4Ukh/9bNx/X9/SpG5IOOK60U5yVlhCz0r5q4wgOuF2929rSN5iM0Ly7/6a0gTyY8
         li2Q==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20210112 header.b=QsJO48Xh;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 e1-20020a17090a77c100b0023d21d8cd42si3826445pjs.27.2023.03.30.04.39.33;
        Thu, 30 Mar 2023 04:39:46 -0700 (PDT)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gmail.com header.s=20210112 header.b=QsJO48Xh;
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231182AbjC3Kym (ORCPT <rfc822;rua109.linux@gmail.com>
        + 99 others); Thu, 30 Mar 2023 06:54:42 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50884 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231154AbjC3Kyj (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 30 Mar 2023 06:54:39 -0400
Received: from mail-ed1-x533.google.com (mail-ed1-x533.google.com
 [IPv6:2a00:1450:4864:20::533])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB50A7A8F
        for <linux-kernel@vger.kernel.org>;
 Thu, 30 Mar 2023 03:54:35 -0700 (PDT)
Received: by mail-ed1-x533.google.com with SMTP id ew6so74698381edb.7
        for <linux-kernel@vger.kernel.org>;
 Thu, 30 Mar 2023 03:54:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20210112; t=1680173674;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=Sgclq3CwPySlcIw6eNZ6zuLOimqag1tJf0PxPQK2QJo=;
        b=QsJO48Xho5QjgAiE08zrxXR7Kp+SSklxZ5jznLJLljSe+f3v+/MKKRZ0fLrpe9zC06
         UzFARBsHzgpPi6R6mQMJM8ZRmRU79GXDh8Ek04XhRt45SYpcsXPQ/SHow58eUmItxdn1
         YfWE8/QlC7xkXEliizKoH9XgGA7kG0qdhpK4RtYXKm2HcfBcdv4anxZ+fPfcvaoiK26I
         ExruaIIRqypGnIVYbnTdWKKsrq3R2ZTlV8EFMYXZslDMS+CNDGEo76vh/L/tI7a0Zjkh
         o83VDqVslSG4JXNqFxo44FlT/GzRyGtOrCd0CS2lFEySVGkNGjrLH7NV0G87dG43MpVX
         FUpw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112; t=1680173674;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=Sgclq3CwPySlcIw6eNZ6zuLOimqag1tJf0PxPQK2QJo=;
        b=k9EIG/XQQi5tdwXbkso4Sr1/tZH+Ymw8JGwqbMr6p9LK4ArXFmPRaESgRikUV0AJkB
         rYjF+Y0i7LgfLQtColTFL8H2ozd4p8yKLcZAFeLSmCUkAMCXOxRu7S12lQVqaN2dPDb4
         ORc4i/CzR5HuzKVf/c0dKKF7tuu587tqNSBKII6vC0rq4VlwfxEg0eZyuXyg46gqBs2C
         cN1w3ruowBWZywkjJmKelykBSDz3BaQKbv3St1iwb6f6d5LgxvR+pUsr0mtXQ8felvDN
         xDkYikJimU2OZv3T01lgwf0Ohgf5pCQ+wkVP67oiypB4mc820Bw4St1fsVcE8pmzk6IM
         29kw==
X-Gm-Message-State: AAQBX9e7aelHYl+whL91VDEpO6vxmSmKn3xJ5W9tu9kxbb++q79y8/HP
        9N2R8McjfxIrqg0UB8HZI/vqumLSujkJog==
X-Received: by 2002:a17:907:d483:b0:932:ac6c:7ef9 with SMTP id
 vj3-20020a170907d48300b00932ac6c7ef9mr25052246ejc.22.1680173673563;
        Thu, 30 Mar 2023 03:54:33 -0700 (PDT)
Received: from lelloman-5950.. (host-80-182-134-1.pool80182.interbusiness.it.
 [80.182.134.1])
        by smtp.gmail.com with ESMTPSA id
 hb6-20020a170906b88600b0093120a11a5dsm17639628ejb.92.2023.03.30.03.54.32
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 30 Mar 2023 03:54:33 -0700 (PDT)
From: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
To: linux-kernel@vger.kernel.org
Cc: peterz@infradead.org, surenb@google.com, brauner@kernel.org,
        chris@chrisdown.name, hannes@cmpxchg.org,
        Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Subject: [PATCH v6 1/4] sched/psi: rearrange polling code in preparation
Date: Thu, 30 Mar 2023 12:54:15 +0200
Message-Id: <20230330105418.77061-2-cerasuolodomenico@gmail.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20230330105418.77061-1-cerasuolodomenico@gmail.com>
References: <20230330105418.77061-1-cerasuolodomenico@gmail.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID,
        DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,
        SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no
        version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1761792635082643749?=
X-GMAIL-MSGID: =?utf-8?q?1761792635082643749?=

Move a few functions up in the file to avoid forward declaration needed
in the patch implementing unprivileged PSI triggers.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Suren Baghdasaryan <surenb@google.com>
---
 kernel/sched/psi.c | 196 ++++++++++++++++++++++-----------------------
 1 file changed, 98 insertions(+), 98 deletions(-)

diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index 02e011cabe91..fe9269f1d2a4 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -384,92 +384,6 @@ static void collect_percpu_times(struct psi_group *group,
 		*pchanged_states = changed_states;
 }
 
-static u64 update_averages(struct psi_group *group, u64 now)
-{
-	unsigned long missed_periods = 0;
-	u64 expires, period;
-	u64 avg_next_update;
-	int s;
-
-	/* avgX= */
-	expires = group->avg_next_update;
-	if (now - expires >= psi_period)
-		missed_periods = div_u64(now - expires, psi_period);
-
-	/*
-	 * The periodic clock tick can get delayed for various
-	 * reasons, especially on loaded systems. To avoid clock
-	 * drift, we schedule the clock in fixed psi_period intervals.
-	 * But the deltas we sample out of the per-cpu buckets above
-	 * are based on the actual time elapsing between clock ticks.
-	 */
-	avg_next_update = expires + ((1 + missed_periods) * psi_period);
-	period = now - (group->avg_last_update + (missed_periods * psi_period));
-	group->avg_last_update = now;
-
-	for (s = 0; s < NR_PSI_STATES - 1; s++) {
-		u32 sample;
-
-		sample = group->total[PSI_AVGS][s] - group->avg_total[s];
-		/*
-		 * Due to the lockless sampling of the time buckets,
-		 * recorded time deltas can slip into the next period,
-		 * which under full pressure can result in samples in
-		 * excess of the period length.
-		 *
-		 * We don't want to report non-sensical pressures in
-		 * excess of 100%, nor do we want to drop such events
-		 * on the floor. Instead we punt any overage into the
-		 * future until pressure subsides. By doing this we
-		 * don't underreport the occurring pressure curve, we
-		 * just report it delayed by one period length.
-		 *
-		 * The error isn't cumulative. As soon as another
-		 * delta slips from a period P to P+1, by definition
-		 * it frees up its time T in P.
-		 */
-		if (sample > period)
-			sample = period;
-		group->avg_total[s] += sample;
-		calc_avgs(group->avg[s], missed_periods, sample, period);
-	}
-
-	return avg_next_update;
-}
-
-static void psi_avgs_work(struct work_struct *work)
-{
-	struct delayed_work *dwork;
-	struct psi_group *group;
-	u32 changed_states;
-	u64 now;
-
-	dwork = to_delayed_work(work);
-	group = container_of(dwork, struct psi_group, avgs_work);
-
-	mutex_lock(&group->avgs_lock);
-
-	now = sched_clock();
-
-	collect_percpu_times(group, PSI_AVGS, &changed_states);
-	/*
-	 * If there is task activity, periodically fold the per-cpu
-	 * times and feed samples into the running averages. If things
-	 * are idle and there is no data to process, stop the clock.
-	 * Once restarted, we'll catch up the running averages in one
-	 * go - see calc_avgs() and missed_periods.
-	 */
-	if (now >= group->avg_next_update)
-		group->avg_next_update = update_averages(group, now);
-
-	if (changed_states & PSI_STATE_RESCHEDULE) {
-		schedule_delayed_work(dwork, nsecs_to_jiffies(
-				group->avg_next_update - now) + 1);
-	}
-
-	mutex_unlock(&group->avgs_lock);
-}
-
 /* Trigger tracking window manipulations */
 static void window_reset(struct psi_window *win, u64 now, u64 value,
 			 u64 prev_growth)
@@ -516,18 +430,6 @@ static u64 window_update(struct psi_window *win, u64 now, u64 value)
 	return growth;
 }
 
-static void init_triggers(struct psi_group *group, u64 now)
-{
-	struct psi_trigger *t;
-
-	list_for_each_entry(t, &group->triggers, node)
-		window_reset(&t->win, now,
-				group->total[PSI_POLL][t->state], 0);
-	memcpy(group->polling_total, group->total[PSI_POLL],
-		   sizeof(group->polling_total));
-	group->polling_next_update = now + group->poll_min_period;
-}
-
 static u64 update_triggers(struct psi_group *group, u64 now)
 {
 	struct psi_trigger *t;
@@ -590,6 +492,104 @@ static u64 update_triggers(struct psi_group *group, u64 now)
 	return now + group->poll_min_period;
 }
 
+static u64 update_averages(struct psi_group *group, u64 now)
+{
+	unsigned long missed_periods = 0;
+	u64 expires, period;
+	u64 avg_next_update;
+	int s;
+
+	/* avgX= */
+	expires = group->avg_next_update;
+	if (now - expires >= psi_period)
+		missed_periods = div_u64(now - expires, psi_period);
+
+	/*
+	 * The periodic clock tick can get delayed for various
+	 * reasons, especially on loaded systems. To avoid clock
+	 * drift, we schedule the clock in fixed psi_period intervals.
+	 * But the deltas we sample out of the per-cpu buckets above
+	 * are based on the actual time elapsing between clock ticks.
+	 */
+	avg_next_update = expires + ((1 + missed_periods) * psi_period);
+	period = now - (group->avg_last_update + (missed_periods * psi_period));
+	group->avg_last_update = now;
+
+	for (s = 0; s < NR_PSI_STATES - 1; s++) {
+		u32 sample;
+
+		sample = group->total[PSI_AVGS][s] - group->avg_total[s];
+		/*
+		 * Due to the lockless sampling of the time buckets,
+		 * recorded time deltas can slip into the next period,
+		 * which under full pressure can result in samples in
+		 * excess of the period length.
+		 *
+		 * We don't want to report non-sensical pressures in
+		 * excess of 100%, nor do we want to drop such events
+		 * on the floor. Instead we punt any overage into the
+		 * future until pressure subsides. By doing this we
+		 * don't underreport the occurring pressure curve, we
+		 * just report it delayed by one period length.
+		 *
+		 * The error isn't cumulative. As soon as another
+		 * delta slips from a period P to P+1, by definition
+		 * it frees up its time T in P.
+		 */
+		if (sample > period)
+			sample = period;
+		group->avg_total[s] += sample;
+		calc_avgs(group->avg[s], missed_periods, sample, period);
+	}
+
+	return avg_next_update;
+}
+
+static void psi_avgs_work(struct work_struct *work)
+{
+	struct delayed_work *dwork;
+	struct psi_group *group;
+	u32 changed_states;
+	u64 now;
+
+	dwork = to_delayed_work(work);
+	group = container_of(dwork, struct psi_group, avgs_work);
+
+	mutex_lock(&group->avgs_lock);
+
+	now = sched_clock();
+
+	collect_percpu_times(group, PSI_AVGS, &changed_states);
+	/*
+	 * If there is task activity, periodically fold the per-cpu
+	 * times and feed samples into the running averages. If things
+	 * are idle and there is no data to process, stop the clock.
+	 * Once restarted, we'll catch up the running averages in one
+	 * go - see calc_avgs() and missed_periods.
+	 */
+	if (now >= group->avg_next_update)
+		group->avg_next_update = update_averages(group, now);
+
+	if (changed_states & PSI_STATE_RESCHEDULE) {
+		schedule_delayed_work(dwork, nsecs_to_jiffies(
+				group->avg_next_update - now) + 1);
+	}
+
+	mutex_unlock(&group->avgs_lock);
+}
+
+static void init_triggers(struct psi_group *group, u64 now)
+{
+	struct psi_trigger *t;
+
+	list_for_each_entry(t, &group->triggers, node)
+		window_reset(&t->win, now,
+				group->total[PSI_POLL][t->state], 0);
+	memcpy(group->polling_total, group->total[PSI_POLL],
+		   sizeof(group->polling_total));
+	group->polling_next_update = now + group->poll_min_period;
+}
+
 /* Schedule polling if it's not already scheduled or forced. */
 static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay,
 				   bool force)