From patchwork Wed Mar 29 15:33:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Domenico Cerasuolo X-Patchwork-Id: 76659 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp502394vqo; Wed, 29 Mar 2023 08:35:38 -0700 (PDT) X-Google-Smtp-Source: AKy350aR8MsyxkwO8UEKd1Dvohok0M2GxN9Mq8KefAVQxlgoBFJffdhKy4I24Yjt982XTu5iavR9 X-Received: by 2002:a17:907:6d24:b0:93e:9362:75fa with SMTP id sa36-20020a1709076d2400b0093e936275famr22646044ejc.47.1680104138637; Wed, 29 Mar 2023 08:35:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680104138; cv=none; d=google.com; s=arc-20160816; b=g70Gwd9sJUrvFTOPJ+b9n22Oj6o15kPSoQHKD/fz1CMeYmXRFB9UjKtSyyu0A7F/JI pd7DWOEvi6JM7Y+6BbIIuNBjfZ47PKaeMlBgYvHCWLs/ukF51TCThhXSZ8NiTOMaGCCQ C83DtsQiCxb5zgn8Mp/EGSJqkWH5ExS9G6Qtr9ANdmvnXlHQjVttLkd/ePnibAFrSHTS /sOwR66BQSfqemNOd4+d5wMiZYud4ezSwLW845f/3bVLzc+1Is2TSZQtqh0BLz/lMEe3 WKmLU4Tf+2pC0tmyxzPBw7Rz8mDfCId/Yy+9QYB0m3u9jc2m++pscmb4l1FWIJj11Vda olCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=e39gTwxGluXQJRtHutgVo/eHgQ2HpZYFqbaudGJzbN8=; b=0wircZQROcnV0sGcpyqgDwaYFMB1xMyHJJVGkw7wLWa9CzjIKZFDwLlpD2iYhSx2kC m0bVNQbMKSdbvaTsSHHGmrIykxf/YR+ySWDOOLw+nfaZiY3Q7KAa9pSZGzhLes6pyFWP 00SlK02IclYyChNi+CmAkksFHyKgBYvjr4uckLSMjSmoyZtTql2JXBhD4fFaTNCEQ0cC hGSprvPT/up5K68EC4movsjwY2RLfaXkbu+3hm07+mrezxjF+6aao8lyftO90Zdu6WR/ nbuYx8dkkkbQXJz8y/Te8+V1nDJv67HS8rBa2O4mmPID1PhiwFiTslFaokDlrpuvI/Pm pNxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=KEppAoL3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lc15-20020a170906dfef00b00929b501af1asi17599023ejc.1008.2023.03.29.08.35.10; Wed, 29 Mar 2023 08:35:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=KEppAoL3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230315AbjC2Pdt (ORCPT + 99 others); Wed, 29 Mar 2023 11:33:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230295AbjC2Pdl (ORCPT ); Wed, 29 Mar 2023 11:33:41 -0400 Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4CBEF468B for ; Wed, 29 Mar 2023 08:33:34 -0700 (PDT) Received: by mail-ed1-x532.google.com with SMTP id x3so64912718edb.10 for ; Wed, 29 Mar 2023 08:33:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680104012; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=e39gTwxGluXQJRtHutgVo/eHgQ2HpZYFqbaudGJzbN8=; b=KEppAoL3AuWJApMyDpfhQTNLKzjfSI+nlQqU/CDPc8HgdmmLmCdJWnzSUqcxN+FEtt QQtxEyoT3QeDuYngnNevJmUCujVaN2mWIMe0ZNtl0zvhOLsR4Hqxyra5BExw6czG/cUR 6KIqddo1kXOqDDgt9b11fapupIzPUZhMEyyp557JumWDuobhQRJd4wCkSEV/ldtMEiXE P6JoE5JQrM756bWKa4RtWPaI7bXVTpus070scWjbbE3KD8c/A4k220y5wMbzJx/L1GlJ l4Zv1v2VDiwpAKcxcPxQgq9fHBQxrHbMN+5iL+e5Z9zunCOPZyKlNvPtCnHXo7Tkk/JM txCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680104012; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=e39gTwxGluXQJRtHutgVo/eHgQ2HpZYFqbaudGJzbN8=; b=QcEAb253jko6RZxHjbg7cxENDSNz4XXp29JRHSYvzvLGIRpU7IOZsuHYr8Hc4oAflW Mxy8azldjgTAmiSQ9CbtKEL0QuSm66CRx20pqbgaVy9gtSrVWARxE0tAMxOZ8c+sxDEt j0SxU3smtnHuvR0tbgKcvXf2y1WFkdm4Y84ux/gIi7QxAaQjAPaGfNBXWztpw6XDXQPQ 0tQzsS+fhRxR7lDUrQHi5eP/f70Uw1HXq0SE1bYBO2IyED7OAlu9eq8VVYXozuCleutT bj2gyAKxWG3evAr/GWbX9pHZA6EIFN/fhgH2/qcHwS44SE8dxVE0auWWbonV51+MVisC T1yQ== X-Gm-Message-State: AAQBX9ethcqucvsVHyEjX49TBWHQrPZAHVgNxqPIcwrlhN/odbFyAiQp 9ae6O20HRnHLJy4Rd0XE56Bfi0/HKJ6I49B6 X-Received: by 2002:a17:906:1645:b0:88d:f759:15ae with SMTP id n5-20020a170906164500b0088df75915aemr21367291ejd.42.1680104012386; Wed, 29 Mar 2023 08:33:32 -0700 (PDT) Received: from lelloman-5950.. (host-79-13-135-230.retail.telecomitalia.it. [79.13.135.230]) by smtp.gmail.com with ESMTPSA id g10-20020a17090670ca00b0093b8c0952e4sm10193735ejk.219.2023.03.29.08.33.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Mar 2023 08:33:32 -0700 (PDT) From: Domenico Cerasuolo To: linux-kernel@vger.kernel.org Cc: peterz@infradead.org, surenb@google.com, brauner@kernel.org, chris@chrisdown.name, hannes@cmpxchg.org, Domenico Cerasuolo Subject: [PATCH v4 1/4] sched/psi: rearrange polling code in preparation Date: Wed, 29 Mar 2023 17:33:24 +0200 Message-Id: <20230329153327.140215-2-cerasuolodomenico@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230329153327.140215-1-cerasuolodomenico@gmail.com> References: <20230329153327.140215-1-cerasuolodomenico@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761716877120489103?= X-GMAIL-MSGID: =?utf-8?q?1761716877120489103?= Move a few functions up in the file to avoid forward declaration needed in the patch implementing unprivileged PSI triggers. Suggested-by: Johannes Weiner Signed-off-by: Domenico Cerasuolo Acked-by: Johannes Weiner --- kernel/sched/psi.c | 196 ++++++++++++++++++++++----------------------- 1 file changed, 98 insertions(+), 98 deletions(-) diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 02e011cabe91..fe9269f1d2a4 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -384,92 +384,6 @@ static void collect_percpu_times(struct psi_group *group, *pchanged_states = changed_states; } -static u64 update_averages(struct psi_group *group, u64 now) -{ - unsigned long missed_periods = 0; - u64 expires, period; - u64 avg_next_update; - int s; - - /* avgX= */ - expires = group->avg_next_update; - if (now - expires >= psi_period) - missed_periods = div_u64(now - expires, psi_period); - - /* - * The periodic clock tick can get delayed for various - * reasons, especially on loaded systems. To avoid clock - * drift, we schedule the clock in fixed psi_period intervals. - * But the deltas we sample out of the per-cpu buckets above - * are based on the actual time elapsing between clock ticks. - */ - avg_next_update = expires + ((1 + missed_periods) * psi_period); - period = now - (group->avg_last_update + (missed_periods * psi_period)); - group->avg_last_update = now; - - for (s = 0; s < NR_PSI_STATES - 1; s++) { - u32 sample; - - sample = group->total[PSI_AVGS][s] - group->avg_total[s]; - /* - * Due to the lockless sampling of the time buckets, - * recorded time deltas can slip into the next period, - * which under full pressure can result in samples in - * excess of the period length. - * - * We don't want to report non-sensical pressures in - * excess of 100%, nor do we want to drop such events - * on the floor. Instead we punt any overage into the - * future until pressure subsides. By doing this we - * don't underreport the occurring pressure curve, we - * just report it delayed by one period length. - * - * The error isn't cumulative. As soon as another - * delta slips from a period P to P+1, by definition - * it frees up its time T in P. - */ - if (sample > period) - sample = period; - group->avg_total[s] += sample; - calc_avgs(group->avg[s], missed_periods, sample, period); - } - - return avg_next_update; -} - -static void psi_avgs_work(struct work_struct *work) -{ - struct delayed_work *dwork; - struct psi_group *group; - u32 changed_states; - u64 now; - - dwork = to_delayed_work(work); - group = container_of(dwork, struct psi_group, avgs_work); - - mutex_lock(&group->avgs_lock); - - now = sched_clock(); - - collect_percpu_times(group, PSI_AVGS, &changed_states); - /* - * If there is task activity, periodically fold the per-cpu - * times and feed samples into the running averages. If things - * are idle and there is no data to process, stop the clock. - * Once restarted, we'll catch up the running averages in one - * go - see calc_avgs() and missed_periods. - */ - if (now >= group->avg_next_update) - group->avg_next_update = update_averages(group, now); - - if (changed_states & PSI_STATE_RESCHEDULE) { - schedule_delayed_work(dwork, nsecs_to_jiffies( - group->avg_next_update - now) + 1); - } - - mutex_unlock(&group->avgs_lock); -} - /* Trigger tracking window manipulations */ static void window_reset(struct psi_window *win, u64 now, u64 value, u64 prev_growth) @@ -516,18 +430,6 @@ static u64 window_update(struct psi_window *win, u64 now, u64 value) return growth; } -static void init_triggers(struct psi_group *group, u64 now) -{ - struct psi_trigger *t; - - list_for_each_entry(t, &group->triggers, node) - window_reset(&t->win, now, - group->total[PSI_POLL][t->state], 0); - memcpy(group->polling_total, group->total[PSI_POLL], - sizeof(group->polling_total)); - group->polling_next_update = now + group->poll_min_period; -} - static u64 update_triggers(struct psi_group *group, u64 now) { struct psi_trigger *t; @@ -590,6 +492,104 @@ static u64 update_triggers(struct psi_group *group, u64 now) return now + group->poll_min_period; } +static u64 update_averages(struct psi_group *group, u64 now) +{ + unsigned long missed_periods = 0; + u64 expires, period; + u64 avg_next_update; + int s; + + /* avgX= */ + expires = group->avg_next_update; + if (now - expires >= psi_period) + missed_periods = div_u64(now - expires, psi_period); + + /* + * The periodic clock tick can get delayed for various + * reasons, especially on loaded systems. To avoid clock + * drift, we schedule the clock in fixed psi_period intervals. + * But the deltas we sample out of the per-cpu buckets above + * are based on the actual time elapsing between clock ticks. + */ + avg_next_update = expires + ((1 + missed_periods) * psi_period); + period = now - (group->avg_last_update + (missed_periods * psi_period)); + group->avg_last_update = now; + + for (s = 0; s < NR_PSI_STATES - 1; s++) { + u32 sample; + + sample = group->total[PSI_AVGS][s] - group->avg_total[s]; + /* + * Due to the lockless sampling of the time buckets, + * recorded time deltas can slip into the next period, + * which under full pressure can result in samples in + * excess of the period length. + * + * We don't want to report non-sensical pressures in + * excess of 100%, nor do we want to drop such events + * on the floor. Instead we punt any overage into the + * future until pressure subsides. By doing this we + * don't underreport the occurring pressure curve, we + * just report it delayed by one period length. + * + * The error isn't cumulative. As soon as another + * delta slips from a period P to P+1, by definition + * it frees up its time T in P. + */ + if (sample > period) + sample = period; + group->avg_total[s] += sample; + calc_avgs(group->avg[s], missed_periods, sample, period); + } + + return avg_next_update; +} + +static void psi_avgs_work(struct work_struct *work) +{ + struct delayed_work *dwork; + struct psi_group *group; + u32 changed_states; + u64 now; + + dwork = to_delayed_work(work); + group = container_of(dwork, struct psi_group, avgs_work); + + mutex_lock(&group->avgs_lock); + + now = sched_clock(); + + collect_percpu_times(group, PSI_AVGS, &changed_states); + /* + * If there is task activity, periodically fold the per-cpu + * times and feed samples into the running averages. If things + * are idle and there is no data to process, stop the clock. + * Once restarted, we'll catch up the running averages in one + * go - see calc_avgs() and missed_periods. + */ + if (now >= group->avg_next_update) + group->avg_next_update = update_averages(group, now); + + if (changed_states & PSI_STATE_RESCHEDULE) { + schedule_delayed_work(dwork, nsecs_to_jiffies( + group->avg_next_update - now) + 1); + } + + mutex_unlock(&group->avgs_lock); +} + +static void init_triggers(struct psi_group *group, u64 now) +{ + struct psi_trigger *t; + + list_for_each_entry(t, &group->triggers, node) + window_reset(&t->win, now, + group->total[PSI_POLL][t->state], 0); + memcpy(group->polling_total, group->total[PSI_POLL], + sizeof(group->polling_total)); + group->polling_next_update = now + group->poll_min_period; +} + /* Schedule polling if it's not already scheduled or forced. */ static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay, bool force) From patchwork Wed Mar 29 15:33:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Domenico Cerasuolo X-Patchwork-Id: 76658 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp502385vqo; Wed, 29 Mar 2023 08:35:38 -0700 (PDT) X-Google-Smtp-Source: AKy350YzL5idWqtb1UD7ZxQUpjMEarK9+wWveHeVVyDBTuXB6DWYQGhN4IIrnn/eTt29q3LROjOZ X-Received: by 2002:a17:907:7788:b0:932:c1e2:9984 with SMTP id ky8-20020a170907778800b00932c1e29984mr20136364ejc.58.1680104137800; Wed, 29 Mar 2023 08:35:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680104137; cv=none; d=google.com; s=arc-20160816; b=BlI9kElKrhNsu/6ij2MAL0qLwfAgAkJNYXXaw+KllJfXeKGrA3ig0KrQzy+m2c9ycd ZAzUD/R+25/sYneP9b8I7/8/ctrLiN5rUijYUOrXe3U/fOueID8/cy4GR80H/lysCs78 IzJ4erGeQcZTHZZO0vuxIvA3rcqez3odY0ZhK7ldJWXMqr+vb6ZhwG4o3uyWfSHgXlb8 N7KEPifwCSRKIpAPY0K8ih6bYOysn8kYflNYGBsZm3hqdPAfWcvxpbQOZIdOPLc2gQO1 gX0r3OBBSvQE+X0ggTI6mjYmm0GrCMPxGWrFvmF79w37XVbxaWEU1eFbsr6yJ3u8yVe2 22aA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=5hkpteDP5zyZA6Tf7swPgdaYXDNeNkwCZlHuhJRkDTo=; b=qKPgWWqzxa2o7js3B+x0QI07Wdxqij9HZyupHF7I5TGt8ykxWlyU1Jd5DWDPB59GgB xPtJQRMbM89ZahXU1t4J2h9CXwVzhvy/cDFFuj27CMxMbKwVjZ+s9h8drH1G0jmY8kG3 n3GTqHBbdBh671jrklUpj96KF/soFUA6XpDM97MmXkRSkkvQkkNTeWvhtcPtNahQAl6N +eLesLhmf9HU+jvfOQ9B0xC0d+JcJAH9VtWQNMht37WBW3CTlftWR4LYkn0Ge6RzUVOW cfWDsx/1x66y2C/QVKLUeLOvt9BCkSXdi9/e+oKH9TCj5sxZfzxKGHc31K0c58D0L7U5 DqwQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=ZNEnnGyK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id cs18-20020a170906dc9200b0093388136b11si19777610ejc.233.2023.03.29.08.35.12; Wed, 29 Mar 2023 08:35:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=ZNEnnGyK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230475AbjC2Pd4 (ORCPT + 99 others); Wed, 29 Mar 2023 11:33:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42898 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230335AbjC2Pdm (ORCPT ); Wed, 29 Mar 2023 11:33:42 -0400 Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 797BA59C7 for ; Wed, 29 Mar 2023 08:33:35 -0700 (PDT) Received: by mail-ed1-x532.google.com with SMTP id er13so23977641edb.9 for ; Wed, 29 Mar 2023 08:33:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680104013; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5hkpteDP5zyZA6Tf7swPgdaYXDNeNkwCZlHuhJRkDTo=; b=ZNEnnGyKkyyKm+akwMca/WNLx2gfGSTbpZUQB/1/cfbBUJpCwL/mhv1QIJJZXBKx7o rhhvimmFgBpB3wuRuGTGV8xTVXv9/xZjJvMgBUAF6DldDZphwdGk9mdgyW09aLXQ9JJl K3su1tIFI0RMOql5nDfEdqLRfca2dQxJwZeU8aFIZBBAW2wnitW8asBQdJGmO0D9mVmw jIJG1Arf1/Emp2rq5YhAYrsjFb376FJMstH0HbIzPLNX2ZTLafCKmdyFNKLUoPRpITOf gXAhwLG1xOw/8Dgj4pris+/EcPhvPdyesaMfIbdm2ARaEVv4Vnw02LREBvBs72e06RTK aAIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680104013; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5hkpteDP5zyZA6Tf7swPgdaYXDNeNkwCZlHuhJRkDTo=; b=ENpX8RVVpFCyoulAFqulgCOU89OAO1o4HR5ameZvP5StJxK3c6sljOElcKSlQWIKMK fOD4ZwxLlm98nE6i5VkDd7nhc9N/GdWsw+KEqOnwpk55jXG4h2XFoxk89n0jq8KVuvjl 4fKKQZB7tp9htcVVvJ/vg1qkObuOmtzhYypbk5yc88bWJ6CmrskChwG8J0atrx5H5s2/ fzXbesxhARJLiPNPzo/lrY7RLorU6BiWGftCZKOMEW2wKicswXKcDLJZUkMm65nyCYDG yNDNMqmoZ7obGuytToJN+ZCR+NuqONaHQL7CKqQLPIPuE5epQhS/3COwWwrKMuCDBAv3 9ivA== X-Gm-Message-State: AAQBX9eZQ/lfB862GW+u5h1+CN5jf7czAWBHr3KIgjIUSFOSy/iHXkin R2MdA7ZPVzX4jNX6GYnrDi20MzZ1Lq2XIVeD X-Received: by 2002:a17:907:6f0d:b0:946:b942:ad6a with SMTP id sy13-20020a1709076f0d00b00946b942ad6amr7319493ejc.38.1680104013559; Wed, 29 Mar 2023 08:33:33 -0700 (PDT) Received: from lelloman-5950.. (host-79-13-135-230.retail.telecomitalia.it. [79.13.135.230]) by smtp.gmail.com with ESMTPSA id g10-20020a17090670ca00b0093b8c0952e4sm10193735ejk.219.2023.03.29.08.33.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Mar 2023 08:33:33 -0700 (PDT) From: Domenico Cerasuolo To: linux-kernel@vger.kernel.org Cc: peterz@infradead.org, surenb@google.com, brauner@kernel.org, chris@chrisdown.name, hannes@cmpxchg.org, Domenico Cerasuolo Subject: [PATCH v4 2/4] sched/psi: rename existing poll members in preparation Date: Wed, 29 Mar 2023 17:33:25 +0200 Message-Id: <20230329153327.140215-3-cerasuolodomenico@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230329153327.140215-1-cerasuolodomenico@gmail.com> References: <20230329153327.140215-1-cerasuolodomenico@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761716876620963504?= X-GMAIL-MSGID: =?utf-8?q?1761716876620963504?= Renaming in PSI implementation to make a clear distinction between privileged and unprivileged triggers code to be implemented in the next patch. Suggested-by: Johannes Weiner Signed-off-by: Domenico Cerasuolo Acked-by: Johannes Weiner --- include/linux/psi_types.h | 36 ++++----- kernel/sched/psi.c | 163 +++++++++++++++++++------------------- 2 files changed, 100 insertions(+), 99 deletions(-) diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index 1e0a0d7ace3a..1819afa8b198 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -175,26 +175,26 @@ struct psi_group { u64 total[NR_PSI_AGGREGATORS][NR_PSI_STATES - 1]; unsigned long avg[NR_PSI_STATES - 1][3]; - /* Monitor work control */ - struct task_struct __rcu *poll_task; - struct timer_list poll_timer; - wait_queue_head_t poll_wait; - atomic_t poll_wakeup; - atomic_t poll_scheduled; + /* Monitor RT polling work control */ + struct task_struct __rcu *rtpoll_task; + struct timer_list rtpoll_timer; + wait_queue_head_t rtpoll_wait; + atomic_t rtpoll_wakeup; + atomic_t rtpoll_scheduled; /* Protects data used by the monitor */ - struct mutex trigger_lock; - - /* Configured polling triggers */ - struct list_head triggers; - u32 nr_triggers[NR_PSI_STATES - 1]; - u32 poll_states; - u64 poll_min_period; - - /* Total stall times at the start of monitor activation */ - u64 polling_total[NR_PSI_STATES - 1]; - u64 polling_next_update; - u64 polling_until; + struct mutex rtpoll_trigger_lock; + + /* Configured RT polling triggers */ + struct list_head rtpoll_triggers; + u32 rtpoll_nr_triggers[NR_PSI_STATES - 1]; + u32 rtpoll_states; + u64 rtpoll_min_period; + + /* Total stall times at the start of RT polling monitor activation */ + u64 rtpoll_total[NR_PSI_STATES - 1]; + u64 rtpoll_next_update; + u64 rtpoll_until; }; #else /* CONFIG_PSI */ diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index fe9269f1d2a4..a3d0b5cf797a 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -189,14 +189,14 @@ static void group_init(struct psi_group *group) INIT_DELAYED_WORK(&group->avgs_work, psi_avgs_work); mutex_init(&group->avgs_lock); /* Init trigger-related members */ - atomic_set(&group->poll_scheduled, 0); - mutex_init(&group->trigger_lock); - INIT_LIST_HEAD(&group->triggers); - group->poll_min_period = U32_MAX; - group->polling_next_update = ULLONG_MAX; - init_waitqueue_head(&group->poll_wait); - timer_setup(&group->poll_timer, poll_timer_fn, 0); - rcu_assign_pointer(group->poll_task, NULL); + atomic_set(&group->rtpoll_scheduled, 0); + mutex_init(&group->rtpoll_trigger_lock); + INIT_LIST_HEAD(&group->rtpoll_triggers); + group->rtpoll_min_period = U32_MAX; + group->rtpoll_next_update = ULLONG_MAX; + init_waitqueue_head(&group->rtpoll_wait); + timer_setup(&group->rtpoll_timer, poll_timer_fn, 0); + rcu_assign_pointer(group->rtpoll_task, NULL); } void __init psi_init(void) @@ -440,11 +440,11 @@ static u64 update_triggers(struct psi_group *group, u64 now) * On subsequent updates, calculate growth deltas and let * watchers know when their specified thresholds are exceeded. */ - list_for_each_entry(t, &group->triggers, node) { + list_for_each_entry(t, &group->rtpoll_triggers, node) { u64 growth; bool new_stall; - new_stall = group->polling_total[t->state] != total[t->state]; + new_stall = group->rtpoll_total[t->state] != total[t->state]; /* Check for stall activity or a previous threshold breach */ if (!new_stall && !t->pending_event) @@ -486,10 +486,10 @@ static u64 update_triggers(struct psi_group *group, u64 now) } if (update_total) - memcpy(group->polling_total, total, - sizeof(group->polling_total)); + memcpy(group->rtpoll_total, total, + sizeof(group->rtpoll_total)); - return now + group->poll_min_period; + return now + group->rtpoll_min_period; } static u64 update_averages(struct psi_group *group, u64 now) @@ -582,53 +582,53 @@ static void init_triggers(struct psi_group *group, u64 now) { struct psi_trigger *t; - list_for_each_entry(t, &group->triggers, node) + list_for_each_entry(t, &group->rtpoll_triggers, node) window_reset(&t->win, now, group->total[PSI_POLL][t->state], 0); - memcpy(group->polling_total, group->total[PSI_POLL], - sizeof(group->polling_total)); - group->polling_next_update = now + group->poll_min_period; + memcpy(group->rtpoll_total, group->total[PSI_POLL], + sizeof(group->rtpoll_total)); + group->rtpoll_next_update = now + group->rtpoll_min_period; } /* Schedule polling if it's not already scheduled or forced. */ -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay, +static void psi_schedule_rtpoll_work(struct psi_group *group, unsigned long delay, bool force) { struct task_struct *task; /* * atomic_xchg should be called even when !force to provide a - * full memory barrier (see the comment inside psi_poll_work). + * full memory barrier (see the comment inside psi_rtpoll_work). */ - if (atomic_xchg(&group->poll_scheduled, 1) && !force) + if (atomic_xchg(&group->rtpoll_scheduled, 1) && !force) return; rcu_read_lock(); - task = rcu_dereference(group->poll_task); + task = rcu_dereference(group->rtpoll_task); /* * kworker might be NULL in case psi_trigger_destroy races with * psi_task_change (hotpath) which can't use locks */ if (likely(task)) - mod_timer(&group->poll_timer, jiffies + delay); + mod_timer(&group->rtpoll_timer, jiffies + delay); else - atomic_set(&group->poll_scheduled, 0); + atomic_set(&group->rtpoll_scheduled, 0); rcu_read_unlock(); } -static void psi_poll_work(struct psi_group *group) +static void psi_rtpoll_work(struct psi_group *group) { bool force_reschedule = false; u32 changed_states; u64 now; - mutex_lock(&group->trigger_lock); + mutex_lock(&group->rtpoll_trigger_lock); now = sched_clock(); - if (now > group->polling_until) { + if (now > group->rtpoll_until) { /* * We are either about to start or might stop polling if no * state change was recorded. Resetting poll_scheduled leaves @@ -638,7 +638,7 @@ static void psi_poll_work(struct psi_group *group) * should be negligible and polling_next_update still keeps * updates correctly on schedule. */ - atomic_set(&group->poll_scheduled, 0); + atomic_set(&group->rtpoll_scheduled, 0); /* * A task change can race with the poll worker that is supposed to * report on it. To avoid missing events, ensure ordering between @@ -667,9 +667,9 @@ static void psi_poll_work(struct psi_group *group) collect_percpu_times(group, PSI_POLL, &changed_states); - if (changed_states & group->poll_states) { + if (changed_states & group->rtpoll_states) { /* Initialize trigger windows when entering polling mode */ - if (now > group->polling_until) + if (now > group->rtpoll_until) init_triggers(group, now); /* @@ -677,50 +677,50 @@ static void psi_poll_work(struct psi_group *group) * minimum tracking window as long as monitor states are * changing. */ - group->polling_until = now + - group->poll_min_period * UPDATES_PER_WINDOW; + group->rtpoll_until = now + + group->rtpoll_min_period * UPDATES_PER_WINDOW; } - if (now > group->polling_until) { - group->polling_next_update = ULLONG_MAX; + if (now > group->rtpoll_until) { + group->rtpoll_next_update = ULLONG_MAX; goto out; } - if (now >= group->polling_next_update) - group->polling_next_update = update_triggers(group, now); + if (now >= group->rtpoll_next_update) + group->rtpoll_next_update = update_triggers(group, now); - psi_schedule_poll_work(group, - nsecs_to_jiffies(group->polling_next_update - now) + 1, + psi_schedule_rtpoll_work(group, + nsecs_to_jiffies(group->rtpoll_next_update - now) + 1, force_reschedule); out: - mutex_unlock(&group->trigger_lock); + mutex_unlock(&group->rtpoll_trigger_lock); } -static int psi_poll_worker(void *data) +static int psi_rtpoll_worker(void *data) { struct psi_group *group = (struct psi_group *)data; sched_set_fifo_low(current); while (true) { - wait_event_interruptible(group->poll_wait, - atomic_cmpxchg(&group->poll_wakeup, 1, 0) || + wait_event_interruptible(group->rtpoll_wait, + atomic_cmpxchg(&group->rtpoll_wakeup, 1, 0) || kthread_should_stop()); if (kthread_should_stop()) break; - psi_poll_work(group); + psi_rtpoll_work(group); } return 0; } static void poll_timer_fn(struct timer_list *t) { - struct psi_group *group = from_timer(group, t, poll_timer); + struct psi_group *group = from_timer(group, t, rtpoll_timer); - atomic_set(&group->poll_wakeup, 1); - wake_up_interruptible(&group->poll_wait); + atomic_set(&group->rtpoll_wakeup, 1); + wake_up_interruptible(&group->rtpoll_wait); } static void record_times(struct psi_group_cpu *groupc, u64 now) @@ -851,8 +851,8 @@ static void psi_group_change(struct psi_group *group, int cpu, write_seqcount_end(&groupc->seq); - if (state_mask & group->poll_states) - psi_schedule_poll_work(group, 1, false); + if (state_mask & group->rtpoll_states) + psi_schedule_rtpoll_work(group, 1, false); if (wake_clock && !delayed_work_pending(&group->avgs_work)) schedule_delayed_work(&group->avgs_work, PSI_FREQ); @@ -1005,8 +1005,8 @@ void psi_account_irqtime(struct task_struct *task, u32 delta) write_seqcount_end(&groupc->seq); - if (group->poll_states & (1 << PSI_IRQ_FULL)) - psi_schedule_poll_work(group, 1, false); + if (group->rtpoll_states & (1 << PSI_IRQ_FULL)) + psi_schedule_rtpoll_work(group, 1, false); } while ((group = group->parent)); } #endif @@ -1101,7 +1101,7 @@ void psi_cgroup_free(struct cgroup *cgroup) cancel_delayed_work_sync(&cgroup->psi->avgs_work); free_percpu(cgroup->psi->pcpu); /* All triggers must be removed by now */ - WARN_ONCE(cgroup->psi->poll_states, "psi: trigger leak\n"); + WARN_ONCE(cgroup->psi->rtpoll_states, "psi: trigger leak\n"); kfree(cgroup->psi); } @@ -1302,29 +1302,29 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group, init_waitqueue_head(&t->event_wait); t->pending_event = false; - mutex_lock(&group->trigger_lock); + mutex_lock(&group->rtpoll_trigger_lock); - if (!rcu_access_pointer(group->poll_task)) { + if (!rcu_access_pointer(group->rtpoll_task)) { struct task_struct *task; - task = kthread_create(psi_poll_worker, group, "psimon"); + task = kthread_create(psi_rtpoll_worker, group, "psimon"); if (IS_ERR(task)) { kfree(t); - mutex_unlock(&group->trigger_lock); + mutex_unlock(&group->rtpoll_trigger_lock); return ERR_CAST(task); } - atomic_set(&group->poll_wakeup, 0); + atomic_set(&group->rtpoll_wakeup, 0); wake_up_process(task); - rcu_assign_pointer(group->poll_task, task); + rcu_assign_pointer(group->rtpoll_task, task); } - list_add(&t->node, &group->triggers); - group->poll_min_period = min(group->poll_min_period, + list_add(&t->node, &group->rtpoll_triggers); + group->rtpoll_min_period = min(group->rtpoll_min_period, div_u64(t->win.size, UPDATES_PER_WINDOW)); - group->nr_triggers[t->state]++; - group->poll_states |= (1 << t->state); + group->rtpoll_nr_triggers[t->state]++; + group->rtpoll_states |= (1 << t->state); - mutex_unlock(&group->trigger_lock); + mutex_unlock(&group->rtpoll_trigger_lock); return t; } @@ -1349,51 +1349,52 @@ void psi_trigger_destroy(struct psi_trigger *t) */ wake_up_pollfree(&t->event_wait); - mutex_lock(&group->trigger_lock); + mutex_lock(&group->rtpoll_trigger_lock); if (!list_empty(&t->node)) { struct psi_trigger *tmp; u64 period = ULLONG_MAX; list_del(&t->node); - group->nr_triggers[t->state]--; - if (!group->nr_triggers[t->state]) - group->poll_states &= ~(1 << t->state); + group->rtpoll_nr_triggers[t->state]--; + if (!group->rtpoll_nr_triggers[t->state]) + group->rtpoll_states &= ~(1 << t->state); /* reset min update period for the remaining triggers */ - list_for_each_entry(tmp, &group->triggers, node) + list_for_each_entry(tmp, &group->rtpoll_triggers, node) period = min(period, div_u64(tmp->win.size, UPDATES_PER_WINDOW)); - group->poll_min_period = period; - /* Destroy poll_task when the last trigger is destroyed */ - if (group->poll_states == 0) { - group->polling_until = 0; + group->rtpoll_min_period = period; + /* Destroy rtpoll_task when the last trigger is destroyed */ + if (group->rtpoll_states == 0) { + group->rtpoll_until = 0; task_to_destroy = rcu_dereference_protected( - group->poll_task, - lockdep_is_held(&group->trigger_lock)); - rcu_assign_pointer(group->poll_task, NULL); - del_timer(&group->poll_timer); + group->rtpoll_task, + lockdep_is_held(&group->rtpoll_trigger_lock)); + rcu_assign_pointer(group->rtpoll_task, NULL); + del_timer(&group->rtpoll_timer); } } - mutex_unlock(&group->trigger_lock); + mutex_unlock(&group->rtpoll_trigger_lock); /* - * Wait for psi_schedule_poll_work RCU to complete its read-side + * Wait for psi_schedule_rtpoll_work RCU to complete its read-side * critical section before destroying the trigger and optionally the - * poll_task. + * rtpoll_task. */ synchronize_rcu(); /* - * Stop kthread 'psimon' after releasing trigger_lock to prevent a - * deadlock while waiting for psi_poll_work to acquire trigger_lock + * Stop kthread 'psimon' after releasing rtpoll_trigger_lock to prevent + * a deadlock while waiting for psi_rtpoll_work to acquire + * rtpoll_trigger_lock */ if (task_to_destroy) { /* * After the RCU grace period has expired, the worker - * can no longer be found through group->poll_task. + * can no longer be found through group->rtpoll_task. */ kthread_stop(task_to_destroy); - atomic_set(&group->poll_scheduled, 0); + atomic_set(&group->rtpoll_scheduled, 0); } kfree(t); } From patchwork Wed Mar 29 15:33:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Domenico Cerasuolo X-Patchwork-Id: 76662 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp506881vqo; Wed, 29 Mar 2023 08:42:59 -0700 (PDT) X-Google-Smtp-Source: AK7set8b4a93cG26QFHSfYXrjGJKAQazUAMb/tesZqaoQGOSKMIsHq6sUhKWyGDmWc2KMILgi0xO X-Received: by 2002:a05:6a20:6628:b0:d4:9d94:8e7c with SMTP id n40-20020a056a20662800b000d49d948e7cmr13859351pzh.2.1680104578892; Wed, 29 Mar 2023 08:42:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680104578; cv=none; d=google.com; s=arc-20160816; b=saXPNXpY5iqP436bNP16H5WsYvL61ddeP5jDAHBPiM4TgWXyw1s8rOwHdIWsIGyGy/ J+WlURdnZNzOoW+TyAJE+eFD6ag9r9OlimmJfexwpa/yKJy1nMCBQws9zTdVfrHqR0NE 6nhq2KpRWw8jgUNNXEU57GeVK6cga5xiy2zEw/lkjgR4Zpqk7NyLnH7HNjZyiHlDLoAl sHT74GRT7TnBHt3P5aV2qE/EZPJzRrMFfJAE3tlzkpOWFyqLVWS7YKg/BFhtn78zUF6e cWDgh1Apq1NM7UA64rVVGKammRpPfoOmKLXHllsPwTRzXBXQ9+OTNk+b9RKVjhi50bfc 5hpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=eGGlb2745yxdkPMgr58M3JFNd1mBkXZRVV0HG70eWBQ=; b=efwGircTVb5RJ0CllVaqPcGm5QdbcTzDyaVkKKktjriYZJNyTaJk51hTAyRpdtbjpP qX6C2zNtAl36F/vzL+ZOsrRyncXrgmalRp/PYhw8XbX8IjL7TNhs/tj65hYlDMP1tvEG Na824DyjinlOoqvv1/9a+Pp1B4mPOgUYNmO8v3uzBS2kTazz1X1JDih3f/hUvcmKMohd lzPU/Xw0ChEWMjLtmZ2d1c+plYv0tqf0cJYEM3RAA22IF3K3ohX7Q8wJYHVsIBn+hF8k 4/YJSkqBOQqB0HEDyVI48x8Y7OaFPoVe4Z+dGIgSOU7N7FUu13NpKroWN6Yzhi199qKL kBcQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=Nx+s+uU+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y71-20020a638a4a000000b0050fa9bc63cbsi20636948pgd.432.2023.03.29.08.42.35; Wed, 29 Mar 2023 08:42:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=Nx+s+uU+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230094AbjC2Pd6 (ORCPT + 99 others); Wed, 29 Mar 2023 11:33:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42340 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230037AbjC2Pdn (ORCPT ); Wed, 29 Mar 2023 11:33:43 -0400 Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C292B101 for ; Wed, 29 Mar 2023 08:33:36 -0700 (PDT) Received: by mail-ed1-x52f.google.com with SMTP id ew6so64921287edb.7 for ; Wed, 29 Mar 2023 08:33:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680104015; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=eGGlb2745yxdkPMgr58M3JFNd1mBkXZRVV0HG70eWBQ=; b=Nx+s+uU+6QMulxybZtm8C/77FYeBzRpE1hztS7IJ2L2udTwU8UEGovabFRLAG9/4wt I0M2gCGNgND9zwstbITdZlOZafGL8jYbQdt2zbHjg5OMIev4NUEOcumgIGsYea5u/itb YOjldUkyDn5m0tll11aTzCoG8FKeUoF1BY9CKIsGl6Gm2VfZjWghotwX2apEEYKoEhLZ QMNd1f+HKCuZa+wWcJVselSM1I+3FxGIIxLw3O/SveuRSX84diw5DnYuwt+W3qMwY/W7 +hbteaGGTGBVej6YDew4vNRTH/NEcKYX1k+Wb3xbe2LnI0wPSCJHw3MwmTIdhsPXtbAw FKFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680104015; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eGGlb2745yxdkPMgr58M3JFNd1mBkXZRVV0HG70eWBQ=; b=7qNDlFKXsIgMZgtF8RGOKi94tRAzkZ61dmvZisU4VDz6ppEsJ56KDOBQfSTEPj8NLW j9a0W6qLgE+KEiuMaHy76x3wCi5FKm2UXGP7U3Azg9xQn9kv1FFD/GPdyp0caYzlIKjl JltPwqUbbY7rdjHu0aD+GxADXIDrUy/BR0Vhqvch8xImgXRXDptiI7OomKwoqwYYEch6 mSTeaQDx5dRtNHmWLAyfUK3m3v6O7hNM849UkidQBj+jciVxkYbsdQnlWY6MAcmCK3WY w7vEc5mTSrFL8O/ZTjPgBTE9K8YILFanhXwLOJZ0jY5rcSlFVVKNf7f3W2D6HBOMsImK 1ZEA== X-Gm-Message-State: AAQBX9cRda8OkWtKoxRi9tXSuXtYil3WMx+cHUS4UCue3oDx/i3O0NKO XsMgEhqWwK0cQ5Xpat5C4VJR4EMQPJg6UhOr X-Received: by 2002:a17:907:6d0c:b0:944:18e0:6ef2 with SMTP id sa12-20020a1709076d0c00b0094418e06ef2mr15946875ejc.73.1680104014707; Wed, 29 Mar 2023 08:33:34 -0700 (PDT) Received: from lelloman-5950.. (host-79-13-135-230.retail.telecomitalia.it. [79.13.135.230]) by smtp.gmail.com with ESMTPSA id g10-20020a17090670ca00b0093b8c0952e4sm10193735ejk.219.2023.03.29.08.33.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Mar 2023 08:33:34 -0700 (PDT) From: Domenico Cerasuolo To: linux-kernel@vger.kernel.org Cc: peterz@infradead.org, surenb@google.com, brauner@kernel.org, chris@chrisdown.name, hannes@cmpxchg.org, Domenico Cerasuolo Subject: [PATCH v4 3/4] sched/psi: extract update_triggers side effect Date: Wed, 29 Mar 2023 17:33:26 +0200 Message-Id: <20230329153327.140215-4-cerasuolodomenico@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230329153327.140215-1-cerasuolodomenico@gmail.com> References: <20230329153327.140215-1-cerasuolodomenico@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761717338609730001?= X-GMAIL-MSGID: =?utf-8?q?1761717338609730001?= This change moves update_total flag out of update_triggers function, currently called only in psi_poll_work. In the next patch, update_triggers will be called also in psi_avgs_work, but the total update information is specific to psi_poll_work. Returning update_total value to the caller let us avoid differentiating the implementation of update_triggers for different aggregators. Suggested-by: Johannes Weiner Signed-off-by: Domenico Cerasuolo Acked-by: Johannes Weiner --- kernel/sched/psi.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index a3d0b5cf797a..f3df6a8ff493 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -430,11 +430,11 @@ static u64 window_update(struct psi_window *win, u64 now, u64 value) return growth; } -static u64 update_triggers(struct psi_group *group, u64 now) +static u64 update_triggers(struct psi_group *group, u64 now, bool *update_total) { struct psi_trigger *t; - bool update_total = false; u64 *total = group->total[PSI_POLL]; + *update_total = false; /* * On subsequent updates, calculate growth deltas and let @@ -462,7 +462,7 @@ static u64 update_triggers(struct psi_group *group, u64 now) * been through all of them. Also remember to extend the * polling time if we see new stall activity. */ - update_total = true; + *update_total = true; /* Calculate growth since last update */ growth = window_update(&t->win, now, total[t->state]); @@ -485,10 +485,6 @@ static u64 update_triggers(struct psi_group *group, u64 now) t->pending_event = false; } - if (update_total) - memcpy(group->rtpoll_total, total, - sizeof(group->rtpoll_total)); - return now + group->rtpoll_min_period; } @@ -622,6 +618,7 @@ static void psi_rtpoll_work(struct psi_group *group) { bool force_reschedule = false; u32 changed_states; + bool update_total; u64 now; mutex_lock(&group->rtpoll_trigger_lock); @@ -686,8 +683,12 @@ static void psi_rtpoll_work(struct psi_group *group) goto out; } - if (now >= group->rtpoll_next_update) - group->rtpoll_next_update = update_triggers(group, now); + if (now >= group->rtpoll_next_update) { + group->rtpoll_next_update = update_triggers(group, now, &update_total); + if (update_total) + memcpy(group->rtpoll_total, group->total[PSI_POLL], + sizeof(group->rtpoll_total)); + } psi_schedule_rtpoll_work(group, nsecs_to_jiffies(group->rtpoll_next_update - now) + 1, From patchwork Wed Mar 29 15:33:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Domenico Cerasuolo X-Patchwork-Id: 76663 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp511662vqo; Wed, 29 Mar 2023 08:51:04 -0700 (PDT) X-Google-Smtp-Source: AKy350Yzscfx4D3Ac2gK6RCamxB8TjG9WYCkmVolB2be5TjpjpB+8a6vomqKxfr5KgQhi+o6J+CT X-Received: by 2002:a17:906:6b8b:b0:93f:9b68:a0f4 with SMTP id l11-20020a1709066b8b00b0093f9b68a0f4mr2957416ejr.26.1680105064618; Wed, 29 Mar 2023 08:51:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680105064; cv=none; d=google.com; s=arc-20160816; b=YUncIqvc3e2Gm83VjHmpark/SdiZVDLacAtu6foVtrlgjlwFTWviy/+Uxd7yULtWvw CToFJyyYJJpxgEiV4NVywN5AbU3l1AHJvbKeRauuZIHKYkVXTVnI6nsjUhZDYVrrGEOF lvhsbrdntz+s2yIU5eg7Tg9bikLteODC9raooqGvlB7bFEJp2wKB5sMMPK7wgYGNLuAu OyerxhWvXdwFp1Zfor34mvDmyEKLBuS2f5KkeJYKqWjbbbRvGJxzALGVTFGZ2zs3nUaK f6joJ2Qbqrh70j+ktIGuIcQ39ZhJNQ8PkCo0hYRHzDdieEm9xhgvdZihrx3qNWbDK+zj q9uw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=yyx1bJqCQCciR5OiAx/Y/6wLtU8qwBaslYTnmEUxQsk=; b=gbtN83OYS5b66XO7Qb6VsKvowpSgADJfe9MHwNk2mNruGRDPZwtxQSQn1LL8XRqUIL dExIIYl0OQWklTXVbQaIYHVnZpY3izlgKtRBpGYICKjpIdIwzIyEmZvXxBwBVByhDC+w wDzeNvuRcrH4rYxmNtLKQ+Tr2rHWdyVQ54Nyn6Dylix0ZoZIbQ0RIFdjkXen0fQ2vPnW +XWyqckYUON0qd0usC+QdY4cJ1/8cZQJ/i4pUDqW8iscqhBCpZlCQ9A84DcJwMRjzXLL ScmCmct9fje/hWKHIUWsgDAjJGSElpxH19IsCFA/pjJWADVkcdLUeIUPujBaAhkctOWt hEDg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=SMSezu5w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s24-20020a1709064d9800b009314fd54ad1si31220154eju.703.2023.03.29.08.50.41; Wed, 29 Mar 2023 08:51:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=SMSezu5w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229994AbjC2PeG (ORCPT + 99 others); Wed, 29 Mar 2023 11:34:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42738 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229954AbjC2Pdq (ORCPT ); Wed, 29 Mar 2023 11:33:46 -0400 Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E568D59E1 for ; Wed, 29 Mar 2023 08:33:37 -0700 (PDT) Received: by mail-ed1-x532.google.com with SMTP id cn12so65002334edb.4 for ; Wed, 29 Mar 2023 08:33:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1680104016; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=yyx1bJqCQCciR5OiAx/Y/6wLtU8qwBaslYTnmEUxQsk=; b=SMSezu5wGs255+AnzlJsEosRUdhjku7F3sOqbpyi4V0zvwd0K0xl3w9z+tU6PvSPz1 DCxwCBaKDMQXWNnB0BfUPmHHnNG2MbKrctu3XB/xrAXLGC/4uC9UcPm1M3qgEEh7njMd EsNeigSCnCCNbcwtbSsrZ5KEFzK9ACl3t/n11H0U8cTHTpkOYrIuR5FIAw/6a5VLqqPf 4iEt9Oa7rF9IxeAikX73ER6kWdKal3/D/ZcQkXAuiAUjailQyyc2ZH/0BYQbrBu/zKQD U7p1jW3S+TW/B/lnh+4bTySxCAfX+yynu7xDjumVBIsXuA9hS8bJaxthO7lfVT4ojTcg oDWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680104016; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yyx1bJqCQCciR5OiAx/Y/6wLtU8qwBaslYTnmEUxQsk=; b=SYurE3ud3WMWYOx4mvBTInse20U06XHjvhl4ab3msdaVjZiAv5jRISBCqKYUhvxwTH YXHiNL10jfhByq/G6sALuSsAPVtFgjboDglFfcGrYNRDYDMjX60YOA1mIt4I8IlHzlRk U2Nl68f8Tw/8JX2WhCYf263rPAFmQFDXhaGNyXGSWdAAtcXnYi0gc0EUfixWkKsHy9j5 TdmIqMVah1bPJk8rfrd2qbvOwFr5XL3xuFPXQSMX4cdQzv6Y0W2e0djxP62X1SfKjmhi U4wiHiqTXUG18LzyHxU18GV7kYjL9DiPs0ihMkeAO+Xr2cHeqHL7HAqltTb0WL3UjT3C arSg== X-Gm-Message-State: AAQBX9dv8hJnQ1DtYEJTVG7gH8G83fpDA7J7/uJ5N29qaGaR6baK0r3W 5WII6dPkPPw+Q27tvqMtXRWUp6xBpvAYisOg X-Received: by 2002:a05:6402:2693:b0:4bc:edde:150d with SMTP id w19-20020a056402269300b004bcedde150dmr3320429edd.0.1680104016066; Wed, 29 Mar 2023 08:33:36 -0700 (PDT) Received: from lelloman-5950.. (host-79-13-135-230.retail.telecomitalia.it. [79.13.135.230]) by smtp.gmail.com with ESMTPSA id g10-20020a17090670ca00b0093b8c0952e4sm10193735ejk.219.2023.03.29.08.33.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Mar 2023 08:33:35 -0700 (PDT) From: Domenico Cerasuolo To: linux-kernel@vger.kernel.org Cc: peterz@infradead.org, surenb@google.com, brauner@kernel.org, chris@chrisdown.name, hannes@cmpxchg.org, Domenico Cerasuolo Subject: [PATCH v4 4/4] sched/psi: allow unprivileged polling of N*2s period Date: Wed, 29 Mar 2023 17:33:27 +0200 Message-Id: <20230329153327.140215-5-cerasuolodomenico@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230329153327.140215-1-cerasuolodomenico@gmail.com> References: <20230329153327.140215-1-cerasuolodomenico@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761717848259897333?= X-GMAIL-MSGID: =?utf-8?q?1761717848259897333?= PSI offers 2 mechanisms to get information about a specific resource pressure. One is reading from /proc/pressure/, which gives average pressures aggregated every 2s. The other is creating a pollable fd for a specific resource and cgroup. The trigger creation requires CAP_SYS_RESOURCE, and gives the possibility to pick specific time window and threshold, spawing an RT thread to aggregate the data. Systemd would like to provide containers the option to monitor pressure on their own cgroup and sub-cgroups. For example, if systemd launches a container that itself then launches services, the container should have the ability to poll() for pressure in individual services. But neither the container nor the services are privileged. This patch implements a mechanism to allow unprivileged users to create pressure triggers. The difference with privileged triggers creation is that unprivileged ones must have a time window that's a multiple of 2s. This is so that we can avoid unrestricted spawning of rt threads, and use instead the same aggregation mechanism done for the averages, which runs independently of any triggers. Suggested-by: Johannes Weiner Signed-off-by: Domenico Cerasuolo Acked-by: Johannes Weiner --- Documentation/accounting/psi.rst | 4 ++ include/linux/psi.h | 2 +- include/linux/psi_types.h | 7 ++ kernel/cgroup/cgroup.c | 2 +- kernel/sched/psi.c | 120 +++++++++++++++++++------------ 5 files changed, 87 insertions(+), 48 deletions(-) diff --git a/Documentation/accounting/psi.rst b/Documentation/accounting/psi.rst index 5e40b3f437f9..df6062eb3abb 100644 --- a/Documentation/accounting/psi.rst +++ b/Documentation/accounting/psi.rst @@ -105,6 +105,10 @@ prevent overly frequent polling. Max limit is chosen as a high enough number after which monitors are most likely not needed and psi averages can be used instead. +Unprivileged users can also create monitors, with the only limitation that the +window size must be a multiple of 2s, in order to prevent excessive resource +usage. + When activated, psi monitor stays active for at least the duration of one tracking window to avoid repeated activations/deactivations when system is bouncing in and out of the stall state. diff --git a/include/linux/psi.h b/include/linux/psi.h index b029a847def1..ab26200c2803 100644 --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -24,7 +24,7 @@ void psi_memstall_leave(unsigned long *flags); int psi_show(struct seq_file *s, struct psi_group *group, enum psi_res res); struct psi_trigger *psi_trigger_create(struct psi_group *group, - char *buf, enum psi_res res); + char *buf, enum psi_res res, struct file *file); void psi_trigger_destroy(struct psi_trigger *t); __poll_t psi_trigger_poll(void **trigger_ptr, struct file *file, diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index 1819afa8b198..bc998471976a 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -151,6 +151,9 @@ struct psi_trigger { /* Deferred event(s) from previous ratelimit window */ bool pending_event; + + /* Used to differentiate destruction action*/ + enum psi_aggregators aggregator; }; struct psi_group { @@ -171,6 +174,10 @@ struct psi_group { /* Aggregator work control */ struct delayed_work avgs_work; + /* Unprivileged triggers against N*PSI_FREQ windows */ + struct list_head avg_triggers; + u32 avg_nr_triggers[NR_PSI_STATES - 1]; + /* Total stall times and sampled pressure averages */ u64 total[NR_PSI_AGGREGATORS][NR_PSI_STATES - 1]; unsigned long avg[NR_PSI_STATES - 1][3]; diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 935e8121b21e..dead36969bba 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -3761,7 +3761,7 @@ static ssize_t pressure_write(struct kernfs_open_file *of, char *buf, } psi = cgroup_psi(cgrp); - new = psi_trigger_create(psi, buf, res); + new = psi_trigger_create(psi, buf, res, of->file); if (IS_ERR(new)) { cgroup_put(cgrp); return PTR_ERR(new); diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index f3df6a8ff493..b36c3b71e8f2 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -186,9 +186,9 @@ static void group_init(struct psi_group *group) seqcount_init(&per_cpu_ptr(group->pcpu, cpu)->seq); group->avg_last_update = sched_clock(); group->avg_next_update = group->avg_last_update + psi_period; - INIT_DELAYED_WORK(&group->avgs_work, psi_avgs_work); mutex_init(&group->avgs_lock); - /* Init trigger-related members */ + + /* Init rtpoll trigger-related members */ atomic_set(&group->rtpoll_scheduled, 0); mutex_init(&group->rtpoll_trigger_lock); INIT_LIST_HEAD(&group->rtpoll_triggers); @@ -197,6 +197,11 @@ static void group_init(struct psi_group *group) init_waitqueue_head(&group->rtpoll_wait); timer_setup(&group->rtpoll_timer, poll_timer_fn, 0); rcu_assign_pointer(group->rtpoll_task, NULL); + + /* Init avg trigger-related members */ + INIT_LIST_HEAD(&group->avg_triggers); + memset(group->avg_nr_triggers, 0, sizeof(group->avg_nr_triggers)); + INIT_DELAYED_WORK(&group->avgs_work, psi_avgs_work); } void __init psi_init(void) @@ -430,21 +435,25 @@ static u64 window_update(struct psi_window *win, u64 now, u64 value) return growth; } -static u64 update_triggers(struct psi_group *group, u64 now, bool *update_total) +static u64 update_triggers(struct psi_group *group, u64 now, bool *update_total, + enum psi_aggregators aggregator) { struct psi_trigger *t; - u64 *total = group->total[PSI_POLL]; + u64 *total = group->total[aggregator]; + struct list_head *triggers = aggregator == PSI_AVGS ? &group->avg_triggers + : &group->rtpoll_triggers; + u64 *aggregator_total = aggregator == PSI_AVGS ? group->avg_total : group->rtpoll_total; *update_total = false; /* * On subsequent updates, calculate growth deltas and let * watchers know when their specified thresholds are exceeded. */ - list_for_each_entry(t, &group->rtpoll_triggers, node) { + list_for_each_entry(t, triggers, node) { u64 growth; bool new_stall; - new_stall = group->rtpoll_total[t->state] != total[t->state]; + new_stall = aggregator_total[t->state] != total[t->state]; /* Check for stall activity or a previous threshold breach */ if (!new_stall && !t->pending_event) @@ -546,6 +555,7 @@ static void psi_avgs_work(struct work_struct *work) struct delayed_work *dwork; struct psi_group *group; u32 changed_states; + bool update_total; u64 now; dwork = to_delayed_work(work); @@ -563,8 +573,10 @@ static void psi_avgs_work(struct work_struct *work) * Once restarted, we'll catch up the running averages in one * go - see calc_avgs() and missed_periods. */ - if (now >= group->avg_next_update) + if (now >= group->avg_next_update) { + update_triggers(group, now, &update_total, PSI_AVGS); group->avg_next_update = update_averages(group, now); + } if (changed_states & PSI_STATE_RESCHEDULE) { schedule_delayed_work(dwork, nsecs_to_jiffies( @@ -574,7 +586,7 @@ static void psi_avgs_work(struct work_struct *work) mutex_unlock(&group->avgs_lock); } -static void init_triggers(struct psi_group *group, u64 now) +static void init_rtpoll_triggers(struct psi_group *group, u64 now) { struct psi_trigger *t; @@ -667,7 +679,7 @@ static void psi_rtpoll_work(struct psi_group *group) if (changed_states & group->rtpoll_states) { /* Initialize trigger windows when entering polling mode */ if (now > group->rtpoll_until) - init_triggers(group, now); + init_rtpoll_triggers(group, now); /* * Keep the monitor active for at least the duration of the @@ -684,7 +696,7 @@ static void psi_rtpoll_work(struct psi_group *group) } if (now >= group->rtpoll_next_update) { - group->rtpoll_next_update = update_triggers(group, now, &update_total); + group->rtpoll_next_update = update_triggers(group, now, &update_total, PSI_POLL); if (update_total) memcpy(group->rtpoll_total, group->total[PSI_POLL], sizeof(group->rtpoll_total)); @@ -1254,16 +1266,23 @@ int psi_show(struct seq_file *m, struct psi_group *group, enum psi_res res) } struct psi_trigger *psi_trigger_create(struct psi_group *group, - char *buf, enum psi_res res) + char *buf, enum psi_res res, struct file *file) { struct psi_trigger *t; enum psi_states state; u32 threshold_us; + bool privileged; u32 window_us; if (static_branch_likely(&psi_disabled)) return ERR_PTR(-EOPNOTSUPP); + /* + * Checking the privilege here on file->f_cred implies that a privileged user + * could open the file and delegate the write to an unprivileged one. + */ + privileged = cap_raised(file->f_cred->cap_effective, CAP_SYS_RESOURCE); + if (sscanf(buf, "some %u %u", &threshold_us, &window_us) == 2) state = PSI_IO_SOME + res * 2; else if (sscanf(buf, "full %u %u", &threshold_us, &window_us) == 2) @@ -1283,6 +1302,13 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group, window_us > WINDOW_MAX_US) return ERR_PTR(-EINVAL); + /* + * Unprivileged users can only use 2s windows so that averages aggregation + * work is used, and no RT threads need to be spawned. + */ + if (!privileged && window_us % 2000000) + return ERR_PTR(-EINVAL); + /* Check threshold */ if (threshold_us == 0 || threshold_us > window_us) return ERR_PTR(-EINVAL); @@ -1302,10 +1328,11 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group, t->last_event_time = 0; init_waitqueue_head(&t->event_wait); t->pending_event = false; + t->aggregator = privileged ? PSI_POLL : PSI_AVGS; mutex_lock(&group->rtpoll_trigger_lock); - if (!rcu_access_pointer(group->rtpoll_task)) { + if (privileged && !rcu_access_pointer(group->rtpoll_task)) { struct task_struct *task; task = kthread_create(psi_rtpoll_worker, group, "psimon"); @@ -1319,11 +1346,16 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group, rcu_assign_pointer(group->rtpoll_task, task); } - list_add(&t->node, &group->rtpoll_triggers); - group->rtpoll_min_period = min(group->rtpoll_min_period, - div_u64(t->win.size, UPDATES_PER_WINDOW)); - group->rtpoll_nr_triggers[t->state]++; - group->rtpoll_states |= (1 << t->state); + if (privileged) { + list_add(&t->node, &group->rtpoll_triggers); + group->rtpoll_min_period = min(group->rtpoll_min_period, + div_u64(t->win.size, UPDATES_PER_WINDOW)); + group->rtpoll_nr_triggers[t->state]++; + group->rtpoll_states |= (1 << t->state); + } else { + list_add(&t->node, &group->avg_triggers); + group->avg_nr_triggers[t->state]++; + } mutex_unlock(&group->rtpoll_trigger_lock); @@ -1357,22 +1389,26 @@ void psi_trigger_destroy(struct psi_trigger *t) u64 period = ULLONG_MAX; list_del(&t->node); - group->rtpoll_nr_triggers[t->state]--; - if (!group->rtpoll_nr_triggers[t->state]) - group->rtpoll_states &= ~(1 << t->state); - /* reset min update period for the remaining triggers */ - list_for_each_entry(tmp, &group->rtpoll_triggers, node) - period = min(period, div_u64(tmp->win.size, - UPDATES_PER_WINDOW)); - group->rtpoll_min_period = period; - /* Destroy rtpoll_task when the last trigger is destroyed */ - if (group->rtpoll_states == 0) { - group->rtpoll_until = 0; - task_to_destroy = rcu_dereference_protected( - group->rtpoll_task, - lockdep_is_held(&group->rtpoll_trigger_lock)); - rcu_assign_pointer(group->rtpoll_task, NULL); - del_timer(&group->rtpoll_timer); + if (t->aggregator == PSI_AVGS) { + group->avg_nr_triggers[t->state]--; + } else { + group->rtpoll_nr_triggers[t->state]--; + if (!group->rtpoll_nr_triggers[t->state]) + group->rtpoll_states &= ~(1 << t->state); + /* reset min update period for the remaining triggers */ + list_for_each_entry(tmp, &group->rtpoll_triggers, node) + period = min(period, div_u64(tmp->win.size, + UPDATES_PER_WINDOW)); + group->rtpoll_min_period = period; + /* Destroy rtpoll_task when the last trigger is destroyed */ + if (group->rtpoll_states == 0) { + group->rtpoll_until = 0; + task_to_destroy = rcu_dereference_protected( + group->rtpoll_task, + lockdep_is_held(&group->rtpoll_trigger_lock)); + rcu_assign_pointer(group->rtpoll_task, NULL); + del_timer(&group->rtpoll_timer); + } } } @@ -1437,27 +1473,19 @@ static int psi_cpu_show(struct seq_file *m, void *v) return psi_show(m, &psi_system, PSI_CPU); } -static int psi_open(struct file *file, int (*psi_show)(struct seq_file *, void *)) -{ - if (file->f_mode & FMODE_WRITE && !capable(CAP_SYS_RESOURCE)) - return -EPERM; - - return single_open(file, psi_show, NULL); -} - static int psi_io_open(struct inode *inode, struct file *file) { - return psi_open(file, psi_io_show); + return single_open(file, psi_io_show, NULL); } static int psi_memory_open(struct inode *inode, struct file *file) { - return psi_open(file, psi_memory_show); + return single_open(file, psi_memory_show, NULL); } static int psi_cpu_open(struct inode *inode, struct file *file) { - return psi_open(file, psi_cpu_show); + return single_open(file, psi_cpu_show, NULL); } static ssize_t psi_write(struct file *file, const char __user *user_buf, @@ -1491,7 +1519,7 @@ static ssize_t psi_write(struct file *file, const char __user *user_buf, return -EBUSY; } - new = psi_trigger_create(&psi_system, buf, res); + new = psi_trigger_create(&psi_system, buf, res, file); if (IS_ERR(new)) { mutex_unlock(&seq->lock); return PTR_ERR(new); @@ -1571,7 +1599,7 @@ static int psi_irq_show(struct seq_file *m, void *v) static int psi_irq_open(struct inode *inode, struct file *file) { - return psi_open(file, psi_irq_show); + return single_open(file, psi_irq_show, NULL); } static ssize_t psi_irq_write(struct file *file, const char __user *user_buf,