From patchwork Thu Mar 9 17:07:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Domenico Cerasuolo X-Patchwork-Id: 67060 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp425873wrd; Thu, 9 Mar 2023 09:46:09 -0800 (PST) X-Google-Smtp-Source: AK7set/wZnvU1m4C1WCHPTTpkci7Oh7uX/lnOe8N6oH2TL+jovcJL2sVwdz/mffs8BvCGSp/tsgT X-Received: by 2002:a17:90a:12c4:b0:23a:f4e7:733d with SMTP id b4-20020a17090a12c400b0023af4e7733dmr7370209pjg.10.1678383969342; Thu, 09 Mar 2023 09:46:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678383969; cv=none; d=google.com; s=arc-20160816; b=cksLNXcbt/jAXr8blhK3SrDq5qKiSL6WsW/iCLLS61iDnNjRmFtWfgQq94M2TM1mKt hMaw7n3vLwOoiVd79ps0nAfujzNVRiX1h9QmvnUFynimBiRNlVVh6kOGJcqyHuq2UH0J b/187Ywve1sGvH/jnHDD/4BXzEIGMrh9+jhgNJab4UD0r8IQXoUK3Qmq/tK02QUu+ATa LBBCfwxKf1O7JvEKn0yinTW1/igEjQ0occkUILA+1V2+XzoUF23FffFrlo7MHTVPJV0E yt+q7qwev0jPvqzsH8cX0BwE9IDJJNSlgdgCtFPa/2o4Mob0JUCLRbFvGWs8mH0HE+vx ixMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=e39gTwxGluXQJRtHutgVo/eHgQ2HpZYFqbaudGJzbN8=; b=fh9s/3UF+Huyxn0e/Nvge33Zn1o4x9pqEz5wh+JTFmpn60IFiyNSihD8Z8MR2AhM6h UpGCQFgY4ETkCiBGCjOJs/p735FVkdFaLEl3vDQV7XFbFK8q6m9LnuyVT+JYbhM2kw16 XbVvaK2UX91mQyBTZ6enFTgXOSoZnvM8KoBUgHGcpfttv9Dilhbn+TDYCGqzXLWnlPcg vd3cYrznVIES8S3ijvdac3FU1n7/aaOEK/lwas5Dmyw4Gqn/rj3ySjjsjkBXlGhoHU4y 0hTFiT8wABVrEPb3yyqRkpy41q/huRwnb5Q2cd0MHRI/q/HQLMxlM9W1ZpwJAWhV1oDL BECw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=OYRmI8B9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n11-20020a17090ade8b00b0022c4ddf8413si386313pjv.8.2023.03.09.09.45.56; Thu, 09 Mar 2023 09:46:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=OYRmI8B9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229913AbjCIRLF (ORCPT + 99 others); Thu, 9 Mar 2023 12:11:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41440 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231325AbjCIRKB (ORCPT ); Thu, 9 Mar 2023 12:10:01 -0500 Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBB344C6E5 for ; Thu, 9 Mar 2023 09:08:03 -0800 (PST) Received: by mail-ed1-x532.google.com with SMTP id g3so9850493eda.1 for ; Thu, 09 Mar 2023 09:08:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1678381682; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=e39gTwxGluXQJRtHutgVo/eHgQ2HpZYFqbaudGJzbN8=; b=OYRmI8B9D0MjYsSKwLdDr+uzR7jo6MQ8AS4rod/3sOVErh9mOpD9j9SEyXXbt3CPjI M0yGJgQSlC36lTTBtX49rRQzEc18Bl142EGJdvxaUCVi6p+pSogY8VgAKiXhgC+zVBEl ncAvvlrLGtJ4A35pcbzkORaXp0x9fhzdLgviuX+f37FG/aAEEfTMpwHHJlJZyz25d6x5 LjTfHXB7TnPmNwy37MJScd9c9pFR1/KoYqr2vDXwzowyz8km4WWaSLqPwqsyNZGgpX9g H+wggHzgLjuitlK/4DVgkcTaF1G1aqtLzoVsftH+Bab+BCTGG2TPS1cu6Aw8zObdT0Ra OZlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678381682; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=e39gTwxGluXQJRtHutgVo/eHgQ2HpZYFqbaudGJzbN8=; b=5bxplLYOOWEcOefjJnsStZ9qT1pS37FskreoyWeo9d7d1MS2HaWc1E9etkjvq2T6Y8 bZuF0d0q52hWsW58mvCHQFXefD8a+EZNm2iJZgs98OcClP6p8QWoTq3ZcwIZye8OXTfq 5YR+jN8Gb4g8CTBNeCrRhZHaJkfUVnAWHTOaO0euOyUtp5Pw5Jq8Ze9rDdNm7+EVWHE1 R7+fBV6yACnDO5eyjaUQMwkppCr8mMgIH+3+AfdDJsMmaZARQ+umOy1yxkuta+ERdd2K VbWsnT0qclN72EXbvc12SrSWxqw5XuQB48PaYGDmbliO7Ku6ygMXXbIX6zWz2O8yFbRq 0y5A== X-Gm-Message-State: AO0yUKUK97M2BxQ9C6Pf/qBQ2PyhWuiDmpcN2i+Sz8qlqT/++BlVKjZa h+Pla39pMYfrjBZD7gOUoa24Tx8acGHBhw== X-Received: by 2002:aa7:d80f:0:b0:4ce:bb5c:158e with SMTP id v15-20020aa7d80f000000b004cebb5c158emr21630736edq.19.1678381681935; Thu, 09 Mar 2023 09:08:01 -0800 (PST) Received: from lelloman-5950.. (host-79-22-154-28.retail.telecomitalia.it. [79.22.154.28]) by smtp.gmail.com with ESMTPSA id bj8-20020a170906b04800b008d85435f914sm9154867ejb.98.2023.03.09.09.08.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Mar 2023 09:08:01 -0800 (PST) From: Domenico Cerasuolo To: linux-kernel@vger.kernel.org Cc: peterz@infradead.org, surenb@google.com, brauner@kernel.org, chris@chrisdown.name, hannes@cmpxchg.org, Domenico Cerasuolo Subject: [PATCH 1/4] sched/psi: rearrange polling code in preparation Date: Thu, 9 Mar 2023 18:07:53 +0100 Message-Id: <20230309170756.52927-2-cerasuolodomenico@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230309170756.52927-1-cerasuolodomenico@gmail.com> References: <20230309170756.52927-1-cerasuolodomenico@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759913148996549652?= X-GMAIL-MSGID: =?utf-8?q?1759913148996549652?= Move a few functions up in the file to avoid forward declaration needed in the patch implementing unprivileged PSI triggers. Suggested-by: Johannes Weiner Signed-off-by: Domenico Cerasuolo --- kernel/sched/psi.c | 196 ++++++++++++++++++++++----------------------- 1 file changed, 98 insertions(+), 98 deletions(-) diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 02e011cabe91..fe9269f1d2a4 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -384,92 +384,6 @@ static void collect_percpu_times(struct psi_group *group, *pchanged_states = changed_states; } -static u64 update_averages(struct psi_group *group, u64 now) -{ - unsigned long missed_periods = 0; - u64 expires, period; - u64 avg_next_update; - int s; - - /* avgX= */ - expires = group->avg_next_update; - if (now - expires >= psi_period) - missed_periods = div_u64(now - expires, psi_period); - - /* - * The periodic clock tick can get delayed for various - * reasons, especially on loaded systems. To avoid clock - * drift, we schedule the clock in fixed psi_period intervals. - * But the deltas we sample out of the per-cpu buckets above - * are based on the actual time elapsing between clock ticks. - */ - avg_next_update = expires + ((1 + missed_periods) * psi_period); - period = now - (group->avg_last_update + (missed_periods * psi_period)); - group->avg_last_update = now; - - for (s = 0; s < NR_PSI_STATES - 1; s++) { - u32 sample; - - sample = group->total[PSI_AVGS][s] - group->avg_total[s]; - /* - * Due to the lockless sampling of the time buckets, - * recorded time deltas can slip into the next period, - * which under full pressure can result in samples in - * excess of the period length. - * - * We don't want to report non-sensical pressures in - * excess of 100%, nor do we want to drop such events - * on the floor. Instead we punt any overage into the - * future until pressure subsides. By doing this we - * don't underreport the occurring pressure curve, we - * just report it delayed by one period length. - * - * The error isn't cumulative. As soon as another - * delta slips from a period P to P+1, by definition - * it frees up its time T in P. - */ - if (sample > period) - sample = period; - group->avg_total[s] += sample; - calc_avgs(group->avg[s], missed_periods, sample, period); - } - - return avg_next_update; -} - -static void psi_avgs_work(struct work_struct *work) -{ - struct delayed_work *dwork; - struct psi_group *group; - u32 changed_states; - u64 now; - - dwork = to_delayed_work(work); - group = container_of(dwork, struct psi_group, avgs_work); - - mutex_lock(&group->avgs_lock); - - now = sched_clock(); - - collect_percpu_times(group, PSI_AVGS, &changed_states); - /* - * If there is task activity, periodically fold the per-cpu - * times and feed samples into the running averages. If things - * are idle and there is no data to process, stop the clock. - * Once restarted, we'll catch up the running averages in one - * go - see calc_avgs() and missed_periods. - */ - if (now >= group->avg_next_update) - group->avg_next_update = update_averages(group, now); - - if (changed_states & PSI_STATE_RESCHEDULE) { - schedule_delayed_work(dwork, nsecs_to_jiffies( - group->avg_next_update - now) + 1); - } - - mutex_unlock(&group->avgs_lock); -} - /* Trigger tracking window manipulations */ static void window_reset(struct psi_window *win, u64 now, u64 value, u64 prev_growth) @@ -516,18 +430,6 @@ static u64 window_update(struct psi_window *win, u64 now, u64 value) return growth; } -static void init_triggers(struct psi_group *group, u64 now) -{ - struct psi_trigger *t; - - list_for_each_entry(t, &group->triggers, node) - window_reset(&t->win, now, - group->total[PSI_POLL][t->state], 0); - memcpy(group->polling_total, group->total[PSI_POLL], - sizeof(group->polling_total)); - group->polling_next_update = now + group->poll_min_period; -} - static u64 update_triggers(struct psi_group *group, u64 now) { struct psi_trigger *t; @@ -590,6 +492,104 @@ static u64 update_triggers(struct psi_group *group, u64 now) return now + group->poll_min_period; } +static u64 update_averages(struct psi_group *group, u64 now) +{ + unsigned long missed_periods = 0; + u64 expires, period; + u64 avg_next_update; + int s; + + /* avgX= */ + expires = group->avg_next_update; + if (now - expires >= psi_period) + missed_periods = div_u64(now - expires, psi_period); + + /* + * The periodic clock tick can get delayed for various + * reasons, especially on loaded systems. To avoid clock + * drift, we schedule the clock in fixed psi_period intervals. + * But the deltas we sample out of the per-cpu buckets above + * are based on the actual time elapsing between clock ticks. + */ + avg_next_update = expires + ((1 + missed_periods) * psi_period); + period = now - (group->avg_last_update + (missed_periods * psi_period)); + group->avg_last_update = now; + + for (s = 0; s < NR_PSI_STATES - 1; s++) { + u32 sample; + + sample = group->total[PSI_AVGS][s] - group->avg_total[s]; + /* + * Due to the lockless sampling of the time buckets, + * recorded time deltas can slip into the next period, + * which under full pressure can result in samples in + * excess of the period length. + * + * We don't want to report non-sensical pressures in + * excess of 100%, nor do we want to drop such events + * on the floor. Instead we punt any overage into the + * future until pressure subsides. By doing this we + * don't underreport the occurring pressure curve, we + * just report it delayed by one period length. + * + * The error isn't cumulative. As soon as another + * delta slips from a period P to P+1, by definition + * it frees up its time T in P. + */ + if (sample > period) + sample = period; + group->avg_total[s] += sample; + calc_avgs(group->avg[s], missed_periods, sample, period); + } + + return avg_next_update; +} + +static void psi_avgs_work(struct work_struct *work) +{ + struct delayed_work *dwork; + struct psi_group *group; + u32 changed_states; + u64 now; + + dwork = to_delayed_work(work); + group = container_of(dwork, struct psi_group, avgs_work); + + mutex_lock(&group->avgs_lock); + + now = sched_clock(); + + collect_percpu_times(group, PSI_AVGS, &changed_states); + /* + * If there is task activity, periodically fold the per-cpu + * times and feed samples into the running averages. If things + * are idle and there is no data to process, stop the clock. + * Once restarted, we'll catch up the running averages in one + * go - see calc_avgs() and missed_periods. + */ + if (now >= group->avg_next_update) + group->avg_next_update = update_averages(group, now); + + if (changed_states & PSI_STATE_RESCHEDULE) { + schedule_delayed_work(dwork, nsecs_to_jiffies( + group->avg_next_update - now) + 1); + } + + mutex_unlock(&group->avgs_lock); +} + +static void init_triggers(struct psi_group *group, u64 now) +{ + struct psi_trigger *t; + + list_for_each_entry(t, &group->triggers, node) + window_reset(&t->win, now, + group->total[PSI_POLL][t->state], 0); + memcpy(group->polling_total, group->total[PSI_POLL], + sizeof(group->polling_total)); + group->polling_next_update = now + group->poll_min_period; +} + /* Schedule polling if it's not already scheduled or forced. */ static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay, bool force) From patchwork Thu Mar 9 17:07:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Domenico Cerasuolo X-Patchwork-Id: 67058 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp422993wrd; Thu, 9 Mar 2023 09:39:22 -0800 (PST) X-Google-Smtp-Source: AK7set98GUeqW92zOkYHN29Ebz1bvJ5Yup+mHbm/7B9veaP+M2jP+2gXZda+VRzWLqbwFxUgfqqB X-Received: by 2002:a05:6a20:69a3:b0:c7:6f26:c99 with SMTP id t35-20020a056a2069a300b000c76f260c99mr37181563pzk.61.1678383562130; Thu, 09 Mar 2023 09:39:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678383562; cv=none; d=google.com; s=arc-20160816; b=vyHiELxsfetUbxAr+8yXbHUctFI7J0BuZOTGrPm2uArAD3VuLGyXvZ5wj7sEa7X2ts i5Kk7w0r3aM+S8vJqtQTsoWfH1IHVBt69QFUN0UN1S0fRnqhStN7IUBmGbUPD2Y/BVHl ta59eUW5Yetg+WQgutfj/f4S2K5pRD+PZMXE1nDusfmlcbHKsmDXrmuwQotVZQWqKQOJ 4BOFVVOhKfwDhEReeQWDqmYpetDIUr7/EQxdqvGXH0hXEI3SUMi+OYaPy25lNNoJUGW9 a8iyQX2zy1gS9ZlTrdy94DisLhO7NZB4MhFVid9vi4zglWFaWqJzzQtgtkF4wwAxBlYU t9wQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=5hkpteDP5zyZA6Tf7swPgdaYXDNeNkwCZlHuhJRkDTo=; b=ZuDqm8Lu1ykxtfELp+QxXjCc4lpaYQ6MKdcyY+JH6ZU9outJTXl3yDp27wiOByX6Ev DHAlZ09X4i20HhKy5UBucbi+2r5PyYFG5HZJknzKotj5Ti+8T8r1aSxjvZpd5pNdk+XV pcLjqaGg4YOTGZKVeu7sehc4Jwv+lurINTqueKJz99uNaWMV21Qjr22rW299MW9zhS4M uEI9hrMOm5R5XPtroj1P9mXA9M8eSq6k1D0iypaemoQgMYT+pxnTgsXLMRHj5b/Q1Bb2 eR2FZMcPq9udLo+z0pF1kymnYspwUa4LEwqapic/41Pq2mOczAEItOtXkViW8+VvZtjb aJAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=H5gecXqc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a10-20020a631a4a000000b0047711269d75si16849046pgm.516.2023.03.09.09.39.07; Thu, 09 Mar 2023 09:39:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=H5gecXqc; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229623AbjCIRKo (ORCPT + 99 others); Thu, 9 Mar 2023 12:10:44 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41592 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231335AbjCIRKB (ORCPT ); Thu, 9 Mar 2023 12:10:01 -0500 Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB755136D4 for ; Thu, 9 Mar 2023 09:08:04 -0800 (PST) Received: by mail-ed1-x532.google.com with SMTP id cy23so9678258edb.12 for ; Thu, 09 Mar 2023 09:08:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1678381683; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5hkpteDP5zyZA6Tf7swPgdaYXDNeNkwCZlHuhJRkDTo=; b=H5gecXqcv/b+bRamjVmeFzWBzLkTS4WS5ptcH0LQUK+0VhaLJjArnGZwGef7m7a/yn 2PLB5vDnGfnoO/OA/q6BdA0+HAuA2Pwg8b/d+Mc1TVcU6zioGkZO4qzuTRHgA3aXPyxt zLUTw8ByiJd2PdwF/heBC1M4rx8syp3K+lT5flJkjAhTTofXWnbGdhLSdgnfJsyjtA2y w6GGcRoxyeJ6DGFF0xtwbJn2Hk24KnNtSNKvvLukg7KgOZcBtIgQjU243sEkjaO3bGUt pdr1JQO3zos4f3o7HB59KrgswYjy13V9h28L5mQMwLi31yRhPpIsvhjPSTBX3p86zlHb 0j/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678381683; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5hkpteDP5zyZA6Tf7swPgdaYXDNeNkwCZlHuhJRkDTo=; b=hP1JH1OcNCKdl0ym6YNI7QuXTKWcI34JuPHSq2C+RZ0+dPzthuBF6kbQL89Zrs3mfo dvOa3Ncv2RHa0Uv56Xc87MgWG4iCMKJIkAsYbNkI5/MXkgq5ZbuFYaONUJaPu4fzX3LW ueCkjLdX7mJ7sDHMFE/Ppq5Har3Cv5/b61tr3gPHvBBNPang5MnlBGT6+W6XWmd3hHQW 8KZjaBrCnjz63J+ju4KWWBXbnDgrn+ZGTHCPst0u816vfufFTjhWzuKbWGLRfpIEWBO5 DH4IgBbIH4VlWbkPaM6XpIouAOX5lgy4j746d6MBKRZcGhmqlaJhI46uBEPwSA80DPq5 p2bg== X-Gm-Message-State: AO0yUKWdVgChtEvzOJGTyTtiwuXJVKtDNWXuv7Oh7kxPshfPcOEsmEtA FOuGs7V/TK5EVgYCqY84tl2ub3FHcMX3aw== X-Received: by 2002:a17:907:7216:b0:86a:316:d107 with SMTP id dr22-20020a170907721600b0086a0316d107mr29159095ejc.72.1678381683069; Thu, 09 Mar 2023 09:08:03 -0800 (PST) Received: from lelloman-5950.. (host-79-22-154-28.retail.telecomitalia.it. [79.22.154.28]) by smtp.gmail.com with ESMTPSA id bj8-20020a170906b04800b008d85435f914sm9154867ejb.98.2023.03.09.09.08.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Mar 2023 09:08:02 -0800 (PST) From: Domenico Cerasuolo To: linux-kernel@vger.kernel.org Cc: peterz@infradead.org, surenb@google.com, brauner@kernel.org, chris@chrisdown.name, hannes@cmpxchg.org, Domenico Cerasuolo Subject: [PATCH 2/4] sched/psi: rename existing poll members in preparation Date: Thu, 9 Mar 2023 18:07:54 +0100 Message-Id: <20230309170756.52927-3-cerasuolodomenico@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230309170756.52927-1-cerasuolodomenico@gmail.com> References: <20230309170756.52927-1-cerasuolodomenico@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759912722055142261?= X-GMAIL-MSGID: =?utf-8?q?1759912722055142261?= Renaming in PSI implementation to make a clear distinction between privileged and unprivileged triggers code to be implemented in the next patch. Suggested-by: Johannes Weiner Signed-off-by: Domenico Cerasuolo --- include/linux/psi_types.h | 36 ++++----- kernel/sched/psi.c | 163 +++++++++++++++++++------------------- 2 files changed, 100 insertions(+), 99 deletions(-) diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index 1e0a0d7ace3a..1819afa8b198 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -175,26 +175,26 @@ struct psi_group { u64 total[NR_PSI_AGGREGATORS][NR_PSI_STATES - 1]; unsigned long avg[NR_PSI_STATES - 1][3]; - /* Monitor work control */ - struct task_struct __rcu *poll_task; - struct timer_list poll_timer; - wait_queue_head_t poll_wait; - atomic_t poll_wakeup; - atomic_t poll_scheduled; + /* Monitor RT polling work control */ + struct task_struct __rcu *rtpoll_task; + struct timer_list rtpoll_timer; + wait_queue_head_t rtpoll_wait; + atomic_t rtpoll_wakeup; + atomic_t rtpoll_scheduled; /* Protects data used by the monitor */ - struct mutex trigger_lock; - - /* Configured polling triggers */ - struct list_head triggers; - u32 nr_triggers[NR_PSI_STATES - 1]; - u32 poll_states; - u64 poll_min_period; - - /* Total stall times at the start of monitor activation */ - u64 polling_total[NR_PSI_STATES - 1]; - u64 polling_next_update; - u64 polling_until; + struct mutex rtpoll_trigger_lock; + + /* Configured RT polling triggers */ + struct list_head rtpoll_triggers; + u32 rtpoll_nr_triggers[NR_PSI_STATES - 1]; + u32 rtpoll_states; + u64 rtpoll_min_period; + + /* Total stall times at the start of RT polling monitor activation */ + u64 rtpoll_total[NR_PSI_STATES - 1]; + u64 rtpoll_next_update; + u64 rtpoll_until; }; #else /* CONFIG_PSI */ diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index fe9269f1d2a4..a3d0b5cf797a 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -189,14 +189,14 @@ static void group_init(struct psi_group *group) INIT_DELAYED_WORK(&group->avgs_work, psi_avgs_work); mutex_init(&group->avgs_lock); /* Init trigger-related members */ - atomic_set(&group->poll_scheduled, 0); - mutex_init(&group->trigger_lock); - INIT_LIST_HEAD(&group->triggers); - group->poll_min_period = U32_MAX; - group->polling_next_update = ULLONG_MAX; - init_waitqueue_head(&group->poll_wait); - timer_setup(&group->poll_timer, poll_timer_fn, 0); - rcu_assign_pointer(group->poll_task, NULL); + atomic_set(&group->rtpoll_scheduled, 0); + mutex_init(&group->rtpoll_trigger_lock); + INIT_LIST_HEAD(&group->rtpoll_triggers); + group->rtpoll_min_period = U32_MAX; + group->rtpoll_next_update = ULLONG_MAX; + init_waitqueue_head(&group->rtpoll_wait); + timer_setup(&group->rtpoll_timer, poll_timer_fn, 0); + rcu_assign_pointer(group->rtpoll_task, NULL); } void __init psi_init(void) @@ -440,11 +440,11 @@ static u64 update_triggers(struct psi_group *group, u64 now) * On subsequent updates, calculate growth deltas and let * watchers know when their specified thresholds are exceeded. */ - list_for_each_entry(t, &group->triggers, node) { + list_for_each_entry(t, &group->rtpoll_triggers, node) { u64 growth; bool new_stall; - new_stall = group->polling_total[t->state] != total[t->state]; + new_stall = group->rtpoll_total[t->state] != total[t->state]; /* Check for stall activity or a previous threshold breach */ if (!new_stall && !t->pending_event) @@ -486,10 +486,10 @@ static u64 update_triggers(struct psi_group *group, u64 now) } if (update_total) - memcpy(group->polling_total, total, - sizeof(group->polling_total)); + memcpy(group->rtpoll_total, total, + sizeof(group->rtpoll_total)); - return now + group->poll_min_period; + return now + group->rtpoll_min_period; } static u64 update_averages(struct psi_group *group, u64 now) @@ -582,53 +582,53 @@ static void init_triggers(struct psi_group *group, u64 now) { struct psi_trigger *t; - list_for_each_entry(t, &group->triggers, node) + list_for_each_entry(t, &group->rtpoll_triggers, node) window_reset(&t->win, now, group->total[PSI_POLL][t->state], 0); - memcpy(group->polling_total, group->total[PSI_POLL], - sizeof(group->polling_total)); - group->polling_next_update = now + group->poll_min_period; + memcpy(group->rtpoll_total, group->total[PSI_POLL], + sizeof(group->rtpoll_total)); + group->rtpoll_next_update = now + group->rtpoll_min_period; } /* Schedule polling if it's not already scheduled or forced. */ -static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay, +static void psi_schedule_rtpoll_work(struct psi_group *group, unsigned long delay, bool force) { struct task_struct *task; /* * atomic_xchg should be called even when !force to provide a - * full memory barrier (see the comment inside psi_poll_work). + * full memory barrier (see the comment inside psi_rtpoll_work). */ - if (atomic_xchg(&group->poll_scheduled, 1) && !force) + if (atomic_xchg(&group->rtpoll_scheduled, 1) && !force) return; rcu_read_lock(); - task = rcu_dereference(group->poll_task); + task = rcu_dereference(group->rtpoll_task); /* * kworker might be NULL in case psi_trigger_destroy races with * psi_task_change (hotpath) which can't use locks */ if (likely(task)) - mod_timer(&group->poll_timer, jiffies + delay); + mod_timer(&group->rtpoll_timer, jiffies + delay); else - atomic_set(&group->poll_scheduled, 0); + atomic_set(&group->rtpoll_scheduled, 0); rcu_read_unlock(); } -static void psi_poll_work(struct psi_group *group) +static void psi_rtpoll_work(struct psi_group *group) { bool force_reschedule = false; u32 changed_states; u64 now; - mutex_lock(&group->trigger_lock); + mutex_lock(&group->rtpoll_trigger_lock); now = sched_clock(); - if (now > group->polling_until) { + if (now > group->rtpoll_until) { /* * We are either about to start or might stop polling if no * state change was recorded. Resetting poll_scheduled leaves @@ -638,7 +638,7 @@ static void psi_poll_work(struct psi_group *group) * should be negligible and polling_next_update still keeps * updates correctly on schedule. */ - atomic_set(&group->poll_scheduled, 0); + atomic_set(&group->rtpoll_scheduled, 0); /* * A task change can race with the poll worker that is supposed to * report on it. To avoid missing events, ensure ordering between @@ -667,9 +667,9 @@ static void psi_poll_work(struct psi_group *group) collect_percpu_times(group, PSI_POLL, &changed_states); - if (changed_states & group->poll_states) { + if (changed_states & group->rtpoll_states) { /* Initialize trigger windows when entering polling mode */ - if (now > group->polling_until) + if (now > group->rtpoll_until) init_triggers(group, now); /* @@ -677,50 +677,50 @@ static void psi_poll_work(struct psi_group *group) * minimum tracking window as long as monitor states are * changing. */ - group->polling_until = now + - group->poll_min_period * UPDATES_PER_WINDOW; + group->rtpoll_until = now + + group->rtpoll_min_period * UPDATES_PER_WINDOW; } - if (now > group->polling_until) { - group->polling_next_update = ULLONG_MAX; + if (now > group->rtpoll_until) { + group->rtpoll_next_update = ULLONG_MAX; goto out; } - if (now >= group->polling_next_update) - group->polling_next_update = update_triggers(group, now); + if (now >= group->rtpoll_next_update) + group->rtpoll_next_update = update_triggers(group, now); - psi_schedule_poll_work(group, - nsecs_to_jiffies(group->polling_next_update - now) + 1, + psi_schedule_rtpoll_work(group, + nsecs_to_jiffies(group->rtpoll_next_update - now) + 1, force_reschedule); out: - mutex_unlock(&group->trigger_lock); + mutex_unlock(&group->rtpoll_trigger_lock); } -static int psi_poll_worker(void *data) +static int psi_rtpoll_worker(void *data) { struct psi_group *group = (struct psi_group *)data; sched_set_fifo_low(current); while (true) { - wait_event_interruptible(group->poll_wait, - atomic_cmpxchg(&group->poll_wakeup, 1, 0) || + wait_event_interruptible(group->rtpoll_wait, + atomic_cmpxchg(&group->rtpoll_wakeup, 1, 0) || kthread_should_stop()); if (kthread_should_stop()) break; - psi_poll_work(group); + psi_rtpoll_work(group); } return 0; } static void poll_timer_fn(struct timer_list *t) { - struct psi_group *group = from_timer(group, t, poll_timer); + struct psi_group *group = from_timer(group, t, rtpoll_timer); - atomic_set(&group->poll_wakeup, 1); - wake_up_interruptible(&group->poll_wait); + atomic_set(&group->rtpoll_wakeup, 1); + wake_up_interruptible(&group->rtpoll_wait); } static void record_times(struct psi_group_cpu *groupc, u64 now) @@ -851,8 +851,8 @@ static void psi_group_change(struct psi_group *group, int cpu, write_seqcount_end(&groupc->seq); - if (state_mask & group->poll_states) - psi_schedule_poll_work(group, 1, false); + if (state_mask & group->rtpoll_states) + psi_schedule_rtpoll_work(group, 1, false); if (wake_clock && !delayed_work_pending(&group->avgs_work)) schedule_delayed_work(&group->avgs_work, PSI_FREQ); @@ -1005,8 +1005,8 @@ void psi_account_irqtime(struct task_struct *task, u32 delta) write_seqcount_end(&groupc->seq); - if (group->poll_states & (1 << PSI_IRQ_FULL)) - psi_schedule_poll_work(group, 1, false); + if (group->rtpoll_states & (1 << PSI_IRQ_FULL)) + psi_schedule_rtpoll_work(group, 1, false); } while ((group = group->parent)); } #endif @@ -1101,7 +1101,7 @@ void psi_cgroup_free(struct cgroup *cgroup) cancel_delayed_work_sync(&cgroup->psi->avgs_work); free_percpu(cgroup->psi->pcpu); /* All triggers must be removed by now */ - WARN_ONCE(cgroup->psi->poll_states, "psi: trigger leak\n"); + WARN_ONCE(cgroup->psi->rtpoll_states, "psi: trigger leak\n"); kfree(cgroup->psi); } @@ -1302,29 +1302,29 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group, init_waitqueue_head(&t->event_wait); t->pending_event = false; - mutex_lock(&group->trigger_lock); + mutex_lock(&group->rtpoll_trigger_lock); - if (!rcu_access_pointer(group->poll_task)) { + if (!rcu_access_pointer(group->rtpoll_task)) { struct task_struct *task; - task = kthread_create(psi_poll_worker, group, "psimon"); + task = kthread_create(psi_rtpoll_worker, group, "psimon"); if (IS_ERR(task)) { kfree(t); - mutex_unlock(&group->trigger_lock); + mutex_unlock(&group->rtpoll_trigger_lock); return ERR_CAST(task); } - atomic_set(&group->poll_wakeup, 0); + atomic_set(&group->rtpoll_wakeup, 0); wake_up_process(task); - rcu_assign_pointer(group->poll_task, task); + rcu_assign_pointer(group->rtpoll_task, task); } - list_add(&t->node, &group->triggers); - group->poll_min_period = min(group->poll_min_period, + list_add(&t->node, &group->rtpoll_triggers); + group->rtpoll_min_period = min(group->rtpoll_min_period, div_u64(t->win.size, UPDATES_PER_WINDOW)); - group->nr_triggers[t->state]++; - group->poll_states |= (1 << t->state); + group->rtpoll_nr_triggers[t->state]++; + group->rtpoll_states |= (1 << t->state); - mutex_unlock(&group->trigger_lock); + mutex_unlock(&group->rtpoll_trigger_lock); return t; } @@ -1349,51 +1349,52 @@ void psi_trigger_destroy(struct psi_trigger *t) */ wake_up_pollfree(&t->event_wait); - mutex_lock(&group->trigger_lock); + mutex_lock(&group->rtpoll_trigger_lock); if (!list_empty(&t->node)) { struct psi_trigger *tmp; u64 period = ULLONG_MAX; list_del(&t->node); - group->nr_triggers[t->state]--; - if (!group->nr_triggers[t->state]) - group->poll_states &= ~(1 << t->state); + group->rtpoll_nr_triggers[t->state]--; + if (!group->rtpoll_nr_triggers[t->state]) + group->rtpoll_states &= ~(1 << t->state); /* reset min update period for the remaining triggers */ - list_for_each_entry(tmp, &group->triggers, node) + list_for_each_entry(tmp, &group->rtpoll_triggers, node) period = min(period, div_u64(tmp->win.size, UPDATES_PER_WINDOW)); - group->poll_min_period = period; - /* Destroy poll_task when the last trigger is destroyed */ - if (group->poll_states == 0) { - group->polling_until = 0; + group->rtpoll_min_period = period; + /* Destroy rtpoll_task when the last trigger is destroyed */ + if (group->rtpoll_states == 0) { + group->rtpoll_until = 0; task_to_destroy = rcu_dereference_protected( - group->poll_task, - lockdep_is_held(&group->trigger_lock)); - rcu_assign_pointer(group->poll_task, NULL); - del_timer(&group->poll_timer); + group->rtpoll_task, + lockdep_is_held(&group->rtpoll_trigger_lock)); + rcu_assign_pointer(group->rtpoll_task, NULL); + del_timer(&group->rtpoll_timer); } } - mutex_unlock(&group->trigger_lock); + mutex_unlock(&group->rtpoll_trigger_lock); /* - * Wait for psi_schedule_poll_work RCU to complete its read-side + * Wait for psi_schedule_rtpoll_work RCU to complete its read-side * critical section before destroying the trigger and optionally the - * poll_task. + * rtpoll_task. */ synchronize_rcu(); /* - * Stop kthread 'psimon' after releasing trigger_lock to prevent a - * deadlock while waiting for psi_poll_work to acquire trigger_lock + * Stop kthread 'psimon' after releasing rtpoll_trigger_lock to prevent + * a deadlock while waiting for psi_rtpoll_work to acquire + * rtpoll_trigger_lock */ if (task_to_destroy) { /* * After the RCU grace period has expired, the worker - * can no longer be found through group->poll_task. + * can no longer be found through group->rtpoll_task. */ kthread_stop(task_to_destroy); - atomic_set(&group->poll_scheduled, 0); + atomic_set(&group->rtpoll_scheduled, 0); } kfree(t); } From patchwork Thu Mar 9 17:07:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Domenico Cerasuolo X-Patchwork-Id: 67059 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp423652wrd; Thu, 9 Mar 2023 09:40:55 -0800 (PST) X-Google-Smtp-Source: AK7set/qrw7Qq93KuwYcmGytAqPAkiR7McbB4VQKYfSnxjTIV9I7LLEtVg79/Un5EyA340tFHwTI X-Received: by 2002:a05:6a20:4320:b0:d0:45c0:1421 with SMTP id h32-20020a056a20432000b000d045c01421mr9166484pzk.48.1678383654963; Thu, 09 Mar 2023 09:40:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678383654; cv=none; d=google.com; s=arc-20160816; b=dDxisqkhXCS0v15yJZVHFOxta0tet8ZCn/PdXtS+78j19kBb2foHl6kAE1H4EFII+b 0CrI1drtHFMXXk8c++rL+busICGw3twbupRJZexPmoo3IJXk2RN4zmuGi9VjjCS+oqhe if9Rtxrqs55OLtoWsWh94v8uSiJyBgu533i2gkqeiCRBp7oxglNb4+NtNELE4WxeyaDI S2KTM+bBqFq5Fp1EJU9E4BF7OgCSmAckdg3AKXAAXyEH58nrw+2s0weHq8fAMlV81PA6 iCYD0y0mUw1oZ9JMiOLgljIVrz5Rq5Nl3KJK2qipC7nIIUaxrJq3xztEHuhCW48hXRyc SFdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=iBuKepBSNeui5B6FhWjCNCNf/8wtioatAM9zTb92L/k=; b=n1WC+0nu+mNIg4rM8bPyjtJ4CEnYbUVqhocjavYKPPyd/i9Tge9fmM+mw0R9uVeGbF JlZ7zViw2H/6t+hxR/3hD1/WD7Lh6wte5DWHMRZKOrFPIePIpGt6Zb6fjLt3V12pCoqn efKjDS+ncm4usmd7HAu67kEbtEUHpOam28E6Hwd2KBaWK6N8cTHbSXPZz65abTdwXb43 JvuaKl4LpSx7BUJt11WRmbV5/I38t0pzcUtNR9kZuM8mY8HeMrXnN0cNgmhxwAVCSzyi i/kiXJH7PqYKsllkcJJcP43JUrqd7WsY+yFHqEtXuL3ktUt1tDy1jnvxL5ESeSmb3fXG Khng== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=DWQnHIVg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v10-20020a63610a000000b00503015b772asi5209760pgb.743.2023.03.09.09.40.40; Thu, 09 Mar 2023 09:40:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=DWQnHIVg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229872AbjCIRK5 (ORCPT + 99 others); Thu, 9 Mar 2023 12:10:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39748 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231338AbjCIRKC (ORCPT ); Thu, 9 Mar 2023 12:10:02 -0500 Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1EAAB469D for ; Thu, 9 Mar 2023 09:08:06 -0800 (PST) Received: by mail-ed1-x52f.google.com with SMTP id ec29so9785246edb.6 for ; Thu, 09 Mar 2023 09:08:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1678381684; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=iBuKepBSNeui5B6FhWjCNCNf/8wtioatAM9zTb92L/k=; b=DWQnHIVggPRC4AMq5D2NHxoXXd86XqeZtMfb45iyddGQpsTxvtQiMIrsqFLXM7hB4B oR1iwPUseQ3YhpcGhCEoDUbW34yxZhdLlXi5keCAtZY8YHe8Z8CCAsrzo/XkfQOoHHl4 xTz6NRdfe1rRvNZRGUZRVx4Ny0nh2PSSxRLTZ5e2Ao0zbI5hkJto/28SzTe3i4VIotdd RHJh28lzS+NFZpqrqmyxK3x8GICkUZThCFj4d+Swo1yCl5uRoVXNka3HipXTUjUeuXTI USjY/jTKRxF65MTEHEq9rj/pAjV5XBFYGISOuJqPqYJunWWTWuWYIAWK5bW7dgHJVxwG Cp+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678381684; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=iBuKepBSNeui5B6FhWjCNCNf/8wtioatAM9zTb92L/k=; b=n2hjyyTcoORprOKaqF6Svn0kROaAJNSbQdsHwdxwARhjJqN/fUljyY1a8X6BgkezG3 8WoffgXDOlNveVyqurcFGZAohNkLfXSMiMIAhRO1EvRWAQNZaKy5+Qk55Ds5L/YGCANB q5EMOR+1JmOYfKrBtp6pFWckWTw+RgMkSZ2Gps61sJgbNz6VkILcquURGPuvucqs2eIO MYMFfTesgZ2NcQUD1Q/HNpdGsY3aEJCQOQ1eBpeowywOQTSOZ64gToWxPW1VS0icbbQD mj70bdSI6SnaITRxtMJDfUEeDFw53+Alhlj8KUTPXfMD0bwcSuH+R33x2JfnV9SZP/rA euSA== X-Gm-Message-State: AO0yUKW7mhVWm3fRt6yy1bNg/fkjldOe1uHtkHIUCkMwcLN6FxWvdc/w KqsTX3AwXYJBcoKK3nftmN2wHQ9JJh8qiQ== X-Received: by 2002:a17:907:320a:b0:87a:ee05:f7b with SMTP id xg10-20020a170907320a00b0087aee050f7bmr26717111ejb.24.1678381684279; Thu, 09 Mar 2023 09:08:04 -0800 (PST) Received: from lelloman-5950.. (host-79-22-154-28.retail.telecomitalia.it. [79.22.154.28]) by smtp.gmail.com with ESMTPSA id bj8-20020a170906b04800b008d85435f914sm9154867ejb.98.2023.03.09.09.08.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Mar 2023 09:08:03 -0800 (PST) From: Domenico Cerasuolo To: linux-kernel@vger.kernel.org Cc: peterz@infradead.org, surenb@google.com, brauner@kernel.org, chris@chrisdown.name, hannes@cmpxchg.org, Domenico Cerasuolo Subject: [PATCH 3/4] sched/psi: extract update_triggers side effect Date: Thu, 9 Mar 2023 18:07:55 +0100 Message-Id: <20230309170756.52927-4-cerasuolodomenico@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230309170756.52927-1-cerasuolodomenico@gmail.com> References: <20230309170756.52927-1-cerasuolodomenico@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759912819541530767?= X-GMAIL-MSGID: =?utf-8?q?1759912819541530767?= The update of rtpoll_total inside update_triggers can be moved out of the function since changed_states has the same information as the update_total flag used in the function. Besides the simplification of the function, with the next patch it would become an unwanted side effect needed only for PSI_POLL. Suggested-by: Johannes Weiner Signed-off-by: Domenico Cerasuolo --- kernel/sched/psi.c | 20 +++++--------------- 1 file changed, 5 insertions(+), 15 deletions(-) diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index a3d0b5cf797a..476941c1cbea 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -433,7 +433,6 @@ static u64 window_update(struct psi_window *win, u64 now, u64 value) static u64 update_triggers(struct psi_group *group, u64 now) { struct psi_trigger *t; - bool update_total = false; u64 *total = group->total[PSI_POLL]; /* @@ -456,14 +455,6 @@ static u64 update_triggers(struct psi_group *group, u64 now) * events without dropping any). */ if (new_stall) { - /* - * Multiple triggers might be looking at the same state, - * remember to update group->polling_total[] once we've - * been through all of them. Also remember to extend the - * polling time if we see new stall activity. - */ - update_total = true; - /* Calculate growth since last update */ growth = window_update(&t->win, now, total[t->state]); if (!t->pending_event) { @@ -484,11 +475,6 @@ static u64 update_triggers(struct psi_group *group, u64 now) /* Reset threshold breach flag once event got generated */ t->pending_event = false; } - - if (update_total) - memcpy(group->rtpoll_total, total, - sizeof(group->rtpoll_total)); - return now + group->rtpoll_min_period; } @@ -686,8 +672,12 @@ static void psi_rtpoll_work(struct psi_group *group) goto out; } - if (now >= group->rtpoll_next_update) + if (now >= group->rtpoll_next_update) { group->rtpoll_next_update = update_triggers(group, now); + if (changed_states & group->rtpoll_states) + memcpy(group->rtpoll_total, group->total[PSI_POLL], + sizeof(group->rtpoll_total)); + } psi_schedule_rtpoll_work(group, nsecs_to_jiffies(group->rtpoll_next_update - now) + 1, From patchwork Thu Mar 9 17:07:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Domenico Cerasuolo X-Patchwork-Id: 67057 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp414941wrd; Thu, 9 Mar 2023 09:22:16 -0800 (PST) X-Google-Smtp-Source: AK7set+owKmfIPsD3soM0LPNj7BUYZ33xs4G2B/3NwAkn10pCjxcjj+ApM373DbgzHIRzLmZ8TbT X-Received: by 2002:a05:6a20:7b30:b0:cc:109b:6c85 with SMTP id s48-20020a056a207b3000b000cc109b6c85mr19271228pzh.7.1678382536644; Thu, 09 Mar 2023 09:22:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1678382536; cv=none; d=google.com; s=arc-20160816; b=MaprWpEcKBwxeSeSK6e+bdZfeAj/JbLcM4yrdZjrSNyRo7X84ePCqO+lrWacbBTUQT WhIONF5zLPoHAIyvglXWIK56hOm8HQXuP8Og9LVjSrCFxjdop5A5J29z8HhIirH5lXhA ehLFxHHaBI+Gl3Q1iprOzEOSHjLqZlokO0VMTh2eQMHjiJ1u46RbVDWrtcWE59sgqsxf czZ4BNqRhmx0EcohlUCRmaRlHftcQhjfeASc44f2HEoD/XnAsIwpqd1o7eYyNiBPKHmi xt60NL8ZpxXGuG2u9nphi4xmiK4puqlIWIiO0FjOIGCQwdb9U3RVKagaEXlmJmAxxn+3 XGXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=xPrjIlcKtd89jv00eXJk/K8E993g+fzHBjKuVHhHBB4=; b=PTm8WLW88qj9SX30vuog6K1ENDcJ4cUxHTGxX75VQc0lU+6lgQggcPO5/u1kKwPVn9 d2Coe6BlKJgjTLc9BQI97ybStuCKBQgmaLiKDIAXQjhm1CdS2JfeyC4orT4D4Rbz0PpF 2UOsi9tEGk3Aa7dfq1C9fU28Xzc9IKUIF6CGParVm8Z0rXYnqaqAgSNB7itl+9kYdceM DqDdU1zZd+vg9ivmRyZa5/Zi47meuq0yB5ev+wSb2UAYBcVSIZmNnyQ2aCLwvVudM2Ta unqRscGXD4GHp6ISgQDMV0MBO4S1F3Zp6JJRTScVjhkC3KLvvxh3yqj3g0qzEe3c9Ikr yc6A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=Esm16GcG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a1-20020a630b41000000b004fb907d8aa7si15657860pgl.816.2023.03.09.09.22.00; Thu, 09 Mar 2023 09:22:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=Esm16GcG; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230098AbjCIRLO (ORCPT + 99 others); Thu, 9 Mar 2023 12:11:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41654 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231341AbjCIRKD (ORCPT ); Thu, 9 Mar 2023 12:10:03 -0500 Received: from mail-ed1-x532.google.com (mail-ed1-x532.google.com [IPv6:2a00:1450:4864:20::532]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50BCA16ACF for ; Thu, 9 Mar 2023 09:08:06 -0800 (PST) Received: by mail-ed1-x532.google.com with SMTP id da10so9839114edb.3 for ; Thu, 09 Mar 2023 09:08:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1678381685; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xPrjIlcKtd89jv00eXJk/K8E993g+fzHBjKuVHhHBB4=; b=Esm16GcGGqKillnOKS1lxVJtybW52HFGoFSOVYd0wJP8McxKeUBaZprPiKabL06qVC cLbB67UUOz0I7OTeF1xxUJT5EZ/8zmwksCsVtO/j83qWY1IIyTZOsuLpJz49tIvoOrwR DoEy/Knui6xDHNNCaZoAXHesWC11PfUXTIeWeD3NYLbI31cEH9LXKqWiBaVwGHYHtoDJ nNqmNeluIEsTiufueawawYLUedHw5xDxvx9E/ZrmjBYcCUx0LZvJ1dU3qJTmyzSuC+0N 42J2pvO5O1A4nZElRHGvPxGBEi/UtGVKJmwKLt/2QCwrIWF73Hp4DjI0YY8kOif2KUXp hZjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1678381685; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xPrjIlcKtd89jv00eXJk/K8E993g+fzHBjKuVHhHBB4=; b=C18c/prlJPdSraC4KEpH8NhpX02jJdAUfMd+3KEci5V+PAG0laM7JlS0vxqT5yGOdO tF4NNQLgZh2XBYsmn8QQWQc36HcUKukMNja4OPoWZWT5dMGE42kWVk6FRN/aCW5oLnge nDDOhDbbvsaTTAIncS3TVxr0JUU7JqdxOKNkYjC0piMOJts3htsECcOfX8xMcuWFEz3Y 4rxIxv2SbdBBs/dbIsRnfMmBTrlDTNn7REUKIjqaaXl6+SLufDIdiqvhvDh76wLEbw8H BYg5B5viP899PzYNY68wewyRPcxTu/QfVeVbOdGdPjVxCE6uS1Smppt0DGhskE8wlhjL 1org== X-Gm-Message-State: AO0yUKXqMzRf5W6bR8PoPdktF37ttJttrRiZfebTqkCWLnZ8Wb6Ho7a9 eZ21a4El69lX8GBFNZ9n3lXlTZljWOL05w== X-Received: by 2002:a17:907:3ea9:b0:88d:777a:9ca6 with SMTP id hs41-20020a1709073ea900b0088d777a9ca6mr27714046ejc.18.1678381685455; Thu, 09 Mar 2023 09:08:05 -0800 (PST) Received: from lelloman-5950.. (host-79-22-154-28.retail.telecomitalia.it. [79.22.154.28]) by smtp.gmail.com with ESMTPSA id bj8-20020a170906b04800b008d85435f914sm9154867ejb.98.2023.03.09.09.08.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 09 Mar 2023 09:08:05 -0800 (PST) From: Domenico Cerasuolo To: linux-kernel@vger.kernel.org Cc: peterz@infradead.org, surenb@google.com, brauner@kernel.org, chris@chrisdown.name, hannes@cmpxchg.org, Domenico Cerasuolo Subject: [PATCH 4/4] sched/psi: allow unprivileged polling of N*2s period Date: Thu, 9 Mar 2023 18:07:56 +0100 Message-Id: <20230309170756.52927-5-cerasuolodomenico@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230309170756.52927-1-cerasuolodomenico@gmail.com> References: <20230309170756.52927-1-cerasuolodomenico@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759911646772040646?= X-GMAIL-MSGID: =?utf-8?q?1759911646772040646?= PSI offers 2 mechanisms to get information about a specific resource pressure. One is reading from /proc/pressure/, which gives average pressures aggregated every 2s. The other is creating a pollable fd for a specific resource and cgroup. The trigger creation requires CAP_SYS_RESOURCE, and gives the possibility to pick specific time window and threshold, spawing an RT thread to aggregate the data. Systemd would like to provide containers the option to monitor pressure on their own cgroup and sub-cgroups. For example, if systemd launches a container that itself then launches services, the container should have the ability to poll() for pressure in individual services. But neither the container nor the services are privileged. This patch implements a mechanism to allow unprivileged users to create pressure triggers. The difference with privileged triggers creation is that unprivileged ones must have a time window that's a multiple of 2s. This is so that we can avoid unrestricted spawning of rt threads, and use instead the same aggregation mechanism done for the averages, which runs indipendently of any triggers. Suggested-by: Johannes Weiner Signed-off-by: Domenico Cerasuolo --- Documentation/accounting/psi.rst | 4 ++ include/linux/psi.h | 2 +- include/linux/psi_types.h | 7 +++ kernel/cgroup/cgroup.c | 2 +- kernel/sched/psi.c | 105 ++++++++++++++++++++----------- 5 files changed, 83 insertions(+), 37 deletions(-) diff --git a/Documentation/accounting/psi.rst b/Documentation/accounting/psi.rst index 5e40b3f437f9..df6062eb3abb 100644 --- a/Documentation/accounting/psi.rst +++ b/Documentation/accounting/psi.rst @@ -105,6 +105,10 @@ prevent overly frequent polling. Max limit is chosen as a high enough number after which monitors are most likely not needed and psi averages can be used instead. +Unprivileged users can also create monitors, with the only limitation that the +window size must be a multiple of 2s, in order to prevent excessive resource +usage. + When activated, psi monitor stays active for at least the duration of one tracking window to avoid repeated activations/deactivations when system is bouncing in and out of the stall state. diff --git a/include/linux/psi.h b/include/linux/psi.h index b029a847def1..ab26200c2803 100644 --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -24,7 +24,7 @@ void psi_memstall_leave(unsigned long *flags); int psi_show(struct seq_file *s, struct psi_group *group, enum psi_res res); struct psi_trigger *psi_trigger_create(struct psi_group *group, - char *buf, enum psi_res res); + char *buf, enum psi_res res, struct file *file); void psi_trigger_destroy(struct psi_trigger *t); __poll_t psi_trigger_poll(void **trigger_ptr, struct file *file, diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index 1819afa8b198..e439f411c23b 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -151,6 +151,9 @@ struct psi_trigger { /* Deferred event(s) from previous ratelimit window */ bool pending_event; + + /* Used to differentiate destruction action*/ + enum psi_aggregators aggregator; }; struct psi_group { @@ -171,6 +174,10 @@ struct psi_group { /* Aggregator work control */ struct delayed_work avgs_work; + /* Unprivileged triggers against N*PSI_FREQ windows */ + struct list_head triggers; + u32 nr_triggers[NR_PSI_STATES - 1]; + /* Total stall times and sampled pressure averages */ u64 total[NR_PSI_AGGREGATORS][NR_PSI_STATES - 1]; unsigned long avg[NR_PSI_STATES - 1][3]; diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 935e8121b21e..dead36969bba 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -3761,7 +3761,7 @@ static ssize_t pressure_write(struct kernfs_open_file *of, char *buf, } psi = cgroup_psi(cgrp); - new = psi_trigger_create(psi, buf, res); + new = psi_trigger_create(psi, buf, res, of->file); if (IS_ERR(new)) { cgroup_put(cgrp); return PTR_ERR(new); diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 476941c1cbea..fde91aa4e55f 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -186,9 +186,9 @@ static void group_init(struct psi_group *group) seqcount_init(&per_cpu_ptr(group->pcpu, cpu)->seq); group->avg_last_update = sched_clock(); group->avg_next_update = group->avg_last_update + psi_period; - INIT_DELAYED_WORK(&group->avgs_work, psi_avgs_work); mutex_init(&group->avgs_lock); - /* Init trigger-related members */ + + /* Init rtpoll trigger-related members */ atomic_set(&group->rtpoll_scheduled, 0); mutex_init(&group->rtpoll_trigger_lock); INIT_LIST_HEAD(&group->rtpoll_triggers); @@ -197,6 +197,11 @@ static void group_init(struct psi_group *group) init_waitqueue_head(&group->rtpoll_wait); timer_setup(&group->rtpoll_timer, poll_timer_fn, 0); rcu_assign_pointer(group->rtpoll_task, NULL); + + /* Init avg trigger-related members */ + INIT_LIST_HEAD(&group->triggers); + memset(group->nr_triggers, 0, sizeof(group->nr_triggers)); + INIT_DELAYED_WORK(&group->avgs_work, psi_avgs_work); } void __init psi_init(void) @@ -430,20 +435,23 @@ static u64 window_update(struct psi_window *win, u64 now, u64 value) return growth; } -static u64 update_triggers(struct psi_group *group, u64 now) +static u64 update_triggers(struct psi_group *group, u64 now, enum psi_aggregators aggregator) { struct psi_trigger *t; - u64 *total = group->total[PSI_POLL]; + u64 *total = group->total[aggregator]; + struct list_head *triggers = aggregator == PSI_AVGS ? &group->triggers + : &group->rtpoll_triggers; + u64 *aggregator_total = aggregator == PSI_AVGS ? group->avg_total : group->rtpoll_total; /* * On subsequent updates, calculate growth deltas and let * watchers know when their specified thresholds are exceeded. */ - list_for_each_entry(t, &group->rtpoll_triggers, node) { + list_for_each_entry(t, triggers, node) { u64 growth; bool new_stall; - new_stall = group->rtpoll_total[t->state] != total[t->state]; + new_stall = aggregator_total[t->state] != total[t->state]; /* Check for stall activity or a previous threshold breach */ if (!new_stall && !t->pending_event) @@ -553,8 +561,10 @@ static void psi_avgs_work(struct work_struct *work) * Once restarted, we'll catch up the running averages in one * go - see calc_avgs() and missed_periods. */ - if (now >= group->avg_next_update) + if (now >= group->avg_next_update) { + update_triggers(group, now, PSI_AVGS); group->avg_next_update = update_averages(group, now); + } if (changed_states & PSI_STATE_RESCHEDULE) { schedule_delayed_work(dwork, nsecs_to_jiffies( @@ -571,9 +581,17 @@ static void init_triggers(struct psi_group *group, u64 now) list_for_each_entry(t, &group->rtpoll_triggers, node) window_reset(&t->win, now, group->total[PSI_POLL][t->state], 0); + + list_for_each_entry(t, &group->triggers, node) + window_reset(&t->win, now, + group->total[PSI_AVGS][t->state], 0); + memcpy(group->rtpoll_total, group->total[PSI_POLL], sizeof(group->rtpoll_total)); group->rtpoll_next_update = now + group->rtpoll_min_period; + + memcpy(group->avg_total, group->total[PSI_AVGS], + sizeof(group->avg_total)); } /* Schedule polling if it's not already scheduled or forced. */ @@ -673,7 +691,7 @@ static void psi_rtpoll_work(struct psi_group *group) } if (now >= group->rtpoll_next_update) { - group->rtpoll_next_update = update_triggers(group, now); + group->rtpoll_next_update = update_triggers(group, now, PSI_POLL); if (changed_states & group->rtpoll_states) memcpy(group->rtpoll_total, group->total[PSI_POLL], sizeof(group->rtpoll_total)); @@ -1243,16 +1261,19 @@ int psi_show(struct seq_file *m, struct psi_group *group, enum psi_res res) } struct psi_trigger *psi_trigger_create(struct psi_group *group, - char *buf, enum psi_res res) + char *buf, enum psi_res res, struct file *file) { struct psi_trigger *t; enum psi_states state; u32 threshold_us; + bool privileged; u32 window_us; if (static_branch_likely(&psi_disabled)) return ERR_PTR(-EOPNOTSUPP); + privileged = cap_raised(file->f_cred->cap_effective, CAP_SYS_RESOURCE); + if (sscanf(buf, "some %u %u", &threshold_us, &window_us) == 2) state = PSI_IO_SOME + res * 2; else if (sscanf(buf, "full %u %u", &threshold_us, &window_us) == 2) @@ -1272,6 +1293,13 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group, window_us > WINDOW_MAX_US) return ERR_PTR(-EINVAL); + /* + * Unprivileged users can only use 2s windows so that averages aggregation + * work is used, and no RT threads need to be spawned. + */ + if (!privileged && window_us % 2000000) + return ERR_PTR(-EINVAL); + /* Check threshold */ if (threshold_us == 0 || threshold_us > window_us) return ERR_PTR(-EINVAL); @@ -1291,10 +1319,11 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group, t->last_event_time = 0; init_waitqueue_head(&t->event_wait); t->pending_event = false; + t->aggregator = privileged ? PSI_POLL : PSI_AVGS; mutex_lock(&group->rtpoll_trigger_lock); - if (!rcu_access_pointer(group->rtpoll_task)) { + if (privileged && !rcu_access_pointer(group->rtpoll_task)) { struct task_struct *task; task = kthread_create(psi_rtpoll_worker, group, "psimon"); @@ -1308,11 +1337,16 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group, rcu_assign_pointer(group->rtpoll_task, task); } - list_add(&t->node, &group->rtpoll_triggers); - group->rtpoll_min_period = min(group->rtpoll_min_period, - div_u64(t->win.size, UPDATES_PER_WINDOW)); - group->rtpoll_nr_triggers[t->state]++; - group->rtpoll_states |= (1 << t->state); + if (privileged) { + list_add(&t->node, &group->rtpoll_triggers); + group->rtpoll_min_period = min(group->rtpoll_min_period, + div_u64(t->win.size, UPDATES_PER_WINDOW)); + group->rtpoll_nr_triggers[t->state]++; + group->rtpoll_states |= (1 << t->state); + } else { + list_add(&t->node, &group->triggers); + group->nr_triggers[t->state]++; + } mutex_unlock(&group->rtpoll_trigger_lock); @@ -1346,22 +1380,26 @@ void psi_trigger_destroy(struct psi_trigger *t) u64 period = ULLONG_MAX; list_del(&t->node); - group->rtpoll_nr_triggers[t->state]--; - if (!group->rtpoll_nr_triggers[t->state]) - group->rtpoll_states &= ~(1 << t->state); - /* reset min update period for the remaining triggers */ - list_for_each_entry(tmp, &group->rtpoll_triggers, node) - period = min(period, div_u64(tmp->win.size, - UPDATES_PER_WINDOW)); - group->rtpoll_min_period = period; - /* Destroy rtpoll_task when the last trigger is destroyed */ - if (group->rtpoll_states == 0) { - group->rtpoll_until = 0; - task_to_destroy = rcu_dereference_protected( - group->rtpoll_task, - lockdep_is_held(&group->rtpoll_trigger_lock)); - rcu_assign_pointer(group->rtpoll_task, NULL); - del_timer(&group->rtpoll_timer); + if (t->aggregator == PSI_AVGS) { + group->nr_triggers[t->state]--; + } else { + group->rtpoll_nr_triggers[t->state]--; + if (!group->rtpoll_nr_triggers[t->state]) + group->rtpoll_states &= ~(1 << t->state); + /* reset min update period for the remaining triggers */ + list_for_each_entry(tmp, &group->rtpoll_triggers, node) + period = min(period, div_u64(tmp->win.size, + UPDATES_PER_WINDOW)); + group->rtpoll_min_period = period; + /* Destroy rtpoll_task when the last trigger is destroyed */ + if (group->rtpoll_states == 0) { + group->rtpoll_until = 0; + task_to_destroy = rcu_dereference_protected( + group->rtpoll_task, + lockdep_is_held(&group->rtpoll_trigger_lock)); + rcu_assign_pointer(group->rtpoll_task, NULL); + del_timer(&group->rtpoll_timer); + } } } @@ -1428,9 +1466,6 @@ static int psi_cpu_show(struct seq_file *m, void *v) static int psi_open(struct file *file, int (*psi_show)(struct seq_file *, void *)) { - if (file->f_mode & FMODE_WRITE && !capable(CAP_SYS_RESOURCE)) - return -EPERM; - return single_open(file, psi_show, NULL); } @@ -1480,7 +1515,7 @@ static ssize_t psi_write(struct file *file, const char __user *user_buf, return -EBUSY; } - new = psi_trigger_create(&psi_system, buf, res); + new = psi_trigger_create(&psi_system, buf, res, file); if (IS_ERR(new)) { mutex_unlock(&seq->lock); return PTR_ERR(new);