From patchwork Thu Mar 23 10:33:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Domenico Cerasuolo X-Patchwork-Id: 73993 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp2837671wrt; Thu, 23 Mar 2023 03:54:03 -0700 (PDT) X-Google-Smtp-Source: AK7set8E0WqRXYScNwtGoh/c0d3KDz3uH9BQuaTQBuwSUtcag1hjZ9JPsLfdkiQKzhtl5Gr0u4Fx X-Received: by 2002:a05:6402:205b:b0:501:d43e:d1e6 with SMTP id bc27-20020a056402205b00b00501d43ed1e6mr10095480edb.4.1679568843282; Thu, 23 Mar 2023 03:54:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679568843; cv=none; d=google.com; s=arc-20160816; b=KnAYRaPokm4peKfI56jNHUHo53ki1B2TB6mh6zlkl4H1aBouj0uLIZwzH8FFddTnyS 0G19yR3Zdi83eMgnDtSNT/Li3YNKV4zOKU488tq201cv+/0Ou6xY8+wn6rs8JEOOIIg2 W3qt8wexoy+onc9BXrGXyhzO1BVBFgLXAJo9xF0DZRixK1ofwTVW8ntcK3XvA9DT1BzB /tWnCcrkVJPepgDIGtxGDF8HxbAeG0nRS62/Ayf5bv+JnCPu9LHWdAo+vpUQOib1o1hy aEyHaFFqYdoPA2iJB/I8Eu0fZXi+l7nFo2TgEeTv6PK5ci4uBirjJJLb9Bxni9hVJpuM Yjfw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=e39gTwxGluXQJRtHutgVo/eHgQ2HpZYFqbaudGJzbN8=; b=tjZVyju76Ipd6lxce8qYQhCydTxvXA0mxAHb3IlYs0Xe0V0v5GL/hyNg0eHuzIWJFf 5ghdVwTAi36M7q6zP0EsUgxFxoZCjfdmBaGWzLIJGwV29qBCr4OJL/DMAO25Wx/1jaa3 G/e30BLVPZX/jqPocsU3Sk0zxHC3mZGlkYUxIZ6R8Vy+QD/ygbQ9bR8RATdlyglFR45y Z6zDodAWdBYqesrC664mXteCqHeJ5yFZtxj9UgSQ/P4JeHhAEdbVUd+y3M43G6+iSnyU SCmooxCKSd39FxJ26soXc2rycpjS9RKHOeJ/lEbDheKjtFdE4/m6ciV2acg0k+1tl48Y gOwA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=ZClaRIqY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y5-20020aa7c245000000b004faec71d331si18688939edo.436.2023.03.23.03.53.39; Thu, 23 Mar 2023 03:54:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=ZClaRIqY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231569AbjCWKiG (ORCPT + 99 others); Thu, 23 Mar 2023 06:38:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231415AbjCWKhj (ORCPT ); Thu, 23 Mar 2023 06:37:39 -0400 Received: from mail-ed1-x536.google.com (mail-ed1-x536.google.com [IPv6:2a00:1450:4864:20::536]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B539522A10 for ; Thu, 23 Mar 2023 03:33:59 -0700 (PDT) Received: by mail-ed1-x536.google.com with SMTP id h8so84309397ede.8 for ; Thu, 23 Mar 2023 03:33:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679567638; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=e39gTwxGluXQJRtHutgVo/eHgQ2HpZYFqbaudGJzbN8=; b=ZClaRIqYQClVmo2WzlEaXJ3FTA9cMbeW/qTqyIoJ7r5dCVVNJR8r2dnltXs3A+I0XG snhPmQSu4v1M4p1hfgugF3Y5Y5+gOeEh23yK8hKNeVPqAXYs7F5McksJyCy8SswwqIze hd3qk+HKagMUo3oqdVND8GE9lZg00pl/ez5sFPfYxZQJtFlArCfJf6nSW8voqBpzMfoE DyyafPLu7xhmGexcbdsRosJHcc5DIzGDdjRR/jlcQ9793yOp6KMXre5+DYYEHg0pFh1X rSN1PA9hGEJIyfdE71UG4QF4gasoDmGr0S0xGYimac0atc13+mWWKTERXYjixsnDecWh VpgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679567638; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=e39gTwxGluXQJRtHutgVo/eHgQ2HpZYFqbaudGJzbN8=; b=K6VUiuWDVcFYW6f3y8jm1+tzYtfBskNWECVDzuonCzwAnAxiP18c1IJrMGtr59eanD iVcObAlApUz8bA4UL9Og4BXxYCdtKGb/OHDGZLiipnIyQ2G6jskPnQ34moW8de7yHhRt E8WCoW1kGLWtNVa+AZGIgDrHeiZSrY+mG+xfLENkxnDTSYSAVFuNQ2OqgRu+nnIuyZOS YuBIwcXDam/34OfqVu7Nc59Ldix9w+zlqyaGB3BVLqMl8cWnm19e222zX+oyiVOZbNqR p5o7b4zoVQ6uxXH8jvI4bRstcePOpIR2ROL9GO0mEnyV9dSCC8J3U+aFaSaHCgfaKgod 8B/w== X-Gm-Message-State: AO0yUKUhUhrX6psKJseq98/TjpSN7rDwgr9aaC/lD62ZZTktNLZz24F7 y4Z77V52WuF8vjZfbsmowj+3Sm1GBFAhtQ== X-Received: by 2002:a17:906:c309:b0:932:da0d:96ac with SMTP id s9-20020a170906c30900b00932da0d96acmr9393596ejz.10.1679567637836; Thu, 23 Mar 2023 03:33:57 -0700 (PDT) Received: from lelloman-5950.. (host-79-13-135-230.retail.telecomitalia.it. [79.13.135.230]) by smtp.gmail.com with ESMTPSA id s15-20020a1709060c0f00b00928e0ea53e5sm8432687ejf.84.2023.03.23.03.33.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Mar 2023 03:33:57 -0700 (PDT) From: Domenico Cerasuolo To: linux-kernel@vger.kernel.org Cc: peterz@infradead.org, surenb@google.com, brauner@kernel.org, chris@chrisdown.name, hannes@cmpxchg.org, Domenico Cerasuolo Subject: [PATCH v2 1/3] sched/psi: rearrange polling code in preparation Date: Thu, 23 Mar 2023 11:33:48 +0100 Message-Id: <20230323103350.40569-2-cerasuolodomenico@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230323103350.40569-1-cerasuolodomenico@gmail.com> References: <20230323103350.40569-1-cerasuolodomenico@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761155579690742423?= X-GMAIL-MSGID: =?utf-8?q?1761155579690742423?= Move a few functions up in the file to avoid forward declaration needed in the patch implementing unprivileged PSI triggers. Suggested-by: Johannes Weiner Signed-off-by: Domenico Cerasuolo --- kernel/sched/psi.c | 196 ++++++++++++++++++++++----------------------- 1 file changed, 98 insertions(+), 98 deletions(-) diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 02e011cabe91..fe9269f1d2a4 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -384,92 +384,6 @@ static void collect_percpu_times(struct psi_group *group, *pchanged_states = changed_states; } -static u64 update_averages(struct psi_group *group, u64 now) -{ - unsigned long missed_periods = 0; - u64 expires, period; - u64 avg_next_update; - int s; - - /* avgX= */ - expires = group->avg_next_update; - if (now - expires >= psi_period) - missed_periods = div_u64(now - expires, psi_period); - - /* - * The periodic clock tick can get delayed for various - * reasons, especially on loaded systems. To avoid clock - * drift, we schedule the clock in fixed psi_period intervals. - * But the deltas we sample out of the per-cpu buckets above - * are based on the actual time elapsing between clock ticks. - */ - avg_next_update = expires + ((1 + missed_periods) * psi_period); - period = now - (group->avg_last_update + (missed_periods * psi_period)); - group->avg_last_update = now; - - for (s = 0; s < NR_PSI_STATES - 1; s++) { - u32 sample; - - sample = group->total[PSI_AVGS][s] - group->avg_total[s]; - /* - * Due to the lockless sampling of the time buckets, - * recorded time deltas can slip into the next period, - * which under full pressure can result in samples in - * excess of the period length. - * - * We don't want to report non-sensical pressures in - * excess of 100%, nor do we want to drop such events - * on the floor. Instead we punt any overage into the - * future until pressure subsides. By doing this we - * don't underreport the occurring pressure curve, we - * just report it delayed by one period length. - * - * The error isn't cumulative. As soon as another - * delta slips from a period P to P+1, by definition - * it frees up its time T in P. - */ - if (sample > period) - sample = period; - group->avg_total[s] += sample; - calc_avgs(group->avg[s], missed_periods, sample, period); - } - - return avg_next_update; -} - -static void psi_avgs_work(struct work_struct *work) -{ - struct delayed_work *dwork; - struct psi_group *group; - u32 changed_states; - u64 now; - - dwork = to_delayed_work(work); - group = container_of(dwork, struct psi_group, avgs_work); - - mutex_lock(&group->avgs_lock); - - now = sched_clock(); - - collect_percpu_times(group, PSI_AVGS, &changed_states); - /* - * If there is task activity, periodically fold the per-cpu - * times and feed samples into the running averages. If things - * are idle and there is no data to process, stop the clock. - * Once restarted, we'll catch up the running averages in one - * go - see calc_avgs() and missed_periods. - */ - if (now >= group->avg_next_update) - group->avg_next_update = update_averages(group, now); - - if (changed_states & PSI_STATE_RESCHEDULE) { - schedule_delayed_work(dwork, nsecs_to_jiffies( - group->avg_next_update - now) + 1); - } - - mutex_unlock(&group->avgs_lock); -} - /* Trigger tracking window manipulations */ static void window_reset(struct psi_window *win, u64 now, u64 value, u64 prev_growth) @@ -516,18 +430,6 @@ static u64 window_update(struct psi_window *win, u64 now, u64 value) return growth; } -static void init_triggers(struct psi_group *group, u64 now) -{ - struct psi_trigger *t; - - list_for_each_entry(t, &group->triggers, node) - window_reset(&t->win, now, - group->total[PSI_POLL][t->state], 0); - memcpy(group->polling_total, group->total[PSI_POLL], - sizeof(group->polling_total)); - group->polling_next_update = now + group->poll_min_period; -} - static u64 update_triggers(struct psi_group *group, u64 now) { struct psi_trigger *t; @@ -590,6 +492,104 @@ static u64 update_triggers(struct psi_group *group, u64 now) return now + group->poll_min_period; } +static u64 update_averages(struct psi_group *group, u64 now) +{ + unsigned long missed_periods = 0; + u64 expires, period; + u64 avg_next_update; + int s; + + /* avgX= */ + expires = group->avg_next_update; + if (now - expires >= psi_period) + missed_periods = div_u64(now - expires, psi_period); + + /* + * The periodic clock tick can get delayed for various + * reasons, especially on loaded systems. To avoid clock + * drift, we schedule the clock in fixed psi_period intervals. + * But the deltas we sample out of the per-cpu buckets above + * are based on the actual time elapsing between clock ticks. + */ + avg_next_update = expires + ((1 + missed_periods) * psi_period); + period = now - (group->avg_last_update + (missed_periods * psi_period)); + group->avg_last_update = now; + + for (s = 0; s < NR_PSI_STATES - 1; s++) { + u32 sample; + + sample = group->total[PSI_AVGS][s] - group->avg_total[s]; + /* + * Due to the lockless sampling of the time buckets, + * recorded time deltas can slip into the next period, + * which under full pressure can result in samples in + * excess of the period length. + * + * We don't want to report non-sensical pressures in + * excess of 100%, nor do we want to drop such events + * on the floor. Instead we punt any overage into the + * future until pressure subsides. By doing this we + * don't underreport the occurring pressure curve, we + * just report it delayed by one period length. + * + * The error isn't cumulative. As soon as another + * delta slips from a period P to P+1, by definition + * it frees up its time T in P. + */ + if (sample > period) + sample = period; + group->avg_total[s] += sample; + calc_avgs(group->avg[s], missed_periods, sample, period); + } + + return avg_next_update; +} + +static void psi_avgs_work(struct work_struct *work) +{ + struct delayed_work *dwork; + struct psi_group *group; + u32 changed_states; + u64 now; + + dwork = to_delayed_work(work); + group = container_of(dwork, struct psi_group, avgs_work); + + mutex_lock(&group->avgs_lock); + + now = sched_clock(); + + collect_percpu_times(group, PSI_AVGS, &changed_states); + /* + * If there is task activity, periodically fold the per-cpu + * times and feed samples into the running averages. If things + * are idle and there is no data to process, stop the clock. + * Once restarted, we'll catch up the running averages in one + * go - see calc_avgs() and missed_periods. + */ + if (now >= group->avg_next_update) + group->avg_next_update = update_averages(group, now); + + if (changed_states & PSI_STATE_RESCHEDULE) { + schedule_delayed_work(dwork, nsecs_to_jiffies( + group->avg_next_update - now) + 1); + } + + mutex_unlock(&group->avgs_lock); +} + +static void init_triggers(struct psi_group *group, u64 now) +{ + struct psi_trigger *t; + + list_for_each_entry(t, &group->triggers, node) + window_reset(&t->win, now, + group->total[PSI_POLL][t->state], 0); + memcpy(group->polling_total, group->total[PSI_POLL], + sizeof(group->polling_total)); + group->polling_next_update = now + group->poll_min_period; +} + /* Schedule polling if it's not already scheduled or forced. */ static void psi_schedule_poll_work(struct psi_group *group, unsigned long delay, bool force) From patchwork Thu Mar 23 10:33:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Domenico Cerasuolo X-Patchwork-Id: 73994 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp2837835wrt; Thu, 23 Mar 2023 03:54:29 -0700 (PDT) X-Google-Smtp-Source: AK7set/v8HjW95WZMirfkIeaAsITGWFKN9XJUxS6pW/Mqb/Kis2ZVagDdhrymD4zpPnDW13+0DIA X-Received: by 2002:a50:fe88:0:b0:4fd:14d5:bb47 with SMTP id d8-20020a50fe88000000b004fd14d5bb47mr8947820edt.38.1679568869767; Thu, 23 Mar 2023 03:54:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679568869; cv=none; d=google.com; s=arc-20160816; b=g77HDpHbEn04U7KKrbWghmbFzGs40ry497vx6Un0KQAsNEysUV0NXtZX3NeUUnYd7I WN1NJjDi96PbDpuoTgBD0k4YQA86xrx8lkraVZoTOxUQcVv0tj4BZxA/031S/yK2VaEM S8XbOScviIBPeTSKVoeypyf67WpUeyrO6PUsnYqPqlZu9hvUoSAmTYjNKlWUjauIVB11 VwbLWwKqTO72N8PXzT6rI8ecTdR6/SzJ0WqncNDN7XjL+vLaWdOyTh6I58LhJG0gc2c/ phCFPBRFhNae7N2rB7ivKgM13lq3zuZyBkoEOHZHwcHhwxdmA5H50h3kctaynw/YfGAp 9sag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=HmkAbvZMMAm1tWub9ISTg2NSODjujuFeUC6lDF5gaQo=; b=wUFCBhfSHzXcPGpGUuDRG+Nc7Nx3fmjk27sq/zWlqJiAy2ptovay0GnCnRYOt0dsIr HlJO/Q52NdxWHJhrT43FDmeD2XebhQB1wcqhlkeB1cRHLSzgct/U74al12pl6jvvm5at T+N3wAy+8MstEKZ1CTGOK6gdOfwm3pN6TK3vd06kGBUSJ6UFArqBOzMBAOe6ah8OjMKl 9o0+9p42HyajuuVkXzk8EhiQAWuNnF/9gM2Tg8HbVKERbfIeuUTbJzq3ygidYsVk0dYb tAFRxz/gI1EdJy5707OBW08rprXGFTA54P3omM4UYMl2VjueNbJLrjAHudjsNylw8l/f zrNg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=iBU8OGv6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d26-20020a056402079a00b00501d485862esi8641845edy.426.2023.03.23.03.54.06; Thu, 23 Mar 2023 03:54:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=iBU8OGv6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231418AbjCWKiD (ORCPT + 99 others); Thu, 23 Mar 2023 06:38:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34392 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230369AbjCWKhj (ORCPT ); Thu, 23 Mar 2023 06:37:39 -0400 Received: from mail-ed1-x52e.google.com (mail-ed1-x52e.google.com [IPv6:2a00:1450:4864:20::52e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EAA04227B5 for ; Thu, 23 Mar 2023 03:34:00 -0700 (PDT) Received: by mail-ed1-x52e.google.com with SMTP id o12so84314923edb.9 for ; Thu, 23 Mar 2023 03:34:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679567639; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=HmkAbvZMMAm1tWub9ISTg2NSODjujuFeUC6lDF5gaQo=; b=iBU8OGv6tP3Y2HJGuiCD511Ba3WPR8/e99J1WXNwsVjiW2Nl9j37CeHv795s3t/keM 49SCi5bn1H1Pv6sGkdCrr5x24BlvbqFoQuV56aQl8JpGFqkHRjv9KtEcZ37AFmsNZRSD nzNbWCmpSHkVnMau5YVGO/cdtwSg57XXNCoE9B3YwB82ePq1qNXGqseA//ED+IOqZ7fc zpsJzwKCIp49vw6ff6h8RWnHkUw2TnqO0oOCBJivk9drcjx7lJpd5Aa8aon/SiEI985B /fgYxLXmvzKYmsXF5di1jAW0x2haaYm7nJtmul1PV82aM20+Q+4N8z+Zzi0eOE79pJkq pHSw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679567639; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HmkAbvZMMAm1tWub9ISTg2NSODjujuFeUC6lDF5gaQo=; b=OvnlAtllf7DnHo+suVCU+n7AYbN5GLZfV5mw/BBiMhU6X/wehzpZNbCHmH1awGzM25 +CSmb0Ukccd3F/XJYiqGSXG+vPV1MnijlJ5IWkV7Mah/FJP/mktx1WyM04T32Db+LtL3 iN0W0s1lqzNeDbWV80z3obeFgBsKsTgh/3Z8H3lnlxpShJOBZJMhgjnRsdPyjJc+b2kV OFtGGT8AQFhbnTRdCSZcfbgnCm5I8I8oJ3zc6JztfneRNwmHt09lr23JM7K0AQSDR42g urJGSCM3LAnINZvo8tkTof85QwB3dL4J740Ftx5i3o9TXlpHZ6qH8Sz0oNLERa4Pxg0V 9U4g== X-Gm-Message-State: AO0yUKX6YHvZmvvi3dxInmXH6ECMXZA6CKL4Ht8HZ8HLmOPkKKpmq7ed beSt4x+aGUOV4vBEgLH0dCrnDC6xzex/Eg== X-Received: by 2002:a17:906:f8c2:b0:930:3916:df17 with SMTP id lh2-20020a170906f8c200b009303916df17mr11515224ejb.0.1679567639041; Thu, 23 Mar 2023 03:33:59 -0700 (PDT) Received: from lelloman-5950.. (host-79-13-135-230.retail.telecomitalia.it. [79.13.135.230]) by smtp.gmail.com with ESMTPSA id s15-20020a1709060c0f00b00928e0ea53e5sm8432687ejf.84.2023.03.23.03.33.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Mar 2023 03:33:58 -0700 (PDT) From: Domenico Cerasuolo To: linux-kernel@vger.kernel.org Cc: peterz@infradead.org, surenb@google.com, brauner@kernel.org, chris@chrisdown.name, hannes@cmpxchg.org, Domenico Cerasuolo Subject: [PATCH v2 2/3] sched/psi: extract update_triggers side effect Date: Thu, 23 Mar 2023 11:33:49 +0100 Message-Id: <20230323103350.40569-3-cerasuolodomenico@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230323103350.40569-1-cerasuolodomenico@gmail.com> References: <20230323103350.40569-1-cerasuolodomenico@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761155606937502130?= X-GMAIL-MSGID: =?utf-8?q?1761155606937502130?= This change moves update_total flag out of update_triggers function, currently called only in psi_poll_work. In the next patch, update_triggers will be called also in psi_avgs_work, but the total update information is specific to psi_poll_work. Returning update_total value to the caller let us avoid differentiating the implementation of update_triggers for different aggregators. Suggested-by: Johannes Weiner Signed-off-by: Domenico Cerasuolo --- kernel/sched/psi.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index fe9269f1d2a4..17d71ef07751 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -430,11 +430,11 @@ static u64 window_update(struct psi_window *win, u64 now, u64 value) return growth; } -static u64 update_triggers(struct psi_group *group, u64 now) +static u64 update_triggers(struct psi_group *group, u64 now, bool *update_total) { struct psi_trigger *t; - bool update_total = false; u64 *total = group->total[PSI_POLL]; + *update_total = false; /* * On subsequent updates, calculate growth deltas and let @@ -462,7 +462,7 @@ static u64 update_triggers(struct psi_group *group, u64 now) * been through all of them. Also remember to extend the * polling time if we see new stall activity. */ - update_total = true; + *update_total = true; /* Calculate growth since last update */ growth = window_update(&t->win, now, total[t->state]); @@ -485,10 +485,6 @@ static u64 update_triggers(struct psi_group *group, u64 now) t->pending_event = false; } - if (update_total) - memcpy(group->polling_total, total, - sizeof(group->polling_total)); - return now + group->poll_min_period; } @@ -622,6 +618,7 @@ static void psi_poll_work(struct psi_group *group) { bool force_reschedule = false; u32 changed_states; + bool update_total; u64 now; mutex_lock(&group->trigger_lock); @@ -686,8 +683,12 @@ static void psi_poll_work(struct psi_group *group) goto out; } - if (now >= group->polling_next_update) - group->polling_next_update = update_triggers(group, now); + if (now >= group->polling_next_update) { + group->polling_next_update = update_triggers(group, now, &update_total); + if (update_total) + memcpy(group->polling_total, group->total[PSI_POLL], + sizeof(group->polling_total)); + } psi_schedule_poll_work(group, nsecs_to_jiffies(group->polling_next_update - now) + 1, From patchwork Thu Mar 23 10:33:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Domenico Cerasuolo X-Patchwork-Id: 73990 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp2834606wrt; Thu, 23 Mar 2023 03:45:24 -0700 (PDT) X-Google-Smtp-Source: AK7set/WA765w4uCz8GOADnllNq+iAd784Bb+4Dm/ayC/xsvvcfYiJgkuL7U6oyJGF15UPereuI7 X-Received: by 2002:aa7:d982:0:b0:501:d3a0:30d9 with SMTP id u2-20020aa7d982000000b00501d3a030d9mr5302222eds.7.1679568324269; Thu, 23 Mar 2023 03:45:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679568324; cv=none; d=google.com; s=arc-20160816; b=rHFRMLqm2BAgfdAlaZCpCzs4tkazdp8Y3BCxpqKINPAsy6BqQ4DDieXuyWKFlcTqYP mZQcAbEK3JrHx/H41/OWWIOj8V6MNoJwqnu+DtatRi402kkOj9RmHpHQOJAXcjtpRfiK Ro4ewx3jPb5uSzQ9d24V46ZA4JJO9lj+Si8lIKrQ2YLwXW8FPjd7r9GcoAaQTQ6iqeNO h8d4C9GE9XU4jT08pQL7i0ceiVbHJQT2ptKTinDoicvB+OgXynnrgpPOfW9TKFnTsTyI 2WTdWShyDwXIZVfRjEYpYMBSUluLcbGRYICtvCwUohPH6I3tXAPmor7Wz5jBcV/20L4O PzOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Qjht4vTChRpFeoKS3xK8PrUlAYMElAZIp6og0BY81Ms=; b=wcDT6aqqX0NSr22n6cUgL6FtsxlW4ULRLheZ0m1pOqkBxPeD5QAhzI1YB6E7UAUJgA crqMmaMcgZITReUmBqAWNEE5xHbb6UhOgt180CdPv21obUBW1vGZgLkVUoCvOF4iPFm5 16V5Hskx47kWzsY8clwzrr0brVjZd/Qu3ojatp794ucpWsuZkk2ZjAYdJ919SPvW1tMW IHWdtx6uPlj+LQj5aWcSPZvowNnXF0HGA4lOUz/YySUouHm9w+4YsToH7doJqXMrDQzc /DBHPCIIVfiPJymBnWlxwVOD0O4FX+QJyCb3SdAT72aHmiYOk3OxhtA9IBaWzTIWs9Zr ExxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=MfbiojYr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id sc6-20020a1709078a0600b0093cd63ce6f8si1543489ejc.523.2023.03.23.03.45.00; Thu, 23 Mar 2023 03:45:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=MfbiojYr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231579AbjCWKiJ (ORCPT + 99 others); Thu, 23 Mar 2023 06:38:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33202 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231142AbjCWKhj (ORCPT ); Thu, 23 Mar 2023 06:37:39 -0400 Received: from mail-ed1-x529.google.com (mail-ed1-x529.google.com [IPv6:2a00:1450:4864:20::529]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16C591C307 for ; Thu, 23 Mar 2023 03:34:02 -0700 (PDT) Received: by mail-ed1-x529.google.com with SMTP id eg48so84227025edb.13 for ; Thu, 23 Mar 2023 03:34:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; t=1679567640; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Qjht4vTChRpFeoKS3xK8PrUlAYMElAZIp6og0BY81Ms=; b=MfbiojYrW35n833YKGLdmFVP8lv6mVjA/jXwandx3HVYwMXkra5c2SsbJtleqriVdj ICq9KI50HY9y4O0utld2XDCs+anUuh8WCptdQeI7TUstWZIYk/ywd0OeRySIt6b+2SJZ 9nxaTG+vN2yF2fEwK9W8x8CWYGtqNim7167qxK+MNbLjjpbxEAbI9dE/QMnoAjpnSX3Q rezUzQqEoSuCZnwW6SSrZH37MByGfAK9BTeeJ/KHMH506X7uSPgj7ehXj4m6pjUWUQjy z6tRWIRIh7fVWTu+VI4DmIqTNMvsGcUVtnmXKg2jJHg+87e1J9q5QrGBuXx9LE98LXcv 78Yg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679567640; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Qjht4vTChRpFeoKS3xK8PrUlAYMElAZIp6og0BY81Ms=; b=7dKACMVfQGoCCCXU6/jqU7mf6eTdXrEbES6vf8Tlsi8kGqOzBSRA/D5B/nr+e3iT1S hchKpLm/YZc9dKRgjcJTQ05+cGCiRSuwgryd6vxYo2b07fb2U5Zje3I+02wW3JtTToG7 cff2kWrLMhnHXaKo+h2Ds7Buc+V3WEJj3khtwU38PVtAgZXuUlrk1SvGY+USaf2fsXed tGXnR4PaAleAcS6z88lesKWPeNkUlqHoqtHnbGesHUTWjs1YgiZgcVKUfSjg7pe2pmsj PX4jO6OXy8ipZ2Oq25DWnVfy6JMl7tpV90ct/YBZdAEaFNyokA0TQHu5Bh1JxLXS9I9F ft2g== X-Gm-Message-State: AO0yUKU3gNauzNgW0vyAV/cMCKQUGoeWxG2wUiMWK300Pg7Pg/jUsDqn bJSeDFVNAebbrxtb/4kKmpPrJQmZ7QciSg== X-Received: by 2002:a17:906:360f:b0:92f:dcf7:9434 with SMTP id q15-20020a170906360f00b0092fdcf79434mr5370616ejb.9.1679567640253; Thu, 23 Mar 2023 03:34:00 -0700 (PDT) Received: from lelloman-5950.. (host-79-13-135-230.retail.telecomitalia.it. [79.13.135.230]) by smtp.gmail.com with ESMTPSA id s15-20020a1709060c0f00b00928e0ea53e5sm8432687ejf.84.2023.03.23.03.33.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Mar 2023 03:33:59 -0700 (PDT) From: Domenico Cerasuolo To: linux-kernel@vger.kernel.org Cc: peterz@infradead.org, surenb@google.com, brauner@kernel.org, chris@chrisdown.name, hannes@cmpxchg.org, Domenico Cerasuolo Subject: [PATCH v2 3/3] sched/psi: allow unprivileged polling of N*2s period Date: Thu, 23 Mar 2023 11:33:50 +0100 Message-Id: <20230323103350.40569-4-cerasuolodomenico@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230323103350.40569-1-cerasuolodomenico@gmail.com> References: <20230323103350.40569-1-cerasuolodomenico@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761155035307146278?= X-GMAIL-MSGID: =?utf-8?q?1761155035307146278?= PSI offers 2 mechanisms to get information about a specific resource pressure. One is reading from /proc/pressure/, which gives average pressures aggregated every 2s. The other is creating a pollable fd for a specific resource and cgroup. The trigger creation requires CAP_SYS_RESOURCE, and gives the possibility to pick specific time window and threshold, spawing an RT thread to aggregate the data. Systemd would like to provide containers the option to monitor pressure on their own cgroup and sub-cgroups. For example, if systemd launches a container that itself then launches services, the container should have the ability to poll() for pressure in individual services. But neither the container nor the services are privileged. This patch implements a mechanism to allow unprivileged users to create pressure triggers. The difference with privileged triggers creation is that unprivileged ones must have a time window that's a multiple of 2s. This is so that we can avoid unrestricted spawning of rt threads, and use instead the same aggregation mechanism done for the averages, which runs independently of any triggers. Suggested-by: Johannes Weiner Signed-off-by: Domenico Cerasuolo --- Documentation/accounting/psi.rst | 4 ++ include/linux/psi.h | 2 +- include/linux/psi_types.h | 11 +++- kernel/cgroup/cgroup.c | 2 +- kernel/sched/psi.c | 110 ++++++++++++++++++------------- 5 files changed, 78 insertions(+), 51 deletions(-) diff --git a/Documentation/accounting/psi.rst b/Documentation/accounting/psi.rst index 5e40b3f437f9..df6062eb3abb 100644 --- a/Documentation/accounting/psi.rst +++ b/Documentation/accounting/psi.rst @@ -105,6 +105,10 @@ prevent overly frequent polling. Max limit is chosen as a high enough number after which monitors are most likely not needed and psi averages can be used instead. +Unprivileged users can also create monitors, with the only limitation that the +window size must be a multiple of 2s, in order to prevent excessive resource +usage. + When activated, psi monitor stays active for at least the duration of one tracking window to avoid repeated activations/deactivations when system is bouncing in and out of the stall state. diff --git a/include/linux/psi.h b/include/linux/psi.h index b029a847def1..ab26200c2803 100644 --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -24,7 +24,7 @@ void psi_memstall_leave(unsigned long *flags); int psi_show(struct seq_file *s, struct psi_group *group, enum psi_res res); struct psi_trigger *psi_trigger_create(struct psi_group *group, - char *buf, enum psi_res res); + char *buf, enum psi_res res, struct file *file); void psi_trigger_destroy(struct psi_trigger *t); __poll_t psi_trigger_poll(void **trigger_ptr, struct file *file, diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index 1e0a0d7ace3a..eaee30f54670 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -151,6 +151,14 @@ struct psi_trigger { /* Deferred event(s) from previous ratelimit window */ bool pending_event; + + /* Used to differentiate destruction action*/ + enum psi_aggregators aggregator; +}; + +struct trigger_info { + struct list_head triggers; + u32 nr_triggers[NR_PSI_STATES - 1]; }; struct psi_group { @@ -186,8 +194,7 @@ struct psi_group { struct mutex trigger_lock; /* Configured polling triggers */ - struct list_head triggers; - u32 nr_triggers[NR_PSI_STATES - 1]; + struct trigger_info trig_info[NR_PSI_AGGREGATORS]; u32 poll_states; u64 poll_min_period; diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 935e8121b21e..dead36969bba 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -3761,7 +3761,7 @@ static ssize_t pressure_write(struct kernfs_open_file *of, char *buf, } psi = cgroup_psi(cgrp); - new = psi_trigger_create(psi, buf, res); + new = psi_trigger_create(psi, buf, res, of->file); if (IS_ERR(new)) { cgroup_put(cgrp); return PTR_ERR(new); diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 17d71ef07751..f15d92819fe5 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -180,18 +180,20 @@ static void poll_timer_fn(struct timer_list *t); static void group_init(struct psi_group *group) { int cpu; + int i; group->enabled = true; for_each_possible_cpu(cpu) seqcount_init(&per_cpu_ptr(group->pcpu, cpu)->seq); group->avg_last_update = sched_clock(); group->avg_next_update = group->avg_last_update + psi_period; - INIT_DELAYED_WORK(&group->avgs_work, psi_avgs_work); mutex_init(&group->avgs_lock); /* Init trigger-related members */ atomic_set(&group->poll_scheduled, 0); mutex_init(&group->trigger_lock); - INIT_LIST_HEAD(&group->triggers); + for (i = 0; i < NR_PSI_AGGREGATORS; i++) + INIT_LIST_HEAD(&group->trig_info[i].triggers); + INIT_DELAYED_WORK(&group->avgs_work, psi_avgs_work); group->poll_min_period = U32_MAX; group->polling_next_update = ULLONG_MAX; init_waitqueue_head(&group->poll_wait); @@ -430,21 +432,24 @@ static u64 window_update(struct psi_window *win, u64 now, u64 value) return growth; } -static u64 update_triggers(struct psi_group *group, u64 now, bool *update_total) +static u64 update_triggers(struct psi_group *group, u64 now, bool *update_total, + enum psi_aggregators aggregator) { struct psi_trigger *t; - u64 *total = group->total[PSI_POLL]; + u64 *total = group->total[aggregator]; + struct list_head *triggers = &group->trig_info[aggregator].triggers; + u64 *aggregator_total = aggregator == PSI_AVGS ? group->avg_total : group->polling_total; *update_total = false; /* * On subsequent updates, calculate growth deltas and let * watchers know when their specified thresholds are exceeded. */ - list_for_each_entry(t, &group->triggers, node) { + list_for_each_entry(t, triggers, node) { u64 growth; bool new_stall; - new_stall = group->polling_total[t->state] != total[t->state]; + new_stall = aggregator_total[t->state] != total[t->state]; /* Check for stall activity or a previous threshold breach */ if (!new_stall && !t->pending_event) @@ -545,6 +550,7 @@ static void psi_avgs_work(struct work_struct *work) { struct delayed_work *dwork; struct psi_group *group; + bool update_total; u32 changed_states; u64 now; @@ -563,8 +569,10 @@ static void psi_avgs_work(struct work_struct *work) * Once restarted, we'll catch up the running averages in one * go - see calc_avgs() and missed_periods. */ - if (now >= group->avg_next_update) + if (now >= group->avg_next_update) { + update_triggers(group, now, &update_total, PSI_AVGS); group->avg_next_update = update_averages(group, now); + } if (changed_states & PSI_STATE_RESCHEDULE) { schedule_delayed_work(dwork, nsecs_to_jiffies( @@ -574,11 +582,11 @@ static void psi_avgs_work(struct work_struct *work) mutex_unlock(&group->avgs_lock); } -static void init_triggers(struct psi_group *group, u64 now) +static void init_poll_triggers(struct psi_group *group, u64 now) { struct psi_trigger *t; - list_for_each_entry(t, &group->triggers, node) + list_for_each_entry(t, &group->trig_info[PSI_POLL].triggers, node) window_reset(&t->win, now, group->total[PSI_POLL][t->state], 0); memcpy(group->polling_total, group->total[PSI_POLL], @@ -667,7 +675,7 @@ static void psi_poll_work(struct psi_group *group) if (changed_states & group->poll_states) { /* Initialize trigger windows when entering polling mode */ if (now > group->polling_until) - init_triggers(group, now); + init_poll_triggers(group, now); /* * Keep the monitor active for at least the duration of the @@ -684,7 +692,7 @@ static void psi_poll_work(struct psi_group *group) } if (now >= group->polling_next_update) { - group->polling_next_update = update_triggers(group, now, &update_total); + group->polling_next_update = update_triggers(group, now, &update_total, PSI_POLL); if (update_total) memcpy(group->polling_total, group->total[PSI_POLL], sizeof(group->polling_total)); @@ -1254,16 +1262,19 @@ int psi_show(struct seq_file *m, struct psi_group *group, enum psi_res res) } struct psi_trigger *psi_trigger_create(struct psi_group *group, - char *buf, enum psi_res res) + char *buf, enum psi_res res, struct file *file) { struct psi_trigger *t; enum psi_states state; u32 threshold_us; + bool privileged; u32 window_us; if (static_branch_likely(&psi_disabled)) return ERR_PTR(-EOPNOTSUPP); + privileged = cap_raised(file->f_cred->cap_effective, CAP_SYS_RESOURCE); + if (sscanf(buf, "some %u %u", &threshold_us, &window_us) == 2) state = PSI_IO_SOME + res * 2; else if (sscanf(buf, "full %u %u", &threshold_us, &window_us) == 2) @@ -1283,6 +1294,13 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group, window_us > WINDOW_MAX_US) return ERR_PTR(-EINVAL); + /* + * Unprivileged users can only use 2s windows so that averages aggregation + * work is used, and no RT threads need to be spawned. + */ + if (!privileged && window_us % 2000000) + return ERR_PTR(-EINVAL); + /* Check threshold */ if (threshold_us == 0 || threshold_us > window_us) return ERR_PTR(-EINVAL); @@ -1302,10 +1320,11 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group, t->last_event_time = 0; init_waitqueue_head(&t->event_wait); t->pending_event = false; + t->aggregator = privileged ? PSI_POLL : PSI_AVGS; mutex_lock(&group->trigger_lock); - if (!rcu_access_pointer(group->poll_task)) { + if (privileged && !rcu_access_pointer(group->poll_task)) { struct task_struct *task; task = kthread_create(psi_poll_worker, group, "psimon"); @@ -1319,12 +1338,14 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group, rcu_assign_pointer(group->poll_task, task); } - list_add(&t->node, &group->triggers); - group->poll_min_period = min(group->poll_min_period, - div_u64(t->win.size, UPDATES_PER_WINDOW)); - group->nr_triggers[t->state]++; - group->poll_states |= (1 << t->state); + list_add(&t->node, &group->trig_info[t->aggregator].triggers); + group->trig_info[t->aggregator].nr_triggers[t->state]++; + if (privileged) { + group->poll_min_period = min(group->poll_min_period, + div_u64(t->win.size, UPDATES_PER_WINDOW)); + group->poll_states |= (1 << t->state); + } mutex_unlock(&group->trigger_lock); return t; @@ -1357,22 +1378,25 @@ void psi_trigger_destroy(struct psi_trigger *t) u64 period = ULLONG_MAX; list_del(&t->node); - group->nr_triggers[t->state]--; - if (!group->nr_triggers[t->state]) - group->poll_states &= ~(1 << t->state); - /* reset min update period for the remaining triggers */ - list_for_each_entry(tmp, &group->triggers, node) - period = min(period, div_u64(tmp->win.size, - UPDATES_PER_WINDOW)); - group->poll_min_period = period; - /* Destroy poll_task when the last trigger is destroyed */ - if (group->poll_states == 0) { - group->polling_until = 0; - task_to_destroy = rcu_dereference_protected( - group->poll_task, - lockdep_is_held(&group->trigger_lock)); - rcu_assign_pointer(group->poll_task, NULL); - del_timer(&group->poll_timer); + group->trig_info[t->aggregator].nr_triggers[t->state]--; + + if (t->aggregator == PSI_POLL) { + if (!group->trig_info[t->aggregator].nr_triggers[t->state]) + group->poll_states &= ~(1 << t->state); + /* reset min update period for the remaining triggers */ + list_for_each_entry(tmp, &group->trig_info[t->aggregator].triggers, node) + period = min(period, div_u64(tmp->win.size, + UPDATES_PER_WINDOW)); + group->poll_min_period = period; + /* Destroy poll_task when the last trigger is destroyed */ + if (group->poll_states == 0) { + group->polling_until = 0; + task_to_destroy = rcu_dereference_protected( + group->poll_task, + lockdep_is_held(&group->trigger_lock)); + rcu_assign_pointer(group->poll_task, NULL); + del_timer(&group->poll_timer); + } } } @@ -1436,27 +1460,19 @@ static int psi_cpu_show(struct seq_file *m, void *v) return psi_show(m, &psi_system, PSI_CPU); } -static int psi_open(struct file *file, int (*psi_show)(struct seq_file *, void *)) -{ - if (file->f_mode & FMODE_WRITE && !capable(CAP_SYS_RESOURCE)) - return -EPERM; - - return single_open(file, psi_show, NULL); -} - static int psi_io_open(struct inode *inode, struct file *file) { - return psi_open(file, psi_io_show); + return single_open(file, psi_io_show, NULL); } static int psi_memory_open(struct inode *inode, struct file *file) { - return psi_open(file, psi_memory_show); + return single_open(file, psi_memory_show, NULL); } static int psi_cpu_open(struct inode *inode, struct file *file) { - return psi_open(file, psi_cpu_show); + return single_open(file, psi_cpu_show, NULL); } static ssize_t psi_write(struct file *file, const char __user *user_buf, @@ -1490,7 +1506,7 @@ static ssize_t psi_write(struct file *file, const char __user *user_buf, return -EBUSY; } - new = psi_trigger_create(&psi_system, buf, res); + new = psi_trigger_create(&psi_system, buf, res, file); if (IS_ERR(new)) { mutex_unlock(&seq->lock); return PTR_ERR(new); @@ -1570,7 +1586,7 @@ static int psi_irq_show(struct seq_file *m, void *v) static int psi_irq_open(struct inode *inode, struct file *file) { - return psi_open(file, psi_irq_show); + return single_open(file, psi_irq_show, NULL); } static ssize_t psi_irq_write(struct file *file, const char __user *user_buf,