From patchwork Tue Feb 13 18:44:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Morse X-Patchwork-Id: 200557 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp739630dyb; Tue, 13 Feb 2024 10:48:31 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUB4EbVOX5i3gmAt3GU2jvgG1uCalmFgDx2zWtgN86N7dhkE5vSOKgFjAG7KU+HEol1YVRfqoOULn9GN1S60OA4H4k9ig== X-Google-Smtp-Source: AGHT+IEqAvqkUHInl0GFc/kIbBNlZZdMsJyFRsPbmJmX49I58BCTz1T0C0cXnkHhxRBEzdefU8BB X-Received: by 2002:a17:906:b852:b0:a3c:c97e:7d63 with SMTP id ga18-20020a170906b85200b00a3cc97e7d63mr116402ejb.48.1707850111167; Tue, 13 Feb 2024 10:48:31 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707850111; cv=pass; d=google.com; s=arc-20160816; b=qrFC7ygQJsMccfhj7ZNR5+2h1XWVGUz3mXXYwv3I3MVlOf2B035gGuPOFWzU007Zk9 3NpCaBfDR+ngWgjo/qJwqNqJtgJagL1xxGulblYiRFbTPf4FwJ4r8mBeHeKwR7UuyLt/ HBAiZ/IhSVryEltoVcJAuMW+YPCY5w1q5M6j4eN+J4OgsvMdx3/h6k+JAOZaSja1YhTY Qt+Kgv1VXUTxnEQXYBIGyQe9gkOMwFrclyT8FHJhnUUhPWdjcSOrdCrK/vQWfRzCCOhN O3vX34G3vBtjeZCOZfPXK/8uZyuzbXYO6bMNQ/B73ACrM9j3q6N3fBcEFdHWafOr/CS5 5GSA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=CK5bHYyML+5ggglC8i1zhWHzgKvxzECq517FxCys7WU=; fh=WfSQBSaCeP2KwkGTqRiYyP/3ood+v3b1D5w8ZCIHPGM=; b=HzeUdsNa/U1xeM484gIcx3duzyhSu9+h8ADBYkSZTKt8Vf6Y/yHjZ4pkhrzXZba+S/ ogFhmwU6EoBHlgjHQ4nh467VECr3VsInJs3Fow6DH7pkR2FT05G46/5wYYO1GPfbFUhV PJUsh2Ig5j4DPhAyN7vmabVBEmwKCKpXgdy/aeQxNA83Sm/X+1+tyDpFwmOxPeCX7Hwn JX6EbE8j/aEezMLbUr8vF1lh6l/nll8qbHiuEZAZPEc6csOeIbIHQKAlema8S7Kq6PjY yt1oiOdAfwbfempaRaaEM8aTMbE7HwYnD2hFDihWP28zCIcheOrO4CRd9Js/tOb3N0rd eqaA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-64104-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-64104-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com X-Forwarded-Encrypted: i=2; AJvYcCVRgZglpCtjb3LZODRBCqGlOyc9JF4E2CmiQcaHGm3pCab31/k7OS5cxJXJlwbhGaBKvxwI09iU3mILJb/eBXWe92n0sQ== Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id v18-20020a170906339200b00a3cf5d9315asi1032461eja.482.2024.02.13.10.48.30 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Feb 2024 10:48:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-64104-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-64104-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-64104-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 9D1A71F2200B for ; Tue, 13 Feb 2024 18:48:30 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2460A612C6; Tue, 13 Feb 2024 18:45:49 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C08C6612D6 for ; Tue, 13 Feb 2024 18:45:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707849947; cv=none; b=aSZzROPtIigeAp6weMkI1GIsEJ2NqvdbxRZkO+fuJKtuTVxIVzXRrpssRSferYekCF2gRtncXb1zmRLzfALxW43wisBTi3Wpc0o791ez7EqhOCawpfGFUiuoFCeIJuRE0O4aZkHMN+lkzQFxbMqsndDvDU5vYmZhSGoZ/IuV4ks= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707849947; c=relaxed/simple; bh=KFwlStvMlPXKUvkhpDhAxvJVdMWOp52J5AhonkDwfAg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=PS0NFqwBEcxuktQ7obfYm5LopptCSaOZ/YCMrA1B9t4bVUyDp9sHqWgB5/5jJlErAM4dBSokuDtwTnggCDxIVtWuZwszcd2JUdIx+ChTmDOV5eoIV05l0j8ei24qy0NZgn6oM7qh0C3rwrAEV54nMMi3yKSi2No8lgPX1nFGTaA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BF3191570; Tue, 13 Feb 2024 10:46:26 -0800 (PST) Received: from merodach.members.linode.com (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 677783F766; Tue, 13 Feb 2024 10:45:42 -0800 (PST) From: James Morse To: x86@kernel.org, linux-kernel@vger.kernel.org Cc: Fenghua Yu , Reinette Chatre , Thomas Gleixner , Ingo Molnar , Borislav Petkov , H Peter Anvin , Babu Moger , James Morse , shameerali.kolothum.thodi@huawei.com, D Scott Phillips OS , carl@os.amperecomputing.com, lcherian@marvell.com, bobo.shaobowang@huawei.com, tan.shaopeng@fujitsu.com, baolin.wang@linux.alibaba.com, Jamie Iles , Xin Hao , peternewman@google.com, dfustini@baylibre.com, amitsinght@marvell.com, David Hildenbrand , Babu Moger Subject: [PATCH v9 13/24] x86/resctrl: Queue mon_event_read() instead of sending an IPI Date: Tue, 13 Feb 2024 18:44:27 +0000 Message-Id: <20240213184438.16675-14-james.morse@arm.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240213184438.16675-1-james.morse@arm.com> References: <20240213184438.16675-1-james.morse@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790810638028029377 X-GMAIL-MSGID: 1790810638028029377 Intel is blessed with an abundance of monitors, one per RMID, that can be read from any CPU in the domain. MPAMs monitors reside in the MMIO MSC, the number implemented is up to the manufacturer. This means when there are fewer monitors than needed, they need to be allocated and freed. MPAM's CSU monitors are used to back the 'llc_occupancy' monitor file. The CSU counter is allowed to return 'not ready' for a small number of micro-seconds after programming. To allow one CSU hardware monitor to be used for multiple control or monitor groups, the CPU accessing the monitor needs to be able to block when configuring and reading the counter. Worse, the domain may be broken up into slices, and the MMIO accesses for each slice may need performing from different CPUs. These two details mean MPAMs monitor code needs to be able to sleep, and IPI another CPU in the domain to read from a resource that has been sliced. mon_event_read() already invokes mon_event_count() via IPI, which means this isn't possible. On systems using nohz-full, some CPUs need to be interrupted to run kernel work as they otherwise stay in user-space running realtime workloads. Interrupting these CPUs should be avoided, and scheduling work on them may never complete. Change mon_event_read() to pick a housekeeping CPU, (one that is not using nohz_full) and schedule mon_event_count() and wait. If all the CPUs in a domain are using nohz-full, then an IPI is used as the fallback. This function is only used in response to a user-space filesystem request (not the timing sensitive overflow code). This allows MPAM to hide the slice behaviour from resctrl, and to keep the monitor-allocation in monitor.c. When the IPI fallback is used on machines where MPAM needs to make an access on multiple CPUs, the counter read will always fail. Signed-off-by: James Morse Tested-by: Shaopeng Tan Tested-by: Peter Newman Tested-by: Babu Moger Tested-by: Carl Worth # arm64 Reviewed-by: Shaopeng Tan Reviewed-by: Peter Newman Reviewed-by: Reinette Chatre Reviewed-by: Babu Moger --- Changes since v2: * Use cpumask_any_housekeeping() and fallback to an IPI if needed. Changes since v3: * Actually include the IPI fallback code. Changes since v4: * Tinkered with existing capitalisation. Changes since v5: * Added a newline. Changes since v6: * Moved lockdep annotations to a later patch. --- arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 26 +++++++++++++++++++++-- arch/x86/kernel/cpu/resctrl/monitor.c | 2 +- 2 files changed, 25 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c index beccb0e87ba7..d07f99245851 100644 --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c @@ -19,6 +19,8 @@ #include #include #include +#include + #include "internal.h" /* @@ -522,12 +524,21 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of, return ret; } +static int smp_mon_event_count(void *arg) +{ + mon_event_count(arg); + + return 0; +} + void mon_event_read(struct rmid_read *rr, struct rdt_resource *r, struct rdt_domain *d, struct rdtgroup *rdtgrp, int evtid, int first) { + int cpu; + /* - * setup the parameters to send to the IPI to read the data. + * Setup the parameters to pass to mon_event_count() to read the data. */ rr->rgrp = rdtgrp; rr->evtid = evtid; @@ -536,7 +547,18 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r, rr->val = 0; rr->first = first; - smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1); + cpu = cpumask_any_housekeeping(&d->cpu_mask); + + /* + * cpumask_any_housekeeping() prefers housekeeping CPUs, but + * are all the CPUs nohz_full? If yes, pick a CPU to IPI. + * MPAM's resctrl_arch_rmid_read() is unable to read the + * counters on some platforms if its called in irq context. + */ + if (tick_nohz_full_cpu(cpu)) + smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1); + else + smp_call_on_cpu(cpu, smp_mon_event_count, rr, false); } int rdtgroup_mondata_show(struct seq_file *m, void *arg) diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c index 38f85e53ca93..fd060ef86f38 100644 --- a/arch/x86/kernel/cpu/resctrl/monitor.c +++ b/arch/x86/kernel/cpu/resctrl/monitor.c @@ -585,7 +585,7 @@ static void mbm_bw_count(u32 closid, u32 rmid, struct rmid_read *rr) } /* - * This is called via IPI to read the CQM/MBM counters + * This is scheduled by mon_event_read() to read the CQM/MBM counters * on a domain. */ void mon_event_count(void *info)