Message ID | 166990903030.17806.5106229901730558377.stgit@bmoger-ubuntu |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp330208wrr; Thu, 1 Dec 2022 07:38:42 -0800 (PST) X-Google-Smtp-Source: AA0mqf6ibGJ0X90qg8eCbPzy6moJygb+ttmZRonJMbWFYvAsiOVT82+sd78G1g/OHMdTaEQBaXUG X-Received: by 2002:a17:90b:3e8b:b0:1fb:825c:af8a with SMTP id rj11-20020a17090b3e8b00b001fb825caf8amr31394074pjb.104.1669909122523; Thu, 01 Dec 2022 07:38:42 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1669909122; cv=pass; d=google.com; s=arc-20160816; b=XN/WCikBVb6OKLoO04PseKc6J6lc7r6pc4luAV9/DSWtklSNEGu9zbvy+IWWyKXPAd 0EEM6ui3OT3zHZI5FiDqidVjnm1vf4mRBgVK7rOEDVG0RXkQ0BQN8cOm13BwoSHVgQRA Dr4HQwrbrrvVdYCwdDMGfeTFGNzWAGtiIQzaTzeA4ubckNEWTbOa4LUyb7m+92//V0v1 KHldnGZ0/N8xK59hDJLqiWip7dGPoBBneVNQCem5UmYActkdzN7EXv+gkOpL+5ifJJUA w2i/6wKPrqRexAc3+TQoIG4ckmD+bk3arIxQgjR57xWT2A3IwwSlCetdBWkmBESCbp6H GCbQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:dkim-signature; bh=+uzowYfQobwSEZJ6ftLgb5iSVQSyJuZ59EqCdWfiPf4=; b=p2d8rAVX34ocjoA3hQpQJ+8L2xBcnyLbBVaavQhBSmJNhexKQCYYowqCoTsRu9xhFd fuxYUVSyObLbimfh0b/VVq3+AGt1f1Z/zLzKr3Xj+WeMKoVGxFQWv//RJUsBHiUeL+Lj J6hXXc8dtZ8xdEZw8iS83dSjk0OsNgfNco+3W5Ux0ZLv+LQ5MWEE2kP0azeOL6zL610a pbuK9LzWtGuTdQ4lIgsXP0ZykIcPCNgQRryTUzkIRaPqjyCMCa5rglCwFgEzMjHX9GK2 K5ogxGB/hEX4nnJTDNkOOnnF3OVxWShaTIyIyVbfI8NRm+yjqu/q84l2nnHfqjb8FMrl VHHg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b=eH36igBJ; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u6-20020a634546000000b00470693437b8si4845432pgk.550.2022.12.01.07.38.29; Thu, 01 Dec 2022 07:38:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b=eH36igBJ; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232020AbiLAPhr (ORCPT <rfc822;heyuhang3455@gmail.com> + 99 others); Thu, 1 Dec 2022 10:37:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48590 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232005AbiLAPhY (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 1 Dec 2022 10:37:24 -0500 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2083.outbound.protection.outlook.com [40.107.223.83]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBBC98D649; Thu, 1 Dec 2022 07:37:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Sq3oCYo1/V8pNXGb9WTQC3r4mwyr+JjE6iV8RFYXEEi+Gb7cv19ytnyftI9uwvz6izpPWLkPLV8nn1J3nI8P3QPAoDnoEyEjb/D/EUuNP3XlmhPkR6XGwXmA6KyxGIAQkeJbgNjJP7xFz4tHJzdcKNmXHAyNRI5JNMqaFC7o+YRc7+traT7CAlbxo/yV1epfQLAwFiRAeBIcWZcNX9atu9zVG70r96YKhdw0s41fT5uqXt4LDjTHF7NQr292jAPxL+bwIYMqsPlcq1uoRFTz/YP2IwoAds7W5MgUf5ljT1pIVpOto1iQXNchLdk6PiX4ljnOsNNWIaln5AyxGlhgHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+uzowYfQobwSEZJ6ftLgb5iSVQSyJuZ59EqCdWfiPf4=; b=HkXC1YqasvEeLrlxAgdjBsCg/UOqhMYgYSiNCd8k8e1c2Qx8RrzkV/ZOQEfq3tYVDzECLIFzMjhG0GVs75tm0CrAtKbWo5PmMv/TFU1JgFJs8sqajp2NKKTkZ4eje+n0Hgl062kCEFcEm1Y0qVKKF65V16dbiAcs8fjbSpBq44ZoBG7UnJCwtyC3FSkb8hOSvK96rVuvxpTt+ot1Pavzxc2gwpyUAqpz7iOoA9wb/tBu7NYx4UQ2nuBmyPKVql3Cnu7vpui9hYag9LLG/d5juZFiQ6uzJSRpRnAyEveVuAAeHYY22nCb333wJsZIv9HONbKKDyMJMTL4aI9uLGEKFQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=linutronix.de smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=+uzowYfQobwSEZJ6ftLgb5iSVQSyJuZ59EqCdWfiPf4=; b=eH36igBJfDm8lYZIbKG/Gp1jLhhOJA9wXi0tcAVLMmtsFRbBmuf3XzH8asAmn6KG0fn0+oUZMbjEw5hbtt2ZyQUwQyq5gVwFBXn/swM7VgX55k2gxZa/OHUWFJKWN31zZai1hzcjiugUck/eXTQyhaYtdu22lBItp4T7XHdkuew= Received: from MW4PR03CA0107.namprd03.prod.outlook.com (2603:10b6:303:b7::22) by DS0PR12MB7677.namprd12.prod.outlook.com (2603:10b6:8:136::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5857.23; Thu, 1 Dec 2022 15:37:20 +0000 Received: from CO1NAM11FT009.eop-nam11.prod.protection.outlook.com (2603:10b6:303:b7:cafe::d7) by MW4PR03CA0107.outlook.office365.com (2603:10b6:303:b7::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5880.8 via Frontend Transport; Thu, 1 Dec 2022 15:37:20 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by CO1NAM11FT009.mail.protection.outlook.com (10.13.175.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.5880.8 via Frontend Transport; Thu, 1 Dec 2022 15:37:19 +0000 Received: from [127.0.1.1] (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Thu, 1 Dec 2022 09:37:15 -0600 Subject: [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config From: Babu Moger <babu.moger@amd.com> To: <corbet@lwn.net>, <reinette.chatre@intel.com>, <tglx@linutronix.de>, <mingo@redhat.com>, <bp@alien8.de> CC: <fenghua.yu@intel.com>, <dave.hansen@linux.intel.com>, <x86@kernel.org>, <hpa@zytor.com>, <paulmck@kernel.org>, <akpm@linux-foundation.org>, <quic_neeraju@quicinc.com>, <rdunlap@infradead.org>, <damien.lemoal@opensource.wdc.com>, <songmuchun@bytedance.com>, <peterz@infradead.org>, <jpoimboe@kernel.org>, <pbonzini@redhat.com>, <babu.moger@amd.com>, <chang.seok.bae@intel.com>, <pawan.kumar.gupta@linux.intel.com>, <jmattson@google.com>, <daniel.sneddon@linux.intel.com>, <sandipan.das@amd.com>, <tony.luck@intel.com>, <james.morse@arm.com>, <linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <bagasdotme@gmail.com>, <eranian@google.com>, <christophe.leroy@csgroup.eu>, <pawan.kumar.gupta@linux.intel.com>, <jarkko@kernel.org>, <adrian.hunter@intel.com>, <quic_jiles@quicinc.com>, <peternewman@google.com> Date: Thu, 1 Dec 2022 09:37:10 -0600 Message-ID: <166990903030.17806.5106229901730558377.stgit@bmoger-ubuntu> In-Reply-To: <166990882621.17806.16780480657453071426.stgit@bmoger-ubuntu> References: <166990882621.17806.16780480657453071426.stgit@bmoger-ubuntu> User-Agent: StGit/1.1.dev103+g5369f4c MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CO1NAM11FT009:EE_|DS0PR12MB7677:EE_ X-MS-Office365-Filtering-Correlation-Id: 0fea3cfc-724e-483c-192a-08dad3b1ef22 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Lzb603UZF5EcBkYVBnp2ADGjTZn+TRdB+TDYW2HkOU7CIAv3HGoBHAzXqcXhLYUQ1neQ59f8UIbta4RKhL6xqK8MNFF0kCAuh85hMhD37f0YOvVN42F634eBSUteC9WQJbu8tyx3iYTShfT5P0GgMi1nE8MFW2Sft+Ya1VfcGs8ZsyVvd8vIvEqveBn/W0Y0rntU0H/Atrs/TUP+JUwcDyJqS4fEiu+xDr8Em4H8d235ZsIhvTb9y9PFXgxc3pUbm76XEY/xxlHncedNS30ydByEmEvu/GeVa60UuTRrhjKSEwzICGZGugHNpVk0kqJNasPH/E6GAGrdlABt3fCjlsqamC16Xkt25i6k15uSzSfIm5sRJim0rdmAU1acfUOHGqkvEltUC18ePFixlH3gOZPr1GtWwseXCyUg4hXften9QuDTQndwjgxSC7JOY+z3m1vXJCgseBv0rurviTfAZ6OB41dQyyM2Zj5nJDrRHYHS3RmM6+97SotBSWV7pYHb4DqtxeJS72+QM0I1kTA5WaGlR0Ak8F/LuHhI6QAAyDBc787cQ3MKhTL/ayq5E5atoTgxMgcR86RKqHQZpn+VbUwL6oGKxNUoPezqiac7sGBwUdFi9qNwwPfDz8hzRfCgGlFfT3fb32uc5fzID1jmTjjX4j5AQFc8nBT1PPV2Ho+veNrWrrcd9XE9wzNVZvd18kXzUKM7dADjVDWVy1RYdZGnTmAVLTdGJXH7ZCxbdyzyAOpyZpXqri2gj+4HBtX57BPDri0LYtYLSf3cyzbq2g== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230022)(4636009)(7916004)(136003)(396003)(376002)(346002)(39860400002)(451199015)(40470700004)(36840700001)(46966006)(41300700001)(5660300002)(86362001)(7406005)(81166007)(110136005)(7416002)(26005)(8936002)(16576012)(54906003)(316002)(40460700003)(426003)(336012)(47076005)(103116003)(186003)(83380400001)(40480700001)(82310400005)(4326008)(8676002)(33716001)(70586007)(70206006)(44832011)(36860700001)(478600001)(16526019)(9686003)(82740400003)(6666004)(356005)(2906002)(71626007)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Dec 2022 15:37:19.9445 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0fea3cfc-724e-483c-192a-08dad3b1ef22 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: CO1NAM11FT009.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB7677 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751026628057914050?= X-GMAIL-MSGID: =?utf-8?q?1751026628057914050?= |
Series |
Support for AMD QoS new features
|
|
Commit Message
Moger, Babu
Dec. 1, 2022, 3:37 p.m. UTC
The current event configuration for mbm_total_bytes can be changed by
the user by writing to the file
/sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.
The event configuration settings are domain specific and will affect all
the CPUs in the domain.
Following are the types of events supported:
==== ===========================================================
Bits Description
==== ===========================================================
6 Dirty Victims from the QOS domain to all types of memory
5 Reads to slow memory in the non-local NUMA domain
4 Reads to slow memory in the local NUMA domain
3 Non-temporal writes to non-local NUMA domain
2 Non-temporal writes to local NUMA domain
1 Reads to memory in the non-local NUMA domain
0 Reads to memory in the local NUMA domain
==== ===========================================================
For example:
To change the mbm_total_bytes to count only reads on domain 0, the bits
0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33). Run the
command.
$echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
To change the mbm_total_bytes to count all the slow memory reads on
domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
Run the command.
$echo 1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
Signed-off-by: Babu Moger <babu.moger@amd.com>
---
arch/x86/kernel/cpu/resctrl/monitor.c | 13 +++
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 127 ++++++++++++++++++++++++++++++++
include/linux/resctrl.h | 10 +++
3 files changed, 149 insertions(+), 1 deletion(-)
Comments
Hi Babu, On 12/1/2022 7:37 AM, Babu Moger wrote: > The current event configuration for mbm_total_bytes can be changed by > the user by writing to the file > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config. Please drop "current" from above > > The event configuration settings are domain specific and will affect all > the CPUs in the domain. please drop "will" > > Following are the types of events supported: > > ==== =========================================================== > Bits Description > ==== =========================================================== > 6 Dirty Victims from the QOS domain to all types of memory > 5 Reads to slow memory in the non-local NUMA domain > 4 Reads to slow memory in the local NUMA domain > 3 Non-temporal writes to non-local NUMA domain > 2 Non-temporal writes to local NUMA domain > 1 Reads to memory in the non-local NUMA domain > 0 Reads to memory in the local NUMA domain > ==== =========================================================== > > For example: > To change the mbm_total_bytes to count only reads on domain 0, the bits > 0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33). Run the > command. > $echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config > > To change the mbm_total_bytes to count all the slow memory reads on > domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30). > Run the command. > $echo 1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config > > Signed-off-by: Babu Moger <babu.moger@amd.com> > --- > arch/x86/kernel/cpu/resctrl/monitor.c | 13 +++ > arch/x86/kernel/cpu/resctrl/rdtgroup.c | 127 ++++++++++++++++++++++++++++++++ > include/linux/resctrl.h | 10 +++ > 3 files changed, 149 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c > index 7c8a3a745041..b265856835de 100644 > --- a/arch/x86/kernel/cpu/resctrl/monitor.c > +++ b/arch/x86/kernel/cpu/resctrl/monitor.c > @@ -176,6 +176,19 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d, > memset(am, 0, sizeof(*am)); > } > > +void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain *d) > +{ > + struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d); > + > + if (is_mbm_total_enabled()) > + memset(hw_dom->arch_mbm_total, 0, > + sizeof(*hw_dom->arch_mbm_total) * r->num_rmid); > + > + if (is_mbm_local_enabled()) > + memset(hw_dom->arch_mbm_local, 0, > + sizeof(*hw_dom->arch_mbm_local) * r->num_rmid); > +} > + We learned a lot more about this area after Peter's discovery: https://lore.kernel.org/lkml/20221207112924.3602960-1-peternewman@google.com/ Since this is a new generic function it should be clear in which scenarios it is valid. Could you please add a function comment that warns future developers about consequences if a new usage is considered? Something like: /* * Assumes that hardware counters are also reset and thus that there is no need * to record initial non-zero counts. */ > static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width) > { > u64 shift = 64 - width, chunks; > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > index 580f3cce19e2..8a22a652a6e8 100644 > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > @@ -1517,6 +1517,130 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of, > return 0; > } > > +static void mon_event_config_write(void *info) > +{ > + struct mon_config_info *mon_info = info; > + u32 index; > + index does not need to be u32 ... mon_event_config_index_get() returns "unsigned int" and wrmsr expects "unsigned int", it can also just be "unsigned int". > + index = mon_event_config_index_get(mon_info->evtid); > + if (index == INVALID_CONFIG_INDEX) { > + pr_warn_once("Invalid event id %d\n", mon_info->evtid); > + return; > + } > + wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0); > +} > + > +static int mbm_config_write_domain(struct rdt_resource *r, > + struct rdt_domain *d, u32 evtid, u32 val) > +{ > + struct mon_config_info mon_info = {0}; > + int ret = 0; > + > + /* mon_config cannot be more than the supported set of events */ > + if (val > MAX_EVT_CONFIG_BITS) { > + rdt_last_cmd_puts("Invalid event configuration\n"); > + return -EINVAL; > + } > + > + /* > + * Read the current config value first. If both are the same then > + * no need to write it again. > + */ > + mon_info.evtid = evtid; > + mondata_config_read(d, &mon_info); > + if (mon_info.mon_config == val) > + goto out; > + > + mon_info.mon_config = val; > + > + /* > + * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the > + * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE > + * are scoped at the domain level. Writing any of these MSRs > + * on one CPU is supposed to be observed by all CPUs in the > + * domain. However, the hardware team recommends to update > + * these MSRs on all the CPUs in the domain. > + */ > + on_each_cpu_mask(&d->cpu_mask, mon_event_config_write, &mon_info, 1); > + > + /* > + * When an Event Configuration is changed, the bandwidth counters > + * for all RMIDs and Events will be cleared by the hardware. The > + * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for > + * every RMID on the next read to any event for every RMID. > + * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62) > + * cleared while it is tracked by the hardware. Clear the > + * mbm_local and mbm_total counts for all the RMIDs. > + */ > + resctrl_arch_reset_rmid_all(r, d); If I understand correctly the expectation is that when user space read counters (via mon_data files) right after the configuration was changed then this read will return "Unavailable" and then the next read will return data. If this is the case then I think a snippet about this user experience would be helpful to add to the documentation. Have you considered doing a preemptive read on the RMIDs that are in use to avoid users encountering "Unavailable"? I assume doing so on a busy system could potentially involve hundreds of register reads/writes. > + > +out: > + return ret; > +} > + > +static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid) > +{ > + char *dom_str = NULL, *id_str; > + unsigned long dom_id, val; > + struct rdt_domain *d; > + int ret = 0; > + > +next: > + if (!tok || tok[0] == '\0') > + return 0; > + > + /* Start processing the strings for each domain */ > + dom_str = strim(strsep(&tok, ";")); > + id_str = strsep(&dom_str, "="); > + > + if (!id_str || kstrtoul(id_str, 10, &dom_id)) { > + rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n"); > + return -EINVAL; > + } > + > + if (!dom_str || kstrtoul(dom_str, 16, &val)) { > + rdt_last_cmd_puts("Non-numeric event configuration value\n"); > + return -EINVAL; > + } > + > + list_for_each_entry(d, &r->domains, list) { > + if (d->id == dom_id) { > + ret = mbm_config_write_domain(r, d, evtid, val); > + if (ret) > + return -EINVAL; > + goto next; > + } > + } > + > + return -EINVAL; > +} > + > +static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of, > + char *buf, size_t nbytes, > + loff_t off) > +{ > + struct rdt_resource *r = of->kn->parent->priv; > + int ret; > + > + /* Valid input requires a trailing newline */ > + if (nbytes == 0 || buf[nbytes - 1] != '\n') > + return -EINVAL; > + > + cpus_read_lock(); > + mutex_lock(&rdtgroup_mutex); > + > + rdt_last_cmd_clear(); > + > + buf[nbytes - 1] = '\0'; > + > + ret = mon_config_write(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID); > + > + mutex_unlock(&rdtgroup_mutex); > + cpus_read_unlock(); > + > + return ret ?: nbytes; > +} > + > /* rdtgroup information files for one cache resource. */ > static struct rftype res_common_files[] = { > { > @@ -1617,9 +1741,10 @@ static struct rftype res_common_files[] = { > }, > { > .name = "mbm_total_bytes_config", > - .mode = 0444, > + .mode = 0644, > .kf_ops = &rdtgroup_kf_single_ops, > .seq_show = mbm_total_bytes_config_show, > + .write = mbm_total_bytes_config_write, > }, > { > .name = "mbm_local_bytes_config", > diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h > index 0cee154abc9f..e4dc65892446 100644 > --- a/include/linux/resctrl.h > +++ b/include/linux/resctrl.h > @@ -250,6 +250,16 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d, > void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d, > u32 rmid, enum resctrl_event_id eventid); > > +/** > + * resctrl_arch_reset_rmid_all() - Reset any private state associated with > + * all the rmids. It could be more explicit: "Reset all private state associated with all rmids and eventids." > + * @r: The domain's resource. > + * @d: The rmid's domain. This copy&paste needs some changes to match this new utility. How about: @r: The resctrl resource. @d: The domain for which all architectural counter state will be cleared. I think it can be improved more but the above could be a start (please do not copy verbatim but ensure style is correct.) Keep in mind that this utility does not clear the non-architectural counter state. This does not apply to AMD since that state is used by the software controller, but it needs to be kept in mind if another usage for this utility arises. > + * > + * This can be called from any CPU. > + */ > +void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain *d); > + > extern unsigned int resctrl_rmid_realloc_threshold; > extern unsigned int resctrl_rmid_realloc_limit; > > > The above hunk fails the "no spaces before tabs" checkpatch check. Reinette
[AMD Official Use Only - General] Hi Reinette, > -----Original Message----- > From: Reinette Chatre <reinette.chatre@intel.com> > Sent: Thursday, December 15, 2022 12:25 PM > To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net; > tglx@linutronix.de; mingo@redhat.com; bp@alien8.de > Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org; > hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org; > quic_neeraju@quicinc.com; rdunlap@infradead.org; > damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com; > peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com; > chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com; > jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan > <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com; > linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org; > bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu; > jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com; > peternewman@google.com > Subject: Re: [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write > mbm_total_bytes_config > > Hi Babu, > > On 12/1/2022 7:37 AM, Babu Moger wrote: > > The current event configuration for mbm_total_bytes can be changed by > > the user by writing to the file > > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config. > > Please drop "current" from above Sure. > > > > > The event configuration settings are domain specific and will affect > > all the CPUs in the domain. > > please drop "will" Ok > > > > > Following are the types of events supported: > > > > ==== > =========================================================== > > Bits Description > > ==== > =========================================================== > > 6 Dirty Victims from the QOS domain to all types of memory > > 5 Reads to slow memory in the non-local NUMA domain > > 4 Reads to slow memory in the local NUMA domain > > 3 Non-temporal writes to non-local NUMA domain > > 2 Non-temporal writes to local NUMA domain > > 1 Reads to memory in the non-local NUMA domain > > 0 Reads to memory in the local NUMA domain > > ==== > =========================================================== > > > > For example: > > To change the mbm_total_bytes to count only reads on domain 0, the > > bits 0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33). > > Run the command. > > $echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config > > > > To change the mbm_total_bytes to count all the slow memory reads on > > domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30). > > Run the command. > > $echo 1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config > > > > Signed-off-by: Babu Moger <babu.moger@amd.com> > > --- > > arch/x86/kernel/cpu/resctrl/monitor.c | 13 +++ > > arch/x86/kernel/cpu/resctrl/rdtgroup.c | 127 > ++++++++++++++++++++++++++++++++ > > include/linux/resctrl.h | 10 +++ > > 3 files changed, 149 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c > > b/arch/x86/kernel/cpu/resctrl/monitor.c > > index 7c8a3a745041..b265856835de 100644 > > --- a/arch/x86/kernel/cpu/resctrl/monitor.c > > +++ b/arch/x86/kernel/cpu/resctrl/monitor.c > > @@ -176,6 +176,19 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, > struct rdt_domain *d, > > memset(am, 0, sizeof(*am)); > > } > > > > +void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct > > +rdt_domain *d) { > > + struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d); > > + > > + if (is_mbm_total_enabled()) > > + memset(hw_dom->arch_mbm_total, 0, > > + sizeof(*hw_dom->arch_mbm_total) * r->num_rmid); > > + > > + if (is_mbm_local_enabled()) > > + memset(hw_dom->arch_mbm_local, 0, > > + sizeof(*hw_dom->arch_mbm_local) * r->num_rmid); } > > + > > We learned a lot more about this area after Peter's discovery: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kern > el.org%2Flkml%2F20221207112924.3602960-1- > peternewman%40google.com%2F&data=05%7C01%7Cbabu.moger%40am > d.com%7Cc9ca23a7b3c643d2385508dadec9f83d%7C3dd8961fe4884e608e11a8 > 2d994e183d%7C0%7C0%7C638067256273943013%7CUnknown%7CTWFpbGZsb > 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0% > 3D%7C3000%7C%7C%7C&sdata=Pmvi2L4L727GqALGaeEO0MbpcnuRcqKc > opNNXuu%2BbFM%3D&reserved=0 > > Since this is a new generic function it should be clear in which scenarios it is > valid. > Could you please add a function comment that warns future developers about > consequences if a new usage is considered? Something like: > > /* > * Assumes that hardware counters are also reset and thus that there is no > need > * to record initial non-zero counts. > */ Ok. Sure. > > > static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int > > width) { > > u64 shift = 64 - width, chunks; > > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c > > b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > > index 580f3cce19e2..8a22a652a6e8 100644 > > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c > > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > > @@ -1517,6 +1517,130 @@ static int mbm_local_bytes_config_show(struct > kernfs_open_file *of, > > return 0; > > } > > > > +static void mon_event_config_write(void *info) { > > + struct mon_config_info *mon_info = info; > > + u32 index; > > + > > index does not need to be u32 ... mon_event_config_index_get() returns > "unsigned int" > and wrmsr expects "unsigned int", it can also just be "unsigned int". Ok. > > > > + index = mon_event_config_index_get(mon_info->evtid); > > + if (index == INVALID_CONFIG_INDEX) { > > + pr_warn_once("Invalid event id %d\n", mon_info->evtid); > > + return; > > + } > > + wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0); > } > > + > > +static int mbm_config_write_domain(struct rdt_resource *r, > > + struct rdt_domain *d, u32 evtid, u32 val) { > > + struct mon_config_info mon_info = {0}; > > + int ret = 0; > > + > > + /* mon_config cannot be more than the supported set of events */ > > + if (val > MAX_EVT_CONFIG_BITS) { > > + rdt_last_cmd_puts("Invalid event configuration\n"); > > + return -EINVAL; > > + } > > + > > + /* > > + * Read the current config value first. If both are the same then > > + * no need to write it again. > > + */ > > + mon_info.evtid = evtid; > > + mondata_config_read(d, &mon_info); > > + if (mon_info.mon_config == val) > > + goto out; > > + > > + mon_info.mon_config = val; > > + > > + /* > > + * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the > > + * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE > > + * are scoped at the domain level. Writing any of these MSRs > > + * on one CPU is supposed to be observed by all CPUs in the > > + * domain. However, the hardware team recommends to update > > + * these MSRs on all the CPUs in the domain. > > + */ > > + on_each_cpu_mask(&d->cpu_mask, mon_event_config_write, > &mon_info, > > +1); > > + > > + /* > > + * When an Event Configuration is changed, the bandwidth counters > > + * for all RMIDs and Events will be cleared by the hardware. The > > + * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for > > + * every RMID on the next read to any event for every RMID. > > + * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62) > > + * cleared while it is tracked by the hardware. Clear the > > + * mbm_local and mbm_total counts for all the RMIDs. > > + */ > > + resctrl_arch_reset_rmid_all(r, d); > > If I understand correctly the expectation is that when user space read counters > (via mon_data files) right after the configuration was changed then this read > will return "Unavailable" and then the next read will return data. > > If this is the case then I think a snippet about this user experience would be > helpful to add to the documentation. Ok. How about this in the documentation? "When an event configuration is changed, the bandwidth counters for all the RMIDs and the events will be cleared for that domain. The next read for every RMID will report "Unavailable" and subsequent reads will report the valid value." > > Have you considered doing a preemptive read on the RMIDs that are in use to > avoid users encountering "Unavailable"? I assume doing so on a busy system > could potentially involve hundreds of register reads/writes. No. I have not tried that. > > > + > > +out: > > + return ret; > > +} > > + > > +static int mon_config_write(struct rdt_resource *r, char *tok, u32 > > +evtid) { > > + char *dom_str = NULL, *id_str; > > + unsigned long dom_id, val; > > + struct rdt_domain *d; > > + int ret = 0; > > + > > +next: > > + if (!tok || tok[0] == '\0') > > + return 0; > > + > > + /* Start processing the strings for each domain */ > > + dom_str = strim(strsep(&tok, ";")); > > + id_str = strsep(&dom_str, "="); > > + > > + if (!id_str || kstrtoul(id_str, 10, &dom_id)) { > > + rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n"); > > + return -EINVAL; > > + } > > + > > + if (!dom_str || kstrtoul(dom_str, 16, &val)) { > > + rdt_last_cmd_puts("Non-numeric event configuration > value\n"); > > + return -EINVAL; > > + } > > + > > + list_for_each_entry(d, &r->domains, list) { > > + if (d->id == dom_id) { > > + ret = mbm_config_write_domain(r, d, evtid, val); > > + if (ret) > > + return -EINVAL; > > + goto next; > > + } > > + } > > + > > + return -EINVAL; > > +} > > + > > +static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of, > > + char *buf, size_t nbytes, > > + loff_t off) > > +{ > > + struct rdt_resource *r = of->kn->parent->priv; > > + int ret; > > + > > + /* Valid input requires a trailing newline */ > > + if (nbytes == 0 || buf[nbytes - 1] != '\n') > > + return -EINVAL; > > + > > + cpus_read_lock(); > > + mutex_lock(&rdtgroup_mutex); > > + > > + rdt_last_cmd_clear(); > > + > > + buf[nbytes - 1] = '\0'; > > + > > + ret = mon_config_write(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID); > > + > > + mutex_unlock(&rdtgroup_mutex); > > + cpus_read_unlock(); > > + > > + return ret ?: nbytes; > > +} > > + > > /* rdtgroup information files for one cache resource. */ static > > struct rftype res_common_files[] = { > > { > > @@ -1617,9 +1741,10 @@ static struct rftype res_common_files[] = { > > }, > > { > > .name = "mbm_total_bytes_config", > > - .mode = 0444, > > + .mode = 0644, > > .kf_ops = &rdtgroup_kf_single_ops, > > .seq_show = mbm_total_bytes_config_show, > > + .write = mbm_total_bytes_config_write, > > }, > > { > > .name = "mbm_local_bytes_config", > > diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index > > 0cee154abc9f..e4dc65892446 100644 > > --- a/include/linux/resctrl.h > > +++ b/include/linux/resctrl.h > > @@ -250,6 +250,16 @@ int resctrl_arch_rmid_read(struct rdt_resource > > *r, struct rdt_domain *d, void resctrl_arch_reset_rmid(struct rdt_resource > *r, struct rdt_domain *d, > > u32 rmid, enum resctrl_event_id eventid); > > > > +/** > > + * resctrl_arch_reset_rmid_all() - Reset any private state associated with > > + * all the rmids. > > It could be more explicit: > "Reset all private state associated with all rmids and eventids." Sure. > > > + * @r: The domain's resource. > > + * @d: The rmid's domain. > > This copy&paste needs some changes to match this new utility. > How about: > @r: The resctrl resource. > @d: The domain for which all architectural counter state will be cleared. Sure. > > I think it can be improved more but the above could be a start (please do not > copy verbatim but ensure style is correct.) > > Keep in mind that this utility does not clear the non-architectural counter state. > This does not apply to AMD since that state is used by the software controller, > but it needs to be kept in mind if another usage for this utility arises. Sure. Will add this text in the description. > > > + * > > + * This can be called from any CPU. > > + */ > > +void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct > > +rdt_domain *d); > > + > > extern unsigned int resctrl_rmid_realloc_threshold; extern unsigned > > int resctrl_rmid_realloc_limit; > > > > > > > > The above hunk fails the "no spaces before tabs" checkpatch check. Sure. Will check. Thanks Babu > > Reinette
[AMD Official Use Only - General] Hi Reinette, > -----Original Message----- > From: Reinette Chatre <reinette.chatre@intel.com> > Sent: Thursday, December 15, 2022 12:25 PM > To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net; > tglx@linutronix.de; mingo@redhat.com; bp@alien8.de > Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org; > hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org; > quic_neeraju@quicinc.com; rdunlap@infradead.org; > damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com; > peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com; > chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com; > jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan > <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com; > linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org; > bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu; > jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com; > peternewman@google.com > Subject: Re: [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write > mbm_total_bytes_config > > Hi Babu, > > On 12/1/2022 7:37 AM, Babu Moger wrote: > > The current event configuration for mbm_total_bytes can be changed by > > the user by writing to the file > > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config. > > Please drop "current" from above > > > > > The event configuration settings are domain specific and will affect > > all the CPUs in the domain. > > please drop "will" > > > > > Following are the types of events supported: > > > > ==== > =========================================================== > > Bits Description > > ==== > =========================================================== > > 6 Dirty Victims from the QOS domain to all types of memory > > 5 Reads to slow memory in the non-local NUMA domain > > 4 Reads to slow memory in the local NUMA domain > > 3 Non-temporal writes to non-local NUMA domain > > 2 Non-temporal writes to local NUMA domain > > 1 Reads to memory in the non-local NUMA domain > > 0 Reads to memory in the local NUMA domain > > ==== > =========================================================== > > > > For example: > > To change the mbm_total_bytes to count only reads on domain 0, the > > bits 0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33). > > Run the command. > > $echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config > > > > To change the mbm_total_bytes to count all the slow memory reads on > > domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30). > > Run the command. > > $echo 1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config > > > > Signed-off-by: Babu Moger <babu.moger@amd.com> > > --- > > arch/x86/kernel/cpu/resctrl/monitor.c | 13 +++ > > arch/x86/kernel/cpu/resctrl/rdtgroup.c | 127 > ++++++++++++++++++++++++++++++++ > > include/linux/resctrl.h | 10 +++ > > 3 files changed, 149 insertions(+), 1 deletion(-) > > > > diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c > > b/arch/x86/kernel/cpu/resctrl/monitor.c > > index 7c8a3a745041..b265856835de 100644 > > --- a/arch/x86/kernel/cpu/resctrl/monitor.c > > +++ b/arch/x86/kernel/cpu/resctrl/monitor.c > > @@ -176,6 +176,19 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, > struct rdt_domain *d, > > memset(am, 0, sizeof(*am)); > > } > > > > +void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct > > +rdt_domain *d) { > > + struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d); > > + > > + if (is_mbm_total_enabled()) > > + memset(hw_dom->arch_mbm_total, 0, > > + sizeof(*hw_dom->arch_mbm_total) * r->num_rmid); > > + > > + if (is_mbm_local_enabled()) > > + memset(hw_dom->arch_mbm_local, 0, > > + sizeof(*hw_dom->arch_mbm_local) * r->num_rmid); } > > + > > We learned a lot more about this area after Peter's discovery: > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kern > el.org%2Flkml%2F20221207112924.3602960-1- > peternewman%40google.com%2F&data=05%7C01%7Cbabu.moger%40am > d.com%7Cc9ca23a7b3c643d2385508dadec9f83d%7C3dd8961fe4884e608e11a8 > 2d994e183d%7C0%7C0%7C638067256273943013%7CUnknown%7CTWFpbGZsb > 3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0% > 3D%7C3000%7C%7C%7C&sdata=Pmvi2L4L727GqALGaeEO0MbpcnuRcqKc > opNNXuu%2BbFM%3D&reserved=0 > > Since this is a new generic function it should be clear in which scenarios it is > valid. > Could you please add a function comment that warns future developers about > consequences if a new usage is considered? Something like: > > /* > * Assumes that hardware counters are also reset and thus that there is no > need > * to record initial non-zero counts. > */ > > > static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int > > width) { > > u64 shift = 64 - width, chunks; > > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c > > b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > > index 580f3cce19e2..8a22a652a6e8 100644 > > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c > > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > > @@ -1517,6 +1517,130 @@ static int mbm_local_bytes_config_show(struct > kernfs_open_file *of, > > return 0; > > } > > > > +static void mon_event_config_write(void *info) { > > + struct mon_config_info *mon_info = info; > > + u32 index; > > + > > index does not need to be u32 ... mon_event_config_index_get() returns > "unsigned int" > and wrmsr expects "unsigned int", it can also just be "unsigned int". > > > > + index = mon_event_config_index_get(mon_info->evtid); > > + if (index == INVALID_CONFIG_INDEX) { > > + pr_warn_once("Invalid event id %d\n", mon_info->evtid); > > + return; > > + } > > + wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0); > } > > + > > +static int mbm_config_write_domain(struct rdt_resource *r, > > + struct rdt_domain *d, u32 evtid, u32 val) { > > + struct mon_config_info mon_info = {0}; > > + int ret = 0; > > + > > + /* mon_config cannot be more than the supported set of events */ > > + if (val > MAX_EVT_CONFIG_BITS) { > > + rdt_last_cmd_puts("Invalid event configuration\n"); > > + return -EINVAL; > > + } > > + > > + /* > > + * Read the current config value first. If both are the same then > > + * no need to write it again. > > + */ > > + mon_info.evtid = evtid; > > + mondata_config_read(d, &mon_info); > > + if (mon_info.mon_config == val) > > + goto out; > > + > > + mon_info.mon_config = val; > > + > > + /* > > + * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the > > + * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE > > + * are scoped at the domain level. Writing any of these MSRs > > + * on one CPU is supposed to be observed by all CPUs in the > > + * domain. However, the hardware team recommends to update > > + * these MSRs on all the CPUs in the domain. > > + */ > > + on_each_cpu_mask(&d->cpu_mask, mon_event_config_write, > &mon_info, > > +1); Forgot about this. This snippet is going to change. I have tested and works fine. How about this? /* * Update MSR_IA32_EVT_CFG_BASE MSR on one of the CPUs in the * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE * are scoped at the domain level. Writing any of these MSRs * on one CPU is supposed to be observed by all CPUs in the domain. */ smp_call_function_any(&d->cpu_mask, mon_event_config_write, &mon_info, 1); /* > + * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the > + * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE > + * are scoped at the domain level. Writing any of these MSRs > + * on one CPU is supposed to be observed by all CPUs in the > + * domain. However, the hardware team recommends to update > + * these MSRs on all the CPUs in the domain. > + */ > + on_each_cpu_mask(&d->cpu_mask, mon_event_config_write, &mon_info, 1); > > + > > + /* > > + * When an Event Configuration is changed, the bandwidth counters > > + * for all RMIDs and Events will be cleared by the hardware. The > > + * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for > > + * every RMID on the next read to any event for every RMID. > > + * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62) > > + * cleared while it is tracked by the hardware. Clear the > > + * mbm_local and mbm_total counts for all the RMIDs. > > + */ > > + resctrl_arch_reset_rmid_all(r, d); > > If I understand correctly the expectation is that when user space read counters > (via mon_data files) right after the configuration was changed then this read > will return "Unavailable" and then the next read will return data. > > If this is the case then I think a snippet about this user experience would be > helpful to add to the documentation. > > Have you considered doing a preemptive read on the RMIDs that are in use to > avoid users encountering "Unavailable"? I assume doing so on a busy system > could potentially involve hundreds of register reads/writes. > > > + > > +out: > > + return ret; > > +} > > + > > +static int mon_config_write(struct rdt_resource *r, char *tok, u32 > > +evtid) { > > + char *dom_str = NULL, *id_str; > > + unsigned long dom_id, val; > > + struct rdt_domain *d; > > + int ret = 0; > > + > > +next: > > + if (!tok || tok[0] == '\0') > > + return 0; > > + > > + /* Start processing the strings for each domain */ > > + dom_str = strim(strsep(&tok, ";")); > > + id_str = strsep(&dom_str, "="); > > + > > + if (!id_str || kstrtoul(id_str, 10, &dom_id)) { > > + rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n"); > > + return -EINVAL; > > + } > > + > > + if (!dom_str || kstrtoul(dom_str, 16, &val)) { > > + rdt_last_cmd_puts("Non-numeric event configuration > value\n"); > > + return -EINVAL; > > + } > > + > > + list_for_each_entry(d, &r->domains, list) { > > + if (d->id == dom_id) { > > + ret = mbm_config_write_domain(r, d, evtid, val); > > + if (ret) > > + return -EINVAL; > > + goto next; > > + } > > + } > > + > > + return -EINVAL; > > +} > > + > > +static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of, > > + char *buf, size_t nbytes, > > + loff_t off) > > +{ > > + struct rdt_resource *r = of->kn->parent->priv; > > + int ret; > > + > > + /* Valid input requires a trailing newline */ > > + if (nbytes == 0 || buf[nbytes - 1] != '\n') > > + return -EINVAL; > > + > > + cpus_read_lock(); > > + mutex_lock(&rdtgroup_mutex); > > + > > + rdt_last_cmd_clear(); > > + > > + buf[nbytes - 1] = '\0'; > > + > > + ret = mon_config_write(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID); > > + > > + mutex_unlock(&rdtgroup_mutex); > > + cpus_read_unlock(); > > + > > + return ret ?: nbytes; > > +} > > + > > /* rdtgroup information files for one cache resource. */ static > > struct rftype res_common_files[] = { > > { > > @@ -1617,9 +1741,10 @@ static struct rftype res_common_files[] = { > > }, > > { > > .name = "mbm_total_bytes_config", > > - .mode = 0444, > > + .mode = 0644, > > .kf_ops = &rdtgroup_kf_single_ops, > > .seq_show = mbm_total_bytes_config_show, > > + .write = mbm_total_bytes_config_write, > > }, > > { > > .name = "mbm_local_bytes_config", > > diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index > > 0cee154abc9f..e4dc65892446 100644 > > --- a/include/linux/resctrl.h > > +++ b/include/linux/resctrl.h > > @@ -250,6 +250,16 @@ int resctrl_arch_rmid_read(struct rdt_resource > > *r, struct rdt_domain *d, void resctrl_arch_reset_rmid(struct rdt_resource > *r, struct rdt_domain *d, > > u32 rmid, enum resctrl_event_id eventid); > > > > +/** > > + * resctrl_arch_reset_rmid_all() - Reset any private state associated with > > + * all the rmids. > > It could be more explicit: > "Reset all private state associated with all rmids and eventids." > > > + * @r: The domain's resource. > > + * @d: The rmid's domain. > > This copy&paste needs some changes to match this new utility. > How about: > @r: The resctrl resource. > @d: The domain for which all architectural counter state will be cleared. > > I think it can be improved more but the above could be a start (please do not > copy verbatim but ensure style is correct.) > > Keep in mind that this utility does not clear the non-architectural counter state. > This does not apply to AMD since that state is used by the software controller, > but it needs to be kept in mind if another usage for this utility arises. > > > + * > > + * This can be called from any CPU. > > + */ > > +void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct > > +rdt_domain *d); > > + > > extern unsigned int resctrl_rmid_realloc_threshold; extern unsigned > > int resctrl_rmid_realloc_limit; > > > > > > > > The above hunk fails the "no spaces before tabs" checkpatch check. > > Reinette
Hi Babu, On 12/19/2022 11:28 AM, Moger, Babu wrote: > [AMD Official Use Only - General] > > Hi Reinette, > >> -----Original Message----- >> From: Reinette Chatre <reinette.chatre@intel.com> >> Sent: Thursday, December 15, 2022 12:25 PM >> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net; >> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de >> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org; >> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org; >> quic_neeraju@quicinc.com; rdunlap@infradead.org; >> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com; >> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com; >> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com; >> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan >> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com; >> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org; >> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu; >> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com; >> peternewman@google.com >> Subject: Re: [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write >> mbm_total_bytes_config >> >> Hi Babu, >> >> On 12/1/2022 7:37 AM, Babu Moger wrote: ... >>> + /* >>> + * When an Event Configuration is changed, the bandwidth counters >>> + * for all RMIDs and Events will be cleared by the hardware. The >>> + * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for >>> + * every RMID on the next read to any event for every RMID. >>> + * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62) >>> + * cleared while it is tracked by the hardware. Clear the >>> + * mbm_local and mbm_total counts for all the RMIDs. >>> + */ >>> + resctrl_arch_reset_rmid_all(r, d); >> >> If I understand correctly the expectation is that when user space read counters >> (via mon_data files) right after the configuration was changed then this read >> will return "Unavailable" and then the next read will return data. >> >> If this is the case then I think a snippet about this user experience would be >> helpful to add to the documentation. > > Ok. How about this in the documentation? > > "When an event configuration is changed, the bandwidth counters for all the RMIDs and the events will be cleared for that domain. > The next read for every RMID will report "Unavailable" and subsequent reads will report the valid value." > > Thinking about this more ... why are the counters for all eventids cleared when only one eventid's configuration is changed? Reinette
Hi Babu, On 12/19/2022 11:50 AM, Moger, Babu wrote: > > Forgot about this. This snippet is going to change. I have tested and works fine. > How about this? > > /* > * Update MSR_IA32_EVT_CFG_BASE MSR on one of the CPUs in the > * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE > * are scoped at the domain level. Writing any of these MSRs > * on one CPU is supposed to be observed by all CPUs in the domain. > */ > smp_call_function_any(&d->cpu_mask, mon_event_config_write, &mon_info, 1); > > It looks good but please drop the "supposed to be". If there is any uncertainty then the data should be written to all CPUs, if not then that uncertain text should be dropped. Reinette
Hi Reinette, On 12/20/22 11:32, Reinette Chatre wrote: > Hi Babu, > > On 12/19/2022 11:50 AM, Moger, Babu wrote: >> Forgot about this. This snippet is going to change. I have tested and works fine. >> How about this? >> >> /* >> * Update MSR_IA32_EVT_CFG_BASE MSR on one of the CPUs in the >> * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE >> * are scoped at the domain level. Writing any of these MSRs >> * on one CPU is supposed to be observed by all CPUs in the domain. >> */ >> smp_call_function_any(&d->cpu_mask, mon_event_config_write, &mon_info, 1); >> >> > It looks good but please drop the "supposed to be". If there is any uncertainty > then the data should be written to all CPUs, if not then that uncertain text should > be dropped. There is no uncertainty. I will drop "supposed to be". Thanks Babu
Hi Reinette, On 12/20/22 11:32, Reinette Chatre wrote: > Hi Babu, > > On 12/19/2022 11:28 AM, Moger, Babu wrote: >> [AMD Official Use Only - General] >> >> Hi Reinette, >> >>> -----Original Message----- >>> From: Reinette Chatre <reinette.chatre@intel.com> >>> Sent: Thursday, December 15, 2022 12:25 PM >>> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net; >>> tglx@linutronix.de; mingo@redhat.com; bp@alien8.de >>> Cc: fenghua.yu@intel.com; dave.hansen@linux.intel.com; x86@kernel.org; >>> hpa@zytor.com; paulmck@kernel.org; akpm@linux-foundation.org; >>> quic_neeraju@quicinc.com; rdunlap@infradead.org; >>> damien.lemoal@opensource.wdc.com; songmuchun@bytedance.com; >>> peterz@infradead.org; jpoimboe@kernel.org; pbonzini@redhat.com; >>> chang.seok.bae@intel.com; pawan.kumar.gupta@linux.intel.com; >>> jmattson@google.com; daniel.sneddon@linux.intel.com; Das1, Sandipan >>> <Sandipan.Das@amd.com>; tony.luck@intel.com; james.morse@arm.com; >>> linux-doc@vger.kernel.org; linux-kernel@vger.kernel.org; >>> bagasdotme@gmail.com; eranian@google.com; christophe.leroy@csgroup.eu; >>> jarkko@kernel.org; adrian.hunter@intel.com; quic_jiles@quicinc.com; >>> peternewman@google.com >>> Subject: Re: [PATCH v9 10/13] x86/resctrl: Add sysfs interface to write >>> mbm_total_bytes_config >>> >>> Hi Babu, >>> >>> On 12/1/2022 7:37 AM, Babu Moger wrote: > ... > >>>> + /* >>>> + * When an Event Configuration is changed, the bandwidth counters >>>> + * for all RMIDs and Events will be cleared by the hardware. The >>>> + * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for >>>> + * every RMID on the next read to any event for every RMID. >>>> + * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62) >>>> + * cleared while it is tracked by the hardware. Clear the >>>> + * mbm_local and mbm_total counts for all the RMIDs. >>>> + */ >>>> + resctrl_arch_reset_rmid_all(r, d); >>> If I understand correctly the expectation is that when user space read counters >>> (via mon_data files) right after the configuration was changed then this read >>> will return "Unavailable" and then the next read will return data. >>> >>> If this is the case then I think a snippet about this user experience would be >>> helpful to add to the documentation. >> Ok. How about this in the documentation? >> >> "When an event configuration is changed, the bandwidth counters for all the RMIDs and the events will be cleared for that domain. >> The next read for every RMID will report "Unavailable" and subsequent reads will report the valid value." >> >> > Thinking about this more ... why are the counters for all eventids cleared > when only one eventid's configuration is changed? Its because of the way U-Bit tracking is implemented. The U-bit is tracked per RMID basis, not on a Per-Event + Per RMID basis. Therefore, resetting the U-Bit for 1 will reset it for both. This is what I got from h/w team. Thanks Babu > > Reinette > >
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c index 7c8a3a745041..b265856835de 100644 --- a/arch/x86/kernel/cpu/resctrl/monitor.c +++ b/arch/x86/kernel/cpu/resctrl/monitor.c @@ -176,6 +176,19 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d, memset(am, 0, sizeof(*am)); } +void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain *d) +{ + struct rdt_hw_domain *hw_dom = resctrl_to_arch_dom(d); + + if (is_mbm_total_enabled()) + memset(hw_dom->arch_mbm_total, 0, + sizeof(*hw_dom->arch_mbm_total) * r->num_rmid); + + if (is_mbm_local_enabled()) + memset(hw_dom->arch_mbm_local, 0, + sizeof(*hw_dom->arch_mbm_local) * r->num_rmid); +} + static u64 mbm_overflow_count(u64 prev_msr, u64 cur_msr, unsigned int width) { u64 shift = 64 - width, chunks; diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c index 580f3cce19e2..8a22a652a6e8 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -1517,6 +1517,130 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of, return 0; } +static void mon_event_config_write(void *info) +{ + struct mon_config_info *mon_info = info; + u32 index; + + index = mon_event_config_index_get(mon_info->evtid); + if (index == INVALID_CONFIG_INDEX) { + pr_warn_once("Invalid event id %d\n", mon_info->evtid); + return; + } + wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0); +} + +static int mbm_config_write_domain(struct rdt_resource *r, + struct rdt_domain *d, u32 evtid, u32 val) +{ + struct mon_config_info mon_info = {0}; + int ret = 0; + + /* mon_config cannot be more than the supported set of events */ + if (val > MAX_EVT_CONFIG_BITS) { + rdt_last_cmd_puts("Invalid event configuration\n"); + return -EINVAL; + } + + /* + * Read the current config value first. If both are the same then + * no need to write it again. + */ + mon_info.evtid = evtid; + mondata_config_read(d, &mon_info); + if (mon_info.mon_config == val) + goto out; + + mon_info.mon_config = val; + + /* + * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the + * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE + * are scoped at the domain level. Writing any of these MSRs + * on one CPU is supposed to be observed by all CPUs in the + * domain. However, the hardware team recommends to update + * these MSRs on all the CPUs in the domain. + */ + on_each_cpu_mask(&d->cpu_mask, mon_event_config_write, &mon_info, 1); + + /* + * When an Event Configuration is changed, the bandwidth counters + * for all RMIDs and Events will be cleared by the hardware. The + * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for + * every RMID on the next read to any event for every RMID. + * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62) + * cleared while it is tracked by the hardware. Clear the + * mbm_local and mbm_total counts for all the RMIDs. + */ + resctrl_arch_reset_rmid_all(r, d); + +out: + return ret; +} + +static int mon_config_write(struct rdt_resource *r, char *tok, u32 evtid) +{ + char *dom_str = NULL, *id_str; + unsigned long dom_id, val; + struct rdt_domain *d; + int ret = 0; + +next: + if (!tok || tok[0] == '\0') + return 0; + + /* Start processing the strings for each domain */ + dom_str = strim(strsep(&tok, ";")); + id_str = strsep(&dom_str, "="); + + if (!id_str || kstrtoul(id_str, 10, &dom_id)) { + rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n"); + return -EINVAL; + } + + if (!dom_str || kstrtoul(dom_str, 16, &val)) { + rdt_last_cmd_puts("Non-numeric event configuration value\n"); + return -EINVAL; + } + + list_for_each_entry(d, &r->domains, list) { + if (d->id == dom_id) { + ret = mbm_config_write_domain(r, d, evtid, val); + if (ret) + return -EINVAL; + goto next; + } + } + + return -EINVAL; +} + +static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, + loff_t off) +{ + struct rdt_resource *r = of->kn->parent->priv; + int ret; + + /* Valid input requires a trailing newline */ + if (nbytes == 0 || buf[nbytes - 1] != '\n') + return -EINVAL; + + cpus_read_lock(); + mutex_lock(&rdtgroup_mutex); + + rdt_last_cmd_clear(); + + buf[nbytes - 1] = '\0'; + + ret = mon_config_write(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID); + + mutex_unlock(&rdtgroup_mutex); + cpus_read_unlock(); + + return ret ?: nbytes; +} + /* rdtgroup information files for one cache resource. */ static struct rftype res_common_files[] = { { @@ -1617,9 +1741,10 @@ static struct rftype res_common_files[] = { }, { .name = "mbm_total_bytes_config", - .mode = 0444, + .mode = 0644, .kf_ops = &rdtgroup_kf_single_ops, .seq_show = mbm_total_bytes_config_show, + .write = mbm_total_bytes_config_write, }, { .name = "mbm_local_bytes_config", diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index 0cee154abc9f..e4dc65892446 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -250,6 +250,16 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d, void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d, u32 rmid, enum resctrl_event_id eventid); +/** + * resctrl_arch_reset_rmid_all() - Reset any private state associated with + * all the rmids. + * @r: The domain's resource. + * @d: The rmid's domain. + * + * This can be called from any CPU. + */ +void resctrl_arch_reset_rmid_all(struct rdt_resource *r, struct rdt_domain *d); + extern unsigned int resctrl_rmid_realloc_threshold; extern unsigned int resctrl_rmid_realloc_limit;