Message ID | 20231201005720.235639-1-babu.moger@amd.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp792740vqy; Thu, 30 Nov 2023 16:57:46 -0800 (PST) X-Google-Smtp-Source: AGHT+IG1d2Jb62JfTlTrSP9gMNJytZu+vLVXjXfcQ/zWD1Rl/N0tQOoVkytNQeawtXqpXSNPPW47 X-Received: by 2002:a05:6a20:5482:b0:18c:21aa:6a3f with SMTP id i2-20020a056a20548200b0018c21aa6a3fmr30022185pzk.8.1701392266480; Thu, 30 Nov 2023 16:57:46 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1701392266; cv=pass; d=google.com; s=arc-20160816; b=R0fH906gA0huMttwNvSwDr8IymcGaJPU4LLeITACFFnbvq8EMZPEqm0zIlc65MoIKo /Mn4OsA7Oj3ixDodeIXv0NNb9odrh++1pS24p5Iw0IZMpxQGK8ymYjBa9TvSGcv3hSrL dDGh6rmVtXUCUNKpoN19otKQS9fGv79+lEKljhgmlPksxdL92PGw5ar6g1AfUAhbJXBL JDj7BACxCbyu37v+dm57Tge+yAyV9fAUVJ6OO1ch0KgN+nogTcA47TI7qWP7MaeX7QjG Ce5p0PFQL9K2xoOEdXLW27wCYrWi4R/eFFDIq4V43PLai9WutpVJRrFEjSJmwtgV1UdQ q3eQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=YZvJbj2da+nbzaBDF0R+2JFoVFu1hrrj1EKbhfpjVdY=; fh=EzIPwuN7XPVC8G8SnRL5+FJ4TCMLPoV2JVVwV/H/kM4=; b=NagLW8stsmjyiWMFyYhxVGMslca26chEkTmxXn0unnbnpIw8OZyDuvpUAoc/EB86bJ 7ZTn4QJq70ya2V+V0/7DPb7s4BaTr015JbYh+BnyuWgychHVapS8i1ycdzgTh6q5ZZ+r WZXJgkXo+1KLJoL5CF/Y2E+Tc0HrFQL5F9l40gAyFGJByU6iHSwM5XMEbxthFIK5PFjm +NeW3uQLfXr6oegn/pVqFGHY4zqoY9OZpHzZ+NrSN2tEJnud508FZrVp5QuURLQcJdm2 b6JSU7UZRrBb2WWSYKW2KylgxIwzT5mMRlfXD6N7t4YNZFoXBoxD4yCGTPn0uIwAsiKr 9BGg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b="Kpoi8/gL"; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id 69-20020a630048000000b005898e5f41f8si2424681pga.53.2023.11.30.16.57.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 16:57:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@amd.com header.s=selector1 header.b="Kpoi8/gL"; arc=pass (i=1 spf=pass spfdomain=amd.com dmarc=pass fromdomain=amd.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amd.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 271A285F6EB6; Thu, 30 Nov 2023 16:57:38 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229623AbjLAA52 (ORCPT <rfc822;ruipengqi7@gmail.com> + 99 others); Thu, 30 Nov 2023 19:57:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48828 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229493AbjLAA51 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 30 Nov 2023 19:57:27 -0500 Received: from NAM02-BN1-obe.outbound.protection.outlook.com (mail-bn1nam02on2065.outbound.protection.outlook.com [40.107.212.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 62E2010C2; Thu, 30 Nov 2023 16:57:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=AxArqkr9PasZNZtHbGPUpMjbT0ugnQSWNlAOfHYLKA9U13nFiwUelwJJ29jPup0MqU9vgsWxS4jyFHW9EC+nzgkvGIpB9iM8A+ceu7WRpnBPIQ3FGYYpn5B6QVGu6Pzdac57E2GuaLez331uV7/LTN0mJtYGp/KuXVr6AsnYm7x58TmBTbuQeb8+VLGpUbQGhOHYQbqGV6+twI2sAcVFc6DKKPuwYTLqNnfgSaKJOhiz3/r5+VloGSOpiMAz1HuEA2BMmZsCn65r/VMwy7zUa5efrPaTd7ndJIxEAvdmEAvHFrKLER2d1fxhWuZd8gpFybbDQBFsjtNVQ/0KuijoTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=YZvJbj2da+nbzaBDF0R+2JFoVFu1hrrj1EKbhfpjVdY=; b=QQwUEYZqZ3VKWTNuU5oKEKX8uu2l7F8HYo+Tgh5vbxsUAYwcvi3cx8KxhJIz6WTRDXkzMNAVZmKtJYQYc9Dsz/alVzJqAJ5glL9tBSVMNqzLytPrYoquQ4OAs/CgfTFoM2wHS9UE4fGo+Y+kfL2itIB7Szw40B7uAz7s3QbsSiKhCGRoHx/QSabkkTlRZ8fLpih2MgnturNa5CPf81bVv+de+JZWYoqnJp7AvJAnsBCw7KVy4csJa+Kpi7wevEkEZUPGOJ1xvNTjF0Mo/FMtL8M/IPxJgjqs2FeqFuN1t+JUc9uWiZlG/fuDgpgfLQG5MFytUmialGvLLU7gf9WYlA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=lwn.net smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YZvJbj2da+nbzaBDF0R+2JFoVFu1hrrj1EKbhfpjVdY=; b=Kpoi8/gLS6lFXGpuFIz84c9FRvGGTrSCxLQUGxr+gDwbzrw/4DP6rQ0HnKuRDsZz4Tc+VsGfEjGAoO7Iz9qtLeE8wNVkuVBtgH+MH35xbYihJCHHqS3DQL0u717ErjhS9vbNG/Kq0jQENHAz/3SpmBDDOCGwgXHbpflfk88tnro= Received: from BN7PR06CA0040.namprd06.prod.outlook.com (2603:10b6:408:34::17) by DM3PR12MB9325.namprd12.prod.outlook.com (2603:10b6:0:46::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7046.23; Fri, 1 Dec 2023 00:57:29 +0000 Received: from SN1PEPF000252A1.namprd05.prod.outlook.com (2603:10b6:408:34:cafe::10) by BN7PR06CA0040.outlook.office365.com (2603:10b6:408:34::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7046.24 via Frontend Transport; Fri, 1 Dec 2023 00:57:28 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by SN1PEPF000252A1.mail.protection.outlook.com (10.167.242.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7046.17 via Frontend Transport; Fri, 1 Dec 2023 00:57:28 +0000 Received: from bmoger-ubuntu.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.34; Thu, 30 Nov 2023 18:57:26 -0600 From: Babu Moger <babu.moger@amd.com> To: <corbet@lwn.net>, <fenghua.yu@intel.com>, <reinette.chatre@intel.com>, <tglx@linutronix.de>, <mingo@redhat.com>, <bp@alien8.de>, <dave.hansen@linux.intel.com> CC: <x86@kernel.org>, <hpa@zytor.com>, <paulmck@kernel.org>, <rdunlap@infradead.org>, <tj@kernel.org>, <peterz@infradead.org>, <seanjc@google.com>, <kim.phillips@amd.com>, <babu.moger@amd.com>, <jmattson@google.com>, <ilpo.jarvinen@linux.intel.com>, <jithu.joseph@intel.com>, <kan.liang@linux.intel.com>, <nikunj@amd.com>, <daniel.sneddon@linux.intel.com>, <pbonzini@redhat.com>, <rick.p.edgecombe@intel.com>, <rppt@kernel.org>, <maciej.wieczor-retman@intel.com>, <linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <eranian@google.com>, <peternewman@google.com>, <dhagiani@amd.com> Subject: [PATCH 00/15] x86/resctrl : Support AMD QoS RMID Pinning feature Date: Thu, 30 Nov 2023 18:57:05 -0600 Message-ID: <20231201005720.235639-1-babu.moger@amd.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SN1PEPF000252A1:EE_|DM3PR12MB9325:EE_ X-MS-Office365-Filtering-Correlation-Id: 6b3a7109-0fa8-4e28-c6d6-08dbf2087dbb X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: yoALEAxH/p2J/hNlC1mL+xzaS9064fBL4GuMcRe9wOatsG8vLGC8irdcj2lgzlOPFmw8ZQNXbLWJqh8I9X+wX4C0JksHO2b5wZxv7brk6cw4J21s6lHaNofG9zZYhgxIY2VFeIYfbepp6IMlZa+04ZY+EOG/EODxNZo1GsV5S9CZgOELbH+9gD56TFIAQMiLANcrQp0QO+G/n6iXvVXVWSRduU/vWzG5GYW8utyOdl1+dxwm+Td0KK8PaSLrncQTz77b20ySgm8l4t6KmuPVWXc1u2cEz9V9g4k8KhVxFAHx4whMpfPXTxCcBf+2DFDVQsSou2ZRMo5WX2jK5RtIk82fgoOJmsMSmI2sxupziJfumUGRrKNCv2xlVoP8NS0Rp27mHYMsr6nmVIhoC3gnGSibwg/dX8l3FX4EYeskc9FaJASA0q5xUGPm0QF3wGNvmgZDYkkACUEL9j2kCZ1iPrA85e4bq1EyQ2a+YGS21XQGGRM5y2GvjDzUEKkZdHvE3eWXZJMwzohK5mjKfqdqu2+fcglz4bfWnEJ6LfJqOA5VrOt0RYjGVsPEXmfWfsf0CfyjbqfoPmWl8zip0iVEsbheizEUt5DKOOHjyoCgu3Mg+X9hjIxWWVC/WgKIhspaqRsZm4rn1AKR/wH8LdUW/x7YGfm90wW5sqNamHZeT+n0xGOoOhN83sIdukgktZLWHGyWufiSaaa69I8O55gsolUUpjLvvYcBcfEIfC40QB3WM1LELRAnBPfKpc21dtoQJGxKCds/HEXxZA6/Q2RcM63RTmGckTJps4BCeCApTZU= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230031)(4636009)(376002)(346002)(396003)(136003)(39860400002)(230922051799003)(82310400011)(186009)(1800799012)(64100799003)(451199024)(36840700001)(40470700004)(46966006)(356005)(47076005)(41300700001)(4326008)(44832011)(86362001)(2616005)(8936002)(5660300002)(6666004)(8676002)(1076003)(40480700001)(7416002)(2906002)(26005)(110136005)(16526019)(7696005)(81166007)(478600001)(316002)(70206006)(70586007)(54906003)(426003)(966005)(336012)(40460700003)(36860700001)(83380400001)(36756003)(82740400003)(36900700001);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Dec 2023 00:57:28.5706 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 6b3a7109-0fa8-4e28-c6d6-08dbf2087dbb X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SN1PEPF000252A1.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM3PR12MB9325 X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Thu, 30 Nov 2023 16:57:38 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784039097459630292 X-GMAIL-MSGID: 1784039097459630292 |
Series |
x86/resctrl : Support AMD QoS RMID Pinning feature
|
|
Message
Moger, Babu
Dec. 1, 2023, 12:57 a.m. UTC
These series adds the support for AMD QoS RMID Pinning feature. It is also called ABMC (Assignable Bandwidth Monitoring Counters) feature. The feature details are available in APM listed below [1]. [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth Monitoring (ABMC). The documentation is available at Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 The patches are based on top of commit 346887b65d89ae987698bc1efd8e5536bd180b3f (tip/master) # Introduction AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring feature only guarantees that RMIDs currently assigned to a processor will be tracked by hardware. The counters of any other RMIDs which are no longer being tracked will be reset to zero. The MBM event counters return "Unavailable" for the RMIDs that are not active. Users can create 256 or more monitor groups. But there can be only limited number of groups that can be give guaranteed monitoring numbers. With ever changing system configuration, there is no way to definitely know which of these groups will be active for certain point of time. Users do not have the option to monitor a group or set of groups for certain period of time without worrying about RMID being reset in between. The ABMC feature provides an option to pin (or assign) the RMID to the hardware counter and monitor the bandwidth for a longer duration. The pinned RMID will be active until the user unpins (or unassigns) it. There is no need to worry about counters being reset during this period. Additionally, the user can specify a bitmask identifying the specific bandwidth types from the given source to track with the counter. # Linux Implementation Hardware provides total of 32 counters available for assignment. Each Linux resctrl group can be assigned a maximum of 2 counters. One for mbm_total_bytes and one for mbm_local_bytes. Users also have the option to assign only one counter to the group. If the system runs out of assignable counters, the kernel will display the error when the user attempts a new counter assignment. Users need to unassign already used counters for new assignments. # Examples a. Check if ABMC support is available #mount -t resctrl resctrl /sys/fs/resctrl/ #cat /sys/fs/resctrl/info/L3_MON/mon_features llc_occupancy mbm_total_bytes mbm_total_bytes_config mbm_local_bytes mbm_local_bytes_config abmc_capable ← Linux kernel detected ABMC feature. b. Mount with ABMC support #umount /sys/fs/resctrl/ #mount -o abmc -t resctrl resctrl /sys/fs/resctrl/ c. Read the monitor states. There will be new file "monitor_state" for each monitor group when ABMC feature is enabled. By default, both total and local MBM events are in "unassign" state. #cat /sys/fs/resctrl/monitor_state total=unassign;local=unassign d. Read the event mbm_total_bytes and mbm_local_bytes. Note that MBA events are not available until the user assigns the events explicitly. Users need to assign the counters to monitor the events in this mode. #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes Unavailable #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes Unavailable e. Assign a h/w counter to the total event and read the monitor_state. #echo total=assign > /sys/fs/resctrl/monitor_state #cat /sys/fs/resctrl/monitor_state total=assign;local=unassign f. Now that the total event is assigned. Read the total event. #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes 6136000 g. Assign a h/w counter to both total and local events and read the monitor_state. #echo "total=assign;local=assign" > /sys/fs/resctrl/monitor_state #cat /sys/fs/resctrl/monitor_state total=assign;local=assign h. Now that both total and local events are assigned, read the events. #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes 6136000 #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 58694 i. Check the bandwidth configuration for the group. Note that bandwidth configuration has a domain scope. Total event defaults to 0x7F (to count all the events) and local event defaults to 0x15 (to count all the local numa events). The event bitmap decoding is available in https://www.kernel.org/doc/Documentation/x86/resctrl.rst in section "mbm_total_bytes_config", "mbm_local_bytes_config": #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 0=0x7f;1=0x7f #cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 0=0x15;1=0xi15 j. Change the bandwidth source for domain 0 for the total event to count only reads. Note that this change effects events on the domain 0. #echo total=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 0=0x33;1=0x7F k. Now read the total event again. The mbm_total_bytes should display only the read events. #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes 6136000 l. Unmount the resctrl #umount /sys/fs/resctrl/ NOTE: For simplicity these examples are run on a default resctrl group. Similar experiments are can be run non-defaults groups. --- Babu Moger (15): x86/resctrl: Remove hard-coded memory bandwidth limit x86/resctrl: Remove hard-coded memory bandwidth event configuration x86/resctrl: Add support for Assignable Bandwidth Monitoring Counters (ABMC) x86/resctrl: Add ABMC feature in the command line options x86/resctrl: Detect ABMC feature details x86/resctrl: Add the mount option for ABMC feature x86/resctrl: Add support to enable/disable ABMC feature x86/resctrl: Introduce interface to display number of ABMC counters x86/resctrl: Add interface to display monitor state of the group x86/resctrl: Initialize ABMC counters bitmap x86/resctrl: Add data structures for ABMC assignment x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg x86/resctrl: Add the interface to assign a ABMC counter x86/resctrl: Add interface unassign a ABMC counter x86/resctrl: Update ABMC assignment on event configuration changes .../admin-guide/kernel-parameters.txt | 2 +- Documentation/arch/x86/resctrl.rst | 52 +++ arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/msr-index.h | 2 + arch/x86/kernel/cpu/cpuid-deps.c | 2 + arch/x86/kernel/cpu/resctrl/core.c | 23 +- arch/x86/kernel/cpu/resctrl/internal.h | 49 ++- arch/x86/kernel/cpu/resctrl/monitor.c | 22 + arch/x86/kernel/cpu/resctrl/rdtgroup.c | 415 +++++++++++++++++- arch/x86/kernel/cpu/scattered.c | 1 + include/linux/resctrl.h | 2 + 11 files changed, 562 insertions(+), 9 deletions(-)
Comments
[+James] Hi James, On Thu, Nov 30, 2023 at 4:57 PM Babu Moger <babu.moger@amd.com> wrote: > > These series adds the support for AMD QoS RMID Pinning feature. It is also > called ABMC (Assignable Bandwidth Monitoring Counters) feature. > > The feature details are available in APM listed below [1]. > [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming > Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth > Monitoring (ABMC). The documentation is available at > Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 > > The patches are based on top of commit > 346887b65d89ae987698bc1efd8e5536bd180b3f (tip/master) > > # Introduction > > AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring > feature only guarantees that RMIDs currently assigned to a processor will > be tracked by hardware. The counters of any other RMIDs which are no > longer being tracked will be reset to zero. The MBM event counters return > "Unavailable" for the RMIDs that are not active. > > Users can create 256 or more monitor groups. But there can be only limited > number of groups that can be give guaranteed monitoring numbers. With ever > changing system configuration, there is no way to definitely know which of > these groups will be active for certain point of time. Users do not have > the option to monitor a group or set of groups for certain period of time > without worrying about RMID being reset in between. > > The ABMC feature provides an option to pin (or assign) the RMID to the > hardware counter and monitor the bandwidth for a longer duration. The > pinned RMID will be active until the user unpins (or unassigns) it. There > is no need to worry about counters being reset during this period. > Additionally, the user can specify a bitmask identifying the specific > bandwidth types from the given source to track with the counter. > > # Linux Implementation > > Hardware provides total of 32 counters available for assignment. > Each Linux resctrl group can be assigned a maximum of 2 counters. One for > mbm_total_bytes and one for mbm_local_bytes. Users also have the option to > assign only one counter to the group. If the system runs out of assignable > counters, the kernel will display the error when the user attempts a new > counter assignment. Users need to unassign already used counters for new > assignments. > > # Examples > > a. Check if ABMC support is available > #mount -t resctrl resctrl /sys/fs/resctrl/ > #cat /sys/fs/resctrl/info/L3_MON/mon_features > llc_occupancy > mbm_total_bytes > mbm_total_bytes_config > mbm_local_bytes > mbm_local_bytes_config > abmc_capable ← Linux kernel detected ABMC feature. > > b. Mount with ABMC support > #umount /sys/fs/resctrl/ > #mount -o abmc -t resctrl resctrl /sys/fs/resctrl/ > > c. Read the monitor states. There will be new file "monitor_state" > for each monitor group when ABMC feature is enabled. By default, > both total and local MBM events are in "unassign" state. > > #cat /sys/fs/resctrl/monitor_state > total=unassign;local=unassign > > d. Read the event mbm_total_bytes and mbm_local_bytes. Note that MBA > events are not available until the user assigns the events explicitly. > Users need to assign the counters to monitor the events in this mode. > > #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes > Unavailable > > #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes > Unavailable > > e. Assign a h/w counter to the total event and read the monitor_state. > > #echo total=assign > /sys/fs/resctrl/monitor_state > #cat /sys/fs/resctrl/monitor_state > total=assign;local=unassign > > f. Now that the total event is assigned. Read the total event. > > #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes > 6136000 > > g. Assign a h/w counter to both total and local events and read the monitor_state. > > #echo "total=assign;local=assign" > /sys/fs/resctrl/monitor_state > #cat /sys/fs/resctrl/monitor_state > total=assign;local=assign > > h. Now that both total and local events are assigned, read the events. > > #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes > 6136000 > #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes > 58694 We had briefly discussed this topic of explicit counter assignment in resctrl earlier this year[1], but you didn't want it to be unique to MPAM. Now that a similar capability exists on AMD and an interface is being proposed, we can talk about this in the context of MPAM again. With some generalization and refinements, I expect this proposal could be applied to assigning a limited number of MBWU monitors to monitoring groups. Also, I had proposed in another thread[2] applying such an interface to previous AMD hardware where the monitor assignments cannot be directly controlled to avoid or reduce the overhead in my soft RMID proposal. Thanks! -Peter [1] https://lore.kernel.org/all/f8a25b5f-4a7d-0891-1152-33f349059b5d@arm.com/ [2] https://lore.kernel.org/all/CALPaoCjg-W3w8OKLHP_g6Evoo03fbgaOQZrGTLX6vdSLp70=SA@mail.gmail.com/
(+James) Hi Babu, On 11/30/2023 4:57 PM, Babu Moger wrote: > These series adds the support for AMD QoS RMID Pinning feature. It is also "These series" - is this series part of a bigger work? > called ABMC (Assignable Bandwidth Monitoring Counters) feature. > > The feature details are available in APM listed below [1]. > [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming > Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth > Monitoring (ABMC). The documentation is available at > Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 > > The patches are based on top of commit > 346887b65d89ae987698bc1efd8e5536bd180b3f (tip/master) > > # Introduction > > AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring > feature only guarantees that RMIDs currently assigned to a processor will > be tracked by hardware. The counters of any other RMIDs which are no > longer being tracked will be reset to zero. The MBM event counters return > "Unavailable" for the RMIDs that are not active. > > Users can create 256 or more monitor groups. But there can be only limited > number of groups that can be give guaranteed monitoring numbers. With ever > changing system configuration, there is no way to definitely know which of > these groups will be active for certain point of time. Users do not have > the option to monitor a group or set of groups for certain period of time > without worrying about RMID being reset in between. > > The ABMC feature provides an option to pin (or assign) the RMID to the > hardware counter and monitor the bandwidth for a longer duration. The > pinned RMID will be active until the user unpins (or unassigns) it. There > is no need to worry about counters being reset during this period. > Additionally, the user can specify a bitmask identifying the specific > bandwidth types from the given source to track with the counter. > > # Linux Implementation > > Hardware provides total of 32 counters available for assignment. > Each Linux resctrl group can be assigned a maximum of 2 counters. One for > mbm_total_bytes and one for mbm_local_bytes. Users also have the option to > assign only one counter to the group. If the system runs out of assignable > counters, the kernel will display the error when the user attempts a new > counter assignment. Users need to unassign already used counters for new > assignments. > > # Examples > > a. Check if ABMC support is available > #mount -t resctrl resctrl /sys/fs/resctrl/ > #cat /sys/fs/resctrl/info/L3_MON/mon_features > llc_occupancy > mbm_total_bytes > mbm_total_bytes_config > mbm_local_bytes > mbm_local_bytes_config > abmc_capable ← Linux kernel detected ABMC feature. (Please start thinking about a new name that is not the AMD feature name. This is added to resctrl filesystem that is the generic interface used to work with different architectures. This thus needs to be generalized to what user requires and how it can be accommodated by the hardware ... this is already expected to be needed by MPAM and having a AMD feature name could cause confusion.) > > b. Mount with ABMC support > #umount /sys/fs/resctrl/ > #mount -o abmc -t resctrl resctrl /sys/fs/resctrl/ > hmmm ... so this requires the user to mount resctrl, determine if the feature is supported, unmount resctrl, remount resctrl with feature enabled. Could you please elaborate what prevents this feature from being enabled without needing to remount resctrl? > c. Read the monitor states. There will be new file "monitor_state" > for each monitor group when ABMC feature is enabled. By default, > both total and local MBM events are in "unassign" state. > > #cat /sys/fs/resctrl/monitor_state > total=unassign;local=unassign > > d. Read the event mbm_total_bytes and mbm_local_bytes. Note that MBA > events are not available until the user assigns the events explicitly. > Users need to assign the counters to monitor the events in this mode. > > #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes > Unavailable How is the llc_occupancy event impacted when ABMC is enabled? Can all RMIDs still be used to track cache occupancy? > > #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes > Unavailable I believe that "Unavailable" already has an accepted meaning within current interface and is associated with temporary failure. Even the AMD spec states "This is generally a temporary condition and subsequent reads may succeed". In the scenario above there is no chance that this counter would produce a value later. I do not think it is ideal to overload existing interface with different meanings associated with a new hardware specific feature ... something like "Disabled" seems more appropriate. Considering this we may even consider using these files themselves as a way to enable the counters if they are disabled. For example, just "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used to enable this counter. No need for a new "monitor_state". Please note that this is not an official proposal since there are two other use cases that still need to be considered as we await James's feedback on how this may work for MPAM and also how this may be useful on AMD hardware that does not support ABMC but users may want to get similar benefits ([1]) > > e. Assign a h/w counter to the total event and read the monitor_state. > > #echo total=assign > /sys/fs/resctrl/monitor_state > #cat /sys/fs/resctrl/monitor_state > total=assign;local=unassign > > f. Now that the total event is assigned. Read the total event. > > #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes > 6136000 > > g. Assign a h/w counter to both total and local events and read the monitor_state. > > #echo "total=assign;local=assign" > /sys/fs/resctrl/monitor_state > #cat /sys/fs/resctrl/monitor_state > total=assign;local=assign > > h. Now that both total and local events are assigned, read the events. > > #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes > 6136000 > #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes > 58694 It looks like if not all RMIDs asssociated with parent and child groups have counters then the accumulated counters would just treat the "unassigned" as zero? > > i. Check the bandwidth configuration for the group. Note that bandwidth > configuration has a domain scope. Total event defaults to 0x7F (to > count all the events) and local event defaults to 0x15 > (to count all the local numa events). The event bitmap decoding is > available in https://www.kernel.org/doc/Documentation/x86/resctrl.rst > in section "mbm_total_bytes_config", "mbm_local_bytes_config": > > #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config > 0=0x7f;1=0x7f > > #cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config > 0=0x15;1=0xi15 These would not be available if system does not support BMEC. From patch #3 it does not seem as though ABMC is dependent on BMEC. Is ABMC dependent on BMEC or are they just using the same config bits? > > j. Change the bandwidth source for domain 0 for the total event to count only reads. > Note that this change effects events on the domain 0. > > #echo total=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config typo? > #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config > 0=0x33;1=0x7F > > k. Now read the total event again. The mbm_total_bytes should display > only the read events. > > #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes > 6136000 hmmm ... seems like there is a need to make the MBM events configurable even if BMEC is not supported. Reinette [1] https://lore.kernel.org/lkml/CALPaoCjg-W3w8OKLHP_g6Evoo03fbgaOQZrGTLX6vdSLp70=SA@mail.gmail.com/
Hi Reinette, On 12/5/23 17:17, Reinette Chatre wrote: > (+James) > > Hi Babu, > > On 11/30/2023 4:57 PM, Babu Moger wrote: >> These series adds the support for AMD QoS RMID Pinning feature. It is also > > "These series" - is this series part of a bigger work? No. There are some some plans to optimize rmid_reads. Peter is planning to work on that. But both are independent of each other. > >> called ABMC (Assignable Bandwidth Monitoring Counters) feature. >> >> The feature details are available in APM listed below [1]. >> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming >> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth >> Monitoring (ABMC). The documentation is available at >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 >> >> The patches are based on top of commit >> 346887b65d89ae987698bc1efd8e5536bd180b3f (tip/master) >> >> # Introduction >> >> AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring >> feature only guarantees that RMIDs currently assigned to a processor will >> be tracked by hardware. The counters of any other RMIDs which are no >> longer being tracked will be reset to zero. The MBM event counters return >> "Unavailable" for the RMIDs that are not active. >> >> Users can create 256 or more monitor groups. But there can be only limited >> number of groups that can be give guaranteed monitoring numbers. With ever >> changing system configuration, there is no way to definitely know which of >> these groups will be active for certain point of time. Users do not have >> the option to monitor a group or set of groups for certain period of time >> without worrying about RMID being reset in between. >> >> The ABMC feature provides an option to pin (or assign) the RMID to the >> hardware counter and monitor the bandwidth for a longer duration. The >> pinned RMID will be active until the user unpins (or unassigns) it. There >> is no need to worry about counters being reset during this period. >> Additionally, the user can specify a bitmask identifying the specific >> bandwidth types from the given source to track with the counter. >> >> # Linux Implementation >> >> Hardware provides total of 32 counters available for assignment. >> Each Linux resctrl group can be assigned a maximum of 2 counters. One for >> mbm_total_bytes and one for mbm_local_bytes. Users also have the option to >> assign only one counter to the group. If the system runs out of assignable >> counters, the kernel will display the error when the user attempts a new >> counter assignment. Users need to unassign already used counters for new >> assignments. >> >> # Examples >> >> a. Check if ABMC support is available >> #mount -t resctrl resctrl /sys/fs/resctrl/ >> #cat /sys/fs/resctrl/info/L3_MON/mon_features >> llc_occupancy >> mbm_total_bytes >> mbm_total_bytes_config >> mbm_local_bytes >> mbm_local_bytes_config >> abmc_capable ← Linux kernel detected ABMC feature. > > (Please start thinking about a new name that is not the AMD feature > name. This is added to resctrl filesystem that is the generic interface > used to work with different architectures. This thus needs to be generalized > to what user requires and how it can be accommodated by the hardware ... > this is already expected to be needed by MPAM and having a AMD feature > name could cause confusion.) Yes. Agree. How about "assign_capable"? > >> >> b. Mount with ABMC support >> #umount /sys/fs/resctrl/ >> #mount -o abmc -t resctrl resctrl /sys/fs/resctrl/ >> > > hmmm ... so this requires the user to mount resctrl, determine if the > feature is supported, unmount resctrl, remount resctrl with feature enabled. > Could you please elaborate what prevents this feature from being enabled > without needing to remount resctrl? Spec says "Enabling ABMC: ABMC is enabled by setting L3_QOS_EXT_CFG.ABMC_En=1 (see Figure 19-7). When the state of ABMC_En is changed, it must be changed to the updated value on all logical processors in the QOS Domain. Upon transitions of the ABMC_En the following actions take place: All ABMC assignable bandwidth counters are reset to 0. The L3 default mode bandwidth counters are reset to 0. The L3_QOS_ABMC_CFG MSR is reset to 0." So, all the monitoring group counters will be reset. It is technically possible to enable without remount. But ABMC mode requires few new files(in each group) which I added when mounted with "-o abmc". Thought it is a better option. Otherwise we need to add these files when ABMC is supported(not when enabled). Need to add another file in /sys/fs/resctrl/info/L3_MON to enable the feature on the fly. Both are acceptable options. Any thoughts? > >> c. Read the monitor states. There will be new file "monitor_state" >> for each monitor group when ABMC feature is enabled. By default, >> both total and local MBM events are in "unassign" state. >> >> #cat /sys/fs/resctrl/monitor_state >> total=unassign;local=unassign >> >> d. Read the event mbm_total_bytes and mbm_local_bytes. Note that MBA >> events are not available until the user assigns the events explicitly. >> Users need to assign the counters to monitor the events in this mode. >> >> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes >> Unavailable > > How is the llc_occupancy event impacted when ABMC is enabled? Can all RMIDs > still be used to track cache occupancy? llc_occupancy event is not impacted by ABMC mode. It can be still used to track cache occupancy. > >> >> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes >> Unavailable > > I believe that "Unavailable" already has an accepted meaning within current > interface and is associated with temporary failure. Even the AMD spec states "This > is generally a temporary condition and subsequent reads may succeed". In the > scenario above there is no chance that this counter would produce a value later. > I do not think it is ideal to overload existing interface with different meanings > associated with a new hardware specific feature ... something like "Disabled" seems > more appropriate. Hardware still reports it as unavailable. Also, there are some error cases hardware can report unavailable. We may not be able to differentiate that. > > Considering this we may even consider using these files themselves as a > way to enable the counters if they are disabled. For example, just > "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used I am not sure about this. This is specific to domain 0. This group can have cpus from multiple domains. I think we should have the interface for all the domains(not for specific domain). > to enable this counter. No need for a new "monitor_state". Please note that this > is not an official proposal since there are two other use cases that still need to > be considered as we await James's feedback on how this may work for MPAM and > also how this may be useful on AMD hardware that does not support ABMC but > users may want to get similar benefits ([1]) Ok. Lets wait for James comments. > >> >> e. Assign a h/w counter to the total event and read the monitor_state. >> >> #echo total=assign > /sys/fs/resctrl/monitor_state >> #cat /sys/fs/resctrl/monitor_state >> total=assign;local=unassign >> >> f. Now that the total event is assigned. Read the total event. >> >> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes >> 6136000 >> >> g. Assign a h/w counter to both total and local events and read the monitor_state. >> >> #echo "total=assign;local=assign" > /sys/fs/resctrl/monitor_state >> #cat /sys/fs/resctrl/monitor_state >> total=assign;local=assign >> >> h. Now that both total and local events are assigned, read the events. >> >> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes >> 6136000 >> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes >> 58694 > > It looks like if not all RMIDs asssociated with parent and child groups > have counters then the accumulated counters would just treat the "unassigned" > as zero? That is correct. > >> >> i. Check the bandwidth configuration for the group. Note that bandwidth >> configuration has a domain scope. Total event defaults to 0x7F (to >> count all the events) and local event defaults to 0x15 >> (to count all the local numa events). The event bitmap decoding is >> available in https://www.kernel.org/doc/Documentation/x86/resctrl.rst >> in section "mbm_total_bytes_config", "mbm_local_bytes_config": >> >> #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config >> 0=0x7f;1=0x7f >> >> #cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config >> 0=0x15;1=0xi15 > > > These would not be available if system does not support BMEC. From > patch #3 it does not seem as though ABMC is dependent on BMEC. > > Is ABMC dependent on BMEC or are they just using the same > config bits? Good question. They dont have to be dependent on each other. To keep the rmid_read interface same, we made it dependent on each other. I will add the dependency in patch 3. I have added explanation in patch 15. https://lore.kernel.org/lkml/20231201005720.235639-16-babu.moger@amd.com/ > >> >> j. Change the bandwidth source for domain 0 for the total event to count only reads. >> Note that this change effects events on the domain 0. >> >> #echo total=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config > > typo? Yes. Cut paste mistake. Will fix it. > >> #cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config >> 0=0x33;1=0x7F >> >> k. Now read the total event again. The mbm_total_bytes should display >> only the read events. >> >> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes >> 6136000 > > hmmm ... seems like there is a need to make the MBM events configurable even > if BMEC is not supported. Yes, in ABMC mode. Will add the dependency. Will use the standard mode if BMEC and ABMC are not available. > > Reinette > > > [1] https://lore.kernel.org/lkml/CALPaoCjg-W3w8OKLHP_g6Evoo03fbgaOQZrGTLX6vdSLp70=SA@mail.gmail.com/
Hi Babu, On 12/6/2023 7:40 AM, Moger, Babu wrote: > Hi Reinette, > > On 12/5/23 17:17, Reinette Chatre wrote: >> (+James) >> >> Hi Babu, >> >> On 11/30/2023 4:57 PM, Babu Moger wrote: >>> These series adds the support for AMD QoS RMID Pinning feature. It is also >> >> "These series" - is this series part of a bigger work? > > No. > There are some some plans to optimize rmid_reads. Peter is planning to > work on that. But both are independent of each other. I would propose that you use "This series" instead to avoid creating wrong impression. >>> a. Check if ABMC support is available >>> #mount -t resctrl resctrl /sys/fs/resctrl/ >>> #cat /sys/fs/resctrl/info/L3_MON/mon_features >>> llc_occupancy >>> mbm_total_bytes >>> mbm_total_bytes_config >>> mbm_local_bytes >>> mbm_local_bytes_config >>> abmc_capable ← Linux kernel detected ABMC feature. >> >> (Please start thinking about a new name that is not the AMD feature >> name. This is added to resctrl filesystem that is the generic interface >> used to work with different architectures. This thus needs to be generalized >> to what user requires and how it can be accommodated by the hardware ... >> this is already expected to be needed by MPAM and having a AMD feature >> name could cause confusion.) > > Yes. Agree. > > How about "assign_capable"? Let's wait to learn more about other use case. > >> >>> >>> b. Mount with ABMC support >>> #umount /sys/fs/resctrl/ >>> #mount -o abmc -t resctrl resctrl /sys/fs/resctrl/ >>> >> >> hmmm ... so this requires the user to mount resctrl, determine if the >> feature is supported, unmount resctrl, remount resctrl with feature enabled. >> Could you please elaborate what prevents this feature from being enabled >> without needing to remount resctrl? > > Spec says > "Enabling ABMC: ABMC is enabled by setting L3_QOS_EXT_CFG.ABMC_En=1 (see > Figure 19-7). When the state of ABMC_En is changed, it must be changed to > the updated value on all logical processors in the QOS Domain. > Upon transitions of the ABMC_En the following actions take place: > All ABMC assignable bandwidth counters are reset to 0. > The L3 default mode bandwidth counters are reset to 0. > The L3_QOS_ABMC_CFG MSR is reset to 0." > > So, all the monitoring group counters will be reset. > > It is technically possible to enable without remount. But ABMC mode > requires few new files(in each group) which I added when mounted with "-o > abmc". Thought it is a better option. > > Otherwise we need to add these files when ABMC is supported(not when > enabled). Need to add another file in /sys/fs/resctrl/info/L3_MON to > enable the feature on the fly. > > Both are acceptable options. Any thoughts? The new resctrl files in info/ could always be present. For example, user space may want to know how many counters are available before enabling the feature. It is not yet obvious to me that this feature requires new files in monitor groups. >>> c. Read the monitor states. There will be new file "monitor_state" >>> for each monitor group when ABMC feature is enabled. By default, >>> both total and local MBM events are in "unassign" state. >>> >>> #cat /sys/fs/resctrl/monitor_state >>> total=unassign;local=unassign >>> >>> d. Read the event mbm_total_bytes and mbm_local_bytes. Note that MBA >>> events are not available until the user assigns the events explicitly. >>> Users need to assign the counters to monitor the events in this mode. >>> >>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes >>> Unavailable >> >> How is the llc_occupancy event impacted when ABMC is enabled? Can all RMIDs >> still be used to track cache occupancy? > > llc_occupancy event is not impacted by ABMC mode. It can be still used to > track cache occupancy. > >> >>> >>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes >>> Unavailable >> >> I believe that "Unavailable" already has an accepted meaning within current >> interface and is associated with temporary failure. Even the AMD spec states "This >> is generally a temporary condition and subsequent reads may succeed". In the >> scenario above there is no chance that this counter would produce a value later. >> I do not think it is ideal to overload existing interface with different meanings >> associated with a new hardware specific feature ... something like "Disabled" seems >> more appropriate. > > Hardware still reports it as unavailable. Also, there are some error cases > hardware can report unavailable. We may not be able to differentiate that. This highlights that this resctrl feature is currently latched to AMD's ABMC. I do not think we should require that this resctrl feature is backed by hardware that can support reads of counters that are disabled. A counter read really only needs to be sent to hardware if it is enabled. >> Considering this we may even consider using these files themselves as a >> way to enable the counters if they are disabled. For example, just >> "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used > > I am not sure about this. This is specific to domain 0. This group can > have cpus from multiple domains. I think we should have the interface for > all the domains(not for specific domain). Are the ABMC registers not per CPU? This is unclear to me at this time since changelog of patch #13 states it is per-CPU but yet the code uses smp_call_function_any(). Even so, this needs to take other use cases into account. So far Peter mentioned the scenario where enabling of one counter would do so for all events associated with that counter and then there could also be a global enable/disable. > >> to enable this counter. No need for a new "monitor_state". Please note that this >> is not an official proposal since there are two other use cases that still need to >> be considered as we await James's feedback on how this may work for MPAM and >> also how this may be useful on AMD hardware that does not support ABMC but >> users may want to get similar benefits ([1]) > > Ok. Lets wait for James comments. Reinette
Hi Reinette, On 12/6/23 12:49, Reinette Chatre wrote: > Hi Babu, > > On 12/6/2023 7:40 AM, Moger, Babu wrote: >> Hi Reinette, >> >> On 12/5/23 17:17, Reinette Chatre wrote: >>> (+James) >>> >>> Hi Babu, >>> >>> On 11/30/2023 4:57 PM, Babu Moger wrote: >>>> These series adds the support for AMD QoS RMID Pinning feature. It is also >>> >>> "These series" - is this series part of a bigger work? >> >> No. >> There are some some plans to optimize rmid_reads. Peter is planning to >> work on that. But both are independent of each other. > > I would propose that you use "This series" instead to avoid creating > wrong impression. Sure. > >>>> a. Check if ABMC support is available >>>> #mount -t resctrl resctrl /sys/fs/resctrl/ >>>> #cat /sys/fs/resctrl/info/L3_MON/mon_features >>>> llc_occupancy >>>> mbm_total_bytes >>>> mbm_total_bytes_config >>>> mbm_local_bytes >>>> mbm_local_bytes_config >>>> abmc_capable ← Linux kernel detected ABMC feature. >>> >>> (Please start thinking about a new name that is not the AMD feature >>> name. This is added to resctrl filesystem that is the generic interface >>> used to work with different architectures. This thus needs to be generalized >>> to what user requires and how it can be accommodated by the hardware ... >>> this is already expected to be needed by MPAM and having a AMD feature >>> name could cause confusion.) >> >> Yes. Agree. >> >> How about "assign_capable"? > > Let's wait to learn more about other use case. > >> >>> >>>> >>>> b. Mount with ABMC support >>>> #umount /sys/fs/resctrl/ >>>> #mount -o abmc -t resctrl resctrl /sys/fs/resctrl/ >>>> >>> >>> hmmm ... so this requires the user to mount resctrl, determine if the >>> feature is supported, unmount resctrl, remount resctrl with feature enabled. >>> Could you please elaborate what prevents this feature from being enabled >>> without needing to remount resctrl? >> >> Spec says >> "Enabling ABMC: ABMC is enabled by setting L3_QOS_EXT_CFG.ABMC_En=1 (see >> Figure 19-7). When the state of ABMC_En is changed, it must be changed to >> the updated value on all logical processors in the QOS Domain. >> Upon transitions of the ABMC_En the following actions take place: >> All ABMC assignable bandwidth counters are reset to 0. >> The L3 default mode bandwidth counters are reset to 0. >> The L3_QOS_ABMC_CFG MSR is reset to 0." >> >> So, all the monitoring group counters will be reset. >> >> It is technically possible to enable without remount. But ABMC mode >> requires few new files(in each group) which I added when mounted with "-o >> abmc". Thought it is a better option. >> >> Otherwise we need to add these files when ABMC is supported(not when >> enabled). Need to add another file in /sys/fs/resctrl/info/L3_MON to >> enable the feature on the fly. >> >> Both are acceptable options. Any thoughts? > > The new resctrl files in info/ could always be present. For example, > user space may want to know how many counters are available before > enabling the feature. > > It is not yet obvious to me that this feature requires new files > in monitor groups. There are two MBM events(total and local) in each group. We should provide an interface to assign each event independently. User can assign only one event in a group. We should also provide an option assign both the events in the group. This needs to be done at resctrl group level. > >>>> c. Read the monitor states. There will be new file "monitor_state" >>>> for each monitor group when ABMC feature is enabled. By default, >>>> both total and local MBM events are in "unassign" state. >>>> >>>> #cat /sys/fs/resctrl/monitor_state >>>> total=unassign;local=unassign >>>> >>>> d. Read the event mbm_total_bytes and mbm_local_bytes. Note that MBA >>>> events are not available until the user assigns the events explicitly. >>>> Users need to assign the counters to monitor the events in this mode. >>>> >>>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes >>>> Unavailable >>> >>> How is the llc_occupancy event impacted when ABMC is enabled? Can all RMIDs >>> still be used to track cache occupancy? >> >> llc_occupancy event is not impacted by ABMC mode. It can be still used to >> track cache occupancy. >> >>> >>>> >>>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes >>>> Unavailable >>> >>> I believe that "Unavailable" already has an accepted meaning within current >>> interface and is associated with temporary failure. Even the AMD spec states "This >>> is generally a temporary condition and subsequent reads may succeed". In the >>> scenario above there is no chance that this counter would produce a value later. >>> I do not think it is ideal to overload existing interface with different meanings >>> associated with a new hardware specific feature ... something like "Disabled" seems >>> more appropriate. >> >> Hardware still reports it as unavailable. Also, there are some error cases >> hardware can report unavailable. We may not be able to differentiate that. > > This highlights that this resctrl feature is currently latched to AMD's > ABMC. I do not think we should require that this resctrl feature is backed > by hardware that can support reads of counters that are disabled. A counter > read really only needs to be sent to hardware if it is enabled. > >>> Considering this we may even consider using these files themselves as a >>> way to enable the counters if they are disabled. For example, just >>> "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used >> >> I am not sure about this. This is specific to domain 0. This group can >> have cpus from multiple domains. I think we should have the interface for >> all the domains(not for specific domain). > > Are the ABMC registers not per CPU? This is unclear to me at this time > since changelog of patch #13 states it is per-CPU but yet the code > uses smp_call_function_any(). Here are the clarifications from hardware engineer about this. # While configuring the counter, should we have to write (L3_QOS_ABMC_CFG) on all the logical processors in a domain? No. In order to configure a specific counter, you only need to write it on a single logical processor in a domain. Configuring the actual ABMC counter is a side-effect of the write to this register. And the actual ABMC counter configuration is a global state. "Each logical processor implements a separate copy of these registers" identifies that if you write a 5 to L3_QOS_ABMC_CFG on C0T0, you will not read a 5 from the L3_QOS_ABMC_CFG register on C1T0. > > Even so, this needs to take other use cases into account. So far Peter > mentioned the scenario where enabling of one counter would do so for all > events associated with that counter and then there could also be a global > enable/disable. > >> >>> to enable this counter. No need for a new "monitor_state". Please note that this >>> is not an official proposal since there are two other use cases that still need to >>> be considered as we await James's feedback on how this may work for MPAM and >>> also how this may be useful on AMD hardware that does not support ABMC but >>> users may want to get similar benefits ([1]) >> >> Ok. Lets wait for James comments. > > Reinette >
Hi Babu, On 12/7/2023 8:12 AM, Moger, Babu wrote: > On 12/6/23 12:49, Reinette Chatre wrote: >> On 12/6/2023 7:40 AM, Moger, Babu wrote: >>> On 12/5/23 17:17, Reinette Chatre wrote: >>>> On 11/30/2023 4:57 PM, Babu Moger wrote: >>>>> b. Mount with ABMC support >>>>> #umount /sys/fs/resctrl/ >>>>> #mount -o abmc -t resctrl resctrl /sys/fs/resctrl/ >>>>> >>>> >>>> hmmm ... so this requires the user to mount resctrl, determine if the >>>> feature is supported, unmount resctrl, remount resctrl with feature enabled. >>>> Could you please elaborate what prevents this feature from being enabled >>>> without needing to remount resctrl? >>> >>> Spec says >>> "Enabling ABMC: ABMC is enabled by setting L3_QOS_EXT_CFG.ABMC_En=1 (see >>> Figure 19-7). When the state of ABMC_En is changed, it must be changed to >>> the updated value on all logical processors in the QOS Domain. >>> Upon transitions of the ABMC_En the following actions take place: >>> All ABMC assignable bandwidth counters are reset to 0. >>> The L3 default mode bandwidth counters are reset to 0. >>> The L3_QOS_ABMC_CFG MSR is reset to 0." >>> >>> So, all the monitoring group counters will be reset. >>> >>> It is technically possible to enable without remount. But ABMC mode >>> requires few new files(in each group) which I added when mounted with "-o >>> abmc". Thought it is a better option. >>> >>> Otherwise we need to add these files when ABMC is supported(not when >>> enabled). Need to add another file in /sys/fs/resctrl/info/L3_MON to >>> enable the feature on the fly. >>> >>> Both are acceptable options. Any thoughts? >> >> The new resctrl files in info/ could always be present. For example, >> user space may want to know how many counters are available before >> enabling the feature. >> >> It is not yet obvious to me that this feature requires new files >> in monitor groups. > > There are two MBM events(total and local) in each group. > We should provide an interface to assign each event independently. > User can assign only one event in a group. We should also provide an > option assign both the events in the group. This needs to be done at > resctrl group level. Understood. I would like to start by considering how (if at all) existing files may be used, thus my example of using mbm_total_bytes, before adding more files. ... >>>>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes >>>>> Unavailable >>>> >>>> I believe that "Unavailable" already has an accepted meaning within current >>>> interface and is associated with temporary failure. Even the AMD spec states "This >>>> is generally a temporary condition and subsequent reads may succeed". In the >>>> scenario above there is no chance that this counter would produce a value later. >>>> I do not think it is ideal to overload existing interface with different meanings >>>> associated with a new hardware specific feature ... something like "Disabled" seems >>>> more appropriate. >>> >>> Hardware still reports it as unavailable. Also, there are some error cases >>> hardware can report unavailable. We may not be able to differentiate that. >> >> This highlights that this resctrl feature is currently latched to AMD's >> ABMC. I do not think we should require that this resctrl feature is backed >> by hardware that can support reads of counters that are disabled. A counter >> read really only needs to be sent to hardware if it is enabled. >> >>>> Considering this we may even consider using these files themselves as a >>>> way to enable the counters if they are disabled. For example, just >>>> "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used >>> >>> I am not sure about this. This is specific to domain 0. This group can >>> have cpus from multiple domains. I think we should have the interface for >>> all the domains(not for specific domain). >> >> Are the ABMC registers not per CPU? This is unclear to me at this time >> since changelog of patch #13 states it is per-CPU but yet the code >> uses smp_call_function_any(). > > Here are the clarifications from hardware engineer about this. > > # While configuring the counter, should we have to write (L3_QOS_ABMC_CFG) > on all the logical processors in a domain? > > No. In order to configure a specific counter, you only need to write it > on a single logical processor in a domain. Configuring the actual ABMC > counter is a side-effect of the write to this register. And the actual > ABMC counter configuration is a global state. > > "Each logical processor implements a separate copy of these registers" > identifies that if you write a 5 to L3_QOS_ABMC_CFG on C0T0, you will not > read a 5 from the L3_QOS_ABMC_CFG register on C1T0. Thank you for this information. Would reading L3_QOS_ABMC_DSC register on C1T0 return the configuration written to L3_QOS_ABMC_CFG on C0T0 ? Even so, you do confirm that the counter configuration is per domain. If I understand correctly the implementation in this series assumes the counters are programmed identically on all domains, but theoretically the system can support domains with different counter configurations. For example, if a resource group is limited to CPUs in one domain it would be unnecessary to consume the other domain's counters. This also ties into what this feature may morph into when considering the non-ABMC AMD hardware needing similar interface as well as MPAM. I understand for MPAM that resources are required for a counter but I do not know their scope. Reinette
Hi Reinette, On 12/7/2023 1:29 PM, Reinette Chatre wrote: > Hi Babu, > > On 12/7/2023 8:12 AM, Moger, Babu wrote: >> On 12/6/23 12:49, Reinette Chatre wrote: >>> On 12/6/2023 7:40 AM, Moger, Babu wrote: >>>> On 12/5/23 17:17, Reinette Chatre wrote: >>>>> On 11/30/2023 4:57 PM, Babu Moger wrote: > >>>>>> b. Mount with ABMC support >>>>>> #umount /sys/fs/resctrl/ >>>>>> #mount -o abmc -t resctrl resctrl /sys/fs/resctrl/ >>>>>> >>>>> hmmm ... so this requires the user to mount resctrl, determine if the >>>>> feature is supported, unmount resctrl, remount resctrl with feature enabled. >>>>> Could you please elaborate what prevents this feature from being enabled >>>>> without needing to remount resctrl? >>>> Spec says >>>> "Enabling ABMC: ABMC is enabled by setting L3_QOS_EXT_CFG.ABMC_En=1 (see >>>> Figure 19-7). When the state of ABMC_En is changed, it must be changed to >>>> the updated value on all logical processors in the QOS Domain. >>>> Upon transitions of the ABMC_En the following actions take place: >>>> All ABMC assignable bandwidth counters are reset to 0. >>>> The L3 default mode bandwidth counters are reset to 0. >>>> The L3_QOS_ABMC_CFG MSR is reset to 0." >>>> >>>> So, all the monitoring group counters will be reset. >>>> >>>> It is technically possible to enable without remount. But ABMC mode >>>> requires few new files(in each group) which I added when mounted with "-o >>>> abmc". Thought it is a better option. >>>> >>>> Otherwise we need to add these files when ABMC is supported(not when >>>> enabled). Need to add another file in /sys/fs/resctrl/info/L3_MON to >>>> enable the feature on the fly. >>>> >>>> Both are acceptable options. Any thoughts? >>> The new resctrl files in info/ could always be present. For example, >>> user space may want to know how many counters are available before >>> enabling the feature. >>> >>> It is not yet obvious to me that this feature requires new files >>> in monitor groups. >> There are two MBM events(total and local) in each group. >> We should provide an interface to assign each event independently. >> User can assign only one event in a group. We should also provide an >> option assign both the events in the group. This needs to be done at >> resctrl group level. > Understood. I would like to start by considering how (if at all) existing > files may be used, thus my example of using mbm_total_bytes, before adding > more files. > > > ... > >>>>>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes >>>>>> Unavailable >>>>> I believe that "Unavailable" already has an accepted meaning within current >>>>> interface and is associated with temporary failure. Even the AMD spec states "This >>>>> is generally a temporary condition and subsequent reads may succeed". In the >>>>> scenario above there is no chance that this counter would produce a value later. >>>>> I do not think it is ideal to overload existing interface with different meanings >>>>> associated with a new hardware specific feature ... something like "Disabled" seems >>>>> more appropriate. >>>> Hardware still reports it as unavailable. Also, there are some error cases >>>> hardware can report unavailable. We may not be able to differentiate that. >>> This highlights that this resctrl feature is currently latched to AMD's >>> ABMC. I do not think we should require that this resctrl feature is backed >>> by hardware that can support reads of counters that are disabled. A counter >>> read really only needs to be sent to hardware if it is enabled. >>> >>>>> Considering this we may even consider using these files themselves as a >>>>> way to enable the counters if they are disabled. For example, just >>>>> "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used >>>> I am not sure about this. This is specific to domain 0. This group can >>>> have cpus from multiple domains. I think we should have the interface for >>>> all the domains(not for specific domain). >>> Are the ABMC registers not per CPU? This is unclear to me at this time >>> since changelog of patch #13 states it is per-CPU but yet the code >>> uses smp_call_function_any(). >> Here are the clarifications from hardware engineer about this. >> >> # While configuring the counter, should we have to write (L3_QOS_ABMC_CFG) >> on all the logical processors in a domain? >> >> No. In order to configure a specific counter, you only need to write it >> on a single logical processor in a domain. Configuring the actual ABMC >> counter is a side-effect of the write to this register. And the actual >> ABMC counter configuration is a global state. >> >> "Each logical processor implements a separate copy of these registers" >> identifies that if you write a 5 to L3_QOS_ABMC_CFG on C0T0, you will not >> read a 5 from the L3_QOS_ABMC_CFG register on C1T0. > Thank you for this information. Would reading L3_QOS_ABMC_DSC register on > C1T0 return the configuration written to L3_QOS_ABMC_CFG on C0T0 ? Yes. Because the counter counter configuration is global. Reading L3_QOS_ABMC_DSC will return the configuration of the counter specified by QOS_ABMC_CFG[CtrID]. > > Even so, you do confirm that the counter configuration is per domain. If I > understand correctly the implementation in this series assumes the counters > are programmed identically on all domains, but theoretically the system can support > domains with different counter configurations. For example, if a resource group > is limited to CPUs in one domain it would be unnecessary to consume the other > domain's counters. Yes. It is programmed on all the domains. Separating the domain configuration will require more changes. I am not planning to address in this series. > > This also ties into what this feature may morph into when considering the > non-ABMC AMD hardware needing similar interface as well as MPAM. I understand > for MPAM that resources are required for a counter but I do not know their > scope. > > Reinette
Hi Babu, On 12/7/2023 3:07 PM, Moger, Babu wrote: > On 12/7/2023 1:29 PM, Reinette Chatre wrote: >> On 12/7/2023 8:12 AM, Moger, Babu wrote: >>> On 12/6/23 12:49, Reinette Chatre wrote: >>>> On 12/6/2023 7:40 AM, Moger, Babu wrote: >>>>> On 12/5/23 17:17, Reinette Chatre wrote: >>>>>> On 11/30/2023 4:57 PM, Babu Moger wrote: >> >>>>>>> b. Mount with ABMC support >>>>>>> #umount /sys/fs/resctrl/ >>>>>>> #mount -o abmc -t resctrl resctrl /sys/fs/resctrl/ >>>>>>> >>>>>> hmmm ... so this requires the user to mount resctrl, determine if the >>>>>> feature is supported, unmount resctrl, remount resctrl with feature enabled. >>>>>> Could you please elaborate what prevents this feature from being enabled >>>>>> without needing to remount resctrl? >>>>> Spec says >>>>> "Enabling ABMC: ABMC is enabled by setting L3_QOS_EXT_CFG.ABMC_En=1 (see >>>>> Figure 19-7). When the state of ABMC_En is changed, it must be changed to >>>>> the updated value on all logical processors in the QOS Domain. >>>>> Upon transitions of the ABMC_En the following actions take place: >>>>> All ABMC assignable bandwidth counters are reset to 0. >>>>> The L3 default mode bandwidth counters are reset to 0. >>>>> The L3_QOS_ABMC_CFG MSR is reset to 0." >>>>> >>>>> So, all the monitoring group counters will be reset. >>>>> >>>>> It is technically possible to enable without remount. But ABMC mode >>>>> requires few new files(in each group) which I added when mounted with "-o >>>>> abmc". Thought it is a better option. >>>>> >>>>> Otherwise we need to add these files when ABMC is supported(not when >>>>> enabled). Need to add another file in /sys/fs/resctrl/info/L3_MON to >>>>> enable the feature on the fly. >>>>> >>>>> Both are acceptable options. Any thoughts? >>>> The new resctrl files in info/ could always be present. For example, >>>> user space may want to know how many counters are available before >>>> enabling the feature. >>>> >>>> It is not yet obvious to me that this feature requires new files >>>> in monitor groups. >>> There are two MBM events(total and local) in each group. >>> We should provide an interface to assign each event independently. >>> User can assign only one event in a group. We should also provide an >>> option assign both the events in the group. This needs to be done at >>> resctrl group level. >> Understood. I would like to start by considering how (if at all) existing >> files may be used, thus my example of using mbm_total_bytes, before adding >> more files. >> >> >> ... >> >>>>>>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes >>>>>>> Unavailable >>>>>> I believe that "Unavailable" already has an accepted meaning within current >>>>>> interface and is associated with temporary failure. Even the AMD spec states "This >>>>>> is generally a temporary condition and subsequent reads may succeed". In the >>>>>> scenario above there is no chance that this counter would produce a value later. >>>>>> I do not think it is ideal to overload existing interface with different meanings >>>>>> associated with a new hardware specific feature ... something like "Disabled" seems >>>>>> more appropriate. >>>>> Hardware still reports it as unavailable. Also, there are some error cases >>>>> hardware can report unavailable. We may not be able to differentiate that. >>>> This highlights that this resctrl feature is currently latched to AMD's >>>> ABMC. I do not think we should require that this resctrl feature is backed >>>> by hardware that can support reads of counters that are disabled. A counter >>>> read really only needs to be sent to hardware if it is enabled. >>>> >>>>>> Considering this we may even consider using these files themselves as a >>>>>> way to enable the counters if they are disabled. For example, just >>>>>> "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used >>>>> I am not sure about this. This is specific to domain 0. This group can >>>>> have cpus from multiple domains. I think we should have the interface for >>>>> all the domains(not for specific domain). >>>> Are the ABMC registers not per CPU? This is unclear to me at this time >>>> since changelog of patch #13 states it is per-CPU but yet the code >>>> uses smp_call_function_any(). >>> Here are the clarifications from hardware engineer about this. >>> >>> # While configuring the counter, should we have to write (L3_QOS_ABMC_CFG) >>> on all the logical processors in a domain? >>> >>> No. In order to configure a specific counter, you only need to write it >>> on a single logical processor in a domain. Configuring the actual ABMC >>> counter is a side-effect of the write to this register. And the actual >>> ABMC counter configuration is a global state. >>> >>> "Each logical processor implements a separate copy of these registers" >>> identifies that if you write a 5 to L3_QOS_ABMC_CFG on C0T0, you will not >>> read a 5 from the L3_QOS_ABMC_CFG register on C1T0. >> Thank you for this information. Would reading L3_QOS_ABMC_DSC register on >> C1T0 return the configuration written to L3_QOS_ABMC_CFG on C0T0 ? > > Yes. Because the counter counter configuration is global. Reading L3_QOS_ABMC_DSC will return the configuration of the counter specified by > > QOS_ABMC_CFG[CtrID]. To confirm, when you say "global" you mean within a domain? > >> >> Even so, you do confirm that the counter configuration is per domain. If I >> understand correctly the implementation in this series assumes the counters >> are programmed identically on all domains, but theoretically the system can support >> domains with different counter configurations. For example, if a resource group >> is limited to CPUs in one domain it would be unnecessary to consume the other >> domain's counters. > Yes. It is programmed on all the domains. Separating the domain > configuration will require more changes. I am not planning to address > in this series. That may be ok. The priority is to consider how users want to interact with this feature and create a suitable interface to support this. This version may not separate domain configuration, but we do not want to create an the interface that prevents such an enhancement in the future. Especially since it is already known that hardware supports it. Reinette
Hi Reinette, On 12/7/2023 5:26 PM, Reinette Chatre wrote: > Hi Babu, > > On 12/7/2023 3:07 PM, Moger, Babu wrote: >> On 12/7/2023 1:29 PM, Reinette Chatre wrote: >>> On 12/7/2023 8:12 AM, Moger, Babu wrote: >>>> On 12/6/23 12:49, Reinette Chatre wrote: >>>>> On 12/6/2023 7:40 AM, Moger, Babu wrote: >>>>>> On 12/5/23 17:17, Reinette Chatre wrote: >>>>>>> On 11/30/2023 4:57 PM, Babu Moger wrote: >>>>>>>> b. Mount with ABMC support >>>>>>>> #umount /sys/fs/resctrl/ >>>>>>>> #mount -o abmc -t resctrl resctrl /sys/fs/resctrl/ >>>>>>>> >>>>>>> hmmm ... so this requires the user to mount resctrl, determine if the >>>>>>> feature is supported, unmount resctrl, remount resctrl with feature enabled. >>>>>>> Could you please elaborate what prevents this feature from being enabled >>>>>>> without needing to remount resctrl? >>>>>> Spec says >>>>>> "Enabling ABMC: ABMC is enabled by setting L3_QOS_EXT_CFG.ABMC_En=1 (see >>>>>> Figure 19-7). When the state of ABMC_En is changed, it must be changed to >>>>>> the updated value on all logical processors in the QOS Domain. >>>>>> Upon transitions of the ABMC_En the following actions take place: >>>>>> All ABMC assignable bandwidth counters are reset to 0. >>>>>> The L3 default mode bandwidth counters are reset to 0. >>>>>> The L3_QOS_ABMC_CFG MSR is reset to 0." >>>>>> >>>>>> So, all the monitoring group counters will be reset. >>>>>> >>>>>> It is technically possible to enable without remount. But ABMC mode >>>>>> requires few new files(in each group) which I added when mounted with "-o >>>>>> abmc". Thought it is a better option. >>>>>> >>>>>> Otherwise we need to add these files when ABMC is supported(not when >>>>>> enabled). Need to add another file in /sys/fs/resctrl/info/L3_MON to >>>>>> enable the feature on the fly. >>>>>> >>>>>> Both are acceptable options. Any thoughts? >>>>> The new resctrl files in info/ could always be present. For example, >>>>> user space may want to know how many counters are available before >>>>> enabling the feature. >>>>> >>>>> It is not yet obvious to me that this feature requires new files >>>>> in monitor groups. >>>> There are two MBM events(total and local) in each group. >>>> We should provide an interface to assign each event independently. >>>> User can assign only one event in a group. We should also provide an >>>> option assign both the events in the group. This needs to be done at >>>> resctrl group level. >>> Understood. I would like to start by considering how (if at all) existing >>> files may be used, thus my example of using mbm_total_bytes, before adding >>> more files. >>> >>> >>> ... >>> >>>>>>>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes >>>>>>>> Unavailable >>>>>>> I believe that "Unavailable" already has an accepted meaning within current >>>>>>> interface and is associated with temporary failure. Even the AMD spec states "This >>>>>>> is generally a temporary condition and subsequent reads may succeed". In the >>>>>>> scenario above there is no chance that this counter would produce a value later. >>>>>>> I do not think it is ideal to overload existing interface with different meanings >>>>>>> associated with a new hardware specific feature ... something like "Disabled" seems >>>>>>> more appropriate. >>>>>> Hardware still reports it as unavailable. Also, there are some error cases >>>>>> hardware can report unavailable. We may not be able to differentiate that. >>>>> This highlights that this resctrl feature is currently latched to AMD's >>>>> ABMC. I do not think we should require that this resctrl feature is backed >>>>> by hardware that can support reads of counters that are disabled. A counter >>>>> read really only needs to be sent to hardware if it is enabled. >>>>> >>>>>>> Considering this we may even consider using these files themselves as a >>>>>>> way to enable the counters if they are disabled. For example, just >>>>>>> "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used >>>>>> I am not sure about this. This is specific to domain 0. This group can >>>>>> have cpus from multiple domains. I think we should have the interface for >>>>>> all the domains(not for specific domain). >>>>> Are the ABMC registers not per CPU? This is unclear to me at this time >>>>> since changelog of patch #13 states it is per-CPU but yet the code >>>>> uses smp_call_function_any(). >>>> Here are the clarifications from hardware engineer about this. >>>> >>>> # While configuring the counter, should we have to write (L3_QOS_ABMC_CFG) >>>> on all the logical processors in a domain? >>>> >>>> No. In order to configure a specific counter, you only need to write it >>>> on a single logical processor in a domain. Configuring the actual ABMC >>>> counter is a side-effect of the write to this register. And the actual >>>> ABMC counter configuration is a global state. >>>> >>>> "Each logical processor implements a separate copy of these registers" >>>> identifies that if you write a 5 to L3_QOS_ABMC_CFG on C0T0, you will not >>>> read a 5 from the L3_QOS_ABMC_CFG register on C1T0. >>> Thank you for this information. Would reading L3_QOS_ABMC_DSC register on >>> C1T0 return the configuration written to L3_QOS_ABMC_CFG on C0T0 ? >> Yes. Because the counter counter configuration is global. Reading L3_QOS_ABMC_DSC will return the configuration of the counter specified by >> >> QOS_ABMC_CFG[CtrID]. > > To confirm, when you say "global" you mean within a domain? Yes. That is correct. > >>> Even so, you do confirm that the counter configuration is per domain. If I >>> understand correctly the implementation in this series assumes the counters >>> are programmed identically on all domains, but theoretically the system can support >>> domains with different counter configurations. For example, if a resource group >>> is limited to CPUs in one domain it would be unnecessary to consume the other >>> domain's counters. >> Yes. It is programmed on all the domains. Separating the domain >> configuration will require more changes. I am not planning to address >> in this series. > That may be ok. The priority is to consider how users want to interact with this > feature and create a suitable interface to support this. This version may not > separate domain configuration, but we do not want to create an the interface that > prevents such an enhancement in the future. Especially since it is already known > that hardware supports it. Yes. Understood. Thanks Babu
On Tue, Dec 5, 2023 at 3:17 PM Reinette Chatre <reinette.chatre@intel.com> wrote: > On 11/30/2023 4:57 PM, Babu Moger wrote: > > c. Read the monitor states. There will be new file "monitor_state" > > for each monitor group when ABMC feature is enabled. By default, > > both total and local MBM events are in "unassign" state. > > > > #cat /sys/fs/resctrl/monitor_state > > total=unassign;local=unassign > > > > d. Read the event mbm_total_bytes and mbm_local_bytes. Note that MBA > > events are not available until the user assigns the events explicitly. > > Users need to assign the counters to monitor the events in this mode. > > > > #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes > > Unavailable > > How is the llc_occupancy event impacted when ABMC is enabled? Can all RMIDs > still be used to track cache occupancy? > > > > > #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes > > Unavailable > > I believe that "Unavailable" already has an accepted meaning within current > interface and is associated with temporary failure. Even the AMD spec states "This > is generally a temporary condition and subsequent reads may succeed". In the > scenario above there is no chance that this counter would produce a value later. > I do not think it is ideal to overload existing interface with different meanings > associated with a new hardware specific feature ... something like "Disabled" seems > more appropriate. Could we hide event counter files if they're not enabled? Is there value in displaying the value of a non-running counter that will be reset the next time it's enabled? > > Considering this we may even consider using these files themselves as a > way to enable the counters if they are disabled. For example, just > "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used > to enable this counter. No need for a new "monitor_state". Please note that this > is not an official proposal since there are two other use cases that still need to > be considered as we await James's feedback on how this may work for MPAM and > also how this may be useful on AMD hardware that does not support ABMC but > users may want to get similar benefits ([1]) We plan to use the ABMC counters as a window to sample the MB/s rate of a very large number of groups, so there's a serious concern about the number of write syscalls this will take, as they will add up quickly for a large RMID*domain count. To that end, the ideal would be the ability to re-assign all ABMC counters on all domains in a single system call. -Peter
Hi Peter, On 12/8/2023 11:45 AM, Peter Newman wrote: > On Tue, Dec 5, 2023 at 3:17 PM Reinette Chatre > <reinette.chatre@intel.com> wrote: >> On 11/30/2023 4:57 PM, Babu Moger wrote: >>> c. Read the monitor states. There will be new file "monitor_state" >>> for each monitor group when ABMC feature is enabled. By default, >>> both total and local MBM events are in "unassign" state. >>> >>> #cat /sys/fs/resctrl/monitor_state >>> total=unassign;local=unassign >>> >>> d. Read the event mbm_total_bytes and mbm_local_bytes. Note that MBA >>> events are not available until the user assigns the events explicitly. >>> Users need to assign the counters to monitor the events in this mode. >>> >>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes >>> Unavailable >> >> How is the llc_occupancy event impacted when ABMC is enabled? Can all RMIDs >> still be used to track cache occupancy? >> >>> >>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes >>> Unavailable >> >> I believe that "Unavailable" already has an accepted meaning within current >> interface and is associated with temporary failure. Even the AMD spec states "This >> is generally a temporary condition and subsequent reads may succeed". In the >> scenario above there is no chance that this counter would produce a value later. >> I do not think it is ideal to overload existing interface with different meanings >> associated with a new hardware specific feature ... something like "Disabled" seems >> more appropriate. > > Could we hide event counter files if they're not enabled? Is there > value in displaying the value of a non-running counter that will be > reset the next time it's enabled? It may be possible to hide the counter file when it is disabled but in this case it is not clear to me how to communicate to user space that it is an available counter that can be enabled and by hiding the file one mechanism to actually enable the counter is lost. It is not required to display a stale value when a counter is disabled, text like "Disabled" can be used. >> Considering this we may even consider using these files themselves as a >> way to enable the counters if they are disabled. For example, just >> "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used >> to enable this counter. No need for a new "monitor_state". Please note that this >> is not an official proposal since there are two other use cases that still need to >> be considered as we await James's feedback on how this may work for MPAM and >> also how this may be useful on AMD hardware that does not support ABMC but >> users may want to get similar benefits ([1]) > > We plan to use the ABMC counters as a window to sample the MB/s rate > of a very large number of groups, so there's a serious concern about > the number of write syscalls this will take, as they will add up > quickly for a large RMID*domain count. > > To that end, the ideal would be the ability to re-assign all ABMC > counters on all domains in a single system call. Understood. I've already pointed out that this is a use case needing to be considered. Please see [1] - search for "global enable/disable". Reinette [1] https://lore.kernel.org/lkml/e36699cf-c73e-401b-b770-63eba708df38@intel.com/
Hi Reinette/Peter, > -----Original Message----- > From: Reinette Chatre <reinette.chatre@intel.com> > Sent: Thursday, December 7, 2023 1:29 PM > To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net; > fenghua.yu@intel.com; tglx@linutronix.de; mingo@redhat.com; > bp@alien8.de; dave.hansen@linux.intel.com; James Morse > <james.morse@arm.com> > Cc: x86@kernel.org; hpa@zytor.com; paulmck@kernel.org; > rdunlap@infradead.org; tj@kernel.org; peterz@infradead.org; > seanjc@google.com; Phillips, Kim <kim.phillips@amd.com>; > jmattson@google.com; ilpo.jarvinen@linux.intel.com; > jithu.joseph@intel.com; kan.liang@linux.intel.com; Dadhania, Nikunj > <nikunj.dadhania@amd.com>; daniel.sneddon@linux.intel.com; > pbonzini@redhat.com; rick.p.edgecombe@intel.com; rppt@kernel.org; > maciej.wieczor-retman@intel.com; linux-doc@vger.kernel.org; linux- > kernel@vger.kernel.org; eranian@google.com; peternewman@google.com; > Giani, Dhaval <Dhaval.Giani@amd.com> > Subject: Re: [PATCH 00/15] x86/resctrl : Support AMD QoS RMID Pinning > feature > > Hi Babu, > > On 12/7/2023 8:12 AM, Moger, Babu wrote: > > On 12/6/23 12:49, Reinette Chatre wrote: > >> On 12/6/2023 7:40 AM, Moger, Babu wrote: > >>> On 12/5/23 17:17, Reinette Chatre wrote: > >>>> On 11/30/2023 4:57 PM, Babu Moger wrote: > > > >>>>> b. Mount with ABMC support > >>>>> #umount /sys/fs/resctrl/ > >>>>> #mount -o abmc -t resctrl resctrl /sys/fs/resctrl/ > >>>>> > >>>> > >>>> hmmm ... so this requires the user to mount resctrl, determine if > >>>> the feature is supported, unmount resctrl, remount resctrl with feature > enabled. > >>>> Could you please elaborate what prevents this feature from being > >>>> enabled without needing to remount resctrl? > >>> > >>> Spec says > >>> "Enabling ABMC: ABMC is enabled by setting > L3_QOS_EXT_CFG.ABMC_En=1 > >>> (see Figure 19-7). When the state of ABMC_En is changed, it must be > >>> changed to the updated value on all logical processors in the QOS Domain. > >>> Upon transitions of the ABMC_En the following actions take place: > >>> All ABMC assignable bandwidth counters are reset to 0. > >>> The L3 default mode bandwidth counters are reset to 0. > >>> The L3_QOS_ABMC_CFG MSR is reset to 0." > >>> > >>> So, all the monitoring group counters will be reset. > >>> > >>> It is technically possible to enable without remount. But ABMC mode > >>> requires few new files(in each group) which I added when mounted > >>> with "-o abmc". Thought it is a better option. > >>> > >>> Otherwise we need to add these files when ABMC is supported(not when > >>> enabled). Need to add another file in /sys/fs/resctrl/info/L3_MON to > >>> enable the feature on the fly. > >>> > >>> Both are acceptable options. Any thoughts? I think we didn’t conclude on this yet. I will remove the requirement to remount the filesystem to use ABMC. That way users can move back and forth between the modes without having to remount. We need to take care of extra cleanup of states(data structure) when user moves back and forth. Hopefully, I should be able to take care of that. Thanks Babu