Message ID | 20240130222034.37181-1-tony.luck@intel.com |
---|---|
Headers |
Return-Path: <linux-kernel+bounces-45383-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp1530359dyb; Tue, 30 Jan 2024 14:21:18 -0800 (PST) X-Google-Smtp-Source: AGHT+IGE9Sftb6CUfC9w4/x6LUPkXSD2ceNYmZ6o5sUCc2NYHFjstp7tGlPm7oUeSUencZ45AB82 X-Received: by 2002:a4a:d342:0:b0:59a:37c4:8b26 with SMTP id d2-20020a4ad342000000b0059a37c48b26mr5787389oos.0.1706653278480; Tue, 30 Jan 2024 14:21:18 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706653278; cv=pass; d=google.com; s=arc-20160816; b=yBHDzf6yC05mVoMYuArHcX44qBpxiDZ5+KfcHOFiKYYivx4rxU+dt61e5Bq/83tEV7 O8ripgFuFlo8k9f3NqpTUU5pkhbl+dJNu0mxiT9dPLJdnPETVSxSdcXdBr0ADuPzY6SR wAMPg1ahQPgAF3GsQ1j6sUVgj6h0z3RCEOg6IJOde9XC9DqJ6z6sHDexght3vw6Q9CWT orHx3HCO6ADpTe2TerkLJqelXn1NewWp8l4pHxorj1cqR/d0AExuXISSLNhrxmmLP0Qq wMDC8JIUCdxXeMAlx2sgaTwixCEtlDz8gxhDIbCSZqPaVE0YOxpwcxqMBGb4f2Lf/G1n gfMA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=BeDo9e0KT+MT08yv6xGq7GMc0XkkRqprsbzysYNHBao=; fh=kDu4WSgSKQjfqS0+uR7gDD47kd+EMQAW47hbuuOMwvI=; b=0RwJN+x8Qjlwb3e1W9okm41bBmWes6cezdp97CHUw7mBKv7NZ5JkoDHSaWu6kdT9P4 9HyuGY3U1jHz8sdWRfFXAefHZKCDo9cwistFEdOujeG9a59yGmeRj14FLR3PfxPzV+LH Pt+MLoHYC9SBOKb5p9vUemzUxfp1Ui5bMdSgm5GJqoLqcn+jvXOPrIop7Sq2gUB/uFzO fkwo3kBtGAzKFavJ78usx0vuPMA2ae1cGS89l1ZDQRo+E1FHWjrZ/A5vYH823My3QoYw Kr9Bo0IMo2uIzyElQlQ3IM8hsFd8nyQq1WTZ55sCvBBMsvQWw6SoSeiGz3OoxiUiRuWq W/Uw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=NGmizMIL; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-45383-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-45383-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id y11-20020a05620a44cb00b0078401e17974si5365477qkp.106.2024.01.30.14.21.18 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jan 2024 14:21:18 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-45383-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=NGmizMIL; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-45383-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-45383-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 45FF91C22238 for <ouuuleilei@gmail.com>; Tue, 30 Jan 2024 22:21:18 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id F31217869F; Tue, 30 Jan 2024 22:20:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="NGmizMIL" Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1E7A214F61; Tue, 30 Jan 2024 22:20:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.55.52.115 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706653245; cv=none; b=npd/A6rzCGLGaQSD0kn+IOOML+P27XhH7DhrKyMbMfBZByYJFOmMpgtiKoooqD+sWLI2TwlHUdOje7aUxDWejHlWn84hZA00uccZ5i8oPEXQld2IqDF/nRLiNCnP8SgMeWKh8cbCFZotoZSgJNegvz5DTLMnDA9hORDpYWryocc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706653245; c=relaxed/simple; bh=Guvd+5pJ0YBYfgZsiLxWsFaZxzZmc/F6dpcN4WmiJN4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HvIv+xkXMV24ckQMJqTX3eiZQHkoZkRF2Htnp20R7uH/n6sBVzJlzcZWV816HkkmrBkqSlAXMIrdSPC4O6vsVdblH+fletQe96KCzUBgtevemM7/W0FjR3NjWkqVUK+zZHeBtM52G1/KC/CeeN5LBLFxNxjjWEjJBsxV0ckBEvo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=NGmizMIL; arc=none smtp.client-ip=192.55.52.115 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706653244; x=1738189244; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Guvd+5pJ0YBYfgZsiLxWsFaZxzZmc/F6dpcN4WmiJN4=; b=NGmizMILShvq+vq78Cz7loholRhv8bmubMPoGOGcJyya2OV2EjMprCWg 9F3+u1V4WtMSFK9lB11ZXcHiGXmb3C6ZuOaB5uSgvuHVutbS9cIna72SD WfReg9XtRCFZN5jTlSrnvu8OaUoWuO4mDdtyuGnqn+/d3fPnF6FRfKsUi Z8ic+f+Dnb3T0kCVZJYPsEl+jZviwwtfkyHYD/zrtx72+cbQMmnk6oBhf 8VJlund6Ta+teidvB0TVZtjgUDRJqG/BGBg/uQnoIvXW/36Q9i3YeSWXU abIOtlrzqkM2lbh+X26yCxyXPIxWNR6Q8thYRtg/vB2mp/pvIY+/zWWNM A==; X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="403041696" X-IronPort-AV: E=Sophos;i="6.05,230,1701158400"; d="scan'208";a="403041696" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2024 14:20:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10969"; a="1119412835" X-IronPort-AV: E=Sophos;i="6.05,230,1701158400"; d="scan'208";a="1119412835" Received: from agluck-desk3.sc.intel.com ([172.25.222.74]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jan 2024 14:20:42 -0800 From: Tony Luck <tony.luck@intel.com> To: Fenghua Yu <fenghua.yu@intel.com>, Reinette Chatre <reinette.chatre@intel.com>, Peter Newman <peternewman@google.com>, Jonathan Corbet <corbet@lwn.net>, Shuah Khan <skhan@linuxfoundation.org>, x86@kernel.org Cc: Shaopeng Tan <tan.shaopeng@fujitsu.com>, James Morse <james.morse@arm.com>, Jamie Iles <quic_jiles@quicinc.com>, Babu Moger <babu.moger@amd.com>, Randy Dunlap <rdunlap@infradead.org>, Drew Fustini <dfustini@baylibre.com>, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, patches@lists.linux.dev, Tony Luck <tony.luck@intel.com> Subject: [PATCH v15-RFC 0/8] Add support for Sub-NUMA cluster (SNC) systems Date: Tue, 30 Jan 2024 14:20:26 -0800 Message-ID: <20240130222034.37181-1-tony.luck@intel.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240126223837.21835-1-tony.luck@intel.com> References: <20240126223837.21835-1-tony.luck@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772150361455667514 X-GMAIL-MSGID: 1789555668062790968 |
Series |
Add support for Sub-NUMA cluster (SNC) systems
|
|
Message
Luck, Tony
Jan. 30, 2024, 10:20 p.m. UTC
This is the re-worked version of this series that I promised to post
yesterday. Check that e-mail for the arguments for this alternate
approach.
https://lore.kernel.org/all/ZbhLRDvZrxBZDv2j@agluck-desk3/
Apologies to Drew Fustini who I'd somehow dropped from later versions
of this series. Drew: you had made a comment at one point that having
different scopes within a single resource may be useful on RISC-V.
Version 14 included that, but it's gone here. Maybe multiple resctrl
"struct resource" for a single h/w entity like L3 as I'm doing in this
version could work for you too?
Patches 1-5 are almost completely rewritten based around the new
idea to give CMT and MBM their own "resource" instead of sharing
one with L3 CAT. This removes the need for separate domain lists,
and thus most of the churn of the previous version of this series.
Patches 6-8 are largely unchanged. But I removed all the Reviewed
and Tested tags since they are now built on a completely different
base.
Patches are against tip x86/cache:
fc747eebef73 ("x86/resctrl: Remove redundant variable in mbm_config_write_domain()")
Re-work doesn't affect the v14 cover letter, so pasting it here:
The Sub-NUMA cluster feature on some Intel processors partitions the CPUs
that share an L3 cache into two or more sets. This plays havoc with the
Resource Director Technology (RDT) monitoring features. Prior to this
patch Intel has advised that SNC and RDT are incompatible.
Some of these CPU support an MSR that can partition the RMID counters in
the same way. This allows monitoring features to be used. With the caveat
that users must be aware that Linux may migrate tasks more frequently
between SNC nodes than between "regular" NUMA nodes, so reading counters
from all SNC nodes may be needed to get a complete picture of activity
for tasks.
Cache and memory bandwidth allocation features continue to operate at
the scope of the L3 cache.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tony Luck (8):
x86/resctrl: Split the RDT_RESOURCE_L3 resource
x86/resctrl: Move all monitoring functions to RDT_RESOURCE_L3_MON
x86/resctrl: Prepare for non-cache-scoped resources
x86/resctrl: Add helper function to look up domain_id from scope
x86/resctrl: Add "NODE" as an option for resource scope
x86/resctrl: Introduce snc_nodes_per_l3_cache
x86/resctrl: Sub NUMA Cluster detection and enable
x86/resctrl: Update documentation with Sub-NUMA cluster changes
Documentation/arch/x86/resctrl.rst | 25 ++-
include/linux/resctrl.h | 10 +-
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/internal.h | 3 +
arch/x86/kernel/cpu/resctrl/core.c | 181 +++++++++++++++++++++-
arch/x86/kernel/cpu/resctrl/monitor.c | 28 ++--
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 6 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 12 +-
8 files changed, 236 insertions(+), 30 deletions(-)
base-commit: fc747eebef734563cf68a512f57937c8f231834a
Comments
Hi Tony, On 1/30/2024 2:20 PM, Tony Luck wrote: > This is the re-worked version of this series that I promised to post > yesterday. Check that e-mail for the arguments for this alternate > approach. > > https://lore.kernel.org/all/ZbhLRDvZrxBZDv2j@agluck-desk3/ > > Apologies to Drew Fustini who I'd somehow dropped from later versions > of this series. Drew: you had made a comment at one point that having > different scopes within a single resource may be useful on RISC-V. > Version 14 included that, but it's gone here. Maybe multiple resctrl > "struct resource" for a single h/w entity like L3 as I'm doing in this > version could work for you too? > > Patches 1-5 are almost completely rewritten based around the new > idea to give CMT and MBM their own "resource" instead of sharing > one with L3 CAT. This removes the need for separate domain lists, > and thus most of the churn of the previous version of this series. I do not see it as removing the need for separate domain lists but instead keeping the idea of separate domain lists but in this case duplicating the resource in order to host the second domain list. This solution also keeps the same structures for control and monitoring that previous version cleaned up [1]. To me this thus seems like a similar solution as v14 but with additional duplication due to an extra resource and without the cleanup. Reinette [1] https://lore.kernel.org/lkml/20240126223837.21835-5-tony.luck@intel.com/
On Tue, Jan 30, 2024 at 02:20:26PM -0800, Tony Luck wrote: > This is the re-worked version of this series that I promised to post > yesterday. Check that e-mail for the arguments for this alternate > approach. > > https://lore.kernel.org/all/ZbhLRDvZrxBZDv2j@agluck-desk3/ > > Apologies to Drew Fustini who I'd somehow dropped from later versions > of this series. Drew: you had made a comment at one point that having > different scopes within a single resource may be useful on RISC-V. > Version 14 included that, but it's gone here. Maybe multiple resctrl > "struct resource" for a single h/w entity like L3 as I'm doing in this > version could work for you too? Sorry for the latency. The RISC-V CBQRI specification [1] describes a bandwidth controller register interface [2]. It allows a controller to implement both bandwidth allocation and bandwidth usage monitoring. The proof-of-concept resctrl implementation [3] that I worked on created two domains for each memory controller in the example SoC. One domain would contain the MBA resource and the other would contain the L3 resource to represent MBM files like local_bytes: # cat /sys/fs/resctrl/schemata MB:4= 80;6= 80;8= 80 L2:0=0fff;1=0fff L3:2=ffff;3=0000;5=0000;7=0000 Where: Domain 0 is L2 cache controller 0 capacity allocation Domain 1 is L2 cache controller 1 capacity allocation Domain 2 is L3 cache controller capacity allocation Domain 4 is Memory controller 0 bandwidth allocation Domain 6 is Memory controller 1 bandwidth allocation Domain 8 is Memory controller 2 bandwidth allocation Domain 3 is Memory controller 0 bandwidth monitoring Domain 5 is Memory controller 1 bandwidth monitoring Domain 7 is Memory controller 2 bandwidth monitoring I think this scheme is confusing but I wasn't able to find a better way to do it at the time. > Patches 1-5 are almost completely rewritten based around the new > idea to give CMT and MBM their own "resource" instead of sharing > one with L3 CAT. This removes the need for separate domain lists, > and thus most of the churn of the previous version of this series. Very interesting. Do you think I would be able to create MBM files for each memory controller without creating pointless L3 domains that show up in schemata? Thanks, Drew [1] https://github.com/riscv-non-isa/riscv-cbqri/releases/tag/v1.0-rc1 [2] https://github.com/riscv-non-isa/riscv-cbqri/blob/main/qos_bandwidth.adoc [3] https://lore.kernel.org/linux-riscv/20230419111111.477118-1-dfustini@baylibre.com/
> > Patches 1-5 are almost completely rewritten based around the new > > idea to give CMT and MBM their own "resource" instead of sharing > > one with L3 CAT. This removes the need for separate domain lists, > > and thus most of the churn of the previous version of this series. > > Very interesting. Do you think I would be able to create MBM files for > each memory controller without creating pointless L3 domains that show > up in schemata? Entries only show up in the schemata file for resources that are "alloc_capable". So you should be able to make as many rdt_hw_resource structures as you need that are "mon_capable", but not "alloc_capable" ... though making more than one such resource may explore untested areas of the code since there has historically only been one mon_capable resource. It looks like the resource id from the "rid" field is passed through to the "show" functions for MBM and CQM. This patch series splits the one resource that is marked as both mon_capable and alloc_capable into two. Maybe that's a useful cleanup, but maybe not a requirement for what you need. -Tony
Hi Tony, On 1/30/24 16:20, Tony Luck wrote: > This is the re-worked version of this series that I promised to post > yesterday. Check that e-mail for the arguments for this alternate > approach. To be honest, I like this series more than the previous series. I always thought RDT_RESOURCE_L3_MON should have been a separate resource by itself. You need to separate the domain lists for RDT_RESOURCE_L3 and RDT_RESOURCE_L3_MON if you are going this route. I didn't see that in this series. Also I have few other comments as well. Thanks Babu
On Fri, Feb 09, 2024 at 09:27:56AM -0600, Moger, Babu wrote: > Hi Tony, > > On 1/30/24 16:20, Tony Luck wrote: > > This is the re-worked version of this series that I promised to post > > yesterday. Check that e-mail for the arguments for this alternate > > approach. > > To be honest, I like this series more than the previous series. I always > thought RDT_RESOURCE_L3_MON should have been a separate resource by itself. > > You need to separate the domain lists for RDT_RESOURCE_L3 and > RDT_RESOURCE_L3_MON if you are going this route. I didn't see that in this > series. Also I have few other comments as well. They are separated. Each "struct rdt_resource" has its own domain list. Or do you mean break up the struct rdt_domain into the control and monitor versions as was done in the previous series? > > Thanks > Babu >
On 2/9/24 12:31, Tony Luck wrote: > On Fri, Feb 09, 2024 at 09:27:56AM -0600, Moger, Babu wrote: >> Hi Tony, >> >> On 1/30/24 16:20, Tony Luck wrote: >>> This is the re-worked version of this series that I promised to post >>> yesterday. Check that e-mail for the arguments for this alternate >>> approach. >> >> To be honest, I like this series more than the previous series. I always >> thought RDT_RESOURCE_L3_MON should have been a separate resource by itself. >> >> You need to separate the domain lists for RDT_RESOURCE_L3 and >> RDT_RESOURCE_L3_MON if you are going this route. I didn't see that in this >> series. Also I have few other comments as well. > > They are separated. Each "struct rdt_resource" has its own domain list. Yea. You are right. > > Or do you mean break up the struct rdt_domain into the control and > monitor versions as was done in the previous series? No. Not required. Each resource has its own domain list. So, it is separated already as far as I can see. Reinette seem to have some concerns about this series. But, I am fine with both these approaches. I feel this is more clean approach.
On 2/9/2024 11:38 AM, Moger, Babu wrote: > On 2/9/24 12:31, Tony Luck wrote: >> On Fri, Feb 09, 2024 at 09:27:56AM -0600, Moger, Babu wrote: >>> On 1/30/24 16:20, Tony Luck wrote: > > Reinette seem to have some concerns about this series. But, I am fine with > both these approaches. I feel this is more clean approach. I questioned the motivation but never received a response. Reinette
>> Reinette seem to have some concerns about this series. But, I am fine with >> both these approaches. I feel this is more clean approach. > > I questioned the motivation but never received a response. Reinette, Sorry. My motivation was to reduce the amount of code churn that was done in the by the previous incarnation. 9 files changed, 629 insertions(+), 282 deletions(-) Vast amounts of that just added "_mon" or "_ctrl" to structure or variable names. -Tony
Hi Tony, On 2/9/2024 1:36 PM, Luck, Tony wrote: >>> Reinette seem to have some concerns about this series. But, I am fine with >>> both these approaches. I feel this is more clean approach. >> >> I questioned the motivation but never received a response. > > Reinette, > > Sorry. My motivation was to reduce the amount of code churn that > was done in the by the previous incarnation. > > 9 files changed, 629 insertions(+), 282 deletions(-) > > Vast amounts of that just added "_mon" or "_ctrl" to structure > or variable names. I actually had specific points that this response also ignores. Let me repeat and highlight the same points: 1) You claim that this series "removes the need for separate domain lists" ... but then this series does just that (create a separate domain list), but in an obfuscated way (duplicate the resource to have the monitoring domain list in there). 2) You claim this series "reduces amount of code churn", but this is because this series keeps using the same original data structures for separate monitoring and control usages. The previous series made an effort to separate the structures for the different usages but this series does not. What makes it ok in this series to use the same data structures for different usages? Additionally: Regarding "Vast amounts of that just added "_mon" or "_ctrl" to structure or variable names." ... that is because the structures are actually split, no? It is not just renaming for unnecessary churn. What is the benefit of keeping the data structures to be shared between monitor and control usages? If there is a benefit to keeping these data structures, why not just address this aspect in previous solution? Reinette
> I actually had specific points that this response also ignores. > Let me repeat and highlight the same points: > > 1) You claim that this series "removes the need for separate domain > lists" ... but then this series does just that (create a separate > domain list), but in an obfuscated way (duplicate the resource to > have the monitoring domain list in there). That was poorly worded on my part. I should have said "removes the need for separate domain lists within a single rdt_resource". Adding an extra domain list to a resource may be the start of a slippery slope. What if there is some additional "L3"-like resctrl operation that acts at the socket level (Intel has made products with multiple L3 instances per socket before). Would you be OK add a third domain list to every struct rdt_resource to handle this? Or would it be simpler to just add a new rdt_resource structure with socket scoped domains? > 2) You claim this series "reduces amount of code churn", but this is > because this series keeps using the same original data structures > for separate monitoring and control usages. The previous series made > an effort to separate the structures for the different usages > but this series does not. What makes it ok in this series to > use the same data structures for different usages? Legacy resctrl has been using the same rdt_domain structure for both usages since the dawn of time. So it has been OK up until now. > Additionally: > > Regarding "Vast amounts of that just added "_mon" or "_ctrl" to structure > or variable names." ... that is because the structures are actually split, > no? It is not just renaming for unnecessary churn. Perhaps not "unnecessary" churn. But certainly a lot of code change for what I perceive as very little real gain. > What is the benefit of keeping the data structures to be shared > between monitor and control usages? Benefit is no code changes. Cost is continuing to waste memory with structures that are slightly bigger than they need to be. > If there is a benefit to keeping these data structures, why not just > address this aspect in previous solution? The previous solution evolved to splitting these structures. But this happened incrementally (remember that at an early stage the monitor structures all got the "_mon" addition to their names, but the control structures kept the original names). Only when I got to the end of this process did I look at the magnitude of the change. -Tony
Hi Tony, On 2/9/2024 3:44 PM, Luck, Tony wrote: >> I actually had specific points that this response also ignores. >> Let me repeat and highlight the same points: >> >> 1) You claim that this series "removes the need for separate domain >> lists" ... but then this series does just that (create a separate >> domain list), but in an obfuscated way (duplicate the resource to >> have the monitoring domain list in there). > > That was poorly worded on my part. I should have said "removes the > need for separate domain lists within a single rdt_resource". > > Adding an extra domain list to a resource may be the start of a slippery > slope. What if there is some additional "L3"-like resctrl operation that > acts at the socket level (Intel has made products with multiple L3 > instances per socket before). Would you be OK add a third domain > list to every struct rdt_resource to handle this? Or would it be simpler > to just add a new rdt_resource structure with socket scoped domains? This should not be about what is simplest to patch into current resctrl. There is no need to support a new domain list for a new scope. The domain lists support the functionality: control or monitoring. If control has socket scope the existing implementation supports that. If there is another operation supported by a resource apart from control or monitoring then we can consider how to support it when we know what it is. That would also be a great point to decide if the same data structure should just grow to support an operation that not all resources may support. That may depend on the amount of data needed to support this hypothetical operation. > >> 2) You claim this series "reduces amount of code churn", but this is >> because this series keeps using the same original data structures >> for separate monitoring and control usages. The previous series made >> an effort to separate the structures for the different usages >> but this series does not. What makes it ok in this series to >> use the same data structures for different usages? > > Legacy resctrl has been using the same rdt_domain structure for both > usages since the dawn of time. So it has been OK up until now. This is not the same. Legacy resctrl uses the same data structure in the same list for both control and monitoring usages so it is fine to have both monitoring and control data in the data structure. What you are doing in both solutions is to place the same data structure in separate lists for control and monitoring usages. In the one list only the control data is used, on the other only the monitoring data is used. >> Additionally: >> >> Regarding "Vast amounts of that just added "_mon" or "_ctrl" to structure >> or variable names." ... that is because the structures are actually split, >> no? It is not just renaming for unnecessary churn. > > Perhaps not "unnecessary" churn. But certainly a lot of code change for > what I perceive as very little real gain. ok. There may be little gain wrt saving space. One complication with this single data structure is that its content may only be decided based on which list it is part of. It should be obvious to developers when which members are valid. Perhaps this can be addressed with clear documentation of the data structures. > >> What is the benefit of keeping the data structures to be shared >> between monitor and control usages? > > Benefit is no code changes. Cost is continuing to waste memory with > structures that are slightly bigger than they need to be. > >> If there is a benefit to keeping these data structures, why not just >> address this aspect in previous solution? > > The previous solution evolved to splitting these structures. But this > happened incrementally (remember that at an early stage the monitor > structures all got the "_mon" addition to their names, but the control > structures kept the original names). Only when I got to the end of this > process did I look at the magnitude of the change. Not answering my question. Reinette
> >> I actually had specific points that this response also ignores. > >> Let me repeat and highlight the same points: > >> > >> 1) You claim that this series "removes the need for separate domain > >> lists" ... but then this series does just that (create a separate > >> domain list), but in an obfuscated way (duplicate the resource to > >> have the monitoring domain list in there). > > > > That was poorly worded on my part. I should have said "removes the > > need for separate domain lists within a single rdt_resource". > > > > Adding an extra domain list to a resource may be the start of a slippery > > slope. What if there is some additional "L3"-like resctrl operation that > > acts at the socket level (Intel has made products with multiple L3 > > instances per socket before). Would you be OK add a third domain > > list to every struct rdt_resource to handle this? Or would it be simpler > > to just add a new rdt_resource structure with socket scoped domains? > > This should not be about what is simplest to patch into current resctrl. I wanted to offer this in case Boris also thought that the previous version was too much churn to support an obscure Intel-only (so far) feature. But if you are going to Nack this new version on the grounds that it muddies the water about usage of the rdt_domain structure, then I will abandon it. > There is no need to support a new domain list for a new scope. The domain > lists support the functionality: control or monitoring. If control has > socket scope the existing implementation supports that. > If there is another operation supported by a resource apart from > control or monitoring then we can consider how to support it when > we know what it is. That would also be a great point to decide if > the same data structure should just grow to support an operation that > not all resources may support. That may depend on the amount of data > needed to support this hypothetical operation. > > > > >> 2) You claim this series "reduces amount of code churn", but this is > >> because this series keeps using the same original data structures > >> for separate monitoring and control usages. The previous series made > >> an effort to separate the structures for the different usages > >> but this series does not. What makes it ok in this series to > >> use the same data structures for different usages? > > > > Legacy resctrl has been using the same rdt_domain structure for both > > usages since the dawn of time. So it has been OK up until now. > > This is not the same. > > Legacy resctrl uses the same data structure in the same list for both control > and monitoring usages so it is fine to have both monitoring and control data > in the data structure. > > What you are doing in both solutions is to place the same data structure > in separate lists for control and monitoring usages. In the one list only the > control data is used, on the other only the monitoring data is used. > > >> Additionally: > >> > >> Regarding "Vast amounts of that just added "_mon" or "_ctrl" to structure > >> or variable names." ... that is because the structures are actually split, > >> no? It is not just renaming for unnecessary churn. > > > > Perhaps not "unnecessary" churn. But certainly a lot of code change for > > what I perceive as very little real gain. > > ok. There may be little gain wrt saving space. One complication with > this single data structure is that its content may only be decided based > on which list it is part of. It should be obvious to developers when > which members are valid. Perhaps this can be addressed with clear > documentation of the data structures. > > > > >> What is the benefit of keeping the data structures to be shared > >> between monitor and control usages? > > > > Benefit is no code changes. Cost is continuing to waste memory with > > structures that are slightly bigger than they need to be. > > > >> If there is a benefit to keeping these data structures, why not just > >> address this aspect in previous solution? > > > > The previous solution evolved to splitting these structures. But this > > happened incrementally (remember that at an early stage the monitor > > structures all got the "_mon" addition to their names, but the control > > structures kept the original names). Only when I got to the end of this > > process did I look at the magnitude of the change. > > Not answering my question. I'm not exactly sure what "aspect" you thought could be addressed in the previous series. But the point is moot now. This diversion from the series has come to a dead end, and I hope that Boris will look at v14 (either before the next group of ARM patches, or after). -Tony
Hi Babu, On 2/9/2024 7:27 AM, Moger, Babu wrote: > To be honest, I like this series more than the previous series. I always > thought RDT_RESOURCE_L3_MON should have been a separate resource by itself. Would you prefer that your "Reviewed-by" tag be removed from the previous series? Reinette
>> To be honest, I like this series more than the previous series. I always >> thought RDT_RESOURCE_L3_MON should have been a separate resource by itself. > > Would you prefer that your "Reviewed-by" tag be removed from the > previous series? I'm thinking that I could continue splitting things and break "struct rdt_resource" into separate "ctrl" and "mon" structures. Then we'd have a clean split from top to bottom. Doing that would get rid of the rdt_resources_all[] array. Replacing with individual rdt_hw_ctrl_resource and rdt_hw_mon_resource declarations for each feature. Features found on a system would be added to a list of ctrl or list of mon resources. -Tony
On 2/12/24 13:44, Reinette Chatre wrote: > Hi Babu, > > On 2/9/2024 7:27 AM, Moger, Babu wrote: > >> To be honest, I like this series more than the previous series. I always >> thought RDT_RESOURCE_L3_MON should have been a separate resource by itself. > > Would you prefer that your "Reviewed-by" tag be removed from the > previous series? > Sure. I will plan to review again the new series when Tony submits v16.
Hi Tony, On 2/12/2024 11:57 AM, Luck, Tony wrote: >>> To be honest, I like this series more than the previous series. I always >>> thought RDT_RESOURCE_L3_MON should have been a separate resource by itself. >> >> Would you prefer that your "Reviewed-by" tag be removed from the >> previous series? > > I'm thinking that I could continue splitting things and break "struct rdt_resource" into > separate "ctrl" and "mon" structures. Then we'd have a clean split from top to bottom. It is not obvious what you mean with "continue splitting things". Are you speaking about "continue splitting from v14" or "continue splitting from v15-RFC"? I think that any solution needs to consider what makes sense for resctrl as a whole instead of how to support SNC with smallest patch possible. There should not be any changes that makes resctrl harder to understand and maintain, as exemplified by confusion introduced by a simple thing as resource name choice [1]. > > Doing that would get rid of the rdt_resources_all[] array. Replacing with individual > rdt_hw_ctrl_resource and rdt_hw_mon_resource declarations for each feature. > > Features found on a system would be added to a list of ctrl or list of mon resources. Could you please elaborate what is architecturally wrong with v14 and how this new proposal addresses that? Reinette [1] https://lore.kernel.org/lkml/ZcZyqs5hnQqZ5ZV0@agluck-desk3/
On Mon, Feb 12, 2024 at 01:43:56PM -0800, Reinette Chatre wrote: > Hi Tony, > > On 2/12/2024 11:57 AM, Luck, Tony wrote: > >>> To be honest, I like this series more than the previous series. I always > >>> thought RDT_RESOURCE_L3_MON should have been a separate resource by itself. > >> > >> Would you prefer that your "Reviewed-by" tag be removed from the > >> previous series? > > > > I'm thinking that I could continue splitting things and break "struct rdt_resource" into > > separate "ctrl" and "mon" structures. Then we'd have a clean split from top to bottom. > > It is not obvious what you mean with "continue splitting things". Are you > speaking about "continue splitting from v14" or "continue splitting from v15-RFC"? I'm speaking of some future potential changes. Not proposing to do this now. > I think that any solution needs to consider what makes sense for resctrl > as a whole instead of how to support SNC with smallest patch possible. I am officially abandoning my v15-RFC patches. I wasn't clear enough in my e-mail earlier today. https://lore.kernel.org/all/SJ1PR11MB608378D1304224D9E8A9016FFC482@SJ1PR11MB6083.namprd11.prod.outlook.com/ > > There should not be any changes that makes resctrl harder to understand > and maintain, as exemplified by confusion introduced by a simple thing as > resource name choice [1]. > > > > > Doing that would get rid of the rdt_resources_all[] array. Replacing with individual > > rdt_hw_ctrl_resource and rdt_hw_mon_resource declarations for each feature. > > > > Features found on a system would be added to a list of ctrl or list of mon resources. > > Could you please elaborate what is architecturally wrong with v14 and how this > new proposal addresses that? There is nothing architecturally wrong with v14. I thought it was more complex than it needed to be. You have convinced me that my v15-RFC series, while simpler, is not a reasonable path for long-term resctrl maintainability. > > Reinette > > [1] https://lore.kernel.org/lkml/ZcZyqs5hnQqZ5ZV0@agluck-desk3/ -Tony
Hello, On 12/02/2024 22:05, Tony Luck wrote: > On Mon, Feb 12, 2024 at 01:43:56PM -0800, Reinette Chatre wrote: >> On 2/12/2024 11:57 AM, Luck, Tony wrote: >>>>> To be honest, I like this series more than the previous series. I always >>>>> thought RDT_RESOURCE_L3_MON should have been a separate resource by itself. >>>> >>>> Would you prefer that your "Reviewed-by" tag be removed from the >>>> previous series? >>> >>> I'm thinking that I could continue splitting things and break "struct rdt_resource" into >>> separate "ctrl" and "mon" structures. Then we'd have a clean split from top to bottom. >> >> It is not obvious what you mean with "continue splitting things". Are you >> speaking about "continue splitting from v14" or "continue splitting from v15-RFC"? > > I'm speaking of some future potential changes. Not proposing to > do this now. > >> I think that any solution needs to consider what makes sense for resctrl >> as a whole instead of how to support SNC with smallest patch possible. >> There should not be any changes that makes resctrl harder to understand >> and maintain, as exemplified by confusion introduced by a simple thing as >> resource name choice [1]. >> >>> >>> Doing that would get rid of the rdt_resources_all[] array. Replacing with individual >>> rdt_hw_ctrl_resource and rdt_hw_mon_resource declarations for each feature. >>> >>> Features found on a system would be added to a list of ctrl or list of mon resources. >> >> Could you please elaborate what is architecturally wrong with v14 and how this >> new proposal addresses that? > > There is nothing architecturally wrong with v14. I thought it was more > complex than it needed to be. You have convinced me that my v15-RFC > series, while simpler, is not a reasonable path for long-term resctrl > maintainability. I'm not sure if its helpful to describe a third approach at this point - but on the off chance its useful: With SNC enable, the L3 monitors are unaffected, but the controls behave as if they were part of some other component in the system.. ACPI describes something called "memory side caches" [0] in the HMAT table, which are outside the CPU cache hierarchy, and are associated with a Proximity-Domain. I've heard that one of Arm's partners has built a system with MPAM controls on something like this. How would we support this - and would this be a better fit for the way SNC behaves? I think this would be a new resource and schema, 'MSC'(?) with domain-ids using the NUMA nid. As these aren't CPU caches, they wouldn't appear in the same part of the sysfs hierarchy, and wouldn't necessarily have a cache-id. For SNC systems, I think this would look like CMT on the L3, and CAT on the 'MSC'. Existing software wouldn't know to use the new schema, but equally wouldn't be surprised by the domain-ids being something other than the cache-id, and the controls and monitors not lining up. Where its not quite right for SNC is sysfs may not describe a memory side cache, but one would be present in resctrl. I don't think that's a problem - unless these systems do also have a memory-side-cache that behaves differently. (where is the controls being applied at the 'near' side of the link - I don't think the difference matters) I'm a little nervous that the SNC support looks strange if we ever add support for something like the above. Given its described in ACPI, I assume there are plenty of machines out there that look like this. (Why aren't memory-side-caches a CPU cache? They live near the memory controller and cache based on the PA, not the CPU that issued the transaction) Thanks, James [0] https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#memory-side-cache-overview
> With SNC enable, the L3 monitors are unaffected, but the controls behave as if they were > part of some other component in the system. I don't think of it like that. See attached picture of a single socket divided in two by SNC. [If the attachment is stripped off for those reading this via mailing lists, if you want the picture, just send me an e-mail.] Everything in blue is node 0. Yellow for node 1. The rectangles in the middle represent the L3 cache (12-way associative). When cores in node 0 access memory in node 0, it will be cached using the "top" half of the cache indices. Similarly for node 1 using the "bottom" half. Here’s how each of the Intel L3 resctrl functions operate with SNC enabled: CQM: Reports how much of your half of the L3 cache is occupied MBM: Reports on memory traffic from your half of the cache to your memory controllers. CAT: Still controls which ways of the cache are available for allocation (but each way has half the capacity.) MBA: The same throttling levels applied to "blue" and "yellow" traffic (because there are only socket level controls). > I'm a little nervous that the SNC support looks strange if we ever add support for > something like the above. Given its described in ACPI, I assume there are plenty of > machines out there that look like this. I'm also nervous as h/w designers find various ways to diverge from the old paradigm of socket scope == L3 cache scope == NUMA node scope -Tony