Message ID | 20230928191350.205703-1-tony.luck@intel.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp3580514vqu; Thu, 28 Sep 2023 13:30:00 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH8ZoHvhGfOaeaUopubeUoDOxVzLjskGaWDEjPS8/hlCw2HaMwz9pW7Vqg+KCh3W9WmhCPx X-Received: by 2002:a05:6870:1588:b0:1b0:454b:1c3d with SMTP id j8-20020a056870158800b001b0454b1c3dmr2538403oab.36.1695933000037; Thu, 28 Sep 2023 13:30:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695933000; cv=none; d=google.com; s=arc-20160816; b=KJt2C7/35T5/Cic0oCySKiENai2syWvzAmLzapRcGvSGcpI2lJK7nLJ3VqxtuZ7ECc FJnNxKFMjV0Fo7jr8eEurGYa6KrfhVP5s+TfkWgK6Dq1/OXdbiM0qgA2zVCbbb3nscvZ K32K5sZq2IFG0Zkff7Hw8GPSGDxWRXp/HKrn5qB/Cmz75XefZAz3/jnK9X3oH7j+VwWM r0mAPGRyk3F53ZjFFgT0OGljhEQJsYt/+JrnXiJh738nXF/38BrkUEVPavjq7ej789a1 MgN7uJ8RFnK5XAViBMjD71grP5nDJXUapcA4EQnrw9DSX3e+cpncAnsaF1DOOZ8qTUvp Bt2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Gq4kwGSmqKyZXhHKBgcFC+gcre4lNqovKz2+zXZxKO4=; fh=EIH9XAmicvPIUSP7TBeBhZ/WaoqG49JQ3xV1i3Gl7Co=; b=Dp/Xi2m0QYVmu6v2MzfHmj3cT4Q8M1mrX1PebdXxG4qnevdEuzFrc85aBECneKq0xd bWCwxXb6mqHY7xk7l83cVoY+95d0QVjW6QWtwjNme1MT11y8zNZw07QUt2Bj5y42dkVz 6ZF4Ku4dxX5QJPQYVQfkCYJG5iyQjYsdtexYGI4t+ddE5rRMq8ZWoZ3LxcM8QCCSrSqh rfS5axdtruspbH2ul7jz9LgUCHwjLcYUYJyaLaPlaGY/EWzH9RVjoZD1mjZw7R9ien9f ITD/XqhlhxCqY8QV5ncwuOpB8s7q+Fagzw1px0FSeYqyXmJ4Ne7NChjQYw1RdPKp5rU+ CqlA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=W4h3AV1V; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id s140-20020a632c92000000b005859c221c6dsi332052pgs.221.2023.09.28.13.29.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Sep 2023 13:29:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=W4h3AV1V; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id A8A9C805793E; Thu, 28 Sep 2023 12:15:19 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231860AbjI1TOF (ORCPT <rfc822;pwkd43@gmail.com> + 21 others); Thu, 28 Sep 2023 15:14:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44828 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229478AbjI1TOD (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 28 Sep 2023 15:14:03 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7F9301A7; Thu, 28 Sep 2023 12:14:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695928441; x=1727464441; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=10vGzZXf8lhncYI1MROfTO1wzd14J2xXaiWn996fOio=; b=W4h3AV1V7Rmq1pOSOAYyDIUB/acTFrnMgFaA4I39UfYBjJWSN1uUFsNC 7vnGrB4NPUzn/hur0akhq6ZcBHD063v++Y+mfMN7XeKr39ZWIM6LyG/K4 gcSnZSHZZou7A7SZxuFYYKXJrUFPO1wfiTLsTsmXg3Jj7CAZZl/Ll/iXB pCaIpHHQUuGIM3emo/wiJYUtqmqUgA4joyYCwodFr8mDqY7NCe2jsbpYy 99YbU/A/hHm+iKRE+ARP4rayzc2k5Q2zoLPv8vQ6Kg/WDpJ+UOwDRizxp Pcq36m1VDb+m+MicwZoXFf+TXgjop8vJi9VJC8R69QuZ4x6lx/3pcoCLg g==; X-IronPort-AV: E=McAfee;i="6600,9927,10847"; a="367213859" X-IronPort-AV: E=Sophos;i="6.03,185,1694761200"; d="scan'208";a="367213859" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Sep 2023 12:13:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10847"; a="779020016" X-IronPort-AV: E=Sophos;i="6.03,185,1694761200"; d="scan'208";a="779020016" Received: from agluck-desk3.sc.intel.com ([172.25.222.74]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Sep 2023 12:13:57 -0700 From: Tony Luck <tony.luck@intel.com> To: Fenghua Yu <fenghua.yu@intel.com>, Reinette Chatre <reinette.chatre@intel.com>, Peter Newman <peternewman@google.com>, Jonathan Corbet <corbet@lwn.net>, Shuah Khan <skhan@linuxfoundation.org>, x86@kernel.org Cc: Shaopeng Tan <tan.shaopeng@fujitsu.com>, James Morse <james.morse@arm.com>, Jamie Iles <quic_jiles@quicinc.com>, Babu Moger <babu.moger@amd.com>, Randy Dunlap <rdunlap@infradead.org>, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, patches@lists.linux.dev, Tony Luck <tony.luck@intel.com> Subject: [PATCH v6 0/8] Add support for Sub-NUMA cluster (SNC) systems Date: Thu, 28 Sep 2023 12:13:41 -0700 Message-ID: <20230928191350.205703-1-tony.luck@intel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230829234426.64421-1-tony.luck@intel.com> References: <20230829234426.64421-1-tony.luck@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Thu, 28 Sep 2023 12:15:19 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772150361455667514 X-GMAIL-MSGID: 1778314641167564385 |
Series |
Add support for Sub-NUMA cluster (SNC) systems
|
|
Message
Luck, Tony
Sept. 28, 2023, 7:13 p.m. UTC
The Sub-NUMA cluster feature on some Intel processors partitions
the CPUs that share an L3 cache into two or more sets. This plays
havoc with the Resource Director Technology (RDT) monitoring features.
Prior to this patch Intel has advised that SNC and RDT are incompatible.
Some of these CPU support an MSR that can partition the RMID
counters in the same way. This allows for monitoring features
to be used (with the caveat that memory accesses between different
SNC NUMA nodes may still not be counted accuratlely.
Note that this patch series improves resctrl reporting considerably
on systems with SNC enabled, but there will still be some anomalies
for processes accessing memory from other sub-NUMA nodes.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
Summary of changes since v5 - see each patch commit for more specifics
Rebased to v6.6-rc3
0001 Define "scope" enum with values 2, 3 for caches to simplify some
code (but sanity check before each such usage).
Better warning messages when scope lookup fails
0002 New patch so that some code can be shared between looking up
control and monitor domains
0003 Spell "mondomains" as "mon_domains" and be consistent with all
the other "mon" identifiers also having similar "_".
Don't leave control stuff with old names, change those too
so now have ctrl_scope, ctrl_domains, etc.
0004 Use infrastructure from 0002 to have a common rdt_find_domain()
function for both types of domain structure.
0003 was using same "rdt_domain" structure for both control
and monitor domains. Divide it into rdt_ctrl_domain and
rdt_mon_domain structures with just the fields they need.
Ditto for rdt_hw_domain. Also split and rename many support
functions and macros.
Lots of "fir tree local declaration order" changes because
lengths of typenames changed.
0005 Better commit description
0006 Better commit and code comments
0007 More explanations in commit and code comments.
Use consistent naming for "snc_*()" functions.
Patch to update selftests dropped from this series. Someone else
has taken over that work.
Tony Luck (8):
x86/resctrl: Prepare for new domain scope
x86/resctrl: Prepare to split rdt_domain structure
x86/resctrl: Prepare for different scope for control/monitor
operations
x86/resctrl: Split the rdt_domain and rdt_hw_domain structures
x86/resctrl: Add node-scope to the options for feature scope
x86/resctrl: Introduce snc_nodes_per_l3_cache
x86/resctrl: Sub NUMA Cluster detection and enable
x86/resctrl: Update documentation with Sub-NUMA cluster changes
Documentation/arch/x86/resctrl.rst | 34 +-
include/linux/resctrl.h | 78 +++--
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/internal.h | 66 ++--
arch/x86/kernel/cpu/resctrl/core.c | 380 +++++++++++++++++-----
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 52 +--
arch/x86/kernel/cpu/resctrl/monitor.c | 58 ++--
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 14 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 131 ++++----
9 files changed, 567 insertions(+), 247 deletions(-)
base-commit: 6465e260f48790807eef06b583b38ca9789b6072
Comments
Hi Tony, On Thu, Sep 28, 2023 at 9:14 PM Tony Luck <tony.luck@intel.com> wrote: > > Currently supported resctrl features are all domain scoped the same as the > scope of the L2 or L3 caches. > > Add RESCTRL_NODE as a new option for features that are scoped at the > same granularity as NUMA nodes. This is needed for Intel's Sub-NUMA > Cluster (SNC) feature where monitoring features are node scoped. > > Signed-off-by: Tony Luck <tony.luck@intel.com> > --- > > Changes since v5: > > Updates to commit message. > > include/linux/resctrl.h | 1 + > arch/x86/kernel/cpu/resctrl/core.c | 2 ++ > 2 files changed, 3 insertions(+) > > diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h > index 1c925e3db2ea..18ed787f9798 100644 > --- a/include/linux/resctrl.h > +++ b/include/linux/resctrl.h > @@ -165,6 +165,7 @@ struct resctrl_schema; > enum resctrl_scope { > RESCTRL_L2_CACHE = 2, > RESCTRL_L3_CACHE = 3, > + RESCTRL_NODE, > }; > > /** > diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c > index 726f00c01079..e61bf919ac78 100644 > --- a/arch/x86/kernel/cpu/resctrl/core.c > +++ b/arch/x86/kernel/cpu/resctrl/core.c > @@ -511,6 +511,8 @@ static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope) > case RESCTRL_L2_CACHE: > case RESCTRL_L3_CACHE: > return get_cpu_cacheinfo_id(cpu, scope); > + case RESCTRL_NODE: > + return cpu_to_node(cpu); > default: > break; > } > -- > 2.41.0 > Looks fine. Reviewed-by: Peter Newman <peternewman@google.com>
Hi Tony, On Thu, Sep 28, 2023 at 9:14 PM Tony Luck <tony.luck@intel.com> wrote: > > The Sub-NUMA cluster feature on some Intel processors partitions > the CPUs that share an L3 cache into two or more sets. This plays > havoc with the Resource Director Technology (RDT) monitoring features. > Prior to this patch Intel has advised that SNC and RDT are incompatible. > > Some of these CPU support an MSR that can partition the RMID > counters in the same way. This allows for monitoring features > to be used (with the caveat that memory accesses between different > SNC NUMA nodes may still not be counted accuratlely. Is an "SNC NUMA node" a "sub-NUMA node", or a NUMA node on which SNC has been enabled? Thanks! -Peter
On Fri, Sep 29, 2023 at 04:33:17PM +0200, Peter Newman wrote: > Hi Tony, > > On Thu, Sep 28, 2023 at 9:14 PM Tony Luck <tony.luck@intel.com> wrote: > > > > The Sub-NUMA cluster feature on some Intel processors partitions > > the CPUs that share an L3 cache into two or more sets. This plays > > havoc with the Resource Director Technology (RDT) monitoring features. > > Prior to this patch Intel has advised that SNC and RDT are incompatible. > > > > Some of these CPU support an MSR that can partition the RMID > > counters in the same way. This allows for monitoring features > > to be used (with the caveat that memory accesses between different > > SNC NUMA nodes may still not be counted accuratlely. > > Is an "SNC NUMA node" a "sub-NUMA node", or a NUMA node on which SNC > has been enabled? It would be architecturally possible to enable SNC mode on a subset of CPU sockets. But there isn't a BIOS setup option to do that. You either have SNC everywhere, or nowhere. I prefer "SNC NUMA node" == "sub-NUMA node". This version "NUMA node on which SNC has been enabled" makes it sound like there is a control on a NUMA node that can be switched. The control is on the CPU socket. That's often equivalent to a NUMA node, but Intel has had CPUs in the past where this isn't the case (e.g. Cascade Lake -AP and Cooper Lake). > > Thanks! > -Peter Thanks for the review of the series. I've applied changes to my local tree. Will post v7 of the series early next week if no other reviews come in. -Tony