[v6,0/8] Add support for Sub-NUMA cluster (SNC) systems

Message ID 20230928191350.205703-1-tony.luck@intel.com
Headers
Series Add support for Sub-NUMA cluster (SNC) systems |

Message

Luck, Tony Sept. 28, 2023, 7:13 p.m. UTC
  The Sub-NUMA cluster feature on some Intel processors partitions
the CPUs that share an L3 cache into two or more sets. This plays
havoc with the Resource Director Technology (RDT) monitoring features.
Prior to this patch Intel has advised that SNC and RDT are incompatible.

Some of these CPU support an MSR that can partition the RMID
counters in the same way. This allows for monitoring features
to be used (with the caveat that memory accesses between different
SNC NUMA nodes may still not be counted accuratlely.

Note that this patch series improves resctrl reporting considerably
on systems with SNC enabled, but there will still be some anomalies
for processes accessing memory from other sub-NUMA nodes.

Signed-off-by: Tony Luck <tony.luck@intel.com>

---

Summary of changes since v5 - see each patch commit for more specifics

Rebased to v6.6-rc3

0001	Define "scope" enum with values 2, 3 for caches to simplify some
	code (but sanity check before each such usage).
	Better warning messages when scope lookup fails

0002	New patch so that some code can be shared between looking up
	control and monitor domains

0003	Spell "mondomains" as "mon_domains" and be consistent with all
	the other "mon" identifiers also having similar "_".
	Don't leave control stuff with old names, change those too
	so now have ctrl_scope, ctrl_domains, etc.

0004	Use infrastructure from 0002 to have a common rdt_find_domain()
	function for both types of domain structure.
	0003 was using same "rdt_domain" structure for both control
	and monitor domains. Divide it into rdt_ctrl_domain and
	rdt_mon_domain structures with just the fields they need.
	Ditto for rdt_hw_domain. Also split and rename many support
	functions and macros.
	Lots of "fir tree local declaration order" changes because
	lengths of typenames changed.

0005	Better commit description

0006	Better commit and code comments

0007	More explanations in commit and code comments.
	Use consistent naming for "snc_*()" functions.

Patch to update selftests dropped from this series. Someone else
has taken over that work.

Tony Luck (8):
  x86/resctrl: Prepare for new domain scope
  x86/resctrl: Prepare to split rdt_domain structure
  x86/resctrl: Prepare for different scope for control/monitor
    operations
  x86/resctrl: Split the rdt_domain and rdt_hw_domain structures
  x86/resctrl: Add node-scope to the options for feature scope
  x86/resctrl: Introduce snc_nodes_per_l3_cache
  x86/resctrl: Sub NUMA Cluster detection and enable
  x86/resctrl: Update documentation with Sub-NUMA cluster changes

 Documentation/arch/x86/resctrl.rst        |  34 +-
 include/linux/resctrl.h                   |  78 +++--
 arch/x86/include/asm/msr-index.h          |   1 +
 arch/x86/kernel/cpu/resctrl/internal.h    |  66 ++--
 arch/x86/kernel/cpu/resctrl/core.c        | 380 +++++++++++++++++-----
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  52 +--
 arch/x86/kernel/cpu/resctrl/monitor.c     |  58 ++--
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |  14 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 131 ++++----
 9 files changed, 567 insertions(+), 247 deletions(-)


base-commit: 6465e260f48790807eef06b583b38ca9789b6072
  

Comments

Peter Newman Sept. 29, 2023, 2:21 p.m. UTC | #1
Hi Tony,

On Thu, Sep 28, 2023 at 9:14 PM Tony Luck <tony.luck@intel.com> wrote:
>
> Currently supported resctrl features are all domain scoped the same as the
> scope of the L2 or L3 caches.
>
> Add RESCTRL_NODE as a new option for features that are scoped at the
> same granularity as NUMA nodes. This is needed for Intel's Sub-NUMA
> Cluster (SNC) feature where monitoring features are node scoped.
>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
>
> Changes since v5:
>
> Updates to commit message.
>
>  include/linux/resctrl.h            | 1 +
>  arch/x86/kernel/cpu/resctrl/core.c | 2 ++
>  2 files changed, 3 insertions(+)
>
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 1c925e3db2ea..18ed787f9798 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -165,6 +165,7 @@ struct resctrl_schema;
>  enum resctrl_scope {
>         RESCTRL_L2_CACHE = 2,
>         RESCTRL_L3_CACHE = 3,
> +       RESCTRL_NODE,
>  };
>
>  /**
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 726f00c01079..e61bf919ac78 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -511,6 +511,8 @@ static int get_domain_id_from_scope(int cpu, enum resctrl_scope scope)
>         case RESCTRL_L2_CACHE:
>         case RESCTRL_L3_CACHE:
>                 return get_cpu_cacheinfo_id(cpu, scope);
> +       case RESCTRL_NODE:
> +               return cpu_to_node(cpu);
>         default:
>                 break;
>         }
> --
> 2.41.0
>

Looks fine.

Reviewed-by: Peter Newman <peternewman@google.com>
  
Peter Newman Sept. 29, 2023, 2:33 p.m. UTC | #2
Hi Tony,

On Thu, Sep 28, 2023 at 9:14 PM Tony Luck <tony.luck@intel.com> wrote:
>
> The Sub-NUMA cluster feature on some Intel processors partitions
> the CPUs that share an L3 cache into two or more sets. This plays
> havoc with the Resource Director Technology (RDT) monitoring features.
> Prior to this patch Intel has advised that SNC and RDT are incompatible.
>
> Some of these CPU support an MSR that can partition the RMID
> counters in the same way. This allows for monitoring features
> to be used (with the caveat that memory accesses between different
> SNC NUMA nodes may still not be counted accuratlely.

Is an "SNC NUMA node" a "sub-NUMA node", or a NUMA node on which SNC
has been enabled?

Thanks!
-Peter
  
Luck, Tony Sept. 29, 2023, 9:06 p.m. UTC | #3
On Fri, Sep 29, 2023 at 04:33:17PM +0200, Peter Newman wrote:
> Hi Tony,
> 
> On Thu, Sep 28, 2023 at 9:14 PM Tony Luck <tony.luck@intel.com> wrote:
> >
> > The Sub-NUMA cluster feature on some Intel processors partitions
> > the CPUs that share an L3 cache into two or more sets. This plays
> > havoc with the Resource Director Technology (RDT) monitoring features.
> > Prior to this patch Intel has advised that SNC and RDT are incompatible.
> >
> > Some of these CPU support an MSR that can partition the RMID
> > counters in the same way. This allows for monitoring features
> > to be used (with the caveat that memory accesses between different
> > SNC NUMA nodes may still not be counted accuratlely.
> 
> Is an "SNC NUMA node" a "sub-NUMA node", or a NUMA node on which SNC
> has been enabled?

It would be architecturally possible to enable SNC mode on
a subset of CPU sockets. But there isn't a BIOS setup option
to do that. You either have SNC everywhere, or nowhere.

I prefer "SNC NUMA node" == "sub-NUMA node".

This version "NUMA node on which SNC has been enabled"
makes it sound like there is a control on a NUMA node
that can be switched.  The control is on the CPU socket.
That's often equivalent to a NUMA node, but Intel has
had CPUs in the past where this isn't the case (e.g.
Cascade Lake -AP and Cooper Lake).
> 
> Thanks!
> -Peter


Thanks for the review of the series. I've applied changes
to my local tree. Will post v7 of the series early next
week if no other reviews come in.

-Tony