[v8,00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking

Message ID 20231215174343.13872-1-james.morse@arm.com
Headers
Series x86/resctrl: monitored closid+rmid together, separate arch/fs locking |

Message

James Morse Dec. 15, 2023, 5:43 p.m. UTC
  Some of the changes this version are:
 * Fixed a bounds checking bug in cpumask_any_housekeeping(),
 * Moved the kfree() of rmid_ptrs[] later,

Changes are noted in each patch, I've not added 'no changes' notes
if these need double checking anyway. I'll try again next series.

~

This series does two things, it changes resctrl to call resctrl_arch_rmid_read()
in a way that works for MPAM, and it separates the locking so that the arch code
and filesystem code don't have to share a mutex. I tried to split this as two
series, but these touch similar call sites, so it would create more work.

(What's MPAM? See the cover letter of the first series. [1])

On x86 the RMID is an independent number. MPAMs equivalent is PMG, but this
isn't an independent number - it extends the PARTID (same as CLOSID) space
with bits that aren't used to select the configuration. The monitors can
then be told to match specific PMG values, allowing monitor-groups to be
created.

But, MPAM expects the monitors to always monitor by PARTID. The
Cache-storage-utilisation counters can only work this way.
(In the MPAM spec not setting the MATCH_PARTID bit is made CONSTRAINED
UNPREDICTABLE - which is Arm's term to mean portable software can't rely on
this)

It gets worse, as some SoCs may have very few PMG bits. I've seen the
datasheet for one that has a single bit of PMG space.

To be usable, MPAM's counters always need the PARTID and the PMG.
For resctrl, this means always making the CLOSID available when the RMID
is used.

To ensure RMID are always unique, this series combines the CLOSID and RMID
into an index, and manages RMID based on that. For x86, the index and RMID
would always be the same.


Currently the architecture specific code in the cpuhp callbacks takes the
rdtgroup_mutex. This means the filesystem code would have to export this
lock, resulting in an ill-defined interface between the two, and the possibility
of cross-architecture lock-ordering head aches.

The second part of this series adds a domain_list_lock to protect writes to the
domain list, and protects the domain list with RCU - or cpus_read_lock().

Use of RCU is to allow lockless readers of the domain list. To get MPAMs monitors
working, its very likely they'll need to be plumbed up to perf. An uncore PMU
driver would need to be a lockless reader of the domain list.



This series is based on v6.7-rc2, and can be retrieved from:
https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/monitors_and_locking/v8

Bugs welcome,

Thanks,

James

[1] https://lore.kernel.org/lkml/20210728170637.25610-1-james.morse@arm.com/
[v1] https://lore.kernel.org/all/20221021131204.5581-1-james.morse@arm.com/
[v2] https://lore.kernel.org/lkml/20230113175459.14825-1-james.morse@arm.com/
[v3] https://lore.kernel.org/r/20230320172620.18254-1-james.morse@arm.com/
[v4] https://lore.kernel.org/r/20230525180209.19497-1-james.morse@arm.com/
[v5] https://lore.kernel.org/lkml/20230728164254.27562-1-james.morse@arm.com/
[v6] https://lore.kernel.org/all/20230914172138.11977-1-james.morse@arm.com/
[v7] https://lore.kernel.org/r/20231025180345.28061-1-james.morse@arm.com/

James Morse (24):
  tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef
  x86/resctrl: kfree() rmid_ptrs from resctrl_exit()
  x86/resctrl: Create helper for RMID allocation and mondata dir
    creation
  x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare()
  x86/resctrl: Track the closid with the rmid
  x86/resctrl: Access per-rmid structures by index
  x86/resctrl: Allow RMID allocation to be scoped by CLOSID
  x86/resctrl: Track the number of dirty RMID a CLOSID has
  x86/resctrl: Use __set_bit()/__clear_bit() instead of open coding
  x86/resctrl: Allocate the cleanest CLOSID by searching
    closid_num_dirty_rmid
  x86/resctrl: Move CLOSID/RMID matching and setting to use helpers
  x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow
  x86/resctrl: Queue mon_event_read() instead of sending an IPI
  x86/resctrl: Allow resctrl_arch_rmid_read() to sleep
  x86/resctrl: Allow arch to allocate memory needed in
    resctrl_arch_rmid_read()
  x86/resctrl: Make resctrl_mounted checks explicit
  x86/resctrl: Move alloc/mon static keys into helpers
  x86/resctrl: Make rdt_enable_key the arch's decision to switch
  x86/resctrl: Add helpers for system wide mon/alloc capable
  x86/resctrl: Add CPU online callback for resctrl work
  x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but
    cpu
  x86/resctrl: Add CPU offline callback for resctrl work
  x86/resctrl: Move domain helper migration into resctrl_offline_cpu()
  x86/resctrl: Separate arch and fs resctrl locks

 arch/x86/include/asm/resctrl.h            |  90 +++++
 arch/x86/kernel/cpu/resctrl/core.c        | 102 ++---
 arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  48 ++-
 arch/x86/kernel/cpu/resctrl/internal.h    |  67 +++-
 arch/x86/kernel/cpu/resctrl/monitor.c     | 449 +++++++++++++++++-----
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |  15 +-
 arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 359 ++++++++++++-----
 include/linux/resctrl.h                   |  48 ++-
 include/linux/tick.h                      |   9 +-
 9 files changed, 911 insertions(+), 276 deletions(-)
  

Comments

Reinette Chatre Dec. 16, 2023, 5:02 a.m. UTC | #1
Hi James,

On 12/15/2023 9:43 AM, James Morse wrote:
> When a CPU is taken offline resctrl may need to move the overflow or
> limbo handlers to run on a different CPU.
> 
> Once the offline callbacks have been split, cqm_setup_limbo_handler()
> will be called while the CPU that is going offline is still present
> in the cpu_mask.
> 
> Pass the CPU to exclude to cqm_setup_limbo_handler() and
> mbm_setup_overflow_handler(). These functions can use a variant of
> cpumask_any_but() when selecting the CPU. -1 is used to indicate no CPUs
> need excluding.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> Tested-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Tested-by: Peter Newman <peternewman@google.com>
> Tested-by: Babu Moger <babu.moger@amd.com>
> Reviewed-by: Shaopeng Tan <tan.shaopeng@fujitsu.com>
> Reviewed-by: Babu Moger <babu.moger@amd.com>
> ---

Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>

Reinette
  
Carl Worth Dec. 22, 2023, 10:43 p.m. UTC | #2
James Morse <james.morse@arm.com> writes:
> This series does two things, it changes resctrl to call resctrl_arch_rmid_read()
> in a way that works for MPAM, and it separates the locking so that the arch code
> and filesystem code don't have to share a mutex. I tried to split this as two
> series, but these touch similar call sites, so it would create more work.
>
> (What's MPAM? See the cover letter of the first series. [1])

Thanks, James. This is really useful for us at Ampere since it enables
the MPAM driver on top of this series.

I've tested this series on an Ampere implementation, by successfully
using resctrl to configure and exercise MPAM functionality. I can't
speak to the effects of the refactor on x86 since I have not tested that
all.

For the series:

Tested-by: Carl Worth <carl@os.amperecomputing.com>

-Carl
  
Moger, Babu Jan. 3, 2024, 7:42 p.m. UTC | #3
Hi James,
Tested the series. Looks good.
Thanks
Babu

On 12/15/23 11:43, James Morse wrote:
> Some of the changes this version are:
>  * Fixed a bounds checking bug in cpumask_any_housekeeping(),
>  * Moved the kfree() of rmid_ptrs[] later,
> 
> Changes are noted in each patch, I've not added 'no changes' notes
> if these need double checking anyway. I'll try again next series.
> 
> ~
> 
> This series does two things, it changes resctrl to call resctrl_arch_rmid_read()
> in a way that works for MPAM, and it separates the locking so that the arch code
> and filesystem code don't have to share a mutex. I tried to split this as two
> series, but these touch similar call sites, so it would create more work.
> 
> (What's MPAM? See the cover letter of the first series. [1])
> 
> On x86 the RMID is an independent number. MPAMs equivalent is PMG, but this
> isn't an independent number - it extends the PARTID (same as CLOSID) space
> with bits that aren't used to select the configuration. The monitors can
> then be told to match specific PMG values, allowing monitor-groups to be
> created.
> 
> But, MPAM expects the monitors to always monitor by PARTID. The
> Cache-storage-utilisation counters can only work this way.
> (In the MPAM spec not setting the MATCH_PARTID bit is made CONSTRAINED
> UNPREDICTABLE - which is Arm's term to mean portable software can't rely on
> this)
> 
> It gets worse, as some SoCs may have very few PMG bits. I've seen the
> datasheet for one that has a single bit of PMG space.
> 
> To be usable, MPAM's counters always need the PARTID and the PMG.
> For resctrl, this means always making the CLOSID available when the RMID
> is used.
> 
> To ensure RMID are always unique, this series combines the CLOSID and RMID
> into an index, and manages RMID based on that. For x86, the index and RMID
> would always be the same.
> 
> 
> Currently the architecture specific code in the cpuhp callbacks takes the
> rdtgroup_mutex. This means the filesystem code would have to export this
> lock, resulting in an ill-defined interface between the two, and the possibility
> of cross-architecture lock-ordering head aches.
> 
> The second part of this series adds a domain_list_lock to protect writes to the
> domain list, and protects the domain list with RCU - or cpus_read_lock().
> 
> Use of RCU is to allow lockless readers of the domain list. To get MPAMs monitors
> working, its very likely they'll need to be plumbed up to perf. An uncore PMU
> driver would need to be a lockless reader of the domain list.
> 
> 
> 
> This series is based on v6.7-rc2, and can be retrieved from:
> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/monitors_and_locking/v8
> 
> Bugs welcome,
> 
> Thanks,
> 
> James
> 
> [1] https://lore.kernel.org/lkml/20210728170637.25610-1-james.morse@arm.com/
> [v1] https://lore.kernel.org/all/20221021131204.5581-1-james.morse@arm.com/
> [v2] https://lore.kernel.org/lkml/20230113175459.14825-1-james.morse@arm.com/
> [v3] https://lore.kernel.org/r/20230320172620.18254-1-james.morse@arm.com/
> [v4] https://lore.kernel.org/r/20230525180209.19497-1-james.morse@arm.com/
> [v5] https://lore.kernel.org/lkml/20230728164254.27562-1-james.morse@arm.com/
> [v6] https://lore.kernel.org/all/20230914172138.11977-1-james.morse@arm.com/
> [v7] https://lore.kernel.org/r/20231025180345.28061-1-james.morse@arm.com/
> 
> James Morse (24):
>   tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef
>   x86/resctrl: kfree() rmid_ptrs from resctrl_exit()
>   x86/resctrl: Create helper for RMID allocation and mondata dir
>     creation
>   x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare()
>   x86/resctrl: Track the closid with the rmid
>   x86/resctrl: Access per-rmid structures by index
>   x86/resctrl: Allow RMID allocation to be scoped by CLOSID
>   x86/resctrl: Track the number of dirty RMID a CLOSID has
>   x86/resctrl: Use __set_bit()/__clear_bit() instead of open coding
>   x86/resctrl: Allocate the cleanest CLOSID by searching
>     closid_num_dirty_rmid
>   x86/resctrl: Move CLOSID/RMID matching and setting to use helpers
>   x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow
>   x86/resctrl: Queue mon_event_read() instead of sending an IPI
>   x86/resctrl: Allow resctrl_arch_rmid_read() to sleep
>   x86/resctrl: Allow arch to allocate memory needed in
>     resctrl_arch_rmid_read()
>   x86/resctrl: Make resctrl_mounted checks explicit
>   x86/resctrl: Move alloc/mon static keys into helpers
>   x86/resctrl: Make rdt_enable_key the arch's decision to switch
>   x86/resctrl: Add helpers for system wide mon/alloc capable
>   x86/resctrl: Add CPU online callback for resctrl work
>   x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but
>     cpu
>   x86/resctrl: Add CPU offline callback for resctrl work
>   x86/resctrl: Move domain helper migration into resctrl_offline_cpu()
>   x86/resctrl: Separate arch and fs resctrl locks
> 
>  arch/x86/include/asm/resctrl.h            |  90 +++++
>  arch/x86/kernel/cpu/resctrl/core.c        | 102 ++---
>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  48 ++-
>  arch/x86/kernel/cpu/resctrl/internal.h    |  67 +++-
>  arch/x86/kernel/cpu/resctrl/monitor.c     | 449 +++++++++++++++++-----
>  arch/x86/kernel/cpu/resctrl/pseudo_lock.c |  15 +-
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 359 ++++++++++++-----
>  include/linux/resctrl.h                   |  48 ++-
>  include/linux/tick.h                      |   9 +-
>  9 files changed, 911 insertions(+), 276 deletions(-)
>
  
James Morse Jan. 22, 2024, 6:06 p.m. UTC | #4
Hi Reinette,

On 16/12/2023 05:02, Reinette Chatre wrote:
> On 12/15/2023 9:43 AM, James Morse wrote:
>> When a CPU is taken offline resctrl may need to move the overflow or
>> limbo handlers to run on a different CPU.
>>
>> Once the offline callbacks have been split, cqm_setup_limbo_handler()
>> will be called while the CPU that is going offline is still present
>> in the cpu_mask.
>>
>> Pass the CPU to exclude to cqm_setup_limbo_handler() and
>> mbm_setup_overflow_handler(). These functions can use a variant of
>> cpumask_any_but() when selecting the CPU. -1 is used to indicate no CPUs
>> need excluding.

> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>


Thanks!

James
  
James Morse Jan. 22, 2024, 6:06 p.m. UTC | #5
Hi Babu,

On 03/01/2024 19:42, Moger, Babu wrote:
> Hi James,
> Tested the series. Looks good.

Thanks - this was on an AMD machine right? (I've not got access to one of those, so I'm
always nervous about something I may have missed!)


Thanks,

James


> Thanks
> Babu
> 
> On 12/15/23 11:43, James Morse wrote:
>> Some of the changes this version are:
>>  * Fixed a bounds checking bug in cpumask_any_housekeeping(),
>>  * Moved the kfree() of rmid_ptrs[] later,
>>
>> Changes are noted in each patch, I've not added 'no changes' notes
>> if these need double checking anyway. I'll try again next series.
>>
>> ~
>>
>> This series does two things, it changes resctrl to call resctrl_arch_rmid_read()
>> in a way that works for MPAM, and it separates the locking so that the arch code
>> and filesystem code don't have to share a mutex. I tried to split this as two
>> series, but these touch similar call sites, so it would create more work.
>>
>> (What's MPAM? See the cover letter of the first series. [1])
>>
>> On x86 the RMID is an independent number. MPAMs equivalent is PMG, but this
>> isn't an independent number - it extends the PARTID (same as CLOSID) space
>> with bits that aren't used to select the configuration. The monitors can
>> then be told to match specific PMG values, allowing monitor-groups to be
>> created.
>>
>> But, MPAM expects the monitors to always monitor by PARTID. The
>> Cache-storage-utilisation counters can only work this way.
>> (In the MPAM spec not setting the MATCH_PARTID bit is made CONSTRAINED
>> UNPREDICTABLE - which is Arm's term to mean portable software can't rely on
>> this)
>>
>> It gets worse, as some SoCs may have very few PMG bits. I've seen the
>> datasheet for one that has a single bit of PMG space.
>>
>> To be usable, MPAM's counters always need the PARTID and the PMG.
>> For resctrl, this means always making the CLOSID available when the RMID
>> is used.
>>
>> To ensure RMID are always unique, this series combines the CLOSID and RMID
>> into an index, and manages RMID based on that. For x86, the index and RMID
>> would always be the same.
>>
>>
>> Currently the architecture specific code in the cpuhp callbacks takes the
>> rdtgroup_mutex. This means the filesystem code would have to export this
>> lock, resulting in an ill-defined interface between the two, and the possibility
>> of cross-architecture lock-ordering head aches.
>>
>> The second part of this series adds a domain_list_lock to protect writes to the
>> domain list, and protects the domain list with RCU - or cpus_read_lock().
>>
>> Use of RCU is to allow lockless readers of the domain list. To get MPAMs monitors
>> working, its very likely they'll need to be plumbed up to perf. An uncore PMU
>> driver would need to be a lockless reader of the domain list.
>>
>>
>>
>> This series is based on v6.7-rc2, and can be retrieved from:
>> https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/monitors_and_locking/v8
>>
>> Bugs welcome,
>>
>> Thanks,
>>
>> James
>>
>> [1] https://lore.kernel.org/lkml/20210728170637.25610-1-james.morse@arm.com/
>> [v1] https://lore.kernel.org/all/20221021131204.5581-1-james.morse@arm.com/
>> [v2] https://lore.kernel.org/lkml/20230113175459.14825-1-james.morse@arm.com/
>> [v3] https://lore.kernel.org/r/20230320172620.18254-1-james.morse@arm.com/
>> [v4] https://lore.kernel.org/r/20230525180209.19497-1-james.morse@arm.com/
>> [v5] https://lore.kernel.org/lkml/20230728164254.27562-1-james.morse@arm.com/
>> [v6] https://lore.kernel.org/all/20230914172138.11977-1-james.morse@arm.com/
>> [v7] https://lore.kernel.org/r/20231025180345.28061-1-james.morse@arm.com/
>>
>> James Morse (24):
>>   tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef
>>   x86/resctrl: kfree() rmid_ptrs from resctrl_exit()
>>   x86/resctrl: Create helper for RMID allocation and mondata dir
>>     creation
>>   x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare()
>>   x86/resctrl: Track the closid with the rmid
>>   x86/resctrl: Access per-rmid structures by index
>>   x86/resctrl: Allow RMID allocation to be scoped by CLOSID
>>   x86/resctrl: Track the number of dirty RMID a CLOSID has
>>   x86/resctrl: Use __set_bit()/__clear_bit() instead of open coding
>>   x86/resctrl: Allocate the cleanest CLOSID by searching
>>     closid_num_dirty_rmid
>>   x86/resctrl: Move CLOSID/RMID matching and setting to use helpers
>>   x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow
>>   x86/resctrl: Queue mon_event_read() instead of sending an IPI
>>   x86/resctrl: Allow resctrl_arch_rmid_read() to sleep
>>   x86/resctrl: Allow arch to allocate memory needed in
>>     resctrl_arch_rmid_read()
>>   x86/resctrl: Make resctrl_mounted checks explicit
>>   x86/resctrl: Move alloc/mon static keys into helpers
>>   x86/resctrl: Make rdt_enable_key the arch's decision to switch
>>   x86/resctrl: Add helpers for system wide mon/alloc capable
>>   x86/resctrl: Add CPU online callback for resctrl work
>>   x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but
>>     cpu
>>   x86/resctrl: Add CPU offline callback for resctrl work
>>   x86/resctrl: Move domain helper migration into resctrl_offline_cpu()
>>   x86/resctrl: Separate arch and fs resctrl locks
>>
>>  arch/x86/include/asm/resctrl.h            |  90 +++++
>>  arch/x86/kernel/cpu/resctrl/core.c        | 102 ++---
>>  arch/x86/kernel/cpu/resctrl/ctrlmondata.c |  48 ++-
>>  arch/x86/kernel/cpu/resctrl/internal.h    |  67 +++-
>>  arch/x86/kernel/cpu/resctrl/monitor.c     | 449 +++++++++++++++++-----
>>  arch/x86/kernel/cpu/resctrl/pseudo_lock.c |  15 +-
>>  arch/x86/kernel/cpu/resctrl/rdtgroup.c    | 359 ++++++++++++-----
>>  include/linux/resctrl.h                   |  48 ++-
>>  include/linux/tick.h                      |   9 +-
>>  9 files changed, 911 insertions(+), 276 deletions(-)
>>
>
  
James Morse Jan. 22, 2024, 6:06 p.m. UTC | #6
Hi Carl,

On 22/12/2023 22:43, Carl Worth wrote:
> James Morse <james.morse@arm.com> writes:
>> This series does two things, it changes resctrl to call resctrl_arch_rmid_read()
>> in a way that works for MPAM, and it separates the locking so that the arch code
>> and filesystem code don't have to share a mutex. I tried to split this as two
>> series, but these touch similar call sites, so it would create more work.
>>
>> (What's MPAM? See the cover letter of the first series. [1])
> 
> Thanks, James. This is really useful for us at Ampere since it enables
> the MPAM driver on top of this series.
> 
> I've tested this series on an Ampere implementation, by successfully
> using resctrl to configure and exercise MPAM functionality.

Great! Thanks for this.


> I can't
> speak to the effects of the refactor on x86 since I have not tested that
> all.

> For the series:
> Tested-by: Carl Worth <carl@os.amperecomputing.com>

Thanks, I'll add an "# arm64" on the end of that to preserve your above comment about not
testing this on x86.


Thanks,

James