[00/15] x86/resctrl : Support AMD QoS RMID Pinning feature

Message ID 20231201005720.235639-1-babu.moger@amd.com
Headers
Series x86/resctrl : Support AMD QoS RMID Pinning feature |

Message

Moger, Babu Dec. 1, 2023, 12:57 a.m. UTC
  These series adds the support for AMD QoS RMID Pinning feature. It is also
called ABMC (Assignable Bandwidth Monitoring Counters) feature.

The feature details are available in APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC). The documentation is available at
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

The patches are based on top of commit 
346887b65d89ae987698bc1efd8e5536bd180b3f (tip/master)

# Introduction

AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring
feature only guarantees that RMIDs currently assigned to a processor will
be tracked by hardware. The counters of any other RMIDs which are no
longer being tracked will be reset to zero. The MBM event counters return
"Unavailable" for the RMIDs that are not active.

Users can create 256 or more monitor groups. But there can be only limited
number of groups that can be give guaranteed monitoring numbers. With ever
changing system configuration, there is no way to definitely know which of
these groups will be active for certain point of time. Users do not have
the option to monitor a group or set of groups for certain period of time
without worrying about RMID being reset in between.

The ABMC feature provides an option to pin (or assign) the RMID to the
hardware counter and monitor the bandwidth for a longer duration. The
pinned RMID will be active until the user unpins (or unassigns) it.  There
is no need to worry about counters being reset during this period.
Additionally, the user can specify a bitmask identifying the specific
bandwidth types from the given source to track with the counter.

# Linux Implementation

Hardware provides total of 32 counters available for assignment.
Each Linux resctrl group can be assigned a maximum of 2 counters. One for
mbm_total_bytes and one for mbm_local_bytes. Users also have the option to
assign only one counter to the group. If the system runs out of assignable
counters, the kernel will display the error when the user attempts a new
counter assignment. Users need to unassign already used counters for new
assignments.

# Examples

a. Check if ABMC support is available
	#mount -t resctrl resctrl /sys/fs/resctrl/
	#cat /sys/fs/resctrl/info/L3_MON/mon_features 
	llc_occupancy
	mbm_total_bytes
	mbm_total_bytes_config
	mbm_local_bytes
	mbm_local_bytes_config
	abmc_capable ←  Linux kernel detected ABMC feature.

b. Mount with ABMC support
	#umount /sys/fs/resctrl/
	#mount  -o abmc -t resctrl resctrl /sys/fs/resctrl/
	
c. Read the monitor states. There will be new file "monitor_state"
   for each monitor group when ABMC feature is enabled. By default,
   both total and local MBM events are in "unassign" state.
	
	#cat /sys/fs/resctrl/monitor_state 
	total=unassign;local=unassign
	
d. Read the event mbm_total_bytes and mbm_local_bytes. Note that MBA
   events are not available until the user assigns the events explicitly.
   Users need to assign the counters to monitor the events in this mode.
	
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	Unavailable
	
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 
	Unavailable
	
e. Assign a h/w counter to the total event and read the monitor_state.
	
	#echo total=assign > /sys/fs/resctrl/monitor_state
	#cat /sys/fs/resctrl/monitor_state 
	total=assign;local=unassign
	
f. Now that the total event is assigned. Read the total event.
	
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	6136000
	
g. Assign a h/w counter to both total and local events and read the monitor_state.
	
	#echo "total=assign;local=assign" > /sys/fs/resctrl/monitor_state
	#cat /sys/fs/resctrl/monitor_state
	total=assign;local=assign
	
h. Now that both total and local events are  assigned, read the events.
	
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	6136000
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
	58694
	
i. Check the bandwidth configuration for the group. Note that bandwidth
   configuration has a domain scope. Total event defaults to 0x7F (to
   count all the events) and local event defaults to 0x15
   (to count all the local numa events). The event bitmap decoding is
   available in https://www.kernel.org/doc/Documentation/x86/resctrl.rst
   in section "mbm_total_bytes_config", "mbm_local_bytes_config":
	
	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
	0=0x7f;1=0x7f
	
	#cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 
	0=0x15;1=0xi15
	
j. Change the bandwidth source for domain 0 for the total event to count only reads.
   Note that this change effects events on the domain 0.
	
	#echo total=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
	0=0x33;1=0x7F
	
k. Now read the total event again. The mbm_total_bytes should display
   only the read events.
	
	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
	6136000
	
l. Unmount the resctrl
	 
	#umount /sys/fs/resctrl/

NOTE: For simplicity these examples are run on a default resctrl group.
Similar experiments are can be run non-defaults groups.
---

Babu Moger (15):
  x86/resctrl: Remove hard-coded memory bandwidth limit
  x86/resctrl: Remove hard-coded memory bandwidth event configuration
  x86/resctrl: Add support for Assignable Bandwidth Monitoring Counters
    (ABMC)
  x86/resctrl: Add ABMC feature in the command line options
  x86/resctrl: Detect ABMC feature details
  x86/resctrl: Add the mount option for ABMC feature
  x86/resctrl: Add support to enable/disable ABMC feature
  x86/resctrl: Introduce interface to display number of ABMC counters
  x86/resctrl: Add interface to display monitor state of the group
  x86/resctrl: Initialize ABMC counters bitmap
  x86/resctrl: Add data structures for ABMC assignment
  x86/resctrl: Introduce mbm_total_cfg and mbm_local_cfg
  x86/resctrl: Add the interface to assign a ABMC counter
  x86/resctrl: Add interface unassign a ABMC counter
  x86/resctrl: Update ABMC assignment on event configuration changes

 .../admin-guide/kernel-parameters.txt         |   2 +-
 Documentation/arch/x86/resctrl.rst            |  52 +++
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/msr-index.h              |   2 +
 arch/x86/kernel/cpu/cpuid-deps.c              |   2 +
 arch/x86/kernel/cpu/resctrl/core.c            |  23 +-
 arch/x86/kernel/cpu/resctrl/internal.h        |  49 ++-
 arch/x86/kernel/cpu/resctrl/monitor.c         |  22 +
 arch/x86/kernel/cpu/resctrl/rdtgroup.c        | 415 +++++++++++++++++-
 arch/x86/kernel/cpu/scattered.c               |   1 +
 include/linux/resctrl.h                       |   2 +
 11 files changed, 562 insertions(+), 9 deletions(-)
  

Comments

Peter Newman Dec. 5, 2023, 12:13 a.m. UTC | #1
[+James]

Hi James,

On Thu, Nov 30, 2023 at 4:57 PM Babu Moger <babu.moger@amd.com> wrote:
>
> These series adds the support for AMD QoS RMID Pinning feature. It is also
> called ABMC (Assignable Bandwidth Monitoring Counters) feature.
>
> The feature details are available in APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC). The documentation is available at
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>
> The patches are based on top of commit
> 346887b65d89ae987698bc1efd8e5536bd180b3f (tip/master)
>
> # Introduction
>
> AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring
> feature only guarantees that RMIDs currently assigned to a processor will
> be tracked by hardware. The counters of any other RMIDs which are no
> longer being tracked will be reset to zero. The MBM event counters return
> "Unavailable" for the RMIDs that are not active.
>
> Users can create 256 or more monitor groups. But there can be only limited
> number of groups that can be give guaranteed monitoring numbers. With ever
> changing system configuration, there is no way to definitely know which of
> these groups will be active for certain point of time. Users do not have
> the option to monitor a group or set of groups for certain period of time
> without worrying about RMID being reset in between.
>
> The ABMC feature provides an option to pin (or assign) the RMID to the
> hardware counter and monitor the bandwidth for a longer duration. The
> pinned RMID will be active until the user unpins (or unassigns) it.  There
> is no need to worry about counters being reset during this period.
> Additionally, the user can specify a bitmask identifying the specific
> bandwidth types from the given source to track with the counter.
>
> # Linux Implementation
>
> Hardware provides total of 32 counters available for assignment.
> Each Linux resctrl group can be assigned a maximum of 2 counters. One for
> mbm_total_bytes and one for mbm_local_bytes. Users also have the option to
> assign only one counter to the group. If the system runs out of assignable
> counters, the kernel will display the error when the user attempts a new
> counter assignment. Users need to unassign already used counters for new
> assignments.
>
> # Examples
>
> a. Check if ABMC support is available
>         #mount -t resctrl resctrl /sys/fs/resctrl/
>         #cat /sys/fs/resctrl/info/L3_MON/mon_features
>         llc_occupancy
>         mbm_total_bytes
>         mbm_total_bytes_config
>         mbm_local_bytes
>         mbm_local_bytes_config
>         abmc_capable ←  Linux kernel detected ABMC feature.
>
> b. Mount with ABMC support
>         #umount /sys/fs/resctrl/
>         #mount  -o abmc -t resctrl resctrl /sys/fs/resctrl/
>
> c. Read the monitor states. There will be new file "monitor_state"
>    for each monitor group when ABMC feature is enabled. By default,
>    both total and local MBM events are in "unassign" state.
>
>         #cat /sys/fs/resctrl/monitor_state
>         total=unassign;local=unassign
>
> d. Read the event mbm_total_bytes and mbm_local_bytes. Note that MBA
>    events are not available until the user assigns the events explicitly.
>    Users need to assign the counters to monitor the events in this mode.
>
>         #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>         Unavailable
>
>         #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>         Unavailable
>
> e. Assign a h/w counter to the total event and read the monitor_state.
>
>         #echo total=assign > /sys/fs/resctrl/monitor_state
>         #cat /sys/fs/resctrl/monitor_state
>         total=assign;local=unassign
>
> f. Now that the total event is assigned. Read the total event.
>
>         #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>         6136000
>
> g. Assign a h/w counter to both total and local events and read the monitor_state.
>
>         #echo "total=assign;local=assign" > /sys/fs/resctrl/monitor_state
>         #cat /sys/fs/resctrl/monitor_state
>         total=assign;local=assign
>
> h. Now that both total and local events are  assigned, read the events.
>
>         #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>         6136000
>         #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>         58694

We had briefly discussed this topic of explicit counter assignment in
resctrl earlier this year[1], but you didn't want it to be unique to
MPAM.

Now that a similar capability exists on AMD and an interface is being
proposed, we can talk about this in the context of MPAM again.

With some generalization and refinements, I expect this proposal could
be applied to assigning a limited number of MBWU monitors to
monitoring groups.

Also, I had proposed in another thread[2] applying such an interface
to previous AMD hardware where the monitor assignments cannot be
directly controlled to avoid or reduce the overhead in my soft RMID
proposal.

Thanks!
-Peter

[1] https://lore.kernel.org/all/f8a25b5f-4a7d-0891-1152-33f349059b5d@arm.com/
[2] https://lore.kernel.org/all/CALPaoCjg-W3w8OKLHP_g6Evoo03fbgaOQZrGTLX6vdSLp70=SA@mail.gmail.com/
  
Reinette Chatre Dec. 5, 2023, 11:17 p.m. UTC | #2
(+James)

Hi Babu,

On 11/30/2023 4:57 PM, Babu Moger wrote:
> These series adds the support for AMD QoS RMID Pinning feature. It is also

"These series" - is this series part of a bigger work?

> called ABMC (Assignable Bandwidth Monitoring Counters) feature.
> 
> The feature details are available in APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC). The documentation is available at
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
> 
> The patches are based on top of commit 
> 346887b65d89ae987698bc1efd8e5536bd180b3f (tip/master)
> 
> # Introduction
> 
> AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring
> feature only guarantees that RMIDs currently assigned to a processor will
> be tracked by hardware. The counters of any other RMIDs which are no
> longer being tracked will be reset to zero. The MBM event counters return
> "Unavailable" for the RMIDs that are not active.
> 
> Users can create 256 or more monitor groups. But there can be only limited
> number of groups that can be give guaranteed monitoring numbers. With ever
> changing system configuration, there is no way to definitely know which of
> these groups will be active for certain point of time. Users do not have
> the option to monitor a group or set of groups for certain period of time
> without worrying about RMID being reset in between.
> 
> The ABMC feature provides an option to pin (or assign) the RMID to the
> hardware counter and monitor the bandwidth for a longer duration. The
> pinned RMID will be active until the user unpins (or unassigns) it.  There
> is no need to worry about counters being reset during this period.
> Additionally, the user can specify a bitmask identifying the specific
> bandwidth types from the given source to track with the counter.
> 
> # Linux Implementation
> 
> Hardware provides total of 32 counters available for assignment.
> Each Linux resctrl group can be assigned a maximum of 2 counters. One for
> mbm_total_bytes and one for mbm_local_bytes. Users also have the option to
> assign only one counter to the group. If the system runs out of assignable
> counters, the kernel will display the error when the user attempts a new
> counter assignment. Users need to unassign already used counters for new
> assignments.
> 
> # Examples
> 
> a. Check if ABMC support is available
> 	#mount -t resctrl resctrl /sys/fs/resctrl/
> 	#cat /sys/fs/resctrl/info/L3_MON/mon_features 
> 	llc_occupancy
> 	mbm_total_bytes
> 	mbm_total_bytes_config
> 	mbm_local_bytes
> 	mbm_local_bytes_config
> 	abmc_capable ←  Linux kernel detected ABMC feature.

(Please start thinking about a new name that is not the AMD feature
name. This is added to resctrl filesystem that is the generic interface
used to work with different architectures. This thus needs to be generalized
to what user requires and how it can be accommodated by the hardware ...
this is already expected to be needed by MPAM and having a AMD feature
name could cause confusion.)

> 
> b. Mount with ABMC support
> 	#umount /sys/fs/resctrl/
> 	#mount  -o abmc -t resctrl resctrl /sys/fs/resctrl/
> 	

hmmm ... so this requires the user to mount resctrl, determine if the
feature is supported, unmount resctrl, remount resctrl with feature enabled.
Could you please elaborate what prevents this feature from being enabled
without needing to remount resctrl?

> c. Read the monitor states. There will be new file "monitor_state"
>    for each monitor group when ABMC feature is enabled. By default,
>    both total and local MBM events are in "unassign" state.
> 	
> 	#cat /sys/fs/resctrl/monitor_state 
> 	total=unassign;local=unassign
> 	
> d. Read the event mbm_total_bytes and mbm_local_bytes. Note that MBA
>    events are not available until the user assigns the events explicitly.
>    Users need to assign the counters to monitor the events in this mode.
> 	
> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 	Unavailable

How is the llc_occupancy event impacted when ABMC is enabled? Can all RMIDs
still be used to track cache occupancy?

> 	
> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 
> 	Unavailable

I believe that "Unavailable" already has an accepted meaning within current
interface and is associated with temporary failure. Even the AMD spec states "This
is generally a temporary condition and subsequent reads may succeed". In the
scenario above there is no chance that this counter would produce a value later.
I do not think it is ideal to overload existing interface with different meanings
associated with a new hardware specific feature ... something like "Disabled" seems
more appropriate.

Considering this we may even consider using these files themselves as a
way to enable the counters if they are disabled. For example, just
"echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used
to enable this counter. No need for a new "monitor_state". Please note that this
is not an official proposal since there are two other use cases that still need to
be considered as we await James's feedback on how this may work for MPAM and
also how this may be useful on AMD hardware that does not support ABMC but
users may want to get similar benefits ([1])

> 	
> e. Assign a h/w counter to the total event and read the monitor_state.
> 	
> 	#echo total=assign > /sys/fs/resctrl/monitor_state
> 	#cat /sys/fs/resctrl/monitor_state 
> 	total=assign;local=unassign
> 	
> f. Now that the total event is assigned. Read the total event.
> 	
> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 	6136000
> 	
> g. Assign a h/w counter to both total and local events and read the monitor_state.
> 	
> 	#echo "total=assign;local=assign" > /sys/fs/resctrl/monitor_state
> 	#cat /sys/fs/resctrl/monitor_state
> 	total=assign;local=assign
> 	
> h. Now that both total and local events are  assigned, read the events.
> 	
> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 	6136000
> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> 	58694

It looks like if not all RMIDs asssociated with parent and child groups
have counters then the accumulated counters would just treat the "unassigned"
as zero?

> 	
> i. Check the bandwidth configuration for the group. Note that bandwidth
>    configuration has a domain scope. Total event defaults to 0x7F (to
>    count all the events) and local event defaults to 0x15
>    (to count all the local numa events). The event bitmap decoding is
>    available in https://www.kernel.org/doc/Documentation/x86/resctrl.rst
>    in section "mbm_total_bytes_config", "mbm_local_bytes_config":
> 	
> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
> 	0=0x7f;1=0x7f
> 	
> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 
> 	0=0x15;1=0xi15


These would not be available if system does not support BMEC. From
patch #3 it does not seem as though ABMC is dependent on BMEC.

Is ABMC dependent on BMEC or are they just using the same
config bits?

> 	
> j. Change the bandwidth source for domain 0 for the total event to count only reads.
>    Note that this change effects events on the domain 0.
> 	
> 	#echo total=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 

typo?

> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
> 	0=0x33;1=0x7F
> 	
> k. Now read the total event again. The mbm_total_bytes should display
>    only the read events.
> 	
> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> 	6136000

hmmm ... seems like there is a need to make the MBM events configurable even
if BMEC is not supported.

Reinette


[1] https://lore.kernel.org/lkml/CALPaoCjg-W3w8OKLHP_g6Evoo03fbgaOQZrGTLX6vdSLp70=SA@mail.gmail.com/
  
Moger, Babu Dec. 6, 2023, 3:40 p.m. UTC | #3
Hi Reinette,

On 12/5/23 17:17, Reinette Chatre wrote:
> (+James)
> 
> Hi Babu,
> 
> On 11/30/2023 4:57 PM, Babu Moger wrote:
>> These series adds the support for AMD QoS RMID Pinning feature. It is also
> 
> "These series" - is this series part of a bigger work?

No.
There are some some plans to optimize rmid_reads. Peter is planning to
work on that. But both are independent of each other.

> 
>> called ABMC (Assignable Bandwidth Monitoring Counters) feature.
>>
>> The feature details are available in APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC). The documentation is available at
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>
>> The patches are based on top of commit 
>> 346887b65d89ae987698bc1efd8e5536bd180b3f (tip/master)
>>
>> # Introduction
>>
>> AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring
>> feature only guarantees that RMIDs currently assigned to a processor will
>> be tracked by hardware. The counters of any other RMIDs which are no
>> longer being tracked will be reset to zero. The MBM event counters return
>> "Unavailable" for the RMIDs that are not active.
>>
>> Users can create 256 or more monitor groups. But there can be only limited
>> number of groups that can be give guaranteed monitoring numbers. With ever
>> changing system configuration, there is no way to definitely know which of
>> these groups will be active for certain point of time. Users do not have
>> the option to monitor a group or set of groups for certain period of time
>> without worrying about RMID being reset in between.
>>
>> The ABMC feature provides an option to pin (or assign) the RMID to the
>> hardware counter and monitor the bandwidth for a longer duration. The
>> pinned RMID will be active until the user unpins (or unassigns) it.  There
>> is no need to worry about counters being reset during this period.
>> Additionally, the user can specify a bitmask identifying the specific
>> bandwidth types from the given source to track with the counter.
>>
>> # Linux Implementation
>>
>> Hardware provides total of 32 counters available for assignment.
>> Each Linux resctrl group can be assigned a maximum of 2 counters. One for
>> mbm_total_bytes and one for mbm_local_bytes. Users also have the option to
>> assign only one counter to the group. If the system runs out of assignable
>> counters, the kernel will display the error when the user attempts a new
>> counter assignment. Users need to unassign already used counters for new
>> assignments.
>>
>> # Examples
>>
>> a. Check if ABMC support is available
>> 	#mount -t resctrl resctrl /sys/fs/resctrl/
>> 	#cat /sys/fs/resctrl/info/L3_MON/mon_features 
>> 	llc_occupancy
>> 	mbm_total_bytes
>> 	mbm_total_bytes_config
>> 	mbm_local_bytes
>> 	mbm_local_bytes_config
>> 	abmc_capable ←  Linux kernel detected ABMC feature.
> 
> (Please start thinking about a new name that is not the AMD feature
> name. This is added to resctrl filesystem that is the generic interface
> used to work with different architectures. This thus needs to be generalized
> to what user requires and how it can be accommodated by the hardware ...
> this is already expected to be needed by MPAM and having a AMD feature
> name could cause confusion.)

Yes. Agree.

How about "assign_capable"?

> 
>>
>> b. Mount with ABMC support
>> 	#umount /sys/fs/resctrl/
>> 	#mount  -o abmc -t resctrl resctrl /sys/fs/resctrl/
>> 	
> 
> hmmm ... so this requires the user to mount resctrl, determine if the
> feature is supported, unmount resctrl, remount resctrl with feature enabled.
> Could you please elaborate what prevents this feature from being enabled
> without needing to remount resctrl?

Spec says
"Enabling ABMC: ABMC is enabled by setting L3_QOS_EXT_CFG.ABMC_En=1 (see
Figure 19-7). When the state of ABMC_En is changed, it must be changed to
the updated value on all logical processors in the QOS Domain.
Upon transitions of the ABMC_En the following actions take place:
All ABMC assignable bandwidth counters are reset to 0.
The L3 default mode bandwidth counters are reset to 0.
The L3_QOS_ABMC_CFG MSR is reset to 0."

So, all the monitoring group counters will be reset.

It is technically possible to enable without remount. But ABMC mode
requires few new files(in each group) which I added when mounted with "-o
abmc". Thought it is a better option.

Otherwise we need to add these files when ABMC is supported(not when
enabled). Need to add another file in /sys/fs/resctrl/info/L3_MON to
enable the feature on the fly.

Both are acceptable options. Any thoughts?


> 
>> c. Read the monitor states. There will be new file "monitor_state"
>>    for each monitor group when ABMC feature is enabled. By default,
>>    both total and local MBM events are in "unassign" state.
>> 	
>> 	#cat /sys/fs/resctrl/monitor_state 
>> 	total=unassign;local=unassign
>> 	
>> d. Read the event mbm_total_bytes and mbm_local_bytes. Note that MBA
>>    events are not available until the user assigns the events explicitly.
>>    Users need to assign the counters to monitor the events in this mode.
>> 	
>> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> 	Unavailable
> 
> How is the llc_occupancy event impacted when ABMC is enabled? Can all RMIDs
> still be used to track cache occupancy?

llc_occupancy event is not impacted by ABMC mode. It can be still used to
track cache occupancy.

> 
>> 	
>> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 
>> 	Unavailable
> 
> I believe that "Unavailable" already has an accepted meaning within current
> interface and is associated with temporary failure. Even the AMD spec states "This
> is generally a temporary condition and subsequent reads may succeed". In the
> scenario above there is no chance that this counter would produce a value later.
> I do not think it is ideal to overload existing interface with different meanings
> associated with a new hardware specific feature ... something like "Disabled" seems
> more appropriate.

Hardware still reports it as unavailable. Also, there are some error cases
hardware can report unavailable. We may not be able to differentiate that.

> 
> Considering this we may even consider using these files themselves as a
> way to enable the counters if they are disabled. For example, just
> "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used

I am not sure about this. This is specific to domain 0. This group can
have cpus from multiple domains. I think we should have the interface for
all the domains(not for specific domain).

> to enable this counter. No need for a new "monitor_state". Please note that this
> is not an official proposal since there are two other use cases that still need to
> be considered as we await James's feedback on how this may work for MPAM and
> also how this may be useful on AMD hardware that does not support ABMC but
> users may want to get similar benefits ([1])

Ok. Lets wait for James comments.
> 
>> 	
>> e. Assign a h/w counter to the total event and read the monitor_state.
>> 	
>> 	#echo total=assign > /sys/fs/resctrl/monitor_state
>> 	#cat /sys/fs/resctrl/monitor_state 
>> 	total=assign;local=unassign
>> 	
>> f. Now that the total event is assigned. Read the total event.
>> 	
>> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> 	6136000
>> 	
>> g. Assign a h/w counter to both total and local events and read the monitor_state.
>> 	
>> 	#echo "total=assign;local=assign" > /sys/fs/resctrl/monitor_state
>> 	#cat /sys/fs/resctrl/monitor_state
>> 	total=assign;local=assign
>> 	
>> h. Now that both total and local events are  assigned, read the events.
>> 	
>> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> 	6136000
>> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>> 	58694
> 
> It looks like if not all RMIDs asssociated with parent and child groups
> have counters then the accumulated counters would just treat the "unassigned"
> as zero?

That is correct.

> 
>> 	
>> i. Check the bandwidth configuration for the group. Note that bandwidth
>>    configuration has a domain scope. Total event defaults to 0x7F (to
>>    count all the events) and local event defaults to 0x15
>>    (to count all the local numa events). The event bitmap decoding is
>>    available in https://www.kernel.org/doc/Documentation/x86/resctrl.rst
>>    in section "mbm_total_bytes_config", "mbm_local_bytes_config":
>> 	
>> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
>> 	0=0x7f;1=0x7f
>> 	
>> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config 
>> 	0=0x15;1=0xi15
> 
> 
> These would not be available if system does not support BMEC. From
> patch #3 it does not seem as though ABMC is dependent on BMEC.
> 
> Is ABMC dependent on BMEC or are they just using the same
> config bits?

Good question. They dont have to be dependent on each other. To keep the
rmid_read interface same, we made it dependent on each other. I will add
the dependency in patch 3.

I have added explanation in patch 15.
https://lore.kernel.org/lkml/20231201005720.235639-16-babu.moger@amd.com/


> 
>> 	
>> j. Change the bandwidth source for domain 0 for the total event to count only reads.
>>    Note that this change effects events on the domain 0.
>> 	
>> 	#echo total=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
> 
> typo?

Yes. Cut paste mistake. Will fix it.

> 
>> 	#cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config 
>> 	0=0x33;1=0x7F
>> 	
>> k. Now read the total event again. The mbm_total_bytes should display
>>    only the read events.
>> 	
>> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> 	6136000
> 
> hmmm ... seems like there is a need to make the MBM events configurable even
> if BMEC is not supported.

Yes, in ABMC mode. Will add the dependency. Will use the standard mode if
BMEC and ABMC  are not available.

> 
> Reinette
> 
> 
> [1] https://lore.kernel.org/lkml/CALPaoCjg-W3w8OKLHP_g6Evoo03fbgaOQZrGTLX6vdSLp70=SA@mail.gmail.com/
  
Reinette Chatre Dec. 6, 2023, 6:49 p.m. UTC | #4
Hi Babu,

On 12/6/2023 7:40 AM, Moger, Babu wrote:
> Hi Reinette,
> 
> On 12/5/23 17:17, Reinette Chatre wrote:
>> (+James)
>>
>> Hi Babu,
>>
>> On 11/30/2023 4:57 PM, Babu Moger wrote:
>>> These series adds the support for AMD QoS RMID Pinning feature. It is also
>>
>> "These series" - is this series part of a bigger work?
> 
> No.
> There are some some plans to optimize rmid_reads. Peter is planning to
> work on that. But both are independent of each other.

I would propose that you use "This series" instead to avoid creating
wrong impression.

>>> a. Check if ABMC support is available
>>> 	#mount -t resctrl resctrl /sys/fs/resctrl/
>>> 	#cat /sys/fs/resctrl/info/L3_MON/mon_features 
>>> 	llc_occupancy
>>> 	mbm_total_bytes
>>> 	mbm_total_bytes_config
>>> 	mbm_local_bytes
>>> 	mbm_local_bytes_config
>>> 	abmc_capable ←  Linux kernel detected ABMC feature.
>>
>> (Please start thinking about a new name that is not the AMD feature
>> name. This is added to resctrl filesystem that is the generic interface
>> used to work with different architectures. This thus needs to be generalized
>> to what user requires and how it can be accommodated by the hardware ...
>> this is already expected to be needed by MPAM and having a AMD feature
>> name could cause confusion.)
> 
> Yes. Agree.
> 
> How about "assign_capable"?

Let's wait to learn more about other use case.

> 
>>
>>>
>>> b. Mount with ABMC support
>>> 	#umount /sys/fs/resctrl/
>>> 	#mount  -o abmc -t resctrl resctrl /sys/fs/resctrl/
>>> 	
>>
>> hmmm ... so this requires the user to mount resctrl, determine if the
>> feature is supported, unmount resctrl, remount resctrl with feature enabled.
>> Could you please elaborate what prevents this feature from being enabled
>> without needing to remount resctrl?
> 
> Spec says
> "Enabling ABMC: ABMC is enabled by setting L3_QOS_EXT_CFG.ABMC_En=1 (see
> Figure 19-7). When the state of ABMC_En is changed, it must be changed to
> the updated value on all logical processors in the QOS Domain.
> Upon transitions of the ABMC_En the following actions take place:
> All ABMC assignable bandwidth counters are reset to 0.
> The L3 default mode bandwidth counters are reset to 0.
> The L3_QOS_ABMC_CFG MSR is reset to 0."
> 
> So, all the monitoring group counters will be reset.
> 
> It is technically possible to enable without remount. But ABMC mode
> requires few new files(in each group) which I added when mounted with "-o
> abmc". Thought it is a better option.
> 
> Otherwise we need to add these files when ABMC is supported(not when
> enabled). Need to add another file in /sys/fs/resctrl/info/L3_MON to
> enable the feature on the fly.
> 
> Both are acceptable options. Any thoughts?

The new resctrl files in info/ could always be present. For example,
user space may want to know how many counters are available before
enabling the feature.

It is not yet obvious to me that this feature requires new files
in monitor groups.

>>> c. Read the monitor states. There will be new file "monitor_state"
>>>    for each monitor group when ABMC feature is enabled. By default,
>>>    both total and local MBM events are in "unassign" state.
>>> 	
>>> 	#cat /sys/fs/resctrl/monitor_state 
>>> 	total=unassign;local=unassign
>>> 	
>>> d. Read the event mbm_total_bytes and mbm_local_bytes. Note that MBA
>>>    events are not available until the user assigns the events explicitly.
>>>    Users need to assign the counters to monitor the events in this mode.
>>> 	
>>> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>> 	Unavailable
>>
>> How is the llc_occupancy event impacted when ABMC is enabled? Can all RMIDs
>> still be used to track cache occupancy?
> 
> llc_occupancy event is not impacted by ABMC mode. It can be still used to
> track cache occupancy.
> 
>>
>>> 	
>>> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 
>>> 	Unavailable
>>
>> I believe that "Unavailable" already has an accepted meaning within current
>> interface and is associated with temporary failure. Even the AMD spec states "This
>> is generally a temporary condition and subsequent reads may succeed". In the
>> scenario above there is no chance that this counter would produce a value later.
>> I do not think it is ideal to overload existing interface with different meanings
>> associated with a new hardware specific feature ... something like "Disabled" seems
>> more appropriate.
> 
> Hardware still reports it as unavailable. Also, there are some error cases
> hardware can report unavailable. We may not be able to differentiate that.

This highlights that this resctrl feature is currently latched to AMD's 
ABMC. I do not think we should require that this resctrl feature is backed
by hardware that can support reads of counters that are disabled. A counter
read really only needs to be sent to hardware if it is enabled.

>> Considering this we may even consider using these files themselves as a
>> way to enable the counters if they are disabled. For example, just
>> "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used
> 
> I am not sure about this. This is specific to domain 0. This group can
> have cpus from multiple domains. I think we should have the interface for
> all the domains(not for specific domain).

Are the ABMC registers not per CPU? This is unclear to me at this time
since changelog of patch #13 states it is per-CPU but yet the code
uses smp_call_function_any().

Even so, this needs to take other use cases into account. So far Peter
mentioned the scenario where enabling of one counter would do so for all
events associated with that counter and then there could also be a global
enable/disable.

> 
>> to enable this counter. No need for a new "monitor_state". Please note that this
>> is not an official proposal since there are two other use cases that still need to
>> be considered as we await James's feedback on how this may work for MPAM and
>> also how this may be useful on AMD hardware that does not support ABMC but
>> users may want to get similar benefits ([1])
> 
> Ok. Lets wait for James comments.

Reinette
  
Moger, Babu Dec. 7, 2023, 4:12 p.m. UTC | #5
Hi Reinette,

On 12/6/23 12:49, Reinette Chatre wrote:
> Hi Babu,
> 
> On 12/6/2023 7:40 AM, Moger, Babu wrote:
>> Hi Reinette,
>>
>> On 12/5/23 17:17, Reinette Chatre wrote:
>>> (+James)
>>>
>>> Hi Babu,
>>>
>>> On 11/30/2023 4:57 PM, Babu Moger wrote:
>>>> These series adds the support for AMD QoS RMID Pinning feature. It is also
>>>
>>> "These series" - is this series part of a bigger work?
>>
>> No.
>> There are some some plans to optimize rmid_reads. Peter is planning to
>> work on that. But both are independent of each other.
> 
> I would propose that you use "This series" instead to avoid creating
> wrong impression.

Sure.

> 
>>>> a. Check if ABMC support is available
>>>> 	#mount -t resctrl resctrl /sys/fs/resctrl/
>>>> 	#cat /sys/fs/resctrl/info/L3_MON/mon_features 
>>>> 	llc_occupancy
>>>> 	mbm_total_bytes
>>>> 	mbm_total_bytes_config
>>>> 	mbm_local_bytes
>>>> 	mbm_local_bytes_config
>>>> 	abmc_capable ←  Linux kernel detected ABMC feature.
>>>
>>> (Please start thinking about a new name that is not the AMD feature
>>> name. This is added to resctrl filesystem that is the generic interface
>>> used to work with different architectures. This thus needs to be generalized
>>> to what user requires and how it can be accommodated by the hardware ...
>>> this is already expected to be needed by MPAM and having a AMD feature
>>> name could cause confusion.)
>>
>> Yes. Agree.
>>
>> How about "assign_capable"?
> 
> Let's wait to learn more about other use case.
> 
>>
>>>
>>>>
>>>> b. Mount with ABMC support
>>>> 	#umount /sys/fs/resctrl/
>>>> 	#mount  -o abmc -t resctrl resctrl /sys/fs/resctrl/
>>>> 	
>>>
>>> hmmm ... so this requires the user to mount resctrl, determine if the
>>> feature is supported, unmount resctrl, remount resctrl with feature enabled.
>>> Could you please elaborate what prevents this feature from being enabled
>>> without needing to remount resctrl?
>>
>> Spec says
>> "Enabling ABMC: ABMC is enabled by setting L3_QOS_EXT_CFG.ABMC_En=1 (see
>> Figure 19-7). When the state of ABMC_En is changed, it must be changed to
>> the updated value on all logical processors in the QOS Domain.
>> Upon transitions of the ABMC_En the following actions take place:
>> All ABMC assignable bandwidth counters are reset to 0.
>> The L3 default mode bandwidth counters are reset to 0.
>> The L3_QOS_ABMC_CFG MSR is reset to 0."
>>
>> So, all the monitoring group counters will be reset.
>>
>> It is technically possible to enable without remount. But ABMC mode
>> requires few new files(in each group) which I added when mounted with "-o
>> abmc". Thought it is a better option.
>>
>> Otherwise we need to add these files when ABMC is supported(not when
>> enabled). Need to add another file in /sys/fs/resctrl/info/L3_MON to
>> enable the feature on the fly.
>>
>> Both are acceptable options. Any thoughts?
> 
> The new resctrl files in info/ could always be present. For example,
> user space may want to know how many counters are available before
> enabling the feature.
> 
> It is not yet obvious to me that this feature requires new files
> in monitor groups.

There are two MBM events(total and local) in each group.
We should provide an interface to assign each event independently.
User can assign only one event in a group. We should also provide an
option assign both the events in the group. This needs to be done at
resctrl group level.

> 
>>>> c. Read the monitor states. There will be new file "monitor_state"
>>>>    for each monitor group when ABMC feature is enabled. By default,
>>>>    both total and local MBM events are in "unassign" state.
>>>> 	
>>>> 	#cat /sys/fs/resctrl/monitor_state 
>>>> 	total=unassign;local=unassign
>>>> 	
>>>> d. Read the event mbm_total_bytes and mbm_local_bytes. Note that MBA
>>>>    events are not available until the user assigns the events explicitly.
>>>>    Users need to assign the counters to monitor the events in this mode.
>>>> 	
>>>> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>> 	Unavailable
>>>
>>> How is the llc_occupancy event impacted when ABMC is enabled? Can all RMIDs
>>> still be used to track cache occupancy?
>>
>> llc_occupancy event is not impacted by ABMC mode. It can be still used to
>> track cache occupancy.
>>
>>>
>>>> 	
>>>> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 
>>>> 	Unavailable
>>>
>>> I believe that "Unavailable" already has an accepted meaning within current
>>> interface and is associated with temporary failure. Even the AMD spec states "This
>>> is generally a temporary condition and subsequent reads may succeed". In the
>>> scenario above there is no chance that this counter would produce a value later.
>>> I do not think it is ideal to overload existing interface with different meanings
>>> associated with a new hardware specific feature ... something like "Disabled" seems
>>> more appropriate.
>>
>> Hardware still reports it as unavailable. Also, there are some error cases
>> hardware can report unavailable. We may not be able to differentiate that.
> 
> This highlights that this resctrl feature is currently latched to AMD's 
> ABMC. I do not think we should require that this resctrl feature is backed
> by hardware that can support reads of counters that are disabled. A counter
> read really only needs to be sent to hardware if it is enabled.
> 
>>> Considering this we may even consider using these files themselves as a
>>> way to enable the counters if they are disabled. For example, just
>>> "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used
>>
>> I am not sure about this. This is specific to domain 0. This group can
>> have cpus from multiple domains. I think we should have the interface for
>> all the domains(not for specific domain).
> 
> Are the ABMC registers not per CPU? This is unclear to me at this time
> since changelog of patch #13 states it is per-CPU but yet the code
> uses smp_call_function_any().

Here are the clarifications from hardware engineer about this.

# While configuring the counter, should we have to write (L3_QOS_ABMC_CFG)
on all the logical processors in a domain?

No.  In order to configure a specific counter, you only need to write it
on a  single logical processor in a domain.  Configuring the actual ABMC
counter is a side-effect of the write to this register.  And the actual
ABMC counter configuration is a  global state.

"Each logical processor implements a separate copy of these registers"
identifies that if you write a 5 to L3_QOS_ABMC_CFG on C0T0, you will not
read a 5 from the L3_QOS_ABMC_CFG register on C1T0.


> 
> Even so, this needs to take other use cases into account. So far Peter
> mentioned the scenario where enabling of one counter would do so for all
> events associated with that counter and then there could also be a global
> enable/disable.
> 
>>
>>> to enable this counter. No need for a new "monitor_state". Please note that this
>>> is not an official proposal since there are two other use cases that still need to
>>> be considered as we await James's feedback on how this may work for MPAM and
>>> also how this may be useful on AMD hardware that does not support ABMC but
>>> users may want to get similar benefits ([1])
>>
>> Ok. Lets wait for James comments.
> 
> Reinette
>
  
Reinette Chatre Dec. 7, 2023, 7:29 p.m. UTC | #6
Hi Babu,

On 12/7/2023 8:12 AM, Moger, Babu wrote:
> On 12/6/23 12:49, Reinette Chatre wrote:
>> On 12/6/2023 7:40 AM, Moger, Babu wrote:
>>> On 12/5/23 17:17, Reinette Chatre wrote:
>>>> On 11/30/2023 4:57 PM, Babu Moger wrote:


>>>>> b. Mount with ABMC support
>>>>> 	#umount /sys/fs/resctrl/
>>>>> 	#mount  -o abmc -t resctrl resctrl /sys/fs/resctrl/
>>>>> 	
>>>>
>>>> hmmm ... so this requires the user to mount resctrl, determine if the
>>>> feature is supported, unmount resctrl, remount resctrl with feature enabled.
>>>> Could you please elaborate what prevents this feature from being enabled
>>>> without needing to remount resctrl?
>>>
>>> Spec says
>>> "Enabling ABMC: ABMC is enabled by setting L3_QOS_EXT_CFG.ABMC_En=1 (see
>>> Figure 19-7). When the state of ABMC_En is changed, it must be changed to
>>> the updated value on all logical processors in the QOS Domain.
>>> Upon transitions of the ABMC_En the following actions take place:
>>> All ABMC assignable bandwidth counters are reset to 0.
>>> The L3 default mode bandwidth counters are reset to 0.
>>> The L3_QOS_ABMC_CFG MSR is reset to 0."
>>>
>>> So, all the monitoring group counters will be reset.
>>>
>>> It is technically possible to enable without remount. But ABMC mode
>>> requires few new files(in each group) which I added when mounted with "-o
>>> abmc". Thought it is a better option.
>>>
>>> Otherwise we need to add these files when ABMC is supported(not when
>>> enabled). Need to add another file in /sys/fs/resctrl/info/L3_MON to
>>> enable the feature on the fly.
>>>
>>> Both are acceptable options. Any thoughts?
>>
>> The new resctrl files in info/ could always be present. For example,
>> user space may want to know how many counters are available before
>> enabling the feature.
>>
>> It is not yet obvious to me that this feature requires new files
>> in monitor groups.
> 
> There are two MBM events(total and local) in each group.
> We should provide an interface to assign each event independently.
> User can assign only one event in a group. We should also provide an
> option assign both the events in the group. This needs to be done at
> resctrl group level.

Understood. I would like to start by considering how (if at all) existing
files may be used, thus my example of using mbm_total_bytes, before adding
more files.


...

>>>>> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes 
>>>>> 	Unavailable
>>>>
>>>> I believe that "Unavailable" already has an accepted meaning within current
>>>> interface and is associated with temporary failure. Even the AMD spec states "This
>>>> is generally a temporary condition and subsequent reads may succeed". In the
>>>> scenario above there is no chance that this counter would produce a value later.
>>>> I do not think it is ideal to overload existing interface with different meanings
>>>> associated with a new hardware specific feature ... something like "Disabled" seems
>>>> more appropriate.
>>>
>>> Hardware still reports it as unavailable. Also, there are some error cases
>>> hardware can report unavailable. We may not be able to differentiate that.
>>
>> This highlights that this resctrl feature is currently latched to AMD's 
>> ABMC. I do not think we should require that this resctrl feature is backed
>> by hardware that can support reads of counters that are disabled. A counter
>> read really only needs to be sent to hardware if it is enabled.
>>
>>>> Considering this we may even consider using these files themselves as a
>>>> way to enable the counters if they are disabled. For example, just
>>>> "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used
>>>
>>> I am not sure about this. This is specific to domain 0. This group can
>>> have cpus from multiple domains. I think we should have the interface for
>>> all the domains(not for specific domain).
>>
>> Are the ABMC registers not per CPU? This is unclear to me at this time
>> since changelog of patch #13 states it is per-CPU but yet the code
>> uses smp_call_function_any().
> 
> Here are the clarifications from hardware engineer about this.
> 
> # While configuring the counter, should we have to write (L3_QOS_ABMC_CFG)
> on all the logical processors in a domain?
> 
> No.  In order to configure a specific counter, you only need to write it
> on a  single logical processor in a domain.  Configuring the actual ABMC
> counter is a side-effect of the write to this register.  And the actual
> ABMC counter configuration is a  global state.
> 
> "Each logical processor implements a separate copy of these registers"
> identifies that if you write a 5 to L3_QOS_ABMC_CFG on C0T0, you will not
> read a 5 from the L3_QOS_ABMC_CFG register on C1T0.

Thank you for this information. Would reading L3_QOS_ABMC_DSC register on
C1T0 return the configuration written to L3_QOS_ABMC_CFG on C0T0 ?

Even so, you do confirm that the counter configuration is per domain. If I
understand correctly the implementation in this series assumes the counters
are programmed identically on all domains, but theoretically the system can support
domains with different counter configurations. For example, if a resource group
is limited to CPUs in one domain it would be unnecessary to consume the other
domain's counters.

This also ties into what this feature may morph into when considering the
non-ABMC AMD hardware needing similar interface as well as MPAM. I understand
for MPAM that resources are required for a counter but I do not know their
scope.

Reinette
  
Moger, Babu Dec. 7, 2023, 11:07 p.m. UTC | #7
Hi Reinette,

On 12/7/2023 1:29 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/7/2023 8:12 AM, Moger, Babu wrote:
>> On 12/6/23 12:49, Reinette Chatre wrote:
>>> On 12/6/2023 7:40 AM, Moger, Babu wrote:
>>>> On 12/5/23 17:17, Reinette Chatre wrote:
>>>>> On 11/30/2023 4:57 PM, Babu Moger wrote:
>
>>>>>> b. Mount with ABMC support
>>>>>> 	#umount /sys/fs/resctrl/
>>>>>> 	#mount  -o abmc -t resctrl resctrl /sys/fs/resctrl/
>>>>>> 	
>>>>> hmmm ... so this requires the user to mount resctrl, determine if the
>>>>> feature is supported, unmount resctrl, remount resctrl with feature enabled.
>>>>> Could you please elaborate what prevents this feature from being enabled
>>>>> without needing to remount resctrl?
>>>> Spec says
>>>> "Enabling ABMC: ABMC is enabled by setting L3_QOS_EXT_CFG.ABMC_En=1 (see
>>>> Figure 19-7). When the state of ABMC_En is changed, it must be changed to
>>>> the updated value on all logical processors in the QOS Domain.
>>>> Upon transitions of the ABMC_En the following actions take place:
>>>> All ABMC assignable bandwidth counters are reset to 0.
>>>> The L3 default mode bandwidth counters are reset to 0.
>>>> The L3_QOS_ABMC_CFG MSR is reset to 0."
>>>>
>>>> So, all the monitoring group counters will be reset.
>>>>
>>>> It is technically possible to enable without remount. But ABMC mode
>>>> requires few new files(in each group) which I added when mounted with "-o
>>>> abmc". Thought it is a better option.
>>>>
>>>> Otherwise we need to add these files when ABMC is supported(not when
>>>> enabled). Need to add another file in /sys/fs/resctrl/info/L3_MON to
>>>> enable the feature on the fly.
>>>>
>>>> Both are acceptable options. Any thoughts?
>>> The new resctrl files in info/ could always be present. For example,
>>> user space may want to know how many counters are available before
>>> enabling the feature.
>>>
>>> It is not yet obvious to me that this feature requires new files
>>> in monitor groups.
>> There are two MBM events(total and local) in each group.
>> We should provide an interface to assign each event independently.
>> User can assign only one event in a group. We should also provide an
>> option assign both the events in the group. This needs to be done at
>> resctrl group level.
> Understood. I would like to start by considering how (if at all) existing
> files may be used, thus my example of using mbm_total_bytes, before adding
> more files.
>
>
> ...
>
>>>>>> 	#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>> 	Unavailable
>>>>> I believe that "Unavailable" already has an accepted meaning within current
>>>>> interface and is associated with temporary failure. Even the AMD spec states "This
>>>>> is generally a temporary condition and subsequent reads may succeed". In the
>>>>> scenario above there is no chance that this counter would produce a value later.
>>>>> I do not think it is ideal to overload existing interface with different meanings
>>>>> associated with a new hardware specific feature ... something like "Disabled" seems
>>>>> more appropriate.
>>>> Hardware still reports it as unavailable. Also, there are some error cases
>>>> hardware can report unavailable. We may not be able to differentiate that.
>>> This highlights that this resctrl feature is currently latched to AMD's
>>> ABMC. I do not think we should require that this resctrl feature is backed
>>> by hardware that can support reads of counters that are disabled. A counter
>>> read really only needs to be sent to hardware if it is enabled.
>>>
>>>>> Considering this we may even consider using these files themselves as a
>>>>> way to enable the counters if they are disabled. For example, just
>>>>> "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used
>>>> I am not sure about this. This is specific to domain 0. This group can
>>>> have cpus from multiple domains. I think we should have the interface for
>>>> all the domains(not for specific domain).
>>> Are the ABMC registers not per CPU? This is unclear to me at this time
>>> since changelog of patch #13 states it is per-CPU but yet the code
>>> uses smp_call_function_any().
>> Here are the clarifications from hardware engineer about this.
>>
>> # While configuring the counter, should we have to write (L3_QOS_ABMC_CFG)
>> on all the logical processors in a domain?
>>
>> No.  In order to configure a specific counter, you only need to write it
>> on a  single logical processor in a domain.  Configuring the actual ABMC
>> counter is a side-effect of the write to this register.  And the actual
>> ABMC counter configuration is a  global state.
>>
>> "Each logical processor implements a separate copy of these registers"
>> identifies that if you write a 5 to L3_QOS_ABMC_CFG on C0T0, you will not
>> read a 5 from the L3_QOS_ABMC_CFG register on C1T0.
> Thank you for this information. Would reading L3_QOS_ABMC_DSC register on
> C1T0 return the configuration written to L3_QOS_ABMC_CFG on C0T0 ?

Yes. Because the counter counter configuration is global. Reading 
L3_QOS_ABMC_DSC will return the configuration of the counter specified by

QOS_ABMC_CFG[CtrID].

>
> Even so, you do confirm that the counter configuration is per domain. If I
> understand correctly the implementation in this series assumes the counters
> are programmed identically on all domains, but theoretically the system can support
> domains with different counter configurations. For example, if a resource group
> is limited to CPUs in one domain it would be unnecessary to consume the other
> domain's counters.
Yes. It is programmed on all the domains. Separating the domain 
configuration will require more changes. I am not planning to address in 
this series.
>
> This also ties into what this feature may morph into when considering the
> non-ABMC AMD hardware needing similar interface as well as MPAM. I understand
> for MPAM that resources are required for a counter but I do not know their
> scope.
>
> Reinette
  
Reinette Chatre Dec. 7, 2023, 11:26 p.m. UTC | #8
Hi Babu,

On 12/7/2023 3:07 PM, Moger, Babu wrote:
> On 12/7/2023 1:29 PM, Reinette Chatre wrote:
>> On 12/7/2023 8:12 AM, Moger, Babu wrote:
>>> On 12/6/23 12:49, Reinette Chatre wrote:
>>>> On 12/6/2023 7:40 AM, Moger, Babu wrote:
>>>>> On 12/5/23 17:17, Reinette Chatre wrote:
>>>>>> On 11/30/2023 4:57 PM, Babu Moger wrote:
>>
>>>>>>> b. Mount with ABMC support
>>>>>>>     #umount /sys/fs/resctrl/
>>>>>>>     #mount  -o abmc -t resctrl resctrl /sys/fs/resctrl/
>>>>>>>     
>>>>>> hmmm ... so this requires the user to mount resctrl, determine if the
>>>>>> feature is supported, unmount resctrl, remount resctrl with feature enabled.
>>>>>> Could you please elaborate what prevents this feature from being enabled
>>>>>> without needing to remount resctrl?
>>>>> Spec says
>>>>> "Enabling ABMC: ABMC is enabled by setting L3_QOS_EXT_CFG.ABMC_En=1 (see
>>>>> Figure 19-7). When the state of ABMC_En is changed, it must be changed to
>>>>> the updated value on all logical processors in the QOS Domain.
>>>>> Upon transitions of the ABMC_En the following actions take place:
>>>>> All ABMC assignable bandwidth counters are reset to 0.
>>>>> The L3 default mode bandwidth counters are reset to 0.
>>>>> The L3_QOS_ABMC_CFG MSR is reset to 0."
>>>>>
>>>>> So, all the monitoring group counters will be reset.
>>>>>
>>>>> It is technically possible to enable without remount. But ABMC mode
>>>>> requires few new files(in each group) which I added when mounted with "-o
>>>>> abmc". Thought it is a better option.
>>>>>
>>>>> Otherwise we need to add these files when ABMC is supported(not when
>>>>> enabled). Need to add another file in /sys/fs/resctrl/info/L3_MON to
>>>>> enable the feature on the fly.
>>>>>
>>>>> Both are acceptable options. Any thoughts?
>>>> The new resctrl files in info/ could always be present. For example,
>>>> user space may want to know how many counters are available before
>>>> enabling the feature.
>>>>
>>>> It is not yet obvious to me that this feature requires new files
>>>> in monitor groups.
>>> There are two MBM events(total and local) in each group.
>>> We should provide an interface to assign each event independently.
>>> User can assign only one event in a group. We should also provide an
>>> option assign both the events in the group. This needs to be done at
>>> resctrl group level.
>> Understood. I would like to start by considering how (if at all) existing
>> files may be used, thus my example of using mbm_total_bytes, before adding
>> more files.
>>
>>
>> ...
>>
>>>>>>>     #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>     Unavailable
>>>>>> I believe that "Unavailable" already has an accepted meaning within current
>>>>>> interface and is associated with temporary failure. Even the AMD spec states "This
>>>>>> is generally a temporary condition and subsequent reads may succeed". In the
>>>>>> scenario above there is no chance that this counter would produce a value later.
>>>>>> I do not think it is ideal to overload existing interface with different meanings
>>>>>> associated with a new hardware specific feature ... something like "Disabled" seems
>>>>>> more appropriate.
>>>>> Hardware still reports it as unavailable. Also, there are some error cases
>>>>> hardware can report unavailable. We may not be able to differentiate that.
>>>> This highlights that this resctrl feature is currently latched to AMD's
>>>> ABMC. I do not think we should require that this resctrl feature is backed
>>>> by hardware that can support reads of counters that are disabled. A counter
>>>> read really only needs to be sent to hardware if it is enabled.
>>>>
>>>>>> Considering this we may even consider using these files themselves as a
>>>>>> way to enable the counters if they are disabled. For example, just
>>>>>> "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used
>>>>> I am not sure about this. This is specific to domain 0. This group can
>>>>> have cpus from multiple domains. I think we should have the interface for
>>>>> all the domains(not for specific domain).
>>>> Are the ABMC registers not per CPU? This is unclear to me at this time
>>>> since changelog of patch #13 states it is per-CPU but yet the code
>>>> uses smp_call_function_any().
>>> Here are the clarifications from hardware engineer about this.
>>>
>>> # While configuring the counter, should we have to write (L3_QOS_ABMC_CFG)
>>> on all the logical processors in a domain?
>>>
>>> No.  In order to configure a specific counter, you only need to write it
>>> on a  single logical processor in a domain.  Configuring the actual ABMC
>>> counter is a side-effect of the write to this register.  And the actual
>>> ABMC counter configuration is a  global state.
>>>
>>> "Each logical processor implements a separate copy of these registers"
>>> identifies that if you write a 5 to L3_QOS_ABMC_CFG on C0T0, you will not
>>> read a 5 from the L3_QOS_ABMC_CFG register on C1T0.
>> Thank you for this information. Would reading L3_QOS_ABMC_DSC register on
>> C1T0 return the configuration written to L3_QOS_ABMC_CFG on C0T0 ?
> 
> Yes. Because the counter counter configuration is global. Reading L3_QOS_ABMC_DSC will return the configuration of the counter specified by
> 
> QOS_ABMC_CFG[CtrID].


To confirm, when you say "global" you mean within a domain?

> 
>>
>> Even so, you do confirm that the counter configuration is per domain. If I
>> understand correctly the implementation in this series assumes the counters
>> are programmed identically on all domains, but theoretically the system can support
>> domains with different counter configurations. For example, if a resource group
>> is limited to CPUs in one domain it would be unnecessary to consume the other
>> domain's counters.
> Yes. It is programmed on all the domains. Separating the domain
> configuration will require more changes. I am not planning to address
> in this series.

That may be ok. The priority is to consider how users want to interact with this
feature and create a suitable interface to support this. This version may not
separate domain configuration, but we do not want to create an the interface that
prevents such an enhancement in the future. Especially since it is already known
that hardware supports it.

Reinette
  
Moger, Babu Dec. 7, 2023, 11:34 p.m. UTC | #9
Hi Reinette,

On 12/7/2023 5:26 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 12/7/2023 3:07 PM, Moger, Babu wrote:
>> On 12/7/2023 1:29 PM, Reinette Chatre wrote:
>>> On 12/7/2023 8:12 AM, Moger, Babu wrote:
>>>> On 12/6/23 12:49, Reinette Chatre wrote:
>>>>> On 12/6/2023 7:40 AM, Moger, Babu wrote:
>>>>>> On 12/5/23 17:17, Reinette Chatre wrote:
>>>>>>> On 11/30/2023 4:57 PM, Babu Moger wrote:
>>>>>>>> b. Mount with ABMC support
>>>>>>>>      #umount /sys/fs/resctrl/
>>>>>>>>      #mount  -o abmc -t resctrl resctrl /sys/fs/resctrl/
>>>>>>>>      
>>>>>>> hmmm ... so this requires the user to mount resctrl, determine if the
>>>>>>> feature is supported, unmount resctrl, remount resctrl with feature enabled.
>>>>>>> Could you please elaborate what prevents this feature from being enabled
>>>>>>> without needing to remount resctrl?
>>>>>> Spec says
>>>>>> "Enabling ABMC: ABMC is enabled by setting L3_QOS_EXT_CFG.ABMC_En=1 (see
>>>>>> Figure 19-7). When the state of ABMC_En is changed, it must be changed to
>>>>>> the updated value on all logical processors in the QOS Domain.
>>>>>> Upon transitions of the ABMC_En the following actions take place:
>>>>>> All ABMC assignable bandwidth counters are reset to 0.
>>>>>> The L3 default mode bandwidth counters are reset to 0.
>>>>>> The L3_QOS_ABMC_CFG MSR is reset to 0."
>>>>>>
>>>>>> So, all the monitoring group counters will be reset.
>>>>>>
>>>>>> It is technically possible to enable without remount. But ABMC mode
>>>>>> requires few new files(in each group) which I added when mounted with "-o
>>>>>> abmc". Thought it is a better option.
>>>>>>
>>>>>> Otherwise we need to add these files when ABMC is supported(not when
>>>>>> enabled). Need to add another file in /sys/fs/resctrl/info/L3_MON to
>>>>>> enable the feature on the fly.
>>>>>>
>>>>>> Both are acceptable options. Any thoughts?
>>>>> The new resctrl files in info/ could always be present. For example,
>>>>> user space may want to know how many counters are available before
>>>>> enabling the feature.
>>>>>
>>>>> It is not yet obvious to me that this feature requires new files
>>>>> in monitor groups.
>>>> There are two MBM events(total and local) in each group.
>>>> We should provide an interface to assign each event independently.
>>>> User can assign only one event in a group. We should also provide an
>>>> option assign both the events in the group. This needs to be done at
>>>> resctrl group level.
>>> Understood. I would like to start by considering how (if at all) existing
>>> files may be used, thus my example of using mbm_total_bytes, before adding
>>> more files.
>>>
>>>
>>> ...
>>>
>>>>>>>>      #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>>>>>>      Unavailable
>>>>>>> I believe that "Unavailable" already has an accepted meaning within current
>>>>>>> interface and is associated with temporary failure. Even the AMD spec states "This
>>>>>>> is generally a temporary condition and subsequent reads may succeed". In the
>>>>>>> scenario above there is no chance that this counter would produce a value later.
>>>>>>> I do not think it is ideal to overload existing interface with different meanings
>>>>>>> associated with a new hardware specific feature ... something like "Disabled" seems
>>>>>>> more appropriate.
>>>>>> Hardware still reports it as unavailable. Also, there are some error cases
>>>>>> hardware can report unavailable. We may not be able to differentiate that.
>>>>> This highlights that this resctrl feature is currently latched to AMD's
>>>>> ABMC. I do not think we should require that this resctrl feature is backed
>>>>> by hardware that can support reads of counters that are disabled. A counter
>>>>> read really only needs to be sent to hardware if it is enabled.
>>>>>
>>>>>>> Considering this we may even consider using these files themselves as a
>>>>>>> way to enable the counters if they are disabled. For example, just
>>>>>>> "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used
>>>>>> I am not sure about this. This is specific to domain 0. This group can
>>>>>> have cpus from multiple domains. I think we should have the interface for
>>>>>> all the domains(not for specific domain).
>>>>> Are the ABMC registers not per CPU? This is unclear to me at this time
>>>>> since changelog of patch #13 states it is per-CPU but yet the code
>>>>> uses smp_call_function_any().
>>>> Here are the clarifications from hardware engineer about this.
>>>>
>>>> # While configuring the counter, should we have to write (L3_QOS_ABMC_CFG)
>>>> on all the logical processors in a domain?
>>>>
>>>> No.  In order to configure a specific counter, you only need to write it
>>>> on a  single logical processor in a domain.  Configuring the actual ABMC
>>>> counter is a side-effect of the write to this register.  And the actual
>>>> ABMC counter configuration is a  global state.
>>>>
>>>> "Each logical processor implements a separate copy of these registers"
>>>> identifies that if you write a 5 to L3_QOS_ABMC_CFG on C0T0, you will not
>>>> read a 5 from the L3_QOS_ABMC_CFG register on C1T0.
>>> Thank you for this information. Would reading L3_QOS_ABMC_DSC register on
>>> C1T0 return the configuration written to L3_QOS_ABMC_CFG on C0T0 ?
>> Yes. Because the counter counter configuration is global. Reading L3_QOS_ABMC_DSC will return the configuration of the counter specified by
>>
>> QOS_ABMC_CFG[CtrID].
>
> To confirm, when you say "global" you mean within a domain?

Yes. That is correct.


>
>>> Even so, you do confirm that the counter configuration is per domain. If I
>>> understand correctly the implementation in this series assumes the counters
>>> are programmed identically on all domains, but theoretically the system can support
>>> domains with different counter configurations. For example, if a resource group
>>> is limited to CPUs in one domain it would be unnecessary to consume the other
>>> domain's counters.
>> Yes. It is programmed on all the domains. Separating the domain
>> configuration will require more changes. I am not planning to address
>> in this series.
> That may be ok. The priority is to consider how users want to interact with this
> feature and create a suitable interface to support this. This version may not
> separate domain configuration, but we do not want to create an the interface that
> prevents such an enhancement in the future. Especially since it is already known
> that hardware supports it.

Yes. Understood.

Thanks

Babu
  
Peter Newman Dec. 8, 2023, 7:45 p.m. UTC | #10
On Tue, Dec 5, 2023 at 3:17 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
> On 11/30/2023 4:57 PM, Babu Moger wrote:
> > c. Read the monitor states. There will be new file "monitor_state"
> >    for each monitor group when ABMC feature is enabled. By default,
> >    both total and local MBM events are in "unassign" state.
> >
> >       #cat /sys/fs/resctrl/monitor_state
> >       total=unassign;local=unassign
> >
> > d. Read the event mbm_total_bytes and mbm_local_bytes. Note that MBA
> >    events are not available until the user assigns the events explicitly.
> >    Users need to assign the counters to monitor the events in this mode.
> >
> >       #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
> >       Unavailable
>
> How is the llc_occupancy event impacted when ABMC is enabled? Can all RMIDs
> still be used to track cache occupancy?
>
> >
> >       #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
> >       Unavailable
>
> I believe that "Unavailable" already has an accepted meaning within current
> interface and is associated with temporary failure. Even the AMD spec states "This
> is generally a temporary condition and subsequent reads may succeed". In the
> scenario above there is no chance that this counter would produce a value later.
> I do not think it is ideal to overload existing interface with different meanings
> associated with a new hardware specific feature ... something like "Disabled" seems
> more appropriate.

Could we hide event counter files if they're not enabled? Is there
value in displaying the value of a non-running counter that will be
reset the next time it's enabled?


>
> Considering this we may even consider using these files themselves as a
> way to enable the counters if they are disabled. For example, just
> "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used
> to enable this counter. No need for a new "monitor_state". Please note that this
> is not an official proposal since there are two other use cases that still need to
> be considered as we await James's feedback on how this may work for MPAM and
> also how this may be useful on AMD hardware that does not support ABMC but
> users may want to get similar benefits ([1])

We plan to use the ABMC counters as a window to sample the MB/s rate
of a very large number of groups, so there's a serious concern about
the number of write syscalls this will take, as they will add up
quickly for a large RMID*domain count.

To that end, the ideal would be the ability to re-assign all ABMC
counters on all domains in a single system call.

-Peter
  
Reinette Chatre Dec. 8, 2023, 8:09 p.m. UTC | #11
Hi Peter,

On 12/8/2023 11:45 AM, Peter Newman wrote:
> On Tue, Dec 5, 2023 at 3:17 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>> On 11/30/2023 4:57 PM, Babu Moger wrote:
>>> c. Read the monitor states. There will be new file "monitor_state"
>>>    for each monitor group when ABMC feature is enabled. By default,
>>>    both total and local MBM events are in "unassign" state.
>>>
>>>       #cat /sys/fs/resctrl/monitor_state
>>>       total=unassign;local=unassign
>>>
>>> d. Read the event mbm_total_bytes and mbm_local_bytes. Note that MBA
>>>    events are not available until the user assigns the events explicitly.
>>>    Users need to assign the counters to monitor the events in this mode.
>>>
>>>       #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>>       Unavailable
>>
>> How is the llc_occupancy event impacted when ABMC is enabled? Can all RMIDs
>> still be used to track cache occupancy?
>>
>>>
>>>       #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>>       Unavailable
>>
>> I believe that "Unavailable" already has an accepted meaning within current
>> interface and is associated with temporary failure. Even the AMD spec states "This
>> is generally a temporary condition and subsequent reads may succeed". In the
>> scenario above there is no chance that this counter would produce a value later.
>> I do not think it is ideal to overload existing interface with different meanings
>> associated with a new hardware specific feature ... something like "Disabled" seems
>> more appropriate.
> 
> Could we hide event counter files if they're not enabled? Is there
> value in displaying the value of a non-running counter that will be
> reset the next time it's enabled?

It may be possible to hide the counter file when it is disabled but in this
case it is not clear to me how to communicate to user space that it is
an available counter that can be enabled and by hiding the file one mechanism
to actually enable the counter is lost. It is not required to
display a stale value when a counter is disabled, text like "Disabled"
can be used.

>> Considering this we may even consider using these files themselves as a
>> way to enable the counters if they are disabled. For example, just
>> "echo 1 > /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes" can be used
>> to enable this counter. No need for a new "monitor_state". Please note that this
>> is not an official proposal since there are two other use cases that still need to
>> be considered as we await James's feedback on how this may work for MPAM and
>> also how this may be useful on AMD hardware that does not support ABMC but
>> users may want to get similar benefits ([1])
> 
> We plan to use the ABMC counters as a window to sample the MB/s rate
> of a very large number of groups, so there's a serious concern about
> the number of write syscalls this will take, as they will add up
> quickly for a large RMID*domain count.
> 
> To that end, the ideal would be the ability to re-assign all ABMC
> counters on all domains in a single system call.

Understood. I've already pointed out that this is a use case needing
to be considered. Please see [1] - search for "global enable/disable".

Reinette

[1] https://lore.kernel.org/lkml/e36699cf-c73e-401b-b770-63eba708df38@intel.com/
  
Moger, Babu Dec. 8, 2023, 10:58 p.m. UTC | #12
Hi Reinette/Peter,

> -----Original Message-----
> From: Reinette Chatre <reinette.chatre@intel.com>
> Sent: Thursday, December 7, 2023 1:29 PM
> To: Moger, Babu <Babu.Moger@amd.com>; corbet@lwn.net;
> fenghua.yu@intel.com; tglx@linutronix.de; mingo@redhat.com;
> bp@alien8.de; dave.hansen@linux.intel.com; James Morse
> <james.morse@arm.com>
> Cc: x86@kernel.org; hpa@zytor.com; paulmck@kernel.org;
> rdunlap@infradead.org; tj@kernel.org; peterz@infradead.org;
> seanjc@google.com; Phillips, Kim <kim.phillips@amd.com>;
> jmattson@google.com; ilpo.jarvinen@linux.intel.com;
> jithu.joseph@intel.com; kan.liang@linux.intel.com; Dadhania, Nikunj
> <nikunj.dadhania@amd.com>; daniel.sneddon@linux.intel.com;
> pbonzini@redhat.com; rick.p.edgecombe@intel.com; rppt@kernel.org;
> maciej.wieczor-retman@intel.com; linux-doc@vger.kernel.org; linux-
> kernel@vger.kernel.org; eranian@google.com; peternewman@google.com;
> Giani, Dhaval <Dhaval.Giani@amd.com>
> Subject: Re: [PATCH 00/15] x86/resctrl : Support AMD QoS RMID Pinning
> feature
> 
> Hi Babu,
> 
> On 12/7/2023 8:12 AM, Moger, Babu wrote:
> > On 12/6/23 12:49, Reinette Chatre wrote:
> >> On 12/6/2023 7:40 AM, Moger, Babu wrote:
> >>> On 12/5/23 17:17, Reinette Chatre wrote:
> >>>> On 11/30/2023 4:57 PM, Babu Moger wrote:
> 
> 
> >>>>> b. Mount with ABMC support
> >>>>> 	#umount /sys/fs/resctrl/
> >>>>> 	#mount  -o abmc -t resctrl resctrl /sys/fs/resctrl/
> >>>>>
> >>>>
> >>>> hmmm ... so this requires the user to mount resctrl, determine if
> >>>> the feature is supported, unmount resctrl, remount resctrl with feature
> enabled.
> >>>> Could you please elaborate what prevents this feature from being
> >>>> enabled without needing to remount resctrl?
> >>>
> >>> Spec says
> >>> "Enabling ABMC: ABMC is enabled by setting
> L3_QOS_EXT_CFG.ABMC_En=1
> >>> (see Figure 19-7). When the state of ABMC_En is changed, it must be
> >>> changed to the updated value on all logical processors in the QOS Domain.
> >>> Upon transitions of the ABMC_En the following actions take place:
> >>> All ABMC assignable bandwidth counters are reset to 0.
> >>> The L3 default mode bandwidth counters are reset to 0.
> >>> The L3_QOS_ABMC_CFG MSR is reset to 0."
> >>>
> >>> So, all the monitoring group counters will be reset.
> >>>
> >>> It is technically possible to enable without remount. But ABMC mode
> >>> requires few new files(in each group) which I added when mounted
> >>> with "-o abmc". Thought it is a better option.
> >>>
> >>> Otherwise we need to add these files when ABMC is supported(not when
> >>> enabled). Need to add another file in /sys/fs/resctrl/info/L3_MON to
> >>> enable the feature on the fly.
> >>>
> >>> Both are acceptable options. Any thoughts?

I think we didn’t conclude on this yet.  I will remove the requirement to
remount the filesystem to use ABMC.  That way users can move back and
forth between the modes without having to remount. We need to take care of
extra cleanup of states(data structure) when user moves back and forth.
Hopefully, I should be able to take care of that.

Thanks
Babu