[v1,7/9] x86/resctrl: Assign HW RMIDs to CPUs for soft RMID

Message ID 20230421141723.2405942-8-peternewman@google.com
State New
Headers
Series x86/resctrl: Use soft RMIDs for reliable MBM on AMD |

Commit Message

Peter Newman April 21, 2023, 2:17 p.m. UTC
  To implement soft RMIDs, each CPU needs a HW RMID that is unique within
its L3 cache domain. This is the minimum number of RMIDs needed to
monitor all CPUs.

This is accomplished by determining the rank of each CPU's mask bit
within its L3 shared_cpu_mask in resctrl_online_cpu().

Signed-off-by: Peter Newman <peternewman@google.com>
---
 arch/x86/kernel/cpu/resctrl/core.c | 39 +++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)
  

Comments

Reinette Chatre May 11, 2023, 9:39 p.m. UTC | #1
Hi Peter,

On 4/21/2023 7:17 AM, Peter Newman wrote:
> To implement soft RMIDs, each CPU needs a HW RMID that is unique within
> its L3 cache domain. This is the minimum number of RMIDs needed to
> monitor all CPUs.
> 
> This is accomplished by determining the rank of each CPU's mask bit
> within its L3 shared_cpu_mask in resctrl_online_cpu().
> 
> Signed-off-by: Peter Newman <peternewman@google.com>
> ---
>  arch/x86/kernel/cpu/resctrl/core.c | 39 +++++++++++++++++++++++++++++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
> index 47b1c37a81f8..b0d873231b1e 100644
> --- a/arch/x86/kernel/cpu/resctrl/core.c
> +++ b/arch/x86/kernel/cpu/resctrl/core.c
> @@ -596,6 +596,38 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r)
>  	}
>  }
>  
> +/* Assign each CPU an RMID that is unique within its cache domain. */
> +static u32 determine_hw_rmid_for_cpu(int cpu)

This code tends to use the verb "get", something like "get_hw_rmid()"
could work.

> +{
> +	struct cpu_cacheinfo *ci = get_cpu_cacheinfo(cpu);
> +	struct cacheinfo *l3ci = NULL;
> +	u32 rmid;
> +	int i;
> +
> +	/* Locate the cacheinfo for this CPU's L3 cache. */
> +	for (i = 0; i < ci->num_leaves; i++) {
> +		if (ci->info_list[i].level == 3 &&
> +		    (ci->info_list[i].attributes & CACHE_ID)) {
> +			l3ci = &ci->info_list[i];
> +			break;
> +		}
> +	}
> +	WARN_ON(!l3ci);
> +
> +	if (!l3ci)
> +		return 0;

You can use "if (WARN_ON(..))"

> +
> +	/* Use the position of cpu in its shared_cpu_mask as its RMID. */

(please use "CPU" instead of "cpu" in comments and changelogs)

> +	rmid = 0;
> +	for_each_cpu(i, &l3ci->shared_cpu_map) {
> +		if (i == cpu)
> +			break;
> +		rmid++;
> +	}
> +
> +	return rmid;
> +}

I do not see any impact to the (soft) RMIDs that can be assigned to monitor
groups, yet from what I understand a generic "RMID" is used as index to MBM state.
Is this correct? A hardware RMID and software RMID would thus share the
same MBM state. If this is correct I think we need to work on making
the boundaries between hard and soft RMID more clear.

> +
>  static void clear_closid_rmid(int cpu)
>  {
>  	struct resctrl_pqr_state *state = this_cpu_ptr(&pqr_state);
> @@ -604,7 +636,12 @@ static void clear_closid_rmid(int cpu)
>  	state->default_rmid = 0;
>  	state->cur_closid = 0;
>  	state->cur_rmid = 0;
> -	wrmsr(MSR_IA32_PQR_ASSOC, 0, 0);
> +	state->hw_rmid = 0;
> +
> +	if (static_branch_likely(&rdt_soft_rmid_enable_key))
> +		state->hw_rmid = determine_hw_rmid_for_cpu(cpu);
> +
> +	wrmsr(MSR_IA32_PQR_ASSOC, state->hw_rmid, 0);
>  }
>  
>  static int resctrl_online_cpu(unsigned int cpu)

Reinette
  
Peter Newman May 16, 2023, 2:49 p.m. UTC | #2
Hi Reinette,

On Thu, May 11, 2023 at 11:40 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
> On 4/21/2023 7:17 AM, Peter Newman wrote:
> > +     /* Locate the cacheinfo for this CPU's L3 cache. */
> > +     for (i = 0; i < ci->num_leaves; i++) {
> > +             if (ci->info_list[i].level == 3 &&
> > +                 (ci->info_list[i].attributes & CACHE_ID)) {
> > +                     l3ci = &ci->info_list[i];
> > +                     break;
> > +             }
> > +     }
> > +     WARN_ON(!l3ci);
> > +
> > +     if (!l3ci)
> > +             return 0;
>
> You can use "if (WARN_ON(..))"

Thanks, I'll look for the other changes in the series which would
benefit from this.


> > +     rmid = 0;
> > +     for_each_cpu(i, &l3ci->shared_cpu_map) {
> > +             if (i == cpu)
> > +                     break;
> > +             rmid++;
> > +     }
> > +
> > +     return rmid;
> > +}
>
> I do not see any impact to the (soft) RMIDs that can be assigned to monitor
> groups, yet from what I understand a generic "RMID" is used as index to MBM state.
> Is this correct? A hardware RMID and software RMID would thus share the
> same MBM state. If this is correct I think we need to work on making
> the boundaries between hard and soft RMID more clear.

The only RMID-indexed state used by soft RMIDs right now is
mbm_state::soft_rmid_bytes. The other aspect of the boundary is
ensuring that nothing will access the hard RMID-specific state for a
soft RMID.

The remainder of the mbm_state is only accessed by the software
controller, which you suggested that I disable.

The arch_mbm_state is accessed only through resctrl_arch_rmid_read()
and resctrl_arch_reset_rmid(), which are called by __mon_event_count()
or the limbo handler.

__mon_event_count() is aware of soft RMIDs, so I would just need to
ensure the software controller is disabled and never put any RMIDs on
the limbo list. To be safe, I can also add
WARN_ON_ONCE(rdt_mon_soft_rmid) to the rmid-indexing of the mbm_state
arrays in the software controller and before the
resctrl_arch_rmid_read() call in the limbo handler to catch if they're
ever using soft RMIDs.

-Peter



>
> > +
> >  static void clear_closid_rmid(int cpu)
> >  {
> >       struct resctrl_pqr_state *state = this_cpu_ptr(&pqr_state);
> > @@ -604,7 +636,12 @@ static void clear_closid_rmid(int cpu)
> >       state->default_rmid = 0;
> >       state->cur_closid = 0;
> >       state->cur_rmid = 0;
> > -     wrmsr(MSR_IA32_PQR_ASSOC, 0, 0);
> > +     state->hw_rmid = 0;
> > +
> > +     if (static_branch_likely(&rdt_soft_rmid_enable_key))
> > +             state->hw_rmid = determine_hw_rmid_for_cpu(cpu);
> > +
> > +     wrmsr(MSR_IA32_PQR_ASSOC, state->hw_rmid, 0);
> >  }
> >
> >  static int resctrl_online_cpu(unsigned int cpu)
>
> Reinette
  
Reinette Chatre May 17, 2023, 12:06 a.m. UTC | #3
Hi Peter,

On 5/16/2023 7:49 AM, Peter Newman wrote:
> On Thu, May 11, 2023 at 11:40 PM Reinette Chatre
> <reinette.chatre@intel.com> wrote:
>> On 4/21/2023 7:17 AM, Peter Newman wrote:
>>> +     rmid = 0;
>>> +     for_each_cpu(i, &l3ci->shared_cpu_map) {
>>> +             if (i == cpu)
>>> +                     break;
>>> +             rmid++;
>>> +     }
>>> +
>>> +     return rmid;
>>> +}
>>
>> I do not see any impact to the (soft) RMIDs that can be assigned to monitor
>> groups, yet from what I understand a generic "RMID" is used as index to MBM state.
>> Is this correct? A hardware RMID and software RMID would thus share the
>> same MBM state. If this is correct I think we need to work on making
>> the boundaries between hard and soft RMID more clear.
> 
> The only RMID-indexed state used by soft RMIDs right now is
> mbm_state::soft_rmid_bytes. The other aspect of the boundary is
> ensuring that nothing will access the hard RMID-specific state for a
> soft RMID.
> 
> The remainder of the mbm_state is only accessed by the software
> controller, which you suggested that I disable.
> 
> The arch_mbm_state is accessed only through resctrl_arch_rmid_read()
> and resctrl_arch_reset_rmid(), which are called by __mon_event_count()
> or the limbo handler.
> 
> __mon_event_count() is aware of soft RMIDs, so I would just need to
> ensure the software controller is disabled and never put any RMIDs on
> the limbo list. To be safe, I can also add
> WARN_ON_ONCE(rdt_mon_soft_rmid) to the rmid-indexing of the mbm_state
> arrays in the software controller and before the
> resctrl_arch_rmid_read() call in the limbo handler to catch if they're
> ever using soft RMIDs.

I understand and trust that you can ensure that this implementation is
done safely. Please also consider how future changes to resctrl may stumble
if there are not clear boundaries. You may be able to "ensure the software
controller is disabled and never put any RMIDs on the limbo list", but
consider if these rules will be clear to somebody who comes along in a year
or more.

Documenting the data structures with these unique usages will help. 
Specific accessors can sometimes be useful to make it obvious in which state
the data is being accessed and what data can be accessed. Using WARN
as you suggest is a useful tool.

Reinette
  
Peter Newman June 6, 2023, 1:31 p.m. UTC | #4
Hi Reinette,

On Wed, May 17, 2023 at 2:06 AM Reinette Chatre
<reinette.chatre@intel.com> wrote:
> On 5/16/2023 7:49 AM, Peter Newman wrote:
> > On Thu, May 11, 2023 at 11:40 PM Reinette Chatre
> > <reinette.chatre@intel.com> wrote:
> >> I do not see any impact to the (soft) RMIDs that can be assigned to monitor
> >> groups, yet from what I understand a generic "RMID" is used as index to MBM state.
> >> Is this correct? A hardware RMID and software RMID would thus share the
> >> same MBM state. If this is correct I think we need to work on making
> >> the boundaries between hard and soft RMID more clear.
> >
> > The only RMID-indexed state used by soft RMIDs right now is
> > mbm_state::soft_rmid_bytes. The other aspect of the boundary is
> > ensuring that nothing will access the hard RMID-specific state for a
> > soft RMID.
> >
> > The remainder of the mbm_state is only accessed by the software
> > controller, which you suggested that I disable.
> >
> > The arch_mbm_state is accessed only through resctrl_arch_rmid_read()
> > and resctrl_arch_reset_rmid(), which are called by __mon_event_count()
> > or the limbo handler.
> >
> > __mon_event_count() is aware of soft RMIDs, so I would just need to
> > ensure the software controller is disabled and never put any RMIDs on
> > the limbo list. To be safe, I can also add
> > WARN_ON_ONCE(rdt_mon_soft_rmid) to the rmid-indexing of the mbm_state
> > arrays in the software controller and before the
> > resctrl_arch_rmid_read() call in the limbo handler to catch if they're
> > ever using soft RMIDs.
>
> I understand and trust that you can ensure that this implementation is
> done safely. Please also consider how future changes to resctrl may stumble
> if there are not clear boundaries. You may be able to "ensure the software
> controller is disabled and never put any RMIDs on the limbo list", but
> consider if these rules will be clear to somebody who comes along in a year
> or more.
>
> Documenting the data structures with these unique usages will help.
> Specific accessors can sometimes be useful to make it obvious in which state
> the data is being accessed and what data can be accessed. Using WARN
> as you suggest is a useful tool.

After studying the present usage of RMID values some more, I've
concluded that I can cleanly move all knowledge of the soft RMID
implementation to be within resctrl_arch_rmid_read() and that none of
the FS-layer code should need to be aware of them. However, doing this
would require James's patch to allow resctrl_arch_rmid_read() to
block[1], since resctrl_arch_rmid_read() would be the first
opportunity architecture-dependent code has to IPI the other CPUs in
the domain.

The alternative to blocking in resctrl_arch_rmid_read() would be
introducing an arch hook to mon_event_read(), where blocking can be
done today without James's patches, so that architecture-dependent
code can IPI all CPUs in the target domain to flush their event counts
to memory before calling mon_event_count() to total their MBM event
counts.

The remaining special case for soft RMIDs would be knowing that they
should never go on the limbo list. Right now I've hard-coded the soft
RMID read to always return 0 bytes for occupancy events, but this
answer is only correct in the context of deciding whether RMIDs are
dirty, so I have to prevent the events from being presented to the
user. If returning an error wasn't considered "dirty", maybe that
would work too.

Maybe the cleanest approach would be to cause enabling soft RMIDs to
somehow cause is_llc_occupancy_enabled() to return false, but this is
difficult as long as soft RMIDs are configured at mount time and
rdt_mon_features is set at boot time. If soft RMIDs move completely
into the arch layer, is it preferable to configure them with an rdt
boot option instead of adding an architecture-dependent mount option?
I recall James being opposed to adding a boot option for this.

Thanks!
-Peter

[1] https://lore.kernel.org/lkml/20230525180209.19497-15-james.morse@arm.com/
  
Peter Newman June 6, 2023, 1:36 p.m. UTC | #5
On Fri, Apr 21, 2023 at 4:18 PM Peter Newman <peternewman@google.com> wrote:
>  static void clear_closid_rmid(int cpu)
>  {
>         struct resctrl_pqr_state *state = this_cpu_ptr(&pqr_state);
> @@ -604,7 +636,12 @@ static void clear_closid_rmid(int cpu)
>         state->default_rmid = 0;
>         state->cur_closid = 0;
>         state->cur_rmid = 0;
> -       wrmsr(MSR_IA32_PQR_ASSOC, 0, 0);
> +       state->hw_rmid = 0;
> +
> +       if (static_branch_likely(&rdt_soft_rmid_enable_key))
> +               state->hw_rmid = determine_hw_rmid_for_cpu(cpu);

clear_closid_rmid() isn't run at mount time, so hw_rmid will be
uninitialized on any CPUs which were already enabled. The static key
was originally set at boot.

(the consequence was that domain bandwidth was the amount recorded on
the first CPU in the domain multiplied by the number of CPUs in the
domain)
  

Patch

diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 47b1c37a81f8..b0d873231b1e 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -596,6 +596,38 @@  static void domain_remove_cpu(int cpu, struct rdt_resource *r)
 	}
 }
 
+/* Assign each CPU an RMID that is unique within its cache domain. */
+static u32 determine_hw_rmid_for_cpu(int cpu)
+{
+	struct cpu_cacheinfo *ci = get_cpu_cacheinfo(cpu);
+	struct cacheinfo *l3ci = NULL;
+	u32 rmid;
+	int i;
+
+	/* Locate the cacheinfo for this CPU's L3 cache. */
+	for (i = 0; i < ci->num_leaves; i++) {
+		if (ci->info_list[i].level == 3 &&
+		    (ci->info_list[i].attributes & CACHE_ID)) {
+			l3ci = &ci->info_list[i];
+			break;
+		}
+	}
+	WARN_ON(!l3ci);
+
+	if (!l3ci)
+		return 0;
+
+	/* Use the position of cpu in its shared_cpu_mask as its RMID. */
+	rmid = 0;
+	for_each_cpu(i, &l3ci->shared_cpu_map) {
+		if (i == cpu)
+			break;
+		rmid++;
+	}
+
+	return rmid;
+}
+
 static void clear_closid_rmid(int cpu)
 {
 	struct resctrl_pqr_state *state = this_cpu_ptr(&pqr_state);
@@ -604,7 +636,12 @@  static void clear_closid_rmid(int cpu)
 	state->default_rmid = 0;
 	state->cur_closid = 0;
 	state->cur_rmid = 0;
-	wrmsr(MSR_IA32_PQR_ASSOC, 0, 0);
+	state->hw_rmid = 0;
+
+	if (static_branch_likely(&rdt_soft_rmid_enable_key))
+		state->hw_rmid = determine_hw_rmid_for_cpu(cpu);
+
+	wrmsr(MSR_IA32_PQR_ASSOC, state->hw_rmid, 0);
 }
 
 static int resctrl_online_cpu(unsigned int cpu)