[v1,8/9] x86/resctrl: Use mbm_update() to push soft RMID counts

Message ID 20230421141723.2405942-9-peternewman@google.com
State New
Headers
Series x86/resctrl: Use soft RMIDs for reliable MBM on AMD |

Commit Message

Peter Newman April 21, 2023, 2:17 p.m. UTC
  __mon_event_count() only reads the current software count and does not
cause CPUs in the domain to flush. For mbm_update() to be effective in
preventing overflow in hardware counters with soft RMIDs, it needs to
flush the domain CPUs so that all of the HW RMIDs are read.

When RMIDs are soft, mbm_update() is intended to push bandwidth counts
to the software counters rather than pulling the counts from hardware
when userspace reads event counts, as this is a lot more efficient when
the number of HW RMIDs is fixed.

When RMIDs are soft, mbm_update() only calls mbm_flush_cpu_handler() on
each CPU in the domain rather than reading all RMIDs.

Signed-off-by: Peter Newman <peternewman@google.com>
---
 arch/x86/kernel/cpu/resctrl/monitor.c | 28 +++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)
  

Comments

Reinette Chatre May 11, 2023, 9:40 p.m. UTC | #1
Hi Peter,

On 4/21/2023 7:17 AM, Peter Newman wrote:
> @@ -806,12 +811,27 @@ void mbm_handle_overflow(struct work_struct *work)
>  	r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
>  	d = container_of(work, struct rdt_domain, mbm_over.work);
>  
> +	if (rdt_mon_soft_rmid) {
> +		/*
> +		 * HW RMIDs are permanently assigned to CPUs, so only a per-CPU
> +		 * flush is needed.
> +		 */
> +		on_each_cpu_mask(&d->cpu_mask, mbm_flush_cpu_handler, NULL,
> +				 false);
> +	}
> +
>  	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
> -		mbm_update(r, d, prgrp->mon.rmid);
> +		/*
> +		 * mbm_update() on every RMID would result in excessive IPIs
> +		 * when RMIDs are soft.
> +		 */
> +		if (!rdt_mon_soft_rmid) {
> +			mbm_update(r, d, prgrp->mon.rmid);
>  
> -		head = &prgrp->mon.crdtgrp_list;
> -		list_for_each_entry(crgrp, head, mon.crdtgrp_list)
> -			mbm_update(r, d, crgrp->mon.rmid);
> +			head = &prgrp->mon.crdtgrp_list;
> +			list_for_each_entry(crgrp, head, mon.crdtgrp_list)
> +				mbm_update(r, d, crgrp->mon.rmid);
> +		}
>  
>  		if (is_mba_sc(NULL))
>  			update_mba_bw(prgrp, d);


hmmm ... I think that update_mba_bw() relies on mbm_update() to call
mbm_bw_count() to update the data it depends on. Keeping update_mba_bw()
while dropping mbm_update() thus seems problematic. AMD does not support the
software controller though so it may make things simpler if support for
software RMIDs disables support for software controller (in a clear way).

Reinette
  
Peter Newman June 2, 2023, 12:42 p.m. UTC | #2
Hi Reinette,

On Thu, May 11, 2023 at 11:40 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
> On 4/21/2023 7:17 AM, Peter Newman wrote:
> >       list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
> > -             mbm_update(r, d, prgrp->mon.rmid);
> > +             /*
> > +              * mbm_update() on every RMID would result in excessive IPIs
> > +              * when RMIDs are soft.
> > +              */
> > +             if (!rdt_mon_soft_rmid) {
> > +                     mbm_update(r, d, prgrp->mon.rmid);
> >
> > -             head = &prgrp->mon.crdtgrp_list;
> > -             list_for_each_entry(crgrp, head, mon.crdtgrp_list)
> > -                     mbm_update(r, d, crgrp->mon.rmid);
> > +                     head = &prgrp->mon.crdtgrp_list;
> > +                     list_for_each_entry(crgrp, head, mon.crdtgrp_list)
> > +                             mbm_update(r, d, crgrp->mon.rmid);
> > +             }
> >
> >               if (is_mba_sc(NULL))
> >                       update_mba_bw(prgrp, d);
>
>
> hmmm ... I think that update_mba_bw() relies on mbm_update() to call
> mbm_bw_count() to update the data it depends on. Keeping update_mba_bw()
> while dropping mbm_update() thus seems problematic. AMD does not support the
> software controller though so it may make things simpler if support for
> software RMIDs disables support for software controller (in a clear way).

I looked over this again and realized that the rationale for skipping
mbm_update() in this patch is incorrect.
__mon_event_count_soft_rmid() does not send any IPIs, so it's
perfectly fine to call mbm_update() and update_mba_bw() when using
soft RMIDs.

Even if we don't use the software controller on AMD, it seems
conceptually cleaner to just allow soft and hard RMIDs to be used
interchangeably wherever they work.

-Peter
  
Peter Newman June 6, 2023, 1:48 p.m. UTC | #3
On Fri, Apr 21, 2023 at 4:18 PM Peter Newman <peternewman@google.com> wrote:
>
> __mon_event_count() only reads the current software count and does not
> cause CPUs in the domain to flush. For mbm_update() to be effective in
> preventing overflow in hardware counters with soft RMIDs, it needs to
> flush the domain CPUs so that all of the HW RMIDs are read.
>
> When RMIDs are soft, mbm_update() is intended to push bandwidth counts
> to the software counters rather than pulling the counts from hardware
> when userspace reads event counts, as this is a lot more efficient when
> the number of HW RMIDs is fixed.

The low frequency with which the overflow handler is run would
introduce too much error into bandwidth calculations and running it
more frequently regardless of whether event count reads are being
requested by the user is not a good use of CPU time.

mon_event_read() needs to pull fresh event count values from hardware.

> When RMIDs are soft, mbm_update() only calls mbm_flush_cpu_handler() on
> each CPU in the domain rather than reading all RMIDs.

I'll try going back to Stephane's original approach of rate-limiting
how often domain CPUs need to be flushed and allowing the user to
configure the time threshold. This will allow mbm_update() to read all
of the RMIDs without triggering lots of redundant IPIs. (redundant
because only the current RMID on each CPU can change when RMIDs are
soft)
  

Patch

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 3d54a634471a..9575cb79b8ee 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -487,6 +487,11 @@  void resctrl_mbm_flush_cpu(void)
 		__mbm_flush(QOS_L3_MBM_TOTAL_EVENT_ID, r, d);
 }
 
+static void mbm_flush_cpu_handler(void *p)
+{
+	resctrl_mbm_flush_cpu();
+}
+
 static int __mon_event_count_soft_rmid(u32 rmid, struct rmid_read *rr)
 {
 	struct mbm_state *m;
@@ -806,12 +811,27 @@  void mbm_handle_overflow(struct work_struct *work)
 	r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
 	d = container_of(work, struct rdt_domain, mbm_over.work);
 
+	if (rdt_mon_soft_rmid) {
+		/*
+		 * HW RMIDs are permanently assigned to CPUs, so only a per-CPU
+		 * flush is needed.
+		 */
+		on_each_cpu_mask(&d->cpu_mask, mbm_flush_cpu_handler, NULL,
+				 false);
+	}
+
 	list_for_each_entry(prgrp, &rdt_all_groups, rdtgroup_list) {
-		mbm_update(r, d, prgrp->mon.rmid);
+		/*
+		 * mbm_update() on every RMID would result in excessive IPIs
+		 * when RMIDs are soft.
+		 */
+		if (!rdt_mon_soft_rmid) {
+			mbm_update(r, d, prgrp->mon.rmid);
 
-		head = &prgrp->mon.crdtgrp_list;
-		list_for_each_entry(crgrp, head, mon.crdtgrp_list)
-			mbm_update(r, d, crgrp->mon.rmid);
+			head = &prgrp->mon.crdtgrp_list;
+			list_for_each_entry(crgrp, head, mon.crdtgrp_list)
+				mbm_update(r, d, crgrp->mon.rmid);
+		}
 
 		if (is_mba_sc(NULL))
 			update_mba_bw(prgrp, d);