[v4] x86/resctrl: Add mount option to pick total MBM event

Message ID 20231128231439.81691-1-tony.luck@intel.com
State New
Headers
Series [v4] x86/resctrl: Add mount option to pick total MBM event |

Commit Message

Luck, Tony Nov. 28, 2023, 11:14 p.m. UTC
  Add a "total" mount option to be used in conjunction with "mba_MBps"
to request use of the total memory bandwidth event as the feedback
input to the control loop.

Also fall back to using the total event if the local event is not
supported by the CPU.

Update the once-per-second polling code to use the event (local
or total memory bandwidth).

Signed-off-by: Tony Luck <tony.luck@intel.com>
---

Changes since v3:

Reinette suggested that users might like the option to use the total
memory bandwidth event. I tried out some code to make the event runtime
selectable via a r/w file in the resctrl/info directories. But that
got complicated because of the amount of state that needs to be updated
when switching events. Since there isn't a firm use case for user
selectable event, this latest version falls back to the far simpler
case of using a mount option.

 Documentation/arch/x86/resctrl.rst     |  3 +++
 arch/x86/kernel/cpu/resctrl/internal.h |  3 +++
 arch/x86/kernel/cpu/resctrl/monitor.c  | 20 +++++++++-----------
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 15 ++++++++++++++-
 4 files changed, 29 insertions(+), 12 deletions(-)
  

Comments

Reinette Chatre Nov. 29, 2023, 11:48 p.m. UTC | #1
Hi Tony,

On 11/28/2023 3:14 PM, Tony Luck wrote:
> Add a "total" mount option to be used in conjunction with "mba_MBps"
> to request use of the total memory bandwidth event as the feedback
> input to the control loop.

"total" is very generic. It is also not clear to me why users
would need to use two mount options. What if the new mount option
is "mba_MBps_total" instead, without user needing to also provide
"mba_MBps"? 

> 
> Also fall back to using the total event if the local event is not
> supported by the CPU.
> 
> Update the once-per-second polling code to use the event (local
> or total memory bandwidth).

Please take care to describe why this change is needed, not just
what it does. This is required by x86. For confirmation:
https://lore.kernel.org/lkml/20231009172517.GRZSQ3fT05LGgpcW35@fat_crate.local/

> 
> Signed-off-by: Tony Luck <tony.luck@intel.com>
> ---
> 
> Changes since v3:
> 
> Reinette suggested that users might like the option to use the total
> memory bandwidth event. I tried out some code to make the event runtime
> selectable via a r/w file in the resctrl/info directories. But that
> got complicated because of the amount of state that needs to be updated
> when switching events. Since there isn't a firm use case for user
> selectable event, this latest version falls back to the far simpler
> case of using a mount option.

(I did not realize that that discussion was over.)

> 
>  Documentation/arch/x86/resctrl.rst     |  3 +++
>  arch/x86/kernel/cpu/resctrl/internal.h |  3 +++
>  arch/x86/kernel/cpu/resctrl/monitor.c  | 20 +++++++++-----------
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 15 ++++++++++++++-
>  4 files changed, 29 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index a6279df64a9d..29c3e7137eb8 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -46,6 +46,9 @@ mount options are:
>  "mba_MBps":
>  	Enable the MBA Software Controller(mba_sc) to specify MBA
>  	bandwidth in MBps
> +"total":
> +	Use total instead of local memory bandwidth to drive the
> +	MBA Software Controller
>  "debug":
>  	Make debug files accessible. Available debug files are annotated with
>  	"Available only with debug option".
> diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> index a4f1aa15f0a2..f98fc9adc2da 100644
> --- a/arch/x86/kernel/cpu/resctrl/internal.h
> +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> @@ -59,6 +59,7 @@ struct rdt_fs_context {
>  	bool				enable_cdpl2;
>  	bool				enable_cdpl3;
>  	bool				enable_mba_mbps;
> +	bool				use_mbm_total;
>  	bool				enable_debug;
>  };

Why did you choose new member to not follow existing custom of having
an enable_ prefix?

>  
> @@ -428,6 +429,8 @@ extern struct rdt_hw_resource rdt_resources_all[];
>  extern struct rdtgroup rdtgroup_default;
>  DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
>  
> +extern enum resctrl_event_id mba_mbps_evt_id;
> +

This global seems unnecessary. struct resctrl_membw.mba_sc indicates if
the software controller is enabled. Creating this global fragments
related information.

One option could be to change the type of struct resctrl_membw.mba_sc to
enum resctrl_event_id. I assume that 0 would never be a valid event ID and
can thus be used to know if the software controller is disabled. If this
is done then enum resctrl_event_id's documentation should be updated
with this assumption/usage.

Reinette
  
Luck, Tony Dec. 1, 2023, 8:45 p.m. UTC | #2
On Wed, Nov 29, 2023 at 03:48:37PM -0800, Reinette Chatre wrote:
> Hi Tony,
> 
> On 11/28/2023 3:14 PM, Tony Luck wrote:
> > Add a "total" mount option to be used in conjunction with "mba_MBps"
> > to request use of the total memory bandwidth event as the feedback
> > input to the control loop.
> 
> "total" is very generic. It is also not clear to me why users
> would need to use two mount options. What if the new mount option
> is "mba_MBps_total" instead, without user needing to also provide
> "mba_MBps"? 

I wasn't attached to "total". I tried to change the type of the existing
"mba_MBps" option to fsparam_string_emtpy() in the hope that existing users
would be able to keep using "mba_MBps", and users who want to use total
bandwidth could use "mba_MBps=total". But that type requires the "="
in the string provided by the user for the "empty" case.

So now I'm experimenting with adding a new option:

	fsparam_string("mba_MBps_event", Opt_mba_mbps_event)

so a user would specify "mba_MBps_event=total" instead of "mba_MBps".
My code also allow for "mba_MBps_event=local" if a user want to ensure
they use the local bandwidth event (failing the mount if it is not
available).

Existing code using the legacy "mba_MBps" option will get local by
default, failing over to total if needed.

> > 
> > Also fall back to using the total event if the local event is not
> > supported by the CPU.
> > 
> > Update the once-per-second polling code to use the event (local
> > or total memory bandwidth).
> 
> Please take care to describe why this change is needed, not just
> what it does. This is required by x86. For confirmation:
> https://lore.kernel.org/lkml/20231009172517.GRZSQ3fT05LGgpcW35@fat_crate.local/

Yes. I had some justification in earlier versions, but lost it along the
way. I will include in next version.

> > 
> > Signed-off-by: Tony Luck <tony.luck@intel.com>
> > ---
> > 
> > Changes since v3:
> > 
> > Reinette suggested that users might like the option to use the total
> > memory bandwidth event. I tried out some code to make the event runtime
> > selectable via a r/w file in the resctrl/info directories. But that
> > got complicated because of the amount of state that needs to be updated
> > when switching events. Since there isn't a firm use case for user
> > selectable event, this latest version falls back to the far simpler
> > case of using a mount option.
> 
> (I did not realize that that discussion was over.)

I'd like to avoid too much feature creep in this series. I'd like to
finish solving the original problem (systems without local bandwidth
monitoring should have a way to use total bandwidth) before tackling
additional issues in a separate patch series. Taking on a simple way
for users to request total bandwidth isn't much extra code. Making
it possible to switch events at runtime isn't. Fixing the corner cases
where the feedback loop may get stuck is also a separate issue.

> > 
> >  Documentation/arch/x86/resctrl.rst     |  3 +++
> >  arch/x86/kernel/cpu/resctrl/internal.h |  3 +++
> >  arch/x86/kernel/cpu/resctrl/monitor.c  | 20 +++++++++-----------
> >  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 15 ++++++++++++++-
> >  4 files changed, 29 insertions(+), 12 deletions(-)
> > 
> > diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> > index a6279df64a9d..29c3e7137eb8 100644
> > --- a/Documentation/arch/x86/resctrl.rst
> > +++ b/Documentation/arch/x86/resctrl.rst
> > @@ -46,6 +46,9 @@ mount options are:
> >  "mba_MBps":
> >  	Enable the MBA Software Controller(mba_sc) to specify MBA
> >  	bandwidth in MBps
> > +"total":
> > +	Use total instead of local memory bandwidth to drive the
> > +	MBA Software Controller
> >  "debug":
> >  	Make debug files accessible. Available debug files are annotated with
> >  	"Available only with debug option".
> > diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
> > index a4f1aa15f0a2..f98fc9adc2da 100644
> > --- a/arch/x86/kernel/cpu/resctrl/internal.h
> > +++ b/arch/x86/kernel/cpu/resctrl/internal.h
> > @@ -59,6 +59,7 @@ struct rdt_fs_context {
> >  	bool				enable_cdpl2;
> >  	bool				enable_cdpl3;
> >  	bool				enable_mba_mbps;
> > +	bool				use_mbm_total;
> >  	bool				enable_debug;
> >  };
> 
> Why did you choose new member to not follow existing custom of having
> an enable_ prefix?

That does look awful. Next version will do this:

-       bool                            enable_mba_mbps;
+       bool                            enable_mba_mbps_local;
+       bool                            enable_mba_mbps_total;

> 
> >  
> > @@ -428,6 +429,8 @@ extern struct rdt_hw_resource rdt_resources_all[];
> >  extern struct rdtgroup rdtgroup_default;
> >  DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
> >  
> > +extern enum resctrl_event_id mba_mbps_evt_id;
> > +
> 
> This global seems unnecessary. struct resctrl_membw.mba_sc indicates if
> the software controller is enabled. Creating this global fragments
> related information.

Global is now gone.

> One option could be to change the type of struct resctrl_membw.mba_sc to
> enum resctrl_event_id. I assume that 0 would never be a valid event ID and
> can thus be used to know if the software controller is disabled. If this
> is done then enum resctrl_event_id's documentation should be updated
> with this assumption/usage.

But I'm not fond of this overloading the meaning of resctrl_membw.mba_sc
by making that assumption that "0" will never be a valid event.

I've left the mba_sc field as a boolean that indicates enabled and added
a new field for the event:

@@ -138,6 +139,7 @@ struct resctrl_membw {
        bool                            arch_needs_linear;
        enum membw_throttle_mode        throttle_mode;
        bool                            mba_sc;
+       enum resctrl_event_id           mba_mbps_event;
        u32                             *mb_map;
 };

New version will be posted soon.

-Tony
  

Patch

diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
index a6279df64a9d..29c3e7137eb8 100644
--- a/Documentation/arch/x86/resctrl.rst
+++ b/Documentation/arch/x86/resctrl.rst
@@ -46,6 +46,9 @@  mount options are:
 "mba_MBps":
 	Enable the MBA Software Controller(mba_sc) to specify MBA
 	bandwidth in MBps
+"total":
+	Use total instead of local memory bandwidth to drive the
+	MBA Software Controller
 "debug":
 	Make debug files accessible. Available debug files are annotated with
 	"Available only with debug option".
diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h
index a4f1aa15f0a2..f98fc9adc2da 100644
--- a/arch/x86/kernel/cpu/resctrl/internal.h
+++ b/arch/x86/kernel/cpu/resctrl/internal.h
@@ -59,6 +59,7 @@  struct rdt_fs_context {
 	bool				enable_cdpl2;
 	bool				enable_cdpl3;
 	bool				enable_mba_mbps;
+	bool				use_mbm_total;
 	bool				enable_debug;
 };
 
@@ -428,6 +429,8 @@  extern struct rdt_hw_resource rdt_resources_all[];
 extern struct rdtgroup rdtgroup_default;
 DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
 
+extern enum resctrl_event_id mba_mbps_evt_id;
+
 extern struct dentry *debugfs_resctrl;
 
 enum resctrl_res_level {
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index f136ac046851..230297603836 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -431,9 +431,10 @@  static int __mon_event_count(u32 rmid, struct rmid_read *rr)
  */
 static void mbm_bw_count(u32 rmid, struct rmid_read *rr)
 {
-	struct mbm_state *m = &rr->d->mbm_local[rmid];
 	u64 cur_bw, bytes, cur_bytes;
+	struct mbm_state *m;
 
+	m = get_mbm_state(rr->d, rmid, rr->evtid);
 	cur_bytes = rr->val;
 	bytes = cur_bytes - m->prev_bw_bytes;
 	m->prev_bw_bytes = cur_bytes;
@@ -518,6 +519,7 @@  void mon_event_count(void *info)
  */
 static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
 {
+	enum resctrl_event_id evt_id = mba_mbps_evt_id;
 	u32 closid, rmid, cur_msr_val, new_msr_val;
 	struct mbm_state *pmbm_data, *cmbm_data;
 	u32 cur_bw, delta_bw, user_bw;
@@ -526,14 +528,14 @@  static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
 	struct list_head *head;
 	struct rdtgroup *entry;
 
-	if (!is_mbm_local_enabled())
+	if (!is_mbm_enabled())
 		return;
 
 	r_mba = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
 
 	closid = rgrp->closid;
 	rmid = rgrp->mon.rmid;
-	pmbm_data = &dom_mbm->mbm_local[rmid];
+	pmbm_data = get_mbm_state(dom_mbm, rmid, evt_id);
 
 	dom_mba = get_domain_from_cpu(smp_processor_id(), r_mba);
 	if (!dom_mba) {
@@ -553,7 +555,7 @@  static void update_mba_bw(struct rdtgroup *rgrp, struct rdt_domain *dom_mbm)
 	 */
 	head = &rgrp->mon.crdtgrp_list;
 	list_for_each_entry(entry, head, mon.crdtgrp_list) {
-		cmbm_data = &dom_mbm->mbm_local[entry->mon.rmid];
+		cmbm_data = get_mbm_state(dom_mbm, entry->mon.rmid, evt_id);
 		cur_bw += cmbm_data->prev_bw;
 		delta_bw += cmbm_data->delta_bw;
 	}
@@ -616,18 +618,14 @@  static void mbm_update(struct rdt_resource *r, struct rdt_domain *d, int rmid)
 		rr.evtid = QOS_L3_MBM_TOTAL_EVENT_ID;
 		rr.val = 0;
 		__mon_event_count(rmid, &rr);
+		if (is_mba_sc(NULL) && rr.evtid == mba_mbps_evt_id)
+			mbm_bw_count(rmid, &rr);
 	}
 	if (is_mbm_local_enabled()) {
 		rr.evtid = QOS_L3_MBM_LOCAL_EVENT_ID;
 		rr.val = 0;
 		__mon_event_count(rmid, &rr);
-
-		/*
-		 * Call the MBA software controller only for the
-		 * control groups and when user has enabled
-		 * the software controller explicitly.
-		 */
-		if (is_mba_sc(NULL))
+		if (is_mba_sc(NULL) && rr.evtid == mba_mbps_evt_id)
 			mbm_bw_count(rmid, &rr);
 	}
 }
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 69a1de92384a..39a5b73af4ef 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -57,6 +57,8 @@  static char last_cmd_status_buf[512];
 static int rdtgroup_setup_root(struct rdt_fs_context *ctx);
 static void rdtgroup_destroy_root(void);
 
+enum resctrl_event_id mba_mbps_evt_id;
+
 struct dentry *debugfs_resctrl;
 
 static bool resctrl_debug;
@@ -2294,7 +2296,7 @@  static bool supports_mba_mbps(void)
 {
 	struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_MBA].r_resctrl;
 
-	return (is_mbm_local_enabled() &&
+	return (is_mbm_enabled() &&
 		r->alloc_capable && is_mba_linear());
 }
 
@@ -2470,6 +2472,10 @@  static int rdt_enable_ctx(struct rdt_fs_context *ctx)
 		ret = set_mba_sc(true);
 		if (ret)
 			goto out_cdpl3;
+		if (ctx->use_mbm_total || !is_mbm_local_enabled())
+			mba_mbps_evt_id = QOS_L3_MBM_TOTAL_EVENT_ID;
+		else
+			mba_mbps_evt_id = QOS_L3_MBM_LOCAL_EVENT_ID;
 	}
 
 	if (ctx->enable_debug)
@@ -2683,6 +2689,7 @@  enum rdt_param {
 	Opt_cdp,
 	Opt_cdpl2,
 	Opt_mba_mbps,
+	Opt_mba_mbps_total,
 	Opt_debug,
 	nr__rdt_params
 };
@@ -2691,6 +2698,7 @@  static const struct fs_parameter_spec rdt_fs_parameters[] = {
 	fsparam_flag("cdp",		Opt_cdp),
 	fsparam_flag("cdpl2",		Opt_cdpl2),
 	fsparam_flag("mba_MBps",	Opt_mba_mbps),
+	fsparam_flag("total",		Opt_mba_mbps_total),
 	fsparam_flag("debug",		Opt_debug),
 	{}
 };
@@ -2717,6 +2725,11 @@  static int rdt_parse_param(struct fs_context *fc, struct fs_parameter *param)
 			return -EINVAL;
 		ctx->enable_mba_mbps = true;
 		return 0;
+	case Opt_mba_mbps_total:
+		if (!is_mbm_total_enabled())
+			return -EINVAL;
+		ctx->use_mbm_total = true;
+		return 0;
 	case Opt_debug:
 		ctx->enable_debug = true;
 		return 0;