Message ID | 20230923030657.16148-2-haitao.huang@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:910f:0:b0:403:3b70:6f57 with SMTP id r15csp72440vqg; Fri, 22 Sep 2023 21:08:42 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHzb1hZ2IpWRH7sPTygGg1sYR8LKcO71Qcm2JifWeAIn46nIzS4/8epYv409b3k2n/g1FkJ X-Received: by 2002:a05:6a20:1451:b0:154:a1e4:b676 with SMTP id a17-20020a056a20145100b00154a1e4b676mr1916574pzi.4.1695442122573; Fri, 22 Sep 2023 21:08:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695442122; cv=none; d=google.com; s=arc-20160816; b=WS0BBRDjWO5abdEhobQTyRpBfmlgKtUrAGwW/jeZBptgDTiL94CgTtqD0X1b4dZ7mg 1J6OIudcjfx9gzO82ulQGsSV7Tpbo1apwZEoqfhka1rO6D2EwEGIVNOP2kk6j9twzxNi g9dYG2f7tumkjt4lms1jV00chksUHLtO4Y4dqt3baA3mJNorDSd8drARXe6X97CE1EoG 1/dur+XcO1ShC3pSFt0Eww0X3lEjm/Ds+oE8V/+wLoO1LPHprx8kdq6Efascjhk14B3W ruj1TpyknkgI9MKn4Ns4n89M0TTVUPyx1DA7fIQH0Y7qAXniYvKfB6uwSiJJ1lk3h/nH jqDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=FbBk0VqbBxhxYaRMrXFsRz5a+U1QibFI6qNEvZbq28Q=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=OI+T64geWsEfidYBv511oBAxpTFG2uCeNDRsHvxrn3Uqfmh1AXcQ3rk8u+ytn3CCje 9g6vlJrd8skHudflTApm4Owfd+KnJnT+uBis450RlmSn2GMns3Lg8GRrru9wDMT7xvuK rh8CYxpWIk7fJeCDCgqi03mL1Skc0382v0JAy43NUBAvRThqJmOZHij9znCawOoev5yo NJl+8MaYGfmcC7jqf9c+YUrmdTLi7pGLC8HbWsleLza7emt+ek8lusytRQJuBII9Ci2f x2XGY531CsWsPaeUjHkLkOWajlVf6R8LwJI0s+Za2NhonDcDHXc0tmlck2JY/zLEE77M n01g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=U0B+NtHF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id s1-20020a63dc01000000b0056fed6fa634si4881289pgg.433.2023.09.22.21.08.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Sep 2023 21:08:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=U0B+NtHF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id D60D6801D483; Fri, 22 Sep 2023 20:07:13 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229848AbjIWDHK (ORCPT <rfc822;pwkd43@gmail.com> + 28 others); Fri, 22 Sep 2023 23:07:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229660AbjIWDHH (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 22 Sep 2023 23:07:07 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 470751A5; Fri, 22 Sep 2023 20:07:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438421; x=1726974421; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CWaM84xI5mGXkf+aiRiMgd1gKaujRaa2JuPurtwlf8Y=; b=U0B+NtHFKuBLO9I6DNye3WraaZEKf26w14SdDHHA07l1N79BAQ0lGhPb 8PCaWO2jrzXsqtSKx1atOyyIgN6Q51Kaj9me++3ONChEdNEi7Il3ogEZa 747VrJmx8vjHMiiQWA4Xteav+Apb373Qi2vurET8bP042ktzu5i5eHc/a /sO+FhP3xE8niDlcoCKAXOK2h7p9qsFzbCr2vaGXSjO84mmM+rJ+23DZM wBSTYfNs9SZ/+aXTu3JvlL0HcVa71QRKTCu6jy4Tvfj5LdejMn7VF/ngd SmTMiZUnbumSPoXbOQ+fi82kEumAOusjqJWRTAeKRAIay2rRDsd/n6cOQ A==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466722" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466722" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:06:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048516" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048516" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:03 -0700 From: Haitao Huang <haitao.huang@linux.intel.com> To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 01/18] cgroup/misc: Add per resource callbacks for CSS events Date: Fri, 22 Sep 2023 20:06:40 -0700 Message-Id: <20230923030657.16148-2-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:07:13 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777799918856362590 X-GMAIL-MSGID: 1777799918856362590 |
Series |
Add Cgroup support for SGX EPC memory
|
|
Commit Message
Haitao Huang
Sept. 23, 2023, 3:06 a.m. UTC
From: Kristen Carlson Accardi <kristen@linux.intel.com> The misc cgroup controller (subsystem) currently does not perform resource type specific action for Cgroups Subsystem State (CSS) events: the 'css_alloc' event when a cgroup is created and the 'css_free' event when a cgroup is destroyed, or in event of user writing the max value to the misc.max file to set the usage limit of a specific resource [admin-guide/cgroup-v2.rst, 5-9. Misc]. Define callbacks for those events and allow resource providers to register the callbacks per resource type as needed. This will be utilized later by the EPC misc cgroup support implemented in the SGX driver: - On css_alloc, allocate and initialize necessary structures for EPC reclaiming, e.g., LRU list, work queue, etc. - On css_free, cleanup and free those structures created in alloc. - On max_write, trigger EPC reclaiming if the new limit is at or below current usage. Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com> --- V5: - Remove prefixes from the callback names (tj) - Update commit message (Jarkko) V4: - Moved this to the front of the series. - Applies on cgroup/for-6.6 with the overflow fix for misc. V3: - Removed the released() callback --- include/linux/misc_cgroup.h | 5 +++++ kernel/cgroup/misc.c | 32 +++++++++++++++++++++++++++++--- 2 files changed, 34 insertions(+), 3 deletions(-)
Comments
On Sat Sep 23, 2023 at 6:06 AM EEST, Haitao Huang wrote: > From: Kristen Carlson Accardi <kristen@linux.intel.com> > > The misc cgroup controller (subsystem) currently does not perform > resource type specific action for Cgroups Subsystem State (CSS) events: > the 'css_alloc' event when a cgroup is created and the 'css_free' event > when a cgroup is destroyed, or in event of user writing the max value to > the misc.max file to set the usage limit of a specific resource > [admin-guide/cgroup-v2.rst, 5-9. Misc]. > > Define callbacks for those events and allow resource providers to > register the callbacks per resource type as needed. This will be > utilized later by the EPC misc cgroup support implemented in the SGX > driver: > - On css_alloc, allocate and initialize necessary structures for EPC > reclaiming, e.g., LRU list, work queue, etc. > - On css_free, cleanup and free those structures created in alloc. > - On max_write, trigger EPC reclaiming if the new limit is at or below > current usage. > > Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> > Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com> > --- > V5: > - Remove prefixes from the callback names (tj) > - Update commit message (Jarkko) > > V4: > - Moved this to the front of the series. > - Applies on cgroup/for-6.6 with the overflow fix for misc. > > V3: > - Removed the released() callback > --- > include/linux/misc_cgroup.h | 5 +++++ > kernel/cgroup/misc.c | 32 +++++++++++++++++++++++++++++--- > 2 files changed, 34 insertions(+), 3 deletions(-) > > diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h > index e799b1f8d05b..96a88822815a 100644 > --- a/include/linux/misc_cgroup.h > +++ b/include/linux/misc_cgroup.h > @@ -37,6 +37,11 @@ struct misc_res { > u64 max; > atomic64_t usage; > atomic64_t events; > + > + /* per resource callback ops */ > + int (*alloc)(struct misc_cg *cg); > + void (*free)(struct misc_cg *cg); > + void (*max_write)(struct misc_cg *cg); > }; > > /** > diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c > index 79a3717a5803..62c9198dee21 100644 > --- a/kernel/cgroup/misc.c > +++ b/kernel/cgroup/misc.c > @@ -276,10 +276,13 @@ static ssize_t misc_cg_max_write(struct kernfs_open_file *of, char *buf, > > cg = css_misc(of_css(of)); > > - if (READ_ONCE(misc_res_capacity[type])) > + if (READ_ONCE(misc_res_capacity[type])) { > WRITE_ONCE(cg->res[type].max, max); > - else > + if (cg->res[type].max_write) > + cg->res[type].max_write(cg); > + } else { > ret = -EINVAL; > + } > > return ret ? ret : nbytes; > } > @@ -383,23 +386,39 @@ static struct cftype misc_cg_files[] = { > static struct cgroup_subsys_state * > misc_cg_alloc(struct cgroup_subsys_state *parent_css) > { > + struct misc_cg *parent_cg; > enum misc_res_type i; > struct misc_cg *cg; > + int ret; > > if (!parent_css) { > cg = &root_cg; > + parent_cg = &root_cg; > } else { > cg = kzalloc(sizeof(*cg), GFP_KERNEL); > if (!cg) > return ERR_PTR(-ENOMEM); > + parent_cg = css_misc(parent_css); > } > > for (i = 0; i < MISC_CG_RES_TYPES; i++) { > WRITE_ONCE(cg->res[i].max, MAX_NUM); > atomic64_set(&cg->res[i].usage, 0); > + if (parent_cg->res[i].alloc) { > + ret = parent_cg->res[i].alloc(cg); > + if (ret) > + goto alloc_err; > + } > } > > return &cg->css; > + > +alloc_err: > + for (i = 0; i < MISC_CG_RES_TYPES; i++) > + if (parent_cg->res[i].free) > + cg->res[i].free(cg); > + kfree(cg); > + return ERR_PTR(ret); > } > > /** > @@ -410,7 +429,14 @@ misc_cg_alloc(struct cgroup_subsys_state *parent_css) > */ > static void misc_cg_free(struct cgroup_subsys_state *css) > { > - kfree(css_misc(css)); > + struct misc_cg *cg = css_misc(css); > + enum misc_res_type i; > + > + for (i = 0; i < MISC_CG_RES_TYPES; i++) > + if (cg->res[i].free) > + cg->res[i].free(cg); > + > + kfree(cg); > } > > /* Cgroup controller callbacks */ > -- > 2.25.1 Since the only existing client feature requires all callbacks, should this not have that as an invariant? I.e. it might be better to fail unless *all* ops are non-nil (e.g. to catch issues in the kernel code). BR, Jarkko
Hi Jarkko On Mon, 25 Sep 2023 12:09:21 -0500, Jarkko Sakkinen <jarkko@kernel.org> wrote: > On Sat Sep 23, 2023 at 6:06 AM EEST, Haitao Huang wrote: >> From: Kristen Carlson Accardi <kristen@linux.intel.com> >> >> The misc cgroup controller (subsystem) currently does not perform >> resource type specific action for Cgroups Subsystem State (CSS) events: >> the 'css_alloc' event when a cgroup is created and the 'css_free' event >> when a cgroup is destroyed, or in event of user writing the max value to >> the misc.max file to set the usage limit of a specific resource >> [admin-guide/cgroup-v2.rst, 5-9. Misc]. >> >> Define callbacks for those events and allow resource providers to >> register the callbacks per resource type as needed. This will be >> utilized later by the EPC misc cgroup support implemented in the SGX >> driver: >> - On css_alloc, allocate and initialize necessary structures for EPC >> reclaiming, e.g., LRU list, work queue, etc. >> - On css_free, cleanup and free those structures created in alloc. >> - On max_write, trigger EPC reclaiming if the new limit is at or below >> current usage. >> >> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> >> Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com> >> --- >> V5: >> - Remove prefixes from the callback names (tj) >> - Update commit message (Jarkko) >> >> V4: >> - Moved this to the front of the series. >> - Applies on cgroup/for-6.6 with the overflow fix for misc. >> >> V3: >> - Removed the released() callback >> --- >> include/linux/misc_cgroup.h | 5 +++++ >> kernel/cgroup/misc.c | 32 +++++++++++++++++++++++++++++--- >> 2 files changed, 34 insertions(+), 3 deletions(-) >> >> diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h >> index e799b1f8d05b..96a88822815a 100644 >> --- a/include/linux/misc_cgroup.h >> +++ b/include/linux/misc_cgroup.h >> @@ -37,6 +37,11 @@ struct misc_res { >> u64 max; >> atomic64_t usage; >> atomic64_t events; >> + >> + /* per resource callback ops */ >> + int (*alloc)(struct misc_cg *cg); >> + void (*free)(struct misc_cg *cg); >> + void (*max_write)(struct misc_cg *cg); >> }; >> >> /** >> diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c >> index 79a3717a5803..62c9198dee21 100644 >> --- a/kernel/cgroup/misc.c >> +++ b/kernel/cgroup/misc.c >> @@ -276,10 +276,13 @@ static ssize_t misc_cg_max_write(struct >> kernfs_open_file *of, char *buf, >> >> cg = css_misc(of_css(of)); >> >> - if (READ_ONCE(misc_res_capacity[type])) >> + if (READ_ONCE(misc_res_capacity[type])) { >> WRITE_ONCE(cg->res[type].max, max); >> - else >> + if (cg->res[type].max_write) >> + cg->res[type].max_write(cg); >> + } else { >> ret = -EINVAL; >> + } >> >> return ret ? ret : nbytes; >> } >> @@ -383,23 +386,39 @@ static struct cftype misc_cg_files[] = { >> static struct cgroup_subsys_state * >> misc_cg_alloc(struct cgroup_subsys_state *parent_css) >> { >> + struct misc_cg *parent_cg; >> enum misc_res_type i; >> struct misc_cg *cg; >> + int ret; >> >> if (!parent_css) { >> cg = &root_cg; >> + parent_cg = &root_cg; >> } else { >> cg = kzalloc(sizeof(*cg), GFP_KERNEL); >> if (!cg) >> return ERR_PTR(-ENOMEM); >> + parent_cg = css_misc(parent_css); >> } >> >> for (i = 0; i < MISC_CG_RES_TYPES; i++) { >> WRITE_ONCE(cg->res[i].max, MAX_NUM); >> atomic64_set(&cg->res[i].usage, 0); >> + if (parent_cg->res[i].alloc) { >> + ret = parent_cg->res[i].alloc(cg); >> + if (ret) >> + goto alloc_err; >> + } >> } >> >> return &cg->css; >> + >> +alloc_err: >> + for (i = 0; i < MISC_CG_RES_TYPES; i++) >> + if (parent_cg->res[i].free) >> + cg->res[i].free(cg); >> + kfree(cg); >> + return ERR_PTR(ret); >> } >> >> /** >> @@ -410,7 +429,14 @@ misc_cg_alloc(struct cgroup_subsys_state >> *parent_css) >> */ >> static void misc_cg_free(struct cgroup_subsys_state *css) >> { >> - kfree(css_misc(css)); >> + struct misc_cg *cg = css_misc(css); >> + enum misc_res_type i; >> + >> + for (i = 0; i < MISC_CG_RES_TYPES; i++) >> + if (cg->res[i].free) >> + cg->res[i].free(cg); >> + >> + kfree(cg); >> } >> >> /* Cgroup controller callbacks */ >> -- >> 2.25.1 > > Since the only existing client feature requires all callbacks, should > this not have that as an invariant? > > I.e. it might be better to fail unless *all* ops are non-nil (e.g. to > catch issues in the kernel code). > These callbacks are chained from cgroup_subsys, and they are defined separately so it'd be better follow the same pattern. Or change together with cgroup_subsys if we want to do that. Reasonable? Thanks Haitao
On Tue Sep 26, 2023 at 6:04 AM EEST, Haitao Huang wrote: > Hi Jarkko > > On Mon, 25 Sep 2023 12:09:21 -0500, Jarkko Sakkinen <jarkko@kernel.org> > wrote: > > > On Sat Sep 23, 2023 at 6:06 AM EEST, Haitao Huang wrote: > >> From: Kristen Carlson Accardi <kristen@linux.intel.com> > >> > >> The misc cgroup controller (subsystem) currently does not perform > >> resource type specific action for Cgroups Subsystem State (CSS) events: > >> the 'css_alloc' event when a cgroup is created and the 'css_free' event > >> when a cgroup is destroyed, or in event of user writing the max value to > >> the misc.max file to set the usage limit of a specific resource > >> [admin-guide/cgroup-v2.rst, 5-9. Misc]. > >> > >> Define callbacks for those events and allow resource providers to > >> register the callbacks per resource type as needed. This will be > >> utilized later by the EPC misc cgroup support implemented in the SGX > >> driver: > >> - On css_alloc, allocate and initialize necessary structures for EPC > >> reclaiming, e.g., LRU list, work queue, etc. > >> - On css_free, cleanup and free those structures created in alloc. > >> - On max_write, trigger EPC reclaiming if the new limit is at or below > >> current usage. > >> > >> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> > >> Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com> > >> --- > >> V5: > >> - Remove prefixes from the callback names (tj) > >> - Update commit message (Jarkko) > >> > >> V4: > >> - Moved this to the front of the series. > >> - Applies on cgroup/for-6.6 with the overflow fix for misc. > >> > >> V3: > >> - Removed the released() callback > >> --- > >> include/linux/misc_cgroup.h | 5 +++++ > >> kernel/cgroup/misc.c | 32 +++++++++++++++++++++++++++++--- > >> 2 files changed, 34 insertions(+), 3 deletions(-) > >> > >> diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h > >> index e799b1f8d05b..96a88822815a 100644 > >> --- a/include/linux/misc_cgroup.h > >> +++ b/include/linux/misc_cgroup.h > >> @@ -37,6 +37,11 @@ struct misc_res { > >> u64 max; > >> atomic64_t usage; > >> atomic64_t events; > >> + > >> + /* per resource callback ops */ > >> + int (*alloc)(struct misc_cg *cg); > >> + void (*free)(struct misc_cg *cg); > >> + void (*max_write)(struct misc_cg *cg); > >> }; > >> > >> /** > >> diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c > >> index 79a3717a5803..62c9198dee21 100644 > >> --- a/kernel/cgroup/misc.c > >> +++ b/kernel/cgroup/misc.c > >> @@ -276,10 +276,13 @@ static ssize_t misc_cg_max_write(struct > >> kernfs_open_file *of, char *buf, > >> > >> cg = css_misc(of_css(of)); > >> > >> - if (READ_ONCE(misc_res_capacity[type])) > >> + if (READ_ONCE(misc_res_capacity[type])) { > >> WRITE_ONCE(cg->res[type].max, max); > >> - else > >> + if (cg->res[type].max_write) > >> + cg->res[type].max_write(cg); > >> + } else { > >> ret = -EINVAL; > >> + } > >> > >> return ret ? ret : nbytes; > >> } > >> @@ -383,23 +386,39 @@ static struct cftype misc_cg_files[] = { > >> static struct cgroup_subsys_state * > >> misc_cg_alloc(struct cgroup_subsys_state *parent_css) > >> { > >> + struct misc_cg *parent_cg; > >> enum misc_res_type i; > >> struct misc_cg *cg; > >> + int ret; > >> > >> if (!parent_css) { > >> cg = &root_cg; > >> + parent_cg = &root_cg; > >> } else { > >> cg = kzalloc(sizeof(*cg), GFP_KERNEL); > >> if (!cg) > >> return ERR_PTR(-ENOMEM); > >> + parent_cg = css_misc(parent_css); > >> } > >> > >> for (i = 0; i < MISC_CG_RES_TYPES; i++) { > >> WRITE_ONCE(cg->res[i].max, MAX_NUM); > >> atomic64_set(&cg->res[i].usage, 0); > >> + if (parent_cg->res[i].alloc) { > >> + ret = parent_cg->res[i].alloc(cg); > >> + if (ret) > >> + goto alloc_err; > >> + } > >> } > >> > >> return &cg->css; > >> + > >> +alloc_err: > >> + for (i = 0; i < MISC_CG_RES_TYPES; i++) > >> + if (parent_cg->res[i].free) > >> + cg->res[i].free(cg); > >> + kfree(cg); > >> + return ERR_PTR(ret); > >> } > >> > >> /** > >> @@ -410,7 +429,14 @@ misc_cg_alloc(struct cgroup_subsys_state > >> *parent_css) > >> */ > >> static void misc_cg_free(struct cgroup_subsys_state *css) > >> { > >> - kfree(css_misc(css)); > >> + struct misc_cg *cg = css_misc(css); > >> + enum misc_res_type i; > >> + > >> + for (i = 0; i < MISC_CG_RES_TYPES; i++) > >> + if (cg->res[i].free) > >> + cg->res[i].free(cg); > >> + > >> + kfree(cg); > >> } > >> > >> /* Cgroup controller callbacks */ > >> -- > >> 2.25.1 > > > > Since the only existing client feature requires all callbacks, should > > this not have that as an invariant? > > > > I.e. it might be better to fail unless *all* ops are non-nil (e.g. to > > catch issues in the kernel code). > > > > These callbacks are chained from cgroup_subsys, and they are defined > separately so it'd be better follow the same pattern. Or change together > with cgroup_subsys if we want to do that. Reasonable? I noticed this one later: It would better to create a separate ops struct and declare the instance as const at minimum. Then there is no need for dynamic assigment of ops and all that is in rodata. This is improves both security and also allows static analysis bit better. Now you have to dynamically trace the struct instance, e.g. in case of a bug. If this one done, it would be already in the vmlinux. BR, Jarkko
On Tue Sep 26, 2023 at 4:10 PM EEST, Jarkko Sakkinen wrote: > On Tue Sep 26, 2023 at 6:04 AM EEST, Haitao Huang wrote: > > Hi Jarkko > > > > On Mon, 25 Sep 2023 12:09:21 -0500, Jarkko Sakkinen <jarkko@kernel.org> > > wrote: > > > > > On Sat Sep 23, 2023 at 6:06 AM EEST, Haitao Huang wrote: > > >> From: Kristen Carlson Accardi <kristen@linux.intel.com> > > >> > > >> The misc cgroup controller (subsystem) currently does not perform > > >> resource type specific action for Cgroups Subsystem State (CSS) events: > > >> the 'css_alloc' event when a cgroup is created and the 'css_free' event > > >> when a cgroup is destroyed, or in event of user writing the max value to > > >> the misc.max file to set the usage limit of a specific resource > > >> [admin-guide/cgroup-v2.rst, 5-9. Misc]. > > >> > > >> Define callbacks for those events and allow resource providers to > > >> register the callbacks per resource type as needed. This will be > > >> utilized later by the EPC misc cgroup support implemented in the SGX > > >> driver: > > >> - On css_alloc, allocate and initialize necessary structures for EPC > > >> reclaiming, e.g., LRU list, work queue, etc. > > >> - On css_free, cleanup and free those structures created in alloc. > > >> - On max_write, trigger EPC reclaiming if the new limit is at or below > > >> current usage. > > >> > > >> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> > > >> Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com> > > >> --- > > >> V5: > > >> - Remove prefixes from the callback names (tj) > > >> - Update commit message (Jarkko) > > >> > > >> V4: > > >> - Moved this to the front of the series. > > >> - Applies on cgroup/for-6.6 with the overflow fix for misc. > > >> > > >> V3: > > >> - Removed the released() callback > > >> --- > > >> include/linux/misc_cgroup.h | 5 +++++ > > >> kernel/cgroup/misc.c | 32 +++++++++++++++++++++++++++++--- > > >> 2 files changed, 34 insertions(+), 3 deletions(-) > > >> > > >> diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h > > >> index e799b1f8d05b..96a88822815a 100644 > > >> --- a/include/linux/misc_cgroup.h > > >> +++ b/include/linux/misc_cgroup.h > > >> @@ -37,6 +37,11 @@ struct misc_res { > > >> u64 max; > > >> atomic64_t usage; > > >> atomic64_t events; > > >> + > > >> + /* per resource callback ops */ > > >> + int (*alloc)(struct misc_cg *cg); > > >> + void (*free)(struct misc_cg *cg); > > >> + void (*max_write)(struct misc_cg *cg); > > >> }; > > >> > > >> /** > > >> diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c > > >> index 79a3717a5803..62c9198dee21 100644 > > >> --- a/kernel/cgroup/misc.c > > >> +++ b/kernel/cgroup/misc.c > > >> @@ -276,10 +276,13 @@ static ssize_t misc_cg_max_write(struct > > >> kernfs_open_file *of, char *buf, > > >> > > >> cg = css_misc(of_css(of)); > > >> > > >> - if (READ_ONCE(misc_res_capacity[type])) > > >> + if (READ_ONCE(misc_res_capacity[type])) { > > >> WRITE_ONCE(cg->res[type].max, max); > > >> - else > > >> + if (cg->res[type].max_write) > > >> + cg->res[type].max_write(cg); > > >> + } else { > > >> ret = -EINVAL; > > >> + } > > >> > > >> return ret ? ret : nbytes; > > >> } > > >> @@ -383,23 +386,39 @@ static struct cftype misc_cg_files[] = { > > >> static struct cgroup_subsys_state * > > >> misc_cg_alloc(struct cgroup_subsys_state *parent_css) > > >> { > > >> + struct misc_cg *parent_cg; > > >> enum misc_res_type i; > > >> struct misc_cg *cg; > > >> + int ret; > > >> > > >> if (!parent_css) { > > >> cg = &root_cg; > > >> + parent_cg = &root_cg; > > >> } else { > > >> cg = kzalloc(sizeof(*cg), GFP_KERNEL); > > >> if (!cg) > > >> return ERR_PTR(-ENOMEM); > > >> + parent_cg = css_misc(parent_css); > > >> } > > >> > > >> for (i = 0; i < MISC_CG_RES_TYPES; i++) { > > >> WRITE_ONCE(cg->res[i].max, MAX_NUM); > > >> atomic64_set(&cg->res[i].usage, 0); > > >> + if (parent_cg->res[i].alloc) { > > >> + ret = parent_cg->res[i].alloc(cg); > > >> + if (ret) > > >> + goto alloc_err; > > >> + } > > >> } > > >> > > >> return &cg->css; > > >> + > > >> +alloc_err: > > >> + for (i = 0; i < MISC_CG_RES_TYPES; i++) > > >> + if (parent_cg->res[i].free) > > >> + cg->res[i].free(cg); > > >> + kfree(cg); > > >> + return ERR_PTR(ret); > > >> } > > >> > > >> /** > > >> @@ -410,7 +429,14 @@ misc_cg_alloc(struct cgroup_subsys_state > > >> *parent_css) > > >> */ > > >> static void misc_cg_free(struct cgroup_subsys_state *css) > > >> { > > >> - kfree(css_misc(css)); > > >> + struct misc_cg *cg = css_misc(css); > > >> + enum misc_res_type i; > > >> + > > >> + for (i = 0; i < MISC_CG_RES_TYPES; i++) > > >> + if (cg->res[i].free) > > >> + cg->res[i].free(cg); > > >> + > > >> + kfree(cg); > > >> } > > >> > > >> /* Cgroup controller callbacks */ > > >> -- > > >> 2.25.1 > > > > > > Since the only existing client feature requires all callbacks, should > > > this not have that as an invariant? > > > > > > I.e. it might be better to fail unless *all* ops are non-nil (e.g. to > > > catch issues in the kernel code). > > > > > > > These callbacks are chained from cgroup_subsys, and they are defined > > separately so it'd be better follow the same pattern. Or change together > > with cgroup_subsys if we want to do that. Reasonable? > > I noticed this one later: > > It would better to create a separate ops struct and declare the instance > as const at minimum. > > Then there is no need for dynamic assigment of ops and all that is in > rodata. This is improves both security and also allows static analysis > bit better. > > Now you have to dynamically trace the struct instance, e.g. in case of > a bug. If this one done, it would be already in the vmlinux. I.e. then in the driver you can have static const struct declaration with *all* pointers pre-assigned. Not sure if cgroups follows this or not but it is *objectively* better. Previous work is not always best possible work... BR, Jarkko
On Tue, 26 Sep 2023 08:13:18 -0500, Jarkko Sakkinen <jarkko@kernel.org> wrote: ... >> > >> /** >> > >> @@ -410,7 +429,14 @@ misc_cg_alloc(struct cgroup_subsys_state >> > >> *parent_css) >> > >> */ >> > >> static void misc_cg_free(struct cgroup_subsys_state *css) >> > >> { >> > >> - kfree(css_misc(css)); >> > >> + struct misc_cg *cg = css_misc(css); >> > >> + enum misc_res_type i; >> > >> + >> > >> + for (i = 0; i < MISC_CG_RES_TYPES; i++) >> > >> + if (cg->res[i].free) >> > >> + cg->res[i].free(cg); >> > >> + >> > >> + kfree(cg); >> > >> } >> > >> >> > >> /* Cgroup controller callbacks */ >> > >> -- >> > >> 2.25.1 >> > > >> > > Since the only existing client feature requires all callbacks, >> should >> > > this not have that as an invariant? >> > > >> > > I.e. it might be better to fail unless *all* ops are non-nil (e.g. >> to >> > > catch issues in the kernel code). >> > > >> > >> > These callbacks are chained from cgroup_subsys, and they are defined >> > separately so it'd be better follow the same pattern. Or change >> together >> > with cgroup_subsys if we want to do that. Reasonable? >> >> I noticed this one later: >> >> It would better to create a separate ops struct and declare the instance >> as const at minimum. >> >> Then there is no need for dynamic assigment of ops and all that is in >> rodata. This is improves both security and also allows static analysis >> bit better. >> >> Now you have to dynamically trace the struct instance, e.g. in case of >> a bug. If this one done, it would be already in the vmlinux. >I.e. then in the driver you can have static const struct declaration > with *all* pointers pre-assigned. > > Not sure if cgroups follows this or not but it is *objectively* > better. Previous work is not always best possible work... > IIUC, like vm_ops field in vma structs. Although function pointers in vm_ops are assigned statically, but you still need dynamically assign vm_ops for each instance of vma. So the code will look like this: if (parent_cg->res[i].misc_ops && parent_cg->res[i].misc_ops->alloc) { ... } I don't see this is the pattern used in cgroups and no strong opinion either way. TJ, do you have preference on this? Thanks Haitao
On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote: > From: Kristen Carlson Accardi <kristen@linux.intel.com> > > The misc cgroup controller (subsystem) currently does not perform > resource type specific action for Cgroups Subsystem State (CSS) events: > the 'css_alloc' event when a cgroup is created and the 'css_free' event > when a cgroup is destroyed, or in event of user writing the max value to > the misc.max file to set the usage limit of a specific resource > [admin-guide/cgroup-v2.rst, 5-9. Misc]. > > Define callbacks for those events and allow resource providers to > register the callbacks per resource type as needed. This will be > utilized later by the EPC misc cgroup support implemented in the SGX > driver: > - On css_alloc, allocate and initialize necessary structures for EPC > reclaiming, e.g., LRU list, work queue, etc. > - On css_free, cleanup and free those structures created in alloc. > - On max_write, trigger EPC reclaiming if the new limit is at or below > current usage. Nit: Wondering why we should trigger EPC reclaiming if the new limit is *at* current usage? I actually don't quite care about why here, but writing these details in the changelog may bring unnecessary confusion. I guess you can just remove all the details about what SGX driver needs to do on these callbacks. > > Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> > Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com> > --- > V5: > - Remove prefixes from the callback names (tj) > - Update commit message (Jarkko) > > V4: > - Moved this to the front of the series. > - Applies on cgroup/for-6.6 with the overflow fix for misc. > > V3: > - Removed the released() callback > --- > include/linux/misc_cgroup.h | 5 +++++ > kernel/cgroup/misc.c | 32 +++++++++++++++++++++++++++++--- > 2 files changed, 34 insertions(+), 3 deletions(-) > > diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h > index e799b1f8d05b..96a88822815a 100644 > --- a/include/linux/misc_cgroup.h > +++ b/include/linux/misc_cgroup.h > @@ -37,6 +37,11 @@ struct misc_res { > u64 max; > atomic64_t usage; > atomic64_t events; > + > + /* per resource callback ops */ Nit: This comment isn't quite useful IMHO. And it seems you should just expand the existing comment for the 'struct misc_res', which already covers the existing members. Or as Jarkko suggested, maybe you can introduce another structure 'misc_res_ops' and comment more details for all these callbacks just like 'struct misc_res'. Anyway it's cgroup maintainer's call. > + int (*alloc)(struct misc_cg *cg); > + void (*free)(struct misc_cg *cg); > + void (*max_write)(struct misc_cg *cg); > }; > > /** > diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c > index 79a3717a5803..62c9198dee21 100644 > --- a/kernel/cgroup/misc.c > +++ b/kernel/cgroup/misc.c > @@ -276,10 +276,13 @@ static ssize_t misc_cg_max_write(struct kernfs_open_file *of, char *buf, > > cg = css_misc(of_css(of)); > > - if (READ_ONCE(misc_res_capacity[type])) > + if (READ_ONCE(misc_res_capacity[type])) { > WRITE_ONCE(cg->res[type].max, max); > - else > + if (cg->res[type].max_write) > + cg->res[type].max_write(cg); > + } else { > ret = -EINVAL; > + } > > return ret ? ret : nbytes; > } > @@ -383,23 +386,39 @@ static struct cftype misc_cg_files[] = { > static struct cgroup_subsys_state * > misc_cg_alloc(struct cgroup_subsys_state *parent_css) > { > + struct misc_cg *parent_cg; Nit: The below variable '*cg' can be moved here together with 'parent_cg'. > enum misc_res_type i; > struct misc_cg *cg; > + int ret; > > if (!parent_css) { > cg = &root_cg; > + parent_cg = &root_cg; Nit: parent_cg = cg = &root_cg; ? > } else { > cg = kzalloc(sizeof(*cg), GFP_KERNEL); > if (!cg) > return ERR_PTR(-ENOMEM); > + parent_cg = css_misc(parent_css); > } > > for (i = 0; i < MISC_CG_RES_TYPES; i++) { > WRITE_ONCE(cg->res[i].max, MAX_NUM); > atomic64_set(&cg->res[i].usage, 0); > + if (parent_cg->res[i].alloc) { > + ret = parent_cg->res[i].alloc(cg); > + if (ret) > + goto alloc_err; > + } > } > > return &cg->css; > + > +alloc_err: > + for (i = 0; i < MISC_CG_RES_TYPES; i++) > + if (parent_cg->res[i].free) > + cg->res[i].free(cg); > + kfree(cg); > + return ERR_PTR(ret); > } > > /** > @@ -410,7 +429,14 @@ misc_cg_alloc(struct cgroup_subsys_state *parent_css) > */ > static void misc_cg_free(struct cgroup_subsys_state *css) > { > - kfree(css_misc(css)); > + struct misc_cg *cg = css_misc(css); > + enum misc_res_type i; > + > + for (i = 0; i < MISC_CG_RES_TYPES; i++) > + if (cg->res[i].free) > + cg->res[i].free(cg); > + > + kfree(cg); > } > > /* Cgroup controller callbacks */
On Wed Sep 27, 2023 at 4:56 AM EEST, Haitao Huang wrote: > On Tue, 26 Sep 2023 08:13:18 -0500, Jarkko Sakkinen <jarkko@kernel.org> > wrote: > > ... > >> > >> /** > >> > >> @@ -410,7 +429,14 @@ misc_cg_alloc(struct cgroup_subsys_state > >> > >> *parent_css) > >> > >> */ > >> > >> static void misc_cg_free(struct cgroup_subsys_state *css) > >> > >> { > >> > >> - kfree(css_misc(css)); > >> > >> + struct misc_cg *cg = css_misc(css); > >> > >> + enum misc_res_type i; > >> > >> + > >> > >> + for (i = 0; i < MISC_CG_RES_TYPES; i++) > >> > >> + if (cg->res[i].free) > >> > >> + cg->res[i].free(cg); > >> > >> + > >> > >> + kfree(cg); > >> > >> } > >> > >> > >> > >> /* Cgroup controller callbacks */ > >> > >> -- > >> > >> 2.25.1 > >> > > > >> > > Since the only existing client feature requires all callbacks, > >> should > >> > > this not have that as an invariant? > >> > > > >> > > I.e. it might be better to fail unless *all* ops are non-nil (e.g. > >> to > >> > > catch issues in the kernel code). > >> > > > >> > > >> > These callbacks are chained from cgroup_subsys, and they are defined > >> > separately so it'd be better follow the same pattern. Or change > >> together > >> > with cgroup_subsys if we want to do that. Reasonable? > >> > >> I noticed this one later: > >> > >> It would better to create a separate ops struct and declare the instance > >> as const at minimum. > >> > >> Then there is no need for dynamic assigment of ops and all that is in > >> rodata. This is improves both security and also allows static analysis > >> bit better. > >> > >> Now you have to dynamically trace the struct instance, e.g. in case of > >> a bug. If this one done, it would be already in the vmlinux. > >I.e. then in the driver you can have static const struct declaration > > with *all* pointers pre-assigned. > > > > Not sure if cgroups follows this or not but it is *objectively* > > better. Previous work is not always best possible work... > > > > IIUC, like vm_ops field in vma structs. Although function pointers in > vm_ops are assigned statically, but you still need dynamically assign > vm_ops for each instance of vma. > > So the code will look like this: > > if (parent_cg->res[i].misc_ops && parent_cg->res[i].misc_ops->alloc) > { > ... > } > > I don't see this is the pattern used in cgroups and no strong opinion > either way. > > TJ, do you have preference on this? I do have strong opinion on this. In the client side we want as much things declared statically as we can because it gives more tools for statical analysis. I don't want to see dynamic assignments in the SGX driver, when they are not actually needed, no matter things are done in cgroups. > Thanks > Haitao BR, Jarkko
On Tue Oct 3, 2023 at 1:47 AM EEST, Jarkko Sakkinen wrote: > On Wed Sep 27, 2023 at 4:56 AM EEST, Haitao Huang wrote: > > On Tue, 26 Sep 2023 08:13:18 -0500, Jarkko Sakkinen <jarkko@kernel.org> > > wrote: > > > > ... > > >> > >> /** > > >> > >> @@ -410,7 +429,14 @@ misc_cg_alloc(struct cgroup_subsys_state > > >> > >> *parent_css) > > >> > >> */ > > >> > >> static void misc_cg_free(struct cgroup_subsys_state *css) > > >> > >> { > > >> > >> - kfree(css_misc(css)); > > >> > >> + struct misc_cg *cg = css_misc(css); > > >> > >> + enum misc_res_type i; > > >> > >> + > > >> > >> + for (i = 0; i < MISC_CG_RES_TYPES; i++) > > >> > >> + if (cg->res[i].free) > > >> > >> + cg->res[i].free(cg); > > >> > >> + > > >> > >> + kfree(cg); > > >> > >> } > > >> > >> > > >> > >> /* Cgroup controller callbacks */ > > >> > >> -- > > >> > >> 2.25.1 > > >> > > > > >> > > Since the only existing client feature requires all callbacks, > > >> should > > >> > > this not have that as an invariant? > > >> > > > > >> > > I.e. it might be better to fail unless *all* ops are non-nil (e.g. > > >> to > > >> > > catch issues in the kernel code). > > >> > > > > >> > > > >> > These callbacks are chained from cgroup_subsys, and they are defined > > >> > separately so it'd be better follow the same pattern. Or change > > >> together > > >> > with cgroup_subsys if we want to do that. Reasonable? > > >> > > >> I noticed this one later: > > >> > > >> It would better to create a separate ops struct and declare the instance > > >> as const at minimum. > > >> > > >> Then there is no need for dynamic assigment of ops and all that is in > > >> rodata. This is improves both security and also allows static analysis > > >> bit better. > > >> > > >> Now you have to dynamically trace the struct instance, e.g. in case of > > >> a bug. If this one done, it would be already in the vmlinux. > > >I.e. then in the driver you can have static const struct declaration > > > with *all* pointers pre-assigned. > > > > > > Not sure if cgroups follows this or not but it is *objectively* > > > better. Previous work is not always best possible work... > > > > > > > IIUC, like vm_ops field in vma structs. Although function pointers in > > vm_ops are assigned statically, but you still need dynamically assign > > vm_ops for each instance of vma. > > > > So the code will look like this: > > > > if (parent_cg->res[i].misc_ops && parent_cg->res[i].misc_ops->alloc) > > { > > ... > > } > > > > I don't see this is the pattern used in cgroups and no strong opinion > > either way. > > > > TJ, do you have preference on this? > > I do have strong opinion on this. In the client side we want as much > things declared statically as we can because it gives more tools for > statical analysis. > > I don't want to see dynamic assignments in the SGX driver, when they > are not actually needed, no matter things are done in cgroups. I.e. I don't really even care what crazy things cgroups subsystem might do or not do. It's not my problem. All I care is that we *do not* have any use for assigning those pointers at run-time. So do whatever you want with cgroups side as long as this is not the case. BR, Jarkko
On Wed, 27 Sep 2023 04:20:55 -0500, Huang, Kai <kai.huang@intel.com> wrote: > On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote: >> From: Kristen Carlson Accardi <kristen@linux.intel.com> >> >> The misc cgroup controller (subsystem) currently does not perform >> resource type specific action for Cgroups Subsystem State (CSS) events: >> the 'css_alloc' event when a cgroup is created and the 'css_free' event >> when a cgroup is destroyed, or in event of user writing the max value to >> the misc.max file to set the usage limit of a specific resource >> [admin-guide/cgroup-v2.rst, 5-9. Misc]. >> >> Define callbacks for those events and allow resource providers to >> register the callbacks per resource type as needed. This will be >> utilized later by the EPC misc cgroup support implemented in the SGX >> driver: >> - On css_alloc, allocate and initialize necessary structures for EPC >> reclaiming, e.g., LRU list, work queue, etc. >> - On css_free, cleanup and free those structures created in alloc. >> - On max_write, trigger EPC reclaiming if the new limit is at or below >> current usage. > > Nit: > > Wondering why we should trigger EPC reclaiming if the new limit is *at* > current > usage? > > I actually don't quite care about why here, but writing these details in > the > changelog may bring unnecessary confusion. I guess you can just remove > all the > details about what SGX driver needs to do on these callbacks. > > Okay, I'll remove the three bullets on the SGX drive implementation details. Thanks Haitao
Hi Jarkko On Mon, 02 Oct 2023 17:55:14 -0500, Jarkko Sakkinen <jarkko@kernel.org> wrote: ... >> > >> I noticed this one later: >> > >> >> > >> It would better to create a separate ops struct and declare the >> instance >> > >> as const at minimum. >> > >> >> > >> Then there is no need for dynamic assigment of ops and all that is >> in >> > >> rodata. This is improves both security and also allows static >> analysis >> > >> bit better. >> > >> >> > >> Now you have to dynamically trace the struct instance, e.g. in >> case of >> > >> a bug. If this one done, it would be already in the vmlinux. >> > >I.e. then in the driver you can have static const struct declaration >> > > with *all* pointers pre-assigned. >> > > >> > > Not sure if cgroups follows this or not but it is *objectively* >> > > better. Previous work is not always best possible work... >> > > >> > >> > IIUC, like vm_ops field in vma structs. Although function pointers in >> > vm_ops are assigned statically, but you still need dynamically assign >> > vm_ops for each instance of vma. >> > >> > So the code will look like this: >> > >> > if (parent_cg->res[i].misc_ops && parent_cg->res[i].misc_ops->alloc) >> > { >> > ... >> > } >> > >> > I don't see this is the pattern used in cgroups and no strong opinion >> > either way. >> > >> > TJ, do you have preference on this? >> >> I do have strong opinion on this. In the client side we want as much >> things declared statically as we can because it gives more tools for >> statical analysis. >> >> I don't want to see dynamic assignments in the SGX driver, when they >> are not actually needed, no matter things are done in cgroups. > > I.e. I don't really even care what crazy things cgroups subsystem > might do or not do. It's not my problem. > > All I care is that we *do not* have any use for assigning those > pointers at run-time. So do whatever you want with cgroups side > as long as this is not the case. > So I will update to something like following. Let me know if that's correct understanding. @tj, I'd appreciate for your input on whether this is acceptable from cgroups side. --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -31,22 +31,26 @@ struct misc_cg; #include <linux/cgroup.h> +/* per resource callback ops */ +struct misc_operations_struct { + int (*alloc)(struct misc_cg *cg); + void (*free)(struct misc_cg *cg); + void (*max_write)(struct misc_cg *cg); +}; /** * struct misc_res: Per cgroup per misc type resource * @max: Maximum limit on the resource. * @usage: Current usage of the resource. * @events: Number of times, the resource limit exceeded. + * @priv: resource specific data. + * @misc_ops: resource specific operations. */ struct misc_res { u64 max; atomic64_t usage; atomic64_t events; void *priv; - - /* per resource callback ops */ - int (*alloc)(struct misc_cg *cg); - void (*free)(struct misc_cg *cg); - void (*max_write)(struct misc_cg *cg); + const struct misc_operations_struct *misc_ops; }; ... diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index 4633b8629e63..500415087643 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -277,8 +277,8 @@ static ssize_t misc_cg_max_write(struct kernfs_open_file *of, char *buf, if (READ_ONCE(misc_res_capacity[type])) { WRITE_ONCE(cg->res[type].max, max); - if (cg->res[type].max_write) - cg->res[type].max_write(cg); + if (cg->res[type].misc_ops && cg->res[type].misc_ops->max_write) + cg->res[type].misc_ops->max_write(cg); [skip other similar changes in misc.c] And on SGX side, it'll be updated like this: --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c @@ -376,6 +376,14 @@ static void sgx_epc_cgroup_max_write(struct misc_cg *cg) queue_work(sgx_epc_cg_wq, &rc.epc_cg->reclaim_work); } +static int sgx_epc_cgroup_alloc(struct misc_cg *cg); + +const struct misc_operations_struct sgx_epc_cgroup_ops = { + .alloc = sgx_epc_cgroup_alloc, + .free = sgx_epc_cgroup_free, + .max_write = sgx_epc_cgroup_max_write, +}; + static int sgx_epc_cgroup_alloc(struct misc_cg *cg) { struct sgx_epc_cgroup *epc_cg; @@ -386,12 +394,7 @@ static int sgx_epc_cgroup_alloc(struct misc_cg *cg) sgx_lru_init(&epc_cg->lru); INIT_WORK(&epc_cg->reclaim_work, sgx_epc_cgroup_reclaim_work_func); - cg->res[MISC_CG_RES_SGX_EPC].alloc = sgx_epc_cgroup_alloc; - cg->res[MISC_CG_RES_SGX_EPC].free = sgx_epc_cgroup_free; - cg->res[MISC_CG_RES_SGX_EPC].max_write = sgx_epc_cgroup_max_write; - cg->res[MISC_CG_RES_SGX_EPC].priv = epc_cg; - epc_cg->cg = cg; - + cg->res[MISC_CG_RES_SGX_EPC].misc_ops = &sgx_epc_cgroup_ops; return 0; } Thanks again to all of you for feedback. Haitao
Hello, On Wed, Oct 04, 2023 at 10:45:08AM -0500, Haitao Huang wrote: > So I will update to something like following. Let me know if that's correct > understanding. > @tj, I'd appreciate for your input on whether this is acceptable from > cgroups side. Yeah, that's fine by me and I can't tell what actual differences the two would have in practice. Thanks.
On Fri, Sep 22, 2023 at 08:06:40PM -0700, Haitao Huang <haitao.huang@linux.intel.com> wrote: > @@ -276,10 +276,13 @@ static ssize_t misc_cg_max_write(struct kernfs_open_file *of, char *buf, > > cg = css_misc(of_css(of)); > > - if (READ_ONCE(misc_res_capacity[type])) > + if (READ_ONCE(misc_res_capacity[type])) { > WRITE_ONCE(cg->res[type].max, max); > - else > + if (cg->res[type].max_write) > + cg->res[type].max_write(cg); > + } else { > ret = -EINVAL; > > + } Is it time for a misc_cg_mutex? This given no synchronization guarantees to implementors of max_write. (Alternatively, document it that the callback must implement own synchronization.) Thanks, Michal
diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index e799b1f8d05b..96a88822815a 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -37,6 +37,11 @@ struct misc_res { u64 max; atomic64_t usage; atomic64_t events; + + /* per resource callback ops */ + int (*alloc)(struct misc_cg *cg); + void (*free)(struct misc_cg *cg); + void (*max_write)(struct misc_cg *cg); }; /** diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index 79a3717a5803..62c9198dee21 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -276,10 +276,13 @@ static ssize_t misc_cg_max_write(struct kernfs_open_file *of, char *buf, cg = css_misc(of_css(of)); - if (READ_ONCE(misc_res_capacity[type])) + if (READ_ONCE(misc_res_capacity[type])) { WRITE_ONCE(cg->res[type].max, max); - else + if (cg->res[type].max_write) + cg->res[type].max_write(cg); + } else { ret = -EINVAL; + } return ret ? ret : nbytes; } @@ -383,23 +386,39 @@ static struct cftype misc_cg_files[] = { static struct cgroup_subsys_state * misc_cg_alloc(struct cgroup_subsys_state *parent_css) { + struct misc_cg *parent_cg; enum misc_res_type i; struct misc_cg *cg; + int ret; if (!parent_css) { cg = &root_cg; + parent_cg = &root_cg; } else { cg = kzalloc(sizeof(*cg), GFP_KERNEL); if (!cg) return ERR_PTR(-ENOMEM); + parent_cg = css_misc(parent_css); } for (i = 0; i < MISC_CG_RES_TYPES; i++) { WRITE_ONCE(cg->res[i].max, MAX_NUM); atomic64_set(&cg->res[i].usage, 0); + if (parent_cg->res[i].alloc) { + ret = parent_cg->res[i].alloc(cg); + if (ret) + goto alloc_err; + } } return &cg->css; + +alloc_err: + for (i = 0; i < MISC_CG_RES_TYPES; i++) + if (parent_cg->res[i].free) + cg->res[i].free(cg); + kfree(cg); + return ERR_PTR(ret); } /** @@ -410,7 +429,14 @@ misc_cg_alloc(struct cgroup_subsys_state *parent_css) */ static void misc_cg_free(struct cgroup_subsys_state *css) { - kfree(css_misc(css)); + struct misc_cg *cg = css_misc(css); + enum misc_res_type i; + + for (i = 0; i < MISC_CG_RES_TYPES; i++) + if (cg->res[i].free) + cg->res[i].free(cg); + + kfree(cg); } /* Cgroup controller callbacks */