From patchwork Tue Jan 30 02:09:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 193836 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp1006746dyb; Mon, 29 Jan 2024 21:09:53 -0800 (PST) X-Google-Smtp-Source: AGHT+IF0hqjQPbSkLnXZ+XIzkOhRefpOcUKMGpRqcW1sp7xdNevopEwCr6KLgHv5A33PvUJAjfV7 X-Received: by 2002:a05:6a00:708:b0:6db:dd2d:b4e1 with SMTP id 8-20020a056a00070800b006dbdd2db4e1mr3520187pfl.8.1706591393394; Mon, 29 Jan 2024 21:09:53 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706591393; cv=pass; d=google.com; s=arc-20160816; b=QtDyX7124x59y1+XrJ/Ri0qWi/1PkpHGQh0uqq1Y7DIG55sK6tzpnw5j+ml2GfxMQZ 31Jd3WXemVzNAPW2Kwybru1P7y/RBgkFtM6f3O8sycrfCaeS8kH4OGYFuqcSWLMdepXP KArzrYLpZ9Kl6c7tFgl0RBc7vj4nYpynDjExD0eHkSsTjLfWkEUusBxBhtzpIWtG4Hty J3FvnuQmZs+Eh+sBMQFcFoaX5xYz1KHcBkUBGrqU6gzbsLve/sdp7FMu8U7+m/qln3EQ LE2G5ua1BQ5UhLt5Glja48M0kEYXaUqbtIxOhTT/C1uhRsdQBtYr88HdxuVg40q5HtjB SP8Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=quKV4zsEdWqyTwWMn83NQRpFDl6eiWTlD1owedSLuJE=; fh=dtGru4wMzkfh5N4HjO7hjgtPmzQheD7FyA8qGNiECvw=; b=JIiTp5ZROboprYOVDIlhKOj8Il/q/AKg5u7ZlZOUW7d+HVdJ7rsbpc3IOlEN69IPJY 76lEDxPK4xzOJUYvXfF1yrAtQgeBE+uAmFrExB2BNsyaahGohps/inNYzideBMGnNdIx oK8+FNft4fYL2wra8cvIACh0x4a7JiJ23gjLU/AK3sef2rjTANvD8p8pGIq6D/5vqIV1 p3mhE+ysiq8reVj/6mbufs5tA58pt7bisEzkiiPdvn1nq/OkenpnEh5EVYL7PxVlmg5f xlDtFVMj60cChhQHRiBsJv0s51PQXtD2B1Dr5RA2EjKmld/b3RyOtYN5QDbb4zDbLx5o WWyg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=QZ8H8rxR; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43782-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43782-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id p27-20020a056a000a1b00b006de382dda1dsi1802519pfh.331.2024.01.29.21.09.52 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 21:09:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-43782-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=QZ8H8rxR; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43782-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43782-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 5686EB23490 for ; Tue, 30 Jan 2024 02:10:58 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 23908381AD; Tue, 30 Jan 2024 02:09:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QZ8H8rxR" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9C7AC36AF6; Tue, 30 Jan 2024 02:09:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580582; cv=none; b=mjhdMN64V/Ja6ud53jSE5HvaCEcyD62kusdHhK3GS3EGbOzJyHnpGaWtufA4WP0ShsVujl6P7uaYrziBikWzCJUTd65uxhww05fubc7RE968DNS1qjH5vlMBiwtE5e37C75w2jBPocHgxF/bvr6rzqYTPI/5z8+O26KB3ae84OA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580582; c=relaxed/simple; bh=d/9Zkh3WXT/GzBNXEY2h0MuOAEumEnml0DuJKlUHSQ0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=dGZCu/W9YdEyOlVyh4wxTU5K3KBAQucZ8KV1lM2lRsUgqwL1RtFyVmGfII6aVFgUE0YWfev4UaM3UAa5KK2YsaujCvToxCcSVMmWqZdVpZNxClRFsBit0JdatxrlhnGNbwrjkzCUh25j7xP70KMw0fHT3oHCc7G17mGChMHGpzw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QZ8H8rxR; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706580581; x=1738116581; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=d/9Zkh3WXT/GzBNXEY2h0MuOAEumEnml0DuJKlUHSQ0=; b=QZ8H8rxR6l9+wzBPRLBz/tYEy2GHgSB/fMbjcduk6t1DGLexziCDFCgy PaEmjI5tIzpDDM/lQP4yInXyAsV7zeajgeRKWw02uxukgfcJDKn5ClVZI qWLMt+9ezqdu+ByYIpalFsVvUmSBkVxFT6U4Bvs2r6J0YUTksRZNYhKeN ej06/mXzhhdtK1wlAAytOLEEzBTkpKOIFZBQulP4iu5LqzAnDDhkc400v Gg+QG4v/7Xm1Wn7MWWbJxXJsGkOJ6JfD2cM0qtN87jtloGMN0AT1tlBHz 6W6GgY0yq+J4aXDZqwAZpVBdR8XixZyNwWvKGntzBXK1YtgPLM4r+6CzR Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="16530921" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="16530921" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 18:09:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="822042301" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="822042301" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga001.jf.intel.com with ESMTP; 29 Jan 2024 18:09:38 -0800 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, chrisyan@microsoft.com Subject: [PATCH v8 01/15] cgroup/misc: Add per resource callbacks for CSS events Date: Mon, 29 Jan 2024 18:09:24 -0800 Message-Id: <20240130020938.10025-2-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240130020938.10025-1-haitao.huang@linux.intel.com> References: <20240130020938.10025-1-haitao.huang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789490777308751414 X-GMAIL-MSGID: 1789490777308751414 From: Kristen Carlson Accardi The misc cgroup controller (subsystem) currently does not perform resource type specific action for Cgroups Subsystem State (CSS) events: the 'css_alloc' event when a cgroup is created and the 'css_free' event when a cgroup is destroyed. Define callbacks for those events and allow resource providers to register the callbacks per resource type as needed. This will be utilized later by the EPC misc cgroup support implemented in the SGX driver. Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Reviewed-by: Jarkko Sakkinen --- V8: - Abstract out _misc_cg_res_free() and _misc_cg_res_alloc() (Jarkko) V7: - Make ops one per resource type and store them in array (Michal) - Rename the ops struct to misc_res_ops, and enforce the constraints of required callback functions (Jarkko) - Moved addition of priv field to patch 4 where it was used first. (Jarkko) V6: - Create ops struct for per resource callbacks (Jarkko) - Drop max_write callback (Dave, Michal) - Style fixes (Kai) --- include/linux/misc_cgroup.h | 11 +++++ kernel/cgroup/misc.c | 84 +++++++++++++++++++++++++++++++++---- 2 files changed, 87 insertions(+), 8 deletions(-) diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index e799b1f8d05b..0806d4436208 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -27,6 +27,16 @@ struct misc_cg; #include +/** + * struct misc_res_ops: per resource type callback ops. + * @alloc: invoked for resource specific initialization when cgroup is allocated. + * @free: invoked for resource specific cleanup when cgroup is deallocated. + */ +struct misc_res_ops { + int (*alloc)(struct misc_cg *cg); + void (*free)(struct misc_cg *cg); +}; + /** * struct misc_res: Per cgroup per misc type resource * @max: Maximum limit on the resource. @@ -56,6 +66,7 @@ struct misc_cg { u64 misc_cg_res_total_usage(enum misc_res_type type); int misc_cg_set_capacity(enum misc_res_type type, u64 capacity); +int misc_cg_set_ops(enum misc_res_type type, const struct misc_res_ops *ops); int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, u64 amount); void misc_cg_uncharge(enum misc_res_type type, struct misc_cg *cg, u64 amount); diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index 79a3717a5803..14ab13ef3bc7 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -39,6 +39,9 @@ static struct misc_cg root_cg; */ static u64 misc_res_capacity[MISC_CG_RES_TYPES]; +/* Resource type specific operations */ +static const struct misc_res_ops *misc_res_ops[MISC_CG_RES_TYPES]; + /** * parent_misc() - Get the parent of the passed misc cgroup. * @cgroup: cgroup whose parent needs to be fetched. @@ -105,6 +108,36 @@ int misc_cg_set_capacity(enum misc_res_type type, u64 capacity) } EXPORT_SYMBOL_GPL(misc_cg_set_capacity); +/** + * misc_cg_set_ops() - set resource specific operations. + * @type: Type of the misc res. + * @ops: Operations for the given type. + * + * Context: Any context. + * Return: + * * %0 - Successfully registered the operations. + * * %-EINVAL - If @type is invalid, or the operations missing any required callbacks. + */ +int misc_cg_set_ops(enum misc_res_type type, const struct misc_res_ops *ops) +{ + if (!valid_type(type)) + return -EINVAL; + + if (!ops->alloc) { + pr_err("%s: alloc missing\n", __func__); + return -EINVAL; + } + + if (!ops->free) { + pr_err("%s: free missing\n", __func__); + return -EINVAL; + } + + misc_res_ops[type] = ops; + return 0; +} +EXPORT_SYMBOL_GPL(misc_cg_set_ops); + /** * misc_cg_cancel_charge() - Cancel the charge from the misc cgroup. * @type: Misc res type in misc cg to cancel the charge from. @@ -371,6 +404,33 @@ static struct cftype misc_cg_files[] = { {} }; +static inline int _misc_cg_res_alloc(struct misc_cg *cg) +{ + enum misc_res_type i; + int ret; + + for (i = 0; i < MISC_CG_RES_TYPES; i++) { + WRITE_ONCE(cg->res[i].max, MAX_NUM); + atomic64_set(&cg->res[i].usage, 0); + if (misc_res_ops[i]) { + ret = misc_res_ops[i]->alloc(cg); + if (ret) + return ret; + } + } + + return 0; +} + +static inline void _misc_cg_res_free(struct misc_cg *cg) +{ + enum misc_res_type i; + + for (i = 0; i < MISC_CG_RES_TYPES; i++) + if (misc_res_ops[i]) + misc_res_ops[i]->free(cg); +} + /** * misc_cg_alloc() - Allocate misc cgroup. * @parent_css: Parent cgroup. @@ -383,20 +443,25 @@ static struct cftype misc_cg_files[] = { static struct cgroup_subsys_state * misc_cg_alloc(struct cgroup_subsys_state *parent_css) { - enum misc_res_type i; - struct misc_cg *cg; + struct misc_cg *parent_cg, *cg; + int ret; - if (!parent_css) { - cg = &root_cg; + if (unlikely(!parent_css)) { + parent_cg = cg = &root_cg; } else { cg = kzalloc(sizeof(*cg), GFP_KERNEL); if (!cg) return ERR_PTR(-ENOMEM); + parent_cg = css_misc(parent_css); } - for (i = 0; i < MISC_CG_RES_TYPES; i++) { - WRITE_ONCE(cg->res[i].max, MAX_NUM); - atomic64_set(&cg->res[i].usage, 0); + ret = _misc_cg_res_alloc(cg); + if (ret) { + _misc_cg_res_free(cg); + if (likely(parent_css)) + kfree(cg); + + return ERR_PTR(ret); } return &cg->css; @@ -410,7 +475,10 @@ misc_cg_alloc(struct cgroup_subsys_state *parent_css) */ static void misc_cg_free(struct cgroup_subsys_state *css) { - kfree(css_misc(css)); + struct misc_cg *cg = css_misc(css); + + _misc_cg_res_free(cg); + kfree(cg); } /* Cgroup controller callbacks */ From patchwork Tue Jan 30 02:09:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 193869 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp1055487dyb; Mon, 29 Jan 2024 23:38:20 -0800 (PST) X-Google-Smtp-Source: AGHT+IHHcyCtlgyU5GgW+ikvS+SXJf9wNpu+Iirjr9xzFwHRZSmjiTJstyLB9g5rBA+XO+g6QPnd X-Received: by 2002:a81:83c8:0:b0:602:d17a:7f7b with SMTP id t191-20020a8183c8000000b00602d17a7f7bmr5301626ywf.39.1706600300228; Mon, 29 Jan 2024 23:38:20 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706600300; cv=pass; d=google.com; s=arc-20160816; b=fu6cPBqC0VHhQY5KZFbPwSHvOnxLrx2bkSwQDrH1T2xh2HaL3qImtii3n8U55+a+sk lzqoEXkNP4TG5m/jf4BhVMfmTtAnsWsvxt2RkPpfUrqPLRYx/EpvWWrYNqRnAO9kCz0J KyeQ9JUgDC29RwldPLSMY/Md3OktV59NxTCkYXJDfe/Rd4qew6WE4un9a5k5CJIyvJ+M KL2z/bemWAevktoBBuyqaCrC7vlSsZcjTL7jS71ImIZseFw+E+nLbRwjVLRMiEPOF8Fy f7H6tYob1igJYz+fy5AfM3uJnrlMQldKP/6A1j406EWi4lC0W9vczLS4ET9FaLwkQNca ND2A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=AjYx2RXmZ12ZTZ4QMQH6AM9PJymYWkZubaRveF/V0jU=; fh=dtGru4wMzkfh5N4HjO7hjgtPmzQheD7FyA8qGNiECvw=; b=FC9s8xOU+ZNwl6KkqosNiuUnvasM6KdWGO3+24BCvBRzRMG0oFXXavmfZ26jxG2hYt ye7wlu8qcu7zIPGI8m2kK7Cal15hUS4GxcZnELM5XyX8gPKWwtfHw6gA1ct85JfDlgLw vTAcbyTf9GopfvyIyppPvb/McbMeYKXFQ8xFW+BHXYsK19otX0nDnl74RrXnMx2uz/Yq vq74l3lTI7iODsLwJFIK7OCmGL0Q/2L411Pg7FO0WLMcOfld3Irp9qIwP1iqw8CVTdbO JULlJtA7zlYq5w8o1KwIkA9GT9svVz/mXMqkUelqv7bylb/qTIFRwyXusOd7EmCGNSA3 vAlA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=LjRQLRzg; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43784-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43784-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id y62-20020a636441000000b005d5c9f0d40csi7163458pgb.302.2024.01.29.23.38.20 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 23:38:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-43784-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=LjRQLRzg; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43784-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43784-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id BD77A2870B1 for ; Tue, 30 Jan 2024 02:11:14 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 11ED038380; Tue, 30 Jan 2024 02:09:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="LjRQLRzg" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D726A364B3; Tue, 30 Jan 2024 02:09:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580583; cv=none; b=utqRpjbly6LOJBY1hrDe5LlgHqOT2Y+ajtRVtMALEUKUHMnYtrPSI4KADiEpDAPtdl26H/PM1XKq1TPAl0372ESU8vG4hHvrBTjZ8tzuhap3c7RVpj8QCnkAk3xvX0yVG8EnrE3l8YGt3fWUxDzkDoen7g+nkKx/XwnhZWeuSJ0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580583; c=relaxed/simple; bh=ZEGZe8zv9/pCYQdMqDtrlmy0wtyZRRV26E+V+mhOBxI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=PWXSaLGV2kluod/QoKz331H0mKaDQrJV+ml0awDXbtsX+fxSAWN5wGbKhA/wb8OsOhc7aB0KVvz7YXR6XzvXvhrFF+zHYMdbICesGAJVCW3tmw/v2c784HkzNVN0szczNM4Wa3ATLfPL+NPTqWbOh40NzZA0LfLbjfmJEGAbMXE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=LjRQLRzg; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706580582; x=1738116582; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ZEGZe8zv9/pCYQdMqDtrlmy0wtyZRRV26E+V+mhOBxI=; b=LjRQLRzgTVq89O5agUR/+S47RDGS5IwqGSXB/+ed6wjkij9KpOCvaZGS EUd6br5gVRCiObk1ryiuW4LjiX0iIF26CKfbCPJsqafo3gER281qq7THF rTC8QOVw/VXA/yI2aO6G6le9+xT8o5vgek6gK2MFStYTZ1eu7UVlgvOYF olIBhrToBrQqizu1BJspyyS7yehPucDjBvZCjNCTVVvv1BB589KIHeNhs JVv9mocCmLdCwc9pkvVPX9ZO5y0EfzttA5cpQteBx+h2uwJAE0vGic5aj y92C9HKT9c81VHSJNNweg3yuY1h5pRHVgyZ0aXTan2HZGDHntIbh+IE2Z A==; X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="16530923" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="16530923" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 18:09:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="822042321" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="822042321" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga001.jf.intel.com with ESMTP; 29 Jan 2024 18:09:38 -0800 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, chrisyan@microsoft.com Subject: [PATCH v8 02/15] cgroup/misc: Export APIs for SGX driver Date: Mon, 29 Jan 2024 18:09:25 -0800 Message-Id: <20240130020938.10025-3-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240130020938.10025-1-haitao.huang@linux.intel.com> References: <20240130020938.10025-1-haitao.huang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789500116328311993 X-GMAIL-MSGID: 1789500116328311993 From: Kristen Carlson Accardi The SGX EPC cgroup will reclaim EPC pages when a usage in a cgroup reaches its or ancestor's limit. This requires a walk from the current cgroup up to the root similar to misc_cg_try_charge(). Export misc_cg_parent() to enable this walk. The SGX driver may also need start a global level reclamation from the root. Export misc_cg_root() for the SGX driver to access. Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Reviewed-by: Jarkko Sakkinen --- V6: - Make commit messages more concise and split the original patch into two(Kai) --- include/linux/misc_cgroup.h | 24 ++++++++++++++++++++++++ kernel/cgroup/misc.c | 21 ++++++++------------- 2 files changed, 32 insertions(+), 13 deletions(-) diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index 0806d4436208..541a5611c597 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -64,6 +64,7 @@ struct misc_cg { struct misc_res res[MISC_CG_RES_TYPES]; }; +struct misc_cg *misc_cg_root(void); u64 misc_cg_res_total_usage(enum misc_res_type type); int misc_cg_set_capacity(enum misc_res_type type, u64 capacity); int misc_cg_set_ops(enum misc_res_type type, const struct misc_res_ops *ops); @@ -84,6 +85,20 @@ static inline struct misc_cg *css_misc(struct cgroup_subsys_state *css) return css ? container_of(css, struct misc_cg, css) : NULL; } +/** + * misc_cg_parent() - Get the parent of the passed misc cgroup. + * @cgroup: cgroup whose parent needs to be fetched. + * + * Context: Any context. + * Return: + * * struct misc_cg* - Parent of the @cgroup. + * * %NULL - If @cgroup is null or the passed cgroup does not have a parent. + */ +static inline struct misc_cg *misc_cg_parent(struct misc_cg *cgroup) +{ + return cgroup ? css_misc(cgroup->css.parent) : NULL; +} + /* * get_current_misc_cg() - Find and get the misc cgroup of the current task. * @@ -108,6 +123,15 @@ static inline void put_misc_cg(struct misc_cg *cg) } #else /* !CONFIG_CGROUP_MISC */ +static inline struct misc_cg *misc_cg_root(void) +{ + return NULL; +} + +static inline struct misc_cg *misc_cg_parent(struct misc_cg *cg) +{ + return NULL; +} static inline u64 misc_cg_res_total_usage(enum misc_res_type type) { diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index 14ab13ef3bc7..1f0d8e05b36c 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -43,18 +43,13 @@ static u64 misc_res_capacity[MISC_CG_RES_TYPES]; static const struct misc_res_ops *misc_res_ops[MISC_CG_RES_TYPES]; /** - * parent_misc() - Get the parent of the passed misc cgroup. - * @cgroup: cgroup whose parent needs to be fetched. - * - * Context: Any context. - * Return: - * * struct misc_cg* - Parent of the @cgroup. - * * %NULL - If @cgroup is null or the passed cgroup does not have a parent. + * misc_cg_root() - Return the root misc cgroup. */ -static struct misc_cg *parent_misc(struct misc_cg *cgroup) +struct misc_cg *misc_cg_root(void) { - return cgroup ? css_misc(cgroup->css.parent) : NULL; + return &root_cg; } +EXPORT_SYMBOL_GPL(misc_cg_root); /** * valid_type() - Check if @type is valid or not. @@ -183,7 +178,7 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, u64 amount) if (!amount) return 0; - for (i = cg; i; i = parent_misc(i)) { + for (i = cg; i; i = misc_cg_parent(i)) { res = &i->res[type]; new_usage = atomic64_add_return(amount, &res->usage); @@ -196,12 +191,12 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, u64 amount) return 0; err_charge: - for (j = i; j; j = parent_misc(j)) { + for (j = i; j; j = misc_cg_parent(j)) { atomic64_inc(&j->res[type].events); cgroup_file_notify(&j->events_file); } - for (j = cg; j != i; j = parent_misc(j)) + for (j = cg; j != i; j = misc_cg_parent(j)) misc_cg_cancel_charge(type, j, amount); misc_cg_cancel_charge(type, i, amount); return ret; @@ -223,7 +218,7 @@ void misc_cg_uncharge(enum misc_res_type type, struct misc_cg *cg, u64 amount) if (!(amount && valid_type(type) && cg)) return; - for (i = cg; i; i = parent_misc(i)) + for (i = cg; i; i = misc_cg_parent(i)) misc_cg_cancel_charge(type, i, amount); } EXPORT_SYMBOL_GPL(misc_cg_uncharge); From patchwork Tue Jan 30 02:09:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 193825 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp976134dyb; Mon, 29 Jan 2024 19:26:45 -0800 (PST) X-Google-Smtp-Source: AGHT+IFEZEou4NJQqjmPyPNH3VxQASAD3jTI6XYe9HQdXMyozNYsxu4dBsN7sQj3+YQuE/80mSLL X-Received: by 2002:a2e:8e69:0:b0:2d0:4f16:7ea3 with SMTP id t9-20020a2e8e69000000b002d04f167ea3mr2217088ljk.42.1706585205726; Mon, 29 Jan 2024 19:26:45 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706585205; cv=pass; d=google.com; s=arc-20160816; b=CxdVQaBzmxg4Hi/dHdB6jkqHLbA2mcS+Z3iWP/9z8YpK2REW91txthScrQQz/wadbC Yt7wbpg4jK/mQX7quA/3LFnHKCEB9cqCy2M9wtzZfc+EcVtPDrT5PeoBiAS1x6hBybLQ +rhbkBdOLKWG2TUVgUsy6NlNvyZtl/QNj0Pixf/BphQqamE+Lr2/CJAQS9Xg2jTDxR+A Bh1wz6w7yCNv1YFy0E63f45c+f4hkVtZTnCPF/hL9FAPina3vaNWABA6z6NrWmcIFg7c GicD8U7GQG1o9ovAzedtROjjmEASJZWXbZyHCxVf27Zuf/WMOrvzwImeURRkufOOF8uz yWtA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=AqWWI/Gh7z2z8mZ5OI3FsyusrHgJwzTCkVq/3hcyohU=; fh=dtGru4wMzkfh5N4HjO7hjgtPmzQheD7FyA8qGNiECvw=; b=FDBcgUFeb/ChV6RREo0byerwgLECLLjnAldDymKcndCl+8zDgB41sJGNztPkVo8uq6 G6Bz6+BMsJ+YSY9rtiM7G3WROMsx+5OyQTASmjrVCbf8V/P8B2x8U3Co1p4RdMOMiouk uG0DoTBK4mSl9L6ZBzl/QAst7266T1xz1rZDiNsc5mQ46pdxwtk4hhA3PBMsbHdsvL3G kmkqALiKxyhWsY3yryFHLCXyOGmbvicJOndo4ETsgPnmWkonMpmbcHJrYk+lIZuyFeVM L6cJEcCXD94TYkmYS8+3cPW7eQ+Mgye6Gd7h4HZxcuQKx2xnLqPd7P9On74JumFPZ08t 8rVQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=dk4iYCZ0; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43783-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43783-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id fe12-20020a056402390c00b0055a4d0e3a77si4031990edb.266.2024.01.29.19.26.45 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 19:26:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-43783-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=dk4iYCZ0; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43783-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43783-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id E5C491F24FD9 for ; Tue, 30 Jan 2024 02:10:59 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3FB36381B3; Tue, 30 Jan 2024 02:09:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="dk4iYCZ0" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DB36536B01; Tue, 30 Jan 2024 02:09:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580583; cv=none; b=evctCbbsYnFnVlBy0kaR0loyUheTqeEaROCa+36PdkQeAfNV6CPMVeUIKyY1m/HCJlb2gjZZNbA+SBk9ozs0Cn/ecpPnBs1n5TFAhECnEnu7v1UZ+kQ24VvFFNbl+zeSxeBTb9Eh4Mljop40hwqJZ9rqkROhixADhrFPWrsEJMs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580583; c=relaxed/simple; bh=YDWZ+/RuZjq5pI5dUr4mOk8isI80rU+KIyGmgA2cvBo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=UwDk9QROOfKXqHqXuKq5qWi2oyTVuyR2+KxersyNrLguOOTgCW7pKHqL/3mDNS+RNE3xypwnEI1qn5J/FQiprLye7hHoN8K7vNfTyN0/JQlMcjjzcpukRDpHmlhrAqaPmsMJR1M/D8hzu+tWETJdtiUHo+/JNMj9DJtZMrWgeSk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=dk4iYCZ0; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706580582; x=1738116582; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YDWZ+/RuZjq5pI5dUr4mOk8isI80rU+KIyGmgA2cvBo=; b=dk4iYCZ0VYrVPbH1/kU401gNUyrsQbieOAsE9eSnaMxQB7NTBA/kN0Kg RQzWFsEbjXuOOhwTyvLI8YMbET4oT9hheY+RtEoMHOFlspJ42h96cnutp RB7tFlIMAbQlOhdBtizzY+stvTUAKXW3t1x6alExWCADLd/KP+piwRItl ylVQco7YYgFko9H2KfkqQmFF5pKYGj88guxgTismsv9RBDW0FRk7nRsfV I+3fIGtqkrUYPmAm8RZF0lXqiAoI+CQTcW3IBg0uIiPO6wL9NTS30YA4e qeXr8P0UBelRRiQmzw8UKJTbzvoAibYSt3GcKxWj3OGbWr3HFCejC+phm A==; X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="16530932" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="16530932" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 18:09:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="822042326" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="822042326" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga001.jf.intel.com with ESMTP; 29 Jan 2024 18:09:39 -0800 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, chrisyan@microsoft.com Subject: [PATCH v8 03/15] cgroup/misc: Add SGX EPC resource type Date: Mon, 29 Jan 2024 18:09:26 -0800 Message-Id: <20240130020938.10025-4-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240130020938.10025-1-haitao.huang@linux.intel.com> References: <20240130020938.10025-1-haitao.huang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789484288531247410 X-GMAIL-MSGID: 1789484288531247410 From: Kristen Carlson Accardi Add SGX EPC memory, MISC_CG_RES_SGX_EPC, to be a valid resource type for the misc controller. Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Reviewed-by: Jarkko Sakkinen --- V6: - Split the original patch into this and the preceding one (Kai) --- include/linux/misc_cgroup.h | 4 ++++ kernel/cgroup/misc.c | 4 ++++ 2 files changed, 8 insertions(+) diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index 541a5611c597..2f6cc3a0ad23 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -17,6 +17,10 @@ enum misc_res_type { MISC_CG_RES_SEV, /* AMD SEV-ES ASIDs resource */ MISC_CG_RES_SEV_ES, +#endif +#ifdef CONFIG_CGROUP_SGX_EPC + /* SGX EPC memory resource */ + MISC_CG_RES_SGX_EPC, #endif MISC_CG_RES_TYPES }; diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index 1f0d8e05b36c..e51d6a45007f 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -24,6 +24,10 @@ static const char *const misc_res_name[] = { /* AMD SEV-ES ASIDs resource */ "sev_es", #endif +#ifdef CONFIG_CGROUP_SGX_EPC + /* Intel SGX EPC memory bytes */ + "sgx_epc", +#endif }; /* Root misc cgroup */ From patchwork Tue Jan 30 02:09:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 193837 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp1007322dyb; Mon, 29 Jan 2024 21:11:34 -0800 (PST) X-Google-Smtp-Source: AGHT+IHBwP0IqPFAYq4GBYpOY4EAQ8iaYmJ7kM8IQaW9eTpcWKEvuNRbEUv6igV9oTq4jj57Fv3v X-Received: by 2002:a17:906:41b:b0:a35:7132:26d5 with SMTP id d27-20020a170906041b00b00a35713226d5mr4291142eja.63.1706591493797; Mon, 29 Jan 2024 21:11:33 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706591493; cv=pass; d=google.com; s=arc-20160816; b=E6rJJCByX9bJYzxiLImXjTWqmW4FbJgzRKKB5oG2wcxSVByFJc2qSQuu2jDdxw8oE8 5MkiF0rkuswfxTCw3Iy4Tan/wPyCjyyltldGe5DoVSJwbOFkX9FmH9dVuSRsDxQsm5uG mfvatZ/kYCckn2DUIqWm+Jc4wT/99mkLdvWJxmV5cFLYuRbt+AmkiEZ/jCnXsdjSk0zS aitT4sCDDFMQ3t4PFCpoWtVmONuKCfm+JjVBpKuu/0RpjiZn52boYfQyQH3yMZB/DfLH jeJjVf6RTKdBF5/fWO1UYuoQyeN9lFr96ebcUJiMwEQbdxZmSBi5pluGkZHN4pnlUDt3 i/eA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=Hwl17W3QS/bcjV8i4vgSobuuQqy5g+0qiYCJjU7kbM4=; fh=dtGru4wMzkfh5N4HjO7hjgtPmzQheD7FyA8qGNiECvw=; b=DbeRIHkozpyoEJCHGdgwf0XRNumhE5dFYjbu4lTAbdjRCX6Kj6NpQSymNVAetCuMTm 1PtBaykyzKu7GlA0dBohR9FuPJtGdr8lbQ3SHUV94oYofflWxgFtBGtDTPJCvlwEyD9M vG0PByLBzeAfVD9PaYpwWqGUZYJZlqjLxmte2V8ckQTYmRXcMVpiUjEGXevPbnwsTHUX iNRsavfRbVNSdO9p3qOre1+fqYHuHQBxHNbGnDYgsjtsf4c1CAT4SFXaGSR32pyBijHf +Ih1fneUo5dnnAXbxNCO9srNWa7nI1TAOx5mzfoXOiOdBe/G/+4NdsHzVI87IrH28oPq 0Yig== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=aveMz51U; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43785-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43785-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id u9-20020a170906068900b00a2ed1db3626si4162614ejb.570.2024.01.29.21.11.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 21:11:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-43785-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=aveMz51U; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43785-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43785-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 68CFD1F25C8F for ; Tue, 30 Jan 2024 02:12:27 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 164E05381A; Tue, 30 Jan 2024 02:09:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="aveMz51U" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FFC736B1E; Tue, 30 Jan 2024 02:09:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580584; cv=none; b=PIYj1xj1ZRLN1FdlB8SmOWFpyvFPx/v3vkLQ+5+CKCyBVDYhR0rC1jqSWFtbxp/kh5drfiVz7z2L81Y/Ml/F9APTyjTaMIQiGRbudUNaaMzFqXK8YXi6I8Y1DZCAEN13MUKs6GCD4Rasj+9LgqH0I9+UAH1IkKzjHRuS9sfYrkI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580584; c=relaxed/simple; bh=QQ6626XJnsHKUrYdOZck6HAqXnE3J4gGEHzMBSZifkQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=uVQ6R4P1MMPHsaLWtMtSqbzjdeSdF77Lm3EMi0VPNvmNgOC29zNMjZwlb9mN+yJ9Oy7HU1ATCLexLBgMX01j2bfFIKu2zLPJ6IzDAM6s1kY2wdrclgf1doTwsV3EC2py0A7YnBZTLqW0S+/hzv/DFQcvD3vSQaaosfX/Ui4kO0Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=aveMz51U; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706580583; x=1738116583; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QQ6626XJnsHKUrYdOZck6HAqXnE3J4gGEHzMBSZifkQ=; b=aveMz51UpAAoOQWzcjH1OsX/+kt63+wtXIq3bSrnU4Vym/n4Gsx/ZW5f WpQE3cAn2eAiLRPw+K/LPnZt2ClS2VND4WBHFGyF1xY1bBZN0OuI2eUdu GundO5pE0Y4a+t9bHhEf6t/3OuMEAoi49g1HlhUOEJRRWSB1J4ndoEjjq BBFnoScWrDVN55wzGpcvscdWZxyvrqM+y5o5JGexwIrvTLf65nsh6XIdm D5Y0KdR1yM24iFaPusP1tyH1P4vNPNL9keNTikAclXyUH8J61yGg0EwpC d0emJzvLFtEedkGAYLZiCDfWIlCzmRGvAxgFLmWkZmYT+Dus3X+9coz+/ A==; X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="16530941" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="16530941" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 18:09:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="822042330" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="822042330" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga001.jf.intel.com with ESMTP; 29 Jan 2024 18:09:39 -0800 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, chrisyan@microsoft.com Subject: [PATCH v8 04/15] x86/sgx: Implement basic EPC misc cgroup functionality Date: Mon, 29 Jan 2024 18:09:27 -0800 Message-Id: <20240130020938.10025-5-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240130020938.10025-1-haitao.huang@linux.intel.com> References: <20240130020938.10025-1-haitao.huang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789490882521410866 X-GMAIL-MSGID: 1789490882521410866 From: Kristen Carlson Accardi SGX Enclave Page Cache (EPC) memory allocations are separate from normal RAM allocations, and are managed solely by the SGX subsystem. The existing cgroup memory controller cannot be used to limit or account for SGX EPC memory, which is a desirable feature in some environments. For example, in a Kubernates environment, a user can request certain EPC quota for a pod but the orchestrator can not enforce the quota to limit runtime EPC usage of the pod without an EPC cgroup controller. Utilize the misc controller [admin-guide/cgroup-v2.rst, 5-9. Misc] to limit and track EPC allocations per cgroup. Earlier patches have added the "sgx_epc" resource type in the misc cgroup subsystem. Add basic support in SGX driver as the "sgx_epc" resource provider: - Set "capacity" of EPC by calling misc_cg_set_capacity() - Update EPC usage counter, "current", by calling charge and uncharge APIs for EPC allocation and deallocation, respectively. - Setup sgx_epc resource type specific callbacks, which perform initialization and cleanup during cgroup allocation and deallocation, respectively. With these changes, the misc cgroup controller enables user to set a hard limit for EPC usage in the "misc.max" interface file. It reports current usage in "misc.current", the total EPC memory available in "misc.capacity", and the number of times EPC usage reached the max limit in "misc.events". For now, the EPC cgroup simply blocks additional EPC allocation in sgx_alloc_epc_page() when the limit is reached. Reclaimable pages are still tracked in the global active list, only reclaimed by the global reclaimer when the total free page count is lower than a threshold. Later patches will reorganize the tracking and reclamation code in the global reclaimer and implement per-cgroup tracking and reclaiming. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Reviewed-by: Jarkko Sakkinen --- V8: - Remove null checks for epc_cg in try_charge()/uncharge(). (Jarkko) - Remove extra space, '_INTEL'. (Jarkko) V7: - Use a static for root cgroup (Kai) - Wrap epc_cg field in sgx_epc_page struct with #ifdef (Kai) - Correct check for charge API return (Kai) - Start initialization in SGX device driver init (Kai) - Remove unneeded BUG_ON (Kai) - Split sgx_get_current_epc_cg() out of sgx_epc_cg_try_charge() (Kai) V6: - Split the original large patch"Limit process EPC usage with misc cgroup controller" and restructure it (Kai) --- arch/x86/Kconfig | 13 +++++ arch/x86/kernel/cpu/sgx/Makefile | 1 + arch/x86/kernel/cpu/sgx/epc_cgroup.c | 73 ++++++++++++++++++++++++++++ arch/x86/kernel/cpu/sgx/epc_cgroup.h | 73 ++++++++++++++++++++++++++++ arch/x86/kernel/cpu/sgx/main.c | 52 +++++++++++++++++++- arch/x86/kernel/cpu/sgx/sgx.h | 5 ++ include/linux/misc_cgroup.h | 2 + 7 files changed, 217 insertions(+), 2 deletions(-) create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.c create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 5edec175b9bf..10c3d1d099b2 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1947,6 +1947,19 @@ config X86_SGX If unsure, say N. +config CGROUP_SGX_EPC + bool "Miscellaneous Cgroup Controller for Enclave Page Cache (EPC) for Intel SGX" + depends on X86_SGX && CGROUP_MISC + help + Provides control over the EPC footprint of tasks in a cgroup via + the Miscellaneous cgroup controller. + + EPC is a subset of regular memory that is usable only by SGX + enclaves and is very limited in quantity, e.g. less than 1% + of total DRAM. + + Say N if unsure. + config X86_USER_SHADOW_STACK bool "X86 userspace shadow stack" depends on AS_WRUSS diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile index 9c1656779b2a..12901a488da7 100644 --- a/arch/x86/kernel/cpu/sgx/Makefile +++ b/arch/x86/kernel/cpu/sgx/Makefile @@ -4,3 +4,4 @@ obj-y += \ ioctl.o \ main.o obj-$(CONFIG_X86_SGX_KVM) += virt.o +obj-$(CONFIG_CGROUP_SGX_EPC) += epc_cgroup.o diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c b/arch/x86/kernel/cpu/sgx/epc_cgroup.c new file mode 100644 index 000000000000..eac8548164de --- /dev/null +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c @@ -0,0 +1,73 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright(c) 2022 Intel Corporation. + +#include +#include +#include "epc_cgroup.h" + +static struct sgx_epc_cgroup epc_cg_root; + +/** + * sgx_epc_cgroup_try_charge() - try to charge cgroup for a single EPC page + * + * @epc_cg: The EPC cgroup to be charged for the page. + * Return: + * * %0 - If successfully charged. + * * -errno - for failures. + */ +int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg) +{ + return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE); +} + +/** + * sgx_epc_cgroup_uncharge() - uncharge a cgroup for an EPC page + * @epc_cg: The charged epc cgroup + */ +void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg) +{ + misc_cg_uncharge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE); +} + +static void sgx_epc_cgroup_free(struct misc_cg *cg) +{ + struct sgx_epc_cgroup *epc_cg; + + epc_cg = sgx_epc_cgroup_from_misc_cg(cg); + if (!epc_cg) + return; + + kfree(epc_cg); +} + +static int sgx_epc_cgroup_alloc(struct misc_cg *cg); + +const struct misc_res_ops sgx_epc_cgroup_ops = { + .alloc = sgx_epc_cgroup_alloc, + .free = sgx_epc_cgroup_free, +}; + +static void sgx_epc_misc_init(struct misc_cg *cg, struct sgx_epc_cgroup *epc_cg) +{ + cg->res[MISC_CG_RES_SGX_EPC].priv = epc_cg; + epc_cg->cg = cg; +} + +static int sgx_epc_cgroup_alloc(struct misc_cg *cg) +{ + struct sgx_epc_cgroup *epc_cg; + + epc_cg = kzalloc(sizeof(*epc_cg), GFP_KERNEL); + if (!epc_cg) + return -ENOMEM; + + sgx_epc_misc_init(cg, epc_cg); + + return 0; +} + +void sgx_epc_cgroup_init(void) +{ + misc_cg_set_ops(MISC_CG_RES_SGX_EPC, &sgx_epc_cgroup_ops); + sgx_epc_misc_init(misc_cg_root(), &epc_cg_root); +} diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.h b/arch/x86/kernel/cpu/sgx/epc_cgroup.h new file mode 100644 index 000000000000..6b664b4c321f --- /dev/null +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h @@ -0,0 +1,73 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2022 Intel Corporation. */ +#ifndef _SGX_EPC_CGROUP_H_ +#define _SGX_EPC_CGROUP_H_ + +#include +#include +#include +#include +#include +#include + +#include "sgx.h" + +#ifndef CONFIG_CGROUP_SGX_EPC +#define MISC_CG_RES_SGX_EPC MISC_CG_RES_TYPES +struct sgx_epc_cgroup; + +static inline struct sgx_epc_cgroup *sgx_get_current_epc_cg(void) +{ + return NULL; +} + +static inline void sgx_put_epc_cg(struct sgx_epc_cgroup *epc_cg) { } + +static inline int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg) +{ + return 0; +} + +static inline void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg) { } + +static inline void sgx_epc_cgroup_init(void) { } +#else +struct sgx_epc_cgroup { + struct misc_cg *cg; +}; + +static inline struct sgx_epc_cgroup *sgx_epc_cgroup_from_misc_cg(struct misc_cg *cg) +{ + return (struct sgx_epc_cgroup *)(cg->res[MISC_CG_RES_SGX_EPC].priv); +} + +/** + * sgx_get_current_epc_cg() - get the EPC cgroup of current process. + * + * Returned cgroup has its ref count increased by 1. Caller must call + * sgx_put_epc_cg() to return the reference. + * + * Return: EPC cgroup to which the current task belongs to. + */ +static inline struct sgx_epc_cgroup *sgx_get_current_epc_cg(void) +{ + return sgx_epc_cgroup_from_misc_cg(get_current_misc_cg()); +} + +/** + * sgx_put_epc_cg() - Put the EPC cgroup and reduce its ref count. + * @epc_cg - EPC cgroup to put. + */ +static inline void sgx_put_epc_cg(struct sgx_epc_cgroup *epc_cg) +{ + if (epc_cg) + put_misc_cg(epc_cg->cg); +} + +int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg); +void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg); +void sgx_epc_cgroup_init(void); + +#endif + +#endif /* _SGX_EPC_CGROUP_H_ */ diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 166692f2d501..c32f18b70c73 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -17,6 +18,7 @@ #include "driver.h" #include "encl.h" #include "encls.h" +#include "epc_cgroup.h" struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS]; static int sgx_nr_epc_sections; @@ -558,7 +560,16 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) */ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) { + struct sgx_epc_cgroup *epc_cg; struct sgx_epc_page *page; + int ret; + + epc_cg = sgx_get_current_epc_cg(); + ret = sgx_epc_cgroup_try_charge(epc_cg); + if (ret) { + sgx_put_epc_cg(epc_cg); + return ERR_PTR(ret); + } for ( ; ; ) { page = __sgx_alloc_epc_page(); @@ -567,8 +578,10 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - if (list_empty(&sgx_active_page_list)) - return ERR_PTR(-ENOMEM); + if (list_empty(&sgx_active_page_list)) { + page = ERR_PTR(-ENOMEM); + break; + } if (!reclaim) { page = ERR_PTR(-EBUSY); @@ -580,10 +593,25 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } + /* + * Need to do a global reclamation if cgroup was not full but free + * physical pages run out, causing __sgx_alloc_epc_page() to fail. + */ sgx_reclaim_pages(); cond_resched(); } +#ifdef CONFIG_CGROUP_SGX_EPC + if (!IS_ERR(page)) { + WARN_ON_ONCE(page->epc_cg); + /* sgx_put_epc_cg() in sgx_free_epc_page() */ + page->epc_cg = epc_cg; + } else { + sgx_epc_cgroup_uncharge(epc_cg); + sgx_put_epc_cg(epc_cg); + } +#endif + if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) wake_up(&ksgxd_waitq); @@ -604,6 +632,14 @@ void sgx_free_epc_page(struct sgx_epc_page *page) struct sgx_epc_section *section = &sgx_epc_sections[page->section]; struct sgx_numa_node *node = section->node; +#ifdef CONFIG_CGROUP_SGX_EPC + if (page->epc_cg) { + sgx_epc_cgroup_uncharge(page->epc_cg); + sgx_put_epc_cg(page->epc_cg); + page->epc_cg = NULL; + } +#endif + spin_lock(&node->lock); page->owner = NULL; @@ -643,6 +679,11 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size, section->pages[i].flags = 0; section->pages[i].owner = NULL; section->pages[i].poison = 0; + +#ifdef CONFIG_CGROUP_SGX_EPC + section->pages[i].epc_cg = NULL; +#endif + list_add_tail(§ion->pages[i].list, &sgx_dirty_page_list); } @@ -787,6 +828,7 @@ static void __init arch_update_sysfs_visibility(int nid) {} static bool __init sgx_page_cache_init(void) { u32 eax, ebx, ecx, edx, type; + u64 capacity = 0; u64 pa, size; int nid; int i; @@ -837,6 +879,7 @@ static bool __init sgx_page_cache_init(void) sgx_epc_sections[i].node = &sgx_numa_nodes[nid]; sgx_numa_nodes[nid].size += size; + capacity += size; sgx_nr_epc_sections++; } @@ -846,6 +889,8 @@ static bool __init sgx_page_cache_init(void) return false; } + misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, capacity); + return true; } @@ -942,6 +987,9 @@ static int __init sgx_init(void) if (sgx_vepc_init() && ret) goto err_provision; + /* Setup cgroup if either the native or vepc driver is active */ + sgx_epc_cgroup_init(); + return 0; err_provision: diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index d2dad21259a8..a898d86dead0 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -29,12 +29,17 @@ /* Pages on free list */ #define SGX_EPC_PAGE_IS_FREE BIT(1) +struct sgx_epc_cgroup; + struct sgx_epc_page { unsigned int section; u16 flags; u16 poison; struct sgx_encl_page *owner; struct list_head list; +#ifdef CONFIG_CGROUP_SGX_EPC + struct sgx_epc_cgroup *epc_cg; +#endif }; /* diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index 2f6cc3a0ad23..1a16efdfcd3d 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -46,11 +46,13 @@ struct misc_res_ops { * @max: Maximum limit on the resource. * @usage: Current usage of the resource. * @events: Number of times, the resource limit exceeded. + * @priv: resource specific data. */ struct misc_res { u64 max; atomic64_t usage; atomic64_t events; + void *priv; }; /** From patchwork Tue Jan 30 02:09:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 193876 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp1068030dyb; Tue, 30 Jan 2024 00:11:21 -0800 (PST) X-Google-Smtp-Source: AGHT+IEFkJLsu5/h3FQu+poM9XfVeeZ9R6xtJizZDSQH8nIbn6qvPM9CuFO81wXSDGBoCZTVmf1C X-Received: by 2002:a05:6402:a44:b0:55f:4095:950f with SMTP id bt4-20020a0564020a4400b0055f4095950fmr549249edb.13.1706602280867; Tue, 30 Jan 2024 00:11:20 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706602280; cv=pass; d=google.com; s=arc-20160816; b=abgKJRjoyP4um9r68GDfHWpuoy+G3//2cbUmdFPhdK2uXLXSTdmaBHcOfNjVXdpuBR slHSk6PGpPypLSJyaBKrsMcO0dd9owWbxIxbwxVZTBZJdjCCHRwM1aVV0gz5PtYQDHkO u8gc7j8gR3OU+7NIr+BHUjdpV8xTlfGltaVGLmS6DDAXeX9kquydgU0Y0GGgKvjLQiw8 GEQptOlTRqZk9x9bX2Y6Er0qTK1XFnEdDRykxVVYv6xhckqFQxA5UvTk6HStBHz2dYj3 Q9xX2q0Ebo4VwwcdW3FQDSDog2j4qh9O04/sUs+32F/EQkhHiom57uVysiW+ejzNsMYk hJkw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=ElFC2ehhELeEvC6JJ4a7k/iXjRXopkrfts3ldeJEX1E=; fh=dtGru4wMzkfh5N4HjO7hjgtPmzQheD7FyA8qGNiECvw=; b=itdSibEr2civH+umByRk34tXv5k2oh25il+PnrAcGpAo6D33WzqDovssfUDWRuN55g Nyd+FxVi8MltVEsUfFzRAXndDhn6B/wunNHDPG5HJs7AfPDobQcGfYGw69ALXWnua3XF O4Qa4JcysMxIj3rGpEAGH6zjMTaSbMdYmkKC1rr/HN9zI+UgrG6RLPJu/tFOw1U9Za7F /qY26heUW7XHV4t4B5Vgmraq1TZ3rCUnM8W0xDi8TANV9aOmdPcF4KJc0omt3AriNfZr 8r33dp/Kz9DnfnUeH+uihNuI1mfyeNXGFn8iuw4J0UUe6StVT1PxLu77zb1/aQR7lgIw /Urg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=DmUyRlQ9; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43787-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43787-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id t36-20020a056402242400b0055efb4604fdsi1962837eda.676.2024.01.30.00.11.20 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jan 2024 00:11:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-43787-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=DmUyRlQ9; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43787-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43787-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 8921E1F260FD for ; Tue, 30 Jan 2024 02:12:56 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 07F2254F92; Tue, 30 Jan 2024 02:09:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="DmUyRlQ9" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6DC683716E; Tue, 30 Jan 2024 02:09:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580586; cv=none; b=oDTk0+GYdQmfhH5AIWq4lHNYZN0eWfzsFpDxThOifLV3WUWPZ4u20ju+IzLAJPbQMEUiDBO/p0DcLGQ3LRyGeQrm3FHARxUp28PATqmpB2G7ZF/bAtHkBrREmjkVw+3qNc3UENy7uq2KDSTZ6wDzOILfKCALAQgaRn2MXfOMZaw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580586; c=relaxed/simple; bh=Il+dSnIOspaqoTr4+vmcB6/3FTgzodB3r6fRZncdtbI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=CK0SO2paeTtuor9BK9xFFKTxCCMKot+1AlKNwfe+xWTnOhxB+EaKlQuIucfu0UhSsfI7iPFA0jq9LrdmYOcPuuoNEKX45Ra6fqDnHxl2v+tEhZbFEnYoTiw8Hgg8J5lZUp+2/w1PMavU1grGUpGrv+MqohAWGWVjCZGP00jQrUQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=DmUyRlQ9; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706580584; x=1738116584; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Il+dSnIOspaqoTr4+vmcB6/3FTgzodB3r6fRZncdtbI=; b=DmUyRlQ9SVyWZF04VJ5BtEgjqOlxfdISu8Z0rt/ZUXa0ydPhmPVgHpGv JUrGJi7AlufpbTzWJqSnBkouELjGXu8hZxEif25ycg4L8wt/LaYm2/Kho c0bokJrsYoSm2XkkJVrxVcED3dco9eAdmIXx3jdM74nFyN5qCERYpI/08 nmc9n2T8ux5LijXqZ7LqlcT0v8R1u65J77Xy7ElglfxslDApjVVmChXhX 2DLPuDWdyWM2Fx8rRZdCpwH4y5FJdyXVpU3QOWTWFIZvAmMAPZ9Aa1ttT kt3X0XwznfqHH9bE7lQDUKzL2VnQS7ZqZi0w+vWD2u084G1JjgJpmFd0Y w==; X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="16530950" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="16530950" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 18:09:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="822042333" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="822042333" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga001.jf.intel.com with ESMTP; 29 Jan 2024 18:09:39 -0800 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, chrisyan@microsoft.com Subject: [PATCH v8 05/15] x86/sgx: Add sgx_epc_lru_list to encapsulate LRU list Date: Mon, 29 Jan 2024 18:09:28 -0800 Message-Id: <20240130020938.10025-6-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240130020938.10025-1-haitao.huang@linux.intel.com> References: <20240130020938.10025-1-haitao.huang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789502193030143292 X-GMAIL-MSGID: 1789502193030143292 From: Sean Christopherson Introduce a data structure to wrap the existing reclaimable list and its spinlock. Each cgroup later will have one instance of this structure to track EPC pages allocated for processes associated with the same cgroup. Just like the global SGX reclaimer (ksgxd), an EPC cgroup reclaims pages from the reclaimable list in this structure when its usage reaches near its limit. Use this structure to encapsulate the LRU list and its lock used by the global reclaimer. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson Reviewed-by: Jarkko Sakkinen --- V6: - removed introduction to unreclaimables in commit message. V4: - Removed unneeded comments for the spinlock and the non-reclaimables. (Kai, Jarkko) - Revised the commit to add introduction comments for unreclaimables and multiple LRU lists.(Kai) - Reordered the patches: delay all changes for unreclaimables to later, and this one becomes the first change in the SGX subsystem. V3: - Removed the helper functions and revised commit messages. --- arch/x86/kernel/cpu/sgx/main.c | 39 +++++++++++++++++----------------- arch/x86/kernel/cpu/sgx/sgx.h | 15 +++++++++++++ 2 files changed, 35 insertions(+), 19 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index c32f18b70c73..912959c7ecc9 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -28,10 +28,9 @@ static DEFINE_XARRAY(sgx_epc_address_space); /* * These variables are part of the state of the reclaimer, and must be accessed - * with sgx_reclaimer_lock acquired. + * with sgx_global_lru.lock acquired. */ -static LIST_HEAD(sgx_active_page_list); -static DEFINE_SPINLOCK(sgx_reclaimer_lock); +static struct sgx_epc_lru_list sgx_global_lru; static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0); @@ -306,13 +305,13 @@ static void sgx_reclaim_pages(void) int ret; int i; - spin_lock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); for (i = 0; i < SGX_NR_TO_SCAN; i++) { - if (list_empty(&sgx_active_page_list)) + epc_page = list_first_entry_or_null(&sgx_global_lru.reclaimable, + struct sgx_epc_page, list); + if (!epc_page) break; - epc_page = list_first_entry(&sgx_active_page_list, - struct sgx_epc_page, list); list_del_init(&epc_page->list); encl_page = epc_page->owner; @@ -324,7 +323,7 @@ static void sgx_reclaim_pages(void) */ epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; } - spin_unlock(&sgx_reclaimer_lock); + spin_unlock(&sgx_global_lru.lock); for (i = 0; i < cnt; i++) { epc_page = chunk[i]; @@ -347,9 +346,9 @@ static void sgx_reclaim_pages(void) continue; skip: - spin_lock(&sgx_reclaimer_lock); - list_add_tail(&epc_page->list, &sgx_active_page_list); - spin_unlock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); + list_add_tail(&epc_page->list, &sgx_global_lru.reclaimable); + spin_unlock(&sgx_global_lru.lock); kref_put(&encl_page->encl->refcount, sgx_encl_release); @@ -380,7 +379,7 @@ static void sgx_reclaim_pages(void) static bool sgx_should_reclaim(unsigned long watermark) { return atomic_long_read(&sgx_nr_free_pages) < watermark && - !list_empty(&sgx_active_page_list); + !list_empty(&sgx_global_lru.reclaimable); } /* @@ -432,6 +431,8 @@ static bool __init sgx_page_reclaimer_init(void) ksgxd_tsk = tsk; + sgx_lru_init(&sgx_global_lru); + return true; } @@ -507,10 +508,10 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) */ void sgx_mark_page_reclaimable(struct sgx_epc_page *page) { - spin_lock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); page->flags |= SGX_EPC_PAGE_RECLAIMER_TRACKED; - list_add_tail(&page->list, &sgx_active_page_list); - spin_unlock(&sgx_reclaimer_lock); + list_add_tail(&page->list, &sgx_global_lru.reclaimable); + spin_unlock(&sgx_global_lru.lock); } /** @@ -525,18 +526,18 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page) */ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) { - spin_lock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { /* The page is being reclaimed. */ if (list_empty(&page->list)) { - spin_unlock(&sgx_reclaimer_lock); + spin_unlock(&sgx_global_lru.lock); return -EBUSY; } list_del(&page->list); page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; } - spin_unlock(&sgx_reclaimer_lock); + spin_unlock(&sgx_global_lru.lock); return 0; } @@ -578,7 +579,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - if (list_empty(&sgx_active_page_list)) { + if (list_empty(&sgx_global_lru.reclaimable)) { page = ERR_PTR(-ENOMEM); break; } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index a898d86dead0..0e99e9ae3a67 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -88,6 +88,21 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page) return section->virt_addr + index * PAGE_SIZE; } +/* + * Contains EPC pages tracked by the global reclaimer (ksgxd) or an EPC + * cgroup. + */ +struct sgx_epc_lru_list { + spinlock_t lock; + struct list_head reclaimable; +}; + +static inline void sgx_lru_init(struct sgx_epc_lru_list *lru) +{ + spin_lock_init(&lru->lock); + INIT_LIST_HEAD(&lru->reclaimable); +} + struct sgx_epc_page *__sgx_alloc_epc_page(void); void sgx_free_epc_page(struct sgx_epc_page *page); From patchwork Tue Jan 30 02:09:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 193838 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp1007537dyb; Mon, 29 Jan 2024 21:12:20 -0800 (PST) X-Google-Smtp-Source: AGHT+IFZZlbiYr78yd+3JyJ5czDvJyELE6cPP7+JcJuPlPiyVAgWZ2A9yamPos2qZEBPuyvdf330 X-Received: by 2002:a17:90b:241:b0:295:ab4b:cfc2 with SMTP id fz1-20020a17090b024100b00295ab4bcfc2mr758799pjb.34.1706591540699; Mon, 29 Jan 2024 21:12:20 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706591540; cv=pass; d=google.com; s=arc-20160816; b=MLNgsaohXWr5xGqWloBOU8idOpfiR5Q9N5ZMI4JTt2TFZgUsHtOu6e/G7UwwjQ0BxR QtuFhYlpym0N+kk7acbd3RFjHjndwC4DXd/P1F9I7EaCcQ4djAxzhIuA1zk5U4lj911H Ut7xVrIH4dYLuaHuLWea7gozMYGTKx4n+ucgGJbQ9gpWkwyopz8rOt783M6paTM5vLTQ m8NecbO4uWQat/LZnDzv1O6QpJRyuPyl3rrIZCFwOdzG0UihS6ju/DtASOsmMs2g59ab DEXG85DbCw0Pxy1CNPKgSooMveRQfzAsiOl457RmIaR8tAOqkSL2MZ0f++++Ky4QmIb5 w2Jg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=05G0N7/0nyHKuQd7Pt/MZTSp0rQF8knxWgY1c/kVV7w=; fh=dtGru4wMzkfh5N4HjO7hjgtPmzQheD7FyA8qGNiECvw=; b=c5bTa6EtW86COH3oaBspaOIUQ5CfOhJ0PRgIf1fA6mVrLLpnOhcImoME4NWVPEK5/d QI7eriSZV+2/undsXpnw3xj7BcXGFcxJ3Mnf2j8OdbBExOMw0XW6vdsCatIB0Quc3kOO eJSmQUKVVg6xZDGd9CjTAGQi4z4HtihnFfjVj2gcvcKeOsj5n57EooZo0HlSRBQsu3gF 6zQqvKtdny+l6s62NGgXWw2IYpiCjrJv6DvTvfHEyHqqihtzPvrGqyEZNHDEYvcukyIL 5w0FxV5+fqjIDjCnQtQpqKYvO557DXl1nVN6HY1aOko6LMXJ85yHXiyMC1GNZrph0X5k gk/A== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=E6NIUZBM; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43786-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43786-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id y7-20020a17090a644700b00290ef34bc86si6878963pjm.85.2024.01.29.21.12.20 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 21:12:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-43786-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=E6NIUZBM; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43786-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43786-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 301422882D1 for ; Tue, 30 Jan 2024 02:12:44 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B70FF5465D; Tue, 30 Jan 2024 02:09:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="E6NIUZBM" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 79E5A37171; Tue, 30 Jan 2024 02:09:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580585; cv=none; b=FEmKOyfU3/hR1eYGSfgWCKPLowbWfrh3XNbh4T0dm9f4wiHMlSaeqSmoFO0zanmWYHShrvBCbzGfMdJzgsu28rmBxw+1AmBNv9x80BWh1Utoi2lpqWRsC/+p5e+GJKQEQwzbFJkDigL0i/dnH12+tb78dgQuIT7cYB2azvq4how= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580585; c=relaxed/simple; bh=riG0vI3pb2HvvMY7hIMcLvUSRS9WHe6ioFIvFSpXVKo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=jJkqUhETbNyrdhGVn3gJtAtPu8YFeoOb9A4EK+YZ1Kzs+kDnKeuVoxJCj+4N127oOVx/DPWdgg8Rv7AZ5StEZjI/bUEB/QjOZUYE0zLXG+6v6qz6Ftp3zj2oRqAbyNE9hcbjToTI6UsKfjP+zyDx2mCbvM9EvPycKov2FYOsLLA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=E6NIUZBM; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706580584; x=1738116584; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=riG0vI3pb2HvvMY7hIMcLvUSRS9WHe6ioFIvFSpXVKo=; b=E6NIUZBMX941xXlFOw6bk2EFf6NJrz7q2+u3S+IRyQfw6DjTgogyA3V2 DvL0zYzRhXiCil6Z+trmKW4eKKbXTBCFP+2T5lVk74irPdE3/GNn1XXJh 6EhUiiyTfyH+Xq4nGS+gQhfe1pY2EIkMbMhjtJTzW1F20zNmoRQatTcjl 81397iMisG+pX6BU2RmN5Z624b1JUAy59vjP9RpWe4cdKAhNYW5PG+gac foZiHVuZ075zha48gtS/cSXKXGvPw5KccE7p9nwCXdp7H7VmogorvT17D RDXSqQ8m/8Xmk7xhsIYwV6E01QNYDlQHBEWeV9ntcPC1ogwwk69/NkeyE g==; X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="16530959" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="16530959" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 18:09:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="822042336" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="822042336" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga001.jf.intel.com with ESMTP; 29 Jan 2024 18:09:39 -0800 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, chrisyan@microsoft.com Subject: [PATCH v8 06/15] x86/sgx: Abstract tracking reclaimable pages in LRU Date: Mon, 29 Jan 2024 18:09:29 -0800 Message-Id: <20240130020938.10025-7-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240130020938.10025-1-haitao.huang@linux.intel.com> References: <20240130020938.10025-1-haitao.huang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789490931319167934 X-GMAIL-MSGID: 1789490931319167934 From: Kristen Carlson Accardi The functions, sgx_{mark,unmark}_page_reclaimable(), manage the tracking of reclaimable EPC pages: sgx_mark_page_reclaimable() adds a newly allocated page into the global LRU list while sgx_unmark_page_reclaimable() does the opposite. Abstract the hard coded global LRU references in these functions to make them reusable when pages are tracked in per-cgroup LRUs. Create a helper, sgx_lru_list(), that returns the LRU that tracks a given EPC page. It simply returns the global LRU now, and will later return the LRU of the cgroup within which the EPC page was allocated. Replace the hard coded global LRU with a call to this helper. Next patches will first get the cgroup reclamation flow ready while keeping pages tracked in the global LRU and reclaimed by ksgxd before we make the switch in the end for sgx_lru_list() to return per-cgroup LRU. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Reviewed-by: Jarkko Sakkinen --- V7: - Split this out from the big patch, #10 in V6. (Dave, Kai) --- arch/x86/kernel/cpu/sgx/main.c | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 912959c7ecc9..a131aa985c95 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -32,6 +32,11 @@ static DEFINE_XARRAY(sgx_epc_address_space); */ static struct sgx_epc_lru_list sgx_global_lru; +static inline struct sgx_epc_lru_list *sgx_lru_list(struct sgx_epc_page *epc_page) +{ + return &sgx_global_lru; +} + static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0); /* Nodes with one or more EPC sections. */ @@ -500,25 +505,24 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) } /** - * sgx_mark_page_reclaimable() - Mark a page as reclaimable + * sgx_mark_page_reclaimable() - Mark a page as reclaimable and track it in a LRU. * @page: EPC page - * - * Mark a page as reclaimable and add it to the active page list. Pages - * are automatically removed from the active list when freed. */ void sgx_mark_page_reclaimable(struct sgx_epc_page *page) { - spin_lock(&sgx_global_lru.lock); + struct sgx_epc_lru_list *lru = sgx_lru_list(page); + + spin_lock(&lru->lock); page->flags |= SGX_EPC_PAGE_RECLAIMER_TRACKED; - list_add_tail(&page->list, &sgx_global_lru.reclaimable); - spin_unlock(&sgx_global_lru.lock); + list_add_tail(&page->list, &lru->reclaimable); + spin_unlock(&lru->lock); } /** - * sgx_unmark_page_reclaimable() - Remove a page from the reclaim list + * sgx_unmark_page_reclaimable() - Remove a page from its tracking LRU * @page: EPC page * - * Clear the reclaimable flag and remove the page from the active page list. + * Clear the reclaimable flag if set and remove the page from its LRU. * * Return: * 0 on success, @@ -526,18 +530,20 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page) */ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) { - spin_lock(&sgx_global_lru.lock); + struct sgx_epc_lru_list *lru = sgx_lru_list(page); + + spin_lock(&lru->lock); if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { /* The page is being reclaimed. */ if (list_empty(&page->list)) { - spin_unlock(&sgx_global_lru.lock); + spin_unlock(&lru->lock); return -EBUSY; } list_del(&page->list); page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; } - spin_unlock(&sgx_global_lru.lock); + spin_unlock(&lru->lock); return 0; } From patchwork Tue Jan 30 02:09:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 193834 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp997335dyb; Mon, 29 Jan 2024 20:40:14 -0800 (PST) X-Google-Smtp-Source: AGHT+IGN5ujBwuGQwdSuYQdnuA5emRMC8GzaiE2um15vGz4YjGHORz1fAakOogdREI2/kBR8NYnB X-Received: by 2002:a05:6214:d02:b0:683:815f:61f8 with SMTP id 2-20020a0562140d0200b00683815f61f8mr7706203qvh.75.1706589614463; Mon, 29 Jan 2024 20:40:14 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706589614; cv=pass; d=google.com; s=arc-20160816; b=So3vRHLjRBazNEPY92VvhT+qp3UljyxxXXufJo2T3zq64N0vVm5oQVxyzbN0cIFlZH C3YQmUTLgKhzAFmW/NDkwC8lNO3zPyuRzY9vMbX2JqGx52IUKYctO2wwZZTj33oknfPi LxfiJamb3fovU6FeXk0QGrvT8D+Wh2IPIHafp21sQspU/tFGTRSXwSalPhMtVqH6hUdh tRNJeezS0y+E9XGpQZK9L7up+qJCKQ/qzRf0kkmilptwIk5uDzAN+veTq4Yk+MaPXIhd 1Q8LtcCFOE5Y4ll7HUsplpN+ApsHn5I81fT9cqJRjPC+p63qyZTfbYXppkexa6PvCEXM MuMA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=7TZN1IyCIFKvnletJdVaiDEMVqIy4FfHAnOGtC6tX78=; fh=dtGru4wMzkfh5N4HjO7hjgtPmzQheD7FyA8qGNiECvw=; b=YRzdnfypiCl3M2NHx+QS/TL63ob0SMdpy2R7gzjW+he5XNMQ34I3hwSOUufIq8sTR2 o7cbEP0cLBuHTo1Am34xWhcbyLf6n54YTvRF93hHg9UFZw0Ssdx+3CsbP7xEQOMv3RE/ LEL2UMkCas0VWLcB8gAXH0MTtHMjT6lOVslBxtjQYT0193ki1MKX7bjbSjNrvUBbQmK0 m0meuAclgpmt9EWs307X7i+fY7r9xp/W6k3zK9OZ7D2B3QTLBY+OiBX57gs9A/G5X2d5 jTcSOPERkz0XjmCpGbVYB1bGGST9nbnjkjYDeF7UIy/UQZW+UvjEK95p/AeOvr2bselo 12sw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ixMKRwEc; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43788-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43788-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id gi7-20020a056214248700b006852904842csi9519542qvb.276.2024.01.29.20.40.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 20:40:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-43788-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ixMKRwEc; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43788-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43788-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id A5B471C244F9 for ; Tue, 30 Jan 2024 02:13:05 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 557C755E72; Tue, 30 Jan 2024 02:09:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ixMKRwEc" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 91463376E9; Tue, 30 Jan 2024 02:09:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580586; cv=none; b=C7z9Pdj4JcGfRN5mRxpcmCvl1S555kq8Fkx/xvZ8TRhNwbbBsGTFVwdnhjyeFE+SsDAlgsaJiXhWPJ1HR9/VKWy/x5lapCWcLZhzz5wkxpZFHvCVUsTneSgrfLuxTz6ZRYTBf8dpHBfZ+o745jCV3uEIj5xSanqp1x4+newCZ6U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580586; c=relaxed/simple; bh=PoyVKTX25MY4a35rvXrMVzkJGqAxRZaTOxgth451WqE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=aXzjBANx0gn4EiApZk2VnfRYBuV47W3hyLOhoV4QO/6Z7eJ7Pp2IeEkwDR0uPjJIBvhw5dabBPUe4l7WvVaERWvCBQ0T+BGtUfrIHeA2OpD6WLJ5NuyQiXosZUnOGauFNyRuBKG6qk7hYujrehLMrZNI3D/liMmzWXaFu5A2Z3U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ixMKRwEc; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706580585; x=1738116585; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PoyVKTX25MY4a35rvXrMVzkJGqAxRZaTOxgth451WqE=; b=ixMKRwEcDkCuz20+GaKp3exJZ/88qVusMjKtiJZwRTeGbX8Yub4Mvr6M F8OlQFgERq0uMt7nv+IbxaI8X81PILuJfs6w/1JcVqlQoW53AS2OyGWzD YM7X+UWq3CZgGML38DCS0M0FvxrNA74njz1BSiXn/eYOBWAPB3xzUJ8SY LzTyhySeusxelBvF8fbCmvdUyIzgq/HJ0WLfLE0ZR2gCginWtCXEILRCQ d+Z6pFUmjq/OuZ4iDtXW7RXH2X/m2PryApskU+mGvhgBvpf9zV3gLlKAb Eiq8T33lcDTAyferMD/GdHW861MIjMFn0MMq0AyxItbVnwUiiR2cG59rE A==; X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="16530968" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="16530968" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 18:09:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="822042339" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="822042339" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga001.jf.intel.com with ESMTP; 29 Jan 2024 18:09:39 -0800 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, chrisyan@microsoft.com Subject: [PATCH v8 07/15] x86/sgx: Expose sgx_reclaim_pages() for cgroup Date: Mon, 29 Jan 2024 18:09:30 -0800 Message-Id: <20240130020938.10025-8-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240130020938.10025-1-haitao.huang@linux.intel.com> References: <20240130020938.10025-1-haitao.huang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789488911332882574 X-GMAIL-MSGID: 1789488911332882574 From: Sean Christopherson Each EPC cgroup will have an LRU structure to track reclaimable EPC pages. When a cgroup usage reaches its limit, the cgroup needs to reclaim pages from its LRU or LRUs of its descendants to make room for any new allocations. To prepare for reclamation per cgroup, expose the top level reclamation function, sgx_reclaim_pages(), in header file for reuse. Add a parameter to the function to pass in an LRU so cgroups can pass in different tracking LRUs later. Add another parameter for passing in the number of pages to scan and make the function return the number of pages reclaimed as a cgroup reclaimer may need to track reclamation progress from its descendants, change number of pages to scan in subsequent calls. Create a wrapper for the global reclaimer, sgx_reclaim_pages_global(), to just call this function with the global LRU passed in. When per-cgroup LRU is added later, the wrapper will perform global reclamation from the root cgroup. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Reviewed-by: Jarkko Sakkinen --- V8: - Use width of 80 characters in text paragraphs. (Jarkko) V7: - Reworked from patch 9 of V6, "x86/sgx: Restructure top-level EPC reclaim function". Do not split the top level function (Kai) - Dropped patches 7 and 8 of V6. --- arch/x86/kernel/cpu/sgx/main.c | 53 +++++++++++++++++++++++----------- arch/x86/kernel/cpu/sgx/sgx.h | 1 + 2 files changed, 37 insertions(+), 17 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index a131aa985c95..4f5824c4751d 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -286,11 +286,13 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, mutex_unlock(&encl->lock); } -/* - * Take a fixed number of pages from the head of the active page pool and - * reclaim them to the enclave's private shmem files. Skip the pages, which have - * been accessed since the last scan. Move those pages to the tail of active - * page pool so that the pages get scanned in LRU like fashion. +/** + * sgx_reclaim_pages() - Reclaim a fixed number of pages from an LRU + * + * Take a fixed number of pages from the head of a given LRU and reclaim them to + * the enclave's private shmem files. Skip the pages, which have been accessed + * since the last scan. Move those pages to the tail of the list so that the + * pages get scanned in LRU like fashion. * * Batch process a chunk of pages (at the moment 16) in order to degrade amount * of IPI's and ETRACK's potentially required. sgx_encl_ewb() does degrade a bit @@ -298,8 +300,13 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, * + EWB) but not sufficiently. Reclaiming one page at a time would also be * problematic as it would increase the lock contention too much, which would * halt forward progress. + * + * @lru: The LRU from which pages are reclaimed. + * @nr_to_scan: Pointer to the target number of pages to scan, must be less than + * SGX_NR_TO_SCAN. + * Return: Number of pages reclaimed. */ -static void sgx_reclaim_pages(void) +unsigned int sgx_reclaim_pages(struct sgx_epc_lru_list *lru, unsigned int *nr_to_scan) { struct sgx_epc_page *chunk[SGX_NR_TO_SCAN]; struct sgx_backing backing[SGX_NR_TO_SCAN]; @@ -310,10 +317,10 @@ static void sgx_reclaim_pages(void) int ret; int i; - spin_lock(&sgx_global_lru.lock); - for (i = 0; i < SGX_NR_TO_SCAN; i++) { - epc_page = list_first_entry_or_null(&sgx_global_lru.reclaimable, - struct sgx_epc_page, list); + spin_lock(&lru->lock); + + for (; *nr_to_scan > 0; --(*nr_to_scan)) { + epc_page = list_first_entry_or_null(&lru->reclaimable, struct sgx_epc_page, list); if (!epc_page) break; @@ -328,7 +335,8 @@ static void sgx_reclaim_pages(void) */ epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; } - spin_unlock(&sgx_global_lru.lock); + + spin_unlock(&lru->lock); for (i = 0; i < cnt; i++) { epc_page = chunk[i]; @@ -351,9 +359,9 @@ static void sgx_reclaim_pages(void) continue; skip: - spin_lock(&sgx_global_lru.lock); - list_add_tail(&epc_page->list, &sgx_global_lru.reclaimable); - spin_unlock(&sgx_global_lru.lock); + spin_lock(&lru->lock); + list_add_tail(&epc_page->list, &lru->reclaimable); + spin_unlock(&lru->lock); kref_put(&encl_page->encl->refcount, sgx_encl_release); @@ -366,6 +374,7 @@ static void sgx_reclaim_pages(void) sgx_reclaimer_block(epc_page); } + ret = 0; for (i = 0; i < cnt; i++) { epc_page = chunk[i]; if (!epc_page) @@ -378,7 +387,10 @@ static void sgx_reclaim_pages(void) epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; sgx_free_epc_page(epc_page); + ret++; } + + return (unsigned int)ret; } static bool sgx_should_reclaim(unsigned long watermark) @@ -387,6 +399,13 @@ static bool sgx_should_reclaim(unsigned long watermark) !list_empty(&sgx_global_lru.reclaimable); } +static void sgx_reclaim_pages_global(void) +{ + unsigned int nr_to_scan = SGX_NR_TO_SCAN; + + sgx_reclaim_pages(&sgx_global_lru, &nr_to_scan); +} + /* * sgx_reclaim_direct() should be called (without enclave's mutex held) * in locations where SGX memory resources might be low and might be @@ -395,7 +414,7 @@ static bool sgx_should_reclaim(unsigned long watermark) void sgx_reclaim_direct(void) { if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) - sgx_reclaim_pages(); + sgx_reclaim_pages_global(); } static int ksgxd(void *p) @@ -418,7 +437,7 @@ static int ksgxd(void *p) sgx_should_reclaim(SGX_NR_HIGH_PAGES)); if (sgx_should_reclaim(SGX_NR_HIGH_PAGES)) - sgx_reclaim_pages(); + sgx_reclaim_pages_global(); cond_resched(); } @@ -604,7 +623,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) * Need to do a global reclamation if cgroup was not full but free * physical pages run out, causing __sgx_alloc_epc_page() to fail. */ - sgx_reclaim_pages(); + sgx_reclaim_pages_global(); cond_resched(); } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 0e99e9ae3a67..2593c013d091 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -110,6 +110,7 @@ void sgx_reclaim_direct(void); void sgx_mark_page_reclaimable(struct sgx_epc_page *page); int sgx_unmark_page_reclaimable(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); +unsigned int sgx_reclaim_pages(struct sgx_epc_lru_list *lru, unsigned int *nr_to_scan); void sgx_ipi_cb(void *info); From patchwork Tue Jan 30 02:09:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 193835 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp999935dyb; Mon, 29 Jan 2024 20:49:41 -0800 (PST) X-Google-Smtp-Source: AGHT+IGWDIfWDggZ8QG0+TTGCstvNSAEwjBwhSadpDA0EcyfmcpnlOXGKkEvTT/RXEI1lzG/6TkX X-Received: by 2002:aca:2102:0:b0:3bd:c19f:2fa3 with SMTP id 2-20020aca2102000000b003bdc19f2fa3mr8503675oiz.51.1706590181126; Mon, 29 Jan 2024 20:49:41 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706590181; cv=pass; d=google.com; s=arc-20160816; b=V9sjNSvO5JFp3YAqIu2SI76SHV8NAf1OiGJ2gbRBOxbtfgTghrpqAYGGG/zwHOKbNZ 3DjQDfhD6Qnl5YMPlHe9KpDVQw3xNErFcOVATPWBOdFUPKC4xUv48Wzqw1GVD1RzGp2c 1SC10UKbbJJYfIXCinH7p73f620WWEu7a7lRyJrtxbBLIyifLHVZZwKbLkNep89U6ltD zNw89/+U1GqJHgLFUszcuGHWy5EjnmFrReRaUk6C+hKQ9n2HwtQ7KXkwZ7PlTJPYYt1k 4/S64AiExXTjSkLzF/GT5CQNyv4lGD2Yc/eJaQGLRz78X9PYPUX9Du+29iZCc70fscOd 0sgA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=xy18kDQjaXZL4MWx+jhp89HjpO/Erx6yaw9znfwFLcU=; fh=dtGru4wMzkfh5N4HjO7hjgtPmzQheD7FyA8qGNiECvw=; b=Ue9T3ffdAFat+4oOnpUU1xs/o+MkKlUUUQ+FBJ1x4GnK8u3TBCC46wUMn+bZeUDFxf BM+5dLzYOnNYqhraLUSbNKYfYkn/o3crWGQtWiRTcByhD7T/FtvoF9yrL5Rk92iK31N0 vn1QNBGeSLgiYlRJHiTLHdCGb0b+RDO0/4a9Q2tD69sGz1AKJlIETdHOF/YoX1W/nwO2 CMgz9pH+LG2nvbNo/j3VJmOwHR3v3Q7RLWINHeLRaRbtnj0QKM67sOIR7YUWzJpTayIx Ea70Jg8E2N4sZ8Cwok4xdSbpGa5kuenKNr7dwBxtmHInQr+lSPW3n/T9Jt7U+TWJ4ax4 xylg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=kAh1HGoK; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43789-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43789-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id k64-20020a638443000000b005cec9fd8061si361123pgd.511.2024.01.29.20.49.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 20:49:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-43789-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=kAh1HGoK; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43789-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43789-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id CE4FBB244B9 for ; Tue, 30 Jan 2024 02:13:27 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id F297A364DC; Tue, 30 Jan 2024 02:09:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="kAh1HGoK" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 53DFA37705; Tue, 30 Jan 2024 02:09:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580587; cv=none; b=F0aD4ZBDujwhYucbVohyoIULaBiRkAdds9zaVkOScagIb7LPV5PrAQkoQj70cWT5FwIZuB+TAAKjNDuMHHRRxJi5u4mwIevI4rzzXIagE4Yfl6MeJjI4k+pHl0NAIJ0uT5fa533PgEHFU9JqVDsFMzdA+K27QdmI/iQpVyVVIxg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580587; c=relaxed/simple; bh=yBZnVsYKUJTdPa6SgYXfS7808tsLgToE9zfhOWgI4aw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Ol69ckkxKAFES4BDuzI54CCSvvCfx+jzPcW1MAmXJrxNjXaHL/vi1ihWH/IBImsaEbRlWuk6rxbZZBk7U1vMoVmr5KMvgJMsLLWiqP/6ZfIdlLLzrLufLSoVsp7laa+mqxyI3sHbkLr+wIfOZFyKRYXLrDGB00APwj94zQSuXeg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=kAh1HGoK; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706580586; x=1738116586; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yBZnVsYKUJTdPa6SgYXfS7808tsLgToE9zfhOWgI4aw=; b=kAh1HGoKUmJy6jet5nN9NVnaZP++IEnxcKH0drppK4vvz1VhZL7gdOty 845M4r3BNQqk0ywGthCHLCkgEbMs2ElWVelBKHZezYqzNZg/OhW4w9OKt ULAGhucA9U82z3WK4Qb2B2ZQsuMSFYoe31aCcDnZcgM8TVRYUbq6V+VKL FZG7djGUH2vfyGNx3b7KJnr3/hVxZKOWEUzqsSqAcPsfX2kyBlFqn7HzM OSbIUm2uaOiyFRjdm7TA92LpJYAEiL9ZNGbXwQqiRHBnUvbkC9Fo4L5NI np9gocehhLRa/iWnLMEC3Y9eCtcwXa0hsU7AG5yObQsj1ap2D6nZO3tgt w==; X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="16530977" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="16530977" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 18:09:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="822042342" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="822042342" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga001.jf.intel.com with ESMTP; 29 Jan 2024 18:09:40 -0800 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, chrisyan@microsoft.com Subject: [PATCH v8 08/15] x86/sgx: Implement EPC reclamation flows for cgroup Date: Mon, 29 Jan 2024 18:09:31 -0800 Message-Id: <20240130020938.10025-9-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240130020938.10025-1-haitao.huang@linux.intel.com> References: <20240130020938.10025-1-haitao.huang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789489505627406718 X-GMAIL-MSGID: 1789489505627406718 From: Kristen Carlson Accardi Implement the reclamation flow for cgroup, encapsulated in the top-level function sgx_epc_cgroup_reclaim_pages(). It does a pre-order walk on its subtree, and make calls to sgx_reclaim_pages() at each node passing in the LRU of that node. It keeps track of total reclaimed pages, and pages left to attempt. It stops the walk if desired number of pages are attempted. In some contexts, e.g. page fault handling, only asynchronous reclamation is allowed. Create a work-queue, corresponding work item and function definitions to support the asynchronous reclamation. Both synchronous and asynchronous flows invoke the same top level reclaim function, and will be triggered later by sgx_epc_cgroup_try_charge() when usage of the cgroup is at or near its limit. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang --- V8: - Remove alignment for substructure variables. (Jarkko) V7: - Split this out from the big patch, #10 in V6. (Dave, Kai) --- arch/x86/kernel/cpu/sgx/epc_cgroup.c | 174 ++++++++++++++++++++++++++- arch/x86/kernel/cpu/sgx/epc_cgroup.h | 3 + 2 files changed, 176 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c b/arch/x86/kernel/cpu/sgx/epc_cgroup.c index eac8548164de..8858a0850f8a 100644 --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c @@ -7,9 +7,173 @@ static struct sgx_epc_cgroup epc_cg_root; +static struct workqueue_struct *sgx_epc_cg_wq; + +static inline u64 sgx_epc_cgroup_page_counter_read(struct sgx_epc_cgroup *epc_cg) +{ + return atomic64_read(&epc_cg->cg->res[MISC_CG_RES_SGX_EPC].usage) / PAGE_SIZE; +} + +static inline u64 sgx_epc_cgroup_max_pages(struct sgx_epc_cgroup *epc_cg) +{ + return READ_ONCE(epc_cg->cg->res[MISC_CG_RES_SGX_EPC].max) / PAGE_SIZE; +} + +/* + * Get the lower bound of limits of a cgroup and its ancestors. Used in + * sgx_epc_cgroup_reclaim_work_func() to determine if EPC usage of a cgroup is + * over its limit or its ancestors' hence reclamation is needed. + */ +static inline u64 sgx_epc_cgroup_max_pages_to_root(struct sgx_epc_cgroup *epc_cg) +{ + struct misc_cg *i = epc_cg->cg; + u64 m = U64_MAX; + + while (i) { + m = min(m, READ_ONCE(i->res[MISC_CG_RES_SGX_EPC].max)); + i = misc_cg_parent(i); + } + + return m / PAGE_SIZE; +} + /** - * sgx_epc_cgroup_try_charge() - try to charge cgroup for a single EPC page + * sgx_epc_cgroup_lru_empty() - check if a cgroup tree has no pages on its LRUs + * @root: Root of the tree to check * + * Return: %true if all cgroups under the specified root have empty LRU lists. + * Used to avoid livelocks due to a cgroup having a non-zero charge count but + * no pages on its LRUs, e.g. due to a dead enclave waiting to be released or + * because all pages in the cgroup are unreclaimable. + */ +bool sgx_epc_cgroup_lru_empty(struct misc_cg *root) +{ + struct cgroup_subsys_state *css_root; + struct cgroup_subsys_state *pos; + struct sgx_epc_cgroup *epc_cg; + bool ret = true; + + /* + * Caller ensure css_root ref acquired + */ + css_root = &root->css; + + rcu_read_lock(); + css_for_each_descendant_pre(pos, css_root) { + if (!css_tryget(pos)) + break; + + rcu_read_unlock(); + + epc_cg = sgx_epc_cgroup_from_misc_cg(css_misc(pos)); + + spin_lock(&epc_cg->lru.lock); + ret = list_empty(&epc_cg->lru.reclaimable); + spin_unlock(&epc_cg->lru.lock); + + rcu_read_lock(); + css_put(pos); + if (!ret) + break; + } + + rcu_read_unlock(); + + return ret; +} + +/** + * sgx_epc_cgroup_reclaim_pages() - walk a cgroup tree and scan LRUs to reclaim pages + * @root: Root of the tree to start walking + * Return: Number of pages reclaimed. + */ +unsigned int sgx_epc_cgroup_reclaim_pages(struct misc_cg *root) +{ + /* + * Attempting to reclaim only a few pages will often fail and is + * inefficient, while reclaiming a huge number of pages can result in + * soft lockups due to holding various locks for an extended duration. + */ + unsigned int nr_to_scan = SGX_NR_TO_SCAN; + struct cgroup_subsys_state *css_root; + struct cgroup_subsys_state *pos; + struct sgx_epc_cgroup *epc_cg; + unsigned int cnt; + + /* Caller ensure css_root ref acquired */ + css_root = &root->css; + + cnt = 0; + rcu_read_lock(); + css_for_each_descendant_pre(pos, css_root) { + if (!css_tryget(pos)) + break; + rcu_read_unlock(); + + epc_cg = sgx_epc_cgroup_from_misc_cg(css_misc(pos)); + cnt += sgx_reclaim_pages(&epc_cg->lru, &nr_to_scan); + + rcu_read_lock(); + css_put(pos); + if (!nr_to_scan) + break; + } + + rcu_read_unlock(); + return cnt; +} + +/* + * Scheduled by sgx_epc_cgroup_try_charge() to reclaim pages from the cgroup + * when the cgroup is at/near its maximum capacity + */ +static void sgx_epc_cgroup_reclaim_work_func(struct work_struct *work) +{ + struct sgx_epc_cgroup *epc_cg; + u64 cur, max; + + epc_cg = container_of(work, struct sgx_epc_cgroup, reclaim_work); + + for (;;) { + max = sgx_epc_cgroup_max_pages_to_root(epc_cg); + + /* + * Adjust the limit down by one page, the goal is to free up + * pages for fault allocations, not to simply obey the limit. + * Conditionally decrementing max also means the cur vs. max + * check will correctly handle the case where both are zero. + */ + if (max) + max--; + + /* + * Unless the limit is extremely low, in which case forcing + * reclaim will likely cause thrashing, force the cgroup to + * reclaim at least once if it's operating *near* its maximum + * limit by adjusting @max down by half the min reclaim size. + * This work func is scheduled by sgx_epc_cgroup_try_charge + * when it cannot directly reclaim due to being in an atomic + * context, e.g. EPC allocation in a fault handler. Waiting + * to reclaim until the cgroup is actually at its limit is less + * performant as it means the faulting task is effectively + * blocked until a worker makes its way through the global work + * queue. + */ + if (max > SGX_NR_TO_SCAN * 2) + max -= (SGX_NR_TO_SCAN / 2); + + cur = sgx_epc_cgroup_page_counter_read(epc_cg); + + if (cur <= max || sgx_epc_cgroup_lru_empty(epc_cg->cg)) + break; + + /* Keep reclaiming until above condition is met. */ + sgx_epc_cgroup_reclaim_pages(epc_cg->cg); + } +} + +/** + * sgx_epc_cgroup_try_charge() - try to charge cgroup for a single EPC page * @epc_cg: The EPC cgroup to be charged for the page. * Return: * * %0 - If successfully charged. @@ -37,6 +201,7 @@ static void sgx_epc_cgroup_free(struct misc_cg *cg) if (!epc_cg) return; + cancel_work_sync(&epc_cg->reclaim_work); kfree(epc_cg); } @@ -49,6 +214,8 @@ const struct misc_res_ops sgx_epc_cgroup_ops = { static void sgx_epc_misc_init(struct misc_cg *cg, struct sgx_epc_cgroup *epc_cg) { + sgx_lru_init(&epc_cg->lru); + INIT_WORK(&epc_cg->reclaim_work, sgx_epc_cgroup_reclaim_work_func); cg->res[MISC_CG_RES_SGX_EPC].priv = epc_cg; epc_cg->cg = cg; } @@ -68,6 +235,11 @@ static int sgx_epc_cgroup_alloc(struct misc_cg *cg) void sgx_epc_cgroup_init(void) { + sgx_epc_cg_wq = alloc_workqueue("sgx_epc_cg_wq", + WQ_UNBOUND | WQ_FREEZABLE, + WQ_UNBOUND_MAX_ACTIVE); + BUG_ON(!sgx_epc_cg_wq); + misc_cg_set_ops(MISC_CG_RES_SGX_EPC, &sgx_epc_cgroup_ops); sgx_epc_misc_init(misc_cg_root(), &epc_cg_root); } diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.h b/arch/x86/kernel/cpu/sgx/epc_cgroup.h index 6b664b4c321f..e3c6a08f0ee8 100644 --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.h +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h @@ -34,6 +34,8 @@ static inline void sgx_epc_cgroup_init(void) { } #else struct sgx_epc_cgroup { struct misc_cg *cg; + struct sgx_epc_lru_list lru; + struct work_struct reclaim_work; }; static inline struct sgx_epc_cgroup *sgx_epc_cgroup_from_misc_cg(struct misc_cg *cg) @@ -66,6 +68,7 @@ static inline void sgx_put_epc_cg(struct sgx_epc_cgroup *epc_cg) int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg); void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg); +bool sgx_epc_cgroup_lru_empty(struct misc_cg *root); void sgx_epc_cgroup_init(void); #endif From patchwork Tue Jan 30 02:09:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 193862 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp1047724dyb; Mon, 29 Jan 2024 23:14:39 -0800 (PST) X-Google-Smtp-Source: AGHT+IGK0Mr4QZ1OvSbtIz47xK6Wg85Bdi8tycY7J3OYvOOWAgrl02UHvcdvW/HQXRrcVtkkUFYY X-Received: by 2002:a17:90b:104:b0:294:777e:2fd4 with SMTP id p4-20020a17090b010400b00294777e2fd4mr5336070pjz.8.1706598878913; Mon, 29 Jan 2024 23:14:38 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706598878; cv=pass; d=google.com; s=arc-20160816; b=yCGyOdnI70L5p3hHkcbm30c5Y5RBQ0LgBzlYvcYuxjp1Jq1fViM0/3bY0EWsrdxgfp GBg2zTWV6S2UX1iSPFyrOl4qyU88ex37udtazE+wvfV/bMKZIyVPJKd1c9/jnxbuZCaK HILQIwxLs+XNWiMQF3eObXZ0cAfmS93VTiZ54bsIaeV07cYRxRUad8v9+ViWBIp5ne8p xzjR7EGOD8jy34Dq9HLRUNXxzAPUiDDz3xH6NotLTq0OPxLjY4+XUUdE4ucyg1r71miQ 7diRNyKDWUML8C7wuah+9gq+dlQsN7DAlewMI1ufrEPWooeGesiAfYrEflyyWR7GYtG6 prSA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=XcU4flRufYGlfTv5qzk5cjWlfLAEgA6bVGPzdmuBvgs=; fh=dtGru4wMzkfh5N4HjO7hjgtPmzQheD7FyA8qGNiECvw=; b=NVMtZzYM2elxuh2jZRdKCabbTduhiD303ClCmWhb7LFw3XmvPtjf1kY//zYND40Vr9 BvzX6W9+OYmPqDEyY5xAfexMpO5rc4JXFTzg9Qe4ExqvPWfkREf7MjVupNZ2qXxRM4iF bcLJLBHIsKMaUJXKksWfYip7dxALjaQzROD9NqaWj9uJhLJajFuprXQ5eKiUz7dxuSsY mthNHqkaGtfioPM/IbMeVClMKnTlRGThaOEXR8t+RZzhqIG5MpjwnwbsFpcXwLPZZE6D 90cd7IEXZYJ8Ir56Xpuyux3vTk/PLOJEBtHnCSj8zT7IPcdM1Xg6oWC9NfciO2wVHoPC rtmg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=aPQpumpv; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43791-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43791-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id gz8-20020a17090b0ec800b00295ada9d9e9si757365pjb.107.2024.01.29.23.14.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 23:14:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-43791-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=aPQpumpv; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43791-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43791-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id A5C25B248AD for ; Tue, 30 Jan 2024 02:14:07 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B045115AAA1; Tue, 30 Jan 2024 02:09:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="aPQpumpv" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 41A88381B6; Tue, 30 Jan 2024 02:09:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580588; cv=none; b=VEZ4oxSo2KxWxMQdFsbkPo+i/B6lehn5KZa1uMjfCS+0yyQUbxaINuDxS7tXHEHPGNkn5nSrmDRWkzoW1EU2GB08dUpHGDnAfiDImiDmx1kdKR9aM4wi/o1ln806XgxCy3C1RRt5BVHPJwjISwtfRpQ5k5S1KBhv9B+kkV7OwdQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580588; c=relaxed/simple; bh=JfM/ZcJB3Qb0I3vm5nhyOVFVq3TPAFx7TTYvObPoqg4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=RpWcf9VZjDSby6RX0b3GPHEyYSWa/J+LmvrriIL10MxpwGBU3CuYi7fSn2akRuRN4ttDNNsoxFtVJAXm8aNFHy5anL4W0GwHRT24qL5ILkL8m0STDOJxU2xb3KDaZ7QFWI8fjxtJUeTHGjXsBn9MLmSv/AysF9CgcDZtNK/74Ao= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=aPQpumpv; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706580587; x=1738116587; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=JfM/ZcJB3Qb0I3vm5nhyOVFVq3TPAFx7TTYvObPoqg4=; b=aPQpumpvbZb2KkRIDDiz4cDar3QaBUctAFdS/7AOiLeED9nMLpax6bxn VBW/fWAZsu7/CufoVrKldbylmWhbxXm8EjbGAGUR1OH14k41gvr2RYo22 xFt+9SFX5bh797B30NghkcrrH2/Irle3TscYXVRRtqz9tZZuq+0narB+j SBfrlirf9DxD6Nwhl9EUw5qHW+73l3txcCOpEvLuXg3E+4aJW+RZhqMqi OMGRlsIWJVRmV2znkPdd5zotUWnEgbFRJehFpGouVv72rOBmuB4hh/Pyg CRDgl3acxoJ02+aavP/0g6+b/uPwRBaTUfSVD/HO9GZxL/To9ZfbnBW9X A==; X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="16530986" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="16530986" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 18:09:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="822042345" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="822042345" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga001.jf.intel.com with ESMTP; 29 Jan 2024 18:09:40 -0800 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, chrisyan@microsoft.com Subject: [PATCH v8 09/15] x86/sgx: Charge mem_cgroup for per-cgroup reclamation Date: Mon, 29 Jan 2024 18:09:32 -0800 Message-Id: <20240130020938.10025-10-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240130020938.10025-1-haitao.huang@linux.intel.com> References: <20240130020938.10025-1-haitao.huang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789498626252897523 X-GMAIL-MSGID: 1789498626252897523 Enclave Page Cache(EPC) memory can be swapped out to regular system memory, and the consumed memory should be charged to a proper mem_cgroup. Currently the selection of mem_cgroup to charge is done in sgx_encl_get_mem_cgroup(). But it only considers two contexts in which the swapping can be done: normal tasks and the ksgxd kthread. With the new EPC cgroup implementation, the swapping can also happen in EPC cgroup work-queue threads. In those cases, it improperly selects the root mem_cgroup to charge for the RAM usage. Change sgx_encl_get_mem_cgroup() to handle non-task contexts only and return the mem_cgroup of an mm_struct associated with the enclave. The return is used to charge for EPC backing pages in all kthread cases. Pass a flag into the top level reclamation function, sgx_reclaim_pages(), to explicitly indicate whether it is called from a background kthread. Internally, if the flag is true, switch the active mem_cgroup to the one returned from sgx_encl_get_mem_cgroup(), prior to any backing page allocation, in order to ensure that shmem page allocations are charged to the enclave's cgroup. Removed current_is_ksgxd() as it is no longer needed. Signed-off-by: Haitao Huang Reported-by: Mikko Ylinen --- V8: - Limit text paragraphs to 80 characters wide. (Jarkko) --- arch/x86/kernel/cpu/sgx/encl.c | 43 ++++++++++++++-------------- arch/x86/kernel/cpu/sgx/encl.h | 3 +- arch/x86/kernel/cpu/sgx/epc_cgroup.c | 7 +++-- arch/x86/kernel/cpu/sgx/main.c | 27 ++++++++--------- arch/x86/kernel/cpu/sgx/sgx.h | 3 +- 5 files changed, 40 insertions(+), 43 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 279148e72459..348e8b58abeb 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -993,9 +993,7 @@ static int __sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_inde } /* - * When called from ksgxd, returns the mem_cgroup of a struct mm stored - * in the enclave's mm_list. When not called from ksgxd, just returns - * the mem_cgroup of the current task. + * Returns the mem_cgroup of a struct mm stored in the enclave's mm_list. */ static struct mem_cgroup *sgx_encl_get_mem_cgroup(struct sgx_encl *encl) { @@ -1003,14 +1001,6 @@ static struct mem_cgroup *sgx_encl_get_mem_cgroup(struct sgx_encl *encl) struct sgx_encl_mm *encl_mm; int idx; - /* - * If called from normal task context, return the mem_cgroup - * of the current task's mm. The remainder of the handling is for - * ksgxd. - */ - if (!current_is_ksgxd()) - return get_mem_cgroup_from_mm(current->mm); - /* * Search the enclave's mm_list to find an mm associated with * this enclave to charge the allocation to. @@ -1047,29 +1037,38 @@ static struct mem_cgroup *sgx_encl_get_mem_cgroup(struct sgx_encl *encl) * @encl: an enclave pointer * @page_index: enclave page index * @backing: data for accessing backing storage for the page + * @indirect: in ksgxd or EPC cgroup work queue context + * + * Create a backing page for loading data back into an EPC page with ELDU. This + * function takes a reference on a new backing page which must be dropped with a + * corresponding call to sgx_encl_put_backing(). * - * When called from ksgxd, sets the active memcg from one of the - * mms in the enclave's mm_list prior to any backing page allocation, - * in order to ensure that shmem page allocations are charged to the - * enclave. Create a backing page for loading data back into an EPC page with - * ELDU. This function takes a reference on a new backing page which - * must be dropped with a corresponding call to sgx_encl_put_backing(). + * When @indirect is true, sets the active memcg from one of the mms in the + * enclave's mm_list prior to any backing page allocation, in order to ensure + * that shmem page allocations are charged to the enclave. * * Return: * 0 on success, * -errno otherwise. */ int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index, - struct sgx_backing *backing) + struct sgx_backing *backing, bool indirect) { - struct mem_cgroup *encl_memcg = sgx_encl_get_mem_cgroup(encl); - struct mem_cgroup *memcg = set_active_memcg(encl_memcg); + struct mem_cgroup *encl_memcg; + struct mem_cgroup *memcg; int ret; + if (indirect) { + encl_memcg = sgx_encl_get_mem_cgroup(encl); + memcg = set_active_memcg(encl_memcg); + } + ret = __sgx_encl_get_backing(encl, page_index, backing); - set_active_memcg(memcg); - mem_cgroup_put(encl_memcg); + if (indirect) { + set_active_memcg(memcg); + mem_cgroup_put(encl_memcg); + } return ret; } diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h index f94ff14c9486..549cd2e8d98b 100644 --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -103,12 +103,11 @@ static inline int sgx_encl_find(struct mm_struct *mm, unsigned long addr, int sgx_encl_may_map(struct sgx_encl *encl, unsigned long start, unsigned long end, unsigned long vm_flags); -bool current_is_ksgxd(void); void sgx_encl_release(struct kref *ref); int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm); const cpumask_t *sgx_encl_cpumask(struct sgx_encl *encl); int sgx_encl_alloc_backing(struct sgx_encl *encl, unsigned long page_index, - struct sgx_backing *backing); + struct sgx_backing *backing, bool indirect); void sgx_encl_put_backing(struct sgx_backing *backing); int sgx_encl_test_and_clear_young(struct mm_struct *mm, struct sgx_encl_page *page); diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c b/arch/x86/kernel/cpu/sgx/epc_cgroup.c index 8858a0850f8a..cbcb7b0de3fe 100644 --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c @@ -85,9 +85,10 @@ bool sgx_epc_cgroup_lru_empty(struct misc_cg *root) /** * sgx_epc_cgroup_reclaim_pages() - walk a cgroup tree and scan LRUs to reclaim pages * @root: Root of the tree to start walking + * @indirect: In ksgxd or EPC cgroup work queue context. * Return: Number of pages reclaimed. */ -unsigned int sgx_epc_cgroup_reclaim_pages(struct misc_cg *root) +static unsigned int sgx_epc_cgroup_reclaim_pages(struct misc_cg *root, bool indirect) { /* * Attempting to reclaim only a few pages will often fail and is @@ -111,7 +112,7 @@ unsigned int sgx_epc_cgroup_reclaim_pages(struct misc_cg *root) rcu_read_unlock(); epc_cg = sgx_epc_cgroup_from_misc_cg(css_misc(pos)); - cnt += sgx_reclaim_pages(&epc_cg->lru, &nr_to_scan); + cnt += sgx_reclaim_pages(&epc_cg->lru, &nr_to_scan, indirect); rcu_read_lock(); css_put(pos); @@ -168,7 +169,7 @@ static void sgx_epc_cgroup_reclaim_work_func(struct work_struct *work) break; /* Keep reclaiming until above condition is met. */ - sgx_epc_cgroup_reclaim_pages(epc_cg->cg); + sgx_epc_cgroup_reclaim_pages(epc_cg->cg, true); } } diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 4f5824c4751d..51904f191b97 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -254,7 +254,7 @@ static void sgx_encl_ewb(struct sgx_epc_page *epc_page, } static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, - struct sgx_backing *backing) + struct sgx_backing *backing, bool indirect) { struct sgx_encl_page *encl_page = epc_page->owner; struct sgx_encl *encl = encl_page->encl; @@ -270,7 +270,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, if (!encl->secs_child_cnt && test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) { ret = sgx_encl_alloc_backing(encl, PFN_DOWN(encl->size), - &secs_backing); + &secs_backing, indirect); if (ret) goto out; @@ -304,9 +304,11 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, * @lru: The LRU from which pages are reclaimed. * @nr_to_scan: Pointer to the target number of pages to scan, must be less than * SGX_NR_TO_SCAN. + * @indirect: In ksgxd or EPC cgroup work queue contexts. * Return: Number of pages reclaimed. */ -unsigned int sgx_reclaim_pages(struct sgx_epc_lru_list *lru, unsigned int *nr_to_scan) +unsigned int sgx_reclaim_pages(struct sgx_epc_lru_list *lru, unsigned int *nr_to_scan, + bool indirect) { struct sgx_epc_page *chunk[SGX_NR_TO_SCAN]; struct sgx_backing backing[SGX_NR_TO_SCAN]; @@ -348,7 +350,7 @@ unsigned int sgx_reclaim_pages(struct sgx_epc_lru_list *lru, unsigned int *nr_to page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base); mutex_lock(&encl_page->encl->lock); - ret = sgx_encl_alloc_backing(encl_page->encl, page_index, &backing[i]); + ret = sgx_encl_alloc_backing(encl_page->encl, page_index, &backing[i], indirect); if (ret) { mutex_unlock(&encl_page->encl->lock); goto skip; @@ -381,7 +383,7 @@ unsigned int sgx_reclaim_pages(struct sgx_epc_lru_list *lru, unsigned int *nr_to continue; encl_page = epc_page->owner; - sgx_reclaimer_write(epc_page, &backing[i]); + sgx_reclaimer_write(epc_page, &backing[i], indirect); kref_put(&encl_page->encl->refcount, sgx_encl_release); epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; @@ -399,11 +401,11 @@ static bool sgx_should_reclaim(unsigned long watermark) !list_empty(&sgx_global_lru.reclaimable); } -static void sgx_reclaim_pages_global(void) +static void sgx_reclaim_pages_global(bool indirect) { unsigned int nr_to_scan = SGX_NR_TO_SCAN; - sgx_reclaim_pages(&sgx_global_lru, &nr_to_scan); + sgx_reclaim_pages(&sgx_global_lru, &nr_to_scan, indirect); } /* @@ -414,7 +416,7 @@ static void sgx_reclaim_pages_global(void) void sgx_reclaim_direct(void) { if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) - sgx_reclaim_pages_global(); + sgx_reclaim_pages_global(false); } static int ksgxd(void *p) @@ -437,7 +439,7 @@ static int ksgxd(void *p) sgx_should_reclaim(SGX_NR_HIGH_PAGES)); if (sgx_should_reclaim(SGX_NR_HIGH_PAGES)) - sgx_reclaim_pages_global(); + sgx_reclaim_pages_global(true); cond_resched(); } @@ -460,11 +462,6 @@ static bool __init sgx_page_reclaimer_init(void) return true; } -bool current_is_ksgxd(void) -{ - return current == ksgxd_tsk; -} - static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid) { struct sgx_numa_node *node = &sgx_numa_nodes[nid]; @@ -623,7 +620,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) * Need to do a global reclamation if cgroup was not full but free * physical pages run out, causing __sgx_alloc_epc_page() to fail. */ - sgx_reclaim_pages_global(); + sgx_reclaim_pages_global(false); cond_resched(); } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 2593c013d091..cfe906054d85 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -110,7 +110,8 @@ void sgx_reclaim_direct(void); void sgx_mark_page_reclaimable(struct sgx_epc_page *page); int sgx_unmark_page_reclaimable(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); -unsigned int sgx_reclaim_pages(struct sgx_epc_lru_list *lru, unsigned int *nr_to_scan); +unsigned int sgx_reclaim_pages(struct sgx_epc_lru_list *lru, unsigned int *nr_to_scan, + bool indirect); void sgx_ipi_cb(void *info); From patchwork Tue Jan 30 02:09:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 193840 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp1018905dyb; Mon, 29 Jan 2024 21:51:49 -0800 (PST) X-Google-Smtp-Source: AGHT+IH/3sjHDNseJ0ZZZDw1KPwFtzLAHtP74Vi7vkaiNRlYtYrM06qMzIY4uOOqk3x9JuJdHZyb X-Received: by 2002:a2e:2243:0:b0:2cf:175e:612a with SMTP id i64-20020a2e2243000000b002cf175e612amr5360238lji.28.1706593909049; Mon, 29 Jan 2024 21:51:49 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706593909; cv=pass; d=google.com; s=arc-20160816; b=s9bcG04l69loL4WsC1oWUMpM08/I2Cwm4S8uJq8LSSz74fRoFnoyul2CGWQ/0SOuB7 gBh1noCJoS9W4+Zd19lXVxYu09cjvLJXX+1+yVYZvpghTCC5luUQ1wxC3448W5KlRha8 tC85WXiARH3xdu3+dXk5fN8GkjOtOXdcTlf1QkEW875bBAjZBmeAfDQzKF8tU71PpXzR pLRIBIayglEFghVGuTAKyUD9RtG1eg/grpKUxwMsMqBNkPG7N+wqD1kPydkJweaORwFm i9hd02+k4tebI3fxXM4jaLkAzzSJUXxEayni5Tbw/LbW9zbSJGOzYGN3V8sBgW7u0xh0 zVNA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=ReYxWAZMUxNRrNlRmmYLc3NOXxDcrubsy6l0zAcGTqU=; fh=dtGru4wMzkfh5N4HjO7hjgtPmzQheD7FyA8qGNiECvw=; b=fWjTb8sDuPfqaFqeQ7hmtiwCJHs4GVhcQLl+Ktef7Y/DGcHkE9OAe9sy8wTbgCMxdR hBBZqCUv+tiMNNe9lobXYfvPZiCneM/mu0+NGEl+vOpEHlfiVfsB88dBx4p3dUydAtbP rcH3//ytPIyydc9CUT5Nk9OQQqjuOAKIhcVRaYPk4xoewsZnUIs2JQTkJlJiyGo0GqtO 6YYlC4GuebOAtxS5q8MkURmF58O3vh51D2fLeA39zybAUBToZ1akELKSUA4pQCsrocDb lnbINb2TygEC5SWGAmL8iZVYn+aHhbIAdeKpN9Qm8G9Lektv+4dsMiCCtpweOdEquaLV 3eEw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fMM1xiKc; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43790-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43790-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id b10-20020a0564021f0a00b0055df856084esi3547833edb.686.2024.01.29.21.51.48 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 21:51:49 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-43790-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fMM1xiKc; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43790-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43790-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id EA4631F26632 for ; Tue, 30 Jan 2024 02:13:28 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 13D1A1586E8; Tue, 30 Jan 2024 02:09:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="fMM1xiKc" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 78DD0381C4; Tue, 30 Jan 2024 02:09:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580588; cv=none; b=C/m0gcuoM7NNOCMq3at7TIkJIfwPnFh9GAb1mwlZc6C+tBKHnDt72XmFPEu3S7+g1B0GzNIfVgxWBgPBbd4dm6GvsKmKfVBiRAAEAlNs+q3gShQ5wj1OsmuLXCtDGWcpEytFm06+0rqEMyzam+u0SF4j6MA7/x0WTNNp3BCNJUQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580588; c=relaxed/simple; bh=u1DYsgZy13ZjtchvNGb7CEOtsk3gCdTnMR1cRDDMGJI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Ki5TK2enmXFRV0uUYCaUK1kyAFfyiPcqom6H34A5+jZUraszF4SCMUXi6XMd5Mamws6YwU+TF/RvgYicxIr5QNx0x0xEpSiArtakcRHN2mYGAa+GL/FG5I5Jl2lw+6kpLiraoQASJQ8l9LcLjWDgl6ffW9MML1cPObU05ghegd0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=fMM1xiKc; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706580587; x=1738116587; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=u1DYsgZy13ZjtchvNGb7CEOtsk3gCdTnMR1cRDDMGJI=; b=fMM1xiKc0rqGC7NEn4HZsKv/1i5+vPJp3lYAtcO/gk0AtMtsb3io72TT osaU5AaWQm6xl+A9yZqx/mVPHE5xx5W4B+eIsctdgZ8/2WehiZvbn2r77 N6DOaZ3wbuyX9d+SXlHAa2lmBvOvtwroR0vnu83Tnl6r/IARPv382anE3 XLGKq27oP3xQByp0y8uUYgRekI4BI4WokuW0NdTUMvj++R1WDicFeMNbG I6MKbIsxgYh7bjkQpBFP3ipMty+7rKbT949r7+kFqI9wCJD0IQONIgUCa NNdKvSxURISKy4pGCpCYNdmMnWnFITw6FNU+2wEto1GqY0eRJ3yOxnwDq g==; X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="16530995" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="16530995" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 18:09:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="822042348" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="822042348" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga001.jf.intel.com with ESMTP; 29 Jan 2024 18:09:40 -0800 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, chrisyan@microsoft.com Subject: [PATCH v8 10/15] x86/sgx: Add EPC reclamation in cgroup try_charge() Date: Mon, 29 Jan 2024 18:09:33 -0800 Message-Id: <20240130020938.10025-11-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240130020938.10025-1-haitao.huang@linux.intel.com> References: <20240130020938.10025-1-haitao.huang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789493414966666459 X-GMAIL-MSGID: 1789493414966666459 From: Kristen Carlson Accardi When the EPC usage of a cgroup is near its limit, the cgroup needs to reclaim pages used in the same cgroup to make room for new allocations. This is analogous to the behavior that the global reclaimer is triggered when the global usage is close to total available EPC. Add a Boolean parameter for sgx_epc_cgroup_try_charge() to indicate whether synchronous reclaim is allowed or not. And trigger the synchronous/asynchronous reclamation flow accordingly. Note at this point, all reclaimable EPC pages are still tracked in the global LRU and per-cgroup LRUs are empty. So no per-cgroup reclamation is activated yet. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang --- V7: - Split this out from the big patch, #10 in V6. (Dave, Kai) --- arch/x86/kernel/cpu/sgx/epc_cgroup.c | 26 ++++++++++++++++++++++++-- arch/x86/kernel/cpu/sgx/epc_cgroup.h | 4 ++-- arch/x86/kernel/cpu/sgx/main.c | 2 +- 3 files changed, 27 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c b/arch/x86/kernel/cpu/sgx/epc_cgroup.c index cbcb7b0de3fe..127f515ffccf 100644 --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c @@ -176,13 +176,35 @@ static void sgx_epc_cgroup_reclaim_work_func(struct work_struct *work) /** * sgx_epc_cgroup_try_charge() - try to charge cgroup for a single EPC page * @epc_cg: The EPC cgroup to be charged for the page. + * @reclaim: Whether or not synchronous reclaim is allowed * Return: * * %0 - If successfully charged. * * -errno - for failures. */ -int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg) +int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool reclaim) { - return misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE); + for (;;) { + if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, + PAGE_SIZE)) + break; + + if (sgx_epc_cgroup_lru_empty(epc_cg->cg)) + return -ENOMEM; + + if (signal_pending(current)) + return -ERESTARTSYS; + + if (!reclaim) { + queue_work(sgx_epc_cg_wq, &epc_cg->reclaim_work); + return -EBUSY; + } + + if (!sgx_epc_cgroup_reclaim_pages(epc_cg->cg, false)) + /* All pages were too young to reclaim, try again a little later */ + schedule(); + } + + return 0; } /** diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.h b/arch/x86/kernel/cpu/sgx/epc_cgroup.h index e3c6a08f0ee8..d061cd807b45 100644 --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.h +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h @@ -23,7 +23,7 @@ static inline struct sgx_epc_cgroup *sgx_get_current_epc_cg(void) static inline void sgx_put_epc_cg(struct sgx_epc_cgroup *epc_cg) { } -static inline int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg) +static inline int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool reclaim) { return 0; } @@ -66,7 +66,7 @@ static inline void sgx_put_epc_cg(struct sgx_epc_cgroup *epc_cg) put_misc_cg(epc_cg->cg); } -int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg); +int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool reclaim); void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg); bool sgx_epc_cgroup_lru_empty(struct misc_cg *root); void sgx_epc_cgroup_init(void); diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 51904f191b97..2279ae967707 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -588,7 +588,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) int ret; epc_cg = sgx_get_current_epc_cg(); - ret = sgx_epc_cgroup_try_charge(epc_cg); + ret = sgx_epc_cgroup_try_charge(epc_cg, reclaim); if (ret) { sgx_put_epc_cg(epc_cg); return ERR_PTR(ret); From patchwork Tue Jan 30 02:09:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 193833 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp997307dyb; Mon, 29 Jan 2024 20:40:10 -0800 (PST) X-Google-Smtp-Source: AGHT+IGvFCdUa6K7FxuthG/KOTZ0bB119e5vy7gvz/vcuM3JHmbjylI0sQlUBPsI7quU7SVGR1Yv X-Received: by 2002:a05:620a:146c:b0:783:f4fd:d615 with SMTP id j12-20020a05620a146c00b00783f4fdd615mr4546742qkl.72.1706589610722; Mon, 29 Jan 2024 20:40:10 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706589610; cv=pass; d=google.com; s=arc-20160816; b=WLz0wQAeZoNW1ke/lQc6PQydOAlbY5OP9oZ9UFk1IBhx5xoFG6xpW/3/0NhUb0LwkP zD57HPRULKmKr+YKvtV/VZGywpr7cHbOzytK1uPSGUWu5UXp9tMF229OpAGQm9RCgAYI bxFnYL7vUxkJG4hgDjxfkEz1ftyQP3iP7WutFWwMHlqySoufYcV0wWRaCJx2A+1h5sCu fPWS+Yfn5sxgvzQ+3stOU8zkto/NGvVWvDKnkpADwwXLVvggojjYbt4dis2XnLWx8BAG 1glGxcnmfWNzAL7i2i/VK8UBydwQ01w+7646CLXgGIiVUIQLc/GAj+tbW67Po8LBBp66 Niug== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=Me81c86lvtGZI93PVJQnQauE0EXMwbFUcZYFIVmNP2w=; fh=dtGru4wMzkfh5N4HjO7hjgtPmzQheD7FyA8qGNiECvw=; b=L7eQTtK09WphohyHEwP42avsWHO6b8Uy0Wbz1ETr3jIfYG3VDn8GrUpWTbtEumfLol 6RR+ZVHIaCJ8YQ2mBEFKoen5y0+TT31T6Kkrx6oWLVDiHTtOIhQIKKAuzgfZ7B3R0AwY KNQEA4GIERaCV+cLBQ1yUUpRSQqkP1QUCGG3DwBlY3jZTOWD9pVciu6qm2EIfg12OPFC hjj2oYF5e4mRqKju7DNQwl7ASX1MRufSVyfgsiM8+O9wzlQ4QZBZC7kPnVr84ZtJ6xdB jqm/cSYUi6BEot2VnzomvHpAAY8hBFdGGM55pOTHKHqHalYTivQs9XfXojRIseBKNPqE tpQQ== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=cPIeb39w; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43792-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43792-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id a27-20020a05620a02fb00b00783e5c0ae73si6643640qko.180.2024.01.29.20.40.10 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 20:40:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-43792-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=cPIeb39w; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43792-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43792-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 2C85A1C24838 for ; Tue, 30 Jan 2024 02:13:33 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4C3A1158D9B; Tue, 30 Jan 2024 02:09:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="cPIeb39w" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E62A38389; Tue, 30 Jan 2024 02:09:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580588; cv=none; b=L3bcJwRQzT+RWTS3YpEvpE1E1Kii9O53np7DYFcGEzlJbZLnLW1rFDigP+N+Xol2nWQgOKSxo0Ct5g5j/5hWCdw02RCGivO2xYEV+Bt4SQNneOj8LEJj8Lq3sUIxbVUy5tw3a/+qWONnCmzIAZAx8Kjupjpd0W1NX3vpuAwuU0M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580588; c=relaxed/simple; bh=rrpRMx6KFPHw5V/aj02GXylo9fjKkcl/4Bs4p2tRtDQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qvHNKDLsdcl9mfY5s984C1uia/s0YBPzCyh1dwzbRxYFM15ogVkzWy9Jn5HLjhHvYfdN6hKZv2uGuO4BGkcjXR0ykoh78TNzD46idzH1ejWipu8neuTkjGezejXNqNdLUjOZoznj9Wz6toXPpphpkoglkkCIYrvdh2bv06TYsJU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=cPIeb39w; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706580588; x=1738116588; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=rrpRMx6KFPHw5V/aj02GXylo9fjKkcl/4Bs4p2tRtDQ=; b=cPIeb39wO6aEJaAQPWSTbMrTb+VhcgkrI1GDDhKVStpPqH8RBmPch1yq 8oGUiaQfdU4dxo/MEm/0CafQR0sHvcq0XHM1qgL0/GEjgRz1l7960pjZ7 ogbpLewu0MC5jQ2XX+iKDi6YnOLaDwf5RNQ8COkAVe3Yzzlv9mOazgTeG CLl1bekTmu9CxDARuJrs2TYaRKXfS8aco491FH3EVGgw3iu5Jx0ZQwZmf yPmXbekrdrKaiK9m+98OzdeV3KvAlKgQny+d2LCoqvwRArpcoR5c37fwZ He2QcGGhmt87bRLmwnJU8EQGcdQYm2IBOCMoDud4ZJPy0htqHH2UpFN1c g==; X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="16531004" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="16531004" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 18:09:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="822042351" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="822042351" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga001.jf.intel.com with ESMTP; 29 Jan 2024 18:09:40 -0800 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, chrisyan@microsoft.com Subject: [PATCH v8 11/15] x86/sgx: Abstract check for global reclaimable pages Date: Mon, 29 Jan 2024 18:09:34 -0800 Message-Id: <20240130020938.10025-12-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240130020938.10025-1-haitao.huang@linux.intel.com> References: <20240130020938.10025-1-haitao.huang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789488907438067049 X-GMAIL-MSGID: 1789488907438067049 From: Kristen Carlson Accardi To determine if any page available for reclamation at the global level, only checking for emptiness of the global LRU is not adequate when pages are tracked in multiple LRUs, one per cgroup. For this purpose, create a new helper, sgx_can_reclaim(), currently only checks the global LRU, later will check emptiness of LRUs of all cgroups when per-cgroup tracking is turned on. Replace all the checks of the global LRU, list_empty(&sgx_global_lru.reclaimable), with calls to sgx_can_reclaim(). Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang --- v7: - Split this out from the big patch, #10 in V6. (Dave, Kai) --- arch/x86/kernel/cpu/sgx/main.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 2279ae967707..6b0c26cac621 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -37,6 +37,11 @@ static inline struct sgx_epc_lru_list *sgx_lru_list(struct sgx_epc_page *epc_pag return &sgx_global_lru; } +static inline bool sgx_can_reclaim(void) +{ + return !list_empty(&sgx_global_lru.reclaimable); +} + static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0); /* Nodes with one or more EPC sections. */ @@ -398,7 +403,7 @@ unsigned int sgx_reclaim_pages(struct sgx_epc_lru_list *lru, unsigned int *nr_to static bool sgx_should_reclaim(unsigned long watermark) { return atomic_long_read(&sgx_nr_free_pages) < watermark && - !list_empty(&sgx_global_lru.reclaimable); + sgx_can_reclaim(); } static void sgx_reclaim_pages_global(bool indirect) @@ -601,7 +606,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - if (list_empty(&sgx_global_lru.reclaimable)) { + if (!sgx_can_reclaim()) { page = ERR_PTR(-ENOMEM); break; } From patchwork Tue Jan 30 02:09:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 193951 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp1112675dyb; Tue, 30 Jan 2024 02:06:56 -0800 (PST) X-Google-Smtp-Source: AGHT+IH3wB28JNy8incPpeCJ7oBnBgX4dCsud0Mzh5P1DrG9/X40k1NYIWDJOObYInwuIsWIlJvk X-Received: by 2002:a05:6870:47a4:b0:206:a798:8d88 with SMTP id c36-20020a05687047a400b00206a7988d88mr3852225oaq.29.1706609216150; Tue, 30 Jan 2024 02:06:56 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706609216; cv=pass; d=google.com; s=arc-20160816; b=fjHZ5c6YrKHIb3L151VUIP7wa3vncfyaneOEpuY8lajZV28DAUZq4T8OJ61z/oH5pX c+0wvowFXghxQNPmuNtO8S788Q1UL7AskzKFgZMlDZ6AwT4b3oUZBIvgYNLLxSSei5IL 9NS83dSuNP5Cvxn+2QZGQlStJsDC7SY2EtRhpHd9YZrYKj5czotj0WO/FsbKGQsxGzfO 1CCcVPHs96JXsz4LhlA5zVy2IngXzWQZs4ref4ifRVQjtijCFHOuy1EsCMziMqnYXVkH j4Z9pKFgFZgFCqIyHofIO47JO7vJU7bebnHDv+tQLuX+7vLdCXPXY5k0NeeCYoFhpZm8 VdZg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=S2XjIk+IpBf1dTU0uKZrExoEDHmMFXTeZdQqpOeINB0=; fh=dtGru4wMzkfh5N4HjO7hjgtPmzQheD7FyA8qGNiECvw=; b=UQ0M6ncggNVWDuGZOeUmzONPbymPkn4XNc42rF5e8vVF9yV9IsXQcQjfnRfklQOQPg wZ4AnbgEy+0gm40vADScesi7d2l8MWbEJ/26FkcPwGp04aysI+Dp8KH5nrMCd/+HKIE/ 5/8dR6VMA4kjLsAcTMpeTaaLKw42NcZ9X39WY/s4ikoHTFRc9nVkOYcnIQfTZT8YYEnR fxE0KCvAgZGcDclZtK/tUg9N+7JypWi6MlXq1xX5otnPxy5+BBhk2kjnb65Hxn7csgJg 7+RgsLajjejcEa02eaJ64d52hN/5GxctGAKAT5S09our7N4MqklbInrGT2HHphAc91yt A0gg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Dmq+r2dt; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43793-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43793-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id l17-20020a635b51000000b005ced5bbfcb3si6929765pgm.392.2024.01.30.02.06.56 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jan 2024 02:06:56 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-43793-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Dmq+r2dt; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43793-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43793-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id A34032891B2 for ; Tue, 30 Jan 2024 02:14:03 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A0E7C15A4BF; Tue, 30 Jan 2024 02:09:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Dmq+r2dt" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5F01238DDF; Tue, 30 Jan 2024 02:09:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580590; cv=none; b=bzx0kzNiERGP0jX/YNRu7PQL4OfX80c9AecF43s9yrKR5OrLdvSi0jFkBTSqYP8pAqraYe+cpwe3mBz00kE9/jzannvEl8sw0HpQn8OcQy84WFQxefwgGrw+5it/20yb+mgW0CW+JYnk6a20LrsJqTlhXXvErZw0V+AhW4ax9ks= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580590; c=relaxed/simple; bh=ciLOjkP1Lyv8VF4qBFgqWsBylreTL6rvQA9H9sB86+8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=V7tajMpHIrNACNkTXNdQoxRhujEWY98vs/K/C61QyklS7GZR2Of155AL9fG/XHvgr2yp5dRKQ1dPXc49x+OU/zkRbV9a4EK48o4i9S0rKKbxRJpkY99BV+zAl11OcCqIlhjXgun3+a1ItzIoCKZ3oS+FBrTBqkrsRnlPOsd0phc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Dmq+r2dt; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706580589; x=1738116589; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ciLOjkP1Lyv8VF4qBFgqWsBylreTL6rvQA9H9sB86+8=; b=Dmq+r2dtdlmtoSY/gCMfrHl0uwOw9d7SrAWwAAdzxWl+cExHcDV0Xnpv GGOGTJdYoGatZMTAwm/zYZTLXcUWCpn/WfCCuUoD6h8CTrmGtFKz61VHG zal8g5WFvxG3g/xqncDwXkP53oHzEdirWewHcJ/D+7IeFAH4jH9FDpH06 UaARg4pbxwZe5yoq2vvKw3WriTT9V1b1iyA8nTOIXey+7uORSvTQoqlf7 41MU1LzW01j6nrTBlWBmaKBMA/ts9M7zpu73REHKtQBcA8XvXNNy6hrgU 0jrNMd6DT//dnc4zaFS3XixlKqgs3BEN/KPvP+Kup1JDm2b/q47DILXbE A==; X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="16531013" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="16531013" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 18:09:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="822042354" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="822042354" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga001.jf.intel.com with ESMTP; 29 Jan 2024 18:09:40 -0800 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, chrisyan@microsoft.com Subject: [PATCH v8 12/15] x86/sgx: Expose sgx_epc_cgroup_reclaim_pages() for global reclaimer Date: Mon, 29 Jan 2024 18:09:35 -0800 Message-Id: <20240130020938.10025-13-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240130020938.10025-1-haitao.huang@linux.intel.com> References: <20240130020938.10025-1-haitao.huang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789509465805402809 X-GMAIL-MSGID: 1789509465805402809 From: Kristen Carlson Accardi When cgroup is enabled, all reclaimable pages will be tracked in cgroup LRUs. The global reclaimer needs to start reclamation from the root cgroup. Expose the top level cgroup reclamation function so the global reclaimer can reuse it. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang --- V8: - Remove unneeded breaks in function declarations. (Jarkko) V7: - Split this out from the big patch, #10 in V6. (Dave, Kai) --- arch/x86/kernel/cpu/sgx/epc_cgroup.c | 2 +- arch/x86/kernel/cpu/sgx/epc_cgroup.h | 7 +++++++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c b/arch/x86/kernel/cpu/sgx/epc_cgroup.c index 127f515ffccf..e08425b1faa5 100644 --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c @@ -88,7 +88,7 @@ bool sgx_epc_cgroup_lru_empty(struct misc_cg *root) * @indirect: In ksgxd or EPC cgroup work queue context. * Return: Number of pages reclaimed. */ -static unsigned int sgx_epc_cgroup_reclaim_pages(struct misc_cg *root, bool indirect) +unsigned int sgx_epc_cgroup_reclaim_pages(struct misc_cg *root, bool indirect) { /* * Attempting to reclaim only a few pages will often fail and is diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.h b/arch/x86/kernel/cpu/sgx/epc_cgroup.h index d061cd807b45..5b3e8e1b8630 100644 --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.h +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h @@ -31,6 +31,11 @@ static inline int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool static inline void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg) { } static inline void sgx_epc_cgroup_init(void) { } + +static inline unsigned int sgx_epc_cgroup_reclaim_pages(struct misc_cg *root, bool indirect) +{ + return 0; +} #else struct sgx_epc_cgroup { struct misc_cg *cg; @@ -69,6 +74,8 @@ static inline void sgx_put_epc_cg(struct sgx_epc_cgroup *epc_cg) int sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, bool reclaim); void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg); bool sgx_epc_cgroup_lru_empty(struct misc_cg *root); +unsigned int sgx_epc_cgroup_reclaim_pages(struct misc_cg *root, bool indirect); + void sgx_epc_cgroup_init(void); #endif From patchwork Tue Jan 30 02:09:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 193827 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp977170dyb; Mon, 29 Jan 2024 19:30:26 -0800 (PST) X-Google-Smtp-Source: AGHT+IHCifUGcyiRD727wUQir6vBeTW6cIFPI4nuO//STS327iU/yySRKcQjLIsvL3V9F0oJvF+3 X-Received: by 2002:a05:620a:1a19:b0:783:f808:e495 with SMTP id bk25-20020a05620a1a1900b00783f808e495mr4485647qkb.35.1706585426463; Mon, 29 Jan 2024 19:30:26 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706585426; cv=pass; d=google.com; s=arc-20160816; b=kE1XMC95aLt/zhNcJ1pwTRkH034jUcrkYU0dZ2CTbePOXJk7VyZUaZbYVVhk3J4kjY ZobZoCOv8clncN5uUtZhYspBX9PWwL6jjdDe9TtVwuElT9yfDC898yzAK2UAB3VKeFL5 kvJGvc6JnHnTrkcVPwJl7n5TsqV7lwjUjq8UQOAUoNTE1arYKdWaETWTHrKtUlA8OJqE 1RTuQelvlwTMrMFQQrCU2zoVF/p0VQpqao5TW9huzOslmgN3Kz2URy1SJCaFbFRosx0n 7UKv3rUOQOvrW+dYnwtuBm8tiVLHoIDYra0fsxq8vXSszdOQyCdnYRJVnEfIqjFL2fR2 XzbA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=wqWcfC0p69skPdu0RzqJsRRzuGiw4SXRCSkOdwc+4bE=; fh=dtGru4wMzkfh5N4HjO7hjgtPmzQheD7FyA8qGNiECvw=; b=IpWi+nSlGfXPXXqNYgT4OGqNsgwSmXGOg1SsavPdqXVYJ87L38pgC4FowNfLTQ9S5N xuCPlbRwNWVvlniNLqeUNhoeHd6yItngcZd55LeA3N8g4x5SSvX4Yti82qJETp71+sgP lfJd4U4QwCgIQMBuj+qzarSsGnO+EdJPbP1605Bthn/BBGyJ1BZbMZuMZdOrj3RA1o/6 F6FX1phE919dtd0g+3iJQGWi89+8Mp+NM4MtXo3MyP9oLy3FDEa0ZUWR9+/gBHqomuAp SjxPMZGgEiYKYncIfYPtZTcSl9RFpEBsNEMebbxLEIgDk8pNUnkCE38eTlj0cGq9eDEI WK9Q== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=gpsxD4Nd; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43794-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43794-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id h20-20020a05620a10b400b007832295fc0dsi8871666qkk.468.2024.01.29.19.30.26 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 19:30:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-43794-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=gpsxD4Nd; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43794-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43794-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id AC86A1C24BBF for ; Tue, 30 Jan 2024 02:14:09 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id F3E9F15AAAB; Tue, 30 Jan 2024 02:09:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="gpsxD4Nd" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A553738F80; Tue, 30 Jan 2024 02:09:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580590; cv=none; b=lw6Wsfe7bMAjFAlleYGuVkCFBN60DouKlGJ25Q0SRCGmszIeIC6DhLONyU8t7RuL4h0DAlFkjdoq5yvSPfIZPvOCe6mKlKNAuONWtMhi5FgGC9ucpU7ItGoDNtSclMUdYgvdCHi1K5739QvM6zPVs1z9xWnb9BW+JJWkHXM1494= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580590; c=relaxed/simple; bh=o1nFXrA5FX9ZZTIDA9Xh5KyH840bTq33YyyW8UGP2Gw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=muPAZIZ+lozTC80PUgbLtsQ28hGO48kzEZhoSKLvDewI3utesyjw+mOKYriy8GjpvhajRqKTvtn5S5qLMTf1UQ2t21wdZT0GEGgW00kjWhqcnpZxNp7LEiz94EQXIOXA2K92iGmMDkwvhAW9NrbM/FeRzxhTiXd8Nz7H6Fce9cs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=gpsxD4Nd; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706580589; x=1738116589; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=o1nFXrA5FX9ZZTIDA9Xh5KyH840bTq33YyyW8UGP2Gw=; b=gpsxD4Ndik710SGw4pueUO/OPcN3wMWTJBkbE05Ph58zfAPQg+Hpmgae h6KeHuyXQ1O1gAosVrjnpYIFGJJIWdtu0THRTf3ZpWLn8UELIlvOnyI5j ZdnMoVVKJdtfr0FtunwjDxnCSPRRH5vHk7zna3/1Yihoe9oVYzIEEV3RR 4jakUneVSZn5Eej0sb3LNsKNk4kHiMU1Fn5LfFLDHfaQBNJnrc/OoUTj8 pbCeSqaRLpe9OJCbC7z+4i/AkYnwZDV2HpOj9Txy47CKvWgOucOHxTG9B +goPlNtDUH+BYvwUbHaOCaUC3nkB2qg0zOfk8QuGj2Ayf1Yw2QAWw7CYN w==; X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="16531022" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="16531022" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 18:09:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="822042357" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="822042357" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga001.jf.intel.com with ESMTP; 29 Jan 2024 18:09:40 -0800 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, chrisyan@microsoft.com Subject: [PATCH v8 13/15] x86/sgx: Turn on per-cgroup EPC reclamation Date: Mon, 29 Jan 2024 18:09:36 -0800 Message-Id: <20240130020938.10025-14-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240130020938.10025-1-haitao.huang@linux.intel.com> References: <20240130020938.10025-1-haitao.huang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789484520616190461 X-GMAIL-MSGID: 1789484520616190461 From: Kristen Carlson Accardi Previous patches have implemented all infrastructure needed for per-cgroup EPC page tracking and reclaiming. But all reclaimable EPC pages are still tracked in the global LRU as sgx_lru_list() returns hard coded reference to the global LRU. Change sgx_lru_list() to return the LRU of the cgroup in which the given EPC page is allocated. This makes all EPC pages tracked in per-cgroup LRUs and the global reclaimer (ksgxd) will not be able to reclaim any pages from the global LRU. However, in cases of over-committing, i.e., sum of cgroup limits greater than the total capacity, cgroups may never reclaim but the total usage can still be near the capacity. Therefore global reclamation is still needed in those cases and it should reclaim from the root cgroup. Modify sgx_reclaim_pages_global(), to reclaim from the root EPC cgroup when cgroup is enabled, otherwise from the global LRU. Similarly, modify sgx_can_reclaim(), to check emptiness of LRUs of all cgroups when EPC cgroup is enabled, otherwise only check the global LRU. With these changes, the global reclamation and per-cgroup reclamation both work properly with all pages tracked in per-cgroup LRUs. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang --- V7: - Split this out from the big patch, #10 in V6. (Dave, Kai) --- arch/x86/kernel/cpu/sgx/main.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 6b0c26cac621..d4265a390ba9 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -34,12 +34,23 @@ static struct sgx_epc_lru_list sgx_global_lru; static inline struct sgx_epc_lru_list *sgx_lru_list(struct sgx_epc_page *epc_page) { +#ifdef CONFIG_CGROUP_SGX_EPC + if (epc_page->epc_cg) + return &epc_page->epc_cg->lru; + + /* This should not happen if kernel is configured correctly */ + WARN_ON_ONCE(1); +#endif return &sgx_global_lru; } static inline bool sgx_can_reclaim(void) { +#ifdef CONFIG_CGROUP_SGX_EPC + return !sgx_epc_cgroup_lru_empty(misc_cg_root()); +#else return !list_empty(&sgx_global_lru.reclaimable); +#endif } static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0); @@ -410,7 +421,10 @@ static void sgx_reclaim_pages_global(bool indirect) { unsigned int nr_to_scan = SGX_NR_TO_SCAN; - sgx_reclaim_pages(&sgx_global_lru, &nr_to_scan, indirect); + if (IS_ENABLED(CONFIG_CGROUP_SGX_EPC)) + sgx_epc_cgroup_reclaim_pages(misc_cg_root(), indirect); + else + sgx_reclaim_pages(&sgx_global_lru, &nr_to_scan, indirect); } /* From patchwork Tue Jan 30 02:09:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 193826 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp977085dyb; Mon, 29 Jan 2024 19:30:14 -0800 (PST) X-Google-Smtp-Source: AGHT+IH+45PHViufjjV1lQ4INe5d7R/o/zb/36VzlM0O+gEzrcw6IvDKqtDUW5fc+ljKp8o+jiWL X-Received: by 2002:a05:622a:1792:b0:42a:83d4:2db3 with SMTP id s18-20020a05622a179200b0042a83d42db3mr7366510qtk.121.1706585413864; Mon, 29 Jan 2024 19:30:13 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706585413; cv=pass; d=google.com; s=arc-20160816; b=uVa4AGkpqHaVCR3vP1X17TZxqmH7RhZlgJQqDlYfU0sWBr86/1cvAiU0i5JdmmduCA baIQM65sOgzC6C2gt3u0HJBnIliQ72TzAJRszfONksTsYoMND5QTXqLLaTdUXhHD9bfq f4xanZa6N6Jdm5Rfn7fg42bggBEWLXwx6+o4oM6oHbc5qfSHtc47upL7q+oXMoudUBtk LZoiqycjXu4e1snzuCDgHbJvu0ZFNQ+1dbWS8RTP+wBO196kvuXw0MhcfxmQmml6GR9J gcYGi7R0IWAUzSVpCWaGW5pOZMamrIHa7Id4MvYYqDELa2utWbT6s9JozhJDduAcvVw6 D/4g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=IK1zZD9GQLiLQlBRFrkRhjEGngByj19LCkee3XOSOjQ=; fh=dtGru4wMzkfh5N4HjO7hjgtPmzQheD7FyA8qGNiECvw=; b=LYfXYuQvH4bvr1veWgpNZLUh+hdGjz0Zv3/rRVN7rwfEpIQ5235kYUyPLu1aNgfkWp 9g8FSIwPwkGvHpgk0ufPeVSHTUq4LdBxyTT7EUH755oCZ23YKWRM5eJQvLAzf/6YmimI amT0f4k3nICy+nLJnMSbqRxY4fKnbdOZiJAi+54G+zuoQ1s5iF5McopnW9m2Bea2w6d/ vUKlbdfgYODjuNT1hvx1Yn1VunzcNPBFeiFF+Fw9g3roFSzadiwv0C13NK9yPq8nFbF4 Pa8Wnfl29EWz1qIDwtSM5hj/ldu38IUZYHo7sSzaTyFkdRO4vfFzR2oXdSHhS2zJAbhZ QqJw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=XmuJfFn5; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43796-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43796-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id r19-20020ac85c93000000b0042a1138b665si8982661qta.783.2024.01.29.19.30.13 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 19:30:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-43796-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=XmuJfFn5; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-43796-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43796-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 450371C24D8A for ; Tue, 30 Jan 2024 02:14:29 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E936515AADE; Tue, 30 Jan 2024 02:09:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="XmuJfFn5" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6C48553E22; Tue, 30 Jan 2024 02:09:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580592; cv=none; b=iaa5fDttlwQnIwFYvxAUO+B4UaHEGXrtce3YtiS92y2INPHGmhR63kC3cXgRl3r3pDx7430ZOYPTx3UVEiQCyt9x0pzbGRIS+EPy4SV5PxS7jgXR8zp2+BXoo5wqwYE0QTE4NfajgwAczc0d1eZiy/vJW8nbFxmk3XZd4f9h9kY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580592; c=relaxed/simple; bh=9w5lQ2pOZdanN4uPSlw3FqAR9K1n/EpTv9COUE47BCo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=t+Sq7J8EJe2HowbyY1q+sVs952PxBUHo2M1AfUl//WI81avVLHBDX8c00MnHE/dBQsdQ0EBOq7XyyR3HaefUGgRCNDWKqCKeLCBbYpkTKOtZim1TfdzR6RVXeBGYanwTylEKsfNSkWRS9FSsWRL43Xn97jF9Q/5FiIj2xWu8sds= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=XmuJfFn5; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706580591; x=1738116591; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9w5lQ2pOZdanN4uPSlw3FqAR9K1n/EpTv9COUE47BCo=; b=XmuJfFn5sR0gIfTeqXdJANuhrkcM5p4430vksCsV2y/J1IazHJK09rTK b92HkrTtd6kkhlRlbC6MqPi5arwX/umA3hwjkWsKaJi1Uj4iFcs6LP8pM 52KolE/nldSlsjTjIvqrbE4E9lZ4pGhlrfNN2YTWh9AEDANM2RwH5myua EtiLv3kBE42+s1TjLKW8gWpsctQctV9Ftgp7mn+qZZEerlbXime7FW8Eg trLu8iTae9p09mvmrnJIVtBmipsdLPo+miKTrIWX5ek3oFuB5nRc5Auuw A+kyZZIlBa8FItow2duuCjr4jC3IzEsh4hS7hhT5u9hN7Rz0O9xRNCAB8 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="16531031" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="16531031" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 18:09:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="822042360" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="822042360" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga001.jf.intel.com with ESMTP; 29 Jan 2024 18:09:41 -0800 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, chrisyan@microsoft.com Subject: [PATCH v8 14/15] Docs/x86/sgx: Add description for cgroup support Date: Mon, 29 Jan 2024 18:09:37 -0800 Message-Id: <20240130020938.10025-15-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240130020938.10025-1-haitao.huang@linux.intel.com> References: <20240130020938.10025-1-haitao.huang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789484507027653440 X-GMAIL-MSGID: 1789484507027653440 From: Sean Christopherson Add initial documentation of how to regulate the distribution of SGX Enclave Page Cache (EPC) memory via the Miscellaneous cgroup controller. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V8: - Limit text width to 80 characters to be consistent. V6: - Remove mentioning of VMM specific behavior on handling SIGBUS - Remove statement of forced reclamation, add statement to specify ENOMEM returned when no reclamation possible. - Added statements on the non-preemptive nature for the max limit - Dropped Reviewed-by tag because of changes V4: - Fix indentation (Randy) - Change misc.events file to be read-only - Fix a typo for 'subsystem' - Add behavior when VMM overcommit EPC with a cgroup (Mikko) --- Documentation/arch/x86/sgx.rst | 83 ++++++++++++++++++++++++++++++++++ 1 file changed, 83 insertions(+) diff --git a/Documentation/arch/x86/sgx.rst b/Documentation/arch/x86/sgx.rst index d90796adc2ec..c537e6a9aa65 100644 --- a/Documentation/arch/x86/sgx.rst +++ b/Documentation/arch/x86/sgx.rst @@ -300,3 +300,86 @@ to expected failures and handle them as follows: first call. It indicates a bug in the kernel or the userspace client if any of the second round of ``SGX_IOC_VEPC_REMOVE_ALL`` calls has a return code other than 0. + + +Cgroup Support +============== + +The "sgx_epc" resource within the Miscellaneous cgroup controller regulates +distribution of SGX EPC memory, which is a subset of system RAM that is used to +provide SGX-enabled applications with protected memory, and is otherwise +inaccessible, i.e. shows up as reserved in /proc/iomem and cannot be +read/written outside of an SGX enclave. + +Although current systems implement EPC by stealing memory from RAM, for all +intents and purposes the EPC is independent from normal system memory, e.g. must +be reserved at boot from RAM and cannot be converted between EPC and normal +memory while the system is running. The EPC is managed by the SGX subsystem and +is not accounted by the memory controller. Note that this is true only for EPC +memory itself, i.e. normal memory allocations related to SGX and EPC memory, +e.g. the backing memory for evicted EPC pages, are accounted, limited and +protected by the memory controller. + +Much like normal system memory, EPC memory can be overcommitted via virtual +memory techniques and pages can be swapped out of the EPC to their backing store +(normal system memory allocated via shmem). The SGX EPC subsystem is analogous +to the memory subsystem, and it implements limit and protection models for EPC +memory. + +SGX EPC Interface Files +----------------------- + +For a generic description of the Miscellaneous controller interface files, +please see Documentation/admin-guide/cgroup-v2.rst + +All SGX EPC memory amounts are in bytes unless explicitly stated otherwise. If +a value which is not PAGE_SIZE aligned is written, the actual value used by the +controller will be rounded down to the closest PAGE_SIZE multiple. + + misc.capacity + A read-only flat-keyed file shown only in the root cgroup. The sgx_epc + resource will show the total amount of EPC memory available on the + platform. + + misc.current + A read-only flat-keyed file shown in the non-root cgroups. The sgx_epc + resource will show the current active EPC memory usage of the cgroup and + its descendants. EPC pages that are swapped out to backing RAM are not + included in the current count. + + misc.max + A read-write single value file which exists on non-root cgroups. The + sgx_epc resource will show the EPC usage hard limit. The default is + "max". + + If a cgroup's EPC usage reaches this limit, EPC allocations, e.g., for + page fault handling, will be blocked until EPC can be reclaimed from the + cgroup. If there are no pages left that are reclaimable within the same + group, the kernel returns ENOMEM. + + The EPC pages allocated for a guest VM by the virtual EPC driver are not + reclaimable by the host kernel. In case the guest cgroup's limit is + reached and no reclaimable pages left in the same cgroup, the virtual + EPC driver returns SIGBUS to the user space process to indicate failure + on new EPC allocation requests. + + The misc.max limit is non-preemptive. If a user writes a limit lower + than the current usage to this file, the cgroup will not preemptively + deallocate pages currently in use, and will only start blocking the next + allocation and reclaiming EPC at that time. + + misc.events + A read-only flat-keyed file which exists on non-root cgroups. + A value change in this file generates a file modified event. + + max + The number of times the cgroup has triggered a reclaim due to + its EPC usage approaching (or exceeding) its max EPC boundary. + +Migration +--------- + +Once an EPC page is charged to a cgroup (during allocation), it remains charged +to the original cgroup until the page is released or reclaimed. Migrating a +process to a different cgroup doesn't move the EPC charges that it incurred +while in the previous cgroup to its new cgroup. From patchwork Tue Jan 30 02:09:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 193797 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp953395dyb; Mon, 29 Jan 2024 18:14:32 -0800 (PST) X-Google-Smtp-Source: AGHT+IGubSuHwWWKFyNt/diAsK/ORt5EfZhvJ63EbhAqodcoI3KC7/xJkgnhaOVdOESSCQ1FyMO5 X-Received: by 2002:a17:906:c211:b0:a35:70d8:89e9 with SMTP id d17-20020a170906c21100b00a3570d889e9mr4614775ejz.63.1706580872536; Mon, 29 Jan 2024 18:14:32 -0800 (PST) Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id o10-20020a170906288a00b00a2d0a8f21b8si4104579ejd.396.2024.01.29.18.14.32 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Jan 2024 18:14:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-43795-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@intel.com header.s=Intel header.b=VWvn92VY; arc=fail (body hash mismatch); spf=pass (google.com: domain of linux-kernel+bounces-43795-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-43795-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id F21B71F2700D for ; Tue, 30 Jan 2024 02:14:31 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3309915B0E2; Tue, 30 Jan 2024 02:09:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=intel.com header.i=@intel.com header.b="VWvn92VY" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 30C2F381DE; Tue, 30 Jan 2024 02:09:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580591; cv=none; b=rSi44zI6pIcycLwT9BzCybbpaP+RC+qPUpVew8RTCxVVOCVqrj04O60BInp0MoOFLonh2cfz79A7UJyadbCY329L5rMf+rhPqL3/4VnHMGZsunkgSLA1UCpXMhMknho6dfFdyQlo+lF5jYgTtSW06/nd7JRDtSzDJpb2ZW1NRhg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706580591; c=relaxed/simple; bh=yOWHglLZ1B+fQRkP+rpuHciUQYdwgtTPy4iyogRW/R4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=IyFEy2JyLg7ze1fgeq+fKHddwEt+EwCgwlR/nqls+cNFrSnyM/nYNvjkezW5PkDi5xCxP+h6Gb+xf8NhdCSxG0R2LPCP+NnAS7miG0RXfxXLqTMwMDMeTiJXwwmfkxRVSvxcGdTTByKysXxrd0tKNX3IqmmVFhtfbrkoWd48Hw0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=VWvn92VY; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706580590; x=1738116590; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yOWHglLZ1B+fQRkP+rpuHciUQYdwgtTPy4iyogRW/R4=; b=VWvn92VYj14HhW7hnnQfa9CE8XxqnEhgM+Uys+6M0rwUOHwR8onjBrUK KtICj4S3nkv0oA1PnUZWSti83cGnpAoXqjbP32OzneW0O3aMd7mA50uwU PJCdnzKmQc6Xb/8hdUXLr+B09CwgCJdHan/LJ7AwEKiZAhLCyeocJzvxa /mKHePNNWQSAzthKTIL4m84KBcNvRpnoC0DiMsLMUWFItopsBqn/aikT1 NrQ0QfBnHtbM1DJkN70MO9F2fxCAVWMXSGf+bve1ht/Y1vbfKAEt078Ts lgCKczfJ278HCknMiYDeh05ubN3iNbUSKk5v+v528lCMG8qmHYA9ViY6a w==; X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="16531040" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="16531040" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Jan 2024 18:09:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10968"; a="822042363" X-IronPort-AV: E=Sophos;i="6.05,707,1701158400"; d="scan'208";a="822042363" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga001.jf.intel.com with ESMTP; 29 Jan 2024 18:09:41 -0800 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, chrisyan@microsoft.com Subject: [PATCH v8 15/15] selftests/sgx: Add scripts for EPC cgroup testing Date: Mon, 29 Jan 2024 18:09:38 -0800 Message-Id: <20240130020938.10025-16-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240130020938.10025-1-haitao.huang@linux.intel.com> References: <20240130020938.10025-1-haitao.huang@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789479744909801748 X-GMAIL-MSGID: 1789479744909801748 The scripts rely on cgroup-tools package from libcgroup [1]. To run selftests for epc cgroup: sudo ./run_epc_cg_selftests.sh To watch misc cgroup 'current' changes during testing, run this in a separate terminal: /watch_misc_for_tests.sh current With different cgroups, the script starts one or multiple concurrent SGX selftests, each to run one unclobbered_vdso_oversubscribed test. Each of such test tries to load an enclave of EPC size equal to the EPC capacity available on the platform. The script checks results against the expectation set for each cgroup and reports success or failure. The script creates 3 different cgroups at the beginning with following expectations: 1) SMALL - intentionally small enough to fail the test loading an enclave of size equal to the capacity. 2) LARGE - large enough to run up to 4 concurrent tests but fail some if more than 4 concurrent tests are run. The script starts 4 expecting at least one test to pass, and then starts 5 expecting at least one test to fail. 3) LARGER - limit is the same as the capacity, large enough to run lots of concurrent tests. The script starts 8 of them and expects all pass. Then it reruns the same test with one process randomly killed and usage checked to be zero after all process exit. The script also includes a test with low mem_cg limit and LARGE sgx_epc limit to verify that the RAM used for per-cgroup reclamation is charged to a proper mem_cg. [1] https://github.com/libcgroup/libcgroup/blob/main/README Signed-off-by: Haitao Huang --- V7: - Added memcontrol test. V5: - Added script with automatic results checking, remove the interactive script. - The script can run independent from the series below. --- .../selftests/sgx/run_epc_cg_selftests.sh | 246 ++++++++++++++++++ .../selftests/sgx/watch_misc_for_tests.sh | 13 + 2 files changed, 259 insertions(+) create mode 100755 tools/testing/selftests/sgx/run_epc_cg_selftests.sh create mode 100755 tools/testing/selftests/sgx/watch_misc_for_tests.sh diff --git a/tools/testing/selftests/sgx/run_epc_cg_selftests.sh b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh new file mode 100755 index 000000000000..e027bf39f005 --- /dev/null +++ b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh @@ -0,0 +1,246 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright(c) 2023 Intel Corporation. + +TEST_ROOT_CG=selftest +cgcreate -g misc:$TEST_ROOT_CG +if [ $? -ne 0 ]; then + echo "# Please make sure cgroup-tools is installed, and misc cgroup is mounted." + exit 1 +fi +TEST_CG_SUB1=$TEST_ROOT_CG/test1 +TEST_CG_SUB2=$TEST_ROOT_CG/test2 +# We will only set limit in test1 and run tests in test3 +TEST_CG_SUB3=$TEST_ROOT_CG/test1/test3 +TEST_CG_SUB4=$TEST_ROOT_CG/test4 + +cgcreate -g misc:$TEST_CG_SUB1 +cgcreate -g misc:$TEST_CG_SUB2 +cgcreate -g misc:$TEST_CG_SUB3 +cgcreate -g misc:$TEST_CG_SUB4 + +# Default to V2 +CG_MISC_ROOT=/sys/fs/cgroup +CG_MEM_ROOT=/sys/fs/cgroup +CG_V1=0 +if [ ! -d "/sys/fs/cgroup/misc" ]; then + echo "# cgroup V2 is in use." +else + echo "# cgroup V1 is in use." + CG_MISC_ROOT=/sys/fs/cgroup/misc + CG_MEM_ROOT=/sys/fs/cgroup/memory + CG_V1=1 +fi + +CAPACITY=$(grep "sgx_epc" "$CG_MISC_ROOT/misc.capacity" | awk '{print $2}') +# This is below number of VA pages needed for enclave of capacity size. So +# should fail oversubscribed cases +SMALL=$(( CAPACITY / 512 )) + +# At least load one enclave of capacity size successfully, maybe up to 4. +# But some may fail if we run more than 4 concurrent enclaves of capacity size. +LARGE=$(( SMALL * 4 )) + +# Load lots of enclaves +LARGER=$CAPACITY +echo "# Setting up limits." +echo "sgx_epc $SMALL" > $CG_MISC_ROOT/$TEST_CG_SUB1/misc.max +echo "sgx_epc $LARGE" > $CG_MISC_ROOT/$TEST_CG_SUB2/misc.max +echo "sgx_epc $LARGER" > $CG_MISC_ROOT/$TEST_CG_SUB4/misc.max + +timestamp=$(date +%Y%m%d_%H%M%S) + +test_cmd="./test_sgx -t unclobbered_vdso_oversubscribed" + +wait_check_process_status() { + local pid=$1 + local check_for_success=$2 # If 1, check for success; + # If 0, check for failure + wait "$pid" + local status=$? + + if [[ $check_for_success -eq 1 && $status -eq 0 ]]; then + echo "# Process $pid succeeded." + return 0 + elif [[ $check_for_success -eq 0 && $status -ne 0 ]]; then + echo "# Process $pid returned failure." + return 0 + fi + return 1 +} + +wait_and_detect_for_any() { + local pids=("$@") + local check_for_success=$1 # If 1, check for success; + # If 0, check for failure + local detected=1 # 0 for success detection + + for pid in "${pids[@]:1}"; do + if wait_check_process_status "$pid" "$check_for_success"; then + detected=0 + # Wait for other processes to exit + fi + done + + return $detected +} + +echo "# Start unclobbered_vdso_oversubscribed with SMALL limit, expecting failure..." +# Always use leaf node of misc cgroups so it works for both v1 and v2 +# these may fail on OOM +cgexec -g misc:$TEST_CG_SUB3 $test_cmd >cgtest_small_$timestamp.log 2>&1 +if [[ $? -eq 0 ]]; then + echo "# Fail on SMALL limit, not expecting any test passes." + cgdelete -r -g misc:$TEST_ROOT_CG + exit 1 +else + echo "# Test failed as expected." +fi + +echo "# PASSED SMALL limit." + +echo "# Start 4 concurrent unclobbered_vdso_oversubscribed tests with LARGE limit, + expecting at least one success...." + +pids=() +for i in {1..4}; do + ( + cgexec -g misc:$TEST_CG_SUB2 $test_cmd >cgtest_large_positive_$timestamp.$i.log 2>&1 + ) & + pids+=($!) +done + + +if wait_and_detect_for_any 1 "${pids[@]}"; then + echo "# PASSED LARGE limit positive testing." +else + echo "# Failed on LARGE limit positive testing, no test passes." + cgdelete -r -g misc:$TEST_ROOT_CG + exit 1 +fi + +echo "# Start 5 concurrent unclobbered_vdso_oversubscribed tests with LARGE limit, + expecting at least one failure...." +pids=() +for i in {1..5}; do + ( + cgexec -g misc:$TEST_CG_SUB2 $test_cmd >cgtest_large_negative_$timestamp.$i.log 2>&1 + ) & + pids+=($!) +done + +if wait_and_detect_for_any 0 "${pids[@]}"; then + echo "# PASSED LARGE limit negative testing." +else + echo "# Failed on LARGE limit negative testing, no test fails." + cgdelete -r -g misc:$TEST_ROOT_CG + exit 1 +fi + +echo "# Start 8 concurrent unclobbered_vdso_oversubscribed tests with LARGER limit, + expecting no failure...." +pids=() +for i in {1..8}; do + ( + cgexec -g misc:$TEST_CG_SUB4 $test_cmd >cgtest_larger_$timestamp.$i.log 2>&1 + ) & + pids+=($!) +done + +if wait_and_detect_for_any 0 "${pids[@]}"; then + echo "# Failed on LARGER limit, at least one test fails." + cgdelete -r -g misc:$TEST_ROOT_CG + exit 1 +else + echo "# PASSED LARGER limit tests." +fi + +echo "# Start 8 concurrent unclobbered_vdso_oversubscribed tests with LARGER limit, + randomly kill one, expecting no failure...." +pids=() +for i in {1..8}; do + ( + cgexec -g misc:$TEST_CG_SUB4 $test_cmd >cgtest_larger_kill_$timestamp.$i.log 2>&1 + ) & + pids+=($!) +done + +sleep $((RANDOM % 10 + 5)) + +# Randomly select a PID to kill +RANDOM_INDEX=$((RANDOM % 8)) +PID_TO_KILL=${pids[RANDOM_INDEX]} + +kill $PID_TO_KILL +echo "# Killed process with PID: $PID_TO_KILL" + +any_failure=0 +for pid in "${pids[@]}"; do + wait "$pid" + status=$? + if [ "$pid" != "$PID_TO_KILL" ]; then + if [[ $status -ne 0 ]]; then + echo "# Process $pid returned failure." + any_failure=1 + fi + fi +done + +if [[ $any_failure -ne 0 ]]; then + echo "# Failed on random killing, at least one test fails." + cgdelete -r -g misc:$TEST_ROOT_CG + exit 1 +fi +echo "# PASSED LARGER limit test with a process randomly killed." + +cgcreate -g memory:$TEST_CG_SUB2 +if [ $? -ne 0 ]; then + echo "# Failed creating memory controller." + cgdelete -r -g misc:$TEST_ROOT_CG + exit 1 +fi +MEM_LIMIT_TOO_SMALL=$((CAPACITY - 2 * LARGE)) + +if [[ $CG_V1 -eq 0 ]]; then + echo "$MEM_LIMIT_TOO_SMALL" > $CG_MEM_ROOT/$TEST_CG_SUB2/memory.max +else + echo "$MEM_LIMIT_TOO_SMALL" > $CG_MEM_ROOT/$TEST_CG_SUB2/memory.limit_in_bytes + echo "$MEM_LIMIT_TOO_SMALL" > $CG_MEM_ROOT/$TEST_CG_SUB2/memory.memsw.limit_in_bytes +fi + +echo "# Start 4 concurrent unclobbered_vdso_oversubscribed tests with LARGE EPC limit, + and too small RAM limit, expecting all failures...." +pids=() +for i in {1..4}; do + ( + cgexec -g memory:$TEST_CG_SUB2 -g misc:$TEST_CG_SUB2 $test_cmd \ + >cgtest_large_oom_$timestamp.$i.log 2>&1 + ) & + pids+=($!) +done + +if wait_and_detect_for_any 1 "${pids[@]}"; then + echo "# Failed on tests with memcontrol, some tests did not fail." + cgdelete -r -g misc:$TEST_ROOT_CG + if [[ $CG_V1 -ne 0 ]]; then + cgdelete -r -g memory:$TEST_ROOT_CG + fi + exit 1 +else + echo "# PASSED LARGE limit tests with memcontrol." +fi + +sleep 2 + +USAGE=$(grep '^sgx_epc' "$CG_MISC_ROOT/$TEST_ROOT_CG/misc.current" | awk '{print $2}') +if [ "$USAGE" -ne 0 ]; then + echo "# Failed: Final usage is $USAGE, not 0." +else + echo "# PASSED leakage check." + echo "# PASSED ALL cgroup limit tests, cleanup cgroups..." +fi +cgdelete -r -g misc:$TEST_ROOT_CG +if [[ $CG_V1 -ne 0 ]]; then + cgdelete -r -g memory:$TEST_ROOT_CG +fi +echo "# done." diff --git a/tools/testing/selftests/sgx/watch_misc_for_tests.sh b/tools/testing/selftests/sgx/watch_misc_for_tests.sh new file mode 100755 index 000000000000..dbd38f346e7b --- /dev/null +++ b/tools/testing/selftests/sgx/watch_misc_for_tests.sh @@ -0,0 +1,13 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright(c) 2023 Intel Corporation. + +if [ -z "$1" ] + then + echo "No argument supplied, please provide 'max', 'current' or 'events'" + exit 1 +fi + +watch -n 1 "find /sys/fs/cgroup -wholename */test*/misc.$1 -exec sh -c \ + 'echo \"\$1:\"; cat \"\$1\"' _ {} \;" +