From patchwork Mon Oct 30 18:20:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 159837 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp2415297vqb; Mon, 30 Oct 2023 11:21:21 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEkXfddfsBAWbQ/wmAKCU4jqpjoVGSiRP30hxa51QCIXyzpWkirtq00DRtLojhBPC2SrhS+ X-Received: by 2002:a17:90b:38ca:b0:280:1de0:8a86 with SMTP id nn10-20020a17090b38ca00b002801de08a86mr5197458pjb.20.1698690080853; Mon, 30 Oct 2023 11:21:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698690080; cv=none; d=google.com; s=arc-20160816; b=sDEwge16mm3SYBTbioJ8fmJ+37TX6WwYedfyuvoQ6trfr2VnGCIA3SdQiv/O1aqSds om5sJ6SCrAip37WIOoXEAGIeJUEnGPGs3h0kKYV+yMjNtwvHfGny4va8JQ4RI7nmUPpI a4WwvYGKDzCiBEFAfO/eh868pPhsmAYPEyjpQhX7rNZx25P74SIWsv+1JSGLwaIj/jDw y/dz5w2dP9X/P8D4HR0jaJ8L5//fZIYzHlqmw+QHlzdMITCw/fd0W2dxfavhnURPLuD2 TNW+7Pn+LQ6QiHZF4tbrtVIAf23SCYpeKr+jtXeK9TNX1t4edBj4Hvy89nB4IlMKsTS8 dyMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Pe+sBBU2XxaVJRSSBOPIoXqkDo1ImRgF5DvLiYV96kU=; fh=sNqM0rzpS3sahstSbhU1PMo5LJZgUrUMK8BDlDCJMTE=; b=Jf/w4xBqgSNFkCYax77gXDVSjh9U76+Wv2tVZcGtJq7HmSMNsIYTwaeHCbV0OvoaMf RwZEJ8NdLy/j15QvsrIJInK7fj5ewfutgS4XJ7nlGDo2ljcfND+odS5xfi6BiQJtdhjt j7K36+htolpsRPBJRsiweB9zbTSSrr1AICk2vdxHVPtfp4vMoxO+rFwOoAuUuQWTgO+F FLJr0iCwhq22iWqNOeJVJ7q+sulL7TiAysUDCFGYddLlc6Ti7p+BYZPd5MrkVLjaeA46 C9l1LeUFrk5aEt8i19VeQNEFgqGl9y56rrheIBYHiIDxAgD/t4Vdm98HjYblQTvQb/bq n1cw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=m1sarAKo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id h18-20020a17090acf1200b0028031758019si3312757pju.32.2023.10.30.11.21.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 11:21:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=m1sarAKo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id B2EAA8058A3B; Mon, 30 Oct 2023 11:21:07 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232013AbjJ3SUn (ORCPT + 32 others); Mon, 30 Oct 2023 14:20:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231827AbjJ3SUe (ORCPT ); Mon, 30 Oct 2023 14:20:34 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 32848C2; Mon, 30 Oct 2023 11:20:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698690030; x=1730226030; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=530RSKYG0q88gdwnkVm1khu5QkPGWZ8B6AWmOKTXhCE=; b=m1sarAKoduGVRWMmATKMuXdQxiZ3mLGqae29t6wqjr2Ha/sKk8rJ+JeQ KuQKmBJjJqNDzstteoQVeBpYlvuDP/e4pW9VfqSnUrCg1Pk3yYuTC05C8 f2pSHYntvfgrHTXSPv+U03cCqBqPaPh5qfKRMWpETHYju6aw6mTq0GFOs Keyi7qmB72Mdvz9Jvn/IkzHhXdQWuikbE31V+NZCdKIzZaQCqIPW6Q4I9 1UikWyzUaft832Sa6oE+nEiwS3Zrc+lcCmfd8n7ia/lB6x61wLho/aROn T+grI1hBVIw7go/R3n1oQkAx35mya4fSVZo9tHn38JMWxI96QI/MtFxU5 g==; X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="367479523" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="367479523" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2023 11:20:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="789529497" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="789529497" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga008.jf.intel.com with ESMTP; 30 Oct 2023 11:20:28 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, Haitao Huang Subject: [PATCH v6 01/12] cgroup/misc: Add per resource callbacks for CSS events Date: Mon, 30 Oct 2023 11:20:02 -0700 Message-Id: <20231030182013.40086-2-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231030182013.40086-1-haitao.huang@linux.intel.com> References: <20231030182013.40086-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Mon, 30 Oct 2023 11:21:07 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781205650617352005 X-GMAIL-MSGID: 1781205650617352005 From: Kristen Carlson Accardi The misc cgroup controller (subsystem) currently does not perform resource type specific action for Cgroups Subsystem State (CSS) events: the 'css_alloc' event when a cgroup is created and the 'css_free' event when a cgroup is destroyed. Define callbacks for those events and allow resource providers to register the callbacks per resource type as needed. This will be utilized later by the EPC misc cgroup support implemented in the SGX driver. Also add per resource type private data for those callbacks to store and access resource specific data. Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang --- V6: - Create ops struct for per resource callbacks (Jarkko) - Drop max_write callback (Dave, Michal) - Style fixes (Kai) --- include/linux/misc_cgroup.h | 14 ++++++++++++++ kernel/cgroup/misc.c | 27 ++++++++++++++++++++++++--- 2 files changed, 38 insertions(+), 3 deletions(-) diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index e799b1f8d05b..5dc509c27c3d 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -27,16 +27,30 @@ struct misc_cg; #include +/** + * struct misc_operations_struct: per resource callback ops. + * @alloc: invoked for resource specific initialization when cgroup is allocated. + * @free: invoked for resource specific cleanup when cgroup is deallocated. + */ +struct misc_operations_struct { + int (*alloc)(struct misc_cg *cg); + void (*free)(struct misc_cg *cg); +}; + /** * struct misc_res: Per cgroup per misc type resource * @max: Maximum limit on the resource. * @usage: Current usage of the resource. * @events: Number of times, the resource limit exceeded. + * @priv: resource specific data. + * @misc_ops: resource specific operations. */ struct misc_res { u64 max; atomic64_t usage; atomic64_t events; + void *priv; + const struct misc_operations_struct *misc_ops; }; /** diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index 79a3717a5803..d971ede44ebf 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -383,23 +383,37 @@ static struct cftype misc_cg_files[] = { static struct cgroup_subsys_state * misc_cg_alloc(struct cgroup_subsys_state *parent_css) { + struct misc_cg *parent_cg, *cg; enum misc_res_type i; - struct misc_cg *cg; + int ret; if (!parent_css) { - cg = &root_cg; + parent_cg = cg = &root_cg; } else { cg = kzalloc(sizeof(*cg), GFP_KERNEL); if (!cg) return ERR_PTR(-ENOMEM); + parent_cg = css_misc(parent_css); } for (i = 0; i < MISC_CG_RES_TYPES; i++) { WRITE_ONCE(cg->res[i].max, MAX_NUM); atomic64_set(&cg->res[i].usage, 0); + if (parent_cg->res[i].misc_ops && parent_cg->res[i].misc_ops->alloc) { + ret = parent_cg->res[i].misc_ops->alloc(cg); + if (ret) + goto alloc_err; + } } return &cg->css; + +alloc_err: + for (i = 0; i < MISC_CG_RES_TYPES; i++) + if (parent_cg->res[i].misc_ops && parent_cg->res[i].misc_ops->free) + cg->res[i].misc_ops->free(cg); + kfree(cg); + return ERR_PTR(ret); } /** @@ -410,7 +424,14 @@ misc_cg_alloc(struct cgroup_subsys_state *parent_css) */ static void misc_cg_free(struct cgroup_subsys_state *css) { - kfree(css_misc(css)); + struct misc_cg *cg = css_misc(css); + enum misc_res_type i; + + for (i = 0; i < MISC_CG_RES_TYPES; i++) + if (cg->res[i].misc_ops && cg->res[i].misc_ops->free) + cg->res[i].misc_ops->free(cg); + + kfree(cg); } /* Cgroup controller callbacks */ From patchwork Mon Oct 30 18:20:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 159839 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp2415705vqb; Mon, 30 Oct 2023 11:21:54 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFzaXOtBn6IIGeSkviZXiAzng+fB9h2G9hpPEaxnRCOeXDmIZqPRROgsKV0NizXN2Tt9vkg X-Received: by 2002:a17:90a:f015:b0:27d:114e:d4a3 with SMTP id bt21-20020a17090af01500b0027d114ed4a3mr7678920pjb.14.1698690114636; Mon, 30 Oct 2023 11:21:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698690114; cv=none; d=google.com; s=arc-20160816; b=iTJcA4ib5Z8KkTr51nc+cLUgmVVNVnMztTrj6tuWXsDPrm9o7spcBUI5trbHD+UFnR qMkUut04sEsXVzp68ahV7K0vth7jO/lB1NoA6Zzddi3OfAaE9NMIM7M+SXtV34yOHtQa 8+puz7jtRjTSTUTo+Or2Zi4HZ5NcUtjqAap3F7EuUT43Wjfo0x/fBXJ0+5dMVDFbanae zsj1Rre7J6a4UwjX7WIJD3RIwWRGdzFeFrCOQ8JlFBiDCNTbR+ifBEEaSbMbLop/EJF0 KmvcmymzdzP25P3+kmyQPi/KFPOssVurD9zgQY0j0dKqrB6mZ1Ih3e3PmFYVWgjXPdzZ qTpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=9T/4o0DpIXp2UMK9rr5U9/mFJ3MSvaBuQqv59vNsCWM=; fh=sNqM0rzpS3sahstSbhU1PMo5LJZgUrUMK8BDlDCJMTE=; b=okSohTx0QCWYdNbZf5pbxGZECVRDMMUNIroqLiWqpqE7LEaYjgMylPwNNguZAl5fdS g8Yb/mB8VCqjhP6NOsmcM8KKgPoSZ1jIYi8wQWyB1/vtFwuNQwPVpTzV/Eja4xYHWt9/ Vyasr8PMjIkJtXNhWWw4RG0Pjx1nf3rAl1FJ6sxAz/LAXETOeoRx0QGXiGKIXNt7My/9 0fvYuiPAPY7VhPRCPDQNWJDCJjqbkpgfxx6fk+ZKEnXT5fMnCixss0n92lBUFTAY9hHK CSw8r80o0pepoWF0dTiD1wxA2cwPX9UbqkMOl5Uvi408hjFQwpnD55zoVygatP/fwt3w JP5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=J1uvufC3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id ca7-20020a17090af30700b002804113621esi2764372pjb.100.2023.10.30.11.21.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 11:21:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=J1uvufC3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 23D7A8077829; Mon, 30 Oct 2023 11:21:32 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232404AbjJ3SUs (ORCPT + 32 others); Mon, 30 Oct 2023 14:20:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33616 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230449AbjJ3SUe (ORCPT ); Mon, 30 Oct 2023 14:20:34 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9BDECC9; Mon, 30 Oct 2023 11:20:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698690031; x=1730226031; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=lbc5z15jjbdRkyhWfBNnPpmSka2dKIEP/U6WNWcCv9k=; b=J1uvufC3k+kkb07KdbRtvHvWUb5KsOcRkkvEjyuvZuEk5AQlI+PsFT+l yz42QpYKA2ECHv2Jm3jAiV92nXdQHyW1aEkabBpjNoDBy0KJBTQtDwMxi 1OWbzK/5TQTMjPIqLK9Y1zDaT4uO4wTLRPUgPx2JnYO/fZGt/OJEHqF/Z 10Oflc3p8AWKJbV8XJEcN0VxAV8ODXBrevU+RW4Vs8QV2Q4LOP5yBPQve VifZB7S6ML9buXGosA4YSSHdQ26DzhRB+Zl8piVHauZ1jJLeWTrtIsuBd N1hImS3m7ozgwkT21Um1+d/Ga31MISgxRMU0gvRxrWYOTYjqO/d/hvwQL w==; X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="367479531" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="367479531" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2023 11:20:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="789529501" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="789529501" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga008.jf.intel.com with ESMTP; 30 Oct 2023 11:20:28 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, Haitao Huang Subject: [PATCH v6 02/12] cgroup/misc: Export APIs for SGX driver Date: Mon, 30 Oct 2023 11:20:03 -0700 Message-Id: <20231030182013.40086-3-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231030182013.40086-1-haitao.huang@linux.intel.com> References: <20231030182013.40086-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Mon, 30 Oct 2023 11:21:32 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781205685881666278 X-GMAIL-MSGID: 1781205685881666278 From: Kristen Carlson Accardi Export misc_cg_root() so the SGX EPC cgroup can access and do extra setup during initialization, e.g., set callbacks and private data previously defined. The SGX EPC cgroup will reclaim EPC pages when a usage in a cgroup reaches its or ancestor's limit. This requires a walk from the current cgroup up to the root similar to misc_cg_try_charge(). Export misc_cg_parent() to enable this walk. Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang --- V6: - Make commit messages more concise and split the original patch into two(Kai) --- include/linux/misc_cgroup.h | 24 ++++++++++++++++++++++++ kernel/cgroup/misc.c | 21 ++++++++------------- 2 files changed, 32 insertions(+), 13 deletions(-) diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index 5dc509c27c3d..2a3b1f8dc669 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -68,6 +68,7 @@ struct misc_cg { struct misc_res res[MISC_CG_RES_TYPES]; }; +struct misc_cg *misc_cg_root(void); u64 misc_cg_res_total_usage(enum misc_res_type type); int misc_cg_set_capacity(enum misc_res_type type, u64 capacity); int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, u64 amount); @@ -87,6 +88,20 @@ static inline struct misc_cg *css_misc(struct cgroup_subsys_state *css) return css ? container_of(css, struct misc_cg, css) : NULL; } +/** + * misc_cg_parent() - Get the parent of the passed misc cgroup. + * @cgroup: cgroup whose parent needs to be fetched. + * + * Context: Any context. + * Return: + * * struct misc_cg* - Parent of the @cgroup. + * * %NULL - If @cgroup is null or the passed cgroup does not have a parent. + */ +static inline struct misc_cg *misc_cg_parent(struct misc_cg *cgroup) +{ + return cgroup ? css_misc(cgroup->css.parent) : NULL; +} + /* * get_current_misc_cg() - Find and get the misc cgroup of the current task. * @@ -111,6 +126,15 @@ static inline void put_misc_cg(struct misc_cg *cg) } #else /* !CONFIG_CGROUP_MISC */ +static inline struct misc_cg *misc_cg_root(void) +{ + return NULL; +} + +static inline struct misc_cg *misc_cg_parent(struct misc_cg *cg) +{ + return NULL; +} static inline u64 misc_cg_res_total_usage(enum misc_res_type type) { diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index d971ede44ebf..fa464324ccf8 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -40,18 +40,13 @@ static struct misc_cg root_cg; static u64 misc_res_capacity[MISC_CG_RES_TYPES]; /** - * parent_misc() - Get the parent of the passed misc cgroup. - * @cgroup: cgroup whose parent needs to be fetched. - * - * Context: Any context. - * Return: - * * struct misc_cg* - Parent of the @cgroup. - * * %NULL - If @cgroup is null or the passed cgroup does not have a parent. + * misc_cg_root() - Return the root misc cgroup. */ -static struct misc_cg *parent_misc(struct misc_cg *cgroup) +struct misc_cg *misc_cg_root(void) { - return cgroup ? css_misc(cgroup->css.parent) : NULL; + return &root_cg; } +EXPORT_SYMBOL_GPL(misc_cg_root); /** * valid_type() - Check if @type is valid or not. @@ -150,7 +145,7 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, u64 amount) if (!amount) return 0; - for (i = cg; i; i = parent_misc(i)) { + for (i = cg; i; i = misc_cg_parent(i)) { res = &i->res[type]; new_usage = atomic64_add_return(amount, &res->usage); @@ -163,12 +158,12 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, u64 amount) return 0; err_charge: - for (j = i; j; j = parent_misc(j)) { + for (j = i; j; j = misc_cg_parent(j)) { atomic64_inc(&j->res[type].events); cgroup_file_notify(&j->events_file); } - for (j = cg; j != i; j = parent_misc(j)) + for (j = cg; j != i; j = misc_cg_parent(j)) misc_cg_cancel_charge(type, j, amount); misc_cg_cancel_charge(type, i, amount); return ret; @@ -190,7 +185,7 @@ void misc_cg_uncharge(enum misc_res_type type, struct misc_cg *cg, u64 amount) if (!(amount && valid_type(type) && cg)) return; - for (i = cg; i; i = parent_misc(i)) + for (i = cg; i; i = misc_cg_parent(i)) misc_cg_cancel_charge(type, i, amount); } EXPORT_SYMBOL_GPL(misc_cg_uncharge); From patchwork Mon Oct 30 18:20:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 159844 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp2416295vqb; Mon, 30 Oct 2023 11:22:44 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEp8Bbh5/fE559tFUt9WqvueoEIpW5+dHBGRoT/AZrKlJmlHV0Yxoong0y9b8On9omfSKxf X-Received: by 2002:a17:902:fb45:b0:1cc:379b:3505 with SMTP id lf5-20020a170902fb4500b001cc379b3505mr3362083plb.49.1698690163745; Mon, 30 Oct 2023 11:22:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698690163; cv=none; d=google.com; s=arc-20160816; b=VyGd+eJLNNvSbpKVaoauNcbwVE19XN5tqwUc/MuGoTGJUQAu6RVTWFQnvhCbw4jInZ ItkhJv4wlJnM6Qt20JZAazV8N5m5WlNdzlTQDyTVxkWO8ekQjg1bbnwJh1VghOiVYQFN Q8VXLowpRn06aWC5IaRuIIPoHBzWq3DA9A5ycYm0vzTAS+agU7y18F2e9auZNzYYNDyZ poUQY23QfPO0ZwQGat2u92Qbue2aMJ4yh2hGk1bHRQcZPATY5F6wgRR5l8phFSMlMyfT Bz5xEf9PqawiVrg75fKm1UI82OI/ftlygaelszL8v6fclfFC5gcqdsZumGVuAZNIqevr 6esg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=nNZ70ezK8IHHtcw37/VaTy0+dw4rKgKwLF7fJv7J+js=; fh=sNqM0rzpS3sahstSbhU1PMo5LJZgUrUMK8BDlDCJMTE=; b=LOvpLfXH+GE+TMyqEpCi4K0UkajKm2eB6jbAYMkP18j41+RQOSuOvL7ovVjFwv0w2E e3J4ZO1Ulx3SH8fQdKGhIkh3elaeNHjNbk9kEmoADK3HEqY++k5Ak/wxA5V5x8ecN4B/ dBNQe6J7JfP7a2NVN/UCNvqyO8afXwO9MkgCq2r8LM3we7CCC0+pgU/1lGO11Yu/fUfu xX+Ijn5Z9lsXMtAQMbJb+1MCXsN24hBaIqZ5EE+ABne2fok8hgiEQNsAtDuDZBw0mmPy CJXBI1D3y4HaFdiUq3dpwMwSe3AtUuIAID78G8NW6n8yqfPAF1hvjCSJYjJgN/igZ0Nc F1Fw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=f8cwO1yQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id i8-20020a170902c94800b001cc3a66229dsi3137252pla.507.2023.10.30.11.22.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 11:22:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=f8cwO1yQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 35D3580870E7; Mon, 30 Oct 2023 11:21:20 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232989AbjJ3SUw (ORCPT + 32 others); Mon, 30 Oct 2023 14:20:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231157AbjJ3SUe (ORCPT ); Mon, 30 Oct 2023 14:20:34 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E6650D3; Mon, 30 Oct 2023 11:20:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698690032; x=1730226032; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5GmAti5pT59JXbFdfEtdxPlG/vuMlTMqJEpR0/hTUEE=; b=f8cwO1yQUlpWvuzvVvzPEi7mNi+mvTT7xdK+LENuSuXEXUB6/1VnZh3y lwmMqhrJ3OwCGA2K4H0C7Sa96U5n+3D9Ip+P+MU6UWIaRdUxQX4fCNZH7 4ITMCfaXEco7uKEdBW0qWuEp7QGP/YmWdGWSdM9t9Ten5UqECXecR37vS WaMXJB1Q4d8Q11gKBVFSuyhA3ru/t5rtknGhpN3Nv84rZgV/pil3KteT1 8vybgFyfYWutZ1CtpgixEGMPx3Nra0XPAps3Ij6oHMtnzM0JbzBZZz89T 39kme88kaOxlWoGjKbGtZe9gWSNHG4UuHYrMZJv9Xv24xcvLwCzukPCUg Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="367479542" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="367479542" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2023 11:20:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="789529504" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="789529504" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga008.jf.intel.com with ESMTP; 30 Oct 2023 11:20:28 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, Haitao Huang Subject: [PATCH v6 03/12] cgroup/misc: Add SGX EPC resource type Date: Mon, 30 Oct 2023 11:20:04 -0700 Message-Id: <20231030182013.40086-4-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231030182013.40086-1-haitao.huang@linux.intel.com> References: <20231030182013.40086-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Mon, 30 Oct 2023 11:21:20 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781205737629148778 X-GMAIL-MSGID: 1781205737629148778 From: Kristen Carlson Accardi Add SGX EPC memory, MISC_CG_RES_SGX_EPC, to be a valid resource type for the misc controller. Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang --- V6: - Split the original patch into this and the preceding one (Kai) --- include/linux/misc_cgroup.h | 4 ++++ kernel/cgroup/misc.c | 4 ++++ 2 files changed, 8 insertions(+) diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index 2a3b1f8dc669..368f6c5fccae 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -17,6 +17,10 @@ enum misc_res_type { MISC_CG_RES_SEV, /* AMD SEV-ES ASIDs resource */ MISC_CG_RES_SEV_ES, +#endif +#ifdef CONFIG_CGROUP_SGX_EPC + /* SGX EPC memory resource */ + MISC_CG_RES_SGX_EPC, #endif MISC_CG_RES_TYPES }; diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index fa464324ccf8..a22500851fe8 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -24,6 +24,10 @@ static const char *const misc_res_name[] = { /* AMD SEV-ES ASIDs resource */ "sev_es", #endif +#ifdef CONFIG_CGROUP_SGX_EPC + /* Intel SGX EPC memory bytes */ + "sgx_epc", +#endif }; /* Root misc cgroup */ From patchwork Mon Oct 30 18:20:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 159845 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp2416668vqb; Mon, 30 Oct 2023 11:23:17 -0700 (PDT) X-Google-Smtp-Source: AGHT+IF56aDbwHlBiOCmGksYFIRbO6RRGUEA855C4+ntLh88tY1tTjZlm9R0wsPSx7d9DulbaGKA X-Received: by 2002:a05:6a00:1a4a:b0:68a:582b:6b62 with SMTP id h10-20020a056a001a4a00b0068a582b6b62mr487244pfv.7.1698690197496; Mon, 30 Oct 2023 11:23:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698690197; cv=none; d=google.com; s=arc-20160816; b=XETrtJjyfvIuHui23N6jJhjNYkSCQ+dWU2x9uebsTZcLcDzM6baxnY+nLPQl6uMAvK FaPkFCw/ap6C8j9ZUkTtE3faZ4scI3wM+sFS8eaY6UA1LT6tQeURpdzGD+MGFpEuVH6a DdGh5HAK9W67s6/+FzmTAzu/D7bE17pHX3lQhBPjoGIfvZIQ8f+pSqvHO9OuBuRBiyVn Wjj8E6jieR754HuDzsIHfRRM+I21/WJ6ZgOitgVIBA7LYudnL0Eh9xII2jjG5TAhsF9o dQfNwrsMtszI/RClyW+d7Bu5O4mcg89JW4CwUqttOYk38waNQ/tte2rq4VifOmRY3VlC JQuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=hPS3r7hMkBFhA7BsgKWYKs42I+N3tjkMHUicVBUhylE=; fh=CTa835q0iyfQFiV5yjYPzrfvG/ulw1To85Pi/STRmhA=; b=FAlLE43jGW9/3o2fDQ5aks/SEOhLoikIbC4fBKKOJKm6sZV2x2GuEsaJVQxPHww+lD dTRoH/zhmeNm/x41SjAHJOXc5Sf+s7NpgOqUwFQU8u0zO3mUw7Lcp0UYSFzPQc8SCgO5 I2YFODKyhgcilOTyPtgqolzAIZyXDWPp5RcL115wZtte4UROxZ9UApzeILEXDUhjGNXF yH70oO5T+0XL6refcXgWtHjuIoIeKVggNascA0S1dWU+LEmLGopCELttO/Q2uUeLsn5h cj1aS7U5A9RtvWR6FgqrR+4ilwtQi/pAUNVOi6kU+8caMXyymw6cVpHTycJHMDvq0dEY H/0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=BxGkbJnS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id g22-20020a056a001a1600b006c081a3fe4dsi5271763pfv.259.2023.10.30.11.23.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 11:23:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=BxGkbJnS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 94D7F8049049; Mon, 30 Oct 2023 11:23:10 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233321AbjJ3SVB (ORCPT + 32 others); Mon, 30 Oct 2023 14:21:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33650 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232019AbjJ3SUk (ORCPT ); Mon, 30 Oct 2023 14:20:40 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 61187C5; Mon, 30 Oct 2023 11:20:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698690033; x=1730226033; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CZ5TypyOV7qEcBmcoHmI4JyV2l1WURD+EvzLJqnisRk=; b=BxGkbJnSVC6PO6eq2+s60R7uyqZgKL0mo1Xr/6N6Wvcjk8Wpeq15xqRu 4zIAbWTXB3AU5jkWZmACee8rQ4d2aUmBmPj+Y8YMREhGLFGUW2zhRAl4z 0VFXCtLOVW6580Lo6P90AxoT9vGyZbdSBzTcX5YJzZHw1yUgrBf+vCdWV FPPMkLA53f1nhYriy/nZmsmNaEAKXgM/0pX9rkWHNFVWowaHo1OokxJe5 hZsQ+takPBNFByRkXU/TGggqeuzSOkGrjPexknCeTX7H6u+7nZKNj1CEB GPJCEFsv8hhZX/UrfavftewHnmPQmzaTzWTNFfY8K6iWiu05m1GnTUR1V A==; X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="367479553" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="367479553" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2023 11:20:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="789529507" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="789529507" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga008.jf.intel.com with ESMTP; 30 Oct 2023 11:20:28 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, Sean Christopherson , Haitao Huang Subject: [PATCH v6 04/12] x86/sgx: Implement basic EPC misc cgroup functionality Date: Mon, 30 Oct 2023 11:20:05 -0700 Message-Id: <20231030182013.40086-5-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231030182013.40086-1-haitao.huang@linux.intel.com> References: <20231030182013.40086-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Mon, 30 Oct 2023 11:23:10 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781205772382245417 X-GMAIL-MSGID: 1781205772382245417 From: Kristen Carlson Accardi Implement support for cgroup control of SGX Enclave Page Cache (EPC) memory using the misc cgroup controller. EPC memory is independent from normal system memory, e.g. must be reserved at boot from RAM and cannot be converted between EPC and normal memory while the system is running. EPC is managed by the SGX subsystem and is not accounted by the memory controller. Much like normal system memory, EPC memory can be overcommitted via virtual memory techniques and pages can be swapped out of the EPC to their backing store (normal system memory, e.g. shmem). The SGX EPC subsystem is analogous to the memory subsystem and the SGX EPC controller is in turn analogous to the memory controller; it implements limit and protection models for EPC memory. The misc controller provides a mechanism to set a hard limit of EPC usage via the "sgx_epc" resource in "misc.max". The total EPC memory available on the system is reported via the "sgx_epc" resource in "misc.capacity". This patch was modified from the previous version to only add basic EPC cgroup structure, accounting allocations for cgroup usage (charge/uncharge), setup misc cgroup callbacks, set total EPC capacity. For now, the EPC cgroup simply blocks additional EPC allocation in sgx_alloc_epc_page() when the limit is reached. Reclaimable pages are still tracked in the global active list, only reclaimed by the global reclaimer when the total free page count is lower than a threshold. Later patches will reorganize the tracking and reclamation code in the globale reclaimer and implement per-cgroup tracking and reclaiming. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang --- V6: - Split the original large patch"Limit process EPC usage with misc cgroup controller" and restructure it (Kai) --- arch/x86/Kconfig | 13 ++++ arch/x86/kernel/cpu/sgx/Makefile | 1 + arch/x86/kernel/cpu/sgx/epc_cgroup.c | 103 +++++++++++++++++++++++++++ arch/x86/kernel/cpu/sgx/epc_cgroup.h | 36 ++++++++++ arch/x86/kernel/cpu/sgx/main.c | 28 ++++++++ arch/x86/kernel/cpu/sgx/sgx.h | 3 + 6 files changed, 184 insertions(+) create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.c create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 66bfabae8814..e17c5dc3aea4 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1921,6 +1921,19 @@ config X86_SGX If unsure, say N. +config CGROUP_SGX_EPC + bool "Miscellaneous Cgroup Controller for Enclave Page Cache (EPC) for Intel SGX" + depends on X86_SGX && CGROUP_MISC + help + Provides control over the EPC footprint of tasks in a cgroup via + the Miscellaneous cgroup controller. + + EPC is a subset of regular memory that is usable only by SGX + enclaves and is very limited in quantity, e.g. less than 1% + of total DRAM. + + Say N if unsure. + config X86_USER_SHADOW_STACK bool "X86 userspace shadow stack" depends on AS_WRUSS diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile index 9c1656779b2a..12901a488da7 100644 --- a/arch/x86/kernel/cpu/sgx/Makefile +++ b/arch/x86/kernel/cpu/sgx/Makefile @@ -4,3 +4,4 @@ obj-y += \ ioctl.o \ main.o obj-$(CONFIG_X86_SGX_KVM) += virt.o +obj-$(CONFIG_CGROUP_SGX_EPC) += epc_cgroup.o diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c b/arch/x86/kernel/cpu/sgx/epc_cgroup.c new file mode 100644 index 000000000000..500627d0563f --- /dev/null +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c @@ -0,0 +1,103 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright(c) 2022 Intel Corporation. + +#include +#include +#include "epc_cgroup.h" + +static inline struct sgx_epc_cgroup *sgx_epc_cgroup_from_misc_cg(struct misc_cg *cg) +{ + return (struct sgx_epc_cgroup *)(cg->res[MISC_CG_RES_SGX_EPC].priv); +} + +static inline bool sgx_epc_cgroup_disabled(void) +{ + return !cgroup_subsys_enabled(misc_cgrp_subsys); +} + +/** + * sgx_epc_cgroup_try_charge() - hierarchically try to charge a single EPC page + * + * Returns EPC cgroup or NULL on success, -errno on failure. + */ +struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(void) +{ + struct sgx_epc_cgroup *epc_cg; + int ret; + + if (sgx_epc_cgroup_disabled()) + return NULL; + + epc_cg = sgx_epc_cgroup_from_misc_cg(get_current_misc_cg()); + ret = misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE); + + if (!ret) { + /* No epc_cg returned, release ref from get_current_misc_cg() */ + put_misc_cg(epc_cg->cg); + return ERR_PTR(-ENOMEM); + } + + /* Ref released in sgx_epc_cgroup_uncharge() */ + return epc_cg; +} + +/** + * sgx_epc_cgroup_uncharge() - hierarchically uncharge EPC pages + * @epc_cg: the charged epc cgroup + */ +void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg) +{ + if (sgx_epc_cgroup_disabled()) + return; + + misc_cg_uncharge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE); + + /* Ref got from sgx_epc_cgroup_try_charge() */ + put_misc_cg(epc_cg->cg); +} + +static void sgx_epc_cgroup_free(struct misc_cg *cg) +{ + struct sgx_epc_cgroup *epc_cg; + + epc_cg = sgx_epc_cgroup_from_misc_cg(cg); + if (!epc_cg) + return; + + kfree(epc_cg); +} + +static int sgx_epc_cgroup_alloc(struct misc_cg *cg); + +const struct misc_operations_struct sgx_epc_cgroup_ops = { + .alloc = sgx_epc_cgroup_alloc, + .free = sgx_epc_cgroup_free, +}; + +static int sgx_epc_cgroup_alloc(struct misc_cg *cg) +{ + struct sgx_epc_cgroup *epc_cg; + + epc_cg = kzalloc(sizeof(*epc_cg), GFP_KERNEL); + if (!epc_cg) + return -ENOMEM; + + cg->res[MISC_CG_RES_SGX_EPC].misc_ops = &sgx_epc_cgroup_ops; + cg->res[MISC_CG_RES_SGX_EPC].priv = epc_cg; + epc_cg->cg = cg; + return 0; +} + +static int __init sgx_epc_cgroup_init(void) +{ + struct misc_cg *cg; + + if (!boot_cpu_has(X86_FEATURE_SGX)) + return 0; + + cg = misc_cg_root(); + BUG_ON(!cg); + + return sgx_epc_cgroup_alloc(cg); +} +subsys_initcall(sgx_epc_cgroup_init); diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.h b/arch/x86/kernel/cpu/sgx/epc_cgroup.h new file mode 100644 index 000000000000..c3abfe82be15 --- /dev/null +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2022 Intel Corporation. */ +#ifndef _INTEL_SGX_EPC_CGROUP_H_ +#define _INTEL_SGX_EPC_CGROUP_H_ + +#include +#include +#include +#include +#include +#include + +#include "sgx.h" + +#ifndef CONFIG_CGROUP_SGX_EPC +#define MISC_CG_RES_SGX_EPC MISC_CG_RES_TYPES +struct sgx_epc_cgroup; + +static inline struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(void) +{ + return NULL; +} + +static inline void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg) { } +#else +struct sgx_epc_cgroup { + struct misc_cg *cg; +}; + +struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(void); +void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg); +bool sgx_epc_cgroup_lru_empty(struct misc_cg *root); + +#endif + +#endif /* _INTEL_SGX_EPC_CGROUP_H_ */ diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 166692f2d501..07606f391540 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -17,6 +18,7 @@ #include "driver.h" #include "encl.h" #include "encls.h" +#include "epc_cgroup.h" struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS]; static int sgx_nr_epc_sections; @@ -559,6 +561,11 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) { struct sgx_epc_page *page; + struct sgx_epc_cgroup *epc_cg; + + epc_cg = sgx_epc_cgroup_try_charge(); + if (IS_ERR(epc_cg)) + return ERR_CAST(epc_cg); for ( ; ; ) { page = __sgx_alloc_epc_page(); @@ -580,10 +587,21 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } + /* + * Need to do a global reclamation if cgroup was not full but free + * physical pages run out, causing __sgx_alloc_epc_page() to fail. + */ sgx_reclaim_pages(); cond_resched(); } + if (!IS_ERR(page)) { + WARN_ON_ONCE(page->epc_cg); + page->epc_cg = epc_cg; + } else { + sgx_epc_cgroup_uncharge(epc_cg); + } + if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) wake_up(&ksgxd_waitq); @@ -604,6 +622,11 @@ void sgx_free_epc_page(struct sgx_epc_page *page) struct sgx_epc_section *section = &sgx_epc_sections[page->section]; struct sgx_numa_node *node = section->node; + if (page->epc_cg) { + sgx_epc_cgroup_uncharge(page->epc_cg); + page->epc_cg = NULL; + } + spin_lock(&node->lock); page->owner = NULL; @@ -643,6 +666,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size, section->pages[i].flags = 0; section->pages[i].owner = NULL; section->pages[i].poison = 0; + section->pages[i].epc_cg = NULL; list_add_tail(§ion->pages[i].list, &sgx_dirty_page_list); } @@ -787,6 +811,7 @@ static void __init arch_update_sysfs_visibility(int nid) {} static bool __init sgx_page_cache_init(void) { u32 eax, ebx, ecx, edx, type; + u64 capacity = 0; u64 pa, size; int nid; int i; @@ -837,6 +862,7 @@ static bool __init sgx_page_cache_init(void) sgx_epc_sections[i].node = &sgx_numa_nodes[nid]; sgx_numa_nodes[nid].size += size; + capacity += size; sgx_nr_epc_sections++; } @@ -846,6 +872,8 @@ static bool __init sgx_page_cache_init(void) return false; } + misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, capacity); + return true; } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index d2dad21259a8..b1786774b8d2 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -29,12 +29,15 @@ /* Pages on free list */ #define SGX_EPC_PAGE_IS_FREE BIT(1) +struct sgx_epc_cgroup; + struct sgx_epc_page { unsigned int section; u16 flags; u16 poison; struct sgx_encl_page *owner; struct list_head list; + struct sgx_epc_cgroup *epc_cg; }; /* From patchwork Mon Oct 30 18:20:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 159838 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp2415580vqb; Mon, 30 Oct 2023 11:21:44 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG6Xp8sxDX52Owbu74RkFX5Ru2VawLC5QfLRJKsF+TNkCDFV4b6jOgw/VPfncKhjfAMcPn/ X-Received: by 2002:a05:6a20:da91:b0:17a:2f1:ce1 with SMTP id iy17-20020a056a20da9100b0017a02f10ce1mr11769809pzb.31.1698690104645; Mon, 30 Oct 2023 11:21:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698690104; cv=none; d=google.com; s=arc-20160816; b=HvW9i+303p51VqxNec863/D9rB1Vi9ykxOYDLhxJN265BI53dOBNbaQ6bCFIn4crqY T7ePJN0z3tN+yyChHoUGB5gP+t22B8t+nOKGaD2UaAueHn1Wy/PKBlY6rlPbrWM4ERMo sZJ1IZHIALDAyUSYrVxp0IS/XNAcdu4Hk4nwwYEmdiXd1FNX3cmBTP/P+6MVMCtSgjNc ZEvxGbfrQkHaFEyZrtiJHWJxkf4AFP3ngxd0RAYbl+SaZLc2v3nNwN4fSU5Z/n6AK/BC /WmefgCEEEtegDhU+armfEkhaAKor2/uTnNJyaixboqkydGL2ahyW5hJETisTmlc99R2 gzWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=fAvwFEpnT51WGj1WZhC61t+ZUkrrIxyKbPugqE1gF9c=; fh=CTa835q0iyfQFiV5yjYPzrfvG/ulw1To85Pi/STRmhA=; b=r4drr2wKMeLoPDN6WrXH1vLBsKogeYKojVgGAGmBGUgmMEbjtB3tL8enKkvzg2J/1q DCEGG6hO0NvNqqFsefLKhATwwgNQGvLXs5oqGwPEUOOeKbyL+RWAgxahD7d4J5RStB/A kCpy4UgH91fk2n70zWJcrt2Aw4bgIstp5yaBtS8rkYAi80xzFokJ+T7dzgWsNE0UcV6y vOmvzKNwbspPe0lfDWvTL1uJtj8LM7EiT/PMztlBCK6l0TmaQknuVHzZ7ZWyrxjQ50I+ fIIo3DY4emNlWvBuOjaVbw6WLrSrnwyTi3aXgRiSIG6o8kC8O+Zjb9U2mOJRCkWwCHVL L+bA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=nLiGgZCM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id z9-20020a656649000000b00580e32f778csi5146190pgv.506.2023.10.30.11.21.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 11:21:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=nLiGgZCM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id AB37E80AEB0F; Mon, 30 Oct 2023 11:21:32 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233142AbjJ3SU5 (ORCPT + 32 others); Mon, 30 Oct 2023 14:20:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232084AbjJ3SUg (ORCPT ); Mon, 30 Oct 2023 14:20:36 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB750F9; Mon, 30 Oct 2023 11:20:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698690033; x=1730226033; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DcjHPzwKWtx4S8uR0MSwAzSJIgVUMVFldYzeDNnxAeU=; b=nLiGgZCMIfR0uuDBm7McZ61esFs2vx0PnR+DIYTOt44IEgQ/yJ6fgqSd n4vUdgQyI/MDTATd+Di569FtYoJz2K7+Vjv66NxBsFpfh2sZzlq778Oe1 Z/w/FZaF/TffvXIQhpQ7OE8Mmc46VscuhOpByAUfhQ+MViJ+EDNVPppD0 cAKsbfHnz7FoJPG0taUtnlgeQoCSeB4Gu7w35DG1K+VJ6emPSDJUsz+vx J3VAVhhbdiujvs+LhHKarMB8n26pQ+FYL2+Sn0i/+wRjmmKYMno1fPw52 YZE5BR1hjMsFNeO6A163ho3b7lUEXTdpjEyNOFmEwyuTKRZ2EEtiBh3E5 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="367479560" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="367479560" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2023 11:20:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="789529514" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="789529514" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga008.jf.intel.com with ESMTP; 30 Oct 2023 11:20:28 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, Sean Christopherson , Haitao Huang Subject: [PATCH v6 05/12] x86/sgx: Add sgx_epc_lru_list to encapsulate LRU list Date: Mon, 30 Oct 2023 11:20:06 -0700 Message-Id: <20231030182013.40086-6-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231030182013.40086-1-haitao.huang@linux.intel.com> References: <20231030182013.40086-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Mon, 30 Oct 2023 11:21:32 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781205674980946956 X-GMAIL-MSGID: 1781205674980946956 From: Sean Christopherson Introduce a data structure to wrap the existing reclaimable list and its spinlock. Each cgroup later will have one instance of this structure to track EPC pages allocated for processes associated with the same cgroup. Just like the global SGX reclaimer (ksgxd), an EPC cgroup reclaims pages from the reclaimable list in this structure when its usage reaches near its limit. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V6: - removed introduction to unreclaimables in commit message. V4: - Removed unneeded comments for the spinlock and the non-reclaimables. (Kai, Jarkko) - Revised the commit to add introduction comments for unreclaimables and multiple LRU lists.(Kai) - Reordered the patches: delay all changes for unreclaimables to later, and this one becomes the first change in the SGX subsystem. V3: - Removed the helper functions and revised commit messages. --- arch/x86/kernel/cpu/sgx/sgx.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index b1786774b8d2..0fbe6a2a159b 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -86,6 +86,21 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page) return section->virt_addr + index * PAGE_SIZE; } +/* + * Contains EPC pages tracked by the global reclaimer (ksgxd) or an EPC + * cgroup. + */ +struct sgx_epc_lru_list { + spinlock_t lock; + struct list_head reclaimable; +}; + +static inline void sgx_lru_init(struct sgx_epc_lru_list *lru) +{ + spin_lock_init(&lru->lock); + INIT_LIST_HEAD(&lru->reclaimable); +} + struct sgx_epc_page *__sgx_alloc_epc_page(void); void sgx_free_epc_page(struct sgx_epc_page *page); From patchwork Mon Oct 30 18:20:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 159840 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp2415752vqb; Mon, 30 Oct 2023 11:21:59 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEgM5HGroGACJbzyjKJF6zrEUeoB1Oywz1mm4Z5e04SaH4N94UO1Cc1WgQQf3o5oz9u/v9Y X-Received: by 2002:a05:6a00:1348:b0:693:3c11:4293 with SMTP id k8-20020a056a00134800b006933c114293mr9488744pfu.14.1698690119151; Mon, 30 Oct 2023 11:21:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698690119; cv=none; d=google.com; s=arc-20160816; b=B6aOVdZtqGKMD84u58IxYL6DzZOmfksiZJh3FsFPKtuboEnK6XvRBJ8za4gPP8/zWB UWY+xKGA8zS28ZCcQaAuIlFEoQ4qIwBNWLEmgp+le8SyDXZyJdJ9LkhQ4LRNFax60d21 wsd6ymrZ59L/mjmCnhj7CwkpOc0HE5N+jefRgGsn1QdCyRQbVapHEFsBW8kEh1olAuzp r5Z+VPjdPPr/8vYxcTknrxbj5/Aqw6sQYx0rOIIW+XPWL4EU4yP8+s0tFLWbke4Bmp5r gufoqmXU3oLl7fqvD92iybuvmkJJ1pfPSTzwh29l6djqjHyOOj6BSjT8LlsgRAA3gj5S FApw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=sbGYrmfIoiAmWVQwv3OJD08Ob0eCcSzxZVCVsh5ykTg=; fh=CTa835q0iyfQFiV5yjYPzrfvG/ulw1To85Pi/STRmhA=; b=uNQ7HZusZr2NFeFLRH7cBCNtByHn9PTisz13ph+D1IpobXF5kIsG8+JlEEeiDy2iAd iU7k2eaaIrr4prgezgGvdvyvzHkHzwNkOZU6I73vZKpK2YyzpZ8E+fZFFztke6RNLT+c sWLwiq33au0ViNkOttqJPNXAlYDSdONGv7RGO2BtrZoegAf95rjI2gBeOrpAUD+Ti8Rn np12Nz2xcJRUW4q39ow2Mjqu8xGwQlqTmytJgEqdG2nVvermtvoLkeEPW4Occ2M05rOu F09jFvTXR1Jt1oCD0gKof4dMfBi46Bafn6G61RSZSSmmaisX7PCb0ivAteFHzcphS5+L AOPQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=APdDxlNd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id k18-20020a056a00135200b006be062e4ecbsi5189228pfu.32.2023.10.30.11.21.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 11:21:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=APdDxlNd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 605EB805B203; Mon, 30 Oct 2023 11:21:43 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233070AbjJ3SUz (ORCPT + 32 others); Mon, 30 Oct 2023 14:20:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53252 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232146AbjJ3SUh (ORCPT ); Mon, 30 Oct 2023 14:20:37 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C0CFF5; Mon, 30 Oct 2023 11:20:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698690034; x=1730226034; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XIxXbhwXmB5hOPvlgZIYcW3lql519F0tLhTnN6zai0s=; b=APdDxlNdZB1f7M5kGUsK2yrTPYYLB2wfafYkYj9yTa8h7/gWN77H71N/ gOsJgqZzytZLvTZP4qFJAEzYTogj9b9fzd55rOREzguqmBtsZiaY1M2ZV 1QJO6+dBu8tnWaqDKRX29sDfM968iQPWwQpJJU8EUNmgBpXWKEbs9Ac7s O9bC18PFnuZrS5uMNF2Is/k3WF9ZAb0fX1AqAug2utUXJUugD7mwNrmXy 889ZhmxsCCrDT7mbLKO/4Bl9IwheHgiqkEB19ThyAOSQB+Uh7yajaArNi lNgc46hkPUbePUht/wpQRO3nq8Rje5ohyu2hjNCtzszvX1pmUsxNKUkMi w==; X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="367479572" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="367479572" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2023 11:20:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="789529517" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="789529517" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga008.jf.intel.com with ESMTP; 30 Oct 2023 11:20:29 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, Sean Christopherson , Haitao Huang Subject: [PATCH v6 06/12] x86/sgx: Use sgx_epc_lru_list for existing active page list Date: Mon, 30 Oct 2023 11:20:07 -0700 Message-Id: <20231030182013.40086-7-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231030182013.40086-1-haitao.huang@linux.intel.com> References: <20231030182013.40086-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Mon, 30 Oct 2023 11:21:43 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781205690210468932 X-GMAIL-MSGID: 1781205690210468932 From: Sean Christopherson In future each cgroup needs a LRU list to track reclaimable pages. For now just replace the existing sgx_active_page_list in the reclaimer and its spinlock with a global sgx_epc_lru_list struct. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V5: - Spelled out SECS, VA (Jarkko) V4: - No change, only reordered the patch. V3: - Remove usage of list wrapper --- arch/x86/kernel/cpu/sgx/main.c | 39 +++++++++++++++++----------------- 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 07606f391540..d347acd717fd 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -28,10 +28,9 @@ static DEFINE_XARRAY(sgx_epc_address_space); /* * These variables are part of the state of the reclaimer, and must be accessed - * with sgx_reclaimer_lock acquired. + * with sgx_global_lru.lock acquired. */ -static LIST_HEAD(sgx_active_page_list); -static DEFINE_SPINLOCK(sgx_reclaimer_lock); +static struct sgx_epc_lru_list sgx_global_lru; static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0); @@ -306,13 +305,13 @@ static void sgx_reclaim_pages(void) int ret; int i; - spin_lock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); for (i = 0; i < SGX_NR_TO_SCAN; i++) { - if (list_empty(&sgx_active_page_list)) + epc_page = list_first_entry_or_null(&sgx_global_lru.reclaimable, + struct sgx_epc_page, list); + if (!epc_page) break; - epc_page = list_first_entry(&sgx_active_page_list, - struct sgx_epc_page, list); list_del_init(&epc_page->list); encl_page = epc_page->owner; @@ -324,7 +323,7 @@ static void sgx_reclaim_pages(void) */ epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; } - spin_unlock(&sgx_reclaimer_lock); + spin_unlock(&sgx_global_lru.lock); for (i = 0; i < cnt; i++) { epc_page = chunk[i]; @@ -347,9 +346,9 @@ static void sgx_reclaim_pages(void) continue; skip: - spin_lock(&sgx_reclaimer_lock); - list_add_tail(&epc_page->list, &sgx_active_page_list); - spin_unlock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); + list_add_tail(&epc_page->list, &sgx_global_lru.reclaimable); + spin_unlock(&sgx_global_lru.lock); kref_put(&encl_page->encl->refcount, sgx_encl_release); @@ -380,7 +379,7 @@ static void sgx_reclaim_pages(void) static bool sgx_should_reclaim(unsigned long watermark) { return atomic_long_read(&sgx_nr_free_pages) < watermark && - !list_empty(&sgx_active_page_list); + !list_empty(&sgx_global_lru.reclaimable); } /* @@ -432,6 +431,8 @@ static bool __init sgx_page_reclaimer_init(void) ksgxd_tsk = tsk; + sgx_lru_init(&sgx_global_lru); + return true; } @@ -507,10 +508,10 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) */ void sgx_mark_page_reclaimable(struct sgx_epc_page *page) { - spin_lock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); page->flags |= SGX_EPC_PAGE_RECLAIMER_TRACKED; - list_add_tail(&page->list, &sgx_active_page_list); - spin_unlock(&sgx_reclaimer_lock); + list_add_tail(&page->list, &sgx_global_lru.reclaimable); + spin_unlock(&sgx_global_lru.lock); } /** @@ -525,18 +526,18 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page) */ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) { - spin_lock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { /* The page is being reclaimed. */ if (list_empty(&page->list)) { - spin_unlock(&sgx_reclaimer_lock); + spin_unlock(&sgx_global_lru.lock); return -EBUSY; } list_del(&page->list); page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; } - spin_unlock(&sgx_reclaimer_lock); + spin_unlock(&sgx_global_lru.lock); return 0; } @@ -574,7 +575,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - if (list_empty(&sgx_active_page_list)) + if (list_empty(&sgx_global_lru.reclaimable)) return ERR_PTR(-ENOMEM); if (!reclaim) { From patchwork Mon Oct 30 18:20:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 159843 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp2416180vqb; Mon, 30 Oct 2023 11:22:38 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFigE0ucF4BBM4Kd4Thie0VyUv3PfUTrVJF4yu0Y13birG0F7G6w9DFx6e+Ls9H7NVs0kXy X-Received: by 2002:a05:6a00:134c:b0:6ba:2ba7:b9cb with SMTP id k12-20020a056a00134c00b006ba2ba7b9cbmr12896053pfu.12.1698690158509; Mon, 30 Oct 2023 11:22:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698690158; cv=none; d=google.com; s=arc-20160816; b=0vqBxrAvGYEVtPHyyiQ+yn+PWZRJXkGdCU+TsMus2BShaI1MeK8HediRyy6PCU/Hkp ZDMCTjJif6erMsp4fhUFEUfItjoPX6lP7ITh885mC3hoI99z4c9RtZ/zXWfzs6hveny1 3gcrM9ycPQJeqJTBeL9jekkIsSuJEwuZfj6SMLLa0tjMCLQQCC4W+QGczWrxVDRtlHZV bhn2NHvdHlhpquRz3U9SKtkqMiPwW2Yqyw6Qd8PC8AxNraD9LgqwDI2p0gwZQmLvAOlg f3FkfaJrMIztvy8GM/bujzKDtV7f0UP/9mVHTWMZ3hL+61JMl0NwYiA4EKa9NjJfi9eL 3O0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=SlyC+f+FvVowUnUtKYPrD48yBN0YDEbHYd/mA9ddeZ0=; fh=aiVAfeLPyNbMei+Mx9XdTAlwmRswwMxOo2eiKJjY3Ss=; b=sOLvjmtj1faE7qp6oN5iqCWshxfLx+dPQQauev5adPzRrV1RITdvNL0fM/lFZRhObV I4ToiEXmETIUwnM+t3E45p9RBYO9ByyeKVeIFtGfGpRS0ihQyw6UtL0fYlUU+uE1gr/T UzoWDNHDaCWUezUdObNQOw08Pv6ouN/CzB/92y8+MhjBjyfqA3tNIrDtT34Kac1U63Qn 4EnMUfpPn2R9evBlAPyD+z5PeEmDMS//8ZbI48heogQN7qcDfpWYXCgnhGTZ7qmz4Qg0 vNkY80J2XljpoMy8PY/dJysXYGzBhqpK1ywOt5oJpFBq3d7LD5FFOx4pvxuMx9tKoqHu QTqw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="G/sf7+/e"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id b7-20020a056a000a8700b006c12323c3desi1367934pfl.350.2023.10.30.11.22.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 11:22:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="G/sf7+/e"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 1DA848077829; Mon, 30 Oct 2023 11:22:22 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232573AbjJ3SVF (ORCPT + 32 others); Mon, 30 Oct 2023 14:21:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232412AbjJ3SUl (ORCPT ); Mon, 30 Oct 2023 14:20:41 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AFAC3D3; Mon, 30 Oct 2023 11:20:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698690035; x=1730226035; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VrZIxRi+B6VM0vEJc0eFxB36W9ZoesT26irs+zOqumk=; b=G/sf7+/e6nzt8AywJHy9DmV+26eMBYx5JB0Vus/1aDngIyDvXmZjG6a/ 3htt19r3via0fty4ToQf2wuRu8WjEWEfT7/rONu0o9Pe0/xRNQ5KGwsWf 8asTWJHvAnArkLhhqkC7kYffcKkEzLXdSS38E1qBo7Cp5uLtzNnIZ9JTz IGeHLMkuSq44wO00QUU/9C57HIkXe4Xsd/2e2Nd/iso6rkbg1Ja+dEVLG zBzTyRttNS6W8c3EBrnzef/X9nBFBSZCmIvGnc8B3xKb0gk+O/zNzOG/1 AtYCevhN0C8lu3crAUOvEaLDG7uvrJ418x0eSmu80MEwmYkvYtnXZEqkb g==; X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="367479581" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="367479581" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2023 11:20:29 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="789529520" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="789529520" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga008.jf.intel.com with ESMTP; 30 Oct 2023 11:20:29 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, Haitao Huang , Sean Christopherson Subject: [PATCH v6 07/12] x86/sgx: Introduce EPC page states Date: Mon, 30 Oct 2023 11:20:08 -0700 Message-Id: <20231030182013.40086-8-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231030182013.40086-1-haitao.huang@linux.intel.com> References: <20231030182013.40086-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Mon, 30 Oct 2023 11:22:22 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781205731491971163 X-GMAIL-MSGID: 1781205731491971163 Use the lower 2 bits in the flags field of sgx_epc_page struct to track EPC states and define an enum for possible states for EPC pages tracked for reclamation. Add the RECLAIM_IN_PROGRESS state to explicitly indicate a page that is identified as a candidate for reclaiming, but has not yet been reclaimed, instead of relying on list_empty(&epc_page->list). A later patch will replace the array on stack with a temporary list to store the candidate pages, so list_empty() should no longer be used for this purpose. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V6: - Drop UNRECLAIMABLE and use only 2 bits for states (Kai) - Combine the patch for RECLAIM_IN_PROGRESS - Style fixes (Jarkko and Kai) --- arch/x86/kernel/cpu/sgx/encl.c | 2 +- arch/x86/kernel/cpu/sgx/main.c | 33 +++++++++--------- arch/x86/kernel/cpu/sgx/sgx.h | 62 +++++++++++++++++++++++++++++++--- 3 files changed, 76 insertions(+), 21 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 279148e72459..17dc108d3ff7 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -1315,7 +1315,7 @@ void sgx_encl_free_epc_page(struct sgx_epc_page *page) { int ret; - WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED); + WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_STATE_MASK); ret = __eremove(sgx_get_epc_virt_addr(page)); if (WARN_ONCE(ret, EREMOVE_ERROR_MESSAGE, ret, ret)) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index d347acd717fd..e27ac73d8843 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -315,13 +315,14 @@ static void sgx_reclaim_pages(void) list_del_init(&epc_page->list); encl_page = epc_page->owner; - if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) + if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) { + sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIM_IN_PROGRESS); chunk[cnt++] = epc_page; - else + } else /* The owner is freeing the page. No need to add the * page back to the list of reclaimable pages. */ - epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; + sgx_epc_page_reset_state(epc_page); } spin_unlock(&sgx_global_lru.lock); @@ -347,6 +348,7 @@ static void sgx_reclaim_pages(void) skip: spin_lock(&sgx_global_lru.lock); + sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIMABLE); list_add_tail(&epc_page->list, &sgx_global_lru.reclaimable); spin_unlock(&sgx_global_lru.lock); @@ -370,7 +372,7 @@ static void sgx_reclaim_pages(void) sgx_reclaimer_write(epc_page, &backing[i]); kref_put(&encl_page->encl->refcount, sgx_encl_release); - epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; + sgx_epc_page_reset_state(epc_page); sgx_free_epc_page(epc_page); } @@ -509,7 +511,8 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) void sgx_mark_page_reclaimable(struct sgx_epc_page *page) { spin_lock(&sgx_global_lru.lock); - page->flags |= SGX_EPC_PAGE_RECLAIMER_TRACKED; + WARN_ON_ONCE(sgx_epc_page_reclaimable(page->flags)); + page->flags |= SGX_EPC_PAGE_RECLAIMABLE; list_add_tail(&page->list, &sgx_global_lru.reclaimable); spin_unlock(&sgx_global_lru.lock); } @@ -527,16 +530,13 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page) int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) { spin_lock(&sgx_global_lru.lock); - if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { - /* The page is being reclaimed. */ - if (list_empty(&page->list)) { - spin_unlock(&sgx_global_lru.lock); - return -EBUSY; - } - - list_del(&page->list); - page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; + if (sgx_epc_page_reclaim_in_progress(page->flags)) { + spin_unlock(&sgx_global_lru.lock); + return -EBUSY; } + + list_del(&page->list); + sgx_epc_page_reset_state(page); spin_unlock(&sgx_global_lru.lock); return 0; @@ -623,6 +623,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page) struct sgx_epc_section *section = &sgx_epc_sections[page->section]; struct sgx_numa_node *node = section->node; + WARN_ON_ONCE(page->flags & (SGX_EPC_PAGE_STATE_MASK)); if (page->epc_cg) { sgx_epc_cgroup_uncharge(page->epc_cg); page->epc_cg = NULL; @@ -635,7 +636,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page) list_add(&page->list, &node->sgx_poison_page_list); else list_add_tail(&page->list, &node->free_page_list); - page->flags = SGX_EPC_PAGE_IS_FREE; + page->flags = SGX_EPC_PAGE_FREE; spin_unlock(&node->lock); atomic_long_inc(&sgx_nr_free_pages); @@ -737,7 +738,7 @@ int arch_memory_failure(unsigned long pfn, int flags) * If the page is on a free list, move it to the per-node * poison page list. */ - if (page->flags & SGX_EPC_PAGE_IS_FREE) { + if (page->flags == SGX_EPC_PAGE_FREE) { list_move(&page->list, &node->sgx_poison_page_list); goto out; } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 0fbe6a2a159b..dd7ab65b5b27 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -23,11 +23,44 @@ #define SGX_NR_LOW_PAGES 32 #define SGX_NR_HIGH_PAGES 64 -/* Pages, which are being tracked by the page reclaimer. */ -#define SGX_EPC_PAGE_RECLAIMER_TRACKED BIT(0) +enum sgx_epc_page_state { + /* + * Allocated but not tracked by the reclaimer. + * + * Pages allocated for virtual EPC which are never tracked by the host + * reclaimer; pages just allocated from free list but not yet put in + * use; pages just reclaimed, but not yet returned to the free list. + * Becomes FREE after sgx_free_epc(). + * Becomes RECLAIMABLE after sgx_mark_page_reclaimable(). + */ + SGX_EPC_PAGE_NOT_TRACKED = 0, + + /* + * Page is in the free list, ready for allocation. + * + * Becomes NOT_TRACKED after sgx_alloc_epc_page(). + */ + SGX_EPC_PAGE_FREE = 1, + + /* + * Page is in use and tracked in a reclaimable LRU list. + * + * Becomes NOT_TRACKED after sgx_unmark_page_reclaimable(). + * Becomes RECLAIM_IN_PROGRESS in sgx_reclaim_pages() when identified + * for reclaiming. + */ + SGX_EPC_PAGE_RECLAIMABLE = 2, + + /* + * Page is in the middle of reclamation. + * + * Back to RECLAIMABLE if reclamation fails for any reason. + * Becomes NOT_TRACKED if reclaimed successfully. + */ + SGX_EPC_PAGE_RECLAIM_IN_PROGRESS = 3, +}; -/* Pages on free list */ -#define SGX_EPC_PAGE_IS_FREE BIT(1) +#define SGX_EPC_PAGE_STATE_MASK GENMASK(1, 0) struct sgx_epc_cgroup; @@ -40,6 +73,27 @@ struct sgx_epc_page { struct sgx_epc_cgroup *epc_cg; }; +static inline void sgx_epc_page_reset_state(struct sgx_epc_page *page) +{ + page->flags &= ~SGX_EPC_PAGE_STATE_MASK; +} + +static inline void sgx_epc_page_set_state(struct sgx_epc_page *page, unsigned long flags) +{ + page->flags &= ~SGX_EPC_PAGE_STATE_MASK; + page->flags |= (flags & SGX_EPC_PAGE_STATE_MASK); +} + +static inline bool sgx_epc_page_reclaim_in_progress(unsigned long flags) +{ + return SGX_EPC_PAGE_RECLAIM_IN_PROGRESS == (flags & SGX_EPC_PAGE_STATE_MASK); +} + +static inline bool sgx_epc_page_reclaimable(unsigned long flags) +{ + return SGX_EPC_PAGE_RECLAIMABLE == (flags & SGX_EPC_PAGE_STATE_MASK); +} + /* * Contains the tracking data for NUMA nodes having EPC pages. Most importantly, * the free page list local to the node is stored here. From patchwork Mon Oct 30 18:20:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 159841 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp2415950vqb; Mon, 30 Oct 2023 11:22:15 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFQ/JqkQ5wShzAediEv/iaZ7UVXYL6niE56F25UeGpuwq4WVKaRWiOjfR6xZRI/QWGtDAHM X-Received: by 2002:a17:90b:11c4:b0:27c:ef18:d270 with SMTP id gv4-20020a17090b11c400b0027cef18d270mr7533616pjb.20.1698690135097; Mon, 30 Oct 2023 11:22:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698690135; cv=none; d=google.com; s=arc-20160816; b=Aifn2RxZiB4JVzI68wzp+O+Ye8sjPcZ5DS6F54DNC4xVO2eHR5pqe/m8kqQpfzbEbh H1Wv2wyWDBjagW5IJYT9M64/XyUl8v2PJTJotwh3LO0nKaEwUr64A7SoJdNEkT7iwQsx CyPIgRUSHFtrRGFoUDjPCO1irrf+ZXoLnJHRvtQqm44Qd/TbQMqEcyWcP3HAeSf+kJDh 8YtrMA7HBUlEdg65xe3+giw2D0VPZLsflGQNQqQAcTsbb+3RtGKacf0AeB+RjZGGJ8z7 5bygte6nJpgkSlFxo7iOwL4YxqGIg/tiewucwDG+ASsf0ytLeGOKAMssAZ/0IQeNvuWk tqRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=mKBzcpoWOONIhY/3IytBU0jumtxxl8spUMZuwSJl2wM=; fh=CTa835q0iyfQFiV5yjYPzrfvG/ulw1To85Pi/STRmhA=; b=RP/Yek5AT/jZ6t5g8aMhWAXbWqJrNQQtwJgEU/+1m3r5v//08Cysy6+1IR6MUaY+n+ qBe2jVhhJxXHDrzFEohNOv1+Bp5kopPCvB+5UP8x/XwdOKDhDWvRDPLlLi24b061pRoo gLDw0W9+AjB5d2wZtifS0TGh1x9z2c8IRL0gD3DxjcALdYQkzfgnEXHhyFqhcpP2flPS NcN0koj+lXL1p+vwXlg38ip1i0gbSBm8zi/dx4NvMWuLCvH2U666cyjP11Pzhx2QOhqp mfhEIzQOqaJInF7h9Cf4fbQMvgi+iuWv7ZSDNGy9vVxqSzLeg/6NCWM5cYfjtR47hH7v Am2Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=kluN1ngx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id ot13-20020a17090b3b4d00b0026b31ed4895si1675422pjb.29.2023.10.30.11.22.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 11:22:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=kluN1ngx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 0B12B8049054; Mon, 30 Oct 2023 11:21:52 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232951AbjJ3SVK (ORCPT + 32 others); Mon, 30 Oct 2023 14:21:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53306 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232481AbjJ3SUl (ORCPT ); Mon, 30 Oct 2023 14:20:41 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EE4F9EE; Mon, 30 Oct 2023 11:20:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698690036; x=1730226036; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=3i2BDE/DBhNogG1Ndm7YAif6roTq2fHz7d8OkLZtLwY=; b=kluN1ngx/Bjc7jVSrCuCkJcNf+TrZS/VPb7hAFayepcciTDMVyCaedUZ trTLC9dxMcKJSdc1aI6/a0mr3m1Ne0Q9D4JD7rEcWwTTJZVHCFbewRL5M 5stacmwTPX4p+agjXbBOzNYMjazBglZg2DRUa7l9/3dmKzjJNuOe21tkl eWWs+MacjEcYywPuG4HELujI7ePtW04+9P2CBDWWFdKeX6qUrc9/X4zr7 Xpg4CpxhUUMm/Z+27o70UvOK9sGjc+ht4fHdnCjzT7DYKtTrXqSeO4ghE Yu6ksSzARVTrDC/zOYrObaSX5ctQpAHH/YHM9sDlzp70Kixt9URYPIt77 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="367479596" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="367479596" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2023 11:20:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="789529524" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="789529524" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga008.jf.intel.com with ESMTP; 30 Oct 2023 11:20:29 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, Sean Christopherson , Haitao Huang Subject: [PATCH v6 08/12] x86/sgx: Use a list to track to-be-reclaimed pages Date: Mon, 30 Oct 2023 11:20:09 -0700 Message-Id: <20231030182013.40086-9-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231030182013.40086-1-haitao.huang@linux.intel.com> References: <20231030182013.40086-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Mon, 30 Oct 2023 11:21:52 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781205707116135884 X-GMAIL-MSGID: 1781205707116135884 From: Sean Christopherson Change sgx_reclaim_pages() to use a list rather than an array for storing the epc_pages which will be reclaimed. This change is needed to transition to the LRU implementation for EPC cgroup support. When the EPC cgroup is implemented, the reclaiming process will do a pre-order tree walk for the subtree starting from the limit-violating cgroup. When each node is visited, candidate pages are selected from its "reclaimable" LRU list and moved into this temporary list. Passing a list from node to node for temporary storage in this walk is more straightforward than using an array. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V6: - Remove extra list_del_init and style fix (Kai) V4: - Changes needed for patch reordering - Revised commit message V3: - Removed list wrappers --- arch/x86/kernel/cpu/sgx/main.c | 35 +++++++++++++++------------------- 1 file changed, 15 insertions(+), 20 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index e27ac73d8843..33bcba313d40 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -296,12 +296,11 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, */ static void sgx_reclaim_pages(void) { - struct sgx_epc_page *chunk[SGX_NR_TO_SCAN]; struct sgx_backing backing[SGX_NR_TO_SCAN]; + struct sgx_epc_page *epc_page, *tmp; struct sgx_encl_page *encl_page; - struct sgx_epc_page *epc_page; pgoff_t page_index; - int cnt = 0; + LIST_HEAD(iso); int ret; int i; @@ -317,7 +316,7 @@ static void sgx_reclaim_pages(void) if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) { sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIM_IN_PROGRESS); - chunk[cnt++] = epc_page; + list_move_tail(&epc_page->list, &iso); } else /* The owner is freeing the page. No need to add the * page back to the list of reclaimable pages. @@ -326,8 +325,11 @@ static void sgx_reclaim_pages(void) } spin_unlock(&sgx_global_lru.lock); - for (i = 0; i < cnt; i++) { - epc_page = chunk[i]; + if (list_empty(&iso)) + return; + + i = 0; + list_for_each_entry_safe(epc_page, tmp, &iso, list) { encl_page = epc_page->owner; if (!sgx_reclaimer_age(epc_page)) @@ -342,6 +344,7 @@ static void sgx_reclaim_pages(void) goto skip; } + i++; encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; mutex_unlock(&encl_page->encl->lock); continue; @@ -349,27 +352,19 @@ static void sgx_reclaim_pages(void) skip: spin_lock(&sgx_global_lru.lock); sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIMABLE); - list_add_tail(&epc_page->list, &sgx_global_lru.reclaimable); + list_move_tail(&epc_page->list, &sgx_global_lru.reclaimable); spin_unlock(&sgx_global_lru.lock); kref_put(&encl_page->encl->refcount, sgx_encl_release); - - chunk[i] = NULL; - } - - for (i = 0; i < cnt; i++) { - epc_page = chunk[i]; - if (epc_page) - sgx_reclaimer_block(epc_page); } - for (i = 0; i < cnt; i++) { - epc_page = chunk[i]; - if (!epc_page) - continue; + list_for_each_entry(epc_page, &iso, list) + sgx_reclaimer_block(epc_page); + i = 0; + list_for_each_entry_safe(epc_page, tmp, &iso, list) { encl_page = epc_page->owner; - sgx_reclaimer_write(epc_page, &backing[i]); + sgx_reclaimer_write(epc_page, &backing[i++]); kref_put(&encl_page->encl->refcount, sgx_encl_release); sgx_epc_page_reset_state(epc_page); From patchwork Mon Oct 30 18:20:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 159846 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp2416801vqb; Mon, 30 Oct 2023 11:23:29 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHpJQbtPFw17eoKGCIbHcqLdpGA2xco3W8HGJlaC2ocIrs+EbRjNIhqt8jGFNOg4xJNDrRe X-Received: by 2002:a17:90a:9484:b0:280:4799:a841 with SMTP id s4-20020a17090a948400b002804799a841mr4048525pjo.38.1698690209634; Mon, 30 Oct 2023 11:23:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698690209; cv=none; d=google.com; s=arc-20160816; b=ZoDEUNHrADQiR2ykY1scvl5dxE9LpJYFeTzvQI3ANtGBQWZK6BRTeUbDF9udTGZVe6 mZEAHTzWNaaJWp3KxmzMkh3p77JTAwq6FE+IjCAqQwnh4p5GKxTd9UkZSV7vOM1uHQlc 4HnyzgSAJvFBkUz7q7NIdYM57bTVTGWNrTpI62lkaG8hsE61q6/7NI/U8wpKBQj6yvH3 IISkoewXN/UiZeyqrqnIT+dRPhm0MviJ9yXZ6b4M+r5lSA68Rud/l6qEASSLhsN/cYzb M9t/50qtmwZWfF35pHbTW4AC073lOHf2ELXIs9T1gpUn8jcRPfd9HbLhcQzwNgCrgTOx d9kA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=hx0ZdLNLAQ4oU+EgBWz0IsHkFFUWyeSw5yZu85KjKUU=; fh=CTa835q0iyfQFiV5yjYPzrfvG/ulw1To85Pi/STRmhA=; b=OPLT79TVy8nPHNwv1V48d4H1PXC70aSayhqESXoSz2KukbfqcmCgjBbH4fjxcJxRGg b8ZZoifzxnlAdOcZbrFnCCz1h5NQ7CtnqfnXlktM2VKesfHbIXMCr7BiXLk/jVvWxGU8 d4KT20NVH6UW62mkWUcSMZTYuFNnA6bGzo9W8HZuyXcRiIjpyCbPY+kWLorGHff5Pd70 7Li3iAilQD/lT3azkAqdSLfRS/4Enqdl0+pIhe1Kr4ntOgysUflm8ydfE6kEJg3nybk1 l/RsnQl0ZqfxewU7ILuBU8x6xXDIhOwzUt6S6i/TNdSaz1+KV3HuE2YjBPXdtJe7R0Yq rZ/A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=kedlIRxl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id d10-20020a17090a628a00b0026b49c1aa50si7145807pjj.111.2023.10.30.11.23.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 11:23:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=kedlIRxl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id AAF9C805989C; Mon, 30 Oct 2023 11:23:25 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232660AbjJ3SVO (ORCPT + 32 others); Mon, 30 Oct 2023 14:21:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232546AbjJ3SUm (ORCPT ); Mon, 30 Oct 2023 14:20:42 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D638C9; Mon, 30 Oct 2023 11:20:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698690036; x=1730226036; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XwIgULTxO8Z4Bp/SNgub1tcebqsaKXGrvzAz+ItDLZY=; b=kedlIRxlUjelJSHdfjdLLpBFk0wVY3s/SBU37vmVxgjCthLOVvwK12Li nrKpYs+cYINFBw7OFgQoCFfjUW0GWf9hzEbG4BJm23fZwRV1IJdKj/f8I iXUxLfuBUKplkE74YsTDUC4H4D5PoRXAZ8dEUVTEnMLjW9E2ZAHMwKnBQ mS3IZ1Nd2a2sQM1S2RqZXnNY4tMoCq6FWEAHHbasUChfv3VP8LAIG2rCw 6rdkdqIbYf67tUtNPJzgVS2TEE9hy+ff+U1hCHl4VYTJKQL2xNnflUWS2 ovvRgCIr6+KfJKrrz9olIOJYpTjzYfXKIzUNE6O8xkSIbeAAjwKStmpxj A==; X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="367479608" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="367479608" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2023 11:20:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="789529528" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="789529528" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga008.jf.intel.com with ESMTP; 30 Oct 2023 11:20:29 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, Sean Christopherson , Haitao Huang Subject: [PATCH v6 09/12] x86/sgx: Restructure top-level EPC reclaim function Date: Mon, 30 Oct 2023 11:20:10 -0700 Message-Id: <20231030182013.40086-10-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231030182013.40086-1-haitao.huang@linux.intel.com> References: <20231030182013.40086-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Mon, 30 Oct 2023 11:23:25 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781205785200647452 X-GMAIL-MSGID: 1781205785200647452 From: Sean Christopherson To prepare for per-cgroup reclamation, separate the top-level reclaim function, sgx_reclaim_epc_pages(), into two separate functions: - sgx_isolate_epc_pages() scans and isolates reclaimable pages from a given LRU list. - sgx_do_epc_reclamation() performs the real reclamation for the already isolated pages. Create a new function, sgx_reclaim_epc_pages_global(), calling those two in succession, to replace the original sgx_reclaim_epc_pages(). The above two functions will serve as building blocks for the reclamation flows in later EPC cgroup implementation. sgx_do_epc_reclamation() returns the number of reclaimed pages. The EPC cgroup will use the result to track reclaiming progress. sgx_isolate_epc_pages() returns the additional number of pages to scan for current epoch of reclamation. The EPC cgroup will use the result to determine if more scanning to be done in LRUs in its children groups. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V6: - Restructure patches to make it easier to review. (Kai) - Fix unused nr_to_scan (Kai) --- arch/x86/kernel/cpu/sgx/main.c | 97 ++++++++++++++++++++++------------ arch/x86/kernel/cpu/sgx/sgx.h | 8 +++ 2 files changed, 72 insertions(+), 33 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 33bcba313d40..e8848b493eb7 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -281,33 +281,23 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, mutex_unlock(&encl->lock); } -/* - * Take a fixed number of pages from the head of the active page pool and - * reclaim them to the enclave's private shmem files. Skip the pages, which have - * been accessed since the last scan. Move those pages to the tail of active - * page pool so that the pages get scanned in LRU like fashion. +/** + * sgx_isolate_epc_pages() - Isolate pages from an LRU for reclaim + * @lru: LRU from which to reclaim + * @nr_to_scan: Number of pages to scan for reclaim + * @dst: Destination list to hold the isolated pages * - * Batch process a chunk of pages (at the moment 16) in order to degrade amount - * of IPI's and ETRACK's potentially required. sgx_encl_ewb() does degrade a bit - * among the HW threads with three stage EWB pipeline (EWB, ETRACK + EWB and IPI - * + EWB) but not sufficiently. Reclaiming one page at a time would also be - * problematic as it would increase the lock contention too much, which would - * halt forward progress. + * Return: remaining pages to scan, i.e, @nr_to_scan minus the number of pages scanned. */ -static void sgx_reclaim_pages(void) +unsigned int sgx_isolate_epc_pages(struct sgx_epc_lru_list *lru, unsigned int nr_to_scan, + struct list_head *dst) { - struct sgx_backing backing[SGX_NR_TO_SCAN]; - struct sgx_epc_page *epc_page, *tmp; struct sgx_encl_page *encl_page; - pgoff_t page_index; - LIST_HEAD(iso); - int ret; - int i; + struct sgx_epc_page *epc_page; - spin_lock(&sgx_global_lru.lock); - for (i = 0; i < SGX_NR_TO_SCAN; i++) { - epc_page = list_first_entry_or_null(&sgx_global_lru.reclaimable, - struct sgx_epc_page, list); + spin_lock(&lru->lock); + for (; nr_to_scan > 0; --nr_to_scan) { + epc_page = list_first_entry_or_null(&lru->reclaimable, struct sgx_epc_page, list); if (!epc_page) break; @@ -316,23 +306,53 @@ static void sgx_reclaim_pages(void) if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) { sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIM_IN_PROGRESS); - list_move_tail(&epc_page->list, &iso); + list_move_tail(&epc_page->list, dst); } else /* The owner is freeing the page. No need to add the * page back to the list of reclaimable pages. */ sgx_epc_page_reset_state(epc_page); } - spin_unlock(&sgx_global_lru.lock); + spin_unlock(&lru->lock); + + return nr_to_scan; +} + +/** + * sgx_do_epc_reclamation() - Perform reclamation for isolated EPC pages. + * @iso: List of isolated pages for reclamation + * + * Take a list of EPC pages and reclaim them to the enclave's private shmem files. Do not + * reclaim the pages that have been accessed since the last scan, and move each of those pages + * to the tail of its tracking LRU list. + * + * Limit the number of pages to be processed up to SGX_NR_TO_SCAN_MAX per call in order to + * degrade amount of IPI's and ETRACK's potentially required. sgx_encl_ewb() does degrade a bit + * among the HW threads with three stage EWB pipeline (EWB, ETRACK + EWB and IPI + EWB) but not + * sufficiently. Reclaiming one page at a time would also be problematic as it would increase + * the lock contention too much, which would halt forward progress. + * + * Extra pages in the list beyond the SGX_NR_TO_SCAN_MAX limit are skipped and returned back to + * their tracking LRU lists. + * + * Return: number of pages successfully reclaimed. + */ +unsigned int sgx_do_epc_reclamation(struct list_head *iso) +{ + struct sgx_backing backing[SGX_NR_TO_SCAN_MAX]; + struct sgx_epc_page *epc_page, *tmp; + struct sgx_encl_page *encl_page; + pgoff_t page_index; + size_t ret, i; - if (list_empty(&iso)) - return; + if (list_empty(iso)) + return 0; i = 0; - list_for_each_entry_safe(epc_page, tmp, &iso, list) { + list_for_each_entry_safe(epc_page, tmp, iso, list) { encl_page = epc_page->owner; - if (!sgx_reclaimer_age(epc_page)) + if (i == SGX_NR_TO_SCAN_MAX || !sgx_reclaimer_age(epc_page)) goto skip; page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base); @@ -358,11 +378,11 @@ static void sgx_reclaim_pages(void) kref_put(&encl_page->encl->refcount, sgx_encl_release); } - list_for_each_entry(epc_page, &iso, list) + list_for_each_entry(epc_page, iso, list) sgx_reclaimer_block(epc_page); i = 0; - list_for_each_entry_safe(epc_page, tmp, &iso, list) { + list_for_each_entry_safe(epc_page, tmp, iso, list) { encl_page = epc_page->owner; sgx_reclaimer_write(epc_page, &backing[i++]); @@ -371,6 +391,17 @@ static void sgx_reclaim_pages(void) sgx_free_epc_page(epc_page); } + + return i; +} + +static void sgx_reclaim_epc_pages_global(void) +{ + LIST_HEAD(iso); + + sgx_isolate_epc_pages(&sgx_global_lru, SGX_NR_TO_SCAN, &iso); + + sgx_do_epc_reclamation(&iso); } static bool sgx_should_reclaim(unsigned long watermark) @@ -387,7 +418,7 @@ static bool sgx_should_reclaim(unsigned long watermark) void sgx_reclaim_direct(void) { if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) - sgx_reclaim_pages(); + sgx_reclaim_epc_pages_global(); } static int ksgxd(void *p) @@ -410,7 +441,7 @@ static int ksgxd(void *p) sgx_should_reclaim(SGX_NR_HIGH_PAGES)); if (sgx_should_reclaim(SGX_NR_HIGH_PAGES)) - sgx_reclaim_pages(); + sgx_reclaim_epc_pages_global(); cond_resched(); } @@ -587,7 +618,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) * Need to do a global reclamation if cgroup was not full but free * physical pages run out, causing __sgx_alloc_epc_page() to fail. */ - sgx_reclaim_pages(); + sgx_reclaim_epc_pages_global(); cond_resched(); } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index dd7ab65b5b27..6a40f70ed96f 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -19,6 +19,11 @@ #define SGX_MAX_EPC_SECTIONS 8 #define SGX_EEXTEND_BLOCK_SIZE 256 + +/* + * Maximum number of pages to scan for reclaiming. + */ +#define SGX_NR_TO_SCAN_MAX 32U #define SGX_NR_TO_SCAN 16 #define SGX_NR_LOW_PAGES 32 #define SGX_NR_HIGH_PAGES 64 @@ -162,6 +167,9 @@ void sgx_reclaim_direct(void); void sgx_mark_page_reclaimable(struct sgx_epc_page *page); int sgx_unmark_page_reclaimable(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); +unsigned int sgx_do_epc_reclamation(struct list_head *iso); +unsigned int sgx_isolate_epc_pages(struct sgx_epc_lru_list *lru, unsigned int nr_to_scan, + struct list_head *dst); void sgx_ipi_cb(void *info); From patchwork Mon Oct 30 18:20:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 159842 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp2416133vqb; Mon, 30 Oct 2023 11:22:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHndNyP71gJLgfxPzaSkT5AGmVRfoHNF1REfgukAzKLVTvJgV3fmk8fvmkg8nceDV9PcFkl X-Received: by 2002:a17:90a:b881:b0:280:2652:d41 with SMTP id o1-20020a17090ab88100b0028026520d41mr4067447pjr.4.1698690155287; Mon, 30 Oct 2023 11:22:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698690155; cv=none; d=google.com; s=arc-20160816; b=0BXdbqXRcLQLnj5FnzwWE8kOcP3YKnHfvY8SRzWny6ngH2WBsDCI2aPYzaO0oTCt18 cxmYucxl4sme7EsEg7Yv2u1kypw8g6IO/nDQicOcrsCveAP0wgEkik83gxlupZ77JiRj ygi2ZMlv00QLsa7O4q3xYVg5KTTLxxQ3fmTu4iPlfGjBRYUIUidOBcWnnIX9iBWI6+L8 iKfIjwKbBEZz22iVQsuuZzLHbxotis/9pc7ELqB+pel1eAw9KLx9u43qQMkArdVNnEpg GltTidYGzl+tkMhyC3BH7dmOXvWiea1AaCu34wyUlMYiUZwEPa5SBYr3U6b88EiD/zEd BHtg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ykiBXDGdsbsMKP5/BHEpO3quBX9hlGO78wGZO4FvZhY=; fh=CTa835q0iyfQFiV5yjYPzrfvG/ulw1To85Pi/STRmhA=; b=LpnKixrDU4NPp7+88vuf3UoNzgHbmqm7bd5zUSXGBBkQ8dU/E5Ryjcp5YsiHmDCEMb YFk442WGbSQZCQPEInCCGy3SW5m5IbrVgFOLB4TKzVehycA+S6NiL10v1XeOHphydZLo AKXo6lKvDooq0nWsZuBppCyiea7Uy+VGf7KM5cJIHOPFfbkuIlcNMJ8ErYqvAzzlkwqr nrYY6s9G6hdiuXhiCxA1n7czbFqga6pHQQ7/I/1ujSwhhRl55v0Z4ZNaAow7bOpps/uq tegRmlsXMVpARWOgVGM/R5U6ZpEvHZ3sjb5IbCTfnomyJ/eEi5A/MTlj2cuIOItdtYYq 0VYA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=FrqlDpG5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id q14-20020a17090aa00e00b002791d79bbf9si1791329pjp.187.2023.10.30.11.22.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 11:22:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=FrqlDpG5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 0338B8048C05; Mon, 30 Oct 2023 11:22:32 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232763AbjJ3SVR (ORCPT + 32 others); Mon, 30 Oct 2023 14:21:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53342 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232257AbjJ3SUm (ORCPT ); Mon, 30 Oct 2023 14:20:42 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 698169E; Mon, 30 Oct 2023 11:20:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698690037; x=1730226037; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=p7aYQKR9soC/IW9SQPYWpLsPWDLbgwW6uzudMrK7y0o=; b=FrqlDpG5x9EIYlM+17Ggw3lZmR82k95trqIElly5ZAsOtESv8ha6ktqi M4HTQVNAo9+Sc+8GmCsrQox7Orgnpe/3Da6ZPjAW4t1vgpxrJt7raHefl 13byxcMbeT+DgHN1YOSSgnM7Hh94H7DdDxYfZeMy8Ll5K1kNqA9PJ8aRO ITQ8C9IKXuyK513evVts7JhE0on3wRD9m7GCAP6jcZuIthatYI2NjOaYT Rfg5R90/RIPndPuCcrQFFh4LDoT5ob+4Zl/vMkCDiPzPbE3hyhQcDC+Lh JUaxVvsrD/W04YEUqdhrZf0I7aBjcgQmiiknWNx3EKVdb1UoIA2vWZd4Z Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="367479627" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="367479627" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2023 11:20:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="789529531" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="789529531" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga008.jf.intel.com with ESMTP; 30 Oct 2023 11:20:30 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, Sean Christopherson , Haitao Huang Subject: [PATCH v6 10/12] x86/sgx: Implement EPC reclamation for cgroup Date: Mon, 30 Oct 2023 11:20:11 -0700 Message-Id: <20231030182013.40086-11-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231030182013.40086-1-haitao.huang@linux.intel.com> References: <20231030182013.40086-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Mon, 30 Oct 2023 11:22:32 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781205728328380445 X-GMAIL-MSGID: 1781205728328380445 From: Kristen Carlson Accardi Currently all reclaimable pages are tracked only in the global LRU list, and only the global reclaimer(ksgxd) performs reclamation when the global free page counts are lower than a threshold. When a cgroup limit is reached, cgroup need also try to reclaim pages allocated within the group. This patch enables per-cgroup reclamation. Add a helper function sgx_lru_list(), that for a given EPC page, returns the LRU list of the cgroup that is assigned to the EPC page at allocation time. This helper is used to replace the hard coded global LRU wherever appropriate: modify sgx_mark/unmark_page_reclaimable() to track EPCs in the LRU list of the appropriate cgroup; modify sgx_do_epc_reclamation() to return unreclaimed pages back to proper cgroup. Implement the reclamation flow for cgroup, encapsulated in the top-level function sgx_epc_cgroup_reclaim_pages(). Just like the global reclaimer, the cgroup reclaimer first isolates candidate pages for reclaim, then invokes sgx_do_epc_reclamation(). The only difference is that a cgroup does a pre-order walk on its subtree to scan for candidate pages from its own LRU and LRUs in its descendants. In some contexts, e.g. page fault handling, only asynchronous reclamation is allowed. Create a workqueue, 'sgx_epc_cg_wq', corresponding work item and function definitions to support the asynchronous reclamation. Add a Boolean parameter for sgx_epc_cgroup_try_charge() to indicate whether synchronous reclaim is allowed or not. Both synchronous and asynchronous flows invoke the same top level reclaim function, sgx_epc_cgroup_reclaim_pages(). All reclaimable pages are tracked in per-cgroup LRUs when cgroup is enabled. Update the original global reclaimer to reclaim from the root cgroup when cgroup is enabled, also calling sgx_epc_cgroup_reclaim_pages(). Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang --- V6: - Drop EPC OOM killer.(Dave, Michal) - Patch restructuring: this includes part split from the patch, "Limit process EPC usage with misc cgroup controller", and combined with "Prepare for multiple LRUs" - Removed force reclamation ignoring 'youngness' of the pages - Removed checking for capacity in reclamation loop. --- arch/x86/kernel/cpu/sgx/epc_cgroup.c | 224 ++++++++++++++++++++++++++- arch/x86/kernel/cpu/sgx/epc_cgroup.h | 19 ++- arch/x86/kernel/cpu/sgx/main.c | 71 ++++++--- 3 files changed, 289 insertions(+), 25 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c b/arch/x86/kernel/cpu/sgx/epc_cgroup.c index 500627d0563f..110d44c0ef7c 100644 --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.c +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c @@ -5,6 +5,38 @@ #include #include "epc_cgroup.h" +#define SGX_EPC_RECLAIM_MIN_PAGES 16U + +static struct workqueue_struct *sgx_epc_cg_wq; + +static inline u64 sgx_epc_cgroup_page_counter_read(struct sgx_epc_cgroup *epc_cg) +{ + return atomic64_read(&epc_cg->cg->res[MISC_CG_RES_SGX_EPC].usage) / PAGE_SIZE; +} + +static inline u64 sgx_epc_cgroup_max_pages(struct sgx_epc_cgroup *epc_cg) +{ + return READ_ONCE(epc_cg->cg->res[MISC_CG_RES_SGX_EPC].max) / PAGE_SIZE; +} + +/* + * Get the lower bound of limits of a cgroup and its ancestors. Used in + * sgx_epc_cgroup_reclaim_work_func() to determine if EPC usage of a cgroup is over its limit + * or its ancestors' hence reclamation is needed. + */ +static inline u64 sgx_epc_cgroup_max_pages_to_root(struct sgx_epc_cgroup *epc_cg) +{ + struct misc_cg *i = epc_cg->cg; + u64 m = U64_MAX; + + while (i) { + m = min(m, READ_ONCE(i->res[MISC_CG_RES_SGX_EPC].max)); + i = misc_cg_parent(i); + } + + return m / PAGE_SIZE; +} + static inline struct sgx_epc_cgroup *sgx_epc_cgroup_from_misc_cg(struct misc_cg *cg) { return (struct sgx_epc_cgroup *)(cg->res[MISC_CG_RES_SGX_EPC].priv); @@ -15,12 +47,188 @@ static inline bool sgx_epc_cgroup_disabled(void) return !cgroup_subsys_enabled(misc_cgrp_subsys); } +/** + * sgx_epc_cgroup_lru_empty() - check if a cgroup tree has no pages on its LRUs + * @root: root of the tree to check + * + * Return: %true if all cgroups under the specified root have empty LRU lists. + * Used to avoid livelocks due to a cgroup having a non-zero charge count but + * no pages on its LRUs, e.g. due to a dead enclave waiting to be released or + * because all pages in the cgroup are unreclaimable. + */ +bool sgx_epc_cgroup_lru_empty(struct misc_cg *root) +{ + struct cgroup_subsys_state *css_root; + struct cgroup_subsys_state *pos; + struct sgx_epc_cgroup *epc_cg; + bool ret = true; + + /* + * Caller ensure css_root ref acquired + */ + css_root = &root->css; + + rcu_read_lock(); + css_for_each_descendant_pre(pos, css_root) { + if (!css_tryget(pos)) + break; + + rcu_read_unlock(); + + epc_cg = sgx_epc_cgroup_from_misc_cg(css_misc(pos)); + + spin_lock(&epc_cg->lru.lock); + ret = list_empty(&epc_cg->lru.reclaimable); + spin_unlock(&epc_cg->lru.lock); + + rcu_read_lock(); + css_put(pos); + if (!ret) + break; + } + + rcu_read_unlock(); + + return ret; +} + +/** + * sgx_epc_cgroup_isolate_pages() - walk a cgroup tree and scan LRUs to select pages for + * reclamation + * @root: root of the tree to start walking + * @nr_to_scan: The number of pages to scan + * @dst: Destination list to hold the isolated pages + */ +void sgx_epc_cgroup_isolate_pages(struct misc_cg *root, + unsigned int nr_to_scan, struct list_head *dst) +{ + struct cgroup_subsys_state *css_root; + struct cgroup_subsys_state *pos; + struct sgx_epc_cgroup *epc_cg; + + if (!nr_to_scan) + return; + + /* Caller ensure css_root ref acquired */ + css_root = &root->css; + + rcu_read_lock(); + css_for_each_descendant_pre(pos, css_root) { + if (!css_tryget(pos)) + break; + rcu_read_unlock(); + + epc_cg = sgx_epc_cgroup_from_misc_cg(css_misc(pos)); + nr_to_scan = sgx_isolate_epc_pages(&epc_cg->lru, nr_to_scan, dst); + + rcu_read_lock(); + css_put(pos); + if (!nr_to_scan) + break; + } + + rcu_read_unlock(); +} + +static unsigned int sgx_epc_cgroup_reclaim_pages(unsigned int nr_pages, + struct misc_cg *root) +{ + LIST_HEAD(iso); + /* + * Attempting to reclaim only a few pages will often fail and is inefficient, while + * reclaiming a huge number of pages can result in soft lockups due to holding various + * locks for an extended duration. + */ + nr_pages = max(nr_pages, SGX_EPC_RECLAIM_MIN_PAGES); + nr_pages = min(nr_pages, SGX_NR_TO_SCAN_MAX); + sgx_epc_cgroup_isolate_pages(root, nr_pages, &iso); + + return sgx_do_epc_reclamation(&iso); +} + +/* + * Scheduled by sgx_epc_cgroup_try_charge() to reclaim pages from the cgroup when the cgroup is + * at/near its maximum capacity + */ +static void sgx_epc_cgroup_reclaim_work_func(struct work_struct *work) +{ + struct sgx_epc_cgroup *epc_cg; + u64 cur, max; + + epc_cg = container_of(work, struct sgx_epc_cgroup, reclaim_work); + + for (;;) { + max = sgx_epc_cgroup_max_pages_to_root(epc_cg); + + /* + * Adjust the limit down by one page, the goal is to free up + * pages for fault allocations, not to simply obey the limit. + * Conditionally decrementing max also means the cur vs. max + * check will correctly handle the case where both are zero. + */ + if (max) + max--; + + /* + * Unless the limit is extremely low, in which case forcing + * reclaim will likely cause thrashing, force the cgroup to + * reclaim at least once if it's operating *near* its maximum + * limit by adjusting @max down by half the min reclaim size. + * This work func is scheduled by sgx_epc_cgroup_try_charge + * when it cannot directly reclaim due to being in an atomic + * context, e.g. EPC allocation in a fault handler. Waiting + * to reclaim until the cgroup is actually at its limit is less + * performant as it means the faulting task is effectively + * blocked until a worker makes its way through the global work + * queue. + */ + if (max > SGX_NR_TO_SCAN_MAX) + max -= (SGX_EPC_RECLAIM_MIN_PAGES / 2); + + cur = sgx_epc_cgroup_page_counter_read(epc_cg); + + if (cur <= max || sgx_epc_cgroup_lru_empty(epc_cg->cg)) + break; + + /* Keep reclaiming until above condition is met. */ + sgx_epc_cgroup_reclaim_pages((unsigned int)(cur - max), epc_cg->cg); + } +} + +static int __sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, + bool reclaim) +{ + for (;;) { + if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, + PAGE_SIZE)) + break; + + if (sgx_epc_cgroup_lru_empty(epc_cg->cg)) + return -ENOMEM; + + if (signal_pending(current)) + return -ERESTARTSYS; + + if (!reclaim) { + queue_work(sgx_epc_cg_wq, &epc_cg->reclaim_work); + return -EBUSY; + } + + if (!sgx_epc_cgroup_reclaim_pages(1, epc_cg->cg)) + /* All pages were too young to reclaim, try again */ + schedule(); + } + + return 0; +} + /** * sgx_epc_cgroup_try_charge() - hierarchically try to charge a single EPC page + * @reclaim: whether or not synchronous reclaim is allowed * * Returns EPC cgroup or NULL on success, -errno on failure. */ -struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(void) +struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(bool reclaim) { struct sgx_epc_cgroup *epc_cg; int ret; @@ -29,12 +237,12 @@ struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(void) return NULL; epc_cg = sgx_epc_cgroup_from_misc_cg(get_current_misc_cg()); - ret = misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE); + ret = __sgx_epc_cgroup_try_charge(epc_cg, reclaim); - if (!ret) { + if (ret) { /* No epc_cg returned, release ref from get_current_misc_cg() */ put_misc_cg(epc_cg->cg); - return ERR_PTR(-ENOMEM); + return ERR_PTR(ret); } /* Ref released in sgx_epc_cgroup_uncharge() */ @@ -64,6 +272,7 @@ static void sgx_epc_cgroup_free(struct misc_cg *cg) if (!epc_cg) return; + cancel_work_sync(&epc_cg->reclaim_work); kfree(epc_cg); } @@ -82,6 +291,8 @@ static int sgx_epc_cgroup_alloc(struct misc_cg *cg) if (!epc_cg) return -ENOMEM; + sgx_lru_init(&epc_cg->lru); + INIT_WORK(&epc_cg->reclaim_work, sgx_epc_cgroup_reclaim_work_func); cg->res[MISC_CG_RES_SGX_EPC].misc_ops = &sgx_epc_cgroup_ops; cg->res[MISC_CG_RES_SGX_EPC].priv = epc_cg; epc_cg->cg = cg; @@ -95,6 +306,11 @@ static int __init sgx_epc_cgroup_init(void) if (!boot_cpu_has(X86_FEATURE_SGX)) return 0; + sgx_epc_cg_wq = alloc_workqueue("sgx_epc_cg_wq", + WQ_UNBOUND | WQ_FREEZABLE, + WQ_UNBOUND_MAX_ACTIVE); + BUG_ON(!sgx_epc_cg_wq); + cg = misc_cg_root(); BUG_ON(!cg); diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.h b/arch/x86/kernel/cpu/sgx/epc_cgroup.h index c3abfe82be15..ddc1b89f2805 100644 --- a/arch/x86/kernel/cpu/sgx/epc_cgroup.h +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h @@ -16,20 +16,33 @@ #define MISC_CG_RES_SGX_EPC MISC_CG_RES_TYPES struct sgx_epc_cgroup; -static inline struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(void) +static inline struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(bool reclaim) { return NULL; } static inline void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg) { } + +static inline void sgx_epc_cgroup_isolate_pages(struct misc_cg *root, + unsigned int nr_to_scan, + struct list_head *dst) { } + +static bool sgx_epc_cgroup_lru_empty(struct misc_cg *root) +{ + return true; +} #else struct sgx_epc_cgroup { - struct misc_cg *cg; + struct misc_cg *cg; + struct sgx_epc_lru_list lru; + struct work_struct reclaim_work; }; -struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(void); +struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(bool reclaim); void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg); bool sgx_epc_cgroup_lru_empty(struct misc_cg *root); +void sgx_epc_cgroup_isolate_pages(struct misc_cg *root, unsigned int nr_to_scan, + struct list_head *dst); #endif diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index e8848b493eb7..c496b8f15b54 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -32,6 +32,31 @@ static DEFINE_XARRAY(sgx_epc_address_space); */ static struct sgx_epc_lru_list sgx_global_lru; +#ifndef CONFIG_CGROUP_SGX_EPC +static inline struct sgx_epc_lru_list *sgx_lru_list(struct sgx_epc_page *epc_page) +{ + return &sgx_global_lru; +} +#else +static inline struct sgx_epc_lru_list *sgx_lru_list(struct sgx_epc_page *epc_page) +{ + if (epc_page->epc_cg) + return &epc_page->epc_cg->lru; + + /* This should not happen if kernel is configured correctly */ + WARN_ON_ONCE(1); + return &sgx_global_lru; +} +#endif + +static inline bool sgx_can_reclaim(void) +{ + if (IS_ENABLED(CONFIG_CGROUP_SGX_EPC)) + return !sgx_epc_cgroup_lru_empty(misc_cg_root()); + + return !list_empty(&sgx_global_lru.reclaimable); +} + static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0); /* Nodes with one or more EPC sections. */ @@ -342,6 +367,7 @@ unsigned int sgx_do_epc_reclamation(struct list_head *iso) struct sgx_backing backing[SGX_NR_TO_SCAN_MAX]; struct sgx_epc_page *epc_page, *tmp; struct sgx_encl_page *encl_page; + struct sgx_epc_lru_list *lru; pgoff_t page_index; size_t ret, i; @@ -370,10 +396,11 @@ unsigned int sgx_do_epc_reclamation(struct list_head *iso) continue; skip: - spin_lock(&sgx_global_lru.lock); + lru = sgx_lru_list(epc_page); + spin_lock(&lru->lock); sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIMABLE); - list_move_tail(&epc_page->list, &sgx_global_lru.reclaimable); - spin_unlock(&sgx_global_lru.lock); + list_move_tail(&epc_page->list, &lru->reclaimable); + spin_unlock(&lru->lock); kref_put(&encl_page->encl->refcount, sgx_encl_release); } @@ -397,9 +424,13 @@ unsigned int sgx_do_epc_reclamation(struct list_head *iso) static void sgx_reclaim_epc_pages_global(void) { + unsigned int nr_to_scan = SGX_NR_TO_SCAN; LIST_HEAD(iso); - sgx_isolate_epc_pages(&sgx_global_lru, SGX_NR_TO_SCAN, &iso); + if (IS_ENABLED(CONFIG_CGROUP_SGX_EPC)) + sgx_epc_cgroup_isolate_pages(misc_cg_root(), nr_to_scan, &iso); + else + sgx_isolate_epc_pages(&sgx_global_lru, nr_to_scan, &iso); sgx_do_epc_reclamation(&iso); } @@ -407,7 +438,7 @@ static void sgx_reclaim_epc_pages_global(void) static bool sgx_should_reclaim(unsigned long watermark) { return atomic_long_read(&sgx_nr_free_pages) < watermark && - !list_empty(&sgx_global_lru.reclaimable); + sgx_can_reclaim(); } /* @@ -528,26 +559,26 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) } /** - * sgx_mark_page_reclaimable() - Mark a page as reclaimable + * sgx_mark_page_reclaimable() - Mark a page as reclaimable and add it to an appropriate LRU * @page: EPC page * - * Mark a page as reclaimable and add it to the active page list. Pages - * are automatically removed from the active list when freed. */ void sgx_mark_page_reclaimable(struct sgx_epc_page *page) { - spin_lock(&sgx_global_lru.lock); + struct sgx_epc_lru_list *lru = sgx_lru_list(page); + + spin_lock(&lru->lock); WARN_ON_ONCE(sgx_epc_page_reclaimable(page->flags)); page->flags |= SGX_EPC_PAGE_RECLAIMABLE; - list_add_tail(&page->list, &sgx_global_lru.reclaimable); - spin_unlock(&sgx_global_lru.lock); + list_add_tail(&page->list, &lru->reclaimable); + spin_unlock(&lru->lock); } /** * sgx_unmark_page_reclaimable() - Remove a page from the reclaim list * @page: EPC page * - * Clear the reclaimable flag and remove the page from the active page list. + * Clear the reclaimable flag if set and remove the page from its LRU. * * Return: * 0 on success, @@ -555,15 +586,17 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page) */ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) { - spin_lock(&sgx_global_lru.lock); + struct sgx_epc_lru_list *lru = sgx_lru_list(page); + + spin_lock(&lru->lock); if (sgx_epc_page_reclaim_in_progress(page->flags)) { - spin_unlock(&sgx_global_lru.lock); + spin_unlock(&lru->lock); return -EBUSY; } list_del(&page->list); sgx_epc_page_reset_state(page); - spin_unlock(&sgx_global_lru.lock); + spin_unlock(&lru->lock); return 0; } @@ -590,7 +623,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) struct sgx_epc_page *page; struct sgx_epc_cgroup *epc_cg; - epc_cg = sgx_epc_cgroup_try_charge(); + epc_cg = sgx_epc_cgroup_try_charge(reclaim); if (IS_ERR(epc_cg)) return ERR_CAST(epc_cg); @@ -601,8 +634,10 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - if (list_empty(&sgx_global_lru.reclaimable)) - return ERR_PTR(-ENOMEM); + if (!sgx_can_reclaim()) { + page = ERR_PTR(-ENOMEM); + break; + } if (!reclaim) { page = ERR_PTR(-EBUSY); From patchwork Mon Oct 30 18:20:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 159848 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp2416865vqb; Mon, 30 Oct 2023 11:23:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGA8V/K3OAcc9YrzZTj+6+FHgjuewLmONUjpDhsL0L77XjsiRl0RQZ/K97onsf61IBioWBZ X-Received: by 2002:a05:6a00:21d0:b0:68e:2f6e:b4c0 with SMTP id t16-20020a056a0021d000b0068e2f6eb4c0mr9061503pfj.28.1698690214774; Mon, 30 Oct 2023 11:23:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698690214; cv=none; d=google.com; s=arc-20160816; b=eUjM8LjamNdoXxGY+SZr5Uj+mT69WPn19YWbeErwI7jCwx3d0+hGtjGNGVh3yC7gcY BYBDUAzvdiL/Gqt8Jc3TYgr1XP5jA67le7/BXXlBLZ+uUHfRp+v9Gc79hR55HAn+G0y5 Qz48ToDglHebadq92XNj1SCCCQHTuakHPj5Gm4h6RBmcDYTU7VFDiBibSaSNokXDddpx ODlg3f9zftvrtCZUu0LLeISajl610ct/rxLiEK4ZH+congAhbv6/xZRDzqRfKU9Hs//X 8PtHTwSdOPU9oZyUu+imAP5CK7cQTqb04MakVcJWYGLBXQWDyLbpYuVGNTcZPI5JMd4S ya7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=k5ADfNm9pzqDxA7yGCXld5DiJo52D2lEIMAEYaTAoEE=; fh=CTa835q0iyfQFiV5yjYPzrfvG/ulw1To85Pi/STRmhA=; b=X4QOGFkAF+TAjkjOl8fW2+Ow5Vd47sOoUN2YftTRqPW429+SnXvNTfhLUEjPSsyQKT XvlGQDa4aTivYN6nppVlyErKd+Yfp1VOdrz6b39Y+xgQvlKBQrtQSh/dNR8lQ6ZH5nCZ j4bBNHDsEomJzDewj3T1jGtmyq/wb6hdUMx9uqNY+jewkZiaZbamUumQUiW6waesY0SG rngkgYcpxj5iw883/zPZBJASOCLh3vZxc4RRyADN6Jx4Uh9dy+h4abcsBQLNZ3hC6l9N Ty0c1UuJM0qYGX7mnMJtOjFhDpe+5hAtUhAWEUvlcNRcHOtDsoPg0qYJ1Y46ZnOckt/h E+WA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=dKJ3Ocih; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id c3-20020a056a000ac300b006c1222d68e2si1373571pfl.196.2023.10.30.11.23.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 11:23:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=dKJ3Ocih; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 8EDA5805B205; Mon, 30 Oct 2023 11:23:26 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233761AbjJ3SVY (ORCPT + 32 others); Mon, 30 Oct 2023 14:21:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53272 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232645AbjJ3SUq (ORCPT ); Mon, 30 Oct 2023 14:20:46 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0287210B; Mon, 30 Oct 2023 11:20:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698690037; x=1730226037; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=zF19XFFiqPbLLTSJfpDVbR2CVLpYKiYAcawYZXkE6BY=; b=dKJ3OcihbkND6gY5GqP9gflXvAXxIzl8V6KEUSDbh3Pvm04SP6tdpyHi ft9Y3Tgdr5HPsDrz9kvWTi12q+1fJvzdGagCHiHJHarie9o19R6VXEDOJ vVpVC8HPpeFyBwQSvO0qB25N0B9nHTW7R59RxBaLfegI4JZ+1tq8mWG62 +VEz5FFzQl1UAabuyZW/g3Jontb7zi3MMEV2hNASaYlOErl+dKxhtfZuk OmwGIN+oMG7/sz0H8FcReMnfAmJ6pUf3VDFHcBvC31s8DCqBuvnjuDw2f ftk958JBVY/VULcsu6IPRemuqeKtkxXaZr9yfCOJf1O8eMR5gFTgE8Lun Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="367479640" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="367479640" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2023 11:20:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="789529534" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="789529534" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga008.jf.intel.com with ESMTP; 30 Oct 2023 11:20:30 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, Sean Christopherson , Haitao Huang Subject: [PATCH v6 11/12] Docs/x86/sgx: Add description for cgroup support Date: Mon, 30 Oct 2023 11:20:12 -0700 Message-Id: <20231030182013.40086-12-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231030182013.40086-1-haitao.huang@linux.intel.com> References: <20231030182013.40086-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Mon, 30 Oct 2023 11:23:26 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781205790770613937 X-GMAIL-MSGID: 1781205790770613937 From: Sean Christopherson Add initial documentation of how to regulate the distribution of SGX Enclave Page Cache (EPC) memory via the Miscellaneous cgroup controller. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V6: - Remove mentioning of VMM specific behavior on handling SIGBUS - Remove statement of forced reclamation, add statement to specify ENOMEM returned when no reclamation possible. - Added statements on the non-preemptive nature for the max limit - Dropped Reviewed-by tag because of changes V4: - Fix indentation (Randy) - Change misc.events file to be read-only - Fix a typo for 'subsystem' - Add behavior when VMM overcommit EPC with a cgroup (Mikko) --- Documentation/arch/x86/sgx.rst | 74 ++++++++++++++++++++++++++++++++++ 1 file changed, 74 insertions(+) diff --git a/Documentation/arch/x86/sgx.rst b/Documentation/arch/x86/sgx.rst index d90796adc2ec..dfc8fac13ab2 100644 --- a/Documentation/arch/x86/sgx.rst +++ b/Documentation/arch/x86/sgx.rst @@ -300,3 +300,77 @@ to expected failures and handle them as follows: first call. It indicates a bug in the kernel or the userspace client if any of the second round of ``SGX_IOC_VEPC_REMOVE_ALL`` calls has a return code other than 0. + + +Cgroup Support +============== + +The "sgx_epc" resource within the Miscellaneous cgroup controller regulates distribution of SGX +EPC memory, which is a subset of system RAM that is used to provide SGX-enabled applications +with protected memory, and is otherwise inaccessible, i.e. shows up as reserved in /proc/iomem +and cannot be read/written outside of an SGX enclave. + +Although current systems implement EPC by stealing memory from RAM, for all intents and +purposes the EPC is independent from normal system memory, e.g. must be reserved at boot from +RAM and cannot be converted between EPC and normal memory while the system is running. The EPC +is managed by the SGX subsystem and is not accounted by the memory controller. Note that this +is true only for EPC memory itself, i.e. normal memory allocations related to SGX and EPC +memory, e.g. the backing memory for evicted EPC pages, are accounted, limited and protected by +the memory controller. + +Much like normal system memory, EPC memory can be overcommitted via virtual memory techniques +and pages can be swapped out of the EPC to their backing store (normal system memory allocated +via shmem). The SGX EPC subsystem is analogous to the memory subsystem, and it implements +limit and protection models for EPC memory. + +SGX EPC Interface Files +----------------------- + +For a generic description of the Miscellaneous controller interface files, please see +Documentation/admin-guide/cgroup-v2.rst + +All SGX EPC memory amounts are in bytes unless explicitly stated otherwise. If a value which +is not PAGE_SIZE aligned is written, the actual value used by the controller will be rounded +down to the closest PAGE_SIZE multiple. + + misc.capacity + A read-only flat-keyed file shown only in the root cgroup. The sgx_epc resource will + show the total amount of EPC memory available on the platform. + + misc.current + A read-only flat-keyed file shown in the non-root cgroups. The sgx_epc resource will + show the current active EPC memory usage of the cgroup and its descendants. EPC pages + that are swapped out to backing RAM are not included in the current count. + + misc.max + A read-write single value file which exists on non-root cgroups. The sgx_epc resource + will show the EPC usage hard limit. The default is "max". + + If a cgroup's EPC usage reaches this limit, EPC allocations, e.g. for page fault + handling, will be blocked until EPC can be reclaimed from the cgroup. If there are no + pages left that are reclaimable within the same group, the kernel returns ENOMEM. + + The EPC pages allocated for a guest VM by the virtual EPC driver are not reclaimable by + the host kernel. In case the guest cgroup's limit is reached and no reclaimable pages + left in the same cgroup, the virtual EPC driver returns SIGBUS to the user space + process to indicate failure on new EPC allocation requests. + + The misc.max limit is non-preemptive. If a user writes a limit lower than the current + usage to this file, the cgroup will not preemptively deallocate pages currently in use, + and will only start blocking the next allocation and reclaiming EPC at that time. + + misc.events + A read-only flat-keyed file which exists on non-root cgroups. + A value change in this file generates a file modified event. + + max + The number of times the cgroup has triggered a reclaim + due to its EPC usage approaching (or exceeding) its max + EPC boundary. + +Migration +--------- + +Once an EPC page is charged to a cgroup (during allocation), it remains charged to the original +cgroup until the page is released or reclaimed. Migrating a process to a different cgroup +doesn't move the EPC charges that it incurred while in the previous cgroup to its new cgroup. From patchwork Mon Oct 30 18:20:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 159847 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp2416816vqb; Mon, 30 Oct 2023 11:23:30 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGbSLeEAaeXQTYq0xGYjaz9md+0807Sqet6coGBy249Zk0btCOcjyV89SE0CIgNrrgZdIyY X-Received: by 2002:a17:90b:3601:b0:27d:839:52ae with SMTP id ml1-20020a17090b360100b0027d083952aemr8059570pjb.32.1698690210516; Mon, 30 Oct 2023 11:23:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698690210; cv=none; d=google.com; s=arc-20160816; b=ief4LyOnWtjVFqDnv7ciLE0R8XnD2AO//7teiZ2HQo9QeWSsgnOj4wK2WM8t4RWWQq UTBJprYF54ZcIb3QIvBfkmp25oFHM5AJyX6rkl5RuNTeC9nAlRu90X+Qynqmu3+f0Y5m 5p9mlZI1UuAjRqFRl2xLEsdZueYVLsMuZNHBivw3M33hVnR64t68Pl91G0Uyf3dVndel fYQchCUbw+Fpka3+40d173ppRrGIey+HXjHK7gCqRmyqjp5Gc6jLbEs8DIPv2KzEZRRW NzFzgFzfhiGWZld9gscXSc5pvOwkuTupvwQnNVtuqb6NypeuRe98Bfj2sO+iGBUB9zUr RJbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=K4FI2feDOco5ghbKPka0S/BXj64ERzYv5rfAibUo/tM=; fh=sNqM0rzpS3sahstSbhU1PMo5LJZgUrUMK8BDlDCJMTE=; b=TfqsBBVADxazJKchFkJrXjOro18sg/NnNeW+kmKEW6p79bB6DRRE2MF16eKqrm0fi9 hu4lFWbPjM+5lIJssW7Yc875AbJGSojrm5UlpBh0Gp0yvVexYBY/IBB52h4IfiZU5cmz 0zD+0YKKh5r7WctPJeQJdijxUxRgM1ZLX2W4c2K1UpbvEwPbu0zu+WYCbPkq9OA6A98T tgSFuv/4t1NsEZO+5bEgsULnpn0w6NRApVXRmbgfdS3qR45X+J32GZ16/NBm6FoWJQqH V40jJoAHZJNbcLY8DOw0iPwyB8xDnk9fgJnjXfNj2b3dtf7XrGeScVv8U962iO7BODqB qqXw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=oIPxdGh2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id lk9-20020a17090b33c900b002800879f482si5713902pjb.87.2023.10.30.11.23.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 11:23:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=oIPxdGh2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 9BB8E8051A1C; Mon, 30 Oct 2023 11:23:26 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233632AbjJ3SVV (ORCPT + 32 others); Mon, 30 Oct 2023 14:21:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232169AbjJ3SUm (ORCPT ); Mon, 30 Oct 2023 14:20:42 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 09D59E1; Mon, 30 Oct 2023 11:20:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698690038; x=1730226038; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vwoNizkiHcFYzYVMWoNqF3D7lmeva/dKQ41pgY/Z5Eo=; b=oIPxdGh2L7axnKHvFARPmeUYlAfGCBBp76ONANHG6iod16Vgykf5YmdT jRhPPmhS3obzO+FKJ6NQ0jeV+IMAMe0jLVizCoY+wddaS+FbedQS6opF5 xFHkAmSkZ96bBNFSWZXVgqzu5frODhmv1oj0PmIxfUT1pmxQIx07fYD5r ySHpVDLB2CcIA1WU+dwBhtab+P2QgBOKIZrHfSVM06i8jpgYnlUPiiZ/z KqEHTIGv6m6i4R2e5+Yj20yiPQPkZr6j6a3o+e/M9KAYszAVH4JRbAVS7 gfsox4wImxFise3DadxLaDmE/jdoQ+c3IlAIQzvYaxPN/UbPHpnHnQVGF A==; X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="367479647" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="367479647" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Oct 2023 11:20:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10879"; a="789529539" X-IronPort-AV: E=Sophos;i="6.03,263,1694761200"; d="scan'208";a="789529539" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by orsmga008.jf.intel.com with ESMTP; 30 Oct 2023 11:20:30 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com, Haitao Huang Subject: [PATCH v6 12/12] selftests/sgx: Add scripts for EPC cgroup testing Date: Mon, 30 Oct 2023 11:20:13 -0700 Message-Id: <20231030182013.40086-13-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231030182013.40086-1-haitao.huang@linux.intel.com> References: <20231030182013.40086-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Mon, 30 Oct 2023 11:23:26 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781205786453129306 X-GMAIL-MSGID: 1781205786453129306 The scripts rely on cgroup-tools package from libcgroup [1]. To run selftests for epc cgroup: sudo ./run_epc_cg_selftests.sh With different cgroups, the script starts one or multiple concurrent SGX selftests, each to run one unclobbered_vdso_oversubscribed test. Each of such test tries to load an enclave of EPC size equal to the EPC capacity available on the platform. The script checks results against the expectation set for each cgroup and reports success or failure. The script creates 3 different cgroups at the beginning with following expectations: 1) SMALL - intentionally small enough to fail the test loading an enclave of size equal to the capacity. 2) LARGE - large enough to run up to 4 concurrent tests but fail some if more than 4 concurrent tests are run. The script starts 4 expecting at least one test to pass, and then starts 5 expecting at least one test to fail. 3) LARGER - limit is the same as the capacity, large enough to run lots of concurrent tests. The script starts 10 of them and expects all pass. Then it reruns the same test with one process randomly killed and usage checked to be zero after all process exit. To watch misc cgroup 'current' changes during testing, run this in a separate terminal: ./watch_misc_for_tests.sh current [1] https://github.com/libcgroup/libcgroup/blob/main/README Signed-off-by: Haitao Huang --- V5: - Added script with automatic results checking, remove the interactive script. - The script can run independent from the series below. --- .../selftests/sgx/run_epc_cg_selftests.sh | 196 ++++++++++++++++++ .../selftests/sgx/watch_misc_for_tests.sh | 13 ++ 2 files changed, 209 insertions(+) create mode 100755 tools/testing/selftests/sgx/run_epc_cg_selftests.sh create mode 100755 tools/testing/selftests/sgx/watch_misc_for_tests.sh diff --git a/tools/testing/selftests/sgx/run_epc_cg_selftests.sh b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh new file mode 100755 index 000000000000..72b93f694753 --- /dev/null +++ b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh @@ -0,0 +1,196 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright(c) 2023 Intel Corporation. + +TEST_ROOT_CG=selftest +cgcreate -g misc:$TEST_ROOT_CG +if [ $? -ne 0 ]; then + echo "# Please make sure cgroup-tools is installed, and misc cgroup is mounted." + exit 1 +fi +TEST_CG_SUB1=$TEST_ROOT_CG/test1 +TEST_CG_SUB2=$TEST_ROOT_CG/test2 +TEST_CG_SUB3=$TEST_ROOT_CG/test1/test3 +TEST_CG_SUB4=$TEST_ROOT_CG/test4 + +cgcreate -g misc:$TEST_CG_SUB1 +cgcreate -g misc:$TEST_CG_SUB2 +cgcreate -g misc:$TEST_CG_SUB3 +cgcreate -g misc:$TEST_CG_SUB4 + +# Default to V2 +CG_ROOT=/sys/fs/cgroup +if [ ! -d "/sys/fs/cgroup/misc" ]; then + echo "# cgroup V2 is in use." +else + echo "# cgroup V1 is in use." + CG_ROOT=/sys/fs/cgroup/misc +fi + +CAPACITY=$(grep "sgx_epc" "$CG_ROOT/misc.capacity" | awk '{print $2}') +# This is below number of VA pages needed for enclave of capacity size. So +# should fail oversubscribed cases +SMALL=$(( CAPACITY / 512 )) + +# At least load one enclave of capacity size successfully, maybe up to 4. +# But some may fail if we run more than 4 concurrent enclaves of capacity size. +LARGE=$(( SMALL * 4 )) + +# Load lots of enclaves +LARGER=$CAPACITY +echo "# Setting up limits." +echo "sgx_epc $SMALL" | tee $CG_ROOT/$TEST_CG_SUB1/misc.max +echo "sgx_epc $LARGE" | tee $CG_ROOT/$TEST_CG_SUB2/misc.max +echo "sgx_epc $LARGER" | tee $CG_ROOT/$TEST_CG_SUB4/misc.max + +timestamp=$(date +%Y%m%d_%H%M%S) + +test_cmd="./test_sgx -t unclobbered_vdso_oversubscribed" + +echo "# Start unclobbered_vdso_oversubscribed with SMALL limit, expecting failure..." +# Always use leaf node of misc cgroups so it works for both v1 and v2 +# these may fail on OOM +cgexec -g misc:$TEST_CG_SUB3 $test_cmd >cgtest_small_$timestamp.log 2>&1 +if [[ $? -eq 0 ]]; then + echo "# Fail on SMALL limit, not expecting any test passes." + cgdelete -r -g misc:$TEST_ROOT_CG + exit 1 +else + echo "# Test failed as expected." +fi + +echo "# PASSED SMALL limit." + +echo "# Start 4 concurrent unclobbered_vdso_oversubscribed tests with LARGE limit, + expecting at least one success...." +pids=() +for i in {1..4}; do + ( + cgexec -g misc:$TEST_CG_SUB2 $test_cmd >cgtest_large_positive_$timestamp.$i.log 2>&1 + ) & + pids+=($!) +done + +any_success=0 +for pid in "${pids[@]}"; do + wait "$pid" + status=$? + if [[ $status -eq 0 ]]; then + any_success=1 + echo "# Process $pid returned successfully." + fi +done + +if [[ $any_success -eq 0 ]]; then + echo "# Failed on LARGE limit positive testing, no test passes." + cgdelete -r -g misc:$TEST_ROOT_CG + exit 1 +fi + +echo "# PASSED LARGE limit positive testing." + +echo "# Start 5 concurrent unclobbered_vdso_oversubscribed tests with LARGE limit, + expecting at least one failure...." +pids=() +for i in {1..5}; do + ( + cgexec -g misc:$TEST_CG_SUB2 $test_cmd >cgtest_large_negative_$timestamp.$i.log 2>&1 + ) & + pids+=($!) +done + +any_failure=0 +for pid in "${pids[@]}"; do + wait "$pid" + status=$? + if [[ $status -ne 0 ]]; then + echo "# Process $pid returned failure." + any_failure=1 + fi +done + +if [[ $any_failure -eq 0 ]]; then + echo "# Failed on LARGE limit negative testing, no test fails." + cgdelete -r -g misc:$TEST_ROOT_CG + exit 1 +fi + +echo "# PASSED LARGE limit negative testing." + +echo "# Start 10 concurrent unclobbered_vdso_oversubscribed tests with LARGER limit, + expecting no failure...." +pids=() +for i in {1..10}; do + ( + cgexec -g misc:$TEST_CG_SUB4 $test_cmd >cgtest_larger_$timestamp.$i.log 2>&1 + ) & + pids+=($!) +done + +any_failure=0 +for pid in "${pids[@]}"; do + wait "$pid" + status=$? + if [[ $status -ne 0 ]]; then + echo "# Process $pid returned failure." + any_failure=1 + fi +done + +if [[ $any_failure -ne 0 ]]; then + echo "# Failed on LARGER limit, at least one test fails." + cgdelete -r -g misc:$TEST_ROOT_CG + exit 1 +fi + +echo "# PASSED LARGER limit tests." + + +echo "# Start 10 concurrent unclobbered_vdso_oversubscribed tests with LARGER limit, + randomly kill one, expecting no failure...." +pids=() +for i in {1..10}; do + ( + cgexec -g misc:$TEST_CG_SUB4 $test_cmd >cgtest_larger_$timestamp.$i.log 2>&1 + ) & + pids+=($!) +done + +sleep $((RANDOM % 10 + 5)) + +# Randomly select a PID to kill +RANDOM_INDEX=$((RANDOM % 10)) +PID_TO_KILL=${pids[RANDOM_INDEX]} + +kill $PID_TO_KILL +echo "# Killed process with PID: $PID_TO_KILL" + +any_failure=0 +for pid in "${pids[@]}"; do + wait "$pid" + status=$? + if [ "$pid" != "$PID_TO_KILL" ]; then + if [[ $status -ne 0 ]]; then + echo "# Process $pid returned failure." + any_failure=1 + fi + fi +done + +if [[ $any_failure -ne 0 ]]; then + echo "# Failed on random killing, at least one test fails." + cgdelete -r -g misc:$TEST_ROOT_CG + exit 1 +fi + +sleep 1 + +USAGE=$(grep '^sgx_epc' "$CG_ROOT/$TEST_ROOT_CG/misc.current" | awk '{print $2}') +if [ "$USAGE" -ne 0 ]; then + echo "# Failed: Final usage is $USAGE, not 0." +else + echo "# PASSED leakage check." + echo "# PASSED ALL cgroup limit tests, cleanup cgroups..." +fi +cgdelete -r -g misc:$TEST_ROOT_CG +echo "# done." diff --git a/tools/testing/selftests/sgx/watch_misc_for_tests.sh b/tools/testing/selftests/sgx/watch_misc_for_tests.sh new file mode 100755 index 000000000000..dbd38f346e7b --- /dev/null +++ b/tools/testing/selftests/sgx/watch_misc_for_tests.sh @@ -0,0 +1,13 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright(c) 2023 Intel Corporation. + +if [ -z "$1" ] + then + echo "No argument supplied, please provide 'max', 'current' or 'events'" + exit 1 +fi + +watch -n 1 "find /sys/fs/cgroup -wholename */test*/misc.$1 -exec sh -c \ + 'echo \"\$1:\"; cat \"\$1\"' _ {} \;" +