From patchwork Sat Sep 23 03:06:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 143805 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:910f:0:b0:403:3b70:6f57 with SMTP id r15csp72440vqg; Fri, 22 Sep 2023 21:08:42 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHzb1hZ2IpWRH7sPTygGg1sYR8LKcO71Qcm2JifWeAIn46nIzS4/8epYv409b3k2n/g1FkJ X-Received: by 2002:a05:6a20:1451:b0:154:a1e4:b676 with SMTP id a17-20020a056a20145100b00154a1e4b676mr1916574pzi.4.1695442122573; Fri, 22 Sep 2023 21:08:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695442122; cv=none; d=google.com; s=arc-20160816; b=WS0BBRDjWO5abdEhobQTyRpBfmlgKtUrAGwW/jeZBptgDTiL94CgTtqD0X1b4dZ7mg 1J6OIudcjfx9gzO82ulQGsSV7Tpbo1apwZEoqfhka1rO6D2EwEGIVNOP2kk6j9twzxNi g9dYG2f7tumkjt4lms1jV00chksUHLtO4Y4dqt3baA3mJNorDSd8drARXe6X97CE1EoG 1/dur+XcO1ShC3pSFt0Eww0X3lEjm/Ds+oE8V/+wLoO1LPHprx8kdq6Efascjhk14B3W ruj1TpyknkgI9MKn4Ns4n89M0TTVUPyx1DA7fIQH0Y7qAXniYvKfB6uwSiJJ1lk3h/nH jqDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=FbBk0VqbBxhxYaRMrXFsRz5a+U1QibFI6qNEvZbq28Q=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=OI+T64geWsEfidYBv511oBAxpTFG2uCeNDRsHvxrn3Uqfmh1AXcQ3rk8u+ytn3CCje 9g6vlJrd8skHudflTApm4Owfd+KnJnT+uBis450RlmSn2GMns3Lg8GRrru9wDMT7xvuK rh8CYxpWIk7fJeCDCgqi03mL1Skc0382v0JAy43NUBAvRThqJmOZHij9znCawOoev5yo NJl+8MaYGfmcC7jqf9c+YUrmdTLi7pGLC8HbWsleLza7emt+ek8lusytRQJuBII9Ci2f x2XGY531CsWsPaeUjHkLkOWajlVf6R8LwJI0s+Za2NhonDcDHXc0tmlck2JY/zLEE77M n01g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=U0B+NtHF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id s1-20020a63dc01000000b0056fed6fa634si4881289pgg.433.2023.09.22.21.08.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Sep 2023 21:08:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=U0B+NtHF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id D60D6801D483; Fri, 22 Sep 2023 20:07:13 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229848AbjIWDHK (ORCPT + 28 others); Fri, 22 Sep 2023 23:07:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229660AbjIWDHH (ORCPT ); Fri, 22 Sep 2023 23:07:07 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 470751A5; Fri, 22 Sep 2023 20:07:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438421; x=1726974421; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CWaM84xI5mGXkf+aiRiMgd1gKaujRaa2JuPurtwlf8Y=; b=U0B+NtHFKuBLO9I6DNye3WraaZEKf26w14SdDHHA07l1N79BAQ0lGhPb 8PCaWO2jrzXsqtSKx1atOyyIgN6Q51Kaj9me++3ONChEdNEi7Il3ogEZa 747VrJmx8vjHMiiQWA4Xteav+Apb373Qi2vurET8bP042ktzu5i5eHc/a /sO+FhP3xE8niDlcoCKAXOK2h7p9qsFzbCr2vaGXSjO84mmM+rJ+23DZM wBSTYfNs9SZ/+aXTu3JvlL0HcVa71QRKTCu6jy4Tvfj5LdejMn7VF/ngd SmTMiZUnbumSPoXbOQ+fi82kEumAOusjqJWRTAeKRAIay2rRDsd/n6cOQ A==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466722" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466722" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:06:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048516" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048516" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:03 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 01/18] cgroup/misc: Add per resource callbacks for CSS events Date: Fri, 22 Sep 2023 20:06:40 -0700 Message-Id: <20230923030657.16148-2-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:07:13 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777799918856362590 X-GMAIL-MSGID: 1777799918856362590 From: Kristen Carlson Accardi The misc cgroup controller (subsystem) currently does not perform resource type specific action for Cgroups Subsystem State (CSS) events: the 'css_alloc' event when a cgroup is created and the 'css_free' event when a cgroup is destroyed, or in event of user writing the max value to the misc.max file to set the usage limit of a specific resource [admin-guide/cgroup-v2.rst, 5-9. Misc]. Define callbacks for those events and allow resource providers to register the callbacks per resource type as needed. This will be utilized later by the EPC misc cgroup support implemented in the SGX driver: - On css_alloc, allocate and initialize necessary structures for EPC reclaiming, e.g., LRU list, work queue, etc. - On css_free, cleanup and free those structures created in alloc. - On max_write, trigger EPC reclaiming if the new limit is at or below current usage. Signed-off-by: Kristen Carlson Accardi Signed-off-by: Haitao Huang --- V5: - Remove prefixes from the callback names (tj) - Update commit message (Jarkko) V4: - Moved this to the front of the series. - Applies on cgroup/for-6.6 with the overflow fix for misc. V3: - Removed the released() callback --- include/linux/misc_cgroup.h | 5 +++++ kernel/cgroup/misc.c | 32 +++++++++++++++++++++++++++++--- 2 files changed, 34 insertions(+), 3 deletions(-) diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index e799b1f8d05b..96a88822815a 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -37,6 +37,11 @@ struct misc_res { u64 max; atomic64_t usage; atomic64_t events; + + /* per resource callback ops */ + int (*alloc)(struct misc_cg *cg); + void (*free)(struct misc_cg *cg); + void (*max_write)(struct misc_cg *cg); }; /** diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index 79a3717a5803..62c9198dee21 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -276,10 +276,13 @@ static ssize_t misc_cg_max_write(struct kernfs_open_file *of, char *buf, cg = css_misc(of_css(of)); - if (READ_ONCE(misc_res_capacity[type])) + if (READ_ONCE(misc_res_capacity[type])) { WRITE_ONCE(cg->res[type].max, max); - else + if (cg->res[type].max_write) + cg->res[type].max_write(cg); + } else { ret = -EINVAL; + } return ret ? ret : nbytes; } @@ -383,23 +386,39 @@ static struct cftype misc_cg_files[] = { static struct cgroup_subsys_state * misc_cg_alloc(struct cgroup_subsys_state *parent_css) { + struct misc_cg *parent_cg; enum misc_res_type i; struct misc_cg *cg; + int ret; if (!parent_css) { cg = &root_cg; + parent_cg = &root_cg; } else { cg = kzalloc(sizeof(*cg), GFP_KERNEL); if (!cg) return ERR_PTR(-ENOMEM); + parent_cg = css_misc(parent_css); } for (i = 0; i < MISC_CG_RES_TYPES; i++) { WRITE_ONCE(cg->res[i].max, MAX_NUM); atomic64_set(&cg->res[i].usage, 0); + if (parent_cg->res[i].alloc) { + ret = parent_cg->res[i].alloc(cg); + if (ret) + goto alloc_err; + } } return &cg->css; + +alloc_err: + for (i = 0; i < MISC_CG_RES_TYPES; i++) + if (parent_cg->res[i].free) + cg->res[i].free(cg); + kfree(cg); + return ERR_PTR(ret); } /** @@ -410,7 +429,14 @@ misc_cg_alloc(struct cgroup_subsys_state *parent_css) */ static void misc_cg_free(struct cgroup_subsys_state *css) { - kfree(css_misc(css)); + struct misc_cg *cg = css_misc(css); + enum misc_res_type i; + + for (i = 0; i < MISC_CG_RES_TYPES; i++) + if (cg->res[i].free) + cg->res[i].free(cg); + + kfree(cg); } /* Cgroup controller callbacks */ From patchwork Sat Sep 23 03:06:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 143825 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp12664vqu; Fri, 22 Sep 2023 23:22:23 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHD/Ht4GSaIfZgzyQl/QqrkQW6OREJvT+lIvWPfdCDGNUcjLozcde/4FAFg49sU4y7afOZd X-Received: by 2002:a0d:ef43:0:b0:57a:f72:ebf8 with SMTP id y64-20020a0def43000000b0057a0f72ebf8mr1775132ywe.28.1695450143145; Fri, 22 Sep 2023 23:22:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695450143; cv=none; d=google.com; s=arc-20160816; b=gF6tLr6IykFnbrYIGbcJ4GoKcxtm8IQTmC3rAxWHFTyTStPinYjGiFBitXUSwvP/xV tBScQnodOOfs1AwCuRDn1UdpcbGn9wtlbq3SKKSX13hK9QyKAcRZ96KV9NS0/SfaF7u0 0No4f/iNg0l4HSrmD3zSTfFN3IkqtDL5dmr+OMi5GLsql8eGS6klB2Zy3rN5ZrlBFSQw lCp3WemDdAkTLAWrbslJNsriL9Swccs1jspOuDbuhDeIsyB90fQmXkiT6S8KzQF7+6XO vaK5ODwGIIKlvVD1FTimkIX3bAwqdf/j3TK9BMvxE6A6I5fub6MlqgKX/vg6tder6MH/ McGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=SZwQwU5dvzN5rv4w4i3huwpfgT/B+hJ4n03esTL2240=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=gXGvzf0SccJ8N/PFFHsZIcpXgKBqbe2ki6EmIGaBhBgc+JN9WCESYZD4314mOXFKFp j450lZ3WKtad7jM9J0pAed9xO5lvgUPLw+K6JEAzED7ydvUvPBk2c3A5MIzUkw05+cWm NmYAt+BLGFcAxuvrbr8JFG8A5iNiq8w7fJlqGBD9V3HmeLCLhbIUGj7mFJzINOgD3OkN lzF/DBaIrNIiOfkB+eAU6kXNP6kq7R/y+pDQy7fnDs1nEGtHi2BKOyPWGIqyoq/8xFb8 NJt8b6FX+pJNll+YVXqSUHbmcdXL74f136PAGRoGO0Soxm6tZpSzVFkHq0SDoIHDorjB DGAg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=BtF9ojUd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id w29-20020a63161d000000b005702257f32csi5146756pgl.840.2023.09.22.23.22.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Sep 2023 23:22:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=BtF9ojUd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id DC689805A895; Fri, 22 Sep 2023 20:07:34 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229905AbjIWDHO (ORCPT + 28 others); Fri, 22 Sep 2023 23:07:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38360 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229684AbjIWDHH (ORCPT ); Fri, 22 Sep 2023 23:07:07 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7CB3F180; Fri, 22 Sep 2023 20:07:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438421; x=1726974421; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=gHiEoeQWB3v495nDv7XQWXxhBS4gYS7Eqf+K1QD7IOs=; b=BtF9ojUdSsWayAE/JTA8gOmnE+8hkmQCtKXsaT6XKxRJSuanN3siCs5M xtglpOdWbAgW8fiZCJ5+JWX3c083r04w+oqmZlhpiBo0OnnqoKAQCvf42 k/aPjLVN1R8NQr9qzP4FEZVeQaji+E/0VQo2gK+RJi24dAK1hL3i+G0xK ImHTjO+XIR1sqlcMhE7LR1YwcOVR+t026J+WqzjQpt0sGkoCb/SV8fQK8 IgGnT7a8qf+i9A1NYbgaXRf7t0QzFIaTJsX3DMRbc/gmImWRqQyPaBUV4 0dZqYvztPBlnxthOT+0EZUVwNcq154tBBVN/2wosa0mBsx8rQnLIwo5kA w==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466730" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466730" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:07:00 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048521" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048521" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:03 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 02/18] cgroup/misc: Add SGX EPC resource type and export APIs for SGX driver Date: Fri, 22 Sep 2023 20:06:41 -0700 Message-Id: <20230923030657.16148-3-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:07:34 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777808328982371886 X-GMAIL-MSGID: 1777808328982371886 From: Kristen Carlson Accardi Add SGX EPC memory, MISC_CG_RES_SGX_EPC, to be a valid resource type for the misc controller. Add per resource type private data so that SGX can store additional per cgroup data in misc_cg->misc_cg_res[MISC_CG_RES_SGX_EPC]. Export misc_cg_root() so the SGX driver can initialize and add those additional structures to the root misc cgroup as part of initialization for EPC cgroup support. This bootstraps the same additional initialization for non-root cgroups in the 'alloc()' callback added in the previous patch. The SGX driver, as the EPC memory provider, will have a background worker to reclaim EPC pages to make room for new allocations in the same cgroup when its usage counter reaches near the limit controlled by the cgroup and its ancestors. Therefore it needs to do a walk from the current cgroup up to the root. To enable this walk, move parent_misc() into misc_cgroup.h and make inline to make this function available to SGX, rename it to misc_cg_parent(), and update kernel/cgroup/misc.c to use the new name. Signed-off-by: Kristen Carlson Accardi Signed-off-by: Haitao Huang Acked-by: Tejun Heo --- V5: - Revised commit message (Jarkko) V4: - Moved this to the second in the series. --- include/linux/misc_cgroup.h | 29 +++++++++++++++++++++++++++++ kernel/cgroup/misc.c | 25 ++++++++++++------------- 2 files changed, 41 insertions(+), 13 deletions(-) diff --git a/include/linux/misc_cgroup.h b/include/linux/misc_cgroup.h index 96a88822815a..87f29f8597e1 100644 --- a/include/linux/misc_cgroup.h +++ b/include/linux/misc_cgroup.h @@ -17,6 +17,10 @@ enum misc_res_type { MISC_CG_RES_SEV, /* AMD SEV-ES ASIDs resource */ MISC_CG_RES_SEV_ES, +#endif +#ifdef CONFIG_CGROUP_SGX_EPC + /* SGX EPC memory resource */ + MISC_CG_RES_SGX_EPC, #endif MISC_CG_RES_TYPES }; @@ -37,6 +41,7 @@ struct misc_res { u64 max; atomic64_t usage; atomic64_t events; + void *priv; /* per resource callback ops */ int (*alloc)(struct misc_cg *cg); @@ -59,6 +64,7 @@ struct misc_cg { struct misc_res res[MISC_CG_RES_TYPES]; }; +struct misc_cg *misc_cg_root(void); u64 misc_cg_res_total_usage(enum misc_res_type type); int misc_cg_set_capacity(enum misc_res_type type, u64 capacity); int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, u64 amount); @@ -78,6 +84,20 @@ static inline struct misc_cg *css_misc(struct cgroup_subsys_state *css) return css ? container_of(css, struct misc_cg, css) : NULL; } +/** + * misc_cg_parent() - Get the parent of the passed misc cgroup. + * @cgroup: cgroup whose parent needs to be fetched. + * + * Context: Any context. + * Return: + * * struct misc_cg* - Parent of the @cgroup. + * * %NULL - If @cgroup is null or the passed cgroup does not have a parent. + */ +static inline struct misc_cg *misc_cg_parent(struct misc_cg *cgroup) +{ + return cgroup ? css_misc(cgroup->css.parent) : NULL; +} + /* * get_current_misc_cg() - Find and get the misc cgroup of the current task. * @@ -102,6 +122,15 @@ static inline void put_misc_cg(struct misc_cg *cg) } #else /* !CONFIG_CGROUP_MISC */ +static inline struct misc_cg *misc_cg_root(void) +{ + return NULL; +} + +static inline struct misc_cg *misc_cg_parent(struct misc_cg *cg) +{ + return NULL; +} static inline u64 misc_cg_res_total_usage(enum misc_res_type type) { diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c index 62c9198dee21..4633b8629e63 100644 --- a/kernel/cgroup/misc.c +++ b/kernel/cgroup/misc.c @@ -24,6 +24,10 @@ static const char *const misc_res_name[] = { /* AMD SEV-ES ASIDs resource */ "sev_es", #endif +#ifdef CONFIG_CGROUP_SGX_EPC + /* Intel SGX EPC memory bytes */ + "sgx_epc", +#endif }; /* Root misc cgroup */ @@ -40,18 +44,13 @@ static struct misc_cg root_cg; static u64 misc_res_capacity[MISC_CG_RES_TYPES]; /** - * parent_misc() - Get the parent of the passed misc cgroup. - * @cgroup: cgroup whose parent needs to be fetched. - * - * Context: Any context. - * Return: - * * struct misc_cg* - Parent of the @cgroup. - * * %NULL - If @cgroup is null or the passed cgroup does not have a parent. + * misc_cg_root() - Return the root misc cgroup. */ -static struct misc_cg *parent_misc(struct misc_cg *cgroup) +struct misc_cg *misc_cg_root(void) { - return cgroup ? css_misc(cgroup->css.parent) : NULL; + return &root_cg; } +EXPORT_SYMBOL_GPL(misc_cg_root); /** * valid_type() - Check if @type is valid or not. @@ -150,7 +149,7 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, u64 amount) if (!amount) return 0; - for (i = cg; i; i = parent_misc(i)) { + for (i = cg; i; i = misc_cg_parent(i)) { res = &i->res[type]; new_usage = atomic64_add_return(amount, &res->usage); @@ -163,12 +162,12 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg, u64 amount) return 0; err_charge: - for (j = i; j; j = parent_misc(j)) { + for (j = i; j; j = misc_cg_parent(j)) { atomic64_inc(&j->res[type].events); cgroup_file_notify(&j->events_file); } - for (j = cg; j != i; j = parent_misc(j)) + for (j = cg; j != i; j = misc_cg_parent(j)) misc_cg_cancel_charge(type, j, amount); misc_cg_cancel_charge(type, i, amount); return ret; @@ -190,7 +189,7 @@ void misc_cg_uncharge(enum misc_res_type type, struct misc_cg *cg, u64 amount) if (!(amount && valid_type(type) && cg)) return; - for (i = cg; i; i = parent_misc(i)) + for (i = cg; i; i = misc_cg_parent(i)) misc_cg_cancel_charge(type, i, amount); } EXPORT_SYMBOL_GPL(misc_cg_uncharge); From patchwork Sat Sep 23 03:06:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 143890 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp156872vqu; Sat, 23 Sep 2023 05:50:10 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG1ito/6qnTiVZ4Vmx48jacf+4Xfulp2Uxopsi2DxQTVj0zisUMY4M8iQvLvojm8PgBfajR X-Received: by 2002:a17:902:e5ce:b0:1b6:649b:92cc with SMTP id u14-20020a170902e5ce00b001b6649b92ccmr1919442plf.69.1695473410096; Sat, 23 Sep 2023 05:50:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695473410; cv=none; d=google.com; s=arc-20160816; b=H051vsZeYTuQ4BRktNOkX6km9P850h6ao7nrAS6A//ikYH36RPLyfaLLt9Yph5Wyna CJO88s+0YPFGecjPj9a4q8K3exCh8IOjbVv/VlyjMxhE522iTmQNI+i77q2W1ySKu80E Scfbn/U+lsTHp87L0Z4wkMThxTLxEECJIT3h66ZNPYJ63EHRb9Cpeoj2lTeoJEMytiWm ymKMfS79N6JJof4EkMSdCsPFN9mBBgaWhaYHdWt4hdBowP3IBjgtFl/uhBq3R4doJmiu pYJcQeuhcJFz7aBgB1nUuoPuyFiBt6JRJrgORad6LgWtwYH1CxU3XJc2y/GMD4c/ftJw l45A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=bGlU/4JU+Zb6+yhWx9fQ0h2IskKhlr5ZCT9Wt5gXHB8=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=h6co1N1JuPq0s+/qjYdgj7DiAZz+hVv8GfLNt2TTOLW9E+6HuZAfkJ6TDE9jAo9+nx lXyaH0CgEqWwB8VFql9ZlKZXpfZroAeR56jt98UeJHx6S9sliar+Zzm6kVZoy+YDngT1 GKBjuPNohqzQGNGFLdY0w6VFEu2C1+Vwm7JXdKdKbPhbd1d5mfhAsAQk9wKXOyGfQN/o 7fEuQc2TlCEmuuFsYQe3tcvVnrUb9WLspo7NxHwtESYlnum5ioOKAwWpNQcDOF/hKFDZ xZREPwb0DCyUqI5kwtQwbPxnNxmb2RF/y4sOv8klY8JLvuYBKF10fywFdNvqWH8UgWNF iAng== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=axsybe+a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id u7-20020a17090341c700b001bde8c9800esi6422252ple.459.2023.09.23.05.50.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 23 Sep 2023 05:50:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=axsybe+a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id B38D2808E640; Fri, 22 Sep 2023 20:07:26 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229939AbjIWDHS (ORCPT + 28 others); Fri, 22 Sep 2023 23:07:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229490AbjIWDHI (ORCPT ); Fri, 22 Sep 2023 23:07:08 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C4491A7; Fri, 22 Sep 2023 20:07:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438422; x=1726974422; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Bl38Qf8S1sTRN04lW2YFAodcrfy1L4fMzLvJCxC13/E=; b=axsybe+aDrMZ0KkfyOZE9abRzGIrB+i9eofz2fJIqTumnixBlfciEw/J BjAHBdIdT3EsEXyMnzw1qWyWBt7rnlOO5JOFHQnivhXm3t172kkst+HRh /at8+1vb/bVeCYz4vux0aBYwivGQOeh7x6jlSVc8XCoBA9QR5SJKU0YAQ H6WSpVD4LWUhDqnzWUgx4BtFop4FR8EaepmRI82Tcjp3rdecVL2poYJ/V WJJEqKo+VqNp2YhWzazf2D1/lDkd0mn70MT3WyZvxzXkAafyOsU0+C0HK po6BgD5azrljYiJL9G7g3R/wYC+tbmCC79Vw6q+0WyfQJVqDxyGu89eHv Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466738" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466738" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:07:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048527" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048527" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:04 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 03/18] x86/sgx: Add sgx_epc_lru_lists to encapsulate LRU lists Date: Fri, 22 Sep 2023 20:06:42 -0700 Message-Id: <20230923030657.16148-4-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:07:26 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777832726932699650 X-GMAIL-MSGID: 1777832726932699650 From: Sean Christopherson Introduce a data structure to wrap the existing reclaimable list and its spinlock. Each cgroup later will have one instance of this structure to track EPC pages allocated for processes associated with the same cgroup. Just like the global SGX reclaimer (ksgxd), an EPC cgroup reclaims pages from the reclaimable list in this structure when its usage reaches near its limit. Currently, ksgxd does not track the VA, SECS pages. They are considered as 'unreclaimable' pages that are only deallocated when their respective owning enclaves are destroyed and all associated resources released. When an EPC cgroup can not reclaim any more reclaimable EPC pages to reduce its usage below its limit, the cgroup must also reclaim those unreclaimables by killing their owning enclaves. The VA and SECS pages later are also tracked in an 'unreclaimable' list added to this structure to support this OOM killing of enclaves. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V4: - Removed unneeded comments for the spinlock and the non-reclaimables. (Kai, Jarkko) - Revised the commit to add introduction comments for unreclaimables and multiple LRU lists.(Kai) - Reordered the patches: delay all changes for unreclaimables to later, and this one becomes the first change in the SGX subsystem. V3: - Removed the helper functions and revised commit messages. --- arch/x86/kernel/cpu/sgx/sgx.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index d2dad21259a8..018414b2abe8 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -83,6 +83,20 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page) return section->virt_addr + index * PAGE_SIZE; } +/* + * Tracks EPC pages reclaimable by the reclaimer (ksgxd). + */ +struct sgx_epc_lru_lists { + spinlock_t lock; + struct list_head reclaimable; +}; + +static inline void sgx_lru_init(struct sgx_epc_lru_lists *lrus) +{ + spin_lock_init(&lrus->lock); + INIT_LIST_HEAD(&lrus->reclaimable); +} + struct sgx_epc_page *__sgx_alloc_epc_page(void); void sgx_free_epc_page(struct sgx_epc_page *page); From patchwork Sat Sep 23 03:06:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 143867 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp97023vqu; Sat, 23 Sep 2023 03:32:55 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEDB1eXrRuFsNu2v7z13Nbzq8MpVmO7fD+BO5/mIgLGVuOWduY+M3jnVZWYHzOMecZJUuJ+ X-Received: by 2002:a05:6358:7209:b0:134:ed9b:15a7 with SMTP id h9-20020a056358720900b00134ed9b15a7mr2229690rwa.30.1695465175588; Sat, 23 Sep 2023 03:32:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695465175; cv=none; d=google.com; s=arc-20160816; b=eo3p4LvgAcNsw1feQCOevmiO6eEFhih1WrCW+7oD30E8criAKyJYI99SGQ60J7qBvG 4AonAR/0rGOJ8wVnxgAcFT5Vcb38EibDmFn991fqQNFmZHDyE4HpiBZEY8g/Z8IfQVP1 H7ERGu7m6hI0A+XVLPIl6StwVmagDNP+btfviz0Xa4oojFjY34qvA+F1gVO+3xUDK0pR TSsPDTTZ6naDgySrzIoSEgM9E6dn9wsijsQ4OM0Ng1U9AnuKuwhNprWZszbEk4O8vri4 m40MPhJgnJhBeHpOfnLGUjSSrOEGMf8ZjOj2RyYxT+euSa1n/9II0PoeJNyVqHDiv8gL eq0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=CBtY+bCyKIzH857XyqE7xKD8wDRwZ121KIWfO8a5SlY=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=dbVCuo1H1zTeEkGZqQzmwhE/z88ns5Ebh0QqsTDDUCp/iOVAZBtBQ9jGtOGqzyPRNb P5nvubNLGGzAJErdsSkwX9MLU4MZYPQx/3/Cslo0luB686zN77L/l70GDAUJgqvUiTRO rIO48GCZAmwxzQXfxh7Q55kp9PjWHGXQBjtq+hURjmkHIiLa+ZVsRId0gqxg8GLsAlRn BoD0FM+TSUPVlOaUmwv4ntYADAK1Qq3KiRKIvHfW6QM/EwbZ3Nn50D62D3hmxcmUSkwG 5VEfoE1LWt7pTNfeHhthWyczrDKoKvD/dD600qy5KFP3rjCdZrbRSNNFF0P6/thO8Xok TmPw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ITTN1kjV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from pete.vger.email (pete.vger.email. [2620:137:e000::3:6]) by mx.google.com with ESMTPS id be3-20020a656e43000000b0057e5d66c200si2182602pgb.462.2023.09.23.03.32.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 23 Sep 2023 03:32:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) client-ip=2620:137:e000::3:6; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ITTN1kjV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:6 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by pete.vger.email (Postfix) with ESMTP id 857D180BC121; Fri, 22 Sep 2023 20:07:46 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at pete.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230030AbjIWDHW (ORCPT + 28 others); Fri, 22 Sep 2023 23:07:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38378 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229782AbjIWDHI (ORCPT ); Fri, 22 Sep 2023 23:07:08 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C3911A5; Fri, 22 Sep 2023 20:07:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438422; x=1726974422; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2XInwiTptBt8Hdm7nc27V3H+v3nniIpqVQk6SoGNLx0=; b=ITTN1kjVtJRjkoeT++c7HQ4bQOZlv9LWRlIGCWeRdiq34MTUjMBRz1ov Dj5GfFUBpie5gWlql6Uk0YODF+lkWpgy4ZdQPmtsZ45YtxrYeqQ9QmTiX 4ADJarwNM57uqoJ8ZJ5kpqqRL0P9uez9qSVSh1wRM7vJ4v6fru2adPqcs TmGuVVOPDgGjJBQt1LSTWr7Ubl0bNYc/ffEnlGDcwrbc7OUdkg9fEJIAi mP/zDWwPM8hJBWm45BSwu3SAXf8V3HecRDYk9+qoSjCb02WGjU3LFG2Lp 68b5xPDtbEmjovKw62JiZInQEme7uUJvqpcxjfLcwxlXEVBJf94WWf++g Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466746" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466746" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:07:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048532" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048532" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:05 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 04/18] x86/sgx: Use sgx_epc_lru_lists for existing active page list Date: Fri, 22 Sep 2023 20:06:43 -0700 Message-Id: <20230923030657.16148-5-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=2.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on pete.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (pete.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:07:47 -0700 (PDT) X-Spam-Level: ** X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777824091868341304 X-GMAIL-MSGID: 1777824091868341304 From: Sean Christopherson All EPC pages of enclaves including Version Array (VA) and SGX Enclave Control Structure (SECS) will be tracked in sgx_epc_lru_lists structs, one per cgroup. For now just replace the existing sgx_active_page_list in the reclaimer and its spinlock with a global sgx_epc_lru_lists struct. VA and SECS pages are still not tracked at this point but they will be tracked after an unreclaimable LRU list is added to the sgx_epc_lru_lists struct. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V5: - Spelled out SECS, VA (Jarkko) V4: - No change, only reordered the patch. V3: - Remove usage of list wrapper --- arch/x86/kernel/cpu/sgx/main.c | 39 +++++++++++++++++----------------- 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 166692f2d501..afce51d6e94a 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -26,10 +26,9 @@ static DEFINE_XARRAY(sgx_epc_address_space); /* * These variables are part of the state of the reclaimer, and must be accessed - * with sgx_reclaimer_lock acquired. + * with sgx_global_lru.lock acquired. */ -static LIST_HEAD(sgx_active_page_list); -static DEFINE_SPINLOCK(sgx_reclaimer_lock); +static struct sgx_epc_lru_lists sgx_global_lru; static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0); @@ -304,13 +303,13 @@ static void sgx_reclaim_pages(void) int ret; int i; - spin_lock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); for (i = 0; i < SGX_NR_TO_SCAN; i++) { - if (list_empty(&sgx_active_page_list)) + epc_page = list_first_entry_or_null(&sgx_global_lru.reclaimable, + struct sgx_epc_page, list); + if (!epc_page) break; - epc_page = list_first_entry(&sgx_active_page_list, - struct sgx_epc_page, list); list_del_init(&epc_page->list); encl_page = epc_page->owner; @@ -322,7 +321,7 @@ static void sgx_reclaim_pages(void) */ epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; } - spin_unlock(&sgx_reclaimer_lock); + spin_unlock(&sgx_global_lru.lock); for (i = 0; i < cnt; i++) { epc_page = chunk[i]; @@ -345,9 +344,9 @@ static void sgx_reclaim_pages(void) continue; skip: - spin_lock(&sgx_reclaimer_lock); - list_add_tail(&epc_page->list, &sgx_active_page_list); - spin_unlock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); + list_add_tail(&epc_page->list, &sgx_global_lru.reclaimable); + spin_unlock(&sgx_global_lru.lock); kref_put(&encl_page->encl->refcount, sgx_encl_release); @@ -378,7 +377,7 @@ static void sgx_reclaim_pages(void) static bool sgx_should_reclaim(unsigned long watermark) { return atomic_long_read(&sgx_nr_free_pages) < watermark && - !list_empty(&sgx_active_page_list); + !list_empty(&sgx_global_lru.reclaimable); } /* @@ -430,6 +429,8 @@ static bool __init sgx_page_reclaimer_init(void) ksgxd_tsk = tsk; + sgx_lru_init(&sgx_global_lru); + return true; } @@ -505,10 +506,10 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) */ void sgx_mark_page_reclaimable(struct sgx_epc_page *page) { - spin_lock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); page->flags |= SGX_EPC_PAGE_RECLAIMER_TRACKED; - list_add_tail(&page->list, &sgx_active_page_list); - spin_unlock(&sgx_reclaimer_lock); + list_add_tail(&page->list, &sgx_global_lru.reclaimable); + spin_unlock(&sgx_global_lru.lock); } /** @@ -523,18 +524,18 @@ void sgx_mark_page_reclaimable(struct sgx_epc_page *page) */ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) { - spin_lock(&sgx_reclaimer_lock); + spin_lock(&sgx_global_lru.lock); if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { /* The page is being reclaimed. */ if (list_empty(&page->list)) { - spin_unlock(&sgx_reclaimer_lock); + spin_unlock(&sgx_global_lru.lock); return -EBUSY; } list_del(&page->list); page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; } - spin_unlock(&sgx_reclaimer_lock); + spin_unlock(&sgx_global_lru.lock); return 0; } @@ -567,7 +568,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - if (list_empty(&sgx_active_page_list)) + if (list_empty(&sgx_global_lru.reclaimable)) return ERR_PTR(-ENOMEM); if (!reclaim) { From patchwork Sat Sep 23 03:06:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 143862 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp79213vqu; Sat, 23 Sep 2023 02:43:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG8pYCYIWAmFBcqYANNPXlgodOycer0pzjXukc6/bAjxQTrfPucBRKuZb7dVTr54eGYDxzt X-Received: by 2002:a0d:ea13:0:b0:59c:8b3:6863 with SMTP id t19-20020a0dea13000000b0059c08b36863mr1805541ywe.43.1695462232186; Sat, 23 Sep 2023 02:43:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695462232; cv=none; d=google.com; s=arc-20160816; b=dHCM7oR2Lp6stOliYAKKz91Tacw6bY402T6uxTuZh8lt5c3k3lqWgSbDPLQkHNJuow PK5q4qPe6Tg8scZffSh5l5PdEo6gMWORr8q+6rkqHq8htI1fNwVTtlsiKfdCRppIHVUE ufSzDP/jIIIEfMmIvbt41U2ScsHp00lu0t9Y0p47YHbFpJAeWjAiIHQdjdXa4TUb8MeX /6GMsdZ/Eo/4mO4/peWdr6mlPrp/clRHlB/XDOs+/tObOUL+HB6qGPSkKs4nORNwT78F HXlJT7wOkHPs73VJtDhqpZR1epscsatcnJt7yHTwWlGw2F3hWI5Q9nzMg8CJ6Op+d21K QRjA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=zfLSq/oQffxY4iKB4jCPz9GlaK9A/EGO3IVg7r8jFhU=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=HGLHLG8/zb9EoSfzcQE2jitlIUandbjPktiqmfuehvRURaFT3/vyxj+vxx8Oub6OPe Rn/RrKQVRPqre4yJDWK9N1SfHRDfRz5BcZQ3OONRrvJTj5kHmfPPzw2/GwyYM4pIODoN t2NJe53Kfm3plZuWgIrcSrV9KexM5CG6F+DAQpG1mIiDhvdy1G4SZa2xZFD7laKjRiAF 1FP8kOpEerlLpQCtjyEhLWH9HoTWXX8VHlT99/MpSVHuUN4q+w45TSU5Wj/Q/ladnQ91 RwDgw8TQAT2cj2I1DuoKBWt7Qm/j9p7GSE5w+x+ns3BWoevgCRD3cattT717neOgyHfC +67Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=eybwKEgp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id b14-20020a170902650e00b001bf5f853a8bsi5274732plk.641.2023.09.23.02.43.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 23 Sep 2023 02:43:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=eybwKEgp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 1DA9381BFA91; Fri, 22 Sep 2023 20:07:50 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230031AbjIWDH1 (ORCPT + 28 others); Fri, 22 Sep 2023 23:07:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38394 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229788AbjIWDHJ (ORCPT ); Fri, 22 Sep 2023 23:07:09 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D6DA51A8; Fri, 22 Sep 2023 20:07:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438422; x=1726974422; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Qotu9yJej/PeYxfuujfhn2qav8F09p8/KaLGJiCS8IU=; b=eybwKEgpuv88v8wpZO+dWFKclr0vjP4mv8d8XBtxVT1eKUiCKNMukkOQ Jc5ti+ve13nTnx7s4LnrVjTBemgMge/41qoqM8r3zm1DSPkV4ECPN0MLU 1QNDC7TCHmCCz8grNeWdrjPUzkDCzlisv56jGJjf/eXpQvfAdfOa3BcXo FhZyCeSpvnG80INSwkyS56OqmOv2hS0yP2FMBWENsqinzSb1qlHgcANVO m8IcUA4MxbKcyAeX6WOTsdC5fhQJdHyEWjl/7RbRufE//a3Mi/XP/8wvX jCEdl6KLMeH7L8lXlzN91p+30BgxeUFHx6bOlTGTBJn8LD8NYeg680GRd A==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466754" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466754" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:07:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048536" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048536" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:06 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 05/18] x86/sgx: Store reclaimable EPC pages in sgx_epc_lru_lists Date: Fri, 22 Sep 2023 20:06:44 -0700 Message-Id: <20230923030657.16148-6-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=2.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:08:04 -0700 (PDT) X-Spam-Level: ** X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777821005737203736 X-GMAIL-MSGID: 1777821005737203736 From: Sean Christopherson Replace sgx_mark_page_reclaimable() and sgx_unmark_page_reclaimable() with sgx_record_epc_page() and sgx_drop_epc_page(). The sgx_record_epc_page() function adds the epc_page to the "reclaimable" list in the sgx_epc_lru_lists struct, while sgx_drop_epc_page() removes the page from the LRU list. For now, this change serves as a straightforward replacement of the two functions for pages tracked by the reclaimer. When the unreclaimable list is added to track VA and SECS pages for cgroups, these functions will be updated to add/remove them from the unreclaimable lists. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V5: - style fixes (Jarkko) V4: - Code update needed for patch reordering - Revised commit message. --- arch/x86/kernel/cpu/sgx/encl.c | 6 +++--- arch/x86/kernel/cpu/sgx/ioctl.c | 8 ++++---- arch/x86/kernel/cpu/sgx/main.c | 22 ++++++++++++---------- arch/x86/kernel/cpu/sgx/sgx.h | 4 ++-- 4 files changed, 21 insertions(+), 19 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 279148e72459..97a53e34a8b4 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -272,7 +272,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl, return ERR_CAST(epc_page); encl->secs_child_cnt++; - sgx_mark_page_reclaimable(entry->epc_page); + sgx_record_epc_page(epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); return entry; } @@ -398,7 +398,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma, encl_page->type = SGX_PAGE_TYPE_REG; encl->secs_child_cnt++; - sgx_mark_page_reclaimable(encl_page->epc_page); + sgx_record_epc_page(epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); phys_addr = sgx_get_epc_phys_addr(epc_page); /* @@ -714,7 +714,7 @@ void sgx_encl_release(struct kref *ref) * The page and its radix tree entry cannot be freed * if the page is being held by the reclaimer. */ - if (sgx_unmark_page_reclaimable(entry->epc_page)) + if (sgx_drop_epc_page(entry->epc_page)) continue; sgx_encl_free_epc_page(entry->epc_page); diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index 5d390df21440..a75eb44022a3 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -322,7 +322,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src, goto err_out; } - sgx_mark_page_reclaimable(encl_page->epc_page); + sgx_record_epc_page(epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); mutex_unlock(&encl->lock); mmap_read_unlock(current->mm); return ret; @@ -961,7 +961,7 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl, * Prevent page from being reclaimed while mutex * is released. */ - if (sgx_unmark_page_reclaimable(entry->epc_page)) { + if (sgx_drop_epc_page(entry->epc_page)) { ret = -EAGAIN; goto out_entry_changed; } @@ -976,7 +976,7 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl, mutex_lock(&encl->lock); - sgx_mark_page_reclaimable(entry->epc_page); + sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); } /* Change EPC type */ @@ -1133,7 +1133,7 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl, goto out_unlock; } - if (sgx_unmark_page_reclaimable(entry->epc_page)) { + if (sgx_drop_epc_page(entry->epc_page)) { ret = -EBUSY; goto out_unlock; } diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index afce51d6e94a..dec1d57cbff6 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -268,7 +268,6 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, goto out; sgx_encl_ewb(encl->secs.epc_page, &secs_backing); - sgx_encl_free_epc_page(encl->secs.epc_page); encl->secs.epc_page = NULL; @@ -498,31 +497,34 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) } /** - * sgx_mark_page_reclaimable() - Mark a page as reclaimable + * sgx_record_epc_page() - Add a page to the appropriate LRU list * @page: EPC page + * @flags: The type of page that is being recorded * - * Mark a page as reclaimable and add it to the active page list. Pages - * are automatically removed from the active list when freed. + * Mark a page with the specified flags and add it to the appropriate + * list. */ -void sgx_mark_page_reclaimable(struct sgx_epc_page *page) +void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags) { spin_lock(&sgx_global_lru.lock); - page->flags |= SGX_EPC_PAGE_RECLAIMER_TRACKED; - list_add_tail(&page->list, &sgx_global_lru.reclaimable); + WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED); + page->flags |= flags; + if (flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) + list_add_tail(&page->list, &sgx_global_lru.reclaimable); spin_unlock(&sgx_global_lru.lock); } /** - * sgx_unmark_page_reclaimable() - Remove a page from the reclaim list + * sgx_drop_epc_page() - Remove a page from a LRU list * @page: EPC page * - * Clear the reclaimable flag and remove the page from the active page list. + * Clear the reclaimable flag if set and remove the page from its LRU. * * Return: * 0 on success, * -EBUSY if the page is in the process of being reclaimed */ -int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) +int sgx_drop_epc_page(struct sgx_epc_page *page) { spin_lock(&sgx_global_lru.lock); if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 018414b2abe8..113d930fd087 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -101,8 +101,8 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void); void sgx_free_epc_page(struct sgx_epc_page *page); void sgx_reclaim_direct(void); -void sgx_mark_page_reclaimable(struct sgx_epc_page *page); -int sgx_unmark_page_reclaimable(struct sgx_epc_page *page); +void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags); +int sgx_drop_epc_page(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); void sgx_ipi_cb(void *info); From patchwork Sat Sep 23 03:06:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 143847 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp57101vqu; Sat, 23 Sep 2023 01:36:21 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE18fPmBmlstapteKBAL6v2lR+FQ5eHXx/nF+TdcqlJkQHft32YJ85Lu7OPaD8z4hBonb4G X-Received: by 2002:a05:6358:2607:b0:13a:a85b:c373 with SMTP id l7-20020a056358260700b0013aa85bc373mr1881855rwc.18.1695458181645; Sat, 23 Sep 2023 01:36:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695458181; cv=none; d=google.com; s=arc-20160816; b=03d2y3DxA52JWj355wOZPUFEDjbE3to6SENtWi7ejjNzj5PfS1U7Rnme3nkooN0Xcv +dfKV3d+rD8G9DRa6e2rxdxwrBkEx3Dp41PnB9bVR+qM+ydTOD/bMwEGVbwbp+5D5lzi ZvWs8NbZ18zUI8oG9G2ET9CVjsclTcEdnpVT3ZTF9MfXn35r99Fb7y1B1928kGwu8MhD qjsPSfnhQL/NOdKNqNoNvSCqFX+BNufQNW361ayI0pF+Zp8DUlUADGsdQF0wrKIdrDOO JWMeDBdc2/njGD9ofwMqUVUkBXE/vKF5zop4HkqcGhGuv2onEtKJbCrxbZEf1rFJjm7m YAiQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=sBfPvCap5w9OVlfKdI7l3tcRWL+peW2Xre2Zgbxh2Co=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=kal9WR5MRPTfyjuirjqpcEa99Qxc0MtBkk1/z2XCO8HfqFJj0ujG9R3UkfPmaDgV+5 uruVOeQCtIE4Cesrezv0dwd/npit+zHUfv+gKjyYs4nqzqj7MDRobRUbN6jCITo99ZEn 2Yc3rZXLfQU2IBTJJti2ICaLXEv4VAqKYxflAADbsWz1QOg7AO7CL0h/dWeEFNZco1yG LAsWUfw6O4c7LAgRPn6oHEmMQwWkZcFCawX/UQJY7a1eZfdVrhnoBHxdICrkv/GzjGEB I1kn8iYNNNhvoiSJ8Kded5YCR9fDyibeEXe8o5541lGs9U3m8HUmYXVYUWFtDyMllSB5 rSUw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=coxVdvqj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id w2-20020a637b02000000b00573fb2f7537si5612142pgc.586.2023.09.23.01.36.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 23 Sep 2023 01:36:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=coxVdvqj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 2691883C23AE; Fri, 22 Sep 2023 20:08:48 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230272AbjIWDHd (ORCPT + 28 others); Fri, 22 Sep 2023 23:07:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38402 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229815AbjIWDHJ (ORCPT ); Fri, 22 Sep 2023 23:07:09 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7432E180; Fri, 22 Sep 2023 20:07:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438423; x=1726974423; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=W4lWkvNcyZmnYgi3wRQrFdf4W2G8XS4+lTbHa80YL7g=; b=coxVdvqjtsJqYXoR1GpmC8dy9qPiHAu7cej4KsqB173KXZv7qI8fcjC2 UB7Nmv/0aPkgItMbJS7ypKk1ursE9Ly3frXtekXDnvL79A/b70eqT7kzR 8BVTcpu6gRGJAMHMWlcuYsMcAa4GsCjGhNrNBZNo75/jVJTQb9acIxrqw phlwbof/bVvXjELvc9i1XuKwAvtAkLS8pVVaVbv/X0o/Gs19xJ+O9vQBg +c8q3rn8gSg8ZsUqZZ5rokite65sizX/A1XMh94z+WwT/Ls0flY1emS2o pfpZGHtt5Ea+ouZSLEXRL8VgbgtTGgbQIvDFLFFEJDFdg4gOXITeKVd0t Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466762" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466762" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:07:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048541" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048541" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:06 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 06/18] x86/sgx: Introduce EPC page states Date: Fri, 22 Sep 2023 20:06:45 -0700 Message-Id: <20230923030657.16148-7-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=2.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:08:48 -0700 (PDT) X-Spam-Level: ** X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777816758040167097 X-GMAIL-MSGID: 1777816758040167097 Use the lower 3 bits in the flags field of sgx_epc_page struct to track EPC states in its life cycle and define an enum for possible states. More state(s) will be added later. Signed-off-by: Haitao Huang --- V4: - No changes other than required for patch reordering. V3: - This is new in V3 to replace the bit mask based approach (requested by Jarkko) --- arch/x86/kernel/cpu/sgx/encl.c | 14 +++++++--- arch/x86/kernel/cpu/sgx/ioctl.c | 7 +++-- arch/x86/kernel/cpu/sgx/main.c | 19 +++++++------ arch/x86/kernel/cpu/sgx/sgx.h | 49 ++++++++++++++++++++++++++++++--- 4 files changed, 71 insertions(+), 18 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 97a53e34a8b4..f5afc8d65e22 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -244,8 +244,12 @@ static struct sgx_epc_page *sgx_encl_load_secs(struct sgx_encl *encl) { struct sgx_epc_page *epc_page = encl->secs.epc_page; - if (!epc_page) + if (!epc_page) { epc_page = sgx_encl_eldu(&encl->secs, NULL); + if (!IS_ERR(epc_page)) + sgx_record_epc_page(epc_page, + SGX_EPC_PAGE_UNRECLAIMABLE); + } return epc_page; } @@ -272,7 +276,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl, return ERR_CAST(epc_page); encl->secs_child_cnt++; - sgx_record_epc_page(epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); + sgx_record_epc_page(epc_page, SGX_EPC_PAGE_RECLAIMABLE); return entry; } @@ -398,7 +402,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma, encl_page->type = SGX_PAGE_TYPE_REG; encl->secs_child_cnt++; - sgx_record_epc_page(epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); + sgx_record_epc_page(epc_page, SGX_EPC_PAGE_RECLAIMABLE); phys_addr = sgx_get_epc_phys_addr(epc_page); /* @@ -1256,6 +1260,8 @@ struct sgx_epc_page *sgx_alloc_va_page(bool reclaim) sgx_encl_free_epc_page(epc_page); return ERR_PTR(-EFAULT); } + sgx_record_epc_page(epc_page, + SGX_EPC_PAGE_UNRECLAIMABLE); return epc_page; } @@ -1315,7 +1321,7 @@ void sgx_encl_free_epc_page(struct sgx_epc_page *page) { int ret; - WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED); + WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_STATE_MASK); ret = __eremove(sgx_get_epc_virt_addr(page)); if (WARN_ONCE(ret, EREMOVE_ERROR_MESSAGE, ret, ret)) diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index a75eb44022a3..9a32bf5a1070 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -113,6 +113,9 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs) encl->attributes = secs->attributes; encl->attributes_mask = SGX_ATTR_UNPRIV_MASK; + sgx_record_epc_page(encl->secs.epc_page, + SGX_EPC_PAGE_UNRECLAIMABLE); + /* Set only after completion, as encl->lock has not been taken. */ set_bit(SGX_ENCL_CREATED, &encl->flags); @@ -322,7 +325,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src, goto err_out; } - sgx_record_epc_page(epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); + sgx_record_epc_page(epc_page, SGX_EPC_PAGE_RECLAIMABLE); mutex_unlock(&encl->lock); mmap_read_unlock(current->mm); return ret; @@ -976,7 +979,7 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl, mutex_lock(&encl->lock); - sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); + sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_RECLAIMABLE); } /* Change EPC type */ diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index dec1d57cbff6..b26860399402 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -318,7 +318,7 @@ static void sgx_reclaim_pages(void) /* The owner is freeing the page. No need to add the * page back to the list of reclaimable pages. */ - epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; + sgx_epc_page_reset_state(epc_page); } spin_unlock(&sgx_global_lru.lock); @@ -344,6 +344,7 @@ static void sgx_reclaim_pages(void) skip: spin_lock(&sgx_global_lru.lock); + sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIMABLE); list_add_tail(&epc_page->list, &sgx_global_lru.reclaimable); spin_unlock(&sgx_global_lru.lock); @@ -367,7 +368,7 @@ static void sgx_reclaim_pages(void) sgx_reclaimer_write(epc_page, &backing[i]); kref_put(&encl_page->encl->refcount, sgx_encl_release); - epc_page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; + sgx_epc_page_reset_state(epc_page); sgx_free_epc_page(epc_page); } @@ -507,9 +508,9 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags) { spin_lock(&sgx_global_lru.lock); - WARN_ON_ONCE(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED); + WARN_ON_ONCE(sgx_epc_page_reclaimable(page->flags)); page->flags |= flags; - if (flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) + if (sgx_epc_page_reclaimable(flags)) list_add_tail(&page->list, &sgx_global_lru.reclaimable); spin_unlock(&sgx_global_lru.lock); } @@ -527,7 +528,7 @@ void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags) int sgx_drop_epc_page(struct sgx_epc_page *page) { spin_lock(&sgx_global_lru.lock); - if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { + if (sgx_epc_page_reclaimable(page->flags)) { /* The page is being reclaimed. */ if (list_empty(&page->list)) { spin_unlock(&sgx_global_lru.lock); @@ -535,7 +536,7 @@ int sgx_drop_epc_page(struct sgx_epc_page *page) } list_del(&page->list); - page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; + sgx_epc_page_reset_state(page); } spin_unlock(&sgx_global_lru.lock); @@ -607,6 +608,8 @@ void sgx_free_epc_page(struct sgx_epc_page *page) struct sgx_epc_section *section = &sgx_epc_sections[page->section]; struct sgx_numa_node *node = section->node; + WARN_ON_ONCE(page->flags & (SGX_EPC_PAGE_STATE_MASK)); + spin_lock(&node->lock); page->owner = NULL; @@ -614,7 +617,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page) list_add(&page->list, &node->sgx_poison_page_list); else list_add_tail(&page->list, &node->free_page_list); - page->flags = SGX_EPC_PAGE_IS_FREE; + page->flags = SGX_EPC_PAGE_FREE; spin_unlock(&node->lock); atomic_long_inc(&sgx_nr_free_pages); @@ -715,7 +718,7 @@ int arch_memory_failure(unsigned long pfn, int flags) * If the page is on a free list, move it to the per-node * poison page list. */ - if (page->flags & SGX_EPC_PAGE_IS_FREE) { + if (page->flags == SGX_EPC_PAGE_FREE) { list_move(&page->list, &node->sgx_poison_page_list); goto out; } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 113d930fd087..2faeb40b345f 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -23,11 +23,36 @@ #define SGX_NR_LOW_PAGES 32 #define SGX_NR_HIGH_PAGES 64 -/* Pages, which are being tracked by the page reclaimer. */ -#define SGX_EPC_PAGE_RECLAIMER_TRACKED BIT(0) +enum sgx_epc_page_state { + /* Not tracked by the reclaimer: + * Pages allocated for virtual EPC which are never tracked by the host + * reclaimer; pages just allocated from free list but not yet put in + * use; pages just reclaimed, but not yet returned to the free list. + * Becomes FREE after sgx_free_epc() + * Becomes RECLAIMABLE or UNRECLAIMABLE after sgx_record_epc() + */ + SGX_EPC_PAGE_NOT_TRACKED = 0, + + /* Page is in the free list, ready for allocation + * Becomes NOT_TRACKED after sgx_alloc_epc_page() + */ + SGX_EPC_PAGE_FREE = 1, + + /* Page is in use and tracked in a reclaimable LRU list + * Becomes NOT_TRACKED after sgx_drop_epc() + */ + SGX_EPC_PAGE_RECLAIMABLE = 2, + + /* Page is in use but tracked in an unreclaimable LRU list. These are + * only reclaimable when the whole enclave is OOM killed or the enclave + * is released, e.g., VA, SECS pages + * Becomes NOT_TRACKED after sgx_drop_epc() + */ + SGX_EPC_PAGE_UNRECLAIMABLE = 3, -/* Pages on free list */ -#define SGX_EPC_PAGE_IS_FREE BIT(1) +}; + +#define SGX_EPC_PAGE_STATE_MASK GENMASK(2, 0) struct sgx_epc_page { unsigned int section; @@ -37,6 +62,22 @@ struct sgx_epc_page { struct list_head list; }; +static inline void sgx_epc_page_reset_state(struct sgx_epc_page *page) +{ + page->flags &= ~SGX_EPC_PAGE_STATE_MASK; +} + +static inline void sgx_epc_page_set_state(struct sgx_epc_page *page, unsigned long flags) +{ + page->flags &= ~SGX_EPC_PAGE_STATE_MASK; + page->flags |= (flags & SGX_EPC_PAGE_STATE_MASK); +} + +static inline bool sgx_epc_page_reclaimable(unsigned long flags) +{ + return SGX_EPC_PAGE_RECLAIMABLE == (flags & SGX_EPC_PAGE_STATE_MASK); +} + /* * Contains the tracking data for NUMA nodes having EPC pages. Most importantly, * the free page list local to the node is stored here. From patchwork Sat Sep 23 03:06:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 143800 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:910f:0:b0:403:3b70:6f57 with SMTP id r15csp61799vqg; Fri, 22 Sep 2023 20:35:23 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEEBF8+arSr9AdmAfG8261vHLpYA7OJ08CAAbff/k5hN/vY3f63G3x/WsRAcp/kzjly+Crt X-Received: by 2002:a05:6a20:42a4:b0:14c:446c:b188 with SMTP id o36-20020a056a2042a400b0014c446cb188mr1423726pzj.37.1695440122938; Fri, 22 Sep 2023 20:35:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695440122; cv=none; d=google.com; s=arc-20160816; b=IUDtSpOXEvu1B/zEpbjnig6VDVRAh9I/9apzGnk3hkhSMCm+wv83iaVTiiGWxnJtHP B2G4GY3q4kPnLXBZDyNAKHm6K1MmF44001u1dBJ5KHNoq+r8CR1TmHrzlS19RaSCPEqY +0xw1lf90Oo3ByxZ/1WH1cgZB8l2o9CBwFENRStdVnYIn3hY8USgg+x5O4caHzlY9juv oQLK6GZfEddRm69MOLg1hegBJ9+/K7+jSTBVm7+20b19r8wazTXhBd9Ays3e18+hHlMg ITM8IJ/waq/38HKdrSyQeQM/Ji2FziDIYF7VfNU3krcX6MqscZhNK9sExIbDpIT8D7+S BG+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=lQOEyM52ivoijOL67lESWRbO5vdP+lqfDE+gG68Q5l0=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=SHP+qjcRZilNpHrCVrWJGEUFfCTyiBtiVJtC098xJ/7KLJO2hl1Dr8AakrCuvt2t1m TzKX17vGzRjPleoW9iXDE9PFe8FMfJvLr4TVBvuzFzhQwDb1/WpXVyO//ZyDI4jISOjK EY8A4udR950+wKUGsHcAx7271B5/7nxN96ZPbFtYJLc+neXEe+1kkjjlT5+s6T4nZLA3 1JknoGKnWZuX1UtuRYpVxbOyJ5R/mqhtLwE5S31pA+bNuCB6evy44PWkxeKjFZqqx6ko c3xZUJbAlCsnkn4B4m7otRh1uvKcLBse7aCkfQ+LFooOwINQsmCSO2C2auRWNwkexloO qpEw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=az+qzZHS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from fry.vger.email (fry.vger.email. [2620:137:e000::3:8]) by mx.google.com with ESMTPS id kq7-20020a170903284700b001bdd1f48f91si4896290plb.564.2023.09.22.20.35.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Sep 2023 20:35:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) client-ip=2620:137:e000::3:8; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=az+qzZHS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:8 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 2A9CB83AD1F7; Fri, 22 Sep 2023 20:07:57 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230208AbjIWDHh (ORCPT + 28 others); Fri, 22 Sep 2023 23:07:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38416 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229865AbjIWDHK (ORCPT ); Fri, 22 Sep 2023 23:07:10 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 30E551A5; Fri, 22 Sep 2023 20:07:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438424; x=1726974424; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6gXh1rHtrkHSzkPHJy21La79GKdM/hChwBw4+Gb9Byg=; b=az+qzZHSkN9xnfdSF71mjpM2TUPQN0TGexiyvj4zL+WcE6kRFBwSOY1x bDU+ZLFwz0NEEQ0y0tbdeZglo5kInAk8tYe3j8PFw47LS0P8Q8JseTUk6 2WZHQybn2y7ISBSgkyDPbnlRgv08CT865sWRTvH59hGOtunMyY2dEzs6X /8Ky8jzwrirtx4lquAJLsM2F12ekoI5hleeHLrfkqylgosCoU1fvbprrB wLertwiScRz7zQu/SVId8G8eVBD/myYC16jwg6pgKt+xgQ23IKt1hwVhA ea8pFbeOkQiCZznr9jNEd4ipGx5FBmPMtVdmr6EfNXJ0vG8IPSSarC1x0 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466770" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466770" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:07:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048548" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048548" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:07 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 07/18] x86/sgx: Introduce RECLAIM_IN_PROGRESS state Date: Fri, 22 Sep 2023 20:06:46 -0700 Message-Id: <20230923030657.16148-8-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=2.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:07:57 -0700 (PDT) X-Spam-Level: ** X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777797822783661209 X-GMAIL-MSGID: 1777797822783661209 From: Sean Christopherson Add RECLAIM_IN_PROGRESS state to not rely on list_empty(&epc_page->list) to determine if an EPC page is selected as a reclaiming candidate. When a page is being reclaimed from the page pool (sgx_global_lru), there is an intermediate stage where a page may have been identified as a candidate for reclaiming, but has not yet been reclaimed. Currently such pages are list_del_init()'d from the global LRU list, and stored in a an array on stack. To prevent another thread from dropping the same page in the middle of reclaiming, sgx_drop_epc_page() checks for list_empty(&epc_page->list). A later patch will replace the array on stack with a temporary list to store the candidate pages, so list_empty() should no longer be used for this purpose. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V4: - Fixed some typos. - Revised commit message. V3: - Extend the sgx_epc_page_state enum introduced earlier to replace the flag based approach. --- arch/x86/kernel/cpu/sgx/main.c | 21 ++++++++++----------- arch/x86/kernel/cpu/sgx/sgx.h | 16 ++++++++++++++++ 2 files changed, 26 insertions(+), 11 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index b26860399402..c1ae19a154d0 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -312,13 +312,15 @@ static void sgx_reclaim_pages(void) list_del_init(&epc_page->list); encl_page = epc_page->owner; - if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) + if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) { + sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIM_IN_PROGRESS); chunk[cnt++] = epc_page; - else + } else { /* The owner is freeing the page. No need to add the * page back to the list of reclaimable pages. */ sgx_epc_page_reset_state(epc_page); + } } spin_unlock(&sgx_global_lru.lock); @@ -528,16 +530,13 @@ void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags) int sgx_drop_epc_page(struct sgx_epc_page *page) { spin_lock(&sgx_global_lru.lock); - if (sgx_epc_page_reclaimable(page->flags)) { - /* The page is being reclaimed. */ - if (list_empty(&page->list)) { - spin_unlock(&sgx_global_lru.lock); - return -EBUSY; - } - - list_del(&page->list); - sgx_epc_page_reset_state(page); + if (sgx_epc_page_reclaim_in_progress(page->flags)) { + spin_unlock(&sgx_global_lru.lock); + return -EBUSY; } + + list_del(&page->list); + sgx_epc_page_reset_state(page); spin_unlock(&sgx_global_lru.lock); return 0; diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 2faeb40b345f..764cec23f4e5 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -40,6 +40,8 @@ enum sgx_epc_page_state { /* Page is in use and tracked in a reclaimable LRU list * Becomes NOT_TRACKED after sgx_drop_epc() + * Becomes RECLAIM_IN_PROGRESS in sgx_reclaim_pages() when identified + * for reclaiming */ SGX_EPC_PAGE_RECLAIMABLE = 2, @@ -50,6 +52,14 @@ enum sgx_epc_page_state { */ SGX_EPC_PAGE_UNRECLAIMABLE = 3, + /* Page is being prepared for reclamation, tracked in a temporary + * isolated list by the reclaimer. + * Changes in sgx_reclaim_pages() back to RECLAIMABLE if preparation + * fails for any reason. + * Becomes NOT_TRACKED if reclaimed successfully in sgx_reclaim_pages() + * and immediately sgx_free_epc() is called to make it FREE. + */ + SGX_EPC_PAGE_RECLAIM_IN_PROGRESS = 4, }; #define SGX_EPC_PAGE_STATE_MASK GENMASK(2, 0) @@ -73,6 +83,12 @@ static inline void sgx_epc_page_set_state(struct sgx_epc_page *page, unsigned lo page->flags |= (flags & SGX_EPC_PAGE_STATE_MASK); } +static inline bool sgx_epc_page_reclaim_in_progress(unsigned long flags) +{ + return SGX_EPC_PAGE_RECLAIM_IN_PROGRESS == (flags & + SGX_EPC_PAGE_STATE_MASK); +} + static inline bool sgx_epc_page_reclaimable(unsigned long flags) { return SGX_EPC_PAGE_RECLAIMABLE == (flags & SGX_EPC_PAGE_STATE_MASK); From patchwork Sat Sep 23 03:06:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 143794 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:910f:0:b0:403:3b70:6f57 with SMTP id r15csp54483vqg; Fri, 22 Sep 2023 20:11:45 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFYLdpiVHbZfHt/ddubAcMVDlF6sLZCA8dGoYNTIjwWjKVV0pO1oU0wSw2Z45QR1gj3JxUY X-Received: by 2002:a05:6a20:2589:b0:152:be08:b013 with SMTP id k9-20020a056a20258900b00152be08b013mr1128313pzd.42.1695438705410; Fri, 22 Sep 2023 20:11:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695438705; cv=none; d=google.com; s=arc-20160816; b=rE8SMvAnP7XoKat3mGkJi4sG5TK2g6tlDapn86LylSUx9okeIaCgQCxu4QrauackjD 1Iej7T6gpiPh2W8u5kmSzYrQsge3RK4U0ibZQCeTwDYcY0wZgZHvSHAUtWMgYWjHFfJz d/s0Yg3IrjhY4Js+aesI5CMi7vO0a6I2fPi1hiOHYzFHH/3EQWVU/i4MpSTtcvIOlJWK d7JVRvSQo8IzX6L5mrGY6TLD/so03fy8JcdH8TWVnHkyn+9H2NjTk6ka8pHHh/bazCd1 jYnpvvmWxetDdTGg+GFhQZdorixdcf1AcCC/ZKBgZWz+mrKRNacvWIapdu2bdOsY0rpe vZqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=uBH2wj+FdMrkff9jSQcAMSwVTgJ0ymfYgKif0ih1XPM=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=ZcSaGpabBv7LbqvUE2+PFgrl2cCMqu5G9yLAhabJDXQhkDoLt9MNJXBfv/AV8WLEld NOyqiL898NQmMGWDzN0NtWp7SJH8TRLJTOhV0yM3BQ5wZHPKT11+cXA1imtucakK8VSw 6u08e5KQFG2vvChAf2Jdq1WDpN1CIEelFL0ItQSMryNggzexDg8fQnTvQVjUxpAVgZE3 6nA+XvcCOQPWl/VkjHNOn8Y1fSnveEvxJzS/O7vQAfIGZ6fo3PNaFMB4D1YiX8wESOqb d17dhYpy/MhMaewJVWMkCoVv0BdVGgMmzQ1Lt1R9nnSy0kGDNVQyF2dA2gJx5Blywaav +ilg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=aqMQTtLg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id ld5-20020a170902fac500b001bb6d711625si5046304plb.279.2023.09.22.20.11.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Sep 2023 20:11:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=aqMQTtLg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id BD1E482EA171; Fri, 22 Sep 2023 20:08:41 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231143AbjIWDHk (ORCPT + 28 others); Fri, 22 Sep 2023 23:07:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229887AbjIWDHL (ORCPT ); Fri, 22 Sep 2023 23:07:11 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0718B1A7; Fri, 22 Sep 2023 20:07:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438425; x=1726974425; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=mUkcYfLatdROP3kNi+VVBQokEDDPoz07w/WU/sAEnUA=; b=aqMQTtLgGuf15Tse8w9MPy4+p26ouclTOBIzA+hYGIu+gIGEohogC+CP /5q+y3K6kdY/5ZQTc3g/WD02temprhNtd0bdlc7Jq7fQ+fyqtfpuG26JA kbapsrrtre+ISa5PlDVXzWI/k0SS9i+mZaT6DkBL80zrPb6uvla85c2eY BWSfT6iO/qGp6GGC0PyCLW/pUxnVgmZW7lskREAsFzXTJRcsO1BupQOB0 bEUmW88gVGv/APd+sinwJFAik+EwevVFJvDIjtLbfKKkSRmVWHDwtTOLg es3spATI39p3kx3MY+zzjsCfdeW6l/IbqzhPOjKMT7AK3UZLbqYaeWf6d g==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466778" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466778" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:07:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048551" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048551" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:08 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 08/18] x86/sgx: Use a list to track to-be-reclaimed pages Date: Fri, 22 Sep 2023 20:06:47 -0700 Message-Id: <20230923030657.16148-9-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=2.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:08:42 -0700 (PDT) X-Spam-Level: ** X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777796335810469919 X-GMAIL-MSGID: 1777796335810469919 From: Sean Christopherson Change sgx_reclaim_pages() to use a list rather than an array for storing the epc_pages which will be reclaimed. This change is needed to transition to the LRU implementation for EPC cgroup support. When the EPC cgroup is implemented, the reclaiming process will do a pre-order tree walk for the subtree starting from the limit-violating cgroup. When each node is visited, candidate pages are selected from its "reclaimable" LRU list and moved into this temporary list. Passing a list from node to node for temporary storage in this walk is more straightforward than using an array. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V4: - Changes needed for patch reordering - Revised commit message V3: - Removed list wrappers --- arch/x86/kernel/cpu/sgx/main.c | 40 +++++++++++++++------------------- 1 file changed, 18 insertions(+), 22 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index c1ae19a154d0..fba06dc5abfe 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -293,12 +293,11 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, */ static void sgx_reclaim_pages(void) { - struct sgx_epc_page *chunk[SGX_NR_TO_SCAN]; struct sgx_backing backing[SGX_NR_TO_SCAN]; + struct sgx_epc_page *epc_page, *tmp; struct sgx_encl_page *encl_page; - struct sgx_epc_page *epc_page; pgoff_t page_index; - int cnt = 0; + LIST_HEAD(iso); int ret; int i; @@ -314,18 +313,22 @@ static void sgx_reclaim_pages(void) if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) { sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIM_IN_PROGRESS); - chunk[cnt++] = epc_page; + list_move_tail(&epc_page->list, &iso); } else { - /* The owner is freeing the page. No need to add the - * page back to the list of reclaimable pages. + /* The owner is freeing the page, remove it from the + * LRU list */ sgx_epc_page_reset_state(epc_page); + list_del_init(&epc_page->list); } } spin_unlock(&sgx_global_lru.lock); - for (i = 0; i < cnt; i++) { - epc_page = chunk[i]; + if (list_empty(&iso)) + return; + + i = 0; + list_for_each_entry_safe(epc_page, tmp, &iso, list) { encl_page = epc_page->owner; if (!sgx_reclaimer_age(epc_page)) @@ -340,6 +343,7 @@ static void sgx_reclaim_pages(void) goto skip; } + i++; encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; mutex_unlock(&encl_page->encl->lock); continue; @@ -347,27 +351,19 @@ static void sgx_reclaim_pages(void) skip: spin_lock(&sgx_global_lru.lock); sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIMABLE); - list_add_tail(&epc_page->list, &sgx_global_lru.reclaimable); + list_move_tail(&epc_page->list, &sgx_global_lru.reclaimable); spin_unlock(&sgx_global_lru.lock); kref_put(&encl_page->encl->refcount, sgx_encl_release); - - chunk[i] = NULL; - } - - for (i = 0; i < cnt; i++) { - epc_page = chunk[i]; - if (epc_page) - sgx_reclaimer_block(epc_page); } - for (i = 0; i < cnt; i++) { - epc_page = chunk[i]; - if (!epc_page) - continue; + list_for_each_entry(epc_page, &iso, list) + sgx_reclaimer_block(epc_page); + i = 0; + list_for_each_entry_safe(epc_page, tmp, &iso, list) { encl_page = epc_page->owner; - sgx_reclaimer_write(epc_page, &backing[i]); + sgx_reclaimer_write(epc_page, &backing[i++]); kref_put(&encl_page->encl->refcount, sgx_encl_release); sgx_epc_page_reset_state(epc_page); From patchwork Sat Sep 23 03:06:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 143854 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp72113vqu; Sat, 23 Sep 2023 02:20:23 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFuIRSOGiZEtXLf2CfT5qqnGfXkZ1B3sSxExiPTO+34h6/ZYB2iwPi26+MWclNVzZpw2D+H X-Received: by 2002:a05:6a00:1142:b0:690:1720:aa83 with SMTP id b2-20020a056a00114200b006901720aa83mr2170187pfm.21.1695460823177; Sat, 23 Sep 2023 02:20:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695460823; cv=none; d=google.com; s=arc-20160816; b=xhsnYi7roMR4sYBv0yPXt/VnBvJUMtPkNcNFkBZxr9YPmj0dZm5MElE+6NChh6gWid bI0PJ5iXAbEYjvpXyT6efKzCbixmK7eK6lkK2LLFl4PsrgviGyERO2HnR2TkBEDywqUD 3OK6sGXzMzIWDybCrKCtKU4r2++ElT2DLZUiJ7jxs+LBoUOw1k42oBv8hHslRa32MUjE jsZdzE9iahfGIGlvKW1KuNv9nx/lhuY6iq9eEUjGm0mdficvMIAsJ3dIEJ4qHWzc2cPZ L/judD1N7cK4lasUf4de8Oxb0YIIqREor71b1IUdmy97VqBdi7giOQvsJgztGREshN8t u1tg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=EDB3EXCsqLvDYTGlYabBHLAt2PxBptC/EVFXVM0Jsq8=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=xVjmc+gJGZtTRFKNbUCWqRN1VqUcQtlezrCeCsBA1UovzkuKlNkNiyYqKj4sZmA+Dh x1dN60X6AelIW1i8o4e6ZIdc6KiNn6B7xKB8Fy43h3EkZ2ZRJtOEvgaTO1z7jv/BM0ki icaLZCZk1rMq90h4oVjWK5a7mkGVuRoMzzYQHgQ4TCHw2vwwGC1FmyrWzf7CAFoqTQkX NCRYNQGIXnfbcQSxJfhdfAhEah90HF9NWUpp56vrZ/kqOpb8wbBAEMUmZF3JZ6H9PLJu oavr+MHQDjWnqavShTS59XQmB5QkozOKBZ8+7YJjR1cDC9FdbmYv2wBPmjvADbycIEtz DQyw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UGcAPNMK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id w8-20020a63f508000000b0055391572218si5485544pgh.26.2023.09.23.02.20.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 23 Sep 2023 02:20:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UGcAPNMK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id B79FE80ADF22; Fri, 22 Sep 2023 20:08:07 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229919AbjIWDHp (ORCPT + 28 others); Fri, 22 Sep 2023 23:07:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40124 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229892AbjIWDHM (ORCPT ); Fri, 22 Sep 2023 23:07:12 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87DAE1A8; Fri, 22 Sep 2023 20:07:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438425; x=1726974425; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CPVq7hMKhTxLnKfdXR7Q7yMpcir8DY5XsL5VtYGW0ew=; b=UGcAPNMKOqBSrrtcEHqIYv7pf84DCxGXAqwrmd0+u9QtO/Aw004XgPWr Ue7jVJvi1e1eSbT3G7YBJQp4BN5Oo9RR6JJi+i9lPrI9JyHT+YoKNWB4g 3C3mPZ6MCG+8n2RYMFKmYZdBgpaX8ZyrEx+h1hHwPLDlwwBJ1OHitT2MC kxCjQSBIF1ImeZqh+IHyqCdc5cm6bOEYicIUEJp7A12grMLWY7YqfVvt4 cZhoJrwykFjoGWBrKLi0SfW0rkxImBWeBptIgnmD4E40Lm2qn40TNAWQ8 MKBLRtdOgylAzZMErwENispd5wHnBXczNPZeeKZxekJQpo/FWB5CSpoiZ g==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466787" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466787" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:07:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048555" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048555" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:08 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 09/18] x86/sgx: Store struct sgx_encl when allocating new VA pages Date: Fri, 22 Sep 2023 20:06:48 -0700 Message-Id: <20230923030657.16148-10-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:08:07 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777819528350105035 X-GMAIL-MSGID: 1777819528350105035 From: Sean Christopherson In a later patch, when a cgroup has exceeded the max capacity for EPC pages, it may need to identify and OOM kill a less active enclave to make room for other enclaves within the same group. Such a victim enclave would have no active pages other than the unreclaimable Version Array (VA) and SECS pages. Therefore, the cgroup needs examine its unreclaimable page list, and finding an enclave given a SECS page or a VA page. This will require a backpointer from a page to an enclave, which is not available for VA pages. Because struct sgx_epc_page instances of VA pages are not owned by an sgx_encl_page instance, mark their owner as sgx_encl: pass the struct sgx_encl of the enclave allocating the VA page to sgx_alloc_epc_page(), which will store this value in the owner field of the struct sgx_epc_page. In a later patch, VA pages will be placed in an unreclaimable queue that can be examined by the cgroup to select the OOM killed enclave. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V5: - Fixed some comments in code (Jarkko) V4: - Changes needed for patch reordering - Revised commit messages (Jarkko) --- arch/x86/kernel/cpu/sgx/encl.c | 5 +++-- arch/x86/kernel/cpu/sgx/encl.h | 2 +- arch/x86/kernel/cpu/sgx/ioctl.c | 2 +- arch/x86/kernel/cpu/sgx/main.c | 20 ++++++++++---------- arch/x86/kernel/cpu/sgx/sgx.h | 7 ++++++- 5 files changed, 21 insertions(+), 15 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index f5afc8d65e22..ec3402d41b63 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -1236,6 +1236,7 @@ void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr) /** * sgx_alloc_va_page() - Allocate a Version Array (VA) page + * @encl: The new owner of the page. * @reclaim: Reclaim EPC pages directly if none available. Enclave * mutex should not be held if this is set. * @@ -1245,12 +1246,12 @@ void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr) * a VA page, * -errno otherwise */ -struct sgx_epc_page *sgx_alloc_va_page(bool reclaim) +struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim) { struct sgx_epc_page *epc_page; int ret; - epc_page = sgx_alloc_epc_page(NULL, reclaim); + epc_page = sgx_alloc_epc_page(encl, reclaim); if (IS_ERR(epc_page)) return ERR_CAST(epc_page); diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h index f94ff14c9486..831d63f80f5a 100644 --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -116,7 +116,7 @@ struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl, unsigned long offset, u64 secinfo_flags); void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr); -struct sgx_epc_page *sgx_alloc_va_page(bool reclaim); +struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim); unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page); void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset); bool sgx_va_page_full(struct sgx_va_page *va_page); diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index 9a32bf5a1070..164256ea18d0 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -30,7 +30,7 @@ struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl, bool reclaim) if (!va_page) return ERR_PTR(-ENOMEM); - va_page->epc_page = sgx_alloc_va_page(reclaim); + va_page->epc_page = sgx_alloc_va_page(encl, reclaim); if (IS_ERR(va_page->epc_page)) { err = ERR_CAST(va_page->epc_page); kfree(va_page); diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index fba06dc5abfe..ed813288af44 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -107,7 +107,7 @@ static unsigned long __sgx_sanitize_pages(struct list_head *dirty_page_list) static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page) { - struct sgx_encl_page *page = epc_page->owner; + struct sgx_encl_page *page = epc_page->encl_page; struct sgx_encl *encl = page->encl; struct sgx_encl_mm *encl_mm; bool ret = true; @@ -139,7 +139,7 @@ static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page) static void sgx_reclaimer_block(struct sgx_epc_page *epc_page) { - struct sgx_encl_page *page = epc_page->owner; + struct sgx_encl_page *page = epc_page->encl_page; unsigned long addr = page->desc & PAGE_MASK; struct sgx_encl *encl = page->encl; int ret; @@ -196,7 +196,7 @@ void sgx_ipi_cb(void *info) static void sgx_encl_ewb(struct sgx_epc_page *epc_page, struct sgx_backing *backing) { - struct sgx_encl_page *encl_page = epc_page->owner; + struct sgx_encl_page *encl_page = epc_page->encl_page; struct sgx_encl *encl = encl_page->encl; struct sgx_va_page *va_page; unsigned int va_offset; @@ -249,7 +249,7 @@ static void sgx_encl_ewb(struct sgx_epc_page *epc_page, static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, struct sgx_backing *backing) { - struct sgx_encl_page *encl_page = epc_page->owner; + struct sgx_encl_page *encl_page = epc_page->encl_page; struct sgx_encl *encl = encl_page->encl; struct sgx_backing secs_backing; int ret; @@ -309,7 +309,7 @@ static void sgx_reclaim_pages(void) break; list_del_init(&epc_page->list); - encl_page = epc_page->owner; + encl_page = epc_page->encl_page; if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) { sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIM_IN_PROGRESS); @@ -329,7 +329,7 @@ static void sgx_reclaim_pages(void) i = 0; list_for_each_entry_safe(epc_page, tmp, &iso, list) { - encl_page = epc_page->owner; + encl_page = epc_page->encl_page; if (!sgx_reclaimer_age(epc_page)) goto skip; @@ -362,7 +362,7 @@ static void sgx_reclaim_pages(void) i = 0; list_for_each_entry_safe(epc_page, tmp, &iso, list) { - encl_page = epc_page->owner; + encl_page = epc_page->encl_page; sgx_reclaimer_write(epc_page, &backing[i++]); kref_put(&encl_page->encl->refcount, sgx_encl_release); @@ -562,7 +562,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) for ( ; ; ) { page = __sgx_alloc_epc_page(); if (!IS_ERR(page)) { - page->owner = owner; + page->encl_page = owner; break; } @@ -607,7 +607,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page) spin_lock(&node->lock); - page->owner = NULL; + page->encl_page = NULL; if (page->poison) list_add(&page->list, &node->sgx_poison_page_list); else @@ -642,7 +642,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size, for (i = 0; i < nr_pages; i++) { section->pages[i].section = index; section->pages[i].flags = 0; - section->pages[i].owner = NULL; + section->pages[i].encl_page = NULL; section->pages[i].poison = 0; list_add_tail(§ion->pages[i].list, &sgx_dirty_page_list); } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 764cec23f4e5..5110dd433b80 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -68,7 +68,12 @@ struct sgx_epc_page { unsigned int section; u16 flags; u16 poison; - struct sgx_encl_page *owner; + + /* Possible owner types */ + union { + struct sgx_encl_page *encl_page; + struct sgx_encl *encl; + }; struct list_head list; }; From patchwork Sat Sep 23 03:06:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 143919 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp209559vqu; Sat, 23 Sep 2023 07:30:16 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEzh5CfLlc0SYwQZ1hLARRXJeiQTSmiBZcNIaCsziOKle1MBL8pQPiYwbnFo2df4sltKsod X-Received: by 2002:a05:6358:9320:b0:13a:4120:ce2e with SMTP id x32-20020a056358932000b0013a4120ce2emr2952383rwa.20.1695479416108; Sat, 23 Sep 2023 07:30:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695479416; cv=none; d=google.com; s=arc-20160816; b=BY5mL1JeA5mpd+pF0ge0FEGtoqR2dE3g7KH8YcDWp1+Q6uKbizjwEHrRu+GjzJtcjb R3c1+1lb/Jd8waZ3kQN7ySKEYmbYJfFmQRY2GFHe2ydsa/grY5HW31gYGeCB9p3MJQ9D M75zOHhKXj+eX0ndrLIQ3rpHXAKrGzorp3BaHbNvwkGZOvNtg3eWIcwh/640a2K7+c6I 6TzRveccJx5B2dTA/aR/ycTZOQMBZDuWCpBA4ISeJkS+Lz1rScUwYFsQ86gq5lYxaTFH pA6UmobT4LODOthfOMqo9wsKrXQQIsryvoafdtZS0QjW+OE1odhI0bhVmZD0n5GimwTa XaJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=izbdcrGLbqCNLEY8Fiam4brrWR+nP2g4UE9Ir99aJrY=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=P3ftq5UZfuZUv2FD1mlQQhP0ZWxe6rwVhY62tUfXIGtiVdFTWMka1jVtx0jZ5ZgWek lw83viP2vcaK3jZ/oBNhgfWfjJUokiSVixXtZbXY6t+FjmTVdLot6gbzBvXI1fOva7Sa iJlb8lXBj62rwltz+YHIFeGQ2MsLZraq9VyzIkdKMGjjfUQLB/6jAIgCZqqZbxM50SGc iT0XwV3QzR1gugkdcAOd/aqzuA5LyBzZ1cdBPMauHY7A4dOLug/dYLlc+DkapiHrqkq9 WFU/G/snfaKrP6H+aHyBtQVVFT0BAQ8qQawv/jiyX7JUvJu/rCeMNRwf0rby0Su4wEms GFGw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=NQAKpaPC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id w2-20020a637b02000000b00573fb2f7537si6114551pgc.586.2023.09.23.07.30.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 23 Sep 2023 07:30:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=NQAKpaPC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 36534823911E; Fri, 22 Sep 2023 20:08:39 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231164AbjIWDHt (ORCPT + 28 others); Fri, 22 Sep 2023 23:07:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229897AbjIWDHM (ORCPT ); Fri, 22 Sep 2023 23:07:12 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BF361AC; Fri, 22 Sep 2023 20:07:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438426; x=1726974426; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6rGwrEe4Ukw7FcCDvhuspXyNUODXLObGeH1s03Sjd5k=; b=NQAKpaPCKs6WV2O14ecofFZZ9esdSnu0asaZdOVT+XQageN1ztqicilL rzAY1Oc6Dj3M1jWLYOwxjOf003wyh9gJpnluU4FkJiYW5BPqxk9+TVxpI Uyukf/dregEBk1zKYgAO1z//YfiwxErw/Si3OBB/C9mBM64Ht/Yernpur HDErU2BNxrok2XAhulmiKK7Y8dL+xGUCwVSG9gEJOn2GMNzgdjvswrEU1 avvk1vtjlIOzvXABdLE2yrt07Jhu8RPC0f8CwtJyO4cZ7swcDqY33gc2m 3ohVt24lWqYDZrkNv/JFTWdTwaoeU47on21VzB/EkYFlGnQpnRv1CQCLX A==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466795" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466795" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:07:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048558" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048558" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:09 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 10/18] x86/sgx: Add EPC page flags to identify owner types Date: Fri, 22 Sep 2023 20:06:49 -0700 Message-Id: <20230923030657.16148-11-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=2.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:08:40 -0700 (PDT) X-Spam-Level: ** X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777839024018853974 X-GMAIL-MSGID: 1777839024018853974 From: Sean Christopherson Two types of owners of struct sgx_epc_page, 'sgx_encl' for VA pages and 'sgx_encl_page' can be stored in the previously introduced union field. OOM support for cgroups requires that the owner needs to be identified when selecting pages from the unreclaimable list. Address this by adding flags for the owner type. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V4: - Updates for patch reordering. - Rename SGX_EPC_OWNER_ENCL_PAGE to SGX_EPC_OWNER_PAGE. (Jarkko) - Commit message changes. (Jarkko) --- arch/x86/kernel/cpu/sgx/encl.c | 9 +++++---- arch/x86/kernel/cpu/sgx/ioctl.c | 6 ++++-- arch/x86/kernel/cpu/sgx/sgx.h | 6 ++++++ 3 files changed, 15 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index ec3402d41b63..da1657813fce 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -248,6 +248,7 @@ static struct sgx_epc_page *sgx_encl_load_secs(struct sgx_encl *encl) epc_page = sgx_encl_eldu(&encl->secs, NULL); if (!IS_ERR(epc_page)) sgx_record_epc_page(epc_page, + SGX_EPC_OWNER_PAGE | SGX_EPC_PAGE_UNRECLAIMABLE); } @@ -276,7 +277,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl, return ERR_CAST(epc_page); encl->secs_child_cnt++; - sgx_record_epc_page(epc_page, SGX_EPC_PAGE_RECLAIMABLE); + sgx_record_epc_page(epc_page, SGX_EPC_OWNER_PAGE | SGX_EPC_PAGE_RECLAIMABLE); return entry; } @@ -402,7 +403,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma, encl_page->type = SGX_PAGE_TYPE_REG; encl->secs_child_cnt++; - sgx_record_epc_page(epc_page, SGX_EPC_PAGE_RECLAIMABLE); + sgx_record_epc_page(epc_page, SGX_EPC_OWNER_PAGE | SGX_EPC_PAGE_RECLAIMABLE); phys_addr = sgx_get_epc_phys_addr(epc_page); /* @@ -1261,8 +1262,8 @@ struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim) sgx_encl_free_epc_page(epc_page); return ERR_PTR(-EFAULT); } - sgx_record_epc_page(epc_page, - SGX_EPC_PAGE_UNRECLAIMABLE); + sgx_record_epc_page(epc_page, SGX_EPC_OWNER_ENCL | + SGX_EPC_PAGE_UNRECLAIMABLE); return epc_page; } diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index 164256ea18d0..cd338e93acc1 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -114,6 +114,7 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs) encl->attributes_mask = SGX_ATTR_UNPRIV_MASK; sgx_record_epc_page(encl->secs.epc_page, + SGX_EPC_OWNER_PAGE | SGX_EPC_PAGE_UNRECLAIMABLE); /* Set only after completion, as encl->lock has not been taken. */ @@ -325,7 +326,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src, goto err_out; } - sgx_record_epc_page(epc_page, SGX_EPC_PAGE_RECLAIMABLE); + sgx_record_epc_page(epc_page, SGX_EPC_OWNER_PAGE | SGX_EPC_PAGE_RECLAIMABLE); mutex_unlock(&encl->lock); mmap_read_unlock(current->mm); return ret; @@ -979,7 +980,8 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl, mutex_lock(&encl->lock); - sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_RECLAIMABLE); + sgx_record_epc_page(entry->epc_page, + SGX_EPC_OWNER_PAGE | SGX_EPC_PAGE_RECLAIMABLE); } /* Change EPC type */ diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 5110dd433b80..51aba1cd1937 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -64,6 +64,12 @@ enum sgx_epc_page_state { #define SGX_EPC_PAGE_STATE_MASK GENMASK(2, 0) +/* flag for pages owned by a sgx_encl_page */ +#define SGX_EPC_OWNER_PAGE BIT(3) + +/* flag for pages owned by a sgx_encl struct */ +#define SGX_EPC_OWNER_ENCL BIT(4) + struct sgx_epc_page { unsigned int section; u16 flags; From patchwork Sat Sep 23 03:06:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 143900 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp174933vqu; Sat, 23 Sep 2023 06:22:03 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFIzGBHEjvzw3BJNE2y5HnjKIltKHeIINQutRcueHhIYt58latiUsZi7EBKprJAuQN1z8Bo X-Received: by 2002:a17:903:1cb:b0:1b7:e86f:7631 with SMTP id e11-20020a17090301cb00b001b7e86f7631mr2268552plh.19.1695475323372; Sat, 23 Sep 2023 06:22:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695475323; cv=none; d=google.com; s=arc-20160816; b=NEioZ/7hIvwVXGA62ZZdnRIuVpOsVjfbgA6z9l+oqkt/jw2mqYEC84kQK4pYn/SCHQ DXMhcq6nHjj9vsenglyB/9HYqTHoOOhazCvzet1qEJXDNaVIeNZ8GDU5bIsBOssjhvIe AFZ2LEuyMNmggf4wlvQxSbGepRU1Z0tlnUWxIVSzgHNuYEUe9Wo3B3GZhCDOdEV5JhPz +FbTd/NxnCHzcazuj2aoXoL0xAF07b8eqbPkBUgHZ18d3YAYGhW132OB1zeO4DRJeiuN wvPwUW2LiThBQjBNmRTTdoSrGqvrf0W7yvgVq+7fosjM9Lbmmzhg/xAP/mWOtMuOXWjY Fspw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=XAAahNJuaZvLnZHRwJOUt9YprSvVS4ZGoBZaOqYfw58=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=X3PIpkErwrncWVhq4RG2cWL+eOtpFe7tEq89KHjrGxn9Nxa1tkpEl0JPnEq1DPoULn 8BfcHDM+rEXgZLTY3cGb2abgghUDAL/PDIxHJl8O1U6+ZZJyiL46I+Y0fzGIaRVlHz7R JcvwPVGvyRgxezBu52xpMqQolbwkcz9GVxshDLgul7BPbYkDDNc+KCAKBcpx718/ZB0e HPyCRNDh4CsSP9sRXUiMBDnFAzGE6Rw6JnkY7f9QwIdOHqK5jozEMT+v9fbX9+R+t/iK 3rkOW7dheoNjHtodeA48he2MYNMJcAgCO5DzDhmTIyB3QTry8KFROOMdSCsaTtyiJ3Vw nq8w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Dpj3LOxn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id s3-20020a170902ea0300b001bdb34b67basi6499779plg.369.2023.09.23.06.22.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 23 Sep 2023 06:22:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Dpj3LOxn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 2EE1B83ADC83; Fri, 22 Sep 2023 20:08:10 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231205AbjIWDH5 (ORCPT + 28 others); Fri, 22 Sep 2023 23:07:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58350 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229999AbjIWDHV (ORCPT ); Fri, 22 Sep 2023 23:07:21 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 154551B1; Fri, 22 Sep 2023 20:07:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438427; x=1726974427; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=C63qJ730rbliOEJWIvRszebH9fI5wJTL885JlaOiRVw=; b=Dpj3LOxn5sbyqXNxDgx8ly+nkJEviBSKrGMtBUpvgyuRW1on8C9y1F6H BJ9Gn3ddYnvhqnOo2OcmQkAbH1AM4JMO1jR1V+xKOqfH4fUmQuXxI5IyS QxXNM7JHutMbfPL1r++5BR1xtCWtgjqLeRRv4RRrUdxdlnF1+GZKDfwU2 RZ52hluiazsvjy3RmwJfoV7mkKNxs7e8LSSI+RznArLUjkmf3bn5bS6Fg I4aEjLL5lCgy3VaGAWF8+o4/NsJaL3FbpMHRL7yDPO2G3qEfni/QtB1Ib V3nmaUyX7uk+Az8O0bU9EpsZvFg/5EPLWRhpbg3a9mfjEwVM0d8K3N67n w==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466805" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466805" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:07:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048565" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048565" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:10 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 11/18] x86/sgx: store unreclaimable pages in LRU lists Date: Fri, 22 Sep 2023 20:06:50 -0700 Message-Id: <20230923030657.16148-12-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=2.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:08:10 -0700 (PDT) X-Spam-Level: ** X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777834732426329391 X-GMAIL-MSGID: 1777834732426329391 From: Sean Christopherson When an OOM event occurs, all pages associated with an enclave will need to be freed, including pages that are not currently tracked by the cgroup LRU lists. Add a new "unreclaimable" list to the sgx_epc_lru_lists struct and update the "sgx_record/drop_epc_pages()" functions for adding/removing VA and SECS pages to/from this "unreclaimable" list. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V4: - Updates for patch reordering. - Revised commit messages. - Revised comments for the list. V3: - Removed tracking virtual EPC pages in unreclaimable list as host kernel does not reclaim them. The EPC cgroups implemented later only blocks allocating for a guest if the limit is reached by returning -ENOMEM from sgx_alloc_epc_page() called by virt_epc, and does nothing else. Therefore, no need to track those in LRU lists. --- arch/x86/kernel/cpu/sgx/encl.c | 2 ++ arch/x86/kernel/cpu/sgx/ioctl.c | 1 + arch/x86/kernel/cpu/sgx/main.c | 3 +++ arch/x86/kernel/cpu/sgx/sgx.h | 8 +++++++- 4 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index da1657813fce..a8617e6a4b4e 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -746,6 +746,7 @@ void sgx_encl_release(struct kref *ref) xa_destroy(&encl->page_array); if (!encl->secs_child_cnt && encl->secs.epc_page) { + sgx_drop_epc_page(encl->secs.epc_page); sgx_encl_free_epc_page(encl->secs.epc_page); encl->secs.epc_page = NULL; } @@ -754,6 +755,7 @@ void sgx_encl_release(struct kref *ref) va_page = list_first_entry(&encl->va_pages, struct sgx_va_page, list); list_del(&va_page->list); + sgx_drop_epc_page(va_page->epc_page); sgx_encl_free_epc_page(va_page->epc_page); kfree(va_page); } diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index cd338e93acc1..50ddd8988452 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -48,6 +48,7 @@ void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page) encl->page_cnt--; if (va_page) { + sgx_drop_epc_page(va_page->epc_page); sgx_encl_free_epc_page(va_page->epc_page); list_del(&va_page->list); kfree(va_page); diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index ed813288af44..f3a3ed894616 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -268,6 +268,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, goto out; sgx_encl_ewb(encl->secs.epc_page, &secs_backing); + sgx_drop_epc_page(encl->secs.epc_page); sgx_encl_free_epc_page(encl->secs.epc_page); encl->secs.epc_page = NULL; @@ -510,6 +511,8 @@ void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags) page->flags |= flags; if (sgx_epc_page_reclaimable(flags)) list_add_tail(&page->list, &sgx_global_lru.reclaimable); + else + list_add_tail(&page->list, &sgx_global_lru.unreclaimable); spin_unlock(&sgx_global_lru.lock); } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 51aba1cd1937..337747bef7c2 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -152,17 +152,23 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page) } /* - * Tracks EPC pages reclaimable by the reclaimer (ksgxd). + * Contains EPC pages tracked by the reclaimer (ksgxd). */ struct sgx_epc_lru_lists { spinlock_t lock; struct list_head reclaimable; + /* + * Tracks SECS, VA pages,etc., pages only freeable after all its + * dependent reclaimables are freed. + */ + struct list_head unreclaimable; }; static inline void sgx_lru_init(struct sgx_epc_lru_lists *lrus) { spin_lock_init(&lrus->lock); INIT_LIST_HEAD(&lrus->reclaimable); + INIT_LIST_HEAD(&lrus->unreclaimable); } struct sgx_epc_page *__sgx_alloc_epc_page(void); From patchwork Sat Sep 23 03:06:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 143868 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp97809vqu; Sat, 23 Sep 2023 03:34:58 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFG3asnC0BMDOW90Gvi4cPm0m7/VZRkJwiljIGzNOw60UryL0S5dtuWUiu/h7k210P4W6RO X-Received: by 2002:a17:902:ea0f:b0:1c0:9b7c:f82a with SMTP id s15-20020a170902ea0f00b001c09b7cf82amr2163702plg.53.1695465298166; Sat, 23 Sep 2023 03:34:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695465298; cv=none; d=google.com; s=arc-20160816; b=J9Raqhkubp4whBE2ZDrA3G0YTQF0xAGPx87Elj1LugRa4xgNd+WoNrZErRFmphWlAu Lc4HNpyRIb6A3r+e6Nw5LQOfE1NC4PdqQLOpi9Nm56o8aTyMZcejyVRJGJGOKRSHh/D9 z4VrqUdn3QPGriHHxa2Ty5mOpdexyALh1zezsqn/acL9QmUD+nOt09hvPfU1bNGfLpON XZFC0Xz0/qJ28RBjppr8BiPX4RKAjUT5IHH3TKkDk6DgGeZQCThQn752En2an7NZGf6s f9i+uVC8TyexusAS0vnqKcWnze3fBuuTSiLIT3L6JRUjEy7UA9hXixOPslpWFjqLTBUz ARHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=+d4Zlwng/sU2s6pT91qFAWROKIAQwXOAwcoMCKJVljs=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=ca07YP5GCkRb/4ge2q3AhWBUOv7iHUYGbmsSE5303a6knZ2eE3iSZf7xfu08BT7CC+ YZeowHcJAGQuA4Ttik+8rTzTaEQwlFdZr0OrzbCNf25x1r2AGHxmZm1EWR/TJx+V4l/B RP8Y5WmJ8GQyEABZGXGeti+zMkmC/fkFf5yzbY5OYKdE5dw9EoijnA9gpQQsJm5wg5fY nK2CdtAvF0xLtdkUvUMs/b9WC1PAzhBK6cinKJzhY3YjG1XDq1a+Vh31VJWpfyzyaEVO Amljx4uVxtBcFfds1Yx49zNF2GvXQZWnTD1V3+O6st0gYZ9fVdJ3N9sW9PhcVGZV358z 2X/g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UJ2WzrC9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id u18-20020a170903309200b001b9e9b21249si5236728plc.649.2023.09.23.03.34.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 23 Sep 2023 03:34:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UJ2WzrC9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id A273382DB2DF; Fri, 22 Sep 2023 20:08:34 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230369AbjIWDID (ORCPT + 28 others); Fri, 22 Sep 2023 23:08:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230032AbjIWDHW (ORCPT ); Fri, 22 Sep 2023 23:07:22 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3FD4B1B7; Fri, 22 Sep 2023 20:07:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438428; x=1726974428; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UA/9J/zQf4eOo4qQvjhwmYZhqKa4R4W6o4TLEyujPbk=; b=UJ2WzrC9LgSZ2iMJTLGaI1esoNazhmkK/wxndmfkM84dSxgD6JV3hXLE Mkf41/Ks7+5vKcL6+6aXDcaYWpNCvoXrtUEVxA3bni+CJKZouR15+jpJk FpH97v2kHMx5aIFcq0ovdAs+CNU534dSf+oA9DnkSU5nJwAzc7xXJy30i cZuEf6Tafp0WoNI1Kr0OFTEYFXGXTISHX70HGaN+tqnEdC/UnMEkRvbs4 2qdSYahzC3JWwHha6Lxh9zineiGbL/q2KiXzU2DhviacDDnHibGBr2j/d +b6QYG/H979dNgR9XDPnKCi0u70nG0z+z6jILh7eO6ylcFwrW5mC94IwI w==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466815" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466815" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:07:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048570" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048570" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:10 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC Date: Fri, 22 Sep 2023 20:06:51 -0700 Message-Id: <20230923030657.16148-13-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=2.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:08:34 -0700 (PDT) X-Spam-Level: ** X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777824220441401020 X-GMAIL-MSGID: 1777824220441401020 From: Sean Christopherson Introduce the OOM path for killing an enclave with a reclaimer that is no longer able to reclaim enough EPC pages. Find a victim enclave, which will be an enclave with only "unreclaimable" EPC pages left in the cgroup LRU lists. Once a victim is identified, mark the enclave as OOM and zap the enclave's entire page range, and drain all mm references in encl->mm_list. Block allocating any EPC pages in #PF handler, or reloading any pages in all paths, or creating any new mappings. The OOM killing path may race with the reclaimers: in some cases, the victim enclave is in the process of reclaiming the last EPC pages when OOM happens, that is, all pages other than SECS and VA pages are in RECLAIMING_IN_PROGRESS state. The reclaiming process requires access to the enclave backing, VA pages as well as SECS. So the OOM killer does not directly release those enclave resources, instead, it lets all reclaiming in progress to finish, and relies (as currently done) on kref_put on encl->refcount to trigger sgx_encl_release() to do the final cleanup. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V5: - Rename SGX_ENCL_OOM to SGX_ENCL_NO_MEMORY V4: - Updates for patch reordering and typo fixes. V3: - Rebased to use the new VMA_ITERATOR to zap VMAs. - Fixed the racing cases by blocking new page allocation/mapping and reloading when enclave is marked for OOM. And do not release any enclave resources other than draining mm_list entries, and let pages in RECLAIMING_IN_PROGRESS to be reaped by reclaimers. - Due to above changes, also removed the no-longer needed encl->lock in the OOM path which was causing deadlocks reported by the lock prover. --- arch/x86/kernel/cpu/sgx/driver.c | 27 +----- arch/x86/kernel/cpu/sgx/encl.c | 48 ++++++++++- arch/x86/kernel/cpu/sgx/encl.h | 2 + arch/x86/kernel/cpu/sgx/ioctl.c | 9 ++ arch/x86/kernel/cpu/sgx/main.c | 140 +++++++++++++++++++++++++++++++ arch/x86/kernel/cpu/sgx/sgx.h | 1 + 6 files changed, 200 insertions(+), 27 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c index 262f5fb18d74..ff42d649c7b6 100644 --- a/arch/x86/kernel/cpu/sgx/driver.c +++ b/arch/x86/kernel/cpu/sgx/driver.c @@ -44,7 +44,6 @@ static int sgx_open(struct inode *inode, struct file *file) static int sgx_release(struct inode *inode, struct file *file) { struct sgx_encl *encl = file->private_data; - struct sgx_encl_mm *encl_mm; /* * Drain the remaining mm_list entries. At this point the list contains @@ -52,31 +51,7 @@ static int sgx_release(struct inode *inode, struct file *file) * not exited yet. The processes, which have exited, are gone from the * list by sgx_mmu_notifier_release(). */ - for ( ; ; ) { - spin_lock(&encl->mm_lock); - - if (list_empty(&encl->mm_list)) { - encl_mm = NULL; - } else { - encl_mm = list_first_entry(&encl->mm_list, - struct sgx_encl_mm, list); - list_del_rcu(&encl_mm->list); - } - - spin_unlock(&encl->mm_lock); - - /* The enclave is no longer mapped by any mm. */ - if (!encl_mm) - break; - - synchronize_srcu(&encl->srcu); - mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm); - kfree(encl_mm); - - /* 'encl_mm' is gone, put encl_mm->encl reference: */ - kref_put(&encl->refcount, sgx_encl_release); - } - + sgx_encl_mm_drain(encl); kref_put(&encl->refcount, sgx_encl_release); return 0; } diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index a8617e6a4b4e..3c91a705e720 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -451,6 +451,9 @@ static vm_fault_t sgx_vma_fault(struct vm_fault *vmf) if (unlikely(!encl)) return VM_FAULT_SIGBUS; + if (test_bit(SGX_ENCL_NO_MEMORY, &encl->flags)) + return VM_FAULT_SIGBUS; + /* * The page_array keeps track of all enclave pages, whether they * are swapped out or not. If there is no entry for this page and @@ -649,7 +652,8 @@ static int sgx_vma_access(struct vm_area_struct *vma, unsigned long addr, if (!encl) return -EFAULT; - if (!test_bit(SGX_ENCL_DEBUG, &encl->flags)) + if (!test_bit(SGX_ENCL_DEBUG, &encl->flags) || + test_bit(SGX_ENCL_NO_MEMORY, &encl->flags)) return -EFAULT; for (i = 0; i < len; i += cnt) { @@ -774,6 +778,45 @@ void sgx_encl_release(struct kref *ref) kfree(encl); } +/** + * sgx_encl_mm_drain - drain all mm_list entries + * @encl: address of the sgx_encl to drain + * + * Used during oom kill to empty the mm_list entries after they have been + * zapped. Or used by sgx_release to drain the remaining mm_list entries when + * the enclave fd is closing. After this call, sgx_encl_release will be called + * with kref_put. + */ +void sgx_encl_mm_drain(struct sgx_encl *encl) +{ + struct sgx_encl_mm *encl_mm; + + for ( ; ; ) { + spin_lock(&encl->mm_lock); + + if (list_empty(&encl->mm_list)) { + encl_mm = NULL; + } else { + encl_mm = list_first_entry(&encl->mm_list, + struct sgx_encl_mm, list); + list_del_rcu(&encl_mm->list); + } + + spin_unlock(&encl->mm_lock); + + /* The enclave is no longer mapped by any mm. */ + if (!encl_mm) + break; + + synchronize_srcu(&encl->srcu); + mmu_notifier_unregister(&encl_mm->mmu_notifier, encl_mm->mm); + kfree(encl_mm); + + /* 'encl_mm' is gone, put encl_mm->encl reference: */ + kref_put(&encl->refcount, sgx_encl_release); + } +} + /* * 'mm' is exiting and no longer needs mmu notifications. */ @@ -845,6 +888,9 @@ int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm) struct sgx_encl_mm *encl_mm; int ret; + if (test_bit(SGX_ENCL_NO_MEMORY, &encl->flags)) + return -ENOMEM; + /* * Even though a single enclave may be mapped into an mm more than once, * each 'mm' only appears once on encl->mm_list. This is guaranteed by diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h index 831d63f80f5a..cdb57ecb05c8 100644 --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -39,6 +39,7 @@ enum sgx_encl_flags { SGX_ENCL_DEBUG = BIT(1), SGX_ENCL_CREATED = BIT(2), SGX_ENCL_INITIALIZED = BIT(3), + SGX_ENCL_NO_MEMORY = BIT(4), }; struct sgx_encl_mm { @@ -125,5 +126,6 @@ struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl, unsigned long addr); struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl, bool reclaim); void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page); +void sgx_encl_mm_drain(struct sgx_encl *encl); #endif /* _X86_ENCL_H */ diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index 50ddd8988452..e1209e2cf6a3 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -420,6 +420,9 @@ static long sgx_ioc_enclave_add_pages(struct sgx_encl *encl, void __user *arg) test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) return -EINVAL; + if (test_bit(SGX_ENCL_NO_MEMORY, &encl->flags)) + return -ENOMEM; + if (copy_from_user(&add_arg, arg, sizeof(add_arg))) return -EFAULT; @@ -605,6 +608,9 @@ static long sgx_ioc_enclave_init(struct sgx_encl *encl, void __user *arg) test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) return -EINVAL; + if (test_bit(SGX_ENCL_NO_MEMORY, &encl->flags)) + return -ENOMEM; + if (copy_from_user(&init_arg, arg, sizeof(init_arg))) return -EFAULT; @@ -681,6 +687,9 @@ static int sgx_ioc_sgx2_ready(struct sgx_encl *encl) if (!test_bit(SGX_ENCL_INITIALIZED, &encl->flags)) return -EINVAL; + if (test_bit(SGX_ENCL_NO_MEMORY, &encl->flags)) + return -ENOMEM; + return 0; } diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index f3a3ed894616..3b875ab4dcd0 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -621,6 +621,146 @@ void sgx_free_epc_page(struct sgx_epc_page *page) atomic_long_inc(&sgx_nr_free_pages); } +static bool sgx_oom_get_ref(struct sgx_epc_page *epc_page) +{ + struct sgx_encl *encl; + + if (epc_page->flags & SGX_EPC_OWNER_PAGE) + encl = epc_page->encl_page->encl; + else if (epc_page->flags & SGX_EPC_OWNER_ENCL) + encl = epc_page->encl; + else + return false; + + return kref_get_unless_zero(&encl->refcount); +} + +static struct sgx_epc_page *sgx_oom_get_victim(struct sgx_epc_lru_lists *lru) +{ + struct sgx_epc_page *epc_page, *tmp; + + if (list_empty(&lru->unreclaimable)) + return NULL; + + list_for_each_entry_safe(epc_page, tmp, &lru->unreclaimable, list) { + list_del_init(&epc_page->list); + + if (sgx_oom_get_ref(epc_page)) + return epc_page; + } + return NULL; +} + +static void sgx_epc_oom_zap(void *owner, struct mm_struct *mm, unsigned long start, + unsigned long end, const struct vm_operations_struct *ops) +{ + VMA_ITERATOR(vmi, mm, start); + struct vm_area_struct *vma; + + /** + * Use end because start can be zero and not mapped into + * enclave even if encl->base = 0. + */ + for_each_vma_range(vmi, vma, end) { + if (vma->vm_ops == ops && vma->vm_private_data == owner && + vma->vm_start < end) { + zap_vma_pages(vma); + } + } +} + +static bool sgx_oom_encl(struct sgx_encl *encl) +{ + unsigned long mm_list_version; + struct sgx_encl_mm *encl_mm; + bool ret = false; + int idx; + + if (!test_bit(SGX_ENCL_CREATED, &encl->flags)) + goto out_put; + + /* Done OOM on this enclave previously, do not redo it. + * This may happen when the SECS page is still UNRECLAIMABLE because + * another page is in RECLAIM_IN_PROGRESS. Still return true so OOM + * killer can wait until the reclaimer done with the hold-up page and + * SECS before it move on to find another victim. + */ + if (test_bit(SGX_ENCL_NO_MEMORY, &encl->flags)) + goto out; + + set_bit(SGX_ENCL_NO_MEMORY, &encl->flags); + + do { + mm_list_version = encl->mm_list_version; + + /* Pairs with smp_rmb() in sgx_encl_mm_add(). */ + smp_rmb(); + + idx = srcu_read_lock(&encl->srcu); + + list_for_each_entry_rcu(encl_mm, &encl->mm_list, list) { + if (!mmget_not_zero(encl_mm->mm)) + continue; + + mmap_read_lock(encl_mm->mm); + + sgx_epc_oom_zap(encl, encl_mm->mm, encl->base, + encl->base + encl->size, &sgx_vm_ops); + + mmap_read_unlock(encl_mm->mm); + + mmput_async(encl_mm->mm); + } + + srcu_read_unlock(&encl->srcu, idx); + } while (WARN_ON_ONCE(encl->mm_list_version != mm_list_version)); + + sgx_encl_mm_drain(encl); +out: + ret = true; + +out_put: + /* + * This puts the refcount we took when we identified this enclave as + * an OOM victim. + */ + kref_put(&encl->refcount, sgx_encl_release); + return ret; +} + +static inline bool sgx_oom_encl_page(struct sgx_encl_page *encl_page) +{ + return sgx_oom_encl(encl_page->encl); +} + +/** + * sgx_epc_oom() - invoke EPC out-of-memory handling on target LRU + * @lru: LRU that is low + * + * Return: %true if a victim was found and kicked. + */ +bool sgx_epc_oom(struct sgx_epc_lru_lists *lru) +{ + struct sgx_epc_page *victim; + + spin_lock(&lru->lock); + victim = sgx_oom_get_victim(lru); + spin_unlock(&lru->lock); + + if (!victim) + return false; + + if (victim->flags & SGX_EPC_OWNER_PAGE) + return sgx_oom_encl_page(victim->encl_page); + + if (victim->flags & SGX_EPC_OWNER_ENCL) + return sgx_oom_encl(victim->encl); + + /*Will never happen unless we add more owner types in future */ + WARN_ON_ONCE(1); + return false; +} + static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size, unsigned long index, struct sgx_epc_section *section) diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 337747bef7c2..6c0bfdc209c0 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -178,6 +178,7 @@ void sgx_reclaim_direct(void); void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags); int sgx_drop_epc_page(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); +bool sgx_epc_oom(struct sgx_epc_lru_lists *lrus); void sgx_ipi_cb(void *info); From patchwork Sat Sep 23 03:06:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 143827 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp14469vqu; Fri, 22 Sep 2023 23:27:06 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGR3Ww4TwkLNRho3j9rHmGP0ULxnuHdQzrGGpdDxlNKoDI8LPQlmy73amZsLTvkzgIYoyoZ X-Received: by 2002:a05:6a20:8423:b0:12e:caac:f263 with SMTP id c35-20020a056a20842300b0012ecaacf263mr2330935pzd.20.1695450425954; Fri, 22 Sep 2023 23:27:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695450425; cv=none; d=google.com; s=arc-20160816; b=HUktEiH55iNG1rqneLu7D7jI8seJrk0oUT43jdLN5lsR0h0+hb/q9GU6fk0fHhxmOV DAFa1tAPj+aV11C5dtc/PRkSz9WGf4n1LEFkR8dihjrcUNFI+dsNKakaUMIVq7+Kfuwa mBtuoJ2cmQGPqCRa616LeDAQQydjGczDVOO0G6pO3W8hlIvdCGHPUX/g8AeuTKmMXKzH JhKCAI264nO2Cg/8Ob0WF1NNiZWrVLS4XjzltJfZCQ8P4HWBuEWwKHyS22SMurWIBYog YkYihOMNNU2jDjbDcI+IImb0YCF4ZVkqjueDyMKgbJxPspKIQ8am0RnVGNxEEhr/n4ks 1DpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=5dyKcfDTx8FKIB7gwOtGf23Nfp/ym3LX1T6EOIFvmT0=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=UhMJCiCBvQaLOD+Kb5LWAlwfxtb/XWBSGCXaE9A46G7ut1cn1mKpx7v60q58HvPpYF Of7S57xYQIyADrjPMeRV4ZqhAkQJtkuJCFj4UwI9qtD5JpEz5Gaku9n87AvvAjn60keq fjSBOnplXuShpfYwUGRUvTOPPK9tr41D4i6MlrzbjsgMWGFzHzG/TpRrH1omiTigAgWG IfDZcK3vRbtWoKm7h/2mHMm94wz4QGr9tm78gfFOvcry8Z8XVlT0mJPnbILHk5UGjOUf h76G6QRaTj7BQANnCSlRb7w9SzWgBCGNv8JPFDTdAgE0T0SqnYyr79Agg+AE7QqLGTdP IpOg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=I90IDh45; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id i69-20020a638748000000b005740eeadb59si5235493pge.518.2023.09.22.23.27.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Sep 2023 23:27:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=I90IDh45; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 3B2AF80ADF32; Fri, 22 Sep 2023 20:08:17 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229982AbjIWDIJ (ORCPT + 28 others); Fri, 22 Sep 2023 23:08:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40130 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230098AbjIWDHZ (ORCPT ); Fri, 22 Sep 2023 23:07:25 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7D1FD1BD; Fri, 22 Sep 2023 20:07:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438428; x=1726974428; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vcuE/VJmnXofCpmFZk9h3oRUZP/woW11+3Diu3QNdWo=; b=I90IDh451jDIPAmkCWATmVJopsPlDJJzUdvDlRh9ExceDn7mCoG2wGrV MqlQC1f2zJVolk0PYfvuB40GfbYo87xWYB2QHbZQaMybDGFAhxPwSOLoM laY9QTQ8nRQ+/CRXVkXTfGGfGoj+/Rc/s1KLqq63ALAnpJ85C6qMdHqkJ JPggPdnfOb+yf/0Gfqn7fwGBL0BzY7BiTWAOwcLdmirEo0dH61MrNoP7l 1MNjYm0exdIEIgix2Tncdu9UMITe1MrKioq4rYxP5Ay2q/mPpDVOtzWck ITmrrnu3hZCM2ORvqVRz0PGl7FcEpKcXVQLGxwSStvtdjag10xZg84G3n Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466824" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466824" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:07:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048573" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048573" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:11 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 13/18] x86/sgx: Expose sgx_reclaim_pages() for use by EPC cgroup Date: Fri, 22 Sep 2023 20:06:52 -0700 Message-Id: <20230923030657.16148-14-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:08:17 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777808625837140433 X-GMAIL-MSGID: 1777808625837140433 From: Sean Christopherson Adjust and expose the top-level reclaim function as sgx_reclaim_epc_pages() for use by the upcoming EPC cgroup, which will initiate reclaim to enforce the max limit. Make these adjustments to the function signature. 1) To take a parameter that specifies the number of pages to scan for reclaiming. Define a max value of 32, but scan 16 in the case for the global reclaimer (ksgxd). The EPC cgroup will use it to specify a desired number of pages to be reclaimed up to the max value of 32. 2) To take a flag to force reclaiming a page regardless of its age. The EPC cgroup will use the flag to enforce its limits by draining the reclaimable lists before resorting to other measures, e.g. forcefully kill enclaves. 3) Return the number of reclaimed pages. The EPC cgroup will use the result to track reclaiming progress and escalate to a more forceful reclaiming mode, e.g., calling this function with the flag to ignore age of pages. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V4: - Combined the 3 patches that made the individual changes to the function signature. - Removed 'high' limit in commit message. --- arch/x86/kernel/cpu/sgx/main.c | 31 +++++++++++++++++++++---------- arch/x86/kernel/cpu/sgx/sgx.h | 1 + 2 files changed, 22 insertions(+), 10 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 3b875ab4dcd0..4e1a3e038db5 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -18,6 +18,11 @@ #include "encl.h" #include "encls.h" +/* + * Maximum number of pages to scan for reclaiming. + */ +#define SGX_NR_TO_SCAN_MAX 32 + struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS]; static int sgx_nr_epc_sections; static struct task_struct *ksgxd_tsk; @@ -279,7 +284,11 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, mutex_unlock(&encl->lock); } -/* +/** + * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers + * @nr_to_scan: Number of EPC pages to scan for reclaim + * @ignore_age: Reclaim a page even if it is young + * * Take a fixed number of pages from the head of the active page pool and * reclaim them to the enclave's private shmem files. Skip the pages, which have * been accessed since the last scan. Move those pages to the tail of active @@ -292,15 +301,14 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, * problematic as it would increase the lock contention too much, which would * halt forward progress. */ -static void sgx_reclaim_pages(void) +size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age) { - struct sgx_backing backing[SGX_NR_TO_SCAN]; + struct sgx_backing backing[SGX_NR_TO_SCAN_MAX]; struct sgx_epc_page *epc_page, *tmp; struct sgx_encl_page *encl_page; pgoff_t page_index; LIST_HEAD(iso); - int ret; - int i; + size_t ret, i; spin_lock(&sgx_global_lru.lock); for (i = 0; i < SGX_NR_TO_SCAN; i++) { @@ -326,13 +334,14 @@ static void sgx_reclaim_pages(void) spin_unlock(&sgx_global_lru.lock); if (list_empty(&iso)) - return; + return 0; i = 0; list_for_each_entry_safe(epc_page, tmp, &iso, list) { encl_page = epc_page->encl_page; - if (!sgx_reclaimer_age(epc_page)) + if (i == SGX_NR_TO_SCAN_MAX || + (!ignore_age && !sgx_reclaimer_age(epc_page))) goto skip; page_index = PFN_DOWN(encl_page->desc - encl_page->encl->base); @@ -371,6 +380,8 @@ static void sgx_reclaim_pages(void) sgx_free_epc_page(epc_page); } + + return i; } static bool sgx_should_reclaim(unsigned long watermark) @@ -387,7 +398,7 @@ static bool sgx_should_reclaim(unsigned long watermark) void sgx_reclaim_direct(void) { if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) - sgx_reclaim_pages(); + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); } static int ksgxd(void *p) @@ -410,7 +421,7 @@ static int ksgxd(void *p) sgx_should_reclaim(SGX_NR_HIGH_PAGES)); if (sgx_should_reclaim(SGX_NR_HIGH_PAGES)) - sgx_reclaim_pages(); + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); cond_resched(); } @@ -582,7 +593,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - sgx_reclaim_pages(); + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); cond_resched(); } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 6c0bfdc209c0..7e7f1f36d31e 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -179,6 +179,7 @@ void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags); int sgx_drop_epc_page(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); bool sgx_epc_oom(struct sgx_epc_lru_lists *lrus); +size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age); void sgx_ipi_cb(void *info); From patchwork Sat Sep 23 03:06:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 143832 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp24075vqu; Fri, 22 Sep 2023 23:57:53 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFUYblJv+nvIrZxy+JH/n25ur/DFBIPzfl1ouWwK1ddqhazF4d9ULEUN7RV0t5zIK9UELpB X-Received: by 2002:a54:4584:0:b0:3ab:8e86:fc26 with SMTP id z4-20020a544584000000b003ab8e86fc26mr1954274oib.46.1695452272833; Fri, 22 Sep 2023 23:57:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695452272; cv=none; d=google.com; s=arc-20160816; b=dreyyW33CwqsOf9akpwyR90hl8Il83FuNNminpOlWQC+qvCuIDRUQFytrQneexf0U1 /upB1Y1/4Wq2FDmiAD7WTwPQlq0eOwKRP4/KMxsibtjBgN8pJbNKskmgyZrOwq2tU4jS jW0IdWpjL9pzL8OaYRIixyZy+vB5WhqReecZROUHTmR8CEcYt4ZIuS7UCe1jBGCtPFrv NvzQfP7J7UPk4DNS4TDXQFz5xxALaQtJNyPkJ+qZYmxSe1u6OPmXPeG1sfrDluPGQ8Rx 0G6NdNdGBJ56yIl8yMayl2KL9iAxsCsyRDMIFTTwFAKONccokcrU3clat3hJYN86bn09 0K8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=RkiEdAhxXKQf51QyoKC6QgcwXUuh+4/fP5lLrG3NO2I=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=IGWq9kOLjY3GGDmOoFf6voIeEyICmR0/NZ9Yhd6gETKODAvuW7sTvgDDU+I/nf8gOU jxqtRj5FOfnD/Q+AfAm2FC1BjwSXpO2+YfLFy0UMzOEQjhqfhK6xsNS7CB6f2c05rMcN upz3NX4xQ/8spkTAcpG8CpSyTr/+Jp3JgrdKlXqHC7CkAz5J0jXqE4tCftRDge7gMhqF NgR5DERJDNpFKmHxA/YQ8bT0g/yFYRbsP8xDzMRrEd6FfQB8hOYKiswzKG9RI2gBcJ++ MP/VC0yVYHRaryVIsGZ1LcDWsV//HoirrzUvwe+EbgFI7F56CfpPlENDwulMBUsdriXZ Ambg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=i2cyNx7w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id ku4-20020a170903288400b001b8922e82e3si5103968plb.297.2023.09.22.23.57.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Sep 2023 23:57:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=i2cyNx7w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id A076080C5902; Fri, 22 Sep 2023 20:08:21 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231190AbjIWDIO (ORCPT + 28 others); Fri, 22 Sep 2023 23:08:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230222AbjIWDH0 (ORCPT ); Fri, 22 Sep 2023 23:07:26 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 195F4CCB; Fri, 22 Sep 2023 20:07:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438430; x=1726974430; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vokB5ebesnGW9bdWq3wdFib9nSqPQEUUYA5UVblvF4Y=; b=i2cyNx7w4OuNrPwnzw90hbwsWjpyzhoRumyUfjQzWf6LPm/c9W0GBFBD uSnyBJafNEoKYj+MPifWhivwoFkpIHnGCzW7QKtFEmK5PXFARsea7iwT0 iDX27Fa6m0O02RR5QEXTSuI3Z1czTgJq8Tp91388jAFl7I1QUawPkJipQ Zp3eAzM1DtIp/ftNVe/HBfvs5W/ng4sqDr+pTTNUMCjczHTWlSh5QxaKW joSxpRPaBRTEQ+BzXn+rUvdna5hi3NlK7QuzwKVnReQgd62F8WanH57S4 AXodoRFEE939X2/kWwbe/tMH4V8hjp+KOWpCkzr99R2irodh6cU98DTe6 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466834" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466834" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:07:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048576" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048576" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:12 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 14/18] x86/sgx: Add helper to grab pages from an arbitrary EPC LRU Date: Fri, 22 Sep 2023 20:06:53 -0700 Message-Id: <20230923030657.16148-15-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:08:21 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777810562291320608 X-GMAIL-MSGID: 1777810562291320608 From: Sean Christopherson Move the isolation loop into a helper, sgx_isolate_pages(), in preparation for existence of multiple LRUs. Expose the helper to other SGX code so that it can be called from the EPC cgroup code, e.g., to isolate pages from a single cgroup LRU. Exposing the isolation loop allows the cgroup iteration logic to be wholly encapsulated within the cgroup code. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V4: - No changes other than reordering the patches --- arch/x86/kernel/cpu/sgx/main.c | 57 +++++++++++++++++++++------------- arch/x86/kernel/cpu/sgx/sgx.h | 2 ++ 2 files changed, 37 insertions(+), 22 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 4e1a3e038db5..b34ad3574c81 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -284,6 +284,40 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, mutex_unlock(&encl->lock); } +/** + * sgx_isolate_epc_pages() - Isolate pages from an LRU for reclaim + * @lru: LRU from which to reclaim + * @nr_to_scan: Number of pages to scan for reclaim + * @dst: Destination list to hold the isolated pages + */ +void sgx_isolate_epc_pages(struct sgx_epc_lru_lists *lru, size_t nr_to_scan, + struct list_head *dst) +{ + struct sgx_encl_page *encl_page; + struct sgx_epc_page *epc_page; + + spin_lock(&lru->lock); + for (; nr_to_scan > 0; --nr_to_scan) { + epc_page = list_first_entry_or_null(&lru->reclaimable, struct sgx_epc_page, list); + if (!epc_page) + break; + + encl_page = epc_page->encl_page; + + if (kref_get_unless_zero(&encl_page->encl->refcount)) { + sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIM_IN_PROGRESS); + list_move_tail(&epc_page->list, dst); + } else { + /* The owner is freeing the page, remove it from the + * LRU list + */ + sgx_epc_page_reset_state(epc_page); + list_del_init(&epc_page->list); + } + } + spin_unlock(&lru->lock); +} + /** * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers * @nr_to_scan: Number of EPC pages to scan for reclaim @@ -310,28 +344,7 @@ size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age) LIST_HEAD(iso); size_t ret, i; - spin_lock(&sgx_global_lru.lock); - for (i = 0; i < SGX_NR_TO_SCAN; i++) { - epc_page = list_first_entry_or_null(&sgx_global_lru.reclaimable, - struct sgx_epc_page, list); - if (!epc_page) - break; - - list_del_init(&epc_page->list); - encl_page = epc_page->encl_page; - - if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) { - sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIM_IN_PROGRESS); - list_move_tail(&epc_page->list, &iso); - } else { - /* The owner is freeing the page, remove it from the - * LRU list - */ - sgx_epc_page_reset_state(epc_page); - list_del_init(&epc_page->list); - } - } - spin_unlock(&sgx_global_lru.lock); + sgx_isolate_epc_pages(&sgx_global_lru, nr_to_scan, &iso); if (list_empty(&iso)) return 0; diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 7e7f1f36d31e..42075762084c 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -180,6 +180,8 @@ int sgx_drop_epc_page(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); bool sgx_epc_oom(struct sgx_epc_lru_lists *lrus); size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age); +void sgx_isolate_epc_pages(struct sgx_epc_lru_lists *lrus, size_t nr_to_scan, + struct list_head *dst); void sgx_ipi_cb(void *info); From patchwork Sat Sep 23 03:06:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 143818 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:910f:0:b0:403:3b70:6f57 with SMTP id r15csp95479vqg; Fri, 22 Sep 2023 22:27:55 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGDd/mwkxLCj5RtwecINVJRz3F83lEJVVTsStxG1kTknpCb1Aa4CRKRzxZ6c5kaKvM6WshD X-Received: by 2002:a05:6a00:228a:b0:692:b8b9:f728 with SMTP id f10-20020a056a00228a00b00692b8b9f728mr1779219pfe.30.1695446874813; Fri, 22 Sep 2023 22:27:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695446874; cv=none; d=google.com; s=arc-20160816; b=usFJhpG8RorHg1OHg06NHND9J3g7oIdIsbaLI/sTxBK7Xb7G6mV6YZQexr5Z7M+cld oLFNHvhs+oFc/mN76AN33Aetidxmc9DsrXRZxH533o50GqvdRZVVhNPr8eKfU4h3T8uk fjpkwy3hixgiZ3LmgSrPuZ0DoxvNGFc2BMJKSQ0cSNRn1fU9j7eew3jdc6xuJXNnpTqc idqIuVhM92s+w4oFznzh4sKhTle9jY6somfyZ+burX5gJ7Tq3+jk+4zfmpOmB4WE6MrZ 38/L3e+tmVRRF5MHuMCzrtlHGSSRXG+iCvG1CMajNCwL6nR9fEfNWmTJ/LhAZOpCPpUI kL6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=+0IefuGL3b5EK5lfXOt7h7ncGwNzmWc8b3djH8Hr68I=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=OI1Plh7UdudgtME9eKFQ1TySAUcI8xOtNCY2d0wxWDxKyPkHgFt5yveBgUeeA60haY jng57vkF/pxDdu5vatQMG0lNj9aDg1sZ3JAn7UYOog6GEhwhqu5fqQoztKbYTHrDKHZQ Dss1qAU/l27UMTvM0GPTIsj8QyOvdoZVwwkjkUQ2C2nBjsoCCVWYHgP/0iet+3RR4FD2 LZb2QnjN0k40VspxxmTuF8ZmgugOyOpzk3yxO9gw6NdtfraE8KJd6G+Zo1PLopvxxbJL D8jdrFSXObv7E/0U87qNVLFkMeK4cJr2fi3iUE4P7qoymWYDkK2xPSiO1RUAqU0RWUor cR2A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=FHhYCvCx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id bq26-20020a056a000e1a00b006826c8d5a31si5204050pfb.21.2023.09.22.22.27.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Sep 2023 22:27:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=FHhYCvCx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 9AB0980C056C; Fri, 22 Sep 2023 20:08:20 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229655AbjIWDIU (ORCPT + 28 others); Fri, 22 Sep 2023 23:08:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58446 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230010AbjIWDH0 (ORCPT ); Fri, 22 Sep 2023 23:07:26 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1D661CCC; Fri, 22 Sep 2023 20:07:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438430; x=1726974430; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qTmH9wj5vjHB4TTXAmAQRTLQut19Pfnv2uStepKZ6Y0=; b=FHhYCvCxI9EotHe6gbI/cvc/9ZfP6PnWHVuLqgeu63dWTopPsphvFp+i OVEX3bz39JKsajKL9lRfm5yAz05rS3QqIeNuivpQ9w+knGR7jlPTcLa37 5Qt8EHYeS03vLqRGEqg22w7LehT3niYlrU2g9IT3gxvDloyXL+aKDw7Ew Ux3XGc685U1W2xhLZMHh3ya7BcVQEmCtOXKfNNbu5Qmz5o/zdjWiHdpcT b2YLbFpVjv41bkhwSEQWJNvoVQPu5SKm1X6GHQvTee9/v58oQa2Cx59/W NZjBY57lSY9abhVRvNnY8+Pzh/5nG0LMsBjX01KrDymYijWGGpz5ELH2L A==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466842" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466842" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:07:09 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048579" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048579" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:13 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 15/18] x86/sgx: Prepare for multiple LRUs Date: Fri, 22 Sep 2023 20:06:54 -0700 Message-Id: <20230923030657.16148-16-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:08:20 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777804902480044675 X-GMAIL-MSGID: 1777804902480044675 From: Sean Christopherson Add wrappers where a direct references to the global LRU list in the reclaimer functions. To support multiple LRU lists (one per EPC cgroup) later, only make changes inside these wrappers. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson --- V5: - Revise commit message to make the purpose more clear. V4: - Re-organized this patch to include all changes related to encapsulation of the global LRU - Moved this patch to precede the EPC cgroup patch --- arch/x86/kernel/cpu/sgx/main.c | 41 +++++++++++++++++++++++----------- 1 file changed, 28 insertions(+), 13 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index b34ad3574c81..d37ef0dd865f 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -35,6 +35,16 @@ static DEFINE_XARRAY(sgx_epc_address_space); */ static struct sgx_epc_lru_lists sgx_global_lru; +static inline struct sgx_epc_lru_lists *sgx_lru_lists(struct sgx_epc_page *epc_page) +{ + return &sgx_global_lru; +} + +static inline bool sgx_can_reclaim(void) +{ + return !list_empty(&sgx_global_lru.reclaimable); +} + static atomic_long_t sgx_nr_free_pages = ATOMIC_LONG_INIT(0); /* Nodes with one or more EPC sections. */ @@ -340,6 +350,7 @@ size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age) struct sgx_backing backing[SGX_NR_TO_SCAN_MAX]; struct sgx_epc_page *epc_page, *tmp; struct sgx_encl_page *encl_page; + struct sgx_epc_lru_lists *lru; pgoff_t page_index; LIST_HEAD(iso); size_t ret, i; @@ -372,10 +383,11 @@ size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age) continue; skip: - spin_lock(&sgx_global_lru.lock); + lru = sgx_lru_lists(epc_page); + spin_lock(&lru->lock); sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIMABLE); - list_move_tail(&epc_page->list, &sgx_global_lru.reclaimable); - spin_unlock(&sgx_global_lru.lock); + list_move_tail(&epc_page->list, &lru->reclaimable); + spin_unlock(&lru->lock); kref_put(&encl_page->encl->refcount, sgx_encl_release); } @@ -400,7 +412,7 @@ size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age) static bool sgx_should_reclaim(unsigned long watermark) { return atomic_long_read(&sgx_nr_free_pages) < watermark && - !list_empty(&sgx_global_lru.reclaimable); + sgx_can_reclaim(); } /* @@ -530,14 +542,16 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) */ void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags) { - spin_lock(&sgx_global_lru.lock); + struct sgx_epc_lru_lists *lru = sgx_lru_lists(page); + + spin_lock(&lru->lock); WARN_ON_ONCE(sgx_epc_page_reclaimable(page->flags)); page->flags |= flags; if (sgx_epc_page_reclaimable(flags)) - list_add_tail(&page->list, &sgx_global_lru.reclaimable); + list_add_tail(&page->list, &lru->reclaimable); else - list_add_tail(&page->list, &sgx_global_lru.unreclaimable); - spin_unlock(&sgx_global_lru.lock); + list_add_tail(&page->list, &lru->unreclaimable); + spin_unlock(&lru->lock); } /** @@ -552,15 +566,16 @@ void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags) */ int sgx_drop_epc_page(struct sgx_epc_page *page) { - spin_lock(&sgx_global_lru.lock); + struct sgx_epc_lru_lists *lru = sgx_lru_lists(page); + + spin_lock(&lru->lock); if (sgx_epc_page_reclaim_in_progress(page->flags)) { - spin_unlock(&sgx_global_lru.lock); + spin_unlock(&lru->lock); return -EBUSY; } - list_del(&page->list); sgx_epc_page_reset_state(page); - spin_unlock(&sgx_global_lru.lock); + spin_unlock(&lru->lock); return 0; } @@ -593,7 +608,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - if (list_empty(&sgx_global_lru.reclaimable)) + if (!sgx_can_reclaim()) return ERR_PTR(-ENOMEM); if (!reclaim) { From patchwork Sat Sep 23 03:06:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 143793 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:910f:0:b0:403:3b70:6f57 with SMTP id r15csp54300vqg; Fri, 22 Sep 2023 20:11:07 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGD/RuyuNZ12g1uJZD0g1bZNS9MJCk9n5Id/EO4XdAb2NgRQ/Vdei5Z8zWQUirou5ecQHTN X-Received: by 2002:a05:6a00:139b:b0:68c:3f2:5ff7 with SMTP id t27-20020a056a00139b00b0068c03f25ff7mr1532726pfg.1.1695438667721; Fri, 22 Sep 2023 20:11:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695438667; cv=none; d=google.com; s=arc-20160816; b=IoxZIDXskyizK9LxnpteL20GCUE6ImwKiEtyMxT6UH03wLSwhvQN/T3eHju2xMqIT8 pjN0nsILPebSDdFHp0g1h/x000dsz4B92sAmqufEAbGrJYBNOf+rQpfW+0gN3YepzTm1 1SEKKnkgR/zWGDpAQIIccHWLPVCmwPsVfXeMiir25ZfrE2ZCjwF+yNO56YHxZDHxHjEU UbpjSa5GqolaLbei9oSd//piFY14RLShevR9ipktUtB8gB1IfICD88vwGAUM6/5iBILa 89UmjY7XKrfvAGmeUfyEnPWunU8qe05d9jPEcFWUQI7dWZ5qVYOlbPNOnhsq3dqf1o4V NE8w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=npIJRO1Av4QUeN7gcYa5biO1NaIqNTIRbwmcbIRSSUc=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=wfDrIhASn6tVc5YCfNvVCv9gOQo4P3/xNgNAhok3B9zGYS0BD9E4E+cWUS1O6/bpiv 1uJjIBfXfDpvub6hD+SFm/s1iKFC0sxug3bM16n2MkWxuYaDfUTTwtLLkm5x1axbbNAz GZ49EesuxKMzM2Lk36UPEes4UC0RdGR+qIZCCiNP2daZ1SEYvRA5pTbeetc8Rov6HLNp FdI8kLYR0AeXCKANEsZG1zdYEfSfWOGqElb024kmg16dFl7tzyNDfoJ9kvDcfA6vOgem kPZKm4GuyVILy+hkccLnc7QiWNi0YzG6caSDlycRYBqF8+fk7IK1MT3gvDpzqTTEpSCG 9cWw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=KX7FtJy7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from morse.vger.email (morse.vger.email. [2620:137:e000::3:1]) by mx.google.com with ESMTPS id cm22-20020a056a00339600b00690c951d2cesi4963566pfb.191.2023.09.22.20.11.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Sep 2023 20:11:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) client-ip=2620:137:e000::3:1; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=KX7FtJy7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 182F883C226B; Fri, 22 Sep 2023 20:08:42 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230395AbjIWDI0 (ORCPT + 28 others); Fri, 22 Sep 2023 23:08:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41136 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230387AbjIWDHf (ORCPT ); Fri, 22 Sep 2023 23:07:35 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BD382CD4; Fri, 22 Sep 2023 20:07:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438431; x=1726974431; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DQ/JUEkIygS77EczKPJrnikIT3P/J3nqfsW2/IMCHIA=; b=KX7FtJy7ADLCT95h0IjVzqJvzvp71Enkd6+LTxKcLrU8vkcwLsKABZbA U4Qui29aprcCy/qmnjfw84EB7TAGFk3UbneuKAyb2EPtIHNyxxcoucVeS XB+PyNApwBgDQKgxoVx/20TZZZ/FIVqCP/kYmR4TFa6NJvnwKhruHH4l6 7zGUKh7A1+4J14/7Ff8rHE4flJFNz9mV5TT3MXTjMzn/IJZ/ggmQAdYgp 3grer9nvUEBYVelpC3m9wZfAUxoZPVXZZ61PWnLVYeqx6AfWOZf4t8q0s Bp+pAdjnY+HImEIAhpXsp6CwAGAfjXBW60e6AGRB23dXBVaqYMYW0nRPH w==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466850" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466850" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:07:10 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048582" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048582" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:13 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 16/18] x86/sgx: Limit process EPC usage with misc cgroup controller Date: Fri, 22 Sep 2023 20:06:55 -0700 Message-Id: <20230923030657.16148-17-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=2.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:08:42 -0700 (PDT) X-Spam-Level: ** X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777796296232657628 X-GMAIL-MSGID: 1777796296232657628 From: Kristen Carlson Accardi Implement support for cgroup control of SGX Enclave Page Cache (EPC) memory using the misc cgroup controller. EPC memory is independent from normal system memory, e.g. must be reserved at boot from RAM and cannot be converted between EPC and normal memory while the system is running. EPC is managed by the SGX subsystem and is not accounted by the memory controller. Much like normal system memory, EPC memory can be overcommitted via virtual memory techniques and pages can be swapped out of the EPC to their backing store (normal system memory, e.g. shmem). The SGX EPC subsystem is analogous to the memory subsystem and the SGX EPC controller is in turn analogous to the memory controller; it implements limit and protection models for EPC memory. The misc controller provides a mechanism to set a hard limit of EPC usage via the "sgx_epc" resource in "misc.max". The total EPC memory available on the system is reported via the "sgx_epc" resource in "misc.capacity". This patch was modified from its original version to use the misc cgroup controller instead of a custom controller. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Tested-by: Mikko Ylinen Cc: Sean Christopherson --- V5: - kernel-doc fixes (Jarkko) V4: - Fix a white space issue in Kconfig (Randy). - Update comments for LRU list as it can be owned by a cgroup. - Fix comments for sgx_reclaim_epc_pages() and use IS_ENABLED consistently (Mikko) V3: 1) Use the same maximum number of reclaiming candidate pages to be processed, SGX_NR_TO_SCAN_MAX, for each reclaiming iteration in both cgroup worker function and ksgxd. This fixes an overflow in the backing store buffer with the same fixed size allocated on stack in sgx_reclaim_epc_pages(). 2) Initialize max for root EPC cgroup. Otherwise, all misc_cg_try_charge() calls would fail as it checks for all limits of ancestors all the way to the root node. 3) Start reclaiming whenever misc_cg_try_charge fails. Removed all re-checks for limits and current usage. For all purposes and intent, when misc_try_charge() fails, reclaiming is needed. This also corrects an error of not reclaiming when the child limit is larger than one of its ancestors. 4) Handle failure on charging to the root EPC cgroup. Failure on charging to root means we are at or above capacity, so start reclaiming or return OOM error. 5) Removed the custom cgroup tree walking iterator with epoch tracking logic. Replaced it with just the plain css_for_each_descendant_pre iterator. The custom iterator implemented a rather complex epoch scheme I believe was intended to prevent extra reclaiming from multiple worker threads doing the same walk but it turned out not matter much as each thread would only reclaim when usage is above limit. Using the plain css_for_each_descendant_pre iterator simplified code a bit. 6) Do not reclaim synchronously in misc_max_write callback which would block the user. Instead queue an async work item to run the reclaiming loop. 7) Other minor refactoring: - Remove unused params in epc_cgroup APIs - centralize uncharge into sgx_free_epc_page() --- arch/x86/Kconfig | 13 + arch/x86/kernel/cpu/sgx/Makefile | 1 + arch/x86/kernel/cpu/sgx/epc_cgroup.c | 415 +++++++++++++++++++++++++++ arch/x86/kernel/cpu/sgx/epc_cgroup.h | 59 ++++ arch/x86/kernel/cpu/sgx/main.c | 68 ++++- arch/x86/kernel/cpu/sgx/sgx.h | 17 +- 6 files changed, 556 insertions(+), 17 deletions(-) create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.c create mode 100644 arch/x86/kernel/cpu/sgx/epc_cgroup.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 66bfabae8814..e17c5dc3aea4 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1921,6 +1921,19 @@ config X86_SGX If unsure, say N. +config CGROUP_SGX_EPC + bool "Miscellaneous Cgroup Controller for Enclave Page Cache (EPC) for Intel SGX" + depends on X86_SGX && CGROUP_MISC + help + Provides control over the EPC footprint of tasks in a cgroup via + the Miscellaneous cgroup controller. + + EPC is a subset of regular memory that is usable only by SGX + enclaves and is very limited in quantity, e.g. less than 1% + of total DRAM. + + Say N if unsure. + config X86_USER_SHADOW_STACK bool "X86 userspace shadow stack" depends on AS_WRUSS diff --git a/arch/x86/kernel/cpu/sgx/Makefile b/arch/x86/kernel/cpu/sgx/Makefile index 9c1656779b2a..12901a488da7 100644 --- a/arch/x86/kernel/cpu/sgx/Makefile +++ b/arch/x86/kernel/cpu/sgx/Makefile @@ -4,3 +4,4 @@ obj-y += \ ioctl.o \ main.o obj-$(CONFIG_X86_SGX_KVM) += virt.o +obj-$(CONFIG_CGROUP_SGX_EPC) += epc_cgroup.o diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.c b/arch/x86/kernel/cpu/sgx/epc_cgroup.c new file mode 100644 index 000000000000..b5da89cf3a4c --- /dev/null +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.c @@ -0,0 +1,415 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright(c) 2022 Intel Corporation. + +#include +#include +#include +#include +#include +#include + +#include "epc_cgroup.h" + +#define SGX_EPC_RECLAIM_MIN_PAGES 16UL +#define SGX_EPC_RECLAIM_IGNORE_AGE_THRESHOLD 5 +#define SGX_EPC_RECLAIM_OOM_THRESHOLD 5 + +static struct workqueue_struct *sgx_epc_cg_wq; +static bool sgx_epc_cgroup_oom(struct sgx_epc_cgroup *root); + +struct sgx_epc_reclaim_control { + struct sgx_epc_cgroup *epc_cg; + int nr_fails; + bool ignore_age; +}; + +static inline u64 sgx_epc_cgroup_page_counter_read(struct sgx_epc_cgroup *epc_cg) +{ + return atomic64_read(&epc_cg->cg->res[MISC_CG_RES_SGX_EPC].usage) / PAGE_SIZE; +} + +static inline u64 sgx_epc_cgroup_max_pages(struct sgx_epc_cgroup *epc_cg) +{ + return READ_ONCE(epc_cg->cg->res[MISC_CG_RES_SGX_EPC].max) / PAGE_SIZE; +} + +/* + * Get the lower bound of limits of a cgroup and its ancestors. + */ +static inline u64 sgx_epc_cgroup_max_pages_to_root(struct sgx_epc_cgroup *epc_cg) +{ + struct misc_cg *i = epc_cg->cg; + u64 m = U64_MAX; + + while (i) { + m = min(m, READ_ONCE(i->res[MISC_CG_RES_SGX_EPC].max)); + i = misc_cg_parent(i); + } + + return m / PAGE_SIZE; +} + +static inline struct sgx_epc_cgroup *sgx_epc_cgroup_from_misc_cg(struct misc_cg *cg) +{ + if (cg) + return (struct sgx_epc_cgroup *)(cg->res[MISC_CG_RES_SGX_EPC].priv); + + return NULL; +} + +static inline bool sgx_epc_cgroup_disabled(void) +{ + return !cgroup_subsys_enabled(misc_cgrp_subsys); +} + +/** + * sgx_epc_cgroup_lru_empty() - check if a cgroup tree has no pages on its lrus + * @root: root of the tree to check + * + * Return: %true if all cgroups under the specified root have empty LRU lists. + * Used to avoid livelocks due to a cgroup having a non-zero charge count but + * no pages on its LRUs, e.g. due to a dead enclave waiting to be released or + * because all pages in the cgroup are unreclaimable. + */ +bool sgx_epc_cgroup_lru_empty(struct sgx_epc_cgroup *root) +{ + struct cgroup_subsys_state *css_root; + struct cgroup_subsys_state *pos; + struct sgx_epc_cgroup *epc_cg; + bool ret = true; + + /* + * Caller ensure css_root ref acquired + */ + css_root = root ? &root->cg->css : &(misc_cg_root()->css); + + rcu_read_lock(); + css_for_each_descendant_pre(pos, css_root) { + if (!css_tryget(pos)) + break; + + rcu_read_unlock(); + + epc_cg = sgx_epc_cgroup_from_misc_cg(css_misc(pos)); + + spin_lock(&epc_cg->lru.lock); + ret = list_empty(&epc_cg->lru.reclaimable); + spin_unlock(&epc_cg->lru.lock); + + rcu_read_lock(); + css_put(pos); + if (!ret) + break; + } + + rcu_read_unlock(); + + return ret; +} + +/** + * sgx_epc_cgroup_isolate_pages() - walk a cgroup tree and separate pages + * @root: root of the tree to start walking + * @nr_to_scan: The number of pages that need to be isolated + * @dst: Destination list to hold the isolated pages + * + * Walk the cgroup tree and isolate the pages in the hierarchy + * for reclaiming. + */ +void sgx_epc_cgroup_isolate_pages(struct sgx_epc_cgroup *root, + size_t *nr_to_scan, struct list_head *dst) +{ + struct cgroup_subsys_state *css_root; + struct cgroup_subsys_state *pos; + struct sgx_epc_cgroup *epc_cg; + + if (!*nr_to_scan) + return; + + /* Caller ensure css_root ref acquired */ + css_root = root ? &root->cg->css : &(misc_cg_root()->css); + + rcu_read_lock(); + css_for_each_descendant_pre(pos, css_root) { + if (!css_tryget(pos)) + break; + rcu_read_unlock(); + + epc_cg = sgx_epc_cgroup_from_misc_cg(css_misc(pos)); + sgx_isolate_epc_pages(&epc_cg->lru, nr_to_scan, dst); + + rcu_read_lock(); + css_put(pos); + if (!*nr_to_scan) + break; + } + + rcu_read_unlock(); +} + +static int sgx_epc_cgroup_reclaim_pages(unsigned long nr_pages, + struct sgx_epc_reclaim_control *rc) +{ + /* + * Ensure sgx_reclaim_pages is called with a minimum and maximum + * number of pages. Attempting to reclaim only a few pages will + * often fail and is inefficient, while reclaiming a huge number + * of pages can result in soft lockups due to holding various + * locks for an extended duration. + */ + nr_pages = max(nr_pages, SGX_EPC_RECLAIM_MIN_PAGES); + nr_pages = min(nr_pages, SGX_NR_TO_SCAN_MAX); + + return sgx_reclaim_epc_pages(nr_pages, rc->ignore_age, rc->epc_cg); +} + +static int sgx_epc_cgroup_reclaim_failed(struct sgx_epc_reclaim_control *rc) +{ + if (sgx_epc_cgroup_lru_empty(rc->epc_cg)) + return -ENOMEM; + + ++rc->nr_fails; + if (rc->nr_fails > SGX_EPC_RECLAIM_IGNORE_AGE_THRESHOLD) + rc->ignore_age = true; + + return 0; +} + +static inline +void sgx_epc_reclaim_control_init(struct sgx_epc_reclaim_control *rc, + struct sgx_epc_cgroup *epc_cg) +{ + rc->epc_cg = epc_cg; + rc->nr_fails = 0; + rc->ignore_age = false; +} + +/* + * Scheduled by sgx_epc_cgroup_try_charge() to reclaim pages from the + * cgroup when the cgroup is at/near its maximum capacity + */ +static void sgx_epc_cgroup_reclaim_work_func(struct work_struct *work) +{ + struct sgx_epc_reclaim_control rc; + struct sgx_epc_cgroup *epc_cg; + u64 cur, max; + + epc_cg = container_of(work, struct sgx_epc_cgroup, reclaim_work); + + sgx_epc_reclaim_control_init(&rc, epc_cg); + + for (;;) { + max = sgx_epc_cgroup_max_pages_to_root(epc_cg); + + /* + * Adjust the limit down by one page, the goal is to free up + * pages for fault allocations, not to simply obey the limit. + * Conditionally decrementing max also means the cur vs. max + * check will correctly handle the case where both are zero. + */ + if (max) + max--; + + /* + * Unless the limit is extremely low, in which case forcing + * reclaim will likely cause thrashing, force the cgroup to + * reclaim at least once if it's operating *near* its maximum + * limit by adjusting @max down by half the min reclaim size. + * This work func is scheduled by sgx_epc_cgroup_try_charge + * when it cannot directly reclaim due to being in an atomic + * context, e.g. EPC allocation in a fault handler. Waiting + * to reclaim until the cgroup is actually at its limit is less + * performant as it means the faulting task is effectively + * blocked until a worker makes its way through the global work + * queue. + */ + if (max > SGX_NR_TO_SCAN_MAX) + max -= (SGX_EPC_RECLAIM_MIN_PAGES / 2); + + max = min(max, sgx_epc_total_pages); + cur = sgx_epc_cgroup_page_counter_read(epc_cg); + if (cur <= max) + break; + /* Nothing reclaimable */ + if (sgx_epc_cgroup_lru_empty(epc_cg)) { + if (!sgx_epc_cgroup_oom(epc_cg)) + break; + + continue; + } + + if (!sgx_epc_cgroup_reclaim_pages(cur - max, &rc)) { + if (sgx_epc_cgroup_reclaim_failed(&rc)) + break; + } + } +} + +static int __sgx_epc_cgroup_try_charge(struct sgx_epc_cgroup *epc_cg, + bool reclaim) +{ + struct sgx_epc_reclaim_control rc; + unsigned int nr_empty = 0; + + sgx_epc_reclaim_control_init(&rc, epc_cg); + + for (;;) { + if (!misc_cg_try_charge(MISC_CG_RES_SGX_EPC, epc_cg->cg, + PAGE_SIZE)) + break; + + if (sgx_epc_cgroup_lru_empty(epc_cg)) + return -ENOMEM; + + if (signal_pending(current)) + return -ERESTARTSYS; + + if (!reclaim) { + queue_work(sgx_epc_cg_wq, &rc.epc_cg->reclaim_work); + return -EBUSY; + } + + if (!sgx_epc_cgroup_reclaim_pages(1, &rc)) { + if (sgx_epc_cgroup_reclaim_failed(&rc)) { + if (++nr_empty > SGX_EPC_RECLAIM_OOM_THRESHOLD) + return -ENOMEM; + schedule(); + } + } + } + if (epc_cg->cg != misc_cg_root()) + css_get(&epc_cg->cg->css); + + return 0; +} + +/** + * sgx_epc_cgroup_try_charge() - hierarchically try to charge a single EPC page + * @mm: the mm_struct of the process to charge + * @reclaim: whether or not synchronous reclaim is allowed + * + * Returns EPC cgroup or NULL on success, -errno on failure. + */ +struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(bool reclaim) +{ + struct sgx_epc_cgroup *epc_cg; + int ret; + + if (sgx_epc_cgroup_disabled()) + return NULL; + + epc_cg = sgx_epc_cgroup_from_misc_cg(get_current_misc_cg()); + ret = __sgx_epc_cgroup_try_charge(epc_cg, reclaim); + put_misc_cg(epc_cg->cg); + + if (ret) + return ERR_PTR(ret); + + return epc_cg; +} + +/** + * sgx_epc_cgroup_uncharge() - hierarchically uncharge EPC pages + * @epc_cg: the charged epc cgroup + */ +void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg) +{ + if (sgx_epc_cgroup_disabled()) + return; + + misc_cg_uncharge(MISC_CG_RES_SGX_EPC, epc_cg->cg, PAGE_SIZE); + + if (epc_cg->cg != misc_cg_root()) + put_misc_cg(epc_cg->cg); +} + +static bool sgx_epc_cgroup_oom(struct sgx_epc_cgroup *root) +{ + struct cgroup_subsys_state *css_root; + struct cgroup_subsys_state *pos; + struct sgx_epc_cgroup *epc_cg; + bool oom = false; + + /* Caller ensure css_root ref acquired */ + css_root = root ? &root->cg->css : &(misc_cg_root()->css); + + rcu_read_lock(); + css_for_each_descendant_pre(pos, css_root) { + /* skip dead ones */ + if (!css_tryget(pos)) + continue; + + rcu_read_unlock(); + + epc_cg = sgx_epc_cgroup_from_misc_cg(css_misc(pos)); + oom = sgx_epc_oom(&epc_cg->lru); + + rcu_read_lock(); + css_put(pos); + if (oom) + break; + } + + rcu_read_unlock(); + + return oom; +} + +static void sgx_epc_cgroup_free(struct misc_cg *cg) +{ + struct sgx_epc_cgroup *epc_cg; + + epc_cg = sgx_epc_cgroup_from_misc_cg(cg); + cancel_work_sync(&epc_cg->reclaim_work); + kfree(epc_cg); +} + +static void sgx_epc_cgroup_max_write(struct misc_cg *cg) +{ + struct sgx_epc_reclaim_control rc; + struct sgx_epc_cgroup *epc_cg; + + epc_cg = sgx_epc_cgroup_from_misc_cg(cg); + + sgx_epc_reclaim_control_init(&rc, epc_cg); + /* Let the reclaimer to do the work so user is not blocked */ + queue_work(sgx_epc_cg_wq, &rc.epc_cg->reclaim_work); +} + +static int sgx_epc_cgroup_alloc(struct misc_cg *cg) +{ + struct sgx_epc_cgroup *epc_cg; + + epc_cg = kzalloc(sizeof(*epc_cg), GFP_KERNEL); + if (!epc_cg) + return -ENOMEM; + + sgx_lru_init(&epc_cg->lru); + INIT_WORK(&epc_cg->reclaim_work, sgx_epc_cgroup_reclaim_work_func); + cg->res[MISC_CG_RES_SGX_EPC].alloc = sgx_epc_cgroup_alloc; + cg->res[MISC_CG_RES_SGX_EPC].free = sgx_epc_cgroup_free; + cg->res[MISC_CG_RES_SGX_EPC].max_write = sgx_epc_cgroup_max_write; + cg->res[MISC_CG_RES_SGX_EPC].priv = epc_cg; + epc_cg->cg = cg; + + return 0; +} + +static int __init sgx_epc_cgroup_init(void) +{ + struct misc_cg *cg; + + if (!boot_cpu_has(X86_FEATURE_SGX)) + return 0; + + sgx_epc_cg_wq = alloc_workqueue("sgx_epc_cg_wq", + WQ_UNBOUND | WQ_FREEZABLE, + WQ_UNBOUND_MAX_ACTIVE); + BUG_ON(!sgx_epc_cg_wq); + + cg = misc_cg_root(); + BUG_ON(!cg); + + return sgx_epc_cgroup_alloc(cg); +} +subsys_initcall(sgx_epc_cgroup_init); diff --git a/arch/x86/kernel/cpu/sgx/epc_cgroup.h b/arch/x86/kernel/cpu/sgx/epc_cgroup.h new file mode 100644 index 000000000000..dfc902f4d96f --- /dev/null +++ b/arch/x86/kernel/cpu/sgx/epc_cgroup.h @@ -0,0 +1,59 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright(c) 2022 Intel Corporation. */ +#ifndef _INTEL_SGX_EPC_CGROUP_H_ +#define _INTEL_SGX_EPC_CGROUP_H_ + +#include +#include +#include +#include +#include +#include + +#include "sgx.h" + +#ifndef CONFIG_CGROUP_SGX_EPC +#define MISC_CG_RES_SGX_EPC MISC_CG_RES_TYPES +struct sgx_epc_cgroup; + +static inline struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(bool reclaim) +{ + return NULL; +} + +static inline void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg) { } + +static inline void sgx_epc_cgroup_isolate_pages(struct sgx_epc_cgroup *root, + size_t *nr_to_scan, + struct list_head *dst) { } + +static inline struct sgx_epc_lru_lists *epc_cg_lru(struct sgx_epc_cgroup *epc_cg) +{ + return NULL; +} + +static bool sgx_epc_cgroup_lru_empty(struct sgx_epc_cgroup *root) +{ + return true; +} +#else +struct sgx_epc_cgroup { + struct misc_cg *cg; + struct sgx_epc_lru_lists lru; + struct work_struct reclaim_work; +}; + +struct sgx_epc_cgroup *sgx_epc_cgroup_try_charge(bool reclaim); +void sgx_epc_cgroup_uncharge(struct sgx_epc_cgroup *epc_cg); +bool sgx_epc_cgroup_lru_empty(struct sgx_epc_cgroup *root); +void sgx_epc_cgroup_isolate_pages(struct sgx_epc_cgroup *root, + size_t *nr_to_scan, struct list_head *dst); +static inline struct sgx_epc_lru_lists *epc_cg_lru(struct sgx_epc_cgroup *epc_cg) +{ + if (epc_cg) + return &epc_cg->lru; + return NULL; +} +#endif + +#endif /* _INTEL_SGX_EPC_CGROUP_H_ */ diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index d37ef0dd865f..0ade7792ff5f 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -17,12 +18,9 @@ #include "driver.h" #include "encl.h" #include "encls.h" +#include "epc_cgroup.h" -/* - * Maximum number of pages to scan for reclaiming. - */ -#define SGX_NR_TO_SCAN_MAX 32 - +u64 sgx_epc_total_pages; struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS]; static int sgx_nr_epc_sections; static struct task_struct *ksgxd_tsk; @@ -37,11 +35,17 @@ static struct sgx_epc_lru_lists sgx_global_lru; static inline struct sgx_epc_lru_lists *sgx_lru_lists(struct sgx_epc_page *epc_page) { + if (IS_ENABLED(CONFIG_CGROUP_SGX_EPC)) + return epc_cg_lru(epc_page->epc_cg); + return &sgx_global_lru; } static inline bool sgx_can_reclaim(void) { + if (IS_ENABLED(CONFIG_CGROUP_SGX_EPC)) + return !sgx_epc_cgroup_lru_empty(NULL); + return !list_empty(&sgx_global_lru.reclaimable); } @@ -300,14 +304,14 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, * @nr_to_scan: Number of pages to scan for reclaim * @dst: Destination list to hold the isolated pages */ -void sgx_isolate_epc_pages(struct sgx_epc_lru_lists *lru, size_t nr_to_scan, +void sgx_isolate_epc_pages(struct sgx_epc_lru_lists *lru, size_t *nr_to_scan, struct list_head *dst) { struct sgx_encl_page *encl_page; struct sgx_epc_page *epc_page; spin_lock(&lru->lock); - for (; nr_to_scan > 0; --nr_to_scan) { + for (; *nr_to_scan > 0; --(*nr_to_scan)) { epc_page = list_first_entry_or_null(&lru->reclaimable, struct sgx_epc_page, list); if (!epc_page) break; @@ -332,6 +336,7 @@ void sgx_isolate_epc_pages(struct sgx_epc_lru_lists *lru, size_t nr_to_scan, * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers * @nr_to_scan: Number of EPC pages to scan for reclaim * @ignore_age: Reclaim a page even if it is young + * @epc_cg: EPC cgroup from which to reclaim * * Take a fixed number of pages from the head of the active page pool and * reclaim them to the enclave's private shmem files. Skip the pages, which have @@ -345,7 +350,8 @@ void sgx_isolate_epc_pages(struct sgx_epc_lru_lists *lru, size_t nr_to_scan, * problematic as it would increase the lock contention too much, which would * halt forward progress. */ -size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age) +size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age, + struct sgx_epc_cgroup *epc_cg) { struct sgx_backing backing[SGX_NR_TO_SCAN_MAX]; struct sgx_epc_page *epc_page, *tmp; @@ -355,7 +361,15 @@ size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age) LIST_HEAD(iso); size_t ret, i; - sgx_isolate_epc_pages(&sgx_global_lru, nr_to_scan, &iso); + /* + * If a specific cgroup is not being targeted, take from the global + * list first, even when cgroups are enabled. If there are + * pages on the global LRU then they should get reclaimed asap. + */ + if (!IS_ENABLED(CONFIG_CGROUP_SGX_EPC) || !epc_cg) + sgx_isolate_epc_pages(&sgx_global_lru, &nr_to_scan, &iso); + + sgx_epc_cgroup_isolate_pages(epc_cg, &nr_to_scan, &iso); if (list_empty(&iso)) return 0; @@ -423,7 +437,7 @@ static bool sgx_should_reclaim(unsigned long watermark) void sgx_reclaim_direct(void) { if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) - sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false, NULL); } static int ksgxd(void *p) @@ -446,7 +460,7 @@ static int ksgxd(void *p) sgx_should_reclaim(SGX_NR_HIGH_PAGES)); if (sgx_should_reclaim(SGX_NR_HIGH_PAGES)) - sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false, NULL); cond_resched(); } @@ -600,6 +614,11 @@ int sgx_drop_epc_page(struct sgx_epc_page *page) struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) { struct sgx_epc_page *page; + struct sgx_epc_cgroup *epc_cg; + + epc_cg = sgx_epc_cgroup_try_charge(reclaim); + if (IS_ERR(epc_cg)) + return ERR_CAST(epc_cg); for ( ; ; ) { page = __sgx_alloc_epc_page(); @@ -608,8 +627,10 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - if (!sgx_can_reclaim()) - return ERR_PTR(-ENOMEM); + if (!sgx_can_reclaim()) { + page = ERR_PTR(-ENOMEM); + break; + } if (!reclaim) { page = ERR_PTR(-EBUSY); @@ -621,10 +642,17 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) break; } - sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false); + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false, NULL); cond_resched(); } + if (!IS_ERR(page)) { + WARN_ON_ONCE(page->epc_cg); + page->epc_cg = epc_cg; + } else { + sgx_epc_cgroup_uncharge(epc_cg); + } + if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) wake_up(&ksgxd_waitq); @@ -647,6 +675,11 @@ void sgx_free_epc_page(struct sgx_epc_page *page) WARN_ON_ONCE(page->flags & (SGX_EPC_PAGE_STATE_MASK)); + if (page->epc_cg) { + sgx_epc_cgroup_uncharge(page->epc_cg); + page->epc_cg = NULL; + } + spin_lock(&node->lock); page->encl_page = NULL; @@ -657,6 +690,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page) page->flags = SGX_EPC_PAGE_FREE; spin_unlock(&node->lock); + atomic_long_inc(&sgx_nr_free_pages); } @@ -826,6 +860,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size, section->pages[i].flags = 0; section->pages[i].encl_page = NULL; section->pages[i].poison = 0; + section->pages[i].epc_cg = NULL; list_add_tail(§ion->pages[i].list, &sgx_dirty_page_list); } @@ -970,6 +1005,7 @@ static void __init arch_update_sysfs_visibility(int nid) {} static bool __init sgx_page_cache_init(void) { u32 eax, ebx, ecx, edx, type; + u64 capacity = 0; u64 pa, size; int nid; int i; @@ -1020,6 +1056,7 @@ static bool __init sgx_page_cache_init(void) sgx_epc_sections[i].node = &sgx_numa_nodes[nid]; sgx_numa_nodes[nid].size += size; + capacity += size; sgx_nr_epc_sections++; } @@ -1029,6 +1066,9 @@ static bool __init sgx_page_cache_init(void) return false; } + misc_cg_set_capacity(MISC_CG_RES_SGX_EPC, capacity); + sgx_epc_total_pages = capacity >> PAGE_SHIFT; + return true; } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 42075762084c..1b90a905a9e2 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -19,6 +19,11 @@ #define SGX_MAX_EPC_SECTIONS 8 #define SGX_EEXTEND_BLOCK_SIZE 256 + +/* + * Maximum number of pages to scan for reclaiming. + */ +#define SGX_NR_TO_SCAN_MAX 32UL #define SGX_NR_TO_SCAN 16 #define SGX_NR_LOW_PAGES 32 #define SGX_NR_HIGH_PAGES 64 @@ -70,6 +75,8 @@ enum sgx_epc_page_state { /* flag for pages owned by a sgx_encl struct */ #define SGX_EPC_OWNER_ENCL BIT(4) +struct sgx_epc_cgroup; + struct sgx_epc_page { unsigned int section; u16 flags; @@ -81,6 +88,7 @@ struct sgx_epc_page { struct sgx_encl *encl; }; struct list_head list; + struct sgx_epc_cgroup *epc_cg; }; static inline void sgx_epc_page_reset_state(struct sgx_epc_page *page) @@ -129,6 +137,7 @@ struct sgx_epc_section { struct sgx_numa_node *node; }; +extern u64 sgx_epc_total_pages; extern struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS]; static inline unsigned long sgx_get_epc_phys_addr(struct sgx_epc_page *page) @@ -152,7 +161,8 @@ static inline void *sgx_get_epc_virt_addr(struct sgx_epc_page *page) } /* - * Contains EPC pages tracked by the reclaimer (ksgxd). + * Contains EPC pages tracked by the global reclaimer (ksgxd) or an EPC + * cgroup. */ struct sgx_epc_lru_lists { spinlock_t lock; @@ -179,8 +189,9 @@ void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags); int sgx_drop_epc_page(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); bool sgx_epc_oom(struct sgx_epc_lru_lists *lrus); -size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age); -void sgx_isolate_epc_pages(struct sgx_epc_lru_lists *lrus, size_t nr_to_scan, +size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age, + struct sgx_epc_cgroup *epc_cg); +void sgx_isolate_epc_pages(struct sgx_epc_lru_lists *lrus, size_t *nr_to_scan, struct list_head *dst); void sgx_ipi_cb(void *info); From patchwork Sat Sep 23 03:06:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 143839 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp37176vqu; Sat, 23 Sep 2023 00:35:32 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH0cTTF36ZziYbAWhlIy4oO5Gv8voh9y/25tv5slZLeP192xuTzd6giSdMQicxvufQtLjoT X-Received: by 2002:a17:903:32cf:b0:1c0:e87e:52ba with SMTP id i15-20020a17090332cf00b001c0e87e52bamr2865410plr.2.1695454532561; Sat, 23 Sep 2023 00:35:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695454532; cv=none; d=google.com; s=arc-20160816; b=U/5rZxnGRfs+l4LNLsTYVvZ07Coe6IDqMcIg/YNFa5Z99kjgZzxwXJOm7goTc673xw jQkxInFgWWQhdrtj/YyNBHZ7d+16cxOm13YaASUXEnWXo3i50y0bQwxpwply8yDwb3Dp RC49Py4ypcJq/bvH+UuNmWEgA+Y17lgtc1cN810swPSSiDedBR9hOPkuTIDy+LDbu2YR 8T0TWO1Coj6qfH6LzDY4N0GlnxN7u3xoIEm0eDZ+GQ5+30tXQfQxvSDKuKiwRNowB9yN gyhfQHORGP4yfvCQOfua0SpriY2sRXtVtZwS74OSMiA95reKSEq6tE4nMhfqgxaHLVKA Z/yg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=UtyjPnUqhibrXtyQ75jCsqZfTz4zEi1614f/zLFVCZE=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=G36t6lTAP6XketfqIHLD+yfalmnAql15hkGmCD1p7jeZDZ9ilKt+1uDOOBymespIrD cZtEk8aXHpyP8A/deWH6RNNMnExSPuH6mbZrW4YiotZ1O9W4s0EjPzrhtYEDpu47ihCI r8fEaxsSTWtdoep+5b+YtJmk7tIElgWfg+iTd27Qo9q/8l+EsVS18IeDoNcL8m1+fsko bNCYNuzO0Jkq9BpV5087CKDt2fV2v4vKyPQ3nSzruklOfGhrvDNiT9rqV2iIrliL73ob rBxpIVVp00B8klKM6GZUIScw053sQflokWeqkwnbCdM4IT7h7j0kAfSq/ZO9IYP+6tmy Mc5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="RO/1HNJe"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id j18-20020a170902f25200b001b674055d72si5041519plc.621.2023.09.23.00.35.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 23 Sep 2023 00:35:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="RO/1HNJe"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id BC98C802B40D; Fri, 22 Sep 2023 20:08:22 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230389AbjIWDIX (ORCPT + 28 others); Fri, 22 Sep 2023 23:08:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58438 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230397AbjIWDHf (ORCPT ); Fri, 22 Sep 2023 23:07:35 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B77DACDF; Fri, 22 Sep 2023 20:07:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438431; x=1726974431; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Oho//VRT6sgLk83ZBgqkFv56uGXmUurj2kM6E/HQC/w=; b=RO/1HNJeT4qnVSmRa5rjJJk7FNy1Oik86PwAYhgo0OtiiehjLjPxyAqr VmY7ueMLbqHSttxIm9aGfQlWXkHc/pGoc34XKn0EEBP1MZb91qK31wc6P LQH1Junbe41MUA/rOMEWtcZTLrVTbHrOneLkVgCcUL0Sf2DUN8xOffBiT gQ2+H+FH9DpIsDG+LP/EqJdS6c0mGivVf8+4PtU4S7EJJ5jDkB8aN/K/w 0JEBaPXjlEJBb+Q6FW1z6ZRYSH+y3PeClUyBMkJMl+7VLDUKWo3ErUpsx TtYASngDwaJKb+58MpMkkN85B1Taxx7era1tQB1IAcdm7vSvflEAvr4JM Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466860" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466860" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:07:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048587" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048587" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:14 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 17/18] Docs/x86/sgx: Add description for cgroup support Date: Fri, 22 Sep 2023 20:06:56 -0700 Message-Id: <20230923030657.16148-18-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:08:22 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777812931487608015 X-GMAIL-MSGID: 1777812931487608015 From: Sean Christopherson Add initial documentation of how to regulate the distribution of SGX Enclave Page Cache (EPC) memory via the Miscellaneous cgroup controller. Signed-off-by: Sean Christopherson Co-developed-by: Kristen Carlson Accardi Signed-off-by: Kristen Carlson Accardi Co-developed-by: Haitao Huang Signed-off-by: Haitao Huang Cc: Sean Christopherson Reviewed-by: Bagas Sanjaya --- V4: - Fix indentation (Randy) - Change misc.events file to be read-only - Fix a typo for 'subsystem' - Add behavior when VMM overcommit EPC with a cgroup (Mikko) --- Documentation/arch/x86/sgx.rst | 82 ++++++++++++++++++++++++++++++++++ 1 file changed, 82 insertions(+) diff --git a/Documentation/arch/x86/sgx.rst b/Documentation/arch/x86/sgx.rst index d90796adc2ec..65c211bd5342 100644 --- a/Documentation/arch/x86/sgx.rst +++ b/Documentation/arch/x86/sgx.rst @@ -300,3 +300,85 @@ to expected failures and handle them as follows: first call. It indicates a bug in the kernel or the userspace client if any of the second round of ``SGX_IOC_VEPC_REMOVE_ALL`` calls has a return code other than 0. + + +Cgroup Support +============== + +The "sgx_epc" resource within the Miscellaneous cgroup controller regulates +distribution of SGX EPC memory, which is a subset of system RAM that +is used to provide SGX-enabled applications with protected memory, +and is otherwise inaccessible, i.e. shows up as reserved in +/proc/iomem and cannot be read/written outside of an SGX enclave. + +Although current systems implement EPC by stealing memory from RAM, +for all intents and purposes the EPC is independent from normal system +memory, e.g. must be reserved at boot from RAM and cannot be converted +between EPC and normal memory while the system is running. The EPC is +managed by the SGX subsystem and is not accounted by the memory +controller. Note that this is true only for EPC memory itself, i.e. +normal memory allocations related to SGX and EPC memory, e.g. the +backing memory for evicted EPC pages, are accounted, limited and +protected by the memory controller. + +Much like normal system memory, EPC memory can be overcommitted via +virtual memory techniques and pages can be swapped out of the EPC +to their backing store (normal system memory allocated via shmem). +The SGX EPC subsystem is analogous to the memory subsystem, and +it implements limit and protection models for EPC memory. + +SGX EPC Interface Files +----------------------- + +For a generic description of the Miscellaneous controller interface +files, please see Documentation/admin-guide/cgroup-v2.rst + +All SGX EPC memory amounts are in bytes unless explicitly stated +otherwise. If a value which is not PAGE_SIZE aligned is written, +the actual value used by the controller will be rounded down to +the closest PAGE_SIZE multiple. + + misc.capacity + A read-only flat-keyed file shown only in the root cgroup. + The sgx_epc resource will show the total amount of EPC + memory available on the platform. + + misc.current + A read-only flat-keyed file shown in the non-root cgroups. + The sgx_epc resource will show the current active EPC memory + usage of the cgroup and its descendants. EPC pages that are + swapped out to backing RAM are not included in the current count. + + misc.max + A read-write single value file which exists on non-root + cgroups. The sgx_epc resource will show the EPC usage + hard limit. The default is "max". + + If a cgroup's EPC usage reaches this limit, EPC allocations, + e.g. for page fault handling, will be blocked until EPC can + be reclaimed from the cgroup. If EPC cannot be reclaimed in + a timely manner, reclaim will be forced, e.g. by ignoring LRU. + + The EPC pages allocated for KVM guests by the virtual EPC driver + are not reclaimable by the host kernel SGX reclaimers. If a VMM + tries to start a VM within a cgroup whose EPC usage reaches this + limit, the virtual EPC driver will stop allocating more EPC for the + VM, and return SIGBUS to the VMM which would abort the VM launch. + + misc.events + A read-only flat-keyed file which exists on non-root cgroups. + A value change in this file generates a file modified event. + + max + The number of times the cgroup has triggered a reclaim + due to its EPC usage approaching (or exceeding) its max + EPC boundary. + +Migration +--------- + +Once an EPC page is charged to a cgroup (during allocation), it +remains charged to the original cgroup until the page is released +or reclaimed. Migrating a process to a different cgroup doesn't +move the EPC charges that it incurred while in the previous cgroup +to its new cgroup. From patchwork Sat Sep 23 03:06:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Haitao Huang X-Patchwork-Id: 143840 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp41942vqu; Sat, 23 Sep 2023 00:52:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEmsI1ZDVK5agVehMTgfcy9dDQ+/V1oEYy/ka969B3w44BhOvs1NDQJThWFuVsjrv4SEXJi X-Received: by 2002:a05:6e02:152f:b0:34f:d822:baab with SMTP id i15-20020a056e02152f00b0034fd822baabmr2147486ilu.12.1695455547963; Sat, 23 Sep 2023 00:52:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695455547; cv=none; d=google.com; s=arc-20160816; b=ySNhCBZWN2LYonmILTP0YKnaXIMu2W/oWmtQUZ37bUhO5Eswo7mSS2+gDR33ppOECD vsU24KxlZ35UAn+vQICSPytzVOBuLXbFaAOrgF6EuRtcRob2a4DGpf5prWS9eaNMZAZe Pc7l5rLgTBzqvd1nRm9qQBr3xnX8bgrrrf/M38Zc1CNSBVbRRtOUc48uFumLPhUoMiHl Qww+KhmMaEq1rLpn4y0IOEiLzy8HAr5ph7z0Gn5e27Do1J7bWcKB896LX6Y8qOoOA3no JwWN1jVJSWGHTA2bcU2ZLLKELzfFoZ1qPceJ6bpX4y9HfXsG7Rjtp9GnlGyHkilKnHOT FcBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=8xNt2mBCODQ/BN42IHkMkLmmPDVQ62QgiBUGRzocNjg=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=wEprssmBJ6IbPUoTgzS9jhKzuCgPvLkfDG05H42bQUz5Om9+KL3NB2rReggLfaBXWk HBtO+mg3cuinN9g6rsZaODeNgW4FU5y6RhFAUPPs7VIAWCQOOWBEHgPiqXKgyDNnGsbH DLYm9GJKLa605oQ+lxEXx89FNm1KkAaDQLfjC6HiX+BKpDd2VA0YAnKuXo/ThRNj3TLD XoLz60K3FyFLng7Jn3WdxEIuav9s72fLf9ss0bpziGKw1MlEoEERDqCqOuyjGqgaf5Xe k6yKR35HeJKT4GNQWo/R/GeTFjCNhNL55lM17JozbjyOAdZc5Dwb0WwMabUrYmQjU+gt pcqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="m0yMOlB/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id r203-20020a632bd4000000b00573fc6a17dbsi5552483pgr.435.2023.09.23.00.52.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 23 Sep 2023 00:52:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="m0yMOlB/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id AD05883AD1FE; Fri, 22 Sep 2023 20:08:36 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229899AbjIWDI3 (ORCPT + 28 others); Fri, 22 Sep 2023 23:08:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58460 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230418AbjIWDHj (ORCPT ); Fri, 22 Sep 2023 23:07:39 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8F13CCF6; Fri, 22 Sep 2023 20:07:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438434; x=1726974434; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dZ4dr09FZCb8PLkHgNm+LEZeNQ2boJ7dLsvuHqxpIsY=; b=m0yMOlB/eV24uty8sh2g+qB/nyeDiDeJdkwLRFqUw0uvwLH2QOf+fsnh jaiFFfrpz4Gk3ZA9IX4AgVF3HMhPR33Pz6oGdqugzO8LZruHnsd3nGyxg fU9KkVbDW6alVgIx6J7Na+Q7IUUHhTESaiksasagIoSwX0MPCelEywxcT qcyZ4xhJgNgBO36vBQ3Fgh4PfHCZwAk8pJtz4679vzGRjp2pMIueFm408 /9/3IAhIYAHXlTOdMlvrpYN5lXx7dgXFmsO1lHDh1RGOnQKuCv22NYvvJ T9Op0P0IkPdMbC6WWV80yUn93aB4Co9Qemxx3nZ+Qi7uF0l0u/HhWbpM0 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466868" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466868" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:07:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048591" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048591" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:15 -0700 From: Haitao Huang To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 18/18] selftests/sgx: Add scripts for EPC cgroup testing Date: Fri, 22 Sep 2023 20:06:57 -0700 Message-Id: <20230923030657.16148-19-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=2.8 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:08:36 -0700 (PDT) X-Spam-Level: ** X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777813997075648265 X-GMAIL-MSGID: 1777813997075648265 The scripts rely on cgroup-tools package from libcgroup [1]. To run selftests for epc cgroup: sudo ./run_epc_cg_selftests.sh With different cgroups, the script starts one or multiple concurrent SGX selftests, each to run one unclobbered_vdso_oversubscribed test. Each of such test tries to load an enclave of size equal to the EPC capacity available on the platform. The script checks results against the expectation set for each cgroup and report success or failure. The script creates 3 different cgroups at the beginning with following expectations: 1) SMALL - intentionally small enough to fail the test loading an enclave of size equal to the capacity. 2) LARGE - large enough to run up to 4 concurrent tests but fail some if more than 4 concurrent tests are run. The script start 4 expecting at least one test to pass, and then starts 5 expecting at least one test to fail. 3) LARGER - limit is the same as capacity, large enough to run lots of concurrent tests. The script starts 10 of them and expects all pass. To watch misc cgroup 'current' changes during testing, run this in a separate terminal: ./watch_misc_for_tests.sh current [1] https://github.com/libcgroup/libcgroup/blob/main/README Signed-off-by: Haitao Huang --- V5: - Added script with automatic results checking, remove the interactive script. - The script can run independent from the series below. V4: Note: Need to apply on top of this series previously reviewed: https://lore.kernel.org/linux-sgx/20220905020411.17290-1-jarkko@kernel.org/ --- .../selftests/sgx/run_epc_cg_selftests.sh | 147 ++++++++++++++++++ .../selftests/sgx/watch_misc_for_tests.sh | 13 ++ 2 files changed, 160 insertions(+) create mode 100755 tools/testing/selftests/sgx/run_epc_cg_selftests.sh create mode 100755 tools/testing/selftests/sgx/watch_misc_for_tests.sh diff --git a/tools/testing/selftests/sgx/run_epc_cg_selftests.sh b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh new file mode 100755 index 000000000000..410c97ee6e18 --- /dev/null +++ b/tools/testing/selftests/sgx/run_epc_cg_selftests.sh @@ -0,0 +1,147 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright(c) 2023 Intel Corporation. + +TEST_ROOT_CG=selftest +cgcreate -g misc:$TEST_ROOT_CG +if [ $? -ne 0 ]; then + echo "# Please make sure cgroup-tools is installed, and misc cgroup is mounted." + exit 1 +fi +TEST_CG_SUB1=$TEST_ROOT_CG/test1 +TEST_CG_SUB2=$TEST_ROOT_CG/test2 +TEST_CG_SUB3=$TEST_ROOT_CG/test1/test3 +TEST_CG_SUB4=$TEST_ROOT_CG/test4 + +cgcreate -g misc:$TEST_CG_SUB1 +cgcreate -g misc:$TEST_CG_SUB2 +cgcreate -g misc:$TEST_CG_SUB3 +cgcreate -g misc:$TEST_CG_SUB4 + +# Default to V2 +CG_ROOT=/sys/fs/cgroup +if [ ! -d "/sys/fs/cgroup/misc" ]; then + echo "# cgroup V2 is in use." +else + echo "# cgroup V1 is in use." + CG_ROOT=/sys/fs/cgroup/misc +fi + +CAPACITY=$(grep "sgx_epc" "$CG_ROOT/misc.capacity" | awk '{print $2}') +# This is below number of VA pages needed for enclave of capacity size. So +# should fail oversubscribed cases +SMALL=$(( CAPACITY / 512 )) + +# At least load one enclave of capacity size successfully, maybe up to 4. +# But some may fail if we run more than 4 concurrent enclaves of capacity size. +LARGE=$(( SMALL * 4 )) + +# Load lots of enclaves +LARGER=$CAPACITY +echo "# Setting up limits." +echo "sgx_epc $SMALL" | tee $CG_ROOT/$TEST_CG_SUB1/misc.max +echo "sgx_epc $LARGE" | tee $CG_ROOT/$TEST_CG_SUB2/misc.max +echo "sgx_epc $LARGER" | tee $CG_ROOT/$TEST_CG_SUB4/misc.max + +timestamp=$(date +%Y%m%d_%H%M%S) + +test_cmd="./test_sgx -t unclobbered_vdso_oversubscribed" + +echo "# Start unclobbered_vdso_oversubscribed with SMALL limit, expecting failure..." +# Always use leaf node of misc cgroups so it works for both v1 and v2 +# these may fail on OOM +cgexec -g misc:$TEST_CG_SUB3 $test_cmd >cgtest_small_$timestamp.log 2>&1 +if [[ $? -eq 0 ]]; then + echo "# Fail on SMALL limit, not expecting any test passes." + cgdelete -r -g misc:$TEST_ROOT_CG + exit 1 +else + echo "# Test failed as expected." +fi + +echo "# PASSED SMALL limit." + +echo "# Start 4 concurrent unclobbered_vdso_oversubscribed tests with LARGE limit, expecting at least one success...." +pids=() +for i in {1..4}; do + ( + cgexec -g misc:$TEST_CG_SUB2 $test_cmd >cgtest_large_positive_$timestamp.$i.log 2>&1 + ) & + pids+=($!) +done + +any_success=0 +for pid in "${pids[@]}"; do + wait "$pid" + status=$? + if [[ $status -eq 0 ]]; then + any_success=1 + echo "# Process $pid returned successfully." + fi +done + +if [[ $any_success -eq 0 ]]; then + echo "# Failed on LARGE limit positive testing, no test passes." + cgdelete -r -g misc:$TEST_ROOT_CG + exit 1 +fi + +echo "# PASSED LARGE limit positive testing." + +echo "# Start 5 concurrent unclobbered_vdso_oversubscribed tests with LARGE limit, expecting at least one failure...." +pids=() +for i in {1..5}; do + ( + cgexec -g misc:$TEST_CG_SUB2 $test_cmd >cgtest_large_negative_$timestamp.$i.log 2>&1 + ) & + pids+=($!) +done + +any_failure=0 +for pid in "${pids[@]}"; do + wait "$pid" + status=$? + if [[ $status -ne 0 ]]; then + echo "# Process $pid returned failure." + any_failure=1 + fi +done + +if [[ $any_failure -eq 0 ]]; then + echo "# Failed on LARGE limit negative testing, no test fails." + cgdelete -r -g misc:$TEST_ROOT_CG + exit 1 +fi + +echo "# PASSED LARGE limit negative testing." + +echo "# Start 10 concurrent unclobbered_vdso_oversubscribed tests with LARGER limit, expecting no failure...." +pids=() +for i in {1..10}; do + ( + cgexec -g misc:$TEST_CG_SUB4 $test_cmd >cgtest_larger_$timestamp.$i.log 2>&1 + ) & + pids+=($!) +done + +any_failure=0 +for pid in "${pids[@]}"; do + wait "$pid" + status=$? + if [[ $status -ne 0 ]]; then + echo "# Process $pid returned failure." + any_failure=1 + fi +done + +if [[ $any_failure -ne 0 ]]; then + echo "# Failed on LARGER limit, at least one test fails." + cgdelete -r -g misc:$TEST_ROOT_CG + exit 1 +fi + +echo "# PASSED LARGER limit tests." + +echo "# PASSED ALL cgroup limit tests, cleanup cgroups..." +cgdelete -r -g misc:$TEST_ROOT_CG +echo "# done." diff --git a/tools/testing/selftests/sgx/watch_misc_for_tests.sh b/tools/testing/selftests/sgx/watch_misc_for_tests.sh new file mode 100755 index 000000000000..dbd38f346e7b --- /dev/null +++ b/tools/testing/selftests/sgx/watch_misc_for_tests.sh @@ -0,0 +1,13 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright(c) 2023 Intel Corporation. + +if [ -z "$1" ] + then + echo "No argument supplied, please provide 'max', 'current' or 'events'" + exit 1 +fi + +watch -n 1 "find /sys/fs/cgroup -wholename */test*/misc.$1 -exec sh -c \ + 'echo \"\$1:\"; cat \"\$1\"' _ {} \;" +