Message ID | 20230923030657.16148-10-haitao.huang@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp72113vqu; Sat, 23 Sep 2023 02:20:23 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFuIRSOGiZEtXLf2CfT5qqnGfXkZ1B3sSxExiPTO+34h6/ZYB2iwPi26+MWclNVzZpw2D+H X-Received: by 2002:a05:6a00:1142:b0:690:1720:aa83 with SMTP id b2-20020a056a00114200b006901720aa83mr2170187pfm.21.1695460823177; Sat, 23 Sep 2023 02:20:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695460823; cv=none; d=google.com; s=arc-20160816; b=xhsnYi7roMR4sYBv0yPXt/VnBvJUMtPkNcNFkBZxr9YPmj0dZm5MElE+6NChh6gWid bI0PJ5iXAbEYjvpXyT6efKzCbixmK7eK6lkK2LLFl4PsrgviGyERO2HnR2TkBEDywqUD 3OK6sGXzMzIWDybCrKCtKU4r2++ElT2DLZUiJ7jxs+LBoUOw1k42oBv8hHslRa32MUjE jsZdzE9iahfGIGlvKW1KuNv9nx/lhuY6iq9eEUjGm0mdficvMIAsJ3dIEJ4qHWzc2cPZ L/judD1N7cK4lasUf4de8Oxb0YIIqREor71b1IUdmy97VqBdi7giOQvsJgztGREshN8t u1tg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=EDB3EXCsqLvDYTGlYabBHLAt2PxBptC/EVFXVM0Jsq8=; fh=j8PE345l5Ydlo3KwK7JeWnjqRgjiq4AteUoOZeOwa0I=; b=xVjmc+gJGZtTRFKNbUCWqRN1VqUcQtlezrCeCsBA1UovzkuKlNkNiyYqKj4sZmA+Dh x1dN60X6AelIW1i8o4e6ZIdc6KiNn6B7xKB8Fy43h3EkZ2ZRJtOEvgaTO1z7jv/BM0ki icaLZCZk1rMq90h4oVjWK5a7mkGVuRoMzzYQHgQ4TCHw2vwwGC1FmyrWzf7CAFoqTQkX NCRYNQGIXnfbcQSxJfhdfAhEah90HF9NWUpp56vrZ/kqOpb8wbBAEMUmZF3JZ6H9PLJu oavr+MHQDjWnqavShTS59XQmB5QkozOKBZ8+7YJjR1cDC9FdbmYv2wBPmjvADbycIEtz DQyw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UGcAPNMK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id w8-20020a63f508000000b0055391572218si5485544pgh.26.2023.09.23.02.20.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 23 Sep 2023 02:20:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=UGcAPNMK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id B79FE80ADF22; Fri, 22 Sep 2023 20:08:07 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229919AbjIWDHp (ORCPT <rfc822;pwkd43@gmail.com> + 28 others); Fri, 22 Sep 2023 23:07:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40124 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229892AbjIWDHM (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 22 Sep 2023 23:07:12 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 87DAE1A8; Fri, 22 Sep 2023 20:07:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695438425; x=1726974425; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CPVq7hMKhTxLnKfdXR7Q7yMpcir8DY5XsL5VtYGW0ew=; b=UGcAPNMKOqBSrrtcEHqIYv7pf84DCxGXAqwrmd0+u9QtO/Aw004XgPWr Ue7jVJvi1e1eSbT3G7YBJQp4BN5Oo9RR6JJi+i9lPrI9JyHT+YoKNWB4g 3C3mPZ6MCG+8n2RYMFKmYZdBgpaX8ZyrEx+h1hHwPLDlwwBJ1OHitT2MC kxCjQSBIF1ImeZqh+IHyqCdc5cm6bOEYicIUEJp7A12grMLWY7YqfVvt4 cZhoJrwykFjoGWBrKLi0SfW0rkxImBWeBptIgnmD4E40Lm2qn40TNAWQ8 MKBLRtdOgylAzZMErwENispd5wHnBXczNPZeeKZxekJQpo/FWB5CSpoiZ g==; X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="447466787" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="447466787" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 20:07:05 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10841"; a="891048555" X-IronPort-AV: E=Sophos;i="6.03,169,1694761200"; d="scan'208";a="891048555" Received: from b4969161e530.jf.intel.com ([10.165.56.46]) by fmsmga001.fm.intel.com with ESMTP; 22 Sep 2023 20:06:08 -0700 From: Haitao Huang <haitao.huang@linux.intel.com> To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com Subject: [PATCH v5 09/18] x86/sgx: Store struct sgx_encl when allocating new VA pages Date: Fri, 22 Sep 2023 20:06:48 -0700 Message-Id: <20230923030657.16148-10-haitao.huang@linux.intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230923030657.16148-1-haitao.huang@linux.intel.com> References: <20230923030657.16148-1-haitao.huang@linux.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Fri, 22 Sep 2023 20:08:07 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777819528350105035 X-GMAIL-MSGID: 1777819528350105035 |
Series |
Add Cgroup support for SGX EPC memory
|
|
Commit Message
Haitao Huang
Sept. 23, 2023, 3:06 a.m. UTC
From: Sean Christopherson <sean.j.christopherson@intel.com> In a later patch, when a cgroup has exceeded the max capacity for EPC pages, it may need to identify and OOM kill a less active enclave to make room for other enclaves within the same group. Such a victim enclave would have no active pages other than the unreclaimable Version Array (VA) and SECS pages. Therefore, the cgroup needs examine its unreclaimable page list, and finding an enclave given a SECS page or a VA page. This will require a backpointer from a page to an enclave, which is not available for VA pages. Because struct sgx_epc_page instances of VA pages are not owned by an sgx_encl_page instance, mark their owner as sgx_encl: pass the struct sgx_encl of the enclave allocating the VA page to sgx_alloc_epc_page(), which will store this value in the owner field of the struct sgx_epc_page. In a later patch, VA pages will be placed in an unreclaimable queue that can be examined by the cgroup to select the OOM killed enclave. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Co-developed-by: Haitao Huang <haitao.huang@linux.intel.com> Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com> Cc: Sean Christopherson <seanjc@google.com> --- V5: - Fixed some comments in code (Jarkko) V4: - Changes needed for patch reordering - Revised commit messages (Jarkko) --- arch/x86/kernel/cpu/sgx/encl.c | 5 +++-- arch/x86/kernel/cpu/sgx/encl.h | 2 +- arch/x86/kernel/cpu/sgx/ioctl.c | 2 +- arch/x86/kernel/cpu/sgx/main.c | 20 ++++++++++---------- arch/x86/kernel/cpu/sgx/sgx.h | 7 ++++++- 5 files changed, 21 insertions(+), 15 deletions(-)
Comments
On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote: > From: Sean Christopherson <sean.j.christopherson@intel.com> > > In a later patch, when a cgroup has exceeded the max capacity for EPC > pages, it may need to identify and OOM kill a less active enclave to > make room for other enclaves within the same group. Such a victim > enclave would have no active pages other than the unreclaimable Version > Array (VA) and SECS pages. Therefore, the cgroup needs examine its > unreclaimable page list, and finding an enclave given a SECS page or a > VA page. This will require a backpointer from a page to an enclave, > which is not available for VA pages. > > Because struct sgx_epc_page instances of VA pages are not owned by an > sgx_encl_page instance, mark their owner as sgx_encl: pass the struct > sgx_encl of the enclave allocating the VA page to sgx_alloc_epc_page(), > which will store this value in the owner field of the struct > sgx_epc_page. In a later patch, VA pages will be placed in an > unreclaimable queue that can be examined by the cgroup to select the OOM > killed enclave. > > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> > Co-developed-by: Kristen Carlson Accardi <kristen@linux.intel.com> > Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> > Co-developed-by: Haitao Huang <haitao.huang@linux.intel.com> > Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com> > Cc: Sean Christopherson <seanjc@google.com> > [...] > @@ -562,7 +562,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) > for ( ; ; ) { > page = __sgx_alloc_epc_page(); > if (!IS_ERR(page)) { > - page->owner = owner; > + page->encl_page = owner; Looks using 'encl_page' is arbitrary. Also actually for virtual EPC page the owner is set to the 'sgx_vepc' instance. > break; > } > > @@ -607,7 +607,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page) > > spin_lock(&node->lock); > > - page->owner = NULL; > + page->encl_page = NULL; Ditto. > if (page->poison) > list_add(&page->list, &node->sgx_poison_page_list); > else > @@ -642,7 +642,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size, > for (i = 0; i < nr_pages; i++) { > section->pages[i].section = index; > section->pages[i].flags = 0; > - section->pages[i].owner = NULL; > + section->pages[i].encl_page = NULL; > section->pages[i].poison = 0; > list_add_tail(§ion->pages[i].list, &sgx_dirty_page_list); > } > diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h > index 764cec23f4e5..5110dd433b80 100644 > --- a/arch/x86/kernel/cpu/sgx/sgx.h > +++ b/arch/x86/kernel/cpu/sgx/sgx.h > @@ -68,7 +68,12 @@ struct sgx_epc_page { > unsigned int section; > u16 flags; > u16 poison; > - struct sgx_encl_page *owner; > + > + /* Possible owner types */ > + union { > + struct sgx_encl_page *encl_page; > + struct sgx_encl *encl; > + }; Sadly for virtual EPC page the owner is set to the 'sgx_vepc' instance it belongs to. Given how sgx_{alloc|free}_epc_page() arbitrarily uses encl_page, perhaps we should do below? union { struct sgx_encl_page *encl_page; struct sgx_encl *encl; struct sgx_vepc *vepc; void *owner; }; And in sgx_{alloc|free}_epc_page() we can use 'owner' instead.
On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote: > From: Sean Christopherson <sean.j.christopherson@intel.com> > > In a later patch, when a cgroup has exceeded the max capacity for EPC > pages, it may need to identify and OOM kill a less active enclave to > make room for other enclaves within the same group. Such a victim > enclave would have no active pages other than the unreclaimable Version > Array (VA) and SECS pages. > What does "no active pages" mean? A "less active enclave" doesn't necessarily mean it has "no active pages"? > Therefore, the cgroup needs examine its ^ needs to > unreclaimable page list, and finding an enclave given a SECS page or a ^ find > VA page. This will require a backpointer from a page to an enclave, > which is not available for VA pages. > > Because struct sgx_epc_page instances of VA pages are not owned by an > sgx_encl_page instance, mark their owner as sgx_encl: pass the struct > sgx_encl of the enclave allocating the VA page to sgx_alloc_epc_page(), > which will store this value in the owner field of the struct > sgx_epc_page. > IMHO this paragraph is hard to understand and can be more concise: One VA page can be shared by multiple enclave pages thus cannot be associated with any 'struct sgx_encl_page' instance. Set the owner of VA page to the enclave instead. > In a later patch, VA pages will be placed in an > unreclaimable queue that can be examined by the cgroup to select the OOM > killed enclave. The code to "place the VA page to unreclaimable queue" has been done in earlier patch ("x86/sgx: Introduce EPC page states"). Just the unreclaimable list isn't introduced yet. I think you should just introduce it first then you can get rid of those "in a later patch" staff. And nit: please use "unreclaimable list" consistently (not queue). Btw, probably a dumb question: Theoretically if you only need to find a victim enclave you don't need to put VA pages to the unreclaimable list, because those VA pages will be freed anyway when enclave is killed. So keeping VA pages in the list is for accounting all the pages that the cgroup is having?
Hi Kai, On Wed, 27 Sep 2023 06:14:20 -0500, Huang, Kai <kai.huang@intel.com> wrote: > On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote: >> From: Sean Christopherson <sean.j.christopherson@intel.com> >> >> In a later patch, when a cgroup has exceeded the max capacity for EPC >> pages, it may need to identify and OOM kill a less active enclave to >> make room for other enclaves within the same group. Such a victim >> enclave would have no active pages other than the unreclaimable Version >> Array (VA) and SECS pages. Therefore, the cgroup needs examine its >> unreclaimable page list, and finding an enclave given a SECS page or a >> VA page. This will require a backpointer from a page to an enclave, >> which is not available for VA pages. >> >> Because struct sgx_epc_page instances of VA pages are not owned by an >> sgx_encl_page instance, mark their owner as sgx_encl: pass the struct >> sgx_encl of the enclave allocating the VA page to sgx_alloc_epc_page(), >> which will store this value in the owner field of the struct >> sgx_epc_page. In a later patch, VA pages will be placed in an >> unreclaimable queue that can be examined by the cgroup to select the OOM >> killed enclave. >> >> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> >> Co-developed-by: Kristen Carlson Accardi <kristen@linux.intel.com> >> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> >> Co-developed-by: Haitao Huang <haitao.huang@linux.intel.com> >> Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com> >> Cc: Sean Christopherson <seanjc@google.com> >> > > [...] > >> @@ -562,7 +562,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void >> *owner, bool reclaim) >> for ( ; ; ) { >> page = __sgx_alloc_epc_page(); >> if (!IS_ERR(page)) { >> - page->owner = owner; >> + page->encl_page = owner; > > Looks using 'encl_page' is arbitrary. > > Also actually for virtual EPC page the owner is set to the 'sgx_vepc' > instance. > >> break; >> } >> >> @@ -607,7 +607,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page) >> >> spin_lock(&node->lock); >> >> - page->owner = NULL; >> + page->encl_page = NULL; > > Ditto. > >> if (page->poison) >> list_add(&page->list, &node->sgx_poison_page_list); >> else >> @@ -642,7 +642,7 @@ static bool __init sgx_setup_epc_section(u64 >> phys_addr, u64 size, >> for (i = 0; i < nr_pages; i++) { >> section->pages[i].section = index; >> section->pages[i].flags = 0; >> - section->pages[i].owner = NULL; >> + section->pages[i].encl_page = NULL; >> section->pages[i].poison = 0; >> list_add_tail(§ion->pages[i].list, &sgx_dirty_page_list); >> } >> diff --git a/arch/x86/kernel/cpu/sgx/sgx.h >> b/arch/x86/kernel/cpu/sgx/sgx.h >> index 764cec23f4e5..5110dd433b80 100644 >> --- a/arch/x86/kernel/cpu/sgx/sgx.h >> +++ b/arch/x86/kernel/cpu/sgx/sgx.h >> @@ -68,7 +68,12 @@ struct sgx_epc_page { >> unsigned int section; >> u16 flags; >> u16 poison; >> - struct sgx_encl_page *owner; >> + >> + /* Possible owner types */ >> + union { >> + struct sgx_encl_page *encl_page; >> + struct sgx_encl *encl; >> + }; > > Sadly for virtual EPC page the owner is set to the 'sgx_vepc' instance it > belongs to. > > Given how sgx_{alloc|free}_epc_page() arbitrarily uses encl_page, > perhaps we > should do below? > > union { > struct sgx_encl_page *encl_page; > struct sgx_encl *encl; > struct sgx_vepc *vepc; > void *owner; > }; > > And in sgx_{alloc|free}_epc_page() we can use 'owner' instead. > As I mentioned in cover letter and change log in 11/18, this series does not track virtual EPC. We can add vepc field into the union in future if such tracking is needed. Don't think "void *owner" is needed though. Thanks Haitao
On Wed, 2023-09-27 at 10:35 -0500, Haitao Huang wrote: > > > + > > > + /* Possible owner types */ > > > + union { > > > + struct sgx_encl_page *encl_page; > > > + struct sgx_encl *encl; > > > + }; > > > > Sadly for virtual EPC page the owner is set to the 'sgx_vepc' instance it > > belongs to. > > > > Given how sgx_{alloc|free}_epc_page() arbitrarily uses encl_page, > > perhaps we > > should do below? > > > > union { > > struct sgx_encl_page *encl_page; > > struct sgx_encl *encl; > > struct sgx_vepc *vepc; > > void *owner; > > }; > > > > And in sgx_{alloc|free}_epc_page() we can use 'owner' instead. > > > > As I mentioned in cover letter and change log in 11/18, this series does > not track virtual EPC. It's not about how does the cover letter says. We cannot ignore the fact that currently virtual EPC uses owner too. But given the virtual EPC code currently doesn't use the owner, I can live with not having the 'vepc' member in the union now. > We can add vepc field into the union in future if such tracking is needed. > Don't think "void *owner" is needed though. As mentioned, using 'encl_page' arbitrarily in sgx_alloc_epc_page() doesn't look nice. Do you have example in the current kernel code to prove it is acceptable?
On Wed, 27 Sep 2023 16:21:19 -0500, Huang, Kai <kai.huang@intel.com> wrote: > On Wed, 2023-09-27 at 10:35 -0500, Haitao Huang wrote: >> > > + >> > > + /* Possible owner types */ >> > > + union { >> > > + struct sgx_encl_page *encl_page; >> > > + struct sgx_encl *encl; >> > > + }; >> > >> > Sadly for virtual EPC page the owner is set to the 'sgx_vepc' >> instance it >> > belongs to. >> > >> > Given how sgx_{alloc|free}_epc_page() arbitrarily uses encl_page,> >> perhaps we >> > should do below? >> > >> > union { >> > struct sgx_encl_page *encl_page; >> > struct sgx_encl *encl; >> > struct sgx_vepc *vepc; >> > void *owner; >> > }; >> > >> > And in sgx_{alloc|free}_epc_page() we can use 'owner' instead. >> > >> >> As I mentioned in cover letter and change log in 11/18, this series does >> not track virtual EPC. > > It's not about how does the cover letter says. We cannot ignore the > fact that > currently virtual EPC uses owner too. > > But given the virtual EPC code currently doesn't use the owner, I can > live with > not having the 'vepc' member in the union now. Ah, I forgot even though we don't use the owner field assigned by vepc, it is still passed into sgx_alloc/free_epc_page(). Will add back "void* owner" and use it for assignment inside sgx_alloc/free_epc_page(). Thanks Haitao
On Fri, 2023-09-29 at 10:06 -0500, Haitao Huang wrote: > On Wed, 27 Sep 2023 16:21:19 -0500, Huang, Kai <kai.huang@intel.com> wrote: > > > On Wed, 2023-09-27 at 10:35 -0500, Haitao Huang wrote: > > > > > + > > > > > + /* Possible owner types */ > > > > > + union { > > > > > + struct sgx_encl_page *encl_page; > > > > > + struct sgx_encl *encl; > > > > > + }; > > > > > > > > Sadly for virtual EPC page the owner is set to the 'sgx_vepc' > > > instance it > > > > belongs to. > > > > > > > > Given how sgx_{alloc|free}_epc_page() arbitrarily uses encl_page,> > > > perhaps we > > > > should do below? > > > > > > > > union { > > > > struct sgx_encl_page *encl_page; > > > > struct sgx_encl *encl; > > > > struct sgx_vepc *vepc; > > > > void *owner; > > > > }; > > > > > > > > And in sgx_{alloc|free}_epc_page() we can use 'owner' instead. > > > > > > > > > > As I mentioned in cover letter and change log in 11/18, this series does > > > not track virtual EPC. > > > > It's not about how does the cover letter says. We cannot ignore the > > fact that > > currently virtual EPC uses owner too. > > > > But given the virtual EPC code currently doesn't use the owner, I can > > live with > > not having the 'vepc' member in the union now. > > Ah, I forgot even though we don't use the owner field assigned by vepc, it > is still passed into sgx_alloc/free_epc_page(). > > Will add back "void* owner" and use it for assignment inside > sgx_alloc/free_epc_page(). > > And also sgx_setup_epc_section().
On Wed, 27 Sep 2023 06:35:57 -0500, Huang, Kai <kai.huang@intel.com> wrote: > On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote: >> From: Sean Christopherson <sean.j.christopherson@intel.com> >> >> In a later patch, when a cgroup has exceeded the max capacity for EPC >> pages, it may need to identify and OOM kill a less active enclave to >> make room for other enclaves within the same group. Such a victim >> enclave would have no active pages other than the unreclaimable Version >> Array (VA) and SECS pages. > > What does "no active pages" mean? > EPC pages in use. > A "less active enclave" doesn't necessarily mean it has "no active > pages"? > I'll rephrase the above sentences > >> Therefore, the cgroup needs examine its > ^ > needs to > >> unreclaimable page list, and finding an enclave given a SECS page or a > ^ > find > >> VA page. This will require a backpointer from a page to an enclave, >> which is not available for VA pages. >> >> Because struct sgx_epc_page instances of VA pages are not owned by an >> sgx_encl_page instance, mark their owner as sgx_encl: pass the struct >> sgx_encl of the enclave allocating the VA page to sgx_alloc_epc_page(), >> which will store this value in the owner field of the struct >> sgx_epc_page. > > IMHO this paragraph is hard to understand and can be more concise: > > One VA page can be shared by multiple enclave pages thus cannot be > associated > with any 'struct sgx_encl_page' instance. Set the owner of VA page to > the > enclave instead. > > Agreed >> In a later patch, VA pages will be placed in an >> unreclaimable queue that can be examined by the cgroup to select the OOM >> killed enclave. > > The code to "place the VA page to unreclaimable queue" has been done in > earlier > patch ("x86/sgx: Introduce EPC page states"). Just the unreclaimable > list isn't > introduced yet. I think you should just introduce it first then you can > get rid > of those "in a later patch" staff. > I hope I was able to clarify to you in other threads that VA pages are not placed in any queue/list until [PATCH v5 11/18] x86/sgx: store unreclaimable pages in LRU lists. This patch is the first one to implement tracking for unreclaimable pages. I'll add that as a transition hint. > And nit: please use "unreclaimable list" consistently (not queue). > Yes will do > > Btw, probably a dumb question: > > Theoretically if you only need to find a victim enclave you don't need > to put VA > pages to the unreclaimable list, because those VA pages will be freed > anyway > when enclave is killed. So keeping VA pages in the list is for > accounting all > the pages that the cgroup is having? Yes basically tracking them in cgroups as they are allocated. VAs and SECS may also come and go as swapping/unswapping happens. But if a cgroup is OOM, and all reclaimables are gone (swapped out), it'd have to reclaim VAs/SECs in the same cgroup starting from the front of the LRU list. To reclaim a VA/SECS, it identifies the enclave from the owner of the VA/SECS page and kills it, as killing enclave is the only way to reclaim VA/SECS pages.
On Tue, 2023-10-03 at 01:45 -0500, Haitao Huang wrote: > > > > Btw, probably a dumb question: > > > > Theoretically if you only need to find a victim enclave you don't need > > to put VA > > pages to the unreclaimable list, because those VA pages will be freed > > anyway > > when enclave is killed. So keeping VA pages in the list is for > > accounting all > > the pages that the cgroup is having? > > Yes basically tracking them in cgroups as they are allocated. > > VAs and SECS may also come and go as swapping/unswapping happens. But if a > cgroup is OOM, and all reclaimables are gone (swapped out), it'd have to > reclaim VAs/SECs in the same cgroup starting from the front of the LRU > list. To reclaim a VA/SECS, it identifies the enclave from the owner of > the VA/SECS page and kills it, as killing enclave is the only way to > reclaim VA/SECS pages. To kill enclave you just need to track SECS in the unreclaimable list. Only when you want to account the total EPC pages via some list you _probably_ need to track VA as well. But I am not quite sure about this either.
On Tue, 03 Oct 2023 15:07:42 -0500, Huang, Kai <kai.huang@intel.com> wrote: > On Tue, 2023-10-03 at 01:45 -0500, Haitao Huang wrote: >> > >> > Btw, probably a dumb question: >> > >> > Theoretically if you only need to find a victim enclave you don't need >> > to put VA >> > pages to the unreclaimable list, because those VA pages will be freed >> > anyway >> > when enclave is killed. So keeping VA pages in the list is for> >> accounting all >> > the pages that the cgroup is having? >> >> Yes basically tracking them in cgroups as they are allocated. >> >> VAs and SECS may also come and go as swapping/unswapping happens. But >> if acgroup is OOM, and all reclaimables are gone (swapped out), it'd >> have toreclaim VAs/SECs in the same cgroup starting from the front of >> the LRUlist. To reclaim a VA/SECS, it identifies the enclave from the >> owner ofthe VA/SECS page and kills it, as killing enclave is the only >> way toreclaim VA/SECS pages. > > To kill enclave you just need to track SECS in the unreclaimable list. > Only when you want to account the total EPC pages via some list you > _probably_ > need to track VA as well. But I am not quite sure about this either. There is a case where even SECS is paged out for an enclave with all reclaimables out. So cgroup needs to track each page used by an enclave and kill enclave when cgroup needs to lower usage by evicting an VA or SECS page. There were some discussion on paging out VAs without killing enclaves but it'd be complicated and not implemented yet. BTW, I need clarify tracking pages which is done by LRUs vs usage accounting which is done by charge/uncharge to misc. To me tracking is for reclaiming not accounting. Also vEPCs not tracked at all but they are accounted for. Haitao
On Wed, 2023-10-04 at 10:03 -0500, Haitao Huang wrote: > On Tue, 03 Oct 2023 15:07:42 -0500, Huang, Kai <kai.huang@intel.com> wrote: > > > On Tue, 2023-10-03 at 01:45 -0500, Haitao Huang wrote: > > > > > > > > Btw, probably a dumb question: > > > > > > > > Theoretically if you only need to find a victim enclave you don't need > > > > to put VA > > > > pages to the unreclaimable list, because those VA pages will be freed > > > > anyway > > > > when enclave is killed. So keeping VA pages in the list is for> > > > accounting all > > > > the pages that the cgroup is having? > > > > > > Yes basically tracking them in cgroups as they are allocated. > > > > > > VAs and SECS may also come and go as swapping/unswapping happens. But > > > if acgroup is OOM, and all reclaimables are gone (swapped out), it'd > > > have toreclaim VAs/SECs in the same cgroup starting from the front of > > > the LRUlist. To reclaim a VA/SECS, it identifies the enclave from the > > > owner ofthe VA/SECS page and kills it, as killing enclave is the only > > > way toreclaim VA/SECS pages. > > > > To kill enclave you just need to track SECS in the unreclaimable list. > > Only when you want to account the total EPC pages via some list you > > _probably_ > > need to track VA as well. But I am not quite sure about this either. > > There is a case where even SECS is paged out for an enclave with all > reclaimables out. > Yes. But this essentially means these enclaves are not active, thus shouldn't be the victim of OOM? > So cgroup needs to track each page used by an enclave > and kill enclave when cgroup needs to lower usage by evicting an VA or > SECS page. Let's discuss more on tracking SECS on unreclaimable list only. Could we assume that when the OOM wants to pick up a victim to serve the new enclave, there must be at least another one *active* enclave which still has the SECS page in EPC? If yes, that enclave will be selected as victim. If not, then no other enclave will be selected as victim. Instead, only the new enclave which is requesting more EPC will be selected, because it's SECS is on the unreclaimable list. Somehow this is unacceptable, thus we need to track VA pages too in order to kill other inactive enclave? > There were some discussion on paging out VAs without killing enclaves but > it'd be complicated and not implemented yet. No we don't involve swapping VA pages now. It's a separate topic. > > BTW, I need clarify tracking pages which is done by LRUs vs usage > accounting which is done by charge/uncharge to misc. To me tracking is for > reclaiming not accounting. Also vEPCs not tracked at all but they are > accounted for. I'll review the rest patches. Thanks.
On Wed, 04 Oct 2023 16:13:41 -0500, Huang, Kai <kai.huang@intel.com> wrote: > On Wed, 2023-10-04 at 10:03 -0500, Haitao Huang wrote: >> On Tue, 03 Oct 2023 15:07:42 -0500, Huang, Kai <kai.huang@intel.com> >> wrote: >> >> > On Tue, 2023-10-03 at 01:45 -0500, Haitao Huang wrote: >> > > > >> > > > Btw, probably a dumb question: >> > > > >> > > > Theoretically if you only need to find a victim enclave you don't >> need >> > > > to put VA >> > > > pages to the unreclaimable list, because those VA pages will be >> freed >> > > > anyway >> > > > when enclave is killed. So keeping VA pages in the list is for> >> > > accounting all >> > > > the pages that the cgroup is having? >> > > >> > > Yes basically tracking them in cgroups as they are allocated. >> > > >> > > VAs and SECS may also come and go as swapping/unswapping happens. >> But >> > > if acgroup is OOM, and all reclaimables are gone (swapped out), it'd >> > > have toreclaim VAs/SECs in the same cgroup starting from the front >> of >> > > the LRUlist. To reclaim a VA/SECS, it identifies the enclave from >> the >> > > owner ofthe VA/SECS page and kills it, as killing enclave is the >> only >> > > way toreclaim VA/SECS pages. >> > >> > To kill enclave you just need to track SECS in the unreclaimable >> list. >> > Only when you want to account the total EPC pages via some list you >> > _probably_ >> > need to track VA as well. But I am not quite sure about this either. >> >> There is a case where even SECS is paged out for an enclave with all >> reclaimables out. > > Yes. But this essentially means these enclaves are not active, thus > shouldn't > be the victim of OOM? > But there are VA pages for the enclave at that moment. So it can be candidate for OOM victim. >> So cgroup needs to track each page used by an enclave >> and kill enclave when cgroup needs to lower usage by evicting an VA or >> SECS page. > > Let's discuss more on tracking SECS on unreclaimable list only. > > Could we assume that when the OOM wants to pick up a victim to serve the > new > enclave, there must be at least another one *active* enclave which still > has the > SECS page in EPC? > No, at a given instant when OOM happens, "active" enclave's SECS may not be in EPC, but lots of VAs. OOM := "no reclaimable pages left in the cgroup to reclaim and total usage is still at/near limit". > If yes, that enclave will be selected as victim. > > If not, then no other enclave will be selected as victim. Instead, only > the new > enclave which is requesting more EPC will be selected, because it's SECS > is on > the unreclaimable list. > You can't assume the requesting enclave's SECS is in unreclaimable list either. Think the request is from #PF in the scenario we fixed the NULL pointer of SECS by reloading it. > Somehow this is unacceptable, thus we need to track VA pages too in > order to > kill other inactive enclave? > If we know for sure SECS will always be in EPC, thus tracked in unreclaimables, then we probably can do it (see below). I hope the reason given above is clear. >> There were some discussion on paging out VAs without killing enclaves >> but >> it'd be complicated and not implemented yet. > > No we don't involve swapping VA pages now. It's a separate topic. > Only mentioned it as a kind of constraints impacting current design. Another potential alternative: we don't reclaim SECS either until OOM and only track SECS pages for cgroups. But that would change current behavior. And I'm not sure about other consequences, e.g., enclaves theoretically can allocate pages (including VA pages) in different cgroups/processes, so we may still end up tracking all VA pages for cgroups or we track SECS page in all cgroups in which enclave allocated any pages. Let me know your thoughts. >> >> BTW, I need clarify tracking pages which is done by LRUs vs usage >> accounting which is done by charge/uncharge to misc. To me tracking is >> for >> reclaiming not accounting. Also vEPCs not tracked at all but they are >> accounted for. > > I'll review the rest patches. Thanks. Thank you! Haitao
On Wed, 2023-10-04 at 23:22 -0500, Haitao Huang wrote: > On Wed, 04 Oct 2023 16:13:41 -0500, Huang, Kai <kai.huang@intel.com> wrote: > > > On Wed, 2023-10-04 at 10:03 -0500, Haitao Huang wrote: > > > On Tue, 03 Oct 2023 15:07:42 -0500, Huang, Kai <kai.huang@intel.com> > > > wrote: > > > > > > > On Tue, 2023-10-03 at 01:45 -0500, Haitao Huang wrote: > > > > > > > > > > > > Btw, probably a dumb question: > > > > > > > > > > > > Theoretically if you only need to find a victim enclave you don't > > > need > > > > > > to put VA > > > > > > pages to the unreclaimable list, because those VA pages will be > > > freed > > > > > > anyway > > > > > > when enclave is killed. So keeping VA pages in the list is for> > > > > > accounting all > > > > > > the pages that the cgroup is having? > > > > > > > > > > Yes basically tracking them in cgroups as they are allocated. > > > > > > > > > > VAs and SECS may also come and go as swapping/unswapping happens. > > > But > > > > > if acgroup is OOM, and all reclaimables are gone (swapped out), it'd > > > > > have toreclaim VAs/SECs in the same cgroup starting from the front > > > of > > > > > the LRUlist. To reclaim a VA/SECS, it identifies the enclave from > > > the > > > > > owner ofthe VA/SECS page and kills it, as killing enclave is the > > > only > > > > > way toreclaim VA/SECS pages. > > > > > > > > To kill enclave you just need to track SECS in the unreclaimable > > > list. > > > > Only when you want to account the total EPC pages via some list you > > > > _probably_ > > > > need to track VA as well. But I am not quite sure about this either. > > > > > > There is a case where even SECS is paged out for an enclave with all > > > reclaimables out. > > > > Yes. But this essentially means these enclaves are not active, thus > > shouldn't > > be the victim of OOM? > > > > But there are VA pages for the enclave at that moment. So it can be > candidate for OOM victim. Yes. I am not familiar with how does OOM choose victim, but it seems choosing inactive enclaves seems more reasonable. [...] > > > There were some discussion on paging out VAs without killing enclaves > > > but > > > it'd be complicated and not implemented yet. > > > > No we don't involve swapping VA pages now. It's a separate topic. > > > Only mentioned it as a kind of constraints impacting current design. > > Another potential alternative: we don't reclaim SECS either until OOM and > only track SECS pages for cgroups. But that would change current behavior. > And I'm not sure about other consequences, e.g., enclaves theoretically > can allocate pages (including VA pages) in different cgroups/processes, so > we may still end up tracking all VA pages for cgroups or we track SECS > page in all cgroups in which enclave allocated any pages. Let me know your > thoughts. Let's not change current behaviour. I seriously doubt that is needed. So it seems to me that what we need is just some way to let the OOM find some victim enclave. I am not sure whether "tracking EPC pages in some lists" has anything to do with cgroup accounting EPC pages, so will take a look the rest of the patches.
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index f5afc8d65e22..ec3402d41b63 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -1236,6 +1236,7 @@ void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr) /** * sgx_alloc_va_page() - Allocate a Version Array (VA) page + * @encl: The new owner of the page. * @reclaim: Reclaim EPC pages directly if none available. Enclave * mutex should not be held if this is set. * @@ -1245,12 +1246,12 @@ void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr) * a VA page, * -errno otherwise */ -struct sgx_epc_page *sgx_alloc_va_page(bool reclaim) +struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim) { struct sgx_epc_page *epc_page; int ret; - epc_page = sgx_alloc_epc_page(NULL, reclaim); + epc_page = sgx_alloc_epc_page(encl, reclaim); if (IS_ERR(epc_page)) return ERR_CAST(epc_page); diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h index f94ff14c9486..831d63f80f5a 100644 --- a/arch/x86/kernel/cpu/sgx/encl.h +++ b/arch/x86/kernel/cpu/sgx/encl.h @@ -116,7 +116,7 @@ struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl, unsigned long offset, u64 secinfo_flags); void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr); -struct sgx_epc_page *sgx_alloc_va_page(bool reclaim); +struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim); unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page); void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset); bool sgx_va_page_full(struct sgx_va_page *va_page); diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index 9a32bf5a1070..164256ea18d0 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -30,7 +30,7 @@ struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl, bool reclaim) if (!va_page) return ERR_PTR(-ENOMEM); - va_page->epc_page = sgx_alloc_va_page(reclaim); + va_page->epc_page = sgx_alloc_va_page(encl, reclaim); if (IS_ERR(va_page->epc_page)) { err = ERR_CAST(va_page->epc_page); kfree(va_page); diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index fba06dc5abfe..ed813288af44 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -107,7 +107,7 @@ static unsigned long __sgx_sanitize_pages(struct list_head *dirty_page_list) static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page) { - struct sgx_encl_page *page = epc_page->owner; + struct sgx_encl_page *page = epc_page->encl_page; struct sgx_encl *encl = page->encl; struct sgx_encl_mm *encl_mm; bool ret = true; @@ -139,7 +139,7 @@ static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page) static void sgx_reclaimer_block(struct sgx_epc_page *epc_page) { - struct sgx_encl_page *page = epc_page->owner; + struct sgx_encl_page *page = epc_page->encl_page; unsigned long addr = page->desc & PAGE_MASK; struct sgx_encl *encl = page->encl; int ret; @@ -196,7 +196,7 @@ void sgx_ipi_cb(void *info) static void sgx_encl_ewb(struct sgx_epc_page *epc_page, struct sgx_backing *backing) { - struct sgx_encl_page *encl_page = epc_page->owner; + struct sgx_encl_page *encl_page = epc_page->encl_page; struct sgx_encl *encl = encl_page->encl; struct sgx_va_page *va_page; unsigned int va_offset; @@ -249,7 +249,7 @@ static void sgx_encl_ewb(struct sgx_epc_page *epc_page, static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, struct sgx_backing *backing) { - struct sgx_encl_page *encl_page = epc_page->owner; + struct sgx_encl_page *encl_page = epc_page->encl_page; struct sgx_encl *encl = encl_page->encl; struct sgx_backing secs_backing; int ret; @@ -309,7 +309,7 @@ static void sgx_reclaim_pages(void) break; list_del_init(&epc_page->list); - encl_page = epc_page->owner; + encl_page = epc_page->encl_page; if (kref_get_unless_zero(&encl_page->encl->refcount) != 0) { sgx_epc_page_set_state(epc_page, SGX_EPC_PAGE_RECLAIM_IN_PROGRESS); @@ -329,7 +329,7 @@ static void sgx_reclaim_pages(void) i = 0; list_for_each_entry_safe(epc_page, tmp, &iso, list) { - encl_page = epc_page->owner; + encl_page = epc_page->encl_page; if (!sgx_reclaimer_age(epc_page)) goto skip; @@ -362,7 +362,7 @@ static void sgx_reclaim_pages(void) i = 0; list_for_each_entry_safe(epc_page, tmp, &iso, list) { - encl_page = epc_page->owner; + encl_page = epc_page->encl_page; sgx_reclaimer_write(epc_page, &backing[i++]); kref_put(&encl_page->encl->refcount, sgx_encl_release); @@ -562,7 +562,7 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim) for ( ; ; ) { page = __sgx_alloc_epc_page(); if (!IS_ERR(page)) { - page->owner = owner; + page->encl_page = owner; break; } @@ -607,7 +607,7 @@ void sgx_free_epc_page(struct sgx_epc_page *page) spin_lock(&node->lock); - page->owner = NULL; + page->encl_page = NULL; if (page->poison) list_add(&page->list, &node->sgx_poison_page_list); else @@ -642,7 +642,7 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size, for (i = 0; i < nr_pages; i++) { section->pages[i].section = index; section->pages[i].flags = 0; - section->pages[i].owner = NULL; + section->pages[i].encl_page = NULL; section->pages[i].poison = 0; list_add_tail(§ion->pages[i].list, &sgx_dirty_page_list); } diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 764cec23f4e5..5110dd433b80 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -68,7 +68,12 @@ struct sgx_epc_page { unsigned int section; u16 flags; u16 poison; - struct sgx_encl_page *owner; + + /* Possible owner types */ + union { + struct sgx_encl_page *encl_page; + struct sgx_encl *encl; + }; struct list_head list; };