[v2,02/18] x86/sgx: Store struct sgx_encl when allocating new VA pages

Message ID 20221202183655.3767674-3-kristen@linux.intel.com
State New
Headers
Series Add Cgroup support for SGX EPC memory |

Commit Message

Kristen Carlson Accardi Dec. 2, 2022, 6:36 p.m. UTC
  From: Sean Christopherson <sean.j.christopherson@intel.com>

When allocating new Version Array (VA) pages, pass the struct sgx_encl
of the enclave that is allocating the page. sgx_alloc_epc_page() will
store this value in the encl_owner field of the struct sgx_epc_page. In
a later patch, VA pages will be placed in an unreclaimable queue,
and then when the cgroup max limit is reached and there are no more
reclaimable pages and the enclave must be oom killed, all the
VA pages associated with that enclave can be uncharged and freed.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kernel/cpu/sgx/encl.c  | 5 +++--
 arch/x86/kernel/cpu/sgx/encl.h  | 2 +-
 arch/x86/kernel/cpu/sgx/ioctl.c | 2 +-
 arch/x86/kernel/cpu/sgx/sgx.h   | 1 +
 4 files changed, 6 insertions(+), 4 deletions(-)
  

Comments

Dave Hansen Dec. 2, 2022, 9:35 p.m. UTC | #1
On 12/2/22 10:36, Kristen Carlson Accardi wrote:
> When allocating new Version Array (VA) pages, pass the struct sgx_encl
> of the enclave that is allocating the page. sgx_alloc_epc_page() will
> store this value in the encl_owner field of the struct sgx_epc_page. In
> a later patch, VA pages will be placed in an unreclaimable queue,
> and then when the cgroup max limit is reached and there are no more
> reclaimable pages and the enclave must be oom killed, all the
> VA pages associated with that enclave can be uncharged and freed.

What does this have to do with the 'encl' that is being passed, though?

In other words, why is this new sgx_epc_page-to-encl mapping needed for
VA pages now, but it wasn't before?
  
Kristen Carlson Accardi Dec. 2, 2022, 9:40 p.m. UTC | #2
On Fri, 2022-12-02 at 13:35 -0800, Dave Hansen wrote:
> On 12/2/22 10:36, Kristen Carlson Accardi wrote:
> > When allocating new Version Array (VA) pages, pass the struct
> > sgx_encl
> > of the enclave that is allocating the page. sgx_alloc_epc_page()
> > will
> > store this value in the encl_owner field of the struct
> > sgx_epc_page. In
> > a later patch, VA pages will be placed in an unreclaimable queue,
> > and then when the cgroup max limit is reached and there are no more
> > reclaimable pages and the enclave must be oom killed, all the
> > VA pages associated with that enclave can be uncharged and freed.
> 
> What does this have to do with the 'encl' that is being passed,
> though?
> 
> In other words, why is this new sgx_epc_page-to-encl mapping needed
> for
> VA pages now, but it wasn't before?

When we OOM kill an enclave, we want to get rid of all the associated
VA pages too. Prior to this patch, there wasn't a way to easily get the
VA pages associated with an enclave.
  
Dave Hansen Dec. 2, 2022, 9:48 p.m. UTC | #3
On 12/2/22 13:40, Kristen Carlson Accardi wrote:
> On Fri, 2022-12-02 at 13:35 -0800, Dave Hansen wrote:
>> On 12/2/22 10:36, Kristen Carlson Accardi wrote:
>>> When allocating new Version Array (VA) pages, pass the struct
>>> sgx_encl
>>> of the enclave that is allocating the page. sgx_alloc_epc_page()
>>> will
>>> store this value in the encl_owner field of the struct
>>> sgx_epc_page. In
>>> a later patch, VA pages will be placed in an unreclaimable queue,
>>> and then when the cgroup max limit is reached and there are no more
>>> reclaimable pages and the enclave must be oom killed, all the
>>> VA pages associated with that enclave can be uncharged and freed.
>> What does this have to do with the 'encl' that is being passed,
>> though?
>>
>> In other words, why is this new sgx_epc_page-to-encl mapping needed
>> for
>> VA pages now, but it wasn't before?
> When we OOM kill an enclave, we want to get rid of all the associated
> VA pages too. Prior to this patch, there wasn't a way to easily get the
> VA pages associated with an enclave.

Given an enclave, we have encl->va_pages to look up all the VA pages.
Also, this patch's code allows you to go from a va page to an enclave.
That seems like it's going the other direction from what an OOM-kill
would need to do.
  
Sean Christopherson Dec. 2, 2022, 10:35 p.m. UTC | #4
On Fri, Dec 02, 2022, Dave Hansen wrote:
> On 12/2/22 13:40, Kristen Carlson Accardi wrote:
> > On Fri, 2022-12-02 at 13:35 -0800, Dave Hansen wrote:
> >> On 12/2/22 10:36, Kristen Carlson Accardi wrote:
> >>> When allocating new Version Array (VA) pages, pass the struct
> >>> sgx_encl
> >>> of the enclave that is allocating the page. sgx_alloc_epc_page()
> >>> will
> >>> store this value in the encl_owner field of the struct
> >>> sgx_epc_page. In
> >>> a later patch, VA pages will be placed in an unreclaimable queue,
> >>> and then when the cgroup max limit is reached and there are no more
> >>> reclaimable pages and the enclave must be oom killed, all the
> >>> VA pages associated with that enclave can be uncharged and freed.
> >> What does this have to do with the 'encl' that is being passed,
> >> though?
> >>
> >> In other words, why is this new sgx_epc_page-to-encl mapping needed
> >> for
> >> VA pages now, but it wasn't before?
> > When we OOM kill an enclave, we want to get rid of all the associated
> > VA pages too. Prior to this patch, there wasn't a way to easily get the
> > VA pages associated with an enclave.
> 
> Given an enclave, we have encl->va_pages to look up all the VA pages.
> Also, this patch's code allows you to go from a va page to an enclave.

Yep.

> That seems like it's going the other direction from what an OOM-kill
> would need to do.

Providing a backpointer from a VA page to its enclave allows OOM-killing the enclave
if its cgroup is over the limit but there are no reclaimable pages for said cgroup
(for SGX's definition of "reclaimable").  I.e. if all of an enclave's "regular"
pages have been swapped out, the only thing left resident in the EPC will be the
enclave's VA pages, which are not reclaimable in the kernel's current SGX
implementation.
  
Dave Hansen Dec. 2, 2022, 10:47 p.m. UTC | #5
On 12/2/22 14:35, Sean Christopherson wrote:
>> That seems like it's going the other direction from what an OOM-kill
>> would need to do.
> Providing a backpointer from a VA page to its enclave allows OOM-killing the enclave
> if its cgroup is over the limit but there are no reclaimable pages for said cgroup
> (for SGX's definition of "reclaimable").  I.e. if all of an enclave's "regular"
> pages have been swapped out, the only thing left resident in the EPC will be the
> enclave's VA pages, which are not reclaimable in the kernel's current SGX
> implementation.

Ooooooooooooooooooooh.  I'm a dummy.


So, we've got a cgroup.  It's in OOM-kill mode and we're looking at the
*cgroup* LRU lists.  We've done everything we can to the enclave and
swapped everything out that we can.  All we're left with are these
crummy VA pages on the LRU (or equally crummy pages).  We want to
reclaim them but can't swap VA pages.  Our only recourse is to go to the
enclave and kill *it*.

Right now, we can easily find an enclave's VA pages and free them.  We
do that all the time when freeing whole enclaves.  But, what we can't
easily do is find an enclave given a VA page.

A reverse pointer from VA page back to enclave allows the VA page's
enclave to be located and efficiently killed.

Right?

Could we add that context to the changelog, please?
  
Sean Christopherson Dec. 2, 2022, 10:49 p.m. UTC | #6
On Fri, Dec 02, 2022, Dave Hansen wrote:
> On 12/2/22 14:35, Sean Christopherson wrote:
> >> That seems like it's going the other direction from what an OOM-kill
> >> would need to do.
> > Providing a backpointer from a VA page to its enclave allows OOM-killing the enclave
> > if its cgroup is over the limit but there are no reclaimable pages for said cgroup
> > (for SGX's definition of "reclaimable").  I.e. if all of an enclave's "regular"
> > pages have been swapped out, the only thing left resident in the EPC will be the
> > enclave's VA pages, which are not reclaimable in the kernel's current SGX
> > implementation.
> 
> Ooooooooooooooooooooh.  I'm a dummy.
> 
> 
> So, we've got a cgroup.  It's in OOM-kill mode and we're looking at the
> *cgroup* LRU lists.  We've done everything we can to the enclave and
> swapped everything out that we can.  All we're left with are these
> crummy VA pages on the LRU (or equally crummy pages).  We want to
> reclaim them but can't swap VA pages.  Our only recourse is to go to the
> enclave and kill *it*.
> 
> Right now, we can easily find an enclave's VA pages and free them.  We
> do that all the time when freeing whole enclaves.  But, what we can't
> easily do is find an enclave given a VA page.
> 
> A reverse pointer from VA page back to enclave allows the VA page's
> enclave to be located and efficiently killed.
> 
> Right?

Yep, exactly.
  

Patch

diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index f40d64206ded..4eaf9d21e71b 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -1193,6 +1193,7 @@  void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr)
 
 /**
  * sgx_alloc_va_page() - Allocate a Version Array (VA) page
+ * @encl:    The enclave that this page is allocated to.
  * @reclaim: Reclaim EPC pages directly if none available. Enclave
  *           mutex should not be held if this is set.
  *
@@ -1202,12 +1203,12 @@  void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr)
  *   a VA page,
  *   -errno otherwise
  */
-struct sgx_epc_page *sgx_alloc_va_page(bool reclaim)
+struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim)
 {
 	struct sgx_epc_page *epc_page;
 	int ret;
 
-	epc_page = sgx_alloc_epc_page(NULL, reclaim);
+	epc_page = sgx_alloc_epc_page(encl, reclaim);
 	if (IS_ERR(epc_page))
 		return ERR_CAST(epc_page);
 
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index f94ff14c9486..831d63f80f5a 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -116,7 +116,7 @@  struct sgx_encl_page *sgx_encl_page_alloc(struct sgx_encl *encl,
 					  unsigned long offset,
 					  u64 secinfo_flags);
 void sgx_zap_enclave_ptes(struct sgx_encl *encl, unsigned long addr);
-struct sgx_epc_page *sgx_alloc_va_page(bool reclaim);
+struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim);
 unsigned int sgx_alloc_va_slot(struct sgx_va_page *va_page);
 void sgx_free_va_slot(struct sgx_va_page *va_page, unsigned int offset);
 bool sgx_va_page_full(struct sgx_va_page *va_page);
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index ebe79d60619f..9a1bb3c3211a 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -30,7 +30,7 @@  struct sgx_va_page *sgx_encl_grow(struct sgx_encl *encl, bool reclaim)
 		if (!va_page)
 			return ERR_PTR(-ENOMEM);
 
-		va_page->epc_page = sgx_alloc_va_page(reclaim);
+		va_page->epc_page = sgx_alloc_va_page(encl, reclaim);
 		if (IS_ERR(va_page->epc_page)) {
 			err = ERR_CAST(va_page->epc_page);
 			kfree(va_page);
diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
index d16a8baa28d4..39cb15a8abcb 100644
--- a/arch/x86/kernel/cpu/sgx/sgx.h
+++ b/arch/x86/kernel/cpu/sgx/sgx.h
@@ -39,6 +39,7 @@  struct sgx_epc_page {
 		struct sgx_encl_page *encl_owner;
 		/* Use when SGX_EPC_PAGE_KVM_GUEST set in ->flags: */
 		void __user *vepc_vaddr;
+		struct sgx_encl *encl;
 	};
 	struct list_head list;
 };