From patchwork Fri Dec 2 18:36:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kristen Carlson Accardi X-Patchwork-Id: 29063 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp1015888wrr; Fri, 2 Dec 2022 10:43:03 -0800 (PST) X-Google-Smtp-Source: AA0mqf5yfHEFoRrP4kQ6Wb4BWxBJRWd1DiIm0hy1bNBntPGRzTD8Q+MHbTNGE9zdkfjkjfr/JVHv X-Received: by 2002:a17:906:924e:b0:782:2d3e:6340 with SMTP id c14-20020a170906924e00b007822d3e6340mr61628555ejx.234.1670006583729; Fri, 02 Dec 2022 10:43:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670006583; cv=none; d=google.com; s=arc-20160816; b=DTAyZin1apGcnNdwtMYqci9WhAEDrywIm5Zw6KLBtzxh7W4UH4HDMf2NsPhKajhJC/ XQSaQjxQp/m/aXxqtxe/I7exotDne5Z4gayOHvOuGwIv/uVoQm2WfQfphosMJ575zHQv A5CsO6TgzseD6UWXcBYS/kd2ICngrZS9nBgZnwvAbyQaAIj4rlX2BltSZQdpEMHP9flc NKyNdShq8buK+rCgOwVV8vMdeG8JpGkhCRsFDpPA+oSUVPed7bvbaRU7Vy09DMbtBVB4 J+0tS2mYjinbD54cZdSci1QDtQIP2e+5ITRz3rQpojl2W3hOK8i5DGDCXAvzA7mABulS YjzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=WD8d+DtJr2yjtSpKbiynkeVLPIQflIkEjXcyq6P2CxQ=; b=mIqPMWgPwObS2OlNgLfNZjRNfA9l6t5pkXlsz5T0JK4RhDKm2DW1ElekK64Lc3ItTb 0WtRj4H6S59yUU7RL56AKGS4fGUd43js07cSSam7XLNfZIJpUFY+niwJTt5ed0ihhwte uwVD9tmtgu5PwAvU6NuqkuywsD0a7ZDIXQ9ryqixC1RWqxK6AGU39nffFxArVllQMhy0 /0ihwdhob2bcVFINGVuSFySAz6N7M99LDlp+0RmpoZ+P1eykA0IO3Ygr6bLoBZQ1ofpV sbNDyeZmTrfOsJnJSPU+OgJYckGZCMIkfpfvAcYw+2NvAIAMt5tZpIBkESvkaX4r63aZ Aqnw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hHAV4CT7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c6-20020a05640227c600b0046a222b78a5si7357737ede.249.2022.12.02.10.42.40; Fri, 02 Dec 2022 10:43:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=hHAV4CT7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234390AbiLBShc (ORCPT + 99 others); Fri, 2 Dec 2022 13:37:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43956 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234504AbiLBShR (ORCPT ); Fri, 2 Dec 2022 13:37:17 -0500 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 863A4EF0F7; Fri, 2 Dec 2022 10:37:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1670006235; x=1701542235; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BwGWFpwhkXHmY0scYL/UPLZTImaW0jFvH7J012aIUr4=; b=hHAV4CT77h3caWotSLeZ2yWijv+DCzSB+gJrFZjny7ehAtBWPH9BdPs7 Zv5O1olACZ1v2GAXW3yfcHxUDK4YyT/vZkthPYiCQC7EuKyvmvXlWoNuk 0+UXR3un+SvBhqxrGx9U+cGSSqJ61QKPjpPYL+tyQrvLOOTMjqMjGLLFr 50k9ML7A3Vet2Hfisu+3DTpopBEAiSz5UPBd/DC9QEP8sbggNuFNUgwVw lgdVOhdTyi1oPJTjBU2tn93m+OIHIOFgS6BQNoFuQC9u9PP/AVfJyoq73 VPwg4hVFm5AOms7XPbkrQZm8DkcF+p2dga6VqBFBPGmrwismuHX2MiWCR g==; X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="314724539" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="314724539" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 10:37:15 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10549"; a="713717416" X-IronPort-AV: E=Sophos;i="5.96,213,1665471600"; d="scan'208";a="713717416" Received: from kcaskeyx-mobl1.amr.corp.intel.com (HELO kcaccard-desk.amr.corp.intel.com) ([10.251.1.207]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Dec 2022 10:37:13 -0800 From: Kristen Carlson Accardi To: jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, cgroups@vger.kernel.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" Cc: zhiquan1.li@intel.com, Kristen Carlson Accardi , Sean Christopherson Subject: [PATCH v2 05/18] x86/sgx: Track epc pages on reclaimable or unreclaimable lists Date: Fri, 2 Dec 2022 10:36:41 -0800 Message-Id: <20221202183655.3767674-6-kristen@linux.intel.com> X-Mailer: git-send-email 2.38.1 In-Reply-To: <20221202183655.3767674-1-kristen@linux.intel.com> References: <20221202183655.3767674-1-kristen@linux.intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.3 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751128823465158595?= X-GMAIL-MSGID: =?utf-8?q?1751128823465158595?= Replace functions sgx_mark_page_reclaimable() and sgx_unmark_page_reclaimable() with sgx_record_epc_page() and sgx_drop_epc_page(). sgx_record_epc_page() wil add the epc_page to the correct "reclaimable" or "unreclaimable" list in the sgx_epc_lru_lists struct. sgx_drop_epc_page() will delete the page from the LRU list. Tracking pages that are not tracked by the reclaimer in the sgx_epc_lru_lists "unreclaimable" list allows an OOM event to cause all the pages in use by an enclave to be freed, regardless of whether they were reclaimable pages or not. Signed-off-by: Sean Christopherson Signed-off-by: Kristen Carlson Accardi Cc: Sean Christopherson --- arch/x86/kernel/cpu/sgx/encl.c | 10 +++++++--- arch/x86/kernel/cpu/sgx/ioctl.c | 11 +++++++---- arch/x86/kernel/cpu/sgx/main.c | 26 +++++++++++++++----------- arch/x86/kernel/cpu/sgx/sgx.h | 4 ++-- arch/x86/kernel/cpu/sgx/virt.c | 28 ++++++++++++++++++++-------- 5 files changed, 51 insertions(+), 28 deletions(-) diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c index 4eaf9d21e71b..4683da9ef4f1 100644 --- a/arch/x86/kernel/cpu/sgx/encl.c +++ b/arch/x86/kernel/cpu/sgx/encl.c @@ -252,6 +252,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl, epc_page = sgx_encl_eldu(&encl->secs, NULL); if (IS_ERR(epc_page)) return ERR_CAST(epc_page); + sgx_record_epc_page(epc_page, 0); } epc_page = sgx_encl_eldu(entry, encl->secs.epc_page); @@ -259,7 +260,7 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl, return ERR_CAST(epc_page); encl->secs_child_cnt++; - sgx_mark_page_reclaimable(entry->epc_page); + sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); return entry; } @@ -375,7 +376,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma, encl_page->type = SGX_PAGE_TYPE_REG; encl->secs_child_cnt++; - sgx_mark_page_reclaimable(encl_page->epc_page); + sgx_record_epc_page(encl_page->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); phys_addr = sgx_get_epc_phys_addr(epc_page); /* @@ -687,7 +688,7 @@ void sgx_encl_release(struct kref *ref) * The page and its radix tree entry cannot be freed * if the page is being held by the reclaimer. */ - if (sgx_unmark_page_reclaimable(entry->epc_page)) + if (sgx_drop_epc_page(entry->epc_page)) continue; sgx_encl_free_epc_page(entry->epc_page); @@ -703,6 +704,7 @@ void sgx_encl_release(struct kref *ref) xa_destroy(&encl->page_array); if (!encl->secs_child_cnt && encl->secs.epc_page) { + sgx_drop_epc_page(encl->secs.epc_page); sgx_encl_free_epc_page(encl->secs.epc_page); encl->secs.epc_page = NULL; } @@ -711,6 +713,7 @@ void sgx_encl_release(struct kref *ref) va_page = list_first_entry(&encl->va_pages, struct sgx_va_page, list); list_del(&va_page->list); + sgx_drop_epc_page(va_page->epc_page); sgx_encl_free_epc_page(va_page->epc_page); kfree(va_page); } @@ -1218,6 +1221,7 @@ struct sgx_epc_page *sgx_alloc_va_page(struct sgx_encl *encl, bool reclaim) sgx_encl_free_epc_page(epc_page); return ERR_PTR(-EFAULT); } + sgx_record_epc_page(epc_page, 0); return epc_page; } diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c index 9a1bb3c3211a..aca80a3f38a1 100644 --- a/arch/x86/kernel/cpu/sgx/ioctl.c +++ b/arch/x86/kernel/cpu/sgx/ioctl.c @@ -48,6 +48,7 @@ void sgx_encl_shrink(struct sgx_encl *encl, struct sgx_va_page *va_page) encl->page_cnt--; if (va_page) { + sgx_drop_epc_page(va_page->epc_page); sgx_encl_free_epc_page(va_page->epc_page); list_del(&va_page->list); kfree(va_page); @@ -113,6 +114,8 @@ static int sgx_encl_create(struct sgx_encl *encl, struct sgx_secs *secs) encl->attributes = secs->attributes; encl->attributes_mask = SGX_ATTR_DEBUG | SGX_ATTR_MODE64BIT | SGX_ATTR_KSS; + sgx_record_epc_page(encl->secs.epc_page, 0); + /* Set only after completion, as encl->lock has not been taken. */ set_bit(SGX_ENCL_CREATED, &encl->flags); @@ -322,7 +325,7 @@ static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src, goto err_out; } - sgx_mark_page_reclaimable(encl_page->epc_page); + sgx_record_epc_page(encl_page->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); mutex_unlock(&encl->lock); mmap_read_unlock(current->mm); return ret; @@ -958,7 +961,7 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl, * Prevent page from being reclaimed while mutex * is released. */ - if (sgx_unmark_page_reclaimable(entry->epc_page)) { + if (sgx_drop_epc_page(entry->epc_page)) { ret = -EAGAIN; goto out_entry_changed; } @@ -973,7 +976,7 @@ static long sgx_enclave_modify_types(struct sgx_encl *encl, mutex_lock(&encl->lock); - sgx_mark_page_reclaimable(entry->epc_page); + sgx_record_epc_page(entry->epc_page, SGX_EPC_PAGE_RECLAIMER_TRACKED); } /* Change EPC type */ @@ -1130,7 +1133,7 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl, goto out_unlock; } - if (sgx_unmark_page_reclaimable(entry->epc_page)) { + if (sgx_drop_epc_page(entry->epc_page)) { ret = -EBUSY; goto out_unlock; } diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 447cf4b8580c..ecd7f8e704cc 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -262,7 +262,7 @@ static void sgx_reclaimer_write(struct sgx_epc_page *epc_page, goto out; sgx_encl_ewb(encl->secs.epc_page, &secs_backing); - + sgx_drop_epc_page(encl->secs.epc_page); sgx_encl_free_epc_page(encl->secs.epc_page); encl->secs.epc_page = NULL; @@ -499,31 +499,35 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void) } /** - * sgx_mark_page_reclaimable() - Mark a page as reclaimable + * sgx_record_epc_page() - Add a page to the LRU tracking * @page: EPC page * - * Mark a page as reclaimable and add it to the active page list. Pages - * are automatically removed from the active list when freed. + * Mark a page with the specified flags and add it to the appropriate + * (un)reclaimable list. */ -void sgx_mark_page_reclaimable(struct sgx_epc_page *page) +void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags) { spin_lock(&sgx_global_lru.lock); - page->flags |= SGX_EPC_PAGE_RECLAIMER_TRACKED; - sgx_epc_push_reclaimable(&sgx_global_lru, page); + WARN_ON(page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED); + page->flags |= flags; + if (flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) + sgx_epc_push_reclaimable(&sgx_global_lru, page); + else + sgx_epc_push_unreclaimable(&sgx_global_lru, page); spin_unlock(&sgx_global_lru.lock); } /** - * sgx_unmark_page_reclaimable() - Remove a page from the reclaim list + * sgx_drop_epc_page() - Remove a page from a LRU list * @page: EPC page * - * Clear the reclaimable flag and remove the page from the active page list. + * Clear the reclaimable flag if set and remove the page from its LRU. * * Return: * 0 on success, * -EBUSY if the page is in the process of being reclaimed */ -int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) +int sgx_drop_epc_page(struct sgx_epc_page *page) { spin_lock(&sgx_global_lru.lock); if (page->flags & SGX_EPC_PAGE_RECLAIMER_TRACKED) { @@ -533,9 +537,9 @@ int sgx_unmark_page_reclaimable(struct sgx_epc_page *page) return -EBUSY; } - list_del(&page->list); page->flags &= ~SGX_EPC_PAGE_RECLAIMER_TRACKED; } + list_del(&page->list); spin_unlock(&sgx_global_lru.lock); return 0; diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h index 5e6d88438fae..ba4338b7303f 100644 --- a/arch/x86/kernel/cpu/sgx/sgx.h +++ b/arch/x86/kernel/cpu/sgx/sgx.h @@ -159,8 +159,8 @@ struct sgx_epc_page *__sgx_alloc_epc_page(void); void sgx_free_epc_page(struct sgx_epc_page *page); void sgx_reclaim_direct(void); -void sgx_mark_page_reclaimable(struct sgx_epc_page *page); -int sgx_unmark_page_reclaimable(struct sgx_epc_page *page); +void sgx_record_epc_page(struct sgx_epc_page *page, unsigned long flags); +int sgx_drop_epc_page(struct sgx_epc_page *page); struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim); void sgx_ipi_cb(void *info); diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c index 776ae5c1c032..0eabc4db91d0 100644 --- a/arch/x86/kernel/cpu/sgx/virt.c +++ b/arch/x86/kernel/cpu/sgx/virt.c @@ -64,6 +64,8 @@ static int __sgx_vepc_fault(struct sgx_vepc *vepc, goto err_delete; } + sgx_record_epc_page(epc_page, 0); + return 0; err_delete: @@ -148,6 +150,7 @@ static int sgx_vepc_free_page(struct sgx_epc_page *epc_page) return ret; } + sgx_drop_epc_page(epc_page); sgx_free_epc_page(epc_page); return 0; } @@ -220,8 +223,15 @@ static int sgx_vepc_release(struct inode *inode, struct file *file) * have been removed, the SECS page must have a child on * another instance. */ - if (sgx_vepc_free_page(epc_page)) + if (sgx_vepc_free_page(epc_page)) { + /* + * Drop the page before adding it to the list of SECS + * pages. Moving the page off the unreclaimable list + * needs to be done under the LRU's spinlock. + */ + sgx_drop_epc_page(epc_page); list_add_tail(&epc_page->list, &secs_pages); + } xa_erase(&vepc->page_array, index); } @@ -236,15 +246,17 @@ static int sgx_vepc_release(struct inode *inode, struct file *file) mutex_lock(&zombie_secs_pages_lock); list_for_each_entry_safe(epc_page, tmp, &zombie_secs_pages, list) { /* - * Speculatively remove the page from the list of zombies, - * if the page is successfully EREMOVE'd it will be added to - * the list of free pages. If EREMOVE fails, throw the page - * on the local list, which will be spliced on at the end. + * If EREMOVE fails, throw the page on the local list, which + * will be spliced on at the end. + * + * Note, this abuses sgx_drop_epc_page() to delete the page off + * the list of zombies, but this is a very rare path (probably + * never hit in production). It's not worth special casing the + * free path for this super rare case just to avoid taking the + * LRU's spinlock. */ - list_del(&epc_page->list); - if (sgx_vepc_free_page(epc_page)) - list_add_tail(&epc_page->list, &secs_pages); + list_move_tail(&epc_page->list, &secs_pages); } if (!list_empty(&secs_pages))