From patchwork Wed Feb 14 11:35:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Tesarik X-Patchwork-Id: 200914 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp1147298dyb; Wed, 14 Feb 2024 03:36:40 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCX8O37PLpYCXzxCgQtpH++Ql9ivCQks7sNH/juJiGBfZLyDZlk/3UY6LKOR+AZiASpVgFT5WYZl+HoYgBcMQH1qFf6anQ== X-Google-Smtp-Source: AGHT+IHq31KP13Ag6O1OHQFxq5T5A1uJ++3n3u7WxGeWvGWIjatAhUV0gKIjHZlc/kEuXXrZe/tF X-Received: by 2002:a17:906:7117:b0:a3d:1799:8acb with SMTP id x23-20020a170906711700b00a3d17998acbmr1460429ejj.62.1707910599973; Wed, 14 Feb 2024 03:36:39 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707910599; cv=pass; d=google.com; s=arc-20160816; b=xpIvcRSOKtruyEfvY9+dxbrIKjdgS598cFQoJKMHhoy92IP0KM0DUasw3mOnQizDYU 3tJQ3/IVJ3m3TH09QNgxd0+CJT7p8VN+/9grcTSOWCH3k4J8aL5OLaAg/MWCNw0atp1I m6miGP0i0eYGB9PWYfWktr+WyDmWC9hO3ki67HltvWg7RFkL4Bq6Bu+tSat6mPYZJ8Jd I7gvFBlVGGhdL06Wn0adLLeq4WIN/56rsSAIPvJJ/5ZvyIQNHCBIGULz4aQ50ZuIIuPL c7ZLEcX+0fXZvQznPq5h8AFVHrFy6z+Ct9+cPWLqJYvySllxHcYT5+NNiBBxAkJIJoge ADGQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=YTyAPV0BYQXtnkoXtUCVNrvG0qYgYnFb/ILa51Ly750=; fh=Ae/3boiT9mNOAVs2nS+qoLBC0IMVtY+pTExo8X8Kqhc=; b=W3jKU8X/E5gnuOKaMA3bxEmYSJ3Idv5AbJT3qf9XusybXX+B7NeuGNuhMadio1TAMS +7D4l3aRDw+5s8jHvdc6qM9afnSrHn5Zhb31Nh2DLVB9ASESD3jw+RU5Lnk0NDDPsHnm RTdMJNx49yMSiz07fhmRRJ3bPOTqAVkwaP8NSXkGTq814HWlrtjVFS333L8mDSQmzEAk U6tM9DgWTc6XmSu61kwkgAm15oYAAMFTXGNSJEQcTgmuuByXlAPWBVyUHm+IaDXo1++H 1NOrkcqs8ofGRbNtq8xaXO1l37rzZJQyJ9IZ9Gi6JL1d0qjqgpCznREFNCeT5+Huceep 0v4A==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-65138-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-65138-ouuuleilei=gmail.com@vger.kernel.org" X-Forwarded-Encrypted: i=2; AJvYcCVM1ajGxR8awQjSwYh7u6bBahcNdvmz7pUN7061umxXFL3A3snYDERXh4A9Y2EPg1odcj1WDzGsZ1qnGor9uL5ohl1U1g== Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id do21-20020a170906c11500b00a3d434c16a8si582059ejc.831.2024.02.14.03.36.39 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Feb 2024 03:36:39 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-65138-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-65138-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-65138-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 68CBA1F253DE for ; Wed, 14 Feb 2024 11:36:30 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DAB941AAC4; Wed, 14 Feb 2024 11:36:12 +0000 (UTC) Received: from frasgout12.his.huawei.com (frasgout12.his.huawei.com [14.137.139.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3DC2E19473; Wed, 14 Feb 2024 11:36:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=14.137.139.154 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707910570; cv=none; b=E8CsBzio5PGrUusOTUkKNLQsFD+3Bxv28+O7IFpa2UwsGXr6gskRV3c6rNqYITxPo+rWp/lpRA3XjyVY4+GA0sp15ZttsL+6HCc8MWHyJo4e8pm/P7eTdwkCqCclZeKs+GJ+xDjCB/cfahxvLXZH7+qzDwz2dQdzV9UyVIIS4Ak= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707910570; c=relaxed/simple; bh=5vKTvZD+aG1spyEdS1uTQQ4reDy9589yJ+R2Yy1zvE4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=PsQZnbG7QZtXMCW7lpVSf6gbOBLP0YqJhN0KDkWc4RpDeXlyl/EWlkte24EbQSsgBYagvXnyqO8mdrzUdXqO2/OibSmS2MWprbJ122mxDOdOyti1gEHmv6Nai3+lqpsXA6ftJrD+0sYHcQOjlDD+ScXsIK4kWpN/o6VH2KPlcyA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=14.137.139.154 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.18.186.29]) by frasgout12.his.huawei.com (SkyGuard) with ESMTP id 4TZbHt3kGHz9yLnQ; Wed, 14 Feb 2024 19:16:58 +0800 (CST) Received: from mail02.huawei.com (unknown [7.182.16.47]) by mail.maildlp.com (Postfix) with ESMTP id 7E9F814061E; Wed, 14 Feb 2024 19:35:55 +0800 (CST) Received: from huaweicloud.com (unknown [10.45.156.69]) by APP1 (Coremail) with SMTP id LxC2BwAHshp7pcxlDJx9Ag--.51624S3; Wed, 14 Feb 2024 12:35:54 +0100 (CET) From: Petr Tesarik To: Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), "H. Peter Anvin" , Andy Lutomirski , Oleg Nesterov , Peter Zijlstra , Xin Li , Arnd Bergmann , Andrew Morton , Rick Edgecombe , Kees Cook , "Masami Hiramatsu (Google)" , Pengfei Xu , Josh Poimboeuf , Ze Gao , "Kirill A. Shutemov" , Kai Huang , David Woodhouse , Brian Gerst , Jason Gunthorpe , Joerg Roedel , "Mike Rapoport (IBM)" , Tina Zhang , Jacob Pan , linux-doc@vger.kernel.org (open list:DOCUMENTATION), linux-kernel@vger.kernel.org (open list) Cc: Roberto Sassu , petr@tesarici.cz, Petr Tesarik Subject: [PATCH v1 1/8] sbm: x86: page table arch hooks Date: Wed, 14 Feb 2024 12:35:09 +0100 Message-Id: <20240214113516.2307-2-petrtesarik@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240214113516.2307-1-petrtesarik@huaweicloud.com> References: <20240214113516.2307-1-petrtesarik@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: LxC2BwAHshp7pcxlDJx9Ag--.51624S3 X-Coremail-Antispam: 1UD129KBjvJXoW3Xw48XrykWF1rJry3Zr1DZFb_yoWDJFW8pF s7AF1FgF42q3sxK397Jryv9rn8Jws7Ka1rKFZrWa45XF13tayrGF929392qr48ZrykCay5 Ka9xtFn8Ca1UJw7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmq14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_JFI_Gr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26r4j6F4UM28EF7xvwVC2z280aVCY1x0267AKxVW8Jr0_Cr1U M2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjx v20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1l F7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI8v6xkF7I0E8cxan2 IY04v7MxkF7I0Ew4C26cxK6c8Ij28IcwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_Wrv_Gr1UMIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1l IxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r 1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIY CTnIWIevJa73UjIFyTuYvjfUnBT5DUUUU X-CM-SenderInfo: hshw23xhvd2x3n6k3tpzhluzxrxghudrp/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790874065345950790 X-GMAIL-MSGID: 1790874065345950790 From: Petr Tesarik Add arch hooks for the x86 architecture and select CONFIG_HAVE_ARCH_SBM. Implement arch_sbm_init(): Allocate an arch-specific state page and store it as SBM instance private data. Set up mappings for kernel text, static data, current task and current thread stack into the. Implement arch_sbm_map_readonly() and arch_sbm_map_writable(): Set the PTE value, allocating additional page tables as necessary. Implement arch_sbm_destroy(): Walk the page table hierarchy and free all page tables, including the page global directory. Provide a trivial implementation of arch_sbm_exec() to avoid build failures, but do not switch to the constructed page tables yet. Signed-off-by: Petr Tesarik --- arch/x86/Kconfig | 1 + arch/x86/include/asm/sbm.h | 29 ++++ arch/x86/kernel/Makefile | 2 + arch/x86/kernel/sbm/Makefile | 10 ++ arch/x86/kernel/sbm/core.c | 248 +++++++++++++++++++++++++++++++++++ 5 files changed, 290 insertions(+) create mode 100644 arch/x86/include/asm/sbm.h create mode 100644 arch/x86/kernel/sbm/Makefile create mode 100644 arch/x86/kernel/sbm/core.c diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 5edec175b9bf..41fa4ab84c15 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -188,6 +188,7 @@ config X86 select HAVE_ARCH_MMAP_RND_COMPAT_BITS if MMU && COMPAT select HAVE_ARCH_COMPAT_MMAP_BASES if MMU && COMPAT select HAVE_ARCH_PREL32_RELOCATIONS + select HAVE_ARCH_SBM select HAVE_ARCH_SECCOMP_FILTER select HAVE_ARCH_THREAD_STRUCT_WHITELIST select HAVE_ARCH_STACKLEAK diff --git a/arch/x86/include/asm/sbm.h b/arch/x86/include/asm/sbm.h new file mode 100644 index 000000000000..01c8d357550b --- /dev/null +++ b/arch/x86/include/asm/sbm.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023-2024 Huawei Technologies Duesseldorf GmbH + * + * Author: Petr Tesarik + * + * SandBox Mode (SBM) declarations for the x86 architecture. + */ +#ifndef __ASM_SBM_H +#define __ASM_SBM_H + +#if defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) + +#include + +/** + * struct x86_sbm_state - Run-time state of the environment. + * @pgd: Sandbox mode page global directory. + * + * One instance of this union is allocated for each sandbox and stored as SBM + * instance private data. + */ +struct x86_sbm_state { + pgd_t *pgd; +}; + +#endif /* defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) */ + +#endif /* __ASM_SBM_H */ diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 0000325ab98f..4ad63b7d13ee 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -150,6 +150,8 @@ obj-$(CONFIG_X86_CET) += cet.o obj-$(CONFIG_X86_USER_SHADOW_STACK) += shstk.o +obj-$(CONFIG_SANDBOX_MODE) += sbm/ + ### # 64 bit specific files ifeq ($(CONFIG_X86_64),y) diff --git a/arch/x86/kernel/sbm/Makefile b/arch/x86/kernel/sbm/Makefile new file mode 100644 index 000000000000..92d368b526cd --- /dev/null +++ b/arch/x86/kernel/sbm/Makefile @@ -0,0 +1,10 @@ +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (C) 2023-2024 Huawei Technologies Duesseldorf GmbH +# +# Author: Petr Tesarik +# +# Makefile for the x86 SandBox Mode (SBM) implementation. +# + +obj-y := core.o diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c new file mode 100644 index 000000000000..b775e3b387b1 --- /dev/null +++ b/arch/x86/kernel/sbm/core.c @@ -0,0 +1,248 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2023-2024 Huawei Technologies Duesseldorf GmbH + * + * Author: Petr Tesarik + * + * SandBox Mode (SBM) implementation for the x86 architecture. + */ + +#include +#include +#include +#include +#include +#include + +#define GFP_SBM_PGTABLE (GFP_KERNEL | __GFP_ZERO) +#define PGD_ORDER get_order(sizeof(pgd_t) * PTRS_PER_PGD) + +static inline phys_addr_t page_to_ptval(struct page *page) +{ + return PFN_PHYS(page_to_pfn(page)) | _PAGE_TABLE; +} + +static int map_page(struct x86_sbm_state *state, unsigned long addr, + unsigned long pfn, pgprot_t prot) +{ + struct page *page; + pgd_t *pgdp; + p4d_t *p4dp; + pud_t *pudp; + pmd_t *pmdp; + pte_t *ptep; + + pgdp = pgd_offset_pgd(state->pgd, addr); + if (pgd_none(*pgdp)) { + page = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!page) + return -ENOMEM; + set_pgd(pgdp, __pgd(page_to_ptval(page))); + p4dp = (p4d_t *)page_address(page) + p4d_index(addr); + } else + p4dp = p4d_offset(pgdp, addr); + + if (p4d_none(*p4dp)) { + page = alloc_page(GFP_SBM_PGTABLE); + if (!page) + return -ENOMEM; + set_p4d(p4dp, __p4d(page_to_ptval(page))); + pudp = (pud_t *)page_address(page) + pud_index(addr); + } else + pudp = pud_offset(p4dp, addr); + + if (pud_none(*pudp)) { + page = alloc_page(GFP_SBM_PGTABLE); + if (!page) + return -ENOMEM; + set_pud(pudp, __pud(page_to_ptval(page))); + pmdp = (pmd_t *)page_address(page) + pmd_index(addr); + } else + pmdp = pmd_offset(pudp, addr); + + if (pmd_none(*pmdp)) { + page = alloc_page(GFP_SBM_PGTABLE); + if (!page) + return -ENOMEM; + set_pmd(pmdp, __pmd(page_to_ptval(page))); + ptep = (pte_t *)page_address(page) + pte_index(addr); + } else + ptep = pte_offset_kernel(pmdp, addr); + + set_pte(ptep, pfn_pte(pfn, prot)); + return 0; +} + +static int map_range(struct x86_sbm_state *state, unsigned long start, + unsigned long end, pgprot_t prot) +{ + unsigned long pfn; + int err; + + start = PAGE_ALIGN_DOWN(start); + while (start < end) { + if (is_vmalloc_or_module_addr((void *)start)) + pfn = vmalloc_to_pfn((void *)start); + else + pfn = PHYS_PFN(__pa(start)); + err = map_page(state, start, pfn, prot); + if (err) + return err; + start += PAGE_SIZE; + } + + return 0; +} + +int arch_sbm_map_readonly(struct sbm *sbm, const struct sbm_buf *buf) +{ + return map_range(sbm->private, (unsigned long)buf->sbm_ptr, + (unsigned long)buf->sbm_ptr + buf->size, + PAGE_READONLY); +} + +int arch_sbm_map_writable(struct sbm *sbm, const struct sbm_buf *buf) +{ + return map_range(sbm->private, (unsigned long)buf->sbm_ptr, + (unsigned long)buf->sbm_ptr + buf->size, + PAGE_SHARED); +} + +/* Map kernel text, data, rodata, BSS and static per-cpu sections. */ +static int map_kernel(struct x86_sbm_state *state) +{ + int __maybe_unused cpu; + int err; + + err = map_range(state, (unsigned long)_stext, (unsigned long)_etext, + PAGE_READONLY_EXEC); + if (err) + return err; + + err = map_range(state, (unsigned long)__entry_text_start, + (unsigned long)__entry_text_end, PAGE_KERNEL_ROX); + if (err) + return err; + + err = map_range(state, (unsigned long)_sdata, (unsigned long)_edata, + PAGE_READONLY); + if (err) + return err; + err = map_range(state, (unsigned long)__bss_start, + (unsigned long)__bss_stop, PAGE_READONLY); + if (err) + return err; + err = map_range(state, (unsigned long)__start_rodata, + (unsigned long)__end_rodata, PAGE_READONLY); + if (err) + return err; + +#ifdef CONFIG_SMP + for_each_possible_cpu(cpu) { + unsigned long off = per_cpu_offset(cpu); + + err = map_range(state, (unsigned long)__per_cpu_start + off, + (unsigned long)__per_cpu_end + off, + PAGE_READONLY); + if (err) + return err; + } +#endif + + return 0; +} + +int arch_sbm_init(struct sbm *sbm) +{ + struct x86_sbm_state *state; + unsigned long stack; + int err; + + BUILD_BUG_ON(sizeof(*state) > PAGE_SIZE); + state = (struct x86_sbm_state *)__get_free_page(GFP_KERNEL); + if (!state) + return -ENOMEM; + sbm->private = state; + + state->pgd = (pgd_t *)__get_free_pages(GFP_SBM_PGTABLE, PGD_ORDER); + if (!state->pgd) + return -ENOMEM; + + err = map_kernel(state); + if (err) + return err; + + err = map_range(state, (unsigned long)current, + (unsigned long)(current + 1), PAGE_READONLY); + if (err) + return err; + + stack = (unsigned long)task_stack_page(current); + err = map_range(state, stack, stack + THREAD_SIZE, PAGE_READONLY); + if (err) + return err; + + return 0; +} + +static void free_pmd(pmd_t *pmd) +{ + pmd_t *pmdp; + + for (pmdp = pmd; pmdp < pmd + PTRS_PER_PMD; ++pmdp) + if (!pmd_none(*pmdp)) + free_page(pmd_page_vaddr(*pmdp)); + if (PTRS_PER_PMD > 1) + free_page((unsigned long)pmd); +} + +static void free_pud(pud_t *pud) +{ + pud_t *pudp; + + for (pudp = pud; pudp < pud + PTRS_PER_PUD; ++pudp) + if (!pud_none(*pudp)) + free_pmd(pmd_offset(pudp, 0)); + if (PTRS_PER_PUD > 1) + free_page((unsigned long)pud); +} + +static void free_p4d(p4d_t *p4d) +{ + p4d_t *p4dp; + + for (p4dp = p4d; p4dp < p4d + PTRS_PER_P4D; ++p4dp) + if (!p4d_none(*p4dp)) + free_pud(pud_offset(p4dp, 0)); + if (PTRS_PER_P4D > 1) + free_page((unsigned long)p4d); +} + +static void free_pgd(pgd_t *pgd) +{ + pgd_t *pgdp; + + for (pgdp = pgd; pgdp < pgd + PTRS_PER_PGD; ++pgdp) + if (!pgd_none(*pgdp)) + free_p4d(p4d_offset(pgdp, 0)); +} + +void arch_sbm_destroy(struct sbm *sbm) +{ + struct x86_sbm_state *state = sbm->private; + + if (!state) + return; + + if (state->pgd) { + free_pgd(state->pgd); + free_pages((unsigned long)state->pgd, PGD_ORDER); + } + free_page((unsigned long)state); + sbm->private = NULL; +} + +int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *args) +{ + return func(args); +} From patchwork Wed Feb 14 11:35:10 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Tesarik X-Patchwork-Id: 200915 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp1147342dyb; Wed, 14 Feb 2024 03:36:48 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWPmv8TbB39bJ5mELYW+NCLfdJivDvT1B/Bvs8DTH+XDOWO1zGCoj6TW9ftrh66wYylFmFTDcb4JNDJPCueEJTptqIEeA== X-Google-Smtp-Source: AGHT+IEEuyyo4IhrHVmGE1TQ4g3OLP0FmcNFRhrXkqYIFO2jXCqtpHdNMa1iIXIDJpf27VISe15R X-Received: by 2002:a17:906:5fd6:b0:a3c:9cc4:99a0 with SMTP id k22-20020a1709065fd600b00a3c9cc499a0mr1387342ejv.16.1707910608028; Wed, 14 Feb 2024 03:36:48 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707910608; cv=pass; d=google.com; s=arc-20160816; b=Vsm4WzCCamRAIkhwq2CtQ1O5qQoFN5o6jDKazHkyLw2/iLUh2I9sWeQ1P20kasGWeF xO4ob5J0xZSTx1Q9rf/M44q07T2tZhRaBoFMyPX5YUREHXD4r0P2+TbgUphXckV2DexQ FV+zLo3jsuHut26chJyrhDgH6kuPPRRFLXEdnt3RkEG+o01iZdmLDp3GpnecYoHcCX2d CIX2w9gQ02uZr0G8Pd8eMeaIOzPPcBDpvOftTrdZDBz0/jHzM/SRR6dvm9F+sEmb+P8g pJE6ARsKus+r7JEWtdniHEgTNneJ9IS1QHryIwxhC729jTq2C5Yb4rkL/Q7/C6ki4yYt aufg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=0VLO9XCCRrOyvGlH5NhYWuh2+3Y4ipxYRLoEzFH+z+0=; fh=VYCbF4Zkj/MbkmZm+c+jQC4zQIGj2Axo9QXNxZlTxoI=; b=kRiRsXqzFPJtpwfhw0uDurwQpEnelBPIJQ5qaYkn+0Tx/NcLVKjj46SCo5CiMPWoZo RhjvANEDr+x9oMAzxenhFRvaypbuP8uuO8dojPRF/Ft3tp7XOOkI1dGZ9+nNf78C0LAP OL2QlQj3KNaxe/lCqavxuWK+7RlyjSMTGlzm68Xk7/N9x7hoUDJzGlWcXYo7ToaYZU77 ep4+Pgefq0gl6usbqscDnO9qhqewJqd52Ist7ZWdBEUKXUNrvpkwd7Qq3hBLJthci616 1w+q3Rcy+yuQ/VHxhTfItvAWYiJmSFWYBErpDqTtVXGZWA2ewyCkBlkjOk0VU8TAjCds NkUQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-65139-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-65139-ouuuleilei=gmail.com@vger.kernel.org" X-Forwarded-Encrypted: i=2; AJvYcCXnX5Gx0T+Vm7UDYzP7nCGvHmc325TdMWYeXtG8ECk9mtmifz2LMcxzfuVp887nPOGPHS9dIZ0M7qg1pSSBb0pUBPaLRg== Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id k19-20020a17090646d300b00a3cf214d30esi1723361ejs.225.2024.02.14.03.36.47 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Feb 2024 03:36:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-65139-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-65139-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-65139-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 78BBF1F21727 for ; Wed, 14 Feb 2024 11:36:47 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B230B1B964; Wed, 14 Feb 2024 11:36:17 +0000 (UTC) Received: from frasgout12.his.huawei.com (frasgout12.his.huawei.com [14.137.139.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 966771B5AA; Wed, 14 Feb 2024 11:36:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=14.137.139.154 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707910575; cv=none; b=Z4xw6xzdYi1TKIJtGkqW+cGLLEne7/D+OYPoIiJk9A1Rq+sIDEKqPJJHLajwCYtPJPwEyNDhpJbC6HPObwOtr40qXY85mpBOmZ1wwXLLKGJTMIDb3vrVdf4IbxaPAvLI+olwSOrIhSspZaCchTKh3vi9F5JuPy/hpUVU16J5tr0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707910575; c=relaxed/simple; bh=BhUpmPPCX9aFET6nbzIg0SSMZ201vmY12lSaJWzujcQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Gp9xTl8yvutEr8RaMNMsuiZqs9IIZGCEG5BAOSmcYvOUYdJQmXMHzWLRfzLPDoi9cAuGfoyQcsA0eKQYErmk0Sm0vwR22C7VdoB3grJ3HRdI3b0b4C2gk5amrE/eEFxKMq4hpC2ef+C0uUkfIeRKdArDNcCPyNmnvIlitqN3V+I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=14.137.139.154 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.18.186.29]) by frasgout12.his.huawei.com (SkyGuard) with ESMTP id 4TZbHz5GM5z9yLnN; Wed, 14 Feb 2024 19:17:03 +0800 (CST) Received: from mail02.huawei.com (unknown [7.182.16.47]) by mail.maildlp.com (Postfix) with ESMTP id A0B5B1400DA; Wed, 14 Feb 2024 19:36:10 +0800 (CST) Received: from huaweicloud.com (unknown [10.45.156.69]) by APP1 (Coremail) with SMTP id LxC2BwAHshp7pcxlDJx9Ag--.51624S4; Wed, 14 Feb 2024 12:36:10 +0100 (CET) From: Petr Tesarik To: Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), "H. Peter Anvin" , Andy Lutomirski , Oleg Nesterov , Peter Zijlstra , Xin Li , Arnd Bergmann , Andrew Morton , Rick Edgecombe , Kees Cook , "Masami Hiramatsu (Google)" , Pengfei Xu , Josh Poimboeuf , Ze Gao , "Kirill A. Shutemov" , Kai Huang , David Woodhouse , Brian Gerst , Jason Gunthorpe , Joerg Roedel , "Mike Rapoport (IBM)" , Tina Zhang , Jacob Pan , linux-doc@vger.kernel.org (open list:DOCUMENTATION), linux-kernel@vger.kernel.org (open list) Cc: Roberto Sassu , petr@tesarici.cz, Petr Tesarik Subject: [PATCH v1 2/8] sbm: x86: execute target function on sandbox mode stack Date: Wed, 14 Feb 2024 12:35:10 +0100 Message-Id: <20240214113516.2307-3-petrtesarik@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240214113516.2307-1-petrtesarik@huaweicloud.com> References: <20240214113516.2307-1-petrtesarik@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: LxC2BwAHshp7pcxlDJx9Ag--.51624S4 X-Coremail-Antispam: 1UD129KBjvJXoW3Xw4rJFW3Cw18Xr1kAF48Crg_yoW7GF45pr 9rAFn3GF40gasav3sxJr18ury5Zws2ka1fGF9rGFy5Ja4jv3yUJr1v939Fqr4rX3ykGa4r KF4ruF1vkw4UJw7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmq14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jryl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_JFI_Gr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26r4j6F4UM28EF7xvwVC2z280aVCY1x0267AKxVW8Jr0_Cr1U M2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjx v20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1l F7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI8v6xkF7I0E8cxan2 IY04v7MxkF7I0Ew4C26cxK6c8Ij28IcwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_Wrv_Gr1UMIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1l IxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r 1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIY CTnIWIevJa73UjIFyTuYvjfUn3kuUUUUU X-CM-SenderInfo: hshw23xhvd2x3n6k3tpzhluzxrxghudrp/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790874073894835741 X-GMAIL-MSGID: 1790874073894835741 From: Petr Tesarik Allocate and map a separate stack for sandbox mode in arch_sbm_init(). Switch to this stack in arch_sbm_exec(). Store the address of the stack as arch-specific state. On X86_64, RSP is never used to locate thread-specific data, so it is safe to change its value. If the sandbox is preempted by an interrupt, RSP is saved by switch_to() and restored when the sandbox task is scheduled again. The original kernel stack pointer is restored when the sandbox function returns. Since the stack switch mechanism is implemented only for 64-bit, make CONFIG_HAVE_ARCH_SBM depend on X86_64 for now. Leave it under "config X86", because it would be possible to implement a 32-bit variant. Signed-off-by: Petr Tesarik --- arch/x86/Kconfig | 2 +- arch/x86/include/asm/sbm.h | 2 ++ arch/x86/kernel/sbm/Makefile | 6 ++++++ arch/x86/kernel/sbm/call_64.S | 40 +++++++++++++++++++++++++++++++++++ arch/x86/kernel/sbm/core.c | 17 ++++++++++++++- 5 files changed, 65 insertions(+), 2 deletions(-) create mode 100644 arch/x86/kernel/sbm/call_64.S diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 41fa4ab84c15..090d46c7ee7c 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -188,7 +188,7 @@ config X86 select HAVE_ARCH_MMAP_RND_COMPAT_BITS if MMU && COMPAT select HAVE_ARCH_COMPAT_MMAP_BASES if MMU && COMPAT select HAVE_ARCH_PREL32_RELOCATIONS - select HAVE_ARCH_SBM + select HAVE_ARCH_SBM if X86_64 select HAVE_ARCH_SECCOMP_FILTER select HAVE_ARCH_THREAD_STRUCT_WHITELIST select HAVE_ARCH_STACKLEAK diff --git a/arch/x86/include/asm/sbm.h b/arch/x86/include/asm/sbm.h index 01c8d357550b..ed214c17af06 100644 --- a/arch/x86/include/asm/sbm.h +++ b/arch/x86/include/asm/sbm.h @@ -16,12 +16,14 @@ /** * struct x86_sbm_state - Run-time state of the environment. * @pgd: Sandbox mode page global directory. + * @stack: Sandbox mode stack. * * One instance of this union is allocated for each sandbox and stored as SBM * instance private data. */ struct x86_sbm_state { pgd_t *pgd; + unsigned long stack; }; #endif /* defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) */ diff --git a/arch/x86/kernel/sbm/Makefile b/arch/x86/kernel/sbm/Makefile index 92d368b526cd..62c3e85c14a4 100644 --- a/arch/x86/kernel/sbm/Makefile +++ b/arch/x86/kernel/sbm/Makefile @@ -8,3 +8,9 @@ # obj-y := core.o + +### +# 64 bit specific files +ifeq ($(CONFIG_X86_64),y) + obj-y += call_64.o +endif diff --git a/arch/x86/kernel/sbm/call_64.S b/arch/x86/kernel/sbm/call_64.S new file mode 100644 index 000000000000..245d0dddce73 --- /dev/null +++ b/arch/x86/kernel/sbm/call_64.S @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2023-2024 Huawei Technologies Duesseldorf GmbH + * + * Author: Petr Tesarik + * + * SandBox Mode (SBM) low-level x86_64 assembly. + */ + +#include +#include + +.code64 +.section .entry.text, "ax" + +/* + * arguments: + * rdi .. SBM state (kernel address) + * rsi .. func + * rdx .. args + * rcx .. top of sandbox stack + */ +SYM_FUNC_START(x86_sbm_exec) + /* + * Set up the sandbox stack: + * 1. Store the old stack pointer at the top of the sandbox stack, + * where various unwinders can find it and link back to the + * kernel stack. + */ + sub $8, %rcx + mov %rsp, (%rcx) + mov %rcx, %rsp + + mov %rdx, %rdi /* args */ + CALL_NOSPEC rsi + + pop %rsp + + RET +SYM_FUNC_END(x86_sbm_exec) diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c index b775e3b387b1..de6986801148 100644 --- a/arch/x86/kernel/sbm/core.c +++ b/arch/x86/kernel/sbm/core.c @@ -17,6 +17,9 @@ #define GFP_SBM_PGTABLE (GFP_KERNEL | __GFP_ZERO) #define PGD_ORDER get_order(sizeof(pgd_t) * PTRS_PER_PGD) +asmlinkage int x86_sbm_exec(struct x86_sbm_state *state, sbm_func func, + void *args, unsigned long sbm_tos); + static inline phys_addr_t page_to_ptval(struct page *page) { return PFN_PHYS(page_to_pfn(page)) | _PAGE_TABLE; @@ -182,6 +185,15 @@ int arch_sbm_init(struct sbm *sbm) if (err) return err; + state->stack = __get_free_pages(GFP_KERNEL, THREAD_SIZE_ORDER); + if (!state->stack) + return -ENOMEM; + + err = map_range(state, state->stack, state->stack + THREAD_SIZE, + PAGE_SHARED); + if (err) + return err; + return 0; } @@ -238,11 +250,14 @@ void arch_sbm_destroy(struct sbm *sbm) free_pgd(state->pgd); free_pages((unsigned long)state->pgd, PGD_ORDER); } + free_pages(state->stack, THREAD_SIZE_ORDER); free_page((unsigned long)state); sbm->private = NULL; } int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *args) { - return func(args); + struct x86_sbm_state *state = sbm->private; + + return x86_sbm_exec(state, func, args, state->stack + THREAD_SIZE); } From patchwork Wed Feb 14 11:35:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Tesarik X-Patchwork-Id: 200916 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp1147469dyb; Wed, 14 Feb 2024 03:37:05 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCU9gQD71afcRrn/SsycrP+03sOnw4u4wstwUqUen34fdEswax1llRlbAPPTSGM7OYVik1WSPe8592xrERP6PvhMQHDSIA== X-Google-Smtp-Source: AGHT+IFDnp3AFjAswERQkBssigTpvdUz9CLzgZQhXZEnbu5p+gzC/Fow+wpn7HfIj2BOb9g7EgML X-Received: by 2002:a17:907:6d0c:b0:a3d:37a4:809d with SMTP id sa12-20020a1709076d0c00b00a3d37a4809dmr1910476ejc.7.1707910625722; Wed, 14 Feb 2024 03:37:05 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707910625; cv=pass; d=google.com; s=arc-20160816; b=hRg9EQlhsRX3fEmx9lZ01oAfjv30g4EniROxDFLHLOysG7gjeOBI5wDOGu/bJekqFY PDbwdMGRZ63zkK8Ml2Gpn7V2iqotFunLfS9vpEDpVttzJh8mnPe+rjPdv9Lvv68wT/LE AlykvUE2vEsMIRkB4vdaJP7bb7iuc1W9JUJnEIeKASyotXxFplBS3JfbJiDN6wkeh84+ vaUKoetyGzDFs84WfFnkm+fLx3aumbAejCVNwVGBFQ3SNQvwuKCKkpfm+zPp1VzGW2TT K3ldxV3+UTl0JB9cezNpsFjlrJ+Z636KgCqjki94eNLa95/W0Uz1Uv15nco5m3Y/N9U5 uOUw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=99U5z0ElWDJxhGO7uX4CvYyY7i4IPJKIdyI1FUwatwQ=; fh=qhjJtNZ9au+nuwKoniHVFiI1f1b0FdRJ67yvXxiByg4=; b=uCB+ZXkKkkqCpjWZ5KK32708+XuaX/pOvVFhephaqNmdSg+eXeysMbyslW/T66fUbv aHTi7/PpPrV/UgADlVybQX+OrTTbS8xbly28evi0GG6+R3Hbxkrm54M4MJWGj60qb3VW qc0RiwtIBRFAUotNEBrX1dYygAXqkHCsvAeSNwTjANb8r0dZzZqtkG/+oif0ZQeO0WLW EYMameLJAPMgfQcp0c5WK3bis0lb+nGiPuzsz3xz49gh6GuiWIlUoP2A66YAlHpdATf/ XSonYnatjiJRHmc3mtntPx0WzZEx83vAApBYKieaCnp5Hl01PC8SJfyG2ND7TFJi0DkR PNOA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-65140-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-65140-ouuuleilei=gmail.com@vger.kernel.org" X-Forwarded-Encrypted: i=2; AJvYcCU6xY2O/s0hth14R/Xch5fo8YqzCo2Kw8tj78NTrd/wre8XYUTroMU8O3CwzMuD4nQehKANT5PeIqd/stWs0hPEs6B1IA== Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id m15-20020a056402510f00b00563817bc366si413498edd.510.2024.02.14.03.37.05 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Feb 2024 03:37:05 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-65140-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-65140-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-65140-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 2D6911F21764 for ; Wed, 14 Feb 2024 11:37:05 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6C3661AAD9; Wed, 14 Feb 2024 11:36:43 +0000 (UTC) Received: from frasgout11.his.huawei.com (frasgout11.his.huawei.com [14.137.139.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 73CAE1947D; Wed, 14 Feb 2024 11:36:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=14.137.139.23 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707910601; cv=none; b=fljmPV3xI5SNNLgR8/Y8lLh4kH78J99sICC3+5vUSQDgdesVgiBuiKySkG1hA2lzrNh5KSpnr1qDjJuO9xqPlhsEEY/C5DfPyoPLGf3/o0LENN2vI9QTkdkjnMVK7zSsqfPB13H+zO9ZOHpqXJBLOWYW7N/ZGIdZWPThg2mgAC0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707910601; c=relaxed/simple; bh=XX4ZJh1efWO1B3rAxShAcHfODcw783OuO5HzO8Mr6W0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qcY6syD3Vex2ffc1j1zVH7wfAsoZUbENiGnI3tOxk0Ono7z6pwGuDQ9aA+8tL3twnka6hm0bmsVsLWt2a71C5JK/4oau2xOOHZ/LLAyvEdj0pMjrixkF/gTtqhJPFoM6ZUj63aUXmvYIBU8VAoxMWYAVP/qwAOaqGtTHTWJo8Js= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=14.137.139.23 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.18.186.51]) by frasgout11.his.huawei.com (SkyGuard) with ESMTP id 4TZbNw1Yybz9xrtH; Wed, 14 Feb 2024 19:21:20 +0800 (CST) Received: from mail02.huawei.com (unknown [7.182.16.47]) by mail.maildlp.com (Postfix) with ESMTP id B516014066B; Wed, 14 Feb 2024 19:36:25 +0800 (CST) Received: from huaweicloud.com (unknown [10.45.156.69]) by APP1 (Coremail) with SMTP id LxC2BwAHshp7pcxlDJx9Ag--.51624S5; Wed, 14 Feb 2024 12:36:25 +0100 (CET) From: Petr Tesarik To: Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), "H. Peter Anvin" , Andy Lutomirski , Oleg Nesterov , Peter Zijlstra , Xin Li , Arnd Bergmann , Andrew Morton , Rick Edgecombe , Kees Cook , "Masami Hiramatsu (Google)" , Pengfei Xu , Josh Poimboeuf , Ze Gao , "Kirill A. Shutemov" , Kai Huang , David Woodhouse , Brian Gerst , Jason Gunthorpe , Joerg Roedel , "Mike Rapoport (IBM)" , Tina Zhang , Jacob Pan , linux-doc@vger.kernel.org (open list:DOCUMENTATION), linux-kernel@vger.kernel.org (open list) Cc: Roberto Sassu , petr@tesarici.cz, Petr Tesarik Subject: [PATCH v1 3/8] sbm: x86: map system data structures into the sandbox Date: Wed, 14 Feb 2024 12:35:11 +0100 Message-Id: <20240214113516.2307-4-petrtesarik@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240214113516.2307-1-petrtesarik@huaweicloud.com> References: <20240214113516.2307-1-petrtesarik@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: LxC2BwAHshp7pcxlDJx9Ag--.51624S5 X-Coremail-Antispam: 1UD129KBjvJXoWxWFWDtrykCFy7Cr43Ar4Utwb_yoWrGr47pF nxCF1kKFW7K343uwn3Cr40yr15Zws2k3W7Kry2kryrZF17t3W5Ars2g3yDtFW8GFWvga4F qFW3tF4rGa1DZw7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmq14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JrWl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_JFI_Gr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26r4j6F4UM28EF7xvwVC2z280aVCY1x0267AKxVW8Jr0_Cr1U M2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6xIIjx v20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_Gr1l F7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI8v6xkF7I0E8cxan2 IY04v7MxkF7I0Ew4C26cxK6c8Ij28IcwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_Wrv_Gr1UMIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1l IxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r 1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIY CTnIWIevJa73UjIFyTuYvjfUnPEfUUUUU X-CM-SenderInfo: hshw23xhvd2x3n6k3tpzhluzxrxghudrp/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790874092029141300 X-GMAIL-MSGID: 1790874092029141300 From: Petr Tesarik Map CPU system data structures (GDT, TSS, IDT) read-only into every sandbox instance. Map interrupt stacks read-write. The TSS mappings may look confusing. The trick is that TSS pages are mapped twice in the kernel address space: once read-only and once read-write. The GDT entry for the TR register uses the read-only address, but since __pa() does not work for virtual addresses in this range (cpu_entry_area), use the read-write mapping to get TSS physical address. Signed-off-by: Petr Tesarik --- arch/x86/include/asm/page_64_types.h | 1 + arch/x86/kernel/sbm/core.c | 74 ++++++++++++++++++++++++++++ 2 files changed, 75 insertions(+) diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h index 06ef25411d62..62f6e40b3361 100644 --- a/arch/x86/include/asm/page_64_types.h +++ b/arch/x86/include/asm/page_64_types.h @@ -29,6 +29,7 @@ #define IST_INDEX_DB 2 #define IST_INDEX_MCE 3 #define IST_INDEX_VC 4 +#define IST_INDEX_NUM 7 /* * Set __PAGE_OFFSET to the most negative possible address + diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c index de6986801148..f3a123d64afc 100644 --- a/arch/x86/kernel/sbm/core.c +++ b/arch/x86/kernel/sbm/core.c @@ -7,9 +7,13 @@ * SandBox Mode (SBM) implementation for the x86 architecture. */ +#include +#include #include +#include #include #include +#include #include #include #include @@ -155,6 +159,72 @@ static int map_kernel(struct x86_sbm_state *state) return 0; } +/** map_cpu_data() - map CPU system data structures into a sandbox instance + * @sbm: Target sandbox instance. + * + * Create sandbox page tables for: + * * Global Descriptor Table (GDT) + * * Task State Segment (TSS) + * * Interrupt Descriptor Table (IDT). + * + * Return: Zero on success, negative error code on failure. + */ +static int map_cpu_data(struct x86_sbm_state *state) +{ + unsigned long off; + phys_addr_t paddr; + unsigned int ist; + void *vaddr; + int cpu; + int err; + + for_each_possible_cpu(cpu) { + struct cpu_entry_area *cea; + struct tss_struct *tss; + + err = map_page(state, (unsigned long)get_cpu_gdt_ro(cpu), + PHYS_PFN(get_cpu_gdt_paddr(cpu)), + PAGE_KERNEL_RO); + if (err) + return err; + + cea = get_cpu_entry_area(cpu); + + tss = &cea->tss; + paddr = __pa(&per_cpu(cpu_tss_rw, cpu)); + for (off = 0; off < sizeof(cpu_tss_rw); off += PAGE_SIZE) { + err = map_page(state, (unsigned long)tss + off, + PHYS_PFN(paddr + off), PAGE_KERNEL_RO); + if (err) + return err; + } + + paddr = slow_virt_to_phys(&cea->entry_stack_page); + err = map_page(state, (unsigned long)&cea->entry_stack_page, + PHYS_PFN(paddr), PAGE_KERNEL); + if (err) + return err; + + for (ist = 0; ist < IST_INDEX_NUM; ++ist) { + vaddr = (void *)tss->x86_tss.ist[ist]; + if (!vaddr) + continue; + + for (off = EXCEPTION_STKSZ; off; off -= PAGE_SIZE) { + paddr = slow_virt_to_phys(vaddr - off); + err = map_page(state, (unsigned long)vaddr - off, + PHYS_PFN(paddr), PAGE_KERNEL); + if (err) + return err; + } + } + } + + paddr = slow_virt_to_phys((void *)CPU_ENTRY_AREA_RO_IDT); + return map_page(state, CPU_ENTRY_AREA_RO_IDT, PHYS_PFN(paddr), + PAGE_KERNEL_RO); +} + int arch_sbm_init(struct sbm *sbm) { struct x86_sbm_state *state; @@ -194,6 +264,10 @@ int arch_sbm_init(struct sbm *sbm) if (err) return err; + err = map_cpu_data(state); + if (err) + return err; + return 0; } From patchwork Wed Feb 14 11:35:12 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Tesarik X-Patchwork-Id: 200917 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp1147563dyb; Wed, 14 Feb 2024 03:37:22 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWXFcXQMdYRjsubGwrymvmsVR1kCvIdpfFIPyiAD8wm/7qFxoz3zLZv7uxqMSAMeBX2f1Otw/jgmteEfHL/v0261m6e4Q== X-Google-Smtp-Source: AGHT+IEWn1FD8dBmCjoIlvaP/w6+/2svqJB6KTMXIXl9MvX8/78ySXMwBxxzcu8DkorA2FTWCjT3 X-Received: by 2002:a05:6808:221e:b0:3c0:328e:a15c with SMTP id bd30-20020a056808221e00b003c0328ea15cmr2187262oib.41.1707910642387; Wed, 14 Feb 2024 03:37:22 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707910642; cv=pass; d=google.com; s=arc-20160816; b=RVIpJXj1RpEfFofjYwjVuJvpU+QychKquuUG/ZwR1OhQlAxwXLw/VEBnCSEVwnciCr b8hnmKfbBm43ET3JpWXT3tM8jnYa+5yuihGfCtheNcCt6QZEXDzeXEuSxhyRVtL0AA8C +LKLQrXZGjxc9GU0ENrXDXRdGT335C4NgIzMbGJnYoGPufNVxD5WiGlyYaVvFDZ9FHYe h2q6GxcR2zNHFlOA2gCUDSaAhBndQNHufZZvbZiBk8dSfjTJvHRjJ0uPJc+F8FhafLVV ZQ+jG91epF8hxPSnNZutUXZlUnVMszt+YmtWQnbxw5UQOXovLN+TERp5xof01F5zy+X8 2qDw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=Jqwce26vJIpZ4/vkhJDUgZ+UUfTC7DBWv8xXN4lCJRg=; fh=MoH2EHCuRIsetvZQlqpPshfQK4QAztmGctPx7EXOy9g=; b=VBbkvxQabChEguoCIj7oW4ufXwW6yQd7nOiSmZehdL5yCCKSWEZBof8werfxLTvyAh EtYCr1ojkSiuh02Q78kgG/15KQDoCmbyB7q1fjEn6obJt5iQWzz5NzKwjdb2lYZsuQpP +pRvJlw+Qxm8m8T7wWN9cCvh8DGMWkGbCRs9l4hS/P67M+VYiQ3xelqj80mHjXvmax6Z 3xMgbcUpykQYZocYbitBEWUCKqPFg5dVx0edBRe4KKji2k4+B102BdDPhCS6zRtPkOuG 2hX8UaLA+3t3M11qYFAtIM2GRtjwnc6d47ZN3j/RD+03d/6FM9OBOdu53HENXk+ROGqO NscQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-65141-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-65141-ouuuleilei=gmail.com@vger.kernel.org" X-Forwarded-Encrypted: i=2; AJvYcCXAlh8wWWm4O/qKsFjTVQz00TcxoVHMb0GDMEidluqsAV97+/cKFIxL2zRC0DRAvZc/UEXxFKztvPhk4GCbteKhkpBDgw== Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id eo11-20020ad4594b000000b0068d084263ebsi3552685qvb.306.2024.02.14.03.37.22 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Feb 2024 03:37:22 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-65141-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-65141-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-65141-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id D9D9D1C229BB for ; Wed, 14 Feb 2024 11:37:21 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2AF131BC2A; Wed, 14 Feb 2024 11:36:57 +0000 (UTC) Received: from frasgout13.his.huawei.com (frasgout13.his.huawei.com [14.137.139.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B34D51B5AA; Wed, 14 Feb 2024 11:36:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=14.137.139.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707910614; cv=none; b=pWNf9XAyfPYrXv5H9tPtDOHROvAHmNgncvjmDSyy0sKoRO9Xl8mrds0GzHtY5nisZu4fUYxs0Pw6K8JkfsRDRymdmm2FMwvPQZ04KMYVdpSa0q6uZQ0vAqf5tlXU7o2AqOUWrTx0ZMjxnbZwLQrHZJOFZVYBjUVtm8pbbCqI3uo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707910614; c=relaxed/simple; bh=mFrRekuBTReiOEFHzPYdRM8Hp3OlzszAWd9vJqfTtwo=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ICJ8vlGkCPKGpQVdUi0eM9/ajAQ1P7bKszSvYh1R2wP21ezJCOT7myfZphdf1DnS1d743K9drOOiMw4Sty5kOxmFs6CjMmlLCSfvtL7JzIuDDpEd9u4SGXx7SpHos0srh9iLlC7hu81kiO11kuS80gLocHMvg3moxxo2ipxg+fU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=14.137.139.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.18.186.51]) by frasgout13.his.huawei.com (SkyGuard) with ESMTP id 4TZbPC5Fjhz9yMLZ; Wed, 14 Feb 2024 19:21:35 +0800 (CST) Received: from mail02.huawei.com (unknown [7.182.16.47]) by mail.maildlp.com (Postfix) with ESMTP id C3545140661; Wed, 14 Feb 2024 19:36:40 +0800 (CST) Received: from huaweicloud.com (unknown [10.45.156.69]) by APP1 (Coremail) with SMTP id LxC2BwAHshp7pcxlDJx9Ag--.51624S6; Wed, 14 Feb 2024 12:36:40 +0100 (CET) From: Petr Tesarik To: Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), "H. Peter Anvin" , Andy Lutomirski , Oleg Nesterov , Peter Zijlstra , Xin Li , Arnd Bergmann , Andrew Morton , Rick Edgecombe , Kees Cook , "Masami Hiramatsu (Google)" , Pengfei Xu , Josh Poimboeuf , Ze Gao , "Kirill A. Shutemov" , Kai Huang , David Woodhouse , Brian Gerst , Jason Gunthorpe , Joerg Roedel , "Mike Rapoport (IBM)" , Tina Zhang , Jacob Pan , linux-doc@vger.kernel.org (open list:DOCUMENTATION), linux-kernel@vger.kernel.org (open list) Cc: Roberto Sassu , petr@tesarici.cz, Petr Tesarik Subject: [PATCH v1 4/8] sbm: x86: allocate and map an exception stack Date: Wed, 14 Feb 2024 12:35:12 +0100 Message-Id: <20240214113516.2307-5-petrtesarik@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240214113516.2307-1-petrtesarik@huaweicloud.com> References: <20240214113516.2307-1-petrtesarik@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: LxC2BwAHshp7pcxlDJx9Ag--.51624S6 X-Coremail-Antispam: 1UD129KBjvJXoW3Xw48Gw47KFW8KFWxCF45Jrb_yoW7uw4rpF WDA3WkKF4Y9as3Zr9rJr4vvr9xZr4v9r43GF9rK345ZF1Utw15Xrn7KF9Fqr45ZrZ8Ga1Y qFWFqr4DCan8JaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUml14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26r4j6F4UM28EF7xvwVC2z280aVCY1x0267AKxVW8Jr0_ Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6x IIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_ Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI8v6xkF7I0E8c xan2IY04v7MxkF7I0Ew4C26cxK6c8Ij28IcwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE 7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI 8E67AF67kF1VAFwI0_Wrv_Gr1UMIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Gr0_ Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r 1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4U JbIYCTnIWIevJa73UjIFyTuYvjfUnzVbDUUUU X-CM-SenderInfo: hshw23xhvd2x3n6k3tpzhluzxrxghudrp/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790874109574815500 X-GMAIL-MSGID: 1790874109574815500 From: Petr Tesarik Sandbox mode should run with CPL 3. It is treated as user mode by the CPU, so non-IST interrupts will load RSP from TSS. This is the trampoline stack and that one is fine. However, the interrupt entry code then moves to the thread stack, assuming that it cannot be currently in use, since the task was executing in user mode. This assumption is not valid for sandbox mode, because the code was originally called from kernel mode, and the thread stack contains precious data of the sandbox mode callers. Allocate a separate exception stack for sandbox and use it instead of the thread stack in interrupt handlers while sandbox mode is active. To find the sandbox exception stack from interrupt entry, store a pointer to the state page in struct thread_info. This pointer is non-NULL if the current task is running in sandbox mode. It is also non-NULL during the transition from/to sandbox mode. The sandbox exception stack is valid in either case. Signed-off-by: Petr Tesarik --- arch/x86/include/asm/sbm.h | 24 ++++++++++++++++++++++++ arch/x86/include/asm/thread_info.h | 3 +++ arch/x86/kernel/sbm/call_64.S | 1 + arch/x86/kernel/sbm/core.c | 21 ++++++++++++++++++++- arch/x86/kernel/traps.c | 3 ++- 5 files changed, 50 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/sbm.h b/arch/x86/include/asm/sbm.h index ed214c17af06..ca4741b449e8 100644 --- a/arch/x86/include/asm/sbm.h +++ b/arch/x86/include/asm/sbm.h @@ -9,6 +9,8 @@ #ifndef __ASM_SBM_H #define __ASM_SBM_H +#include + #if defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) #include @@ -17,6 +19,7 @@ * struct x86_sbm_state - Run-time state of the environment. * @pgd: Sandbox mode page global directory. * @stack: Sandbox mode stack. + * @exc_stack: Exception and IRQ stack. * * One instance of this union is allocated for each sandbox and stored as SBM * instance private data. @@ -24,8 +27,29 @@ struct x86_sbm_state { pgd_t *pgd; unsigned long stack; + unsigned long exc_stack; }; +/** + * top_of_intr_stack() - Get address interrupt stack. + * + */ +static inline unsigned long top_of_intr_stack(void) +{ + struct x86_sbm_state *sbm = current_thread_info()->sbm_state; + + if (sbm) + return sbm->exc_stack + EXCEPTION_STKSZ; + return current_top_of_stack(); +} + +#else /* defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) */ + +static inline unsigned long top_of_intr_stack(void) +{ + return current_top_of_stack(); +} + #endif /* defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) */ #endif /* __ASM_SBM_H */ diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index d63b02940747..95b1acffb78a 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -60,6 +60,9 @@ struct thread_info { #ifdef CONFIG_SMP u32 cpu; /* current CPU */ #endif +#ifdef CONFIG_SANDBOX_MODE + struct x86_sbm_state *sbm_state; /* SandBox mode state page */ +#endif }; #define INIT_THREAD_INFO(tsk) \ diff --git a/arch/x86/kernel/sbm/call_64.S b/arch/x86/kernel/sbm/call_64.S index 245d0dddce73..1b232c8d15b7 100644 --- a/arch/x86/kernel/sbm/call_64.S +++ b/arch/x86/kernel/sbm/call_64.S @@ -9,6 +9,7 @@ #include #include +#include .code64 .section .entry.text, "ax" diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c index f3a123d64afc..81f1b0093537 100644 --- a/arch/x86/kernel/sbm/core.c +++ b/arch/x86/kernel/sbm/core.c @@ -264,6 +264,14 @@ int arch_sbm_init(struct sbm *sbm) if (err) return err; + state->exc_stack = __get_free_pages(GFP_KERNEL, EXCEPTION_STACK_ORDER); + if (err) + return err; + err = map_range(state, state->exc_stack, + state->exc_stack + EXCEPTION_STKSZ, PAGE_KERNEL); + if (err) + return err; + err = map_cpu_data(state); if (err) return err; @@ -324,6 +332,7 @@ void arch_sbm_destroy(struct sbm *sbm) free_pgd(state->pgd); free_pages((unsigned long)state->pgd, PGD_ORDER); } + free_pages(state->exc_stack, EXCEPTION_STACK_ORDER); free_pages(state->stack, THREAD_SIZE_ORDER); free_page((unsigned long)state); sbm->private = NULL; @@ -332,6 +341,16 @@ void arch_sbm_destroy(struct sbm *sbm) int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *args) { struct x86_sbm_state *state = sbm->private; + int err; + + /* let interrupt handlers use the sandbox state page */ + barrier(); + WRITE_ONCE(current_thread_info()->sbm_state, state); + + err = x86_sbm_exec(state, func, args, state->stack + THREAD_SIZE); + + /* NULLify the state page pointer before it becomes stale */ + WRITE_ONCE(current_thread_info()->sbm_state, NULL); - return x86_sbm_exec(state, func, args, state->stack + THREAD_SIZE); + return err; } diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index c3b2f863acf0..b9c9c74314e7 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -66,6 +66,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 #include @@ -773,7 +774,7 @@ DEFINE_IDTENTRY_RAW(exc_int3) */ asmlinkage __visible noinstr struct pt_regs *sync_regs(struct pt_regs *eregs) { - struct pt_regs *regs = (struct pt_regs *)this_cpu_read(pcpu_hot.top_of_stack) - 1; + struct pt_regs *regs = (struct pt_regs *)top_of_intr_stack() - 1; if (regs != eregs) *regs = *eregs; return regs; From patchwork Wed Feb 14 11:35:13 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Tesarik X-Patchwork-Id: 200918 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp1147804dyb; Wed, 14 Feb 2024 03:38:02 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXrolkReclcQbNBASnTiq8e/Oh754XsUnGFZZhQsuZw9DknhlkspyjVQfExkfKKrsGf4QeNFmrs3N1LGyX3qV9dbTdQsQ== X-Google-Smtp-Source: AGHT+IErn5elgNMAk7SD0TdwVHku3VK4baMmJiwNYQi1Bjj8xTsUBxYHx2ip7p001rLr9JIfhpfY X-Received: by 2002:a17:902:ce8a:b0:1d9:5923:c247 with SMTP id f10-20020a170902ce8a00b001d95923c247mr2373229plg.29.1707910681921; Wed, 14 Feb 2024 03:38:01 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707910681; cv=pass; d=google.com; s=arc-20160816; b=rHhM0wcEvYiqPXvvnzvNOM59Ic/fyxicRGpDDEJDZre4KsuIv/PUvcvImqPlsU6JBP LNtv0pl+wYgGhZWepMn7jGyaAv3d1mVv3CSKvu7dwFosjTwa5U6O7fhUVd5yVsH6Eqbt uOQO7OddSbT/hOt8oM816lDtHCBiqYRAB6eadPfi3kdXChSRUSGon2AjzqUFdAnpSeDt SaMfxw6ZP4MH5ym69Ja7FuUJDw92AwSg4VbK9vvego60AzlQfWfiObVqCLJqupGe/RmA ZoPx4CuWqHp/Cq6ZavKzDin7p593FUtW4r4hyDJftTaGrhlvGjDTK9XfXxA98BONcKGq Z22Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=aMdq2/I9TuaJI+Kn/Ajn7PLLSAHgO3XXhq/EsEpflmc=; fh=oW0KnAWTbryfRG4X9YFTlfkkRdO+YRniCHgX+i/QyGA=; b=cGBBaRxTrkXEsuXc79XpqJ+NSBqQCImU/qk+cbWa54UW5GCDN1IjwNjMLltuizSOf7 pAN+FREjaiKf4aG6fpr/8Gshfc8aAhYGFLQi1aqQ+Kav/fpDobAjnM3xyZ1NGR6cuAtZ vyZs58r4/FppclMv78uOmfjkbvDz0OUi08uByXvua5lMGtnHgEfljKzE6a/+Z/KkA9aH IrOelCzLNUDypY9iGwLQwASZrC1/qJyxAEctp/fQdRv3twsY9Kie69f/tpJvKEo9AE7M kWOxTUJLKKUHRbU2+sbBAKAM4Jko5rsNPwOSsNALQ/34WXzPcf+rQ8Z8HaD8BLUDnZPJ G4WQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-65143-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-65143-ouuuleilei=gmail.com@vger.kernel.org" X-Forwarded-Encrypted: i=2; AJvYcCUvgXAbn0WuySDFPsINAvcQ5HbS0ZCAoKH1c6k0l8q9Eq9Vhs0kFBsK0XyQCsLyldbZ/1hXMV7xNjqU3OiJIt6v9DBXAw== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id x185-20020a6363c2000000b00565db2812a0si3724558pgb.60.2024.02.14.03.38.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Feb 2024 03:38:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-65143-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-65143-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-65143-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id B612A28544A for ; Wed, 14 Feb 2024 11:37:56 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 587F71BC3F; Wed, 14 Feb 2024 11:37:13 +0000 (UTC) Received: from frasgout11.his.huawei.com (frasgout11.his.huawei.com [14.137.139.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5EB431AACE; Wed, 14 Feb 2024 11:37:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=14.137.139.23 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707910630; cv=none; b=plMhWzxq3sYYcPCA8+3dJBmyDr4dZb+CLDIGv1KnQ/skUDjujokH9+o2/0rLEKkIan/Qsfa9Z2RtmyNpML6cDPnCSs/VTvYUSf452UHybc3KFq/X8Y5md+9HhuEnhIMrfGCDrka0FpIbioEgPFpw5gWzX61LisMWw3QfKoe+V4k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707910630; c=relaxed/simple; bh=a2vh6qq7VHsS7NIXK2cTE7729KXe3Z0csZimJkYwPdw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=gW16qYIr0U4wFn2yXViwS1+YlxsB8Yf8W3yaCCzeph21xP1HK/fwDhte/OBy8NjuInL3eB9l5nswJcxBm6cQLqnAmoq9ZJiKHiT1MHu8Qgqy7XfB5zc8xkFt2fIDNGPhAWMmdh0Zal9pbxLcrKRNP9UugLapWCadeliI/PKWFvg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=14.137.139.23 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.18.186.51]) by frasgout11.his.huawei.com (SkyGuard) with ESMTP id 4TZbPT2J9nz9ynSS; Wed, 14 Feb 2024 19:21:49 +0800 (CST) Received: from mail02.huawei.com (unknown [7.182.16.47]) by mail.maildlp.com (Postfix) with ESMTP id DB5521405A2; Wed, 14 Feb 2024 19:36:55 +0800 (CST) Received: from huaweicloud.com (unknown [10.45.156.69]) by APP1 (Coremail) with SMTP id LxC2BwAHshp7pcxlDJx9Ag--.51624S7; Wed, 14 Feb 2024 12:36:55 +0100 (CET) From: Petr Tesarik To: Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), "H. Peter Anvin" , Andy Lutomirski , Oleg Nesterov , Peter Zijlstra , Xin Li , Arnd Bergmann , Andrew Morton , Rick Edgecombe , Kees Cook , "Masami Hiramatsu (Google)" , Pengfei Xu , Josh Poimboeuf , Ze Gao , "Kirill A. Shutemov" , Kai Huang , David Woodhouse , Brian Gerst , Jason Gunthorpe , Joerg Roedel , "Mike Rapoport (IBM)" , Tina Zhang , Jacob Pan , linux-doc@vger.kernel.org (open list:DOCUMENTATION), linux-kernel@vger.kernel.org (open list) Cc: Roberto Sassu , petr@tesarici.cz, Petr Tesarik Subject: [PATCH v1 5/8] sbm: x86: handle sandbox mode faults Date: Wed, 14 Feb 2024 12:35:13 +0100 Message-Id: <20240214113516.2307-6-petrtesarik@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240214113516.2307-1-petrtesarik@huaweicloud.com> References: <20240214113516.2307-1-petrtesarik@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: LxC2BwAHshp7pcxlDJx9Ag--.51624S7 X-Coremail-Antispam: 1UD129KBjvJXoW3Xw48KFW8Jr4DKFy5JFyrWFg_yoWDGryxpF 9rAFn5GFZxWa4SvF9xAr4vvrW3Aws5Kw1YkF9rKry5Z3W2q345Xr4v9w1qqr4kZ395W3WY gFW5Zrn5uan8Jw7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUml14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26r4j6F4UM28EF7xvwVC2z280aVCY1x0267AKxVW8Jr0_ Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6x IIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_ Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI8v6xkF7I0E8c xan2IY04v7MxkF7I0Ew4C26cxK6c8Ij28IcwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE 7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI 8E67AF67kF1VAFwI0_Wrv_Gr1UMIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Gr0_ Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r 1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4U JbIYCTnIWIevJa73UjIFyTuYvjfUnzVbDUUUU X-CM-SenderInfo: hshw23xhvd2x3n6k3tpzhluzxrxghudrp/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790874151089781656 X-GMAIL-MSGID: 1790874151089781656 From: Petr Tesarik Provide a fault handler for sandbox mode. Set the sandbox mode instance error code, abort the sandbox and return to the caller. To allow graceful return from a fatal fault, save all callee-saved registers (including the stack pointer) just before passing control to the target function. Modify the handlers for #PF and #DF CPU exceptions to call this handler if coming from sandbox mode. The check is based on the saved CS register, which should be modified in the entry path to a value that is otherwise not possible (__SBM_CS). For the page fault handler, make sure that sandbox mode check is placed before do_kern_addr_fault(). That function calls spurious_kernel_fault(), which implements lazy TLB invalidation of kernel pages and it assumes that the faulting instruction ran with kernel-mode page tables; it would produce false positives for sandbox mode. Signed-off-by: Petr Tesarik --- arch/x86/include/asm/ptrace.h | 21 +++++++++++++++++++++ arch/x86/include/asm/sbm.h | 24 ++++++++++++++++++++++++ arch/x86/include/asm/segment.h | 7 +++++++ arch/x86/kernel/asm-offsets.c | 5 +++++ arch/x86/kernel/sbm/call_64.S | 21 +++++++++++++++++++++ arch/x86/kernel/sbm/core.c | 26 ++++++++++++++++++++++++++ arch/x86/kernel/traps.c | 11 +++++++++++ arch/x86/mm/fault.c | 6 ++++++ 8 files changed, 121 insertions(+) diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index f4db78b09c8f..f66f16f037b0 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -164,6 +164,27 @@ static inline bool user_64bit_mode(struct pt_regs *regs) #endif } +/* + * sandbox_mode() - did a register set come from SandBox Mode? + * @regs: register set + */ +static inline bool sandbox_mode(struct pt_regs *regs) +{ +#ifdef CONFIG_X86_64 +#ifdef CONFIG_SANDBOX_MODE + /* + * SandBox Mode always runs in 64-bit and it is not implemented + * on paravirt systems, so this is the only possible value. + */ + return regs->cs == __SBM_CS; +#else /* !CONFIG_SANDBOX_MODE */ + return false; +#endif +#else /* !CONFIG_X86_64 */ + return false; +#endif +} + /* * Determine whether the register set came from any context that is running in * 64-bit mode. diff --git a/arch/x86/include/asm/sbm.h b/arch/x86/include/asm/sbm.h index ca4741b449e8..229b1ac3bbd4 100644 --- a/arch/x86/include/asm/sbm.h +++ b/arch/x86/include/asm/sbm.h @@ -11,23 +11,29 @@ #include +struct pt_regs; + #if defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) #include /** * struct x86_sbm_state - Run-time state of the environment. + * @sbm: Link back to the SBM instance. * @pgd: Sandbox mode page global directory. * @stack: Sandbox mode stack. * @exc_stack: Exception and IRQ stack. + * @return_sp: Stack pointer for returning to kernel mode. * * One instance of this union is allocated for each sandbox and stored as SBM * instance private data. */ struct x86_sbm_state { + struct sbm *sbm; pgd_t *pgd; unsigned long stack; unsigned long exc_stack; + unsigned long return_sp; }; /** @@ -43,6 +49,18 @@ static inline unsigned long top_of_intr_stack(void) return current_top_of_stack(); } +/** + * handle_sbm_fault() - Handle a CPU fault in sandbox mode. + * @regs: Saved registers at fault. + * @error_code: CPU error code. + * @address: Fault address (CR2 register). + * + * Handle a sandbox mode fault. The caller should use sandbox_mode() to + * check that @regs came from sandbox mode before calling this function. + */ +void handle_sbm_fault(struct pt_regs *regs, unsigned long error_code, + unsigned long address); + #else /* defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) */ static inline unsigned long top_of_intr_stack(void) @@ -50,6 +68,12 @@ static inline unsigned long top_of_intr_stack(void) return current_top_of_stack(); } +static inline void handle_sbm_fault(struct pt_regs *regs, + unsigned long error_code, + unsigned long address) +{ +} + #endif /* defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) */ #endif /* __ASM_SBM_H */ diff --git a/arch/x86/include/asm/segment.h b/arch/x86/include/asm/segment.h index 9d6411c65920..966831385d18 100644 --- a/arch/x86/include/asm/segment.h +++ b/arch/x86/include/asm/segment.h @@ -217,6 +217,13 @@ #define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS*8 + 3) #define __CPUNODE_SEG (GDT_ENTRY_CPUNODE*8 + 3) +/* + * Sandbox runs with __USER_CS, but the interrupt entry code sets the RPL + * in the saved selector to zero to avoid user-mode processing (FPU, signal + * delivery, etc.). This is the resulting pseudo-CS. + */ +#define __SBM_CS (GDT_ENTRY_DEFAULT_USER_CS*8) + #endif #define IDT_ENTRIES 256 diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 6913b372ccf7..44d4f0a0cb19 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -20,6 +20,7 @@ #include #include #include +#include #ifdef CONFIG_XEN #include @@ -120,4 +121,8 @@ static void __used common(void) OFFSET(ARIA_CTX_rounds, aria_ctx, rounds); #endif +#if defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) + COMMENT("SandBox Mode"); + OFFSET(SBM_return_sp, x86_sbm_state, return_sp); +#endif } diff --git a/arch/x86/kernel/sbm/call_64.S b/arch/x86/kernel/sbm/call_64.S index 1b232c8d15b7..6a615b4f6047 100644 --- a/arch/x86/kernel/sbm/call_64.S +++ b/arch/x86/kernel/sbm/call_64.S @@ -22,6 +22,17 @@ * rcx .. top of sandbox stack */ SYM_FUNC_START(x86_sbm_exec) + /* save all callee-saved registers */ + push %rbp + push %rbx + push %r12 + push %r13 + push %r14 + push %r15 + + /* to be used by sandbox abort */ + mov %rsp, SBM_return_sp(%rdi) + /* * Set up the sandbox stack: * 1. Store the old stack pointer at the top of the sandbox stack, @@ -37,5 +48,15 @@ SYM_FUNC_START(x86_sbm_exec) pop %rsp +SYM_INNER_LABEL(x86_sbm_return, SYM_L_GLOBAL) + ANNOTATE_NOENDBR // IRET target via x86_sbm_fault() + + /* restore callee-saved registers and return */ + pop %r15 + pop %r14 + pop %r13 + pop %r12 + pop %rbx + pop %rbp RET SYM_FUNC_END(x86_sbm_exec) diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c index 81f1b0093537..d4c378847e93 100644 --- a/arch/x86/kernel/sbm/core.c +++ b/arch/x86/kernel/sbm/core.c @@ -13,6 +13,8 @@ #include #include #include +#include +#include #include #include #include @@ -23,6 +25,7 @@ asmlinkage int x86_sbm_exec(struct x86_sbm_state *state, sbm_func func, void *args, unsigned long sbm_tos); +extern char x86_sbm_return[]; static inline phys_addr_t page_to_ptval(struct page *page) { @@ -343,6 +346,8 @@ int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *args) struct x86_sbm_state *state = sbm->private; int err; + state->sbm = sbm; + /* let interrupt handlers use the sandbox state page */ barrier(); WRITE_ONCE(current_thread_info()->sbm_state, state); @@ -354,3 +359,24 @@ int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *args) return err; } + +void handle_sbm_fault(struct pt_regs *regs, unsigned long error_code, + unsigned long address) +{ + struct x86_sbm_state *state = current_thread_info()->sbm_state; + + /* + * Force -EFAULT unless the fault was due to a user-mode instruction + * fetch from the designated return address. + */ + if (error_code != (X86_PF_PROT | X86_PF_USER | X86_PF_INSTR) || + address != (unsigned long)x86_sbm_return) + state->sbm->error = -EFAULT; + + /* modify IRET frame to exit from sandbox */ + regs->ip = (unsigned long)x86_sbm_return; + regs->cs = __KERNEL_CS; + regs->flags = X86_EFLAGS_IF; + regs->sp = state->return_sp; + regs->ss = __KERNEL_DS; +} diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index b9c9c74314e7..8fc5b17b8fb4 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -416,6 +416,12 @@ DEFINE_IDTENTRY_DF(exc_double_fault) irqentry_nmi_enter(regs); instrumentation_begin(); + + if (sandbox_mode(regs)) { + handle_sbm_fault(regs, error_code, 0); + return; + } + notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV); tsk->thread.error_code = error_code; @@ -675,6 +681,11 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection) goto exit; } + if (sandbox_mode(regs)) { + handle_sbm_fault(regs, error_code, 0); + return; + } + if (gp_try_fixup_and_notify(regs, X86_TRAP_GP, error_code, desc, 0)) goto exit; diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 679b09cfe241..f223b258e53f 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -34,6 +34,7 @@ #include /* kvm_handle_async_pf */ #include /* fixup_vdso_exception() */ #include +#include #define CREATE_TRACE_POINTS #include @@ -1500,6 +1501,11 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code, if (unlikely(kmmio_fault(regs, address))) return; + if (sandbox_mode(regs)) { + handle_sbm_fault(regs, error_code, address); + return; + } + /* Was the fault on kernel-controlled part of the address space? */ if (unlikely(fault_in_kernel_space(address))) { do_kern_addr_fault(regs, error_code, address); From patchwork Wed Feb 14 11:35:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Tesarik X-Patchwork-Id: 200919 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp1147883dyb; Wed, 14 Feb 2024 03:38:14 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXjRa7Uv9w5a4ihNoWUzes29acZ0NhZnXtmZDPhGh3s6Cy/Uvs6lEHzhWGioli+pCmaGFOngns6iiz5llwlBh4aEQPMiA== X-Google-Smtp-Source: AGHT+IG491tOGkLrftpzTpZ0EqKKxvTbUYKwUFd8vXQi8Yep28JmVUofj3Ll8ZvtAp3a/7VpXo/H X-Received: by 2002:a9d:6244:0:b0:6e2:e26a:8587 with SMTP id i4-20020a9d6244000000b006e2e26a8587mr2573296otk.33.1707910694608; Wed, 14 Feb 2024 03:38:14 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707910694; cv=pass; d=google.com; s=arc-20160816; b=hLjvUHtqUjmM/cMkjVIwan+jJtpJULhJ9132WC4IheUFLEwGAKMgDhe4z+96FeDcKd vy8aQALR3wqWHnne7nzd5z3a0aQHkoSh70RPXMK/MoghfD5y/XO72nCPO7a0jWr9TZQf MBxAFk33lgx+P/yWB+3D/8Fz/aQ9RwZC3y34KF/mnZlzZAyk0/XivfMrfN5Ze0/zp2on Q+aDgC9bJVZm+FLqoQcVL7Z0CRssgFu/DjFxnqcV3DpCXLhrUXYTbmBYfxtyDmfktHsj u6zIWwlcGoKJ902psW4kmio93TDuMfN2JkWImPK2FkuK4KVc9pJI4XYS4gESTwEHNaeK ayFw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=T6K2mdMb2xwVy+9I66bZTMyHy7mz8+ayx9XxjipRXkU=; fh=zFccAji+AX/37ga4Y2V1AjH8KfAx4tcvCwGuzoNFICU=; b=XOdMtkdktgYNjOhNYScQT5Qhc8RJ+C0ZlZzg0j9w0czfIVWGLMDpm12cwitdzk0a/q ljJDnao8SUWkQMYYQ9JQVeejibUXo+wIfXAzhc8RifxE7XMCfte4hdo2a9mUBY/XQDEY z27MrTuhv0oTkIuX+FNBtlYRqcexkNrECmkDLE60lQ9FYp6wWrgP0gmxU/JpcCMpsOGB PpuOl3u86o3IoPRVeEHWxn0YBuneDt0i31RItj/zQIE6JmN+YCBhDQYGHmBjfg3neATV wWuJovRRongrT/mr4FOB8x7SczydRhr0k0Xgmn2KonFiVNKTF1ko+DVp6PQQ1lnvP1ZK unqA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-65144-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-65144-ouuuleilei=gmail.com@vger.kernel.org" X-Forwarded-Encrypted: i=2; AJvYcCUh2HHzHIT8PpGr9qUMMFUpLqgsQZCTImy96ckh9O0cgXePaj7eqqHo4pj5miPhEMJZ+aKblQPKVkAmsgHEFSf/FI3PBw== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id l184-20020a6388c1000000b005dc5061bd7fsi3491064pgd.557.2024.02.14.03.38.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Feb 2024 03:38:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-65144-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-65144-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-65144-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 4B079285075 for ; Wed, 14 Feb 2024 11:38:14 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DF12E1BF28; Wed, 14 Feb 2024 11:37:19 +0000 (UTC) Received: from frasgout13.his.huawei.com (frasgout13.his.huawei.com [14.137.139.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 30E511BC4F; Wed, 14 Feb 2024 11:37:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=14.137.139.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707910636; cv=none; b=LyBxqH0u8wsdVDIuQye0gEdkHJM4+2GiRGwhHr3H9/tt478OL295cEb+8nvEQuVRCu9g9q3vXPqyGBiT+BKpssnNUwNMIcIcakXXNnXCyDQRprIn6QRvo3qHUQEskLGkd7UulEYM1ReQ079OQ7qEMJTSwHMyyD1TRNaAay6BueU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707910636; c=relaxed/simple; bh=3KIpfTvqNgYyjl5w8d86enZgWo8Yqf3eaTAdxYPmWyU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=qHBskDOHw529uH7CDI2cmUovzcNC+poIcAtXYpp7W2Y6miDR8eV2uoyiYKrDyiwkVG9/pgAeDvYRfd3plO3rliP06xDqTswQWTMq73jVNeHgWt6sFqHFUe+Y5IdlM00XTnU70Hj59HRltRsiymLuZ9enJ6VnnoBW55e4vCMAtTA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=14.137.139.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.18.186.51]) by frasgout13.his.huawei.com (SkyGuard) with ESMTP id 4TZbPc5qBjz9yMLF; Wed, 14 Feb 2024 19:21:56 +0800 (CST) Received: from mail02.huawei.com (unknown [7.182.16.47]) by mail.maildlp.com (Postfix) with ESMTP id F3FBE1404FC; Wed, 14 Feb 2024 19:37:10 +0800 (CST) Received: from huaweicloud.com (unknown [10.45.156.69]) by APP1 (Coremail) with SMTP id LxC2BwAHshp7pcxlDJx9Ag--.51624S8; Wed, 14 Feb 2024 12:37:10 +0100 (CET) From: Petr Tesarik To: Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), "H. Peter Anvin" , Andy Lutomirski , Oleg Nesterov , Peter Zijlstra , Xin Li , Arnd Bergmann , Andrew Morton , Rick Edgecombe , Kees Cook , "Masami Hiramatsu (Google)" , Pengfei Xu , Josh Poimboeuf , Ze Gao , "Kirill A. Shutemov" , Kai Huang , David Woodhouse , Brian Gerst , Jason Gunthorpe , Joerg Roedel , "Mike Rapoport (IBM)" , Tina Zhang , Jacob Pan , linux-doc@vger.kernel.org (open list:DOCUMENTATION), linux-kernel@vger.kernel.org (open list) Cc: Roberto Sassu , petr@tesarici.cz, Petr Tesarik Subject: [PATCH v1 6/8] sbm: x86: switch to sandbox mode pages in arch_sbm_exec() Date: Wed, 14 Feb 2024 12:35:14 +0100 Message-Id: <20240214113516.2307-7-petrtesarik@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240214113516.2307-1-petrtesarik@huaweicloud.com> References: <20240214113516.2307-1-petrtesarik@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: LxC2BwAHshp7pcxlDJx9Ag--.51624S8 X-Coremail-Antispam: 1UD129KBjvAXoW3uw4kWFy7Wr47tr1fGFWDXFb_yoW8Gr15Ko WagF43Kr4xJr9I9a4kAr18Ka4FqFyvqw4kX3WYyw4YvF9xJan5Xry8Gan0y34ruF1Ygwsx Z3y3WFy7Kan2qwnxn29KB7ZKAUJUUUUU529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOT7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l 84ACjcxK6xIIjxv20xvE14v26r4j6ryUM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJV WxJr1l84ACjcxK6I8E87Iv67AKxVW8JVWxJwA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gr1j 6F4UJwAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7V C0I7IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j 6r4UM4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x0262 8vn2kIc2xKxwCY1x0264kExVAvwVAq07x20xyl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC 6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWw C2zVAF1VAY17CE14v26rWY6r4UJwCIc40Y0x0EwIxGrwCI42IY6xIIjxv20xvE14v26r4j 6ryUMIIF0xvE2Ix0cI8IcVCY1x0267AKxVW8Jr0_Cr1UMIIF0xvE42xK8VAvwI8IcIk0rV WUJVWUCwCI42IY6I8E87Iv67AKxVW8JVWxJwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWx JrUvcSsGvfC2KfnxnUUI43ZEXa7VUjyCJPUUUUU== X-CM-SenderInfo: hshw23xhvd2x3n6k3tpzhluzxrxghudrp/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790874164656271588 X-GMAIL-MSGID: 1790874164656271588 From: Petr Tesarik Change CR3 to the sandbox page tables while running in sandbox mode. Since interrupts are enabled, but interrupt handlers cannot run in sandbox mode, modify the interrupt entry and exit paths to leave/reenter sandbox mode as needed. For now, these modifications are not implemented for Xen PV, so add a conflict with CONFIG_XEN_PV. For interrupt entry, save the kernel-mode CR3 value in a dynamically allocated state page and map this page to a fixed virtual address in sandbox mode, so it can be found without relying on any CPU state other than paging. In kernel-mode, this address maps to a zero-filled page in the kernel BSS section. Special care is needed to flush the TLB when entering and leaving sandbox mode, because it changes the mapping of kernel addresses. Kernel page table entries are marked as global and thus normally not flushed when a new value is written to CR3. To flush them, turn off the PGE bit in CR4 when entering sandbox mode (and restore CR4.PGE when leaving sandbox mode). Albeit not very efficient, this method is simple and works. Signed-off-by: Petr Tesarik --- arch/x86/Kconfig | 2 +- arch/x86/entry/entry_64.S | 136 ++++++++++++++++++++++++++++++++++ arch/x86/include/asm/sbm.h | 4 + arch/x86/kernel/asm-offsets.c | 5 ++ arch/x86/kernel/sbm/call_64.S | 48 ++++++++++-- arch/x86/kernel/sbm/core.c | 23 +++++- 6 files changed, 209 insertions(+), 9 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 090d46c7ee7c..e6ee1d3a273b 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -188,7 +188,7 @@ config X86 select HAVE_ARCH_MMAP_RND_COMPAT_BITS if MMU && COMPAT select HAVE_ARCH_COMPAT_MMAP_BASES if MMU && COMPAT select HAVE_ARCH_PREL32_RELOCATIONS - select HAVE_ARCH_SBM if X86_64 + select HAVE_ARCH_SBM if X86_64 && !XEN_PV select HAVE_ARCH_SECCOMP_FILTER select HAVE_ARCH_THREAD_STRUCT_WHITELIST select HAVE_ARCH_STACKLEAK diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index c40f89ab1b4c..e1364115408a 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -623,6 +623,23 @@ SYM_INNER_LABEL(restore_regs_and_return_to_kernel, SYM_L_GLOBAL) ud2 1: #endif + +#ifdef CONFIG_SANDBOX_MODE + /* Restore CR3 if the exception came from sandbox mode. */ + cmpw $__SBM_CS, CS(%rsp) + jne .Lreturn_cr3_done + + movq PER_CPU_VAR(pcpu_hot + X86_current_task), %rcx + movq TASK_sbm_state(%rcx), %rcx + movq SBM_sbm_cr3(%rcx), %rcx + movq %cr4, %rax + andb $~X86_CR4_PGE, %al + movq %rax, %cr4 + movq %rcx, %cr3 + orb $3, CS(%rsp) +#endif + +.Lreturn_cr3_done: POP_REGS addq $8, %rsp /* skip regs->orig_ax */ /* @@ -867,6 +884,27 @@ SYM_CODE_START(paranoid_entry) PUSH_AND_CLEAR_REGS save_ret=1 ENCODE_FRAME_POINTER 8 +#ifdef CONFIG_SANDBOX_MODE + /* + * If sandbox mode was active, adjust the saved CS, switch to + * kernel CR3 and skip non-sandbox CR3 handling. Save old CR3 + * in %r14 even if not using PAGE_TABLE_ISOLATION. This is + * needed during transition to sandbox mode, when CR3 is already + * set, but CS is still __KERNEL_CS. + */ + movq x86_sbm_state + SBM_kernel_cr3, %rcx + jrcxz .Lparanoid_switch_to_kernel + + movq %cr3, %r14 + andb $~3, CS+8(%rsp) + movq %cr4, %rax + orb $X86_CR4_PGE, %al + movq %rax, %cr4 + movq %rcx, %cr3 + jmp .Lparanoid_gsbase +#endif + +.Lparanoid_switch_to_kernel: /* * Always stash CR3 in %r14. This value will be restored, * verbatim, at exit. Needed if paranoid_entry interrupted @@ -884,6 +922,7 @@ SYM_CODE_START(paranoid_entry) */ SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14 +.Lparanoid_gsbase: /* * Handling GSBASE depends on the availability of FSGSBASE. * @@ -967,6 +1006,22 @@ SYM_CODE_START_LOCAL(paranoid_exit) */ IBRS_EXIT save_reg=%r15 +#ifdef CONFIG_SANDBOX_MODE + /* + * When returning to sandbox mode, the sandbox CR3 is restored in + * restore_regs_and_return_to_kernel. When returning to kernel mode, + * but sandbox mode is active, restore CR3 from %r14 here. + */ + cmpw $__SBM_CS, CS(%rsp) + je .Lparanoid_exit_fsgs + movq PER_CPU_VAR(pcpu_hot + X86_current_task), %rax + cmpq $0, TASK_sbm_state(%rax) + je .Lparanoid_exit_no_sbm + movq %r14, %cr3 + jmp .Lparanoid_exit_fsgs +#endif + +.Lparanoid_exit_no_sbm: /* * The order of operations is important. RESTORE_CR3 requires * kernel GSBASE. @@ -977,6 +1032,7 @@ SYM_CODE_START_LOCAL(paranoid_exit) */ RESTORE_CR3 scratch_reg=%rax save_reg=%r14 +.Lparanoid_exit_fsgs: /* Handle the three GSBASE cases */ ALTERNATIVE "jmp .Lparanoid_exit_checkgs", "", X86_FEATURE_FSGSBASE @@ -1007,6 +1063,24 @@ SYM_CODE_START(error_entry) testb $3, CS+8(%rsp) jz .Lerror_kernelspace +#ifdef CONFIG_SANDBOX_MODE + /* + * If sandbox mode was active, adjust the saved CS, + * unconditionally switch to kernel CR3 and continue + * as if the interrupt was from kernel space. + */ + movq x86_sbm_state + SBM_kernel_cr3, %rcx + jrcxz .Lerror_swapgs + + andb $~3, CS+8(%rsp) + movq %cr4, %rax + orb $X86_CR4_PGE, %al + movq %rax, %cr4 + movq %rcx, %cr3 + jmp .Lerror_entry_done_lfence +#endif + +.Lerror_swapgs: /* * We entered from user mode or we're pretending to have entered * from user mode due to an IRET fault. @@ -1149,6 +1223,11 @@ SYM_CODE_START(asm_exc_nmi) testb $3, CS-RIP+8(%rsp) jz .Lnmi_from_kernel +#ifdef CONFIG_SANDBOX_MODE + cmpq $0, x86_sbm_state + SBM_kernel_cr3 + jne .Lnmi_from_sbm +#endif + /* * NMI from user mode. We need to run on the thread stack, but we * can't go through the normal entry paths: NMIs are masked, and @@ -1194,6 +1273,47 @@ SYM_CODE_START(asm_exc_nmi) */ jmp swapgs_restore_regs_and_return_to_usermode +#ifdef CONFIG_SANDBOX_MODE +.Lnmi_from_sbm: + /* + * If NMI from sandbox mode, this cannot be a nested NMI. Adjust + * saved CS, load kernel CR3 and continue on the sandbox exception + * stack. The code is similar to NMI from user mode. + */ + andb $~3, CS-RIP+8(%rsp) + movq %cr4, %rdx + orb $X86_CR4_PGE, %dl + movq %rdx, %cr4 + movq x86_sbm_state + SBM_kernel_cr3, %rdx + movq %rdx, %cr3 + + movq PER_CPU_VAR(pcpu_hot + X86_current_task), %rdx + movq TASK_sbm_state(%rdx), %rdx + movq SBM_exc_stack(%rdx), %rdx + addq $EXCEPTION_STKSZ, %rdx + xchgq %rsp, %rdx + UNWIND_HINT_IRET_REGS base=%rdx offset=8 + pushq 5*8(%rdx) /* pt_regs->ss */ + pushq 4*8(%rdx) /* pt_regs->rsp */ + pushq 3*8(%rdx) /* pt_regs->flags */ + pushq 2*8(%rdx) /* pt_regs->cs */ + pushq 1*8(%rdx) /* pt_regs->rip */ + UNWIND_HINT_IRET_REGS + pushq $-1 /* pt_regs->orig_ax */ + PUSH_AND_CLEAR_REGS rdx=(%rdx) + ENCODE_FRAME_POINTER + + FENCE_SWAPGS_KERNEL_ENTRY + movq %rsp, %rdi + call exc_nmi + + /* + * Take the kernel return path. This will take care of restoring + * CR3 and return CS. + */ + jmp restore_regs_and_return_to_kernel +#endif + .Lnmi_from_kernel: /* * Here's what our stack frame will look like: @@ -1404,6 +1524,22 @@ end_repeat_nmi: /* Always restore stashed SPEC_CTRL value (see paranoid_entry) */ IBRS_EXIT save_reg=%r15 +#ifdef CONFIG_SANDBOX_MODE + /* + * Always restore saved CR3 when sandbox mode is active. This is + * needed if an NMI occurs during transition to sandbox mode. + */ + movq PER_CPU_VAR(pcpu_hot + X86_current_task), %rcx + movq TASK_sbm_state(%rcx), %rcx + jrcxz nmi_no_sbm + + movq %cr4, %rax + andb $~X86_CR4_PGE, %al + movq %rax, %cr4 + movq %r14, %cr3 +#endif + +nmi_no_sbm: /* Always restore stashed CR3 value (see paranoid_entry) */ RESTORE_CR3 scratch_reg=%r15 save_reg=%r14 diff --git a/arch/x86/include/asm/sbm.h b/arch/x86/include/asm/sbm.h index 229b1ac3bbd4..e38dcf9a8017 100644 --- a/arch/x86/include/asm/sbm.h +++ b/arch/x86/include/asm/sbm.h @@ -24,6 +24,8 @@ struct pt_regs; * @stack: Sandbox mode stack. * @exc_stack: Exception and IRQ stack. * @return_sp: Stack pointer for returning to kernel mode. + * @kernel_cr3: Kernel mode CR3 value. + * @sbm_cr3: Sandbox mode CR3 value. * * One instance of this union is allocated for each sandbox and stored as SBM * instance private data. @@ -34,6 +36,8 @@ struct x86_sbm_state { unsigned long stack; unsigned long exc_stack; unsigned long return_sp; + unsigned long kernel_cr3; + unsigned long sbm_cr3; }; /** diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 44d4f0a0cb19..cc2751822532 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -123,6 +124,10 @@ static void __used common(void) #if defined(CONFIG_HAVE_ARCH_SBM) && defined(CONFIG_SANDBOX_MODE) COMMENT("SandBox Mode"); + OFFSET(TASK_sbm_state, task_struct, thread_info.sbm_state); + OFFSET(SBM_exc_stack, x86_sbm_state, exc_stack); OFFSET(SBM_return_sp, x86_sbm_state, return_sp); + OFFSET(SBM_kernel_cr3, x86_sbm_state, kernel_cr3); + OFFSET(SBM_sbm_cr3, x86_sbm_state, sbm_cr3); #endif } diff --git a/arch/x86/kernel/sbm/call_64.S b/arch/x86/kernel/sbm/call_64.S index 6a615b4f6047..8b2b524c5b46 100644 --- a/arch/x86/kernel/sbm/call_64.S +++ b/arch/x86/kernel/sbm/call_64.S @@ -10,6 +10,8 @@ #include #include #include +#include +#include .code64 .section .entry.text, "ax" @@ -20,6 +22,7 @@ * rsi .. func * rdx .. args * rcx .. top of sandbox stack + * r8 .. top of exception stack */ SYM_FUNC_START(x86_sbm_exec) /* save all callee-saved registers */ @@ -29,6 +32,7 @@ SYM_FUNC_START(x86_sbm_exec) push %r13 push %r14 push %r15 + UNWIND_HINT_SAVE /* to be used by sandbox abort */ mov %rsp, SBM_return_sp(%rdi) @@ -38,18 +42,50 @@ SYM_FUNC_START(x86_sbm_exec) * 1. Store the old stack pointer at the top of the sandbox stack, * where various unwinders can find it and link back to the * kernel stack. + * 2. Put a return address at the top of the sandbox stack. Although + * the target code is not executable in sandbox mode, the page + * fault handler can check the fault address to know that the + * target function returned. */ - sub $8, %rcx - mov %rsp, (%rcx) - mov %rcx, %rsp + sub $8 * 2, %rcx + mov %rsp, 8(%rcx) + movq $x86_sbm_return, (%rcx) - mov %rdx, %rdi /* args */ - CALL_NOSPEC rsi + /* + * Move to the sandbox exception stack. + * This stack is mapped as writable supervisor pages both in kernel + * mode and in sandbox mode, so it survives a CR3 change. + */ + sub $8, %r8 + mov %rsp, (%r8) + mov %r8, %rsp + + /* set up the IRET frame */ + pushq $__USER_DS + push %rcx + pushfq + pushq $__USER_CS + push %rsi - pop %rsp + /* + * Switch to sandbox address space. Interrupt handlers cannot cope + * with sandbox CR3 in kernel mode. Disable interrupts before setting + * CR4, because if this task gets preempted, global pages would stay + * disabled, which is really bad for performance. + * The NMI handler takes extra care to restore CR3 and CR4. + */ + mov SBM_sbm_cr3(%rdi), %r11 + mov %cr4, %rax + and $~X86_CR4_PGE, %al + mov %rdx, %rdi /* args */ + cli + mov %rax, %cr4 + mov %r11, %cr3 + iretq SYM_INNER_LABEL(x86_sbm_return, SYM_L_GLOBAL) ANNOTATE_NOENDBR // IRET target via x86_sbm_fault() + UNWIND_HINT_RESTORE /* restore callee-saved registers and return */ pop %r15 diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c index d4c378847e93..0ea193550a83 100644 --- a/arch/x86/kernel/sbm/core.c +++ b/arch/x86/kernel/sbm/core.c @@ -24,9 +24,15 @@ #define PGD_ORDER get_order(sizeof(pgd_t) * PTRS_PER_PGD) asmlinkage int x86_sbm_exec(struct x86_sbm_state *state, sbm_func func, - void *args, unsigned long sbm_tos); + void *args, unsigned long sbm_tos, + unsigned long exc_tos); extern char x86_sbm_return[]; +union { + struct x86_sbm_state state; + char page[PAGE_SIZE]; +} x86_sbm_state __page_aligned_bss; + static inline phys_addr_t page_to_ptval(struct page *page) { return PFN_PHYS(page_to_pfn(page)) | _PAGE_TABLE; @@ -279,6 +285,12 @@ int arch_sbm_init(struct sbm *sbm) if (err) return err; + BUILD_BUG_ON(sizeof(x86_sbm_state) != PAGE_SIZE); + err = map_page(state, (unsigned long)&x86_sbm_state, + PHYS_PFN(__pa(state)), PAGE_KERNEL); + if (err < 0) + return err; + return 0; } @@ -348,11 +360,18 @@ int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *args) state->sbm = sbm; + /* save current (kernel) CR3 for interrupt entry path */ + state->kernel_cr3 = __read_cr3(); + + /* CR3 while running in sandbox */ + state->sbm_cr3 = __sme_pa(state->pgd); + /* let interrupt handlers use the sandbox state page */ barrier(); WRITE_ONCE(current_thread_info()->sbm_state, state); - err = x86_sbm_exec(state, func, args, state->stack + THREAD_SIZE); + err = x86_sbm_exec(state, func, args, state->stack + THREAD_SIZE, + state->exc_stack + EXCEPTION_STKSZ); /* NULLify the state page pointer before it becomes stale */ WRITE_ONCE(current_thread_info()->sbm_state, NULL); From patchwork Wed Feb 14 11:35:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Tesarik X-Patchwork-Id: 200920 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp1148028dyb; Wed, 14 Feb 2024 03:38:29 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVhwNWEbHlPpmWQhAmFCMu0HQtaQKsxl8R6CRZxA6zKHaoweRDPZOcwUXr6fCANHTZ+uAhlrYvjwRHiLP5oUiLU2FKHfg== X-Google-Smtp-Source: AGHT+IFcS3y8S31GB+jxqSF9uokFRoxjvw3lNkAinvorCWkWnJfAl5Bb1q1tyomykL6nQOLDPzmG X-Received: by 2002:a05:6358:3a15:b0:17a:d4ab:d94f with SMTP id g21-20020a0563583a1500b0017ad4abd94fmr2119017rwe.15.1707910709208; Wed, 14 Feb 2024 03:38:29 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707910709; cv=pass; d=google.com; s=arc-20160816; b=uJU3BFik6lpOUPxtOdsPGVVTbL/WYhAEi955+3nxnIeM6h6r2pc+7nIegNZ0/zIjsj BNl/GnTckKtt07wuN5m5vbpB2PglUbi2b9yuSu+hU8Y4iEQVg0/KYVk9FhdK1rse+Ta+ s2m0dlHNx2W99cf45KjZVZo471wrO/pWSRoAJeEm+BqCUxTFPZw2ArRRxHbNzg6Rx5OU IsPzWur+HBBy8UYs3fH+krY0lWe8dhGUT/ElmuV/7XHu6nVqTnX9cH4rG4+eCz2S8kKk 83cpK5MIAghxLI7DnqmIhEJ6k8VlzolZApnfxXxvlBRePMhbGw81rYLTzrqcfCOPRFDR dgVA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=9wsA5mQIXaxZnhgxbiU6KEyh+iI6a5sZ9nNRRen656Y=; fh=bUgqIeav8vh9CSRH5HaUqOiN9qHy99Nk3xK80eupe/s=; b=gC6vywblIjKE4TjS+VKS3Ms/jQgIqrMt9hNI1yHNClyzzGM+fimmne6JKgR+GQEPyr zR9nLdIIHVVEKXGUrZx6ALsNZNvsVNZdCnk7PGjsN8ttSb3SpF5oMyVI4+UM3rPqzkfn GKPEkDEh/icQLxnqBFI20OIdSlbp/yJdoH1YxkJHqxKpky7uqi8FyHPNWrmNVw/fK3fd MweKobcgODf5zH2fi6OhPTsBApUACZhrJOGIW+70uwkVDsk6gB+zNoBOLIyklv3s7Ba4 WtSbV6QUWf9BOmQqXUjC53BPg/G2UUUE8RbszmZttTBON/fzenVhY35iMr6ji763QNrR bIvQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-65145-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-65145-ouuuleilei=gmail.com@vger.kernel.org" X-Forwarded-Encrypted: i=2; AJvYcCVP8igEBBmEEuSbSee41DPB9Hg0V+yRC4kfPSeXxvcAw3uYZvZdTZAgdPgzJ5QH3d4OoD4nJ8wdJRS7GE3jt1wQaLfRng== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id k69-20020a638448000000b005dc894ea130si2276278pgd.384.2024.02.14.03.38.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Feb 2024 03:38:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-65145-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-65145-ouuuleilei=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-65145-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id EDE6F280E3E for ; Wed, 14 Feb 2024 11:38:28 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id AB4DC1B965; Wed, 14 Feb 2024 11:37:32 +0000 (UTC) Received: from frasgout11.his.huawei.com (frasgout11.his.huawei.com [14.137.139.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 147951C29E; Wed, 14 Feb 2024 11:37:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=14.137.139.23 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707910651; cv=none; b=bqEJHgZySoSuTAKiiqmvmIpxIaVWXik7WSzLe7cwXoQe6v9dAsKXXUZC6gWwY6GZGwq8t30nli60MBGkcsW5wfx0IDuyGhCL49mkmJpMa+l6puOWoXcS7884oYn23Jdd01kDfIjFqL6FYOP7NtlaROE8UZTj8m6eYykx6Mwa194= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707910651; c=relaxed/simple; bh=6FpxsbhND52y0CgrNgtNyqX/mYBXOyT7iZFZQdqCIHA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=tv8FHJDmVK9sZ3mC2JLXIfwWLFTDqoeGIEOegnbQlm9BvsGCnYm8rOFuZJM7sAr4tlcURSi2zjFEOP+wigtdWUqz63flyYhN8Gm2nlfJsJqi+ear33pnzYwrTJy16B5U04rzJxo+OCIbaedUCOfuUr9BQ1u/lpYsKMHbmg2bcZ4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=14.137.139.23 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.18.186.29]) by frasgout11.his.huawei.com (SkyGuard) with ESMTP id 4TZbPs4VDPz9ynSb; Wed, 14 Feb 2024 19:22:09 +0800 (CST) Received: from mail02.huawei.com (unknown [7.182.16.47]) by mail.maildlp.com (Postfix) with ESMTP id 17E1B1405A0; Wed, 14 Feb 2024 19:37:26 +0800 (CST) Received: from huaweicloud.com (unknown [10.45.156.69]) by APP1 (Coremail) with SMTP id LxC2BwAHshp7pcxlDJx9Ag--.51624S9; Wed, 14 Feb 2024 12:37:25 +0100 (CET) From: Petr Tesarik To: Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), "H. Peter Anvin" , Andy Lutomirski , Oleg Nesterov , Peter Zijlstra , Xin Li , Arnd Bergmann , Andrew Morton , Rick Edgecombe , Kees Cook , "Masami Hiramatsu (Google)" , Pengfei Xu , Josh Poimboeuf , Ze Gao , "Kirill A. Shutemov" , Kai Huang , David Woodhouse , Brian Gerst , Jason Gunthorpe , Joerg Roedel , "Mike Rapoport (IBM)" , Tina Zhang , Jacob Pan , linux-doc@vger.kernel.org (open list:DOCUMENTATION), linux-kernel@vger.kernel.org (open list) Cc: Roberto Sassu , petr@tesarici.cz, Petr Tesarik Subject: [PATCH v1 7/8] sbm: documentation of the x86-64 SandBox Mode implementation Date: Wed, 14 Feb 2024 12:35:15 +0100 Message-Id: <20240214113516.2307-8-petrtesarik@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240214113516.2307-1-petrtesarik@huaweicloud.com> References: <20240214113516.2307-1-petrtesarik@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: LxC2BwAHshp7pcxlDJx9Ag--.51624S9 X-Coremail-Antispam: 1UD129KBjvJXoW7Kw1DWF48AFy7GrWDJFWrZrb_yoW8CFWxpa 4DCas3WF4kJa42v3Z3Jr47ZryrXay8Gr47GFn3G34UJF9Fg34jyryftF1UtayUGryDCa40 qayjgryxWw4Yy37anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUml14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26r4j6F4UM28EF7xvwVC2z280aVCY1x0267AKxVW8Jr0_ Cr1UM2AIxVAIcxkEcVAq07x20xvEncxIr21l5I8CrVACY4xI64kE6c02F40Ex7xfMcIj6x IIjxv20xvE14v26r106r15McIj6I8E87Iv67AKxVWUJVW8JwAm72CE4IkC6x0Yz7v_Jr0_ Gr1lF7xvr2IYc2Ij64vIr41lF7I21c0EjII2zVCS5cI20VAGYxC7M4IIrI8v6xkF7I0E8c xan2IY04v7MxkF7I0Ew4C26cxK6c8Ij28IcwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE 7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI 8E67AF67kF1VAFwI0_Wrv_Gr1UMIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Gr0_ Xr1lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r 1j6r1xMIIF0xvEx4A2jsIE14v26r4j6F4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4U JbIYCTnIWIevJa73UjIFyTuYvjfUnzVbDUUUU X-CM-SenderInfo: hshw23xhvd2x3n6k3tpzhluzxrxghudrp/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790874179522495644 X-GMAIL-MSGID: 1790874179522495644 From: Petr Tesarik Add a section about the x86-64 implementation. Signed-off-by: Petr Tesarik --- Documentation/security/sandbox-mode.rst | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/Documentation/security/sandbox-mode.rst b/Documentation/security/sandbox-mode.rst index 4405b8858c4a..84816b6b68de 100644 --- a/Documentation/security/sandbox-mode.rst +++ b/Documentation/security/sandbox-mode.rst @@ -111,6 +111,31 @@ These hooks must be implemented to select HAVE_ARCH_SBM. :identifiers: arch_sbm_init arch_sbm_destroy arch_sbm_exec arch_sbm_map_readonly arch_sbm_map_writable +X86_64 Implementation +===================== + +The x86_64 implementation provides strong isolation and recovery from CPU +exceptions. + +Sandbox mode runs in protection ring 3 (same as user mode). This means that: + +* sandbox code cannot execute privileged CPU instructions, +* memory accesses are treated as user accesses. + +The thread stack is readable in sandbox mode, because an on-stack data +structure is used by call helpers and thunks to pass target function +arguments. However, it is not writable, and sandbox code runs on its own +stack. The thread stack is not used by interrupt handlers either. Non-IST +interrupt handlers run on a separate sandbox exception stack. + +The interrupt entry path modifies the saved pt_regs to make it appear as +coming from kernel mode. The CR3 register is then switched to kernel mode. +The interrupt exit path is modified to restore actual pt_regs and switch the +CR3 register back to its sandbox mode value, overriding CR3 changes for page +table isolation. + +Support for paravirtualized kernels is not (yet) provided. + Current Limitations =================== From patchwork Wed Feb 14 11:35:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Petr Tesarik X-Patchwork-Id: 200933 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp1157143dyb; Wed, 14 Feb 2024 04:00:00 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCVwJO9CfSqhhs8q35gHtoD3+BmrwzR7pZq6AOORAKYVs0T1FOOpFaFq/nFL3o943nS+VTo9HtGmaPSGEODygNyffzhQZA== X-Google-Smtp-Source: AGHT+IHdWc8VYmmu7dr5bfl3L7OY5LE1NaLE9HWmgijhlBpK4DbYrNtxs2+3kQaqyvovHtOdwHgb X-Received: by 2002:a17:90a:604f:b0:297:ca7:fd10 with SMTP id h15-20020a17090a604f00b002970ca7fd10mr2401575pjm.1.1707912000548; Wed, 14 Feb 2024 04:00:00 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707912000; cv=pass; d=google.com; s=arc-20160816; b=0r5pUnff5SAs0Wle1+NclTLKriLkNTnfn8Eu/ftc55+Vcg2tAH4X+r8M59gX8YqQcq rMTJDZxYIg69KmKVdt2uJHXG5f2KnWE6X+BolPslDA/l2GQ+1yzmafV9eqqv/dOzK/zl KMJ+b1PLVOOYcSBmedjtgUxtslFHNFeeHIeS/mr5ILBbRD3AtRNQnxBmwp0BZ6d3zDpn DPr4uwXT18Z9OpkURosM0RNXMyoa0uQtLQKYMHNgiWw1R6PKnzNh+cMxzzzo0cKdTqu9 5HFgIACf2/d55pNvY2e4sCmcnEkn+0ZanAMOJZFNsxTSXcNfkRYCH6Uz8skF0zmlVUtn BTYg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=u/Jv5r/D52VsTQU9oilpim+ft8qe9cMGcRaHoqbqgog=; fh=k04hVw+UxaljSNVCQ2DslPkihbUiKFerbpEaxABHBHk=; b=0uyC0TwhD2ACfIp/6+m2AVlIcrT/gx+nMMp9AYHL7b2hCZYRkqJg1KIizUH/JMzvDY ekegUs2kyC02JmG9TeC6VwRTl+QCQhKPX3h2MT93V/mg1da7mLVq8X/nEga7j8/hlcZZ yIZ6KHQ18bIzG2XwPvflBuaFvH31QtYp7oRSCxNGfV3W99ZJUVLOLXEU4etE75KrbRDa yuBNOrXE2exHZSQxLOB9CvfQFJHxwb6G6y43PjYBmD4/7FCiBMObUx4sUm0tzZQb1Q0A zEvGAR/y1TOuas65OUybviagMvidI8XReUYkeehPYWtzhOjHG+P/1QktJLir+csol0+j 9G4Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-65146-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-65146-ouuuleilei=gmail.com@vger.kernel.org" X-Forwarded-Encrypted: i=2; AJvYcCWidTz7OmH7sDCcGZ3IJlAf3trsufNLEueQjlMuWT+vaaAOc/k01BSQdkNW7Y0k4jLDeZkSnE+rpBlWTT7O8A/ekj+iPA== Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id i10-20020a17090acf8a00b00296d045abe1si1034964pju.91.2024.02.14.04.00.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 14 Feb 2024 04:00:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-65146-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=huaweicloud.com); spf=pass (google.com: domain of linux-kernel+bounces-65146-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-kernel+bounces-65146-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id B5C3DB29BFE for ; Wed, 14 Feb 2024 11:38:45 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7C2B61C6B2; Wed, 14 Feb 2024 11:37:48 +0000 (UTC) Received: from frasgout13.his.huawei.com (frasgout13.his.huawei.com [14.137.139.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DD72C1BC31; Wed, 14 Feb 2024 11:37:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=14.137.139.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707910666; cv=none; b=BUFovqR8c1NeIYQwTsXWB7UjSkSroyKMETMFUlCYmyn9SflD0eCYGW63e3sKG/CQISQS0z7IVCGGhH9lNjvpv6mXieBRh09KJ5JxJOqDVCC+3ed8NGN0jMHkj40P5odllbpa5iK1SawhKmzqRYmi/NZcNq2QzASb4KFbaTErXfs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707910666; c=relaxed/simple; bh=/uyNfbbAGyBoNGO8quoCKc4b5P7cnfMZAnnYhYkf9As=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=eMbKBAH2Doy3r7ctZUGQHVJSZQIoXcHKAy149xNy8ItVLvk4dnRUCG3loYsYFCLAjBF2LwIoLPbtGeEjDVSZiEx2sSOeTcNXMx5AgSUV4QEJBsnqbyUkmzeVHE8fXDQ8YXh8eI1B+ChwvdowzbytftzYq6a7bGeBmBxCfwdsXoo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=14.137.139.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.18.186.51]) by frasgout13.his.huawei.com (SkyGuard) with ESMTP id 4TZbQC00hnz9yMLR; Wed, 14 Feb 2024 19:22:27 +0800 (CST) Received: from mail02.huawei.com (unknown [7.182.16.47]) by mail.maildlp.com (Postfix) with ESMTP id 2CCD3140427; Wed, 14 Feb 2024 19:37:41 +0800 (CST) Received: from huaweicloud.com (unknown [10.45.156.69]) by APP1 (Coremail) with SMTP id LxC2BwAHshp7pcxlDJx9Ag--.51624S10; Wed, 14 Feb 2024 12:37:40 +0100 (CET) From: Petr Tesarik To: Jonathan Corbet , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org (maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)), "H. Peter Anvin" , Andy Lutomirski , Oleg Nesterov , Peter Zijlstra , Xin Li , Arnd Bergmann , Andrew Morton , Rick Edgecombe , Kees Cook , "Masami Hiramatsu (Google)" , Pengfei Xu , Josh Poimboeuf , Ze Gao , "Kirill A. Shutemov" , Kai Huang , David Woodhouse , Brian Gerst , Jason Gunthorpe , Joerg Roedel , "Mike Rapoport (IBM)" , Tina Zhang , Jacob Pan , linux-doc@vger.kernel.org (open list:DOCUMENTATION), linux-kernel@vger.kernel.org (open list) Cc: Roberto Sassu , petr@tesarici.cz, Petr Tesarik Subject: [PATCH v1 8/8] sbm: x86: lazy TLB flushing Date: Wed, 14 Feb 2024 12:35:16 +0100 Message-Id: <20240214113516.2307-9-petrtesarik@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240214113516.2307-1-petrtesarik@huaweicloud.com> References: <20240214113516.2307-1-petrtesarik@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-CM-TRANSID: LxC2BwAHshp7pcxlDJx9Ag--.51624S10 X-Coremail-Antispam: 1UD129KBjvJXoW3Xw45Ar4fAFyrZr43XFW8WFg_yoW3Jry3pF n7Ga4kGFs7X34Syws7Xrs5AFn8Za1Dta15JasrKryfZa45Xw45Xr4jkw42qFWrZr95W3Wx KF4avFs5Cwn8Aa7anT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmm14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Gr0_Xr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26r4j6F4UM28EF7xvwVC2z280aVCY1x0267AKxVWxJr0_ GcWle2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2I x0cI8IcVAFwI0_JrI_JrylYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8 JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2 ka0xkIwI1lc7CjxVAKzI0EY4vE52x082I5MxAIw28IcxkI7VAKI48JMxC20s026xCaFVCj c4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7xvwVAFwI0_JrI_JrWlx4 CE17CEb7AF67AKxVWrXVW8Jr1lIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVW8JVW5 JwCI42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr 0_JF4lIxAIcVC2z280aVAFwI0_Gr0_Cr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1U YxBIdaVFxhVjvjDU0xZFpf9x0JU3EfOUUUUU= X-CM-SenderInfo: hshw23xhvd2x3n6k3tpzhluzxrxghudrp/ X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790875533958359519 X-GMAIL-MSGID: 1790875533958359519 From: Petr Tesarik Implement lazy TLB flushing in sandbox mode and keep CR4.PGE enabled. For the transition from sandbox mode to kernel mode: 1. All user page translations (sandbox code and data) are flushed from the TLB, because their page protection bits do not include _PAGE_GLOBAL. 2. Any kernel page translations remain valid after the transition. The SBM state page is an exception; map it without _PAGE_GLOBAL. For the transition from kernel mode to sandbox mode: 1. Kernel page translations become stale. However, any access by code running in sandbox mode (with CPL 3) causes a protection violation. Handle the spurious page faults from such accesses, lazily replacing entries in the TLB. 2. If the TLB contains any user page translations before the switch to sandbox mode, they are flushed, because their page protection bits do not include _PAGE_GLOBAL. This ensures that sandbox mode cannot access user mode pages. Note that the TLB may keep kernel page translations for addresses which are never accessed by sandbox mode. They remain valid after returning to kernel mode. Signed-off-by: Petr Tesarik --- arch/x86/entry/entry_64.S | 17 +----- arch/x86/kernel/sbm/call_64.S | 5 +- arch/x86/kernel/sbm/core.c | 100 +++++++++++++++++++++++++++++++++- 3 files changed, 102 insertions(+), 20 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index e1364115408a..4ba3eea38102 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -632,10 +632,8 @@ SYM_INNER_LABEL(restore_regs_and_return_to_kernel, SYM_L_GLOBAL) movq PER_CPU_VAR(pcpu_hot + X86_current_task), %rcx movq TASK_sbm_state(%rcx), %rcx movq SBM_sbm_cr3(%rcx), %rcx - movq %cr4, %rax - andb $~X86_CR4_PGE, %al - movq %rax, %cr4 movq %rcx, %cr3 + invlpg x86_sbm_state orb $3, CS(%rsp) #endif @@ -897,9 +895,6 @@ SYM_CODE_START(paranoid_entry) movq %cr3, %r14 andb $~3, CS+8(%rsp) - movq %cr4, %rax - orb $X86_CR4_PGE, %al - movq %rax, %cr4 movq %rcx, %cr3 jmp .Lparanoid_gsbase #endif @@ -1073,9 +1068,6 @@ SYM_CODE_START(error_entry) jrcxz .Lerror_swapgs andb $~3, CS+8(%rsp) - movq %cr4, %rax - orb $X86_CR4_PGE, %al - movq %rax, %cr4 movq %rcx, %cr3 jmp .Lerror_entry_done_lfence #endif @@ -1281,9 +1273,6 @@ SYM_CODE_START(asm_exc_nmi) * stack. The code is similar to NMI from user mode. */ andb $~3, CS-RIP+8(%rsp) - movq %cr4, %rdx - orb $X86_CR4_PGE, %dl - movq %rdx, %cr4 movq x86_sbm_state + SBM_kernel_cr3, %rdx movq %rdx, %cr3 @@ -1533,10 +1522,8 @@ end_repeat_nmi: movq TASK_sbm_state(%rcx), %rcx jrcxz nmi_no_sbm - movq %cr4, %rax - andb $~X86_CR4_PGE, %al - movq %rax, %cr4 movq %r14, %cr3 + invlpg x86_sbm_state #endif nmi_no_sbm: diff --git a/arch/x86/kernel/sbm/call_64.S b/arch/x86/kernel/sbm/call_64.S index 8b2b524c5b46..21edce5666bc 100644 --- a/arch/x86/kernel/sbm/call_64.S +++ b/arch/x86/kernel/sbm/call_64.S @@ -10,7 +10,6 @@ #include #include #include -#include #include .code64 @@ -75,12 +74,10 @@ SYM_FUNC_START(x86_sbm_exec) * The NMI handler takes extra care to restore CR3 and CR4. */ mov SBM_sbm_cr3(%rdi), %r11 - mov %cr4, %rax - and $~X86_CR4_PGE, %al mov %rdx, %rdi /* args */ cli - mov %rax, %cr4 mov %r11, %cr3 + invlpg x86_sbm_state iretq SYM_INNER_LABEL(x86_sbm_return, SYM_L_GLOBAL) diff --git a/arch/x86/kernel/sbm/core.c b/arch/x86/kernel/sbm/core.c index 0ea193550a83..296f1fde3c22 100644 --- a/arch/x86/kernel/sbm/core.c +++ b/arch/x86/kernel/sbm/core.c @@ -33,6 +33,11 @@ union { char page[PAGE_SIZE]; } x86_sbm_state __page_aligned_bss; +static inline pgprot_t pgprot_nonglobal(pgprot_t prot) +{ + return __pgprot(pgprot_val(prot) & ~_PAGE_GLOBAL); +} + static inline phys_addr_t page_to_ptval(struct page *page) { return PFN_PHYS(page_to_pfn(page)) | _PAGE_TABLE; @@ -287,7 +292,7 @@ int arch_sbm_init(struct sbm *sbm) BUILD_BUG_ON(sizeof(x86_sbm_state) != PAGE_SIZE); err = map_page(state, (unsigned long)&x86_sbm_state, - PHYS_PFN(__pa(state)), PAGE_KERNEL); + PHYS_PFN(__pa(state)), pgprot_nonglobal(PAGE_KERNEL)); if (err < 0) return err; @@ -379,11 +384,104 @@ int arch_sbm_exec(struct sbm *sbm, sbm_func func, void *args) return err; } +static bool spurious_sbm_fault_check(unsigned long error_code, pte_t *pte) +{ + if ((error_code & X86_PF_WRITE) && !pte_write(*pte)) + return false; + + if ((error_code & X86_PF_INSTR) && !pte_exec(*pte)) + return false; + + return true; +} + +/* + * Handle a spurious fault caused by a stale TLB entry. + * + * This allows us to lazily refresh the TLB when increasing the + * permissions of a kernel page (RO -> RW or NX -> X). Doing it + * eagerly is very expensive since that implies doing a full + * cross-processor TLB flush, even if no stale TLB entries exist + * on other processors. + * + * Spurious faults may only occur if the TLB contains an entry with + * fewer permission than the page table entry. Non-present (P = 0) + * and reserved bit (R = 1) faults are never spurious. + * + * There are no security implications to leaving a stale TLB when + * increasing the permissions on a page. + * + * Returns true if a spurious fault was handled, false otherwise. + * + * See Intel Developer's Manual Vol 3 Section 4.10.4.3, bullet 3 + * (Optional Invalidation). + */ +static bool +spurious_sbm_fault(struct x86_sbm_state *state, unsigned long error_code, + unsigned long address) +{ + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + bool ret; + + if ((error_code & ~(X86_PF_WRITE | X86_PF_INSTR)) != + (X86_PF_USER | X86_PF_PROT)) + return false; + + pgd = __va(state->sbm_cr3 & CR3_ADDR_MASK) + pgd_index(address); + if (!pgd_present(*pgd)) + return false; + + p4d = p4d_offset(pgd, address); + if (!p4d_present(*p4d)) + return false; + + if (p4d_large(*p4d)) + return spurious_sbm_fault_check(error_code, (pte_t *)p4d); + + pud = pud_offset(p4d, address); + if (!pud_present(*pud)) + return false; + + if (pud_large(*pud)) + return spurious_sbm_fault_check(error_code, (pte_t *)pud); + + pmd = pmd_offset(pud, address); + if (!pmd_present(*pmd)) + return false; + + if (pmd_large(*pmd)) + return spurious_sbm_fault_check(error_code, (pte_t *)pmd); + + pte = pte_offset_kernel(pmd, address); + if (!pte_present(*pte)) + return false; + + ret = spurious_sbm_fault_check(error_code, pte); + if (!ret) + return false; + + /* + * Make sure we have permissions in PMD. + * If not, then there's a bug in the page tables: + */ + ret = spurious_sbm_fault_check(error_code, (pte_t *)pmd); + WARN_ONCE(!ret, "PMD has incorrect permission bits\n"); + + return ret; +} + void handle_sbm_fault(struct pt_regs *regs, unsigned long error_code, unsigned long address) { struct x86_sbm_state *state = current_thread_info()->sbm_state; + if (spurious_sbm_fault(state, error_code, address)) + return; + /* * Force -EFAULT unless the fault was due to a user-mode instruction * fetch from the designated return address.