From patchwork Tue Feb 20 20:32:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maxwell Bland X-Patchwork-Id: 203814 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:693c:2685:b0:108:e6aa:91d0 with SMTP id mn5csp655577dyc; Tue, 20 Feb 2024 12:45:12 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXaY73CYOtt6lMwmbQXVuIOhHujaz7deAPDH0TA80q995G7EX28JIXVPMH/QtCbFptT4d6jsoHuzqGytILUU3vQAl+M4w== X-Google-Smtp-Source: AGHT+IELm/w8hTPa4aS3fHswsd/qsXQoBam5yHrCXjreG5nwn+CMu+4Hg4DvAH3zEtHx2M+Xv7wV X-Received: by 2002:a05:6a21:1690:b0:1a0:6856:d128 with SMTP id np16-20020a056a21169000b001a06856d128mr16013614pzb.9.1708461912588; Tue, 20 Feb 2024 12:45:12 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708461912; cv=pass; d=google.com; s=arc-20160816; b=cJVDgVLSCwwJPjkK6voX4vvUGUKiqUv2u5RmLiDpNXtsit0++6GYonOPA+VKjQvTFo xB5MJ6uxcYYkXlzo1ldq/kph7g65ME2TmCrLhpf/5t/pn9fhlJ6IMPIZa3mXqjMcmUnp 5OFB6ZsVWd+QA7VwhLCs4aLC9d0+nTFonqGhgOGLEtakvhonPB2tAOm8vo8bUhb6xRiY sbz+hCb8KiPno/LCVvoFPIgkskua8P8Xc0E5r3VESN0f12kZC/1NGMfe8pzCbvfoCDHz Bkyr9HV0w9J/mFze1gQcAeY2YnkzM6i7EjiOiJM3LHJWvNDEoOuf3zIqo5j3JEEZz7GS sbzg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-unsubscribe:list-subscribe:list-id:precedence:references :in-reply-to:message-id:date:subject:cc:to:from:dkim-signature; bh=bbMfJLfH6G5a4s+Z0cFe2wOzrCFA0lQFMo+SQbxfAvI=; fh=FkeYy9VFhDbdZf7Wr1j+kC7C7CaCFd0E6M33TF/KxqU=; b=Id9q+DJuHSO7j9vHd6dxpTY+VyThilK2s7Ly5zrpeUjIhCZ90b/rr29K9lWbivF43/ 1aapDguX4nLEX5YSVh3RKke/uQQMSzY3Pqc5Crne6RADowfSmoeQY/U18gqmAjxCyovQ BoO+k3nXQwJtsqIsNTlMKq5FKoXI+NcX7G+9MtitCwjVgV7igXvS0wYPXQsOiRVrel0L ebZDyEQzi+01tLf5QUV6XJh1T+U8zAqHX9XuJMv5DYKyrLOf7tib5WW3G9mjsqSZKDWW 48+9WDNUa+oVNw3ZUg7payWG4Pja4RGg9wb9LN4cM7AAvZhmUs2pASbiUDnDQmJdgTej yk5g==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@motorola.com header.s=DKIM202306 header.b=G8c3FTHW; arc=pass (i=1 spf=pass spfdomain=motorola.com dkim=pass dkdomain=motorola.com dmarc=pass fromdomain=motorola.com); spf=pass (google.com: domain of linux-kernel+bounces-73687-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-73687-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=motorola.com Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id e9-20020aa78c49000000b006e47300e4e7si1330339pfd.370.2024.02.20.12.45.12 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Feb 2024 12:45:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-73687-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@motorola.com header.s=DKIM202306 header.b=G8c3FTHW; arc=pass (i=1 spf=pass spfdomain=motorola.com dkim=pass dkdomain=motorola.com dmarc=pass fromdomain=motorola.com); spf=pass (google.com: domain of linux-kernel+bounces-73687-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-73687-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=motorola.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 9472FB22955 for ; Tue, 20 Feb 2024 20:38:12 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D534D14C596; Tue, 20 Feb 2024 20:37:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=motorola.com header.i=@motorola.com header.b="G8c3FTHW" Received: from mx0a-00823401.pphosted.com (mx0a-00823401.pphosted.com [148.163.148.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A05A14F9E7; Tue, 20 Feb 2024 20:37:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.148.104 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708461462; cv=none; b=FPM8EQ9W8C7X/Mp8q1B0nde4gCorspvGvpzBDAb12PohjZ8H3YRqvBgxPrx7LXhIsW7zm317y28qFfWro2lHYuT5pmLmaFPEBBH7IEKshLuegnrmfOoV5DRVm6H/6WQPEbc7XpVaS3A1b6Z+GaaEH3xyhRXBeXKCIy7Y0dClBLI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708461462; c=relaxed/simple; bh=uBERHdw2LAkX/ox0xE0bEhdCEdWfMV2K3mU7Jp07W8M=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References; b=Tet8q4tByhb+4cUmUwhHuw6olJtc+N7+mNBOJg0LnigN8Isf/zvzO7QqxfLi0EsCc3SY/tthveewi1QyUa2lI5ANkoY7cfcFhWdQcccV3GU9QykOryFY/qv5Td2lXkXk8g4f7fyXNXTCpjIAOxB+remxh8SIGpHbQSZOs4nUIO0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=motorola.com; spf=pass smtp.mailfrom=motorola.com; dkim=pass (2048-bit key) header.d=motorola.com header.i=@motorola.com header.b=G8c3FTHW; arc=none smtp.client-ip=148.163.148.104 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=motorola.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=motorola.com Received: from pps.filterd (m0355088.ppops.net [127.0.0.1]) by m0355088.ppops.net (8.17.1.24/8.17.1.24) with ESMTP id 41KJD1Po003877; Tue, 20 Feb 2024 20:33:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=motorola.com; h= from:to:cc:subject:date:message-id:in-reply-to:references; s= DKIM202306; bh=bbMfJLfH6G5a4s+Z0cFe2wOzrCFA0lQFMo+SQbxfAvI=; b=G 8c3FTHWyfPhIWnn3QaQguVptC48Uy9lbb+BcoCWa1nlQvoc+woWLCowBjoFNI/ix dF3dTPLkKuRe51BCEWpmjjyoWUwOEeG1JP3K+L03xl7q88Lz8LruYRC/RRB3CTwB VoNx/lEOcTE6WiCIHXgg5dkPsrrUKhnkMwA9g0o57GM1rcm1QJ1bJzqn4LBPaSvJ 5aZqXa8byiN8r8QcOQjWXqu7qO8ypNwY4E1F+FiRQQ8X2r4skIxi7eqpl2aUyNPQ uI7mEb5ULBOlIP5YUGkeqB8pHq9Ly3Cnsty72nkRlu5/CiR5/fmbdecAA2NZSgUu DRWYGLwsYmIHKfHyKYEXQ== Received: from va32lpfpp03.lenovo.com ([104.232.228.23]) by m0355088.ppops.net (PPS) with ESMTPS id 3wd21x05ua-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 20 Feb 2024 20:33:22 +0000 (GMT) Received: from ilclmmrp01.lenovo.com (ilclmmrp01.mot.com [100.65.83.165]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by va32lpfpp03.lenovo.com (Postfix) with ESMTPS id 4TfWM54k8jz4ygs4; Tue, 20 Feb 2024 20:33:21 +0000 (UTC) Received: from ilclasset01.mot.com (ilclasset01.mot.com [100.64.7.105]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: mbland) by ilclmmrp01.lenovo.com (Postfix) with ESMTPSA id 4TfWM52tFkz3n3fr; Tue, 20 Feb 2024 20:33:21 +0000 (UTC) From: Maxwell Bland To: linux-arm-kernel@lists.infradead.org Cc: gregkh@linuxfoundation.org, agordeev@linux.ibm.com, akpm@linux-foundation.org, andreyknvl@gmail.com, andrii@kernel.org, aneesh.kumar@kernel.org, aou@eecs.berkeley.edu, ardb@kernel.org, arnd@arndb.de, ast@kernel.org, borntraeger@linux.ibm.com, bpf@vger.kernel.org, brauner@kernel.org, catalin.marinas@arm.com, christophe.leroy@csgroup.eu, cl@linux.com, daniel@iogearbox.net, dave.hansen@linux.intel.com, david@redhat.com, dennis@kernel.org, dvyukov@google.com, glider@google.com, gor@linux.ibm.com, guoren@kernel.org, haoluo@google.com, hca@linux.ibm.com, hch@infradead.org, john.fastabend@gmail.com, jolsa@kernel.org, kasan-dev@googlegroups.com, kpsingh@kernel.org, linux-arch@vger.kernel.org, linux@armlinux.org.uk, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, lstoakes@gmail.com, mark.rutland@arm.com, martin.lau@linux.dev, meted@linux.ibm.com, michael.christie@oracle.com, mjguzik@gmail.com, mpe@ellerman.id.au, mst@redhat.com, muchun.song@linux.dev, naveen.n.rao@linux.ibm.com, npiggin@gmail.com, palmer@dabbelt.com, paul.walmsley@sifive.com, quic_nprakash@quicinc.com, quic_pkondeti@quicinc.com, rick.p.edgecombe@intel.com, ryabinin.a.a@gmail.com, ryan.roberts@arm.com, samitolvanen@google.com, sdf@google.com, song@kernel.org, surenb@google.com, svens@linux.ibm.com, tj@kernel.org, urezki@gmail.com, vincenzo.frascino@arm.com, will@kernel.org, wuqiang.matt@bytedance.com, yonghong.song@linux.dev, zlim.lnx@gmail.com, mbland@motorola.com, awheeler@motorola.com Subject: [PATCH 3/4] arm64: separate code and data virtual memory allocation Date: Tue, 20 Feb 2024 14:32:55 -0600 Message-Id: <20240220203256.31153-4-mbland@motorola.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20240220203256.31153-1-mbland@motorola.com> References: <20240220203256.31153-1-mbland@motorola.com> X-Proofpoint-ORIG-GUID: FwETS-WEm85ynHN23cNJXJC1VUZpgnFr X-Proofpoint-GUID: FwETS-WEm85ynHN23cNJXJC1VUZpgnFr X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-20_06,2024-02-20_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 mlxlogscore=999 clxscore=1015 suspectscore=0 phishscore=0 mlxscore=0 malwarescore=0 adultscore=0 bulkscore=0 impostorscore=0 spamscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2402120000 definitions=main-2402200146 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1791452158207761304 X-GMAIL-MSGID: 1791452158207761304 Current BPF and kprobe instruction allocation interfaces do not match the base kernel and intermingle code and data pages within the same sections. In the case of BPF, this appears to be a result of code duplication between the kernel's JIT compiler and arm64's JIT. However, This is no longer necessary given the possibility of overriding vmalloc wrapper functions. arm64's vmalloc_node routines now include a layer of indirection which splits the vmalloc region into two segments surrounding the middle module_alloc region determined by ASLR. To support this, code_region_start and code_region_end are defined to match the 2GB boundary chosen by the kernel module ASLR initialization routine. The result is a large benefits to overall kernel security, as code pages now remain protected by this ASLR routine and protections can be defined linearly for code regions rather than through PTE-level tracking. Signed-off-by: Maxwell Bland --- arch/arm64/include/asm/vmalloc.h | 3 ++ arch/arm64/kernel/module.c | 7 ++++ arch/arm64/kernel/probes/kprobes.c | 2 +- arch/arm64/mm/Makefile | 3 +- arch/arm64/mm/vmalloc.c | 57 ++++++++++++++++++++++++++++++ arch/arm64/net/bpf_jit_comp.c | 5 +-- 6 files changed, 73 insertions(+), 4 deletions(-) create mode 100644 arch/arm64/mm/vmalloc.c diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmalloc.h index 38fafffe699f..dbcf8ad20265 100644 --- a/arch/arm64/include/asm/vmalloc.h +++ b/arch/arm64/include/asm/vmalloc.h @@ -31,4 +31,7 @@ static inline pgprot_t arch_vmap_pgprot_tagged(pgprot_t prot) return pgprot_tagged(prot); } +extern unsigned long code_region_start __ro_after_init; +extern unsigned long code_region_end __ro_after_init; + #endif /* _ASM_ARM64_VMALLOC_H */ diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c index dd851297596e..c4fe753a71a9 100644 --- a/arch/arm64/kernel/module.c +++ b/arch/arm64/kernel/module.c @@ -29,6 +29,10 @@ static u64 module_direct_base __ro_after_init = 0; static u64 module_plt_base __ro_after_init = 0; +/* For pre-init vmalloc, assume the worst-case code range */ +unsigned long code_region_start __ro_after_init = (u64) (_end - SZ_2G); +unsigned long code_region_end __ro_after_init = (u64) (_text + SZ_2G); + /* * Choose a random page-aligned base address for a window of 'size' bytes which * entirely contains the interval [start, end - 1]. @@ -101,6 +105,9 @@ static int __init module_init_limits(void) module_plt_base = random_bounding_box(SZ_2G, min, max); } + code_region_start = module_plt_base; + code_region_end = module_plt_base + SZ_2G; + pr_info("%llu pages in range for non-PLT usage", module_direct_base ? (SZ_128M - kernel_size) / PAGE_SIZE : 0); pr_info("%llu pages in range for PLT usage", diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c index 70b91a8c6bb3..c9e109d6c8bc 100644 --- a/arch/arm64/kernel/probes/kprobes.c +++ b/arch/arm64/kernel/probes/kprobes.c @@ -131,7 +131,7 @@ int __kprobes arch_prepare_kprobe(struct kprobe *p) void *alloc_insn_page(void) { - return __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START, VMALLOC_END, + return __vmalloc_node_range(PAGE_SIZE, 1, code_region_start, code_region_end, GFP_KERNEL, PAGE_KERNEL_ROX, VM_FLUSH_RESET_PERMS, NUMA_NO_NODE, __builtin_return_address(0)); } diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile index dbd1bc95967d..730b805d8388 100644 --- a/arch/arm64/mm/Makefile +++ b/arch/arm64/mm/Makefile @@ -2,7 +2,8 @@ obj-y := dma-mapping.o extable.o fault.o init.o \ cache.o copypage.o flush.o \ ioremap.o mmap.o pgd.o mmu.o \ - context.o proc.o pageattr.o fixmap.o + context.o proc.o pageattr.o fixmap.o \ + vmalloc.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o obj-$(CONFIG_PTDUMP_CORE) += ptdump.o obj-$(CONFIG_PTDUMP_DEBUGFS) += ptdump_debugfs.o diff --git a/arch/arm64/mm/vmalloc.c b/arch/arm64/mm/vmalloc.c new file mode 100644 index 000000000000..b6d2fa841f90 --- /dev/null +++ b/arch/arm64/mm/vmalloc.c @@ -0,0 +1,57 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include +#include + +static void *__vmalloc_node_range_split(unsigned long size, unsigned long align, + unsigned long start, unsigned long end, + unsigned long exclusion_start, unsigned long exclusion_end, gfp_t gfp_mask, + pgprot_t prot, unsigned long vm_flags, int node, + const void *caller) +{ + void *res = NULL; + + res = __vmalloc_node_range(size, align, start, exclusion_start, + gfp_mask, prot, vm_flags, node, caller); + if (!res) + res = __vmalloc_node_range(size, align, exclusion_end, end, + gfp_mask, prot, vm_flags, node, caller); + + return res; +} + +void *__vmalloc_node(unsigned long size, unsigned long align, + gfp_t gfp_mask, unsigned long vm_flags, int node, + const void *caller) +{ + return __vmalloc_node_range_split(size, align, VMALLOC_START, + VMALLOC_END, code_region_start, code_region_end, + gfp_mask, PAGE_KERNEL, vm_flags, node, caller); +} + +void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) +{ + return __vmalloc_node_range_split(size, 1, VMALLOC_START, VMALLOC_END, + code_region_start, code_region_end, + gfp_mask, PAGE_KERNEL, VM_ALLOW_HUGE_VMAP, + NUMA_NO_NODE, __builtin_return_address(0)); +} + +void *vmalloc_user(unsigned long size) +{ + return __vmalloc_node_range_split(size, SHMLBA, VMALLOC_START, VMALLOC_END, + code_region_start, code_region_end, + GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL, + VM_USERMAP, NUMA_NO_NODE, + __builtin_return_address(0)); +} + +void *vmalloc_32_user(unsigned long size) +{ + return __vmalloc_node_range_split(size, SHMLBA, VMALLOC_START, VMALLOC_END, + code_region_start, code_region_end, + GFP_VMALLOC32 | __GFP_ZERO, PAGE_KERNEL, + VM_USERMAP, NUMA_NO_NODE, + __builtin_return_address(0)); +} + diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 8955da5c47cf..40426f3a9bdf 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include @@ -1690,12 +1691,12 @@ u64 bpf_jit_alloc_exec_limit(void) void *bpf_jit_alloc_exec(unsigned long size) { /* Memory is intended to be executable, reset the pointer tag. */ - return kasan_reset_tag(vmalloc(size)); + return kasan_reset_tag(module_alloc(size)); } void bpf_jit_free_exec(void *addr) { - return vfree(addr); + return module_memfree(addr); } /* Indicate the JIT backend supports mixing bpf2bpf and tailcalls. */